Reading the last n lines from a file without using lots of memory

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Reading the last n lines from a file without using lots of memory

philippeback
I wonder how you guys would read the last n lines from a file in Pharo without reading through the whole thing.

Is there code doing just that somewhere?

The code I have is a shell script doing a 'tac file | tail -200 > /temp/something'

I can always do that through OSProcess but wondered if there was something available.

Reading through with a 200-entries FIFO circular buffer seems a bit silly to do.

TIA
Phil


Reply | Threaded
Open this post in threaded view
|

Re: Reading the last n lines from a file without using lots of memory

Guillermo Polito
If you have a MultiByteFileStream you can do

stream position: stream size - 200.
stream next: 200.

Have a look at RemoteString, which is the class used to read the source from the source and changes files (without loading all of them into memory)


On Fri, Jun 27, 2014 at 6:15 PM, [hidden email] <[hidden email]> wrote:
I wonder how you guys would read the last n lines from a file in Pharo without reading through the whole thing.

Is there code doing just that somewhere?

The code I have is a shell script doing a 'tac file | tail -200 > /temp/something'

I can always do that through OSProcess but wondered if there was something available.

Reading through with a 200-entries FIFO circular buffer seems a bit silly to do.

TIA
Phil



Reply | Threaded
Open this post in threaded view
|

Re: Reading the last n lines from a file without using lots of memory

philippeback
Thanks.

But this would give me the 200 last chars. I am interested in the 200 last lines.

Now, I did this: 

command := 'tac ', file fullName, ' | head -200'.
^ (PipeableOSProcess command: command) output.

which did the trick but isn't portable at all (Linux here).

I had a look at RemoteString as well.

Phil


On Fri, Jun 27, 2014 at 6:21 PM, Guillermo Polito <[hidden email]> wrote:
If you have a MultiByteFileStream you can do

stream position: stream size - 200.
stream next: 200.

Have a look at RemoteString, which is the class used to read the source from the source and changes files (without loading all of them into memory)


On Fri, Jun 27, 2014 at 6:15 PM, [hidden email] <[hidden email]> wrote:
I wonder how you guys would read the last n lines from a file in Pharo without reading through the whole thing.

Is there code doing just that somewhere?

The code I have is a shell script doing a 'tac file | tail -200 > /temp/something'

I can always do that through OSProcess but wondered if there was something available.

Reading through with a 200-entries FIFO circular buffer seems a bit silly to do.

TIA
Phil




Reply | Threaded
Open this post in threaded view
|

Re: Reading the last n lines from a file without using lots of memory

Sven Van Caekenberghe-2
http://stackoverflow.com/questions/10164597/how-would-you-implement-tail-efficiently
http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c

I would work with the growing buffer read backwards.
It would be great fun doing that in Pharo.

On 27 Jun 2014, at 18:50, [hidden email] wrote:

> Thanks.
>
> But this would give me the 200 last chars. I am interested in the 200 last lines.
>
> Now, I did this:
>
> command := 'tac ', file fullName, ' | head -200'.
> ^ (PipeableOSProcess command: command) output.
>
> which did the trick but isn't portable at all (Linux here).
>
> I had a look at RemoteString as well.
>
> Phil
>
>
> On Fri, Jun 27, 2014 at 6:21 PM, Guillermo Polito <[hidden email]> wrote:
> If you have a MultiByteFileStream you can do
>
> stream position: stream size - 200.
> stream next: 200.
>
> Have a look at RemoteString, which is the class used to read the source from the source and changes files (without loading all of them into memory)
>
>
> On Fri, Jun 27, 2014 at 6:15 PM, [hidden email] <[hidden email]> wrote:
> I wonder how you guys would read the last n lines from a file in Pharo without reading through the whole thing.
>
> Is there code doing just that somewhere?
>
> The code I have is a shell script doing a 'tac file | tail -200 > /temp/something'
>
> I can always do that through OSProcess but wondered if there was something available.
>
> Reading through with a 200-entries FIFO circular buffer seems a bit silly to do.
>
> TIA
> Phil
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Reading the last n lines from a file without using lots of memory

David T. Lewis
In reply to this post by philippeback
On Fri, Jun 27, 2014 at 06:50:11PM +0200, [hidden email] wrote:

> Thanks.
>
> But this would give me the 200 last chars. I am interested in the 200 last
> lines.
>
> Now, I did this:
>
> command := 'tac ', file fullName, ' | head -200'.
> ^ (PipeableOSProcess command: command) output.
>
> which did the trick but isn't portable at all (Linux here).

<OT>
Don't forget to close the pipes on that PipeableOSProcess when you are done using it :-)
</OT>

Dave

Reply | Threaded
Open this post in threaded view
|

Re: Reading the last n lines from a file without using lots of memory

philippeback


Le 28 juin 2014 01:18, "David T. Lewis" <[hidden email]> a écrit :
>
> On Fri, Jun 27, 2014 at 06:50:11PM +0200, [hidden email] wrote:
> > Thanks.
> >
> > But this would give me the 200 last chars. I am interested in the 200 last
> > lines.
> >
> > Now, I did this:
> >
> > command := 'tac ', file fullName, ' | head -200'.
> > ^ (PipeableOSProcess command: command) output.
> >
> > which did the trick but isn't portable at all (Linux here).
>
> <OT>
> Don't forget to close the pipes on that PipeableOSProcess when you are done using it :-)
> </OT>

Isn't output closing them? Command is a complete string here. 

Phil
>
> Dave
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Reading the last n lines from a file without using lots of memory

David T. Lewis
On Sat, Jun 28, 2014 at 08:19:41AM +0200, [hidden email] wrote:

> Le 28 juin 2014 01:18, "David T. Lewis" <[hidden email]> a ??crit :
> >
> > On Fri, Jun 27, 2014 at 06:50:11PM +0200, [hidden email] wrote:
> > > Thanks.
> > >
> > > But this would give me the 200 last chars. I am interested in the 200
> last
> > > lines.
> > >
> > > Now, I did this:
> > >
> > > command := 'tac ', file fullName, ' | head -200'.
> > > ^ (PipeableOSProcess command: command) output.
> > >
> > > which did the trick but isn't portable at all (Linux here).
> >
> > <OT>
> > Don't forget to close the pipes on that PipeableOSProcess when you are
> done using it :-)
> > </OT>
>
> Isn't output closing them? Command is a complete string here.
>

No, one pipe handle will still be open. Use #closePipes to close it.

PipeableOSProcess is designed to participate in a "pipeline" of commands,
and in that environment each element of the pipeline is responsible for
closing the pipe from its predecessor.

Dave
 

Reply | Threaded
Open this post in threaded view
|

Re: Reading the last n lines from a file without using lots of memory

Denis Kudriashov
In reply to this post by Sven Van Caekenberghe-2
It should be trivial with XStream


2014-06-27 21:45 GMT+04:00 Sven Van Caekenberghe <[hidden email]>:
http://stackoverflow.com/questions/10164597/how-would-you-implement-tail-efficiently
http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c

I would work with the growing buffer read backwards.
It would be great fun doing that in Pharo.

On 27 Jun 2014, at 18:50, [hidden email] wrote:

> Thanks.
>
> But this would give me the 200 last chars. I am interested in the 200 last lines.
>
> Now, I did this:
>
> command := 'tac ', file fullName, ' | head -200'.
> ^ (PipeableOSProcess command: command) output.
>
> which did the trick but isn't portable at all (Linux here).
>
> I had a look at RemoteString as well.
>
> Phil
>
>
> On Fri, Jun 27, 2014 at 6:21 PM, Guillermo Polito <[hidden email]> wrote:
> If you have a MultiByteFileStream you can do
>
> stream position: stream size - 200.
> stream next: 200.
>
> Have a look at RemoteString, which is the class used to read the source from the source and changes files (without loading all of them into memory)
>
>
> On Fri, Jun 27, 2014 at 6:15 PM, [hidden email] <[hidden email]> wrote:
> I wonder how you guys would read the last n lines from a file in Pharo without reading through the whole thing.
>
> Is there code doing just that somewhere?
>
> The code I have is a shell script doing a 'tac file | tail -200 > /temp/something'
>
> I can always do that through OSProcess but wondered if there was something available.
>
> Reading through with a 200-entries FIFO circular buffer seems a bit silly to do.
>
> TIA
> Phil
>
>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Reading the last n lines from a file without using lots of memory

Sven Van Caekenberghe-2

On 29 Jun 2014, at 16:18, Denis Kudriashov <[hidden email]> wrote:

> It should be trivial with XStream

I would love to see that code.

We have Xtreams building happily

  https://ci.inria.fr/pharo-contribution/job/Xtreams/

it is waiting in the trenches, waiting to be used...

> 2014-06-27 21:45 GMT+04:00 Sven Van Caekenberghe <[hidden email]>:
> http://stackoverflow.com/questions/10164597/how-would-you-implement-tail-efficiently
> http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c
>
> I would work with the growing buffer read backwards.
> It would be great fun doing that in Pharo.
>
> On 27 Jun 2014, at 18:50, [hidden email] wrote:
>
> > Thanks.
> >
> > But this would give me the 200 last chars. I am interested in the 200 last lines.
> >
> > Now, I did this:
> >
> > command := 'tac ', file fullName, ' | head -200'.
> > ^ (PipeableOSProcess command: command) output.
> >
> > which did the trick but isn't portable at all (Linux here).
> >
> > I had a look at RemoteString as well.
> >
> > Phil
> >
> >
> > On Fri, Jun 27, 2014 at 6:21 PM, Guillermo Polito <[hidden email]> wrote:
> > If you have a MultiByteFileStream you can do
> >
> > stream position: stream size - 200.
> > stream next: 200.
> >
> > Have a look at RemoteString, which is the class used to read the source from the source and changes files (without loading all of them into memory)
> >
> >
> > On Fri, Jun 27, 2014 at 6:15 PM, [hidden email] <[hidden email]> wrote:
> > I wonder how you guys would read the last n lines from a file in Pharo without reading through the whole thing.
> >
> > Is there code doing just that somewhere?
> >
> > The code I have is a shell script doing a 'tac file | tail -200 > /temp/something'
> >
> > I can always do that through OSProcess but wondered if there was something available.
> >
> > Reading through with a 200-entries FIFO circular buffer seems a bit silly to do.
> >
> > TIA
> > Phil
> >
> >
> >
> >
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Reading the last n lines from a file without using lots of memory

philippeback


Le 29 juin 2014 16:42, "Sven Van Caekenberghe" <[hidden email]> a écrit :
>
>
> On 29 Jun 2014, at 16:18, Denis Kudriashov <[hidden email]> wrote:
>
> > It should be trivial with XStream
>
> I would love to see that code.
>
> We have Xtreams building happily
>
>   https://ci.inria.fr/pharo-contribution/job/Xtreams/
>
> it is waiting in the trenches, waiting to be used...
>

Xtreams looks like a very powerful capability.

Now how would one use it for this case?
> > 2014-06-27 21:45 GMT+04:00 Sven Van Caekenberghe <[hidden email]>:
> > http://stackoverflow.com/questions/10164597/how-would-you-implement-tail-efficiently
> > http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c
> >
> > I would work with the growing buffer read backwards.
> > It would be great fun doing that in Pharo.
> >
> > On 27 Jun 2014, at 18:50, [hidden email] wrote:
> >
> > > Thanks.
> > >
> > > But this would give me the 200 last chars. I am interested in the 200 last lines.
> > >
> > > Now, I did this:
> > >
> > > command := 'tac ', file fullName, ' | head -200'.
> > > ^ (PipeableOSProcess command: command) output.
> > >
> > > which did the trick but isn't portable at all (Linux here).
> > >
> > > I had a look at RemoteString as well.
> > >
> > > Phil
> > >
> > >
> > > On Fri, Jun 27, 2014 at 6:21 PM, Guillermo Polito <[hidden email]> wrote:
> > > If you have a MultiByteFileStream you can do
> > >
> > > stream position: stream size - 200.
> > > stream next: 200.
> > >
> > > Have a look at RemoteString, which is the class used to read the source from the source and changes files (without loading all of them into memory)
> > >
> > >
> > > On Fri, Jun 27, 2014 at 6:15 PM, [hidden email] <[hidden email]> wrote:
> > > I wonder how you guys would read the last n lines from a file in Pharo without reading through the whole thing.
> > >
> > > Is there code doing just that somewhere?
> > >
> > > The code I have is a shell script doing a 'tac file | tail -200 > /temp/something'
> > >
> > > I can always do that through OSProcess but wondered if there was something available.
> > >
> > > Reading through with a 200-entries FIFO circular buffer seems a bit silly to do.
> > >
> > > TIA
> > > Phil
> > >
> > >
> > >
> > >
> >
> >
> >
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Reading the last n lines from a file without using lots of memory

Denis Kudriashov
I look now at streams image. I actually not use XStreams but look at it some times ago. So I solve your task with such code:

s:= FileLocator changes reading .
reversed := [ :out | s -= 1. [s position=0] whileFalse: [ out put: s peek. s -- 1 ] ] reading.
lines := (reversed ending: Character cr asInteger) slicing.
lastLines := (lines limiting: 50) collect: [ :eachReversedLine | 
(eachReversedLine rest reversed reading encoding: #utf8) rest].
lastLines reversed

And now I have questions to stream maintainers:
1) Can we add #reversing transformation? (maybe it already exists?)
2) Why "slicing collecting" is not work and "slicing collect:" should be used?  Last don't produce stream but it process full source stream with collect block. I know it is possible to build another reading block for "slicing collecting". But what is reason that it's not works out of the box? (It is not clean to me after reading google docs)

Best regards,
Denis


2014-06-29 19:26 GMT+04:00 [hidden email] <[hidden email]>:


Le 29 juin 2014 16:42, "Sven Van Caekenberghe" <[hidden email]> a écrit :


>
>
> On 29 Jun 2014, at 16:18, Denis Kudriashov <[hidden email]> wrote:
>
> > It should be trivial with XStream
>
> I would love to see that code.
>
> We have Xtreams building happily
>
>   https://ci.inria.fr/pharo-contribution/job/Xtreams/
>
> it is waiting in the trenches, waiting to be used...
>

Xtreams looks like a very powerful capability.

Now how would one use it for this case?


> > 2014-06-27 21:45 GMT+04:00 Sven Van Caekenberghe <[hidden email]>:
> > http://stackoverflow.com/questions/10164597/how-would-you-implement-tail-efficiently
> > http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c
> >
> > I would work with the growing buffer read backwards.
> > It would be great fun doing that in Pharo.
> >
> > On 27 Jun 2014, at 18:50, [hidden email] wrote:
> >
> > > Thanks.
> > >
> > > But this would give me the 200 last chars. I am interested in the 200 last lines.
> > >
> > > Now, I did this:
> > >
> > > command := 'tac ', file fullName, ' | head -200'.
> > > ^ (PipeableOSProcess command: command) output.
> > >
> > > which did the trick but isn't portable at all (Linux here).
> > >
> > > I had a look at RemoteString as well.
> > >
> > > Phil
> > >
> > >
> > > On Fri, Jun 27, 2014 at 6:21 PM, Guillermo Polito <[hidden email]> wrote:
> > > If you have a MultiByteFileStream you can do
> > >
> > > stream position: stream size - 200.
> > > stream next: 200.
> > >
> > > Have a look at RemoteString, which is the class used to read the source from the source and changes files (without loading all of them into memory)
> > >
> > >
> > > On Fri, Jun 27, 2014 at 6:15 PM, [hidden email] <[hidden email]> wrote:
> > > I wonder how you guys would read the last n lines from a file in Pharo without reading through the whole thing.
> > >
> > > Is there code doing just that somewhere?
> > >
> > > The code I have is a shell script doing a 'tac file | tail -200 > /temp/something'
> > >
> > > I can always do that through OSProcess but wondered if there was something available.
> > >
> > > Reading through with a 200-entries FIFO circular buffer seems a bit silly to do.
> > >
> > > TIA
> > > Phil
> > >
> > >
> > >
> > >
> >
> >
> >
>
>
>