Faster FileStream experiments

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Colin Putney

On 27-Nov-09, at 8:13 AM, Randal L. Schwartz wrote:

>>>>>> "Colin" == Colin Putney <[hidden email]> writes:
>
> Colin> ...and code, perhaps? I did a bit of poking around, but  
> couldn't find
> Colin> anything on the web that said what the license actually is.  
> Can you be
> Colin> more specific than "liberal?"
>
> MLS made it clear at the meeting that Cincom's default release model  
> is now
> "open source" except for things that are business differentiating,  
> and in
> fact, in particular, they would really like to see Xtreams adopted  
> widely, so
> the license would have to be MIT-like for htat to happen.
>
> I'm sure if we poked Arden or James Robertson we could get a  
> statement of
> license for Xtreams available rather quickly.

I'm not going to hold my breath on that one. When Vassili wrote  
Announcements, I tried to get Cincom to attach an open source license  
to it. They loved the idea, wanted Announcements to be adopted widely,  
etc. Very positive, but never actually did it. Eventually, I wrote a  
new implementation from scratch in less time than I had already wasted  
dealing with Cincom.

This was a few years ago, and maybe things have changed at Cincom, but  
given that they haven't actually attached a license yet, I'd be very  
surprised if the shortest path to Xtreams-like functionality in Squeak  
involved the Cincom code.

Colin


Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Diego Gomez Deck
In reply to this post by Nicolas Cellier
El vie, 27-11-2009 a las 15:22 +0100, Nicolas Cellier escribió:

> 2009/11/27 Diego Gomez Deck <[hidden email]>:
> > El vie, 27-11-2009 a las 06:15 -0600, Ralph Johnson escribió:
> >> > I think we need a common superclass for Streams and Collection named
> >> > Iterable where #do: is abstract and #select:, #collect:, #reject:,
> >> > #count:, #detect:, etc (and quiet a lot of the messages in enumerating
> >> > category of Collection) are implemented based on #do:
> >> >
> >> > Of course Stream can refine the #select:/#reject methods to answer a
> >> > FilteredStream that decorates the receiver and apply the filtering on
> >> > the fly.  In the same way #collect: can return a TransformedStream that
> >> > decorates the receiver, etc.
> >>
> >> Since Stream can't reuse #select: and #collect: (or #count, and
> >> #detect: on an infinite stream is risky),
> >
> > Stream and Collection are just the 2 refinements of Iterable that we're
> > talking about in this thread, but there are a lot of classes that can
> > benefit from Iterable as a super-class.
> >
> > On the other side, Stream has #do: (and #atEnd/#next pair) and it's also
> > risky for infinite streams. To push this discussion forward, Is
> > InfiniteStream a real Stream?
> >
> >> they shouldn't be in the
> >> superclass. In that case, what is its purpose?
> >>
> >> i think it is fine to give Stream the same interface as Collection.  I
> >> do this, too.  But they will share very little code, and so there is
> >> no need to give them a common superclass.
> >>
> >> -Ralph Johnson
> >
> > Cheers,
> >
> > -- Diego
> >
>
> #select: and #collect: are not necessarily dangerous even on infinite
> stream once you see them as filters and implement them with a lazy
> block evaluation : Stream select: aBlock should return a SelectStream
> (find a better name here :).
> Then you would use it with #next, as any other InfiniteStream.

Sure, it was my point... The only risk with InfiniteStreams is #do:

My proposal is to create Iterable class, with default implementations of
#select:, #collect:, etc all based on #do: (Just like Collection
implements #size based on #do: but most collections just overwrite it
with a faster version).  This implementation is (at the same time) naive
implementations and documentation of the expected behaviour all writren
in terms of #do:.

Stream implements #select:, #collect: (and those type of messages)
answering a IterableDecorator that make the selection/collection/etc in
lazy way.

There are also some other useful decorators to implement: like
IterableComposite (an union of several iterables that can be handled
like one).

The FilterIterator/CollectorIterator can also be used to select/collect
lazyly on collections.

For InfiniteStreams (Random, Fibonacci Numbers, etc) I propose to create
a type of "Generators" that are "less" than a Stream and less than a
Iterator (they have not concept of #atEnd, #do: doesn't make sense,
etc).  Anyway, I'm not sure how many InfiniteStream we have in current
Squeak. I remember Random was a Stream in Smalltalk/80, but not sure the
current state in Squeak.

Cheers,

-- Diego



Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Eliot Miranda-2
In reply to this post by Nicolas Cellier
Hi Nicholas,

    here are my timings from Cog.  Only the ratios correspond since the source file is of a different size, my machine is different, and Cog runs at very different speeds to the interpreter.  With that in mind...

t1 is nextLine over the sources file via StandardFileStream
t2 is nextLine over the sources file via BufferedFileStream
t3 is next over the sources file via StandardFileStream
t4 is next over the sources file via BufferedFileStream

Cog: an OrderedCollection(11101 836 9626 2306)

Normalizing to the first measurement: 1.0 0.075 0.867 0.208

Your ratios are 1.0 0.206 4.827 0.678

I'd say BufferedFileStream is waaaaay faster :)



P.S. your timing doit revealed a bug in Cog which is why it has taken a while to respond with the results :)  The doit's temp names are encoded and appended to the method as extra bytes.  The JIT wasn't ignoring these extra bytes, and your doit just happened to cause the JIT to follow a null pointer mistakenly scanning these extra bytes.  So thank you :)


On Wed, Nov 18, 2009 at 3:10 AM, Nicolas Cellier <[hidden email]> wrote:
I just gave a try to the BufferedFileStream.
As usual, code is MIT.
Implementation is rough, readOnly, partial (no support for basicNext
crap & al), untested (certainly has bugs).
Early timing experiments have shown a 5x to 7x speed up on [stream
nextLine] and [stream next] micro benchmarks
See class comment of attachment

Reminder: This bench is versus StandardFileStream.
StandardFileStream is the "fast" version, CrLf anf MultiByte are far worse!
This still let some more room...

Integrating and testing a read/write version is a lot harder than this
experiment, but we should really do it.

Nicolas






Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Nicolas Cellier
2009/11/27 Eliot Miranda <[hidden email]>:

> Hi Nicholas,
>     here are my timings from Cog.  Only the ratios correspond since the
> source file is of a different size, my machine is different, and Cog runs at
> very different speeds to the interpreter.  With that in mind...
> t1 is nextLine over the sources file via StandardFileStream
> t2 is nextLine over the sources file via BufferedFileStream
> t3 is next over the sources file via StandardFileStream
> t4 is next over the sources file via BufferedFileStream
> Cog: an OrderedCollection(11101 836 9626 2306)
> Normalizing to the first measurement: 1.0 0.075 0.867 0.208
> Your ratios are 1.0 0.206 4.827 0.678
>
> I'd say BufferedFileStream is waaaaay faster :)
>

Impressive.
I presume every Smalltalk message is accelerated while primitive call
remain expensive...

>
> P.S. your timing doit revealed a bug in Cog which is why it has taken a
> while to respond with the results :)  The doit's temp names are encoded and
> appended to the method as extra bytes.  The JIT wasn't ignoring these extra
> bytes, and your doit just happened to cause the JIT to follow a null pointer
> mistakenly scanning these extra bytes.  So thank you :)
>

Oh, you discovered my secret for finding bugs: (bad) luck

Nicolas

> On Wed, Nov 18, 2009 at 3:10 AM, Nicolas Cellier
> <[hidden email]> wrote:
>>
>> I just gave a try to the BufferedFileStream.
>> As usual, code is MIT.
>> Implementation is rough, readOnly, partial (no support for basicNext
>> crap & al), untested (certainly has bugs).
>> Early timing experiments have shown a 5x to 7x speed up on [stream
>> nextLine] and [stream next] micro benchmarks
>> See class comment of attachment
>>
>> Reminder: This bench is versus StandardFileStream.
>> StandardFileStream is the "fast" version, CrLf anf MultiByte are far
>> worse!
>> This still let some more room...
>>
>> Integrating and testing a read/write version is a lot harder than this
>> experiment, but we should really do it.
>>
>> Nicolas
>>
>>
>>
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Eliot Miranda-2


On Fri, Nov 27, 2009 at 1:49 PM, Nicolas Cellier <[hidden email]> wrote:
2009/11/27 Eliot Miranda <[hidden email]>:
> Hi Nicholas,
>     here are my timings from Cog.  Only the ratios correspond since the
> source file is of a different size, my machine is different, and Cog runs at
> very different speeds to the interpreter.  With that in mind...
> t1 is nextLine over the sources file via StandardFileStream
> t2 is nextLine over the sources file via BufferedFileStream
> t3 is next over the sources file via StandardFileStream
> t4 is next over the sources file via BufferedFileStream
> Cog: an OrderedCollection(11101 836 9626 2306)
> Normalizing to the first measurement: 1.0 0.075 0.867 0.208
> Your ratios are 1.0 0.206 4.827 0.678
>
> I'd say BufferedFileStream is waaaaay faster :)
>

Impressive.
I presume every Smalltalk message is accelerated while primitive call
remain expensive...

Exactly.  Or rather, the primitives which aren't implemented in machine code are even slower to invoke from machine code than in the interpreter.  Machine code primitives exist for SmallInteger + - / * // \\ % > >= < <= = ~=, for Float + - * / > >= < <= = ~=, for Object == at: ByteString at: and for BlockClosure value[:value:value:value:].  Once I reimplement the object representation I'll be happy to implement Object>>at:put: ByteString>>at:put: Behavior>>basicNew & Behavior>>basicNew: which should result in another significant step in performance.


>
> P.S. your timing doit revealed a bug in Cog which is why it has taken a
> while to respond with the results :)  The doit's temp names are encoded and
> appended to the method as extra bytes.  The JIT wasn't ignoring these extra
> bytes, and your doit just happened to cause the JIT to follow a null pointer
> mistakenly scanning these extra bytes.  So thank you :)
>

Oh, you discovered my secret for finding bugs: (bad) luck

:) :)
 

Nicolas

> On Wed, Nov 18, 2009 at 3:10 AM, Nicolas Cellier
> <[hidden email]> wrote:
>>
>> I just gave a try to the BufferedFileStream.
>> As usual, code is MIT.
>> Implementation is rough, readOnly, partial (no support for basicNext
>> crap & al), untested (certainly has bugs).
>> Early timing experiments have shown a 5x to 7x speed up on [stream
>> nextLine] and [stream next] micro benchmarks
>> See class comment of attachment
>>
>> Reminder: This bench is versus StandardFileStream.
>> StandardFileStream is the "fast" version, CrLf anf MultiByte are far
>> worse!
>> This still let some more room...
>>
>> Integrating and testing a read/write version is a lot harder than this
>> experiment, but we should really do it.
>>
>> Nicolas
>>
>>
>>
>
>
>
>
>




Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Nicolas Cellier
In reply to this post by Colin Putney
2009/11/27 Colin Putney <[hidden email]>:

>
> On 27-Nov-09, at 8:03 AM, David T. Lewis wrote:
>
>> I implemented IOHandle for this, see http://wiki.squeak.org/squeak/996.
>> I have not maintained it since about 2003, but the idea is
>> straightforward.
>
> Yes. I looked into IOHandle when implementing Filesystem, but decided to go
> with a new (simpler, but limited) implementation that would let me explore
> the requirements for the stream architecture I had in mind.
>
>> My purpose at that time was to :
>>
>>  * Separate the representation of external IO channels from the
>> represention
>>   of streams and communication protocols.
>>  * Provide a uniform representation of IO channels similar to the unix
>> notion
>>   of treating everything as a 'file'.
>>  * Simplify future refactoring of Socket and FileStream.
>>  * Provide a place for handling asynchronous IO events. Refer to the aio
>>   handling in the unix VM. Files, Sockets, and AsyncFiles could (should)
>> use
>>   a common IO event handling mechanism (aio event signaling a Smalltalk
>> Semaphore).
>
> Indeed. Filesystem comes at this from the other direction, but I think we
> want to end up in the same place. For now I've done TSTTCPW, which is use
> the primitives from the FilePlugin. But eventually I want to improve the
> plumbing. You've done some important work here - perhaps Filesystem can use
> AioPlugin at some point.
>
> Colin
>
>

I wonder why level 3 stdio was used (FILE * fopen, fclose ...) rather
than level 2 (int fid, open, close, ...) in file plugin... Better
portability ?

Nicolas

Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Nicolas Cellier
In reply to this post by Diego Gomez Deck
2009/11/27 Diego Gomez Deck <[hidden email]>:

> El vie, 27-11-2009 a las 15:22 +0100, Nicolas Cellier escribió:
>> 2009/11/27 Diego Gomez Deck <[hidden email]>:
>> > El vie, 27-11-2009 a las 06:15 -0600, Ralph Johnson escribió:
>> >> > I think we need a common superclass for Streams and Collection named
>> >> > Iterable where #do: is abstract and #select:, #collect:, #reject:,
>> >> > #count:, #detect:, etc (and quiet a lot of the messages in enumerating
>> >> > category of Collection) are implemented based on #do:
>> >> >
>> >> > Of course Stream can refine the #select:/#reject methods to answer a
>> >> > FilteredStream that decorates the receiver and apply the filtering on
>> >> > the fly.  In the same way #collect: can return a TransformedStream that
>> >> > decorates the receiver, etc.
>> >>
>> >> Since Stream can't reuse #select: and #collect: (or #count, and
>> >> #detect: on an infinite stream is risky),
>> >
>> > Stream and Collection are just the 2 refinements of Iterable that we're
>> > talking about in this thread, but there are a lot of classes that can
>> > benefit from Iterable as a super-class.
>> >
>> > On the other side, Stream has #do: (and #atEnd/#next pair) and it's also
>> > risky for infinite streams. To push this discussion forward, Is
>> > InfiniteStream a real Stream?
>> >
>> >> they shouldn't be in the
>> >> superclass. In that case, what is its purpose?
>> >>
>> >> i think it is fine to give Stream the same interface as Collection.  I
>> >> do this, too.  But they will share very little code, and so there is
>> >> no need to give them a common superclass.
>> >>
>> >> -Ralph Johnson
>> >
>> > Cheers,
>> >
>> > -- Diego
>> >
>>
>> #select: and #collect: are not necessarily dangerous even on infinite
>> stream once you see them as filters and implement them with a lazy
>> block evaluation : Stream select: aBlock should return a SelectStream
>> (find a better name here :).
>> Then you would use it with #next, as any other InfiniteStream.
>
> Sure, it was my point... The only risk with InfiniteStreams is #do:
>
> My proposal is to create Iterable class, with default implementations of
> #select:, #collect:, etc all based on #do: (Just like Collection
> implements #size based on #do: but most collections just overwrite it
> with a faster version).  This implementation is (at the same time) naive
> implementations and documentation of the expected behaviour all writren
> in terms of #do:.
>
> Stream implements #select:, #collect: (and those type of messages)
> answering a IterableDecorator that make the selection/collection/etc in
> lazy way.
>
> There are also some other useful decorators to implement: like
> IterableComposite (an union of several iterables that can be handled
> like one).
>
> The FilterIterator/CollectorIterator can also be used to select/collect
> lazyly on collections.
>
> For InfiniteStreams (Random, Fibonacci Numbers, etc) I propose to create
> a type of "Generators" that are "less" than a Stream and less than a
> Iterator (they have not concept of #atEnd, #do: doesn't make sense,
> etc).  Anyway, I'm not sure how many InfiniteStream we have in current
> Squeak. I remember Random was a Stream in Smalltalk/80, but not sure the
> current state in Squeak.
>

Oh, they could have a very simple concept:
atEnd
    ^false,
do: aBlock
    [aBlock value: self next] repeat.
But we might want to discourage such usage as well indeed

Nicolas

> Cheers,
>
> -- Diego
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Eliot Miranda-2
In reply to this post by Nicolas Cellier


On Fri, Nov 27, 2009 at 2:24 PM, Nicolas Cellier <[hidden email]> wrote:
2009/11/27 Colin Putney <[hidden email]>:
>
> On 27-Nov-09, at 8:03 AM, David T. Lewis wrote:
>
>> I implemented IOHandle for this, see http://wiki.squeak.org/squeak/996.
>> I have not maintained it since about 2003, but the idea is
>> straightforward.
>
> Yes. I looked into IOHandle when implementing Filesystem, but decided to go
> with a new (simpler, but limited) implementation that would let me explore
> the requirements for the stream architecture I had in mind.
>
>> My purpose at that time was to :
>>
>>  * Separate the representation of external IO channels from the
>> represention
>>   of streams and communication protocols.
>>  * Provide a uniform representation of IO channels similar to the unix
>> notion
>>   of treating everything as a 'file'.
>>  * Simplify future refactoring of Socket and FileStream.
>>  * Provide a place for handling asynchronous IO events. Refer to the aio
>>   handling in the unix VM. Files, Sockets, and AsyncFiles could (should)
>> use
>>   a common IO event handling mechanism (aio event signaling a Smalltalk
>> Semaphore).
>
> Indeed. Filesystem comes at this from the other direction, but I think we
> want to end up in the same place. For now I've done TSTTCPW, which is use
> the primitives from the FilePlugin. But eventually I want to improve the
> plumbing. You've done some important work here - perhaps Filesystem can use
> AioPlugin at some point.
>
> Colin
>
>

I wonder why level 3 stdio was used (FILE * fopen, fclose ...) rather
than level 2 (int fid, open, close, ...) in file plugin... Better
portability ?

level 2 isn't really a level, its a section of the unix manual pages.  Section 2 is the system calls (which really define what unix is).  Section 3 is libraries.  So only the stdio library in section 3 is portable across C implementations.  So yes, you're right, the use of the C library's stdio facilities was chosen for portability.

Nicolas




Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Eliot Miranda-2
An approach I like is to add an endOfStreamValue inst var to Stream and answer its value when at end.  This way nil does not have to be the endOfStreamValue, for example -1 might be much more convenient for a binary stream, and streams can answer nil without confusing their clients.  atEnd can be implemented as
    atEnd
        ^self peek = self endOfStreamValue

You can arrange to make streams raise an end-of-stream exception instead of the endOfStreamValue by using some convention on the contents of endOfStreamValue, such as if it is == to the stream itself (although I note that in the Teleplace image the exception EndOfStrean is defined bit not used).


Of course, stream primitives get in the way of adding inst vars to stream classes ;)

IMO this is a much more useful scheme than making nil the only endOfStream value.

On Fri, Nov 27, 2009 at 2:33 PM, Eliot Miranda <[hidden email]> wrote:


On Fri, Nov 27, 2009 at 2:24 PM, Nicolas Cellier <[hidden email]> wrote:
2009/11/27 Colin Putney <[hidden email]>:
>
> On 27-Nov-09, at 8:03 AM, David T. Lewis wrote:
>
>> I implemented IOHandle for this, see http://wiki.squeak.org/squeak/996.
>> I have not maintained it since about 2003, but the idea is
>> straightforward.
>
> Yes. I looked into IOHandle when implementing Filesystem, but decided to go
> with a new (simpler, but limited) implementation that would let me explore
> the requirements for the stream architecture I had in mind.
>
>> My purpose at that time was to :
>>
>>  * Separate the representation of external IO channels from the
>> represention
>>   of streams and communication protocols.
>>  * Provide a uniform representation of IO channels similar to the unix
>> notion
>>   of treating everything as a 'file'.
>>  * Simplify future refactoring of Socket and FileStream.
>>  * Provide a place for handling asynchronous IO events. Refer to the aio
>>   handling in the unix VM. Files, Sockets, and AsyncFiles could (should)
>> use
>>   a common IO event handling mechanism (aio event signaling a Smalltalk
>> Semaphore).
>
> Indeed. Filesystem comes at this from the other direction, but I think we
> want to end up in the same place. For now I've done TSTTCPW, which is use
> the primitives from the FilePlugin. But eventually I want to improve the
> plumbing. You've done some important work here - perhaps Filesystem can use
> AioPlugin at some point.
>
> Colin
>
>

I wonder why level 3 stdio was used (FILE * fopen, fclose ...) rather
than level 2 (int fid, open, close, ...) in file plugin... Better
portability ?

level 2 isn't really a level, its a section of the unix manual pages.  Section 2 is the system calls (which really define what unix is).  Section 3 is libraries.  So only the stdio library in section 3 is portable across C implementations.  So yes, you're right, the use of the C library's stdio facilities was chosen for portability.

Nicolas





Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Nicolas Cellier
2009/11/27 Eliot Miranda <[hidden email]>:

> An approach I like is to add an endOfStreamValue inst var to Stream and
> answer its value when at end.  This way nil does not have to be the
> endOfStreamValue, for example -1 might be much more convenient for a binary
> stream, and streams can answer nil without confusing their clients.  atEnd
> can be implemented as
>     atEnd
>         ^self peek = self endOfStreamValue
> You can arrange to make streams raise an end-of-stream exception instead of
> the endOfStreamValue by using some convention on the contents of
> endOfStreamValue, such as if it is == to the stream itself (although I note
> that in the Teleplace image the exception EndOfStrean is defined bit not
> used).
>
> Of course, stream primitives get in the way of adding inst vars to stream
> classes ;)
> IMO this is a much more useful scheme than making nil the only endOfStream
> value.
>

Last time I proposed to have an inst var endOfStreamAction was here
http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html
.
Abusing nil value -> nil, I could even let this inst var
un-initialized and be backward compatible
(initializing with a ValueHolder on nil would do as well)

Nicolas

> On Fri, Nov 27, 2009 at 2:33 PM, Eliot Miranda <[hidden email]>
> wrote:
>>
>>
>> On Fri, Nov 27, 2009 at 2:24 PM, Nicolas Cellier
>> <[hidden email]> wrote:
>>>
>>> 2009/11/27 Colin Putney <[hidden email]>:
>>> >
>>> > On 27-Nov-09, at 8:03 AM, David T. Lewis wrote:
>>> >
>>> >> I implemented IOHandle for this, see
>>> >> http://wiki.squeak.org/squeak/996.
>>> >> I have not maintained it since about 2003, but the idea is
>>> >> straightforward.
>>> >
>>> > Yes. I looked into IOHandle when implementing Filesystem, but decided
>>> > to go
>>> > with a new (simpler, but limited) implementation that would let me
>>> > explore
>>> > the requirements for the stream architecture I had in mind.
>>> >
>>> >> My purpose at that time was to :
>>> >>
>>> >>  * Separate the representation of external IO channels from the
>>> >> represention
>>> >>   of streams and communication protocols.
>>> >>  * Provide a uniform representation of IO channels similar to the unix
>>> >> notion
>>> >>   of treating everything as a 'file'.
>>> >>  * Simplify future refactoring of Socket and FileStream.
>>> >>  * Provide a place for handling asynchronous IO events. Refer to the
>>> >> aio
>>> >>   handling in the unix VM. Files, Sockets, and AsyncFiles could
>>> >> (should)
>>> >> use
>>> >>   a common IO event handling mechanism (aio event signaling a
>>> >> Smalltalk
>>> >> Semaphore).
>>> >
>>> > Indeed. Filesystem comes at this from the other direction, but I think
>>> > we
>>> > want to end up in the same place. For now I've done TSTTCPW, which is
>>> > use
>>> > the primitives from the FilePlugin. But eventually I want to improve
>>> > the
>>> > plumbing. You've done some important work here - perhaps Filesystem can
>>> > use
>>> > AioPlugin at some point.
>>> >
>>> > Colin
>>> >
>>> >
>>>
>>> I wonder why level 3 stdio was used (FILE * fopen, fclose ...) rather
>>> than level 2 (int fid, open, close, ...) in file plugin... Better
>>> portability ?
>>
>> level 2 isn't really a level, its a section of the unix manual pages.
>>  Section 2 is the system calls (which really define what unix is).  Section
>> 3 is libraries.  So only the stdio library in section 3 is portable across C
>> implementations.  So yes, you're right, the use of the C library's stdio
>> facilities was chosen for portability.
>>>
>>> Nicolas
>>>
>>
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Igor Stasenko
2009/11/28 Nicolas Cellier <[hidden email]>:

> 2009/11/27 Eliot Miranda <[hidden email]>:
>> An approach I like is to add an endOfStreamValue inst var to Stream and
>> answer its value when at end.  This way nil does not have to be the
>> endOfStreamValue, for example -1 might be much more convenient for a binary
>> stream, and streams can answer nil without confusing their clients.  atEnd
>> can be implemented as
>>     atEnd
>>         ^self peek = self endOfStreamValue
>> You can arrange to make streams raise an end-of-stream exception instead of
>> the endOfStreamValue by using some convention on the contents of
>> endOfStreamValue, such as if it is == to the stream itself (although I note
>> that in the Teleplace image the exception EndOfStrean is defined bit not
>> used).
>>
>> Of course, stream primitives get in the way of adding inst vars to stream
>> classes ;)
>> IMO this is a much more useful scheme than making nil the only endOfStream
>> value.
>>
>
> Last time I proposed to have an inst var endOfStreamAction was here
> http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html
> .
> Abusing nil value -> nil, I could even let this inst var
> un-initialized and be backward compatible
> (initializing with a ValueHolder on nil would do as well)
>

Nicolas, have you considered introducing methods which allow
graciously handle the end-of-stream while reading?
Something like:

nextIfAtEnd: aBlock
and
next: number ifAtEnd: aBlock


then caller may choose to either write:

char := stream nextIfAtEnd: [nil]

or handle end of stream differently, like leaving the loop:

char := stream nextIfAtEnd: [^ results]

the benefit of such approach that code which reads the stream , don't
needs to additionally
test stream state (atEnd) in iteration between #next sends neither
requires some unique value (like nil) returned by #next
when reaching end of stream.

> Nicolas
>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Eliot Miranda-2


On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko <[hidden email]> wrote:
2009/11/28 Nicolas Cellier <[hidden email]>:
> 2009/11/27 Eliot Miranda <[hidden email]>:
>> An approach I like is to add an endOfStreamValue inst var to Stream and
>> answer its value when at end.  This way nil does not have to be the
>> endOfStreamValue, for example -1 might be much more convenient for a binary
>> stream, and streams can answer nil without confusing their clients.  atEnd
>> can be implemented as
>>     atEnd
>>         ^self peek = self endOfStreamValue
>> You can arrange to make streams raise an end-of-stream exception instead of
>> the endOfStreamValue by using some convention on the contents of
>> endOfStreamValue, such as if it is == to the stream itself (although I note
>> that in the Teleplace image the exception EndOfStrean is defined bit not
>> used).
>>
>> Of course, stream primitives get in the way of adding inst vars to stream
>> classes ;)
>> IMO this is a much more useful scheme than making nil the only endOfStream
>> value.
>>
>
> Last time I proposed to have an inst var endOfStreamAction was here
> http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html
> .
> Abusing nil value -> nil, I could even let this inst var
> un-initialized and be backward compatible
> (initializing with a ValueHolder on nil would do as well)
>

Nicolas, have you considered introducing methods which allow
graciously handle the end-of-stream while reading?
Something like:

nextIfAtEnd: aBlock
and
next: number ifAtEnd: aBlock


then caller may choose to either write:

char := stream nextIfAtEnd: [nil]

or handle end of stream differently, like leaving the loop:

char := stream nextIfAtEnd: [^ results]

the benefit of such approach that code which reads the stream , don't
needs to additionally
test stream state (atEnd) in iteration between #next sends neither
requires some unique value (like nil) returned by #next
when reaching end of stream.

IMO the block creation is too expensive for streams.  The defaultHandler approach for and EndOfStream exception is also too expensive.  The endOfStreamValue inst var is a nice trade-off between flexibility, efficiency and simplicity.  You can always write
     [(value := stream next) ~~ stream endOfStreamValue] whileTrue:
        [...do stuff...

 

> Nicolas
>


--
Best regards,
Igor Stasenko AKA sig.




Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Igor Stasenko
2009/11/28 Eliot Miranda <[hidden email]>:

>
>
> On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> 2009/11/28 Nicolas Cellier <[hidden email]>:
>> > 2009/11/27 Eliot Miranda <[hidden email]>:
>> >> An approach I like is to add an endOfStreamValue inst var to Stream and
>> >> answer its value when at end.  This way nil does not have to be the
>> >> endOfStreamValue, for example -1 might be much more convenient for a
>> >> binary
>> >> stream, and streams can answer nil without confusing their clients.
>> >>  atEnd
>> >> can be implemented as
>> >>     atEnd
>> >>         ^self peek = self endOfStreamValue
>> >> You can arrange to make streams raise an end-of-stream exception
>> >> instead of
>> >> the endOfStreamValue by using some convention on the contents of
>> >> endOfStreamValue, such as if it is == to the stream itself (although I
>> >> note
>> >> that in the Teleplace image the exception EndOfStrean is defined bit
>> >> not
>> >> used).
>> >>
>> >> Of course, stream primitives get in the way of adding inst vars to
>> >> stream
>> >> classes ;)
>> >> IMO this is a much more useful scheme than making nil the only
>> >> endOfStream
>> >> value.
>> >>
>> >
>> > Last time I proposed to have an inst var endOfStreamAction was here
>> >
>> > http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html
>> > .
>> > Abusing nil value -> nil, I could even let this inst var
>> > un-initialized and be backward compatible
>> > (initializing with a ValueHolder on nil would do as well)
>> >
>>
>> Nicolas, have you considered introducing methods which allow
>> graciously handle the end-of-stream while reading?
>> Something like:
>>
>> nextIfAtEnd: aBlock
>> and
>> next: number ifAtEnd: aBlock
>>
>>
>> then caller may choose to either write:
>>
>> char := stream nextIfAtEnd: [nil]
>>
>> or handle end of stream differently, like leaving the loop:
>>
>> char := stream nextIfAtEnd: [^ results]
>>
>> the benefit of such approach that code which reads the stream , don't
>> needs to additionally
>> test stream state (atEnd) in iteration between #next sends neither
>> requires some unique value (like nil) returned by #next
>> when reaching end of stream.
>
> IMO the block creation is too expensive for streams.  The defaultHandler
> approach for and EndOfStream exception is also too expensive.  The
> endOfStreamValue inst var is a nice trade-off between flexibility,
> efficiency and simplicity.  You can always write
>      [(value := stream next) ~~ stream endOfStreamValue] whileTrue:
>         [...do stuff...
>

hmm, can you elaborate, at what point you see an expensive block creation?
A block closure is created once at compiling stage, and then passed as
any other object by reading it
from literal frame of method (and as well as , you can use 'stream
nextIfAtEnd: nil' , right?). And only if its going to be activated (by
sending #value), a corresponding block context is created in order to
evaluate the block. But it happens only when you reaching the end of
stream.

It is more expensive because of passing extra argument, i.e. use
#nextIfAtEnd: instead of #next , but not because of passing block,
IMO.

>>
>> > Nicolas
>> >
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Levente Uzonyi-2
On Sat, 28 Nov 2009, Igor Stasenko wrote:

> 2009/11/28 Eliot Miranda <[hidden email]>:
>>
>>
>> On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko <[hidden email]> wrote:
>>>
>>> 2009/11/28 Nicolas Cellier <[hidden email]>:
>>>> 2009/11/27 Eliot Miranda <[hidden email]>:
>>>>> An approach I like is to add an endOfStreamValue inst var to Stream and
>>>>> answer its value when at end.  This way nil does not have to be the
>>>>> endOfStreamValue, for example -1 might be much more convenient for a
>>>>> binary
>>>>> stream, and streams can answer nil without confusing their clients.
>>>>>  atEnd
>>>>> can be implemented as
>>>>>     atEnd
>>>>>         ^self peek = self endOfStreamValue
>>>>> You can arrange to make streams raise an end-of-stream exception
>>>>> instead of
>>>>> the endOfStreamValue by using some convention on the contents of
>>>>> endOfStreamValue, such as if it is == to the stream itself (although I
>>>>> note
>>>>> that in the Teleplace image the exception EndOfStrean is defined bit
>>>>> not
>>>>> used).
>>>>>
>>>>> Of course, stream primitives get in the way of adding inst vars to
>>>>> stream
>>>>> classes ;)
>>>>> IMO this is a much more useful scheme than making nil the only
>>>>> endOfStream
>>>>> value.
>>>>>
>>>>
>>>> Last time I proposed to have an inst var endOfStreamAction was here
>>>>
>>>> http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html
>>>> .
>>>> Abusing nil value -> nil, I could even let this inst var
>>>> un-initialized and be backward compatible
>>>> (initializing with a ValueHolder on nil would do as well)
>>>>
>>>
>>> Nicolas, have you considered introducing methods which allow
>>> graciously handle the end-of-stream while reading?
>>> Something like:
>>>
>>> nextIfAtEnd: aBlock
>>> and
>>> next: number ifAtEnd: aBlock
>>>
>>>
>>> then caller may choose to either write:
>>>
>>> char := stream nextIfAtEnd: [nil]
>>>
>>> or handle end of stream differently, like leaving the loop:
>>>
>>> char := stream nextIfAtEnd: [^ results]
>>>
>>> the benefit of such approach that code which reads the stream , don't
>>> needs to additionally
>>> test stream state (atEnd) in iteration between #next sends neither
>>> requires some unique value (like nil) returned by #next
>>> when reaching end of stream.
>>
>> IMO the block creation is too expensive for streams.  The defaultHandler
>> approach for and EndOfStream exception is also too expensive.  The
>> endOfStreamValue inst var is a nice trade-off between flexibility,
>> efficiency and simplicity.  You can always write
>>      [(value := stream next) ~~ stream endOfStreamValue] whileTrue:
>>         [...do stuff...
>>
>
> hmm, can you elaborate, at what point you see an expensive block creation?
> A block closure is created once at compiling stage, and then passed as
> any other object by reading it
> from literal frame of method (and as well as , you can use 'stream
In this case the block is copied and initialized every time you send
#nextIfAtEnd:. It is only activated at the end of the stream, so most of
the time it is just garbage.

Levente

> nextIfAtEnd: nil' , right?). And only if its going to be activated (by
> sending #value), a corresponding block context is created in order to
> evaluate the block. But it happens only when you reaching the end of
> stream.
>
> It is more expensive because of passing extra argument, i.e. use
> #nextIfAtEnd: instead of #next , but not because of passing block,
> IMO.
>
>>>
>>>> Nicolas
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko AKA sig.
>>>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Nicolas Cellier
2009/11/28 Levente Uzonyi <[hidden email]>:

> On Sat, 28 Nov 2009, Igor Stasenko wrote:
>
>> 2009/11/28 Eliot Miranda <[hidden email]>:
>>>
>>>
>>> On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko <[hidden email]>
>>> wrote:
>>>>
>>>> 2009/11/28 Nicolas Cellier <[hidden email]>:
>>>>>
>>>>> 2009/11/27 Eliot Miranda <[hidden email]>:
>>>>>>
>>>>>> An approach I like is to add an endOfStreamValue inst var to Stream
>>>>>> and
>>>>>> answer its value when at end.  This way nil does not have to be the
>>>>>> endOfStreamValue, for example -1 might be much more convenient for a
>>>>>> binary
>>>>>> stream, and streams can answer nil without confusing their clients.
>>>>>>  atEnd
>>>>>> can be implemented as
>>>>>>     atEnd
>>>>>>         ^self peek = self endOfStreamValue
>>>>>> You can arrange to make streams raise an end-of-stream exception
>>>>>> instead of
>>>>>> the endOfStreamValue by using some convention on the contents of
>>>>>> endOfStreamValue, such as if it is == to the stream itself (although I
>>>>>> note
>>>>>> that in the Teleplace image the exception EndOfStrean is defined bit
>>>>>> not
>>>>>> used).
>>>>>>
>>>>>> Of course, stream primitives get in the way of adding inst vars to
>>>>>> stream
>>>>>> classes ;)
>>>>>> IMO this is a much more useful scheme than making nil the only
>>>>>> endOfStream
>>>>>> value.
>>>>>>
>>>>>
>>>>> Last time I proposed to have an inst var endOfStreamAction was here
>>>>>
>>>>>
>>>>> http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html
>>>>> .
>>>>> Abusing nil value -> nil, I could even let this inst var
>>>>> un-initialized and be backward compatible
>>>>> (initializing with a ValueHolder on nil would do as well)
>>>>>
>>>>
>>>> Nicolas, have you considered introducing methods which allow
>>>> graciously handle the end-of-stream while reading?
>>>> Something like:
>>>>
>>>> nextIfAtEnd: aBlock
>>>> and
>>>> next: number ifAtEnd: aBlock
>>>>
>>>>
>>>> then caller may choose to either write:
>>>>
>>>> char := stream nextIfAtEnd: [nil]
>>>>
>>>> or handle end of stream differently, like leaving the loop:
>>>>
>>>> char := stream nextIfAtEnd: [^ results]
>>>>
>>>> the benefit of such approach that code which reads the stream , don't
>>>> needs to additionally
>>>> test stream state (atEnd) in iteration between #next sends neither
>>>> requires some unique value (like nil) returned by #next
>>>> when reaching end of stream.
>>>
>>> IMO the block creation is too expensive for streams.  The defaultHandler
>>> approach for and EndOfStream exception is also too expensive.  The
>>> endOfStreamValue inst var is a nice trade-off between flexibility,
>>> efficiency and simplicity.  You can always write
>>>      [(value := stream next) ~~ stream endOfStreamValue] whileTrue:
>>>         [...do stuff...
>>>
>>
>> hmm, can you elaborate, at what point you see an expensive block creation?
>> A block closure is created once at compiling stage, and then passed as
>> any other object by reading it
>> from literal frame of method (and as well as , you can use 'stream
>
> In this case the block is copied and initialized every time you send
> #nextIfAtEnd:. It is only activated at the end of the stream, so most of the
> time it is just garbage.
>
> Levente
>

http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-November/122512.html

Nicolas

>> nextIfAtEnd: nil' , right?). And only if its going to be activated (by
>> sending #value), a corresponding block context is created in order to
>> evaluate the block. But it happens only when you reaching the end of
>> stream.
>>
>> It is more expensive because of passing extra argument, i.e. use
>> #nextIfAtEnd: instead of #next , but not because of passing block,
>> IMO.
>>
>>>>
>>>>> Nicolas
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Igor Stasenko AKA sig.
>>>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Igor Stasenko
In reply to this post by Levente Uzonyi-2
2009/11/28 Levente Uzonyi <[hidden email]>:

> On Sat, 28 Nov 2009, Igor Stasenko wrote:
>
>> 2009/11/28 Eliot Miranda <[hidden email]>:
>>>
>>>
>>> On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko <[hidden email]>
>>> wrote:
>>>>
>>>> 2009/11/28 Nicolas Cellier <[hidden email]>:
>>>>>
>>>>> 2009/11/27 Eliot Miranda <[hidden email]>:
>>>>>>
>>>>>> An approach I like is to add an endOfStreamValue inst var to Stream
>>>>>> and
>>>>>> answer its value when at end.  This way nil does not have to be the
>>>>>> endOfStreamValue, for example -1 might be much more convenient for a
>>>>>> binary
>>>>>> stream, and streams can answer nil without confusing their clients.
>>>>>>  atEnd
>>>>>> can be implemented as
>>>>>>     atEnd
>>>>>>         ^self peek = self endOfStreamValue
>>>>>> You can arrange to make streams raise an end-of-stream exception
>>>>>> instead of
>>>>>> the endOfStreamValue by using some convention on the contents of
>>>>>> endOfStreamValue, such as if it is == to the stream itself (although I
>>>>>> note
>>>>>> that in the Teleplace image the exception EndOfStrean is defined bit
>>>>>> not
>>>>>> used).
>>>>>>
>>>>>> Of course, stream primitives get in the way of adding inst vars to
>>>>>> stream
>>>>>> classes ;)
>>>>>> IMO this is a much more useful scheme than making nil the only
>>>>>> endOfStream
>>>>>> value.
>>>>>>
>>>>>
>>>>> Last time I proposed to have an inst var endOfStreamAction was here
>>>>>
>>>>>
>>>>> http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html
>>>>> .
>>>>> Abusing nil value -> nil, I could even let this inst var
>>>>> un-initialized and be backward compatible
>>>>> (initializing with a ValueHolder on nil would do as well)
>>>>>
>>>>
>>>> Nicolas, have you considered introducing methods which allow
>>>> graciously handle the end-of-stream while reading?
>>>> Something like:
>>>>
>>>> nextIfAtEnd: aBlock
>>>> and
>>>> next: number ifAtEnd: aBlock
>>>>
>>>>
>>>> then caller may choose to either write:
>>>>
>>>> char := stream nextIfAtEnd: [nil]
>>>>
>>>> or handle end of stream differently, like leaving the loop:
>>>>
>>>> char := stream nextIfAtEnd: [^ results]
>>>>
>>>> the benefit of such approach that code which reads the stream , don't
>>>> needs to additionally
>>>> test stream state (atEnd) in iteration between #next sends neither
>>>> requires some unique value (like nil) returned by #next
>>>> when reaching end of stream.
>>>
>>> IMO the block creation is too expensive for streams.  The defaultHandler
>>> approach for and EndOfStream exception is also too expensive.  The
>>> endOfStreamValue inst var is a nice trade-off between flexibility,
>>> efficiency and simplicity.  You can always write
>>>      [(value := stream next) ~~ stream endOfStreamValue] whileTrue:
>>>         [...do stuff...
>>>
>>
>> hmm, can you elaborate, at what point you see an expensive block creation?
>> A block closure is created once at compiling stage, and then passed as
>> any other object by reading it
>> from literal frame of method (and as well as , you can use 'stream
>
> In this case the block is copied and initialized every time you send
> #nextIfAtEnd:. It is only activated at the end of the stream, so most of the
> time it is just garbage.
>
ah, yes.. forgot about that.

Well, you can move the block out of the loop:
| block |

block := [ self foo ].

[ stream nextIfAtEnd: block .. ] repeat.

but of course, its not always possible and not first thought which
comes into mind when you using blocks when coding.

Btw, i think is good field for compiler/runtime optimizations - to
avoid excessive closure creation inside a loops/nested blocks.

> Levente
>
>> nextIfAtEnd: nil' , right?). And only if its going to be activated (by
>> sending #value), a corresponding block context is created in order to
>> evaluate the block. But it happens only when you reaching the end of
>> stream.
>>
>> It is more expensive because of passing extra argument, i.e. use
>> #nextIfAtEnd: instead of #next , but not because of passing block,
>> IMO.
>>
>>>>
>>>>> Nicolas
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Igor Stasenko AKA sig.
>>>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Andreas.Raab
In reply to this post by Nicolas Cellier
Hi Nicolas -

I finally got around to looking at this stuff. A couple of comments:

* Regardless of what the long-term solution is, I could really, really
use the performance improvements of BufferedFileStream. How can we bring
this to a usable point?

* I'm not sure I like the subclassing of StandardFileStream - I would
probably opt to subclass FileStream, adopt the primitives and write the
stuff on top from scratch (this also allows us to keep a filePosition
which is explicitly updated etc).

* It is highly likely that read performance is dramatically more
important than write performance in most cases. It may be worthwhile to
start with just buffering reads and have writes go unbuffered. This also
preserves current semantics, allowing to gradually phase in buffered
writes where desired (i.e., using #flushAfter: aBlock). This would make
BufferedFileStream instantly useful for our production uses.

In any case, I *really* like the direction. If we can get this into a
usable state it would allow us to replace the sources and changes files
with buffered versions. As a result I would expect measurable speedups
in some of the macro benchmarks and other common operations (Object
compileAll for example).

Cheers,
   - Andreas

Nicolas Cellier wrote:

> 2009/11/28 Levente Uzonyi <[hidden email]>:
>> On Sat, 28 Nov 2009, Igor Stasenko wrote:
>>
>>> 2009/11/28 Eliot Miranda <[hidden email]>:
>>>>
>>>> On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko <[hidden email]>
>>>> wrote:
>>>>> 2009/11/28 Nicolas Cellier <[hidden email]>:
>>>>>> 2009/11/27 Eliot Miranda <[hidden email]>:
>>>>>>> An approach I like is to add an endOfStreamValue inst var to Stream
>>>>>>> and
>>>>>>> answer its value when at end.  This way nil does not have to be the
>>>>>>> endOfStreamValue, for example -1 might be much more convenient for a
>>>>>>> binary
>>>>>>> stream, and streams can answer nil without confusing their clients.
>>>>>>>  atEnd
>>>>>>> can be implemented as
>>>>>>>     atEnd
>>>>>>>         ^self peek = self endOfStreamValue
>>>>>>> You can arrange to make streams raise an end-of-stream exception
>>>>>>> instead of
>>>>>>> the endOfStreamValue by using some convention on the contents of
>>>>>>> endOfStreamValue, such as if it is == to the stream itself (although I
>>>>>>> note
>>>>>>> that in the Teleplace image the exception EndOfStrean is defined bit
>>>>>>> not
>>>>>>> used).
>>>>>>>
>>>>>>> Of course, stream primitives get in the way of adding inst vars to
>>>>>>> stream
>>>>>>> classes ;)
>>>>>>> IMO this is a much more useful scheme than making nil the only
>>>>>>> endOfStream
>>>>>>> value.
>>>>>>>
>>>>>> Last time I proposed to have an inst var endOfStreamAction was here
>>>>>>
>>>>>>
>>>>>> http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html
>>>>>> .
>>>>>> Abusing nil value -> nil, I could even let this inst var
>>>>>> un-initialized and be backward compatible
>>>>>> (initializing with a ValueHolder on nil would do as well)
>>>>>>
>>>>> Nicolas, have you considered introducing methods which allow
>>>>> graciously handle the end-of-stream while reading?
>>>>> Something like:
>>>>>
>>>>> nextIfAtEnd: aBlock
>>>>> and
>>>>> next: number ifAtEnd: aBlock
>>>>>
>>>>>
>>>>> then caller may choose to either write:
>>>>>
>>>>> char := stream nextIfAtEnd: [nil]
>>>>>
>>>>> or handle end of stream differently, like leaving the loop:
>>>>>
>>>>> char := stream nextIfAtEnd: [^ results]
>>>>>
>>>>> the benefit of such approach that code which reads the stream , don't
>>>>> needs to additionally
>>>>> test stream state (atEnd) in iteration between #next sends neither
>>>>> requires some unique value (like nil) returned by #next
>>>>> when reaching end of stream.
>>>> IMO the block creation is too expensive for streams.  The defaultHandler
>>>> approach for and EndOfStream exception is also too expensive.  The
>>>> endOfStreamValue inst var is a nice trade-off between flexibility,
>>>> efficiency and simplicity.  You can always write
>>>>      [(value := stream next) ~~ stream endOfStreamValue] whileTrue:
>>>>         [...do stuff...
>>>>
>>> hmm, can you elaborate, at what point you see an expensive block creation?
>>> A block closure is created once at compiling stage, and then passed as
>>> any other object by reading it
>>> from literal frame of method (and as well as , you can use 'stream
>> In this case the block is copied and initialized every time you send
>> #nextIfAtEnd:. It is only activated at the end of the stream, so most of the
>> time it is just garbage.
>>
>> Levente
>>
>
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-November/122512.html
>
> Nicolas
>
>>> nextIfAtEnd: nil' , right?). And only if its going to be activated (by
>>> sending #value), a corresponding block context is created in order to
>>> evaluate the block. But it happens only when you reaching the end of
>>> stream.
>>>
>>> It is more expensive because of passing extra argument, i.e. use
>>> #nextIfAtEnd: instead of #next , but not because of passing block,
>>> IMO.
>>>
>>>>>> Nicolas
>>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Igor Stasenko AKA sig.
>>>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko AKA sig.
>>>
>>
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Re: Faster FileStream experiments

Nicolas Cellier
2009/12/1 Andreas Raab <[hidden email]>:
> Hi Nicolas -
>
> I finally got around to looking at this stuff. A couple of comments:
>
> * Regardless of what the long-term solution is, I could really, really use
> the performance improvements of BufferedFileStream. How can we bring this to
> a usable point?
>

First, the code for read/write I provided was completely bogus, I now
have a better one passing some tests.
Meanwhile, I started to have a look at XTream and played a bit with these ideas:
- separate read/write Stream
- every ReadStream has a source, every WriteStream has a destination.
- have different kinds of Read/Write streams: Collection/File/Buffered/...
- separate IOHandle for handling basic primitives
A big part of XTream is the way to transform Streams using blocks,
especially the most powerfull  transforming: [:inputStream
:outputStream |
Another point is uniform usage of EndOfStream exception (Incomplete).
I started to play with an endOfStreamAction alternative.
Another point is usage of Buffer object: this piece allows
implementing read/write streams acting on same sequence. It also is a
key to performance...

XTream also totally change the API (put, get etc...), but it does not
have to (or maybe it does have to be XTreme to deserve its name).

> * I'm not sure I like the subclassing of StandardFileStream - I would
> probably opt to subclass FileStream, adopt the primitives and write the
> stuff on top from scratch (this also allows us to keep a filePosition which
> is explicitly updated etc).
>

My very basic approach for short term performance would be:
- intoduce IOHandle in image for handling primitives (only for files
in a first time, and without modifying StandardFileStream, but just
duplicating to be minimal)
- introduce a BufferedReadStream and a BufferedReadWriteStream under
PositionableStream using this IOHandle as source
- keep same external API, only hack a few creation methods...

In a second time we will have to decide what to do with
MultiByteFileStream: it is a performance bottleneck too.
For a start, I would simply wrap around a buffered one...

> * It is highly likely that read performance is dramatically more important
> than write performance in most cases. It may be worthwhile to start with
> just buffering reads and have writes go unbuffered. This also preserves
> current semantics, allowing to gradually phase in buffered writes where
> desired (i.e., using #flushAfter: aBlock). This would make
> BufferedFileStream instantly useful for our production uses.
>
> In any case, I *really* like the direction. If we can get this into a usable
> state it would allow us to replace the sources and changes files with
> buffered versions. As a result I would expect measurable speedups in some of
> the macro benchmarks and other common operations (Object compileAll for
> example).
>

Concerning macro benchmark, StandardFileStream reading is already
performant in case of pure Random access (upTo: is already buffered).
The gain is for more sequence oriented algorithms. However, chances
are that a loaded package has its source sequentially laid in changes,
condenseChanges also organize source code that way, so Object
compileAll might show a difference eventually.

Nicolas

> Cheers,
>  - Andreas
>
> Nicolas Cellier wrote:
>>
>> 2009/11/28 Levente Uzonyi <[hidden email]>:
>>>
>>> On Sat, 28 Nov 2009, Igor Stasenko wrote:
>>>
>>>> 2009/11/28 Eliot Miranda <[hidden email]>:
>>>>>
>>>>> On Fri, Nov 27, 2009 at 4:40 PM, Igor Stasenko <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>> 2009/11/28 Nicolas Cellier <[hidden email]>:
>>>>>>>
>>>>>>> 2009/11/27 Eliot Miranda <[hidden email]>:
>>>>>>>>
>>>>>>>> An approach I like is to add an endOfStreamValue inst var to Stream
>>>>>>>> and
>>>>>>>> answer its value when at end.  This way nil does not have to be the
>>>>>>>> endOfStreamValue, for example -1 might be much more convenient for a
>>>>>>>> binary
>>>>>>>> stream, and streams can answer nil without confusing their clients.
>>>>>>>>  atEnd
>>>>>>>> can be implemented as
>>>>>>>>    atEnd
>>>>>>>>        ^self peek = self endOfStreamValue
>>>>>>>> You can arrange to make streams raise an end-of-stream exception
>>>>>>>> instead of
>>>>>>>> the endOfStreamValue by using some convention on the contents of
>>>>>>>> endOfStreamValue, such as if it is == to the stream itself (although
>>>>>>>> I
>>>>>>>> note
>>>>>>>> that in the Teleplace image the exception EndOfStrean is defined bit
>>>>>>>> not
>>>>>>>> used).
>>>>>>>>
>>>>>>>> Of course, stream primitives get in the way of adding inst vars to
>>>>>>>> stream
>>>>>>>> classes ;)
>>>>>>>> IMO this is a much more useful scheme than making nil the only
>>>>>>>> endOfStream
>>>>>>>> value.
>>>>>>>>
>>>>>>> Last time I proposed to have an inst var endOfStreamAction was here
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://lists.gforge.inria.fr/pipermail/pharo-project/2009-June/009536.html
>>>>>>> .
>>>>>>> Abusing nil value -> nil, I could even let this inst var
>>>>>>> un-initialized and be backward compatible
>>>>>>> (initializing with a ValueHolder on nil would do as well)
>>>>>>>
>>>>>> Nicolas, have you considered introducing methods which allow
>>>>>> graciously handle the end-of-stream while reading?
>>>>>> Something like:
>>>>>>
>>>>>> nextIfAtEnd: aBlock
>>>>>> and
>>>>>> next: number ifAtEnd: aBlock
>>>>>>
>>>>>>
>>>>>> then caller may choose to either write:
>>>>>>
>>>>>> char := stream nextIfAtEnd: [nil]
>>>>>>
>>>>>> or handle end of stream differently, like leaving the loop:
>>>>>>
>>>>>> char := stream nextIfAtEnd: [^ results]
>>>>>>
>>>>>> the benefit of such approach that code which reads the stream , don't
>>>>>> needs to additionally
>>>>>> test stream state (atEnd) in iteration between #next sends neither
>>>>>> requires some unique value (like nil) returned by #next
>>>>>> when reaching end of stream.
>>>>>
>>>>> IMO the block creation is too expensive for streams.  The
>>>>> defaultHandler
>>>>> approach for and EndOfStream exception is also too expensive.  The
>>>>> endOfStreamValue inst var is a nice trade-off between flexibility,
>>>>> efficiency and simplicity.  You can always write
>>>>>     [(value := stream next) ~~ stream endOfStreamValue] whileTrue:
>>>>>        [...do stuff...
>>>>>
>>>> hmm, can you elaborate, at what point you see an expensive block
>>>> creation?
>>>> A block closure is created once at compiling stage, and then passed as
>>>> any other object by reading it
>>>> from literal frame of method (and as well as , you can use 'stream
>>>
>>> In this case the block is copied and initialized every time you send
>>> #nextIfAtEnd:. It is only activated at the end of the stream, so most of
>>> the
>>> time it is just garbage.
>>>
>>> Levente
>>>
>>
>>
>> http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-November/122512.html
>>
>> Nicolas
>>
>>>> nextIfAtEnd: nil' , right?). And only if its going to be activated (by
>>>> sending #value), a corresponding block context is created in order to
>>>> evaluate the block. But it happens only when you reaching the end of
>>>> stream.
>>>>
>>>> It is more expensive because of passing extra argument, i.e. use
>>>> #nextIfAtEnd: instead of #next , but not because of passing block,
>>>> IMO.
>>>>
>>>>>>> Nicolas
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Igor Stasenko AKA sig.
>>>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Igor Stasenko AKA sig.
>>>>
>>>
>>>
>>>
>>
>>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Faster FileStream experiments

Andreas.Raab
Nicolas Cellier wrote:
> Concerning macro benchmark, StandardFileStream reading is already
> performant in case of pure Random access (upTo: is already buffered).
> The gain is for more sequence oriented algorithms. However, chances
> are that a loaded package has its source sequentially laid in changes,
> condenseChanges also organize source code that way, so Object
> compileAll might show a difference eventually.

Oh, it will. Here are the leaves for "Object compileAll":

**Leaves**
71.0 (1,149)  StandardFileStream  primRead:into:startingAt:count:
2.0 (32)  ByteString  at:put:
1.8 (29)  CompiledMethod  flushCache

That says that if you speed up #next by a factor of 5x (which is trivial
using BufferedFileStream) it'll make compileAll 2-3x faster overall. I
think we'll see similar 2x speedups for other common operations on
source code (recent changes, browsing versions etc).

Faster I/O can make a *huge* difference in speed for the whole system.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Re: Faster FileStream experiments

Nicolas Cellier
2009/12/1 Andreas Raab <[hidden email]>:

> Nicolas Cellier wrote:
>>
>> Concerning macro benchmark, StandardFileStream reading is already
>> performant in case of pure Random access (upTo: is already buffered).
>> The gain is for more sequence oriented algorithms. However, chances
>> are that a loaded package has its source sequentially laid in changes,
>> condenseChanges also organize source code that way, so Object
>> compileAll might show a difference eventually.
>
> Oh, it will. Here are the leaves for "Object compileAll":
>
> **Leaves**
> 71.0 (1,149)  StandardFileStream  primRead:into:startingAt:count:
> 2.0 (32)  ByteString  at:put:
> 1.8 (29)  CompiledMethod  flushCache
>
> That says that if you speed up #next by a factor of 5x (which is trivial
> using BufferedFileStream) it'll make compileAll 2-3x faster overall. I think
> we'll see similar 2x speedups for other common operations on source code
> (recent changes, browsing versions etc).
>
> Faster I/O can make a *huge* difference in speed for the whole system.
>
> Cheers,
>  - Andreas
>
>

Oh yes, but this is MultiByteFileStream that reads characters 1 by 1...
A StandardFileStream would already be much more performant.

Nicolas

123