squeak XTream

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

squeak XTream

Nicolas Cellier
I just published a very early version of squeak extended streams to
http://www.squeaksource.com/XTream.html
- with code form IOHandle ported in trunk (as is)
- with my own implementation of Cincom XTream ideas
I licensed MIT, even IOHandle because these are large chunks of
existing Squeak source. David is it OK ?
Anyway, this code is not used yet...

The stream stacking/piping/composing/transforming (choose a name) is
based on major XTream idea that stream should be transformed through a
block taking an :inputStream and an :outputStream arguments.
This enables virtually any processing (like inflating deflating
encoding decoding etc...)

It is not yet functional, but the basic infrastructure is there: XtreamFIFO

Other noticeable point:
I avoided EndOfStream notification and replaced by an
endOfStreamAction instance variable, which is quite fun.
However with stream composition this will require some more work, and
might be tricky...
Anyway, EndOfStream is one of the possible option offered by
endOfStreamAction, so no feasibility problem.

The end conditions for blocking streams (like a pipe was closed) are
certainly not obvious to handle and will deserve more work too.

One thing that annoys me is this BufferedReadWriteXtream, the only
read/write stream of the collection so far.
It does not fit well in the hierarchy... Maybe I'll have to separate
the buffer too.

I will concentrate now on providing basic collection functionalities.
Then basic file functionalities, probably starting with a
StandardFileStream as source, then a IOFileHandle maybe.
- buffered read/write (equivalent to StandardFileStream)
- line end conversions (equivalent to CrLfFileStream)
- encoding conversions via transform: operations (equivalent to
MultiByteFileStream)

The plan will then be to implement more classes of the Stream
hierarchy. Most should be just implemented by a special transform
block...

Then it would be interesting to start unifying Socket and Pipe and
ASync files...

If anyone want to interfere, the repository is opened for write.

Nicolas

Reply | Threaded
Open this post in threaded view
|

Re: squeak XTream

Andreas.Raab
Hi Nicolas -

I suppose it's too tempting to write a new stream library rather than
just speeding up the existing files ;-) I like the basic approach
although adoption will be a tad more difficult than just having a faster
version of FileStream.

> The stream stacking/piping/composing/transforming (choose a name) is
> based on major XTream idea that stream should be transformed through a
> block taking an :inputStream and an :outputStream arguments.
> This enables virtually any processing (like inflating deflating
> encoding decoding etc...)

Fair enough as a general architecture but there really, really should be
a way of stacking up things directly, i.e., along the lines of:

file := FileHandle open: 'file.txt' mode: 'rb'.
stream := UTF8EncodingXTream on: BufferedXtream on: FileXtream on: file.

I would't want to write this using blocks as input and output.

> Other noticeable point:
> I avoided EndOfStream notification and replaced by an
> endOfStreamAction instance variable, which is quite fun.

Interesting idea. I'm not sure if this is feasible in the long-term
though because writing code for pieces "in the middle" will be tricky if
you don't know whether you're being fed with an exception or not.

> I will concentrate now on providing basic collection functionalities.
> Then basic file functionalities, probably starting with a
> StandardFileStream as source, then a IOFileHandle maybe.
> - buffered read/write (equivalent to StandardFileStream)
> - line end conversions (equivalent to CrLfFileStream)
> - encoding conversions via transform: operations (equivalent to
> MultiByteFileStream)

Well, I'm still in the business of shopping for a faster version of
FileStream ;-)

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Re: squeak XTream

Colin Putney

On 1-Dec-09, at 9:55 PM, Andreas Raab wrote:

>
> file := FileHandle open: 'file.txt' mode: 'rb'.
> stream := UTF8EncodingXTream on: BufferedXtream on: FileXtream on:  
> file.

For Filesystem, I've been working on something like this:

stream := aReference writeStream
        encoding: #utf8;
        buffer: 1024;
        yourself

The stream is responsible for managing the pipeline between its self  
and the handle.

If Nicolas is writing a separate Xtreams package, though, it's going  
to overlap a lot with Filesystem. Perhaps I should just depend on  
Xtreams, rather than duplicate the functionality.

Colin

Reply | Threaded
Open this post in threaded view
|

IOHandle license is now MIT (was: squeak XTream)

David T. Lewis
In reply to this post by Nicolas Cellier
On Wed, Dec 02, 2009 at 05:06:02AM +0100, Nicolas Cellier wrote:
> I just published a very early version of squeak extended streams to
> http://www.squeaksource.com/XTream.html
> - with code form IOHandle ported in trunk (as is)
> - with my own implementation of Cincom XTream ideas
> I licensed MIT, even IOHandle because these are large chunks of
> existing Squeak source. David is it OK ?

I hereby release all IOHandle code under the MIT license.

Dave

p. s. "XTream" -- what a great name!


Reply | Threaded
Open this post in threaded view
|

Re: Re: squeak XTream

Nicolas Cellier
In reply to this post by Colin Putney
2009/12/2 Colin Putney <[hidden email]>:

>
> On 1-Dec-09, at 9:55 PM, Andreas Raab wrote:
>
>>
>> file := FileHandle open: 'file.txt' mode: 'rb'.
>> stream := UTF8EncodingXTream on: BufferedXtream on: FileXtream on: file.
>
> For Filesystem, I've been working on something like this:
>
> stream := aReference writeStream
>        encoding: #utf8;
>        buffer: 1024;
>        yourself
>
> The stream is responsible for managing the pipeline between its self and the
> handle.
>
> If Nicolas is writing a separate Xtreams package, though, it's going to
> overlap a lot with Filesystem. Perhaps I should just depend on Xtreams,
> rather than duplicate the functionality.
>
> Colin
>
>

Xtream is not functional yet, it is just a three evenings shot.
Especially pipelines are quite tricky with a forked process... I got
to rest a bit and think.
This kind of implementation natively has good parallelism properties,
unfortunately this won't exploit multi-core/processors any time soon
in Smalltalk...
A few month ago, I implemented a simple Wrapper-like scheme, but was
not satisfied with end of stream handling. Both EndOfStream exception
capture and atEnd tests are expensive when processing elements 1 by 1.
Maybe I'll have to turn to such a more simple scheme though.
Definitely, we should exchange code/ideas.

Nicolas

Reply | Threaded
Open this post in threaded view
|

Re: Re: squeak XTream

David T. Lewis
On Wed, Dec 02, 2009 at 05:26:53PM +0100, Nicolas Cellier wrote:
>
> Xtream is not functional yet, it is just a three evenings shot.
> Especially pipelines are quite tricky with a forked process... I got
> to rest a bit and think.

Are you referring to pipelines in the sense of unix command pipelines?
This is indeed tricky, and requires for sure that streams on pipes
be set to non-blocking mode, otherwise you lock up the VM (*). If
you have the CommandShell package loaded, you may want to look at
the tests in category "testing-pipelines" in CommandShellTestCase
for examples.

In CommandShell, the necessary methods are just hacked into subclasses
of StandardFileStream. The basic approach is to set non-blocking
mode for pipe reads, then use AIO notification to signal data
available. The process that is reading data from the pipe waits on
a semaphore, then reads the available data. This prevents the VM
lockup problem, and allows lots of processes to concurrently read
from pipe streams.

I'm sure the StandardFileStream hack is not what you want, but it
might provide some ideas.

(*) Of course if you use threads in the VM, this would be different,
but that is another subject I think.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: Re: squeak XTream

Eliot Miranda-2
In reply to this post by Nicolas Cellier


On Wed, Dec 2, 2009 at 8:26 AM, Nicolas Cellier <[hidden email]> wrote:
2009/12/2 Colin Putney <[hidden email]>:
>
> On 1-Dec-09, at 9:55 PM, Andreas Raab wrote:
>
>>
>> file := FileHandle open: 'file.txt' mode: 'rb'.
>> stream := UTF8EncodingXTream on: BufferedXtream on: FileXtream on: file.
>
> For Filesystem, I've been working on something like this:
>
> stream := aReference writeStream
>        encoding: #utf8;
>        buffer: 1024;
>        yourself
>
> The stream is responsible for managing the pipeline between its self and the
> handle.
>
> If Nicolas is writing a separate Xtreams package, though, it's going to
> overlap a lot with Filesystem. Perhaps I should just depend on Xtreams,
> rather than duplicate the functionality.
>
> Colin
>
>

Xtream is not functional yet, it is just a three evenings shot.
Especially pipelines are quite tricky with a forked process... I got
to rest a bit and think.
This kind of implementation natively has good parallelism properties,
unfortunately this won't exploit multi-core/processors any time soon
in Smalltalk...
A few month ago, I implemented a simple Wrapper-like scheme, but was
not satisfied with end of stream handling. Both EndOfStream exception
capture and atEnd tests are expensive when processing elements 1 by 1.
Maybe I'll have to turn to such a more simple scheme though.
Definitely, we should exchange code/ideas.

The scheme that works is to use an endOfStreamValue which can take a value (e.g. the stream itself) that causes EndOfStream to be raised when at end of stream.  e.g.

pastEnd
    ^endOfStreamValue == self
        ifTrue: [(EndOfStream for: self) signal]
        ifFalse: [endOfStreamValue]

Then the only argument is over what the default should be.  Providing convenience methods on the class side can take the pain out of that.

N.B.  The EndOfStream exception has no inst var or accessor for the stream on which it is raised.  This is a bug :)

Nicolas




Reply | Threaded
Open this post in threaded view
|

Re: Re: squeak XTream

Eliot Miranda-2
In reply to this post by David T. Lewis


On Wed, Dec 2, 2009 at 9:13 AM, David T. Lewis <[hidden email]> wrote:
On Wed, Dec 02, 2009 at 05:26:53PM +0100, Nicolas Cellier wrote:
>
> Xtream is not functional yet, it is just a three evenings shot.
> Especially pipelines are quite tricky with a forked process... I got
> to rest a bit and think.

Are you referring to pipelines in the sense of unix command pipelines?
This is indeed tricky, and requires for sure that streams on pipes
be set to non-blocking mode, otherwise you lock up the VM (*). If
you have the CommandShell package loaded, you may want to look at
the tests in category "testing-pipelines" in CommandShellTestCase
for examples.

In CommandShell, the necessary methods are just hacked into subclasses
of StandardFileStream. The basic approach is to set non-blocking
mode for pipe reads, then use AIO notification to signal data
available. The process that is reading data from the pipe waits on
a semaphore, then reads the available data. This prevents the VM
lockup problem, and allows lots of processes to concurrently read
from pipe streams.

I'm sure the StandardFileStream hack is not what you want, but it
might provide some ideas.

(*) Of course if you use threads in the VM, this would be different,
but that is another subject I think.

Yes it is, but FYI it is fixed in the threaded VM.  So hopefully this issue will go away soon :)
 

Dave





Reply | Threaded
Open this post in threaded view
|

Re: Re: squeak XTream

Eliot Miranda-2
In reply to this post by Colin Putney
Hi Colin,

On Wed, Dec 2, 2009 at 6:46 AM, Colin Putney <[hidden email]> wrote:

On 1-Dec-09, at 9:55 PM, Andreas Raab wrote:


file := FileHandle open: 'file.txt' mode: 'rb'.
stream := UTF8EncodingXTream on: BufferedXtream on: FileXtream on: file.

For Filesystem, I've been working on something like this:

stream := aReference writeStream
       encoding: #utf8;
       buffer: 1024;
       yourself

please make that accessor bufferSize: :)  I could well imagine wanting to pass in a specific buffer (e.g. one that records access history for debugging) and would want to use buffer: as the accessor :)
 

The stream is responsible for managing the pipeline between its self and the handle.

If Nicolas is writing a separate Xtreams package, though, it's going to overlap a lot with Filesystem. Perhaps I should just depend on Xtreams, rather than duplicate the functionality.

Colin




Reply | Threaded
Open this post in threaded view
|

Re: Re: squeak XTream

Colin Putney

On 2-Dec-09, at 9:22 AM, Eliot Miranda wrote:

> stream := aReference writeStream
>        encoding: #utf8;
>        buffer: 1024;
>        yourself
>
> please make that accessor bufferSize: :)  I could well imagine  
> wanting to pass in a specific buffer (e.g. one that records access  
> history for debugging) and would want to use buffer: as the  
> accessor :)

Sure, #bufferSize: is certainly more expressive.

Reply | Threaded
Open this post in threaded view
|

Re: Re: squeak XTream

Colin Putney
In reply to this post by Nicolas Cellier

On 2-Dec-09, at 8:26 AM, Nicolas Cellier wrote:

> Xtream is not functional yet, it is just a three evenings shot.

At the rate you're going, you'll have caught up to the functionality  
in Filesystem in no time. ;-)

> Especially pipelines are quite tricky with a forked process... I got
> to rest a bit and think.
> This kind of implementation natively has good parallelism properties,
> unfortunately this won't exploit multi-core/processors any time soon
> in Smalltalk...

I'm guessing you mean running each stage of the stream in a separate  
Smalltalk Process, using the Pipes and Filters pattern?
Stephen Pair did some neat stuff with that a few years ago. It's  
indeed tricky. I wonder if it's worth it in this case, though, exactly  
because Smalltalk doesn't exploit multiprocessors. Flow of control  
inside a stream might be complicated without parallelism, but it's  
probably easier to debug.

> A few month ago, I implemented a simple Wrapper-like scheme, but was
> not satisfied with end of stream handling. Both EndOfStream exception
> capture and atEnd tests are expensive when processing elements 1 by 1.
> Maybe I'll have to turn to such a more simple scheme though.

I don't understand the issue with EndOfStream exceptions. Throwing and  
catching an exception is expensive, yes, but that should happen only  
once, right? Unless you're setting up exception handler inside a loop,  
the expense of a single exception shouldn't be a problem.

> Definitely, we should exchange code/ideas.

Agreed. We may find that doing parallel development with lots of cross-
pollination is the best way to explore the design space.

Colin

Reply | Threaded
Open this post in threaded view
|

Re: Re: squeak XTream

Eliot Miranda-2


On Wed, Dec 2, 2009 at 10:49 AM, Colin Putney <[hidden email]> wrote:

On 2-Dec-09, at 8:26 AM, Nicolas Cellier wrote:

Xtream is not functional yet, it is just a three evenings shot.

At the rate you're going, you'll have caught up to the functionality in Filesystem in no time. ;-)


Especially pipelines are quite tricky with a forked process... I got
to rest a bit and think.
This kind of implementation natively has good parallelism properties,
unfortunately this won't exploit multi-core/processors any time soon
in Smalltalk...

I'm guessing you mean running each stage of the stream in a separate Smalltalk Process, using the Pipes and Filters pattern?
Stephen Pair did some neat stuff with that a few years ago. It's indeed tricky. I wonder if it's worth it in this case, though, exactly because Smalltalk doesn't exploit multiprocessors. Flow of control inside a stream might be complicated without parallelism, but it's probably easier to debug.


A few month ago, I implemented a simple Wrapper-like scheme, but was
not satisfied with end of stream handling. Both EndOfStream exception
capture and atEnd tests are expensive when processing elements 1 by 1.
Maybe I'll have to turn to such a more simple scheme though.

I don't understand the issue with EndOfStream exceptions. Throwing and catching an exception is expensive, yes, but that should happen only once, right? Unless you're setting up exception handler inside a loop, the expense of a single exception shouldn't be a problem.

Exception search and delivery is, uh, /expensive/.  The cost of propagating an EndOfStream exception to its defaultAction and returning nil is huge compared to simply answering an end-of-stream value.  So unless one really wants exception handling one should strive to avoid raising EndOfStream exceptions at end of stream.

I think in VisualWorks we noticed the extreme expense in the ChangeList scanner where one is creating lots of streams on strings corresponding to each chunk.  The end-of-stream exceptions on all these streams when doing something like scanning a changes file would add up to a significant percentage of the entire parse time.  So believe me, it does add up.
 


Definitely, we should exchange code/ideas.

Agreed. We may find that doing parallel development with lots of cross-pollination is the best way to explore the design space.

Colin




Reply | Threaded
Open this post in threaded view
|

Re: Re: squeak XTream

Nicolas Cellier
In reply to this post by Colin Putney
2009/12/2 Colin Putney <[hidden email]>:
>
> On 2-Dec-09, at 8:26 AM, Nicolas Cellier wrote:
>
>> Xtream is not functional yet, it is just a three evenings shot.
>
> At the rate you're going, you'll have caught up to the functionality in
> Filesystem in no time. ;-)
>

Hmm, I need to rest too...

>> Especially pipelines are quite tricky with a forked process... I got
>> to rest a bit and think.
>> This kind of implementation natively has good parallelism properties,
>> unfortunately this won't exploit multi-core/processors any time soon
>> in Smalltalk...
>
> I'm guessing you mean running each stage of the stream in a separate
> Smalltalk Process, using the Pipes and Filters pattern?
> Stephen Pair did some neat stuff with that a few years ago. It's indeed
> tricky. I wonder if it's worth it in this case, though, exactly because
> Smalltalk doesn't exploit multiprocessors. Flow of control inside a stream
> might be complicated without parallelism, but it's probably easier to debug.
>

Yes

>> A few month ago, I implemented a simple Wrapper-like scheme, but was
>> not satisfied with end of stream handling. Both EndOfStream exception
>> capture and atEnd tests are expensive when processing elements 1 by 1.
>> Maybe I'll have to turn to such a more simple scheme though.
>
> I don't understand the issue with EndOfStream exceptions. Throwing and
> catching an exception is expensive, yes, but that should happen only once,
> right? Unless you're setting up exception handler inside a loop, the expense
> of a single exception shouldn't be a problem.
>

yes, my naive implementation of stream wrappers did have to handle the
exception inside the loop.
I don't remember why...
More other, I first used Notification, and it was dangerous because
uncaught Notification can hit an upper handler.
The case when the number of elements change thru a Wrapper can be
tricky (think of a select: operation)
Maybe I should have a refreshed look at it now.

Now I would handle it with an endOfStreamAction anyway...

somewhere in initialization:
    endOfStreamMark := Object new.
    source endOfStreamAction: nil->endOfStreamMark.

and then:
next
   | anElement |
   ^(anElement := source next) = endOfStreamMark
        ifTrue: [endOfStreamAction value]
        ifFalse: [anElement]

my own endOfStreamAction has been set upper in the chain presumably.

Nicolas

>> Definitely, we should exchange code/ideas.
>
> Agreed. We may find that doing parallel development with lots of
> cross-pollination is the best way to explore the design space.
>
> Colin
>
>

One day, we may converge :)

Nicolas

Reply | Threaded
Open this post in threaded view
|

Re: Re: squeak XTream

Nicolas Cellier
In reply to this post by Eliot Miranda-2
2009/12/2 Eliot Miranda <[hidden email]>:

>
>
> On Wed, Dec 2, 2009 at 10:49 AM, Colin Putney <[hidden email]> wrote:
>>
>> On 2-Dec-09, at 8:26 AM, Nicolas Cellier wrote:
>>
>>> Xtream is not functional yet, it is just a three evenings shot.
>>
>> At the rate you're going, you'll have caught up to the functionality in
>> Filesystem in no time. ;-)
>>
>>> Especially pipelines are quite tricky with a forked process... I got
>>> to rest a bit and think.
>>> This kind of implementation natively has good parallelism properties,
>>> unfortunately this won't exploit multi-core/processors any time soon
>>> in Smalltalk...
>>
>> I'm guessing you mean running each stage of the stream in a separate
>> Smalltalk Process, using the Pipes and Filters pattern?
>> Stephen Pair did some neat stuff with that a few years ago. It's indeed
>> tricky. I wonder if it's worth it in this case, though, exactly because
>> Smalltalk doesn't exploit multiprocessors. Flow of control inside a stream
>> might be complicated without parallelism, but it's probably easier to debug.
>>
>>> A few month ago, I implemented a simple Wrapper-like scheme, but was
>>> not satisfied with end of stream handling. Both EndOfStream exception
>>> capture and atEnd tests are expensive when processing elements 1 by 1.
>>> Maybe I'll have to turn to such a more simple scheme though.
>>
>> I don't understand the issue with EndOfStream exceptions. Throwing and
>> catching an exception is expensive, yes, but that should happen only once,
>> right? Unless you're setting up exception handler inside a loop, the expense
>> of a single exception shouldn't be a problem.
>
> Exception search and delivery is, uh, /expensive/.  The cost of propagating
> an EndOfStream exception to its defaultAction and returning nil is huge
> compared to simply answering an end-of-stream value.  So unless one really
> wants exception handling one should strive to avoid raising EndOfStream
> exceptions at end of stream.
> I think in VisualWorks we noticed the extreme expense in the ChangeList
> scanner where one is creating lots of streams on strings corresponding to
> each chunk.  The end-of-stream exceptions on all these streams when doing
> something like scanning a changes file would add up to a significant
> percentage of the entire parse time.  So believe me, it does add up.
>

Yes, I arrived to same conlcusion:
Exception is better than atEnd test quite soon for small collection.
Exception is worse than == endMark test except for very big
collections (large files).

Notification with default ^nil action is the worse thing possible both
for efficiency (whole stack walk) and for not scaling well in
complexity (an upper stream catching an un-caught notification that
should have returned nil in a low level function using streams...)

I arrived to similar conclusion but prefer an endOfStreamAction to an
endOfStreamValue because I like to be able to use a home return
sometimes
    stream endOfStreamAction: [^self]
and I have the endOfStreamValue at not much higher cost:
    stream endOfStreamAction: nil->endOfStreamValue

Cheers

Nicolas

>>
>>> Definitely, we should exchange code/ideas.
>>
>> Agreed. We may find that doing parallel development with lots of
>> cross-pollination is the best way to explore the design space.
>>
>> Colin
>>
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Re: squeak XTream

Igor Stasenko
2009/12/2 Nicolas Cellier <[hidden email]>:

> 2009/12/2 Eliot Miranda <[hidden email]>:
>>
>>
>> On Wed, Dec 2, 2009 at 10:49 AM, Colin Putney <[hidden email]> wrote:
>>>
>>> On 2-Dec-09, at 8:26 AM, Nicolas Cellier wrote:
>>>
>>>> Xtream is not functional yet, it is just a three evenings shot.
>>>
>>> At the rate you're going, you'll have caught up to the functionality in
>>> Filesystem in no time. ;-)
>>>
>>>> Especially pipelines are quite tricky with a forked process... I got
>>>> to rest a bit and think.
>>>> This kind of implementation natively has good parallelism properties,
>>>> unfortunately this won't exploit multi-core/processors any time soon
>>>> in Smalltalk...
>>>
>>> I'm guessing you mean running each stage of the stream in a separate
>>> Smalltalk Process, using the Pipes and Filters pattern?
>>> Stephen Pair did some neat stuff with that a few years ago. It's indeed
>>> tricky. I wonder if it's worth it in this case, though, exactly because
>>> Smalltalk doesn't exploit multiprocessors. Flow of control inside a stream
>>> might be complicated without parallelism, but it's probably easier to debug.
>>>
>>>> A few month ago, I implemented a simple Wrapper-like scheme, but was
>>>> not satisfied with end of stream handling. Both EndOfStream exception
>>>> capture and atEnd tests are expensive when processing elements 1 by 1.
>>>> Maybe I'll have to turn to such a more simple scheme though.
>>>
>>> I don't understand the issue with EndOfStream exceptions. Throwing and
>>> catching an exception is expensive, yes, but that should happen only once,
>>> right? Unless you're setting up exception handler inside a loop, the expense
>>> of a single exception shouldn't be a problem.
>>
>> Exception search and delivery is, uh, /expensive/.  The cost of propagating
>> an EndOfStream exception to its defaultAction and returning nil is huge
>> compared to simply answering an end-of-stream value.  So unless one really
>> wants exception handling one should strive to avoid raising EndOfStream
>> exceptions at end of stream.
>> I think in VisualWorks we noticed the extreme expense in the ChangeList
>> scanner where one is creating lots of streams on strings corresponding to
>> each chunk.  The end-of-stream exceptions on all these streams when doing
>> something like scanning a changes file would add up to a significant
>> percentage of the entire parse time.  So believe me, it does add up.
>>
>
> Yes, I arrived to same conlcusion:
> Exception is better than atEnd test quite soon for small collection.
> Exception is worse than == endMark test except for very big
> collections (large files).
>
> Notification with default ^nil action is the worse thing possible both
> for efficiency (whole stack walk) and for not scaling well in
> complexity (an upper stream catching an un-caught notification that
> should have returned nil in a low level function using streams...)
>
> I arrived to similar conclusion but prefer an endOfStreamAction to an
> endOfStreamValue because I like to be able to use a home return
> sometimes
>    stream endOfStreamAction: [^self]
> and I have the endOfStreamValue at not much higher cost:
>    stream endOfStreamAction: nil->endOfStreamValue
>

yes. using some state ivar (be it endOfStreamAction or
endOfStreamValue) to send #value to it when meeting end of stream is
most flexible & least expensible thing i can imagine.
A stream user then could put a non-local return in block, or signal
exception, or simply nil.. whatever he may desire.

> Cheers
>
> Nicolas
>
>>>
>>>> Definitely, we should exchange code/ideas.
>>>
>>> Agreed. We may find that doing parallel development with lots of
>>> cross-pollination is the best way to explore the design space.
>>>
>>> Colin
>>>
>>
>>
>>
>>
>>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Re: squeak XTream

Levente Uzonyi-2
In reply to this post by Andreas.Raab
On Tue, 1 Dec 2009, Andreas Raab wrote:

> Well, I'm still in the business of shopping for a faster version of
> FileStream ;-)
>

I uploaded two packages to the inbox (Files-ul.37 and Multilingual-ul.68)
which aim to speed up the read performance of FileStreams. Files-ul.37
adds read buffering to StandardFileStream and subclasses, while
Multilingual-ul.68 inlines a few methods. With these changes I measured
~13.5x speedup for:

Smalltalk garbageCollect.
[ FileStream readOnlyFileNamed: 'yourlargefile.txt' do: [ :file |
  [ file next == nil ] whileFalse ] ] timeToRun

and ~39.5x speedup for:

Smalltalk garbageCollect.
[ StandardFileStream readOnlyFileNamed: 'yourlargefile.txt' do: [ :file |
  [ file next == nil ] whileFalse ] ] timeToRun

(where yourlargefile.txt is an 1MB sized file with random data)

The code is far from clean, some tests fail (15 of 37 BitmapStreamTests),
but it seems to be working (though I'm sure there are a few bugs). I would
be happy if someone could find out why the tests fail. Also a better
test suite would be helpful.

Note that as soon as a write is performed the read buffer is killed, so
mixed read/write behavior is probably slower than before. Also note that
loading the changes may harm your image/changes file.


Levente

Reply | Threaded
Open this post in threaded view
|

Re: Re: squeak XTream

Stephen Pair
In reply to this post by Colin Putney
On Wed, Dec 2, 2009 at 1:49 PM, Colin Putney <[hidden email]> wrote:

On 2-Dec-09, at 8:26 AM, Nicolas Cellier wrote:

Especially pipelines are quite tricky with a forked process... I got
to rest a bit and think.
This kind of implementation natively has good parallelism properties,
unfortunately this won't exploit multi-core/processors any time soon
in Smalltalk...

I'm guessing you mean running each stage of the stream in a separate Smalltalk Process, using the Pipes and Filters pattern?
Stephen Pair did some neat stuff with that a few years ago. It's indeed tricky. I wonder if it's worth it in this case, though, exactly because Smalltalk doesn't exploit multiprocessors. Flow of control inside a stream might be complicated without parallelism, but it's probably easier to debug.

Some time ago I implemented dynamic bindings, which gives you an ability to get and set process local (or more accurately, stack local) variables.  I had also implemented a SharedStream class that firmed up the semantics of underflow vs end of stream as well as overflow (if you wanted to cap the size of the buffer).  It was process safe (as the name implies) and had blocking and non blocking protocols.  It could handle any kind of object and was efficient for bytes and characters.

I later read a paper on a more expressive model for concurrency (can't recall the author) that discussed conjunctive and disjunctive joints and I created an implementation.  With the protocol I chose, you could do something like:

   ([a doSomething] | [b doSomething]) value

That would evaluate both blocks concurrently and answer the first one to complete (and terminate the laggard).  The stack is stitched together in such a way that an exception signaled in either block would propagate out and could be handled naturally with an on:do:...when in a handler, both of the child processes would be suspended.  This was the disjunctive joint.  Similarly, you could do:

   ([a doSomething] & [b doSomething]) value

This would evaluate both blocks concurrently, but not return an answer until both completed (and it would return the answer of the last block).  It has the same exception handling semantics.  This was the conjunctive joint.  You could also chain as many blocks as you wanted to run them concurrently.

From there it was a natural extension want to do something like:

   (someString >> [x grep: 'foo'] >> [x sort]) value

Where >> represented a pipe (in the unix sense) and used SharedStream and the concurrency work to connect up the processes.  I used dynamic bindings to define stdin and stdout for each of the processes.  With some contortions I managed to reduce it to something like:

   (someString >> (Grep with: 'foo') >> Sort) value.

I imagine with implicit self sends (ala Self or Newspeak), it could be reduced even further:

   (someString >> grep: 'foo' >> sort) value

- Stephen