I just gave a try to the BufferedFileStream.
As usual, code is MIT. Implementation is rough, readOnly, partial (no support for basicNext crap & al), untested (certainly has bugs). Early timing experiments have shown a 5x to 7x speed up on [stream nextLine] and [stream next] micro benchmarks See class comment of attachment Reminder: This bench is versus StandardFileStream. StandardFileStream is the "fast" version, CrLf anf MultiByte are far worse! This still let some more room... Integrating and testing a read/write version is a lot harder than this experiment, but we should really do it. Nicolas BufferedFileStream.st (13K) Download Attachment |
Hello Nicolas,
thanks for taking a time implementing this idea. Since you are going to introduce something more clever than simple-minded primitive based file operations, i think its worth to think about creating a separate classes for buffering/caching. Lets call it readStrategy, or writeStrategy or cacheStrategy. The idea is to redirect all read/write/seek operations to special layer, which depending on implementation could choose, if given operation will be just dumb primitive call, or something more clever, like read-ahead etc. So, then all streams (not only file stream) could be created using choosen strategy depending on user's will. About BufferedFileStream implementation. There are some room for improvement: cache should remember own starting position + size then at #skip: you simply doing self primSetPosition: fileID to: filePosition \\ bufferSize. but not touching the buffer, because you can't predict what next operation is follows (it can be another #skip: or truncate or close), which makes your read-ahead redundant. The cache should be refreshed only on direct read request, when some data which needs to be read is ouside the range covered by cache. Let me illustrate the case, which shows the suboptimal #skip: behavior: ........>........[..........<..........]........ Here, [ ] is enclosed cached data, and > is file position, after #skip: send. Then caller wants to read bytes up to < marker. In your case, #skip: will refresh cache, causing part of data which was already in buffer to be re-read again, while it is possible to reuse already cached data, and read only bytes between > and [ , and rest can be delivered from cache. Also, since after read request, a file pointer will point at < marker, we are still inside a cache, and don't need to refresh it. 2009/11/18 Nicolas Cellier <[hidden email]>: > I just gave a try to the BufferedFileStream. > As usual, code is MIT. > Implementation is rough, readOnly, partial (no support for basicNext > crap & al), untested (certainly has bugs). > Early timing experiments have shown a 5x to 7x speed up on [stream > nextLine] and [stream next] micro benchmarks > See class comment of attachment > > Reminder: This bench is versus StandardFileStream. > StandardFileStream is the "fast" version, CrLf anf MultiByte are far worse! > This still let some more room... > > Integrating and testing a read/write version is a lot harder than this > experiment, but we should really do it. > > Nicolas > > > > -- Best regards, Igor Stasenko AKA sig. |
2009/11/18 Igor Stasenko <[hidden email]>:
> Hello Nicolas, > thanks for taking a time implementing this idea. > > Since you are going to introduce something more clever than simple-minded > primitive based file operations, i think its worth to think about > creating a separate classes > for buffering/caching. Lets call it readStrategy, or writeStrategy or > cacheStrategy. > The idea is to redirect all read/write/seek operations to special layer, which > depending on implementation could choose, if given operation will be > just dumb primitive call, > or something more clever, like read-ahead etc. > So, then all streams (not only file stream) could be created using > choosen strategy > depending on user's will. > Yes, delegating is a very good idea. Quite sure other smalltalks do that already (I did not want to be tainted, so just kept away, reinventing my own wheel). This trial was a minimal proof of concept, it cannot decently pretend being a clean rewrite. > About BufferedFileStream implementation. There are some room for improvement: > cache should remember own starting position + size > then at #skip: you simply doing > self primSetPosition: fileID to: filePosition \\ bufferSize. > but not touching the buffer, because you can't predict what next > operation is follows (it can be another #skip: or truncate or close), > which makes your read-ahead redundant. > > The cache should be refreshed only on direct read request, when some > data which needs to be read > is ouside the range covered by cache. > Let me illustrate the case, which shows the suboptimal #skip: behavior: > > ........>........[..........<..........]........ > > Here, [ ] is enclosed cached data, > and > is file position, after #skip: send. > Then caller wants to read bytes up to < marker. > In your case, #skip: will refresh cache, causing part of data which > was already in buffer to be re-read again, > while it is possible to reuse already cached data, and read only bytes > between > and [ , > and rest can be delivered from cache. > Also, since after read request, a file pointer will point at < marker, > we are still inside a cache, and don't need to refresh it. > Agree, my current buffer implementation is not lazy enough. It does read ahead before knowing if really necessary :( If I understand it, you would avoid throwing the buffer away until you are sure it won't be reused. Not sure if the use cases are worth the subtle complications. Two consecutive skip: should be rare... Anyway, all these tricks should better be hidden in a private policy Object indeed, otherwise future subclasses which would inevitably flourish under BufferedFileStream (the Squeak entropy) might well break this masterpiece :) Cheers Nicolas > > 2009/11/18 Nicolas Cellier <[hidden email]>: >> I just gave a try to the BufferedFileStream. >> As usual, code is MIT. >> Implementation is rough, readOnly, partial (no support for basicNext >> crap & al), untested (certainly has bugs). >> Early timing experiments have shown a 5x to 7x speed up on [stream >> nextLine] and [stream next] micro benchmarks >> See class comment of attachment >> >> Reminder: This bench is versus StandardFileStream. >> StandardFileStream is the "fast" version, CrLf anf MultiByte are far worse! >> This still let some more room... >> >> Integrating and testing a read/write version is a lot harder than this >> experiment, but we should really do it. >> >> Nicolas >> >> >> >> > > > > -- > Best regards, > Igor Stasenko AKA sig. > > |
2009/11/18 Nicolas Cellier <[hidden email]>:
> 2009/11/18 Igor Stasenko <[hidden email]>: >> Hello Nicolas, >> thanks for taking a time implementing this idea. >> >> Since you are going to introduce something more clever than simple-minded >> primitive based file operations, i think its worth to think about >> creating a separate classes >> for buffering/caching. Lets call it readStrategy, or writeStrategy or >> cacheStrategy. >> The idea is to redirect all read/write/seek operations to special layer, which >> depending on implementation could choose, if given operation will be >> just dumb primitive call, >> or something more clever, like read-ahead etc. >> So, then all streams (not only file stream) could be created using >> choosen strategy >> depending on user's will. >> > > Yes, delegating is a very good idea. > Quite sure other smalltalks do that already (I did not want to be > tainted, so just kept away, reinventing my own wheel). > This trial was a minimal proof of concept, it cannot decently pretend > being a clean rewrite. > but it shown us the potential for improvements. Seriously, 5x-7x speedup is not something which we can just forget and throw away. >> About BufferedFileStream implementation. There are some room for improvement: >> cache should remember own starting position + size >> then at #skip: you simply doing >> self primSetPosition: fileID to: filePosition \\ bufferSize. >> but not touching the buffer, because you can't predict what next >> operation is follows (it can be another #skip: or truncate or close), >> which makes your read-ahead redundant. >> >> The cache should be refreshed only on direct read request, when some >> data which needs to be read >> is ouside the range covered by cache. >> Let me illustrate the case, which shows the suboptimal #skip: behavior: >> >> ........>........[..........<..........]........ >> >> Here, [ ] is enclosed cached data, >> and > is file position, after #skip: send. >> Then caller wants to read bytes up to < marker. >> In your case, #skip: will refresh cache, causing part of data which >> was already in buffer to be re-read again, >> while it is possible to reuse already cached data, and read only bytes >> between > and [ , >> and rest can be delivered from cache. >> Also, since after read request, a file pointer will point at < marker, >> we are still inside a cache, and don't need to refresh it. >> > > Agree, my current buffer implementation is not lazy enough. > It does read ahead before knowing if really necessary :( > > If I understand it, you would avoid throwing the buffer away until you > are sure it won't be reused. > Not sure if the use cases are worth the subtle complications. Two > consecutive skip: should be rare... yes, it is rare and quite unlikely, but you catched my intent clearly: - do not throw away the buffer unless its deem necessary. Lets keep in mind that any memory operation is orders of magnitude faster than disk operations, moreover, a filesystem could be remotely mounted drive which adds even more latency for all file-based operations. So, fighting with it using cache, is good strategy. > Anyway, all these tricks should better be hidden in a private policy > Object indeed, otherwise future subclasses which would inevitably > flourish under BufferedFileStream (the Squeak entropy) might well > break this masterpiece :) > Right. A separate layer is for making a clean room for experiments, without need of rewriting a whole stream class hierarchy, especially subclasses, where things start exploding exponentially. There should be a very thin layer, based on most simple operations: read, write, seek , while rest of stream interface is based on that. So, if we can identify this thin layer and make it pluggable, then we can be sure that at least some part of stream library can be easily customized, and if this part works well, so we can be sure streams in good shape, without need of visiting and testing numerous methods in multiple (sub)classes, which is quite messy. > Cheers > > Nicolas > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Nicolas Cellier
On Wed, Nov 18, 2009 at 3:10 AM, Nicolas Cellier <[hidden email]> wrote: I just gave a try to the BufferedFileStream. Just want to wish you every encouragement! This is *really* useful work.
|
2009/11/18 Eliot Miranda <[hidden email]>:
> > > On Wed, Nov 18, 2009 at 3:10 AM, Nicolas Cellier > <[hidden email]> wrote: >> >> I just gave a try to the BufferedFileStream. >> As usual, code is MIT. >> Implementation is rough, readOnly, partial (no support for basicNext >> crap & al), untested (certainly has bugs). >> Early timing experiments have shown a 5x to 7x speed up on [stream >> nextLine] and [stream next] micro benchmarks >> See class comment of attachment >> >> Reminder: This bench is versus StandardFileStream. >> StandardFileStream is the "fast" version, CrLf anf MultiByte are far >> worse! >> This still let some more room... >> >> Integrating and testing a read/write version is a lot harder than this >> experiment, but we should really do it. > > Just want to wish you every encouragement! This is *really* useful work. Beware, I just wrote from scratch and did not even run one single method since the read/write refactoring... So far, I did rather spend my spare time in commenting the implementation (see class comment too)... If some good souls want to analyze/try it. It should be reasonably optimized for readOnly and random read/write cases. For append only, that might not be optimal due to useless attempts to read past end, but that should not cost that much. For read/append, there is probably room for more efficiency too, but major improvment vs StandardFileStream should already show up. Not sure we really need to introduce these optimizations. The path to a cleaner/faster stream library is longer than just this little step. Beside testing, we'd have to refactor the hierarchy, insulate all instance variables, and delegate as much as possible as Igor suggested. We'd better continue on the cleaning path and not just add another FileStream subclass complexifying a bit more an unecessarily complex library. Nicolas >> >> Nicolas >> >> >> > > > > > BufferedFileStream.st (18K) Download Attachment |
On 26-Nov-09, at 2:48 PM, Nicolas Cellier wrote: > The path to a cleaner/faster stream library is longer than just this > little step. > Beside testing, we'd have to refactor the hierarchy, insulate all > instance variables, and delegate as much as possible as Igor > suggested. > We'd better continue on the cleaning path and not just add another > FileStream subclass complexifying a bit more an unecessarily complex > library. I've been thinking about this too. For Filesystem, I've only implemented very basic stream functionality so far. But I do intend to develop its stream functionality further, and to go in a very different direction from the existing design. Some design elements: - Using handles to decouple the streams from the storage they're operating on. The same stream class should be able to read or write to collections, sockets, files etc. - Separating ReadStream from WriteStream. I find code that both reads and writes to a particular stream to be very rare in practice, and in cases where it does happen, reading and writing are separate activities and using separate streams wouldn't introduce problems. On the other hand, a lot of the complexity in the existing hierarchy stems from the mingling of read and write functionality. - Simplified protocols. The existing stream classes have accumulated a lot of cruft that should be implemented as objects use streams rather than being streams themselves. Examples include fileIn, fileOut, RefrenceStream etc. - Composition rather than inheritance. As I go about implementing string encoding, buffering, compression etc. I plan to enable the creation of stream pipelines to provide combinations of functionality. Instead of implementing BufferedUtf8DelfateFilestream, I want to create a sequence of streams like this: WriteStream -> Utf8Encoder-> DeflateCompressor -> Buffer -> Handle - Grow the new streams parallel to the existing ones. Rather than trying to maintain backwards compatibility, leave the old streams in place and continue to improve them while the new ones are being developed. Migration to the new streams can happen gradually. If the new streams don't attract any users, obviously I'm on the wrong track. :-) So I've been watching your cleanup efforts with interest, particularly the buffering stuff. Keep it up! Colin |
In reply to this post by Nicolas Cellier
>>>>> "Nicolas" == Nicolas Cellier <[hidden email]> writes:
Nicolas> The path to a cleaner/faster stream library is longer than just this Nicolas> little step. Beside testing, we'd have to refactor the hierarchy, Nicolas> insulate all instance variables, and delegate as much as possible as Nicolas> Igor suggested. We'd better continue on the cleaning path and not Nicolas> just add another FileStream subclass complexifying a bit more an Nicolas> unecessarily complex library. Michael Lucas-Smith gave a nice talk on Xtreams at the Portland Linux Users Group. The most interesting thing out of this is the notion that #atEnd is just plain wrong. For some streams, computing #atEnd is impossible. For most streams, it's just expensive. Instead, Xtreams takes the approach that #do: suffices for most people, and for those that can't, an exception when you read past end-of-stream can provide the proper exit from your loop. Then, your loop can concentrate on what happens most of the time, instead of what happens rarely. Xtreams is under a liberal license, and is currently in the Cincom public store. Instead of reinventing yet another stream package, we should be looking at Xtreams, I think. (As a side effect, Xtreams has as a test a very nice PEG parsing package... so we'd get DSLs for relatively free.) -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <[hidden email]> <URL:http://www.stonehenge.com/merlyn/> Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc. See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion |
On 26-Nov-09, at 9:36 PM, Randal L. Schwartz wrote: > Xtreams is under a liberal license, and is currently in the Cincom > public > store. > > Instead of reinventing yet another stream package, we should be > looking at > Xtreams, I think. Very cool. Definitely need to steal ideas from them. ...and code, perhaps? I did a bit of poking around, but couldn't find anything on the web that said what the license actually is. Can you be more specific than "liberal?" Colin |
In reply to this post by Colin Putney
2009/11/27 Colin Putney <[hidden email]>:
> > On 26-Nov-09, at 2:48 PM, Nicolas Cellier wrote: > >> The path to a cleaner/faster stream library is longer than just this >> little step. >> Beside testing, we'd have to refactor the hierarchy, insulate all >> instance variables, and delegate as much as possible as Igor >> suggested. >> We'd better continue on the cleaning path and not just add another >> FileStream subclass complexifying a bit more an unecessarily complex >> library. > > I've been thinking about this too. For Filesystem, I've only implemented > very basic stream functionality so far. But I do intend to develop its > stream functionality further, and to go in a very different direction from > the existing design. Some design elements: > > - Using handles to decouple the streams from the storage they're operating > on. The same stream class should be able to read or write to collections, > sockets, files etc. > > - Separating ReadStream from WriteStream. I find code that both reads and > writes to a particular stream to be very rare in practice, and in cases > where it does happen, reading and writing are separate activities and using > separate streams wouldn't introduce problems. On the other hand, a lot of > the complexity in the existing hierarchy stems from the mingling of read and > write functionality. > Yes, mostly a read-append stream usage for change log... However, a buffered implementation will be difficult with separate read/write buffers in the rare case we need read/write capabilities: writing might trash the read buffer, so they are not independent. > - Simplified protocols. The existing stream classes have accumulated a lot > of cruft that should be implemented as objects use streams rather than being > streams themselves. Examples include fileIn, fileOut, RefrenceStream etc. > Yes, packaging and modularization of core... > - Composition rather than inheritance. As I go about implementing string > encoding, buffering, compression etc. I plan to enable the creation of > stream pipelines to provide combinations of functionality. Instead of > implementing BufferedUtf8DelfateFilestream, I want to create a sequence of > streams like this: > > WriteStream -> Utf8Encoder-> DeflateCompressor -> Buffer -> Handle > Agree again > - Grow the new streams parallel to the existing ones. Rather than trying to > maintain backwards compatibility, leave the old streams in place and > continue to improve them while the new ones are being developed. Migration > to the new streams can happen gradually. If the new streams don't attract > any users, obviously I'm on the wrong track. :-) > > So I've been watching your cleanup efforts with interest, particularly the > buffering stuff. Keep it up! > Obviously, it's just a piece of a larger puzzle. > Colin > > Nicolas |
In reply to this post by Randal L. Schwartz
> Nicolas> The path to a cleaner/faster stream library is longer than just this
> Nicolas> little step. Beside testing, we'd have to refactor the hierarchy, > Nicolas> insulate all instance variables, and delegate as much as possible as > Nicolas> Igor suggested. We'd better continue on the cleaning path and not > Nicolas> just add another FileStream subclass complexifying a bit more an > Nicolas> unecessarily complex library. > > Michael Lucas-Smith gave a nice talk on Xtreams at the Portland Linux Users > Group. The most interesting thing out of this is the notion that #atEnd is > just plain wrong. For some streams, computing #atEnd is impossible. For most > streams, it's just expensive. Instead, Xtreams takes the approach that #do: > suffices for most people, and for those that can't, an exception when you read > past end-of-stream can provide the proper exit from your loop. Then, your > loop can concentrate on what happens most of the time, instead of what happens > rarely. I think we need a common superclass for Streams and Collection named Iterable where #do: is abstract and #select:, #collect:, #reject:, #count:, #detect:, etc (and quiet a lot of the messages in enumerating category of Collection) are implemented based on #do: Of course Stream can refine the #select:/#reject methods to answer a FilteredStream that decorates the receiver and apply the filtering on the fly. In the same way #collect: can return a TransformedStream that decorates the receiver, etc. Just my 2 cents. Cheers, -- Diego |
2009/11/27 Diego Gomez Deck <[hidden email]>:
>> Nicolas> The path to a cleaner/faster stream library is longer than just this >> Nicolas> little step. Beside testing, we'd have to refactor the hierarchy, >> Nicolas> insulate all instance variables, and delegate as much as possible as >> Nicolas> Igor suggested. We'd better continue on the cleaning path and not >> Nicolas> just add another FileStream subclass complexifying a bit more an >> Nicolas> unecessarily complex library. >> >> Michael Lucas-Smith gave a nice talk on Xtreams at the Portland Linux Users >> Group. The most interesting thing out of this is the notion that #atEnd is >> just plain wrong. For some streams, computing #atEnd is impossible. For most >> streams, it's just expensive. Instead, Xtreams takes the approach that #do: >> suffices for most people, and for those that can't, an exception when you read >> past end-of-stream can provide the proper exit from your loop. Then, your >> loop can concentrate on what happens most of the time, instead of what happens >> rarely. > > I think we need a common superclass for Streams and Collection named > Iterable where #do: is abstract and #select:, #collect:, #reject:, > #count:, #detect:, etc (and quiet a lot of the messages in enumerating > category of Collection) are implemented based on #do: > > Of course Stream can refine the #select:/#reject methods to answer a > FilteredStream that decorates the receiver and apply the filtering on > the fly. In the same way #collect: can return a TransformedStream that > decorates the receiver, etc. > > Just my 2 cents. > > Cheers, > > -- Diego > > Yes, this is gst approach, and it seems a good one. > > |
In reply to this post by Diego Gomez Deck
> I think we need a common superclass for Streams and Collection named
> Iterable where #do: is abstract and #select:, #collect:, #reject:, > #count:, #detect:, etc (and quiet a lot of the messages in enumerating > category of Collection) are implemented based on #do: > > Of course Stream can refine the #select:/#reject methods to answer a > FilteredStream that decorates the receiver and apply the filtering on > the fly. In the same way #collect: can return a TransformedStream that > decorates the receiver, etc. Since Stream can't reuse #select: and #collect: (or #count, and #detect: on an infinite stream is risky), they shouldn't be in the superclass. In that case, what is its purpose? i think it is fine to give Stream the same interface as Collection. I do this, too. But they will share very little code, and so there is no need to give them a common superclass. -Ralph Johnson |
In reply to this post by Diego Gomez Deck
2009/11/27 Diego Gomez Deck <[hidden email]> > Nicolas> The path to a cleaner/faster stream library is longer than just this Maybe I'm wrong but I think traits are a good (better) solution for that kind of problem. #do can be a required method and you can implement remaining methods with #do
|
In reply to this post by Ralph Johnson
El vie, 27-11-2009 a las 06:15 -0600, Ralph Johnson escribió:
> > I think we need a common superclass for Streams and Collection named > > Iterable where #do: is abstract and #select:, #collect:, #reject:, > > #count:, #detect:, etc (and quiet a lot of the messages in enumerating > > category of Collection) are implemented based on #do: > > > > Of course Stream can refine the #select:/#reject methods to answer a > > FilteredStream that decorates the receiver and apply the filtering on > > the fly. In the same way #collect: can return a TransformedStream that > > decorates the receiver, etc. > > Since Stream can't reuse #select: and #collect: (or #count, and > #detect: on an infinite stream is risky), Stream and Collection are just the 2 refinements of Iterable that we're talking about in this thread, but there are a lot of classes that can benefit from Iterable as a super-class. On the other side, Stream has #do: (and #atEnd/#next pair) and it's also risky for infinite streams. To push this discussion forward, Is InfiniteStream a real Stream? > they shouldn't be in the > superclass. In that case, what is its purpose? > > i think it is fine to give Stream the same interface as Collection. I > do this, too. But they will share very little code, and so there is > no need to give them a common superclass. > > -Ralph Johnson Cheers, -- Diego |
2009/11/27 Diego Gomez Deck <[hidden email]>:
> El vie, 27-11-2009 a las 06:15 -0600, Ralph Johnson escribió: >> > I think we need a common superclass for Streams and Collection named >> > Iterable where #do: is abstract and #select:, #collect:, #reject:, >> > #count:, #detect:, etc (and quiet a lot of the messages in enumerating >> > category of Collection) are implemented based on #do: >> > >> > Of course Stream can refine the #select:/#reject methods to answer a >> > FilteredStream that decorates the receiver and apply the filtering on >> > the fly. In the same way #collect: can return a TransformedStream that >> > decorates the receiver, etc. >> >> Since Stream can't reuse #select: and #collect: (or #count, and >> #detect: on an infinite stream is risky), > > Stream and Collection are just the 2 refinements of Iterable that we're > talking about in this thread, but there are a lot of classes that can > benefit from Iterable as a super-class. > > On the other side, Stream has #do: (and #atEnd/#next pair) and it's also > risky for infinite streams. To push this discussion forward, Is > InfiniteStream a real Stream? > >> they shouldn't be in the >> superclass. In that case, what is its purpose? >> >> i think it is fine to give Stream the same interface as Collection. I >> do this, too. But they will share very little code, and so there is >> no need to give them a common superclass. >> >> -Ralph Johnson > > Cheers, > > -- Diego > #select: and #collect: are not necessarily dangerous even on infinite stream once you see them as filters and implement them with a lazy block evaluation : Stream select: aBlock should return a SelectStream (find a better name here :). Then you would use it with #next, as any other InfiniteStream. > > > |
In reply to this post by Colin Putney
2009/11/27 Colin Putney <[hidden email]>:
> > On 26-Nov-09, at 2:48 PM, Nicolas Cellier wrote: > >> The path to a cleaner/faster stream library is longer than just this >> little step. >> Beside testing, we'd have to refactor the hierarchy, insulate all >> instance variables, and delegate as much as possible as Igor >> suggested. >> We'd better continue on the cleaning path and not just add another >> FileStream subclass complexifying a bit more an unecessarily complex >> library. > > I've been thinking about this too. For Filesystem, I've only implemented > very basic stream functionality so far. But I do intend to develop its > stream functionality further, and to go in a very different direction from > the existing design. Some design elements: > > - Using handles to decouple the streams from the storage they're operating > on. The same stream class should be able to read or write to collections, > sockets, files etc. > > - Separating ReadStream from WriteStream. I find code that both reads and > writes to a particular stream to be very rare in practice, and in cases > where it does happen, reading and writing are separate activities and using > separate streams wouldn't introduce problems. On the other hand, a lot of > the complexity in the existing hierarchy stems from the mingling of read and > write functionality. > > - Simplified protocols. The existing stream classes have accumulated a lot > of cruft that should be implemented as objects use streams rather than being > streams themselves. Examples include fileIn, fileOut, RefrenceStream etc. > > - Composition rather than inheritance. As I go about implementing string > encoding, buffering, compression etc. I plan to enable the creation of > stream pipelines to provide combinations of functionality. Instead of > implementing BufferedUtf8DelfateFilestream, I want to create a sequence of > streams like this: > > WriteStream -> Utf8Encoder-> DeflateCompressor -> Buffer -> Handle > +100. Just yesterday i thought about same design principle: composition. I call it StreamAdaptor. It should carry a minimal set of methods, which providing a basic set of operations (read/write/seek etc) and also should support a pipelining in same way as you illustrated above: Lets say, initially we created a stream which works with file: Stream -> FileAdaptor then we want it to be buffered: stream adaptor: (stream adaptor beBuffered) Stream -> BufferAdaptor -> FileAdaptor then we want it to be compressed: stream adaptor: (ZipAdaptor on: stream adaptor) Stream -> DeflateCompressor -> BufferAdaptor -> FileAdaptor and so on.. It is easy to see, that if we may want to create same structure for socket connection, all we need is to just use a socket adaptor in the chain, while rest don't requires any modifications. > - Grow the new streams parallel to the existing ones. Rather than trying to > maintain backwards compatibility, leave the old streams in place and > continue to improve them while the new ones are being developed. Migration > to the new streams can happen gradually. If the new streams don't attract > any users, obviously I'm on the wrong track. :-) > > So I've been watching your cleanup efforts with interest, particularly the > buffering stuff. Keep it up! > > Colin > > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Colin Putney
On Thu, Nov 26, 2009 at 08:56:08PM -0800, Colin Putney wrote:
> > I've been thinking about this too. For Filesystem, I've only > implemented very basic stream functionality so far. But I do intend to > develop its stream functionality further, and to go in a very > different direction from the existing design. Some design elements: > > - Using handles to decouple the streams from the storage they're > operating on. The same stream class should be able to read or write to > collections, sockets, files etc. I implemented IOHandle for this, see http://wiki.squeak.org/squeak/996. I have not maintained it since about 2003, but the idea is straightforward. My purpose at that time was to : * Separate the representation of external IO channels from the represention of streams and communication protocols. * Provide a uniform representation of IO channels similar to the unix notion of treating everything as a 'file'. * Simplify future refactoring of Socket and FileStream. * Provide a place for handling asynchronous IO events. Refer to the aio handling in the unix VM. Files, Sockets, and AsyncFiles could (should) use a common IO event handling mechanism (aio event signaling a Smalltalk Semaphore). Since that time I added aio event handling for file (AioPlugin, see http://wiki.squeak.org/squeak/3384), which is a layer on top of Ian's aio event handling in the unix and OS X VMx that which is mainly useful for handling unix pipes. But I still think that a more unified view of "handles for IO channels" is a good idea. The completely separate representation of files and sockets in Squeak still feels wrong to me, maybe just because I am accustomed to unix systems. Dave |
In reply to this post by Colin Putney
>>>>> "Colin" == Colin Putney <[hidden email]> writes:
Colin> ...and code, perhaps? I did a bit of poking around, but couldn't find Colin> anything on the web that said what the license actually is. Can you be Colin> more specific than "liberal?" MLS made it clear at the meeting that Cincom's default release model is now "open source" except for things that are business differentiating, and in fact, in particular, they would really like to see Xtreams adopted widely, so the license would have to be MIT-like for htat to happen. I'm sure if we poked Arden or James Robertson we could get a statement of license for Xtreams available rather quickly. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <[hidden email]> <URL:http://www.stonehenge.com/merlyn/> Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc. See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion |
In reply to this post by David T. Lewis
On 27-Nov-09, at 8:03 AM, David T. Lewis wrote: > I implemented IOHandle for this, see http://wiki.squeak.org/squeak/ > 996. > I have not maintained it since about 2003, but the idea is > straightforward. Yes. I looked into IOHandle when implementing Filesystem, but decided to go with a new (simpler, but limited) implementation that would let me explore the requirements for the stream architecture I had in mind. > My purpose at that time was to : > > * Separate the representation of external IO channels from the > represention > of streams and communication protocols. > * Provide a uniform representation of IO channels similar to the > unix notion > of treating everything as a 'file'. > * Simplify future refactoring of Socket and FileStream. > * Provide a place for handling asynchronous IO events. Refer to the > aio > handling in the unix VM. Files, Sockets, and AsyncFiles could > (should) use > a common IO event handling mechanism (aio event signaling a > Smalltalk Semaphore). Indeed. Filesystem comes at this from the other direction, but I think we want to end up in the same place. For now I've done TSTTCPW, which is use the primitives from the FilePlugin. But eventually I want to improve the plumbing. You've done some important work here - perhaps Filesystem can use AioPlugin at some point. Colin |
Free forum by Nabble | Edit this page |