Hi guys. I found a problem when exporting/importing chunks with MultiByteBinaryOrTextStream and I think #skipSeparators is wrong.
The code is: MultiByteBinaryOrTextStream >> skipSeparators
[self atEnd] whileFalse: [ self basicNext isSeparator ifFalse: [
^ self position: self position - 1]] Problem is that the position is not save, as it is done in MultiByteFileStream >> skipSeparators
| state character | [ state := converter saveStateOf: self.
(character := self next) ifNil: [ false ] ifNotNil: [ character isSeparator ] ] whileTrue.
character ifNotNil: [ converter restoreStateOf: self with: state ] If I copy the implementation from MultiByteFileStream to MultiByteBinaryOrTextStream, my tests work as expected. But I am not a stream expert. So, what do you think? is this change correct?
Thanks, Mariano http://marianopeck.wordpress.com |
Mariano,
What would be wrong with something more elegant, like consumeWhitespace "Strip whitespaces from the input stream." [ readStream atEnd not and: [ readStream peek isSeparator ] ] whileTrue: [ readStream next ] ? In my book #peek, which means a one-element pushback buffer, is a whole lot less of a requirement that a general #position[:] interface which actually assumes that all contents is in memory, a contradiction to the concept of streaming. Maybe these uglier versions are more efficient, but who can say for sure they are correct ? ;-) Sven On 06 Sep 2012, at 10:26, Mariano Martinez Peck <[hidden email]> wrote: > Hi guys. I found a problem when exporting/importing chunks with MultiByteBinaryOrTextStream and I think #skipSeparators is wrong. > The code is: > > MultiByteBinaryOrTextStream >> skipSeparators > > [self atEnd] whileFalse: [ > self basicNext isSeparator ifFalse: [ > ^ self position: self position - 1]] > > Problem is that the position is not save, as it is done in > > MultiByteFileStream >> skipSeparators > > | state character | > [ > state := converter saveStateOf: self. > (character := self next) > ifNil: [ false ] > ifNotNil: [ character isSeparator ] ] whileTrue. > character ifNotNil: [ > converter restoreStateOf: self with: state ] > > > If I copy the implementation from MultiByteFileStream to MultiByteBinaryOrTextStream, my tests work as expected. > But I am not a stream expert. So, what do you think? is this change correct? > > Thanks, > > -- > Mariano > http://marianopeck.wordpress.com -- Sven Van Caekenberghe http://stfx.eu Smalltalk is the Red Pill |
On Thu, Sep 6, 2012 at 10:44 AM, Sven Van Caekenberghe <[hidden email]> wrote: Mariano, Well, that seems to work also :)
I have no idea. I would like to fix it with any of the 2 possibilities. Everything is better than being broken. So any other opinion? ;-) Mariano http://marianopeck.wordpress.com |
In reply to this post by Sven Van Caekenberghe-2
On 06.09.2012 10:44, Sven Van Caekenberghe wrote:
> Mariano, > > What would be wrong with something more elegant, like > > consumeWhitespace > "Strip whitespaces from the input stream." > > [ readStream atEnd not and: [ readStream peek isSeparator ] ] > whileTrue: [ > readStream next ] > > ? > > In my book #peek, which means a one-element pushback buffer, is a whole lot less of a requirement that a general #position[:] interface which actually assumes that all contents is in memory, a contradiction to the concept of streaming. position: protocol (also to implement peek) isn't exactly heresy. :) Though, neglecting converter state (which in one case involves more than position) when one does so is a bad idea. Rewriting #peek to employ a buffer instead of position: for positionable streams(double dispatched through saveStateOf:) purely for the sake of not dealing with position: seems like a bad use of time imho, though you'd lose the compexity of dd'ing through the converter, dealing with the buffer translates into quite a few changes elsewhere. For non-positionable streams, a buffer is really the only choice though, and isn't _that_ complicated to deal with in face of no arbitrary position: changes. (Ironically in light of the above, the use cases of CompoundTextConverter, which is the one which _does_ care about other state than position in saveStateOf:, should never necessitate arbitary position: changes, so it seems targetted purely at making peek work) > Maybe these uglier versions are more efficient, but who can say for sure they are correct ? > > ;-) > > Sven When it comes to streams on in-memory collections (like MBBOTS), using peek shouldn't be very noticeable. For streams with external sources, position: can be costly, inlining the position book-keeping peek does (as MBFS>>skipSeparators does) might be justified. With the amount of whitespace one normally parse through, I sort of doubt it even in that case. Anyways, who can say for sure the peek approach is correct? ;) As mentioned above, for MultiByteBinaryOrTextStreams using CompoundTextConverters it won't be., since #peek in MBBOTS doesn't account for converter state. (Just like skipSeparators, which in addition does not account for multibyte elements, the source of the bug Mariano ran into in the first place, and why peek fixes his non-CompTextConv usecase) TLDR; - Yap, using peek is cleaner, and is a nicer implementation as long as internally you use a buffer / position: is cheap. - MultiByteBinaryOrTextStream>>#peek needs fixing, along the lines Mariano originally intended for skipSeparaters wrt. copying what is done in MultiByteFileStreams, in order to work correctly with all converters. Cheers, Henry |
Free forum by Nabble | Edit this page |