#skipSeparators doesn't work for MultiByteBinaryOrTextStream

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

#skipSeparators doesn't work for MultiByteBinaryOrTextStream

Mariano Martinez Peck
Hi guys. I found a problem when exporting/importing chunks with MultiByteBinaryOrTextStream and I think #skipSeparators is wrong. 
The code is:

MultiByteBinaryOrTextStream  >> skipSeparators

[self atEnd] whileFalse: [
self basicNext isSeparator ifFalse: [
^ self position: self position - 1]]

Problem is that the position is not save, as it is done in

MultiByteFileStream >> skipSeparators

| state character |
state := converter saveStateOf: self.
(character := self next) 
ifNil: [ false ]
ifNotNil: [ character isSeparator ] ] whileTrue.
character ifNotNil: [ 
converter restoreStateOf: self with: state ] 


If I copy the implementation from MultiByteFileStream to MultiByteBinaryOrTextStream, my tests work as expected. 
But I am not a stream expert. So, what do you think? is this change correct?

Thanks,

--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: #skipSeparators doesn't work for MultiByteBinaryOrTextStream

Sven Van Caekenberghe-2
Mariano,

What would be wrong with something more elegant, like

consumeWhitespace
        "Strip whitespaces from the input stream."

        [ readStream atEnd not and: [ readStream peek isSeparator ] ]
                whileTrue: [
                        readStream next ]

?

In my book #peek, which means a one-element pushback buffer, is a whole lot less of a requirement that a general #position[:] interface which actually assumes that all contents is in memory, a contradiction to the concept of streaming.

Maybe these uglier versions are more efficient, but who can say for sure they are correct ?

;-)

Sven

On 06 Sep 2012, at 10:26, Mariano Martinez Peck <[hidden email]> wrote:

> Hi guys. I found a problem when exporting/importing chunks with MultiByteBinaryOrTextStream and I think #skipSeparators is wrong.
> The code is:
>
> MultiByteBinaryOrTextStream  >> skipSeparators
>
> [self atEnd] whileFalse: [
> self basicNext isSeparator ifFalse: [
> ^ self position: self position - 1]]
>
> Problem is that the position is not save, as it is done in
>
> MultiByteFileStream >> skipSeparators
>
> | state character |
> [
> state := converter saveStateOf: self.
> (character := self next)
> ifNil: [ false ]
> ifNotNil: [ character isSeparator ] ] whileTrue.
> character ifNotNil: [
> converter restoreStateOf: self with: state ]
>
>
> If I copy the implementation from MultiByteFileStream to MultiByteBinaryOrTextStream, my tests work as expected.
> But I am not a stream expert. So, what do you think? is this change correct?
>
> Thanks,
>
> --
> Mariano
> http://marianopeck.wordpress.com

--
Sven Van Caekenberghe
http://stfx.eu
Smalltalk is the Red Pill




Reply | Threaded
Open this post in threaded view
|

Re: #skipSeparators doesn't work for MultiByteBinaryOrTextStream

Mariano Martinez Peck


On Thu, Sep 6, 2012 at 10:44 AM, Sven Van Caekenberghe <[hidden email]> wrote:
Mariano,

What would be wrong with something more elegant, like

consumeWhitespace
        "Strip whitespaces from the input stream."

        [ readStream atEnd not and: [ readStream peek isSeparator ] ]
                whileTrue: [
                        readStream next ]

?

In my book #peek, which means a one-element pushback buffer, is a whole lot less of a requirement that a general #position[:] interface which actually assumes that all contents is in memory, a contradiction to the concept of streaming.

Well, that seems to work also :)
 

Maybe these uglier versions are more efficient, but who can say for sure they are correct ?


I have no idea. I would like to fix it with any of the 2 possibilities. Everything is better than being broken. 
So any other opinion?

 
;-)

Sven

On 06 Sep 2012, at 10:26, Mariano Martinez Peck <[hidden email]> wrote:

> Hi guys. I found a problem when exporting/importing chunks with MultiByteBinaryOrTextStream and I think #skipSeparators is wrong.
> The code is:
>
> MultiByteBinaryOrTextStream  >> skipSeparators
>
>       [self atEnd] whileFalse: [
>               self basicNext isSeparator ifFalse: [
>                       ^ self position: self position - 1]]
>
> Problem is that the position is not save, as it is done in
>
> MultiByteFileStream >> skipSeparators
>
>       | state character |
>       [
>               state := converter saveStateOf: self.
>               (character := self next)
>                       ifNil: [ false ]
>                       ifNotNil: [ character isSeparator ] ] whileTrue.
>       character ifNotNil: [
>               converter restoreStateOf: self with: state ]
>
>
> If I copy the implementation from MultiByteFileStream to MultiByteBinaryOrTextStream, my tests work as expected.
> But I am not a stream expert. So, what do you think? is this change correct?
>
> Thanks,
>
> --
> Mariano
> http://marianopeck.wordpress.com

--
Sven Van Caekenberghe
http://stfx.eu
Smalltalk is the Red Pill







--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: #skipSeparators doesn't work for MultiByteBinaryOrTextStream

Henrik Sperre Johansen
In reply to this post by Sven Van Caekenberghe-2
On 06.09.2012 10:44, Sven Van Caekenberghe wrote:

> Mariano,
>
> What would be wrong with something more elegant, like
>
> consumeWhitespace
> "Strip whitespaces from the input stream."
>
> [ readStream atEnd not and: [ readStream peek isSeparator ] ]
> whileTrue: [
> readStream next ]
>
> ?
>
> In my book #peek, which means a one-element pushback buffer, is a whole lot less of a requirement that a general #position[:] interface which actually assumes that all contents is in memory, a contradiction to the concept of streaming.
Considering we are talking of PositionableStream subclasses, using the
position: protocol (also to implement peek) isn't exactly heresy. :)
Though, neglecting converter state (which in one case involves more than
position) when one does so is a bad idea.

Rewriting #peek to employ a buffer instead of position: for positionable
streams(double dispatched through saveStateOf:) purely for the sake of
not dealing with position: seems like a bad use of time imho, though
you'd lose the compexity of dd'ing through the converter, dealing with
the buffer translates into quite a few changes elsewhere.

For non-positionable streams, a buffer is really the only choice though,
and isn't _that_ complicated to deal with in face of no arbitrary
position: changes.

(Ironically in light of the above, the use cases of
CompoundTextConverter, which is the one which _does_ care about other
state than position in saveStateOf:, should never necessitate arbitary
position: changes, so it seems targetted purely at making peek work)
> Maybe these uglier versions are more efficient, but who can say for sure they are correct ?
>
> ;-)
>
> Sven
When it comes to streams on in-memory collections (like MBBOTS), using
peek shouldn't be very noticeable.
For streams with external sources, position: can be costly, inlining the
position book-keeping peek does (as MBFS>>skipSeparators does) might be
justified.
With the amount of whitespace one normally parse through, I sort of
doubt it even in that case.

Anyways, who can say for sure the peek approach is correct? ;)
As mentioned above, for MultiByteBinaryOrTextStreams using
CompoundTextConverters it won't be., since #peek in MBBOTS doesn't
account for converter state.
(Just like skipSeparators, which in addition does not account for
multibyte elements, the source of the bug Mariano ran into in the first
place, and why peek fixes his non-CompTextConv usecase)

TLDR;
- Yap, using peek is cleaner, and is a nicer implementation as long as
internally you use a buffer / position: is cheap.
- MultiByteBinaryOrTextStream>>#peek needs fixing, along the lines
Mariano originally intended for skipSeparaters wrt. copying what is done
in MultiByteFileStreams, in order to work correctly with all converters.

Cheers,
Henry