Re: New Files in Pharo - Migration Guide, How To's and examples

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Richard O'Keefe
The idea of stacking things like character coding and buffering &c
by using layers of wrapping is very much in the spirit of Java and
is neither good for performance nor helpful for correctness.


On 20 March 2018 at 05:19, Guillermo Polito <[hidden email]> wrote:
Hi all,

I've put some minutes summarizing the new APIs provided by the combination of the new File implementation and the Zn encoders. They all basically follow the decorator pattern to stack different responsibilities such as buffering, encoding, line ending convertions.

Please, do not hesitate to give your feedback.

Guille


1. Basic Files

By default files are binary. Not buffered.

(File named: 'name') readStream.
(File named: 'name') readStreamDo: [ :stream | ... ].
(File named: 'name') writeStream.
(File named: 'name') writeStreamDo: [ :stream | ... ].


2. Encoding

To add encoding, wrap a stream with a corresponding ZnCharacterRead/WriteStream.

"Reading"
utf8Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf8'.
utf16Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf16'.

"Writing"
utf8Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf8'.
utf16Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf16'.

3. Buffering

To add buffering, wrap a stream with a corresponding ZnBufferedRead/WriteStream.

bufferedReadStream := ZnBufferedReadStream on: aStream.
bufferedWriteStream := ZnBufferedWriteStream on: aStream.

It is in general better to buffer the reading on the binary file and apply the encoding on the buffer in memory than the other way around. See

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding: 'utf8') upToEnd
]] timeToRun. "0:00:00:09.288"

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnBufferedReadStream on: (ZnCharacterReadStream on: binaryFile encoding: 'utf8')) upToEnd
]] timeToRun. "0:00:00:14.189"

4. File System

By default, file system files are buffered and utf8 encoded to keep backwards compatibility.

'name' asFileReference readStreamDo: [ :bufferedUtf8Stream | ... ].
'name' asFileReference writeStreamDo: [ :bufferedUtf8Stream | ... ].

FileStream also provides access to plain binary files using the #binaryRead/WriteStream messages. Binary streams are buffered by default also.

'name' asFileReference binaryReadStreamDo: [ :bufferedBinaryStream | ... ].
'name' asFileReference binaryWriteStreamDo: [ :bufferedBinaryStream | ... ].

If you want a file with another encoding (to come in the PR https://github.com/pharo-project/pharo/pull/1134), you can specify it while obtaining the stream:

'name' asFileReference
    readStreamEncoded: 'utf16'
    do: [ :bufferedUtf16Stream | ... ].

'name' asFileReference 
    writeStreamEncoded: 'utf8'
    do: [ :bufferedUtf16Stream | ... ].

5. Line Ending Conventions

If you want to write files following a specific line ending convention, use the ZnNewLineWriterStream.
This stream decorator will transform any line ending (cr, lf, crlf) into a defined line ending.
By default it chooses the platform line ending convention.

lineWriter := ZnNewLineWriterStream on: aStream.

If you want to choose another line ending convention you can do:

lineWriter forCr.
lineWriter forLf.
lineWriter forCrLf.
lineWriter forPlatformLineEnding.

--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13


Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Guillermo Polito
Well, I'd say it we did it in the name of modularity. And yes, I believe that having separate responsibilities help in designing, testing and ensuring more easily the correctness of each of the parts in isolation.

I've done also some profiling and it does not look like we've lost in performance either (reading and decoding a 35MB file):

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding: 'utf8') next: binaryFile size.
]] timeToRun. "0:00:00:01.976"

[file := Smalltalk sourcesFile fullName.
(MultiByteFileStream fileNamed: file)
converter: (TextConverter newForEncoding: 'utf8');
upToEnd
] timeToRun. "0:00:00:02.147"



On Mon, Mar 19, 2018 at 5:51 PM, Richard O'Keefe <[hidden email]> wrote:
The idea of stacking things like character coding and buffering &c
by using layers of wrapping is very much in the spirit of Java and
is neither good for performance nor helpful for correctness.


On 20 March 2018 at 05:19, Guillermo Polito <[hidden email]> wrote:
Hi all,

I've put some minutes summarizing the new APIs provided by the combination of the new File implementation and the Zn encoders. They all basically follow the decorator pattern to stack different responsibilities such as buffering, encoding, line ending convertions.

Please, do not hesitate to give your feedback.

Guille


1. Basic Files

By default files are binary. Not buffered.

(File named: 'name') readStream.
(File named: 'name') readStreamDo: [ :stream | ... ].
(File named: 'name') writeStream.
(File named: 'name') writeStreamDo: [ :stream | ... ].


2. Encoding

To add encoding, wrap a stream with a corresponding ZnCharacterRead/WriteStream.

"Reading"
utf8Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf8'.
utf16Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf16'.

"Writing"
utf8Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf8'.
utf16Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf16'.

3. Buffering

To add buffering, wrap a stream with a corresponding ZnBufferedRead/WriteStream.

bufferedReadStream := ZnBufferedReadStream on: aStream.
bufferedWriteStream := ZnBufferedWriteStream on: aStream.

It is in general better to buffer the reading on the binary file and apply the encoding on the buffer in memory than the other way around. See

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding: 'utf8') upToEnd
]] timeToRun. "0:00:00:09.288"

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnBufferedReadStream on: (ZnCharacterReadStream on: binaryFile encoding: 'utf8')) upToEnd
]] timeToRun. "0:00:00:14.189"

4. File System

By default, file system files are buffered and utf8 encoded to keep backwards compatibility.

'name' asFileReference readStreamDo: [ :bufferedUtf8Stream | ... ].
'name' asFileReference writeStreamDo: [ :bufferedUtf8Stream | ... ].

FileStream also provides access to plain binary files using the #binaryRead/WriteStream messages. Binary streams are buffered by default also.

'name' asFileReference binaryReadStreamDo: [ :bufferedBinaryStream | ... ].
'name' asFileReference binaryWriteStreamDo: [ :bufferedBinaryStream | ... ].

If you want a file with another encoding (to come in the PR https://github.com/pharo-project/pharo/pull/1134), you can specify it while obtaining the stream:

'name' asFileReference
    readStreamEncoded: 'utf16'
    do: [ :bufferedUtf16Stream | ... ].

'name' asFileReference 
    writeStreamEncoded: 'utf8'
    do: [ :bufferedUtf16Stream | ... ].

5. Line Ending Conventions

If you want to write files following a specific line ending convention, use the ZnNewLineWriterStream.
This stream decorator will transform any line ending (cr, lf, crlf) into a defined line ending.
By default it chooses the platform line ending convention.

lineWriter := ZnNewLineWriterStream on: aStream.

If you want to choose another line ending convention you can do:

lineWriter forCr.
lineWriter forLf.
lineWriter forCrLf.
lineWriter forPlatformLineEnding.

--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13





--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Stephane Ducasse-3
Thanks Guille for this great post!
I will turn it into a doc :)



On Mon, Mar 19, 2018 at 6:10 PM, Guillermo Polito <[hidden email]> wrote:
Well, I'd say it we did it in the name of modularity. And yes, I believe that having separate responsibilities help in designing, testing and ensuring more easily the correctness of each of the parts in isolation.

I've done also some profiling and it does not look like we've lost in performance either (reading and decoding a 35MB file):

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding: 'utf8') next: binaryFile size.
]] timeToRun. "0:00:00:01.976"

[file := Smalltalk sourcesFile fullName.
(MultiByteFileStream fileNamed: file)
converter: (TextConverter newForEncoding: 'utf8');
upToEnd
] timeToRun. "0:00:00:02.147"



On Mon, Mar 19, 2018 at 5:51 PM, Richard O'Keefe <[hidden email]> wrote:
The idea of stacking things like character coding and buffering &c
by using layers of wrapping is very much in the spirit of Java and
is neither good for performance nor helpful for correctness.


On 20 March 2018 at 05:19, Guillermo Polito <[hidden email]> wrote:
Hi all,

I've put some minutes summarizing the new APIs provided by the combination of the new File implementation and the Zn encoders. They all basically follow the decorator pattern to stack different responsibilities such as buffering, encoding, line ending convertions.

Please, do not hesitate to give your feedback.

Guille


1. Basic Files

By default files are binary. Not buffered.

(File named: 'name') readStream.
(File named: 'name') readStreamDo: [ :stream | ... ].
(File named: 'name') writeStream.
(File named: 'name') writeStreamDo: [ :stream | ... ].


2. Encoding

To add encoding, wrap a stream with a corresponding ZnCharacterRead/WriteStream.

"Reading"
utf8Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf8'.
utf16Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf16'.

"Writing"
utf8Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf8'.
utf16Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf16'.

3. Buffering

To add buffering, wrap a stream with a corresponding ZnBufferedRead/WriteStream.

bufferedReadStream := ZnBufferedReadStream on: aStream.
bufferedWriteStream := ZnBufferedWriteStream on: aStream.

It is in general better to buffer the reading on the binary file and apply the encoding on the buffer in memory than the other way around. See

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding: 'utf8') upToEnd
]] timeToRun. "0:00:00:09.288"

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnBufferedReadStream on: (ZnCharacterReadStream on: binaryFile encoding: 'utf8')) upToEnd
]] timeToRun. "0:00:00:14.189"

4. File System

By default, file system files are buffered and utf8 encoded to keep backwards compatibility.

'name' asFileReference readStreamDo: [ :bufferedUtf8Stream | ... ].
'name' asFileReference writeStreamDo: [ :bufferedUtf8Stream | ... ].

FileStream also provides access to plain binary files using the #binaryRead/WriteStream messages. Binary streams are buffered by default also.

'name' asFileReference binaryReadStreamDo: [ :bufferedBinaryStream | ... ].
'name' asFileReference binaryWriteStreamDo: [ :bufferedBinaryStream | ... ].

If you want a file with another encoding (to come in the PR https://github.com/pharo-project/pharo/pull/1134), you can specify it while obtaining the stream:

'name' asFileReference
    readStreamEncoded: 'utf16'
    do: [ :bufferedUtf16Stream | ... ].

'name' asFileReference 
    writeStreamEncoded: 'utf8'
    do: [ :bufferedUtf16Stream | ... ].

5. Line Ending Conventions

If you want to write files following a specific line ending convention, use the ZnNewLineWriterStream.
This stream decorator will transform any line ending (cr, lf, crlf) into a defined line ending.
By default it chooses the platform line ending convention.

lineWriter := ZnNewLineWriterStream on: aStream.

If you want to choose another line ending convention you can do:

lineWriter forCr.
lineWriter forLf.
lineWriter forCrLf.
lineWriter forPlatformLineEnding.

--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13





--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13


Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Stephane Ducasse-3
In reply to this post by Guillermo Polito
I love the idea of measuring statement :)


On Mon, Mar 19, 2018 at 6:10 PM, Guillermo Polito <[hidden email]> wrote:
Well, I'd say it we did it in the name of modularity. And yes, I believe that having separate responsibilities help in designing, testing and ensuring more easily the correctness of each of the parts in isolation.

I've done also some profiling and it does not look like we've lost in performance either (reading and decoding a 35MB file):

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding: 'utf8') next: binaryFile size.
]] timeToRun. "0:00:00:01.976"

[file := Smalltalk sourcesFile fullName.
(MultiByteFileStream fileNamed: file)
converter: (TextConverter newForEncoding: 'utf8');
upToEnd
] timeToRun. "0:00:00:02.147"



On Mon, Mar 19, 2018 at 5:51 PM, Richard O'Keefe <[hidden email]> wrote:
The idea of stacking things like character coding and buffering &c
by using layers of wrapping is very much in the spirit of Java and
is neither good for performance nor helpful for correctness.


On 20 March 2018 at 05:19, Guillermo Polito <[hidden email]> wrote:
Hi all,

I've put some minutes summarizing the new APIs provided by the combination of the new File implementation and the Zn encoders. They all basically follow the decorator pattern to stack different responsibilities such as buffering, encoding, line ending convertions.

Please, do not hesitate to give your feedback.

Guille


1. Basic Files

By default files are binary. Not buffered.

(File named: 'name') readStream.
(File named: 'name') readStreamDo: [ :stream | ... ].
(File named: 'name') writeStream.
(File named: 'name') writeStreamDo: [ :stream | ... ].


2. Encoding

To add encoding, wrap a stream with a corresponding ZnCharacterRead/WriteStream.

"Reading"
utf8Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf8'.
utf16Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf16'.

"Writing"
utf8Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf8'.
utf16Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf16'.

3. Buffering

To add buffering, wrap a stream with a corresponding ZnBufferedRead/WriteStream.

bufferedReadStream := ZnBufferedReadStream on: aStream.
bufferedWriteStream := ZnBufferedWriteStream on: aStream.

It is in general better to buffer the reading on the binary file and apply the encoding on the buffer in memory than the other way around. See

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding: 'utf8') upToEnd
]] timeToRun. "0:00:00:09.288"

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnBufferedReadStream on: (ZnCharacterReadStream on: binaryFile encoding: 'utf8')) upToEnd
]] timeToRun. "0:00:00:14.189"

4. File System

By default, file system files are buffered and utf8 encoded to keep backwards compatibility.

'name' asFileReference readStreamDo: [ :bufferedUtf8Stream | ... ].
'name' asFileReference writeStreamDo: [ :bufferedUtf8Stream | ... ].

FileStream also provides access to plain binary files using the #binaryRead/WriteStream messages. Binary streams are buffered by default also.

'name' asFileReference binaryReadStreamDo: [ :bufferedBinaryStream | ... ].
'name' asFileReference binaryWriteStreamDo: [ :bufferedBinaryStream | ... ].

If you want a file with another encoding (to come in the PR https://github.com/pharo-project/pharo/pull/1134), you can specify it while obtaining the stream:

'name' asFileReference
    readStreamEncoded: 'utf16'
    do: [ :bufferedUtf16Stream | ... ].

'name' asFileReference 
    writeStreamEncoded: 'utf8'
    do: [ :bufferedUtf16Stream | ... ].

5. Line Ending Conventions

If you want to write files following a specific line ending convention, use the ZnNewLineWriterStream.
This stream decorator will transform any line ending (cr, lf, crlf) into a defined line ending.
By default it chooses the platform line ending convention.

lineWriter := ZnNewLineWriterStream on: aStream.

If you want to choose another line ending convention you can do:

lineWriter forCr.
lineWriter forLf.
lineWriter forCrLf.
lineWriter forPlatformLineEnding.

--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13





--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13


Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Sean P. DeNigris
Administrator
In reply to this post by Guillermo Polito
Guillermo Polito wrote
> I've done also some profiling and it does not look like we've lost in
> performance either (reading and decoding a 35MB file):

Your 1st (new API) example: "0:00:00:01.535"
Your 2nd (old API) example:"0:00:00:01.732"
*** but ***
Your 1st (new API) example using the same consuming selector as example #1
(#upToEnd): "0:00:00:07.816" or 4.5x slower



-----
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Sven Van Caekenberghe-2


> On 20 Mar 2018, at 00:19, Sean P. DeNigris <[hidden email]> wrote:
>
> Guillermo Polito wrote
>> I've done also some profiling and it does not look like we've lost in
>> performance either (reading and decoding a 35MB file):
>
> Your 1st (new API) example: "0:00:00:01.535"
> Your 2nd (old API) example:"0:00:00:01.732"
> *** but ***
> Your 1st (new API) example using the same consuming selector as example #1
> (#upToEnd): "0:00:00:07.816" or 4.5x slower

Yes and no ;-)

Look at MultiByteFileStream>>#upToEnd where they cheat a bit and do basically the same as example 1 using #next:

The Zn #upToEnd implementation does not use size knowledge (as streams are infinite).

As a side note: using #size and #position[:] on variable encoded streams like UTF8 is *very* dangerous and should be avoided. Counting/indexing bytes in the underlying stream is *not* the same as counting/indexing decoded characters. It can be the source of subtle bugs.


> -----
> Cheers,
> Sean
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>


Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Guillermo Polito


On Tue, Mar 20, 2018 at 8:57 AM, Sven Van Caekenberghe <[hidden email]> wrote:


> On 20 Mar 2018, at 00:19, Sean P. DeNigris <[hidden email]> wrote:
>
> Guillermo Polito wrote
>> I've done also some profiling and it does not look like we've lost in
>> performance either (reading and decoding a 35MB file):
>
> Your 1st (new API) example: "0:00:00:01.535"
> Your 2nd (old API) example:"0:00:00:01.732"
> *** but ***
> Your 1st (new API) example using the same consuming selector as example #1
> (#upToEnd): "0:00:00:07.816" or 4.5x slower

Yes and no ;-)

Look at MultiByteFileStream>>#upToEnd where they cheat a bit and do basically the same as example 1 using #next:

The Zn #upToEnd implementation does not use size knowledge (as streams are infinite).

As a side note: using #size and #position[:] on variable encoded streams like UTF8 is *very* dangerous and should be avoided. Counting/indexing bytes in the underlying stream is *not* the same as counting/indexing decoded characters. It can be the source of subtle bugs.

Sean,

Think that the Buffered stream can be reused in the context of sockets too. And there size has no meaning.

Basically the cost in there is on growing and copying the internal buffer used to read. This buffer for the 35M file ends up being ~35M, so it has to grow and copy several times...
Using #next: will preallocate the good size for the internal buffer.

Now, I'd say first that if people want performance they should use the good API :).

Maybe the ZnBuffered stream could ask its underlying stream for its #preferredInternalBufferSizeForUpToAll. But that would contaminate all the stream hierarchy... Is it worth it? Dunno... 
 


> -----
> Cheers,
> Sean
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>





--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Denis Kudriashov
We can cover this case by extra variable maxBufferSize.
By default it would be equal to given size in method #sizeBuffer:. And special users will specify concrete value. FileHandler will use file size for this.
It will also improve buffer size for small files where we do not need big buffer.

2018-03-20 9:57 GMT+01:00 Guillermo Polito <[hidden email]>:


On Tue, Mar 20, 2018 at 8:57 AM, Sven Van Caekenberghe <[hidden email]> wrote:


> On 20 Mar 2018, at 00:19, Sean P. DeNigris <[hidden email]> wrote:
>
> Guillermo Polito wrote
>> I've done also some profiling and it does not look like we've lost in
>> performance either (reading and decoding a 35MB file):
>
> Your 1st (new API) example: "0:00:00:01.535"
> Your 2nd (old API) example:"0:00:00:01.732"
> *** but ***
> Your 1st (new API) example using the same consuming selector as example #1
> (#upToEnd): "0:00:00:07.816" or 4.5x slower

Yes and no ;-)

Look at MultiByteFileStream>>#upToEnd where they cheat a bit and do basically the same as example 1 using #next:

The Zn #upToEnd implementation does not use size knowledge (as streams are infinite).

As a side note: using #size and #position[:] on variable encoded streams like UTF8 is *very* dangerous and should be avoided. Counting/indexing bytes in the underlying stream is *not* the same as counting/indexing decoded characters. It can be the source of subtle bugs.

Sean,

Think that the Buffered stream can be reused in the context of sockets too. And there size has no meaning.

Basically the cost in there is on growing and copying the internal buffer used to read. This buffer for the 35M file ends up being ~35M, so it has to grow and copy several times...
Using #next: will preallocate the good size for the internal buffer.

Now, I'd say first that if people want performance they should use the good API :).

Maybe the ZnBuffered stream could ask its underlying stream for its #preferredInternalBufferSizeForUpToAll. But that would contaminate all the stream hierarchy... Is it worth it? Dunno... 
 


> -----
> Cheers,
> Sean
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>





--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13


Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Denis Kudriashov

2018-03-20 10:15 GMT+01:00 Denis Kudriashov <[hidden email]>:
We can cover this case by extra variable maxBufferSize.

Operations like upToEnd will relocate maximum buffer. 
#next: related methods can be also improved by it too. We will need extra method: 

ZnBufferedReadStream>>nextBuffer: requiredSize
        newBufferSize := requiredSize min: maxBufferSize max: minBufferSize.
        buffer size = newBufferSize ifFalse: [
           buffer := self collectionSpecies new: size].
limit := stream readInto: buffer startingAt: 1 count: buffer size.
position := 1

But it will require another variable minBufferSize.

By default it would be equal to given size in method #sizeBuffer:. And special users will specify concrete value. FileHandler will use file size for this.
It will also improve buffer size for small files where we do not need big buffer.

2018-03-20 9:57 GMT+01:00 Guillermo Polito <[hidden email]>:


On Tue, Mar 20, 2018 at 8:57 AM, Sven Van Caekenberghe <[hidden email]> wrote:


> On 20 Mar 2018, at 00:19, Sean P. DeNigris <[hidden email]> wrote:
>
> Guillermo Polito wrote
>> I've done also some profiling and it does not look like we've lost in
>> performance either (reading and decoding a 35MB file):
>
> Your 1st (new API) example: "0:00:00:01.535"
> Your 2nd (old API) example:"0:00:00:01.732"
> *** but ***
> Your 1st (new API) example using the same consuming selector as example #1
> (#upToEnd): "0:00:00:07.816" or 4.5x slower

Yes and no ;-)

Look at MultiByteFileStream>>#upToEnd where they cheat a bit and do basically the same as example 1 using #next:

The Zn #upToEnd implementation does not use size knowledge (as streams are infinite).

As a side note: using #size and #position[:] on variable encoded streams like UTF8 is *very* dangerous and should be avoided. Counting/indexing bytes in the underlying stream is *not* the same as counting/indexing decoded characters. It can be the source of subtle bugs.

Sean,

Think that the Buffered stream can be reused in the context of sockets too. And there size has no meaning.

Basically the cost in there is on growing and copying the internal buffer used to read. This buffer for the 35M file ends up being ~35M, so it has to grow and copy several times...
Using #next: will preallocate the good size for the internal buffer.

Now, I'd say first that if people want performance they should use the good API :).

Maybe the ZnBuffered stream could ask its underlying stream for its #preferredInternalBufferSizeForUpToAll. But that would contaminate all the stream hierarchy... Is it worth it? Dunno... 
 


> -----
> Cheers,
> Sean
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>





--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13



Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Sven Van Caekenberghe-2
I would very much prefer to take things slowly.

First we have to make the whole migration work, we are not yet there, there are lots of important things to fix.

If there are serious performance regressions, then yes we have to fix them.

But improving things or adding features should be done when everything is stable again.

> On 20 Mar 2018, at 10:31, Denis Kudriashov <[hidden email]> wrote:
>
>
> 2018-03-20 10:15 GMT+01:00 Denis Kudriashov <[hidden email]>:
> We can cover this case by extra variable maxBufferSize.
>
> Operations like upToEnd will relocate maximum buffer.
> #next: related methods can be also improved by it too. We will need extra method:
>
> ZnBufferedReadStream>>nextBuffer: requiredSize
>         newBufferSize := requiredSize min: maxBufferSize max: minBufferSize.
>         buffer size = newBufferSize ifFalse: [
>            buffer := self collectionSpecies new: size].
> limit := stream readInto: buffer startingAt: 1 count: buffer size.
> position := 1
>
> But it will require another variable minBufferSize.
>
> By default it would be equal to given size in method #sizeBuffer:. And special users will specify concrete value. FileHandler will use file size for this.
> It will also improve buffer size for small files where we do not need big buffer.
>
> 2018-03-20 9:57 GMT+01:00 Guillermo Polito <[hidden email]>:
>
>
> On Tue, Mar 20, 2018 at 8:57 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>
>
> > On 20 Mar 2018, at 00:19, Sean P. DeNigris <[hidden email]> wrote:
> >
> > Guillermo Polito wrote
> >> I've done also some profiling and it does not look like we've lost in
> >> performance either (reading and decoding a 35MB file):
> >
> > Your 1st (new API) example: "0:00:00:01.535"
> > Your 2nd (old API) example:"0:00:00:01.732"
> > *** but ***
> > Your 1st (new API) example using the same consuming selector as example #1
> > (#upToEnd): "0:00:00:07.816" or 4.5x slower
>
> Yes and no ;-)
>
> Look at MultiByteFileStream>>#upToEnd where they cheat a bit and do basically the same as example 1 using #next:
>
> The Zn #upToEnd implementation does not use size knowledge (as streams are infinite).
>
> As a side note: using #size and #position[:] on variable encoded streams like UTF8 is *very* dangerous and should be avoided. Counting/indexing bytes in the underlying stream is *not* the same as counting/indexing decoded characters. It can be the source of subtle bugs.
>
> Sean,
>
> Think that the Buffered stream can be reused in the context of sockets too. And there size has no meaning.
>
> Basically the cost in there is on growing and copying the internal buffer used to read. This buffer for the 35M file ends up being ~35M, so it has to grow and copy several times...
> Using #next: will preallocate the good size for the internal buffer.
>
> Now, I'd say first that if people want performance they should use the good API :).
>
> Maybe the ZnBuffered stream could ask its underlying stream for its #preferredInternalBufferSizeForUpToAll. But that would contaminate all the stream hierarchy... Is it worth it? Dunno...
>  
>
>
> > -----
> > Cheers,
> > Sean
> > --
> > Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
> >
>
>
>
>
>
> --
>    
> Guille Polito
> Research Engineer
>
> Centre de Recherche en Informatique, Signal et Automatique de Lille
> CRIStAL - UMR 9189
> French National Center for Scientific Research - http://www.cnrs.fr
>
> Web: http://guillep.github.io
> Phone: +33 06 52 70 66 13
>
>


Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Henrik Sperre Johansen
This is great!
And, +1 on the need for some more convenience constructors...

Word of warning (which would be nice to mention in the guide); using
writeStreamDo: and manually constructed buffered writers will not #flush the
buffered stream automatically on close, so it's crucial this is done
manually.

charGen := Random seed: 16543986234.
output := String new: 1024*1024*50.
1 to: output size do: [ :ix | output at: ix put: ((charGen nextInt: 256) -
1) asCharacter ].

[(File named: 'utf16test.txt') writeStreamDo: [:ws | |encStream|
                encStream := ZnCharacterWriteStream
                        on: (ZnBufferedWriteStream on: ws)
                        encoding: 'utf16'.
                encStream nextPutAll: output
                ]] timeToRun.

FileSize: 102 336 KB.

[(File named: 'utf16test2.txt') writeStreamDo: [:ws | |encStream|
                encStream := ZnCharacterWriteStream
                        on: (ZnBufferedWriteStream on: ws)
                        encoding: 'utf16'.
                encStream nextPutAll: output;flush
                ]] timeToRun.

FileSize: 102 400 KB.

It's about 4 times slower than the corresponding

[FileStream forceNewFileNamed: 'utf16test2.txt' do: [:ws |
                ws converter: UTF16TextConverter new.
                ws nextPutAll: output
                ]] timeToRun

but it's 100MB of UTF16, and 4 secs vs 1, so probably not relevant to real
use cases .

Another addition to the guide valuable to those of use still stuck in the
stone age, would be code  equivalents to FileStream methods:
forceNewFileNamed:do:
newFileNamed:do:
oldFileNamed:do:

Cheers,
Henry




--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

EstebanLM


> On 20 Mar 2018, at 12:56, Henrik Sperre Johansen <[hidden email]> wrote:
>
> This is great!
> And, +1 on the need for some more convenience constructors...
>
> Word of warning (which would be nice to mention in the guide); using
> writeStreamDo: and manually constructed buffered writers will not #flush the
> buffered stream automatically on close, so it's crucial this is done
> manually.

well… in fact it should flush it, isn’t?

>
> charGen := Random seed: 16543986234.
> output := String new: 1024*1024*50.
> 1 to: output size do: [ :ix | output at: ix put: ((charGen nextInt: 256) -
> 1) asCharacter ].
>
> [(File named: 'utf16test.txt') writeStreamDo: [:ws | |encStream|
> encStream := ZnCharacterWriteStream
> on: (ZnBufferedWriteStream on: ws)
> encoding: 'utf16'.
> encStream nextPutAll: output
> ]] timeToRun.
>
> FileSize: 102 336 KB.
>
> [(File named: 'utf16test2.txt') writeStreamDo: [:ws | |encStream|
> encStream := ZnCharacterWriteStream
> on: (ZnBufferedWriteStream on: ws)
> encoding: 'utf16'.
> encStream nextPutAll: output;flush
> ]] timeToRun.
>
> FileSize: 102 400 KB.
>
> It's about 4 times slower than the corresponding
>
> [FileStream forceNewFileNamed: 'utf16test2.txt' do: [:ws |
> ws converter: UTF16TextConverter new.
> ws nextPutAll: output
> ]] timeToRun
>
> but it's 100MB of UTF16, and 4 secs vs 1, so probably not relevant to real
> use cases .

is always relevant.
I wonder why would be 4x to the 2nd one.

Esteban

>
> Another addition to the guide valuable to those of use still stuck in the
> stone age, would be code  equivalents to FileStream methods:
> forceNewFileNamed:do:
> newFileNamed:do:
> oldFileNamed:do:
>
> Cheers,
> Henry
>
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>


Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Henrik Sperre Johansen
EstebanLM wrote
>> On 20 Mar 2018, at 12:56, Henrik Sperre Johansen &lt;

> henrik.s.johansen@

> &gt; wrote:
>>
>> This is great!
>> And, +1 on the need for some more convenience constructors...
>>
>> Word of warning (which would be nice to mention in the guide); using
>> writeStreamDo: and manually constructed buffered writers will not #flush
>> the
>> buffered stream automatically on close, so it's crucial this is done
>> manually.
>
> well… in fact it should flush it, isn’t?
>
>>

The binary filestream flushes to disk on each write operation iirc (or, at
least when it is being closed), but it has no way to know it is wrapped by a
buffered stream which also needs to be flushed to actually write the
remaining buffer contents to file.

Cheers,
Henry



--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Thierry Goubier
2018-03-20 13:20 GMT+01:00 Henrik Sperre Johansen
<[hidden email]>:

> EstebanLM wrote
>>> On 20 Mar 2018, at 12:56, Henrik Sperre Johansen &lt;
>
>> henrik.s.johansen@
>
>> &gt; wrote:
>>>
>>> This is great!
>>> And, +1 on the need for some more convenience constructors...
>>>
>>> Word of warning (which would be nice to mention in the guide); using
>>> writeStreamDo: and manually constructed buffered writers will not #flush
>>> the
>>> buffered stream automatically on close, so it's crucial this is done
>>> manually.
>>
>> well… in fact it should flush it, isn’t?
>>
>>>
>
> The binary filestream flushes to disk on each write operation iirc (or, at
> least when it is being closed), but it has no way to know it is wrapped by a
> buffered stream which also needs to be flushed to actually write the
> remaining buffer contents to file.

Could a binary filestream that has flush requirements (and close
requirements as well) be made aware of the presence of the buffered
stream above it to ensure a close / flush is done correctly?

Could structures where you have multiple buffered streams concurrently
opened on a single binary file stream considered as not exactly
correct?

Thierry

>
> Cheers,
> Henry
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>

Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

AndrewBlack
In reply to this post by Richard O'Keefe
I like the structure of streams wrapping streams from the point of view of modularity of implementation and flexibility of reuse (as in applying buffering and encoding to in-memory or network streams).

But I don’t like exposing al the machinery to the user.  For example, instead of

        bufferedReadStream := ZnBufferedReadStream on: aReadStream.
        bufferedWriteStream := ZnBufferedWriteStream on: aWriteStream.

why not

        bufferedReadStream := ZnBuffer on: aReadStream.
        bufferedWriteStream := ZnBuffer on: aWriteStream

where the class that implements the buffering looks at the stream and decides whether to do read or write buffering (or both, for a read-write stream).

The meaning of #position: on an encoded Unicode stream is perfectly well-defined.  It’s just not the same as the meaning of position on the underlying byteStream — one counts in codePoints, the other in bytes. The old file streams got this wrong; I’m hoping that the new ones get it right.  However, the comment that

        "As a side note: using #size and #position[:] on variable encoded streams
        like UTF8 is *very* dangerous and should be avoided.”

makes me worry that they do not.  Please, either implement #size and #position[:]   correctly on the encoded stream, or don’t implement them at all!  Do *not* implement them incorrectly, and then tell me to avoid them because they are dangerous!

        Andrew


Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Sven Van Caekenberghe-2
In reply to this post by Henrik Sperre Johansen


> On 20 Mar 2018, at 12:56, Henrik Sperre Johansen <[hidden email]> wrote:
>
> This is great!
> And, +1 on the need for some more convenience constructors...
>
> Word of warning (which would be nice to mention in the guide); using
> writeStreamDo: and manually constructed buffered writers will not #flush the
> buffered stream automatically on close, so it's crucial this is done
> manually.

Yes, you are right, but it is clearly documented in the class comment of ZnBufferedWriteStream

There is also ZnBufferedWriteStream class>>#on:do: to help

Now, these are system classes more than user classes. Normal users should get a ready made stream not knowing much about the construction process.

> charGen := Random seed: 16543986234.
> output := String new: 1024*1024*50.
> 1 to: output size do: [ :ix | output at: ix put: ((charGen nextInt: 256) -
> 1) asCharacter ].
>
> [(File named: 'utf16test.txt') writeStreamDo: [:ws | |encStream|
> encStream := ZnCharacterWriteStream
> on: (ZnBufferedWriteStream on: ws)
> encoding: 'utf16'.
> encStream nextPutAll: output
> ]] timeToRun.
>
> FileSize: 102 336 KB.
>
> [(File named: 'utf16test2.txt') writeStreamDo: [:ws | |encStream|
> encStream := ZnCharacterWriteStream
> on: (ZnBufferedWriteStream on: ws)
> encoding: 'utf16'.
> encStream nextPutAll: output;flush
> ]] timeToRun.
>
> FileSize: 102 400 KB.
>
> It's about 4 times slower than the corresponding
>
> [FileStream forceNewFileNamed: 'utf16test2.txt' do: [:ws |
> ws converter: UTF16TextConverter new.
> ws nextPutAll: output
> ]] timeToRun
>
> but it's 100MB of UTF16, and 4 secs vs 1, so probably not relevant to real
> use cases .

Ah, but you are cheating (deliberately or not ;-).

Your test string is a ByteString not a WideString, and UTF16TextConverter contains a fast path that basically does nothing.

Try another string, like

1 to: output size do: [ :ix | output at: ix put: ((charGen nextInt: 1024) -
1) asCharacter ].

Zn is the same speed, the old one does not even stop for me.

I am not sure we have to optimise UTF16, maybe ... adding the same fast path would not be too difficult I guess.

> Another addition to the guide valuable to those of use still stuck in the
> stone age, would be code  equivalents to FileStream methods:
> forceNewFileNamed:do:
> newFileNamed:do:
> oldFileNamed:do:
>
> Cheers,
> Henry
>
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>


Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Sven Van Caekenberghe-2
In reply to this post by AndrewBlack


> On 20 Mar 2018, at 14:51, Andrew P. Black <[hidden email]> wrote:
>
> I like the structure of streams wrapping streams from the point of view of modularity of implementation and flexibility of reuse (as in applying buffering and encoding to in-memory or network streams).
>
> But I don’t like exposing al the machinery to the user.  For example, instead of
>
> bufferedReadStream := ZnBufferedReadStream on: aReadStream.
> bufferedWriteStream := ZnBufferedWriteStream on: aWriteStream.
>
> why not
>
> bufferedReadStream := ZnBuffer on: aReadStream.
> bufferedWriteStream := ZnBuffer on: aWriteStream
>
> where the class that implements the buffering looks at the stream and decides whether to do read or write buffering (or both, for a read-write stream).

Hmm, then I would prefer real object messages, like #buffered not class facades.

> The meaning of #position: on an encoded Unicode stream is perfectly well-defined.  It’s just not the same as the meaning of position on the underlying byteStream — one counts in codePoints, the other in bytes.

Yes, agreed.

> The old file streams got this wrong; I’m hoping that the new ones get it right.  However, the comment that
>
> "As a side note: using #size and #position[:] on variable encoded streams
> like UTF8 is *very* dangerous and should be avoided.”
>
> makes me worry that they do not.  Please, either implement #size and #position[:]   correctly on the encoded stream, or don’t implement them at all!  Do *not* implement them incorrectly, and then tell me to avoid them because they are dangerous!

Well, I will repeat my opinion: #size #position #position: #reset have no place in streams.

I have an experimental implementation of a ZnPositionableReadStream that adds 'excursion behaviour' as an add-on on top of more primitive streams that do not. It allows for code like

  stream savingPositionDo: [ "read ahead" ].
  "continue reading as if noting happened"

Here is the class comment.

I am ZnPositionableReadStream.
I am polymorphic with (the most used/important methods of) ReadStream and PositionableStream.

I wrap another read stream and store the elements that I read in a sliding circular buffer so that I am able to go back to any position inside that buffer.

Essentially, I implement #position and #position: to be used to back out of reading ahead.

Note that the size of my buffer limits how far I can go backwards. A SubscriptOutOfBounds exception will be signalled when an attempt is made to go too far backwards.

The index returned by #position should be considered abstract, without concrete meaning, but it is currently implemented as the count of elements read by #next on the wrapped stream. On a simple stream over an in memory collection, that will be equivalent to an integer index into that collection. But on network streams or streams that were already further along, this will no longer be the case.

The most elementary example of my capabilities can be seen in my implementation of #peek. See also the unit tests #testPlainExcursion and #testSearch

Of course, backing out of an excursion is only possible within the window of the buffer size.

Implementation

- stream <ReadStream> the read stream that I wrap and add positioning to
- buffer <String|ByteArray> sliding, circular buffer
- index <PositiveInteger> zero based index into buffer, where next will be stored
- count <PositiveInteger> number of next operations done on wrapped stream
- delta <PositiveInteger> number of positions that I was moved backwards

The real core methods are #next, #atEnd, #position and #position: and are used to implement the rest.

Part of Zinc HTTP Components



Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Guillermo Polito
Hi all,

Don't forget also that this are the low level bricks used in Pharo. Utterly required by the bootstrap and the minimal image, thus keeping dependencies small and controlled is a big +.

Magic and automatic stuff can be added on top. Nothing prevents to add a little layer on top as a separate library (I even encourage people to do so!) adding a builder, or a DSL to ease the construction of streams in the most performant way possible.
Pharo has extension methods, and even this could be done with some extra objects managing the creation of the stream without being too complex.

Guille

On Tue, Mar 20, 2018 at 3:12 PM, Sven Van Caekenberghe <[hidden email]> wrote:


> On 20 Mar 2018, at 14:51, Andrew P. Black <[hidden email]> wrote:
>
> I like the structure of streams wrapping streams from the point of view of modularity of implementation and flexibility of reuse (as in applying buffering and encoding to in-memory or network streams).
>
> But I don’t like exposing al the machinery to the user.  For example, instead of
>
>       bufferedReadStream := ZnBufferedReadStream on: aReadStream.
>       bufferedWriteStream := ZnBufferedWriteStream on: aWriteStream.
>
> why not
>
>       bufferedReadStream := ZnBuffer on: aReadStream.
>       bufferedWriteStream := ZnBuffer on: aWriteStream
>
> where the class that implements the buffering looks at the stream and decides whether to do read or write buffering (or both, for a read-write stream).

Hmm, then I would prefer real object messages, like #buffered not class facades.

> The meaning of #position: on an encoded Unicode stream is perfectly well-defined.  It’s just not the same as the meaning of position on the underlying byteStream — one counts in codePoints, the other in bytes.

Yes, agreed.

> The old file streams got this wrong; I’m hoping that the new ones get it right.  However, the comment that
>
>       "As a side note: using #size and #position[:] on variable encoded streams
>       like UTF8 is *very* dangerous and should be avoided.”
>
> makes me worry that they do not.  Please, either implement #size and #position[:]   correctly on the encoded stream, or don’t implement them at all!  Do *not* implement them incorrectly, and then tell me to avoid them because they are dangerous!

Well, I will repeat my opinion: #size #position #position: #reset have no place in streams.

I have an experimental implementation of a ZnPositionableReadStream that adds 'excursion behaviour' as an add-on on top of more primitive streams that do not. It allows for code like

  stream savingPositionDo: [ "read ahead" ].
  "continue reading as if noting happened"

Here is the class comment.

I am ZnPositionableReadStream.
I am polymorphic with (the most used/important methods of) ReadStream and PositionableStream.

I wrap another read stream and store the elements that I read in a sliding circular buffer so that I am able to go back to any position inside that buffer.

Essentially, I implement #position and #position: to be used to back out of reading ahead.

Note that the size of my buffer limits how far I can go backwards. A SubscriptOutOfBounds exception will be signalled when an attempt is made to go too far backwards.

The index returned by #position should be considered abstract, without concrete meaning, but it is currently implemented as the count of elements read by #next on the wrapped stream. On a simple stream over an in memory collection, that will be equivalent to an integer index into that collection. But on network streams or streams that were already further along, this will no longer be the case.

The most elementary example of my capabilities can be seen in my implementation of #peek. See also the unit tests #testPlainExcursion and #testSearch

Of course, backing out of an excursion is only possible within the window of the buffer size.

Implementation

- stream <ReadStream> the read stream that I wrap and add positioning to
- buffer <String|ByteArray> sliding, circular buffer
- index <PositiveInteger> zero based index into buffer, where next will be stored
- count <PositiveInteger> number of next operations done on wrapped stream
- delta <PositiveInteger> number of positions that I was moved backwards

The real core methods are #next, #atEnd, #position and #position: and are used to implement the rest.

Part of Zinc HTTP Components






--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: +33 06 52 70 66 13