New Files in Pharo - Migration Guide, How To's and examples

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

New Files in Pharo - Migration Guide, How To's and examples

Guillermo Polito
Hi all,

I've put some minutes summarizing the new APIs provided by the combination of the new File implementation and the Zn encoders. They all basically follow the decorator pattern to stack different responsibilities such as buffering, encoding, line ending convertions.

Please, do not hesitate to give your feedback.

Guille


1. Basic Files

By default files are binary. Not buffered.

(File named: 'name') readStream.
(File named: 'name') readStreamDo: [ :stream | ... ].
(File named: 'name') writeStream.
(File named: 'name') writeStreamDo: [ :stream | ... ].


2. Encoding

To add encoding, wrap a stream with a corresponding ZnCharacterRead/WriteStream.

"Reading"
utf8Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf8'.
utf16Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf16'.

"Writing"
utf8Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf8'.
utf16Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf16'.

3. Buffering

To add buffering, wrap a stream with a corresponding ZnBufferedRead/WriteStream.

bufferedReadStream := ZnBufferedReadStream on: aStream.
bufferedWriteStream := ZnBufferedWriteStream on: aStream.

It is in general better to buffer the reading on the binary file and apply the encoding on the buffer in memory than the other way around. See

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding: 'utf8') upToEnd
]] timeToRun. "0:00:00:09.288"

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnBufferedReadStream on: (ZnCharacterReadStream on: binaryFile encoding: 'utf8')) upToEnd
]] timeToRun. "0:00:00:14.189"

4. File System

By default, file system files are buffered and utf8 encoded to keep backwards compatibility.

'name' asFileReference readStreamDo: [ :bufferedUtf8Stream | ... ].
'name' asFileReference writeStreamDo: [ :bufferedUtf8Stream | ... ].

FileStream also provides access to plain binary files using the #binaryRead/WriteStream messages. Binary streams are buffered by default also.

'name' asFileReference binaryReadStreamDo: [ :bufferedBinaryStream | ... ].
'name' asFileReference binaryWriteStreamDo: [ :bufferedBinaryStream | ... ].

If you want a file with another encoding (to come in the PR https://github.com/pharo-project/pharo/pull/1134), you can specify it while obtaining the stream:

'name' asFileReference
    readStreamEncoded: 'utf16'
    do: [ :bufferedUtf16Stream | ... ].

'name' asFileReference 
    writeStreamEncoded: 'utf8'
    do: [ :bufferedUtf16Stream | ... ].

5. Line Ending Conventions

If you want to write files following a specific line ending convention, use the ZnNewLineWriterStream.
This stream decorator will transform any line ending (cr, lf, crlf) into a defined line ending.
By default it chooses the platform line ending convention.

lineWriter := ZnNewLineWriterStream on: aStream.

If you want to choose another line ending convention you can do:

lineWriter forCr.
lineWriter forLf.
lineWriter forCrLf.
lineWriter forPlatformLineEnding.

--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Denis Kudriashov
Hi Guille.

What you think to add helpfull converting methods?

aStream buffered.
aStream encodedWith: 'utf8'.
aStream utf8Encoded. "because it is very common case".
aStream decodedFrom: 'utf8'.
aStream utf8Decoded. "because it is very common case".
aStream withLineEnding: String cr.
aStream withPlatformLineEnding.

(all methods will return new streams)



2018-03-19 17:19 GMT+01:00 Guillermo Polito <[hidden email]>:
Hi all,

I've put some minutes summarizing the new APIs provided by the combination of the new File implementation and the Zn encoders. They all basically follow the decorator pattern to stack different responsibilities such as buffering, encoding, line ending convertions.

Please, do not hesitate to give your feedback.

Guille


1. Basic Files

By default files are binary. Not buffered.

(File named: 'name') readStream.
(File named: 'name') readStreamDo: [ :stream | ... ].
(File named: 'name') writeStream.
(File named: 'name') writeStreamDo: [ :stream | ... ].


2. Encoding

To add encoding, wrap a stream with a corresponding ZnCharacterRead/WriteStream.

"Reading"
utf8Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf8'.
utf16Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf16'.

"Writing"
utf8Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf8'.
utf16Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf16'.

3. Buffering

To add buffering, wrap a stream with a corresponding ZnBufferedRead/WriteStream.

bufferedReadStream := ZnBufferedReadStream on: aStream.
bufferedWriteStream := ZnBufferedWriteStream on: aStream.

It is in general better to buffer the reading on the binary file and apply the encoding on the buffer in memory than the other way around. See

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding: 'utf8') upToEnd
]] timeToRun. "0:00:00:09.288"

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnBufferedReadStream on: (ZnCharacterReadStream on: binaryFile encoding: 'utf8')) upToEnd
]] timeToRun. "0:00:00:14.189"

4. File System

By default, file system files are buffered and utf8 encoded to keep backwards compatibility.

'name' asFileReference readStreamDo: [ :bufferedUtf8Stream | ... ].
'name' asFileReference writeStreamDo: [ :bufferedUtf8Stream | ... ].

FileStream also provides access to plain binary files using the #binaryRead/WriteStream messages. Binary streams are buffered by default also.

'name' asFileReference binaryReadStreamDo: [ :bufferedBinaryStream | ... ].
'name' asFileReference binaryWriteStreamDo: [ :bufferedBinaryStream | ... ].

If you want a file with another encoding (to come in the PR https://github.com/pharo-project/pharo/pull/1134), you can specify it while obtaining the stream:

'name' asFileReference
    readStreamEncoded: 'utf16'
    do: [ :bufferedUtf16Stream | ... ].

'name' asFileReference 
    writeStreamEncoded: 'utf8'
    do: [ :bufferedUtf16Stream | ... ].

5. Line Ending Conventions

If you want to write files following a specific line ending convention, use the ZnNewLineWriterStream.
This stream decorator will transform any line ending (cr, lf, crlf) into a defined line ending.
By default it chooses the platform line ending convention.

lineWriter := ZnNewLineWriterStream on: aStream.

If you want to choose another line ending convention you can do:

lineWriter forCr.
lineWriter forLf.
lineWriter forCrLf.
lineWriter forPlatformLineEnding.

--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13


Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Esteban A. Maringolo
2018-03-19 16:32 GMT-03:00 Denis Kudriashov <[hidden email]>:
>
> Hi Guille.

> What you think to add helpfull converting methods?

I'd like them. Anything that saves me from passing a string literal as
parameter is good. :)

> aStream buffered.

I'd only change this one, to #beBuffered.

> aStream withLineEnding: String cr.
> aStream withPlatformLineEnding.

These two seem ambiguous to me, #with prefix signals something else in my head.

I'd remove the "with" and use instead:

  aStream lineEnding: String cr.
  aStream usePlatformLineEnding.

Regards!



Esteban A. Maringolo

Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Stephane Ducasse-3
+ one to Esteban gret suggestions!


On Mon, Mar 19, 2018 at 8:39 PM, Esteban A. Maringolo
<[hidden email]> wrote:

> 2018-03-19 16:32 GMT-03:00 Denis Kudriashov <[hidden email]>:
>>
>> Hi Guille.
>
>> What you think to add helpfull converting methods?
>
> I'd like them. Anything that saves me from passing a string literal as
> parameter is good. :)
>
>> aStream buffered.
>
> I'd only change this one, to #beBuffered.
>
>> aStream withLineEnding: String cr.
>> aStream withPlatformLineEnding.
>
> These two seem ambiguous to me, #with prefix signals something else in my head.
>
> I'd remove the "with" and use instead:
>
>   aStream lineEnding: String cr.
>   aStream usePlatformLineEnding.
>
> Regards!
>
>
>
> Esteban A. Maringolo
>

Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Denis Kudriashov
In reply to this post by Esteban A. Maringolo


2018-03-19 20:39 GMT+01:00 Esteban A. Maringolo <[hidden email]>:
2018-03-19 16:32 GMT-03:00 Denis Kudriashov <[hidden email]>:
>
> Hi Guille.

> What you think to add helpfull converting methods?

I'd like them. Anything that saves me from passing a string literal as
parameter is good. :)

> aStream buffered.

I'd only change this one, to #beBuffered.

But #be sounds like it modifies receiver. 
But idea is same as #reversed or #sorted for collections. 
 

> aStream withLineEnding: String cr.
> aStream withPlatformLineEnding.

These two seem ambiguous to me, #with prefix signals something else in my head. 

I'd remove the "with" and use instead:

  aStream lineEnding: String cr.
  aStream usePlatformLineEnding.

Same here. They will return new instances wrapping receiver. And #with is like #copyWith: for arrays.
Maybe something like:

aStream forcedLineEnding: String cr.
aStream forcedPlatformLineEnding.
 

Regards!



Esteban A. Maringolo


Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Nicolas Cellier


2018-03-19 21:21 GMT+01:00 Denis Kudriashov <[hidden email]>:


2018-03-19 20:39 GMT+01:00 Esteban A. Maringolo <[hidden email]>:
2018-03-19 16:32 GMT-03:00 Denis Kudriashov <[hidden email]>:
>
> Hi Guille.

> What you think to add helpfull converting methods?

I'd like them. Anything that saves me from passing a string literal as
parameter is good. :)

> aStream buffered.

I'd only change this one, to #beBuffered.

But #be sounds like it modifies receiver. 
But idea is same as #reversed or #sorted for collections. 
 

-1 for beBuffered, it's not about adding states to the stream.
As Denis said, it's a transform producing a new stream.
 

> aStream withLineEnding: String cr.
> aStream withPlatformLineEnding.

These two seem ambiguous to me, #with prefix signals something else in my head. 

I'd remove the "with" and use instead:

  aStream lineEnding: String cr.
  aStream usePlatformLineEnding.

Same here. They will return new instances wrapping receiver. And #with is like #copyWith: for arrays.
Maybe something like:

aStream forcedLineEnding: String cr.
aStream forcedPlatformLineEnding.
 

Regards!



Esteban A. Maringolo



Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Esteban A. Maringolo
2018-03-19 17:42 GMT-03:00 Nicolas Cellier <[hidden email]>:
> 2018-03-19 21:21 GMT+01:00 Denis Kudriashov <[hidden email]>:
>>
>> 2018-03-19 20:39 GMT+01:00 Esteban A. Maringolo <[hidden email]>:
>>>
>>> 2018-03-19 16:32 GMT-03:00 Denis Kudriashov <[hidden email]>:

>>> > aStream buffered.
>>> I'd only change this one, to #beBuffered.

>> But #be sounds like it modifies receiver.
>> But idea is same as #reversed or #sorted for collections.
> -1 for beBuffered, it's not about adding states to the stream.
> As Denis said, it's a transform producing a new stream.


I didn't understand it was returning a transformed instance, I
actually thought it was the opposite, so if that is the case then
Denis' suggested selectors are fine.

Thanks for the clarification.

Esteban A. Maringolo

Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Stephane Ducasse-3
In reply to this post by Denis Kudriashov
Ah if this is that then I agree with you denis :)
This is why having good names is important.

On Mon, Mar 19, 2018 at 9:21 PM, Denis Kudriashov <[hidden email]> wrote:

>
>
> 2018-03-19 20:39 GMT+01:00 Esteban A. Maringolo <[hidden email]>:
>>
>> 2018-03-19 16:32 GMT-03:00 Denis Kudriashov <[hidden email]>:
>> >
>> > Hi Guille.
>>
>> > What you think to add helpfull converting methods?
>>
>> I'd like them. Anything that saves me from passing a string literal as
>> parameter is good. :)
>>
>> > aStream buffered.
>>
>> I'd only change this one, to #beBuffered.
>
>
> But #be sounds like it modifies receiver.
> But idea is same as #reversed or #sorted for collections.
>
>>
>>
>> > aStream withLineEnding: String cr.
>> > aStream withPlatformLineEnding.
>>
>> These two seem ambiguous to me, #with prefix signals something else in my
>> head.
>>
>>
>> I'd remove the "with" and use instead:
>>
>>   aStream lineEnding: String cr.
>>   aStream usePlatformLineEnding.
>
>
> Same here. They will return new instances wrapping receiver. And #with is
> like #copyWith: for arrays.
> Maybe something like:
>
> aStream forcedLineEnding: String cr.
> aStream forcedPlatformLineEnding.
>
>
>>
>>
>> Regards!
>>
>>
>>
>> Esteban A. Maringolo
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: New Files in Pharo - Migration Guide, How To's and examples

Guillermo Polito
In reply to this post by Esteban A. Maringolo
Yes, it would be good to have those kind of methods. But there are some other constraints to have into account :)
  - introducing these without introducing duplications is right now complicated because streams do not share a hierarchy
  - having a common hierarchy is maybe complicated: ZnStreams should for the moment be backwards compatible with Pharo>4?
  - using traits to avoid duplications would be good, but for now traits are forbidden from kernel packages
        (otherwise we need to add traits in the kernel, + trait support in the class builder, runtime,... and the image will grow MBs and the build time will grow too)

About the naming, I'm for

 - #buffered / #bufferedWithBufferSize: size
 - #encoded / #utf8Encoded / #encodedAs: encoding
 - #usingCr / #usingLf / #usingCrLf / #usingPlatformLineEndingConvention / #usingLineConvention: aConvention

On Mon, Mar 19, 2018 at 9:53 PM, Esteban A. Maringolo <[hidden email]> wrote:
2018-03-19 17:42 GMT-03:00 Nicolas Cellier <[hidden email]>:
> 2018-03-19 21:21 GMT+01:00 Denis Kudriashov <[hidden email]>:
>>
>> 2018-03-19 20:39 GMT+01:00 Esteban A. Maringolo <[hidden email]>:
>>>
>>> 2018-03-19 16:32 GMT-03:00 Denis Kudriashov <[hidden email]>:

>>> > aStream buffered.
>>> I'd only change this one, to #beBuffered.

>> But #be sounds like it modifies receiver.
>> But idea is same as #reversed or #sorted for collections.
> -1 for beBuffered, it's not about adding states to the stream.
> As Denis said, it's a transform producing a new stream.


I didn't understand it was returning a transformed instance, I
actually thought it was the opposite, so if that is the case then
Denis' suggested selectors are fine.

Thanks for the clarification.

Esteban A. Maringolo




--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: +33 06 52 70 66 13