[VW7.3][Linux/Debian] - Encoded streams and switching between binary and text

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[VW7.3][Linux/Debian] - Encoded streams and switching between binary and text

Ladislav Lenart
Hi,

I've run into the strange issue with encoded streams. Let's suppose I
have the following two streams:

    stream1 := ('file' asFilename withEncoding: #'iso8859-2') writeStream.
    stream2 := ('file' asFilename withEncoding: #'ms_cp_1252') writeStream.

The problem is that stream1 is

    EncodedStream on: ExternalWriteStream

but stream2 is just

    ExternalWriteStream.

My problem is that with stream1 I can freely change between #binary and
#text during writing a file. But I can not do this with stream2, because
after the first #binary it forgets its encoding (#'ms_cp_1252') and
sending #text to it makes the stream internally encoded using _platfom
default encoding_ (which in my case is #'iso8859-1').

The cause of all this seems to be in
EncodedStreamConstructor>>addEncodingTo: method, namely the second part
of the condition:

    ((encoder isMemberOf: ByteStreamEncoder) and: [encoder encoder
definitionClass notNil]).

If I manually create EncodedStream with #'ms_cp_1252' encoding, all
works as expected.

So the first thing that interests me is the (surely ingenious) reason of
this behavior.
Personnaly, I dislike this behavior for two reasons:
    1) it does not provide the same functionality (this absence in my
case leads to error)
    2) when I use #withEncoding: on aFilename, I simply expect it to
return EncodedStream (as a result of #writeStream).

Thanks for any explanation,

Ladislav Lenart

Reply | Threaded
Open this post in threaded view
|

Re: [VW7.3][Linux/Debian] - Encoded streams and switching between binary and text

kobetic
I suspect that the reason for this behavior is performance. For the
selected few encodings we can use the VM's optimized String types (see
subclasses of ByteEncodedString to see which those are) to handle the
byte<->character transformation. This is presumably noticeably faster
than running it through the Encoder machinery.

However I don't see a reason why binary/text behavior couldn't be
polymorphic between the two, given that BufferedExternalStreams have the
binary flag as well. I'll poke around some more and probably create an
AR for this issue. I'll post the details here.

Is there any other functionality that you see missing from the bare
external stream ?

Thanks for reporting this.

Martin

Ladislav Lenart wrote:

> Hi,
>
> I've run into the strange issue with encoded streams. Let's suppose I
> have the following two streams:
>
>    stream1 := ('file' asFilename withEncoding: #'iso8859-2') writeStream.
>    stream2 := ('file' asFilename withEncoding: #'ms_cp_1252') writeStream.
>
> The problem is that stream1 is
>
>    EncodedStream on: ExternalWriteStream
>
> but stream2 is just
>
>    ExternalWriteStream.
>
> My problem is that with stream1 I can freely change between #binary and
> #text during writing a file. But I can not do this with stream2, because
> after the first #binary it forgets its encoding (#'ms_cp_1252') and
> sending #text to it makes the stream internally encoded using _platfom
> default encoding_ (which in my case is #'iso8859-1').
>
> The cause of all this seems to be in
> EncodedStreamConstructor>>addEncodingTo: method, namely the second part
> of the condition:
>
>    ((encoder isMemberOf: ByteStreamEncoder) and: [encoder encoder
> definitionClass notNil]).
>
> If I manually create EncodedStream with #'ms_cp_1252' encoding, all
> works as expected.
>
> So the first thing that interests me is the (surely ingenious) reason of
> this behavior.
> Personnaly, I dislike this behavior for two reasons:
>    1) it does not provide the same functionality (this absence in my
> case leads to error)
>    2) when I use #withEncoding: on aFilename, I simply expect it to
> return EncodedStream (as a result of #writeStream).
>
> Thanks for any explanation,
>
> Ladislav Lenart
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [VW7.3][Linux/Debian] - Encoded streams and switching between binary and text

Ladislav Lenart
Martin Kobetic wrote:

> I suspect that the reason for this behavior is performance. For the
> selected few encodings we can use the VM's optimized String types (see
> subclasses of ByteEncodedString to see which those are) to handle the
> byte<->character transformation. This is presumably noticeably faster
> than running it through the Encoder machinery.
>
> However I don't see a reason why binary/text behavior couldn't be
> polymorphic between the two, given that BufferedExternalStreams have
> the binary flag as well. I'll poke around some more and probably
> create an AR for this issue. I'll post the details here.

Thank you for the explanation, I thought about something like this but
was not sure...

The reason is this:

    | stream |
    stream := ('file' asFilename withEncoding: #'ms_cp_1252') writeStream.
        "stream ioBuffer bufferClass is now MSCP1252String"
    stream binary.
        "stream ioBuffer bufferClass is now ByteArray"
    stream text.
        "stream ioBuffer bufferClass is now ISO8859L1String (from String
class>>defaultPlatformClass)

So the stream simply forgets its previous text encoding...

>
> Is there any other functionality that you see missing from the bare
> external stream ?

Well, I am glad you ask, but unfortunately for me, I can't remember any
functionality that I am missing so far... :-)

But sometimes, when I recall my few "good old Java days", one thing that
I liked about it was its approach to streams. It was more like filters
that one can nest to each other. I think I like it because it prefers
composition over inheritance and thus it leads to simpler components and
gains more flexibility.

Ladislav Lenart

>
> Thanks for reporting this.
>
> Martin
>
> Ladislav Lenart wrote:
>
>> Hi,
>>
>> I've run into the strange issue with encoded streams. Let's suppose I
>> have the following two streams:
>>
>>    stream1 := ('file' asFilename withEncoding: #'iso8859-2')
>> writeStream.
>>    stream2 := ('file' asFilename withEncoding: #'ms_cp_1252')
>> writeStream.
>>
>> The problem is that stream1 is
>>
>>    EncodedStream on: ExternalWriteStream
>>
>> but stream2 is just
>>
>>    ExternalWriteStream.
>>
>> My problem is that with stream1 I can freely change between #binary
>> and #text during writing a file. But I can not do this with stream2,
>> because after the first #binary it forgets its encoding
>> (#'ms_cp_1252') and sending #text to it makes the stream internally
>> encoded using _platfom default encoding_ (which in my case is
>> #'iso8859-1').
>>
>> The cause of all this seems to be in
>> EncodedStreamConstructor>>addEncodingTo: method, namely the second
>> part of the condition:
>>
>>    ((encoder isMemberOf: ByteStreamEncoder) and: [encoder encoder
>> definitionClass notNil]).
>>
>> If I manually create EncodedStream with #'ms_cp_1252' encoding, all
>> works as expected.
>>
>> So the first thing that interests me is the (surely ingenious) reason
>> of this behavior.
>> Personnaly, I dislike this behavior for two reasons:
>>    1) it does not provide the same functionality (this absence in my
>> case leads to error)
>>    2) when I use #withEncoding: on aFilename, I simply expect it to
>> return EncodedStream (as a result of #writeStream).
>>
>> Thanks for any explanation,
>>
>> Ladislav Lenart
>>
>>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: [VW7.3][Linux/Debian] - Encoded streams and switching between binary and text

kobetic
In reply to this post by Ladislav Lenart
OK, I've created AR#50174: "ExternalBufferedStreams can loose the
original encoding when flipping the binary/text mode" for this. I have
added you to the list of parties to notify on updates so you should
receive a notice soon.

Thanks again,

Martin

Ladislav Lenart wrote:

> Hi,
>
> I've run into the strange issue with encoded streams. Let's suppose I
> have the following two streams:
>
>    stream1 := ('file' asFilename withEncoding: #'iso8859-2') writeStream.
>    stream2 := ('file' asFilename withEncoding: #'ms_cp_1252') writeStream.
>
> The problem is that stream1 is
>
>    EncodedStream on: ExternalWriteStream
>
> but stream2 is just
>
>    ExternalWriteStream.
>
> My problem is that with stream1 I can freely change between #binary and
> #text during writing a file. But I can not do this with stream2, because
> after the first #binary it forgets its encoding (#'ms_cp_1252') and
> sending #text to it makes the stream internally encoded using _platfom
> default encoding_ (which in my case is #'iso8859-1').
>
> The cause of all this seems to be in
> EncodedStreamConstructor>>addEncodingTo: method, namely the second part
> of the condition:
>
>    ((encoder isMemberOf: ByteStreamEncoder) and: [encoder encoder
> definitionClass notNil]).
>
> If I manually create EncodedStream with #'ms_cp_1252' encoding, all
> works as expected.
>
> So the first thing that interests me is the (surely ingenious) reason of
> this behavior.
> Personnaly, I dislike this behavior for two reasons:
>    1) it does not provide the same functionality (this absence in my
> case leads to error)
>    2) when I use #withEncoding: on aFilename, I simply expect it to
> return EncodedStream (as a result of #writeStream).
>
> Thanks for any explanation,
>
> Ladislav Lenart
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [VW7.3][Linux/Debian] - Encoded streams and switching between binary and text

kobetic
In reply to this post by Ladislav Lenart
Ladislav Lenart wrote:

> The reason is this:
>
>    | stream |
>    stream := ('file' asFilename withEncoding: #'ms_cp_1252') writeStream.
>        "stream ioBuffer bufferClass is now MSCP1252String"
>    stream binary.
>        "stream ioBuffer bufferClass is now ByteArray"
>    stream text.
>        "stream ioBuffer bufferClass is now ISO8859L1String (from String
> class>>defaultPlatformClass)
>
> So the stream simply forgets its previous text encoding...

Yup, I observed the same behavior. You'll see more on that from me in
the AR notification.

>> Is there any other functionality that you see missing from the bare
>> external stream ?
>
>
> Well, I am glad you ask, but unfortunately for me, I can't remember any
> functionality that I am missing so far... :-)
>
> But sometimes, when I recall my few "good old Java days", one thing that
> I liked about it was its approach to streams. It was more like filters
> that one can nest to each other. I think I like it because it prefers
> composition over inheritance and thus it leads to simpler components and
> gains more flexibility.

Actually, this has been on my mind for a while. Stream composition would
be very useful in many scenarios that I work with. Personally I don't
see any realistic possibility of the existing Stream hierarchy to adopt
composition style. It's a core piece and there's just way too much stuff
out there that undoubtedly depends on its little quirks. I suspect that
the only pragmatic way forward is starting a brand new implementation.
Moreover I can't stand some of the "features" the old stuff has, e.g.
the EndOfStreamNotification makes for some very entertaining debugging
sessions. Yet there's no way you could rip that behavior out of the
existing system. So starting over seems like the most viable option to
me. It would probably make a pretty good open source project too, maybe
even <gasp> portable between dialects :-).

Martin

Reply | Threaded
Open this post in threaded view
|

Re: [VW7.3][Linux/Debian] - Encoded streams and switching between binary and text

anthony lander

On 18-Jan-06, at 2:48 PM, Martin Kobetic wrote:

> Actually, this has been on my mind for a while. Stream composition
> would be very useful in many scenarios that I work with. Personally I
> don't see any realistic possibility of the existing Stream hierarchy
> to adopt composition style.

I don't think it has to adopt a new style. Instead, you can write a
StreamWrapper hierarchy (okay...loaded name) which provides services on
top of a base stream. no?

   -anthony

--
PGP key at http://anthony.etherealplanet.org
3FF8 6319 CADA 2D21 03BB 175F 3382 822A 502F AE80

Reply | Threaded
Open this post in threaded view
|

Re: [VW7.3][Linux/Debian] - Encoded streams and switching between binary and text

kobetic
anthony lander wrote:
>> Actually, this has been on my mind for a while. Stream composition
>> would be very useful in many scenarios that I work with. Personally I
>> don't see any realistic possibility of the existing Stream hierarchy
>> to adopt composition style.
>
>
> I don't think it has to adopt a new style. Instead, you can write a
> StreamWrapper hierarchy (okay...loaded name) which provides services on
> top of a base stream. no?

That might be doable if you're happy with the stuff that you're
wrapping. I'd be more inclined to start from scratch reusing/copying
pieces that are worth it. I'm inclined to agree with Ladislav that the
single-inheritance enforced structure just doesn't work well for
Streams. Also I don't like many of the design decisions made, the whole
Internal/External schizophrenia, the character/byte assumptions
sprinkled all over, PeekableStreams using positioning for peeking and
breaking on buffer boundaries for external streams, end of stream
handling, and on and on. To me it just seems like an awful lot of
duck-tape all around that thing, to be worth wrapping some more :-).

I want something elegant, like Vassili's Announcements :-).

Reply | Threaded
Open this post in threaded view
|

RE: [VW7.3][Linux/Debian] - Encoded streams and switching between binary and text

Terry Raymond
Martin

If you are really ambitious I would recommend recreating
the stream hierarchy using VW traits. Utimately, you may
not want to use traits but because there is so much common
functionality in the various stream classes it may be that
the easiest way to sort it all out is to use something like
traits.

Terry
 
===========================================================
Terry Raymond       Smalltalk Professional Debug Package
Crafted Smalltalk
80 Lazywood Ln.
Tiverton, RI  02878
(401) 624-4517      [hidden email]
<http://www.craftedsmalltalk.com>
===========================================================

> -----Original Message-----
> From: Martin Kobetic [mailto:[hidden email]]
> Sent: Wednesday, January 18, 2006 3:20 PM
> To: anthony lander
> Cc: Vwnc
> Subject: Re: [VW7.3][Linux/Debian] - Encoded streams and switching between
> binary and text
>
> anthony lander wrote:
> >> Actually, this has been on my mind for a while. Stream composition
> >> would be very useful in many scenarios that I work with. Personally I
> >> don't see any realistic possibility of the existing Stream hierarchy
> >> to adopt composition style.
> >
> >
> > I don't think it has to adopt a new style. Instead, you can write a
> > StreamWrapper hierarchy (okay...loaded name) which provides services on
> > top of a base stream. no?
>
> That might be doable if you're happy with the stuff that you're
> wrapping. I'd be more inclined to start from scratch reusing/copying
> pieces that are worth it. I'm inclined to agree with Ladislav that the
> single-inheritance enforced structure just doesn't work well for
> Streams. Also I don't like many of the design decisions made, the whole
> Internal/External schizophrenia, the character/byte assumptions
> sprinkled all over, PeekableStreams using positioning for peeking and
> breaking on buffer boundaries for external streams, end of stream
> handling, and on and on. To me it just seems like an awful lot of
> duck-tape all around that thing, to be worth wrapping some more :-).
>
> I want something elegant, like Vassili's Announcements :-).

Reply | Threaded
Open this post in threaded view
|

Re: [VW7.3][Linux/Debian] - Encoded streams and switching between binary and text

Reinout Heeck-2
Terry Raymond wrote:
> If you are really ambitious I would recommend recreating
> the stream hierarchy using VW traits. Utimately, you may
> not want to use traits but because there is so much common
> functionality in the various stream classes it may be that
> the easiest way to sort it all out is to use something like
> traits.

This has been studied in the case of Squeak, see
  http://www.cs.pdx.edu/~black/publications/refactoringsACM.pdf
for inspiration.


R
-

Reply | Threaded
Open this post in threaded view
|

Re: [VW7.3][Linux/Debian] - Encoded streams and switching between binary and text

kobetic
I agree that traits would probably work well, but they are still too
"experimental". If you use that in such a core piece like streams,
you're pretty much forcing traits into the core as well. I'm not sure
I'd want to fight that battle as well.
Moreover you can probably get by quite far just by using delegation as
well. For example I was thinking we could bury the read/write aspect
inside the stream with some sort of StreamAccess strategy, so the
read/write aspect doesn't even show in the hierarchy anymore.

However that paper definitely looks interesting, regardless of the
implementation strategy used. Thanks, Reinout! Another source of
inspiration might be strongtalk's refactored library. I haven't looked
yet, but that one is supposed to be a fairly extensive revision of the
usual st library.

Also I was wondering if there are good examples of stream hierarchy
design, outside of Smalltalk. I hear both positive and negative comments
on the Java ones. Any hands-on experience with others out there ?




Reinout Heeck wrote:

> Terry Raymond wrote:
>
>> If you are really ambitious I would recommend recreating
>> the stream hierarchy using VW traits. Utimately, you may
>> not want to use traits but because there is so much common
>> functionality in the various stream classes it may be that
>> the easiest way to sort it all out is to use something like
>> traits.
>
>
> This has been studied in the case of Squeak, see
>  http://www.cs.pdx.edu/~black/publications/refactoringsACM.pdf
> for inspiration.
>
>
> R
> -
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [VW7.3][Linux/Debian] - Encoded streams and switching between binary and text

Colin Putney
Martin Kobetic wrote:

> However that paper definitely looks interesting, regardless of the
> implementation strategy used. Thanks, Reinout! Another source of
> inspiration might be strongtalk's refactored library. I haven't looked
> yet, but that one is supposed to be a fairly extensive revision of the
> usual st library.

Nice idea. I'm interested in the Strongtalk collections hierarchy as
well. But note that Strongtalk supports mix-in classes, which are
similar to traits. I wouldn't be surprised if the stream hierarchy made
extensive use of them.

Colin

Reply | Threaded
Open this post in threaded view
|

RE: [VW7.3][Linux/Debian] - Encoded streams and switching between binary and text

Terry Raymond
In reply to this post by kobetic
Martin

> -----Original Message-----
> From: Martin Kobetic [mailto:[hidden email]]
> Sent: Thursday, January 19, 2006 9:54 AM
> To: Reinout Heeck
> Cc: 'Vwnc'
> Subject: Re: [VW7.3][Linux/Debian] - Encoded streams and switching between
> binary and text
>
> I agree that traits would probably work well, but they are still too
> "experimental". If you use that in such a core piece like streams,
> you're pretty much forcing traits into the core as well. I'm not sure
> I'd want to fight that battle as well.
> Moreover you can probably get by quite far just by using delegation as
> well. For example I was thinking we could bury the read/write aspect
> inside the stream with some sort of StreamAccess strategy, so the
> read/write aspect doesn't even show in the hierarchy anymore.

That is why I stated that "Utimately, you may not want to use traits..."
But I think using traits to do the design will be a big help because
it would more easily help you in partition the common functions while
at the same time allow you to create fully capable classes.

The problem is when we try to create a hierarchy as interwoven
as Streams we end up with a lot of design artifacts which make it
difficult to understand what we are really trying to accomplish. Using
Traits will remove all or most of those artifacts and make it much
easier to see what we are really trying to do.

> However that paper definitely looks interesting, regardless of the
> implementation strategy used. Thanks, Reinout! Another source of
> inspiration might be strongtalk's refactored library. I haven't looked
> yet, but that one is supposed to be a fairly extensive revision of the
> usual st library.
>
> Also I was wondering if there are good examples of stream hierarchy
> design, outside of Smalltalk. I hear both positive and negative comments
> on the Java ones. Any hands-on experience with others out there ?
>
> Reinout Heeck wrote:
>
> > Terry Raymond wrote:
> >
> >> If you are really ambitious I would recommend recreating
> >> the stream hierarchy using VW traits. Utimately, you may
> >> not want to use traits but because there is so much common
> >> functionality in the various stream classes it may be that
> >> the easiest way to sort it all out is to use something like
> >> traits.
> >
> >
> > This has been studied in the case of Squeak, see
> >  http://www.cs.pdx.edu/~black/publications/refactoringsACM.pdf
> > for inspiration.
> >
> >
> > R
> > -

Terry
 
===========================================================
Terry Raymond       Smalltalk Professional Debug Package
Crafted Smalltalk
80 Lazywood Ln.
Tiverton, RI  02878
(401) 624-4517      [hidden email]
<http://www.craftedsmalltalk.com>
===========================================================


Reply | Threaded
Open this post in threaded view
|

RE: [VW7.3][Linux/Debian] - Encoded streams and switching between binary and text

Alan Knight-2
So the answer to trying to design a stream hierarchy that has more reliance on composition and less on inheritance is to use multiple inheritance?

At 10:22 AM 1/19/2006, Terry Raymond wrote:

>Martin
>
>> -----Original Message-----
>> From: Martin Kobetic [mailto:[hidden email]]
>> Sent: Thursday, January 19, 2006 9:54 AM
>> To: Reinout Heeck
>> Cc: 'Vwnc'
>> Subject: Re: [VW7.3][Linux/Debian] - Encoded streams and switching between
>> binary and text
>>
>> I agree that traits would probably work well, but they are still too
>> "experimental". If you use that in such a core piece like streams,
>> you're pretty much forcing traits into the core as well. I'm not sure
>> I'd want to fight that battle as well.
>> Moreover you can probably get by quite far just by using delegation as
>> well. For example I was thinking we could bury the read/write aspect
>> inside the stream with some sort of StreamAccess strategy, so the
>> read/write aspect doesn't even show in the hierarchy anymore.
>
>That is why I stated that "Utimately, you may not want to use traits..."
>But I think using traits to do the design will be a big help because
>it would more easily help you in partition the common functions while
>at the same time allow you to create fully capable classes.
>
>The problem is when we try to create a hierarchy as interwoven
>as Streams we end up with a lot of design artifacts which make it
>difficult to understand what we are really trying to accomplish. Using
>Traits will remove all or most of those artifacts and make it much
>easier to see what we are really trying to do.
>
>> However that paper definitely looks interesting, regardless of the
>> implementation strategy used. Thanks, Reinout! Another source of
>> inspiration might be strongtalk's refactored library. I haven't looked
>> yet, but that one is supposed to be a fairly extensive revision of the
>> usual st library.
>>
>> Also I was wondering if there are good examples of stream hierarchy
>> design, outside of Smalltalk. I hear both positive and negative comments
>> on the Java ones. Any hands-on experience with others out there ?
>>
>> Reinout Heeck wrote:
>>
>> > Terry Raymond wrote:
>> >
>> >> If you are really ambitious I would recommend recreating
>> >> the stream hierarchy using VW traits. Utimately, you may
>> >> not want to use traits but because there is so much common
>> >> functionality in the various stream classes it may be that
>> >> the easiest way to sort it all out is to use something like
>> >> traits.
>> >
>> >
>> > This has been studied in the case of Squeak, see
>> >  http://www.cs.pdx.edu/~black/publications/refactoringsACM.pdf
>> > for inspiration.
>> >
>> >
>> > R
>> > -
>
>Terry
>
>===========================================================
>Terry Raymond       Smalltalk Professional Debug Package
>Crafted Smalltalk
>80 Lazywood Ln.
>Tiverton, RI  02878
>(401) 624-4517      [hidden email]
><http://www.craftedsmalltalk.com>
>===========================================================

--
Alan Knight [|], Cincom Smalltalk Development
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk

"The Static Typing Philosophy: Make it fast. Make it right. Make it run." - Niall Ross

Reply | Threaded
Open this post in threaded view
|

Re: [Bulk] RE: [VW7.3][Linux/Debian] - Encoded streams and switching between binary and text

kobetic
I see the composition aspect to be mostly orthogonal to how are the
components of that composition themselves implemented. I assume that
traits would be able to help with the latter so that we don't have to
work around the issues like read/write/read-write aspect of streams. But
as I said before I think that delegation might just be good enough for
that as well, even if it's slightly less obvious than a traits based
equivalent.

Alan Knight wrote:

> So the answer to trying to design a stream hierarchy that has more
> reliance on composition and less on inheritance is to use multiple
> inheritance?
>