Smalltalk › Frameworks & Tools › Seaside › Seaside Development

testCodecLatin1 and testCodecUtf8Bom

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

8 messages Options

Michael Lucas-Smith-3

testCodecLatin1 and testCodecUtf8Bom

Hi All,

#testCodecLatin1 and #testCodecUtf8Bom fail for me on VisualWorks.. once
the Unicode string is created, #asByteArray is sent to it to "turn it
back to bytes" but of course this is impossible, it's already a
TwoByteString in VisualWorks and to turn that back to bytes, you need to
specify an encoding using #asByteArrayEncoding: 'utf-8' (for example).

Can these tests be written with a custom compare operation? the comment
in #testCodecLatin1 explains the problem - that comparing unicode
strings using #= is not necessarily going to work across platforms, but
perhaps something like this might work everywhere:

compare: a with: b
a size = b size ifFalse: [^false].
1 to: a size do: [:index | (a at: index) = (b at: index) ifFalse:
[^false]].
^true

Cheers,
Michael
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev

Paolo Bonzini-2

Re: testCodecLatin1 and testCodecUtf8Bom

On Sat, May 15, 2010 at 01:19, Michael Lucas-Smith
<[hidden email]> wrote:
> Hi All,
>
> #testCodecLatin1 and #testCodecUtf8Bom fail for me on VisualWorks.. once the
> Unicode string is created, #asByteArray is sent to it to "turn it back to
> bytes" but of course this is impossible, it's already a TwoByteString in
> VisualWorks and to turn that back to bytes, you need to specify an encoding
> using #asByteArrayEncoding: 'utf-8' (for example).

Maybe instead of

self assert: (codec decode: bom , self utf8String) asByteArray
= self decodedString asByteArray.

we need

self assert: (codec encode: (codec decode: bom , self utf8String))
= (codec encode: self decodedString).

> Can these tests be written with a custom compare operation?

That too; however:

> the comment in
> #testCodecLatin1 explains the problem - that comparing unicode strings using
> #= is not necessarily going to work across platforms, but perhaps something
> like this might work everywhere:
>
> compare: a with: b
> a size = b size ifFalse: [^false].
> 1 to: a size do: [:index | (a at: index) = (b at: index) ifFalse:
> [^false]].
> ^true

That would be the same as #=, wouldn't it? Is comparing unicode
strings using #= not portable *if we know the two strings have the
same encoding* or at least were produced by the same codec?

Paolo
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev

Philippe Marschall

Re: testCodecLatin1 and testCodecUtf8Bom

In reply to this post by Michael Lucas-Smith-3

2010/5/15 Michael Lucas-Smith <[hidden email]>:
> Hi All,
>
> #testCodecLatin1 and #testCodecUtf8Bom fail for me on VisualWorks.. once the
> Unicode string is created, #asByteArray is sent to it to "turn it back to
> bytes" but of course this is impossible, it's already a TwoByteString in
> VisualWorks and to turn that back to bytes, you need to specify an encoding
> using #asByteArrayEncoding: 'utf-8' (for example).

You're right, looking at it again, using #asByteArray seems
problematic due to it's underspecified nature.

> Can these tests be written with a custom compare operation? the comment in
> #testCodecLatin1 explains the problem - that comparing unicode strings using
> #= is not necessarily going to work across platforms, but perhaps something
> like this might work everywhere:
>
> compare: a with: b
> a size = b size ifFalse: [^false].
> 1 to: a size do: [:index | (a at: index) = (b at: index) ifFalse:
> [^false]].
> ^true

If that works with everybody, sure.

Cheers
Philippe
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev

Paolo Bonzini-2

Re: testCodecLatin1 and testCodecUtf8Bom

On Sun, May 16, 2010 at 15:00, Philippe Marschall
<[hidden email]> wrote:

> 2010/5/15 Michael Lucas-Smith <[hidden email]>:
>> Hi All,
>>
>> #testCodecLatin1 and #testCodecUtf8Bom fail for me on VisualWorks.. once the
>> Unicode string is created, #asByteArray is sent to it to "turn it back to
>> bytes" but of course this is impossible, it's already a TwoByteString in
>> VisualWorks and to turn that back to bytes, you need to specify an encoding
>> using #asByteArrayEncoding: 'utf-8' (for example).
>
> You're right, looking at it again, using #asByteArray seems
> problematic due to it's underspecified nature.
>
>> Can these tests be written with a custom compare operation? the comment in
>> #testCodecLatin1 explains the problem - that comparing unicode strings using
>> #= is not necessarily going to work across platforms, but perhaps something
>> like this might work everywhere:
>>
>> compare: a with: b
>> a size = b size ifFalse: [^false].
>> 1 to: a size do: [:index | (a at: index) = (b at: index) ifFalse:
>> [^false]].
>> ^true
>
> If that works with everybody, sure.

It doesn't work for gst. I think it's simplest to add something like
#assert:equalsDecodedString: and let the platform package define it.

Paolo
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev

Paolo Bonzini-2

Re: testCodecLatin1 and testCodecUtf8Bom

In reply to this post by Michael Lucas-Smith-3

On 05/15/2010 01:19 AM, Michael Lucas-Smith wrote:
> Hi All,
>
> #testCodecLatin1 and #testCodecUtf8Bom fail for me on VisualWorks.. once
> the Unicode string is created, #asByteArray is sent to it to "turn it
> back to bytes" but of course this is impossible

It's a hack, but if you use #asString it works in GNU Smalltalk too.

Paolo
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev

Paolo Bonzini-2

Re: testCodecLatin1 and testCodecUtf8Bom

On 05/25/2010 09:50 AM, Paolo Bonzini wrote:
> On 05/15/2010 01:19 AM, Michael Lucas-Smith wrote:
>> Hi All,
>>
>> #testCodecLatin1 and #testCodecUtf8Bom fail for me on VisualWorks.. once
>> the Unicode string is created, #asByteArray is sent to it to "turn it
>> back to bytes" but of course this is impossible
>
> It's a hack, but if you use #asString it works in GNU Smalltalk too.

... and if I do that in #encode: it works. Sorry for not thinking about
it earlier. My excuse is that it is a bit gross. :-)

Paolo
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev

Philippe Marschall

Re: testCodecLatin1 and testCodecUtf8Bom

2010/5/25 Paolo Bonzini <[hidden email]>:

> On 05/25/2010 09:50 AM, Paolo Bonzini wrote:
>>
>> On 05/15/2010 01:19 AM, Michael Lucas-Smith wrote:
>>>
>>> Hi All,
>>>
>>> #testCodecLatin1 and #testCodecUtf8Bom fail for me on VisualWorks.. once
>>> the Unicode string is created, #asByteArray is sent to it to "turn it
>>> back to bytes" but of course this is impossible
>>
>> It's a hack, but if you use #asString it works in GNU Smalltalk too.

would #greaseString work as well?

> ... and if I do that in #encode: it works. Sorry for not thinking about it
> earlier. My excuse is that it is a bit gross. :-)

Would comparing using streams and sending work or would that fail as well?

Cheers
Philippe
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev

Paolo Bonzini-2

Re: testCodecLatin1 and testCodecUtf8Bom

On 05/26/2010 09:58 PM, Philippe Marschall wrote:

> 2010/5/25 Paolo Bonzini<[hidden email]>:
>> On 05/25/2010 09:50 AM, Paolo Bonzini wrote:
>>>
>>> On 05/15/2010 01:19 AM, Michael Lucas-Smith wrote:
>>>>
>>>> Hi All,
>>>>
>>>> #testCodecLatin1 and #testCodecUtf8Bom fail for me on VisualWorks.. once
>>>> the Unicode string is created, #asByteArray is sent to it to "turn it
>>>> back to bytes" but of course this is impossible
>>>
>>> It's a hack, but if you use #asString it works in GNU Smalltalk too.
>
> would #greaseString work as well?

Yes, but I can do that in the codec as well. That's simpler and means
the latest SqueakSource tests should work in gst too.

Paolo
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev