testCodecLatin1 and testCodecUtf8Bom

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

testCodecLatin1 and testCodecUtf8Bom

Michael Lucas-Smith-3
Hi All,

#testCodecLatin1 and #testCodecUtf8Bom fail for me on VisualWorks.. once
the Unicode string is created, #asByteArray is sent to it to "turn it
back to bytes" but of course this is impossible, it's already a
TwoByteString in VisualWorks and to turn that back to bytes, you need to
specify an encoding using #asByteArrayEncoding: 'utf-8' (for example).

Can these tests be written with a custom compare operation? the comment
in #testCodecLatin1 explains the problem - that comparing unicode
strings using #= is not necessarily going to work across platforms, but
perhaps something like this might work everywhere:

compare: a with: b
    a size = b size ifFalse: [^false].
    1 to: a size do: [:index | (a at: index) = (b at: index) ifFalse:
[^false]].
    ^true

Cheers,
Michael
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: testCodecLatin1 and testCodecUtf8Bom

Paolo Bonzini-2
On Sat, May 15, 2010 at 01:19, Michael Lucas-Smith
<[hidden email]> wrote:
> Hi All,
>
> #testCodecLatin1 and #testCodecUtf8Bom fail for me on VisualWorks.. once the
> Unicode string is created, #asByteArray is sent to it to "turn it back to
> bytes" but of course this is impossible, it's already a TwoByteString in
> VisualWorks and to turn that back to bytes, you need to specify an encoding
> using #asByteArrayEncoding: 'utf-8' (for example).

Maybe instead of

 self assert: (codec decode: bom , self utf8String) asByteArray
     = self decodedString asByteArray.

we need

 self assert: (codec encode: (codec decode: bom , self utf8String))
     = (codec encode: self decodedString).

> Can these tests be written with a custom compare operation?

That too; however:

> the comment in
> #testCodecLatin1 explains the problem - that comparing unicode strings using
> #= is not necessarily going to work across platforms, but perhaps something
> like this might work everywhere:
>
> compare: a with: b
>   a size = b size ifFalse: [^false].
>   1 to: a size do: [:index | (a at: index) = (b at: index) ifFalse:
> [^false]].
>   ^true

That would be the same as #=, wouldn't it?  Is comparing unicode
strings using #= not portable *if we know the two strings have the
same encoding* or at least were produced by the same codec?

Paolo
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: testCodecLatin1 and testCodecUtf8Bom

Philippe Marschall
In reply to this post by Michael Lucas-Smith-3
2010/5/15 Michael Lucas-Smith <[hidden email]>:
> Hi All,
>
> #testCodecLatin1 and #testCodecUtf8Bom fail for me on VisualWorks.. once the
> Unicode string is created, #asByteArray is sent to it to "turn it back to
> bytes" but of course this is impossible, it's already a TwoByteString in
> VisualWorks and to turn that back to bytes, you need to specify an encoding
> using #asByteArrayEncoding: 'utf-8' (for example).

You're right, looking at it again, using #asByteArray seems
problematic due to it's underspecified nature.

> Can these tests be written with a custom compare operation? the comment in
> #testCodecLatin1 explains the problem - that comparing unicode strings using
> #= is not necessarily going to work across platforms, but perhaps something
> like this might work everywhere:
>
> compare: a with: b
>   a size = b size ifFalse: [^false].
>   1 to: a size do: [:index | (a at: index) = (b at: index) ifFalse:
> [^false]].
>   ^true

If that works with everybody, sure.

Cheers
Philippe
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: testCodecLatin1 and testCodecUtf8Bom

Paolo Bonzini-2
On Sun, May 16, 2010 at 15:00, Philippe Marschall
<[hidden email]> wrote:

> 2010/5/15 Michael Lucas-Smith <[hidden email]>:
>> Hi All,
>>
>> #testCodecLatin1 and #testCodecUtf8Bom fail for me on VisualWorks.. once the
>> Unicode string is created, #asByteArray is sent to it to "turn it back to
>> bytes" but of course this is impossible, it's already a TwoByteString in
>> VisualWorks and to turn that back to bytes, you need to specify an encoding
>> using #asByteArrayEncoding: 'utf-8' (for example).
>
> You're right, looking at it again, using #asByteArray seems
> problematic due to it's underspecified nature.
>
>> Can these tests be written with a custom compare operation? the comment in
>> #testCodecLatin1 explains the problem - that comparing unicode strings using
>> #= is not necessarily going to work across platforms, but perhaps something
>> like this might work everywhere:
>>
>> compare: a with: b
>>   a size = b size ifFalse: [^false].
>>   1 to: a size do: [:index | (a at: index) = (b at: index) ifFalse:
>> [^false]].
>>   ^true
>
> If that works with everybody, sure.

It doesn't work for gst.  I think it's simplest to add something like
#assert:equalsDecodedString: and let the platform package define it.

Paolo
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: testCodecLatin1 and testCodecUtf8Bom

Paolo Bonzini-2
In reply to this post by Michael Lucas-Smith-3
On 05/15/2010 01:19 AM, Michael Lucas-Smith wrote:
> Hi All,
>
> #testCodecLatin1 and #testCodecUtf8Bom fail for me on VisualWorks.. once
> the Unicode string is created, #asByteArray is sent to it to "turn it
> back to bytes" but of course this is impossible

It's a hack, but if you use #asString it works in GNU Smalltalk too.

Paolo
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: testCodecLatin1 and testCodecUtf8Bom

Paolo Bonzini-2
On 05/25/2010 09:50 AM, Paolo Bonzini wrote:
> On 05/15/2010 01:19 AM, Michael Lucas-Smith wrote:
>> Hi All,
>>
>> #testCodecLatin1 and #testCodecUtf8Bom fail for me on VisualWorks.. once
>> the Unicode string is created, #asByteArray is sent to it to "turn it
>> back to bytes" but of course this is impossible
>
> It's a hack, but if you use #asString it works in GNU Smalltalk too.

... and if I do that in #encode: it works.  Sorry for not thinking about
it earlier.  My excuse is that it is a bit gross.  :-)

Paolo
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: testCodecLatin1 and testCodecUtf8Bom

Philippe Marschall
2010/5/25 Paolo Bonzini <[hidden email]>:

> On 05/25/2010 09:50 AM, Paolo Bonzini wrote:
>>
>> On 05/15/2010 01:19 AM, Michael Lucas-Smith wrote:
>>>
>>> Hi All,
>>>
>>> #testCodecLatin1 and #testCodecUtf8Bom fail for me on VisualWorks.. once
>>> the Unicode string is created, #asByteArray is sent to it to "turn it
>>> back to bytes" but of course this is impossible
>>
>> It's a hack, but if you use #asString it works in GNU Smalltalk too.

would #greaseString work as well?

> ... and if I do that in #encode: it works.  Sorry for not thinking about it
> earlier.  My excuse is that it is a bit gross.  :-)

Would comparing using streams and sending work or would that fail as well?

Cheers
Philippe
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: testCodecLatin1 and testCodecUtf8Bom

Paolo Bonzini-2
On 05/26/2010 09:58 PM, Philippe Marschall wrote:

> 2010/5/25 Paolo Bonzini<[hidden email]>:
>> On 05/25/2010 09:50 AM, Paolo Bonzini wrote:
>>>
>>> On 05/15/2010 01:19 AM, Michael Lucas-Smith wrote:
>>>>
>>>> Hi All,
>>>>
>>>> #testCodecLatin1 and #testCodecUtf8Bom fail for me on VisualWorks.. once
>>>> the Unicode string is created, #asByteArray is sent to it to "turn it
>>>> back to bytes" but of course this is impossible
>>>
>>> It's a hack, but if you use #asString it works in GNU Smalltalk too.
>
> would #greaseString work as well?

Yes, but I can do that in the codec as well.  That's simpler and means
the latest SqueakSource tests should work in gst too.

Paolo
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev