Possible Bug: String>>#= treats nulls as a terminator

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Possible Bug: String>>#= treats nulls as a terminator

GLASS mailing list
Is this correct?

(String with: 12 asCharacter with: 0 asCharacter) =
    (String with: 12 asCharacter with: 0 asCharacter with: 32 asCharacter)

Other string methods, like #copyAfter:, don't treat null the same way.
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug: String>>#= treats nulls as a terminator

GLASS mailing list
My example and thread title were wrong. It skips null *and* various control chars entirely when comparing:
(0 to: 255) select: [:each |
        (String with: $a with: $b) =
                (String with: $a with: each asCharacter with: $b)]

which yields:
anArray( 0, 1, 2, 3, 4, 5, 6, 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 128, 129, 130, 131, 132, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 173)

The GS Prog Guide (p. 77) says the ICU lib handles string comparisons internally, and it seems to ignore these characters for the sake of normalization.

But that means it's possible for two Strings to be #= while having different #sizes and indexable characters, and that comparisons between Strings containing binary data aren't reliable, and that other String methods aren't consistent with #=:
| one two |
one := String with: $a with: 0 asCharacter with: $b.
two := String with: $a with: $b.
one = two
        and: [(one at: 1 equals: two) not
                and: [(two at: 1 equals: one) not]]

And since GsFile #next and #contents are character based:
(GsFile open: 'bin.one' mode: 'wb' onClient: false)
        nextPutAll: #[100 25 200];
        close.
(GsFile open: 'bin.two' mode: 'wb' onClient: false)
        nextPutAll: #[100 200];
        close.
(GsFile open: 'bin.one' mode: 'rb' onClient: false) contents =
        (GsFile open: 'bin.two' mode: 'rb' onClient: false) contents.

Consider this more as a "heads-up" for users than a bug report, since this is apparently the intended, documented behavior.

> Sent: Friday, January 26, 2018 at 2:20 AM
> From: "monty via Glass" <[hidden email]>
> To: [hidden email]
> Subject: [Glass] Possible Bug: String>>#= treats nulls as a terminator
>
> Is this correct?
>
> (String with: 12 asCharacter with: 0 asCharacter) =
>     (String with: 12 asCharacter with: 0 asCharacter with: 32 asCharacter)
>
> Other string methods, like #copyAfter:, don't treat null the same way.
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass
>
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug: String>>#= treats nulls as a terminator

GLASS mailing list
Monty,

Good points ... this "unexpected" behavior of Unicode strings with
respect to control characters has been hard for us to grapple with
internally as well, but this is unicode being unicode. I did notice that
with the exception of code point 173, all of the code points you list
are indeed control characters according the Unicode character table[1].

Code point 173 is a "Soft Hypen"[2] and doesn't really seem to fit the
description of a control character, so I'm now curious if we might have
a bug here, either in our implementation, the implementation of libICU
or my understanding:)

I'm curious how you ran across this behavior? The control characters
wouldn't seem to be a normal part of strings intended for display ...

I'm asking because if there is a use case for providing the old literal
byte comparison operators we can make them available.

Dale

[1] https://unicode-table.com/en/#control-character
[2] https://unicode-table.com/en/00AD/

On 01/27/2018 01:57 AM, monty via Glass wrote:

> My example and thread title were wrong. It skips null *and* various control chars entirely when comparing:
> (0 to: 255) select: [:each |
> (String with: $a with: $b) =
> (String with: $a with: each asCharacter with: $b)]
>
> which yields:
> anArray( 0, 1, 2, 3, 4, 5, 6, 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 128, 129, 130, 131, 132, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 173)
>
> The GS Prog Guide (p. 77) says the ICU lib handles string comparisons internally, and it seems to ignore these characters for the sake of normalization.
>
> But that means it's possible for two Strings to be #= while having different #sizes and indexable characters, and that comparisons between Strings containing binary data aren't reliable, and that other String methods aren't consistent with #=:
> | one two |
> one := String with: $a with: 0 asCharacter with: $b.
> two := String with: $a with: $b.
> one = two
> and: [(one at: 1 equals: two) not
> and: [(two at: 1 equals: one) not]]
>
> And since GsFile #next and #contents are character based:
> (GsFile open: 'bin.one' mode: 'wb' onClient: false)
> nextPutAll: #[100 25 200];
> close.
> (GsFile open: 'bin.two' mode: 'wb' onClient: false)
> nextPutAll: #[100 200];
> close.
> (GsFile open: 'bin.one' mode: 'rb' onClient: false) contents =
> (GsFile open: 'bin.two' mode: 'rb' onClient: false) contents.
>
> Consider this more as a "heads-up" for users than a bug report, since this is apparently the intended, documented behavior.
>
>> Sent: Friday, January 26, 2018 at 2:20 AM
>> From: "monty via Glass" <[hidden email]>
>> To: [hidden email]
>> Subject: [Glass] Possible Bug: String>>#= treats nulls as a terminator
>>
>> Is this correct?
>>
>> (String with: 12 asCharacter with: 0 asCharacter) =
>>      (String with: 12 asCharacter with: 0 asCharacter with: 32 asCharacter)
>>
>> Other string methods, like #copyAfter:, don't treat null the same way.
>> _______________________________________________
>> Glass mailing list
>> [hidden email]
>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug: String>>#= treats nulls as a terminator

GLASS mailing list
I was writing tests for stream converter classes that do encoding/decoding from various encodings. But any use of Strings to store binary data is a use case. ByteArray is more appropriate, but GsFile is still byte-character based by default, even when you open files in binary mode (which I assume just disables line ending normalization on Windows).

> Sent: Saturday, January 27, 2018 at 12:18 PM
> From: "Dale Henrichs via Glass" <[hidden email]>
> To: [hidden email]
> Subject: Re: [Glass] Possible Bug: String>>#= treats nulls as a terminator
>
> Monty,
>
> Good points ... this "unexpected" behavior of Unicode strings with
> respect to control characters has been hard for us to grapple with
> internally as well, but this is unicode being unicode. I did notice that
> with the exception of code point 173, all of the code points you list
> are indeed control characters according the Unicode character table[1].
>
> Code point 173 is a "Soft Hypen"[2] and doesn't really seem to fit the
> description of a control character, so I'm now curious if we might have
> a bug here, either in our implementation, the implementation of libICU
> or my understanding:)
>
> I'm curious how you ran across this behavior? The control characters
> wouldn't seem to be a normal part of strings intended for display ...
>
> I'm asking because if there is a use case for providing the old literal
> byte comparison operators we can make them available.
>
> Dale
>
> [1] https://unicode-table.com/en/#control-character
> [2] https://unicode-table.com/en/00AD/
>
> On 01/27/2018 01:57 AM, monty via Glass wrote:
> > My example and thread title were wrong. It skips null *and* various control chars entirely when comparing:
> > (0 to: 255) select: [:each |
> > (String with: $a with: $b) =
> > (String with: $a with: each asCharacter with: $b)]
> >
> > which yields:
> > anArray( 0, 1, 2, 3, 4, 5, 6, 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 128, 129, 130, 131, 132, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 173)
> >
> > The GS Prog Guide (p. 77) says the ICU lib handles string comparisons internally, and it seems to ignore these characters for the sake of normalization.
> >
> > But that means it's possible for two Strings to be #= while having different #sizes and indexable characters, and that comparisons between Strings containing binary data aren't reliable, and that other String methods aren't consistent with #=:
> > | one two |
> > one := String with: $a with: 0 asCharacter with: $b.
> > two := String with: $a with: $b.
> > one = two
> > and: [(one at: 1 equals: two) not
> > and: [(two at: 1 equals: one) not]]
> >
> > And since GsFile #next and #contents are character based:
> > (GsFile open: 'bin.one' mode: 'wb' onClient: false)
> > nextPutAll: #[100 25 200];
> > close.
> > (GsFile open: 'bin.two' mode: 'wb' onClient: false)
> > nextPutAll: #[100 200];
> > close.
> > (GsFile open: 'bin.one' mode: 'rb' onClient: false) contents =
> > (GsFile open: 'bin.two' mode: 'rb' onClient: false) contents.
> >
> > Consider this more as a "heads-up" for users than a bug report, since this is apparently the intended, documented behavior.
> >
> >> Sent: Friday, January 26, 2018 at 2:20 AM
> >> From: "monty via Glass" <[hidden email]>
> >> To: [hidden email]
> >> Subject: [Glass] Possible Bug: String>>#= treats nulls as a terminator
> >>
> >> Is this correct?
> >>
> >> (String with: 12 asCharacter with: 0 asCharacter) =
> >>      (String with: 12 asCharacter with: 0 asCharacter with: 32 asCharacter)
> >>
> >> Other string methods, like #copyAfter:, don't treat null the same way.
> >> _______________________________________________
> >> Glass mailing list
> >> [hidden email]
> >> http://lists.gemtalksystems.com/mailman/listinfo/glass
> >>
> > _______________________________________________
> > Glass mailing list
> > [hidden email]
> > http://lists.gemtalksystems.com/mailman/listinfo/glass
>
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass
>
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug: String>>#= treats nulls as a terminator

GLASS mailing list


On 01/29/2018 01:16 AM, monty via Glass wrote:
> I was writing tests for stream converter classes that do encoding/decoding from various encodings. But any use of Strings to store binary data is a use case. ByteArray is more appropriate, but GsFile is still byte-character based by default, even when you open files in binary mode (which I assume just disables line ending normalization on Windows).
This seems like a GemStone bug at the end of the day ... ByteArray and
Utf8 are the two classes that _should_ be used, but if GsFile is not
handling them well, then that is an issue for us ... I will check this
out ...

Thanks,

Dale

>
>> Sent: Saturday, January 27, 2018 at 12:18 PM
>> From: "Dale Henrichs via Glass" <[hidden email]>
>> To: [hidden email]
>> Subject: Re: [Glass] Possible Bug: String>>#= treats nulls as a terminator
>>
>> Monty,
>>
>> Good points ... this "unexpected" behavior of Unicode strings with
>> respect to control characters has been hard for us to grapple with
>> internally as well, but this is unicode being unicode. I did notice that
>> with the exception of code point 173, all of the code points you list
>> are indeed control characters according the Unicode character table[1].
>>
>> Code point 173 is a "Soft Hypen"[2] and doesn't really seem to fit the
>> description of a control character, so I'm now curious if we might have
>> a bug here, either in our implementation, the implementation of libICU
>> or my understanding:)
>>
>> I'm curious how you ran across this behavior? The control characters
>> wouldn't seem to be a normal part of strings intended for display ...
>>
>> I'm asking because if there is a use case for providing the old literal
>> byte comparison operators we can make them available.
>>
>> Dale
>>
>> [1] https://unicode-table.com/en/#control-character
>> [2] https://unicode-table.com/en/00AD/
>>
>> On 01/27/2018 01:57 AM, monty via Glass wrote:
>>> My example and thread title were wrong. It skips null *and* various control chars entirely when comparing:
>>> (0 to: 255) select: [:each |
>>> (String with: $a with: $b) =
>>> (String with: $a with: each asCharacter with: $b)]
>>>
>>> which yields:
>>> anArray( 0, 1, 2, 3, 4, 5, 6, 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 128, 129, 130, 131, 132, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 173)
>>>
>>> The GS Prog Guide (p. 77) says the ICU lib handles string comparisons internally, and it seems to ignore these characters for the sake of normalization.
>>>
>>> But that means it's possible for two Strings to be #= while having different #sizes and indexable characters, and that comparisons between Strings containing binary data aren't reliable, and that other String methods aren't consistent with #=:
>>> | one two |
>>> one := String with: $a with: 0 asCharacter with: $b.
>>> two := String with: $a with: $b.
>>> one = two
>>> and: [(one at: 1 equals: two) not
>>> and: [(two at: 1 equals: one) not]]
>>>
>>> And since GsFile #next and #contents are character based:
>>> (GsFile open: 'bin.one' mode: 'wb' onClient: false)
>>> nextPutAll: #[100 25 200];
>>> close.
>>> (GsFile open: 'bin.two' mode: 'wb' onClient: false)
>>> nextPutAll: #[100 200];
>>> close.
>>> (GsFile open: 'bin.one' mode: 'rb' onClient: false) contents =
>>> (GsFile open: 'bin.two' mode: 'rb' onClient: false) contents.
>>>
>>> Consider this more as a "heads-up" for users than a bug report, since this is apparently the intended, documented behavior.
>>>
>>>> Sent: Friday, January 26, 2018 at 2:20 AM
>>>> From: "monty via Glass" <[hidden email]>
>>>> To: [hidden email]
>>>> Subject: [Glass] Possible Bug: String>>#= treats nulls as a terminator
>>>>
>>>> Is this correct?
>>>>
>>>> (String with: 12 asCharacter with: 0 asCharacter) =
>>>>       (String with: 12 asCharacter with: 0 asCharacter with: 32 asCharacter)
>>>>
>>>> Other string methods, like #copyAfter:, don't treat null the same way.
>>>> _______________________________________________
>>>> Glass mailing list
>>>> [hidden email]
>>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>>>
>>> _______________________________________________
>>> Glass mailing list
>>> [hidden email]
>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>> _______________________________________________
>> Glass mailing list
>> [hidden email]
>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug: String>>#= treats nulls as a terminator

GLASS mailing list
The real problem is String>>#=. It's bizarre that two SequenceableCollections can be #= yet have different #sizes and that for every shared index i, it's not necessarily true that "(one at: i) = (two at: i)":
| one two |

one := String with: $a with: 25 asCharacter with: $b.
two := one copyWithout: one second.
one = two
        and: [one asArray ~= two asArray
                and: [
                        (1 to: (one size min: two size)) anySatisfy: [:i |
                                (one at: i) ~= (two at: i)]]].

Java and C# model strings as immutable indexed collections of UTF-16 16-bit code units (meaning surrogate pair-encoded code points require two units), and no normalization is done during comparisons. Instead there are special methods, like Normalize(), that convert a string into a chosen normalized form, and normalized comparisons can then be done on the converted strings. Ignoring the choice of UTF-16, this seems like a better, safer approach if you're still committed to treating strings as indexable character collections.

But I'm not sure how you can fix String or GsFile without breaking backwards compatibility.

> Sent: Monday, January 29, 2018 at 11:44 AM
> From: "Dale Henrichs via Glass" <[hidden email]>
> To: [hidden email]
> Subject: Re: [Glass] Possible Bug: String>>#= treats nulls as a terminator
>
>
>
> On 01/29/2018 01:16 AM, monty via Glass wrote:
> > I was writing tests for stream converter classes that do encoding/decoding from various encodings. But any use of Strings to store binary data is a use case. ByteArray is more appropriate, but GsFile is still byte-character based by default, even when you open files in binary mode (which I assume just disables line ending normalization on Windows).
> This seems like a GemStone bug at the end of the day ... ByteArray and
> Utf8 are the two classes that _should_ be used, but if GsFile is not
> handling them well, then that is an issue for us ... I will check this
> out ...
>
> Thanks,
>
> Dale
>
> >
> >> Sent: Saturday, January 27, 2018 at 12:18 PM
> >> From: "Dale Henrichs via Glass" <[hidden email]>
> >> To: [hidden email]
> >> Subject: Re: [Glass] Possible Bug: String>>#= treats nulls as a terminator
> >>
> >> Monty,
> >>
> >> Good points ... this "unexpected" behavior of Unicode strings with
> >> respect to control characters has been hard for us to grapple with
> >> internally as well, but this is unicode being unicode. I did notice that
> >> with the exception of code point 173, all of the code points you list
> >> are indeed control characters according the Unicode character table[1].
> >>
> >> Code point 173 is a "Soft Hypen"[2] and doesn't really seem to fit the
> >> description of a control character, so I'm now curious if we might have
> >> a bug here, either in our implementation, the implementation of libICU
> >> or my understanding:)
> >>
> >> I'm curious how you ran across this behavior? The control characters
> >> wouldn't seem to be a normal part of strings intended for display ...
> >>
> >> I'm asking because if there is a use case for providing the old literal
> >> byte comparison operators we can make them available.
> >>
> >> Dale
> >>
> >> [1] https://unicode-table.com/en/#control-character
> >> [2] https://unicode-table.com/en/00AD/
> >>
> >> On 01/27/2018 01:57 AM, monty via Glass wrote:
> >>> My example and thread title were wrong. It skips null *and* various control chars entirely when comparing:
> >>> (0 to: 255) select: [:each |
> >>> (String with: $a with: $b) =
> >>> (String with: $a with: each asCharacter with: $b)]
> >>>
> >>> which yields:
> >>> anArray( 0, 1, 2, 3, 4, 5, 6, 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 128, 129, 130, 131, 132, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 173)
> >>>
> >>> The GS Prog Guide (p. 77) says the ICU lib handles string comparisons internally, and it seems to ignore these characters for the sake of normalization.
> >>>
> >>> But that means it's possible for two Strings to be #= while having different #sizes and indexable characters, and that comparisons between Strings containing binary data aren't reliable, and that other String methods aren't consistent with #=:
> >>> | one two |
> >>> one := String with: $a with: 0 asCharacter with: $b.
> >>> two := String with: $a with: $b.
> >>> one = two
> >>> and: [(one at: 1 equals: two) not
> >>> and: [(two at: 1 equals: one) not]]
> >>>
> >>> And since GsFile #next and #contents are character based:
> >>> (GsFile open: 'bin.one' mode: 'wb' onClient: false)
> >>> nextPutAll: #[100 25 200];
> >>> close.
> >>> (GsFile open: 'bin.two' mode: 'wb' onClient: false)
> >>> nextPutAll: #[100 200];
> >>> close.
> >>> (GsFile open: 'bin.one' mode: 'rb' onClient: false) contents =
> >>> (GsFile open: 'bin.two' mode: 'rb' onClient: false) contents.
> >>>
> >>> Consider this more as a "heads-up" for users than a bug report, since this is apparently the intended, documented behavior.
> >>>
> >>>> Sent: Friday, January 26, 2018 at 2:20 AM
> >>>> From: "monty via Glass" <[hidden email]>
> >>>> To: [hidden email]
> >>>> Subject: [Glass] Possible Bug: String>>#= treats nulls as a terminator
> >>>>
> >>>> Is this correct?
> >>>>
> >>>> (String with: 12 asCharacter with: 0 asCharacter) =
> >>>>       (String with: 12 asCharacter with: 0 asCharacter with: 32 asCharacter)
> >>>>
> >>>> Other string methods, like #copyAfter:, don't treat null the same way.
> >>>> _______________________________________________
> >>>> Glass mailing list
> >>>> [hidden email]
> >>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
> >>>>
> >>> _______________________________________________
> >>> Glass mailing list
> >>> [hidden email]
> >>> http://lists.gemtalksystems.com/mailman/listinfo/glass
> >> _______________________________________________
> >> Glass mailing list
> >> [hidden email]
> >> http://lists.gemtalksystems.com/mailman/listinfo/glass
> >>
> > _______________________________________________
> > Glass mailing list
> > [hidden email]
> > http://lists.gemtalksystems.com/mailman/listinfo/glass
>
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass
>
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug: String>>#= treats nulls as a terminator

GLASS mailing list
Another one:
| one two |
 
 one := 'Köln'.
 two :=  String with: $K with: $o with: 16r308 asCharacter with: $l with: $n.
 one = two
  and: [one size ~= two size
                and: [(one endsWith: two) not
                        and: [(one beginsWith: two) not
                                and: [(two endsWith: one) not
                                        and: [(two beginsWith: one) not]]]]].

> Sent: Tuesday, January 30, 2018 at 2:34 AM
> From: "monty via Glass" <[hidden email]>
> To: [hidden email]
> Subject: Re: [Glass] Possible Bug: String>>#= treats nulls as a terminator
>
> The real problem is String>>#=. It's bizarre that two SequenceableCollections can be #= yet have different #sizes and that for every shared index i, it's not necessarily true that "(one at: i) = (two at: i)":
> | one two |
>
> one := String with: $a with: 25 asCharacter with: $b.
> two := one copyWithout: one second.
> one = two
> and: [one asArray ~= two asArray
> and: [
> (1 to: (one size min: two size)) anySatisfy: [:i |
> (one at: i) ~= (two at: i)]]].
>
> Java and C# model strings as immutable indexed collections of UTF-16 16-bit code units (meaning surrogate pair-encoded code points require two units), and no normalization is done during comparisons. Instead there are special methods, like Normalize(), that convert a string into a chosen normalized form, and normalized comparisons can then be done on the converted strings. Ignoring the choice of UTF-16, this seems like a better, safer approach if you're still committed to treating strings as indexable character collections.
>
> But I'm not sure how you can fix String or GsFile without breaking backwards compatibility.
>
> > Sent: Monday, January 29, 2018 at 11:44 AM
> > From: "Dale Henrichs via Glass" <[hidden email]>
> > To: [hidden email]
> > Subject: Re: [Glass] Possible Bug: String>>#= treats nulls as a terminator
> >
> >
> >
> > On 01/29/2018 01:16 AM, monty via Glass wrote:
> > > I was writing tests for stream converter classes that do encoding/decoding from various encodings. But any use of Strings to store binary data is a use case. ByteArray is more appropriate, but GsFile is still byte-character based by default, even when you open files in binary mode (which I assume just disables line ending normalization on Windows).
> > This seems like a GemStone bug at the end of the day ... ByteArray and
> > Utf8 are the two classes that _should_ be used, but if GsFile is not
> > handling them well, then that is an issue for us ... I will check this
> > out ...
> >
> > Thanks,
> >
> > Dale
> >
> > >
> > >> Sent: Saturday, January 27, 2018 at 12:18 PM
> > >> From: "Dale Henrichs via Glass" <[hidden email]>
> > >> To: [hidden email]
> > >> Subject: Re: [Glass] Possible Bug: String>>#= treats nulls as a terminator
> > >>
> > >> Monty,
> > >>
> > >> Good points ... this "unexpected" behavior of Unicode strings with
> > >> respect to control characters has been hard for us to grapple with
> > >> internally as well, but this is unicode being unicode. I did notice that
> > >> with the exception of code point 173, all of the code points you list
> > >> are indeed control characters according the Unicode character table[1].
> > >>
> > >> Code point 173 is a "Soft Hypen"[2] and doesn't really seem to fit the
> > >> description of a control character, so I'm now curious if we might have
> > >> a bug here, either in our implementation, the implementation of libICU
> > >> or my understanding:)
> > >>
> > >> I'm curious how you ran across this behavior? The control characters
> > >> wouldn't seem to be a normal part of strings intended for display ...
> > >>
> > >> I'm asking because if there is a use case for providing the old literal
> > >> byte comparison operators we can make them available.
> > >>
> > >> Dale
> > >>
> > >> [1] https://unicode-table.com/en/#control-character
> > >> [2] https://unicode-table.com/en/00AD/
> > >>
> > >> On 01/27/2018 01:57 AM, monty via Glass wrote:
> > >>> My example and thread title were wrong. It skips null *and* various control chars entirely when comparing:
> > >>> (0 to: 255) select: [:each |
> > >>> (String with: $a with: $b) =
> > >>> (String with: $a with: each asCharacter with: $b)]
> > >>>
> > >>> which yields:
> > >>> anArray( 0, 1, 2, 3, 4, 5, 6, 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 128, 129, 130, 131, 132, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 173)
> > >>>
> > >>> The GS Prog Guide (p. 77) says the ICU lib handles string comparisons internally, and it seems to ignore these characters for the sake of normalization.
> > >>>
> > >>> But that means it's possible for two Strings to be #= while having different #sizes and indexable characters, and that comparisons between Strings containing binary data aren't reliable, and that other String methods aren't consistent with #=:
> > >>> | one two |
> > >>> one := String with: $a with: 0 asCharacter with: $b.
> > >>> two := String with: $a with: $b.
> > >>> one = two
> > >>> and: [(one at: 1 equals: two) not
> > >>> and: [(two at: 1 equals: one) not]]
> > >>>
> > >>> And since GsFile #next and #contents are character based:
> > >>> (GsFile open: 'bin.one' mode: 'wb' onClient: false)
> > >>> nextPutAll: #[100 25 200];
> > >>> close.
> > >>> (GsFile open: 'bin.two' mode: 'wb' onClient: false)
> > >>> nextPutAll: #[100 200];
> > >>> close.
> > >>> (GsFile open: 'bin.one' mode: 'rb' onClient: false) contents =
> > >>> (GsFile open: 'bin.two' mode: 'rb' onClient: false) contents.
> > >>>
> > >>> Consider this more as a "heads-up" for users than a bug report, since this is apparently the intended, documented behavior.
> > >>>
> > >>>> Sent: Friday, January 26, 2018 at 2:20 AM
> > >>>> From: "monty via Glass" <[hidden email]>
> > >>>> To: [hidden email]
> > >>>> Subject: [Glass] Possible Bug: String>>#= treats nulls as a terminator
> > >>>>
> > >>>> Is this correct?
> > >>>>
> > >>>> (String with: 12 asCharacter with: 0 asCharacter) =
> > >>>>       (String with: 12 asCharacter with: 0 asCharacter with: 32 asCharacter)
> > >>>>
> > >>>> Other string methods, like #copyAfter:, don't treat null the same way.
> > >>>> _______________________________________________
> > >>>> Glass mailing list
> > >>>> [hidden email]
> > >>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
> > >>>>
> > >>> _______________________________________________
> > >>> Glass mailing list
> > >>> [hidden email]
> > >>> http://lists.gemtalksystems.com/mailman/listinfo/glass
> > >> _______________________________________________
> > >> Glass mailing list
> > >> [hidden email]
> > >> http://lists.gemtalksystems.com/mailman/listinfo/glass
> > >>
> > > _______________________________________________
> > > Glass mailing list
> > > [hidden email]
> > > http://lists.gemtalksystems.com/mailman/listinfo/glass
> >
> > _______________________________________________
> > Glass mailing list
> > [hidden email]
> > http://lists.gemtalksystems.com/mailman/listinfo/glass
> >
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass
>
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug: String>>#= treats nulls as a terminator

GLASS mailing list
In reply to this post by GLASS mailing list
Hi Monty,


> On 30.01.2018, at 08:34, monty via Glass <[hidden email]> wrote:
>
> The real problem is String>>#=. It's bizarre that two SequenceableCollections can be #= yet have different #sizes and that for every shared index i, it's not necessarily true that "(one at: i) = (two at: i)":

This is, however in line with unicode …
See this very on-point discussion of the matter:

https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/

Best regards
        -Tobias

> | one two |
>
> one := String with: $a with: 25 asCharacter with: $b.
> two := one copyWithout: one second.
> one = two
> and: [one asArray ~= two asArray
> and: [
> (1 to: (one size min: two size)) anySatisfy: [:i |
> (one at: i) ~= (two at: i)]]].
>
> Java and C# model strings as immutable indexed collections of UTF-16 16-bit code units (meaning surrogate pair-encoded code points require two units), and no normalization is done during comparisons. Instead there are special methods, like Normalize(), that convert a string into a chosen normalized form, and normalized comparisons can then be done on the converted strings. Ignoring the choice of UTF-16, this seems like a better, safer approach if you're still committed to treating strings as indexable character collections.
>
> But I'm not sure how you can fix String or GsFile without breaking backwards compatibility.
>
>> Sent: Monday, January 29, 2018 at 11:44 AM
>> From: "Dale Henrichs via Glass" <[hidden email]>
>> To: [hidden email]
>> Subject: Re: [Glass] Possible Bug: String>>#= treats nulls as a terminator
>>
>>
>>
>> On 01/29/2018 01:16 AM, monty via Glass wrote:
>>> I was writing tests for stream converter classes that do encoding/decoding from various encodings. But any use of Strings to store binary data is a use case. ByteArray is more appropriate, but GsFile is still byte-character based by default, even when you open files in binary mode (which I assume just disables line ending normalization on Windows).
>> This seems like a GemStone bug at the end of the day ... ByteArray and
>> Utf8 are the two classes that _should_ be used, but if GsFile is not
>> handling them well, then that is an issue for us ... I will check this
>> out ...
>>
>> Thanks,
>>
>> Dale
>>
>>>
>>>> Sent: Saturday, January 27, 2018 at 12:18 PM
>>>> From: "Dale Henrichs via Glass" <[hidden email]>
>>>> To: [hidden email]
>>>> Subject: Re: [Glass] Possible Bug: String>>#= treats nulls as a terminator
>>>>
>>>> Monty,
>>>>
>>>> Good points ... this "unexpected" behavior of Unicode strings with
>>>> respect to control characters has been hard for us to grapple with
>>>> internally as well, but this is unicode being unicode. I did notice that
>>>> with the exception of code point 173, all of the code points you list
>>>> are indeed control characters according the Unicode character table[1].
>>>>
>>>> Code point 173 is a "Soft Hypen"[2] and doesn't really seem to fit the
>>>> description of a control character, so I'm now curious if we might have
>>>> a bug here, either in our implementation, the implementation of libICU
>>>> or my understanding:)
>>>>
>>>> I'm curious how you ran across this behavior? The control characters
>>>> wouldn't seem to be a normal part of strings intended for display ...
>>>>
>>>> I'm asking because if there is a use case for providing the old literal
>>>> byte comparison operators we can make them available.
>>>>
>>>> Dale
>>>>
>>>> [1] https://unicode-table.com/en/#control-character
>>>> [2] https://unicode-table.com/en/00AD/
>>>>
>>>> On 01/27/2018 01:57 AM, monty via Glass wrote:
>>>>> My example and thread title were wrong. It skips null *and* various control chars entirely when comparing:
>>>>> (0 to: 255) select: [:each |
>>>>> (String with: $a with: $b) =
>>>>> (String with: $a with: each asCharacter with: $b)]
>>>>>
>>>>> which yields:
>>>>> anArray( 0, 1, 2, 3, 4, 5, 6, 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 128, 129, 130, 131, 132, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 173)
>>>>>
>>>>> The GS Prog Guide (p. 77) says the ICU lib handles string comparisons internally, and it seems to ignore these characters for the sake of normalization.
>>>>>
>>>>> But that means it's possible for two Strings to be #= while having different #sizes and indexable characters, and that comparisons between Strings containing binary data aren't reliable, and that other String methods aren't consistent with #=:
>>>>> | one two |
>>>>> one := String with: $a with: 0 asCharacter with: $b.
>>>>> two := String with: $a with: $b.
>>>>> one = two
>>>>> and: [(one at: 1 equals: two) not
>>>>> and: [(two at: 1 equals: one) not]]
>>>>>
>>>>> And since GsFile #next and #contents are character based:
>>>>> (GsFile open: 'bin.one' mode: 'wb' onClient: false)
>>>>> nextPutAll: #[100 25 200];
>>>>> close.
>>>>> (GsFile open: 'bin.two' mode: 'wb' onClient: false)
>>>>> nextPutAll: #[100 200];
>>>>> close.
>>>>> (GsFile open: 'bin.one' mode: 'rb' onClient: false) contents =
>>>>> (GsFile open: 'bin.two' mode: 'rb' onClient: false) contents.
>>>>>
>>>>> Consider this more as a "heads-up" for users than a bug report, since this is apparently the intended, documented behavior.
>>>>>
>>>>>> Sent: Friday, January 26, 2018 at 2:20 AM
>>>>>> From: "monty via Glass" <[hidden email]>
>>>>>> To: [hidden email]
>>>>>> Subject: [Glass] Possible Bug: String>>#= treats nulls as a terminator
>>>>>>
>>>>>> Is this correct?
>>>>>>
>>>>>> (String with: 12 asCharacter with: 0 asCharacter) =
>>>>>>      (String with: 12 asCharacter with: 0 asCharacter with: 32 asCharacter)
>>>>>>
>>>>>> Other string methods, like #copyAfter:, don't treat null the same way.
>>>>>> _______________________________________________
>>>>>> Glass mailing list
>>>>>> [hidden email]
>>>>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>>>>>
>>>>> _______________________________________________
>>>>> Glass mailing list
>>>>> [hidden email]
>>>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>>> _______________________________________________
>>>> Glass mailing list
>>>> [hidden email]
>>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>>>
>>> _______________________________________________
>>> Glass mailing list
>>> [hidden email]
>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>
>> _______________________________________________
>> Glass mailing list
>> [hidden email]
>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug: String>>#= treats nulls as a terminator

GLASS mailing list
You misunderstood the issue. If you choose one string representation (like an indexed collection of code points) but use another (like normalized EGSs) when doing basic comparisons, you get these inconsistencies that arguably violate the underlying indexable collection interface contract (like #= being true while #beingsWith: and #endsWith: are false).

Perl 6, which your article mentions, models strings as indexed, _pre-normalized_ collections of EGSs[0]:
        "Köln" eq "Ko\x308ln" && "Köln".chars == "Ko\x308ln".chars && "Köln".codes == "Ko\x308ln".codes && "Köln".starts-with("Ko\x308ln") && "Köln".ends-with("Ko\x308ln")

('chars' is the length in EGSs, while 'codes' is the length in code points.) The Java/C# approach is more basic, but it's still consistent, forcing you to manually normalize strings before comparing them by code unit, if you want a normalized comparison.

Anyway, I would recommend adding character (code point)-based comparison messages to String, and a #byteContents/#binaryContents message to GsFile, or even better, #ascii/#binary toggles like Pharo/Squeak have so you can set GsFile to #binary and use #next (instead of #nextByte) and #contents normally.

0: https://github.com/MoarVM/MoarVM/blob/master/docs/strings.asciidoc#normalization

> Sent: Tuesday, January 30, 2018 at 3:48 AM
> From: "Tobias Pape" <[hidden email]>
> To: monty <[hidden email]>
> Cc: [hidden email]
> Subject: Re: [Glass] Possible Bug: String>>#= treats nulls as a terminator
>
> Hi Monty,
>
>
> > On 30.01.2018, at 08:34, monty via Glass <[hidden email]> wrote:
> >
> > The real problem is String>>#=. It's bizarre that two SequenceableCollections can be #= yet have different #sizes and that for every shared index i, it's not necessarily true that "(one at: i) = (two at: i)":
>
> This is, however in line with unicode …
> See this very on-point discussion of the matter:
>
> https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/
>
> Best regards
> -Tobias
>
> > | one two |
> >
> > one := String with: $a with: 25 asCharacter with: $b.
> > two := one copyWithout: one second.
> > one = two
> > and: [one asArray ~= two asArray
> > and: [
> > (1 to: (one size min: two size)) anySatisfy: [:i |
> > (one at: i) ~= (two at: i)]]].
> >
> > Java and C# model strings as immutable indexed collections of UTF-16 16-bit code units (meaning surrogate pair-encoded code points require two units), and no normalization is done during comparisons. Instead there are special methods, like Normalize(), that convert a string into a chosen normalized form, and normalized comparisons can then be done on the converted strings. Ignoring the choice of UTF-16, this seems like a better, safer approach if you're still committed to treating strings as indexable character collections.
> >
> > But I'm not sure how you can fix String or GsFile without breaking backwards compatibility.
> >
> >> Sent: Monday, January 29, 2018 at 11:44 AM
> >> From: "Dale Henrichs via Glass" <[hidden email]>
> >> To: [hidden email]
> >> Subject: Re: [Glass] Possible Bug: String>>#= treats nulls as a terminator
> >>
> >>
> >>
> >> On 01/29/2018 01:16 AM, monty via Glass wrote:
> >>> I was writing tests for stream converter classes that do encoding/decoding from various encodings. But any use of Strings to store binary data is a use case. ByteArray is more appropriate, but GsFile is still byte-character based by default, even when you open files in binary mode (which I assume just disables line ending normalization on Windows).
> >> This seems like a GemStone bug at the end of the day ... ByteArray and
> >> Utf8 are the two classes that _should_ be used, but if GsFile is not
> >> handling them well, then that is an issue for us ... I will check this
> >> out ...
> >>
> >> Thanks,
> >>
> >> Dale
> >>
> >>>
> >>>> Sent: Saturday, January 27, 2018 at 12:18 PM
> >>>> From: "Dale Henrichs via Glass" <[hidden email]>
> >>>> To: [hidden email]
> >>>> Subject: Re: [Glass] Possible Bug: String>>#= treats nulls as a terminator
> >>>>
> >>>> Monty,
> >>>>
> >>>> Good points ... this "unexpected" behavior of Unicode strings with
> >>>> respect to control characters has been hard for us to grapple with
> >>>> internally as well, but this is unicode being unicode. I did notice that
> >>>> with the exception of code point 173, all of the code points you list
> >>>> are indeed control characters according the Unicode character table[1].
> >>>>
> >>>> Code point 173 is a "Soft Hypen"[2] and doesn't really seem to fit the
> >>>> description of a control character, so I'm now curious if we might have
> >>>> a bug here, either in our implementation, the implementation of libICU
> >>>> or my understanding:)
> >>>>
> >>>> I'm curious how you ran across this behavior? The control characters
> >>>> wouldn't seem to be a normal part of strings intended for display ...
> >>>>
> >>>> I'm asking because if there is a use case for providing the old literal
> >>>> byte comparison operators we can make them available.
> >>>>
> >>>> Dale
> >>>>
> >>>> [1] https://unicode-table.com/en/#control-character
> >>>> [2] https://unicode-table.com/en/00AD/
> >>>>
> >>>> On 01/27/2018 01:57 AM, monty via Glass wrote:
> >>>>> My example and thread title were wrong. It skips null *and* various control chars entirely when comparing:
> >>>>> (0 to: 255) select: [:each |
> >>>>> (String with: $a with: $b) =
> >>>>> (String with: $a with: each asCharacter with: $b)]
> >>>>>
> >>>>> which yields:
> >>>>> anArray( 0, 1, 2, 3, 4, 5, 6, 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 128, 129, 130, 131, 132, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 173)
> >>>>>
> >>>>> The GS Prog Guide (p. 77) says the ICU lib handles string comparisons internally, and it seems to ignore these characters for the sake of normalization.
> >>>>>
> >>>>> But that means it's possible for two Strings to be #= while having different #sizes and indexable characters, and that comparisons between Strings containing binary data aren't reliable, and that other String methods aren't consistent with #=:
> >>>>> | one two |
> >>>>> one := String with: $a with: 0 asCharacter with: $b.
> >>>>> two := String with: $a with: $b.
> >>>>> one = two
> >>>>> and: [(one at: 1 equals: two) not
> >>>>> and: [(two at: 1 equals: one) not]]
> >>>>>
> >>>>> And since GsFile #next and #contents are character based:
> >>>>> (GsFile open: 'bin.one' mode: 'wb' onClient: false)
> >>>>> nextPutAll: #[100 25 200];
> >>>>> close.
> >>>>> (GsFile open: 'bin.two' mode: 'wb' onClient: false)
> >>>>> nextPutAll: #[100 200];
> >>>>> close.
> >>>>> (GsFile open: 'bin.one' mode: 'rb' onClient: false) contents =
> >>>>> (GsFile open: 'bin.two' mode: 'rb' onClient: false) contents.
> >>>>>
> >>>>> Consider this more as a "heads-up" for users than a bug report, since this is apparently the intended, documented behavior.
> >>>>>
> >>>>>> Sent: Friday, January 26, 2018 at 2:20 AM
> >>>>>> From: "monty via Glass" <[hidden email]>
> >>>>>> To: [hidden email]
> >>>>>> Subject: [Glass] Possible Bug: String>>#= treats nulls as a terminator
> >>>>>>
> >>>>>> Is this correct?
> >>>>>>
> >>>>>> (String with: 12 asCharacter with: 0 asCharacter) =
> >>>>>>      (String with: 12 asCharacter with: 0 asCharacter with: 32 asCharacter)
> >>>>>>
> >>>>>> Other string methods, like #copyAfter:, don't treat null the same way.
> >>>>>> _______________________________________________
> >>>>>> Glass mailing list
> >>>>>> [hidden email]
> >>>>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
> >>>>>>
> >>>>> _______________________________________________
> >>>>> Glass mailing list
> >>>>> [hidden email]
> >>>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
> >>>> _______________________________________________
> >>>> Glass mailing list
> >>>> [hidden email]
> >>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
> >>>>
> >>> _______________________________________________
> >>> Glass mailing list
> >>> [hidden email]
> >>> http://lists.gemtalksystems.com/mailman/listinfo/glass
> >>
> >> _______________________________________________
> >> Glass mailing list
> >> [hidden email]
> >> http://lists.gemtalksystems.com/mailman/listinfo/glass
> >>
> > _______________________________________________
> > Glass mailing list
> > [hidden email]
> > http://lists.gemtalksystems.com/mailman/listinfo/glass
>
>
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug: String>>#= treats nulls as a terminator

GLASS mailing list
In reply to this post by GLASS mailing list
One more, and easily the worst:

| one two |
one:=#[97 150 98] asString.
two:=#[97 98] asString.
one = two
        and: [one hash ~= two hash]

> Sent: Monday, January 29, 2018 at 11:44 AM
> From: "Dale Henrichs via Glass" <[hidden email]>
> To: [hidden email]
> Subject: Re: [Glass] Possible Bug: String>>#= treats nulls as a terminator
>
>
>
> On 01/29/2018 01:16 AM, monty via Glass wrote:
> > I was writing tests for stream converter classes that do encoding/decoding from various encodings. But any use of Strings to store binary data is a use case. ByteArray is more appropriate, but GsFile is still byte-character based by default, even when you open files in binary mode (which I assume just disables line ending normalization on Windows).
> This seems like a GemStone bug at the end of the day ... ByteArray and
> Utf8 are the two classes that _should_ be used, but if GsFile is not
> handling them well, then that is an issue for us ... I will check this
> out ...
>
> Thanks,
>
> Dale
>
> >
> >> Sent: Saturday, January 27, 2018 at 12:18 PM
> >> From: "Dale Henrichs via Glass" <[hidden email]>
> >> To: [hidden email]
> >> Subject: Re: [Glass] Possible Bug: String>>#= treats nulls as a terminator
> >>
> >> Monty,
> >>
> >> Good points ... this "unexpected" behavior of Unicode strings with
> >> respect to control characters has been hard for us to grapple with
> >> internally as well, but this is unicode being unicode. I did notice that
> >> with the exception of code point 173, all of the code points you list
> >> are indeed control characters according the Unicode character table[1].
> >>
> >> Code point 173 is a "Soft Hypen"[2] and doesn't really seem to fit the
> >> description of a control character, so I'm now curious if we might have
> >> a bug here, either in our implementation, the implementation of libICU
> >> or my understanding:)
> >>
> >> I'm curious how you ran across this behavior? The control characters
> >> wouldn't seem to be a normal part of strings intended for display ...
> >>
> >> I'm asking because if there is a use case for providing the old literal
> >> byte comparison operators we can make them available.
> >>
> >> Dale
> >>
> >> [1] https://unicode-table.com/en/#control-character
> >> [2] https://unicode-table.com/en/00AD/
> >>
> >> On 01/27/2018 01:57 AM, monty via Glass wrote:
> >>> My example and thread title were wrong. It skips null *and* various control chars entirely when comparing:
> >>> (0 to: 255) select: [:each |
> >>> (String with: $a with: $b) =
> >>> (String with: $a with: each asCharacter with: $b)]
> >>>
> >>> which yields:
> >>> anArray( 0, 1, 2, 3, 4, 5, 6, 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 128, 129, 130, 131, 132, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 173)
> >>>
> >>> The GS Prog Guide (p. 77) says the ICU lib handles string comparisons internally, and it seems to ignore these characters for the sake of normalization.
> >>>
> >>> But that means it's possible for two Strings to be #= while having different #sizes and indexable characters, and that comparisons between Strings containing binary data aren't reliable, and that other String methods aren't consistent with #=:
> >>> | one two |
> >>> one := String with: $a with: 0 asCharacter with: $b.
> >>> two := String with: $a with: $b.
> >>> one = two
> >>> and: [(one at: 1 equals: two) not
> >>> and: [(two at: 1 equals: one) not]]
> >>>
> >>> And since GsFile #next and #contents are character based:
> >>> (GsFile open: 'bin.one' mode: 'wb' onClient: false)
> >>> nextPutAll: #[100 25 200];
> >>> close.
> >>> (GsFile open: 'bin.two' mode: 'wb' onClient: false)
> >>> nextPutAll: #[100 200];
> >>> close.
> >>> (GsFile open: 'bin.one' mode: 'rb' onClient: false) contents =
> >>> (GsFile open: 'bin.two' mode: 'rb' onClient: false) contents.
> >>>
> >>> Consider this more as a "heads-up" for users than a bug report, since this is apparently the intended, documented behavior.
> >>>
> >>>> Sent: Friday, January 26, 2018 at 2:20 AM
> >>>> From: "monty via Glass" <[hidden email]>
> >>>> To: [hidden email]
> >>>> Subject: [Glass] Possible Bug: String>>#= treats nulls as a terminator
> >>>>
> >>>> Is this correct?
> >>>>
> >>>> (String with: 12 asCharacter with: 0 asCharacter) =
> >>>>       (String with: 12 asCharacter with: 0 asCharacter with: 32 asCharacter)
> >>>>
> >>>> Other string methods, like #copyAfter:, don't treat null the same way.
> >>>> _______________________________________________
> >>>> Glass mailing list
> >>>> [hidden email]
> >>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
> >>>>
> >>> _______________________________________________
> >>> Glass mailing list
> >>> [hidden email]
> >>> http://lists.gemtalksystems.com/mailman/listinfo/glass
> >> _______________________________________________
> >> Glass mailing list
> >> [hidden email]
> >> http://lists.gemtalksystems.com/mailman/listinfo/glass
> >>
> > _______________________________________________
> > Glass mailing list
> > [hidden email]
> > http://lists.gemtalksystems.com/mailman/listinfo/glass
>
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass
>
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass