[3.4.0] Strange behaviour with indices ...

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[3.4.0] Strange behaviour with indices ...

GLASS mailing list

I've created several indices on an IdentitySet holding domain objects and one attribute is a telephone number, which is a string (or unicode). The GsIndexSpec looks like this:

GsIndexSpec new
...
unicodeStringOptimizedIndex: 'each.address.pm_telnr'
collator: (IcuCollator forLocaleNamed: 'de_DE')
options: GsIndexOptions btreePlusIndex + GsIndexOptions optimizedComparison;
...
yourself.


Now I created a combined query WITH indices build and get NO results:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |
eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.
aGsQuery := GsQuery fromString: 'lowerBound <= each.address.pm_telnr <= upperBound'.
aGsQuery
bind: 'lowerBound' to: '047' ;
bind: 'upperBound' to: '0479'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.

The same query WITHOUT indices build brings up a result of 155 entries (and the results seem to be ok).

Now I split the query into two sub queries and I get answers - regardless of having indices or not:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |

eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.

aGsQuery := GsQuery fromString: 'each.address.pm_telnr <= upperBound'.
aGsQuery
bind: 'upperBound' to: '0479'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.

This brings up 200 items (and the result seems to be ok).

Then I execute the lowerBound query and I get answers - regardless of having indices or not:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |

eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.

aGsQuery := GsQuery fromString: 'lowerBound <= each.address.pm_telnr'.
aGsQuery
bind: 'lowerBound' to: '047'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.


That does not look promising ...


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [3.4.0] Strange behaviour with indices ...

GLASS mailing list

Ok, I changed the GsIndexSpec to:

GsIndexSpec new

...

stringOptimizedIndex: 'each.address.pm_telnr'
collator: (IcuCollator forLocaleNamed: 'de_DE')
options: GsIndexOptions btreePlusIndex + GsIndexOptions optimizedComparison;

...

and now the query with (a <= each <= b) is working and returns results. Reason unknown, since the unicode indices should also work. The attribute values were always Strings or Unicode7 instances.

Marten


Marten Feldtmann via Glass <[hidden email]> hat am 27. Februar 2018 um 18:36 geschrieben:

I've created several indices on an IdentitySet holding domain objects and one attribute is a telephone number, which is a string (or unicode). The GsIndexSpec looks like this:

GsIndexSpec new
...
unicodeStringOptimizedIndex: 'each.address.pm_telnr'
collator: (IcuCollator forLocaleNamed: 'de_DE')
options: GsIndexOptions btreePlusIndex + GsIndexOptions optimizedComparison;
...
yourself.


Now I created a combined query WITH indices build and get NO results:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |
eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.
aGsQuery := GsQuery fromString: 'lowerBound <= each.address.pm_telnr <= upperBound'.
aGsQuery
bind: 'lowerBound' to: '047' ;
bind: 'upperBound' to: '0479'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.

The same query WITHOUT indices build brings up a result of 155 entries (and the results seem to be ok).

Now I split the query into two sub queries and I get answers - regardless of having indices or not:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |

eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.

aGsQuery := GsQuery fromString: 'each.address.pm_telnr <= upperBound'.
aGsQuery
bind: 'upperBound' to: '0479'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.

This brings up 200 items (and the result seems to be ok).

Then I execute the lowerBound query and I get answers - regardless of having indices or not:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |

eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.

aGsQuery := GsQuery fromString: 'lowerBound <= each.address.pm_telnr'.
aGsQuery
bind: 'lowerBound' to: '047'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.


That does not look promising ...


 

_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass


 


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [3.4.0] Strange behaviour with indices ...

GLASS mailing list

Marten,

This is indeed strange behavior ... the code paths used by these two indexSpecs _is_ different, but at the end of the day, the index behavior should have been the same ... I''m assuming that you are using a GsDevKit/GLASS repository and that `CharacterCollection isInUnicodeComparisonMode` evaluates to true...

Could you supply me with a simplified test case that reproduces the problem? 

I am suspicious that the problem might be related to the data involved. I’m not at my computer right now so I’m not able to determine whether a trivial data set reproduces the problem or not.

Dale

On 02/27/2018 11:20 AM, Marten Feldtmann via Glass wrote:

Ok, I changed the GsIndexSpec to:

GsIndexSpec new

...

stringOptimizedIndex: 'each.address.pm_telnr'
collator: (IcuCollator forLocaleNamed: 'de_DE')
options: GsIndexOptions btreePlusIndex + GsIndexOptions optimizedComparison;

...

and now the query with (a <= each <= b) is working and returns results. Reason unknown, since the unicode indices should also work. The attribute values were always Strings or Unicode7 instances.

Marten


Marten Feldtmann via Glass [hidden email] hat am 27. Februar 2018 um 18:36 geschrieben:

I've created several indices on an IdentitySet holding domain objects and one attribute is a telephone number, which is a string (or unicode). The GsIndexSpec looks like this:

GsIndexSpec new
...
unicodeStringOptimizedIndex: 'each.address.pm_telnr'
collator: (IcuCollator forLocaleNamed: 'de_DE')
options: GsIndexOptions btreePlusIndex + GsIndexOptions optimizedComparison;
...
yourself.


Now I created a combined query WITH indices build and get NO results:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |
eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.
aGsQuery := GsQuery fromString: 'lowerBound <= each.address.pm_telnr <= upperBound'.
aGsQuery
bind: 'lowerBound' to: '047' ;
bind: 'upperBound' to: '0479'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.

The same query WITHOUT indices build brings up a result of 155 entries (and the results seem to be ok).

Now I split the query into two sub queries and I get answers - regardless of having indices or not:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |

eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.

aGsQuery := GsQuery fromString: 'each.address.pm_telnr <= upperBound'.
aGsQuery
bind: 'upperBound' to: '0479'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.

This brings up 200 items (and the result seems to be ok).

Then I execute the lowerBound query and I get answers - regardless of having indices or not:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |

eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.

aGsQuery := GsQuery fromString: 'lowerBound <= each.address.pm_telnr'.
aGsQuery
bind: 'lowerBound' to: '047'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.


That does not look promising ...


 

_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass


 



_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [3.4.0] Strange behaviour with indices ...

GLASS mailing list

CharacterCollection isInUnicodeComparisonMode evaluates to true ...

Marten

Dale Henrichs via Glass <[hidden email]> hat am 28. Februar 2018 um 00:24 geschrieben:

 

Marten,

This is indeed strange behavior ... the code paths used by these two indexSpecs _is_ different, but at the end of the day, the index behavior should have been the same ... I''m assuming that you are using a GsDevKit/GLASS repository and that `CharacterCollection isInUnicodeComparisonMode` evaluates to true...

Could you supply me with a simplified test case that reproduces the problem? 

I am suspicious that the problem might be related to the data involved. I’m not at my computer right now so I’m not able to determine whether a trivial data set reproduces the problem or not.

Dale

On 02/27/2018 11:20 AM, Marten Feldtmann via Glass wrote:

Ok, I changed the GsIndexSpec to:

GsIndexSpec new

...

stringOptimizedIndex: 'each.address.pm_telnr'
collator: (IcuCollator forLocaleNamed: 'de_DE')
options: GsIndexOptions btreePlusIndex + GsIndexOptions optimizedComparison;

...

and now the query with (a <= each <= b) is working and returns results. Reason unknown, since the unicode indices should also work. The attribute values were always Strings or Unicode7 instances.

Marten


Marten Feldtmann via Glass [hidden email] hat am 27. Februar 2018 um 18:36 geschrieben:

I've created several indices on an IdentitySet holding domain objects and one attribute is a telephone number, which is a string (or unicode). The GsIndexSpec looks like this:

GsIndexSpec new
...
unicodeStringOptimizedIndex: 'each.address.pm_telnr'
collator: (IcuCollator forLocaleNamed: 'de_DE')
options: GsIndexOptions btreePlusIndex + GsIndexOptions optimizedComparison;
...
yourself.


Now I created a combined query WITH indices build and get NO results:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |
eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.
aGsQuery := GsQuery fromString: 'lowerBound <= each.address.pm_telnr <= upperBound'.
aGsQuery
bind: 'lowerBound' to: '047' ;
bind: 'upperBound' to: '0479'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.

The same query WITHOUT indices build brings up a result of 155 entries (and the results seem to be ok).

Now I split the query into two sub queries and I get answers - regardless of having indices or not:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |

eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.

aGsQuery := GsQuery fromString: 'each.address.pm_telnr <= upperBound'.
aGsQuery
bind: 'upperBound' to: '0479'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.

This brings up 200 items (and the result seems to be ok).

Then I execute the lowerBound query and I get answers - regardless of having indices or not:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |

eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.

aGsQuery := GsQuery fromString: 'lowerBound <= each.address.pm_telnr'.
aGsQuery
bind: 'lowerBound' to: '047'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.


That does not look promising ...


 

_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass


 



_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass


 

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass


 


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [3.4.0] Strange behaviour with indices ...

GLASS mailing list
In reply to this post by GLASS mailing list

Marten,

Just want to let you know that I've finally cleared enough time to reproduce the issue you've reported and I hopefully I will get to the bottom of the problem shortly ...

Dale


On 02/27/2018 03:24 PM, Dale Henrichs wrote:

Marten,

This is indeed strange behavior ... the code paths used by these two indexSpecs _is_ different, but at the end of the day, the index behavior should have been the same ... I''m assuming that you are using a GsDevKit/GLASS repository and that `CharacterCollection isInUnicodeComparisonMode` evaluates to true...

Could you supply me with a simplified test case that reproduces the problem? 

I am suspicious that the problem might be related to the data involved. I’m not at my computer right now so I’m not able to determine whether a trivial data set reproduces the problem or not.

Dale

On 02/27/2018 11:20 AM, Marten Feldtmann via Glass wrote:

Ok, I changed the GsIndexSpec to:

GsIndexSpec new

...

stringOptimizedIndex: 'each.address.pm_telnr'
collator: (IcuCollator forLocaleNamed: 'de_DE')
options: GsIndexOptions btreePlusIndex + GsIndexOptions optimizedComparison;

...

and now the query with (a <= each <= b) is working and returns results. Reason unknown, since the unicode indices should also work. The attribute values were always Strings or Unicode7 instances.

Marten


Marten Feldtmann via Glass [hidden email] hat am 27. Februar 2018 um 18:36 geschrieben:

I've created several indices on an IdentitySet holding domain objects and one attribute is a telephone number, which is a string (or unicode). The GsIndexSpec looks like this:

GsIndexSpec new
...
unicodeStringOptimizedIndex: 'each.address.pm_telnr'
collator: (IcuCollator forLocaleNamed: 'de_DE')
options: GsIndexOptions btreePlusIndex + GsIndexOptions optimizedComparison;
...
yourself.


Now I created a combined query WITH indices build and get NO results:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |
eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.
aGsQuery := GsQuery fromString: 'lowerBound <= each.address.pm_telnr <= upperBound'.
aGsQuery
bind: 'lowerBound' to: '047' ;
bind: 'upperBound' to: '0479'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.

The same query WITHOUT indices build brings up a result of 155 entries (and the results seem to be ok).

Now I split the query into two sub queries and I get answers - regardless of having indices or not:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |

eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.

aGsQuery := GsQuery fromString: 'each.address.pm_telnr <= upperBound'.
aGsQuery
bind: 'upperBound' to: '0479'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.

This brings up 200 items (and the result seems to be ok).

Then I execute the lowerBound query and I get answers - regardless of having indices or not:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |

eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.

aGsQuery := GsQuery fromString: 'lowerBound <= each.address.pm_telnr'.
aGsQuery
bind: 'lowerBound' to: '047'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.


That does not look promising ...


 

_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass


 



_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass



_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [3.4.0] Strange behaviour with indices ...

GLASS mailing list

Marten,

I found the source of the bug. The method BtreePlusComparisonQuerySpec>>unicodePerformSelector: is incorrect:

  unicodePerformSelector: anOpCode
  ^ rangeIndex constraintType == #'symbol'
    ifTrue: [ self optimizedUnicodeSymbolPerformSelectors at: opCode ]
    ifFalse: [
      rangeIndex constraintType == #'string'
        ifTrue: [ self optimizedUnicodeStringPerformSelectors at: opCode ]
        ifFalse: [
          "rangeIndex constraintType == #unicodeString or nil"
          self performSelector: opCode ] ]

should be:

  unicodePerformSelector: anOpCode
  ^ rangeIndex constraintType == #'symbol'
    ifTrue: [ self optimizedUnicodeSymbolPerformSelectors at: anOpCode ]
    ifFalse: [
      rangeIndex constraintType == #'string'
        ifTrue: [ self optimizedUnicodeStringPerformSelectors at: anOpCode ]
        ifFalse: [
          "rangeIndex constraintType == #unicodeString or nil"
          self performSelector: anOpCode ] ]


We'll have to expand our test coverage to cover this particular case ... I've submitted an internal bug 47509  "BtreePlusComparisonQuerySpec>>unicodePerformSelector: is incorrect" for this bug and expect the fix to show up in 3.4.2.

FWIW, using #stringOptimizedIndex:

          aGsIndexSpec
            stringOptimizedIndex: 'each.address.pm_telnr'
            options:
              GsIndexOptions btreePlusIndex + GsIndexOptions optimizedComparison

should should be significantly faster than unicodeStringOptimizedIndex:

          aGsIndexSpec
            unicodeStringOptimizedIndex: 'each.address.pm_telnr'
            options:
              GsIndexOptions btreePlusIndex + GsIndexOptions optimizedComparison

Dale

On 03/19/2018 04:48 PM, Dale Henrichs wrote:

Marten,

Just want to let you know that I've finally cleared enough time to reproduce the issue you've reported and I hopefully I will get to the bottom of the problem shortly ...

Dale


On 02/27/2018 03:24 PM, Dale Henrichs wrote:

Marten,

This is indeed strange behavior ... the code paths used by these two indexSpecs _is_ different, but at the end of the day, the index behavior should have been the same ... I''m assuming that you are using a GsDevKit/GLASS repository and that `CharacterCollection isInUnicodeComparisonMode` evaluates to true...

Could you supply me with a simplified test case that reproduces the problem? 

I am suspicious that the problem might be related to the data involved. I’m not at my computer right now so I’m not able to determine whether a trivial data set reproduces the problem or not.

Dale

On 02/27/2018 11:20 AM, Marten Feldtmann via Glass wrote:

Ok, I changed the GsIndexSpec to:

GsIndexSpec new

...

stringOptimizedIndex: 'each.address.pm_telnr'
collator: (IcuCollator forLocaleNamed: 'de_DE')
options: GsIndexOptions btreePlusIndex + GsIndexOptions optimizedComparison;

...

and now the query with (a <= each <= b) is working and returns results. Reason unknown, since the unicode indices should also work. The attribute values were always Strings or Unicode7 instances.

Marten


Marten Feldtmann via Glass [hidden email] hat am 27. Februar 2018 um 18:36 geschrieben:

I've created several indices on an IdentitySet holding domain objects and one attribute is a telephone number, which is a string (or unicode). The GsIndexSpec looks like this:

GsIndexSpec new
...
unicodeStringOptimizedIndex: 'each.address.pm_telnr'
collator: (IcuCollator forLocaleNamed: 'de_DE')
options: GsIndexOptions btreePlusIndex + GsIndexOptions optimizedComparison;
...
yourself.


Now I created a combined query WITH indices build and get NO results:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |
eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.
aGsQuery := GsQuery fromString: 'lowerBound <= each.address.pm_telnr <= upperBound'.
aGsQuery
bind: 'lowerBound' to: '047' ;
bind: 'upperBound' to: '0479'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.

The same query WITHOUT indices build brings up a result of 155 entries (and the results seem to be ok).

Now I split the query into two sub queries and I get answers - regardless of having indices or not:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |

eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.

aGsQuery := GsQuery fromString: 'each.address.pm_telnr <= upperBound'.
aGsQuery
bind: 'upperBound' to: '0479'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.

This brings up 200 items (and the result seems to be ok).

Then I execute the lowerBound query and I get answers - regardless of having indices or not:

| telephone aqmListFound eachCATIGeneralStudy aSet aGsQuery |

eachCATIGeneralStudy := WCATIServiceClass dataRootInstance searchStudyGlobalViaNumber: '18020201'.

aGsQuery := GsQuery fromString: 'lowerBound <= each.address.pm_telnr'.
aGsQuery
bind: 'lowerBound' to: '047'.
(aGsQuery on: eachCATIGeneralStudy getAddressManagement getAddressQueueMemberships) asArray.


That does not look promising ...


 

_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass


 



_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass




_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [3.4.0] Strange behaviour with indices ...

GLASS mailing list
Dale, you're busy, but did you read the whole String>>#= thread? This:
| one two |
one := 'Köln'.
two :=  String with: $K with: $o with: 16r308 asCharacter with: $l with: $n.
one = two
        and: [one hash ~= two hash]

is absolutely a bug, not just a surprise. To be consistent with #=, String>>#hash must normalize (with UCI) its string before computing its hash.

(IMO #=, #<=, etc doing implicit normalization before comparing was a mistake. Either the Java/C# way of making the user manually normalize the strings before comparing or the Perl6 way of storing strings internally in a pre-normalized form would have been more intuitive and consistent. I assume it's too late to change now.)
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [3.4.0] Strange behaviour with indices ...

GLASS mailing list
Hi,

Skimming this thread… we uncovered that problem as well in our migration from GS2.4 to GS 3.2 (or later).
We started seeing strange differences and only recently unlocked the issue:

Even when using the locale en_US_POSIX, there seem to be cases where Strings are not compared by comparing their integer code point values. Example Strings:
string1 := String with: (Character codePoint: 16r00E9).
string2 := String with: (Character codePoint: 16r0065) with: (Character codePoint: 16r0301).

Even with the default IcuLocale set to en_US_POSIX, these strings are #=-equal:
string1 = string2 "=> true"
While this is not GemStone2-compatible behavior, it's not that bad for the Strings to be equal according to #=, as they are in fact canonically equivalent. However, the Strings do not have the same #hash:
string1 hash = string2 hash "=> false”

I’m primarily concerned with places we are using dictionaries with strings as keys and when we are upgrading from GS2.4 to GS3.2 (or later).

Cheers,
Johan

> On 20 Mar 2018, at 10:17, monty via Glass <[hidden email]> wrote:
>
> Dale, you're busy, but did you read the whole String>>#= thread? This:
> | one two |
> one := 'Köln'.
> two :=  String with: $K with: $o with: 16r308 asCharacter with: $l with: $n.
> one = two
> and: [one hash ~= two hash]
>
> is absolutely a bug, not just a surprise. To be consistent with #=, String>>#hash must normalize (with UCI) its string before computing its hash.
>
> (IMO #=, #<=, etc doing implicit normalization before comparing was a mistake. Either the Java/C# way of making the user manually normalize the strings before comparing or the Perl6 way of storing strings internally in a pre-normalized form would have been more intuitive and consistent. I assume it's too late to change now.)
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass