Unexpected string sorting anomaly [since forever]

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Unexpected string sorting anomaly [since forever]

Richard Sargent
Administrator
I have learned (the hard way) that VA Smalltalk has an unusual sorting characteristic for Strings and has for a very long time. If you sort a collection of strings and there are strings differing only in case from each other, the sort is not stable. Sometimes one will sort before the other and sometime they will sort the other way.

'false'  <  'FALSE'    false
'FALSE'  <  'false'    false <<<

'false'  =  'FALSE'    false
'FALSE'  =  'false'    false

'false'  >  'FALSE'    false <<<
'FALSE'  >  'false'    false

'false'  ~=  'FALSE'    true
'FALSE'  ~=  'false'    true

'false'  <=  'FALSE'    true <<<
'FALSE'  <=  'false'    true

'false'  >=  'FALSE'    true
'FALSE'  >=  'false'    true <<<


$h  <  $H    false
$H  <  $h    false <<<

$h  =  $H    false
$H  =  $h    false

$h  >  $H    false <<<
$H  >  $h    false

$h  ~=  $H    true
$H  ~=  $h    true

$h  <=  $H    true <<<
$H  <=  $h    true

$h  >=  $H    true
$H  >=  $h    true <<<

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/9cba5856-bacb-4c7e-b631-3ea41a6f0f33%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected string sorting anomaly [since forever]

Hans-Martin Mosner-3
I wouldn't call that wrong, it's just a characteristic of a partial ordering which can be overcome by more complex locale specific rules.
In Smalltalk, we've historically had a complete ordering (originally ASCII with the exception of the special up-arrow return and left-arrow assignment glyphs which were mapped to ASCII code points 16r5E and 16r5F) which sorted uppercase before lowercase, so 'TRUE' < 'false' but 'FALSE' < 'true'.
With locale collation, things get a lot more complicated (and admittedly VAST doesn't do everything right here.) Case differences play almost no role when you want to achieve dictionary order.
In some natural languages there are different rules which were developed in different contexts, and you can't really say that one is wrong while the other is right. See https://german.stackexchange.com/questions/52765/ordering-german-special-characters-and-those-from-other-languages-when-sorting for a discussion on german sort order, for example.
There are cases when single characters should be treated like character groups (for example, ß should be sorted like ss), and for case differences you might want to have a precedence rule such that case differences only matter if the words are completely equal when compared as lowercase. This would lead to an order 'THRU' < 'thru' < 'TRUE' < 'true' which probably feels most natural to most people.

What you show in your example is basically a partial case-insensistive comparison, so the "<" or ">" messages return false when sender and recipient differ just in letter case, because then neither is considered less than or greater than the other one. For a practical application, you should define a comparison method which orders strings the way you want them ordered :-)

Am Donnerstag, 7. Mai 2020 23:31:03 UTC+2 schrieb Richard Sargent:
I have learned (the hard way) that VA Smalltalk has an unusual sorting characteristic for Strings and has for a very long time. If you sort a collection of strings and there are strings differing only in case from each other, the sort is not stable. Sometimes one will sort before the other and sometime they will sort the other way.

'false'  <  'FALSE'    false
'FALSE'  <  'false'    false <<<

'false'  =  'FALSE'    false
'FALSE'  =  'false'    false

'false'  >  'FALSE'    false <<<
'FALSE'  >  'false'    false

'false'  ~=  'FALSE'    true
'FALSE'  ~=  'false'    true

'false'  <=  'FALSE'    true <<<
'FALSE'  <=  'false'    true

'false'  >=  'FALSE'    true
'FALSE'  >=  'false'    true <<<


$h  <  $H    false
$H  <  $h    false <<<

$h  =  $H    false
$H  =  $h    false

$h  >  $H    false <<<
$H  >  $h    false

$h  ~=  $H    true
$H  ~=  $h    true

$h  <=  $H    true <<<
$H  <=  $h    true

$h  >=  $H    true
$H  >=  $h    true <<<

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/c03f23ff-3750-4832-8896-b703a9583603%40googlegroups.com.