Wrong sorting in my locale

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Wrong sorting in my locale

Janko Mivšek
Hi guys

In VW7.7 I just noticed a pretty funny sorting in Slovenian locale
#'sl_SI.UTF-8' :

  #('Žu' 'Zu') asSortedCollection returns ('Zu' 'Žu') "correct"
  #('Ža' 'Zu') asSortedCollection returns ('Ža' 'Zu') "wrong!"

What is the reason for that? Is there a patch somewhere to solve that
sorting to be correct?

Best regards
Janko

--
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Wrong sorting in my locale

Alan Knight-2
My understanding is that with the general Unicode Collation Algorithm, which is what is implemented in VW 7.7, that's the way those characters sort. Unicode defines >Z-caron> as being equivalent to <Z><caron> and so it sorts comparing the caron against the $u. In order to have them sort appropriately for Slovenian, there would need to be locale-specific collation rules implemented, which is work we have in progress, and which we'd expect to be available in the vw-dev builds in the next few weeks, but is not in any released version right now.

At 06:05 AM 2010-08-25, Janko Mivšek wrote:
Hi guys In VW7.7 I just noticed a pretty funny sorting in Slovenian locale #'sl_SI.UTF-8' :   #('Žu' 'Zu') asSortedCollection returns ('Zu' 'Žu') "correct"   #('Ža' ''Zu') asSortedCollection returns ('Ža' 'Zu') "wrong!" What is the reason for that? Is there a patch somewhere to solve that sorting to be correct? Best regards Janko -- Janko Mivšek AIDA/Web Smalltalk Web Application Server http://www.aidaweb.si _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Wrong sorting in my locale

Holger Kleinsorgen-4
In reply to this post by Janko Mivšek
  to use a locale-specific collation policy:

1. patch

LocaleLocalizationComponent class>install_sl_Locale
LocaleLocalizationComponent class>install_sl_SI_Locale

to use the desired collation selector instead of #englishCollate:to:

2. send

StringCollationPolicy collationAlgorithm:
#IWantToUseTheLocaleSpecificCollationAlgorithm

(any other symbol except #Fastest, #UnicodeNormal,
#UnicodeWithPunctuation will do, too)

> Hi guys
>
> In VW7.7 I just noticed a pretty funny sorting in Slovenian locale
> #'sl_SI.UTF-8' :
>
>   #('Žu' 'Zu') asSortedCollection returns ('Zu' 'Žu') "correct"
>   #('Ža' 'Zu') asSortedCollection returns ('Ža' 'Zu') "wrong!"
>
> What is the reason for that? Is there a patch somewhere to solve that
> sorting to be correct?
>
> Best regards
> Janko
>
> --
> Janko Mivšek
> AIDA/Web
> Smalltalk Web Application Server
> http://www.aidaweb.si
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Wrong sorting in my locale

Alan Knight-2
That would work, but in the case of Slovenian omits the detail of having to write the collation selector as well. There are a few such additional selectors, but not very many.

At 05:44 PM 2010-08-25, Holger Kleinsorgen wrote:
  to use a locale-specific collation policy: 1. patch LocaleLocalizationComponent class>install_sl_Locale LocaleLocalizationComponent class>install_sl_SI_Locale to use the desired collation selector instead of #englishCollate:to: 2. send StringCollationPolicy collationAlgorithm: #IWantToUseTheLocaleSpecificCollationAlgorithm (any other symbol except #Fastest, #UnicodeNormal, #UnicodeWithPunctuation will do, too) > Hi guys > > In VW7.7 I just noticed a pretty funny sorting in Slovenian locale > #'sl_SI.UTF-8' : > >   #('Žu' 'Zu') asSortedCollection returns ('Zu' 'Žu') "correct" >   #('Ža' 'Zu') asSortedCollecction returns ('Ža' 'Zu') "wrong!" > > What is the reason for that? Is there a patch somewhere to solve that > sorting to be correct? > > Best regards > Janko > > -- > Janko Mivšek > AIDA/Web > Smalltalk Web Application Server > http://www.aidaweb.si > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Wrong sorting in my locale

Janko Mivšek
In reply to this post by Holger Kleinsorgen-4
Thanks Holger and Alan,

I actually did a Slovenian-specific collation as override of
StringCollactionPolicy in a special package. When loaded, it changes
collation policy image-wide to Slovenian only. Not an universal solution
but good enough until the general one comes around.

Best regards
Janko

On 25. 08. 2010 23:44, Holger Kleinsorgen wrote:

>   to use a locale-specific collation policy:
>
> 1. patch
>
> LocaleLocalizationComponent class>install_sl_Locale
> LocaleLocalizationComponent class>install_sl_SI_Locale
>
> to use the desired collation selector instead of #englishCollate:to:
>
> 2. send
>
> StringCollationPolicy collationAlgorithm:
> #IWantToUseTheLocaleSpecificCollationAlgorithm
>
> (any other symbol except #Fastest, #UnicodeNormal,
> #UnicodeWithPunctuation will do, too)
>
>> Hi guys
>>
>> In VW7.7 I just noticed a pretty funny sorting in Slovenian locale
>> #'sl_SI.UTF-8' :
>>
>>   #('Žu' 'Zu') asSortedCollection returns ('Zu' 'Žu') "correct"
>>   #('Ža' 'Zu') asSortedCollection returns ('Ža' 'Zu') "wrong!"
>>
>> What is the reason for that? Is there a patch somewhere to solve that
>> sorting to be correct?
>>
>> Best regards
>> Janko
>>
>> --
>> Janko Mivšek
>> AIDA/Web
>> Smalltalk Web Application Server
>> http://www.aidaweb.si
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc