Hello
According to http://www.unicode.org/cldr/charts/27/collation/de.html the German phonebook sort order is a A ä Ä ą̈ Ą̈ ǟ Ǟ ạ̈ Ạ̈ ḁ̈ Ḁ̈ b B c C d D e E f F g G h H i I j J k K l L m M n N o O ö Ö ǫ̈ Ǫ̈ ȫ Ȫ ơ̈ Ơ̈ ợ̈ Ợ̈ ọ̈ Ọ̈ p P q Q r R s S ss ß t T u U ü Ü ǘ Ǘ ǜ Ǜ ǚ Ǚ ų̈ Ų̈ ǖ Ǖ ư̈ Ư̈ ự̈ Ự̈ ụ̈ Ụ̈ ṳ̈ Ṳ̈ ṷ̈ Ṷ̈ ṵ̈ Ṵ̈ v V w W x X y Y z Z I wonder why it looks like this. A lot of characters which never appear in a German text. For Spanish there is 'traditional' and 'standard' http://www.unicode.org/cldr/charts/27/collation/es.html standard a A á Á b B c C d D e E é É f F g G h H i I í Í j J k K l L m M n N ñ Ñ ņ̃ Ņ̃ ṇ̃ Ṇ̃ ṋ̃ Ṋ̃ ṉ̃ Ṉ̃ o O ó Ó p P q Q r R s S t T u U ú Ú ü Ü v V w W x X y Y z Z traditional a A á Á b B c C ch Ch CH cĥ Cĥ CĤ cȟ Cȟ CȞ cḧ Cḧ CḦ cḣ Cḣ CḢ cḩ Cḩ CḨ cḥ Cḥ CḤ cḫ Cḫ CḪ cẖ Cẖ d D e E é É f F g G h H i I í Í j J k K l L ll Ll LL lĺ Lĺ LĹ lľ Lľ LĽ lļ Lļ LĻ lḷ Lḷ LḶ lḹ Lḹ LḸ lḽ Lḽ LḼ lḻ Lḻ LḺ m M n N ñ Ñ ņ̃ Ņ̃ ṇ̃ Ṇ̃ ṋ̃ Ṋ̃ ṉ̃ Ṉ̃ o O ó Ó p P q Q r R s S t T u U ú Ú ü Ü v V w W x X y Y z Z And French is not easily found http://www.unicode.org/cldr/charts/27/collation/index.html or seems to be defined elsewhere http://unicode.org/repos/cldr/tags/release-27/common/collation/fr.xml Suggestions and hints are welcome --Hannes _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Hannes,
For GemStone, we are using the ICU library[1]. We have Unicode7, Unicode16 and Unicode32 classes (subclasses of CharacterCollection) for internal Strings and the class Utf8 (a subclass of ByteArray) for Utf8 encoded strings ... The ICU library provides the primitive implementations for working with the Unicode* and Utf8 classes When we started considering Unicode support, we looked at what it would take to support collation - our main reason for looking at Unicode in the first place) -- and we saw just how complicated the collation rules can be[2], we were glad to see that someone had already done the hard work[1]... Reconciling our legacy String implementations (String, DoubleByteString, and QuadByteString) with the Unicode* classes was also interesting, because the rules for Unicode equality and our legacy equality implementation were not quite compatible. If you are interested in more information, I can share additional details ... Dale [1] http://site.icu-project.org/ [2] http://unicode.org/reports/tr10/ On 12/07/2015 11:54 AM, H. Hirzel wrote: > Hello > > According to http://www.unicode.org/cldr/charts/27/collation/de.html the German > phonebook sort order is > > a A ä Ä ą̈ Ą̈ ǟ Ǟ ạ̈ Ạ̈ ḁ̈ Ḁ̈ b B c C d D e E f F g G h H i I j J k K > l L m M n N o O ö Ö ǫ̈ Ǫ̈ ȫ Ȫ ơ̈ Ơ̈ ợ̈ Ợ̈ ọ̈ Ọ̈ p P q Q r R s S ss ß t > T u U ü Ü ǘ Ǘ ǜ Ǜ ǚ Ǚ ų̈ Ų̈ ǖ Ǖ ư̈ Ư̈ ự̈ Ự̈ ụ̈ Ụ̈ ṳ̈ Ṳ̈ ṷ̈ Ṷ̈ ṵ̈ Ṵ̈ v > V w W x X y Y z Z > > I wonder why it looks like this. A lot of characters which never > appear in a German text. > > > For Spanish there is 'traditional' and 'standard' > > http://www.unicode.org/cldr/charts/27/collation/es.html > > standard a A á Á b B c C d D e E é É f F g G h H i I í Í j J k K l L m > M n N ñ Ñ ņ̃ Ņ̃ ṇ̃ Ṇ̃ ṋ̃ Ṋ̃ ṉ̃ Ṉ̃ o O ó Ó p P q Q r R s S t T u U ú Ú > ü Ü v V w W x X y Y z Z > > traditional a A á Á b B c C ch Ch CH cĥ Cĥ CĤ cȟ Cȟ CȞ cḧ Cḧ CḦ cḣ Cḣ > CḢ cḩ Cḩ CḨ cḥ Cḥ CḤ cḫ Cḫ CḪ cẖ Cẖ d D e E é É f F g G h H i I í Í j > J k K l L ll Ll LL lĺ Lĺ LĹ lľ Lľ LĽ lļ Lļ LĻ lḷ Lḷ LḶ lḹ Lḹ LḸ lḽ Lḽ > LḼ lḻ Lḻ LḺ m M n N ñ Ñ ņ̃ Ņ̃ ṇ̃ Ṇ̃ ṋ̃ Ṋ̃ ṉ̃ Ṉ̃ o O ó Ó p P q Q r R s > S t T u U ú Ú ü Ü v V w W x X y Y z Z > > And French is not easily found > http://www.unicode.org/cldr/charts/27/collation/index.html > or seems to be defined elsewhere > > http://unicode.org/repos/cldr/tags/release-27/common/collation/fr.xml > > Suggestions and hints are welcome > > --Hannes > _______________________________________________ > Cuis mailing list > [hidden email] > http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Dale
Thank you for your answer with links to the ICU library and the notes about classes in Gemstone. Noteworthy that you have a class Utf8 as a subclass of ByteArray. I understand that Gemstone uses the ICU library and thus does not implement the algorithms in Smalltalk. I am currently looking into what the ICU library provides. I found as well a Ruby library [2] which implements CLDR [3] It has methods like this "Alphabetize a list using regular Ruby sort:" $> ["Art", "Wasa", "Älg", "Ved"].sort $> ["Art", "Ved", "Wasa", "Älg"] Alphabetize a list using TwitterCLDR’s locale-aware sort: $> ["Art", "Wasa", "Älg", "Ved"].localize(:de).sort.to_a $> ["Älg", "Art", "Ved", "Wasa"] I hope that given such an example it would not be too difficult to reimplement a similar sort algorithm in Squeak/Cuis/Pharo. Currently the interest is in getting sorting done in a cross-dialect-way. --Hannes [2] https://blog.twitter.com/2012/twittercldr-improving-internationalization-support-in-ruby [3] Unicode Common Locale Data Repository http://cldr.unicode.org/index On 12/7/15, Dale Henrichs <[hidden email]> wrote: > Hannes, > > For GemStone, we are using the ICU library[1]. We have Unicode7, > Unicode16 and Unicode32 classes (subclasses of CharacterCollection) for > internal Strings and the class Utf8 (a subclass of ByteArray) for Utf8 > encoded strings ... > > The ICU library provides the primitive implementations for working with > the Unicode* and Utf8 classes > > When we started considering Unicode support, we looked at what it would > take to support collation - our main reason for looking at Unicode in > the first place) -- and we saw just how complicated the collation rules > can be[2], we were glad to see that someone had already done the hard > work[1]... > > Reconciling our legacy String implementations (String, DoubleByteString, > and QuadByteString) with the Unicode* classes was also interesting, > because the rules for Unicode equality and our legacy equality > implementation were not quite compatible. > > If you are interested in more information, I can share additional > details ... > > Dale > > [1] http://site.icu-project.org/ > [2] http://unicode.org/reports/tr10/ > > On 12/07/2015 11:54 AM, H. Hirzel wrote: >> Hello >> >> According to http://www.unicode.org/cldr/charts/27/collation/de.html the >> German >> phonebook sort order is >> >> a A ä Ä ą̈ Ą̈ ǟ Ǟ ạ̈ Ạ̈ ḁ̈ Ḁ̈ b B c C d D e E f F g G h H i I j J k K >> l L m M n N o O ö Ö ǫ̈ Ǫ̈ ȫ Ȫ ơ̈ Ơ̈ ợ̈ Ợ̈ ọ̈ Ọ̈ p P q Q r R s S ss ß t >> T u U ü Ü ǘ Ǘ ǜ Ǜ ǚ Ǚ ų̈ Ų̈ ǖ Ǖ ư̈ Ư̈ ự̈ Ự̈ ụ̈ Ụ̈ ṳ̈ Ṳ̈ ṷ̈ Ṷ̈ ṵ̈ Ṵ̈ v >> V w W x X y Y z Z >> >> I wonder why it looks like this. A lot of characters which never >> appear in a German text. >> >> >> For Spanish there is 'traditional' and 'standard' >> >> http://www.unicode.org/cldr/charts/27/collation/es.html >> >> standard a A á Á b B c C d D e E é É f F g G h H i I í Í j J k K l L m >> M n N ñ Ñ ņ̃ Ņ̃ ṇ̃ Ṇ̃ ṋ̃ Ṋ̃ ṉ̃ Ṉ̃ o O ó Ó p P q Q r R s S t T u U ú Ú >> ü Ü v V w W x X y Y z Z >> >> traditional a A á Á b B c C ch Ch CH cĥ Cĥ CĤ cȟ Cȟ CȞ cḧ Cḧ CḦ cḣ Cḣ >> CḢ cḩ Cḩ CḨ cḥ Cḥ CḤ cḫ Cḫ CḪ cẖ Cẖ d D e E é É f F g G h H i I í Í j >> J k K l L ll Ll LL lĺ Lĺ LĹ lľ Lľ LĽ lļ Lļ LĻ lḷ Lḷ LḶ lḹ Lḹ LḸ lḽ Lḽ >> LḼ lḻ Lḻ LḺ m M n N ñ Ñ ņ̃ Ņ̃ ṇ̃ Ṇ̃ ṋ̃ Ṋ̃ ṉ̃ Ṉ̃ o O ó Ó p P q Q r R s >> S t T u U ú Ú ü Ü v V w W x X y Y z Z >> >> And French is not easily found >> http://www.unicode.org/cldr/charts/27/collation/index.html >> or seems to be defined elsewhere >> >> http://unicode.org/repos/cldr/tags/release-27/common/collation/fr.xml >> >> Suggestions and hints are welcome >> >> --Hannes >> _______________________________________________ >> Cuis mailing list >> [hidden email] >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org > > > _______________________________________________ > Cuis mailing list > [hidden email] > http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org > _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Free forum by Nabble | Edit this page |