Are there any libraries for glass that will take utf8 and
transliterate/anglicize it to ascii so something like émile becomes emile wœrl becomes woerl etc? i ask because we are about to roll out glass based e-commerce stuff out into europe and our merchant processor can't handle non ascii characters in things like the required credit card name, shipping names etc etc. existing system in perl we have code that converts from utf8 to plain old ascii following a basic set of rules like the above couple examples. what it ends up as doesn't matter so much as long as someone looking at it can understand how you got from one to another and doesn't start scratching their heads. |
The standard utf8 conversion in GLASS uses a primitive call for performance (UTF8PrimitiveEncoding). The class UTF8Encoding does the UTF8 conversion in Smalltalk, so I assume it would be possible do the type of conversion you want by subclassing UTF8Encoding and then using the class before shipping strings to asciii only sites.
Dale ----- "Sean Allen" <[hidden email]> wrote: | Are there any libraries for glass that will take utf8 and | transliterate/anglicize it to ascii | | so something | | like | | émile becomes emile | wœrl becomes woerl | etc? | | i ask because we are about to roll out glass based e-commerce stuff | out into europe | and our merchant processor can't handle non ascii characters in | things | like the required | credit card name, shipping names etc etc. | | existing system in perl we have code that converts from utf8 to plain | old ascii following | a basic set of rules like the above couple examples. what it ends up | as doesn't matter | so much as long as someone looking at it can understand how you got | from one | to another and doesn't start scratching their heads. |
In reply to this post by SeanTAllen
2010/2/22 Sean Allen <[hidden email]>:
> Are there any libraries for glass that will take utf8 and > transliterate/anglicize it to ascii > > so something > > like > > émile becomes emile > wœrl becomes woerl > etc? > > i ask because we are about to roll out glass based e-commerce stuff > out into europe > and our merchant processor can't handle non ascii characters in things > like the required > credit card name, shipping names etc etc. > > existing system in perl we have code that converts from utf8 to plain > old ascii following > a basic set of rules like the above couple examples. what it ends up > as doesn't matter > so much as long as someone looking at it can understand how you got from one > to another and doesn't start scratching their heads. > In Switzerland we have a proverb: You may ask anything if you don't fear the answer. A couple of simple hacks may get you somewhere [1] but there are more heavy weight solutions that get you farther [2]. You may try [3] with: NFD; [:Nonspacing Mark:] Remove; NFC Doing it "right" requires quite a bit of knowledge about your input and output. [1] http://www.alistapart.com/articles/accent-folding-for-auto-complete/ [2] http://userguide.icu-project.org/transforms/general [3] http://demo.icu-project.org/icu-bin/translit Cheers Philippe |
In reply to this post by SeanTAllen
Sean,
On 22.02.2010, at 17:17, Sean Allen wrote: > Are there any libraries for glass that will take utf8 and > transliterate/anglicize it to ascii > > so something > > like > > émile becomes emile > wœrl becomes woerl > etc? > > i ask because we are about to roll out glass based e-commerce stuff > out into europe > and our merchant processor can't handle non ascii characters in things > like the required > credit card name, shipping names etc etc. > > existing system in perl we have code that converts from utf8 to plain > old ascii following > a basic set of rules like the above couple examples. what it ends up > as doesn't matter > so much as long as someone looking at it can understand how you got from one > to another and doesn't start scratching their heads. could you please drop me a note if you achieve something. I have a similar problem. If you create a page in pier it does url safe encoding which works but doesn't look that good. Having transliterated string would make much nicer URLs. Not to say this should be added to pier then ;) Norbert |
In pharo, I have been testing and you can just take each string and
convert it to a hex and replace certain ones that match like F1 with n ( F1 is ñ ) that would be a quick and dirty way to deal with it. but in gemstone this only works to a certain level: this in pharo 'jœ' at: 2 returns $œ but in gemstone it throws an error that was the quad byte string neither class of String or DoubleByteString On Thu, Feb 25, 2010 at 4:01 AM, Norbert Hartl <[hidden email]> wrote: > Sean, > > On 22.02.2010, at 17:17, Sean Allen wrote: > >> Are there any libraries for glass that will take utf8 and >> transliterate/anglicize it to ascii >> >> so something >> >> like >> >> émile becomes emile >> wœrl becomes woerl >> etc? >> >> i ask because we are about to roll out glass based e-commerce stuff >> out into europe >> and our merchant processor can't handle non ascii characters in things >> like the required >> credit card name, shipping names etc etc. >> >> existing system in perl we have code that converts from utf8 to plain >> old ascii following >> a basic set of rules like the above couple examples. what it ends up >> as doesn't matter >> so much as long as someone looking at it can understand how you got from one >> to another and doesn't start scratching their heads. > > could you please drop me a note if you achieve something. I have a similar problem. If you create a page in pier it does url safe encoding which works but doesn't look that good. Having transliterated string would make much nicer URLs. Not to say this should be added to pier then ;) > > Norbert > > |
Free forum by Nabble | Edit this page |