transliteration/anglicisation of utf8

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

transliteration/anglicisation of utf8

SeanTAllen
Are there any libraries for glass that will take utf8 and
transliterate/anglicize it to ascii

so something

like

émile becomes emile
wœrl becomes woerl
etc?

i ask because we are about to roll out glass based e-commerce stuff
out into europe
and our merchant processor can't handle non ascii characters in things
like the required
credit card name, shipping names etc etc.

existing system in perl we have code that converts from utf8 to plain
old ascii following
a basic set of rules like the above couple examples. what it ends up
as doesn't matter
so much as long as someone looking at it can understand how you got from one
to another and doesn't start scratching their heads.
Reply | Threaded
Open this post in threaded view
|

Re: transliteration/anglicisation of utf8

Dale
The standard utf8 conversion in GLASS uses a primitive call for performance (UTF8PrimitiveEncoding). The class UTF8Encoding does the UTF8 conversion in Smalltalk, so I assume it would be possible do the type of conversion you want by subclassing UTF8Encoding and then using the class before shipping strings to asciii only sites.

Dale
----- "Sean Allen" <[hidden email]> wrote:

| Are there any libraries for glass that will take utf8 and
| transliterate/anglicize it to ascii
|
| so something
|
| like
|
| émile becomes emile
| wœrl becomes woerl
| etc?
|
| i ask because we are about to roll out glass based e-commerce stuff
| out into europe
| and our merchant processor can't handle non ascii characters in
| things
| like the required
| credit card name, shipping names etc etc.
|
| existing system in perl we have code that converts from utf8 to plain
| old ascii following
| a basic set of rules like the above couple examples. what it ends up
| as doesn't matter
| so much as long as someone looking at it can understand how you got
| from one
| to another and doesn't start scratching their heads.
Reply | Threaded
Open this post in threaded view
|

Re: transliteration/anglicisation of utf8

Philippe Marschall
In reply to this post by SeanTAllen
2010/2/22 Sean Allen <[hidden email]>:

> Are there any libraries for glass that will take utf8 and
> transliterate/anglicize it to ascii
>
> so something
>
> like
>
> émile becomes emile
> wœrl becomes woerl
> etc?
>
> i ask because we are about to roll out glass based e-commerce stuff
> out into europe
> and our merchant processor can't handle non ascii characters in things
> like the required
> credit card name, shipping names etc etc.
>
> existing system in perl we have code that converts from utf8 to plain
> old ascii following
> a basic set of rules like the above couple examples. what it ends up
> as doesn't matter
> so much as long as someone looking at it can understand how you got from one
> to another and doesn't start scratching their heads.
>

In Switzerland we have a proverb:
You may ask anything if you don't fear the answer.

A couple of simple hacks may get you somewhere [1] but there are more
heavy weight solutions that get you farther [2]. You may try [3] with:
NFD; [:Nonspacing Mark:] Remove; NFC

Doing it "right" requires quite a bit of knowledge about your input and output.

 [1] http://www.alistapart.com/articles/accent-folding-for-auto-complete/
 [2] http://userguide.icu-project.org/transforms/general
 [3] http://demo.icu-project.org/icu-bin/translit

Cheers
Philippe
Reply | Threaded
Open this post in threaded view
|

Re: transliteration/anglicisation of utf8

NorbertHartl
In reply to this post by SeanTAllen
Sean,

On 22.02.2010, at 17:17, Sean Allen wrote:

> Are there any libraries for glass that will take utf8 and
> transliterate/anglicize it to ascii
>
> so something
>
> like
>
> émile becomes emile
> wœrl becomes woerl
> etc?
>
> i ask because we are about to roll out glass based e-commerce stuff
> out into europe
> and our merchant processor can't handle non ascii characters in things
> like the required
> credit card name, shipping names etc etc.
>
> existing system in perl we have code that converts from utf8 to plain
> old ascii following
> a basic set of rules like the above couple examples. what it ends up
> as doesn't matter
> so much as long as someone looking at it can understand how you got from one
> to another and doesn't start scratching their heads.

could you please drop me a note if you achieve something. I have a similar problem. If you create a page in pier it does url safe encoding which works but doesn't look that good. Having transliterated string would make much nicer URLs. Not to say this should be added to pier then ;)

Norbert

Reply | Threaded
Open this post in threaded view
|

Re: transliteration/anglicisation of utf8

SeanTAllen
In pharo, I have been testing and you can just take each string and
convert it to a hex and replace certain ones that match like

F1 with n  ( F1 is ñ )

that would be a quick and dirty way to deal with it.
but in gemstone this only works to a certain level:

this in pharo

'jœ' at: 2

returns



but in gemstone

it throws an error that was the quad byte string neither class of
String or DoubleByteString



On Thu, Feb 25, 2010 at 4:01 AM, Norbert Hartl <[hidden email]> wrote:

> Sean,
>
> On 22.02.2010, at 17:17, Sean Allen wrote:
>
>> Are there any libraries for glass that will take utf8 and
>> transliterate/anglicize it to ascii
>>
>> so something
>>
>> like
>>
>> émile becomes emile
>> wœrl becomes woerl
>> etc?
>>
>> i ask because we are about to roll out glass based e-commerce stuff
>> out into europe
>> and our merchant processor can't handle non ascii characters in things
>> like the required
>> credit card name, shipping names etc etc.
>>
>> existing system in perl we have code that converts from utf8 to plain
>> old ascii following
>> a basic set of rules like the above couple examples. what it ends up
>> as doesn't matter
>> so much as long as someone looking at it can understand how you got from one
>> to another and doesn't start scratching their heads.
>
> could you please drop me a note if you achieve something. I have a similar problem. If you create a page in pier it does url safe encoding which works but doesn't look that good. Having transliterated string would make much nicer URLs. Not to say this should be added to pier then ;)
>
> Norbert
>
>