Working with urls that contain non latin characters
I was ready to show a friend the Pharo web capabilities with the
classical "myString asUrl retrieveContents", but the friend gave me a
url that contains non Latin characters and then I got an
> and browsing around I discovered a useful method...
> encoder := ZnCharacterEncoder detectEncoding: bytes
> "==> a ZnSimplifiedByteEncoder('iso88591' strict)"
> now the following works...
> (ZnPercentEncoder new characterEncoder: encoder ) decode: x.
Right, but that guess is wrong (check the resulting string).
Since we are talking about Chinese characters that are outside the allowed range for #iso88591 (#latin1), that is logical.
Again, to my understanding, without further context, when %BF%A6%CA%B2 is encountered in the query part of a URL, it is first percent decoded, then UTF-8 decoded. That is what #asUrl assumes, and which leads to the error since that particular sequence, when interpreted like that, does not constitute a legal UTF-8 encoding.
> So maybe that helps explain it,
> but I don't know how to join the dots to make it work out of the box
> with "asUrl retrieveContents"
> cheers -ben