Hi,
today I had a problem with non-ascii characters in the URL. The reason for this problem seemed to be that URLs are encoded in a different way as they are decoded. Encoding is done with WAUrlEncoder and decoding is done with URLEncoder (at least on VisualWorks). The URLEncoder uses UTF8 to encode/decode URLs, but the WAUrlEncoder stores characters bytewise with %. So if you encode 'ü' (252 as Integer) with WAUrlEncoder you get '%FC', while the URLEncoder produces '%C3%BC'. If the URLEncoder is also used for encoding everything works fine and even the browser shows the characters properly in the address bar. I'm not sure if that's just a problem of the VW port, but still I don't understand why the WAUrlEncoder doesn't encode with UTF8, even though that's recommented in the rfc (at least that's what Wikipedia said ;-) ). Kind Regards Karsten -- Karsten Kusche - Dipl.Inf. - [hidden email] Tel: +49 3496 21 43 29 Georg Heeg eK - Köthen Handelsregister: Amtsgericht Dortmund A 12812 _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
It's not quite as straightforward as always using UTF-8, unfortunately (actually there basically isn't a *right* answer that is guaranteed to work everywhere). But if you are using UTF-8 as your page encoding you should be using it to encode URLs before percent escaping. WAUrlEncoder in 2.9 does this but it looks like the one in 2.8 doesn't.
There has been a bunch of discussion about encodings on the development list in the past few weeks. Encodings + the web = a mess, really. :) Julian On Fri, Feb 6, 2009 at 3:46 PM, Karsten <[hidden email]> wrote: Hi, _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
Hi Julian,
glad this has been fixed :-) Karsten Julian Fitzell wrote: It's not quite as straightforward as always using UTF-8, unfortunately (actually there basically isn't a *right* answer that is guaranteed to work everywhere). But if you are using UTF-8 as your page encoding you should be using it to encode URLs before percent escaping. WAUrlEncoder in 2.9 does this but it looks like the one in 2.8 doesn't. -- Karsten Kusche - Dipl.Inf. - [hidden email] Tel: +49 3496 21 43 29 Georg Heeg eK - Köthen Handelsregister: Amtsgericht Dortmund A 12812 _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by Karsten Kusche
2009/2/6 Karsten <[hidden email]>:
> Hi, > > today I had a problem with non-ascii characters in the URL. The reason for > this problem seemed to be that URLs are encoded in a different way as they > are decoded. Encoding is done with WAUrlEncoder and decoding is done with > URLEncoder (at least on VisualWorks). The URLEncoder uses UTF8 to > encode/decode URLs, but the WAUrlEncoder stores characters bytewise with %. > So if you encode 'ü' (252 as Integer) with WAUrlEncoder you get '%FC', while > the URLEncoder produces '%C3%BC'. That is how things are in Seaside 2.8, at least the WAUrlEncoder I can't say anything about URLEncoder. > If the URLEncoder is also used for encoding everything works fine and even > the browser shows the characters properly in the address bar. I'm not sure > if that's just a problem of the VW port, but still I don't understand why > the WAUrlEncoder doesn't encode with UTF8, even though that's recommented in > the rfc (at least that's what Wikipedia said ;-) ). First, URLEncoder is a VW class which means we can't use it. Second, specs and Wikipedia entries don't matter. The only things that matter is what browsers and people do. Browsers are quite good at ignoring specs mostly because authors are way better at ignoring specs. So why don't we encode to utf-8? Mostly because a lot of people use utf-8 as an internal encoding for their images. This means that data that is already utf-8 must not encoded to utf-8 again. While this is fine for those people it screws the people that want to use the native encoding internally and use utf-8 externally. Additionally encoding to utf-8 or decoding from utf-8 is wrong if you want to use ISO-8859-1 externally. In Seaside 2.8 you can only indirectly specify your desired internal encoding through the choice (or configuration) of your Server adapter. However in WAUrlEncoder we don't have access to this information. The fix in Seaside 2.9 is that we have access to whether we need to do encoding and to what ;-) Honestly I think the combination of WAUrlEncoder not doing encoding to utf-8 and URLEncoder doing utf-8 decoding is broken. But that is a VW port issue. Cheers Philippe _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
2009/2/7 Philippe Marschall <[hidden email]>:
> 2009/2/6 Karsten <[hidden email]>: >> Hi, >> >> today I had a problem with non-ascii characters in the URL. The reason for >> this problem seemed to be that URLs are encoded in a different way as they >> are decoded. Encoding is done with WAUrlEncoder and decoding is done with >> URLEncoder (at least on VisualWorks). The URLEncoder uses UTF8 to >> encode/decode URLs, but the WAUrlEncoder stores characters bytewise with %. >> So if you encode 'ü' (252 as Integer) with WAUrlEncoder you get '%FC', while >> the URLEncoder produces '%C3%BC'. > > That is how things are in Seaside 2.8, at least the WAUrlEncoder I > can't say anything about URLEncoder. > >> If the URLEncoder is also used for encoding everything works fine and even >> the browser shows the characters properly in the address bar. I'm not sure >> if that's just a problem of the VW port, but still I don't understand why >> the WAUrlEncoder doesn't encode with UTF8, even though that's recommented in >> the rfc (at least that's what Wikipedia said ;-) ). > > First, URLEncoder is a VW class which means we can't use it. > Second, specs and Wikipedia entries don't matter. The only things that > matter is what browsers and people do. Browsers are quite good at > ignoring specs mostly because authors are way better at ignoring > specs. > > So why don't we encode to utf-8? Mostly because a lot of people use > utf-8 as an internal encoding for their images. This means that data > that is already utf-8 must not encoded to utf-8 again. While this is > fine for those people it screws the people that want to use the native > encoding internally and use utf-8 externally. Additionally encoding to > utf-8 or decoding from utf-8 is wrong if you want to use ISO-8859-1 > externally. > > In Seaside 2.8 you can only indirectly specify your desired internal > encoding through the choice (or configuration) of your Server adapter. > However in WAUrlEncoder we don't have access to this information. The > fix in Seaside 2.9 is that we have access to whether we need to do > encoding and to what ;-) > > Honestly I think the combination of WAUrlEncoder not doing encoding to > utf-8 and URLEncoder doing utf-8 decoding is broken. But that is a VW > port issue. Forget this. The behavior you describe is consistent with WAKomEncoded. It's ok as long no non-ASCII data generated on the server is submitted back ;-) Cheers Philippe _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
Free forum by Nabble | Edit this page |