WAUrlEncoder

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

WAUrlEncoder

Karsten Kusche
Hi,

today I had a problem with non-ascii characters in the URL. The reason
for this problem seemed to be that URLs are encoded in a different way
as they are decoded. Encoding is done with WAUrlEncoder and decoding is
done with URLEncoder (at least on VisualWorks). The URLEncoder uses UTF8
to encode/decode URLs, but the WAUrlEncoder stores characters bytewise
with %. So if you encode 'ü' (252 as Integer) with WAUrlEncoder you get
'%FC', while the URLEncoder produces '%C3%BC'.

If the URLEncoder is also used for encoding everything works fine and
even the browser shows the characters properly in the address bar. I'm
not sure if that's just a problem of the VW port, but still I don't
understand why the WAUrlEncoder doesn't encode with UTF8, even though
that's recommented in the rfc (at least that's what Wikipedia said ;-) ).

Kind Regards
Karsten

--
Karsten Kusche - Dipl.Inf. - [hidden email]
Tel: +49 3496 21 43 29
Georg Heeg eK - Köthen
Handelsregister: Amtsgericht Dortmund A 12812

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: WAUrlEncoder

Julian Fitzell-2
It's not quite as straightforward as always using UTF-8, unfortunately (actually there basically isn't a *right* answer that is guaranteed to work everywhere). But if you are using UTF-8 as your page encoding you should be using it to encode URLs before percent escaping. WAUrlEncoder in 2.9 does this but it looks like the one in 2.8 doesn't.

There has been a bunch of discussion about encodings on the development list in the past few weeks. Encodings + the web = a mess, really. :)

Julian

On Fri, Feb 6, 2009 at 3:46 PM, Karsten <[hidden email]> wrote:
Hi,

today I had a problem with non-ascii characters in the URL. The reason for this problem seemed to be that URLs are encoded in a different way as they are decoded. Encoding is done with WAUrlEncoder and decoding is done with URLEncoder (at least on VisualWorks). The URLEncoder uses UTF8 to encode/decode URLs, but the WAUrlEncoder stores characters bytewise with %. So if you encode 'ü' (252 as Integer) with WAUrlEncoder you get '%FC', while the URLEncoder produces '%C3%BC'.

If the URLEncoder is also used for encoding everything works fine and even the browser shows the characters properly in the address bar. I'm not sure if that's just a problem of the VW port, but still I don't understand why the WAUrlEncoder doesn't encode with UTF8, even though that's recommented in the rfc (at least that's what Wikipedia said ;-) ).

Kind Regards
Karsten

--
Karsten Kusche - Dipl.Inf. - [hidden email]
Tel: +49 3496 21 43 29
Georg Heeg eK - Köthen
Handelsregister: Amtsgericht Dortmund A 12812

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside


_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: WAUrlEncoder

Karsten Kusche
Hi Julian,

glad this has been fixed :-)

Karsten



Julian Fitzell wrote:
It's not quite as straightforward as always using UTF-8, unfortunately (actually there basically isn't a *right* answer that is guaranteed to work everywhere). But if you are using UTF-8 as your page encoding you should be using it to encode URLs before percent escaping. WAUrlEncoder in 2.9 does this but it looks like the one in 2.8 doesn't.

There has been a bunch of discussion about encodings on the development list in the past few weeks. Encodings + the web = a mess, really. :)

Julian

On Fri, Feb 6, 2009 at 3:46 PM, Karsten <[hidden email]> wrote:
Hi,

today I had a problem with non-ascii characters in the URL. The reason for this problem seemed to be that URLs are encoded in a different way as they are decoded. Encoding is done with WAUrlEncoder and decoding is done with URLEncoder (at least on VisualWorks). The URLEncoder uses UTF8 to encode/decode URLs, but the WAUrlEncoder stores characters bytewise with %. So if you encode 'ü' (252 as Integer) with WAUrlEncoder you get '%FC', while the URLEncoder produces '%C3%BC'.

If the URLEncoder is also used for encoding everything works fine and even the browser shows the characters properly in the address bar. I'm not sure if that's just a problem of the VW port, but still I don't understand why the WAUrlEncoder doesn't encode with UTF8, even though that's recommented in the rfc (at least that's what Wikipedia said ;-) ).

Kind Regards
Karsten

--
Karsten Kusche - Dipl.Inf. - [hidden email]
Tel: +49 3496 21 43 29
Georg Heeg eK - Köthen
Handelsregister: Amtsgericht Dortmund A 12812

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside


_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

-- 
Karsten Kusche - Dipl.Inf. - [hidden email]
Tel: +49 3496 21 43 29
Georg Heeg eK - Köthen
Handelsregister: Amtsgericht Dortmund A 12812

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: WAUrlEncoder

Philippe Marschall
In reply to this post by Karsten Kusche
2009/2/6 Karsten <[hidden email]>:
> Hi,
>
> today I had a problem with non-ascii characters in the URL. The reason for
> this problem seemed to be that URLs are encoded in a different way as they
> are decoded. Encoding is done with WAUrlEncoder and decoding is done with
> URLEncoder (at least on VisualWorks). The URLEncoder uses UTF8 to
> encode/decode URLs, but the WAUrlEncoder stores characters bytewise with %.
> So if you encode 'ü' (252 as Integer) with WAUrlEncoder you get '%FC', while
> the URLEncoder produces '%C3%BC'.

That is how things are in Seaside 2.8, at least the WAUrlEncoder I
can't say anything about URLEncoder.

> If the URLEncoder is also used for encoding everything works fine and even
> the browser shows the characters properly in the address bar. I'm not sure
> if that's just a problem of the VW port, but still I don't understand why
> the WAUrlEncoder doesn't encode with UTF8, even though that's recommented in
> the rfc (at least that's what Wikipedia said ;-) ).

First, URLEncoder is a VW class which means we can't use it.
Second, specs and Wikipedia entries don't matter. The only things that
matter is what browsers and people do. Browsers are quite good at
ignoring specs mostly because authors are way better at ignoring
specs.

So why don't we encode to utf-8? Mostly because a lot of people use
utf-8 as an internal encoding for their images. This means that data
that is already utf-8 must not encoded to utf-8 again. While this is
fine for those people it screws the people that want to use the native
encoding internally and use utf-8 externally. Additionally encoding to
utf-8 or decoding from utf-8 is wrong if you want to use ISO-8859-1
externally.

In Seaside 2.8 you can only indirectly specify your desired internal
encoding through the choice (or configuration) of your Server adapter.
However in WAUrlEncoder we don't have access to this information. The
fix in Seaside 2.9 is that we have access to whether we need to do
encoding and to what ;-)

Honestly I think the combination of WAUrlEncoder not doing encoding to
utf-8 and URLEncoder doing utf-8 decoding is broken. But that is a VW
port issue.

Cheers
Philippe
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: WAUrlEncoder

Philippe Marschall
2009/2/7 Philippe Marschall <[hidden email]>:

> 2009/2/6 Karsten <[hidden email]>:
>> Hi,
>>
>> today I had a problem with non-ascii characters in the URL. The reason for
>> this problem seemed to be that URLs are encoded in a different way as they
>> are decoded. Encoding is done with WAUrlEncoder and decoding is done with
>> URLEncoder (at least on VisualWorks). The URLEncoder uses UTF8 to
>> encode/decode URLs, but the WAUrlEncoder stores characters bytewise with %.
>> So if you encode 'ü' (252 as Integer) with WAUrlEncoder you get '%FC', while
>> the URLEncoder produces '%C3%BC'.
>
> That is how things are in Seaside 2.8, at least the WAUrlEncoder I
> can't say anything about URLEncoder.
>
>> If the URLEncoder is also used for encoding everything works fine and even
>> the browser shows the characters properly in the address bar. I'm not sure
>> if that's just a problem of the VW port, but still I don't understand why
>> the WAUrlEncoder doesn't encode with UTF8, even though that's recommented in
>> the rfc (at least that's what Wikipedia said ;-) ).
>
> First, URLEncoder is a VW class which means we can't use it.
> Second, specs and Wikipedia entries don't matter. The only things that
> matter is what browsers and people do. Browsers are quite good at
> ignoring specs mostly because authors are way better at ignoring
> specs.
>
> So why don't we encode to utf-8? Mostly because a lot of people use
> utf-8 as an internal encoding for their images. This means that data
> that is already utf-8 must not encoded to utf-8 again. While this is
> fine for those people it screws the people that want to use the native
> encoding internally and use utf-8 externally. Additionally encoding to
> utf-8 or decoding from utf-8 is wrong if you want to use ISO-8859-1
> externally.
>
> In Seaside 2.8 you can only indirectly specify your desired internal
> encoding through the choice (or configuration) of your Server adapter.
> However in WAUrlEncoder we don't have access to this information. The
> fix in Seaside 2.9 is that we have access to whether we need to do
> encoding and to what ;-)
>
> Honestly I think the combination of WAUrlEncoder not doing encoding to
> utf-8 and URLEncoder doing utf-8 decoding is broken. But that is a VW
> port issue.

Forget this. The behavior you describe is consistent with
WAKomEncoded. It's ok as long no non-ASCII data generated on the
server is submitted back ;-)

Cheers
Philippe
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside