Comment #5 on issue 43 by norbert.hartl: url decoding not correct in all
cases
http://code.google.com/p/glassdb/issues/detail?id=43I think the replacement of + and %20 inside the HyURI is correct. The uri
has no encoding. It is encoded before embedded into a request. As for URI
encoding + is a reserved character. But according to the type
application/x-www-form-urlencoded + is an encoded space so it is valid in
the query string if the GET request is considered safe url encoded. I think
HyURI misses to encode the values in writeQueriesTo:
For the decoding part I think neither of the parts in the above
implementation is reliable. There are too many standards floating around.
So cannot tell which encoding the url is really in. A good bet/convention
would be to look at character type in the http header when receiving a
page. It is considered valid to assume the url is in the same encoding as
the document.
Constructing an url without a page does not give any hints about encoding.
looking towards modern implementations and IRI decoding utf-8 should be
considered the default. Therefor a decoding would be to have every octet
received as an ascii character. Then every %xx is converted into an octet.
After that an attempt to decode utf-8 should be done and if the characters
not valid (no utf-8 sequence) should be re-encoded in percentage notation.
This would lead to decoding of the default and leaving any not default
encoding to a later processing step. I know this does not sound very
practical but there aren't much alternatives to get it right.