Posted by
Sven Van Caekenberghe-2 on
Jun 11, 2015; 2:08pm
URL: https://forum.world.st/ZnClient-and-percent-characters-tp4831433p4831705.html
> On 11 Jun 2015, at 08:35, Sven Van Caekenberghe <
[hidden email]> wrote:
>
> @everybody
>
> The key method that defines how the query part of a URL is percent encoded is ZnMetaResourceUtils class>>#querySafeSet
>
> Years ago, Zinc HTTP Components followed the better safe than sorry approach of encoding almost every character except for the ones that are safe in all contexts.
>
> Later on, we began reading the specs better and decided to follow them more closely, that is why there are now different safe sets.
>
> Now, we can (and should) all read the different specs, and try to learn from things in the wild as well from other implementations.
>
> The quote from
http://en.wikipedia.org/wiki/Query_string was incomplete, it said 'for HTML 5 when submitting a form using GET', which is a very specific context.
>
> ZnUrl was written against RFC 3986 mostly.
>
> Now, maybe we made a mistake, maybe not.
I looked into this a bit more, and I am confused.
My most strict reading of RFC 3986 (which obsoletes RFC 2396) says in section 3.4 Query:
query = *( pchar / "/" / "?" )
where
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
which I understand to allow ,
The description above is what is in ZnMetaResourceUtils class>>#querySafeSet which the noted exceptions (=, & and + because we interpret the query as key-value pairs).
In
http://www.w3.org/Addressing/URL/uri-spec.html is read the same.
That being said, there are counter examples, like when you search for foo,bar in Google using Google Chrome, which then results in the URL:
https://www.google.be/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=foo%2CbarOr when you do
$ curl -G -v --data-urlencode "foo=one,two" "
http://zn.stfx.eu/echo?q=1,2,3&x=a,b"
* Hostname was NOT found in DNS cache
* Trying 46.137.113.215...
* Connected to zn.stfx.eu (46.137.113.215) port 80 (#0)
> GET /echo?q=1,2,3&x=a,b&foo=one%2Ctwo HTTP/1.1
> User-Agent: curl/7.37.1
> Host: zn.stfx.eu
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 11 Jun 2015 14:23:06 GMT
* Server Zinc HTTP Components 1.0 is not blacklisted
< Server: Zinc HTTP Components 1.0
< Content-Type: text/plain;charset=utf-8
< Content-Length: 421
< Vary: Accept-Encoding
<
This is Zinc HTTP Components echoing your request !
Running a ZnManagingMultiThreadedServer(running 8083)
GET request for /echo?q=1,2,3&x=a,b&foo=one,two
with headers
X-Forwarded-Server: ip-10-226-6-28.eu-west-1.compute.internal
X-Forwarded-Host: zn.stfx.eu
X-Zinc-Remote-Address: 127.0.0.1
User-Agent: curl/7.37.1
Host: localhost:8083
Connection: Keep-Alive
Accept: */*
X-Forwarded-For: 81.83.7.35
* Connection #0 to host zn.stfx.eu left intact
Reading about JavaScripts' encodeURI and encodeURIComponent functions does not help either (the first one keeps the comma, the latter one encodes it).
I know there are some other people on this list that might have an opinion, so let's try to figure this out together.
> But maybe it also would be a good idea to allow users to decide this for themselves on a case by case basis.
>
>> On 11 Jun 2015, at 05:18, Jimmie Houchin <
[hidden email]> wrote:
>>
>> Thanks for the reply.
>>
>> I implemented Peter's suggestion as an easy keep moving solution.
>>
>> As I said, I am not expert in what is or is not legal according to the standards.
>> However, looking at Python, their urllib library in the quote and urlencode methods they encode the commas by default.
>>
>> _ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>> b'abcdefghijklmnopqrstuvwxyz'
>> b'0123456789'
>> b'_.-')
>>
>>
https://docs.python.org/3/library/urllib.parse.html>>
https://hg.python.org/cpython/file/3.4/Lib/urllib/parse.py>>
>> That's at least how one major language understands the standard. And Python 2.7 is the same.
>>
>> According to Wikipedia
>>
http://en.wikipedia.org/wiki/Query_string>> • Characters that cannot be converted to the correct charset are replaced with HTML numeric character references[9]
>> • SPACE is encoded as '+'
>> • Letters (A–Z and a–z), numbers (0–9) and the characters '*','-','.' and '_' are left as-is
>>
>> It appeared in the stackoverflow article I quoted previously that ASP.NET encodes commas. I could misunderstand or be reading into it.
>>
http://stackoverflow.com/questions/8828702/why-is-the-comma-url-encoded>> Just a little more information to add to the discussion.
>>
>> Thanks.
>>
>> Jimmie
>>
>>
>>
>>
>> On 06/10/2015 05:56 PM, Norbert Hartl wrote:
>>> Just to clarify:
>>>
>>> "
>>> Characters in the "reserved" set are not reserved in
>>> all contexts.
>>>
>>> The set of characters actually reserved within any given URI
>>> component is defined by that component. In general, a character is
>>> reserved if the semantics of the URI changes if the character is
>>> replaced with its escaped US-ASCII encoding."
>>>
>>> If I were you I'd subclass ZnUrl and implement
>>> #encodeQuery:on:
>>> on that class. You could have an extension method in ZnResourceMetaUtils that returns the character set you need to have encoded. In ZnClient you just set your ZnUrl derived class object as #url:
>>> Cannot think of anything better for a quick resolve of your problem.
>>> Norbert
>>>> Am 11.06.2015 um 00:26 schrieb Jimmie Houchin <
[hidden email]>:
>>>>
>>>> I am not an expert on URIs or encoding. However, this is a requirement of the API I am using and I am required to submit an encoded URI with %2C and no commas.
>>>>
>>>> As far as commas needing to be escaped, it seems from other sources that they should be.
>>>>
>>>> From
https://www.ietf.org/rfc/rfc2396.txt>>>> The plus "+", dollar "$", and comma "," characters have been added to
>>>> those in the "reserved" set, since they are treated as reserved
>>>> within the query component.
>>>>
>>>> States that commas are reserved within the query component.
>>>>
>>>>
>>>>
http://stackoverflow.com/questions/8828702/why-is-the-comma-url-encoded>>>>
>>>>
>>>> Regardless of what is or is not required, I do need the ability to have a query string with commas encoded as %2C in order to satisfy and use the API which states.
>>>>
>>>> fields: Optional An URL encoded (%2C) comma separated list of instrument fields that are to be returned in the response. The instrument field will be returned regardless of the input to this query parameter. Please see the Response Parameters section below for a list of valid values.
>>>>
>>>> Which will look like this or something similar.
>>>>
>>>> fields=displayName%2Cinstrument%2Cpip
>>>>
>>>>
>>>> Thanks.
>>>>
>>>> Jimmie
>>>>
>>>>
>>>> On 06/10/2015 03:27 PM, Norbert Hartl wrote:
>>>>> That's because the comma does not need to be escaped in the query part of the uri.
>>>>>
>>>>> Norbert
>>>>>
>>>>>
>>>>>> Am 10.06.2015 um 22:00 schrieb Jimmie Houchin <
[hidden email]>
>>>>>> :
>>>>>>
>>>>>> On 06/10/2015 10:32 AM, Sven Van Caekenberghe wrote:
>>>>>>
>>>>>>>> On 10 Jun 2015, at 17:24, David <
[hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> El Wed, 10 Jun 2015 10:14:37 -0500
>>>>>>>> Jimmie Houchin
>>>>>>>> <
[hidden email]>
>>>>>>>>
>>>>>>>> escribió:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I am attempting to use ZnClient to request data. The request requires
>>>>>>>>> a %2C (comma) delimited string as part of the query. Below is a
>>>>>>>>> snippet.
>>>>>>>>>
>>>>>>>>> znClient
>>>>>>>>> addPath: '/v1/instruments';
>>>>>>>>> queryAt: 'fields' putAll: 'displayName%2Cinstrument%2Cpip';
>>>>>>>>> get ;
>>>>>>>>> contents)
>>>>>>>>>
>>>>>>>>> The string 'displayName%2Cinstrument%2Cpip'
>>>>>>>>> is being converted to 'displayName%252Cinstrument%252Cpip'
>>>>>>>>> which causes the request to fail.
>>>>>>>>>
>>>>>>>>> The query needs to be
>>>>>>>>> fields=displayName%2Cinstrument%2Cpip
>>>>>>>>>
>>>>>>>>> I have not found how to do this correctly.
>>>>>>>>> Any help greatly appreciated.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> Jimmie
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Maybe a silly thing, but since %2C = , ... Did you tried already to
>>>>>>>> make itself encode that? Like
>>>>>>>> znClient
>>>>>>>> addPath: '/v1/instruments';
>>>>>>>> queryAt: 'fields' putAll: 'displayName,instrument,pip';
>>>>>>>> get ;
>>>>>>>> contents)
>>>>>>>>
>>>>>>>> I suspect it is using encoding internally, that is why % is also
>>>>>>>> encoded if you try to put it.
>>>>>>>>
>>>>>>>> I hope that works
>>>>>>>>
>>>>>>> Not silly and no need to suspect, but absolutely correct !
>>>>>>>
>>>>>>> Sven
>>>>>>>
>>>>>> My apologies for not having full disclosure.
>>>>>>
>>>>>> Pharo 4, new image, freshly installed Zinc stable version.
>>>>>> Xubuntu 15.04
>>>>>>
>>>>>>
>>>>>>
>>>>>> That is what I thought would happen and what I tried first. But it is not being encoded from what I can find.
>>>>>>
>>>>>> Inspect this in a workspace/playground.
>>>>>>
>>>>>> ZnClient new
>>>>>> https;
>>>>>> host: '
>>>>>> google.com
>>>>>> ';
>>>>>> addPath: '/commaTest';
>>>>>> queryAt: 'fields' put: 'displayName,instrument,pip';
>>>>>> yourself
>>>>>>
>>>>>> View the request / requestLine / uri. The commas are still present in the URI.
>>>>>> So I tried encoding myself and get the other error.
>>>>>>
>>>>>> Of course Google won't understand this and in this snippet won't receive it.
>>>>>>
>>>>>> And please let me know if I am doing something wrong.
>>>>>>
>>>>>> Any help greatly appreciated.
>>>>>>
>>>>>> Jimmie
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>