I have just come across a very odd 'bug'? in Zinc. Evaluating the two lines (below) will return the same result for the various domains I tested, but not for Delicious. In their case, the HTTPClient returns what I expected - i.e. the contents of the page. But, the Zinc client returns an undefined object. Any thoughts on what is going on - user agent maybe??
response := HTTPClient httpGet:'http://www.delicious.com'. response2 := ZnHttpClient new url:'http://www.delicious.com'; get.
Cheers Andy |
ZnHttpClient is sending a Connection: close header in the request which is causing delicious to return the headers, but not the actual content entity.
It's doing this in ZnHttpClient>>method:for:headers:data:limit:. Commenting out 'request setConnectionClose' in that method allows the get request to work. On Apr 30, 2011 9:23pm, Andy Burnett <[hidden email]> wrote: > I have just come across a very odd 'bug'? in Zinc. Evaluating the two lines (below) will return the same result for the various domains I tested, but not for Delicious. In their case, the HTTPClient returns what I expected - i.e. the contents of the page. But, the Zinc client returns an undefined object. Any thoughts on what is going on - user agent maybe?? > > > > response := HTTPClient httpGet:'http://www.delicious.com'. > response2 := ZnHttpClient new url:'http://www.delicious.com'; get. > > > > > Cheers > Andy > > > > > > > > |
On 01 May 2011, at 04:28, [hidden email] wrote: > ZnHttpClient is sending a Connection: close header in the request which is causing delicious to return the headers, but not the actual content entity. > > It's doing this in ZnHttpClient>>method:for:headers:data:limit:. Commenting out 'request setConnectionClose' in that method allows the get request to work. Actually it is a deeper bug, related to what Esteban reported. As far as I can tell right now, certain servers respond to requests with 'Connection: Close' by not including a 'Content-Length' (most notably Google GWS, but apparently other too). The idea is then to read the content #upToEnd. ZnEntityReader does have a provision for that, but somehow it got disabled because this behavior is not very common (as far as I remember, but I have to check again, 'Content-Length' is required with HTTP/1.1, but there might be finer points in the specs). Enabling it with #allowReadingUpToEnd should have fixed it, but seems to break lots of other code. I have to investigate this further. Anyway, I now know where to look, so I'll get there. Sven |
The URL in this case is responding with chunked encoding in the content, so the reader shouldn't need to be relying on the Content-Length header. When I was testing the delicious URL I noticed that they have an exceedingly short idle timeout before the server shuts down the connection. Removing the Connection: close header in the request was a workaround, putting the request into a keep alive mode which delicious supports by default, but their idle timeout is so short the request effectively works like a single connection if followup requests aren't immediately issued.
On Sun, May 1, 2011 at 5:16 AM, Sven Van Caekenberghe <[hidden email]> wrote:
-- Matt Kennedy |
In reply to this post by Sven Van Caekenberghe
OK, given what you - and Matt - have found, I think I will wait until you have a tweaked version of the code. In the meantime, it looks like the existing HttpClient will have to suffice - I am trying to do some screen scraping of Delicious, so I really need something that will work with that server. Cheers Andy |
On 02 May 2011, at 00:11, Andy Burnett wrote: > OK, given what you - and Matt - have found, I think I will wait until you > have a tweaked version of the code. In the meantime, it looks like the > existing HttpClient will have to suffice - I am trying to do some screen > scraping of Delicious, so I really need something that will work with that > server. Please try again, if you still have time, if not, no problem. Sven |
Great! that now works,
TVM |
Free forum by Nabble | Edit this page |