Strange results from Zinc for the www.delicious.com domain

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Strange results from Zinc for the www.delicious.com domain

Andy Burnett
I have just come across a very odd 'bug'? in Zinc. Evaluating the two lines (below) will return the same result for the various domains I tested, but not for Delicious.  In their case, the HTTPClient returns what I expected - i.e. the contents of the page. But, the Zinc client returns an undefined object.  Any thoughts on what is going on - user agent maybe??

response := HTTPClient httpGet:'http://www.delicious.com'.
response2 := ZnHttpClient new url:'http://www.delicious.com'; get.

Cheers
Andy



Reply | Threaded
Open this post in threaded view
|

Re: Strange results from Zinc for the www.delicious.com domain

Matt Kennedy
ZnHttpClient is sending a Connection: close header in the request which is causing delicious to return the headers, but not the actual content entity.

It's doing this in ZnHttpClient>>method:for:headers:data:limit:. Commenting out 'request setConnectionClose' in that method allows the get request to work.

On Apr 30, 2011 9:23pm, Andy Burnett <[hidden email]> wrote:

> I have just come across a very odd 'bug'? in Zinc. Evaluating the two lines (below) will return the same result for the various domains I tested, but not for Delicious.  In their case, the HTTPClient returns what I expected - i.e. the contents of the page. But, the Zinc client returns an undefined object.  Any thoughts on what is going on - user agent maybe??
>
>
>
> response := HTTPClient httpGet:'http://www.delicious.com'.
> response2 := ZnHttpClient new url:'http://www.delicious.com'; get.
>
>
>
>
> Cheers
> Andy
>
>
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Strange results from Zinc for the www.delicious.com domain

Sven Van Caekenberghe

On 01 May 2011, at 04:28, [hidden email] wrote:

> ZnHttpClient is sending a Connection: close header in the request which is causing delicious to return the headers, but not the actual content entity.
>
> It's doing this in ZnHttpClient>>method:for:headers:data:limit:. Commenting out 'request setConnectionClose' in that method allows the get request to work.

Actually it is a deeper bug, related to what Esteban reported. As far as I can tell right now, certain servers respond to requests with 'Connection: Close' by not including a 'Content-Length' (most notably Google GWS, but apparently other too). The idea is then to read the content #upToEnd. ZnEntityReader does have a provision for that, but somehow it got disabled because this behavior is not very common (as far as I remember, but I have to check again, 'Content-Length' is required with HTTP/1.1, but there might be finer points in the specs). Enabling it with #allowReadingUpToEnd should have fixed it, but seems to break lots of other code. I have to investigate this further.

Anyway, I now know where to look, so I'll get there.

Sven


 
Reply | Threaded
Open this post in threaded view
|

Re: Strange results from Zinc for the www.delicious.com domain

Matt Kennedy
The URL in this case is responding with chunked encoding in the content, so the reader shouldn't need to be relying on the Content-Length header. When I was testing the delicious URL I noticed that they have an exceedingly short idle timeout before the server shuts down the connection. Removing the Connection: close header in the request was a workaround, putting the request into a keep alive mode which delicious supports by default, but their idle timeout is so short the request effectively works like a single connection if followup requests aren't immediately issued.

On Sun, May 1, 2011 at 5:16 AM, Sven Van Caekenberghe <[hidden email]> wrote:

On 01 May 2011, at 04:28, [hidden email] wrote:

> ZnHttpClient is sending a Connection: close header in the request which is causing delicious to return the headers, but not the actual content entity.
>
> It's doing this in ZnHttpClient>>method:for:headers:data:limit:. Commenting out 'request setConnectionClose' in that method allows the get request to work.

Actually it is a deeper bug, related to what Esteban reported. As far as I can tell right now, certain servers respond to requests with 'Connection: Close' by not including a 'Content-Length' (most notably Google GWS, but apparently other too). The idea is then to read the content #upToEnd. ZnEntityReader does have a provision for that, but somehow it got disabled because this behavior is not very common (as far as I remember, but I have to check again, 'Content-Length' is required with HTTP/1.1, but there might be finer points in the specs). Enabling it with #allowReadingUpToEnd should have fixed it, but seems to break lots of other code. I have to investigate this further.

Anyway, I now know where to look, so I'll get there.

Sven






--
Matt Kennedy
Reply | Threaded
Open this post in threaded view
|

Re: Strange results from Zinc for the www.delicious.com domain

Andy Burnett
In reply to this post by Sven Van Caekenberghe
Sven Van Caekenberghe wrote
On 01 May 2011, at 04:28, [hidden email] wrote:

> ZnHttpClient is sending a Connection: close header in the request which is causing delicious to return the headers, but not the actual content entity.
>
> It's doing this in ZnHttpClient>>method:for:headers:data:limit:. Commenting out 'request setConnectionClose' in that method allows the get request to work.

Actually it is a deeper bug, related to what Esteban reported. As far as I can tell right now, certain servers respond to requests with 'Connection: Close' by not including a 'Content-Length' (most notably Google GWS, but apparently other too). The idea is then to read the content #upToEnd. ZnEntityReader does have a provision for that, but somehow it got disabled because this behavior is not very common (as far as I remember, but I have to check again, 'Content-Length' is required with HTTP/1.1, but there might be finer points in the specs). Enabling it with #allowReadingUpToEnd should have fixed it, but seems to break lots of other code. I have to investigate this further.

Anyway, I now know where to look, so I'll get there.

Sven
OK, given what you - and Matt - have found, I think I will wait until you have a tweaked version of the code. In the meantime, it looks like the existing HttpClient will have to suffice - I am trying to do some screen scraping of Delicious, so I really need something that will work with that server.

Cheers
Andy
Reply | Threaded
Open this post in threaded view
|

Re: Strange results from Zinc for the www.delicious.com domain

Sven Van Caekenberghe

On 02 May 2011, at 00:11, Andy Burnett wrote:

> OK, given what you - and Matt - have found, I think I will wait until you
> have a tweaked version of the code. In the meantime, it looks like the
> existing HttpClient will have to suffice - I am trying to do some screen
> scraping of Delicious, so I really need something that will work with that
> server.

Please try again, if you still have time, if not, no problem.

Sven


Reply | Threaded
Open this post in threaded view
|

Re: Strange results from Zinc for the www.delicious.com domain

Andy Burnett
Great! that now works,

TVM