The Inbox: WebClient-Core-tobe.120.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

The Inbox: WebClient-Core-tobe.120.mcz

commits-2
A new version of WebClient-Core was added to project The Inbox:
http://source.squeak.org/inbox/WebClient-Core-tobe.120.mcz

==================== Summary ====================

Name: WebClient-Core-tobe.120
Author: tobe
Time: 7 May 2020, 8:03:26.842149 pm
UUID: e511b96d-6164-458e-bebd-4bb4f42cfb72
Ancestors: WebClient-Core-nice.119

If we get a 304, do not try to parse content

In particular, since content-length is typically not set for a 304 response (https://tools.ietf.org/html/rfc7230#section-3.3.2) and we somehow failed to notice the connection being closed, we ended up with a timeout instead. This caused some Metacello tests to time out.

=============== Diff against WebClient-Core-nice.119 ===============

Item was changed:
  ----- Method: WebResponse>>getContent (in category 'private') -----
  getContent
  "Do not read any content if this was a HEAD request or code is 204 (no content)"
+ (request method = 'HEAD' or: [code = 204 or: [code = 304]]) ifTrue:[^''].
- (request method = 'HEAD' or: [code = 204]) ifTrue:[^''].
  ^super getContent!


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: WebClient-Core-tobe.120.mcz

Tobias Pape

> On 07.05.2020, at 20:03, [hidden email] wrote:
>
> A new version of WebClient-Core was added to project The Inbox:
> http://source.squeak.org/inbox/WebClient-Core-tobe.120.mcz
>
> ==================== Summary ====================
>
> Name: WebClient-Core-tobe.120
> Author: tobe
> Time: 7 May 2020, 8:03:26.842149 pm
> UUID: e511b96d-6164-458e-bebd-4bb4f42cfb72
> Ancestors: WebClient-Core-nice.119
>
> If we get a 304, do not try to parse content
>
> In particular, since content-length is typically not set for a 304 response (https://tools.ietf.org/html/rfc7230#section-3.3.2) and we somehow failed to notice the connection being closed, we ended up with a timeout instead. This caused some Metacello tests to time out.
>
> =============== Diff against WebClient-Core-nice.119 ===============
>
> Item was changed:
>  ----- Method: WebResponse>>getContent (in category 'private') -----
>  getContent
>   "Do not read any content if this was a HEAD request or code is 204 (no content)"
> + (request method = 'HEAD' or: [code = 204 or: [code = 304]]) ifTrue:[^''].
> - (request method = 'HEAD' or: [code = 204]) ifTrue:[^''].
>   ^super getContent!
>

Hey Tom

interesting catch.
But we should not reach here. 304 is a redirect which ought to be handled BEFORE any getContent…

Best regards
        -Tobias

Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: WebClient-Core-tobe.120.mcz

Beckmann, Tom
Hey Tobi,

we do still call `self content`, even if we go into the redirect branch, leading to the described timeout behavior:

WebClient>>#sendRequest:contentBlock
...

        "Handle redirect if needed"
        (self allowRedirect and:[response isRedirect]) ifTrue:[
                "Eat up the content of the previous response"
                response content.
                repeatRedirect := self redirect: request from: response.
        ].
...

I also felt like there should be a more general fix here. From what I read, if no content-length is given, the client should read data until the server closes the connection, however, the socket in SqueakSSL kept reporting that the socket was connected when I tried to fetch data from the GitHub API, until we timed out. This seemed like it would also constitute a correct (and also much simpler) fix, just that maybe we should make this list of codes where we do not need to read data even longer. Maybe I'm also missing another code path somewhere that should catch this case though.

Best,
Tom
________________________________________
From: Squeak-dev <[hidden email]> on behalf of Tobias Pape <[hidden email]>
Sent: Thursday, May 7, 2020 8:09:27 PM
To: The general-purpose Squeak developers list
Subject: Re: [squeak-dev] The Inbox: WebClient-Core-tobe.120.mcz

> On 07.05.2020, at 20:03, [hidden email] wrote:
>
> A new version of WebClient-Core was added to project The Inbox:
> http://source.squeak.org/inbox/WebClient-Core-tobe.120.mcz
>
> ==================== Summary ====================
>
> Name: WebClient-Core-tobe.120
> Author: tobe
> Time: 7 May 2020, 8:03:26.842149 pm
> UUID: e511b96d-6164-458e-bebd-4bb4f42cfb72
> Ancestors: WebClient-Core-nice.119
>
> If we get a 304, do not try to parse content
>
> In particular, since content-length is typically not set for a 304 response (https://tools.ietf.org/html/rfc7230#section-3.3.2) and we somehow failed to notice the connection being closed, we ended up with a timeout instead. This caused some Metacello tests to time out.
>
> =============== Diff against WebClient-Core-nice.119 ===============
>
> Item was changed:
>  ----- Method: WebResponse>>getContent (in category 'private') -----
>  getContent
>       "Do not read any content if this was a HEAD request or code is 204 (no content)"
> +     (request method = 'HEAD' or: [code = 204 or: [code = 304]]) ifTrue:[^''].
> -     (request method = 'HEAD' or: [code = 204]) ifTrue:[^''].
>       ^super getContent!
>

Hey Tom

interesting catch.
But we should not reach here. 304 is a redirect which ought to be handled BEFORE any getContent…

Best regards
        -Tobias


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: WebClient-Core-tobe.120.mcz

Tobias Pape
Hi

> On 07.05.2020, at 20:24, Beckmann, Tom <[hidden email]> wrote:
>
> Hey Tobi,
>
> we do still call `self content`, even if we go into the redirect branch, leading to the described timeout behavior:
>
> WebClient>>#sendRequest:contentBlock
> ...
>
>        "Handle redirect if needed"
>        (self allowRedirect and:[response isRedirect]) ifTrue:[
>                "Eat up the content of the previous response"
>                response content.
>                repeatRedirect := self redirect: request from: response.
>        ].
> ...
>
> I also felt like there should be a more general fix here. From what I read, if no content-length is given, the client should read data until the server closes the connection, however, the socket in SqueakSSL kept reporting that the socket was connected when I tried to fetch data from the GitHub API, until we timed out. This seemed like it would also constitute a correct (and also much simpler) fix, just that maybe we should make this list of codes where we do not need to read data even longer. Maybe I'm also missing another code path somewhere that should catch this case though.

Yeah, in the end it is a robuster thing. Maybe we should catch all 3xx here? they ought all have no content, right?

Best regards
        -Tobias


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: WebClient-Core-tobe.120.mcz

Beckmann, Tom
Ah I found the relevant part:
"   1.  Any response to a HEAD request and any response with a 1xx
       (Informational), 204 (No Content), or 304 (Not Modified) status
       code is always terminated by the first empty line after the
       header fields, regardless of the header fields present in the
       message, and thus cannot contain a message body."
- https://tools.ietf.org/html/rfc7230#section-3.3.3

There appears to be no special cases for the remaining 3xx codes. In fact I believe I do remember seeing HTML content containing a link back in the days on some 301 pages.

So could we change the line to

(request method = 'HEAD' or: [(code between: 100 and: 199) or: [code = 204 or: [code = 304]]])

when merging to trunk (and maybe include the reference to the RFC)? Or should I create a new commit to the inbox?

Best,
Tom
________________________________________
From: Squeak-dev <[hidden email]> on behalf of Tobias Pape <[hidden email]>
Sent: Thursday, May 7, 2020 8:41:18 PM
To: The general-purpose Squeak developers list
Subject: Re: [squeak-dev] The Inbox: WebClient-Core-tobe.120.mcz

Hi

> On 07.05.2020, at 20:24, Beckmann, Tom <[hidden email]> wrote:
>
> Hey Tobi,
>
> we do still call `self content`, even if we go into the redirect branch, leading to the described timeout behavior:
>
> WebClient>>#sendRequest:contentBlock
> ...
>
>        "Handle redirect if needed"
>        (self allowRedirect and:[response isRedirect]) ifTrue:[
>                "Eat up the content of the previous response"
>                response content.
>                repeatRedirect := self redirect: request from: response.
>        ].
> ...
>
> I also felt like there should be a more general fix here. From what I read, if no content-length is given, the client should read data until the server closes the connection, however, the socket in SqueakSSL kept reporting that the socket was connected when I tried to fetch data from the GitHub API, until we timed out. This seemed like it would also constitute a correct (and also much simpler) fix, just that maybe we should make this list of codes where we do not need to read data even longer. Maybe I'm also missing another code path somewhere that should catch this case though.

Yeah, in the end it is a robuster thing. Maybe we should catch all 3xx here? they ought all have no content, right?

Best regards
        -Tobias



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: WebClient-Core-tobe.120.mcz

Tobias Pape

> On 07.05.2020, at 20:55, Beckmann, Tom <[hidden email]> wrote:
>
> Ah I found the relevant part:
> "   1.  Any response to a HEAD request and any response with a 1xx
>       (Informational), 204 (No Content), or 304 (Not Modified) status
>       code is always terminated by the first empty line after the
>       header fields, regardless of the header fields present in the
>       message, and thus cannot contain a message body."
> - https://tools.ietf.org/html/rfc7230#section-3.3.3
>
> There appears to be no special cases for the remaining 3xx codes. In fact I believe I do remember seeing HTML content containing a link back in the days on some 301 pages.
>
> So could we change the line to
>
> (request method = 'HEAD' or: [(code between: 100 and: 199) or: [code = 204 or: [code = 304]]])
>
> when merging to trunk (and maybe include the reference to the RFC)? Or should I create a new commit to the inbox?

New commit and then both to the trunk :)
shipit ;)

-t

>
> Best,
> Tom
> ________________________________________
> From: Squeak-dev <[hidden email]> on behalf of Tobias Pape <[hidden email]>
> Sent: Thursday, May 7, 2020 8:41:18 PM
> To: The general-purpose Squeak developers list
> Subject: Re: [squeak-dev] The Inbox: WebClient-Core-tobe.120.mcz
>
> Hi
>> On 07.05.2020, at 20:24, Beckmann, Tom <[hidden email]> wrote:
>>
>> Hey Tobi,
>>
>> we do still call `self content`, even if we go into the redirect branch, leading to the described timeout behavior:
>>
>> WebClient>>#sendRequest:contentBlock
>> ...
>>
>>       "Handle redirect if needed"
>>       (self allowRedirect and:[response isRedirect]) ifTrue:[
>>               "Eat up the content of the previous response"
>>               response content.
>>               repeatRedirect := self redirect: request from: response.
>>       ].
>> ...
>>
>> I also felt like there should be a more general fix here. From what I read, if no content-length is given, the client should read data until the server closes the connection, however, the socket in SqueakSSL kept reporting that the socket was connected when I tried to fetch data from the GitHub API, until we timed out. This seemed like it would also constitute a correct (and also much simpler) fix, just that maybe we should make this list of codes where we do not need to read data even longer. Maybe I'm also missing another code path somewhere that should catch this case though.
>
> Yeah, in the end it is a robuster thing. Maybe we should catch all 3xx here? they ought all have no content, right?
>
> Best regards
>        -Tobias
>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: WebClient-Core-tobe.120.mcz

fniephaus
On Thu, May 7, 2020 at 9:11 PM Tobias Pape <[hidden email]> wrote:

>
>
> > On 07.05.2020, at 20:55, Beckmann, Tom <[hidden email]> wrote:
> >
> > Ah I found the relevant part:
> > "   1.  Any response to a HEAD request and any response with a 1xx
> >       (Informational), 204 (No Content), or 304 (Not Modified) status
> >       code is always terminated by the first empty line after the
> >       header fields, regardless of the header fields present in the
> >       message, and thus cannot contain a message body."
> > - https://tools.ietf.org/html/rfc7230#section-3.3.3
> >
> > There appears to be no special cases for the remaining 3xx codes. In fact I believe I do remember seeing HTML content containing a link back in the days on some 301 pages.
> >
> > So could we change the line to
> >
> > (request method = 'HEAD' or: [(code between: 100 and: 199) or: [code = 204 or: [code = 304]]])
> >
> > when merging to trunk (and maybe include the reference to the RFC)? Or should I create a new commit to the inbox?
>
> New commit and then both to the trunk :)
> shipit ;)

+1

>
> -t
>
> >
> > Best,
> > Tom
> > ________________________________________
> > From: Squeak-dev <[hidden email]> on behalf of Tobias Pape <[hidden email]>
> > Sent: Thursday, May 7, 2020 8:41:18 PM
> > To: The general-purpose Squeak developers list
> > Subject: Re: [squeak-dev] The Inbox: WebClient-Core-tobe.120.mcz
> >
> > Hi
> >> On 07.05.2020, at 20:24, Beckmann, Tom <[hidden email]> wrote:
> >>
> >> Hey Tobi,
> >>
> >> we do still call `self content`, even if we go into the redirect branch, leading to the described timeout behavior:
> >>
> >> WebClient>>#sendRequest:contentBlock
> >> ...
> >>
> >>       "Handle redirect if needed"
> >>       (self allowRedirect and:[response isRedirect]) ifTrue:[
> >>               "Eat up the content of the previous response"
> >>               response content.
> >>               repeatRedirect := self redirect: request from: response.
> >>       ].
> >> ...
> >>
> >> I also felt like there should be a more general fix here. From what I read, if no content-length is given, the client should read data until the server closes the connection, however, the socket in SqueakSSL kept reporting that the socket was connected when I tried to fetch data from the GitHub API, until we timed out. This seemed like it would also constitute a correct (and also much simpler) fix, just that maybe we should make this list of codes where we do not need to read data even longer. Maybe I'm also missing another code path somewhere that should catch this case though.
> >
> > Yeah, in the end it is a robuster thing. Maybe we should catch all 3xx here? they ought all have no content, right?
> >
> > Best regards
> >        -Tobias
> >
> >
> >
>
>
>