SstHttpClient - how to follow 301 Moved Permanently

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

SstHttpClient - how to follow 301 Moved Permanently

jtuchel
Once again I am oversseeing something very obvious...


What I try to do is download an rss feed. The code so far looks like this:

    | client response |

    client := SstHttpClient forTransportScheme: 'httpsl'.
    fullUrl := 'https://www.kontolino.de/feed'.
   
    [
        client startUp.
        response := client get: fullUrl sstAsUrl using: nil withHeaders: headers]
            ensure: [client shutDown].
           
    response inspect.


What I get is an SstError 153 which contains the following headers (among others):

SstHttpResponseHeader{
HTTP/1.1 301 Moved Permanently
Location: https://www.kontolino.de/feed/


Notice the trailing slash in the Location header. So of course I tried to change the url and added a trailing slash. The answer is the same.

So I tried cUrl (I love this tool more and more). It doesn't automatically follow the redirect and gets the same result as the code mentioned above. But it does have a -L option to follow redirects. It does its job perfectly fine.

So what do I have to do to get the actual file from the server?

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: SstHttpClient - how to follow 301 Moved Permanently

Seth Berman
Hi Joachim,

I look at the headers that google chrome sends to that request...I see
GET /feed/ HTTP/1.1 Host: www.kontolino.de
The SstHttpClient sends
GET /feed/ HTTP/1.1 Host: www.kontolino.de:443

There is some bit of logic (which I'm guessing is wrong or not to spec) in SstHttpClient>>buildHttpGETFor:using:
that fills in the default port for the connection type (443) if it is not in the url.  I'm guessing you have some customizations
that make 443 not the correct choice or something redirects if you send anything on that port?

So, just commented out the section below and it will work.
In the meantime, I'll have to see what the spec says.
I do see that not following redirects by default is to spec. (RFC 2616 10.3.3)
I do believe that a "follow redirect" optional setting would be good to have
Comment this out in that method mentioned above for this url.
(hostAddress includes: $:)
ifFalse: [
hostAddress := '%1:%2' bindWith: hostAddress with: aUrl class defaultPort printString].

On Friday, June 22, 2018 at 6:31:50 AM UTC-4, Joachim Tuchel wrote:
Once again I am oversseeing something very obvious...


What I try to do is download an rss feed. The code so far looks like this:

    | client response |

    client := SstHttpClient forTransportScheme: 'httpsl'.
    fullUrl := '<a href="https://www.kontolino.de/feed" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHTL6oOdBUSNDyZeUCKIFNZhgNRbA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHTL6oOdBUSNDyZeUCKIFNZhgNRbA&#39;;return true;">https://www.kontolino.de/feed'.
   
    [
        client startUp.
        response := client get: fullUrl sstAsUrl using: nil withHeaders: headers]
            ensure: [client shutDown].
           
    response inspect.


What I get is an SstError 153 which contains the following headers (among others):

SstHttpResponseHeader{
HTTP/1.1 301 Moved Permanently
Location: <a href="https://www.kontolino.de/feed/" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFS1rT-nRq0fd38ujB7yYOhVIyUmA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFS1rT-nRq0fd38ujB7yYOhVIyUmA&#39;;return true;">https://www.kontolino.de/feed/


Notice the trailing slash in the Location header. So of course I tried to change the url and added a trailing slash. The answer is the same.

So I tried cUrl (I love this tool more and more). It doesn't automatically follow the redirect and gets the same result as the code mentioned above. But it does have a -L option to follow redirects. It does its job perfectly fine.

So what do I have to do to get the actual file from the server?

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: SstHttpClient - how to follow 301 Moved Permanently

jtuchel
Hi Seth,

thanks a lot for taking a look into this - this fast!

Your suggestion works for me as well. I wonder if something else is broken by it...

I guess you are right that not following a redirect by default is in line with the spec. Otherwise, curl wouldn't support it with an extra parameter but just do it.


Joachim



Am Freitag, 22. Juni 2018 15:33:41 UTC+2 schrieb Seth Berman:
Hi Joachim,

I look at the headers that google chrome sends to that request...I see
GET /feed/ HTTP/1.1 Host: <a href="http://www.kontolino.de" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEGJTJi_1HtBHElrsQy3y0diJB4eQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEGJTJi_1HtBHElrsQy3y0diJB4eQ&#39;;return true;">www.kontolino.de
The SstHttpClient sends
GET /feed/ HTTP/1.1 Host: <a href="http://www.kontolino.de:443" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de%3A443\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEeM_YVbCkTnvF93FAOEEvHJSwLDQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de%3A443\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEeM_YVbCkTnvF93FAOEEvHJSwLDQ&#39;;return true;">www.kontolino.de:443

There is some bit of logic (which I'm guessing is wrong or not to spec) in SstHttpClient>>buildHttpGETFor:using:
that fills in the default port for the connection type (443) if it is not in the url.  I'm guessing you have some customizations
that make 443 not the correct choice or something redirects if you send anything on that port?

So, just commented out the section below and it will work.
In the meantime, I'll have to see what the spec says.
I do see that not following redirects by default is to spec. (RFC 2616 10.3.3)
I do believe that a "follow redirect" optional setting would be good to have
Comment this out in that method mentioned above for this url.
(hostAddress includes: $:)
ifFalse: [
hostAddress := '%1:%2' bindWith: hostAddress with: aUrl class defaultPort printString].

On Friday, June 22, 2018 at 6:31:50 AM UTC-4, Joachim Tuchel wrote:
Once again I am oversseeing something very obvious...


What I try to do is download an rss feed. The code so far looks like this:

    | client response |

    client := SstHttpClient forTransportScheme: 'httpsl'.
    fullUrl := '<a href="https://www.kontolino.de/feed" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHTL6oOdBUSNDyZeUCKIFNZhgNRbA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHTL6oOdBUSNDyZeUCKIFNZhgNRbA&#39;;return true;">https://www.kontolino.de/feed'.
   
    [
        client startUp.
        response := client get: fullUrl sstAsUrl using: nil withHeaders: headers]
            ensure: [client shutDown].
           
    response inspect.


What I get is an SstError 153 which contains the following headers (among others):

SstHttpResponseHeader{
HTTP/1.1 301 Moved Permanently
Location: <a href="https://www.kontolino.de/feed/" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFS1rT-nRq0fd38ujB7yYOhVIyUmA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFS1rT-nRq0fd38ujB7yYOhVIyUmA&#39;;return true;">https://www.kontolino.de/feed/


Notice the trailing slash in the Location header. So of course I tried to change the url and added a trailing slash. The answer is the same.

So I tried cUrl (I love this tool more and more). It doesn't automatically follow the redirect and gets the same result as the code mentioned above. But it does have a -L option to follow redirects. It does its job perfectly fine.

So what do I have to do to get the actual file from the server?

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: SstHttpClient - how to follow 301 Moved Permanently

Seth Berman
Hi Joachim,

According to the RFC Host section here...
"A "host" without any trailing port information implies the default port for the service requested (e.g., "80" for an HTTP URL). For example, a request on the origin server for <http://www.w3.org/pub/WWW/> would properly include:
       GET /pub/WWW/ HTTP/1.1
       Host: www.w3.org"

So, the SstHttpClient doesn't really need to be explicitly appending the default port of a Url type...in fact this is a good example of why it shouldn't.
The problem is if someone has subclassed aUrl and put a custom answer for #defaultPort...now to remove this section will likely break customer code since the suggested change is not to append it anymore.

I suppose since we know the protocol and the default port the protocol, if we see both we can just skip appending it.
If we see the defaultPort is something different than the default for the protocol...then we know we should append it.
I think that solves this basic issue while keeping everything compatible.

- Seth

On Friday, June 22, 2018 at 9:52:43 AM UTC-4, Joachim Tuchel wrote:
Hi Seth,

thanks a lot for taking a look into this - this fast!

Your suggestion works for me as well. I wonder if something else is broken by it...

I guess you are right that not following a redirect by default is in line with the spec. Otherwise, curl wouldn't support it with an extra parameter but just do it.

To answer your question about the redirect itself: Apache is confgured to redirect all requests coming in for http://www.kontolino.de to https://kontolino.de
But whats funny is that the :443 vhost does not redirect... Maybe Wordpress adds it (like Seaside does)...?



Joachim



Am Freitag, 22. Juni 2018 15:33:41 UTC+2 schrieb Seth Berman:
Hi Joachim,

I look at the headers that google chrome sends to that request...I see
GET /feed/ HTTP/1.1 Host: <a href="http://www.kontolino.de" rel="nofollow" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEGJTJi_1HtBHElrsQy3y0diJB4eQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEGJTJi_1HtBHElrsQy3y0diJB4eQ&#39;;return true;">www.kontolino.de
The SstHttpClient sends
GET /feed/ HTTP/1.1 Host: <a href="http://www.kontolino.de:443" rel="nofollow" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de%3A443\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEeM_YVbCkTnvF93FAOEEvHJSwLDQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de%3A443\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEeM_YVbCkTnvF93FAOEEvHJSwLDQ&#39;;return true;">www.kontolino.de:443

There is some bit of logic (which I'm guessing is wrong or not to spec) in SstHttpClient>>buildHttpGETFor:using:
that fills in the default port for the connection type (443) if it is not in the url.  I'm guessing you have some customizations
that make 443 not the correct choice or something redirects if you send anything on that port?

So, just commented out the section below and it will work.
In the meantime, I'll have to see what the spec says.
I do see that not following redirects by default is to spec. (RFC 2616 10.3.3)
I do believe that a "follow redirect" optional setting would be good to have
Comment this out in that method mentioned above for this url.
(hostAddress includes: $:)
ifFalse: [
hostAddress := '%1:%2' bindWith: hostAddress with: aUrl class defaultPort printString].

On Friday, June 22, 2018 at 6:31:50 AM UTC-4, Joachim Tuchel wrote:
Once again I am oversseeing something very obvious...


What I try to do is download an rss feed. The code so far looks like this:

    | client response |

    client := SstHttpClient forTransportScheme: 'httpsl'.
    fullUrl := '<a href="https://www.kontolino.de/feed" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHTL6oOdBUSNDyZeUCKIFNZhgNRbA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHTL6oOdBUSNDyZeUCKIFNZhgNRbA&#39;;return true;">https://www.kontolino.de/feed'.
   
    [
        client startUp.
        response := client get: fullUrl sstAsUrl using: nil withHeaders: headers]
            ensure: [client shutDown].
           
    response inspect.


What I get is an SstError 153 which contains the following headers (among others):

SstHttpResponseHeader{
HTTP/1.1 301 Moved Permanently
Location: <a href="https://www.kontolino.de/feed/" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFS1rT-nRq0fd38ujB7yYOhVIyUmA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFS1rT-nRq0fd38ujB7yYOhVIyUmA&#39;;return true;">https://www.kontolino.de/feed/


Notice the trailing slash in the Location header. So of course I tried to change the url and added a trailing slash. The answer is the same.

So I tried cUrl (I love this tool more and more). It doesn't automatically follow the redirect and gets the same result as the code mentioned above. But it does have a -L option to follow redirects. It does its job perfectly fine.

So what do I have to do to get the actual file from the server?

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: SstHttpClient - how to follow 301 Moved Permanently

jtuchel
Seth,

your suggestion sounds right.

But I guess if anybody overrode #defaultPort, they better knew exactly what they did, because the default port for a protocol is not something you'd be playing with, it is defined in some kind of consensus or an RFC (I am far from being an expert in IP standardization). So I'd regard such changes similar to messing with a private protocol, you better know what you're doing...

I am not sure your suggestion is like a cat biting its own tail: where would you get the default port from in order to decide whether you append the port number or not? If it isn't the URL, it's a new / second place. What if people go and change/override something there as well?

Joachim




Am Freitag, 22. Juni 2018 18:20:21 UTC+2 schrieb Seth Berman:
Hi Joachim,

According to the RFC Host section <a href="https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.23" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.w3.org%2FProtocols%2Frfc2616%2Frfc2616-sec14.html%23sec14.23\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGUIuS02CeQ4mIHBJyr5YMvz2Xk9Q&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.w3.org%2FProtocols%2Frfc2616%2Frfc2616-sec14.html%23sec14.23\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGUIuS02CeQ4mIHBJyr5YMvz2Xk9Q&#39;;return true;">here...
"A "host" without any trailing port information implies the default port for the service requested (e.g., "80" for an HTTP URL). For example, a request on the origin server for <<a href="http://www.w3.org/pub/WWW/" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.w3.org%2Fpub%2FWWW%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEwCyyP7xrqxi_cqXkzLwUcCbNLHA&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.w3.org%2Fpub%2FWWW%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEwCyyP7xrqxi_cqXkzLwUcCbNLHA&#39;;return true;">http://www.w3.org/pub/WWW/> would properly include:
       GET /pub/WWW/ HTTP/1.1
       Host: <a href="http://www.w3.org" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.w3.org\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFE42nqmVdSHGRia1EcUnWeCkex2A&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.w3.org\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFE42nqmVdSHGRia1EcUnWeCkex2A&#39;;return true;">www.w3.org"

So, the SstHttpClient doesn't really need to be explicitly appending the default port of a Url type...in fact this is a good example of why it shouldn't.
The problem is if someone has subclassed aUrl and put a custom answer for #defaultPort...now to remove this section will likely break customer code since the suggested change is not to append it anymore.

I suppose since we know the protocol and the default port the protocol, if we see both we can just skip appending it.
If we see the defaultPort is something different than the default for the protocol...then we know we should append it.
I think that solves this basic issue while keeping everything compatible.

- Seth

On Friday, June 22, 2018 at 9:52:43 AM UTC-4, Joachim Tuchel wrote:
Hi Seth,

thanks a lot for taking a look into this - this fast!

Your suggestion works for me as well. I wonder if something else is broken by it...

I guess you are right that not following a redirect by default is in line with the spec. Otherwise, curl wouldn't support it with an extra parameter but just do it.

To answer your question about the redirect itself: Apache is confgured to redirect all requests coming in for <a href="http://www.kontolino.de" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEGJTJi_1HtBHElrsQy3y0diJB4eQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEGJTJi_1HtBHElrsQy3y0diJB4eQ&#39;;return true;">http://www.kontolino.de to <a href="https://kontolino.de" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fkontolino.de\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGzed6K8KDbCccdm6ufXwHiVqotJQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fkontolino.de\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGzed6K8KDbCccdm6ufXwHiVqotJQ&#39;;return true;">https://kontolino.de
But whats funny is that the :443 vhost does not redirect... Maybe Wordpress adds it (like Seaside does)...?



Joachim



Am Freitag, 22. Juni 2018 15:33:41 UTC+2 schrieb Seth Berman:
Hi Joachim,

I look at the headers that google chrome sends to that request...I see
GET /feed/ HTTP/1.1 Host: <a href="http://www.kontolino.de" rel="nofollow" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEGJTJi_1HtBHElrsQy3y0diJB4eQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEGJTJi_1HtBHElrsQy3y0diJB4eQ&#39;;return true;">www.kontolino.de
The SstHttpClient sends
GET /feed/ HTTP/1.1 Host: <a href="http://www.kontolino.de:443" rel="nofollow" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de%3A443\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEeM_YVbCkTnvF93FAOEEvHJSwLDQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de%3A443\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEeM_YVbCkTnvF93FAOEEvHJSwLDQ&#39;;return true;">www.kontolino.de:443

There is some bit of logic (which I'm guessing is wrong or not to spec) in SstHttpClient>>buildHttpGETFor:using:
that fills in the default port for the connection type (443) if it is not in the url.  I'm guessing you have some customizations
that make 443 not the correct choice or something redirects if you send anything on that port?

So, just commented out the section below and it will work.
In the meantime, I'll have to see what the spec says.
I do see that not following redirects by default is to spec. (RFC 2616 10.3.3)
I do believe that a "follow redirect" optional setting would be good to have
Comment this out in that method mentioned above for this url.
(hostAddress includes: $:)
ifFalse: [
hostAddress := '%1:%2' bindWith: hostAddress with: aUrl class defaultPort printString].

On Friday, June 22, 2018 at 6:31:50 AM UTC-4, Joachim Tuchel wrote:
Once again I am oversseeing something very obvious...


What I try to do is download an rss feed. The code so far looks like this:

    | client response |

    client := SstHttpClient forTransportScheme: 'httpsl'.
    fullUrl := '<a href="https://www.kontolino.de/feed" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHTL6oOdBUSNDyZeUCKIFNZhgNRbA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHTL6oOdBUSNDyZeUCKIFNZhgNRbA&#39;;return true;">https://www.kontolino.de/feed'.
   
    [
        client startUp.
        response := client get: fullUrl sstAsUrl using: nil withHeaders: headers]
            ensure: [client shutDown].
           
    response inspect.


What I get is an SstError 153 which contains the following headers (among others):

SstHttpResponseHeader{
HTTP/1.1 301 Moved Permanently
Location: <a href="https://www.kontolino.de/feed/" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFS1rT-nRq0fd38ujB7yYOhVIyUmA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFS1rT-nRq0fd38ujB7yYOhVIyUmA&#39;;return true;">https://www.kontolino.de/feed/


Notice the trailing slash in the Location header. So of course I tried to change the url and added a trailing slash. The answer is the same.

So I tried cUrl (I love this tool more and more). It doesn't automatically follow the redirect and gets the same result as the code mentioned above. But it does have a -L option to follow redirects. It does its job perfectly fine.

So what do I have to do to get the actual file from the server?

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: SstHttpClient - how to follow 301 Moved Permanently

Seth Berman
Hi Joachim,

SstHttpUrl class>>defaultPort and SstHttpsUrl class>>defaultPort are what the incoming aUrl argument is checked against.
These are hardcoded to 80 and 443.

(aUrl scheme = 'http' and: [aUrl class defaultPort ~= self httpUrlClass defaultPort]) ifTrue: ["append"].
(aUrl scheme = 'https' and: [aUrl class defaultPort ~= self httpsUrlClass defaultPort]) ifTrue: ["append"].

httpUrlClass -> SstHttpUrl
httpsUrlClass -> SstHttpsUrl

The only assumption here is that httpUrlClass defaultPort and httpsUrlClass defaultPort contain the default ports for http and https protocols.
I think this is reasonable enough.  We don't design for eternal changes to our base code...only for extending it.
Though in many respects, this will handle both.  The user could change SstHttpUrl class>>defaultPort to 8080 and this all still works.  It just means that nothing will be appended if the incoming Url has a default port of 8080 and :80 will always be appended for an incoming Url of 80.

I've got these changes but they won't make 9.1 since we are past code cutoff.
I've also got the changes to handle redirect (new allowRedirects:) setting and retry count (new retryCount:) setting.
You want the retryCount because it's possible to get infinite redirects.
It will also handle redirect code 303 ("SEE OTHER") which means if it's a POST, the request must be converted to a GET with content zero'd out and redirected.
This way you don't get multiple POSTs when you are redirected which is regarded as highly unfavorable.

I'll put them in after this release and get them in the first ECAP we do for 9.2

-- Seth

On Monday, June 25, 2018 at 2:07:17 AM UTC-4, Joachim Tuchel wrote:
Seth,

your suggestion sounds right.

But I guess if anybody overrode #defaultPort, they better knew exactly what they did, because the default port for a protocol is not something you'd be playing with, it is defined in some kind of consensus or an RFC (I am far from being an expert in IP standardization). So I'd regard such changes similar to messing with a private protocol, you better know what you're doing...

I am not sure your suggestion is like a cat biting its own tail: where would you get the default port from in order to decide whether you append the port number or not? If it isn't the URL, it's a new / second place. What if people go and change/override something there as well?

Joachim




Am Freitag, 22. Juni 2018 18:20:21 UTC+2 schrieb Seth Berman:
Hi Joachim,

According to the RFC Host section <a href="https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.23" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.w3.org%2FProtocols%2Frfc2616%2Frfc2616-sec14.html%23sec14.23\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGUIuS02CeQ4mIHBJyr5YMvz2Xk9Q&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.w3.org%2FProtocols%2Frfc2616%2Frfc2616-sec14.html%23sec14.23\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGUIuS02CeQ4mIHBJyr5YMvz2Xk9Q&#39;;return true;">here...
"A "host" without any trailing port information implies the default port for the service requested (e.g., "80" for an HTTP URL). For example, a request on the origin server for <<a href="http://www.w3.org/pub/WWW/" rel="nofollow" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.w3.org%2Fpub%2FWWW%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEwCyyP7xrqxi_cqXkzLwUcCbNLHA&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.w3.org%2Fpub%2FWWW%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEwCyyP7xrqxi_cqXkzLwUcCbNLHA&#39;;return true;">http://www.w3.org/pub/WWW/> would properly include:
       GET /pub/WWW/ HTTP/1.1
       Host: <a href="http://www.w3.org" rel="nofollow" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.w3.org\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFE42nqmVdSHGRia1EcUnWeCkex2A&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.w3.org\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFE42nqmVdSHGRia1EcUnWeCkex2A&#39;;return true;">www.w3.org"

So, the SstHttpClient doesn't really need to be explicitly appending the default port of a Url type...in fact this is a good example of why it shouldn't.
The problem is if someone has subclassed aUrl and put a custom answer for #defaultPort...now to remove this section will likely break customer code since the suggested change is not to append it anymore.

I suppose since we know the protocol and the default port the protocol, if we see both we can just skip appending it.
If we see the defaultPort is something different than the default for the protocol...then we know we should append it.
I think that solves this basic issue while keeping everything compatible.

- Seth

On Friday, June 22, 2018 at 9:52:43 AM UTC-4, Joachim Tuchel wrote:
Hi Seth,

thanks a lot for taking a look into this - this fast!

Your suggestion works for me as well. I wonder if something else is broken by it...

I guess you are right that not following a redirect by default is in line with the spec. Otherwise, curl wouldn't support it with an extra parameter but just do it.

To answer your question about the redirect itself: Apache is confgured to redirect all requests coming in for <a href="http://www.kontolino.de" rel="nofollow" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEGJTJi_1HtBHElrsQy3y0diJB4eQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEGJTJi_1HtBHElrsQy3y0diJB4eQ&#39;;return true;">http://www.kontolino.de to <a href="https://kontolino.de" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fkontolino.de\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGzed6K8KDbCccdm6ufXwHiVqotJQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fkontolino.de\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGzed6K8KDbCccdm6ufXwHiVqotJQ&#39;;return true;">https://kontolino.de
But whats funny is that the :443 vhost does not redirect... Maybe Wordpress adds it (like Seaside does)...?



Joachim



Am Freitag, 22. Juni 2018 15:33:41 UTC+2 schrieb Seth Berman:
Hi Joachim,

I look at the headers that google chrome sends to that request...I see
GET /feed/ HTTP/1.1 Host: <a href="http://www.kontolino.de" rel="nofollow" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEGJTJi_1HtBHElrsQy3y0diJB4eQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEGJTJi_1HtBHElrsQy3y0diJB4eQ&#39;;return true;">www.kontolino.de
The SstHttpClient sends
GET /feed/ HTTP/1.1 Host: <a href="http://www.kontolino.de:443" rel="nofollow" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de%3A443\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEeM_YVbCkTnvF93FAOEEvHJSwLDQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.kontolino.de%3A443\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEeM_YVbCkTnvF93FAOEEvHJSwLDQ&#39;;return true;">www.kontolino.de:443

There is some bit of logic (which I'm guessing is wrong or not to spec) in SstHttpClient>>buildHttpGETFor:using:
that fills in the default port for the connection type (443) if it is not in the url.  I'm guessing you have some customizations
that make 443 not the correct choice or something redirects if you send anything on that port?

So, just commented out the section below and it will work.
In the meantime, I'll have to see what the spec says.
I do see that not following redirects by default is to spec. (RFC 2616 10.3.3)
I do believe that a "follow redirect" optional setting would be good to have
Comment this out in that method mentioned above for this url.
(hostAddress includes: $:)
ifFalse: [
hostAddress := '%1:%2' bindWith: hostAddress with: aUrl class defaultPort printString].

On Friday, June 22, 2018 at 6:31:50 AM UTC-4, Joachim Tuchel wrote:
Once again I am oversseeing something very obvious...


What I try to do is download an rss feed. The code so far looks like this:

    | client response |

    client := SstHttpClient forTransportScheme: 'httpsl'.
    fullUrl := '<a href="https://www.kontolino.de/feed" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHTL6oOdBUSNDyZeUCKIFNZhgNRbA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHTL6oOdBUSNDyZeUCKIFNZhgNRbA&#39;;return true;">https://www.kontolino.de/feed'.
   
    [
        client startUp.
        response := client get: fullUrl sstAsUrl using: nil withHeaders: headers]
            ensure: [client shutDown].
           
    response inspect.


What I get is an SstError 153 which contains the following headers (among others):

SstHttpResponseHeader{
HTTP/1.1 301 Moved Permanently
Location: <a href="https://www.kontolino.de/feed/" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFS1rT-nRq0fd38ujB7yYOhVIyUmA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.kontolino.de%2Ffeed%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFS1rT-nRq0fd38ujB7yYOhVIyUmA&#39;;return true;">https://www.kontolino.de/feed/


Notice the trailing slash in the Location header. So of course I tried to change the url and added a trailing slash. The answer is the same.

So I tried cUrl (I love this tool more and more). It doesn't automatically follow the redirect and gets the same result as the code mentioned above. But it does have a -L option to follow redirects. It does its job perfectly fine.

So what do I have to do to get the actual file from the server?

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.