Server timeouts and 504 return codes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Server timeouts and 504 return codes

timrowledge
A couple of weeks ago I had a problem loading something via SqueakMap that resulted in a 504 error. Chris M quite rightly pointed out that responding to a timeout with an immediate retry might not be the best thing (referencing some code I published to try to handle this problem); looking at the error more closely I finally noticed that a 504 is a *gateway* timeout rather than anything that seems likely to be a problem at the SM or MC repository server. Indeed the error came back much quicker than the 45 seconds timeout that we seem to have set for our http connections.

I'm a long way from being an expert in the area of connecting to servers via gateways and what their timeous might be etc. so excuse stupid-question syndrome - I know this isn't Quora where stupid-question is the order of the day.
Am I right in thinking that a 504 error means that some *intermediate* server timed out according to some setting in its internal config ?
Am I right in imagining that we can't normally affect that timeout?

If I have any reasonable grasp on this then we  should probably detect the 504 (in part by explicitly using a WebClient and its error handling rather than the slightly wonky httpSocket faced we have currently) and retry the connection ? Any other error or a timeout at *our* end would still be best handled as an error.

Except of course a 418 which has well defined error handling...

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
You forgot to do your backup 16 days ago.  Tomorrow you'll need that version.



Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Tobias Pape
Hi

> On 27.01.2019, at 02:53, tim Rowledge <[hidden email]> wrote:
>
> A couple of weeks ago I had a problem loading something via SqueakMap that resulted in a 504 error. Chris M quite rightly pointed out that responding to a timeout with an immediate retry might not be the best thing (referencing some code I published to try to handle this problem); looking at the error more closely I finally noticed that a 504 is a *gateway* timeout rather than anything that seems likely to be a problem at the SM or MC repository server. Indeed the error came back much quicker than the 45 seconds timeout that we seem to have set for our http connections.
>
> I'm a long way from being an expert in the area of connecting to servers via gateways and what their timeous might be etc. so excuse stupid-question syndrome - I know this isn't Quora where stupid-question is the order of the day.
> Am I right in thinking that a 504 error means that some *intermediate* server timed out according to some setting in its internal config ?
> Am I right in imagining that we can't normally affect that timeout?
>

Well, we can.

What happens here:

- All our websites, including all HTTP services, such as the Map, arrive together at squeak.org, aka alan.box.squeak.org
  That is an nginx server. And also the server who eventually spits out the 504.
- alan then sees we want a connection to the Map, and does a HTTP request to ted.box.squeak.org (=> alan is a _reverse proxy_)
  and upon response gets us that back.

- if ted fails to respond in 60s, alan gives a 504.

Simple as that. This limits the possibility that we wait too long (ie >60s) on ted.

Elephant in the room: why not directly ted? the nginx on alan is configured as hardened as I thought best, and actually can handle a multitude of requests much better than our squeak-based "application servers". This distinction between reverse proxy and application server is btw quite standard and enables some things. For example:

We can tune a lot of things on alan with regards to how it should handle things. The simplest being:

- we can tune the timeout: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout
  that's where the 60s come from, and we could simply crank it up.
  - HOWEVER: this could mean we eventually run into other timeouts, for example on the server or even in TCP or so.
  - so increasing this just like that _may_ help or _may_ make the Map useless altogether, so please be careful y'all :)

but also:
- we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache
  - we could make alan not even ask ted when we know the answer already.
  - Attention: we need a lot of information on what is stable and what not to do this.
  - (its tempting to try, tho)
  - (we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)
- Note: a lot of time is probably spend by ted generating HTTP and by alan parsing HTTP. Using Fcgi, for example, reduces that, and is supported by both nginx (https://nginx.org/en/docs/http/ngx_http_fastcgi_module.html) and GemStone, but I don't know whether we already have one in squeak.

> If I have any reasonable grasp on this then we  should probably detect the 504 (in part by explicitly using a WebClient and its error handling rather than the slightly wonky httpSocket faced we have currently) and retry the connection ? Any other error or a timeout at *our* end would still be best handled as an error.

All 500-ish codes essentially say "the server is to blame" and the client can do noghitn about that.
I don't think that 504 is meaningfully better handled than 503 or 502 in the WebClient. It think it's ok to pass that through.


>
> Except of course a 418 which has well defined error handling...
>

At least not 451…

Best regards
        -Tobias

> tim
> --
> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> You forgot to do your backup 16 days ago.  Tomorrow you'll need that version.
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Levente Uzonyi
On Sun, 27 Jan 2019, Tobias Pape wrote:

> Hi
>
>> On 27.01.2019, at 02:53, tim Rowledge <[hidden email]> wrote:
>>
>> A couple of weeks ago I had a problem loading something via SqueakMap that resulted in a 504 error. Chris M quite rightly pointed out that responding to a timeout with an immediate retry might not be the best thing (referencing some code I published to try to handle this problem); looking at the error more closely I finally noticed that a 504 is a *gateway* timeout rather than anything that seems likely to be a problem at the SM or MC repository server. Indeed the error came back much quicker than the 45 seconds timeout that we seem to have set for our http connections.
>>
>> I'm a long way from being an expert in the area of connecting to servers via gateways and what their timeous might be etc. so excuse stupid-question syndrome - I know this isn't Quora where stupid-question is the order of the day.
>> Am I right in thinking that a 504 error means that some *intermediate* server timed out according to some setting in its internal config ?
>> Am I right in imagining that we can't normally affect that timeout?
>>
>
> Well, we can.
>
> What happens here:
>
> - All our websites, including all HTTP services, such as the Map, arrive together at squeak.org, aka alan.box.squeak.org
>  That is an nginx server. And also the server who eventually spits out the 504.
> - alan then sees we want a connection to the Map, and does a HTTP request to ted.box.squeak.org (=> alan is a _reverse proxy_)
>  and upon response gets us that back.
>
> - if ted fails to respond in 60s, alan gives a 504.
>
> Simple as that. This limits the possibility that we wait too long (ie >60s) on ted.
>
> Elephant in the room: why not directly ted? the nginx on alan is configured as hardened as I thought best, and actually can handle a multitude of requests much better than our squeak-based "application servers". This distinction between reverse proxy and application server is btw quite standard and enables some things. For example:
>
> We can tune a lot of things on alan with regards to how it should handle things. The simplest being:
>
> - we can tune the timeout: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout
>  that's where the 60s come from, and we could simply crank it up.
>  - HOWEVER: this could mean we eventually run into other timeouts, for example on the server or even in TCP or so.
>  - so increasing this just like that _may_ help or _may_ make the Map useless altogether, so please be careful y'all :)
Tim reported shorter than 45s timeouts, so it is very likely an issue with
the SqueakMap image.

>
> but also:
> - we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache
>  - we could make alan not even ask ted when we know the answer already.
>  - Attention: we need a lot of information on what is stable and what not to do this.
>  - (its tempting to try, tho)
>  - (we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)

If squeaksource/mc used ETags, then the squeaksource image could simply
return 304 and let nginx serve the cached mczs while keeping the
statistics updated.
That would also let us save bandwidth by not downloading files already
sitting in the client's package cache.
We could also use nginx to serve files instead of the image, but then the
image would have to know that it's sitting behind nginx.

> - Note: a lot of time is probably spend by ted generating HTTP and by alan parsing HTTP. Using Fcgi, for example, reduces that, and is supported by both nginx (https://nginx.org/en/docs/http/ngx_http_fastcgi_module.html) and GemStone, but I don't know whether we already have one in squeak.

I'm 99% sure http overhead is negligible.

Levente

>
>> If I have any reasonable grasp on this then we  should probably detect the 504 (in part by explicitly using a WebClient and its error handling rather than the slightly wonky httpSocket faced we have currently) and retry the connection ? Any other error or a timeout at *our* end would still be best handled as an error.
>
> All 500-ish codes essentially say "the server is to blame" and the client can do noghitn about that.
> I don't think that 504 is meaningfully better handled than 503 or 502 in the WebClient. It think it's ok to pass that through.
>
>
>>
>> Except of course a 418 which has well defined error handling...
>>
>
> At least not 451…
>
> Best regards
> -Tobias
>
>> tim
>> --
>> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
>> You forgot to do your backup 16 days ago.  Tomorrow you'll need that version.
>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Tobias Pape
Hi

> On 27.01.2019, at 18:50, Levente Uzonyi <[hidden email]> wrote:
>
> On Sun, 27 Jan 2019, Tobias Pape wrote:
>
>> Hi
>>
>>> On 27.01.2019, at 02:53, tim Rowledge <[hidden email]> wrote:
>>> A couple of weeks ago I had a problem loading something via SqueakMap that resulted in a 504 error. Chris M quite rightly pointed out that responding to a timeout with an immediate retry might not be the best thing (referencing some code I published to try to handle this problem); looking at the error more closely I finally noticed that a 504 is a *gateway* timeout rather than anything that seems likely to be a problem at the SM or MC repository server. Indeed the error came back much quicker than the 45 seconds timeout that we seem to have set for our http connections.
>>> I'm a long way from being an expert in the area of connecting to servers via gateways and what their timeous might be etc. so excuse stupid-question syndrome - I know this isn't Quora where stupid-question is the order of the day. Am I right in thinking that a 504 error means that some *intermediate* server timed out according to some setting in its internal config ?
>>> Am I right in imagining that we can't normally affect that timeout?
>>
>> Well, we can.
>>
>> What happens here:
>>
>> - All our websites, including all HTTP services, such as the Map, arrive together at squeak.org, aka alan.box.squeak.org
>> That is an nginx server. And also the server who eventually spits out the 504.
>> - alan then sees we want a connection to the Map, and does a HTTP request to ted.box.squeak.org (=> alan is a _reverse proxy_)
>> and upon response gets us that back.
>>
>> - if ted fails to respond in 60s, alan gives a 504.
>>
>> Simple as that. This limits the possibility that we wait too long (ie >60s) on ted.
>>
>> Elephant in the room: why not directly ted? the nginx on alan is configured as hardened as I thought best, and actually can handle a multitude of requests much better than our squeak-based "application servers". This distinction between reverse proxy and application server is btw quite standard and enables some things. For example:
>>
>> We can tune a lot of things on alan with regards to how it should handle things. The simplest being:
>> - we can tune the timeout: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout
>> that's where the 60s come from, and we could simply crank it up.
>> - HOWEVER: this could mean we eventually run into other timeouts, for example on the server or even in TCP or so.
>> - so increasing this just like that _may_ help or _may_ make the Map useless altogether, so please be careful y'all :)
>
> Tim reported shorter than 45s timeouts, so it is very likely an issue with the SqueakMap image.

But then we wouldn't have 504. 504 is explicitly: upstream timed out.

What we have is:

/etc/nginx/conf.d/proxy.conf

### proxy-timeouts ###
proxy_connect_timeout   30;
proxy_send_timeout      90;
proxy_read_timeout      90;

And _not_ being able to connect could also mean 504.

But _that_ in turn means that map is so overloaded it cannot take new connections, and that would be a bummer.


>>
>> but also:
>> - we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache
>> - we could make alan not even ask ted when we know the answer already.
>> - Attention: we need a lot of information on what is stable and what not to do this.
>> - (its tempting to try, tho)
>> - (we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)
>
> If squeaksource/mc used ETags, then the squeaksource image could simply return 304 and let nginx serve the cached mczs while keeping the
> statistics updated.

I think I had something like that already in SqueakSource3

> That would also let us save bandwidth by not downloading files already sitting in the client's package cache.
> We could also use nginx to serve files instead of the image, but then the image would have to know that it's sitting behind nginx.

You can do something like that with nginx (_and_ notifiy the server). That would be around 20 lines nginx and 50 lines in SqueakSource3

>
>> - Note: a lot of time is probably spend by ted generating HTTP and by alan parsing HTTP. Using Fcgi, for example, reduces that, and is supported by both nginx (https://nginx.org/en/docs/http/ngx_http_fastcgi_module.html) and GemStone, but I don't know whether we already have one in squeak.
>
> I'm 99% sure http overhead is negligible.

probably. but I don't know.

Best regards
        -Tobias

>
> Levente
>
>>
>>> If I have any reasonable grasp on this then we  should probably detect the 504 (in part by explicitly using a WebClient and its error handling rather than the slightly wonky httpSocket faced we have currently) and retry the connection ? Any other error or a timeout at *our* end would still be best handled as an error.
>>
>> All 500-ish codes essentially say "the server is to blame" and the client can do noghitn about that.
>> I don't think that 504 is meaningfully better handled than 503 or 502 in the WebClient. It think it's ok to pass that through.
>>
>>
>>> Except of course a 418 which has well defined error handling...
>>
>> At least not 451…
>>
>> Best regards
>> -Tobias
>>
>>> tim
>>> --
>>> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
>>> You forgot to do your backup 16 days ago.  Tomorrow you'll need that version.


Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Chris Muller-3
In reply to this post by Levente Uzonyi
Hi guys,

> >> A couple of weeks ago I had a problem loading something via SqueakMap that resulted in a 504 error. Chris M quite rightly pointed out that responding to a timeout with an immediate retry might not be the best thing (referencing some code I published to try to handle this problem); looking at the error more closely I finally noticed that a 504 is a *gateway* timeout rather than anything that seems likely to be a problem at the SM or MC repository server. Indeed the error came back much quicker than the 45 seconds timeout that we seem to have set for our http connections.
> >>
> >> I'm a long way from being an expert in the area of connecting to servers via gateways and what their timeous might be etc. so excuse stupid-question syndrome - I know this isn't Quora where stupid-question is the order of the day.
> >> Am I right in thinking that a 504 error means that some *intermediate* server timed out according to some setting in its internal config ?
> >> Am I right in imagining that we can't normally affect that timeout?
> >>
> >
> > Well, we can.
> >
> > What happens here:
> >
> > - All our websites, including all HTTP services, such as the Map, arrive together at squeak.org, aka alan.box.squeak.org
> >  That is an nginx server. And also the server who eventually spits out the 504.
> > - alan then sees we want a connection to the Map, and does a HTTP request to ted.box.squeak.org (=> alan is a _reverse proxy_)
> >  and upon response gets us that back.

Thanks for the great explanation!  I want to learn more about
admin'ing, so its great to have this in-context example of a
reverse-proxy, thanks for setting that up!

> > - if ted fails to respond in 60s, alan gives a 504.

60s seems like a ideally balanced timeout setting -- the longest any
possible request should be expected to wait ... and yet clients can
still shorten to 45s or 30 if they want a shorter timeout.

> > Simple as that. This limits the possibility that we wait too long (ie >60s) on ted.
> >
> > Elephant in the room: why not directly ted? the nginx on alan is configured as hardened as I thought best, and actually can handle a multitude of requests much better than our squeak-based "application servers". This distinction between reverse proxy and application server is btw quite standard and enables some things. For example:
> >
> > We can tune a lot of things on alan with regards to how it should handle things. The simplest being:
> >
> > - we can tune the timeout: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout
> >  that's where the 60s come from, and we could simply crank it up.
> >  - HOWEVER: this could mean we eventually run into other timeouts, for example on the server or even in TCP or so.
> >  - so increasing this just like that _may_ help or _may_ make the Map useless altogether, so please be careful y'all :)
>
> Tim reported shorter than 45s timeouts, so it is very likely an issue with
> the SqueakMap image.

Yes, the SqueakMap server image is one part of the dynamic, but I
think another is a bug in the trunk image.  I think the reason Tim is
not seeing 45 seconds before error is because the timeout setting of
the high-up client is not being passed all the way down to the
lowest-level layers -- e.g., from HTTPSocket --> WebClient -->
SocketStream --> Socket.  By the time it gets down to Socket which
does the actual work, it's operating on its own 30 second timeout.

It is a fixed amount of time, I *think* still between 30 and 45
seconds, that it takes the SqueakMap server to save its model after an
update (e.g., adding a Release, etc.).  It's so long because the
server is running on a very old 3.x image, interpreter VM.  It's
running a HttpView2 app which doesn't even compile in modern Squeak.
That's why it hasn't been brought forward yet, but I am working on a
new API service to replace it with the eventual goal of SqueakMap
being an "App Store" experience, and it will not suffer timeouts.

> > but also:
> > - we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache
> >  - we could make alan not even ask ted when we know the answer already.
> >  - Attention: we need a lot of information on what is stable and what not to do this.
> >  - (its tempting to try, tho)
> >  - (we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)
>
> If squeaksource/mc used ETags, then the squeaksource image could simply
> return 304 and let nginx serve the cached mczs while keeping the
> statistics updated.

Tim's email was about SqueakMap, not SqueakSource.  SqueakSource
serves the mcz's straight off the hard-drive platter.  We don't need
to trade away download statistics to save a few ms on a mcz request.

> That would also let us save bandwidth by not downloading files already
> sitting in the client's package cache.

How so?  Isn't the package-cache checked before hitting the server at
all?  It certainly should be.

Best,
  Chris


> We could also use nginx to serve files instead of the image, but then the
> image would have to know that it's sitting behind nginx.
>
> > - Note: a lot of time is probably spend by ted generating HTTP and by alan parsing HTTP. Using Fcgi, for example, reduces that, and is supported by both nginx (https://nginx.org/en/docs/http/ngx_http_fastcgi_module.html) and GemStone, but I don't know whether we already have one in squeak.
>
> I'm 99% sure http overhead is negligible.
>
> Levente
>
> >
> >> If I have any reasonable grasp on this then we  should probably detect the 504 (in part by explicitly using a WebClient and its error handling rather than the slightly wonky httpSocket faced we have currently) and retry the connection ? Any other error or a timeout at *our* end would still be best handled as an error.
> >
> > All 500-ish codes essentially say "the server is to blame" and the client can do noghitn about that.
> > I don't think that 504 is meaningfully better handled than 503 or 502 in the WebClient. It think it's ok to pass that through.
> >
> >
> >>
> >> Except of course a 418 which has well defined error handling...
> >>
> >
> > At least not 451…
> >
> > Best regards
> >       -Tobias
> >
> >> tim
> >> --
> >> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> >> You forgot to do your backup 16 days ago.  Tomorrow you'll need that version.
> >>
> >>
> >>

Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Levente Uzonyi
On Sun, 27 Jan 2019, Chris Muller wrote:

> Hi guys,
>
>> >> A couple of weeks ago I had a problem loading something via SqueakMap that resulted in a 504 error. Chris M quite rightly pointed out that responding to a timeout with an immediate retry might not be the best thing (referencing some code I published to try to handle this problem); looking at the error more closely I finally noticed that a 504 is a *gateway* timeout rather than anything that seems likely to be a problem at the SM or MC repository server. Indeed the error came back much quicker than the 45 seconds timeout that we seem to have set for our http connections.
>> >>
>> >> I'm a long way from being an expert in the area of connecting to servers via gateways and what their timeous might be etc. so excuse stupid-question syndrome - I know this isn't Quora where stupid-question is the order of the day.
>> >> Am I right in thinking that a 504 error means that some *intermediate* server timed out according to some setting in its internal config ?
>> >> Am I right in imagining that we can't normally affect that timeout?
>> >>
>> >
>> > Well, we can.
>> >
>> > What happens here:
>> >
>> > - All our websites, including all HTTP services, such as the Map, arrive together at squeak.org, aka alan.box.squeak.org
>> >  That is an nginx server. And also the server who eventually spits out the 504.
>> > - alan then sees we want a connection to the Map, and does a HTTP request to ted.box.squeak.org (=> alan is a _reverse proxy_)
>> >  and upon response gets us that back.
>
> Thanks for the great explanation!  I want to learn more about
> admin'ing, so its great to have this in-context example of a
> reverse-proxy, thanks for setting that up!
>
>> > - if ted fails to respond in 60s, alan gives a 504.
>
> 60s seems like a ideally balanced timeout setting -- the longest any
> possible request should be expected to wait ... and yet clients can
> still shorten to 45s or 30 if they want a shorter timeout.
>
>> > Simple as that. This limits the possibility that we wait too long (ie >60s) on ted.
>> >
>> > Elephant in the room: why not directly ted? the nginx on alan is configured as hardened as I thought best, and actually can handle a multitude of requests much better than our squeak-based "application servers". This distinction between reverse proxy and application server is btw quite standard and enables some things. For example:
>> >
>> > We can tune a lot of things on alan with regards to how it should handle things. The simplest being:
>> >
>> > - we can tune the timeout: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout
>> >  that's where the 60s come from, and we could simply crank it up.
>> >  - HOWEVER: this could mean we eventually run into other timeouts, for example on the server or even in TCP or so.
>> >  - so increasing this just like that _may_ help or _may_ make the Map useless altogether, so please be careful y'all :)
>>
>> Tim reported shorter than 45s timeouts, so it is very likely an issue with
>> the SqueakMap image.
>
> Yes, the SqueakMap server image is one part of the dynamic, but I
> think another is a bug in the trunk image.  I think the reason Tim is
> not seeing 45 seconds before error is because the timeout setting of
> the high-up client is not being passed all the way down to the
> lowest-level layers -- e.g., from HTTPSocket --> WebClient -->
> SocketStream --> Socket.  By the time it gets down to Socket which
> does the actual work, it's operating on its own 30 second timeout.
I would expect subsecond reponse times. 30 seconds is just unacceptably
long.

>
> It is a fixed amount of time, I *think* still between 30 and 45
> seconds, that it takes the SqueakMap server to save its model after an
> update (e.g., adding a Release, etc.).  It's so long because the
> server is running on a very old 3.x image, interpreter VM.  It's
> running a HttpView2 app which doesn't even compile in modern Squeak.
> That's why it hasn't been brought forward yet, but I am working on a
> new API service to replace it with the eventual goal of SqueakMap
> being an "App Store" experience, and it will not suffer timeouts.
>
>> > but also:
>> > - we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache
>> >  - we could make alan not even ask ted when we know the answer already.
>> >  - Attention: we need a lot of information on what is stable and what not to do this.
>> >  - (its tempting to try, tho)
>> >  - (we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)
>>
>> If squeaksource/mc used ETags, then the squeaksource image could simply
>> return 304 and let nginx serve the cached mczs while keeping the
>> statistics updated.
>
> Tim's email was about SqueakMap, not SqueakSource.  SqueakSource
That part of the thread changed direction. It happens sometimes.

> serves the mcz's straight off the hard-drive platter.  We don't need
> to trade away download statistics to save a few ms on a mcz request.

Download statistics would stay the same despite being flawed (e.g.
you'll download everything multiple times even if those files are sitting
in your package cache).
You would save seconds, not milliseconds by not downloading files again.

>
>> That would also let us save bandwidth by not downloading files already
>> sitting in the client's package cache.
>
> How so?  Isn't the package-cache checked before hitting the server at
> all?  It certainly should be.

No, it's not. Currently that's not possible, because different files can
have the same name. And currently we have no way to tell them apart.

Levente

>
> Best,
>  Chris
>
>
>> We could also use nginx to serve files instead of the image, but then the
>> image would have to know that it's sitting behind nginx.
>>
>> > - Note: a lot of time is probably spend by ted generating HTTP and by alan parsing HTTP. Using Fcgi, for example, reduces that, and is supported by both nginx (https://nginx.org/en/docs/http/ngx_http_fastcgi_module.html) and GemStone, but I don't know whether we already have one in squeak.
>>
>> I'm 99% sure http overhead is negligible.
>>
>> Levente
>>
>> >
>> >> If I have any reasonable grasp on this then we  should probably detect the 504 (in part by explicitly using a WebClient and its error handling rather than the slightly wonky httpSocket faced we have currently) and retry the connection ? Any other error or a timeout at *our* end would still be best handled as an error.
>> >
>> > All 500-ish codes essentially say "the server is to blame" and the client can do noghitn about that.
>> > I don't think that 504 is meaningfully better handled than 503 or 502 in the WebClient. It think it's ok to pass that through.
>> >
>> >
>> >>
>> >> Except of course a 418 which has well defined error handling...
>> >>
>> >
>> > At least not 451…
>> >
>> > Best regards
>> >       -Tobias
>> >
>> >> tim
>> >> --
>> >> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
>> >> You forgot to do your backup 16 days ago.  Tomorrow you'll need that version.
>> >>
>> >>
>> >>

Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Chris Muller-4
Hi Levente,

> > Yes, the SqueakMap server image is one part of the dynamic, but I
> > think another is a bug in the trunk image.  I think the reason Tim is
> > not seeing 45 seconds before error is because the timeout setting of
> > the high-up client is not being passed all the way down to the
> > lowest-level layers -- e.g., from HTTPSocket --> WebClient -->
> > SocketStream --> Socket.  By the time it gets down to Socket which
> > does the actual work, it's operating on its own 30 second timeout.
>
> I would expect subsecond reponse times. 30 seconds is just unacceptably
> long.

Well, it depends on if, for example, you're in the middle of
Antarctica with a slow internet connection in an office with a fast
connection.  A 30 second timeout is just the maximum amount of time
the client will wait for the entire process before presenting a
debugger, that's all it can do.

> > It is a fixed amount of time, I *think* still between 30 and 45
> > seconds, that it takes the SqueakMap server to save its model after an
> > update (e.g., adding a Release, etc.).  It's so long because the
> > server is running on a very old 3.x image, interpreter VM.  It's
> > running a HttpView2 app which doesn't even compile in modern Squeak.
> > That's why it hasn't been brought forward yet, but I am working on a
> > new API service to replace it with the eventual goal of SqueakMap
> > being an "App Store" experience, and it will not suffer timeouts.
> >
> >> > but also:
> >> > - we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache
> >> >  - we could make alan not even ask ted when we know the answer already.
> >> >  - Attention: we need a lot of information on what is stable and what not to do this.
> >> >  - (its tempting to try, tho)
> >> >  - (we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)
> >>
> >> If squeaksource/mc used ETags, then the squeaksource image could simply
> >> return 304 and let nginx serve the cached mczs while keeping the
> >> statistics updated.
> >
> > Tim's email was about SqueakMap, not SqueakSource.  SqueakSource
>
> That part of the thread changed direction. It happens sometimes.
>
> > serves the mcz's straight off the hard-drive platter.  We don't need
> > to trade away download statistics to save a few ms on a mcz request.
>
> Download statistics would stay the same despite being flawed (e.g.
> you'll download everything multiple times even if those files are sitting
> in your package cache).

Not if we fix the package-cache (more about this, below).

> You would save seconds, not milliseconds by not downloading files again.

IIUC, you're saying we would save one hope in the "download" --
instead of client <--> alan <--> andreas, it would just be client <-->
alan.  Is that right?

I don't know what the speed between alan <---> andreas is, but I doubt
it's much slower than client <---> alan in most cases, so the savings
would seem to be minimal..?

> >> That would also let us save bandwidth by not downloading files already
> >> sitting in the client's package cache.
> >
> > How so?  Isn't the package-cache checked before hitting the server at
> > all?  It certainly should be.
>
> No, it's not. Currently that's not possible, because different files can
> have the same name. And currently we have no way to tell them apart.

No.  No two MCZ's may have the same name, certainly not withiin the
same repository, because MCRepository cannot support that.  So maybe
we need project subdirectories under package-cache to properly
simulate each cached Repository.  I had no idea we were neutering 90%
of the benefits of our package-cache because of this too, and just
sitting here, I can't help wonder whether this is why MCProxy doesn't
work properly either!

The primary purpose of a cache is to *check it first* to speed up
access to something, right?  What you say about package-cache sounds
really bad we should fix that, not surrender to it.

 - Chris

Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Chris Muller-3
> > >> That would also let us save bandwidth by not downloading files already
> > >> sitting in the client's package cache.
> > >
> > > How so?  Isn't the package-cache checked before hitting the server at
> > > all?  It certainly should be.
> >
> > No, it's not. Currently that's not possible, because different files can
> > have the same name. And currently we have no way to tell them apart.

Even still, we could check the package-cache first, open up the one
with that name and see if its teh correct UUID...

> No.  No two MCZ's may have the same name, certainly not withiin the
> same repository, because MCRepository cannot support that.  So maybe
> we need project subdirectories under package-cache to properly
> simulate each cached Repository.  I had no idea we were neutering 90%
> of the benefits of our package-cache because of this too, and just
> sitting here, I can't help wonder whether this is why MCProxy doesn't
> work properly either!
>
> The primary purpose of a cache is to *check it first* to speed up
> access to something, right?  What you say about package-cache sounds
> really bad we should fix that, not surrender to it.
>
>  - Chris
>

Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Levente Uzonyi
In reply to this post by Chris Muller-4
On Sun, 27 Jan 2019, Chris Muller wrote:

> Hi Levente,
>
>>> Yes, the SqueakMap server image is one part of the dynamic, but I
>>> think another is a bug in the trunk image.  I think the reason Tim is
>>> not seeing 45 seconds before error is because the timeout setting of
>>> the high-up client is not being passed all the way down to the
>>> lowest-level layers -- e.g., from HTTPSocket --> WebClient -->
>>> SocketStream --> Socket.  By the time it gets down to Socket which
>>> does the actual work, it's operating on its own 30 second timeout.
>>
>> I would expect subsecond reponse times. 30 seconds is just unacceptably
>> long.
>
> Well, it depends on if, for example, you're in the middle of
> Antarctica with a slow internet connection in an office with a fast
> connection.  A 30 second timeout is just the maximum amount of time
> the client will wait for the entire process before presenting a
> debugger, that's all it can do.
We can be sure that Tim should get subsecond response times instead of
timeouts after 30 seconds.

>
>>> It is a fixed amount of time, I *think* still between 30 and 45
>>> seconds, that it takes the SqueakMap server to save its model after an
>>> update (e.g., adding a Release, etc.).  It's so long because the
>>> server is running on a very old 3.x image, interpreter VM.  It's
>>> running a HttpView2 app which doesn't even compile in modern Squeak.
>>> That's why it hasn't been brought forward yet, but I am working on a
>>> new API service to replace it with the eventual goal of SqueakMap
>>> being an "App Store" experience, and it will not suffer timeouts.
>>>
>>>>> but also:
>>>>> - we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache
>>>>>  - we could make alan not even ask ted when we know the answer already.
>>>>>  - Attention: we need a lot of information on what is stable and what not to do this.
>>>>>  - (its tempting to try, tho)
>>>>>  - (we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)
>>>>
>>>> If squeaksource/mc used ETags, then the squeaksource image could simply
>>>> return 304 and let nginx serve the cached mczs while keeping the
>>>> statistics updated.
>>>
>>> Tim's email was about SqueakMap, not SqueakSource.  SqueakSource
>>
>> That part of the thread changed direction. It happens sometimes.
>>
>>> serves the mcz's straight off the hard-drive platter.  We don't need
>>> to trade away download statistics to save a few ms on a mcz request.
>>
>> Download statistics would stay the same despite being flawed (e.g.
>> you'll download everything multiple times even if those files are sitting
>> in your package cache).
>
> Not if we fix the package-cache (more about this, below).
>
>> You would save seconds, not milliseconds by not downloading files again.
>
> IIUC, you're saying we would save one hope in the "download" --
> instead of client <--> alan <--> andreas, it would just be client <-->
> alan.  Is that right?
No. If the client doesn't have the mcz in the package cache but nginx has
it in its cache, then we save the transfer of data between alan and
andreas. The file doesn't have to be read from the disk either.
If the client does have the mcz, then we save the complete file transfer.

>
> I don't know what the speed between alan <---> andreas is, but I doubt
> it's much slower than client <---> alan in most cases, so the savings
> would seem to be minimal..?

The image wouldn't have to open a file, read its content from the disk and
send that through a socket. Nginx does that thing magnitudes faster than
Squeak.

>
>>>> That would also let us save bandwidth by not downloading files already
>>>> sitting in the client's package cache.
>>>
>>> How so?  Isn't the package-cache checked before hitting the server at
>>> all?  It certainly should be.
>>
>> No, it's not. Currently that's not possible, because different files can
>> have the same name. And currently we have no way to tell them apart.
>
> No.  No two MCZ's may have the same name, certainly not withiin the
> same repository, because MCRepository cannot support that.  So maybe
Not at the same time, but it's possible, and it just happened recently
with Chronology-ul.21.
It is perfectly possible that a client has a version in its package cache
with the same name as a different version on the server.

> we need project subdirectories under package-cache to properly
> simulate each cached Repository.  I had no idea we were neutering 90%
> of the benefits of our package-cache because of this too, and just
> sitting here, I can't help wonder whether this is why MCProxy doesn't
> work properly either!
>
> The primary purpose of a cache is to *check it first* to speed up
> access to something, right?  What you say about package-cache sounds

I don't know. It wasn't me who designed it. :)

> really bad we should fix that, not surrender to it.

Yes, that should be fixed, but it needs changes on the server side.
What I always had in mind was to extend the repository listing with
hashes/uuids so that the client could figure out if it needs to download a
specific version. But care must be taken not to break the code for
non-ss repositories (e.g. simple directory listings).

Levente

>
> - Chris
>

Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Levente Uzonyi
In reply to this post by Chris Muller-3
On Sun, 27 Jan 2019, Chris Muller wrote:

>>>>> That would also let us save bandwidth by not downloading files already
>>>>> sitting in the client's package cache.
>>>>
>>>> How so?  Isn't the package-cache checked before hitting the server at
>>>> all?  It certainly should be.
>>>
>>> No, it's not. Currently that's not possible, because different files can
>>> have the same name. And currently we have no way to tell them apart.
>
> Even still, we could check the package-cache first, open up the one
> with that name and see if its teh correct UUID...

UUIDs may work, but hashes have the advantage that the tools don't have to
know about the internals of the packages.
Also, I think mcds and mcms don't have UUIDs, but hashes would work with
those too.

Levente

>
>> No.  No two MCZ's may have the same name, certainly not withiin the
>> same repository, because MCRepository cannot support that.  So maybe
>> we need project subdirectories under package-cache to properly
>> simulate each cached Repository.  I had no idea we were neutering 90%
>> of the benefits of our package-cache because of this too, and just
>> sitting here, I can't help wonder whether this is why MCProxy doesn't
>> work properly either!
>>
>> The primary purpose of a cache is to *check it first* to speed up
>> access to something, right?  What you say about package-cache sounds
>> really bad we should fix that, not surrender to it.
>>
>>  - Chris
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Chris Muller-4
In reply to this post by Levente Uzonyi
Hi,

> >>> Yes, the SqueakMap server image is one part of the dynamic, but I
> >>> think another is a bug in the trunk image.  I think the reason Tim is
> >>> not seeing 45 seconds before error is because the timeout setting of
> >>> the high-up client is not being passed all the way down to the
> >>> lowest-level layers -- e.g., from HTTPSocket --> WebClient -->
> >>> SocketStream --> Socket.  By the time it gets down to Socket which
> >>> does the actual work, it's operating on its own 30 second timeout.
> >>
> >> I would expect subsecond reponse times. 30 seconds is just unacceptably
> >> long.
> >
> > Well, it depends on if, for example, you're in the middle of
> > Antarctica with a slow internet connection in an office with a fast
> > connection.  A 30 second timeout is just the maximum amount of time
> > the client will wait for the entire process before presenting a
> > debugger, that's all it can do.
>
> We can be sure that Tim should get subsecond response times instead of
> timeouts after 30 seconds.

Right, but timeout settings are a necessary tool sometimes, my point
was that we should fix client code in trunk to make timeouts work
properly.

Incidentally, 99% of SqueakMap requests ARE subsecond -- just go to
map.squeak.org and click around and see.  For the remaining 1% that
aren't, the issue is known and we're working on a new server to fix
that.

> >>> It is a fixed amount of time, I *think* still between 30 and 45
> >>> seconds, that it takes the SqueakMap server to save its model after an

and so if in the meantime it can simply be made to wait 45s instead of
30s, then current SqueakMap will only be that occasional delay at
worst, instead of the annoying debugger we currently get.

> >> You would save seconds, not milliseconds by not downloading files again.
> >
> > IIUC, you're saying we would save one hope in the "download" --
> > instead of client <--> alan <--> andreas, it would just be client <-->
> > alan.  Is that right?
>
> No. If the client doesn't have the mcz in the package cache but nginx has
> it in its cache, then we save the transfer of data between alan and
> andreas.

Are alan and andreas co-located?

> The file doesn't have to be read from the disk either.

I assume you mean "read from disk" on alan?  What about after it's
cached so many mcz's in RAM that its paging out to swap file?  To me,
wasing precious RAM (of any server) to cache old MCZ file contents
that no one will ever download (because they become old very quickly)
feels wasteful.  Dragster cars are wasteful too, but yes, they are
"faster"... on a dragstrip.  :)  I guess there'd have to be some kind
of application-specific smart management of the cache...

Levente, what about the trunk directory listing, can it cache that?
That is the _#1 thing_ source.squeak.org is accessing and sending back
over, and over, and over again -- every time that MC progress box that
says, "Updating [repository name]".

> If the client does have the mcz, then we save the complete file transfer.
>
> >
> > I don't know what the speed between alan <---> andreas is, but I doubt
> > it's much slower than client <---> alan in most cases, so the savings
> > would seem to be minimal..?
>
> The image wouldn't have to open a file, read its content from the disk and
> send that through a socket.

By "the image" I assume you mean the SqueakSource server image.  But
opening the file takes very little time.  Original web-sites were
.html files, remember how fast those were?  Plus, filesystems "cache"
file contents into their own internal caches anyway...

Yes, it still has to return back through alan but I assume alan does
not wait for a "full download" received from andreas before its
already pipeing back to the Squeak client.  If true, then it seems
like it only amounts to saving one hop, which would hardly be
noticeable over what we have now.

> Nginx does that thing magnitudes faster than
> Squeak.

The UX would not be magnitudes faster though, right?

> >>>> That would also let us save bandwidth by not downloading files already
> >>>> sitting in the client's package cache.
> >>>
> >>> How so?  Isn't the package-cache checked before hitting the server at
> >>> all?  It certainly should be.
> >>
> >> No, it's not. Currently that's not possible, because different files can
> >> have the same name. And currently we have no way to tell them apart.
> >
> > No.  No two MCZ's may have the same name, certainly not withiin the
> > same repository, because MCRepository cannot support that.  So maybe
>
> Not at the same time, but it's possible, and it just happened recently
> with Chronology-ul.21.
> It is perfectly possible that a client has a version in its package cache
> with the same name as a different version on the server.

But we don't want to restrict what's possible in our software design
because of that.  That situation is already a headache anyway.  Same
name theoretically can come only from the same person (if we ensure
unique initials) and so this is avoidable / fixable by resaving one of
them as a different name...

> > we need project subdirectories under package-cache to properly
> > simulate each cached Repository.  I had no idea we were neutering 90%
> > of the benefits of our package-cache because of this too, and just
> > sitting here, I can't help wonder whether this is why MCProxy doesn't
> > work properly either!
> >
> > The primary purpose of a cache is to *check it first* to speed up
> > access to something, right?  What you say about package-cache sounds
>
> I don't know. It wasn't me who designed it. :)

I meant ANY "cache".

   https://en.wikipedia.org/wiki/Cache_(computing)

For Monticello, package-cache's other use-case is when an
authentication issue occurs when trying to save to a HTTP repository.
At that point the Version object with the new ancestry was already
constructed in memory, so rather than worry about trying to "undo" all
that, it was simpler and better to save it to a package-cache, persist
it safely so the client can simply move forward from there (get access
to the HTTP and copy it or whatever).

 - Chris

> > really bad we should fix that, not surrender to it.
>
> Yes, that should be fixed, but it needs changes on the server side.
> What I always had in mind was to extend the repository listing with
> hashes/uuids so that the client could figure out if it needs to download a
> specific version. But care must be taken not to break the code for
> non-ss repositories (e.g. simple directory listings).
>
> Levente
>
> >
> > - Chris
> >

Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

timrowledge
I'm really pleased some competent people are thinking about this; means I can stop worrying about something outside my main thrust !

Generally I prefer things to timeout very quickly if they are going to timeout at all - I was startled to see that the default timeout appears to be 45 seconds. This is especially the case if the thing potentially timing out is blocking any other actions I might want to be getting on with, it used to be a *real* annoyance with some RISC OS applications blocking the entire OS through poor design. Some better user feedback about the progress would help in a lot of cases. After all, if you have some indication that stuff is actually being done for you it is less annoying. It's a pity there isn't a class of html 'error' message that says "I'm working on it, busy right now, check again in X seconds"  or "we're sorry, all our sockets are busy. Please stay online and we'll get to you soon" etc.

I am interested in what error responses we might sensibly handle and how. Some examples that document helpful behaviour would be nice to add so that future authors have some guidance in doing smart things.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Oxymorons: Sweet sorrow



Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Levente Uzonyi
In reply to this post by Chris Muller-4
On Sun, 27 Jan 2019, Chris Muller wrote:

> Hi,
>
>>>>> Yes, the SqueakMap server image is one part of the dynamic, but I
>>>>> think another is a bug in the trunk image.  I think the reason Tim is
>>>>> not seeing 45 seconds before error is because the timeout setting of
>>>>> the high-up client is not being passed all the way down to the
>>>>> lowest-level layers -- e.g., from HTTPSocket --> WebClient -->
>>>>> SocketStream --> Socket.  By the time it gets down to Socket which
>>>>> does the actual work, it's operating on its own 30 second timeout.
>>>>
>>>> I would expect subsecond reponse times. 30 seconds is just unacceptably
>>>> long.
>>>
>>> Well, it depends on if, for example, you're in the middle of
>>> Antarctica with a slow internet connection in an office with a fast
>>> connection.  A 30 second timeout is just the maximum amount of time
>>> the client will wait for the entire process before presenting a
>>> debugger, that's all it can do.
>>
>> We can be sure that Tim should get subsecond response times instead of
>> timeouts after 30 seconds.
>
> Right, but timeout settings are a necessary tool sometimes, my point
> was that we should fix client code in trunk to make timeouts work
> properly.
>
> Incidentally, 99% of SqueakMap requests ARE subsecond -- just go to
> map.squeak.org and click around and see.  For the remaining 1% that
> aren't, the issue is known and we're working on a new server to fix
> that.

Great! That was my point: the image needs to be fixed.

>
>>>>> It is a fixed amount of time, I *think* still between 30 and 45
>>>>> seconds, that it takes the SqueakMap server to save its model after an
>
> and so if in the meantime it can simply be made to wait 45s instead of
> 30s, then current SqueakMap will only be that occasional delay at
> worst, instead of the annoying debugger we currently get.

I don't see why that would make a difference: the user would get a
debugger anyway, but only 15 seconds later.

>
>>>> You would save seconds, not milliseconds by not downloading files again.
>>>
>>> IIUC, you're saying we would save one hope in the "download" --
>>> instead of client <--> alan <--> andreas, it would just be client <-->
>>> alan.  Is that right?
>>
>> No. If the client doesn't have the mcz in the package cache but nginx has
>> it in its cache, then we save the transfer of data between alan and
>> andreas.
>
> Are alan and andreas co-located?

They are cloud servers in the same data center.

>
>> The file doesn't have to be read from the disk either.
>
> I assume you mean "read from disk" on alan?  What about after it's
> cached so many mcz's in RAM that its paging out to swap file?  To me,
> wasing precious RAM (of any server) to cache old MCZ file contents
> that no one will ever download (because they become old very quickly)
> feels wasteful.  Dragster cars are wasteful too, but yes, they are
> "faster"... on a dragstrip.  :)  I guess there'd have to be some kind
> of application-specific smart management of the cache...

Nginx's proxy_cache can handle that all automatically. Also, we don't need
a large cache. A small, memory-only cache would do it.

>
> Levente, what about the trunk directory listing, can it cache that?

Sure.

> That is the _#1 thing_ source.squeak.org is accessing and sending back
> over, and over, and over again -- every time that MC progress box that
> says, "Updating [repository name]".

Right, unless you update an older image.

>
>> If the client does have the mcz, then we save the complete file transfer.
>>
>>>
>>> I don't know what the speed between alan <---> andreas is, but I doubt
>>> it's much slower than client <---> alan in most cases, so the savings
>>> would seem to be minimal..?
>>
>> The image wouldn't have to open a file, read its content from the disk and
>> send that through a socket.
>
> By "the image" I assume you mean the SqueakSource server image.  But
> opening the file takes very little time.  Original web-sites were
> .html files, remember how fast those were?  Plus, filesystems "cache"
> file contents into their own internal caches anyway...

Each file uses one external semaphore, each socket uses three. If you use
a default image, there can be no more than 256 external semaphores which
is ridiculous for a server, and it'll just grind to a halt when some load
arrives. Every time the external semaphore table is full, a GC is
triggered to try clear it up via the finalization process.
Reading a file into memory is slow, writing it to a socket is slow.
(Compared to nginx which uses sendfile to let the kernel handle that).
And Squeak can only use a single process to handle everything.

>
> Yes, it still has to return back through alan but I assume alan does
> not wait for a "full download" received from andreas before its
> already pipeing back to the Squeak client.  If true, then it seems
> like it only amounts to saving one hop, which would hardly be
> noticeable over what we have now.

The goal of caching is not about saving a hop, but to avoid handling files
in Squeak.

>
>> Nginx does that thing magnitudes faster than
>> Squeak.
>
> The UX would not be magnitudes faster though, right?

Directly by letting nginx serving the file, no, but the server image would
be less likely to get stalled (return 5xx responses).
But the caching scheme I described in this thread would make the UX a lot
quicker too, because data would not have to be transferred when you
already have it.

>
>>>>>> That would also let us save bandwidth by not downloading files already
>>>>>> sitting in the client's package cache.
>>>>>
>>>>> How so?  Isn't the package-cache checked before hitting the server at
>>>>> all?  It certainly should be.
>>>>
>>>> No, it's not. Currently that's not possible, because different files can
>>>> have the same name. And currently we have no way to tell them apart.
>>>
>>> No.  No two MCZ's may have the same name, certainly not withiin the
>>> same repository, because MCRepository cannot support that.  So maybe
>>
>> Not at the same time, but it's possible, and it just happened recently
>> with Chronology-ul.21.
>> It is perfectly possible that a client has a version in its package cache
>> with the same name as a different version on the server.
>
> But we don't want to restrict what's possible in our software design
> because of that.  That situation is already a headache anyway.  Same
> name theoretically can come only from the same person (if we ensure
> unique initials) and so this is avoidable / fixable by resaving one of
> them as a different name...

It wasn't me who created the duplicate. If your suggestion had been in
place, some images out there, including mine, would have been broken by
the update process.

>
>>> we need project subdirectories under package-cache to properly
>>> simulate each cached Repository.  I had no idea we were neutering 90%
>>> of the benefits of our package-cache because of this too, and just
>>> sitting here, I can't help wonder whether this is why MCProxy doesn't
>>> work properly either!
>>>
>>> The primary purpose of a cache is to *check it first* to speed up
>>> access to something, right?  What you say about package-cache sounds
>>
>> I don't know. It wasn't me who designed it. :)
>
> I meant ANY "cache".
>
>   https://en.wikipedia.org/wiki/Cache_(computing)

It still depends on the purpose of the cache. It's possible that
package-cache is just a misnomer or it was just a plan to use it as a
cache which hasn't happened yet.

>
> For Monticello, package-cache's other use-case is when an
> authentication issue occurs when trying to save to a HTTP repository.
> At that point the Version object with the new ancestry was already
> constructed in memory, so rather than worry about trying to "undo" all
> that, it was simpler and better to save it to a package-cache, persist
> it safely so the client can simply move forward from there (get access
> to the HTTP and copy it or whatever).

The package-cache is also handy as a default repository and as an offline
storage.

Levente

>
> - Chris
>
>>> really bad we should fix that, not surrender to it.
>>
>> Yes, that should be fixed, but it needs changes on the server side.
>> What I always had in mind was to extend the repository listing with
>> hashes/uuids so that the client could figure out if it needs to download a
>> specific version. But care must be taken not to break the code for
>> non-ss repositories (e.g. simple directory listings).
>>
>> Levente
>>
>>>
>>> - Chris
>>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Eliot Miranda-2
Hi Levente,

> On Jan 27, 2019, at 5:40 PM, Levente Uzonyi <[hidden email]> wrote:
>
>> On Sun, 27 Jan 2019, Chris Muller wrote:
>>
>> Hi,
>>
>>>>>> Yes, the SqueakMap server image is one part of the dynamic, but I
>>>>>> think another is a bug in the trunk image.  I think the reason Tim is
>>>>>> not seeing 45 seconds before error is because the timeout setting of
>>>>>> the high-up client is not being passed all the way down to the
>>>>>> lowest-level layers -- e.g., from HTTPSocket --> WebClient -->
>>>>>> SocketStream --> Socket.  By the time it gets down to Socket which
>>>>>> does the actual work, it's operating on its own 30 second timeout.
>>>>>
>>>>> I would expect subsecond reponse times. 30 seconds is just unacceptably
>>>>> long.
>>>>
>>>> Well, it depends on if, for example, you're in the middle of
>>>> Antarctica with a slow internet connection in an office with a fast
>>>> connection.  A 30 second timeout is just the maximum amount of time
>>>> the client will wait for the entire process before presenting a
>>>> debugger, that's all it can do.
>>>
>>> We can be sure that Tim should get subsecond response times instead of
>>> timeouts after 30 seconds.
>>
>> Right, but timeout settings are a necessary tool sometimes, my point
>> was that we should fix client code in trunk to make timeouts work
>> properly.
>>
>> Incidentally, 99% of SqueakMap requests ARE subsecond -- just go to
>> map.squeak.org and click around and see.  For the remaining 1% that
>> aren't, the issue is known and we're working on a new server to fix
>> that.
>
> Great! That was my point: the image needs to be fixed.
>
>>
>>>>>> It is a fixed amount of time, I *think* still between 30 and 45
>>>>>> seconds, that it takes the SqueakMap server to save its model after an
>>
>> and so if in the meantime it can simply be made to wait 45s instead of
>> 30s, then current SqueakMap will only be that occasional delay at
>> worst, instead of the annoying debugger we currently get.
>
> I don't see why that would make a difference: the user would get a debugger anyway, but only 15 seconds later.
>
>>
>>>>> You would save seconds, not milliseconds by not downloading files again.
>>>>
>>>> IIUC, you're saying we would save one hope in the "download" --
>>>> instead of client <--> alan <--> andreas, it would just be client <-->
>>>> alan.  Is that right?
>>>
>>> No. If the client doesn't have the mcz in the package cache but nginx has
>>> it in its cache, then we save the transfer of data between alan and
>>> andreas.
>>
>> Are alan and andreas co-located?
>
> They are cloud servers in the same data center.
>
>>
>>> The file doesn't have to be read from the disk either.
>>
>> I assume you mean "read from disk" on alan?  What about after it's
>> cached so many mcz's in RAM that its paging out to swap file?  To me,
>> wasing precious RAM (of any server) to cache old MCZ file contents
>> that no one will ever download (because they become old very quickly)
>> feels wasteful.  Dragster cars are wasteful too, but yes, they are
>> "faster"... on a dragstrip.  :)  I guess there'd have to be some kind
>> of application-specific smart management of the cache...
>
> Nginx's proxy_cache can handle that all automatically. Also, we don't need a large cache. A small, memory-only cache would do it.
>
>>
>> Levente, what about the trunk directory listing, can it cache that?
>
> Sure.
>
>> That is the _#1 thing_ source.squeak.org is accessing and sending back
>> over, and over, and over again -- every time that MC progress box that
>> says, "Updating [repository name]".
>
> Right, unless you update an older image.
>
>>
>>> If the client does have the mcz, then we save the complete file transfer.
>>>
>>>>
>>>> I don't know what the speed between alan <---> andreas is, but I doubt
>>>> it's much slower than client <---> alan in most cases, so the savings
>>>> would seem to be minimal..?
>>>
>>> The image wouldn't have to open a file, read its content from the disk and
>>> send that through a socket.
>>
>> By "the image" I assume you mean the SqueakSource server image.  But
>> opening the file takes very little time.  Original web-sites were
>> .html files, remember how fast those were?  Plus, filesystems "cache"
>> file contents into their own internal caches anyway...
>
> Each file uses one external semaphore, each socket uses three. If you use a default image, there can be no more than 256 external semaphores which is ridiculous for a server, and it'll just grind to a halt when some load arrives. Every time the external semaphore table is full, a GC is triggered to try clear it up via the finalization process.
> Reading a file into memory is slow, writing it to a socket is slow.
> (Compared to nginx which uses sendfile to let the kernel handle that).
> And Squeak can only use a single process to handle everything.

That’s configurable.  Alas because writing lock-free table growth is not easy the external semaphore table doesn’t grow automatically.  But the vm does allow its size to be specified in a value cached in the image header and read at startup (IIRC).  So we could easily have a 4K entry external semaphore table.

>
>>
>> Yes, it still has to return back through alan but I assume alan does
>> not wait for a "full download" received from andreas before its
>> already pipeing back to the Squeak client.  If true, then it seems
>> like it only amounts to saving one hop, which would hardly be
>> noticeable over what we have now.
>
> The goal of caching is not about saving a hop, but to avoid handling files in Squeak.
>
>>
>>> Nginx does that thing magnitudes faster than
>>> Squeak.
>>
>> The UX would not be magnitudes faster though, right?
>
> Directly by letting nginx serving the file, no, but the server image would be less likely to get stalled (return 5xx responses).
> But the caching scheme I described in this thread would make the UX a lot quicker too, because data would not have to be transferred when you already have it.
>
>>
>>>>>>> That would also let us save bandwidth by not downloading files already
>>>>>>> sitting in the client's package cache.
>>>>>>
>>>>>> How so?  Isn't the package-cache checked before hitting the server at
>>>>>> all?  It certainly should be.
>>>>>
>>>>> No, it's not. Currently that's not possible, because different files can
>>>>> have the same name. And currently we have no way to tell them apart.
>>>>
>>>> No.  No two MCZ's may have the same name, certainly not withiin the
>>>> same repository, because MCRepository cannot support that.  So maybe
>>>
>>> Not at the same time, but it's possible, and it just happened recently
>>> with Chronology-ul.21.
>>> It is perfectly possible that a client has a version in its package cache
>>> with the same name as a different version on the server.
>>
>> But we don't want to restrict what's possible in our software design
>> because of that.  That situation is already a headache anyway.  Same
>> name theoretically can come only from the same person (if we ensure
>> unique initials) and so this is avoidable / fixable by resaving one of
>> them as a different name...
>
> It wasn't me who created the duplicate. If your suggestion had been in place, some images out there, including mine, would have been broken by the update process.
>
>>
>>>> we need project subdirectories under package-cache to properly
>>>> simulate each cached Repository.  I had no idea we were neutering 90%
>>>> of the benefits of our package-cache because of this too, and just
>>>> sitting here, I can't help wonder whether this is why MCProxy doesn't
>>>> work properly either!
>>>>
>>>> The primary purpose of a cache is to *check it first* to speed up
>>>> access to something, right?  What you say about package-cache sounds
>>>
>>> I don't know. It wasn't me who designed it. :)
>>
>> I meant ANY "cache".
>>
>>  https://en.wikipedia.org/wiki/Cache_(computing)
>
> It still depends on the purpose of the cache. It's possible that package-cache is just a misnomer or it was just a plan to use it as a cache which hasn't happened yet.
>
>>
>> For Monticello, package-cache's other use-case is when an
>> authentication issue occurs when trying to save to a HTTP repository.
>> At that point the Version object with the new ancestry was already
>> constructed in memory, so rather than worry about trying to "undo" all
>> that, it was simpler and better to save it to a package-cache, persist
>> it safely so the client can simply move forward from there (get access
>> to the HTTP and copy it or whatever).
>
> The package-cache is also handy as a default repository and as an offline storage.
>
> Levente
>
>>
>> - Chris
>>
>>>> really bad we should fix that, not surrender to it.
>>>
>>> Yes, that should be fixed, but it needs changes on the server side.
>>> What I always had in mind was to extend the repository listing with
>>> hashes/uuids so that the client could figure out if it needs to download a
>>> specific version. But care must be taken not to break the code for
>>> non-ss repositories (e.g. simple directory listings).
>>>
>>> Levente
>>>
>>>>
>>>> - Chris
>>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Squeak - Dev mailing list
I know that it mostly a cost and reconfiguration thing, but has there been any thought to maybe make multiple servers? With the front end doing a round robin to distribute the load? I’m saying this without knowing what kind of loads the server is experiencing, or whether there are log files that record the activity. 


On Jan 27, 2019, at 18:45, Eliot Miranda <[hidden email]> wrote:

Hi Levente,

On Jan 27, 2019, at 5:40 PM, Levente Uzonyi <[hidden email]> wrote:

On Sun, 27 Jan 2019, Chris Muller wrote:

Hi,

Yes, the SqueakMap server image is one part of the dynamic, but I
think another is a bug in the trunk image.  I think the reason Tim is
not seeing 45 seconds before error is because the timeout setting of
the high-up client is not being passed all the way down to the
lowest-level layers -- e.g., from HTTPSocket --> WebClient -->
SocketStream --> Socket.  By the time it gets down to Socket which
does the actual work, it's operating on its own 30 second timeout.

I would expect subsecond reponse times. 30 seconds is just unacceptably
long.

Well, it depends on if, for example, you're in the middle of
Antarctica with a slow internet connection in an office with a fast
connection.  A 30 second timeout is just the maximum amount of time
the client will wait for the entire process before presenting a
debugger, that's all it can do.

We can be sure that Tim should get subsecond response times instead of
timeouts after 30 seconds.

Right, but timeout settings are a necessary tool sometimes, my point
was that we should fix client code in trunk to make timeouts work
properly.

Incidentally, 99% of SqueakMap requests ARE subsecond -- just go to
map.squeak.org and click around and see.  For the remaining 1% that
aren't, the issue is known and we're working on a new server to fix
that.

Great! That was my point: the image needs to be fixed.


It is a fixed amount of time, I *think* still between 30 and 45
seconds, that it takes the SqueakMap server to save its model after an

and so if in the meantime it can simply be made to wait 45s instead of
30s, then current SqueakMap will only be that occasional delay at
worst, instead of the annoying debugger we currently get.

I don't see why that would make a difference: the user would get a debugger anyway, but only 15 seconds later.


You would save seconds, not milliseconds by not downloading files again.

IIUC, you're saying we would save one hope in the "download" --
instead of client <--> alan <--> andreas, it would just be client <-->
alan.  Is that right?

No. If the client doesn't have the mcz in the package cache but nginx has
it in its cache, then we save the transfer of data between alan and
andreas.

Are alan and andreas co-located?

They are cloud servers in the same data center.


The file doesn't have to be read from the disk either.

I assume you mean "read from disk" on alan?  What about after it's
cached so many mcz's in RAM that its paging out to swap file?  To me,
wasing precious RAM (of any server) to cache old MCZ file contents
that no one will ever download (because they become old very quickly)
feels wasteful.  Dragster cars are wasteful too, but yes, they are
"faster"... on a dragstrip.  :)  I guess there'd have to be some kind
of application-specific smart management of the cache...

Nginx's proxy_cache can handle that all automatically. Also, we don't need a large cache. A small, memory-only cache would do it.


Levente, what about the trunk directory listing, can it cache that?

Sure.

That is the _#1 thing_ source.squeak.org is accessing and sending back
over, and over, and over again -- every time that MC progress box that
says, "Updating [repository name]".

Right, unless you update an older image.


If the client does have the mcz, then we save the complete file transfer.


I don't know what the speed between alan <---> andreas is, but I doubt
it's much slower than client <---> alan in most cases, so the savings
would seem to be minimal..?

The image wouldn't have to open a file, read its content from the disk and
send that through a socket.

By "the image" I assume you mean the SqueakSource server image.  But
opening the file takes very little time.  Original web-sites were
.html files, remember how fast those were?  Plus, filesystems "cache"
file contents into their own internal caches anyway...

Each file uses one external semaphore, each socket uses three. If you use a default image, there can be no more than 256 external semaphores which is ridiculous for a server, and it'll just grind to a halt when some load arrives. Every time the external semaphore table is full, a GC is triggered to try clear it up via the finalization process.
Reading a file into memory is slow, writing it to a socket is slow.
(Compared to nginx which uses sendfile to let the kernel handle that).
And Squeak can only use a single process to handle everything.

That’s configurable.  Alas because writing lock-free table growth is not easy the external semaphore table doesn’t grow automatically.  But the vm does allow its size to be specified in a value cached in the image header and read at startup (IIRC).  So we could easily have a 4K entry external semaphore table.



Yes, it still has to return back through alan but I assume alan does
not wait for a "full download" received from andreas before its
already pipeing back to the Squeak client.  If true, then it seems
like it only amounts to saving one hop, which would hardly be
noticeable over what we have now.

The goal of caching is not about saving a hop, but to avoid handling files in Squeak.


Nginx does that thing magnitudes faster than
Squeak.

The UX would not be magnitudes faster though, right?

Directly by letting nginx serving the file, no, but the server image would be less likely to get stalled (return 5xx responses).
But the caching scheme I described in this thread would make the UX a lot quicker too, because data would not have to be transferred when you already have it.


That would also let us save bandwidth by not downloading files already
sitting in the client's package cache.

How so?  Isn't the package-cache checked before hitting the server at
all?  It certainly should be.

No, it's not. Currently that's not possible, because different files can
have the same name. And currently we have no way to tell them apart.

No.  No two MCZ's may have the same name, certainly not withiin the
same repository, because MCRepository cannot support that.  So maybe

Not at the same time, but it's possible, and it just happened recently
with Chronology-ul.21.
It is perfectly possible that a client has a version in its package cache
with the same name as a different version on the server.

But we don't want to restrict what's possible in our software design
because of that.  That situation is already a headache anyway.  Same
name theoretically can come only from the same person (if we ensure
unique initials) and so this is avoidable / fixable by resaving one of
them as a different name...

It wasn't me who created the duplicate. If your suggestion had been in place, some images out there, including mine, would have been broken by the update process.


we need project subdirectories under package-cache to properly
simulate each cached Repository.  I had no idea we were neutering 90%
of the benefits of our package-cache because of this too, and just
sitting here, I can't help wonder whether this is why MCProxy doesn't
work properly either!

The primary purpose of a cache is to *check it first* to speed up
access to something, right?  What you say about package-cache sounds

I don't know. It wasn't me who designed it. :)

I meant ANY "cache".

https://en.wikipedia.org/wiki/Cache_(computing)

It still depends on the purpose of the cache. It's possible that package-cache is just a misnomer or it was just a plan to use it as a cache which hasn't happened yet.


For Monticello, package-cache's other use-case is when an
authentication issue occurs when trying to save to a HTTP repository.
At that point the Version object with the new ancestry was already
constructed in memory, so rather than worry about trying to "undo" all
that, it was simpler and better to save it to a package-cache, persist
it safely so the client can simply move forward from there (get access
to the HTTP and copy it or whatever).

The package-cache is also handy as a default repository and as an offline storage.

Levente


- Chris

really bad we should fix that, not surrender to it.

Yes, that should be fixed, but it needs changes on the server side.
What I always had in mind was to extend the repository listing with
hashes/uuids so that the client could figure out if it needs to download a
specific version. But care must be taken not to break the code for
non-ss repositories (e.g. simple directory listings).

Levente


- Chris






Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Chris Muller-4
In reply to this post by Levente Uzonyi
Whew!   :)

> >>>>> Yes, the SqueakMap server image is one part of the dynamic, but I
> >>>>> think another is a bug in the trunk image.  I think the reason Tim is
> >>>>> not seeing 45 seconds before error is because the timeout setting of
> >>>>> the high-up client is not being passed all the way down to the
> >>>>> lowest-level layers -- e.g., from HTTPSocket --> WebClient -->
> >>>>> SocketStream --> Socket.  By the time it gets down to Socket which
> >>>>> does the actual work, it's operating on its own 30 second timeout.
> >>>>
> >>>> I would expect subsecond reponse times. 30 seconds is just unacceptably
> >>>> long.
> >>>
> >>> Well, it depends on if, for example, you're in the middle of
> >>> Antarctica with a slow internet connection in an office with a fast
> >>> connection.  A 30 second timeout is just the maximum amount of time
> >>> the client will wait for the entire process before presenting a
> >>> debugger, that's all it can do.
> >>
> >> We can be sure that Tim should get subsecond response times instead of
> >> timeouts after 30 seconds.
> >
> > Right, but timeout settings are a necessary tool sometimes, my point
> > was that we should fix client code in trunk to make timeouts work
> > properly.
> >
> > Incidentally, 99% of SqueakMap requests ARE subsecond -- just go to
> > map.squeak.org and click around and see.  For the remaining 1% that
> > aren't, the issue is known and we're working on a new server to fix
> > that.
>
> Great! That was my point: the image needs to be fixed.

But, you're referring to the server image as "the image needs to be
fixed", which I've already conceded, whereas I'm referring to the
client image -- our trunk image -- as also needing the suspected
bug(s) with WebClient (et al) fixed.

> >>>>> It is a fixed amount of time, I *think* still between 30 and 45
> >>>>> seconds, that it takes the SqueakMap server to save its model after an
> >
> > and so if in the meantime it can simply be made to wait 45s instead of
> > 30s, then current SqueakMap will only be that occasional delay at
> > worst, instead of the annoying debugger we currently get.
>
> I don't see why that would make a difference: the user would get a
> debugger anyway, but only 15 seconds later.

No!  :)  As I said:

> >>>>> It is a fixed amount of time, I *think* still between 30 and 45
> >>>>> seconds, that it takes the SqueakMap server to save its model

So they would get a response < 15s later, not a debuuger.

The server needs the same amount of time to save every time whenever
it happens -- it's very predictable -- and right now to avoid a
debugger Squeak trunk image simply needs to be fixed to honor the 45s
timeout instead of ignoring it and always defaulting to 30.

> > Are alan and andreas co-located?
>
> They are cloud servers in the same data center.
>
> >
> >> The file doesn't have to be read from the disk either.
> >
> > I assume you mean "read from disk" on alan?  What about after it's
> > cached so many mcz's in RAM that its paging out to swap file?  To me,
> > wasing precious RAM (of any server) to cache old MCZ file contents
> > that no one will ever download (because they become old very quickly)
> > feels wasteful.  Dragster cars are wasteful too, but yes, they are
> > "faster"... on a dragstrip.  :)  I guess there'd have to be some kind
> > of application-specific smart management of the cache...
>
> Nginx's proxy_cache can handle that all automatically. Also, we don't need
> a large cache. A small, memory-only cache would do it.

How "small" could it be and still contain all the MCZ's you want to
use to update an an "old" image?

> > Levente, what about the trunk directory listing, can it cache that?
>
> Sure.
>
> > That is the _#1 thing_ source.squeak.org is accessing and sending back
> > over, and over, and over again -- every time that MC progress box that
> > says, "Updating [repository name]".
>
> Right, unless you update an older image.

System resources should not be allocated to optimizing "build" and
"initialize" use-cases.  Those UC's are one offs run by developers,
typically even in the background.

System resources should be optimized around actual **end-user's
interacting with UI's**...

> >> If the client does have the mcz, then we save the complete file transfer.
> >>
> >>>
> >>> I don't know what the speed between alan <---> andreas is, but I doubt
> >>> it's much slower than client <---> alan in most cases, so the savings
> >>> would seem to be minimal..?
> >>
> >> The image wouldn't have to open a file, read its content from the disk and
> >> send that through a socket.
> >
> > By "the image" I assume you mean the SqueakSource server image.  But
> > opening the file takes very little time.  Original web-sites were
> > .html files, remember how fast those were?  Plus, filesystems "cache"
> > file contents into their own internal caches anyway...
>
> Each file uses one external semaphore, each socket uses three. If you use
> a default image, there can be no more than 256 external semaphores which
> is ridiculous for a server,

So, that is that (256 / 4 = 64) concurrent requests for a MCZ before
it is full?   Probably enough for our small community, but you also
said that's just a default we can increase?  Something I'd like to
know if I need for Magma too, where can I find this setting?

> and it'll just grind to a halt when some load
> arrives. Every time the external semaphore table is full, a GC is
> triggered to try clear it up via the finalization process.
> Reading a file into memory is slow, writing it to a socket is slow.
> (Compared to nginx which uses sendfile to let the kernel handle that).
> And Squeak can only use a single process to handle everything.

To me, it comes back to UX.  If we ever get enough load for that to be
an issue, it might be worth looking into.

> > Yes, it still has to return back through alan but I assume alan does
> > not wait for a "full download" received from andreas before its
> > already pipeing back to the Squeak client.  If true, then it seems
> > like it only amounts to saving one hop, which would hardly be
> > noticeable over what we have now.
>
> The goal of caching is not about saving a hop, but to avoid handling files
> in Squeak.
>
> >
> >> Nginx does that thing magnitudes faster than
> >> Squeak.
> >
> > The UX would not be magnitudes faster though, right?
>
> Directly by letting nginx serving the file, no, but the server image would
> be less likely to get stalled (return 5xx responses).

SqueakMap and SqueakSource.com are old still with plans for upgrading,
but are you still getting 5xx's on source.squeak.org?

> But the caching scheme I described in this thread would make the UX a lot
> quicker too, because data would not have to be transferred when you
> already have it.

I assume you mean "data would not have to be transferred" from andreas
to alan... from within the same data center..!   :)

> >>>>>> That would also let us save bandwidth by not downloading files already
> >>>>>> sitting in the client's package cache.
> >>>>>
> >>>>> How so?  Isn't the package-cache checked before hitting the server at
> >>>>> all?  It certainly should be.
> >>>>
> >>>> No, it's not. Currently that's not possible, because different files can
> >>>> have the same name. And currently we have no way to tell them apart.
> >>>
> >>> No.  No two MCZ's may have the same name, certainly not withiin the
> >>> same repository, because MCRepository cannot support that.  So maybe
> >>
> >> Not at the same time, but it's possible, and it just happened recently
> >> with Chronology-ul.21.
> >> It is perfectly possible that a client has a version in its package cache
> >> with the same name as a different version on the server.
> >
> > But we don't want to restrict what's possible in our software design
> > because of that.  That situation is already a headache anyway.  Same
> > name theoretically can come only from the same person (if we ensure
> > unique initials) and so this is avoidable / fixable by resaving one of
> > them as a different name...
>
> It wasn't me who created the duplicate. If your suggestion had been in
> place, some images out there, including mine, would have been broken by
> the update process.

I don't think so, since I said it would open up the .mcz in
package-cache and verify the UUID.

I guess I don't know what you mean -- I see only one Chronology-ul.21
in the ancestry currently anyway..

> >>> we need project subdirectories under package-cache to properly
> >>> simulate each cached Repository.  I had no idea we were neutering 90%
> >>> of the benefits of our package-cache because of this too, and just
> >>> sitting here, I can't help wonder whether this is why MCProxy doesn't
> >>> work properly either!
> >>>
> >>> The primary purpose of a cache is to *check it first* to speed up
> >>> access to something, right?  What you say about package-cache sounds
> >>
> >> I don't know. It wasn't me who designed it. :)
> >
> > I meant ANY "cache".
> >
> >   https://en.wikipedia.org/wiki/Cache_(computing)
>
> It still depends on the purpose of the cache. It's possible that
> package-cache is just a misnomer or it was just a plan to use it as a
> cache which hasn't happened yet.
>
> >
> > For Monticello, package-cache's other use-case is when an
> > authentication issue occurs when trying to save to a HTTP repository.
> > At that point the Version object with the new ancestry was already
> > constructed in memory, so rather than worry about trying to "undo" all
> > that, it was simpler and better to save it to a package-cache, persist
> > it safely so the client can simply move forward from there (get access
> > to the HTTP and copy it or whatever).
>
> The package-cache is also handy as a default repository and as an offline
> storage.

I'm sure you would agree it's better for client images to check their
local package-cache first before hitting nginx.


 - Chris

Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Tobias Pape
In reply to this post by Chris Muller-4

> On 27.01.2019, at 23:18, Chris Muller <[hidden email]> wrote:
>
> Hi Levente,
>
>>> Yes, the SqueakMap server image is one part of the dynamic, but I
>>> think another is a bug in the trunk image.  I think the reason Tim is
>>> not seeing 45 seconds before error is because the timeout setting of
>>> the high-up client is not being passed all the way down to the
>>> lowest-level layers -- e.g., from HTTPSocket --> WebClient -->
>>> SocketStream --> Socket.  By the time it gets down to Socket which
>>> does the actual work, it's operating on its own 30 second timeout.
>>
>> I would expect subsecond reponse times. 30 seconds is just unacceptably
>> long.
>
> Well, it depends on if, for example, you're in the middle of
> Antarctica with a slow internet connection in an office with a fast
> connection.  A 30 second timeout is just the maximum amount of time
> the client will wait for the entire process before presenting a
> debugger, that's all it can do.
>
>>> It is a fixed amount of time, I *think* still between 30 and 45
>>> seconds, that it takes the SqueakMap server to save its model after an
>>> update (e.g., adding a Release, etc.).  It's so long because the
>>> server is running on a very old 3.x image, interpreter VM.  It's
>>> running a HttpView2 app which doesn't even compile in modern Squeak.
>>> That's why it hasn't been brought forward yet, but I am working on a
>>> new API service to replace it with the eventual goal of SqueakMap
>>> being an "App Store" experience, and it will not suffer timeouts.
>>>
>>>>> but also:
>>>>> - we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache
>>>>> - we could make alan not even ask ted when we know the answer already.
>>>>> - Attention: we need a lot of information on what is stable and what not to do this.
>>>>> - (its tempting to try, tho)
>>>>> - (we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)
>>>>
>>>> If squeaksource/mc used ETags, then the squeaksource image could simply
>>>> return 304 and let nginx serve the cached mczs while keeping the
>>>> statistics updated.
>>>
>>> Tim's email was about SqueakMap, not SqueakSource.  SqueakSource
>>
>> That part of the thread changed direction. It happens sometimes.
>>
>>> serves the mcz's straight off the hard-drive platter.  We don't need
>>> to trade away download statistics to save a few ms on a mcz request.
>>
>> Download statistics would stay the same despite being flawed (e.g.
>> you'll download everything multiple times even if those files are sitting
>> in your package cache).
>
> Not if we fix the package-cache (more about this, below).
>
>> You would save seconds, not milliseconds by not downloading files again.
>
> IIUC, you're saying we would save one hope in the "download" --
> instead of client <--> alan <--> andreas, it would just be client <-->
> alan.  Is that right?

Yes.

>
> I don't know what the speed between alan <---> andreas is, but I doubt
> it's much slower than client <---> alan in most cases, so the savings
> would seem to be minimal..?

No. It is not about bandwidth. Nginx is much faster in serving files than
(a) squeak/seaside/squeaksource is and
(b) there is no network/bookkeeping/requesthandling involved when nginx just
    serves files. and even if there is (eg, w/ x-accel-redirect), nginx ist just
    plain faster.

>
>>>> That would also let us save bandwidth by not downloading files already
>>>> sitting in the client's package cache.
>>>
>>> How so?  Isn't the package-cache checked before hitting the server at
>>> all?  It certainly should be.
>>
>> No, it's not. Currently that's not possible, because different files can
>> have the same name. And currently we have no way to tell them apart.
>
> No.  No two MCZ's may have the same name, certainly not withiin the
> same repository, because MCRepository cannot support that.  So maybe
> we need project subdirectories under package-cache to properly
> simulate each cached Repository.  I had no idea we were neutering 90%
> of the benefits of our package-cache because of this too, and just
> sitting here, I can't help wonder whether this is why MCProxy doesn't
> work properly either!

That would be only true if we never would rewrite history or move packges from inbox anywhere else…

>
> The primary purpose of a cache is to *check it first* to speed up
> access to something, right?  What you say about package-cache sounds
> really bad we should fix that, not surrender to it.
>
> - Chris
>


Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Tobias Pape
In reply to this post by Levente Uzonyi

> On 27.01.2019, at 21:48, Levente Uzonyi <[hidden email]> wrote:
>
> On Sun, 27 Jan 2019, Chris Muller wrote:
>
>> Hi guys,
>>
>>>>> A couple of weeks ago I had a problem loading something via SqueakMap that resulted in a 504 error. Chris M quite rightly pointed out that responding to a timeout with an immediate retry might not be the best thing (referencing some code I published to try to handle this problem); looking at the error more closely I finally noticed that a 504 is a *gateway* timeout rather than anything that seems likely to be a problem at the SM or MC repository server. Indeed the error came back much quicker than the 45 seconds timeout that we seem to have set for our http connections.
>>>>>
>>>>> I'm a long way from being an expert in the area of connecting to servers via gateways and what their timeous might be etc. so excuse stupid-question syndrome - I know this isn't Quora where stupid-question is the order of the day.
>>>>> Am I right in thinking that a 504 error means that some *intermediate* server timed out according to some setting in its internal config ?
>>>>> Am I right in imagining that we can't normally affect that timeout?
>>>>>
>>>>
>>>> Well, we can.
>>>>
>>>> What happens here:
>>>>
>>>> - All our websites, including all HTTP services, such as the Map, arrive together at squeak.org, aka alan.box.squeak.org
>>>> That is an nginx server. And also the server who eventually spits out the 504.
>>>> - alan then sees we want a connection to the Map, and does a HTTP request to ted.box.squeak.org (=> alan is a _reverse proxy_)
>>>> and upon response gets us that back.
>>
>> Thanks for the great explanation!  I want to learn more about
>> admin'ing, so its great to have this in-context example of a
>> reverse-proxy, thanks for setting that up!
>>
>>>> - if ted fails to respond in 60s, alan gives a 504.
>>
>> 60s seems like a ideally balanced timeout setting -- the longest any
>> possible request should be expected to wait ... and yet clients can
>> still shorten to 45s or 30 if they want a shorter timeout.
>>
>>>> Simple as that. This limits the possibility that we wait too long (ie >60s) on ted.
>>>>
>>>> Elephant in the room: why not directly ted? the nginx on alan is configured as hardened as I thought best, and actually can handle a multitude of requests much better than our squeak-based "application servers". This distinction between reverse proxy and application server is btw quite standard and enables some things. For example:
>>>>
>>>> We can tune a lot of things on alan with regards to how it should handle things. The simplest being:
>>>>
>>>> - we can tune the timeout: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout
>>>> that's where the 60s come from, and we could simply crank it up.
>>>> - HOWEVER: this could mean we eventually run into other timeouts, for example on the server or even in TCP or so.
>>>> - so increasing this just like that _may_ help or _may_ make the Map useless altogether, so please be careful y'all :)
>>>
>>> Tim reported shorter than 45s timeouts, so it is very likely an issue with
>>> the SqueakMap image.
>>
>> Yes, the SqueakMap server image is one part of the dynamic, but I
>> think another is a bug in the trunk image.  I think the reason Tim is
>> not seeing 45 seconds before error is because the timeout setting of
>> the high-up client is not being passed all the way down to the
>> lowest-level layers -- e.g., from HTTPSocket --> WebClient -->
>> SocketStream --> Socket.  By the time it gets down to Socket which
>> does the actual work, it's operating on its own 30 second timeout.
>
> I would expect subsecond reponse times. 30 seconds is just unacceptably long.
>
>>
>> It is a fixed amount of time, I *think* still between 30 and 45
>> seconds, that it takes the SqueakMap server to save its model after an
>> update (e.g., adding a Release, etc.).  It's so long because the
>> server is running on a very old 3.x image, interpreter VM.  It's
>> running a HttpView2 app which doesn't even compile in modern Squeak.
>> That's why it hasn't been brought forward yet, but I am working on a
>> new API service to replace it with the eventual goal of SqueakMap
>> being an "App Store" experience, and it will not suffer timeouts.
>>
>>>> but also:
>>>> - we can cache: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache
>>>> - we could make alan not even ask ted when we know the answer already.
>>>> - Attention: we need a lot of information on what is stable and what not to do this.
>>>> - (its tempting to try, tho)
>>>> - (we probably want that for squeaksource/source.squeak for the MCZ requests. but we lose the download statistics then…)
>>>
>>> If squeaksource/mc used ETags, then the squeaksource image could simply
>>> return 304 and let nginx serve the cached mczs while keeping the
>>> statistics updated.
>>
>> Tim's email was about SqueakMap, not SqueakSource.  SqueakSource
>
> That part of the thread changed direction. It happens sometimes.
>
>> serves the mcz's straight off the hard-drive platter.  We don't need
>> to trade away download statistics to save a few ms on a mcz request.
>
> Download statistics would stay the same despite being flawed (e.g. you'll download everything multiple times even if those files are sitting in your package cache).
> You would save seconds, not milliseconds by not downloading files again.

I think we trivially could make that happen by using X-Sendfile (apapche) or X-Accel-Redirect (nginx).
(https://www.nginx.com/resources/wiki/start/topics/examples/x-accel/)

The image gets the request but instead of searchign and serving the file, it answers with such a header and the reverse-proxy takes care of the rest.
Problem here: reverse-proxy must have access to the files, which it currently has not.

>
>>
>>> That would also let us save bandwidth by not downloading files already
>>> sitting in the client's package cache.
>>
>> How so?  Isn't the package-cache checked before hitting the server at
>> all?  It certainly should be.
>
> No, it's not. Currently that's not possible, because different files can have the same name. And currently we have no way to tell them apart.
>
> Levente
>
>>
>> Best,
>> Chris
>>
>>
>>> We could also use nginx to serve files instead of the image, but then the
>>> image would have to know that it's sitting behind nginx.
>>>
>>>> - Note: a lot of time is probably spend by ted generating HTTP and by alan parsing HTTP. Using Fcgi, for example, reduces that, and is supported by both nginx (https://nginx.org/en/docs/http/ngx_http_fastcgi_module.html) and GemStone, but I don't know whether we already have one in squeak.
>>>
>>> I'm 99% sure http overhead is negligible.
>>>
>>> Levente
>>>
>>>>
>>>>> If I have any reasonable grasp on this then we  should probably detect the 504 (in part by explicitly using a WebClient and its error handling rather than the slightly wonky httpSocket faced we have currently) and retry the connection ? Any other error or a timeout at *our* end would still be best handled as an error.
>>>>
>>>> All 500-ish codes essentially say "the server is to blame" and the client can do noghitn about that.
>>>> I don't think that 504 is meaningfully better handled than 503 or 502 in the WebClient. It think it's ok to pass that through.
>>>>
>>>>
>>>>>
>>>>> Except of course a 418 which has well defined error handling...
>>>>>
>>>>
>>>> At least not 451…
>>>>
>>>> Best regards
>>>>      -Tobias
>>>>
>>>>> tim
>>>>> --
>>>>> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
>>>>> You forgot to do your backup 16 days ago.  Tomorrow you'll need that version.
>>>>>
>>>>>
>>>>>


Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Tobias Pape
In reply to this post by Chris Muller-4

On 28.01.2019, at 01:39, Chris Muller <[hidden email]> wrote:

Hi,

Yes, the SqueakMap server image is one part of the dynamic, but I
think another is a bug in the trunk image.  I think the reason Tim is
not seeing 45 seconds before error is because the timeout setting of
the high-up client is not being passed all the way down to the
lowest-level layers -- e.g., from HTTPSocket --> WebClient -->
SocketStream --> Socket.  By the time it gets down to Socket which
does the actual work, it's operating on its own 30 second timeout.

I would expect subsecond reponse times. 30 seconds is just unacceptably
long.

Well, it depends on if, for example, you're in the middle of
Antarctica with a slow internet connection in an office with a fast
connection.  A 30 second timeout is just the maximum amount of time
the client will wait for the entire process before presenting a
debugger, that's all it can do.

We can be sure that Tim should get subsecond response times instead of
timeouts after 30 seconds.

Right, but timeout settings are a necessary tool sometimes, my point
was that we should fix client code in trunk to make timeouts work
properly.

Incidentally, 99% of SqueakMap requests ARE subsecond -- just go to
map.squeak.org and click around and see.  For the remaining 1% that
aren't, the issue is known and we're working on a new server to fix
that.

It is a fixed amount of time, I *think* still between 30 and 45
seconds, that it takes the SqueakMap server to save its model after an

and so if in the meantime it can simply be made to wait 45s instead of
30s, then current SqueakMap will only be that occasional delay at
worst, instead of the annoying debugger we currently get.

You would save seconds, not milliseconds by not downloading files again.

IIUC, you're saying we would save one hope in the "download" --
instead of client <--> alan <--> andreas, it would just be client <-->
alan.  Is that right?

No. If the client doesn't have the mcz in the package cache but nginx has
it in its cache, then we save the transfer of data between alan and
andreas.

Are alan and andreas co-located?

They're VMs on rackspace. The slowest bandwidth Rackspace has is 200 MBit/s, the fastest 2 GBit/s, i forgot which we have.
The network is not the limiting factor here, Squeak is.


The file doesn't have to be read from the disk either.

I assume you mean "read from disk" on alan?  What about after it's
cached so many mcz's in RAM that its paging out to swap file?  To me,
wasing precious RAM (of any server) to cache old MCZ file contents
that no one will ever download (because they become old very quickly)
feels wasteful.  Dragster cars are wasteful too, but yes, they are
"faster"... on a dragstrip.  :)  I guess there'd have to be some kind
of application-specific smart management of the cache...

Levente, what about the trunk directory listing, can it cache that?
That is the _#1 thing_ source.squeak.org is accessing and sending back
over, and over, and over again -- every time that MC progress box that
says, "Updating [repository name]".

If the client does have the mcz, then we save the complete file transfer.


I don't know what the speed between alan <---> andreas is, but I doubt
it's much slower than client <---> alan in most cases, so the savings
would seem to be minimal..?

The image wouldn't have to open a file, read its content from the disk and
send that through a socket.

By "the image" I assume you mean the SqueakSource server image.  But
opening the file takes very little time.  Original web-sites were
.html files, remember how fast those were?  Plus, filesystems "cache"
file contents into their own internal caches anyway...

Yes, it still has to return back through alan but I assume alan does
not wait for a "full download" received from andreas before its
already pipeing back to the Squeak client.  If true, then it seems
like it only amounts to saving one hop, which would hardly be
noticeable over what we have now.

Nginx does that thing magnitudes faster than
Squeak.

The UX would not be magnitudes faster though, right?

That would also let us save bandwidth by not downloading files already
sitting in the client's package cache.

How so?  Isn't the package-cache checked before hitting the server at
all?  It certainly should be.

No, it's not. Currently that's not possible, because different files can
have the same name. And currently we have no way to tell them apart.

No.  No two MCZ's may have the same name, certainly not withiin the
same repository, because MCRepository cannot support that.  So maybe

Not at the same time, but it's possible, and it just happened recently
with Chronology-ul.21.
It is perfectly possible that a client has a version in its package cache
with the same name as a different version on the server.

But we don't want to restrict what's possible in our software design
because of that.  That situation is already a headache anyway.  Same
name theoretically can come only from the same person (if we ensure
unique initials) and so this is avoidable / fixable by resaving one of
them as a different name...

we need project subdirectories under package-cache to properly
simulate each cached Repository.  I had no idea we were neutering 90%
of the benefits of our package-cache because of this too, and just
sitting here, I can't help wonder whether this is why MCProxy doesn't
work properly either!

The primary purpose of a cache is to *check it first* to speed up
access to something, right?  What you say about package-cache sounds

I don't know. It wasn't me who designed it. :)

I meant ANY "cache".

 https://en.wikipedia.org/wiki/Cache_(computing)

For Monticello, package-cache's other use-case is when an
authentication issue occurs when trying to save to a HTTP repository.
At that point the Version object with the new ancestry was already
constructed in memory, so rather than worry about trying to "undo" all
that, it was simpler and better to save it to a package-cache, persist
it safely so the client can simply move forward from there (get access
to the HTTP and copy it or whatever).

- Chris

really bad we should fix that, not surrender to it.

Yes, that should be fixed, but it needs changes on the server side.
What I always had in mind was to extend the repository listing with
hashes/uuids so that the client could figure out if it needs to download a
specific version. But care must be taken not to break the code for
non-ss repositories (e.g. simple directory listings).

Levente


- Chris



Reply | Threaded
Open this post in threaded view
|

Re: Server timeouts and 504 return codes

Tobias Pape
In reply to this post by Chris Muller-4

> On 28.01.2019, at 01:39, Chris Muller <[hidden email]> wrote:
>
> Hi,
>
>>>>> Yes, the SqueakMap server image is one part of the dynamic, but I
>>>>> think another is a bug in the trunk image.  I think the reason Tim is
>>>>> not seeing 45 seconds before error is because the timeout setting of
>>>>> the high-up client is not being passed all the way down to the
>>>>> lowest-level layers -- e.g., from HTTPSocket --> WebClient -->
>>>>> SocketStream --> Socket.  By the time it gets down to Socket which
>>>>> does the actual work, it's operating on its own 30 second timeout.
>>>>
>>>> I would expect subsecond reponse times. 30 seconds is just unacceptably
>>>> long.
>>>
>>> Well, it depends on if, for example, you're in the middle of
>>> Antarctica with a slow internet connection in an office with a fast
>>> connection.  A 30 second timeout is just the maximum amount of time
>>> the client will wait for the entire process before presenting a
>>> debugger, that's all it can do.
>>
>> We can be sure that Tim should get subsecond response times instead of
>> timeouts after 30 seconds.
>
> Right, but timeout settings are a necessary tool sometimes, my point
> was that we should fix client code in trunk to make timeouts work
> properly.
>
> Incidentally, 99% of SqueakMap requests ARE subsecond -- just go to
> map.squeak.org and click around and see.  For the remaining 1% that
> aren't, the issue is known and we're working on a new server to fix
> that.
>
>>>>> It is a fixed amount of time, I *think* still between 30 and 45
>>>>> seconds, that it takes the SqueakMap server to save its model after an
>
> and so if in the meantime it can simply be made to wait 45s instead of
> 30s, then current SqueakMap will only be that occasional delay at
> worst, instead of the annoying debugger we currently get.
>
>>>> You would save seconds, not milliseconds by not downloading files again.
>>>
>>> IIUC, you're saying we would save one hope in the "download" --
>>> instead of client <--> alan <--> andreas, it would just be client <-->
>>> alan.  Is that right?
>>
>> No. If the client doesn't have the mcz in the package cache but nginx has
>> it in its cache, then we save the transfer of data between alan and
>> andreas.
>
> Are alan and andreas co-located?

They're VMs on rackspace. The slowest bandwidth Rackspace has is 200 MBit/s, the fastest 2 GBit/s, i forgot which we have.
The network is not the limiting factor here, Squeak is.

>
>> The file doesn't have to be read from the disk either.
>
> I assume you mean "read from disk" on alan?  What about after it's
> cached so many mcz's in RAM that its paging out to swap file?  To me,
> wasing precious RAM (of any server) to cache old MCZ file contents
> that no one will ever download (because they become old very quickly)
> feels wasteful.  Dragster cars are wasteful too, but yes, they are
> "faster"... on a dragstrip.  :)  I guess there'd have to be some kind
> of application-specific smart management of the cache...
>
> Levente, what about the trunk directory listing, can it cache that?
> That is the _#1 thing_ source.squeak.org is accessing and sending back
> over, and over, and over again -- every time that MC progress box that
> says, "Updating [repository name]".
>
>> If the client does have the mcz, then we save the complete file transfer.
>>
>>>
>>> I don't know what the speed between alan <---> andreas is, but I doubt
>>> it's much slower than client <---> alan in most cases, so the savings
>>> would seem to be minimal..?
>>
>> The image wouldn't have to open a file, read its content from the disk and
>> send that through a socket.
>
> By "the image" I assume you mean the SqueakSource server image.  But
> opening the file takes very little time.  Original web-sites were
> .html files, remember how fast those were?  Plus, filesystems "cache"
> file contents into their own internal caches anyway...
>
> Yes, it still has to return back through alan but I assume alan does
> not wait for a "full download" received from andreas before its
> already pipeing back to the Squeak client.  If true, then it seems
> like it only amounts to saving one hop, which would hardly be
> noticeable over what we have now.
>
>> Nginx does that thing magnitudes faster than
>> Squeak.
>
> The UX would not be magnitudes faster though, right?
>
>>>>>> That would also let us save bandwidth by not downloading files already
>>>>>> sitting in the client's package cache.
>>>>>
>>>>> How so?  Isn't the package-cache checked before hitting the server at
>>>>> all?  It certainly should be.
>>>>
>>>> No, it's not. Currently that's not possible, because different files can
>>>> have the same name. And currently we have no way to tell them apart.
>>>
>>> No.  No two MCZ's may have the same name, certainly not withiin the
>>> same repository, because MCRepository cannot support that.  So maybe
>>
>> Not at the same time, but it's possible, and it just happened recently
>> with Chronology-ul.21.
>> It is perfectly possible that a client has a version in its package cache
>> with the same name as a different version on the server.
>
> But we don't want to restrict what's possible in our software design
> because of that.  That situation is already a headache anyway.  Same
> name theoretically can come only from the same person (if we ensure
> unique initials) and so this is avoidable / fixable by resaving one of
> them as a different name...
>
>>> we need project subdirectories under package-cache to properly
>>> simulate each cached Repository.  I had no idea we were neutering 90%
>>> of the benefits of our package-cache because of this too, and just
>>> sitting here, I can't help wonder whether this is why MCProxy doesn't
>>> work properly either!
>>>
>>> The primary purpose of a cache is to *check it first* to speed up
>>> access to something, right?  What you say about package-cache sounds
>>
>> I don't know. It wasn't me who designed it. :)
>
> I meant ANY "cache".
>
>  https://en.wikipedia.org/wiki/Cache_(computing)
>
> For Monticello, package-cache's other use-case is when an
> authentication issue occurs when trying to save to a HTTP repository.
> At that point the Version object with the new ancestry was already
> constructed in memory, so rather than worry about trying to "undo" all
> that, it was simpler and better to save it to a package-cache, persist
> it safely so the client can simply move forward from there (get access
> to the HTTP and copy it or whatever).
>
> - Chris
>
>>> really bad we should fix that, not surrender to it.
>>
>> Yes, that should be fixed, but it needs changes on the server side.
>> What I always had in mind was to extend the repository listing with
>> hashes/uuids so that the client could figure out if it needs to download a
>> specific version. But care must be taken not to break the code for
>> non-ss repositories (e.g. simple directory listings).
>>
>> Levente
>>
>>>
>>> - Chris


123