source.squeak.org --- Responsiveness

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

source.squeak.org --- Responsiveness

marcel.taeumel
Hi, there.

Since several weeks/months now, I cannot update a single package without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox.

What's going on there?! That used to work fine. Timeouts were rare. Gateway errors non-existent.

Best,
Marcel


Reply | Threaded
Open this post in threaded view
|

Re: source.squeak.org --- Responsiveness

David T. Lewis
On Thu, Jan 18, 2018 at 08:10:31AM +0100, Marcel Taeumel wrote:
> Hi, there.
>
> Since several weeks/months now, I cannot update a single package without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox.
>
> What's going on there?! That used to work fine. Timeouts were rare. Gateway errors non-existent.
>

I think that the problem goes back longer than that, although it does seem
to be getting worse in recent months.

My guess (and it is only a guess) is that there are two possible causes:

1) If I recall right, the VM that is installed with source.squeak.org (which
is quite old now) came from a time at which there were problems with the garbage
collector that led to noticeable delays. It is possible that updating the VM
to a more recent version would make this go away.

2) The image is backed by Magma, and it is possible that something there is
eating time when an update is made to a repository.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: source.squeak.org --- Responsiveness

Chris Muller-3
Hi,

I was able to VNC right into the server image.  It is responsive,
however, there are a ton of processes apparently stuck on a
Mutex>>#critical: block.  I think that explains the timeouts.

The service was last restarted 204 days ago.  I'll contact box-admins
and board about restarting the service, that should clear it up.

 - Chris



On Thu, Jan 18, 2018 at 6:12 AM, David T. Lewis <[hidden email]> wrote:

> On Thu, Jan 18, 2018 at 08:10:31AM +0100, Marcel Taeumel wrote:
>> Hi, there.
>>
>> Since several weeks/months now, I cannot update a single package without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox.
>>
>> What's going on there?! That used to work fine. Timeouts were rare. Gateway errors non-existent.
>>
>
> I think that the problem goes back longer than that, although it does seem
> to be getting worse in recent months.
>
> My guess (and it is only a guess) is that there are two possible causes:
>
> 1) If I recall right, the VM that is installed with source.squeak.org (which
> is quite old now) came from a time at which there were problems with the garbage
> collector that led to noticeable delays. It is possible that updating the VM
> to a more recent version would make this go away.
>
> 2) The image is backed by Magma, and it is possible that something there is
> eating time when an update is made to a repository.
>
> Dave
>
>



squeaksource2.png (509K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: source.squeak.org --- Responsiveness

Eliot Miranda-2


On Thu, Jan 18, 2018 at 5:42 PM, Chris Muller <[hidden email]> wrote:
Hi,

I was able to VNC right into the server image.  It is responsive,
however, there are a ton of processes apparently stuck on a
Mutex>>#critical: block.  I think that explains the timeouts.

The service was last restarted 204 days ago.  I'll contact box-admins
and board about restarting the service, that should clear it up.

and for my information what version of Squeak and what VM is it running?


 - Chris



On Thu, Jan 18, 2018 at 6:12 AM, David T. Lewis <[hidden email]> wrote:
> On Thu, Jan 18, 2018 at 08:10:31AM +0100, Marcel Taeumel wrote:
>> Hi, there.
>>
>> Since several weeks/months now, I cannot update a single package without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox.
>>
>> What's going on there?! That used to work fine. Timeouts were rare. Gateway errors non-existent.
>>
>
> I think that the problem goes back longer than that, although it does seem
> to be getting worse in recent months.
>
> My guess (and it is only a guess) is that there are two possible causes:
>
> 1) If I recall right, the VM that is installed with source.squeak.org (which
> is quite old now) came from a time at which there were problems with the garbage
> collector that led to noticeable delays. It is possible that updating the VM
> to a more recent version would make this go away.
>
> 2) The image is backed by Magma, and it is possible that something there is
> eating time when an update is made to a repository.
>
> Dave
>
>






--
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: source.squeak.org --- Responsiveness

KenCausey
Clearly I'm well out of the loop at this point so I'm likely wrong. But, the way any Squeak hosted service that was setup and managed by the Box-Admins team in the past it will automatically restart if it quits (using daemontools).

On Thu, Jan 18, 2018 at 8:17 PM, Eliot Miranda <[hidden email]> wrote:


On Thu, Jan 18, 2018 at 5:42 PM, Chris Muller <[hidden email]> wrote:
Hi,

I was able to VNC right into the server image.  It is responsive,
however, there are a ton of processes apparently stuck on a
Mutex>>#critical: block.  I think that explains the timeouts.

The service was last restarted 204 days ago.  I'll contact box-admins
and board about restarting the service, that should clear it up.

and for my information what version of Squeak and what VM is it running?


 - Chris



On Thu, Jan 18, 2018 at 6:12 AM, David T. Lewis <[hidden email]> wrote:
> On Thu, Jan 18, 2018 at 08:10:31AM +0100, Marcel Taeumel wrote:
>> Hi, there.
>>
>> Since several weeks/months now, I cannot update a single package without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox.
>>
>> What's going on there?! That used to work fine. Timeouts were rare. Gateway errors non-existent.
>>
>
> I think that the problem goes back longer than that, although it does seem
> to be getting worse in recent months.
>
> My guess (and it is only a guess) is that there are two possible causes:
>
> 1) If I recall right, the VM that is installed with source.squeak.org (which
> is quite old now) came from a time at which there were problems with the garbage
> collector that led to noticeable delays. It is possible that updating the VM
> to a more recent version would make this go away.
>
> 2) The image is backed by Magma, and it is possible that something there is
> eating time when an update is made to a repository.
>
> Dave
>
>






--
_,,,^..^,,,_
best, Eliot






Reply | Threaded
Open this post in threaded view
|

Re: source.squeak.org --- Responsiveness

David T. Lewis
Hi Ken,

On Thu, Jan 18, 2018 at 08:32:54PM -0600, Ken Causey wrote:
> Clearly I'm well out of the loop at this point so I'm likely wrong. But,
> the way any Squeak hosted service that was setup and managed by the
> Box-Admins team in the past it will automatically restart if it quits
> (using daemontools).

Yes the daemontools setup is still in effect and works a champ, thank you :-)

I think Chris is just being cautious in asking, since this is our main
source repository server.

Dave


>
> On Thu, Jan 18, 2018 at 8:17 PM, Eliot Miranda <[hidden email]>
> wrote:
>
> >
> >
> > On Thu, Jan 18, 2018 at 5:42 PM, Chris Muller <[hidden email]> wrote:
> >
> >> Hi,
> >>
> >> I was able to VNC right into the server image.  It is responsive,
> >> however, there are a ton of processes apparently stuck on a
> >> Mutex>>#critical: block.  I think that explains the timeouts.
> >>
> >> The service was last restarted 204 days ago.  I'll contact box-admins
> >> and board about restarting the service, that should clear it up.
> >>
> >
> > and for my information what version of Squeak and what VM is it running?
> >
> >
> >>  - Chris
> >>
> >>
> >>
> >> On Thu, Jan 18, 2018 at 6:12 AM, David T. Lewis <[hidden email]>
> >> wrote:
> >> > On Thu, Jan 18, 2018 at 08:10:31AM +0100, Marcel Taeumel wrote:
> >> >> Hi, there.
> >> >>
> >> >> Since several weeks/months now, I cannot update a single package
> >> without either getting a gateway error or a connection timeout. Luckily,
> >> the timeout means that the code update was at least completed, which I can
> >> observe in my email inbox.
> >> >>
> >> >> What's going on there?! That used to work fine. Timeouts were rare.
> >> Gateway errors non-existent.
> >> >>
> >> >
> >> > I think that the problem goes back longer than that, although it does
> >> seem
> >> > to be getting worse in recent months.
> >> >
> >> > My guess (and it is only a guess) is that there are two possible causes:
> >> >
> >> > 1) If I recall right, the VM that is installed with source.squeak.org
> >> (which
> >> > is quite old now) came from a time at which there were problems with
> >> the garbage
> >> > collector that led to noticeable delays. It is possible that updating
> >> the VM
> >> > to a more recent version would make this go away.
> >> >
> >> > 2) The image is backed by Magma, and it is possible that something
> >> there is
> >> > eating time when an update is made to a repository.
> >> >
> >> > Dave
> >> >
> >> >
> >>
> >>
> >>
> >>
> >
> >
> > --
> > _,,,^..^,,,_
> > best, Eliot
> >
> >
> >
> >

>


Reply | Threaded
Open this post in threaded view
|

Re: source.squeak.org --- Responsiveness

Chris Muller-3
In reply to this post by Eliot Miranda-2
>> I was able to VNC right into the server image.  It is responsive,
>> however, there are a ton of processes apparently stuck on a
>> Mutex>>#critical: block.  I think that explains the timeouts.
>>
>> The service was last restarted 204 days ago.  I'll contact box-admins
>> and board about restarting the service, that should clear it up.
>
> and for my information what version of Squeak and what VM is it running?

The production VM released with Squeak 5.1.

    5.0-201608171728  Sun Sep 25 16:02:24 UTC 2016 gcc 4.6.3
[Production Spur VM]

It's been a few months since I tried the most recent VM.  All the
newer ones I'd ever tried since the GC rewrite would crash more often
than I could bear.

I run this same code base and VM to support my own code repository as
a local daemontools service.  It doesn't have the volume
source.squeak.org has, but it has been stable for me.

Reply | Threaded
Open this post in threaded view
|

Re: source.squeak.org --- Responsiveness

Chris Muller-3
In reply to this post by marcel.taeumel
It's restarted.  You should be able to use it normally, however, the
last commit Magma got was on 6-Jan-2018 (every one since then got
stuck on the Mutex), so every commit since then will be recovered (its
revision history indexed into the Magma DB) in the background, so you
may experience some sluggishness for the next few
days.

Thanks for your patience, sorry for any inconvenience.

 - Chris

On Thu, Jan 18, 2018 at 1:10 AM, Marcel Taeumel <[hidden email]> wrote:

> Hi, there.
>
> Since several weeks/months now, I cannot update a single package without
> either getting a gateway error or a connection timeout. Luckily, the timeout
> means that the code update was at least completed, which I can observe in my
> email inbox.
>
> What's going on there?! That used to work fine. Timeouts were rare. Gateway
> errors non-existent.
>
> Best,
> Marcel
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: source.squeak.org --- Responsiveness

David T. Lewis
In reply to this post by Chris Muller-3
On Thu, Jan 18, 2018 at 09:09:00PM -0600, Chris Muller wrote:

> >> I was able to VNC right into the server image.  It is responsive,
> >> however, there are a ton of processes apparently stuck on a
> >> Mutex>>#critical: block.  I think that explains the timeouts.
> >>
> >> The service was last restarted 204 days ago.  I'll contact box-admins
> >> and board about restarting the service, that should clear it up.
> >
> > and for my information what version of Squeak and what VM is it running?
>
> The production VM released with Squeak 5.1.
>
>     5.0-201608171728  Sun Sep 25 16:02:24 UTC 2016 gcc 4.6.3
> [Production Spur VM]
>
> It's been a few months since I tried the most recent VM.  All the
> newer ones I'd ever tried since the GC rewrite would crash more often
> than I could bear.
>
> I run this same code base and VM to support my own code repository as
> a local daemontools service.  It doesn't have the volume
> source.squeak.org has, but it has been stable for me.
>

I think that my mention of garbage collection as a possible cause is
a red herring. Likewise my mention of Magma backing store. Those were
just the only two things I could think of that were obviously different
from the other squeaksource image that we are running.

In any case, 204 days of continuous service without a restart is nothing
to be unhappy about :-)

Dave
 

Reply | Threaded
Open this post in threaded view
|

Re: source.squeak.org --- Responsiveness

Chris Muller-4
:)  Your memory was keener than mine, actually.  As I tail'd the log
when it came back up, I saw the message "Starting Garbage
Collection", and it reminded me about this issue from a couple of
years back..  A strange phenomena with this application
(SqueakSource+Magma) and VM that, upon completion of the initial
loading of the root SSRepository object, at some later time whenever
the first garbage collection after that would take like 2 minutes.
But, after that, it was pretty much fine, pretty snappy.

So, rather than the enduring that pain at a random time, I decided
that at a known time was better.  On startup.


On Thu, Jan 18, 2018 at 9:36 PM, David T. Lewis <[hidden email]> wrote:

> On Thu, Jan 18, 2018 at 09:09:00PM -0600, Chris Muller wrote:
>> >> I was able to VNC right into the server image.  It is responsive,
>> >> however, there are a ton of processes apparently stuck on a
>> >> Mutex>>#critical: block.  I think that explains the timeouts.
>> >>
>> >> The service was last restarted 204 days ago.  I'll contact box-admins
>> >> and board about restarting the service, that should clear it up.
>> >
>> > and for my information what version of Squeak and what VM is it running?
>>
>> The production VM released with Squeak 5.1.
>>
>>     5.0-201608171728  Sun Sep 25 16:02:24 UTC 2016 gcc 4.6.3
>> [Production Spur VM]
>>
>> It's been a few months since I tried the most recent VM.  All the
>> newer ones I'd ever tried since the GC rewrite would crash more often
>> than I could bear.
>>
>> I run this same code base and VM to support my own code repository as
>> a local daemontools service.  It doesn't have the volume
>> source.squeak.org has, but it has been stable for me.
>>
>
> I think that my mention of garbage collection as a possible cause is
> a red herring. Likewise my mention of Magma backing store. Those were
> just the only two things I could think of that were obviously different
> from the other squeaksource image that we are running.
>
> In any case, 204 days of continuous service without a restart is nothing
> to be unhappy about :-)
>
> Dave
>

Reply | Threaded
Open this post in threaded view
|

Re: source.squeak.org --- Responsiveness

Eliot Miranda-2
In reply to this post by marcel.taeumel
Hi All,

On Wed, Jan 17, 2018 at 11:10 PM, Marcel Taeumel <[hidden email]> wrote:
Hi, there.

Since several weeks/months now, I cannot update a single package without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox.

What's going on there?! That used to work fine. Timeouts were rare. Gateway errors non-existent.

I think the main problem is that the server is unresponsive while it generates the diff email to send to the mailing lists.  I say this because committing VMMaker.oscog, a huge package, always times out, and the server can be unresponsive thereafter for many minutes, whereas committing the Cog package to the very same repository, which is far smaller, does not cause a timeout.  f course it could be storing the package to the file system, but I doubt that very much.

So I think we need to rewrite the server to move the computation of and mailing of the diff to a lower priority, so that answering and receiving versions gets priority over reporting changes to the mailing list.  Ion the case of VMMaker.oscog the diff often gets thrown away anyway because it is often very large.

I'm not familiar with the packages that implement the server, nor what the development, testing and installation process is, but I'd love to pair with someone on fixing the responsiveness issue and learn.

Best,
Marcel

_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: source.squeak.org --- Responsiveness

David T. Lewis

On Tue, Jan 23, 2018 at 01:26:37PM -0800, Eliot Miranda wrote:

> Hi All,
>
> On Wed, Jan 17, 2018 at 11:10 PM, Marcel Taeumel <[hidden email]>
> wrote:
>
> > Hi, there.
> >
> > Since several weeks/months now, I cannot update a single package without
> > either getting a gateway error or a connection timeout. Luckily, the
> > timeout means that the code update was at least completed, which I can
> > observe in my email inbox.
> >
> > What's going on there?! That used to work fine. Timeouts were rare.
> > Gateway errors non-existent.
> >
>
> I think the main problem is that the server is unresponsive while it
> generates the diff email to send to the mailing lists.  I say this because
> committing VMMaker.oscog, a huge package, always times out, and the server
> can be unresponsive thereafter for many minutes, whereas committing the Cog
> package to the very same repository, which is far smaller, does not cause a
> timeout.  f course it could be storing the package to the file system, but
> I doubt that very much.
>
> So I think we need to rewrite the server to move the computation of and
> mailing of the diff to a lower priority, so that answering and receiving
> versions gets priority over reporting changes to the mailing list.  Ion the
> case of VMMaker.oscog the diff often gets thrown away anyway because it is
> often very large.
>
> I'm not familiar with the packages that implement the server, nor what the
> development, testing and installation process is, but I'd love to pair with
> someone on fixing the responsiveness issue and learn.
>

Chris, are you interested in working with Eliot on this? I don't think I
can help directly but I do have some experience with the older squeaksource.com
system, and I'm interested in getting that updated at some point so if I
can offer some help without getting in the way I am happy to do so.

Eliot, I suspect that Chris cleared up one problem when he recently restarted
the image, but that the diff processing that you mention is /also/ a problem
and is worth follow up separately. The reason I say this is that I was getting
commit timeouts on even trivial updates, and that problem went away after the
server restart. But if commit timeouts still happen for a VMMaker commit, then
it is very likely due to the diff processing.

If in fact the diff processing for mailing list updates is the culprit, and
if this is something that could be relegated to a background process completely
separate from the user interactions, then I would be tempted to try putting
the mailing list processing into a #forkHeadlessSqueakAndDoThenQuit: block.
Any interest?

Dave


Reply | Threaded
Open this post in threaded view
|

Re: source.squeak.org --- Responsiveness

Eliot Miranda-2
Hi David,

On Tue, Jan 23, 2018 at 5:08 PM, David T. Lewis <[hidden email]> wrote:

On Tue, Jan 23, 2018 at 01:26:37PM -0800, Eliot Miranda wrote:
> Hi All,
>
> On Wed, Jan 17, 2018 at 11:10 PM, Marcel Taeumel <[hidden email]>
> wrote:
>
> > Hi, there.
> >
> > Since several weeks/months now, I cannot update a single package without
> > either getting a gateway error or a connection timeout. Luckily, the
> > timeout means that the code update was at least completed, which I can
> > observe in my email inbox.
> >
> > What's going on there?! That used to work fine. Timeouts were rare.
> > Gateway errors non-existent.
> >
>
> I think the main problem is that the server is unresponsive while it
> generates the diff email to send to the mailing lists.  I say this because
> committing VMMaker.oscog, a huge package, always times out, and the server
> can be unresponsive thereafter for many minutes, whereas committing the Cog
> package to the very same repository, which is far smaller, does not cause a
> timeout.  f course it could be storing the package to the file system, but
> I doubt that very much.
>
> So I think we need to rewrite the server to move the computation of and
> mailing of the diff to a lower priority, so that answering and receiving
> versions gets priority over reporting changes to the mailing list.  Ion the
> case of VMMaker.oscog the diff often gets thrown away anyway because it is
> often very large.
>
> I'm not familiar with the packages that implement the server, nor what the
> development, testing and installation process is, but I'd love to pair with
> someone on fixing the responsiveness issue and learn.
>

Chris, are you interested in working with Eliot on this? I don't think I
can help directly but I do have some experience with the older squeaksource.com
system, and I'm interested in getting that updated at some point so if I
can offer some help without getting in the way I am happy to do so.

Eliot, I suspect that Chris cleared up one problem when he recently restarted
the image, but that the diff processing that you mention is /also/ a problem
and is worth follow up separately. The reason I say this is that I was getting
commit timeouts on even trivial updates, and that problem went away after the
server restart. But if commit timeouts still happen for a VMMaker commit, then
it is very likely due to the diff processing.

If in fact the diff processing for mailing list updates is the culprit, and
if this is something that could be relegated to a background process completely
separate from the user interactions, then I would be tempted to try putting
the mailing list processing into a #forkHeadlessSqueakAndDoThenQuit: block.
Any interest?

It's certainly worth looking at.  And that suggests that there could be two separate images running concurrently, one doing the serving, and one doing the diffs, possibly prompted by the server image.
 
Dave

_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: source.squeak.org --- Responsiveness

David T. Lewis
On Tue, Jan 23, 2018 at 05:36:00PM -0800, Eliot Miranda wrote:

> Hi David,
>
> On Tue, Jan 23, 2018 at 5:08 PM, David T. Lewis <[hidden email]> wrote:
> >
> > On Tue, Jan 23, 2018 at 01:26:37PM -0800, Eliot Miranda wrote:
> >
> > If in fact the diff processing for mailing list updates is the culprit, and
> > if this is something that could be relegated to a background process
> > completely
> > separate from the user interactions, then I would be tempted to try putting
> > the mailing list processing into a #forkHeadlessSqueakAndDoThenQuit:
> > block.
> > Any interest?
> >
>
> It's certainly worth looking at.  And that suggests that there could be two
> separate images running concurrently, one doing the serving, and one doing
> the diffs, possibly prompted by the server image.
>
At the risk of embarassing myself by posting untested code that probably will
not work, the attached change set shows what I had in mind.

Dave




MC notification diff in background for SqueakSource.1.cs (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: source.squeak.org --- Responsiveness

Chris Muller-4
In reply to this post by David T. Lewis
>> I'm not familiar with the packages that implement the server, nor what the
>> development, testing and installation process is, but I'd love to pair with
>> someone on fixing the responsiveness issue and learn.

It's all here:

   http://wiki.squeak.org/squeak/6365

This is what source.squeak.org is running.  It installs and runs clean
(in Linux).  It never saves the running image.

Every serious Squeak developer should do it on their laptop, so they
can have the revision history for their own proprietary code, not just
the source.squeak.org repositories.

Anyone wanting to learn about and work on our code repository should
do it on their laptops, as it's a great place to test fixes and
upgrades before putting them into production source.squeak.org server.
Once you do the installation step, my guess is you'll be able to find
the diff'ing in the code in a short amount of time.  But it needs to
be tested.

> Chris, are you interested in working with Eliot on this?

Yes, but I am leaving in less than 5 hours to depart for a month long
holiday, and I still need to sleep.  I just finished all day packing
sat down for a brief unwind relax and saw "URGENT"..   :)

I went through a lot of work to make the above process lucid and
smooth.  It's time to cash in.  :)  If you try it, you will go from 5%
to 95% knowledge about it in one evening.

I plan to check on-line things in the evenings during my holiday, I
can assist limited.

  - Chris


> I don't think I
> can help directly but I do have some experience with the older squeaksource.com
> system, and I'm interested in getting that updated at some point so if I
> can offer some help without getting in the way I am happy to do so.
>
> Eliot, I suspect that Chris cleared up one problem when he recently restarted
> the image, but that the diff processing that you mention is /also/ a problem
> and is worth follow up separately. The reason I say this is that I was getting
> commit timeouts on even trivial updates, and that problem went away after the
> server restart. But if commit timeouts still happen for a VMMaker commit, then
> it is very likely due to the diff processing.
>
> If in fact the diff processing for mailing list updates is the culprit, and
> if this is something that could be relegated to a background process completely
> separate from the user interactions, then I would be tempted to try putting
> the mailing list processing into a #forkHeadlessSqueakAndDoThenQuit: block.
> Any interest?
>
> Dave
>