Hi, there. Since several weeks/months now, I cannot update a single package without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox. What's going on there?! That used to work fine. Timeouts were rare. Gateway errors non-existent. Best, Marcel |
On Thu, Jan 18, 2018 at 08:10:31AM +0100, Marcel Taeumel wrote:
> Hi, there. > > Since several weeks/months now, I cannot update a single package without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox. > > What's going on there?! That used to work fine. Timeouts were rare. Gateway errors non-existent. > I think that the problem goes back longer than that, although it does seem to be getting worse in recent months. My guess (and it is only a guess) is that there are two possible causes: 1) If I recall right, the VM that is installed with source.squeak.org (which is quite old now) came from a time at which there were problems with the garbage collector that led to noticeable delays. It is possible that updating the VM to a more recent version would make this go away. 2) The image is backed by Magma, and it is possible that something there is eating time when an update is made to a repository. Dave |
Hi,
I was able to VNC right into the server image. It is responsive, however, there are a ton of processes apparently stuck on a Mutex>>#critical: block. I think that explains the timeouts. The service was last restarted 204 days ago. I'll contact box-admins and board about restarting the service, that should clear it up. - Chris On Thu, Jan 18, 2018 at 6:12 AM, David T. Lewis <[hidden email]> wrote: > On Thu, Jan 18, 2018 at 08:10:31AM +0100, Marcel Taeumel wrote: >> Hi, there. >> >> Since several weeks/months now, I cannot update a single package without either getting a gateway error or a connection timeout. Luckily, the timeout means that the code update was at least completed, which I can observe in my email inbox. >> >> What's going on there?! That used to work fine. Timeouts were rare. Gateway errors non-existent. >> > > I think that the problem goes back longer than that, although it does seem > to be getting worse in recent months. > > My guess (and it is only a guess) is that there are two possible causes: > > 1) If I recall right, the VM that is installed with source.squeak.org (which > is quite old now) came from a time at which there were problems with the garbage > collector that led to noticeable delays. It is possible that updating the VM > to a more recent version would make this go away. > > 2) The image is backed by Magma, and it is possible that something there is > eating time when an update is made to a repository. > > Dave > > squeaksource2.png (509K) Download Attachment |
On Thu, Jan 18, 2018 at 5:42 PM, Chris Muller <[hidden email]> wrote: Hi, and for my information what version of Squeak and what VM is it running?
_,,,^..^,,,_ best, Eliot |
Clearly I'm well out of the loop at this point so I'm likely wrong. But, the way any Squeak hosted service that was setup and managed by the Box-Admins team in the past it will automatically restart if it quits (using daemontools). On Thu, Jan 18, 2018 at 8:17 PM, Eliot Miranda <[hidden email]> wrote:
|
Hi Ken,
On Thu, Jan 18, 2018 at 08:32:54PM -0600, Ken Causey wrote: > Clearly I'm well out of the loop at this point so I'm likely wrong. But, > the way any Squeak hosted service that was setup and managed by the > Box-Admins team in the past it will automatically restart if it quits > (using daemontools). Yes the daemontools setup is still in effect and works a champ, thank you :-) I think Chris is just being cautious in asking, since this is our main source repository server. Dave > > On Thu, Jan 18, 2018 at 8:17 PM, Eliot Miranda <[hidden email]> > wrote: > > > > > > > On Thu, Jan 18, 2018 at 5:42 PM, Chris Muller <[hidden email]> wrote: > > > >> Hi, > >> > >> I was able to VNC right into the server image. It is responsive, > >> however, there are a ton of processes apparently stuck on a > >> Mutex>>#critical: block. I think that explains the timeouts. > >> > >> The service was last restarted 204 days ago. I'll contact box-admins > >> and board about restarting the service, that should clear it up. > >> > > > > and for my information what version of Squeak and what VM is it running? > > > > > >> - Chris > >> > >> > >> > >> On Thu, Jan 18, 2018 at 6:12 AM, David T. Lewis <[hidden email]> > >> wrote: > >> > On Thu, Jan 18, 2018 at 08:10:31AM +0100, Marcel Taeumel wrote: > >> >> Hi, there. > >> >> > >> >> Since several weeks/months now, I cannot update a single package > >> without either getting a gateway error or a connection timeout. Luckily, > >> the timeout means that the code update was at least completed, which I can > >> observe in my email inbox. > >> >> > >> >> What's going on there?! That used to work fine. Timeouts were rare. > >> Gateway errors non-existent. > >> >> > >> > > >> > I think that the problem goes back longer than that, although it does > >> seem > >> > to be getting worse in recent months. > >> > > >> > My guess (and it is only a guess) is that there are two possible causes: > >> > > >> > 1) If I recall right, the VM that is installed with source.squeak.org > >> (which > >> > is quite old now) came from a time at which there were problems with > >> the garbage > >> > collector that led to noticeable delays. It is possible that updating > >> the VM > >> > to a more recent version would make this go away. > >> > > >> > 2) The image is backed by Magma, and it is possible that something > >> there is > >> > eating time when an update is made to a repository. > >> > > >> > Dave > >> > > >> > > >> > >> > >> > >> > > > > > > -- > > _,,,^..^,,,_ > > best, Eliot > > > > > > > > > |
In reply to this post by Eliot Miranda-2
>> I was able to VNC right into the server image. It is responsive,
>> however, there are a ton of processes apparently stuck on a >> Mutex>>#critical: block. I think that explains the timeouts. >> >> The service was last restarted 204 days ago. I'll contact box-admins >> and board about restarting the service, that should clear it up. > > and for my information what version of Squeak and what VM is it running? The production VM released with Squeak 5.1. 5.0-201608171728 Sun Sep 25 16:02:24 UTC 2016 gcc 4.6.3 [Production Spur VM] It's been a few months since I tried the most recent VM. All the newer ones I'd ever tried since the GC rewrite would crash more often than I could bear. I run this same code base and VM to support my own code repository as a local daemontools service. It doesn't have the volume source.squeak.org has, but it has been stable for me. |
In reply to this post by marcel.taeumel
It's restarted. You should be able to use it normally, however, the
last commit Magma got was on 6-Jan-2018 (every one since then got stuck on the Mutex), so every commit since then will be recovered (its revision history indexed into the Magma DB) in the background, so you may experience some sluggishness for the next few days. Thanks for your patience, sorry for any inconvenience. - Chris On Thu, Jan 18, 2018 at 1:10 AM, Marcel Taeumel <[hidden email]> wrote: > Hi, there. > > Since several weeks/months now, I cannot update a single package without > either getting a gateway error or a connection timeout. Luckily, the timeout > means that the code update was at least completed, which I can observe in my > email inbox. > > What's going on there?! That used to work fine. Timeouts were rare. Gateway > errors non-existent. > > Best, > Marcel > > > |
In reply to this post by Chris Muller-3
On Thu, Jan 18, 2018 at 09:09:00PM -0600, Chris Muller wrote:
> >> I was able to VNC right into the server image. It is responsive, > >> however, there are a ton of processes apparently stuck on a > >> Mutex>>#critical: block. I think that explains the timeouts. > >> > >> The service was last restarted 204 days ago. I'll contact box-admins > >> and board about restarting the service, that should clear it up. > > > > and for my information what version of Squeak and what VM is it running? > > The production VM released with Squeak 5.1. > > 5.0-201608171728 Sun Sep 25 16:02:24 UTC 2016 gcc 4.6.3 > [Production Spur VM] > > It's been a few months since I tried the most recent VM. All the > newer ones I'd ever tried since the GC rewrite would crash more often > than I could bear. > > I run this same code base and VM to support my own code repository as > a local daemontools service. It doesn't have the volume > source.squeak.org has, but it has been stable for me. > I think that my mention of garbage collection as a possible cause is a red herring. Likewise my mention of Magma backing store. Those were just the only two things I could think of that were obviously different from the other squeaksource image that we are running. In any case, 204 days of continuous service without a restart is nothing to be unhappy about :-) Dave |
:) Your memory was keener than mine, actually. As I tail'd the log
when it came back up, I saw the message "Starting Garbage Collection", and it reminded me about this issue from a couple of years back.. A strange phenomena with this application (SqueakSource+Magma) and VM that, upon completion of the initial loading of the root SSRepository object, at some later time whenever the first garbage collection after that would take like 2 minutes. But, after that, it was pretty much fine, pretty snappy. So, rather than the enduring that pain at a random time, I decided that at a known time was better. On startup. On Thu, Jan 18, 2018 at 9:36 PM, David T. Lewis <[hidden email]> wrote: > On Thu, Jan 18, 2018 at 09:09:00PM -0600, Chris Muller wrote: >> >> I was able to VNC right into the server image. It is responsive, >> >> however, there are a ton of processes apparently stuck on a >> >> Mutex>>#critical: block. I think that explains the timeouts. >> >> >> >> The service was last restarted 204 days ago. I'll contact box-admins >> >> and board about restarting the service, that should clear it up. >> > >> > and for my information what version of Squeak and what VM is it running? >> >> The production VM released with Squeak 5.1. >> >> 5.0-201608171728 Sun Sep 25 16:02:24 UTC 2016 gcc 4.6.3 >> [Production Spur VM] >> >> It's been a few months since I tried the most recent VM. All the >> newer ones I'd ever tried since the GC rewrite would crash more often >> than I could bear. >> >> I run this same code base and VM to support my own code repository as >> a local daemontools service. It doesn't have the volume >> source.squeak.org has, but it has been stable for me. >> > > I think that my mention of garbage collection as a possible cause is > a red herring. Likewise my mention of Magma backing store. Those were > just the only two things I could think of that were obviously different > from the other squeaksource image that we are running. > > In any case, 204 days of continuous service without a restart is nothing > to be unhappy about :-) > > Dave > |
In reply to this post by marcel.taeumel
Hi All,
On Wed, Jan 17, 2018 at 11:10 PM, Marcel Taeumel <[hidden email]> wrote:
I think the main problem is that the server is unresponsive while it generates the diff email to send to the mailing lists. I say this because committing VMMaker.oscog, a huge package, always times out, and the server can be unresponsive thereafter for many minutes, whereas committing the Cog package to the very same repository, which is far smaller, does not cause a timeout. f course it could be storing the package to the file system, but I doubt that very much. So I think we need to rewrite the server to move the computation of and mailing of the diff to a lower priority, so that answering and receiving versions gets priority over reporting changes to the mailing list. Ion the case of VMMaker.oscog the diff often gets thrown away anyway because it is often very large. I'm not familiar with the packages that implement the server, nor what the development, testing and installation process is, but I'd love to pair with someone on fixing the responsiveness issue and learn.
_,,,^..^,,,_ best, Eliot |
On Tue, Jan 23, 2018 at 01:26:37PM -0800, Eliot Miranda wrote: > Hi All, > > On Wed, Jan 17, 2018 at 11:10 PM, Marcel Taeumel <[hidden email]> > wrote: > > > Hi, there. > > > > Since several weeks/months now, I cannot update a single package without > > either getting a gateway error or a connection timeout. Luckily, the > > timeout means that the code update was at least completed, which I can > > observe in my email inbox. > > > > What's going on there?! That used to work fine. Timeouts were rare. > > Gateway errors non-existent. > > > > I think the main problem is that the server is unresponsive while it > generates the diff email to send to the mailing lists. I say this because > committing VMMaker.oscog, a huge package, always times out, and the server > can be unresponsive thereafter for many minutes, whereas committing the Cog > package to the very same repository, which is far smaller, does not cause a > timeout. f course it could be storing the package to the file system, but > I doubt that very much. > > So I think we need to rewrite the server to move the computation of and > mailing of the diff to a lower priority, so that answering and receiving > versions gets priority over reporting changes to the mailing list. Ion the > case of VMMaker.oscog the diff often gets thrown away anyway because it is > often very large. > > I'm not familiar with the packages that implement the server, nor what the > development, testing and installation process is, but I'd love to pair with > someone on fixing the responsiveness issue and learn. > Chris, are you interested in working with Eliot on this? I don't think I can help directly but I do have some experience with the older squeaksource.com system, and I'm interested in getting that updated at some point so if I can offer some help without getting in the way I am happy to do so. Eliot, I suspect that Chris cleared up one problem when he recently restarted the image, but that the diff processing that you mention is /also/ a problem and is worth follow up separately. The reason I say this is that I was getting commit timeouts on even trivial updates, and that problem went away after the server restart. But if commit timeouts still happen for a VMMaker commit, then it is very likely due to the diff processing. If in fact the diff processing for mailing list updates is the culprit, and if this is something that could be relegated to a background process completely separate from the user interactions, then I would be tempted to try putting the mailing list processing into a #forkHeadlessSqueakAndDoThenQuit: block. Any interest? Dave |
Hi David,
On Tue, Jan 23, 2018 at 5:08 PM, David T. Lewis <[hidden email]> wrote:
It's certainly worth looking at. And that suggests that there could be two separate images running concurrently, one doing the serving, and one doing the diffs, possibly prompted by the server image. Dave _,,,^..^,,,_ best, Eliot |
On Tue, Jan 23, 2018 at 05:36:00PM -0800, Eliot Miranda wrote:
> Hi David, > > On Tue, Jan 23, 2018 at 5:08 PM, David T. Lewis <[hidden email]> wrote: > > > > On Tue, Jan 23, 2018 at 01:26:37PM -0800, Eliot Miranda wrote: > > > > If in fact the diff processing for mailing list updates is the culprit, and > > if this is something that could be relegated to a background process > > completely > > separate from the user interactions, then I would be tempted to try putting > > the mailing list processing into a #forkHeadlessSqueakAndDoThenQuit: > > block. > > Any interest? > > > > It's certainly worth looking at. And that suggests that there could be two > separate images running concurrently, one doing the serving, and one doing > the diffs, possibly prompted by the server image. > not work, the attached change set shows what I had in mind. Dave MC notification diff in background for SqueakSource.1.cs (1K) Download Attachment |
In reply to this post by David T. Lewis
>> I'm not familiar with the packages that implement the server, nor what the
>> development, testing and installation process is, but I'd love to pair with >> someone on fixing the responsiveness issue and learn. It's all here: http://wiki.squeak.org/squeak/6365 This is what source.squeak.org is running. It installs and runs clean (in Linux). It never saves the running image. Every serious Squeak developer should do it on their laptop, so they can have the revision history for their own proprietary code, not just the source.squeak.org repositories. Anyone wanting to learn about and work on our code repository should do it on their laptops, as it's a great place to test fixes and upgrades before putting them into production source.squeak.org server. Once you do the installation step, my guess is you'll be able to find the diff'ing in the code in a short amount of time. But it needs to be tested. > Chris, are you interested in working with Eliot on this? Yes, but I am leaving in less than 5 hours to depart for a month long holiday, and I still need to sleep. I just finished all day packing sat down for a brief unwind relax and saw "URGENT".. :) I went through a lot of work to make the above process lucid and smooth. It's time to cash in. :) If you try it, you will go from 5% to 95% knowledge about it in one evening. I plan to check on-line things in the evenings during my holiday, I can assist limited. - Chris > I don't think I > can help directly but I do have some experience with the older squeaksource.com > system, and I'm interested in getting that updated at some point so if I > can offer some help without getting in the way I am happy to do so. > > Eliot, I suspect that Chris cleared up one problem when he recently restarted > the image, but that the diff processing that you mention is /also/ a problem > and is worth follow up separately. The reason I say this is that I was getting > commit timeouts on even trivial updates, and that problem went away after the > server restart. But if commit timeouts still happen for a VMMaker commit, then > it is very likely due to the diff processing. > > If in fact the diff processing for mailing list updates is the culprit, and > if this is something that could be relegated to a background process completely > separate from the user interactions, then I would be tempted to try putting > the mailing list processing into a #forkHeadlessSqueakAndDoThenQuit: block. > Any interest? > > Dave > |
Free forum by Nabble | Edit this page |