SqueakSource is down again

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

SqueakSource is down again

Rob Withers
This seems to be happening a lot.  Is there something that can be done to
alleviate the problem?

thanks,
Rob


Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

Philippe Marschall
2007/12/22, Rob Withers <[hidden email]>:
> This seems to be happening a lot.  Is there something that can be done to
> alleviate the problem?

Yeah, fix the VM.

Cheers
Philippe

Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

Andreas.Raab
Philippe Marschall wrote:
> 2007/12/22, Rob Withers <[hidden email]>:
>> This seems to be happening a lot.  Is there something that can be done to
>> alleviate the problem?
>
> Yeah, fix the VM.

What do you think is broken to cause those problems?

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

Philippe Marschall
2007/12/23, Andreas Raab <[hidden email]>:
> Philippe Marschall wrote:
> > 2007/12/22, Rob Withers <[hidden email]>:
> >> This seems to be happening a lot.  Is there something that can be done to
> >> alleviate the problem?
> >
> > Yeah, fix the VM.
>
> What do you think is broken to cause those problems?

Basically the stuff that made you choose Gemstone over Squeak.
Semaphores for example (ok, not actually VW per se), other usual
suspects are the scheduler, Sockets and the GC. See the stack trace
Lukas sent earlier. Also it's not uncommon to have hundres of
processes hanging on the same Semaphore >> #critical:. The block has
terminated but the semaphore doesn't get released. I mean it's not
that we do something fancy like terminating a process. And sometimes
the image simply freezes and would stop reacting. An easy way to make
use of the second CPU sure wouldn't hurt as well.

Honestly I'm sick of searching which exact combination of what patches
I have to apply to which image and which GC tweaking I have to apply
to which VM with which patches collected from some posts or forks. Is
it asked to much that this stuff simply works?

Cheers
Philippe

Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

Adrian Lienhard
Most of those issues are not bugs of the VM but of code in the image.
Adrian

On Dec 23, 2007, at 12:14 , Philippe Marschall wrote:

> 2007/12/23, Andreas Raab <[hidden email]>:
>> Philippe Marschall wrote:
>>> 2007/12/22, Rob Withers <[hidden email]>:
>>>> This seems to be happening a lot.  Is there something that can be  
>>>> done to
>>>> alleviate the problem?
>>>
>>> Yeah, fix the VM.
>>
>> What do you think is broken to cause those problems?
>
> Basically the stuff that made you choose Gemstone over Squeak.
> Semaphores for example (ok, not actually VW per se), other usual
> suspects are the scheduler, Sockets and the GC. See the stack trace
> Lukas sent earlier. Also it's not uncommon to have hundres of
> processes hanging on the same Semaphore >> #critical:. The block has
> terminated but the semaphore doesn't get released. I mean it's not
> that we do something fancy like terminating a process. And sometimes
> the image simply freezes and would stop reacting. An easy way to make
> use of the second CPU sure wouldn't hurt as well.
>
> Honestly I'm sick of searching which exact combination of what patches
> I have to apply to which image and which GC tweaking I have to apply
> to which VM with which patches collected from some posts or forks. Is
> it asked to much that this stuff simply works?
>
> Cheers
> Philippe
>


Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

stephane ducasse
it would be a good case for a bounty. I'm sure ESUG would participate.

Stef

On 23 déc. 07, at 12:35, Adrian Lienhard wrote:

> Most of those issues are not bugs of the VM but of code in the image.
> Adrian
>
> On Dec 23, 2007, at 12:14 , Philippe Marschall wrote:
>
>> 2007/12/23, Andreas Raab <[hidden email]>:
>>> Philippe Marschall wrote:
>>>> 2007/12/22, Rob Withers <[hidden email]>:
>>>>> This seems to be happening a lot.  Is there something that can  
>>>>> be done to
>>>>> alleviate the problem?
>>>>
>>>> Yeah, fix the VM.
>>>
>>> What do you think is broken to cause those problems?
>>
>> Basically the stuff that made you choose Gemstone over Squeak.
>> Semaphores for example (ok, not actually VW per se), other usual
>> suspects are the scheduler, Sockets and the GC. See the stack trace
>> Lukas sent earlier. Also it's not uncommon to have hundres of
>> processes hanging on the same Semaphore >> #critical:. The block has
>> terminated but the semaphore doesn't get released. I mean it's not
>> that we do something fancy like terminating a process. And sometimes
>> the image simply freezes and would stop reacting. An easy way to make
>> use of the second CPU sure wouldn't hurt as well.
>>
>> Honestly I'm sick of searching which exact combination of what  
>> patches
>> I have to apply to which image and which GC tweaking I have to apply
>> to which VM with which patches collected from some posts or forks. Is
>> it asked to much that this stuff simply works?
>>
>> Cheers
>> Philippe
>>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

Andreas.Raab
In reply to this post by Philippe Marschall
Philippe Marschall wrote:
> Basically the stuff that made you choose Gemstone over Squeak.

I don't think you understand what made me choose Gemstone. There were
really two reasons for it: The first one is that Squeaksource doesn't
have a viable database solution for our loads. Gemstone does and it
works great. But the second one is just as important: Gemstone is a
vendor, this is a company that if anything goes wrong I can turn to and
ask them to fix it in return for money. Given that we're ramping up on
people the latter is perhaps more important than the former since I
don't know exactly how well Gemstone does scale - but I *do* know that
if we'd be outgrowing the box we're using now I can ask them to help us
fix it.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

Michael Rueger-4
In reply to this post by Philippe Marschall
Philippe Marschall wrote:
> 2007/12/22, Rob Withers <[hidden email]>:
>> This seems to be happening a lot.  Is there something that can be done to
>> alleviate the problem?
>
> Yeah, fix the VM.

How about stepping out of the reality distortion field and fix
SqueakSource so it is actually scaling and production quality?

Michael


Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

Philippe Marschall
2007/12/23, Michael Rueger <[hidden email]>:
> Philippe Marschall wrote:
> > 2007/12/22, Rob Withers <[hidden email]>:
> >> This seems to be happening a lot.  Is there something that can be done to
> >> alleviate the problem?
> >
> > Yeah, fix the VM.
>
> How about stepping out of the reality distortion field and fix
> SqueakSource so it is actually scaling and production quality?

So which parts do we need to fix to make the Semaphore, Socket and
image freezing problems go away?

As for scaling and production quality do you seriously expect me to do
this for free in my spare time?

We fixed performance the problems and now run seriously faster than
source.impara.de while being much bigger.

Philippe

Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

Andreas.Raab
Philippe Marschall wrote:
> So which parts do we need to fix to make the Semaphore, Socket and
> image freezing problems go away?

For semaphores I'd recommend the fixes that I've posted over the year.
For sockets I am not aware of any evidence that indicate a socket issue
(we had a few issues that at first looked like sockets were related but
turned out not) but I'd like to hear any evidence that points to sockets
as the cause of problems. As far as I can tell the socket implementation
is very robust right now. For image freezes -in particular in
Squeaksource- you probably need to fix the concurrency issues in
Squeaksource itself. The last time I checked the code was not robust
enough by far against concurrent modifications (parallel commits etc).

> As for scaling and production quality do you seriously expect me to do
> this for free in my spare time?

That depends on whether or not you seriously expect for example the VM
people to fix the VM problems in their spare time for free. If the
answer is yes, then the answer is yes.

> We fixed performance the problems and now run seriously faster than
> source.impara.de while being much bigger.

That's great to hear. I wish you would have told me a couple of months
ago how to achieve that when I was asking (repeatedly) the same questions.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

johnmci
Well in someone's spare time someone might review the page list below  
and Rob Withers' comments
at
http://lists.squeakfoundation.org/pipermail/squeak-dev/2000-July/021307.html
original code at
http://www.smalltalkconsulting.com/html/OTNotes4.html

Grab the socket test suite and rework for the latest socket  
implementation.

This suite was used by Ian and I a few years back to beat on the Unix  
Socket implementation, it also if I recall
uncovered a Socket issue in the beta version of NetBSD Ian was using.


On Dec 23, 2007, at 12:08 PM, Andreas Raab wrote:

> Philippe Marschall wrote:
>> So which parts do we need to fix to make the Semaphore, Socket and
>> image freezing problems go away?
>
> For semaphores I'd recommend the fixes that I've posted over the  
> year. For sockets I am not aware of any evidence that indicate a  
> socket issue (we had a few issues that at first looked like sockets  
> were related but turned out not) but I'd like to hear any evidence  
> that points to sockets as the cause of problems. As far as I can  
> tell the socket implementation is very robust right now. For image  
> freezes -in particular in Squeaksource- you probably need to fix the  
> concurrency issues in Squeaksource itself. The last time I checked  
> the code was not robust enough by far against concurrent  
> modifications (parallel commits etc).
>
>> As for scaling and production quality do you seriously expect me to  
>> do
>> this for free in my spare time?
>
> That depends on whether or not you seriously expect for example the  
> VM people to fix the VM problems in their spare time for free. If  
> the answer is yes, then the answer is yes.
>
>> We fixed performance the problems and now run seriously faster than
>> source.impara.de while being much bigger.
>
> That's great to hear. I wish you would have told me a couple of  
> months ago how to achieve that when I was asking (repeatedly) the  
> same questions.
>
> Cheers,
>  - Andreas
>

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================



Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

Lukas Renggli
In reply to this post by Andreas.Raab
> > So which parts do we need to fix to make the Semaphore, Socket and
> > image freezing problems go away?
>
> For semaphores I'd recommend the fixes that I've posted over the year.

I loaded all your semaphore related patches a couple of months ago and
squeaksource.com ran quietly and happily up to a few weeks ago. Then
suddenly we got many processes hanging in Semaphore>>#critical:.

> For image freezes -in particular in
> Squeaksource- you probably need to fix the concurrency issues in
> Squeaksource itself.

What kind of concurrency issues in squeaksource.com itself could cause
these problems? I know that the code is far from perfect, but I must
also point out that we didn't loose a single of the more than 71'000
versions during the past 4 years. We also never experienced a
corrupted data model.

I wonder how it can happen that semaphores are suddenly blocked? Might
this be related to image saving happening while being within a
critical section?

Cheers,
Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

Andreas.Raab
Lukas Renggli wrote:
>>> So which parts do we need to fix to make the Semaphore, Socket and
>>> image freezing problems go away?
>> For semaphores I'd recommend the fixes that I've posted over the year.
>
> I loaded all your semaphore related patches a couple of months ago and
> squeaksource.com ran quietly and happily up to a few weeks ago. Then
> suddenly we got many processes hanging in Semaphore>>#critical:.

If you could send a couple of complete stack dumps from the affected
image it might be interesting. There is a possibility you were affected
by the problem of primitiveSuspend (which we discussed earlier) but
that's difficult to tell from a stack dump. Much much easier if you can
go into the image and check whether the doIt I sent comes up empty or not.

>> For image freezes -in particular in
>> Squeaksource- you probably need to fix the concurrency issues in
>> Squeaksource itself.
>
> What kind of concurrency issues in squeaksource.com itself could cause
> these problems? I know that the code is far from perfect, but I must
> also point out that we didn't loose a single of the more than 71'000
> versions during the past 4 years. We also never experienced a
> corrupted data model.

What we've experienced was basically that after the first commit, when
our image went to saving the data model in a reference stream (via
SSFileSystem; takes about two minutes or so), a second commit would
wreck havoc on the system. You can probably simulate this by generating
enough load from different clients on the network with or without
SSFileSystem. And I don't like the idea of saving the image very much
because it's probably not feasible to save multiple versions of that
image which ultimately means that any data corruption kills the whole
data model.

> I wonder how it can happen that semaphores are suddenly blocked? Might
> this be related to image saving happening while being within a
> critical section?

Interesting thought. It may be possible for some strange things to
happen if Seaside doesn't take precautions of not accepting connections
while in the midst of a save. The problem is that the image save/startup
runs with whatever priority it's being issued at, so if there's another
process running at the same time there is a chance this process
interrupts the image save with the potential for strange things
happening. Here is one way in which I could see this happening: A
critical lock held by a process waiting for network traffic to occur
when the image is saved. When the image is restored later on, that
socket is no longer valid but the process could still wait on the
semaphore, blocking the critical section for all other uses.

Cheers,
   - Andreas


Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

Philippe Marschall
In reply to this post by Lukas Renggli
2007/12/23, Lukas Renggli <[hidden email]>:

> > > So which parts do we need to fix to make the Semaphore, Socket and
> > > image freezing problems go away?
> >
> > For semaphores I'd recommend the fixes that I've posted over the year.
>
> I loaded all your semaphore related patches a couple of months ago and
> squeaksource.com ran quietly and happily up to a few weeks ago. Then
> suddenly we got many processes hanging in Semaphore>>#critical:.
>
> > For image freezes -in particular in
> > Squeaksource- you probably need to fix the concurrency issues in
> > Squeaksource itself.
>
> What kind of concurrency issues in squeaksource.com itself could cause
> these problems?

We have concurrent, unsychronized writing access to shared data. Until
now we have been very lucky to get away with this without any
problems. It's certainly not the right way to do it.

Cheers
Philippe

> I know that the code is far from perfect, but I must
> also point out that we didn't loose a single of the more than 71'000
> versions during the past 4 years. We also never experienced a
> corrupted data model.
>
> I wonder how it can happen that semaphores are suddenly blocked? Might
> this be related to image saving happening while being within a
> critical section?
>
> Cheers,
> Lukas
>
> --
> Lukas Renggli
> http://www.lukas-renggli.ch
>
>

Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

Philippe Marschall
In reply to this post by Andreas.Raab
2007/12/24, Andreas Raab <[hidden email]>:

> Lukas Renggli wrote:
> >>> So which parts do we need to fix to make the Semaphore, Socket and
> >>> image freezing problems go away?
> >> For semaphores I'd recommend the fixes that I've posted over the year.
> >
> > I loaded all your semaphore related patches a couple of months ago and
> > squeaksource.com ran quietly and happily up to a few weeks ago. Then
> > suddenly we got many processes hanging in Semaphore>>#critical:.
>
> If you could send a couple of complete stack dumps from the affected
> image it might be interesting. There is a possibility you were affected
> by the problem of primitiveSuspend (which we discussed earlier) but
> that's difficult to tell from a stack dump. Much much easier if you can
> go into the image and check whether the doIt I sent comes up empty or not.
>
> >> For image freezes -in particular in
> >> Squeaksource- you probably need to fix the concurrency issues in
> >> Squeaksource itself.
> >
> > What kind of concurrency issues in squeaksource.com itself could cause
> > these problems? I know that the code is far from perfect, but I must
> > also point out that we didn't loose a single of the more than 71'000
> > versions during the past 4 years. We also never experienced a
> > corrupted data model.
>
> What we've experienced was basically that after the first commit, when
> our image went to saving the data model in a reference stream (via
> SSFileSystem; takes about two minutes or so), a second commit would
> wreck havoc on the system. You can probably simulate this by generating
> enough load from different clients on the network with or without
> SSFileSystem. And I don't like the idea of saving the image very much
> because it's probably not feasible to save multiple versions of that
> image which ultimately means that any data corruption kills the whole
> data model.

We don't use reference streams anymore. We are at the point were it
takes more than 30 minutes to write the model to disk. We only save
the image. We are aware how suboptimal this is but until now we have
been very lucky to get away with this.

Cheers
Philippe

> > I wonder how it can happen that semaphores are suddenly blocked? Might
> > this be related to image saving happening while being within a
> > critical section?
>
> Interesting thought. It may be possible for some strange things to
> happen if Seaside doesn't take precautions of not accepting connections
> while in the midst of a save. The problem is that the image save/startup
> runs with whatever priority it's being issued at, so if there's another
> process running at the same time there is a chance this process
> interrupts the image save with the potential for strange things
> happening. Here is one way in which I could see this happening: A
> critical lock held by a process waiting for network traffic to occur
> when the image is saved. When the image is restored later on, that
> socket is no longer valid but the process could still wait on the
> semaphore, blocking the critical section for all other uses.
>
> Cheers,
>    - Andreas
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

Philippe Marschall
In reply to this post by Andreas.Raab
2007/12/23, Andreas Raab <[hidden email]>:

> Philippe Marschall wrote:
> > So which parts do we need to fix to make the Semaphore, Socket and
> > image freezing problems go away?
>
> For semaphores I'd recommend the fixes that I've posted over the year.
> For sockets I am not aware of any evidence that indicate a socket issue
> (we had a few issues that at first looked like sockets were related but
> turned out not) but I'd like to hear any evidence that points to sockets
> as the cause of problems. As far as I can tell the socket implementation
> is very robust right now. For image freezes -in particular in
> Squeaksource- you probably need to fix the concurrency issues in
> Squeaksource itself. The last time I checked the code was not robust
> enough by far against concurrent modifications (parallel commits etc).
>
> > As for scaling and production quality do you seriously expect me to do
> > this for free in my spare time?
>
> That depends on whether or not you seriously expect for example the VM
> people to fix the VM problems in their spare time for free. If the
> answer is yes, then the answer is yes.

Well I can honestly say that SqS is not production quality. It has no
serious persistence (the main installation on Squeak) and we make to
guarantees in this regard. The "storage" leaves several things to be
desired. We do write the .mcz to disk and back it put so there is
limit to the damage a broken image can cause. If you are uneasy with
this, don't use it. It has several stability issues which we believe
are not due to bugs in our code but in the Squeak-Kernel/VM. But we
never pretended otherwise, we never said there are no issues. We never
said "rock stable, no known bugs for years". If you ask on this list
if Squeak is production ready, how many of the VM maintainers are that
frank and say no?

> > We fixed performance the problems and now run seriously faster than
> > source.impara.de while being much bigger.
>
> That's great to hear. I wish you would have told me a couple of months
> ago how to achieve that when I was asking (repeatedly) the same questions.

What I was talking about is pure rendering performance. You get this
by loading the latest version, this was true several months ago as it
is now. If you use the Impara fork, well talk to the Impara guys. From
the description of your problems I got the impression  that the issues
you faced had much more to do with "persistence" and the issues we
face (general stability). As for persistence there is a Magma backend
which I pointed you at. AFAIK this has seen no action which I also
mentioned.

Cheers
Philippe

Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

stephane ducasse
may be I should repeat it since andreas pointed to Gemstone as a way  
to pay for service, but may be
it would be time to collect some money to get someone working on  
these problems:
        makeing ss robust and fixing what should be fixed in VM/Kernel.
ESUG is really to spend money for that.

Stef


On 24 déc. 07, at 00:36, Philippe Marschall wrote:

> 2007/12/23, Andreas Raab <[hidden email]>:
>> Philippe Marschall wrote:
>>> So which parts do we need to fix to make the Semaphore, Socket and
>>> image freezing problems go away?
>>
>> For semaphores I'd recommend the fixes that I've posted over the  
>> year.
>> For sockets I am not aware of any evidence that indicate a socket  
>> issue
>> (we had a few issues that at first looked like sockets were  
>> related but
>> turned out not) but I'd like to hear any evidence that points to  
>> sockets
>> as the cause of problems. As far as I can tell the socket  
>> implementation
>> is very robust right now. For image freezes -in particular in
>> Squeaksource- you probably need to fix the concurrency issues in
>> Squeaksource itself. The last time I checked the code was not robust
>> enough by far against concurrent modifications (parallel commits  
>> etc).
>>
>>> As for scaling and production quality do you seriously expect me  
>>> to do
>>> this for free in my spare time?
>>
>> That depends on whether or not you seriously expect for example  
>> the VM
>> people to fix the VM problems in their spare time for free. If the
>> answer is yes, then the answer is yes.
>
> Well I can honestly say that SqS is not production quality. It has no
> serious persistence (the main installation on Squeak) and we make to
> guarantees in this regard. The "storage" leaves several things to be
> desired. We do write the .mcz to disk and back it put so there is
> limit to the damage a broken image can cause. If you are uneasy with
> this, don't use it. It has several stability issues which we believe
> are not due to bugs in our code but in the Squeak-Kernel/VM. But we
> never pretended otherwise, we never said there are no issues. We never
> said "rock stable, no known bugs for years". If you ask on this list
> if Squeak is production ready, how many of the VM maintainers are that
> frank and say no?
>
>>> We fixed performance the problems and now run seriously faster than
>>> source.impara.de while being much bigger.
>>
>> That's great to hear. I wish you would have told me a couple of  
>> months
>> ago how to achieve that when I was asking (repeatedly) the same  
>> questions.
>
> What I was talking about is pure rendering performance. You get this
> by loading the latest version, this was true several months ago as it
> is now. If you use the Impara fork, well talk to the Impara guys. From
> the description of your problems I got the impression  that the issues
> you faced had much more to do with "persistence" and the issues we
> face (general stability). As for persistence there is a Magma backend
> which I pointed you at. AFAIK this has seen no action which I also
> mentioned.
>
> Cheers
> Philippe
>
>


Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

Lukas Renggli
In reply to this post by Andreas.Raab
> > I loaded all your semaphore related patches a couple of months ago and
> > squeaksource.com ran quietly and happily up to a few weeks ago. Then
> > suddenly we got many processes hanging in Semaphore>>#critical:.
>
> If you could send a couple of complete stack dumps from the affected
> image it might be interesting. There is a possibility you were affected
> by the problem of primitiveSuspend (which we discussed earlier) but
> that's difficult to tell from a stack dump. Much much easier if you can
> go into the image and check whether the doIt I sent comes up empty or not.

The doIt you sent comes out empty, I've never seen a case where it
actually returned a process. For the stack dumps I've got only the
attached screenshot from the process browser that I took December 5.,
roughly a month after loading your patches.

> What we've experienced was basically that after the first commit, when
> our image went to saving the data model in a reference stream (via
> SSFileSystem; takes about two minutes or so), a second commit would
> wreck havoc on the system. You can probably simulate this by generating
> enough load from different clients on the network with or without
> SSFileSystem. And I don't like the idea of saving the image very much
> because it's probably not feasible to save multiple versions of that
> image which ultimately means that any data corruption kills the whole
> data model.

We save the image every hour, what only takes a couple of seconds. We
also recently fixed some bugs that caused it to block for minutes
afterwards.

> Interesting thought. It may be possible for some strange things to
> happen if Seaside doesn't take precautions of not accepting connections
> while in the midst of a save. The problem is that the image save/startup
> runs with whatever priority it's being issued at, so if there's another
> process running at the same time there is a chance this process
> interrupts the image save with the potential for strange things
> happening. Here is one way in which I could see this happening: A
> critical lock held by a process waiting for network traffic to occur
> when the image is saved. When the image is restored later on, that
> socket is no longer valid but the process could still wait on the
> semaphore, blocking the critical section for all other uses.
Current versions of the Kom server adapter for Seaside stop listening
while saving the image, but I have to check if this is also the case
with the version of Seaside used in squeaksource.com.

Cheers,
Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch



Picture 1.png (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

Andreas.Raab
Lukas Renggli wrote:
> The doIt you sent comes out empty, I've never seen a case where it
> actually returned a process. For the stack dumps I've got only the
> attached screenshot from the process browser that I took December 5.,
> roughly a month after loading your patches.

Unfortunately, the screenshot doesn't show much interesting - the best
thing to do is to cover *all* processes when the system is locked up for
forensic analysis. What we have done in our server VMs is to hook up the
USR1 signal to the VM's printAllStacks() function so that we can simply
get a stack dump via kill -USR1 <pid_of_squeak>. I don't know if this is
in the standard Unix VMs (I have no idea how portable the code is) but
it's a must have feature for running a Linux server with Squeak.

> Current versions of the Kom server adapter for Seaside stop listening
> while saving the image, but I have to check if this is also the case
> with the version of Seaside used in squeaksource.com.

That's a likely cause for problems. Also, you probably need to make sure
all requests are finished before saving the image - otherwise some of
them may rely on network activity to wake up.

Cheers,
   - Andreas


Reply | Threaded
Open this post in threaded view
|

Re: SqueakSource is down again

Igor Stasenko
Sockets are registering semaphores in external object table, to be
able signaled by socket plugin.
When image starting after booting VM, an external object table is
cleared and replaced by empty fresh array.
So, if there is any process(es) left in saved image which waiting on
such semaphores to be signaled via external event (as sockets), they
will never awake, because there's no one who can signal them.
To get around problem, at startup phase, we should do something like that:
   Socket allInstancesDo: [:s | s signalAndClearSemaphores ].
So, any process which has waiting for given semaphores will have a
chance to get  past the lock and die.

As for gracious shutdown/startup, to prevent interrupting process
which saving image by some process who can handle network requests, i
think best way is to suspend all active processes except one, which
saving image. And then resume them after save done.
But lately, when i tried to do it myself, i found that it's not
possible, due to bug in #resume:  (see other discussion about
suspend/resume).
So, we need to wait until it will be fixed, or find a way around.

On 24/12/2007, Andreas Raab <[hidden email]> wrote:

> Lukas Renggli wrote:
> > The doIt you sent comes out empty, I've never seen a case where it
> > actually returned a process. For the stack dumps I've got only the
> > attached screenshot from the process browser that I took December 5.,
> > roughly a month after loading your patches.
>
> Unfortunately, the screenshot doesn't show much interesting - the best
> thing to do is to cover *all* processes when the system is locked up for
> forensic analysis. What we have done in our server VMs is to hook up the
> USR1 signal to the VM's printAllStacks() function so that we can simply
> get a stack dump via kill -USR1 <pid_of_squeak>. I don't know if this is
> in the standard Unix VMs (I have no idea how portable the code is) but
> it's a must have feature for running a Linux server with Squeak.
>
> > Current versions of the Kom server adapter for Seaside stop listening
> > while saving the image, but I have to check if this is also the case
> > with the version of Seaside used in squeaksource.com.
>
> That's a likely cause for problems. Also, you probably need to make sure
> all requests are finished before saving the image - otherwise some of
> them may rely on network activity to wake up.
>
> Cheers,
>    - Andreas
>
>
>


--
Best regards,
Igor Stasenko AKA sig.

12