lowspace signalling and handling issues

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

lowspace signalling and handling issues

timrowledge
After having problem trying to debug some TK4 code that blew up with lowspace
problems but never let me catch and debug, I spent some time adding the
lowspace-process stuff we recently discussed. I had to make a few alterations
to match it up with the latest 64bit clean code but no problems with that part.

After building a VM I started testing with some of the methods in
SystemDictionary 'memory space' - in particular  #useUpMemory. It is perhaps
fortunate that I did since the other #useUp* methods pretty much work once the
lowspace process is caught by the vm and passed up to the image. After a _lot_
of head scratching by John & I we found that with a gazillion tiny objects
(Links are the smallest possible objects that can exist on their own, plain
Objects would have to be contained in a collection and so would cost the same 3
words per object) cause a catastrophic GC explosion. What happens is that
memory fills up until we get to signal lowspace and then we are in danger.
Depending upon the exact size of object memory in use the 200kb used as the
lowSpaceThreshold can be gobbled up in one swallow by the
initializeMemoryFirstFree: method making sure there is a byte per object that
survived the markPhase. In using useUpMemory we can get to having 4 bytes of
free space when the next allocate is attempted.... Ka-Boom.
Expanding the lowSpaceThreshold (along with the VM changes to report the
process and avoid the accidental problem of interrupting eventTickler) to a
couple of mb makes it ok on my machine and the threshold can be a lot lower
with the other tests that create bigger (hence fewer) objects (hence smaller
fwdTable needs). In the worst case, we could have a very large OM filled with
very small objects all surviving markPhase; in such a case we would need an
additional 1/12 of OM available for the fwdTable. So for a 30Mb objectmemory we
ought to set the lowSPaceThreshold to 30/13 => 2.31Mb + actual space needed to
run the notifier/debugger etc for reasonable safety. Or hide  the 2.31 Mb away
so the image never even knows it is there. If you are using virtual memory and
a limit of 512Mb then you should perhaps secrete 40Mb some where safe.

This assumes that we really need to have one byte per object of course. The
original rationale was to keep the number of compact loops down to eight (see
Dan's comment in initializeMemoryFirstFree:) for Alan's large demo image. The
nicest solution would be to come up with a way to do our GC & compacting
without needing any extra space. Commence headscratching now... John suggested
making sure the fwd gets less than the byte-per-object if things are tight, and
accpting the extra compaction loops.

Good news- with the vm change and 2Mb lowSpaceThreshold I can probably go back
and find my TK4 problem(s).

Bad news- consider Tweak. With lots of processes whizzing away, merely stopping
the one that did the allocation and triggered the lowspace is not going to be
much good. Stopping everything except the utterly essential stuff to debug the
lowspace will be needed. Probably.

More bad news- somehow, going from VMM37b5 to b6 cost 40% of performance on my machine :-(  Bugger.

tim
--
Tim Rowledge, [hidden email], http://sumeru.stanford.edu/tim
Every bug you find is the last one.

Reply | Threaded
Open this post in threaded view
|

Re: lowspace signalling and handling issues

Andreas.Raab
Hi Tim -

> After having problem trying to debug some TK4 code that blew up with lowspace
> problems but never let me catch and debug, I spent some time adding the
> lowspace-process stuff we recently discussed. I had to make a few alterations
> to match it up with the latest 64bit clean code but no problems with that part.

What am I missing? I don't remember low-space stuff - I only remember
interrupt-related stuff.

> Depending upon the exact size of object memory in use the 200kb used as the
> lowSpaceThreshold can be gobbled up in one swallow by the
> initializeMemoryFirstFree: method making sure there is a byte per object that
> survived the markPhase. In using useUpMemory we can get to having 4 bytes of
> free space when the next allocate is attempted.... Ka-Boom.

Well, so don't eat up the memory. There is no reason why
initializeMemoryFirstFree: would have to reserve that much memory - like
the comment says the reserve "should" be chosen so that compactions can
be done in one pass but there is absolutely no such requirement.
Multi-pass compactions have happened in the past and there is nothing
wrong with them (in a low-space situation).

> This assumes that we really need to have one byte per object of course. The
> original rationale was to keep the number of compact loops down to eight (see
> Dan's comment in initializeMemoryFirstFree:) for Alan's large demo image. The
> nicest solution would be to come up with a way to do our GC & compacting
> without needing any extra space. Commence headscratching now... John suggested
> making sure the fwd gets less than the byte-per-object if things are tight, and
> accpting the extra compaction loops.

Yes. That's the only reasonable way of dealing with it.

> Bad news- consider Tweak. With lots of processes whizzing away, merely stopping
> the one that did the allocation and triggered the lowspace is not going to be
> much good. Stopping everything except the utterly essential stuff to debug the
> lowspace will be needed. Probably.

Uh, oh. Are you telling me that the "low space stuff" you are referring
to above actually suspends the process that triggers the low-space
condition? Bad, bad, bad idea. Ever considered that this might be the
timer process? The finalization process? Low-space is *not* a
per-process condition; suspending the currently running process is
something that should be done with great care (if at all).

Please, don't suspend that process - put it away for the image to
examine but by all means do NOT suspend it. If you give me a nice clean
semaphore signal for Tweak to handle a low-space condition I know
perfectly well what to do but if you just suspend a random process which
may have absolutely nothing with the low space condition, then, yes, we
are in trouble (if this were a tweak scheduler process you'd be totally
hosed).

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: lowspace signalling and handling issues

johnmci

On Apr 30, 2005, at 8:00 PM, Andreas Raab wrote:

> Hi Tim -
>
>> After having problem trying to debug some TK4 code that blew up with  
>> lowspace
>> problems but never let me catch and debug, I spent some time adding  
>> the
>> lowspace-process stuff we recently discussed. I had to make a few  
>> alterations
>> to match it up with the latest 64bit clean code but no problems with  
>> that part.
>
> What am I missing? I don't remember low-space stuff - I only remember  
> interrupt-related stuff.

There was a mantis bug about low-space issues and some patchs to record  
which process caused the lowspace signal. Mind this in my opinion is  
wrong.

>
>> Depending upon the exact size of object memory in use the 200kb used  
>> as the
>> lowSpaceThreshold can be gobbled up in one swallow by the
>> initializeMemoryFirstFree: method making sure there is a byte per  
>> object that
>> survived the markPhase. In using useUpMemory we can get to having 4  
>> bytes of
>> free space when the next allocate is attempted.... Ka-Boom.
>
> Well, so don't eat up the memory. There is no reason why  
> initializeMemoryFirstFree: would have to reserve that much memory -  
> like the comment says the reserve "should" be chosen so that  
> compactions can be done in one pass but there is absolutely no such  
> requirement. Multi-pass compactions have happened in the past and  
> there is nothing wrong with them (in a low-space situation).
>
>> This assumes that we really need to have one byte per object of  
>> course. The
>> original rationale was to keep the number of compact loops down to  
>> eight (see
>> Dan's comment in initializeMemoryFirstFree:) for Alan's large demo  
>> image. The
>> nicest solution would be to come up with a way to do our GC &  
>> compacting
>> without needing any extra space. Commence headscratching now... John  
>> suggested
>> making sure the fwd gets less than the byte-per-object if things are  
>> tight, and
>> accpting the extra compaction loops.
>
> Yes. That's the only reasonable way of dealing with it.

What happens is the fwdblocks calculation grabs all the available free  
memory when it's recalculated after the full GC, the check for this  
condition actually backs it off to allow one object header free, 4 or 6  
bytes I believe, usually you die right away because someone attempts to  
allocate a  new context record and we don't have 98ish bytes free. I  
gave Tim a change set that attempts to maximise freespace to 100K by  
reducing fwdblocks down to 32k, once you hit the 32k limit freespace  
then heads towards zero of course.

Note that once freespace goes under 200,000 we do signal the lowspace  
semaphore btw.

These changes do require a VM change, but we did notice as Tim points  
out if you increase the lowspace threshold, say to 1MB in my testing  
the other night we'll get the semaphore signaled with a current VM,  
this would not occur before in an unaltered VM.

>
>> Bad news- consider Tweak. With lots of processes whizzing away,  
>> merely stopping
>> the one that did the allocation and triggered the lowspace is not  
>> going to be
>> much good. Stopping everything except the utterly essential stuff to  
>> debug the
>> lowspace will be needed. Probably.
>
> Uh, oh. Are you telling me that the "low space stuff" you are  
> referring to above actually suspends the process that triggers the  
> low-space condition? Bad, bad, bad idea. Ever considered that this  
> might be the timer process? The finalization process? Low-space is  
> *not* a per-process condition; suspending the currently running  
> process is something that should be done with great care (if at all).
>
> Please, don't suspend that process - put it away for the image to  
> examine but by all means do NOT suspend it. If you give me a nice  
> clean semaphore signal for Tweak to handle a low-space condition I  
> know perfectly well what to do but if you just suspend a random  
> process which may have absolutely nothing with the low space  
> condition, then, yes, we are in trouble (if this were a tweak  
> scheduler process you'd be totally hosed).

Tim and I were considering to suspend all user processes and others we  
don't have knowledge of being untouchable, then I pointed out Tweak  
spawns all these process, what do we do about them? Certainly we can  
call something to say lowspace Mr Tweak beware...

The Process Browser logic has a table identifying processes of the VM,  
we assume a process the user created is causing the problem.  The  
earlier fix suggested to stop the process that was running when the  
lowspace condition occurred, but I doubt you can 100% say that is the  
process in question and could as you know be the finalization process  
or other critical task. Still this is not harmful because the evil  
process in question is still running and will terminate your image in  
short order.

>
> Cheers,
>   - Andreas
>
>
--
========================================================================
===
John M. McIntosh <[hidden email]> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===


Reply | Threaded
Open this post in threaded view
|

Re: lowspace signalling and handling issues

timrowledge
In reply to this post by Andreas.Raab
In message <[hidden email]>
          Andreas Raab <[hidden email]> wrote:

> Hi Tim -
>
> What am I missing? I don't remember low-space stuff - I only remember
> interrupt-related stuff.
Handling the interrupt caused by lowspace condition being signalled - Mantis
1041.

>
> > Depending upon the exact size of object memory in use the 200kb used as the
> > lowSpaceThreshold can be gobbled up in one swallow by the
> > initializeMemoryFirstFree: method making sure there is a byte per object
that
> > survived the markPhase. In using useUpMemory we can get to having 4 bytes of
> > free space when the next allocate is attempted.... Ka-Boom.
>
> Well, so don't eat up the memory. There is no reason why
> initializeMemoryFirstFree: would have to reserve that much memory - like
> the comment says the reserve "should" be chosen so that compactions can
> be done in one pass but there is absolutely no such requirement.
> Multi-pass compactions have happened in the past and there is nothing
> wrong with them (in a low-space situation).

See my earlier point (below) about the original rationale for wanting one byte
per object to keep the multi-passness from exceeding eight.

>
> > This assumes that we really need to have one byte per object of course. The
> > original rationale was to keep the number of compact loops down to eight
(see
> > Dan's comment in initializeMemoryFirstFree:) for Alan's large demo image.
The
> > nicest solution would be to come up with a way to do our GC & compacting
> > without needing any extra space. Commence headscratching now... John
suggested
> > making sure the fwd gets less than the byte-per-object if things are tight,
and
> > accpting the extra compaction loops.
>
> Yes. That's the only reasonable way of dealing with it.

It's a plausible way of dealing with the immediate-crash aspect but not the
only way. And it doesn't make it much better if there isn't enough memory to
allow the notifier/debugger to do any work once the signal is raised.

>
> > Bad news- consider Tweak. With lots of processes whizzing away, merely
stopping
> > the one that did the allocation and triggered the lowspace is not going to
be
> > much good. Stopping everything except the utterly essential stuff to debug
the
> > lowspace will be needed. Probably.
>
> Uh, oh. Are you telling me that the "low space stuff" you are referring
> to above actually suspends the process that triggers the low-space
> condition?
No, it doesn't do anything like that. Take a look at the mantis 1041 commentary
to remind yourself what is going on here. (And yes, I knew about lowspace not
being a per-process issue about ten years before Squeak appeared....)

If you take another look at what I wote I think you'll see that that is exactly
what I was saying; with many processes in process, simply interrupting the one
that happened to push the allocator over the limit isn't a sufficient response.
So we're in agreement about the problem, let's try to find a good solution.
Right now I think I'll find a good solution of aqueous caffeine compounds in
elevated enthalpy dihydrogen monoxide.



tim
--
Tim Rowledge, [hidden email], http://sumeru.stanford.edu/tim
Useful random insult:- Looks for the "Any" key.

Reply | Threaded
Open this post in threaded view
|

Re: lowspace signalling and handling issues

Andreas.Raab
> If you take another look at what I wote I think you'll see that that is exactly
> what I was saying; with many processes in process, simply interrupting the one
> that happened to push the allocator over the limit isn't a sufficient response.

*Phew* Thanks, I'm relieved (I was trying to get to the server but I
can't get to it right now).

> So we're in agreement about the problem, let's try to find a good solution.

You know, sometimes I wish we'd have swap space to really utilize. One
of the nice things about swap space is that degradation is continuous so
it's not the sudden "boom - you're out of memory" situation but rather a
graceful "starting to get tight ... getting tighter ... now we're really
running into trouble" situation. And most times you're running out of
patience and interrupt whatever was going on long before you ran out of
swap space.

> Right now I think I'll find a good solution of aqueous caffeine compounds in
> elevated enthalpy dihydrogen monoxide.

*grin*

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: lowspace signalling and handling issues

johnmci

On May 1, 2005, at 11:07 AM, Andreas Raab wrote:

>> If you take another look at what I wote I think you'll see that that  
>> is exactly
>> what I was saying; with many processes in process, simply  
>> interrupting the one
>> that happened to push the allocator over the limit isn't a sufficient  
>> response.
>
> *Phew* Thanks, I'm relieved (I was trying to get to the server but I  
> can't get to it right now).
>
>> So we're in agreement about the problem, let's try to find a good  
>> solution.
>
> You know, sometimes I wish we'd have swap space to really utilize. One  
> of the nice things about swap space is that degradation is continuous  
> so it's not the sudden "boom - you're out of memory" situation but  
> rather a graceful "starting to get tight ... getting tighter ... now  
> we're really running into trouble" situation. And most times you're  
> running out of patience and interrupt whatever was going on long  
> before you ran out of swap space.
>

You could tag each process with an instance var that counts memory  
allocations, or memory allocation rate, then in a low space condition  
you slow down the
fastest consumer.  If you recall in the past I had some code to record  
dispatch time since there is only one place in the VM where the process  
switch occurs, same thought
applies, then in the image you could have the lowspace logic consider  
the fastest memory allocation consumers.

Perhaps tagging object allocation by process owner would be  
interesting, could after a full GC know how much memory per process is  
allocated...



>> Right now I think I'll find a good solution of aqueous caffeine  
>> compounds in
>> elevated enthalpy dihydrogen monoxide.
>
> *grin*
>
> Cheers,
>   - Andreas
>
>
--
========================================================================
===
John M. McIntosh <[hidden email]> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===


Reply | Threaded
Open this post in threaded view
|

Re: lowspace signalling and handling issues

timrowledge
In reply to this post by Andreas.Raab
So having the VM provide the oop of the Process that asked for the allocation
which passed the threshold is ok as a start.

I think we need to set the lowSpaceThreshold much higher for a realistic chance
of surviving the alert - several Mb seems to be much safer. Even without the GC
code occasionally stealing a large chunk from the memory we're just declared is
in short supply it doesn't seem smart to wait until memory is seriously tight
before telling the user.  Opening a debugger and doing anything meaningful in
morphic takes a good chunk of space. The current code in the VM pretty much
relies upon there being some more memory to add via sqExpandMemory but
a) not all platforms do that, so it crashes
b) every platform will run low eventually and then see a) above.
Having a threshold of 200kb when the VM may very well demand an extra 200kb as
part of trying to clean up to handle the fact that you don't have 200kb free is
not very safe.

Has anyone ever done any testing to see just how many passes of compacting can
be survived? If we checked the actual free space (perhaps the lowSpaceSignal
state is better) and refused to give any to the fwdBlock table would we simply
be thrashing a bit more, or doomed?  Or should I just give up because nobody
can be bothered to actually think about how to do things right for a change?


tim
--
Tim Rowledge, [hidden email], http://sumeru.stanford.edu/tim
"Bollocks," said Pooh being more forthright than usual

Reply | Threaded
Open this post in threaded view
|

Re: lowspace signalling and handling issues

johnmci
I've taken it down to 32K on a 512MB image via that code that allocates  
links...
Grinds away until freespace goes under 98 bytes (can't allocate a  
context record).

On May 3, 2005, at 6:54 PM, Tim Rowledge wrote:

> So having the VM provide the oop of the Process that asked for the  
> allocation
> which passed the threshold is ok as a start.
>
> I think we need to set the lowSpaceThreshold much higher for a  
> realistic chance
> of surviving the alert - several Mb seems to be much safer. Even  
> without the GC
> code occasionally stealing a large chunk from the memory we're just  
> declared is
> in short supply it doesn't seem smart to wait until memory is  
> seriously tight
> before telling the user.  Opening a debugger and doing anything  
> meaningful in
> morphic takes a good chunk of space. The current code in the VM pretty  
> much
> relies upon there being some more memory to add via sqExpandMemory but
> a) not all platforms do that, so it crashes
> b) every platform will run low eventually and then see a) above.
> Having a threshold of 200kb when the VM may very well demand an extra  
> 200kb as
> part of trying to clean up to handle the fact that you don't have  
> 200kb free is
> not very safe.
>
> Has anyone ever done any testing to see just how many passes of  
> compacting can
> be survived? If we checked the actual free space (perhaps the  
> lowSpaceSignal
> state is better) and refused to give any to the fwdBlock table would  
> we simply
> be thrashing a bit more, or doomed?  Or should I just give up because  
> nobody
> can be bothered to actually think about how to do things right for a  
> change?
>
>
> tim
> --
> Tim Rowledge, [hidden email], http://sumeru.stanford.edu/tim
> "Bollocks," said Pooh being more forthright than usual
>
>
--
========================================================================
===
John M. McIntosh <[hidden email]> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===