After having problem trying to debug some TK4 code that blew up with lowspace
problems but never let me catch and debug, I spent some time adding the lowspace-process stuff we recently discussed. I had to make a few alterations to match it up with the latest 64bit clean code but no problems with that part. After building a VM I started testing with some of the methods in SystemDictionary 'memory space' - in particular #useUpMemory. It is perhaps fortunate that I did since the other #useUp* methods pretty much work once the lowspace process is caught by the vm and passed up to the image. After a _lot_ of head scratching by John & I we found that with a gazillion tiny objects (Links are the smallest possible objects that can exist on their own, plain Objects would have to be contained in a collection and so would cost the same 3 words per object) cause a catastrophic GC explosion. What happens is that memory fills up until we get to signal lowspace and then we are in danger. Depending upon the exact size of object memory in use the 200kb used as the lowSpaceThreshold can be gobbled up in one swallow by the initializeMemoryFirstFree: method making sure there is a byte per object that survived the markPhase. In using useUpMemory we can get to having 4 bytes of free space when the next allocate is attempted.... Ka-Boom. Expanding the lowSpaceThreshold (along with the VM changes to report the process and avoid the accidental problem of interrupting eventTickler) to a couple of mb makes it ok on my machine and the threshold can be a lot lower with the other tests that create bigger (hence fewer) objects (hence smaller fwdTable needs). In the worst case, we could have a very large OM filled with very small objects all surviving markPhase; in such a case we would need an additional 1/12 of OM available for the fwdTable. So for a 30Mb objectmemory we ought to set the lowSPaceThreshold to 30/13 => 2.31Mb + actual space needed to run the notifier/debugger etc for reasonable safety. Or hide the 2.31 Mb away so the image never even knows it is there. If you are using virtual memory and a limit of 512Mb then you should perhaps secrete 40Mb some where safe. This assumes that we really need to have one byte per object of course. The original rationale was to keep the number of compact loops down to eight (see Dan's comment in initializeMemoryFirstFree:) for Alan's large demo image. The nicest solution would be to come up with a way to do our GC & compacting without needing any extra space. Commence headscratching now... John suggested making sure the fwd gets less than the byte-per-object if things are tight, and accpting the extra compaction loops. Good news- with the vm change and 2Mb lowSpaceThreshold I can probably go back and find my TK4 problem(s). Bad news- consider Tweak. With lots of processes whizzing away, merely stopping the one that did the allocation and triggered the lowspace is not going to be much good. Stopping everything except the utterly essential stuff to debug the lowspace will be needed. Probably. More bad news- somehow, going from VMM37b5 to b6 cost 40% of performance on my machine :-( Bugger. tim -- Tim Rowledge, [hidden email], http://sumeru.stanford.edu/tim Every bug you find is the last one. |
Hi Tim -
> After having problem trying to debug some TK4 code that blew up with lowspace > problems but never let me catch and debug, I spent some time adding the > lowspace-process stuff we recently discussed. I had to make a few alterations > to match it up with the latest 64bit clean code but no problems with that part. What am I missing? I don't remember low-space stuff - I only remember interrupt-related stuff. > Depending upon the exact size of object memory in use the 200kb used as the > lowSpaceThreshold can be gobbled up in one swallow by the > initializeMemoryFirstFree: method making sure there is a byte per object that > survived the markPhase. In using useUpMemory we can get to having 4 bytes of > free space when the next allocate is attempted.... Ka-Boom. Well, so don't eat up the memory. There is no reason why initializeMemoryFirstFree: would have to reserve that much memory - like the comment says the reserve "should" be chosen so that compactions can be done in one pass but there is absolutely no such requirement. Multi-pass compactions have happened in the past and there is nothing wrong with them (in a low-space situation). > This assumes that we really need to have one byte per object of course. The > original rationale was to keep the number of compact loops down to eight (see > Dan's comment in initializeMemoryFirstFree:) for Alan's large demo image. The > nicest solution would be to come up with a way to do our GC & compacting > without needing any extra space. Commence headscratching now... John suggested > making sure the fwd gets less than the byte-per-object if things are tight, and > accpting the extra compaction loops. Yes. That's the only reasonable way of dealing with it. > Bad news- consider Tweak. With lots of processes whizzing away, merely stopping > the one that did the allocation and triggered the lowspace is not going to be > much good. Stopping everything except the utterly essential stuff to debug the > lowspace will be needed. Probably. Uh, oh. Are you telling me that the "low space stuff" you are referring to above actually suspends the process that triggers the low-space condition? Bad, bad, bad idea. Ever considered that this might be the timer process? The finalization process? Low-space is *not* a per-process condition; suspending the currently running process is something that should be done with great care (if at all). Please, don't suspend that process - put it away for the image to examine but by all means do NOT suspend it. If you give me a nice clean semaphore signal for Tweak to handle a low-space condition I know perfectly well what to do but if you just suspend a random process which may have absolutely nothing with the low space condition, then, yes, we are in trouble (if this were a tweak scheduler process you'd be totally hosed). Cheers, - Andreas |
On Apr 30, 2005, at 8:00 PM, Andreas Raab wrote: > Hi Tim - > >> After having problem trying to debug some TK4 code that blew up with >> lowspace >> problems but never let me catch and debug, I spent some time adding >> the >> lowspace-process stuff we recently discussed. I had to make a few >> alterations >> to match it up with the latest 64bit clean code but no problems with >> that part. > > What am I missing? I don't remember low-space stuff - I only remember > interrupt-related stuff. There was a mantis bug about low-space issues and some patchs to record which process caused the lowspace signal. Mind this in my opinion is wrong. > >> Depending upon the exact size of object memory in use the 200kb used >> as the >> lowSpaceThreshold can be gobbled up in one swallow by the >> initializeMemoryFirstFree: method making sure there is a byte per >> object that >> survived the markPhase. In using useUpMemory we can get to having 4 >> bytes of >> free space when the next allocate is attempted.... Ka-Boom. > > Well, so don't eat up the memory. There is no reason why > initializeMemoryFirstFree: would have to reserve that much memory - > like the comment says the reserve "should" be chosen so that > compactions can be done in one pass but there is absolutely no such > requirement. Multi-pass compactions have happened in the past and > there is nothing wrong with them (in a low-space situation). > >> This assumes that we really need to have one byte per object of >> course. The >> original rationale was to keep the number of compact loops down to >> eight (see >> Dan's comment in initializeMemoryFirstFree:) for Alan's large demo >> image. The >> nicest solution would be to come up with a way to do our GC & >> compacting >> without needing any extra space. Commence headscratching now... John >> suggested >> making sure the fwd gets less than the byte-per-object if things are >> tight, and >> accpting the extra compaction loops. > > Yes. That's the only reasonable way of dealing with it. What happens is the fwdblocks calculation grabs all the available free memory when it's recalculated after the full GC, the check for this condition actually backs it off to allow one object header free, 4 or 6 bytes I believe, usually you die right away because someone attempts to allocate a new context record and we don't have 98ish bytes free. I gave Tim a change set that attempts to maximise freespace to 100K by reducing fwdblocks down to 32k, once you hit the 32k limit freespace then heads towards zero of course. Note that once freespace goes under 200,000 we do signal the lowspace semaphore btw. These changes do require a VM change, but we did notice as Tim points out if you increase the lowspace threshold, say to 1MB in my testing the other night we'll get the semaphore signaled with a current VM, this would not occur before in an unaltered VM. > >> Bad news- consider Tweak. With lots of processes whizzing away, >> merely stopping >> the one that did the allocation and triggered the lowspace is not >> going to be >> much good. Stopping everything except the utterly essential stuff to >> debug the >> lowspace will be needed. Probably. > > Uh, oh. Are you telling me that the "low space stuff" you are > referring to above actually suspends the process that triggers the > low-space condition? Bad, bad, bad idea. Ever considered that this > might be the timer process? The finalization process? Low-space is > *not* a per-process condition; suspending the currently running > process is something that should be done with great care (if at all). > > Please, don't suspend that process - put it away for the image to > examine but by all means do NOT suspend it. If you give me a nice > clean semaphore signal for Tweak to handle a low-space condition I > know perfectly well what to do but if you just suspend a random > process which may have absolutely nothing with the low space > condition, then, yes, we are in trouble (if this were a tweak > scheduler process you'd be totally hosed). Tim and I were considering to suspend all user processes and others we don't have knowledge of being untouchable, then I pointed out Tweak spawns all these process, what do we do about them? Certainly we can call something to say lowspace Mr Tweak beware... The Process Browser logic has a table identifying processes of the VM, we assume a process the user created is causing the problem. The earlier fix suggested to stop the process that was running when the lowspace condition occurred, but I doubt you can 100% say that is the process in question and could as you know be the finalization process or other critical task. Still this is not harmful because the evil process in question is still running and will terminate your image in short order. > > Cheers, > - Andreas > > -- ======================================================================== === John M. McIntosh <[hidden email]> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by Andreas.Raab
In message <[hidden email]>
Andreas Raab <[hidden email]> wrote: > Hi Tim - > > What am I missing? I don't remember low-space stuff - I only remember > interrupt-related stuff. Handling the interrupt caused by lowspace condition being signalled - Mantis 1041. > > > Depending upon the exact size of object memory in use the 200kb used as the > > lowSpaceThreshold can be gobbled up in one swallow by the > > initializeMemoryFirstFree: method making sure there is a byte per object that > > survived the markPhase. In using useUpMemory we can get to having 4 bytes of > > free space when the next allocate is attempted.... Ka-Boom. > > Well, so don't eat up the memory. There is no reason why > initializeMemoryFirstFree: would have to reserve that much memory - like > the comment says the reserve "should" be chosen so that compactions can > be done in one pass but there is absolutely no such requirement. > Multi-pass compactions have happened in the past and there is nothing > wrong with them (in a low-space situation). See my earlier point (below) about the original rationale for wanting one byte per object to keep the multi-passness from exceeding eight. > > > This assumes that we really need to have one byte per object of course. The > > original rationale was to keep the number of compact loops down to eight (see > > Dan's comment in initializeMemoryFirstFree:) for Alan's large demo image. The > > nicest solution would be to come up with a way to do our GC & compacting > > without needing any extra space. Commence headscratching now... John suggested > > making sure the fwd gets less than the byte-per-object if things are tight, and > > accpting the extra compaction loops. > > Yes. That's the only reasonable way of dealing with it. It's a plausible way of dealing with the immediate-crash aspect but not the only way. And it doesn't make it much better if there isn't enough memory to allow the notifier/debugger to do any work once the signal is raised. > > > Bad news- consider Tweak. With lots of processes whizzing away, merely stopping > > the one that did the allocation and triggered the lowspace is not going to be > > much good. Stopping everything except the utterly essential stuff to debug the > > lowspace will be needed. Probably. > > Uh, oh. Are you telling me that the "low space stuff" you are referring > to above actually suspends the process that triggers the low-space > condition? No, it doesn't do anything like that. Take a look at the mantis 1041 commentary to remind yourself what is going on here. (And yes, I knew about lowspace not being a per-process issue about ten years before Squeak appeared....) If you take another look at what I wote I think you'll see that that is exactly what I was saying; with many processes in process, simply interrupting the one that happened to push the allocator over the limit isn't a sufficient response. So we're in agreement about the problem, let's try to find a good solution. Right now I think I'll find a good solution of aqueous caffeine compounds in elevated enthalpy dihydrogen monoxide. tim -- Tim Rowledge, [hidden email], http://sumeru.stanford.edu/tim Useful random insult:- Looks for the "Any" key. |
> If you take another look at what I wote I think you'll see that that is exactly
> what I was saying; with many processes in process, simply interrupting the one > that happened to push the allocator over the limit isn't a sufficient response. *Phew* Thanks, I'm relieved (I was trying to get to the server but I can't get to it right now). > So we're in agreement about the problem, let's try to find a good solution. You know, sometimes I wish we'd have swap space to really utilize. One of the nice things about swap space is that degradation is continuous so it's not the sudden "boom - you're out of memory" situation but rather a graceful "starting to get tight ... getting tighter ... now we're really running into trouble" situation. And most times you're running out of patience and interrupt whatever was going on long before you ran out of swap space. > Right now I think I'll find a good solution of aqueous caffeine compounds in > elevated enthalpy dihydrogen monoxide. *grin* Cheers, - Andreas |
On May 1, 2005, at 11:07 AM, Andreas Raab wrote: >> If you take another look at what I wote I think you'll see that that >> is exactly >> what I was saying; with many processes in process, simply >> interrupting the one >> that happened to push the allocator over the limit isn't a sufficient >> response. > > *Phew* Thanks, I'm relieved (I was trying to get to the server but I > can't get to it right now). > >> So we're in agreement about the problem, let's try to find a good >> solution. > > You know, sometimes I wish we'd have swap space to really utilize. One > of the nice things about swap space is that degradation is continuous > so it's not the sudden "boom - you're out of memory" situation but > rather a graceful "starting to get tight ... getting tighter ... now > we're really running into trouble" situation. And most times you're > running out of patience and interrupt whatever was going on long > before you ran out of swap space. > You could tag each process with an instance var that counts memory allocations, or memory allocation rate, then in a low space condition you slow down the fastest consumer. If you recall in the past I had some code to record dispatch time since there is only one place in the VM where the process switch occurs, same thought applies, then in the image you could have the lowspace logic consider the fastest memory allocation consumers. Perhaps tagging object allocation by process owner would be interesting, could after a full GC know how much memory per process is allocated... >> Right now I think I'll find a good solution of aqueous caffeine >> compounds in >> elevated enthalpy dihydrogen monoxide. > > *grin* > > Cheers, > - Andreas > > ======================================================================== === John M. McIntosh <[hidden email]> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by Andreas.Raab
So having the VM provide the oop of the Process that asked for the allocation
which passed the threshold is ok as a start. I think we need to set the lowSpaceThreshold much higher for a realistic chance of surviving the alert - several Mb seems to be much safer. Even without the GC code occasionally stealing a large chunk from the memory we're just declared is in short supply it doesn't seem smart to wait until memory is seriously tight before telling the user. Opening a debugger and doing anything meaningful in morphic takes a good chunk of space. The current code in the VM pretty much relies upon there being some more memory to add via sqExpandMemory but a) not all platforms do that, so it crashes b) every platform will run low eventually and then see a) above. Having a threshold of 200kb when the VM may very well demand an extra 200kb as part of trying to clean up to handle the fact that you don't have 200kb free is not very safe. Has anyone ever done any testing to see just how many passes of compacting can be survived? If we checked the actual free space (perhaps the lowSpaceSignal state is better) and refused to give any to the fwdBlock table would we simply be thrashing a bit more, or doomed? Or should I just give up because nobody can be bothered to actually think about how to do things right for a change? tim -- Tim Rowledge, [hidden email], http://sumeru.stanford.edu/tim "Bollocks," said Pooh being more forthright than usual |
I've taken it down to 32K on a 512MB image via that code that allocates
links... Grinds away until freespace goes under 98 bytes (can't allocate a context record). On May 3, 2005, at 6:54 PM, Tim Rowledge wrote: > So having the VM provide the oop of the Process that asked for the > allocation > which passed the threshold is ok as a start. > > I think we need to set the lowSpaceThreshold much higher for a > realistic chance > of surviving the alert - several Mb seems to be much safer. Even > without the GC > code occasionally stealing a large chunk from the memory we're just > declared is > in short supply it doesn't seem smart to wait until memory is > seriously tight > before telling the user. Opening a debugger and doing anything > meaningful in > morphic takes a good chunk of space. The current code in the VM pretty > much > relies upon there being some more memory to add via sqExpandMemory but > a) not all platforms do that, so it crashes > b) every platform will run low eventually and then see a) above. > Having a threshold of 200kb when the VM may very well demand an extra > 200kb as > part of trying to clean up to handle the fact that you don't have > 200kb free is > not very safe. > > Has anyone ever done any testing to see just how many passes of > compacting can > be survived? If we checked the actual free space (perhaps the > lowSpaceSignal > state is better) and refused to give any to the fwdBlock table would > we simply > be thrashing a bit more, or doomed? Or should I just give up because > nobody > can be bothered to actually think about how to do things right for a > change? > > > tim > -- > Tim Rowledge, [hidden email], http://sumeru.stanford.edu/tim > "Bollocks," said Pooh being more forthright than usual > > ======================================================================== === John M. McIntosh <[hidden email]> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
Free forum by Nabble | Edit this page |