Hi Guys -
I am just being very confused about the current behavior of Squeak in the case of memory allocation failure. In my use case I have incoming network requests which are handled at high I/O priority and need to allocate memory based on the size of the request. Given a malformed request, this can easily lead to an allocation failure which really should raise an error, be caught and be done with. However, there doesn't seem to be a way of handling low-space conditions by the client. In the case of an allocation failure, all that appears to be happening is that the low-space semaphore is being signaled with the obvious assumption that the low-space watcher will preempt the running process, make some space and continue. But equally obviously this just can't work if the running process is at a higher priority than the low-space process and since the running process recurses directly into #basicNew: again this will bring your system to a screeching halt. Since I can't possibly be the first person who noticed that (or at least I really don't hope I am) my question is, how do people deal with that situation in their production systems? I have never seen the issue discussed but I would expect that it has come up on some Seaside or other network-related lists. Right now I'm just thinking to do something like signaling an OutOfMemory error which as its default action would signal the lowspace condition, leaving the client with the option to handle the request differently if needed. Cheers, - Andreas |
If you set Smalltalk setGCBiasToGrow: 1
you may get different behavior, assume your vm supports that and you noted the issue with SLANG dropping the code I talked about last moth. The code was fixed, not slang, although I don't think Tim has put the fix in to VMMaker yet (hint hint). Really of course the VM must signal low space at some point if it can't grow, and the process must run instead of something else do whatever, then allow the process that is doing the memory allocation to run and try again. I recall in VW it would attempt the allocation, if it failed it would mutter things to the memory policy to allocate this much more memory, with a % of extra slack, then retry with possible failure, the key being the process asking for the memory would be waiting for the VM to adjust the memory footprint. One failure case in the past was setting the % too low so that other processes would chew up the newly allocated memory leaving you without the memory you just asked for. Of course if the memory requested will push you over the memoryLimit set for the VM, nothing will help. On Feb 10, 2007, at 10:40 AM, Andreas Raab wrote: > Hi Guys - > > I am just being very confused about the current behavior of Squeak > in the case of memory allocation failure. In my use case I have > incoming network requests which are handled at high I/O priority > and need to allocate memory based on the size of the request. Given > a malformed request, this can easily lead to an allocation failure > which really should raise an error, be caught and be done with. > > However, there doesn't seem to be a way of handling low-space > conditions by the client. In the case of an allocation failure, > all that appears to be happening is that the low-space semaphore is > being signaled with the obvious assumption that the low-space > watcher will preempt the running process, make some space and > continue. But equally obviously this just can't work if the running > process is at a higher priority than the low-space process and > since the running process recurses directly into #basicNew: again > this will bring your system to a screeching halt. > > Since I can't possibly be the first person who noticed that (or at > least I really don't hope I am) my question is, how do people deal > with that situation in their production systems? I have never seen > the issue discussed but I would expect that it has come up on some > Seaside or other network-related lists. > > Right now I'm just thinking to do something like signaling an > OutOfMemory error which as its default action would signal the > lowspace condition, leaving the client with the option to handle > the request differently if needed. > > Cheers, > - Andreas > -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
On 10-Feb-07, at 9:05 PM, John M McIntosh wrote: > If you set Smalltalk setGCBiasToGrow: 1 > you may get different behavior, assume your vm supports that and > you noted the issue with SLANG dropping the code I talked about > last moth. The code was fixed, not slang, although I don't think > Tim has put the fix in to VMMaker yet (hint hint). Money for time, hint, hint. > > Really of course the VM must signal low space at some point if it > can't grow, and the process must run instead of something else do > whatever, then allow the process that is doing the memory > allocation to run and try again. Exactly. I'm a bit surprised to read of any process - except possibly the tick etc- running at a higher priority that the lowspace handler. That seems a bit daft; any process that could possibly allocate memory should be running at a lower process, unless perhaps one provides some sort of process locking mechanism. Aside from in-image issues of policy to decide on whether to try to grow or not, there are VM complications in the limit set in some cases as well as a particularly egregious situation where the VM will steal a substantial chunk of the memory that is thought to be available in order to do a bit of gc work. It can go so far as to leave just a few bytes for the allocator which typically isn't a happy place to end up. It *is* possible to make a memory resilient system. The ancient ActiveBook system built from Eliot's BHH was routinely tested by running it down to a few hundred bytes of free memory and a dozen or so free oops (this was an OT system) and it always recovered cleanly. It takes tim, which takes money. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Strange OpCodes: OI: Vey |
In reply to this post by Andreas.Raab
Also see my notes from
Subject: Re: lowspace signalling and handling issues Date: May 3, 2005 7:56:09 PM PDT (CA) >I've taken it down to 32K on a 512MB image via that code that allocates links... >Grinds away until freespace goes under 98 bytes (can't allocate a context record). but there was no interest in sticking those changes into the VM. Would have to hunt for the bits. It removed some complication between the low memory signal, the upperboundary check and doing a GC. However I think it was simpler to turn on the bias to grow logic. On Feb 10, 2007, at 10:40 AM, Andreas Raab wrote: > Hi Guys - > > I am just being very confused about the current behavior of Squeak > in the case of memory allocation failure. In my use case I have > incoming network requests which are handled at high I/O priority > and need to allocate memory based on the size of the request. Given > a malformed request, this can easily lead to an allocation failure > which really should raise an error, be caught and be done with. > > However, there doesn't seem to be a way of handling low-space > conditions by the client. In the case of an allocation failure, > all that appears to be happening is that the low-space semaphore is > being signaled with the obvious assumption that the low-space > watcher will preempt the running process, make some space and > continue. But equally obviously this just can't work if the running > process is at a higher priority than the low-space process and > since the running process recurses directly into #basicNew: again > this will bring your system to a screeching halt. > > Since I can't possibly be the first person who noticed that (or at > least I really don't hope I am) my question is, how do people deal > with that situation in their production systems? I have never seen > the issue discussed but I would expect that it has come up on some > Seaside or other network-related lists. > > Right now I'm just thinking to do something like signaling an > OutOfMemory error which as its default action would signal the > lowspace condition, leaving the client with the option to handle > the request differently if needed. > > Cheers, > - Andreas > -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by timrowledge
tim Rowledge wrote:
>> Really of course the VM must signal low space at some point if it >> can't grow, and the process must run instead of something else do >> whatever, then allow the process that is doing the memory allocation >> to run and try again. > > Exactly. I'm a bit surprised to read of any process - except possibly > the tick etc- running at a higher priority that the lowspace handler. > That seems a bit daft; any process that could possibly allocate memory > should be running at a lower process, unless perhaps one provides some > sort of process locking mechanism. There seem to be two misunderstandings here. For one thing, the lowspace watcher runs at lowIOPriority and there are *plenty* of processes running at that priority or higher. Secondly, even if when signaling the low space semaphore (which can be seen as a *hint* to the system that we're in trouble with respect to memory) the outcome of an allocation ought to be either that the memory was allocated, or that an error is raised. What sense does it make for Behavior>>basicNew: to signal the lowspace semaphore? The result is that you can lock up the system as simply as here: [Array new: SmallInteger maxVal] forkAt: Processor lowIOPriority. And what's the point of that? At least I would expect if the allocation within basicNew: fails we get a proper error condition. But side-effecting by signaling the lowspace semaphore? What good does that do? In particular considering that the lowspace semaphore can't really do anything because it doesn't even know which process got interrupted! Sorry, but this seems Just Wrong(tm). > Aside from in-image issues of policy to decide on whether to try to grow > or not, there are VM complications in the limit set in some cases as > well as a particularly egregious situation where the VM will steal a > substantial chunk of the memory that is thought to be available in order > to do a bit of gc work. It can go so far as to leave just a few bytes > for the allocator which typically isn't a happy place to end up. Not really. Whether the VM is capable of allocating memory or not is a binary decision. There is nothing complicated about it. Whether it can recover from a failed allocation is of course a different question but that's why we have the red zone which triggers a low space condition when we enter it - the red zone is still sufficient to do a variety of things. But when allocation fails, it fails, there is no policy. It fails. Cheers, - Andreas |
On Sun, Feb 11, 2007 at 02:08:29AM -0800, Andreas Raab wrote:
> In particular considering that the lowspace semaphore can't really > do anything because it doesn't even know which process got interrupted! Andreas, Does your image have the fix from Mantis 1041? "Under certain conditions the low space watcher was unable to determine the correct process to suspend following a low space signal. These changes permit the VM to remember the identity of the process that caused the low space condition, and to report it to the image through a primitive." Low space notification was badly broken for quite a while, including 3.8 images, but should be somewhat less broken after applying this change. This might affect Squeakland or OLPC images, I'm not sure. Dave |
In reply to this post by Andreas.Raab
On Feb 11, 2007, at 2:08 AM, Andreas Raab wrote: > > > Not really. Whether the VM is capable of allocating memory or not > is a binary decision. There is nothing complicated about it. > Whether it can recover from a failed allocation is of course a > different question but that's why we have the red zone which > triggers a low space condition when we enter it - the red zone is > still sufficient to do a variety of things. But when allocation > fails, it fails, there is no policy. It fails. Well you get to invent new Policy. My comment about having basicNew tell the VM there is a problem and then retry with failure *after* something has been done, seemed fairly reasonable. Your example of Array new: SmallInteger maxVal would of course fail because the Policy (TM) would look at say oh currently used memory + (SmallInteger maxVal) > ceiling of Mac Carbon VM (which by default is 512k) thus your toast. Some complication exist because once you are into failure mode is that because of an extra-ordinary request or are you hitting the maximum celing . One could of course have a chunk of reserved memory that one could free (couple of MB?) Still in cases of recursive runaway process it's difficult to provide enough time for the developer to do something. One could even determine which processes *must* run, versus user processes which could be halted at the time of allocation failure. > > Cheers, > - Andreas > -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by David T. Lewis
David T. Lewis wrote:
> On Sun, Feb 11, 2007 at 02:08:29AM -0800, Andreas Raab wrote: >> In particular considering that the lowspace semaphore can't really >> do anything because it doesn't even know which process got interrupted! > > Does your image have the fix from Mantis 1041? No, but that doesn't really matter. My point was that a low-priority process has no chance to ever interrupt a higher-priority process. And I doubt your fix changes that. Cheers, - Andreas > "Under certain conditions the low space watcher was unable to determine the > correct process to suspend following a low space signal. These changes permit > the VM to remember the identity of the process that caused the low space > condition, and to report it to the image through a primitive." > > Low space notification was badly broken for quite a while, including 3.8 > images, but should be somewhat less broken after applying this change. > This might affect Squeakland or OLPC images, I'm not sure. > > Dave > > > > |
On 11-Feb-07, at 12:15 PM, Andreas Raab wrote: > David T. Lewis wrote: >> On Sun, Feb 11, 2007 at 02:08:29AM -0800, Andreas Raab wrote: >>> In particular considering that the lowspace semaphore can't >>> really do anything because it doesn't even know which process got >>> interrupted! >> Does your image have the fix from Mantis 1041? > > No, but that doesn't really matter. My point was that a low- > priority process has no chance to ever interrupt a higher-priority > process. And I doubt your fix changes that. No, it simply does a somewhat better job of guessing which process might be the problem. We could, as I'm pretty sure we have discussed, find some way to include the oop of the process that caused the allocation problem in the semaphore more directly, which would improve things a touch more by avoiding the possibility of race conditions. The real problem with identifying the *actually* problematic process is that the allocation request that triggers a lowspace may well not be part of the actual space hog. Suspending the wrong process and letting others - including maybe the monster - simply leads to more trouble. If the lowspace handler suspended all other processes it would obviate some of the problems. If we wanted to interact with users as part of the handler we might have to permit some other process to start or resume, perhaps under some restrictions. If simply doing a gc solved the space problem then we could simply allow everything else to resume. And we should remove direct in-vm calls to gc wherever possible so that in-image code can apply more flexible policies. Having a very high priority process to handle low space conditions seems like a plausible idea to me. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim A bug in the code is worth two in the documentation. |
In reply to this post by Andreas.Raab
On 11-Feb-07, at 2:08 AM, Andreas Raab wrote: > > [snip] that's why we have the red zone which triggers a low space > condition when we enter it - the red zone is still sufficient to do > a variety of things. This turns out to be incorrect. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Disclaimer: Any errors in spelling, tact, or fact are transmission errors. |
tim Rowledge wrote:
> On 11-Feb-07, at 2:08 AM, Andreas Raab wrote: > >> [snip] that's why we have the red zone which triggers a low space >> condition when we enter it - the red zone is still sufficient to do a >> variety of things. > > This turns out to be incorrect. In which way? Is it too small? We used to execute it regularly in the old days before the VM would grow memory dynamically, so there is a chance that this hasn't been executed in a while and needs some adjustment. Still, the basic underlying principle of giving advanced warning and have the image react to *that* as opposed to an actual allocation failure seems sound to me. Cheers, - Andreas |
On 11-Feb-07, at 12:56 PM, Andreas Raab wrote: > tim Rowledge wrote: >> On 11-Feb-07, at 2:08 AM, Andreas Raab wrote: >>> [snip] that's why we have the red zone which triggers a low space >>> condition when we enter it - the red zone is still sufficient to >>> do a variety of things. >> This turns out to be incorrect. > > In which way? Is it too small? We used to execute it regularly in > the old days before the VM would grow memory dynamically, so there > is a chance that this hasn't been executed in a while and needs > some adjustment. Still, the basic underlying principle of giving > advanced warning and have the image react to *that* as opposed to > an actual allocation failure seems sound to me. > > Cheers, > - Andreas > http://lists.squeakfoundation.org/pipermail/vm-dev/2005-May/000213.html tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Useful random insult:- A one-bit brain with a parity error. |
Free forum by Nabble | Edit this page |