Smalltalk › Squeak › Squeak - Dev

Low-space signals in production environments

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

12 messages Options

Andreas.Raab

Low-space signals in production environments

Hi Guys -

I am just being very confused about the current behavior of Squeak in
the case of memory allocation failure. In my use case I have incoming
network requests which are handled at high I/O priority and need to
allocate memory based on the size of the request. Given a malformed
request, this can easily lead to an allocation failure which really
should raise an error, be caught and be done with.

However, there doesn't seem to be a way of handling low-space conditions
by the client. In the case of an allocation failure, all that appears
to be happening is that the low-space semaphore is being signaled with
the obvious assumption that the low-space watcher will preempt the
running process, make some space and continue. But equally obviously
this just can't work if the running process is at a higher priority than
the low-space process and since the running process recurses directly
into #basicNew: again this will bring your system to a screeching halt.

Since I can't possibly be the first person who noticed that (or at least
I really don't hope I am) my question is, how do people deal with that
situation in their production systems? I have never seen the issue
discussed but I would expect that it has come up on some Seaside or
other network-related lists.

Right now I'm just thinking to do something like signaling an
OutOfMemory error which as its default action would signal the lowspace
condition, leaving the client with the option to handle the request
differently if needed.

Cheers,
- Andreas

johnmci

Re: Low-space signals in production environments

If you set Smalltalk setGCBiasToGrow: 1
you may get different behavior, assume your vm supports that and you
noted the issue with SLANG dropping the code I talked about last
moth. The code was fixed, not slang, although I don't think Tim has
put the fix in to VMMaker yet (hint hint).

Really of course the VM must signal low space at some point if it
can't grow, and the process must run instead of something else do
whatever, then allow the process that is doing the memory allocation
to run and try again.

I recall in VW it would attempt the allocation, if it failed it would
mutter things to the memory policy to allocate this much more memory,
with a % of extra slack, then retry with possible failure, the key
being the process asking for the memory would be waiting for the VM
to adjust the memory footprint. One failure case in the past was
setting the % too low so that other processes would chew up the newly
allocated memory leaving you without the memory you just asked for.
Of course if the memory requested will push you over the memoryLimit
set for the VM, nothing will help.

On Feb 10, 2007, at 10:40 AM, Andreas Raab wrote:

> Hi Guys -
>
> I am just being very confused about the current behavior of Squeak
> in the case of memory allocation failure. In my use case I have
> incoming network requests which are handled at high I/O priority
> and need to allocate memory based on the size of the request. Given
> a malformed request, this can easily lead to an allocation failure
> which really should raise an error, be caught and be done with.
>
> However, there doesn't seem to be a way of handling low-space
> conditions by the client. In the case of an allocation failure,
> all that appears to be happening is that the low-space semaphore is
> being signaled with the obvious assumption that the low-space
> watcher will preempt the running process, make some space and
> continue. But equally obviously this just can't work if the running
> process is at a higher priority than the low-space process and
> since the running process recurses directly into #basicNew: again
> this will bring your system to a screeching halt.
>
> Since I can't possibly be the first person who noticed that (or at
> least I really don't hope I am) my question is, how do people deal
> with that situation in their production systems? I have never seen
> the issue discussed but I would expect that it has come up on some
> Seaside or other network-related lists.
>
> Right now I'm just thinking to do something like signaling an
> OutOfMemory error which as its default action would signal the
> lowspace condition, leaving the client with the option to handle
> the request differently if needed.
>
> Cheers,
> - Andreas
>

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

timrowledge

Re: Low-space signals in production environments

On 10-Feb-07, at 9:05 PM, John M McIntosh wrote:

> If you set Smalltalk setGCBiasToGrow: 1
> you may get different behavior, assume your vm supports that and
> you noted the issue with SLANG dropping the code I talked about
> last moth. The code was fixed, not slang, although I don't think
> Tim has put the fix in to VMMaker yet (hint hint).

Money for time, hint, hint.

>
> Really of course the VM must signal low space at some point if it
> can't grow, and the process must run instead of something else do
> whatever, then allow the process that is doing the memory
> allocation to run and try again.

Exactly. I'm a bit surprised to read of any process - except possibly
the tick etc- running at a higher priority that the lowspace handler.
That seems a bit daft; any process that could possibly allocate
memory should be running at a lower process, unless perhaps one
provides some sort of process locking mechanism.

Aside from in-image issues of policy to decide on whether to try to
grow or not, there are VM complications in the limit set in some
cases as well as a particularly egregious situation where the VM will
steal a substantial chunk of the memory that is thought to be
available in order to do a bit of gc work. It can go so far as to
leave just a few bytes for the allocator which typically isn't a
happy place to end up.

It *is* possible to make a memory resilient system. The ancient
ActiveBook system built from Eliot's BHH was routinely tested by
running it down to a few hundred bytes of free memory and a dozen or
so free oops (this was an OT system) and it always recovered cleanly.

It takes tim, which takes money.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: OI: Vey

johnmci

Re: Low-space signals in production environments

In reply to this post by Andreas.Raab

Also see my notes from
Subject: Re: lowspace signalling and handling issues
Date: May 3, 2005 7:56:09 PM PDT (CA)

>I've taken it down to 32K on a 512MB image via that code that
allocates links...
>Grinds away until freespace goes under 98 bytes (can't allocate a
context record).

but there was no interest in sticking those changes into the VM.
Would have to hunt for the bits. It removed some complication between
the low memory signal, the upperboundary check and doing a GC.
However I think it was simpler to turn on the bias to grow logic.

On Feb 10, 2007, at 10:40 AM, Andreas Raab wrote:

Andreas.Raab

Re: Low-space signals in production environments

In reply to this post by timrowledge

tim Rowledge wrote:

>> Really of course the VM must signal low space at some point if it
>> can't grow, and the process must run instead of something else do
>> whatever, then allow the process that is doing the memory allocation
>> to run and try again.
>
> Exactly. I'm a bit surprised to read of any process - except possibly
> the tick etc- running at a higher priority that the lowspace handler.
> That seems a bit daft; any process that could possibly allocate memory
> should be running at a lower process, unless perhaps one provides some
> sort of process locking mechanism.

There seem to be two misunderstandings here. For one thing, the lowspace
watcher runs at lowIOPriority and there are *plenty* of processes
running at that priority or higher.

Secondly, even if when signaling the low space semaphore (which can be
seen as a *hint* to the system that we're in trouble with respect to
memory) the outcome of an allocation ought to be either that the memory
was allocated, or that an error is raised. What sense does it make for
Behavior>>basicNew: to signal the lowspace semaphore? The result is that
you can lock up the system as simply as here:

[Array new: SmallInteger maxVal] forkAt: Processor lowIOPriority.

And what's the point of that? At least I would expect if the allocation
within basicNew: fails we get a proper error condition. But
side-effecting by signaling the lowspace semaphore? What good does that
do? In particular considering that the lowspace semaphore can't really
do anything because it doesn't even know which process got interrupted!
Sorry, but this seems Just Wrong(tm).

> Aside from in-image issues of policy to decide on whether to try to grow
> or not, there are VM complications in the limit set in some cases as
> well as a particularly egregious situation where the VM will steal a
> substantial chunk of the memory that is thought to be available in order
> to do a bit of gc work. It can go so far as to leave just a few bytes
> for the allocator which typically isn't a happy place to end up.

Not really. Whether the VM is capable of allocating memory or not is a
binary decision. There is nothing complicated about it. Whether it can
recover from a failed allocation is of course a different question but
that's why we have the red zone which triggers a low space condition
when we enter it - the red zone is still sufficient to do a variety of
things. But when allocation fails, it fails, there is no policy. It fails.

Cheers,
- Andreas

David T. Lewis

Re: Low-space signals in production environments

On Sun, Feb 11, 2007 at 02:08:29AM -0800, Andreas Raab wrote:
> In particular considering that the lowspace semaphore can't really
> do anything because it doesn't even know which process got interrupted!

Andreas,

Does your image have the fix from Mantis 1041?

"Under certain conditions the low space watcher was unable to determine the
correct process to suspend following a low space signal. These changes permit
the VM to remember the identity of the process that caused the low space
condition, and to report it to the image through a primitive."

Low space notification was badly broken for quite a while, including 3.8
images, but should be somewhat less broken after applying this change.
This might affect Squeakland or OLPC images, I'm not sure.

Dave

johnmci

Re: Low-space signals in production environments

In reply to this post by Andreas.Raab

On Feb 11, 2007, at 2:08 AM, Andreas Raab wrote:

>
>
> Not really. Whether the VM is capable of allocating memory or not
> is a binary decision. There is nothing complicated about it.
> Whether it can recover from a failed allocation is of course a
> different question but that's why we have the red zone which
> triggers a low space condition when we enter it - the red zone is
> still sufficient to do a variety of things. But when allocation
> fails, it fails, there is no policy. It fails.

Well you get to invent new Policy.

My comment about having basicNew tell the VM there is a problem and
then retry with failure *after* something has been done, seemed
fairly reasonable.

Your example of Array new: SmallInteger maxVal would of course fail
because the Policy (TM) would look at say oh currently used memory +
(SmallInteger maxVal) > ceiling of Mac Carbon VM (which by default is
512k) thus your toast. Some complication exist because once you
are into failure mode is that because of an extra-ordinary request or
are you hitting the maximum celing . One could of course have a chunk
of reserved memory that one could free (couple of MB?) Still in
cases of recursive runaway process it's difficult to provide enough
time for the developer to do something.

One could even determine which processes *must* run, versus user
processes which could be halted at the time of allocation failure.

>
> Cheers,
> - Andreas
>

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

Andreas.Raab

Re: Low-space signals in production environments

In reply to this post by David T. Lewis

David T. Lewis wrote:
> On Sun, Feb 11, 2007 at 02:08:29AM -0800, Andreas Raab wrote:
>> In particular considering that the lowspace semaphore can't really
>> do anything because it doesn't even know which process got interrupted!
>
> Does your image have the fix from Mantis 1041?

No, but that doesn't really matter. My point was that a low-priority
process has no chance to ever interrupt a higher-priority process. And I
doubt your fix changes that.

Cheers,
- Andreas

> "Under certain conditions the low space watcher was unable to determine the
> correct process to suspend following a low space signal. These changes permit
> the VM to remember the identity of the process that caused the low space
> condition, and to report it to the image through a primitive."
>
> Low space notification was badly broken for quite a while, including 3.8
> images, but should be somewhat less broken after applying this change.
> This might affect Squeakland or OLPC images, I'm not sure.
>
> Dave
>
>
>
>

timrowledge

Re: Low-space signals in production environments

On 11-Feb-07, at 12:15 PM, Andreas Raab wrote:

> David T. Lewis wrote:
>> On Sun, Feb 11, 2007 at 02:08:29AM -0800, Andreas Raab wrote:
>>> In particular considering that the lowspace semaphore can't
>>> really do anything because it doesn't even know which process got
>>> interrupted!
>> Does your image have the fix from Mantis 1041?
>
> No, but that doesn't really matter. My point was that a low-
> priority process has no chance to ever interrupt a higher-priority
> process. And I doubt your fix changes that.

No, it simply does a somewhat better job of guessing which process
might be the problem.

We could, as I'm pretty sure we have discussed, find some way to
include the oop of the process that caused the allocation problem in
the semaphore more directly, which would improve things a touch more
by avoiding the possibility of race conditions. The real problem with
identifying the *actually* problematic process is that the allocation
request that triggers a lowspace may well not be part of the actual
space hog. Suspending the wrong process and letting others -
including maybe the monster - simply leads to more trouble.

If the lowspace handler suspended all other processes it would
obviate some of the problems. If we wanted to interact with users as
part of the handler we might have to permit some other process to
start or resume, perhaps under some restrictions. If simply doing a
gc solved the space problem then we could simply allow everything
else to resume. And we should remove direct in-vm calls to gc
wherever possible so that in-image code can apply more flexible
policies.

Having a very high priority process to handle low space conditions
seems like a plausible idea to me.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
A bug in the code is worth two in the documentation.

timrowledge

Re: Low-space signals in production environments

In reply to this post by Andreas.Raab

On 11-Feb-07, at 2:08 AM, Andreas Raab wrote:

>
> [snip] that's why we have the red zone which triggers a low space
> condition when we enter it - the red zone is still sufficient to do
> a variety of things.

This turns out to be incorrect.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Disclaimer: Any errors in spelling, tact, or fact are transmission
errors.

Andreas.Raab

Re: Low-space signals in production environments

tim Rowledge wrote:
> On 11-Feb-07, at 2:08 AM, Andreas Raab wrote:
>
>> [snip] that's why we have the red zone which triggers a low space
>> condition when we enter it - the red zone is still sufficient to do a
>> variety of things.
>
> This turns out to be incorrect.

In which way? Is it too small? We used to execute it regularly in the
old days before the VM would grow memory dynamically, so there is a
chance that this hasn't been executed in a while and needs some
adjustment. Still, the basic underlying principle of giving advanced
warning and have the image react to *that* as opposed to an actual
allocation failure seems sound to me.

Cheers,
- Andreas

timrowledge

Re: Low-space signals in production environments

On 11-Feb-07, at 12:56 PM, Andreas Raab wrote:

> tim Rowledge wrote:
>> On 11-Feb-07, at 2:08 AM, Andreas Raab wrote:
>>> [snip] that's why we have the red zone which triggers a low space
>>> condition when we enter it - the red zone is still sufficient to
>>> do a variety of things.
>> This turns out to be incorrect.
>
> In which way? Is it too small? We used to execute it regularly in
> the old days before the VM would grow memory dynamically, so there
> is a chance that this hasn't been executed in a while and needs
> some adjustment. Still, the basic underlying principle of giving
> advanced warning and have the image react to *that* as opposed to
> an actual allocation failure seems sound to me.
>
> Cheers,
> - Andreas
>

We collectively discussed this a while back:-
http://lists.squeakfoundation.org/pipermail/vm-dev/2005-May/000213.html

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- A one-bit brain with a parity error.