Smalltalk › Squeak › Squeak VM

Efficient thread-local shared variables

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

3 messages Options

Andreas.Raab

Efficient thread-local shared variables

Folks -

For a variety of reasons I am in dire need of the ability to vector
shared variables (globals, class vars and pool vars) through an extra
indirection vector per process (really per island but binding per
process seems to be simpler for now). Since I need this for *each and
every shared variable* it needs to be *very* efficient.

The question is: What is the most efficient way to implement such a
scheme? There are a couple of ways I can think about:

1) Just use a dictionary. The main disadvantage is the lookup cost which
could be handled by making it a special kind of dictionary and
implementing the lookup in a primitive. This is a good fallback position
but probably just a little slow in general. It could implemented by
something along the lines of:

ProtoObject>>lookup: sharedBinding
"Look up the value of the given shared binding in the currently
executing process."
^Processor activeProcess scope at: sharedBinding ifAbsent:[nil].

which is pretty straightforward.

2) Use message lookup, e.g., send a message. This is simple to describe
but not necessarily simple to implement correctly. Here is how the
simulation would look like:

ProtoObject>>lookup: sharedBinding
"Look up the value of the given shared binding in the currently
executing process."
^[Processor activeProcess scope perform: sharedBinding key]
on: MessageNotUnderstood do:[:ex| ex return: nil].

One problem here is that the key needs to be unique within all possible
keys which is a problem if there is a name conflict. This can be
resolved by implicitly prefixing names with the place where they are
defined so it's not such big of a deal conceptually but practically the
impact of that change might be more visible.

The other problem is that the scope object needs to hold all the objects
which means quite a number of them. OTOH, one could argue that in many
ways "Smalltalk" is just an object with a few thousand iVars so having a
class representing the namespace defined by Smalltalk may be quite
reasonable.

3) Use "some" integer index caching scheme. The main idea here is in
realizing that really, option #2 doesn't quite work since classes can't
have more than 256 iVars so we'd need to have an indirection through an
array to be able to access these variables. If that is so, then why
can't we inline the entire access pattern and have the scope just be an
array that we index directly?

This is actually the most interesting approach to me because (as far as
I can tell) it would be by far the most efficient. The basic idea goes
like this: If all shared variables are assigned a "global index" then
only this index is required to use them. Any use of the shared variable
Foo would be inlined to "Processor activeProcess scope at: FooIndex"
which (given proper primitive support) would probably be by far the
fastest version (if offered a byte code it should rival the current
speed of accessing shared variables). [I'll admit that there are some
tricky issues with this approach as well, like the size needed for the
scope object and whether or not to use hash lookup instead of indexing]

In any case, I'm trying to gather options. If any of you have any new
ideas or have tried one or the other (successfully or not) or have any
other comments to make I'd love to hear about it.

Cheers,
- Andreas

Klaus D. Witzel

Re: Efficient thread-local shared variables

Hi Andreas,

on Tue, 24 Oct 2006 06:46:26 +0200, you wrote:

> Folks -
>
> For a variety of reasons I am in dire need of the ability to vector
> shared variables (globals, class vars and pool vars) through an extra
> indirection vector per process (really per island but binding per
> process seems to be simpler for now). Since I need this for *each and
> every shared variable* it needs to be *very* efficient.
>
> The question is: What is the most efficient way to implement such a
> scheme?

The fastest indirect access is through literal variables (limited only by
the # of literals allowed per method).

Since you are willing to spend a #symbol per variable, formally declare a
"descriptor" to be a class var (or use a pool). Take #PerProcessThing as
as example; initialize PerProcessThing to a subinstance of Association
which holds a fast and fixed Array index.

Then all you need in the scope of activeProcess is a shared Array which is
indexed by the above machinery. Example use:

PerProcessThing localSharedValue
PerProcessThing localSharedValue: somethingElse

Not counting "Processor activeProcess scope", the above is the fastest
double-indirect access that I can think of.

/Klaus

Klaus D. Witzel

Re: Efficient thread-local shared variables

In reply to this post by Andreas.Raab

Hi Nicolas,

on Tue, 24 Oct 2006 09:39:31 +0200, you wrote:
> Hi Klaus and Andreas
> I find the local shared variable feature most useful.
> I'am trying to understand your suggestions,
...
> Klaus:
> - have a single SharedPool with values being an array, and an index per
> process?

No, one Array per process and one integer index per shared variable.

> pseudo code for PerProcessThing:
> PerProcessThing localSharedValue
> where localSharedValue is (^self at: Processor activeProcess
> processIndex)

No, Association subclass #LocalSharedVariable and then
LocalSharedVariable>>localSharedValue
<primitive: 4711>
"this is what the primitive does faster:"
^ Processor activeProcess scope localSharedArray at: value

LocalSharedVariable's key is the same as the key in the PerProcessThing
association (for convenience), and LocalSharedVariable's value is the
integer index into the localSharedArray.

If you spend a primitive implementation and suppose that the example
compiles to

pushLiteralVariable: (#PerProcessThing -> aLocalSharedVariable)
send: #localSharedValue "handled by primitive 4711"

So exactly two bytecodes (without context switch) and, since you at least
need to tell a "descriptor" and what you want from it (get or set a
value), this identifies the least number of bytecodes necessary.

> - have to reset all the shared value arrays each time a process is
> created or die...

This is independent of any proposal, you always have to allocate the
shared value array per process, like in
[[self allocateSharedValueArray.
self doTheJob] ensure:
[self destroySharedValueArray]] fork

> I am not sure i understood well Klaus proposition
> Did i get it ?

I think so :)

/Klaus

> Nicolas
>
> Le Mardi 24 Octobre 2006 07:33, Klaus D. Witzel a écrit :
>> Hi Andreas,
>>
>> on Tue, 24 Oct 2006 06:46:26 +0200, you wrote:
>> > Folks -
>> >
>> > For a variety of reasons I am in dire need of the ability to vector
>> > shared variables (globals, class vars and pool vars) through an extra
>> > indirection vector per process (really per island but binding per
>> > process seems to be simpler for now). Since I need this for *each and
>> > every shared variable* it needs to be *very* efficient.
>> >
>> > The question is: What is the most efficient way to implement such a
>> > scheme?
>>
>> The fastest indirect access is through literal variables (limited only
>> by
>> the # of literals allowed per method).
>>
>> Since you are willing to spend a #symbol per variable, formally declare
>> a
>> "descriptor" to be a class var (or use a pool). Take #PerProcessThing as
>> as example; initialize PerProcessThing to a subinstance of Association
>> which holds a fast and fixed Array index.
>>
>> Then all you need in the scope of activeProcess is a shared Array which
>> is
>> indexed by the above machinery. Example use:
>>
>> PerProcessThing localSharedValue
>> PerProcessThing localSharedValue: somethingElse
>>
>> Not counting "Processor activeProcess scope", the above is the fastest
>> double-indirect access that I can think of.
>>
>> /Klaus
>
>
>
> ________________________________________________________________________
> iFRANCE, exprimez-vous !
> http://web.ifrance.com