Efficient thread-local shared variables

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Efficient thread-local shared variables

Andreas.Raab
Folks -

For a variety of reasons I am in dire need of the ability to vector
shared variables (globals, class vars and pool vars) through an extra
indirection vector per process (really per island but binding per
process seems to be simpler for now). Since I need this for *each and
every shared variable* it needs to be *very* efficient.

The question is: What is the most efficient way to implement such a
scheme? There are a couple of ways I can think about:

1) Just use a dictionary. The main disadvantage is the lookup cost which
could be handled by making it a special kind of dictionary and
implementing the lookup in a primitive. This is a good fallback position
but probably just a little slow in general. It could implemented by
something along the lines of:

ProtoObject>>lookup: sharedBinding
     "Look up the value of the given shared binding in the currently
executing process."
     ^Processor activeProcess scope at: sharedBinding ifAbsent:[nil].

which is pretty straightforward.

2) Use message lookup, e.g., send a message. This is simple to describe
but not necessarily simple to implement correctly. Here is how the
simulation would look like:

ProtoObject>>lookup: sharedBinding
     "Look up the value of the given shared binding in the currently
executing process."
     ^[Processor activeProcess scope perform: sharedBinding key]
        on: MessageNotUnderstood do:[:ex| ex return: nil].

One problem here is that the key needs to be unique within all possible
keys which is a problem if there is a name conflict. This can be
resolved by implicitly prefixing names with the place where they are
defined so it's not such big of a deal conceptually but practically the
impact of that change might be more visible.

The other problem is that the scope object needs to hold all the objects
which means quite a number of them. OTOH, one could argue that in many
ways "Smalltalk" is just an object with a few thousand iVars so having a
class representing the namespace defined by Smalltalk may be quite
reasonable.

3) Use "some" integer index caching scheme. The main idea here is in
realizing that really, option #2 doesn't quite work since classes can't
have more than 256 iVars so we'd need to have an indirection through an
array to be able to access these variables. If that is so, then why
can't we inline the entire access pattern and have the scope just be an
array that we index directly?

This is actually the most interesting approach to me because (as far as
I can tell) it would be by far the most efficient. The basic idea goes
like this: If all shared variables are assigned a "global index" then
only this index is required to use them. Any use of the shared variable
Foo would be inlined to "Processor activeProcess scope at: FooIndex"
which (given proper primitive support) would probably be by far the
fastest version (if offered a byte code it should rival the current
speed of accessing shared variables). [I'll admit that there are some
tricky issues with this approach as well, like the size needed for the
scope object and whether or not to use hash lookup instead of indexing]

In any case, I'm trying to gather options. If any of you have any new
ideas or have tried one or the other (successfully or not) or have any
other comments to make I'd love to hear about it.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Klaus D. Witzel
Hi Andreas,

on Tue, 24 Oct 2006 06:46:26 +0200, you wrote:

> Folks -
>
> For a variety of reasons I am in dire need of the ability to vector  
> shared variables (globals, class vars and pool vars) through an extra  
> indirection vector per process (really per island but binding per  
> process seems to be simpler for now). Since I need this for *each and  
> every shared variable* it needs to be *very* efficient.
>
> The question is: What is the most efficient way to implement such a  
> scheme?

The fastest indirect access is through literal variables (limited only by  
the # of literals allowed per method).

Since you are willing to spend a #symbol per variable, formally declare a  
"descriptor" to be a class var (or use a pool). Take #PerProcessThing as  
as example; initialize PerProcessThing to a subinstance of Association  
which holds a fast and fixed Array index.

Then all you need in the scope of activeProcess is a shared Array which is  
indexed by the above machinery. Example use:

        PerProcessThing localSharedValue
        PerProcessThing localSharedValue: somethingElse

Not counting "Processor activeProcess scope", the above is the fastest  
double-indirect access that I can think of.

/Klaus


Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Michael van der Gulik
In reply to this post by Andreas.Raab
Andreas Raab wrote:

> Folks -
>
> For a variety of reasons I am in dire need of the ability to vector
> shared variables (globals, class vars and pool vars) through an extra
> indirection vector per process (really per island but binding per
> process seems to be simpler for now). Since I need this for *each and
> every shared variable* it needs to be *very* efficient.
>
> The question is: What is the most efficient way to implement such a
> scheme? There are a couple of ways I can think about:

<snip>

> In any case, I'm trying to gather options. If any of you have any new
> ideas or have tried one or the other (successfully or not) or have any
> other comments to make I'd love to hear about it.

What is the bigger problem that you're trying to solve?

How often will the value of the shared variable differ between the one
in the literal association and the per-process value? If they don't
differ much, perhaps you can do some clever hack in the Interpreter
fetch-literal bytecode implementations to first check if that literal
has an "override" for the current process. I hope this is understandable.

Michael.


Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Nicolas Cellier-3
In reply to this post by Andreas.Raab
Hi Klaus and Andreas
I find the local shared variable feature most useful.
I'am trying to understand your suggestions, sorry for being slow-brained...

Andreas:
- have a SharedPool per process, compiler does replace symbolic name with integer indices
 pseudo-code for PerProcessThing:
 Processor activeProcess sharedPool at: PerProcessThingCompiledIndex
- have to recompile code when an entry removed from sharedPool
 (efficiency rely on this event being rare like for instance variables)

Klaus:
- have a single SharedPool with values being an array, and an index per process?
  pseudo code for PerProcessThing:
  PerProcessThing localSharedValue
  where localSharedValue is (^self at: Processor activeProcess processIndex)
- have to reset all the shared value arrays each time a process is created or die...

I am not sure i understood well Klaus proposition
Did i get it ?

Nicolas

Le Mardi 24 Octobre 2006 07:33, Klaus D. Witzel a écrit :

> Hi Andreas,
>
> on Tue, 24 Oct 2006 06:46:26 +0200, you wrote:
> > Folks -
> >
> > For a variety of reasons I am in dire need of the ability to vector
> > shared variables (globals, class vars and pool vars) through an extra
> > indirection vector per process (really per island but binding per
> > process seems to be simpler for now). Since I need this for *each and
> > every shared variable* it needs to be *very* efficient.
> >
> > The question is: What is the most efficient way to implement such a
> > scheme?
>
> The fastest indirect access is through literal variables (limited only by
> the # of literals allowed per method).
>
> Since you are willing to spend a #symbol per variable, formally declare a
> "descriptor" to be a class var (or use a pool). Take #PerProcessThing as
> as example; initialize PerProcessThing to a subinstance of Association
> which holds a fast and fixed Array index.
>
> Then all you need in the scope of activeProcess is a shared Array which is
> indexed by the above machinery. Example use:
>
> PerProcessThing localSharedValue
> PerProcessThing localSharedValue: somethingElse
>
> Not counting "Processor activeProcess scope", the above is the fastest
> double-indirect access that I can think of.
>
> /Klaus


________________________________________________________________________
iFRANCE, exprimez-vous !
http://web.ifrance.com


Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Reinout Heeck
In reply to this post by Andreas.Raab
>
> 2) Use message lookup, e.g., send a message. This is simple to  
> describe but not necessarily simple to implement correctly. Here is  
> how the simulation would look like:
>


I have experimented with this in VisualWorks (to implement a new  
namespace system, not per-process pools) with great results regarding  
speed. The two concepts I needed (stolen from Forth) where Vocabulary  
(a Dictionary) and SearchOrder (a stack of those dictionaries). My  
breakthrough came when I realized that my search rules in a  
SearchOrder where the same as those the VM uses when doing method  
lookup.

So I implemented a Vocabulary 'instance' as a pair of a anonymous  
class and a singleton instance of that class, lookup is done by  
sending a message to the instance (fast), insertion/removal is done  
by adding/removing methods in the class (slow).
The methods are very simple: they return their first literal, where  
the literal slot holds the value to be returned from the dictionary  
for the key assigned as that method's selector.

> ProtoObject>>lookup: sharedBinding
>     "Look up the value of the given shared binding in the currently  
> executing process."
>     ^[Processor activeProcess scope perform: sharedBinding key]
>        on: MessageNotUnderstood do:[:ex| ex return: nil].


In my scheme you can get rid of the cost of this exception handler by  
reimplementing #doesNotUnderstand: on the anonymous class so it  
raises a KeyNotFound error (or returns nil as your snippet suggests).  
In VW #doesNotUnderstand: is fully supported by the VM optimizations  
(ICs, PICs, hoisting etc..) making lookup misses a fast path, I don't  
know whether Squeak does the same in the case of #doesNotUnderstand:,  
you may want to measure that.

ProtoObject>>lookup: sharedBinding
        ^Processor activeScope perform: sharedBinding key



> One problem here is that the key needs to be unique within all  
> possible keys which is a problem if there is a name conflict. This  
> can be resolved by implicitly prefixing names with the place where  
> they are defined so it's not such big of a deal conceptually but  
> practically the impact of that change might be more visible.


This is where SearchOrder enters the picture (assuming your conflicts  
arise because you don't model scoping).
SearchOrder is implemented by chaining Vocabularies through the  
#superclass attribute of the anonymous class. The VM will run through  
this chain when a miss happens on the first Vocabulary and the nice  
thing is that it will probably optimize future lookup of this key (by  
hoisting).

Implementing scoping is obviously more involved since you need to  
know /where from/ the lookup is performed.

Naively:
        ^(Procesor activeScopeFor: self class) perform: sharedBinding key

Perhaps you can make gains by cashing the per-process SearchOrders  
belonging only to this calling scope in the sharedBinding?

        ^(sharedBinding scopeFor: Processor activeProcess) perform:  
sharedBinding key
        "we lost the 'self class' because the binding already 'knows' that"

That last snippet clearly wants to be implemented on sharedBinding  
instead of ProtoObject  ;-)

>
> The other problem is that the scope object needs to hold all the  
> objects which means quite a number of them. OTOH, one could argue  
> that in many ways "Smalltalk" is just an object with a few thousand  
> iVars so having a class representing the namespace defined by  
> Smalltalk may be quite reasonable.

In my scheme Smalltalk would be modeled by an anonymous class that  
implements thousands of methods, each with one literal, no ivars are  
involved.
So you are limited by the maximum number of methods a class may  
implement. I guess that is infinite for our purposes, but if Squeak  
does have an upper limit on the number of methods per class you can  
work around that by splitting the oversized Vocabularies into  
multiple Vocabularies that are placed adjacent in the SearchOrder.





I don't actively follow this list, so please CC [hidden email] if you  
have any questions.

Furthermore I'm vain, so please credit me if you decide to run with  
this :-)

R
-





Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Klaus D. Witzel
In reply to this post by Nicolas Cellier-3
Hi Nicolas,

on Tue, 24 Oct 2006 09:39:31 +0200, you wrote:
> Hi Klaus and Andreas
> I find the local shared variable feature most useful.
> I'am trying to understand your suggestions,
...
> Klaus:
> - have a single SharedPool with values being an array, and an index per  
> process?

No, one Array per process and one integer index per shared variable.

>   pseudo code for PerProcessThing:
>   PerProcessThing localSharedValue
>   where localSharedValue is (^self at: Processor activeProcess  
> processIndex)

No, Association subclass #LocalSharedVariable and then
LocalSharedVariable>>localSharedValue
        <primitive: 4711>
        "this is what the primitive does faster:"
        ^ Processor activeProcess scope localSharedArray at: value

LocalSharedVariable's key is the same as the key in the PerProcessThing  
association (for convenience), and LocalSharedVariable's value is the  
integer index into the localSharedArray.

If you spend a primitive implementation and suppose that the example  
compiles to

        pushLiteralVariable: (#PerProcessThing -> aLocalSharedVariable)
        send: #localSharedValue "handled by primitive 4711"

So exactly two bytecodes (without context switch) and, since you at least  
need to tell a "descriptor" and what you want from it (get or set a  
value), this identifies the least number of bytecodes necessary.

> - have to reset all the shared value arrays each time a process is  
> created or die...

This is independent of any proposal, you always have to allocate the  
shared value array per process, like in
        [[self allocateSharedValueArray.
         self doTheJob] ensure:
         [self destroySharedValueArray]] fork

> I am not sure i understood well Klaus proposition
> Did i get it ?

I think so :)

/Klaus

> Nicolas
>
> Le Mardi 24 Octobre 2006 07:33, Klaus D. Witzel a écrit :
>> Hi Andreas,
>>
>> on Tue, 24 Oct 2006 06:46:26 +0200, you wrote:
>> > Folks -
>> >
>> > For a variety of reasons I am in dire need of the ability to vector
>> > shared variables (globals, class vars and pool vars) through an extra
>> > indirection vector per process (really per island but binding per
>> > process seems to be simpler for now). Since I need this for *each and
>> > every shared variable* it needs to be *very* efficient.
>> >
>> > The question is: What is the most efficient way to implement such a
>> > scheme?
>>
>> The fastest indirect access is through literal variables (limited only  
>> by
>> the # of literals allowed per method).
>>
>> Since you are willing to spend a #symbol per variable, formally declare  
>> a
>> "descriptor" to be a class var (or use a pool). Take #PerProcessThing as
>> as example; initialize PerProcessThing to a subinstance of Association
>> which holds a fast and fixed Array index.
>>
>> Then all you need in the scope of activeProcess is a shared Array which  
>> is
>> indexed by the above machinery. Example use:
>>
>> PerProcessThing localSharedValue
>> PerProcessThing localSharedValue: somethingElse
>>
>> Not counting "Processor activeProcess scope", the above is the fastest
>> double-indirect access that I can think of.
>>
>> /Klaus
>
>
>
> ________________________________________________________________________
> iFRANCE, exprimez-vous !
> http://web.ifrance.com



Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Nicolas Cellier-3
In reply to this post by Andreas.Raab

Thanks Klaus, i get it better.

> Hi Nicolas,
>
> on Tue, 24 Oct 2006 09:39:31 +0200, you wrote:
> > Hi Klaus and Andreas
> > I find the local shared variable feature most useful.
> > I'am trying to understand your suggestions,
>
> ...
>
> > Klaus:
> > - have a single SharedPool with values being an array, and an index per
> > process?
>
> No, one Array per process and one integer index per shared variable.
>
etc...


________________________________________________________________________
iFRANCE, exprimez-vous !
http://web.ifrance.com


Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Klaus D. Witzel
In reply to this post by Reinout Heeck
Hi Reinout,

I have almost the same implementation for two purposes:

- the slots of (clones of) a prototype
- the roles of a player

The setters/getters are made from template methods which share a new  
literal variable (an other concept is used for slots).

When you say that insertion/removal is slow, what do you compare that to:  
add/remove to/from an OrderedCollection, or just compared to set/get?

Have you considered alternatives, any comparisions or ideas that you can  
share?

/Klaus

On Tue, 24 Oct 2006 10:02:47 +0200, Reinout Heeck wrote:

>>
>> 2) Use message lookup, e.g., send a message. This is simple to describe  
>> but not necessarily simple to implement correctly. Here is how the  
>> simulation would look like:
>>
>
>
> I have experimented with this in VisualWorks (to implement a new  
> namespace system, not per-process pools) with great results regarding  
> speed. The two concepts I needed (stolen from Forth) where Vocabulary (a  
> Dictionary) and SearchOrder (a stack of those dictionaries). My  
> breakthrough came when I realized that my search rules in a SearchOrder  
> where the same as those the VM uses when doing method lookup.
>
> So I implemented a Vocabulary 'instance' as a pair of a anonymous class  
> and a singleton instance of that class, lookup is done by sending a  
> message to the instance (fast), insertion/removal is done by  
> adding/removing methods in the class (slow).
> The methods are very simple: they return their first literal, where the  
> literal slot holds the value to be returned from the dictionary for the  
> key assigned as that method's selector.
>
>> ProtoObject>>lookup: sharedBinding
>>     "Look up the value of the given shared binding in the currently  
>> executing process."
>>     ^[Processor activeProcess scope perform: sharedBinding key]
>>        on: MessageNotUnderstood do:[:ex| ex return: nil].
>
>
> In my scheme you can get rid of the cost of this exception handler by  
> reimplementing #doesNotUnderstand: on the anonymous class so it raises a  
> KeyNotFound error (or returns nil as your snippet suggests). In VW  
> #doesNotUnderstand: is fully supported by the VM optimizations (ICs,  
> PICs, hoisting etc..) making lookup misses a fast path, I don't know  
> whether Squeak does the same in the case of #doesNotUnderstand:, you may  
> want to measure that.
>
> ProtoObject>>lookup: sharedBinding
> ^Processor activeScope perform: sharedBinding key
>
>
>
>> One problem here is that the key needs to be unique within all possible  
>> keys which is a problem if there is a name conflict. This can be  
>> resolved by implicitly prefixing names with the place where they are  
>> defined so it's not such big of a deal conceptually but practically the  
>> impact of that change might be more visible.
>
>
> This is where SearchOrder enters the picture (assuming your conflicts  
> arise because you don't model scoping).
> SearchOrder is implemented by chaining Vocabularies through the  
> #superclass attribute of the anonymous class. The VM will run through  
> this chain when a miss happens on the first Vocabulary and the nice  
> thing is that it will probably optimize future lookup of this key (by  
> hoisting).
>
> Implementing scoping is obviously more involved since you need to know  
> /where from/ the lookup is performed.
>
> Naively:
> ^(Procesor activeScopeFor: self class) perform: sharedBinding key
>
> Perhaps you can make gains by cashing the per-process SearchOrders  
> belonging only to this calling scope in the sharedBinding?
>
> ^(sharedBinding scopeFor: Processor activeProcess) perform:  
> sharedBinding key
> "we lost the 'self class' because the binding already 'knows' that"
>
> That last snippet clearly wants to be implemented on sharedBinding  
> instead of ProtoObject  ;-)
>
>>
>> The other problem is that the scope object needs to hold all the  
>> objects which means quite a number of them. OTOH, one could argue that  
>> in many ways "Smalltalk" is just an object with a few thousand iVars so  
>> having a class representing the namespace defined by Smalltalk may be  
>> quite reasonable.
>
> In my scheme Smalltalk would be modeled by an anonymous class that  
> implements thousands of methods, each with one literal, no ivars are  
> involved.
> So you are limited by the maximum number of methods a class may  
> implement. I guess that is infinite for our purposes, but if Squeak does  
> have an upper limit on the number of methods per class you can work  
> around that by splitting the oversized Vocabularies into multiple  
> Vocabularies that are placed adjacent in the SearchOrder.
>
>
>
>
>
> I don't actively follow this list, so please CC [hidden email] if you  
> have any questions.
>
> Furthermore I'm vain, so please credit me if you decide to run with this  
> :-)
>
> R
> -
>
>
>
>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Nicolas Cellier-3
In reply to this post by Andreas.Raab
So you have to share the first literal between read and write access methods ?
like pseudo byte code:
key
  ^(literalAt: 1) value

key: aValue
  ^(literalAt: 1) value: aValue


Le Mardi 24 Octobre 2006 10:02, Reinout Heeck a écrit :

> > 2) Use message lookup, e.g., send a message. This is simple to
> > describe but not necessarily simple to implement correctly. Here is
> > how the simulation would look like:
>
> I have experimented with this in VisualWorks (to implement a new
> namespace system, not per-process pools) with great results regarding
> speed. The two concepts I needed (stolen from Forth) where Vocabulary
> (a Dictionary) and SearchOrder (a stack of those dictionaries). My
> breakthrough came when I realized that my search rules in a
> SearchOrder where the same as those the VM uses when doing method
> lookup.
>
> So I implemented a Vocabulary 'instance' as a pair of a anonymous
> class and a singleton instance of that class, lookup is done by
> sending a message to the instance (fast), insertion/removal is done
> by adding/removing methods in the class (slow).
> The methods are very simple: they return their first literal, where
> the literal slot holds the value to be returned from the dictionary
> for the key assigned as that method's selector.
>
> > ProtoObject>>lookup: sharedBinding
> >     "Look up the value of the given shared binding in the currently
> > executing process."
> >     ^[Processor activeProcess scope perform: sharedBinding key]
> >        on: MessageNotUnderstood do:[:ex| ex return: nil].
>
> In my scheme you can get rid of the cost of this exception handler by
> reimplementing #doesNotUnderstand: on the anonymous class so it
> raises a KeyNotFound error (or returns nil as your snippet suggests).
> In VW #doesNotUnderstand: is fully supported by the VM optimizations
> (ICs, PICs, hoisting etc..) making lookup misses a fast path, I don't
> know whether Squeak does the same in the case of #doesNotUnderstand:,
> you may want to measure that.
>
> ProtoObject>>lookup: sharedBinding
> ^Processor activeScope perform: sharedBinding key
>
> > One problem here is that the key needs to be unique within all
> > possible keys which is a problem if there is a name conflict. This
> > can be resolved by implicitly prefixing names with the place where
> > they are defined so it's not such big of a deal conceptually but
> > practically the impact of that change might be more visible.
>
> This is where SearchOrder enters the picture (assuming your conflicts
> arise because you don't model scoping).
> SearchOrder is implemented by chaining Vocabularies through the
> #superclass attribute of the anonymous class. The VM will run through
> this chain when a miss happens on the first Vocabulary and the nice
> thing is that it will probably optimize future lookup of this key (by
> hoisting).
>
> Implementing scoping is obviously more involved since you need to
> know /where from/ the lookup is performed.
>
> Naively:
> ^(Procesor activeScopeFor: self class) perform: sharedBinding key
>
> Perhaps you can make gains by cashing the per-process SearchOrders
> belonging only to this calling scope in the sharedBinding?
>
> ^(sharedBinding scopeFor: Processor activeProcess) perform:
> sharedBinding key
> "we lost the 'self class' because the binding already 'knows' that"
>
> That last snippet clearly wants to be implemented on sharedBinding
> instead of ProtoObject  ;-)
>
> > The other problem is that the scope object needs to hold all the
> > objects which means quite a number of them. OTOH, one could argue
> > that in many ways "Smalltalk" is just an object with a few thousand
> > iVars so having a class representing the namespace defined by
> > Smalltalk may be quite reasonable.
>
> In my scheme Smalltalk would be modeled by an anonymous class that
> implements thousands of methods, each with one literal, no ivars are
> involved.
> So you are limited by the maximum number of methods a class may
> implement. I guess that is infinite for our purposes, but if Squeak
> does have an upper limit on the number of methods per class you can
> work around that by splitting the oversized Vocabularies into
> multiple Vocabularies that are placed adjacent in the SearchOrder.
>
>
>
>
>
> I don't actively follow this list, so please CC [hidden email] if you
> have any questions.
>
> Furthermore I'm vain, so please credit me if you decide to run with
> this :-)
>
> R
> -


________________________________________________________________________
iFRANCE, exprimez-vous !
http://web.ifrance.com


Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Diego Gomez Deck
In reply to this post by Andreas.Raab
Hi Andreas,

> For a variety of reasons I am in dire need of the ability to vector
> shared variables (globals, class vars and pool vars) through an extra
> indirection vector per process (really per island but binding per
> process seems to be simpler for now). Since I need this for *each and
> every shared variable* it needs to be *very* efficient.

Is not posible to switch (in hot) the dictionaries containing the shared
variables when a process switch occurs?

Cheers,

-- Diego


Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Reinout Heeck
In reply to this post by Nicolas Cellier-3
On Oct 24, 2006, at 12:18 PM, [hidden email] wrote:

> So you have to share the first literal between read and write  
> access methods ?
> like pseudo byte code:
> key
>   ^(literalAt: 1) value
>
> key: aValue
>   ^(literalAt: 1) value: aValue

Your code confuses me, but you seem to be talking about implementing  
Association behavior. In my scheme CompiledMethods come closest to  
this role.

So getting to the 'association' could be implemented on the anonymous  
class as

associationAt: aSymbol
   ^self compiledMethodAt: aSymbol

then CompiledMethod could implement

key
   ^self selector

value
   ^self localAt: 1


Changing the value is something I haven't needed/tried yet because in  
VW matters are complicated by the JIT cache that would (might?) need  
to be flushed. This requires me getting to the class that holds this  
CompiledMethod. Conceptually:

value: anObject
   self localAt: 1 put: anObject.
   self classThatImplementsMe flushVMMethodCashesFor: self selector.


Overall I'm not too keen on modeling full dictionary behavior because  
my 'dictionary' is already different in that it is reified as two  
objects (the anonymous class and its singleton instance), the class  
can be regarded as a mirror-like accessor to the dictionary for doing  
'meta' queries like getting to an 'association' and for altering the  
dictionary contents. The singleton instance is used for lookup only  
(any support methods implemented on it would clash with keys you  
might want to use, currently only #doesNotUnderstand: is implemented  
on my anon class).




One of my pet peeves about St is that #doesNotUnderstand: should be  
implemented on he class side so it doesn't pollute the object's  
selector namespace...

R
-



Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Reinout Heeck
In reply to this post by Klaus D. Witzel

On Oct 24, 2006, at 12:13 PM, Klaus D. Witzel wrote:

>
> When you say that insertion/removal is slow, what do you compare  
> that to: add/remove to/from an OrderedCollection, or just compared  
> to set/get?

I was comparing to a Dictionary which does add/remove in near  
constant time.
In VW a MethodDictionary is implemented as a SortedCollection with  
the CompiledMethods sorted by the #identityHash value of their  
selector. So inserting/removing involves moving (copying) half of the  
collection one slot up/down.

>
> Have you considered alternatives, any comparisions or ideas that  
> you can share?

No, this was a full blown case of premature optimization, I now have  
a fast Vocabulary/SearchOrder implementation but haven't progressed  
to the real meat of implementing a new namespace system yet :-(


R
-


Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Klaus D. Witzel
Hi Reinout,

on Tue, 24 Oct 2006 16:30:39 +0200, you wrote:

> On Oct 24, 2006, at 12:13 PM, Klaus D. Witzel wrote:
>
>>
>> When you say that insertion/removal is slow, what do you compare that  
>> to: add/remove to/from an OrderedCollection, or just compared to  
>> set/get?
>
> I was comparing to a Dictionary which does add/remove in near constant  
> time.
> In VW a MethodDictionary is implemented as a SortedCollection with the  
> CompiledMethods sorted by the #identityHash value of their selector.

Ah, the price tag of GOF binary search :|

> So inserting/removing involves moving (copying) half of the collection  
> one slot up/down.
>
>>
>> Have you considered alternatives, any comparisions or ideas that you  
>> can share?
>
> No, this was a full blown case of premature optimization, I now have a  
> fast Vocabulary/SearchOrder implementation but haven't progressed to the  
> real meat of implementing a new namespace system yet :-(

I don't know why but the discussion with you "inspired" three new  
operations for my implementation:

- specialization, clones the anonymous class (shares the method dictionary)
- proliferation of name space, locally copies the method dictionary  
("rapidly" doubles it, still shares methods, variables and state)
- proliferation of state, locally clones "un-shares" a variable (and so  
its state)

Smalltalk rules 8-)

/Klaus

> R
> -
>



Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Nicolas Cellier-3
In reply to this post by Andreas.Raab
Yes, despite very confusing names i used, you understood me well, "key" was to be replaced with any variable name lying in the pool dictionary.

Thank you for these explanations.

I understand that having mutable literals would break VW JIT.
It may be worse if we use a trick to share a literal reference between two methods (the getter and setter), and that trick is not recognized at JIT level...

In order to use traditional compiler optimizations, you could maybe use a class var for each pool dictionary entry in your anonymous class instead of a literal reference.
This way, no use to flush the VM cache whenever the value changes...

Compared to traditional global variable access that directly adresses the association (with value or value:), this cost a first method call to get the thread-dependent pool dictionary and an additional method lookup to get/set the value.

Could such kind of method lookup be reduced by an inlining cache a la strongtalk ?


Nicolas


Le Mardi 24 Octobre 2006 16:17, vous avez écrit :

> On Oct 24, 2006, at 12:18 PM, [hidden email] wrote:
> > So you have to share the first literal between read and write
> > access methods ?
> > like pseudo byte code:
> > key
> >   ^(literalAt: 1) value
> >
> > key: aValue
> >   ^(literalAt: 1) value: aValue
>
> Your code confuses me, but you seem to be talking about implementing
> Association behavior. In my scheme CompiledMethods come closest to
> this role.
>
> So getting to the 'association' could be implemented on the anonymous
> class as
>
> associationAt: aSymbol
>    ^self compiledMethodAt: aSymbol
>
> then CompiledMethod could implement
>
> key
>    ^self selector
>
> value
>    ^self localAt: 1
>
>
> Changing the value is something I haven't needed/tried yet because in
> VW matters are complicated by the JIT cache that would (might?) need
> to be flushed. This requires me getting to the class that holds this
> CompiledMethod. Conceptually:
>
> value: anObject
>    self localAt: 1 put: anObject.
>    self classThatImplementsMe flushVMMethodCashesFor: self selector.
>
>
> Overall I'm not too keen on modeling full dictionary behavior because
> my 'dictionary' is already different in that it is reified as two
> objects (the anonymous class and its singleton instance), the class
> can be regarded as a mirror-like accessor to the dictionary for doing
> 'meta' queries like getting to an 'association' and for altering the
> dictionary contents. The singleton instance is used for lookup only
> (any support methods implemented on it would clash with keys you
> might want to use, currently only #doesNotUnderstand: is implemented
> on my anon class).
>
>
>
>
> One of my pet peeves about St is that #doesNotUnderstand: should be
> implemented on he class side so it doesn't pollute the object's
> selector namespace...
>
> R
> -


________________________________________________________________________
iFRANCE, exprimez-vous !
http://web.ifrance.com


Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Bert Freudenberg
Just as a side not on efficiency - globals are much more often read  
than written. The original goal might even be fulfilled by read-only  
variables (not sure about that). So optimizing for read is essential,  
but I have a hunch that writing could be much slower without any  
noticeable effect on performance.

- Bert -

Am 24.10.2006 um 18:07 schrieb [hidden email]:

> Yes, despite very confusing names i used, you understood me well,  
> "key" was to be replaced with any variable name lying in the pool  
> dictionary.
>
> Thank you for these explanations.
>
> I understand that having mutable literals would break VW JIT.
> It may be worse if we use a trick to share a literal reference  
> between two methods (the getter and setter), and that trick is not  
> recognized at JIT level...
>
> In order to use traditional compiler optimizations, you could maybe  
> use a class var for each pool dictionary entry in your anonymous  
> class instead of a literal reference.
> This way, no use to flush the VM cache whenever the value changes...
>
> Compared to traditional global variable access that directly  
> adresses the association (with value or value:), this cost a first  
> method call to get the thread-dependent pool dictionary and an  
> additional method lookup to get/set the value.
>
> Could such kind of method lookup be reduced by an inlining cache a  
> la strongtalk ?
>
>
> Nicolas
>
>
> Le Mardi 24 Octobre 2006 16:17, vous avez écrit :
>> On Oct 24, 2006, at 12:18 PM, [hidden email] wrote:
>>> So you have to share the first literal between read and write
>>> access methods ?
>>> like pseudo byte code:
>>> key
>>>   ^(literalAt: 1) value
>>>
>>> key: aValue
>>>   ^(literalAt: 1) value: aValue
>>
>> Your code confuses me, but you seem to be talking about implementing
>> Association behavior. In my scheme CompiledMethods come closest to
>> this role.
>>
>> So getting to the 'association' could be implemented on the anonymous
>> class as
>>
>> associationAt: aSymbol
>>    ^self compiledMethodAt: aSymbol
>>
>> then CompiledMethod could implement
>>
>> key
>>    ^self selector
>>
>> value
>>    ^self localAt: 1
>>
>>
>> Changing the value is something I haven't needed/tried yet because in
>> VW matters are complicated by the JIT cache that would (might?) need
>> to be flushed. This requires me getting to the class that holds this
>> CompiledMethod. Conceptually:
>>
>> value: anObject
>>    self localAt: 1 put: anObject.
>>    self classThatImplementsMe flushVMMethodCashesFor: self selector.
>>
>>
>> Overall I'm not too keen on modeling full dictionary behavior because
>> my 'dictionary' is already different in that it is reified as two
>> objects (the anonymous class and its singleton instance), the class
>> can be regarded as a mirror-like accessor to the dictionary for doing
>> 'meta' queries like getting to an 'association' and for altering the
>> dictionary contents. The singleton instance is used for lookup only
>> (any support methods implemented on it would clash with keys you
>> might want to use, currently only #doesNotUnderstand: is implemented
>> on my anon class).
>>
>>
>>
>>
>> One of my pet peeves about St is that #doesNotUnderstand: should be
>> implemented on he class side so it doesn't pollute the object's
>> selector namespace...
>>
>> R
>> -
>
>
>
> ______________________________________________________________________
> __
> iFRANCE, exprimez-vous !
> http://web.ifrance.com
>


Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Andreas.Raab
Bert Freudenberg wrote:
> Just as a side not on efficiency - globals are much more often read than
> written. The original goal might even be fulfilled by read-only
> variables (not sure about that). So optimizing for read is essential,
> but I have a hunch that writing could be much slower without any
> noticeable effect on performance.

In general, I agree, but I have my own hunch which is that if writing is
fast they'll be much more written to than if it's slow ;-) So basically,
I don't want to have to assume that writing is slow if it can be avoided.

BTW, thanks for all the pointers, ideas, and other comments. I really
appreciate the discussion and I am still in the midst of thinking about
a good implementation. I'll share my thoughts later.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Reinout Heeck
In reply to this post by Nicolas Cellier-3

On Oct 24, 2006, at 6:07 PM, [hidden email] wrote:

> Yes, despite very confusing names i used, you understood me well,  
> "key" was to be replaced with any variable name lying in the pool  
> dictionary.


Ah, I needed that extra bit of explanation.

I don't implement setters on my Vocabulary object (don't need them)  
only getters. Setting is done by manipulating the MethodDictionary  
through the anonymous class in my system. So the sharing of a  
reference between setter and getter is a non-issue for me.


>>> key
>>>   ^(literalAt: 1) value
>>>
>>> key: aValue
>>>   ^(literalAt: 1) value: aValue

Your code suggests that literal slots hold valueholders, this is not  
the case (at least in VW).


Since I don't use setters it is possible to have an entry with a key  
that ends in a colon like #key: (I can use any arbitrary object as  
selector/key in VW).

Searching for that entry would look like

   ^aVocabulary perform: #key:

Note the missing #withArguments:, all my entries are looked up by  
#performing a zero-arg method, even if the key suggests otherwise.
Put differently: all the methods are zero-arg, just returning their  
first literal.
As selector I can use *any* object including confusing Symbols that  
suggest arguments...



R
-



Reply | Threaded
Open this post in threaded view
|

Re: Efficient thread-local shared variables

Klaus D. Witzel
In reply to this post by Klaus D. Witzel
Hi Reinout,

on Tue, 24 Oct 2006 18:44:49 +0200, you wrote:

> Klaus wrote:
>>
>> I don't know why but the discussion with you "inspired" three new  
>> operations for my implementation:
>>
>> - specialization, clones the anonymous class (shares the method  
>> dictionary)
>> - proliferation of name space, locally copies the method dictionary  
>> ("rapidly" doubles it, still shares methods, variables and state)
>
> I presume you want to do above at Process creation time, and

Yes, e.g. at "schema creation" time.

>> - proliferation of state, locally clones "un-shares" a variable (and so  
>> its state)
>
> on global var assignment.

Yes, e.g. at "schema evolution" time.

>
> Another approach could be to (ab)use SearchOrder. Assuming you still  
> have something like a 'global or 'default' scope you can put that at the  
> bottom of your SearchOrder and have a per-thread Vocabulary (perhaps  
> with scoping support) at the top of the SearchOrder.
>
> So at process creation time you only need to create an *empty*  
> Vocabulary for each scope and chain it to the global SearchOrder,  
> reading a global var will now fall through to the global definitions  
> while assigning a new value to that var will put it in the topmost  
> Vocabulary masking the global state of that variable.

:)

> Thinking this through even further: you could defer the creation of the  
> thread-local Vocabularies in the SearchOrder until the first write is  
> done, if the code only does reads the Process creation overhead can be  
> pretty close to zero :-)
>

Thank you for sharing the ideas!

/Klaus

>
> R
> -
>