How about atomic value-swap bytecode?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
40 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Eliot Miranda-2


On Wed, Oct 13, 2010 at 9:41 AM, Tom Rushworth <[hidden email]> wrote:
Hi Igor, Eliot,

Eliot, I'm not sure I understood your comment:

On 2010-Oct-13, at 08:55, Eliot Miranda wrote:

> Right.  And because Smalltak can't reify variables and CAS is an operation on a variable CAS can't be implemented as a primitive on
> variables.  There's no way to express "pass a variable to a primitive", only "pass an expression (which may be the value of a variable)" to a
> primitive".  One could do it with a primitive on an object, e.g. thisContext at: tempIndex compareWith: match andSetTo: expr, or anObject
> instVarAt: varIndex compareWith: match andSetTo: expr but that's soooo ugly.  Hence I think it is better done using a special assignment
> operator.

I mean one can't write

    | t |
    self compare: t andSwap: expr

because in the above the rvalue of t is passed, not it's lvalue, and cas needs to operate on an lvalue (rvalue == read value, or value of a variable; lvalue = location value, or address of a variable).  Clearly cas needs to apply to a variable, not the value in the variable.  The variable's value is compared and the variable is set to a new value if it matches.  Look at Igor's example and you'll see his primitive on Arrays passes an lvalue since the primitive takes the index of an array element.

e.g. if you implemented CAS in C using functions instead of macros you'd know that you couldn't do it with int cas(int var, int match, int new) but you'd know that you can do it with int cas(int *var, int match, int new) and ... int v; ... cas(&v, match, new)



Doesn't this apply to Igor's proposed atomic swap as well?  The target in Igor's example was a variable "locked" which was clearly meant to be visible to other threads.

Yes.  First time I read Igor's post I on;y saw the initial 

"| var1 var2 temp |

temp := var1.
var1 := var2.
var2 := temp."

which I rejected and didn't read further.  Now I see the :=:.  I don't like this because it doesn't map as nicely to CAS which I like using.  I like the boolean result one gets fro CAS that says the assignment either happened or it didn't.  Atomic swap is more low-level.  I then have to inspect the value the variable used to have and undo the swap if it wasn't what was expected.  So

        | var |
        var ?= expr : match

is equivalent to

        [| var scratch |
        scratch := expr.
        var :=: scratch.
        scratch == match
            ifTrue: [true]
            ifFalse: [var :=: scratch. false]] valueUninterruptibly

Hence I think :=: is too low-level.

 

If that's the case, then they both need to be implemented with a special assignment operator, and Igor and I can get back to arguing the merits of the atomic swap versus the atomic compare and swap :).  If not, then Igor's swap has a clear advantage and I guess you can both ignore the rest of this email...

Um, mine /does/ use a special assignment operator, "?= :" :)
 

On 2010-Oct-13, at 08:33, Igor Stasenko wrote:

> On 13 October 2010 18:07, Tom Rushworth <[hidden email]> wrote:
>> Hi all,
>>
>> Take a look through the literature for the "strength" of the various types of atomic operations with respect to doing various synchronization tasks.  CAS is "universal" in the sense that it can simulate all of the others.  (The simulation is wretched, O(N**3), so it isn't practical, but...)
>> I doubt that your atomic swap is as strong.
>
> Who said i gonna use it for synchronizations tasks?
> We got a plenty of other stuff in smalltalk for synchronization.

I'm using "synchronization" in the sense of safe access to data in a multi-threaded context.  There is no point at all to atomic ops unless you are doing that kind of "synchronization".  In particular, your example was for implementing a lock, which I took to be a mutex style lock from looking at your code.  Your example is in fact implementing a CAS spinloop for a boolean value.  My counter example just used CAS explicitly :).  You can use an unconditional swap for boolean values because there are only two possible values, so that setting the 'locked' variable to true unconditionally doesn't hurt anything since either it was already true and you didn't change anything visible to another thread, or it was the desired previous value (false) and you did  change things visible to another thread is a planned, predictable way.  If the variable in question was a pointer this would not be a safe operation, because you have no way to predict what the value of temp should be.
>
>>
>> Pretty well every current HW architecture actually has the CAS instruction (or some minor variation) implemented as a single instruction, so making it a primitive should be relatively easy. It seems a terrible waste to pick a weak synchronization primitive when a much better one could be had for very little more work.
>
> I feel that you are mixing hardware and language-side atomicity.
> I don't care about hardware , i care that at language side i have a
> simple atomic operation, which can't be interrupted by scheduler.
> Hardware CAS is completely irrelevant to this, since VM can support
> atomicity for language side without any specific requirements from the
> hardware it runs on.
>
>
>>
>> On 2010-Oct-12, at 12:04, Igor Stasenko wrote:
>>
>>> CAS? Perhaps. If you wish to add it too, no problem.
>>>
>>> I want at minimum swap, because with it i can easily implement
>>> lock-free waiting:
>>>
>>> [ | temp |
>>>  [ locked ] whileTrue.
>>>    temp := true.
>>>    locked :=: temp.
>>>    temp
>>> ] whileTrue.
>>
>> I think this is simpler (using Eliot's notation):
>>
>>  [locked ?:= true : false] whileFalse.
>>
>>   ... locked here ...
>>
>>   locked := false.
>>
>> No temp needed, single block. well understood atomic op :).
>
> Yeah? Then try and implement a unconditional atomic swap using CAS.

I'll look up the paper in a short while, but for now, how about:

   [ | oldVal | oldVal := var. var ?:= newVal : oldVal ] whileFalse.

Yes, this makes the retry loop explicit, but the retry loop exists in _any_ atomic op at some level.  It may be buried in the HW where the microcode does the retry if there is memory cache line contention or some such but it is always there.  "Atomic" implies blocking other threads, so that the other (blocked) threads have to retry.  In the case of the atomic op being a VM level primitive the retry loop is implicit in the scheduler, in the sense that the scheduler cannot schedule other threads for the duration of the primitive.
>
> Again, who said, that i going to use atomic swap for loops like above?

Um, your example said it? :)  Sorry if I misunderstood, but email doesn't give much context, and the loop you gave was the only context I had, so I assumed that was what you wanted to do.

> I simply need a guarantee from VM , that swap operation will be
> atomic. No less, no more.
> Compare-And-Swap and swap is different operations.

I guess that depends on how you look at it.  I see CAS as a simple generalization of atomic swap that is _much_ more useful for implementing many different kinds of safe multi-threaded data access.  Unconditional swap works safely for booleans, but if you use it for something more complicated, like numbers, you end up in all sorts of complications.  The unconditional swap ends up "leaking" information into the variable that is visible to other threads in a way that is very difficult to cope with for the other threads.  Try doing a thread safe numeric increment with unconditional swap....

+1.  I've used CAS quite a bit in Cog (e.g. reimplementing signalExternalSemaphore) ad it's concise and easy-to-use.
 
>
>
>> I've used this pattern (expressed in C code) in a very heavily used parallel data processing application, so I can even claim to have tested it extensively :).
>>
>
> Yes yes.. But C is not Smalltalk.

Very true, as Eliot pointed out one of the implications of the difference above.  However, the algorithms for safe access to data in a multi-threaded environment are completely language and hardware independent.  You simply choose the algorithm you use to match what your language and HW have available for you to implement the algorithm.  In this context, if your (our) language provide CAS, you end up with a much broader choice of algorithms.

The only reason I brought up the HW CAS instruction was to make it clear that the machine code or C code for the primitive would not be any more complicated for CAS no matter what the assumptions or guarantees are for the VM scheduler.
>
>>>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>


Regards,

--
Tom Rushworth







Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Igor Stasenko
On 13 October 2010 21:06, Eliot Miranda <[hidden email]> wrote:

>
>
> On Wed, Oct 13, 2010 at 9:41 AM, Tom Rushworth <[hidden email]>
> wrote:
>>
>> Hi Igor, Eliot,
>>
>> Eliot, I'm not sure I understood your comment:
>>
>> On 2010-Oct-13, at 08:55, Eliot Miranda wrote:
>>
>> > Right.  And because Smalltak can't reify variables and CAS is an
>> > operation on a variable CAS can't be implemented as a primitive on
>> > variables.  There's no way to express "pass a variable to a primitive",
>> > only "pass an expression (which may be the value of a variable)" to a
>> > primitive".  One could do it with a primitive on an object, e.g.
>> > thisContext at: tempIndex compareWith: match andSetTo: expr, or anObject
>> > instVarAt: varIndex compareWith: match andSetTo: expr but that's soooo
>> > ugly.  Hence I think it is better done using a special assignment
>> > operator.
>
> I mean one can't write
>     | t |
>     self compare: t andSwap: expr
> because in the above the rvalue of t is passed, not it's lvalue, and cas
> needs to operate on an lvalue (rvalue == read value, or value of a variable;
> lvalue = location value, or address of a variable).  Clearly cas needs to
> apply to a variable, not the value in the variable.  The variable's value is
> compared and the variable is set to a new value if it matches.  Look at
> Igor's example and you'll see his primitive on Arrays passes an lvalue since
> the primitive takes the index of an array element.
> e.g. if you implemented CAS in C using functions instead of macros you'd
> know that you couldn't do it with int cas(int var, int match, int new) but
> you'd know that you can do it with int cas(int *var, int match, int new) and
> ... int v; ... cas(&v, match, new)
>
>>
>> Doesn't this apply to Igor's proposed atomic swap as well?  The target in
>> Igor's example was a variable "locked" which was clearly meant to be visible
>> to other threads.
>
> Yes.  First time I read Igor's post I on;y saw the initial
> "| var1 var2 temp |
> temp := var1.
> var1 := var2.
> var2 := temp."
> which I rejected and didn't read further.  Now I see the :=:.  I don't like
> this because it doesn't map as nicely to CAS which I like using.  I like the
> boolean result one gets fro CAS that says the assignment either happened or
> it didn't.  Atomic swap is more low-level.  I then have to inspect the value
> the variable used to have and undo the swap if it wasn't what was expected.
>  So
>         | var |
>         var ?= expr : match
> is equivalent to
>         [| var scratch |
>         scratch := expr.
>         var :=: scratch.
>         scratch == match
>             ifTrue: [true]
>             ifFalse: [var :=: scratch. false]] valueUninterruptibly
> Hence I think :=: is too low-level.
>

well, once you start using valueUninterruptibly, there is no much
reason to use/introduce atomic swap nor cas :)

moreover, i see no trivial way to implement CAS using atomic swap,
while reverse is trivial.
>From this point of view, CAS is better.

>>
>> If that's the case, then they both need to be implemented with a special
>> assignment operator, and Igor and I can get back to arguing the merits of
>> the atomic swap versus the atomic compare and swap :).  If not, then Igor's
>> swap has a clear advantage and I guess you can both ignore the rest of this
>> email...
>
> Um, mine /does/ use a special assignment operator, "?= :" :)
>

var ?= match : expression

could mean a lot of hacking in compiler. how about:

var ?= { match . expression }

so, you just need to introduce a new binary operator ?= "CAS
assignment" or "conditional assignment"
which is much simpler to hack comparing to trinary one.

And you can also force compiler to expect only '{' after ?=, so no

var ?= foo
var ?= #( foo )
var ?= self bar

will be allowed.


>>
>> On 2010-Oct-13, at 08:33, Igor Stasenko wrote:
>>
>> > On 13 October 2010 18:07, Tom Rushworth <[hidden email]> wrote:
>> >> Hi all,
>> >>
>> >> Take a look through the literature for the "strength" of the various
>> >> types of atomic operations with respect to doing various synchronization
>> >> tasks.  CAS is "universal" in the sense that it can simulate all of the
>> >> others.  (The simulation is wretched, O(N**3), so it isn't practical,
>> >> but...)
>> >> I doubt that your atomic swap is as strong.
>> >
>> > Who said i gonna use it for synchronizations tasks?
>> > We got a plenty of other stuff in smalltalk for synchronization.
>>
>> I'm using "synchronization" in the sense of safe access to data in a
>> multi-threaded context.  There is no point at all to atomic ops unless you
>> are doing that kind of "synchronization".  In particular, your example was
>> for implementing a lock, which I took to be a mutex style lock from looking
>> at your code.  Your example is in fact implementing a CAS spinloop for a
>> boolean value.  My counter example just used CAS explicitly :).  You can use
>> an unconditional swap for boolean values because there are only two possible
>> values, so that setting the 'locked' variable to true unconditionally
>> doesn't hurt anything since either it was already true and you didn't change
>> anything visible to another thread, or it was the desired previous value
>> (false) and you did  change things visible to another thread is a planned,
>> predictable way.  If the variable in question was a pointer this would not
>> be a safe operation, because you have no way to predict what the value of
>> temp should be.
>> >
>> >>
>> >> Pretty well every current HW architecture actually has the CAS
>> >> instruction (or some minor variation) implemented as a single instruction,
>> >> so making it a primitive should be relatively easy. It seems a terrible
>> >> waste to pick a weak synchronization primitive when a much better one could
>> >> be had for very little more work.
>> >
>> > I feel that you are mixing hardware and language-side atomicity.
>> > I don't care about hardware , i care that at language side i have a
>> > simple atomic operation, which can't be interrupted by scheduler.
>> > Hardware CAS is completely irrelevant to this, since VM can support
>> > atomicity for language side without any specific requirements from the
>> > hardware it runs on.
>> >
>> >
>> >>
>> >> On 2010-Oct-12, at 12:04, Igor Stasenko wrote:
>> >>
>> >>> CAS? Perhaps. If you wish to add it too, no problem.
>> >>>
>> >>> I want at minimum swap, because with it i can easily implement
>> >>> lock-free waiting:
>> >>>
>> >>> [ | temp |
>> >>>  [ locked ] whileTrue.
>> >>>    temp := true.
>> >>>    locked :=: temp.
>> >>>    temp
>> >>> ] whileTrue.
>> >>
>> >> I think this is simpler (using Eliot's notation):
>> >>
>> >>  [locked ?:= true : false] whileFalse.
>> >>
>> >>   ... locked here ...
>> >>
>> >>   locked := false.
>> >>
>> >> No temp needed, single block. well understood atomic op :).
>> >
>> > Yeah? Then try and implement a unconditional atomic swap using CAS.
>>
>> I'll look up the paper in a short while, but for now, how about:
>>
>>    [ | oldVal | oldVal := var. var ?:= newVal : oldVal ] whileFalse.
>>
>> Yes, this makes the retry loop explicit, but the retry loop exists in
>> _any_ atomic op at some level.  It may be buried in the HW where the
>> microcode does the retry if there is memory cache line contention or some
>> such but it is always there.  "Atomic" implies blocking other threads, so
>> that the other (blocked) threads have to retry.  In the case of the atomic
>> op being a VM level primitive the retry loop is implicit in the scheduler,
>> in the sense that the scheduler cannot schedule other threads for the
>> duration of the primitive.
>> >
>> > Again, who said, that i going to use atomic swap for loops like above?
>>
>> Um, your example said it? :)  Sorry if I misunderstood, but email doesn't
>> give much context, and the loop you gave was the only context I had, so I
>> assumed that was what you wanted to do.
>>
>> > I simply need a guarantee from VM , that swap operation will be
>> > atomic. No less, no more.
>> > Compare-And-Swap and swap is different operations.
>>
>> I guess that depends on how you look at it.  I see CAS as a simple
>> generalization of atomic swap that is _much_ more useful for implementing
>> many different kinds of safe multi-threaded data access.  Unconditional swap
>> works safely for booleans, but if you use it for something more complicated,
>> like numbers, you end up in all sorts of complications.  The unconditional
>> swap ends up "leaking" information into the variable that is visible to
>> other threads in a way that is very difficult to cope with for the other
>> threads.  Try doing a thread safe numeric increment with unconditional
>> swap....
>
> +1.  I've used CAS quite a bit in Cog (e.g. reimplementing
> signalExternalSemaphore) ad it's concise and easy-to-use.
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Tom Rushworth-3
In reply to this post by Igor Stasenko
Hi Igor,

I should mention as an aside before going on with the debate that there are several variations on CAS.  The one returning a bool is the one I like, but there is another very useful variant that returns the old value, regardless of whether or not it matched the old value you supplied.  In pseudo-code (since 'var' is supposed to be a variable, not a simple value):

cas: &var oldVal: oldValue newVal: newValue
   | prevValue |
   prevValue := var.
   (var == oldValue) ifTrue: [var := newValue]
   ^prevValue

If your interest is in the actual previous value, this is the better variant to use.  You can synthesize the boolean variant by comparing the returned prevValue to the oldValue you suppliled one the operation has completed.

On 2010-Oct-13, at 10:23, Igor Stasenko wrote:

> On 13 October 2010 19:41, Tom Rushworth <[hidden email]> wrote:
>>

[snip]
>>
>> I'm using "synchronization" in the sense of safe access to data in a multi-threaded context.  There is no point at all to atomic ops unless you are doing that kind of "synchronization".  In particular, your example was for implementing a lock, which I took to be a mutex style lock from looking at your code.  Your example is in fact implementing a CAS spinloop for a boolean value.  My counter example just used CAS explicitly :).  You can use an unconditional swap for boolean values because there are only two possible values, so that setting the 'locked' variable to true unconditionally doesn't hurt anything since either it was already true and you didn't change anything visible to another thread, or it was the desired previous value (false) and you did  change things visible to another thread is a planned, predictable way.  If the variable in question was a pointer this would not be a safe operation, because you have no way to predict what the value of temp should be.
>
> how many spin loops you seen where you need more than two values
> (true/false) i.e. locked/unlocked ? :)

Lots. How about a simple lock that uses 0 for unlocked and thread ID > 0 for locked?  This is very useful when trying to find a deadlock, because the lock tells you who owns it. A semaphore count is another (numeric value).  I'm sure there are many more.
>
>
[snip]

>>>>
>>>> No temp needed, single block. well understood atomic op :).
>>>
>>> Yeah? Then try and implement a unconditional atomic swap using CAS.
>>
>> I'll look up the paper in a short while, but for now, how about:
>>
>>    [ | oldVal | oldVal := var. var ?:= newVal : oldVal ] whileFalse.
>>
>
> see.. now in own turn i can say.. hey.. i can do it w/o temp and
> without a loop.  :)

Heh - yes you can.  But I get to reply that unless you are doing it on a boolean, you are changing the value of the var in a way that makes the var very difficult to use safely for the other threads involved.  Let's look at a simple increment of a numeric value:

   [ | oldVal | oldVal := var. var ?:= (oldVal + 1) : oldVal ] whileFalse.

The unconditional swap can easily end up getting one increment "erased" by another.

[snip]
>
> Ok. Here is a simple example for you. A shared LIFO queue.

There's a problem here...

> Populating:
>
> | temp |
> temp := newItem := QueueItem new.
> queueHead :=: newItem.
What if somebody else processes the queue at this point and finishes with the QueueItem instance in temp? You've lost the whole old queue.

> temp next: newItem.
>
>
> Flushing and processing the queue:
>
> | oldHead |
> oldHead := nil.
> queueHead :=: oldHead.
>
> [ oldHead notNil ] whileTrue: [
>  "process the queue item here"
>
>  oldHead := oldHead next.
> ]

I'm not saying CAS can do the LIFO way better, but the naive "push" operation works perfectly:

    | oldHead newItem |
    newItem := QueueItem new.
    [ oldHead := queueHead. newItem next: oldHead. queueHead ?:= newItem : oldHead ] whileFalse.

Your flush of the entire queue looks ok, but could be done the same way by CAS.

The really difficult problem with a shared LIFO queue is extracting a _single_ item at a time (i.e. "pop").  There is almost an entire field of literature on this subject :).  Google for "Lock free" "ABA problem".  There are CAS based solutions, but they are a lot more elaborate than I want to try to type into an email.  I don't know of any atomic swap based solutions to the "pop" operation.

>
>
>>
>>> I simply need a guarantee from VM , that swap operation will be
>>> atomic. No less, no more.
>>> Compare-And-Swap and swap is different operations.
>>
>> I guess that depends on how you look at it.  I see CAS as a simple generalization of atomic swap that is _much_ more useful for implementing many different kinds of safe multi-threaded data access.  Unconditional swap works safely for booleans, but if you use it for something more complicated, like numbers, you end up in all sorts of complications.  The unconditional swap ends up "leaking" information into the variable that is visible to other threads in a way that is very difficult to cope with for the other threads.  Try doing a thread safe numeric increment with unconditional swap....
>
> If you tell me how you can atomically evaluate:
> SmallInteger maxVal+1
>
> i'll try :)

Red herring :).  Atomic increment and decrement are used all the time for a things like reference counting, worrying about overflow of whatever the HW/VM/whatever native word size is another issue.  Note however, that the CAS version of increment has to pick up the old value and look at it in order to compute the new value, and can make a check for maxVal before computing the new value, which provides the opportunity to handle the issue however you want to.

>
>>>
>>>
>>>> I've used this pattern (expressed in C code) in a very heavily used parallel data processing application, so I can even claim to have tested it extensively :).
>>>>
>>>
>>> Yes yes.. But C is not Smalltalk.
>>
>> Very true, as Eliot pointed out one of the implications of the difference above.  However, the algorithms for safe access to data in a multi-threaded environment are completely language and hardware independent.  You simply choose the algorithm you use to match what your language and HW have available for you to implement the algorithm.  In this context, if your (our) language provide CAS, you end up with a much broader choice of algorithms.
>>
>> The only reason I brought up the HW CAS instruction was to make it clear that the machine code or C code for the primitive would not be any more complicated for CAS no matter what the assumptions or guarantees are for the VM scheduler.
>
> I'm not arguing about userfulness of CAS.
> As i said, if you think its cool to have as well, lets think how to
> add it as well.
> But i'm against 'lets add CAS instead of atomic swap'.

Ah.  Well, I'm afraid I am arguing against atomic swap, because it is just too weak to be really useful.  Yes, it does booleans, but that's about all.  I've been working on multi-threaded C code for about 5 years now, and the single most important thing I've learned is that lock free multi-threaded algorithms are not nearly as simple as they look at first.  The classic example is an actual published paper on a lock-free double ended queue that contained a flaw where the last element could be popped from both ends at once by two different threads.  The paper was peer-reviewed, both in it's original conference version and in its later journal version, and the flaw wasn't discovered until after the journal version was published.
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
Regards

--
Tom Rushworth




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Igor Stasenko
In reply to this post by Levente Uzonyi-2
2010/10/13 Levente Uzonyi <[hidden email]>:

> On Tue, 12 Oct 2010, Igor Stasenko wrote:
>
>> On 12 October 2010 16:51, Levente Uzonyi <[hidden email]> wrote:
>>>
>>> On Tue, 12 Oct 2010, Igor Stasenko wrote:
>>>
>>>> Hello, i just thought, that it would be cool to have a special bytecode,
>>>> which guarantees atomicity for swapping values between two variables.
>>>>
>>>> To swap two values, you usually do:
>>>>
>>>> | var1 var2 temp |
>>>>
>>>> temp := var1.
>>>> var1 := var2.
>>>> var2 := temp.
>>>>
>>>> But since its non-atomic, a process can be interrupted and such
>>>> operation
>>>> is not thread-safe.
>>>>
>>>> In order to make it thread safe, you must add even more boilerplate:
>>>>
>>>> | var1 var2 temp |
>>>>
>>>> semaphore critical: [
>>>>  temp := var1.
>>>>  var1 := var2.
>>>>  var2 := temp.
>>>> ]
>>>
>>> An alternative solution:
>>>
>>> | a b |
>>> a := 1.
>>> b := 2.
>>> [
>>>        | tmp |
>>>        tmp := a.
>>>        a := b.
>>>        b := tmp ] valueUnpreemptively
>>>
>>
>> Yeah, another boilerplate under the hood, also highly dependent from
>> scheduling nuances :)
>
> I don't get the "dependency from scheduling nuances part", but here's my
> idea:

i don't like the code, which assuming that scheduling works in some
specific way,
or some code can't be interrupted in the middle.

See Semaphore>>critical: for excersise.

| caught |
        caught := false.
        ^[
                caught := true.
                self wait.
                mutuallyExcludedBlock value
        ] ensure: [ caught ifTrue: [self signal] ]

this code assuming that between
                caught := true.
and
                self wait.

no interrupt possible.
But if it is, then the above implementation is incorrect.


>
> Add the compiler changes to support :=: as atomic swap. We don't really need
> a bytecode for now, since a sequence of assignments is currently atomic. So
> the compiler could compile :=: as three assignments using a hidden temporary
> variable. On other systems, :=: can be compiled differently.
>

For same reason, there is no any guarantees from VM side that three
assignments in a row will be
atomic: storing pointer could trigger a root check, and if roots table
is full, it could trigger GC,
and after GC, there is a finalization semaphore, which could switch an
active process immediately.

VM is evolving and subject to change. Having a clear rule which
guarantees a specific behavior is far more
beneficial comparing to intimate knowledge about how VM works (now).


>
> Levente
>
>>
>> valueUnpreemptively
>>        "Evaluate the receiver (block), without the possibility of
>> preemption
>> by higher priority processes. Use this facility VERY sparingly!"
>>        "Think about using Block>>valueUninterruptably first, and think
>> about
>> using Semaphore>>critical: before that, and think about redesigning
>> your application even before that!
>>        After you've done all that thinking, go right ahead and use it..."
>>        | activeProcess oldPriority result |
>>        activeProcess := Processor activeProcess.
>>        oldPriority := activeProcess priority.
>>        activeProcess priority: Processor highestPriority.
>>        result := self ensure: [activeProcess priority: oldPriority].
>>        "Yield after restoring priority to give the preempted processes a
>> chance to run"
>>        Processor yield.
>>        ^result
>>>
>>> Levente
>>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Eliot Miranda-2
In reply to this post by Igor Stasenko


On Wed, Oct 13, 2010 at 11:37 AM, Igor Stasenko <[hidden email]> wrote:
On 13 October 2010 21:06, Eliot Miranda <[hidden email]> wrote:
>
>
> On Wed, Oct 13, 2010 at 9:41 AM, Tom Rushworth <[hidden email]>
> wrote:
>>
>> Hi Igor, Eliot,
>>
>> Eliot, I'm not sure I understood your comment:
>>
>> On 2010-Oct-13, at 08:55, Eliot Miranda wrote:
>>
>> > Right.  And because Smalltak can't reify variables and CAS is an
>> > operation on a variable CAS can't be implemented as a primitive on
>> > variables.  There's no way to express "pass a variable to a primitive",
>> > only "pass an expression (which may be the value of a variable)" to a
>> > primitive".  One could do it with a primitive on an object, e.g.
>> > thisContext at: tempIndex compareWith: match andSetTo: expr, or anObject
>> > instVarAt: varIndex compareWith: match andSetTo: expr but that's soooo
>> > ugly.  Hence I think it is better done using a special assignment
>> > operator.
>
> I mean one can't write
>     | t |
>     self compare: t andSwap: expr
> because in the above the rvalue of t is passed, not it's lvalue, and cas
> needs to operate on an lvalue (rvalue == read value, or value of a variable;
> lvalue = location value, or address of a variable).  Clearly cas needs to
> apply to a variable, not the value in the variable.  The variable's value is
> compared and the variable is set to a new value if it matches.  Look at
> Igor's example and you'll see his primitive on Arrays passes an lvalue since
> the primitive takes the index of an array element.
> e.g. if you implemented CAS in C using functions instead of macros you'd
> know that you couldn't do it with int cas(int var, int match, int new) but
> you'd know that you can do it with int cas(int *var, int match, int new) and
> ... int v; ... cas(&v, match, new)
>
>>
>> Doesn't this apply to Igor's proposed atomic swap as well?  The target in
>> Igor's example was a variable "locked" which was clearly meant to be visible
>> to other threads.
>
> Yes.  First time I read Igor's post I on;y saw the initial
> "| var1 var2 temp |
> temp := var1.
> var1 := var2.
> var2 := temp."
> which I rejected and didn't read further.  Now I see the :=:.  I don't like
> this because it doesn't map as nicely to CAS which I like using.  I like the
> boolean result one gets fro CAS that says the assignment either happened or
> it didn't.  Atomic swap is more low-level.  I then have to inspect the value
> the variable used to have and undo the swap if it wasn't what was expected.
>  So
>         | var |
>         var ?= expr : match
> is equivalent to
>         [| var scratch |
>         scratch := expr.
>         var :=: scratch.
>         scratch == match
>             ifTrue: [true]
>             ifFalse: [var :=: scratch. false]] valueUninterruptibly
> Hence I think :=: is too low-level.
>

well, once you start using valueUninterruptibly, there is no much
reason to use/introduce atomic swap nor cas :)

Right :)
 

moreover, i see no trivial way to implement CAS using atomic swap,
while reverse is trivial.

Is it, or does it also rely on uninterruptbility?  Is this right?

    v1 :=: v2

is equivalent to

    [ | scratch |
     scratch := v1.
     v1 ?= v2 : scratch] whileFalse.
     v2 := scratch.



>From this point of view, CAS is better.

>>
>> If that's the case, then they both need to be implemented with a special
>> assignment operator, and Igor and I can get back to arguing the merits of
>> the atomic swap versus the atomic compare and swap :).  If not, then Igor's
>> swap has a clear advantage and I guess you can both ignore the rest of this
>> email...
>
> Um, mine /does/ use a special assignment operator, "?= :" :)
>

var ?= match : expression

could mean a lot of hacking in compiler. how about:

var ?= { match . expression }

so, you just need to introduce a new binary operator ?= "CAS
assignment" or "conditional assignment"
which is much simpler to hack comparing to trinary one.

That's a good suggestion, bit let's see.  Personally I find  var ?= { match . expression } ok but a little flowery :)


And you can also force compiler to expect only '{' after ?=, so no

var ?= foo
var ?= #( foo )
var ?= self bar

will be allowed.

But ?= : is no more difficult than parsing expressions within keywords.  I guess it'll look something like

    conditionalAssignment: varNode
" var '?=' expression ':' match => ConditionalAssignmentNode."
| loc start valueNode |
(loc := varNode assignmentCheck: encoder at: prevMark + requestorOffset) >= 0 ifTrue:
[^self notify: 'Cannot store into' at: loc].
start := self startOfNextToken.
self advance.
self expression ifFalse: [^self expected: 'Expression'].
newValueNode := parseNode.
(self match: #colon) ifFalse: [^self expected: 'colon'].
self advance.
self expression ifFalse: [^self expected: 'Expression'].
parseNode := ConditionAssignmentNode new
variable: varNode
value: valueNode
match: valueNode
from: encoder
sourceRange: (start to: self endOfLastToken).
varNode nowHasDef.
^true



>>
>> On 2010-Oct-13, at 08:33, Igor Stasenko wrote:
>>
>> > On 13 October 2010 18:07, Tom Rushworth <[hidden email]> wrote:
>> >> Hi all,
>> >>
>> >> Take a look through the literature for the "strength" of the various
>> >> types of atomic operations with respect to doing various synchronization
>> >> tasks.  CAS is "universal" in the sense that it can simulate all of the
>> >> others.  (The simulation is wretched, O(N**3), so it isn't practical,
>> >> but...)
>> >> I doubt that your atomic swap is as strong.
>> >
>> > Who said i gonna use it for synchronizations tasks?
>> > We got a plenty of other stuff in smalltalk for synchronization.
>>
>> I'm using "synchronization" in the sense of safe access to data in a
>> multi-threaded context.  There is no point at all to atomic ops unless you
>> are doing that kind of "synchronization".  In particular, your example was
>> for implementing a lock, which I took to be a mutex style lock from looking
>> at your code.  Your example is in fact implementing a CAS spinloop for a
>> boolean value.  My counter example just used CAS explicitly :).  You can use
>> an unconditional swap for boolean values because there are only two possible
>> values, so that setting the 'locked' variable to true unconditionally
>> doesn't hurt anything since either it was already true and you didn't change
>> anything visible to another thread, or it was the desired previous value
>> (false) and you did  change things visible to another thread is a planned,
>> predictable way.  If the variable in question was a pointer this would not
>> be a safe operation, because you have no way to predict what the value of
>> temp should be.
>> >
>> >>
>> >> Pretty well every current HW architecture actually has the CAS
>> >> instruction (or some minor variation) implemented as a single instruction,
>> >> so making it a primitive should be relatively easy. It seems a terrible
>> >> waste to pick a weak synchronization primitive when a much better one could
>> >> be had for very little more work.
>> >
>> > I feel that you are mixing hardware and language-side atomicity.
>> > I don't care about hardware , i care that at language side i have a
>> > simple atomic operation, which can't be interrupted by scheduler.
>> > Hardware CAS is completely irrelevant to this, since VM can support
>> > atomicity for language side without any specific requirements from the
>> > hardware it runs on.
>> >
>> >
>> >>
>> >> On 2010-Oct-12, at 12:04, Igor Stasenko wrote:
>> >>
>> >>> CAS? Perhaps. If you wish to add it too, no problem.
>> >>>
>> >>> I want at minimum swap, because with it i can easily implement
>> >>> lock-free waiting:
>> >>>
>> >>> [ | temp |
>> >>>  [ locked ] whileTrue.
>> >>>    temp := true.
>> >>>    locked :=: temp.
>> >>>    temp
>> >>> ] whileTrue.
>> >>
>> >> I think this is simpler (using Eliot's notation):
>> >>
>> >>  [locked ?:= true : false] whileFalse.
>> >>
>> >>   ... locked here ...
>> >>
>> >>   locked := false.
>> >>
>> >> No temp needed, single block. well understood atomic op :).
>> >
>> > Yeah? Then try and implement a unconditional atomic swap using CAS.
>>
>> I'll look up the paper in a short while, but for now, how about:
>>
>>    [ | oldVal | oldVal := var. var ?:= newVal : oldVal ] whileFalse.
>>
>> Yes, this makes the retry loop explicit, but the retry loop exists in
>> _any_ atomic op at some level.  It may be buried in the HW where the
>> microcode does the retry if there is memory cache line contention or some
>> such but it is always there.  "Atomic" implies blocking other threads, so
>> that the other (blocked) threads have to retry.  In the case of the atomic
>> op being a VM level primitive the retry loop is implicit in the scheduler,
>> in the sense that the scheduler cannot schedule other threads for the
>> duration of the primitive.
>> >
>> > Again, who said, that i going to use atomic swap for loops like above?
>>
>> Um, your example said it? :)  Sorry if I misunderstood, but email doesn't
>> give much context, and the loop you gave was the only context I had, so I
>> assumed that was what you wanted to do.
>>
>> > I simply need a guarantee from VM , that swap operation will be
>> > atomic. No less, no more.
>> > Compare-And-Swap and swap is different operations.
>>
>> I guess that depends on how you look at it.  I see CAS as a simple
>> generalization of atomic swap that is _much_ more useful for implementing
>> many different kinds of safe multi-threaded data access.  Unconditional swap
>> works safely for booleans, but if you use it for something more complicated,
>> like numbers, you end up in all sorts of complications.  The unconditional
>> swap ends up "leaking" information into the variable that is visible to
>> other threads in a way that is very difficult to cope with for the other
>> threads.  Try doing a thread safe numeric increment with unconditional
>> swap....
>
> +1.  I've used CAS quite a bit in Cog (e.g. reimplementing
> signalExternalSemaphore) ad it's concise and easy-to-use.
>



--
Best regards,
Igor Stasenko AKA sig.




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Eliot Miranda-2
In reply to this post by Igor Stasenko


On Wed, Oct 13, 2010 at 11:53 AM, Igor Stasenko <[hidden email]> wrote:
2010/10/13 Levente Uzonyi <[hidden email]>:
> On Tue, 12 Oct 2010, Igor Stasenko wrote:
>
>> On 12 October 2010 16:51, Levente Uzonyi <[hidden email]> wrote:
>>>
>>> On Tue, 12 Oct 2010, Igor Stasenko wrote:
>>>
>>>> Hello, i just thought, that it would be cool to have a special bytecode,
>>>> which guarantees atomicity for swapping values between two variables.
>>>>
>>>> To swap two values, you usually do:
>>>>
>>>> | var1 var2 temp |
>>>>
>>>> temp := var1.
>>>> var1 := var2.
>>>> var2 := temp.
>>>>
>>>> But since its non-atomic, a process can be interrupted and such
>>>> operation
>>>> is not thread-safe.
>>>>
>>>> In order to make it thread safe, you must add even more boilerplate:
>>>>
>>>> | var1 var2 temp |
>>>>
>>>> semaphore critical: [
>>>>  temp := var1.
>>>>  var1 := var2.
>>>>  var2 := temp.
>>>> ]
>>>
>>> An alternative solution:
>>>
>>> | a b |
>>> a := 1.
>>> b := 2.
>>> [
>>>        | tmp |
>>>        tmp := a.
>>>        a := b.
>>>        b := tmp ] valueUnpreemptively
>>>
>>
>> Yeah, another boilerplate under the hood, also highly dependent from
>> scheduling nuances :)
>
> I don't get the "dependency from scheduling nuances part", but here's my
> idea:

i don't like the code, which assuming that scheduling works in some
specific way,
or some code can't be interrupted in the middle.

See Semaphore>>critical: for excersise.

| caught |
       caught := false.
       ^[
               caught := true.
               self wait.
               mutuallyExcludedBlock value
       ] ensure: [ caught ifTrue: [self signal] ]

this code assuming that between
               caught := true.
and
               self wait.

no interrupt possible.
But if it is, then the above implementation is incorrect.


>
> Add the compiler changes to support :=: as atomic swap. We don't really need
> a bytecode for now, since a sequence of assignments is currently atomic. So
> the compiler could compile :=: as three assignments using a hidden temporary
> variable. On other systems, :=: can be compiled differently.
>

For same reason, there is no any guarantees from VM side that three
assignments in a row will be
atomic: storing pointer could trigger a root check, and if roots table
is full, it could trigger GC,
and after GC, there is a finalization semaphore, which could switch an
active process immediately.

VM is evolving and subject to change. Having a clear rule which
guarantees a specific behavior is far more
beneficial comparing to intimate knowledge about how VM works (now).

+1.  I think there's general agreement on this.  All this uninterruptibiity goes out of the window (as does Processor activeProcess) as soon as anyone does a truly concurrent implementation.

best
Eliot 


>
> Levente
>
>>
>> valueUnpreemptively
>>        "Evaluate the receiver (block), without the possibility of
>> preemption
>> by higher priority processes. Use this facility VERY sparingly!"
>>        "Think about using Block>>valueUninterruptably first, and think
>> about
>> using Semaphore>>critical: before that, and think about redesigning
>> your application even before that!
>>        After you've done all that thinking, go right ahead and use it..."
>>        | activeProcess oldPriority result |
>>        activeProcess := Processor activeProcess.
>>        oldPriority := activeProcess priority.
>>        activeProcess priority: Processor highestPriority.
>>        result := self ensure: [activeProcess priority: oldPriority].
>>        "Yield after restoring priority to give the preempted processes a
>> chance to run"
>>        Processor yield.
>>        ^result
>>>
>>> Levente
>>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>
>
>
>



--
Best regards,
Igor Stasenko AKA sig.




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Tom Rushworth-3
In reply to this post by Eliot Miranda-2
Hi Eliot,

On 2010-Oct-13, at 11:06, Eliot Miranda wrote:

>
>
> On Wed, Oct 13, 2010 at 9:41 AM, Tom Rushworth <[hidden email]> wrote:
> Hi Igor, Eliot,
>
> Eliot, I'm not sure I understood your comment:
>
> On 2010-Oct-13, at 08:55, Eliot Miranda wrote:
>
> > Right.  And because Smalltak can't reify variables and CAS is an operation on a variable CAS can't be implemented as a primitive on
> > variables.  There's no way to express "pass a variable to a primitive", only "pass an expression (which may be the value of a variable)" to a
> > primitive".  One could do it with a primitive on an object, e.g. thisContext at: tempIndex compareWith: match andSetTo: expr, or anObject
> > instVarAt: varIndex compareWith: match andSetTo: expr but that's soooo ugly.  Hence I think it is better done using a special assignment
> > operator.
>
> I mean one can't write
>
>     | t |
>     self compare: t andSwap: expr

RIght.  I wasn't aware of the restriction on primitives, but I'm painfully aware of the "lvalue problem" in C code.  Fortunately for my sanity, C does let me declare cas(int *var, int match, int new).  It isn't pretty, but it gets the job done without a lot of extra fuss.  For Smalltalk we don't have that escape.
>
>
[snip]

> which I rejected and didn't read further.  Now I see the :=:.  I don't like this because it doesn't map as nicely to CAS which I like using.  I like the boolean result one gets fro CAS that says the assignment either happened or it didn't.  Atomic swap is more low-level.  I then have to inspect the value the variable used to have and undo the swap if it wasn't what was expected.  So

Yes, but "undo the swap" is a lot thornier problem than you might at first think.  What if some other thread has done a swap of its own before you can check for and perform your counter-swap? Your code avoids this by wrapping the check and undo in a [...] valueUninteruptibly, but that sort of defeats the whole point of an atomic exchange.  If you have [...] valueUninterruptibly, why not just use it for atomic swap too?

>
>
>         | var |
>         var ?= expr : match
>
> is equivalent to
>
>         [| var scratch |
>         scratch := expr.
>         var :=: scratch.
>         scratch == match
>             ifTrue: [true]
>             ifFalse: [var :=: scratch. false]] valueUninterruptibly
>
> Hence I think :=: is too low-level.

Well we agree to some extent - I think it is too weak to be worth the effort of doing it when a stronger, much more useful ?:= operation can be done for the pretty close to the same amount of effort.
>
>  
>
> If that's the case, then they both need to be implemented with a special assignment operator, and Igor and I can get back to arguing the merits of the atomic swap versus the atomic compare and swap :).  If not, then Igor's swap has a clear advantage and I guess you can both ignore the rest of this email...
>
> Um, mine /does/ use a special assignment operator, "?= :" :)

Yes, I saw that, sorry I wasn't clearer.  My question was more along the lines of "Eliot's needs an op, doesn't Igor's too?".

[snip]

Regards,

--
Tom Rushworth




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Tom Rushworth-3
In reply to this post by Eliot Miranda-2

On 2010-Oct-13, at 11:58, Eliot Miranda wrote:

>
>
> On Wed, Oct 13, 2010 at 11:53 AM, Igor Stasenko <[hidden email]> wrote:
> 2010/10/13 Levente Uzonyi <[hidden email]>:
> > On Tue, 12 Oct 2010, Igor Stasenko wrote:
> >
>
[snip]
>
> i don't like the code, which assuming that scheduling works in some
> specific way,
> or some code can't be interrupted in the middle.
>
[snip]

> For same reason, there is no any guarantees from VM side that three
> assignments in a row will be
> atomic: storing pointer could trigger a root check, and if roots table
> is full, it could trigger GC,
> and after GC, there is a finalization semaphore, which could switch an
> active process immediately.
>
> VM is evolving and subject to change. Having a clear rule which
> guarantees a specific behavior is far more
> beneficial comparing to intimate knowledge about how VM works (now).
>
> +1.  I think there's general agreement on this.  All this uninterruptibiity goes out of the window (as does Processor activeProcess) as soon as anyone does a truly concurrent implementation.

+1 here too.  This was sort of in the back of my mind when I mentioned that all current HW has a CAS instruction or equivalent.  If the VM makes use of a the HW CAS instruction as the heart of the bytecode, you have future-proofed at least the one bytecode :).

[snip]

Regards,

--
Tom Rushworth




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Igor Stasenko
In reply to this post by Tom Rushworth-3
On 13 October 2010 21:47, Tom Rushworth <[hidden email]> wrote:

> Hi Igor,
>
> I should mention as an aside before going on with the debate that there are several variations on CAS.  The one returning a bool is the one I like, but there is another very useful variant that returns the old value, regardless of whether or not it matched the old value you supplied.  In pseudo-code (since 'var' is supposed to be a variable, not a simple value):
>
> cas: &var oldVal: oldValue newVal: newValue
>   | prevValue |
>   prevValue := var.
>   (var == oldValue) ifTrue: [var := newValue]
>   ^prevValue
>
> If your interest is in the actual previous value, this is the better variant to use.  You can synthesize the boolean variant by comparing the returned prevValue to the oldValue you suppliled one the operation has completed.
>

I think this is not really important. In one case its more convenient
to receive boolean,
in other cases - old value. VM can provide just one variant. Another
one can be coded with ease.

> On 2010-Oct-13, at 10:23, Igor Stasenko wrote:
>
>> On 13 October 2010 19:41, Tom Rushworth <[hidden email]> wrote:
>>>
>
> [snip]
>>>
>>> I'm using "synchronization" in the sense of safe access to data in a multi-threaded context.  There is no point at all to atomic ops unless you are doing that kind of "synchronization".  In particular, your example was for implementing a lock, which I took to be a mutex style lock from looking at your code.  Your example is in fact implementing a CAS spinloop for a boolean value.  My counter example just used CAS explicitly :).  You can use an unconditional swap for boolean values because there are only two possible values, so that setting the 'locked' variable to true unconditionally doesn't hurt anything since either it was already true and you didn't change anything visible to another thread, or it was the desired previous value (false) and you did  change things visible to another thread is a planned, predictable way.  If the variable in question was a pointer this would not be a safe operation, because you have no way to predict what the value of temp should be.
>>
>> how many spin loops you seen where you need more than two values
>> (true/false) i.e. locked/unlocked ? :)
>
> Lots. How about a simple lock that uses 0 for unlocked and thread ID > 0 for locked?  This is very useful when trying to find a deadlock, because the lock tells you who owns it. A semaphore count is another (numeric value).  I'm sure there are many more.
>>
>>
> [snip]
>
>>>>>
>>>>> No temp needed, single block. well understood atomic op :).
>>>>
>>>> Yeah? Then try and implement a unconditional atomic swap using CAS.
>>>
>>> I'll look up the paper in a short while, but for now, how about:
>>>
>>>    [ | oldVal | oldVal := var. var ?:= newVal : oldVal ] whileFalse.
>>>
>>
>> see.. now in own turn i can say.. hey.. i can do it w/o temp and
>> without a loop.  :)
>
> Heh - yes you can.  But I get to reply that unless you are doing it on a boolean, you are changing the value of the var in a way that makes the var very difficult to use safely for the other threads involved.  Let's look at a simple increment of a numeric value:
>
>   [ | oldVal | oldVal := var. var ?:= (oldVal + 1) : oldVal ] whileFalse.
>
> The unconditional swap can easily end up getting one increment "erased" by another.
>
yes.

> [snip]
>>
>> Ok. Here is a simple example for you. A shared LIFO queue.
>
> There's a problem here...
>
>> Populating:
>>
>> | temp |
>> temp := newItem := QueueItem new.
>> queueHead :=: newItem.
> What if somebody else processes the queue at this point and finishes with the QueueItem instance in temp? You've lost the whole old queue.

yes. my bad. :)
Its always hard to make things right for a first time.

>> temp next: newItem.
>>
>>
>> Flushing and processing the queue:
>>
>> | oldHead |
>> oldHead := nil.
>> queueHead :=: oldHead.
>>
>> [ oldHead notNil ] whileTrue: [
>>  "process the queue item here"
>>
>>  oldHead := oldHead next.
>> ]
>
> I'm not saying CAS can do the LIFO way better, but the naive "push" operation works perfectly:
>
>    | oldHead newItem |
>    newItem := QueueItem new.
>    [ oldHead := queueHead. newItem next: oldHead. queueHead ?:= newItem : oldHead ] whileFalse.
>
> Your flush of the entire queue looks ok, but could be done the same way by CAS.
>
> The really difficult problem with a shared LIFO queue is extracting a _single_ item at a time (i.e. "pop").  There is almost an entire field of literature on this subject :).  Google for "Lock free" "ABA problem".  There are CAS based solutions, but they are a lot more elaborate than I want to try to type into an email.  I don't know of any atomic swap based solutions to the "pop" operation.
>>
>>
>>>
>>>> I simply need a guarantee from VM , that swap operation will be
>>>> atomic. No less, no more.
>>>> Compare-And-Swap and swap is different operations.
>>>
>>> I guess that depends on how you look at it.  I see CAS as a simple generalization of atomic swap that is _much_ more useful for implementing many different kinds of safe multi-threaded data access.  Unconditional swap works safely for booleans, but if you use it for something more complicated, like numbers, you end up in all sorts of complications.  The unconditional swap ends up "leaking" information into the variable that is visible to other threads in a way that is very difficult to cope with for the other threads.  Try doing a thread safe numeric increment with unconditional swap....
>>
>> If you tell me how you can atomically evaluate:
>> SmallInteger maxVal+1
>>
>> i'll try :)
>
> Red herring :).  Atomic increment and decrement are used all the time for a things like reference counting, worrying about overflow of whatever the HW/VM/whatever native word size is another issue.  Note however, that the CAS version of increment has to pick up the old value and look at it in order to compute the new value, and can make a check for maxVal before computing the new value, which provides the opportunity to handle the issue however you want to.

yes. once i tried to implement double-linked list using CAS and failed..  :)

>>
>>>>
>>>>
>>>>> I've used this pattern (expressed in C code) in a very heavily used parallel data processing application, so I can even claim to have tested it extensively :).
>>>>>
>>>>
>>>> Yes yes.. But C is not Smalltalk.
>>>
>>> Very true, as Eliot pointed out one of the implications of the difference above.  However, the algorithms for safe access to data in a multi-threaded environment are completely language and hardware independent.  You simply choose the algorithm you use to match what your language and HW have available for you to implement the algorithm.  In this context, if your (our) language provide CAS, you end up with a much broader choice of algorithms.
>>>
>>> The only reason I brought up the HW CAS instruction was to make it clear that the machine code or C code for the primitive would not be any more complicated for CAS no matter what the assumptions or guarantees are for the VM scheduler.
>>
>> I'm not arguing about userfulness of CAS.
>> As i said, if you think its cool to have as well, lets think how to
>> add it as well.
>> But i'm against 'lets add CAS instead of atomic swap'.
>
> Ah.  Well, I'm afraid I am arguing against atomic swap, because it is just too weak to be really useful.  Yes, it does booleans, but that's about all.  I've been working on multi-threaded C code for about 5 years now, and the single most important thing I've learned is that lock free multi-threaded algorithms are not nearly as simple as they look at first.  The classic example is an actual published paper on a lock-free double ended queue that contained a flaw where the last element could be popped from both ends at once by two different threads.  The paper was peer-reviewed, both in it's original conference version and in its later journal version, and the flaw wasn't discovered until after the journal version was published.

Okay, i must confess, that atomic swap seems inferior to CAS.
So, lets stick with CAS.

Thanks, Tom for providing arguments and proving your point.

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Tom Rushworth-3
Hi Igor,

On 2010-Oct-13, at 12:17, Igor Stasenko wrote:

[snip]

>
> yes. once i tried to implement double-linked list using CAS and failed..  :)

Me too.  For both trying and failing :).

[snip]
>>
>
> Okay, i must confess, that atomic swap seems inferior to CAS.
> So, lets stick with CAS.
>
> Thanks, Tom for providing arguments and proving your point.

No problem.  The whole lock-free data structure problem is interesting and I don't have any nearby colleagues I can discuss it with, so I enjoyed the discussion with you.  The only thing we were missing was a couple of beers and a whiteboard to draw pictures and write code on :).

If you and or Eliot implement CAS, I'll try to find the time to update my Split-Order-List Hash Table code on Squeaksource to be the full multi-thread version.  I don't think anyone will have an immediate use for it, but the algorithm is a thing of beauty - I wish I could claim it was mine :).

>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
Regards,

--
Tom Rushworth




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Igor Stasenko
In reply to this post by Eliot Miranda-2
On 13 October 2010 21:56, Eliot Miranda <[hidden email]> wrote:

>
>
> On Wed, Oct 13, 2010 at 11:37 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 13 October 2010 21:06, Eliot Miranda <[hidden email]> wrote:
>> >
>> >
>> > On Wed, Oct 13, 2010 at 9:41 AM, Tom Rushworth <[hidden email]>
>> > wrote:
>> >>
>> >> Hi Igor, Eliot,
>> >>
>> >> Eliot, I'm not sure I understood your comment:
>> >>
>> >> On 2010-Oct-13, at 08:55, Eliot Miranda wrote:
>> >>
>> >> > Right.  And because Smalltak can't reify variables and CAS is an
>> >> > operation on a variable CAS can't be implemented as a primitive on
>> >> > variables.  There's no way to express "pass a variable to a
>> >> > primitive",
>> >> > only "pass an expression (which may be the value of a variable)" to a
>> >> > primitive".  One could do it with a primitive on an object, e.g.
>> >> > thisContext at: tempIndex compareWith: match andSetTo: expr, or
>> >> > anObject
>> >> > instVarAt: varIndex compareWith: match andSetTo: expr but that's
>> >> > soooo
>> >> > ugly.  Hence I think it is better done using a special assignment
>> >> > operator.
>> >
>> > I mean one can't write
>> >     | t |
>> >     self compare: t andSwap: expr
>> > because in the above the rvalue of t is passed, not it's lvalue, and cas
>> > needs to operate on an lvalue (rvalue == read value, or value of a
>> > variable;
>> > lvalue = location value, or address of a variable).  Clearly cas needs
>> > to
>> > apply to a variable, not the value in the variable.  The variable's
>> > value is
>> > compared and the variable is set to a new value if it matches.  Look at
>> > Igor's example and you'll see his primitive on Arrays passes an lvalue
>> > since
>> > the primitive takes the index of an array element.
>> > e.g. if you implemented CAS in C using functions instead of macros you'd
>> > know that you couldn't do it with int cas(int var, int match, int new)
>> > but
>> > you'd know that you can do it with int cas(int *var, int match, int new)
>> > and
>> > ... int v; ... cas(&v, match, new)
>> >
>> >>
>> >> Doesn't this apply to Igor's proposed atomic swap as well?  The target
>> >> in
>> >> Igor's example was a variable "locked" which was clearly meant to be
>> >> visible
>> >> to other threads.
>> >
>> > Yes.  First time I read Igor's post I on;y saw the initial
>> > "| var1 var2 temp |
>> > temp := var1.
>> > var1 := var2.
>> > var2 := temp."
>> > which I rejected and didn't read further.  Now I see the :=:.  I don't
>> > like
>> > this because it doesn't map as nicely to CAS which I like using.  I like
>> > the
>> > boolean result one gets fro CAS that says the assignment either happened
>> > or
>> > it didn't.  Atomic swap is more low-level.  I then have to inspect the
>> > value
>> > the variable used to have and undo the swap if it wasn't what was
>> > expected.
>> >  So
>> >         | var |
>> >         var ?= expr : match
>> > is equivalent to
>> >         [| var scratch |
>> >         scratch := expr.
>> >         var :=: scratch.
>> >         scratch == match
>> >             ifTrue: [true]
>> >             ifFalse: [var :=: scratch. false]] valueUninterruptibly
>> > Hence I think :=: is too low-level.
>> >
>>
>> well, once you start using valueUninterruptibly, there is no much
>> reason to use/introduce atomic swap nor cas :)
>
> Right :)
>
>>
>> moreover, i see no trivial way to implement CAS using atomic swap,
>> while reverse is trivial.
>
> Is it, or does it also rely on uninterruptbility?  Is this right?
>     v1 :=: v2
> is equivalent to
>     [ | scratch |
>      scratch := v1.
>      v1 ?= v2 : scratch] whileFalse.
>      v2 := scratch.
>

Nope! Not in all cases.

If v1 and v2 having equal significance in algorithm, not just v1,
then it won't work.
I mean, if the above will be interrupted after leaving the loop but
before you assign to v2 (v2 := scratch)
and your code expects that not only v1 swapped correctly, but v2 as well,
then you're in trouble.

Swap is symmetric, i.e. v1 :=: v2 should be semantically equal to v2 :=: v1.
But its not achievable unless its also atomic.

So, i must confess (again), that CAS cannot serve as a complete
replacement for atomic swap.


>
>> >From this point of view, CAS is better.
>>
>> >>
>> >> If that's the case, then they both need to be implemented with a
>> >> special
>> >> assignment operator, and Igor and I can get back to arguing the merits
>> >> of
>> >> the atomic swap versus the atomic compare and swap :).  If not, then
>> >> Igor's
>> >> swap has a clear advantage and I guess you can both ignore the rest of
>> >> this
>> >> email...
>> >
>> > Um, mine /does/ use a special assignment operator, "?= :" :)
>> >
>>
>> var ?= match : expression
>>
>> could mean a lot of hacking in compiler. how about:
>>
>> var ?= { match . expression }
>>
>> so, you just need to introduce a new binary operator ?= "CAS
>> assignment" or "conditional assignment"
>> which is much simpler to hack comparing to trinary one.
>
> That's a good suggestion, bit let's see.  Personally I find  var ?= { match
> . expression } ok but a little flowery :)
>

Yes. But it looks more conformant to traditional ST syntax.  :)

>> And you can also force compiler to expect only '{' after ?=, so no
>>
>> var ?= foo
>> var ?= #( foo )
>> var ?= self bar
>>
>> will be allowed.
>
> But ?= : is no more difficult than parsing expressions within keywords.  I
> guess it'll look something like
>     conditionalAssignment: varNode
> " var '?=' expression ':' match => ConditionalAssignmentNode."
> | loc start valueNode |
> (loc := varNode assignmentCheck: encoder at: prevMark + requestorOffset) >=
> 0 ifTrue:
> [^self notify: 'Cannot store into' at: loc].
> start := self startOfNextToken.
> self advance.
> self expression ifFalse: [^self expected: 'Expression'].
> newValueNode := parseNode.
> (self match: #colon) ifFalse: [^self expected: 'colon'].
> self advance.
> self expression ifFalse: [^self expected: 'Expression'].
> parseNode := ConditionAssignmentNode new
> variable: varNode
> value: valueNode
> match: valueNode
> from: encoder
> sourceRange: (start to: self endOfLastToken).
> varNode nowHasDef.
> ^true

Okay, if you think its easy. No problem.
But i am a bit worrying about parsing & clearly reading following:

boo ?= #foo:bar: : #baz: .

colon seems not a good choice here.

>>
>>
>> >>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Igor Stasenko
On 13 October 2010 22:58, Igor Stasenko <[hidden email]> wrote:

>>> moreover, i see no trivial way to implement CAS using atomic swap,
>>> while reverse is trivial.
>>
>> Is it, or does it also rely on uninterruptbility?  Is this right?
>>     v1 :=: v2
>> is equivalent to
>>     [ | scratch |
>>      scratch := v1.
>>      v1 ?= v2 : scratch] whileFalse.
>>      v2 := scratch.
>>
>
> Nope! Not in all cases.
>
For example there may be an algorithms which rely on
that v1 never equal to v2.

> If v1 and v2 having equal significance in algorithm, not just v1,
> then it won't work.
> I mean, if the above will be interrupted after leaving the loop but
> before you assign to v2 (v2 := scratch)
> and your code expects that not only v1 swapped correctly, but v2 as well,
> then you're in trouble.
>
> Swap is symmetric, i.e. v1 :=: v2 should be semantically equal to v2 :=: v1.
> But its not achievable unless its also atomic.
>
> So, i must confess (again), that CAS cannot serve as a complete
> replacement for atomic swap.
>
--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Eliot Miranda-2
In reply to this post by Igor Stasenko


On Wed, Oct 13, 2010 at 12:58 PM, Igor Stasenko <[hidden email]> wrote:
On 13 October 2010 21:56, Eliot Miranda <[hidden email]> wrote:
>
>
> On Wed, Oct 13, 2010 at 11:37 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 13 October 2010 21:06, Eliot Miranda <[hidden email]> wrote:
>> >
>> >
>> > On Wed, Oct 13, 2010 at 9:41 AM, Tom Rushworth <[hidden email]>
>> > wrote:
>> >>
>> >> Hi Igor, Eliot,
>> >>
>> >> Eliot, I'm not sure I understood your comment:
>> >>
>> >> On 2010-Oct-13, at 08:55, Eliot Miranda wrote:
>> >>
>> >> > Right.  And because Smalltak can't reify variables and CAS is an
>> >> > operation on a variable CAS can't be implemented as a primitive on
>> >> > variables.  There's no way to express "pass a variable to a
>> >> > primitive",
>> >> > only "pass an expression (which may be the value of a variable)" to a
>> >> > primitive".  One could do it with a primitive on an object, e.g.
>> >> > thisContext at: tempIndex compareWith: match andSetTo: expr, or
>> >> > anObject
>> >> > instVarAt: varIndex compareWith: match andSetTo: expr but that's
>> >> > soooo
>> >> > ugly.  Hence I think it is better done using a special assignment
>> >> > operator.
>> >
>> > I mean one can't write
>> >     | t |
>> >     self compare: t andSwap: expr
>> > because in the above the rvalue of t is passed, not it's lvalue, and cas
>> > needs to operate on an lvalue (rvalue == read value, or value of a
>> > variable;
>> > lvalue = location value, or address of a variable).  Clearly cas needs
>> > to
>> > apply to a variable, not the value in the variable.  The variable's
>> > value is
>> > compared and the variable is set to a new value if it matches.  Look at
>> > Igor's example and you'll see his primitive on Arrays passes an lvalue
>> > since
>> > the primitive takes the index of an array element.
>> > e.g. if you implemented CAS in C using functions instead of macros you'd
>> > know that you couldn't do it with int cas(int var, int match, int new)
>> > but
>> > you'd know that you can do it with int cas(int *var, int match, int new)
>> > and
>> > ... int v; ... cas(&v, match, new)
>> >
>> >>
>> >> Doesn't this apply to Igor's proposed atomic swap as well?  The target
>> >> in
>> >> Igor's example was a variable "locked" which was clearly meant to be
>> >> visible
>> >> to other threads.
>> >
>> > Yes.  First time I read Igor's post I on;y saw the initial
>> > "| var1 var2 temp |
>> > temp := var1.
>> > var1 := var2.
>> > var2 := temp."
>> > which I rejected and didn't read further.  Now I see the :=:.  I don't
>> > like
>> > this because it doesn't map as nicely to CAS which I like using.  I like
>> > the
>> > boolean result one gets fro CAS that says the assignment either happened
>> > or
>> > it didn't.  Atomic swap is more low-level.  I then have to inspect the
>> > value
>> > the variable used to have and undo the swap if it wasn't what was
>> > expected.
>> >  So
>> >         | var |
>> >         var ?= expr : match
>> > is equivalent to
>> >         [| var scratch |
>> >         scratch := expr.
>> >         var :=: scratch.
>> >         scratch == match
>> >             ifTrue: [true]
>> >             ifFalse: [var :=: scratch. false]] valueUninterruptibly
>> > Hence I think :=: is too low-level.
>> >
>>
>> well, once you start using valueUninterruptibly, there is no much
>> reason to use/introduce atomic swap nor cas :)
>
> Right :)
>
>>
>> moreover, i see no trivial way to implement CAS using atomic swap,
>> while reverse is trivial.
>
> Is it, or does it also rely on uninterruptbility?  Is this right?
>     v1 :=: v2
> is equivalent to
>     [ | scratch |
>      scratch := v1.
>      v1 ?= v2 : scratch] whileFalse.
>      v2 := scratch.
>

Nope! Not in all cases.

If v1 and v2 having equal significance in algorithm, not just v1,
then it won't work.
I mean, if the above will be interrupted after leaving the loop but
before you assign to v2 (v2 := scratch)
and your code expects that not only v1 swapped correctly, but v2 as well,
then you're in trouble.

Swap is symmetric, i.e. v1 :=: v2 should be semantically equal to v2 :=: v1.
But its not achievable unless its also atomic.

So, i must confess (again), that CAS cannot serve as a complete
replacement for atomic swap.


>
>> >From this point of view, CAS is better.
>>
>> >>
>> >> If that's the case, then they both need to be implemented with a
>> >> special
>> >> assignment operator, and Igor and I can get back to arguing the merits
>> >> of
>> >> the atomic swap versus the atomic compare and swap :).  If not, then
>> >> Igor's
>> >> swap has a clear advantage and I guess you can both ignore the rest of
>> >> this
>> >> email...
>> >
>> > Um, mine /does/ use a special assignment operator, "?= :" :)
>> >
>>
>> var ?= match : expression
>>
>> could mean a lot of hacking in compiler. how about:
>>
>> var ?= { match . expression }
>>
>> so, you just need to introduce a new binary operator ?= "CAS
>> assignment" or "conditional assignment"
>> which is much simpler to hack comparing to trinary one.
>
> That's a good suggestion, bit let's see.  Personally I find  var ?= { match
> . expression } ok but a little flowery :)
>

Yes. But it looks more conformant to traditional ST syntax.  :)

>> And you can also force compiler to expect only '{' after ?=, so no
>>
>> var ?= foo
>> var ?= #( foo )
>> var ?= self bar
>>
>> will be allowed.
>
> But ?= : is no more difficult than parsing expressions within keywords.  I
> guess it'll look something like
>     conditionalAssignment: varNode
> " var '?=' expression ':' match => ConditionalAssignmentNode."
> | loc start valueNode |
> (loc := varNode assignmentCheck: encoder at: prevMark + requestorOffset) >=
> 0 ifTrue:
> [^self notify: 'Cannot store into' at: loc].
> start := self startOfNextToken.
> self advance.
> self expression ifFalse: [^self expected: 'Expression'].
> newValueNode := parseNode.
> (self match: #colon) ifFalse: [^self expected: 'colon'].
> self advance.
> self expression ifFalse: [^self expected: 'Expression'].
> parseNode := ConditionAssignmentNode new
> variable: varNode
> value: valueNode
> match: valueNode
> from: encoder
> sourceRange: (start to: self endOfLastToken).
> varNode nowHasDef.
> ^true

Okay, if you think its easy. No problem.
But i am a bit worrying about parsing & clearly reading following:

boo ?= #foo:bar: : #baz: .

colon seems not a good choice here.

OK, how about
    boo ?= #foo:bar: ? #baz
?


The main problem we have now is finding space in the bytecode.  At least for me, the remaining bytecodes have all been taken by Newspeak (and I'm eager to get Newspeak running on Cog).  So my plan is new object representation, new bytecode set, then cas.


>>
>>
>> >>


--
Best regards,
Igor Stasenko AKA sig.




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Tom Rushworth-3
In reply to this post by Igor Stasenko

On 2010-Oct-13, at 12:58, Igor Stasenko wrote:

>>
>> Is it, or does it also rely on uninterruptbility?  Is this right?
>>     v1 :=: v2
>> is equivalent to
>>     [ | scratch |
>>      scratch := v1.
>>      v1 ?= v2 : scratch] whileFalse.
>>      v2 := scratch.
>>
>
> Nope! Not in all cases.
>
> If v1 and v2 having equal significance in algorithm, not just v1,
> then it won't work.
> I mean, if the above will be interrupted after leaving the loop but
> before you assign to v2 (v2 := scratch)
> and your code expects that not only v1 swapped correctly, but v2 as well,
> then you're in trouble.
>
> Swap is symmetric, i.e. v1 :=: v2 should be semantically equal to v2 :=: v1.
> But its not achievable unless its also atomic.
>
> So, i must confess (again), that CAS cannot serve as a complete
> replacement for atomic swap.

I hadn't thought about both v1 and v2 being "global" (in the sense that they are both visible to other threads).  For the sake of the discussion I'll use a little C syntax to the lvalue/rvalue distinction is clear.

CAS looks like: boolean CAS(int *var, int match, int newVal) // only one lvalue

ASWP looks like: void ASWP(int *var1, int *var2) // two lvalues !!

If the second lvalue is thread-local, you can simulate ASWP with CAS.  If it isn't you need to step up to DCAS (Double Compare And Swap - two simultaneous, non-adjacent CAS ops as one atomic op), which is a much rarer bird. (I'm not aware of any actual HW that supports it.)  Another option might be STM (software transactional memory (I think)), but that's not really common either.

So, given how rare DCAS is, I'd have to say it isn't something we should be doing in Squeak, which leads to the question of just how often you expect to have an algorithm using ASWP where both v1 and v2 are "global" ?  I've spent a couple of months over the last few years reading papers on lock free data structures and there are lots of papers using CAS, but none I remember for an atomic swap, so I still think CAS is a better thing to add to Squeak than ASWP.  If you can come up with good examples for algorithms that need both v1 and v2 global then you might have a good argument, but I know there are good examples for algorithms using CAS, so you sort of need to match them :).
>
>
[snip]
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>

--
Tom Rushworth




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Igor Stasenko
In reply to this post by Eliot Miranda-2
On 13 October 2010 23:59, Eliot Miranda <[hidden email]> wrote:

>
>
> On Wed, Oct 13, 2010 at 12:58 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 13 October 2010 21:56, Eliot Miranda <[hidden email]> wrote:
>> >
>> >
>> > On Wed, Oct 13, 2010 at 11:37 AM, Igor Stasenko <[hidden email]>
>> > wrote:
>> >>
>> >> On 13 October 2010 21:06, Eliot Miranda <[hidden email]>
>> >> wrote:
>> >> >
>> >> >
>> >> > On Wed, Oct 13, 2010 at 9:41 AM, Tom Rushworth
>> >> > <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> Hi Igor, Eliot,
>> >> >>
>> >> >> Eliot, I'm not sure I understood your comment:
>> >> >>
>> >> >> On 2010-Oct-13, at 08:55, Eliot Miranda wrote:
>> >> >>
>> >> >> > Right.  And because Smalltak can't reify variables and CAS is an
>> >> >> > operation on a variable CAS can't be implemented as a primitive on
>> >> >> > variables.  There's no way to express "pass a variable to a
>> >> >> > primitive",
>> >> >> > only "pass an expression (which may be the value of a variable)"
>> >> >> > to a
>> >> >> > primitive".  One could do it with a primitive on an object, e.g.
>> >> >> > thisContext at: tempIndex compareWith: match andSetTo: expr, or
>> >> >> > anObject
>> >> >> > instVarAt: varIndex compareWith: match andSetTo: expr but that's
>> >> >> > soooo
>> >> >> > ugly.  Hence I think it is better done using a special assignment
>> >> >> > operator.
>> >> >
>> >> > I mean one can't write
>> >> >     | t |
>> >> >     self compare: t andSwap: expr
>> >> > because in the above the rvalue of t is passed, not it's lvalue, and
>> >> > cas
>> >> > needs to operate on an lvalue (rvalue == read value, or value of a
>> >> > variable;
>> >> > lvalue = location value, or address of a variable).  Clearly cas
>> >> > needs
>> >> > to
>> >> > apply to a variable, not the value in the variable.  The variable's
>> >> > value is
>> >> > compared and the variable is set to a new value if it matches.  Look
>> >> > at
>> >> > Igor's example and you'll see his primitive on Arrays passes an
>> >> > lvalue
>> >> > since
>> >> > the primitive takes the index of an array element.
>> >> > e.g. if you implemented CAS in C using functions instead of macros
>> >> > you'd
>> >> > know that you couldn't do it with int cas(int var, int match, int
>> >> > new)
>> >> > but
>> >> > you'd know that you can do it with int cas(int *var, int match, int
>> >> > new)
>> >> > and
>> >> > ... int v; ... cas(&v, match, new)
>> >> >
>> >> >>
>> >> >> Doesn't this apply to Igor's proposed atomic swap as well?  The
>> >> >> target
>> >> >> in
>> >> >> Igor's example was a variable "locked" which was clearly meant to be
>> >> >> visible
>> >> >> to other threads.
>> >> >
>> >> > Yes.  First time I read Igor's post I on;y saw the initial
>> >> > "| var1 var2 temp |
>> >> > temp := var1.
>> >> > var1 := var2.
>> >> > var2 := temp."
>> >> > which I rejected and didn't read further.  Now I see the :=:.  I
>> >> > don't
>> >> > like
>> >> > this because it doesn't map as nicely to CAS which I like using.  I
>> >> > like
>> >> > the
>> >> > boolean result one gets fro CAS that says the assignment either
>> >> > happened
>> >> > or
>> >> > it didn't.  Atomic swap is more low-level.  I then have to inspect
>> >> > the
>> >> > value
>> >> > the variable used to have and undo the swap if it wasn't what was
>> >> > expected.
>> >> >  So
>> >> >         | var |
>> >> >         var ?= expr : match
>> >> > is equivalent to
>> >> >         [| var scratch |
>> >> >         scratch := expr.
>> >> >         var :=: scratch.
>> >> >         scratch == match
>> >> >             ifTrue: [true]
>> >> >             ifFalse: [var :=: scratch. false]] valueUninterruptibly
>> >> > Hence I think :=: is too low-level.
>> >> >
>> >>
>> >> well, once you start using valueUninterruptibly, there is no much
>> >> reason to use/introduce atomic swap nor cas :)
>> >
>> > Right :)
>> >
>> >>
>> >> moreover, i see no trivial way to implement CAS using atomic swap,
>> >> while reverse is trivial.
>> >
>> > Is it, or does it also rely on uninterruptbility?  Is this right?
>> >     v1 :=: v2
>> > is equivalent to
>> >     [ | scratch |
>> >      scratch := v1.
>> >      v1 ?= v2 : scratch] whileFalse.
>> >      v2 := scratch.
>> >
>>
>> Nope! Not in all cases.
>>
>> If v1 and v2 having equal significance in algorithm, not just v1,
>> then it won't work.
>> I mean, if the above will be interrupted after leaving the loop but
>> before you assign to v2 (v2 := scratch)
>> and your code expects that not only v1 swapped correctly, but v2 as well,
>> then you're in trouble.
>>
>> Swap is symmetric, i.e. v1 :=: v2 should be semantically equal to v2 :=:
>> v1.
>> But its not achievable unless its also atomic.
>>
>> So, i must confess (again), that CAS cannot serve as a complete
>> replacement for atomic swap.
>>
>>
>> >
>> >> >From this point of view, CAS is better.
>> >>
>> >> >>
>> >> >> If that's the case, then they both need to be implemented with a
>> >> >> special
>> >> >> assignment operator, and Igor and I can get back to arguing the
>> >> >> merits
>> >> >> of
>> >> >> the atomic swap versus the atomic compare and swap :).  If not, then
>> >> >> Igor's
>> >> >> swap has a clear advantage and I guess you can both ignore the rest
>> >> >> of
>> >> >> this
>> >> >> email...
>> >> >
>> >> > Um, mine /does/ use a special assignment operator, "?= :" :)
>> >> >
>> >>
>> >> var ?= match : expression
>> >>
>> >> could mean a lot of hacking in compiler. how about:
>> >>
>> >> var ?= { match . expression }
>> >>
>> >> so, you just need to introduce a new binary operator ?= "CAS
>> >> assignment" or "conditional assignment"
>> >> which is much simpler to hack comparing to trinary one.
>> >
>> > That's a good suggestion, bit let's see.  Personally I find  var ?= {
>> > match
>> > . expression } ok but a little flowery :)
>> >
>>
>> Yes. But it looks more conformant to traditional ST syntax.  :)
>>
>> >> And you can also force compiler to expect only '{' after ?=, so no
>> >>
>> >> var ?= foo
>> >> var ?= #( foo )
>> >> var ?= self bar
>> >>
>> >> will be allowed.
>> >
>> > But ?= : is no more difficult than parsing expressions within keywords.
>> >  I
>> > guess it'll look something like
>> >     conditionalAssignment: varNode
>> > " var '?=' expression ':' match => ConditionalAssignmentNode."
>> > | loc start valueNode |
>> > (loc := varNode assignmentCheck: encoder at: prevMark + requestorOffset)
>> > >=
>> > 0 ifTrue:
>> > [^self notify: 'Cannot store into' at: loc].
>> > start := self startOfNextToken.
>> > self advance.
>> > self expression ifFalse: [^self expected: 'Expression'].
>> > newValueNode := parseNode.
>> > (self match: #colon) ifFalse: [^self expected: 'colon'].
>> > self advance.
>> > self expression ifFalse: [^self expected: 'Expression'].
>> > parseNode := ConditionAssignmentNode new
>> > variable: varNode
>> > value: valueNode
>> > match: valueNode
>> > from: encoder
>> > sourceRange: (start to: self endOfLastToken).
>> > varNode nowHasDef.
>> > ^true
>>
>> Okay, if you think its easy. No problem.
>> But i am a bit worrying about parsing & clearly reading following:
>>
>> boo ?= #foo:bar: : #baz: .
>>
>> colon seems not a good choice here.
>
> OK, how about
>     boo ?= #foo:bar: ? #baz
> ?
>

Err.. #? as well as #?= currently parsed as valid binary selector,
so
#foo:bar: ? #baz
can be treated as a valid binary message send expression.

In contrast,  :=:  leads to syntax error, which is good,
as well as
:=?

var :=: { match . expression }
var :=? { match . expression }

var :=? match :=: expression .


> The main problem we have now is finding space in the bytecode.  At least for
> me, the remaining bytecodes have all been taken by Newspeak (and I'm eager
> to get Newspeak running on Cog).  So my plan is new object representation,
> new bytecode set, then cas.

Always leave at least a single bytecode for future extensions.


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Igor Stasenko
In reply to this post by Tom Rushworth-3
On 14 October 2010 00:07, Tom Rushworth <[hidden email]> wrote:

>
> On 2010-Oct-13, at 12:58, Igor Stasenko wrote:
>
>>>
>>> Is it, or does it also rely on uninterruptbility?  Is this right?
>>>     v1 :=: v2
>>> is equivalent to
>>>     [ | scratch |
>>>      scratch := v1.
>>>      v1 ?= v2 : scratch] whileFalse.
>>>      v2 := scratch.
>>>
>>
>> Nope! Not in all cases.
>>
>> If v1 and v2 having equal significance in algorithm, not just v1,
>> then it won't work.
>> I mean, if the above will be interrupted after leaving the loop but
>> before you assign to v2 (v2 := scratch)
>> and your code expects that not only v1 swapped correctly, but v2 as well,
>> then you're in trouble.
>>
>> Swap is symmetric, i.e. v1 :=: v2 should be semantically equal to v2 :=: v1.
>> But its not achievable unless its also atomic.
>>
>> So, i must confess (again), that CAS cannot serve as a complete
>> replacement for atomic swap.
>
> I hadn't thought about both v1 and v2 being "global" (in the sense that they are both visible to other threads).  For the sake of the discussion I'll use a little C syntax to the lvalue/rvalue distinction is clear.
>
> CAS looks like: boolean CAS(int *var, int match, int newVal) // only one lvalue
>
> ASWP looks like: void ASWP(int *var1, int *var2) // two lvalues !!
>
> If the second lvalue is thread-local, you can simulate ASWP with CAS.  If it isn't you need to step up to DCAS (Double Compare And Swap - two simultaneous, non-adjacent CAS ops as one atomic op), which is a much rarer bird. (I'm not aware of any actual HW that supports it.)  Another option might be STM (software transactional memory (I think)), but that's not really common either.
>
> So, given how rare DCAS is, I'd have to say it isn't something we should be doing in Squeak, which leads to the question of just how often you expect to have an algorithm using ASWP where both v1 and v2 are "global" ?  I've spent a couple of months over the last few years reading papers on lock free data structures and there are lots of papers using CAS, but none I remember for an atomic swap, so I still think CAS is a better thing to add to Squeak than ASWP.  If you can come up with good examples for algorithms that need both v1 and v2 global then you might have a good argument, but I know there are good examples for algorithms using CAS, so you sort of need to match them :).

I think you won't find any papers about ASWP, because all researchers
thinking in a way:
 - okay, i have a CAS, now how i implement X using it. :)

Lets get back to my LIFO queue, its seems i found how to make it right
using ASWP:

Populating:

| temp |
temp := newItem := QueueItem new.
newItem next: newItem. "note this"
queueHead :=: newItem.
temp next: newItem.


Flushing and processing the queue:

| oldHead |
oldHead := nil.
queueHead :=: oldHead.

[ oldHead notNil ] whileTrue: [
  oldHead next == oldHead ifFalse: [
    "process the queue item here"
  oldHead := oldHead next. ]
]

in this way, a processing thread will loop with no-operation,
until thread, who populating new item will finish its job, i.e. attach
a rest of queue (or nil).

>>
>
> --
> Tom Rushworth
>
>
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Igor Stasenko
>
> Lets get back to my LIFO queue, its seems i found how to make it right
> using ASWP:
>
> Populating:
>
> | temp |
> temp := newItem := QueueItem new.
> newItem next: newItem. "note this"
> queueHead :=: newItem.
> temp next: newItem.
>
>
> Flushing and processing the queue:
>
> | oldHead |
> oldHead := nil.
> queueHead :=: oldHead.
>
> [ oldHead notNil ] whileTrue: [
>  oldHead next == oldHead ifFalse: [
>    "process the queue item here"
>  oldHead := oldHead next. ]
> ]
>

Popping just one item:

| head next rest |

 "DummyQueueItem same as QueueItem, but carrying no operation"
 rest := head := DummyQueueItem new.

 "same trick, point to itself "
 head next: head.

 head :=: queueHead.   "swap"

 head ifNil: [ rest next: nil. ^ self ]. "okay, queue was empty"

"spin, to make sure we're not in the middle of populating/extracting an item"
 [ head next == head ] whileTrue.

 "skip over dummies"
 [ head notNil and: [ head isDummy ]] whileTrue: [ head := head next. ].

 head ifNil: [ rest next: nil. ^ self ]. "okay, there was just dummies"

 rest next: head next.  "append queue tail back to head"

 self processSingleItem: head


So, in summary:
 - population requires no spin loops.
 - extraction requires 1 spin loop.

Of course it may be incorrect. Please check.

And i am really interested to compare it with implementation which
using CAS instead.

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Tom Rushworth-3
In reply to this post by Igor Stasenko

On 2010-Oct-13, at 14:28, Igor Stasenko wrote:

> On 14 October 2010 00:07, Tom Rushworth <[hidden email]> wrote:
>>
>> On 2010-Oct-13, at 12:58, Igor Stasenko wrote:
>>
>>>>
>>>> Is it, or does it also rely on uninterruptbility?  Is this right?
>>>>     v1 :=: v2
>>>> is equivalent to
>>>>     [ | scratch |
>>>>      scratch := v1.
>>>>      v1 ?= v2 : scratch] whileFalse.
>>>>      v2 := scratch.
>>>>
>>>
>>> Nope! Not in all cases.
>>>
>>> If v1 and v2 having equal significance in algorithm, not just v1,
>>> then it won't work.
>>> I mean, if the above will be interrupted after leaving the loop but
>>> before you assign to v2 (v2 := scratch)
>>> and your code expects that not only v1 swapped correctly, but v2 as well,
>>> then you're in trouble.
>>>
>>> Swap is symmetric, i.e. v1 :=: v2 should be semantically equal to v2 :=: v1.
>>> But its not achievable unless its also atomic.
>>>
>>> So, i must confess (again), that CAS cannot serve as a complete
>>> replacement for atomic swap.
>>
>> I hadn't thought about both v1 and v2 being "global" (in the sense that they are both visible to other threads).  For the sake of the discussion I'll use a little C syntax to the lvalue/rvalue distinction is clear.
>>
>> CAS looks like: boolean CAS(int *var, int match, int newVal) // only one lvalue
>>
>> ASWP looks like: void ASWP(int *var1, int *var2) // two lvalues !!
>>
>> If the second lvalue is thread-local, you can simulate ASWP with CAS.  If it isn't you need to step up to DCAS (Double Compare And Swap - two simultaneous, non-adjacent CAS ops as one atomic op), which is a much rarer bird. (I'm not aware of any actual HW that supports it.)  Another option might be STM (software transactional memory (I think)), but that's not really common either.
>>
>> So, given how rare DCAS is, I'd have to say it isn't something we should be doing in Squeak, which leads to the question of just how often you expect to have an algorithm using ASWP where both v1 and v2 are "global" ?  I've spent a couple of months over the last few years reading papers on lock free data structures and there are lots of papers using CAS, but none I remember for an atomic swap, so I still think CAS is a better thing to add to Squeak than ASWP.  If you can come up with good examples for algorithms that need both v1 and v2 global then you might have a good argument, but I know there are good examples for algorithms using CAS, so you sort of need to match them :).
>
> I think you won't find any papers about ASWP, because all researchers
> thinking in a way:
> - okay, i have a CAS, now how i implement X using it. :)

So true.  There's a very strong "band wagon" effect :).

(I may be being too English idomatic here.  English has an idiom "jump on the band wagon" for joining the currently popular fad or movement, so the "band wagon effect" is what happens when all the researchers see someone with a good idea, or even just a really good paper :).)
>
> Lets get back to my LIFO queue, its seems i found how to make it right
> using ASWP:
>
> Populating:
>
> | temp |
> temp := newItem := QueueItem new.
> newItem next: newItem. "note this"
OK, at this point newItem is a single item circular list.
> queueHead :=: newItem.
Now queueHead is a single item circular list, and newItem is the old queue.  The single item circular list is the "guard" for flushing.
??? What happens if another thread (or threads) does a "Populate" right here, before the "Flush" grabs it, and before this Populate completes?  You end up with two circular lists, and two old queues in two different thread instances of temp - do they get re-assembled in the right way and does Flush handle it correctly?
> temp next: newItem.
This breaks the circular list by appending the old queue.
>
>
> Flushing and processing the queue:
>
> | oldHead |
> oldHead := nil.
> queueHead :=: oldHead.
Now oldHead is the queue and queueHead is nil.
>
> [ oldHead notNil ] whileTrue: [
"spin while the queue is a single item circular list - i.e wait for the last line of the "Populating" code.
>  oldHead next == oldHead ifFalse: [
>    "process the queue item here"
>  oldHead := oldHead next. ]
> ]
>
> in this way, a processing thread will loop with no-operation,
> until thread, who populating new item will finish its job, i.e. attach
> a rest of queue (or nil).

This looks better, but I still have doubts, see the "???" above.  We need to work out the details of n>1 "Populates" happening at the same time.

>
>>>
>>
>> --
>> Tom Rushworth
>>
>>
>>
>>
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>

--
Tom Rushworth




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Igor Stasenko
On 14 October 2010 01:22, Tom Rushworth <[hidden email]> wrote:

>
> On 2010-Oct-13, at 14:28, Igor Stasenko wrote:
>
>> On 14 October 2010 00:07, Tom Rushworth <[hidden email]> wrote:
>>>
>>> On 2010-Oct-13, at 12:58, Igor Stasenko wrote:
>>>
>>>>>
>>>>> Is it, or does it also rely on uninterruptbility?  Is this right?
>>>>>     v1 :=: v2
>>>>> is equivalent to
>>>>>     [ | scratch |
>>>>>      scratch := v1.
>>>>>      v1 ?= v2 : scratch] whileFalse.
>>>>>      v2 := scratch.
>>>>>
>>>>
>>>> Nope! Not in all cases.
>>>>
>>>> If v1 and v2 having equal significance in algorithm, not just v1,
>>>> then it won't work.
>>>> I mean, if the above will be interrupted after leaving the loop but
>>>> before you assign to v2 (v2 := scratch)
>>>> and your code expects that not only v1 swapped correctly, but v2 as well,
>>>> then you're in trouble.
>>>>
>>>> Swap is symmetric, i.e. v1 :=: v2 should be semantically equal to v2 :=: v1.
>>>> But its not achievable unless its also atomic.
>>>>
>>>> So, i must confess (again), that CAS cannot serve as a complete
>>>> replacement for atomic swap.
>>>
>>> I hadn't thought about both v1 and v2 being "global" (in the sense that they are both visible to other threads).  For the sake of the discussion I'll use a little C syntax to the lvalue/rvalue distinction is clear.
>>>
>>> CAS looks like: boolean CAS(int *var, int match, int newVal) // only one lvalue
>>>
>>> ASWP looks like: void ASWP(int *var1, int *var2) // two lvalues !!
>>>
>>> If the second lvalue is thread-local, you can simulate ASWP with CAS.  If it isn't you need to step up to DCAS (Double Compare And Swap - two simultaneous, non-adjacent CAS ops as one atomic op), which is a much rarer bird. (I'm not aware of any actual HW that supports it.)  Another option might be STM (software transactional memory (I think)), but that's not really common either.
>>>
>>> So, given how rare DCAS is, I'd have to say it isn't something we should be doing in Squeak, which leads to the question of just how often you expect to have an algorithm using ASWP where both v1 and v2 are "global" ?  I've spent a couple of months over the last few years reading papers on lock free data structures and there are lots of papers using CAS, but none I remember for an atomic swap, so I still think CAS is a better thing to add to Squeak than ASWP.  If you can come up with good examples for algorithms that need both v1 and v2 global then you might have a good argument, but I know there are good examples for algorithms using CAS, so you sort of need to match them :).
>>
>> I think you won't find any papers about ASWP, because all researchers
>> thinking in a way:
>> - okay, i have a CAS, now how i implement X using it. :)
>
> So true.  There's a very strong "band wagon" effect :).
>
> (I may be being too English idomatic here.  English has an idiom "jump on the band wagon" for joining the currently popular fad or movement, so the "band wagon effect" is what happens when all the researchers see someone with a good idea, or even just a really good paper :).)
>>
>> Lets get back to my LIFO queue, its seems i found how to make it right
>> using ASWP:
>>
>> Populating:
>>
>> | temp |
>> temp := newItem := QueueItem new.
>> newItem next: newItem. "note this"
> OK, at this point newItem is a single item circular list.
>> queueHead :=: newItem.
> Now queueHead is a single item circular list, and newItem is the old queue.  The single item circular list is the "guard" for flushing.
> ??? What happens if another thread (or threads) does a "Populate" right here, before the "Flush" grabs it, and before this Populate completes?  You end up with two circular lists, and two old queues in two different thread instances of temp - do they get re-assembled in the right way and does Flush handle it correctly?

Okay, lets see what happen.
Suppose we having P1, P2 - populating threads , and one F -
flushing/processing thread.

P1 swaps first:

head <= item1 (circular)
queue tail is known by P1 but not attached yet

now P2 interrupts P1

head <= item2 (circular)
queue tail, known by P2 points to item1


now F goes into play

head <= nil.
queue tail points to item2(circular)

- it spins until P2 finishes its work, and breaking the circular
reference by setting item2 next: item1
- now goes to item1
- again it spins until P1 finishes its work, and breaking the circular
reference by setting item1 next: old head


If F interrupts P1 before P2. Then P2 will see nil as queue tail, and
F will be able to process item2
on next flush iteration.

Looks like everything is fine.


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] How about atomic value-swap bytecode?

Tom Rushworth-3
In reply to this post by Igor Stasenko
Hi Igor,

It looks like I owe you an apology for (incorrectly) claiming your atomic swap (ASWP) was weaker than atomic compare and swap (CAS).

I went back to Herlihy's original 1991 paper[A] where he analyzed the various operations, and there are two types of swap considered.  I'll use " "global" to mean "visible to all threads", and "local" to mean "visible to only a single thread".  The first type of swap, where one var is global and one is local is the weak one.  The second type of swap, where both vars are global is as strong as CAS.

So, my bad - I hadn't remembered enough of the details of the paper, just the vague notion that "swap was bad and cas was good" :(.

However, I still think that the best operation to add to Squeak is CAS, for two reasons:

1) the heart of the bytecode can be the actual HW CAS instruction, so that the thread-safety of the bytecode does not depend on any other property of the VM such as scheduling conventions or block implementation strategy.  Most HW architectures today have CAS, I don't know of any with ASWP.  I guess ASWP has to have two main-memory locks while CAS only needs one, so the HW folks went with the one that looked easier :).

2) the "band wagon effect" I mentioned in another post has meant that most published algorithms are based on CAS, and wait-free or lock-free algorithms are very, very, very, difficult to get right[B], so if we jump on the band wagon as well, we are likely to save ourselves a lot of work and possibly some grief by being able to make use of the largest collection of known-good algorithms.

[A] "Wait-Free Synchronization" by Maurice Herlihy, in ACM Transactions on Programming Languages and Systems, Vol. 11, No. 1, January 1991, pages 124-149.

[B] This difficulty is the "real" conclusion to be drawn from the paper I mentioned earlier in this thread about the flaw in the double ended queue algorithm.  The paper in question is:
"DCAS is not a Silver Bullet for Nonblocking Algorithm Design" by Simon Doherty, David L.Detlefs, Lindsay Groves, Christine H. Flood, Victor Luchangco, Paul A. Martin, Mark Moir, Nir Shavit, and Guy L. Steel Jr., in
SPAA 2004, June 27-30, 2004, Barcelona, Spain.
In this paper they actually resorted to using a program verification system called PVS to make CERTAIN that their corrected algorithm is good.

On 2010-Oct-13, at 12:17, Igor Stasenko wrote:

>>> [snip]

>>> I'm not arguing about userfulness of CAS.
>>> As i said, if you think its cool to have as well, lets think how to
>>> add it as well.
>>> But i'm against 'lets add CAS instead of atomic swap'.
>>
>> Ah.  Well, I'm afraid I am arguing against atomic swap, because it is just too weak to be really useful.  Yes, it does booleans, but that's about all.  I've been working on multi-threaded C code for about 5 years now, and the single most important thing I've learned is that lock free multi-threaded algorithms are not nearly as simple as they look at first.  The classic example is an actual published paper on a lock-free double ended queue that contained a flaw where the last element could be popped from both ends at once by two different threads.  The paper was peer-reviewed, both in it's original conference version and in its later journal version, and the flaw wasn't discovered until after the journal version was published.
>
> Okay, i must confess, that atomic swap seems inferior to CAS.
> So, lets stick with CAS.
>
> Thanks, Tom for providing arguments and proving your point.
>
> --
> Best regards,
> Igor Stasenko AKA sig.

--
Tom Rushworth




12