Concurrent Futures

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
152 messages Options
12345 ... 8
Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Joshua Gargus-2

On Oct 29, 2007, at 1:50 PM, Igor Stasenko wrote:

>
> Lets see , what is happen if we have only a future sends.
> Then, given code:
> a print.
> b print.
>
> will not guarantee that a will be printed before b. Now you must
> ensure that you preserve imperative semantics, which may be done like
> following:
> futureA := a print.
> futureA whenComplete: [ b print ].

No, that's not true.  Croquet and E both ensure that the future  
messages are processed in the order that they are sent.  No  
#whenComplete: is required, since the second print expression does  
not depend on the first.

>
> Yes, we can make 'futureA whenComplete:'  check implicitly (by
> modifying VM), then we can preserve old code. But do we really need a
> futures everywhere?

We don't need futures everywhere.  Croquet has chosen to make futures  
explicit; your example would be written:

a future print.
b future print.

At least in Croquet (I don't know what a "typical" E program looks  
like), future sends are used sparingly... the vast majority of  
messages are regular immediate sends.

Josh

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Joshua Gargus-2
In reply to this post by Igor Stasenko

On Oct 29, 2007, at 2:16 PM, Igor Stasenko wrote:
>
> Yes we do, but what prevents others from implementing own locking
> semantics based on direct message sends (not futures)?

What prevents me from using FFI to allocate memory that isn't managed  
by the garbage collector?  Nothing, of course.  But if I create a  
memory leak in this way, it's silly to blame the garbage collector.

I think the analogy is clear, but I'll be explicit... if you  
circumvent a future-based concurrency mechanism by implementing  
locking mechanisms, then it is silly to blame the futures for the  
deadlock that you've created.

Cheers,
Josh

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Giovanni Corriga
In reply to this post by Andreas.Raab
Il giorno lun, 29/10/2007 alle 13.34 -0800, Andreas Raab ha scritto:

> Not "all messages sends". Only messages between concurrent entities
> (islands). This is the main difference to the all-out actors model
> (where each object is its own unit of concurrency) and has the advantage
> that you can reuse all of todays single-threaded code.

Just out of curiosity, how much work do you think it would be necessary
to port the Islands system to the standard Squeak image?

        Giovanni


Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Igor Stasenko
On 30/10/2007, Giovanni Corriga <[hidden email]> wrote:

> Il giorno lun, 29/10/2007 alle 13.34 -0800, Andreas Raab ha scritto:
>
> > Not "all messages sends". Only messages between concurrent entities
> > (islands). This is the main difference to the all-out actors model
> > (where each object is its own unit of concurrency) and has the advantage
> > that you can reuse all of todays single-threaded code.
>
> Just out of curiosity, how much work do you think it would be necessary
> to port the Islands system to the standard Squeak image?
>
>         Giovanni
>
This is what i mean. There is a BIG difference between concurrency
(parallel execution with shared memory) and distributed computing.
An 'islands' is fitting good for distributed computing, but does they
fit for concurrent parallel execution? I doubt.

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Rob Withers

----- Original Message -----
From: "Igor Stasenko" <[hidden email]>
To: "The general-purpose Squeak developers list"
<[hidden email]>
Sent: Monday, October 29, 2007 3:52 PM
Subject: Re: Concurrent Futures


> On 30/10/2007, Giovanni Corriga <[hidden email]> wrote:
>> Il giorno lun, 29/10/2007 alle 13.34 -0800, Andreas Raab ha scritto:
>>
>> > Not "all messages sends". Only messages between concurrent entities
>> > (islands). This is the main difference to the all-out actors model
>> > (where each object is its own unit of concurrency) and has the
>> > advantage
>> > that you can reuse all of todays single-threaded code.
>>
>> Just out of curiosity, how much work do you think it would be necessary
>> to port the Islands system to the standard Squeak image?
>>
>>         Giovanni
>>
> This is what i mean. There is a BIG difference between concurrency
> (parallel execution with shared memory) and distributed computing.
> An 'islands' is fitting good for distributed computing, but does they
> fit for concurrent parallel execution? I doubt.

Igor, where would you place concurrency with disjoint memories?

Would assigning objects to specific processes, within a shared memory, such
that only those processes could mutate that object and the processes were
non-interruptable and non-waitable, would that be sufficient to make it
disjoint?  Imagine that every object reference (header) was assigned to a
specific Vat, and only processes within that Vat could interact with that
object.  All msg sends to that object, from other Vats, would be eventually
scheduled with that object's Vat.

Rob


Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Igor Stasenko
In reply to this post by Andreas.Raab
On 29/10/2007, Andreas Raab <[hidden email]> wrote:

> Igor Stasenko wrote:
> >> but it is still deadlock-free since there is no wait involved (neither
> >> "classic" nor "busy-wait). In fact, we use recursions like the above in
> >> various places in Croquet.
> >
> > See, unless you make all message sends in language as futures, you
> > can't guarantee that  some code will not end up with locking
> > semantics.
>
> Not "all messages sends". Only messages between concurrent entities
> (islands). This is the main difference to the all-out actors model
> (where each object is its own unit of concurrency) and has the advantage
> that you can reuse all of todays single-threaded code.
>

How would you define a boundaries of these entities in same image?
Could you illustrate by some simple examples, or strategy which can be
used for using them for concurrent execution within single VM?
I'm very interested in practical usage of futures myself.
What will you do, or how you would avoid the situation , when
sometimes a two different islands containing a reference to the same
object in VM will send direct messages to it, causing racing
condition?

The difference between distributed computing and parallel (or
concurrent) computing with shared memory is, that with shared memory,
any change of state of any object are automatically propagated on
entire image, while in distributed - not.

> > Lets see , what is happen if we have only a future sends.
> > Then, given code:
> > a print.
> > b print.
> >
> > will not guarantee that a will be printed before b.
>
> Actually it will, if and only if a and b are in the same unit of
> concurrency (island). Your example is a bit misleading because of having
> different receivers - if those were in different islands then indeed
> there will be no guarantee that a prints before b. So for simplicity
> let's change this to:
>
>    Transcript future print: a.
>    Transcript future print: b.
>
> Do we need a whenResolved: block to serialize execution? No we don't
> because messages between two islands are executed in the order in which
> they were scheduled. Everything else would be a straight ticket to
> insanity ;-)
>

Yes. But this example is significant one. Sometimes i want these
messages run in parallel, sometimes i don't. Even for single 'island'.
Then, for general solution we need these islands be a very small (a
smaller one is a single object) or contain big number of objects. The
question is, how to give control of their sizes to developer. How
developer can define a boundaries of island within single image?
I will not accept any solutions like 'multiple images' because this
drives us into distributed computing domain, which is _NOT_ concurrent
computing anymore, simple because its not using shared memory, and in
fact there is no sharing at all, only a glimpse of it.

> > Yes, we can make 'futureA whenComplete:'  check implicitly (by
> > modifying VM), then we can preserve old code. But do we really need a
> > futures everywhere?
>
> No, we don't. See above.
>
> > Or we give up with an imperative style and use something different
> > which fits better with futures, or we give up with futures.
>
> The nice thing about futures is that they can be put on top of
> everything else. We use them in Croquet today.
>
> > Or, we using both of them by mixing.. (which i think is most
> > appropriate).. But then, stating that such system can be really
> > lock-free, is wrong, because it depends on decision of concrete
> > developer and his code.
>
> This may be the outcome for an interim period. The good thing here is
> that you can *prove* that your program is deadlock-free simply by not
> using waits. And ain't that a nice property to have.
>
you mean waits like this (consider following two lines of code run in parallel):

[ a isUnlocked ] whileFalse: [ ]. b unlock.

and

[ b isUnlocked] whileFalse: []. a unlock.

You can remove waits, but can't remove a bad usage patterns from the
brain of developer :)
And how could you guarantee, that any bit of code in current ST image
does not contain such hidden locks - like a loops or recursive loops
which will never return until some external entity will change the
state of some object(s)?

> >>> As for GC - you have automatic memory management instead of manual.
> >>> But there's no automatic algorithm management and never will be ,
> >>> given any language :)
> >> And what's that supposed to mean?
> >>
> > I pointed that futures as an 'automatic lock-free' approach is not
> > quite parallel to 'automatic memory management by GC'.
>
> The similarity is striking. Both in terms of tradeoffs (trade low-level
> control for better productivity) as well as the style of arguments made
> against it ;-) Not that I mind by the way, I find these discussions
> necessary.
>
The striking is, that introducing GC does good things - removing a
necessity to care about memory, which helps a lot in developing and
makes code more clear and smaller. But i can't see how futures does
same. There are still lot things to consider for developer even by
using futures.
Of course, i dont have practice in using futures, and can't tell the
difference, but to what i see, it's not that easy to use comparing to
automatic memory management.
Yes, its easy to pick a random piece of code and blindly type a word
'future'. But i don't think that every such piece will work as before
;)

> Cheers,
>    - Andreas
>
>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Igor Stasenko
In reply to this post by Rob Withers
On 30/10/2007, Rob Withers <[hidden email]> wrote:

>
> ----- Original Message -----
> From: "Igor Stasenko" <[hidden email]>
> To: "The general-purpose Squeak developers list"
> <[hidden email]>
> Sent: Monday, October 29, 2007 3:52 PM
> Subject: Re: Concurrent Futures
>
>
> > On 30/10/2007, Giovanni Corriga <[hidden email]> wrote:
> >> Il giorno lun, 29/10/2007 alle 13.34 -0800, Andreas Raab ha scritto:
> >>
> >> > Not "all messages sends". Only messages between concurrent entities
> >> > (islands). This is the main difference to the all-out actors model
> >> > (where each object is its own unit of concurrency) and has the
> >> > advantage
> >> > that you can reuse all of todays single-threaded code.
> >>
> >> Just out of curiosity, how much work do you think it would be necessary
> >> to port the Islands system to the standard Squeak image?
> >>
> >>         Giovanni
> >>
> > This is what i mean. There is a BIG difference between concurrency
> > (parallel execution with shared memory) and distributed computing.
> > An 'islands' is fitting good for distributed computing, but does they
> > fit for concurrent parallel execution? I doubt.
>
> Igor, where would you place concurrency with disjoint memories?
>
> Would assigning objects to specific processes, within a shared memory, such
> that only those processes could mutate that object and the processes were
> non-interruptable and non-waitable, would that be sufficient to make it
> disjoint?  Imagine that every object reference (header) was assigned to a
> specific Vat, and only processes within that Vat could interact with that
> object.  All msg sends to that object, from other Vats, would be eventually
> scheduled with that object's Vat.
>

Simply because its not scales well. Consider a 1000 and 1 Vats ( 1000
and 1 tales comes in mind :).
A 1000 Vats sending a message to the same object, which is scheduled
in single Vat. So, there will be a HUGE difference in time between
first sender and last sender when they will receive an answer.
Yes, for some messages its the only answer, but for other messages,
like simply asking a state (like accessor), there's no need in
synchronization, because there's no changes in state and no real need
in scheduling.

Concept you give is really close to one , which i have in mind. Except
the example above, which seems will be always a bottleneck.

> Rob

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Rob Withers

----- Original Message -----
From: "Igor Stasenko" <[hidden email]>

>> > This is what i mean. There is a BIG difference between concurrency
>> > (parallel execution with shared memory) and distributed computing.
>> > An 'islands' is fitting good for distributed computing, but does they
>> > fit for concurrent parallel execution? I doubt.
>>
>> Igor, where would you place concurrency with disjoint memories?
>>
>> Would assigning objects to specific processes, within a shared memory,
>> such
>> that only those processes could mutate that object and the processes were
>> non-interruptable and non-waitable, would that be sufficient to make it
>> disjoint?  Imagine that every object reference (header) was assigned to a
>> specific Vat, and only processes within that Vat could interact with that
>> object.  All msg sends to that object, from other Vats, would be
>> eventually
>> scheduled with that object's Vat.
>>
>
> Simply because its not scales well. Consider a 1000 and 1 Vats ( 1000
> and 1 tales comes in mind :).
> A 1000 Vats sending a message to the same object, which is scheduled
> in single Vat. So, there will be a HUGE difference in time between
> first sender and last sender when they will receive an answer.
> Yes, for some messages its the only answer, but for other messages,
> like simply asking a state (like accessor), there's no need in
> synchronization, because there's no changes in state and no real need
> in scheduling.
>
> Concept you give is really close to one , which i have in mind. Except
> the example above, which seems will be always a bottleneck.

Ok, so we have coded a bottleneck situation.  It's not a surprise, since we
have a 1000 objects asking for it's services.  It's time for a refactor and
remove that bottleneck with some service duplication and load balancing.
Using promises or futures doesn't remove bad code, just limits the kinds of
bad code you can have.  This would also be a problem in a distributed
application that was sending 1000 service requests to a single object.

Regarding accessing some state for a read.  If 2 Vats or Islands are on the
same processor with shared memory between them and Vat A tries to access the
state of an object owned by Vat B, as long as there is no relocation
occurring, we should be able to optimize access to this state with a direct
memory read operation.  Of course, in Smalltalk we access state owned by an
object by sending that object a read accessor msg, and I don't see how we
could determine that a given msg sent was there just for a memory read and
do the optimization.  It wouldn't surpise me if Exupery might know such
things.

I'd like to hear some description if the concept you have in mind, if you
have the time.  What are you thinking about in this arena?

> --
> Best regards,
> Igor Stasenko AKA sig.

Cheers,
Rob


Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Igor Stasenko
On 30/10/2007, Rob Withers <[hidden email]> wrote:

>
> ----- Original Message -----
> From: "Igor Stasenko" <[hidden email]>
>
> >> > This is what i mean. There is a BIG difference between concurrency
> >> > (parallel execution with shared memory) and distributed computing.
> >> > An 'islands' is fitting good for distributed computing, but does they
> >> > fit for concurrent parallel execution? I doubt.
> >>
> >> Igor, where would you place concurrency with disjoint memories?
> >>
> >> Would assigning objects to specific processes, within a shared memory,
> >> such
> >> that only those processes could mutate that object and the processes were
> >> non-interruptable and non-waitable, would that be sufficient to make it
> >> disjoint?  Imagine that every object reference (header) was assigned to a
> >> specific Vat, and only processes within that Vat could interact with that
> >> object.  All msg sends to that object, from other Vats, would be
> >> eventually
> >> scheduled with that object's Vat.
> >>
> >
> > Simply because its not scales well. Consider a 1000 and 1 Vats ( 1000
> > and 1 tales comes in mind :).
> > A 1000 Vats sending a message to the same object, which is scheduled
> > in single Vat. So, there will be a HUGE difference in time between
> > first sender and last sender when they will receive an answer.
> > Yes, for some messages its the only answer, but for other messages,
> > like simply asking a state (like accessor), there's no need in
> > synchronization, because there's no changes in state and no real need
> > in scheduling.
> >
> > Concept you give is really close to one , which i have in mind. Except
> > the example above, which seems will be always a bottleneck.
>
> Ok, so we have coded a bottleneck situation.  It's not a surprise, since we
> have a 1000 objects asking for it's services.  It's time for a refactor and
> remove that bottleneck with some service duplication and load balancing.
> Using promises or futures doesn't remove bad code, just limits the kinds of
> bad code you can have.  This would also be a problem in a distributed
> application that was sending 1000 service requests to a single object.
>
> Regarding accessing some state for a read.  If 2 Vats or Islands are on the
> same processor with shared memory between them and Vat A tries to access the
> state of an object owned by Vat B, as long as there is no relocation
> occurring, we should be able to optimize access to this state with a direct
> memory read operation.  Of course, in Smalltalk we access state owned by an
> object by sending that object a read accessor msg, and I don't see how we
> could determine that a given msg sent was there just for a memory read and
> do the optimization.  It wouldn't surpise me if Exupery might know such
> things.
>
> I'd like to hear some description if the concept you have in mind, if you
> have the time.  What are you thinking about in this arena?
>
Well, in my concept i don't have a vats. A computing resources(native
threads) managed by VM and not visible at language level. So, a VM
parallelism is not a language parallelism.
You can't predict what native thread will serve concrete message for
concrete object, thus a load is distributed evenly.
This haves own limitations, but mostly when you interacting with
external libraries through primitives - some of the libraries can not
work (or can work differently) when you accessing them from different
native threads.
Some of the libraries designed with no multithreading in mind, so
using them concurrently may cause a system crash.
But i don't think that i'm alone with this problem. We need to find
some good ways how to control that all calls to some external library
must be invoked only from a single thread of execution, while for
other libraries its ok to call them from anywhere.

> > --
> > Best regards,
> > Igor Stasenko AKA sig.
>
> Cheers,
> Rob
>
>
>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Rob Withers

----- Original Message -----
From: "Igor Stasenko" <[hidden email]>
To: "The general-purpose Squeak developers list"
<[hidden email]>
Sent: Monday, October 29, 2007 5:59 PM
Subject: Re: Concurrent Futures


> On 30/10/2007, Rob Withers <[hidden email]> wrote:

>> I'd like to hear some description if the concept you have in mind, if you
>> have the time.  What are you thinking about in this arena?
>>
> Well, in my concept i don't have a vats. A computing resources(native
> threads) managed by VM and not visible at language level. So, a VM
> parallelism is not a language parallelism.
> You can't predict what native thread will serve concrete message for
> concrete object, thus a load is distributed evenly.
> This haves own limitations, but mostly when you interacting with
> external libraries through primitives - some of the libraries can not
> work (or can work differently) when you accessing them from different
> native threads.
> Some of the libraries designed with no multithreading in mind, so
> using them concurrently may cause a system crash.
> But i don't think that i'm alone with this problem. We need to find
> some good ways how to control that all calls to some external library
> must be invoked only from a single thread of execution, while for
> other libraries its ok to call them from anywhere.

How do you protect against simultaneous accesses to memory for writes/reads?

I agree that thread-safing the primitives is a key task.

Cheers,
Rob


Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Igor Stasenko
On 30/10/2007, Rob Withers <[hidden email]> wrote:

>
> ----- Original Message -----
> From: "Igor Stasenko" <[hidden email]>
> To: "The general-purpose Squeak developers list"
> <[hidden email]>
> Sent: Monday, October 29, 2007 5:59 PM
> Subject: Re: Concurrent Futures
>
>
> > On 30/10/2007, Rob Withers <[hidden email]> wrote:
>
> >> I'd like to hear some description if the concept you have in mind, if you
> >> have the time.  What are you thinking about in this arena?
> >>
> > Well, in my concept i don't have a vats. A computing resources(native
> > threads) managed by VM and not visible at language level. So, a VM
> > parallelism is not a language parallelism.
> > You can't predict what native thread will serve concrete message for
> > concrete object, thus a load is distributed evenly.
> > This haves own limitations, but mostly when you interacting with
> > external libraries through primitives - some of the libraries can not
> > work (or can work differently) when you accessing them from different
> > native threads.
> > Some of the libraries designed with no multithreading in mind, so
> > using them concurrently may cause a system crash.
> > But i don't think that i'm alone with this problem. We need to find
> > some good ways how to control that all calls to some external library
> > must be invoked only from a single thread of execution, while for
> > other libraries its ok to call them from anywhere.
>
> How do you protect against simultaneous accesses to memory for writes/reads?
>
I have described this previously. The idea is simple. Since we have
limited set of write operations in object memory we could design a
system which prevents a primitive operations (such as primat: put:)
from running concurrently on the same object.
This means (like in your concept), that each object should have an
additional slot identifying a context where it used as receiver.
The only write operations which could modify object memory within any
method body is write to receiver memory (like setting ivar or indexed
var). Other operations (like pushing on stack) is safe, because we'll
have separate stack for each execution thread.
The message evaluation sequence then can be done as following:
- check receiver's assigned context
- if it's nil, then assign it to current and start executing method
- if its not nil, then schedule method execution to the thread, which
owns given context.

when performing a message send in method we deactivate the context,
which means that we can set assigned object's context to nil, and set
it again when activate context again. Or if we not, then all message
sends to given object will be scheduled to same native thread. I don't
know what is best - both approaches having own pros and cos.
Retaining a context can help us with making all calls to some external
library be invoked from single thread.

Either way we should ensure that there's a single active context at
some point of time in system running using same receiver. Also, we can
leave most primitives unmodified and be careless about concurrency in
these primitives, because by design, there can be only single
primitive running in system for particular object at any point of
time.

If you can see, such approach gives not much in solving concurrency
issues at language side. But that's where i think developers should
choose the way. We can provide all what they need
(semaphores/promises/futures e.t.c.) in libraries and provide some
additional primitives for controlling new aspects of VM.

> I agree that thread-safing the primitives is a key task.
>
> Cheers,
> Rob
>
>

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Rob Withers

----- Original Message -----
From: "Igor Stasenko" <[hidden email]>
To: "The general-purpose Squeak developers list"
<[hidden email]>
Sent: Monday, October 29, 2007 7:59 PM
Subject: Re: Concurrent Futures


> On 30/10/2007, Rob Withers <[hidden email]> wrote:
>>
>> ----- Original Message -----
>> From: "Igor Stasenko" <[hidden email]>
>> To: "The general-purpose Squeak developers list"
>> <[hidden email]>
>> Sent: Monday, October 29, 2007 5:59 PM
>> Subject: Re: Concurrent Futures
>>
>>
>> > On 30/10/2007, Rob Withers <[hidden email]> wrote:
>>
>> >> I'd like to hear some description if the concept you have in mind, if
>> >> you
>> >> have the time.  What are you thinking about in this arena?
>> >>
>> > Well, in my concept i don't have a vats. A computing resources(native
>> > threads) managed by VM and not visible at language level. So, a VM
>> > parallelism is not a language parallelism.
>> > You can't predict what native thread will serve concrete message for
>> > concrete object, thus a load is distributed evenly.
>> > This haves own limitations, but mostly when you interacting with
>> > external libraries through primitives - some of the libraries can not
>> > work (or can work differently) when you accessing them from different
>> > native threads.
>> > Some of the libraries designed with no multithreading in mind, so
>> > using them concurrently may cause a system crash.
>> > But i don't think that i'm alone with this problem. We need to find
>> > some good ways how to control that all calls to some external library
>> > must be invoked only from a single thread of execution, while for
>> > other libraries its ok to call them from anywhere.
>>
>> How do you protect against simultaneous accesses to memory for
>> writes/reads?
>>
> I have described this previously.

I must have missed it.

> The idea is simple. Since we have
> limited set of write operations in object memory we could design a
> system which prevents a primitive operations (such as primat: put:)
> from running concurrently on the same object.
> This means (like in your concept), that each object should have an
> additional slot identifying a context where it used as receiver.
> The only write operations which could modify object memory within any
> method body is write to receiver memory (like setting ivar or indexed
> var). Other operations (like pushing on stack) is safe, because we'll
> have separate stack for each execution thread.
> The message evaluation sequence then can be done as following:
> - check receiver's assigned context
> - if it's nil, then assign it to current and start executing method
> - if its not nil, then schedule method execution to the thread, which
> owns given context.

It seems to me that you are building event-loops in the VM.  Consider
anObject that is sent a msg on thread 1, so he is assigned to thread 1 and
you start processing it with thread 1.  In the meantime, 3 msgs are sent to
this object on threads 2, 3, and 4.  Each message send is scheduled for
processing by thread 1, when it becomes available.  There is your
event-loop.

The difference is that you allow objects to be reassigned based on thread
availability.  Since we have a shared memory, that is as simple as setting
the new context.  This is similar to ideas I had for redistributing objects
to different Vats based upon msg send density, which could be monitored, or
user specification.


> when performing a message send in method we deactivate the context,
> which means that we can set assigned object's context to nil, and set
> it again when activate context again. Or if we not, then all message
> sends to given object will be scheduled to same native thread. I don't
> know what is best - both approaches having own pros and cos.
> Retaining a context can help us with making all calls to some external
> library be invoked from single thread.

Indeed.  That would be a Vat.

>
> Either way we should ensure that there's a single active context at
> some point of time in system running using same receiver. Also, we can
> leave most primitives unmodified and be careless about concurrency in
> these primitives, because by design, there can be only single
> primitive running in system for particular object at any point of
> time.

But the primitive may have internal state that can't be shared.

>
> If you can see, such approach gives not much in solving concurrency
> issues at language side. But that's where i think developers should
> choose the way. We can provide all what they need
> (semaphores/promises/futures e.t.c.) in libraries and provide some
> additional primitives for controlling new aspects of VM.

I could certainly build the language features I am interested in on top of
such a VM.

cheers,
Rob


Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Andreas.Raab
In reply to this post by Igor Stasenko
Igor Stasenko wrote:
> How would you define a boundaries of these entities in same image?

It is defined implicitly by the island in which a message executes. All
objects created by the execution of a message are part of the island the
computation occurs in.

To create an object in another island you need to artificially "move the
computation" there. That's why islands implement the #new: message, so
that you can create an object in another island by moving the
computation, for example:

   space := island future new: TSpace.

This will create an instance of TSpace in the target island. Once we
have created the "root object" further messages that create objects will
be inside that island, too. For example, take this method:

TSpace>>makeNewCube
   "Create a new cube in this space"
   cube := TCube new.
   self addChild: cube.
   ^cube

and then:

   cube := space future makeNewCube.

Both, cube and space will be in the same island.

> Could you illustrate by some simple examples, or strategy which can be
> used for using them for concurrent execution within single VM?

I'm confused about your use of the term "concurrent". Earlier you wrote
"There is a BIG difference between concurrency (parallel execution with
shared memory) and distributed computing." which seems to imply that you
discount all means of concurrency that do not use shared memory. If that
is really what you mean (which is clearly different from the usual
meaning of the term concurrent) then indeed, there is no way for it to
be "concurrent" because there simply is no shared mutable state between
islands.

> I'm very interested in practical usage of futures myself.
> What will you do, or how you would avoid the situation , when
> sometimes a two different islands containing a reference to the same
> object in VM will send direct messages to it, causing racing
> condition?

The implementation of future message sending uses locks and mutexes. You
might say "aha! so it *is* using locks and mutexes" but just as with
automatic garbage collection (which uses pointers and pointer arithmetic
and explicit freeing) it is simply a means to implement the higher-level
semantics. And since no mutual/nested locks are required
deadlock-freeness can again be proven.

> Yes. But this example is significant one. Sometimes i want these
> messages run in parallel, sometimes i don't. Even for single 'island'.

In the island model, this is not an option. The unit of concurrency is
an island, period. If want to run computations in parallel that share
data you either make the data immutable (which can enable sharing in
some limited cases) or you copy the needed data to "worker islands".
Basic load balancing.

> Then, for general solution we need these islands be a very small (a
> smaller one is a single object) or contain big number of objects. The
> question is, how to give control of their sizes to developer. How
> developer can define a boundaries of island within single image?

By sending messages. See above.

> I will not accept any solutions like 'multiple images' because this
> drives us into distributed computing domain, which is _NOT_ concurrent
> computing anymore, simple because its not using shared memory, and in
> fact there is no sharing at all, only a glimpse of it.

Again, you have a strange definition of the term concurrency. It does
not (neither in general english nor in CS) require use of shared memory.
  There are two main classes of concurrent systems, namely those relying
on (mutable) shared memory and those relying on message passing
(sometimes utilizing immutable shared memory for optimization purposes
because it's indistinguishable from copying). Erlang and E (and Croquet
as long as you use it "correctly") all fall into the latter category.

>> This may be the outcome for an interim period. The good thing here is
>> that you can *prove* that your program is deadlock-free simply by not
>> using waits. And ain't that a nice property to have.
>>
> you mean waits like this (consider following two lines of code run in parallel):
>
> [ a isUnlocked ] whileFalse: [ ]. b unlock.
>
> and
>
> [ b isUnlocked] whileFalse: []. a unlock.

Just like in your previous example, this code is meaningless in Croquet.
You are assuming that a and b can be sent synchronous messages to and
that they resolve while being in the busy-loop. As I have pointed out
earlier this simply doesn't happen. Think of it that way: Results are
itself communicated using future messages, e.g.,

Island>>invokeMessage: aMessage
   "Invoke the message and post the result back to the sender island"
   result := aMessage value. "compute result of the message"
   aMessage promise future value: result. "resolve associated promise"

so you cannot possibly wait for the response to a message you just
scheduled. It is simply not possible, neither actively nor passively.

> And how could you guarantee, that any bit of code in current ST image
> does not contain such hidden locks - like a loops or recursive loops
> which will never return until some external entity will change the
> state of some object(s)?

No more than I can or have to guarantee that any particular bit of the
Squeak library is free of infinite loops. All we need to guarantee is
that we don't introduce new dependencies, which thanks to future
messages and promises we can guarantee. So if the subsystem is deadlock
free before it will stay so in our usage of it. If it's not then, well,
broken code is broken code no matter how you look at it.

>>> I pointed that futures as an 'automatic lock-free' approach is not
>>> quite parallel to 'automatic memory management by GC'.
>> The similarity is striking. Both in terms of tradeoffs (trade low-level
>> control for better productivity) as well as the style of arguments made
>> against it ;-) Not that I mind by the way, I find these discussions
>> necessary.
>>
> The striking is, that introducing GC does good things - removing a
> necessity to care about memory, which helps a lot in developing and
> makes code more clear and smaller. But i can't see how futures does
> same. There are still lot things to consider for developer even by
> using futures.

The main advantages are increased robustness and productivity. We worry
a *lot* about deadlocks since some our usage of Croquet shows exactly
the kind of "mixed usage" that you pointed out. But never, not once,
have we had a deadlock or even have had to worry about it, in places
where we used event-loop concurrency consistently. (interesting aside:
Just today we had a very complex deadlock on one of our servers and my
knee-jerk reaction was to try to convert it to event-loop concurrency
because although we got stack traces we may not be able to completely
figure out how the system ended up in that deadlock :-)

We've gradually continued to move to event-loop concurrency more and
more in many areas of our code because the knowledge that this code will
be deadlock-free allows us to concentrate on solving the problem at hand
instead of figuring out the most unlikely occurences that can cause
deadlock - I suspect that I'll be faster rewriting the code from today
as event-loops than figuring out what caused and how to avoid that deadlock.

And that is in my understanding the most important part - how many hours
have you spent thinking about how exactly a highly concurrent system
could possibly deadlock? What if you could spend this time on improving
the system instead, knowing that deadlock *simply cannot happen*?

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Igor Stasenko
On 30/10/2007, Andreas Raab <[hidden email]> wrote:

> Igor Stasenko wrote:
> > How would you define a boundaries of these entities in same image?
>
> It is defined implicitly by the island in which a message executes. All
> objects created by the execution of a message are part of the island the
> computation occurs in.
>
> To create an object in another island you need to artificially "move the
> computation" there. That's why islands implement the #new: message, so
> that you can create an object in another island by moving the
> computation, for example:
>
>    space := island future new: TSpace.
>
> This will create an instance of TSpace in the target island. Once we
> have created the "root object" further messages that create objects will
> be inside that island, too. For example, take this method:
>
> TSpace>>makeNewCube
>    "Create a new cube in this space"
>    cube := TCube new.
>    self addChild: cube.
>    ^cube
>
> and then:
>
>    cube := space future makeNewCube.
>
> Both, cube and space will be in the same island.
>
When i talking about boundaries, i talking about how you could prevent
a particular object from slipping through these boundaries and then
used in multiple islands by sending direct messages to it.

An islands provide an isolation layer for some computation. Its just a
tool with which we can  obtain a data results of some computation. Now
suppose that you returned a result (OrderedCollection for instance)
which is then used in other islands for further processing.
But at same time, a reference to this object are kept in original
island. Now there is a probability that we can send messages to
mutable object from different islands, and there is no protection. Or
it is?
I talking about current implementation. Is it possible such situation
to happen? And if not, what mechanisms you provide to prevent that?


> > Could you illustrate by some simple examples, or strategy which can be
> > used for using them for concurrent execution within single VM?
>
> I'm confused about your use of the term "concurrent". Earlier you wrote
> "There is a BIG difference between concurrency (parallel execution with
> shared memory) and distributed computing." which seems to imply that you
> discount all means of concurrency that do not use shared memory. If that
> is really what you mean (which is clearly different from the usual
> meaning of the term concurrent) then indeed, there is no way for it to
> be "concurrent" because there simply is no shared mutable state between
> islands.
>

Sorry if my definition was misleading or incorrect. I just see the
difference between running two islands on different images and running
them on same image in separate native threads.
Even if we suppose the we don't have shared state (to some extent)
between islands, we definitely will have shared state in same image
(on VM side at least), and we should carefully deal with it.

> > I'm very interested in practical usage of futures myself.
> > What will you do, or how you would avoid the situation , when
> > sometimes a two different islands containing a reference to the same
> > object in VM will send direct messages to it, causing racing
> > condition?
>
> The implementation of future message sending uses locks and mutexes. You
> might say "aha! so it *is* using locks and mutexes" but just as with
> automatic garbage collection (which uses pointers and pointer arithmetic
> and explicit freeing) it is simply a means to implement the higher-level
> semantics. And since no mutual/nested locks are required
> deadlock-freeness can again be proven.
>
No, i don't wanna say 'a-ha' :) The implementation details not really
matter here because they can be improved later. What is more important
is the proof of concept.
For instance, we could add an 'atomic update' primitive which could be
used for implementing a non-locking queues and/or lists.

> > Yes. But this example is significant one. Sometimes i want these
> > messages run in parallel, sometimes i don't. Even for single 'island'.
>
> In the island model, this is not an option. The unit of concurrency is
> an island, period. If want to run computations in parallel that share
> data you either make the data immutable (which can enable sharing in
> some limited cases) or you copy the needed data to "worker islands".
> Basic load balancing.

Got it.

>
> > Then, for general solution we need these islands be a very small (a
> > smaller one is a single object) or contain big number of objects. The
> > question is, how to give control of their sizes to developer. How
> > developer can define a boundaries of island within single image?
>
> By sending messages. See above.
>
> > I will not accept any solutions like 'multiple images' because this
> > drives us into distributed computing domain, which is _NOT_ concurrent
> > computing anymore, simple because its not using shared memory, and in
> > fact there is no sharing at all, only a glimpse of it.
>
> Again, you have a strange definition of the term concurrency. It does
> not (neither in general english nor in CS) require use of shared memory.
>   There are two main classes of concurrent systems, namely those relying
> on (mutable) shared memory and those relying on message passing
> (sometimes utilizing immutable shared memory for optimization purposes
> because it's indistinguishable from copying). Erlang and E (and Croquet
> as long as you use it "correctly") all fall into the latter category.
>
Correctly. Exactly!
But i'm interesting in making 'use incorrectly' impossible.


> >> This may be the outcome for an interim period. The good thing here is
> >> that you can *prove* that your program is deadlock-free simply by not
> >> using waits. And ain't that a nice property to have.
> >>
> > you mean waits like this (consider following two lines of code run in parallel):
> >
> > [ a isUnlocked ] whileFalse: [ ]. b unlock.
> >
> > and
> >
> > [ b isUnlocked] whileFalse: []. a unlock.
>
> Just like in your previous example, this code is meaningless in Croquet.
> You are assuming that a and b can be sent synchronous messages to and
> that they resolve while being in the busy-loop. As I have pointed out
> earlier this simply doesn't happen. Think of it that way: Results are
> itself communicated using future messages, e.g.,
>
> Island>>invokeMessage: aMessage
>    "Invoke the message and post the result back to the sender island"
>    result := aMessage value. "compute result of the message"
>    aMessage promise future value: result. "resolve associated promise"
>
> so you cannot possibly wait for the response to a message you just
> scheduled. It is simply not possible, neither actively nor passively.
>
> > And how could you guarantee, that any bit of code in current ST image
> > does not contain such hidden locks - like a loops or recursive loops
> > which will never return until some external entity will change the
> > state of some object(s)?
>
> No more than I can or have to guarantee that any particular bit of the
> Squeak library is free of infinite loops. All we need to guarantee is
> that we don't introduce new dependencies, which thanks to future
> messages and promises we can guarantee. So if the subsystem is deadlock
> free before it will stay so in our usage of it. If it's not then, well,
> broken code is broken code no matter how you look at it.
>
> >>> I pointed that futures as an 'automatic lock-free' approach is not
> >>> quite parallel to 'automatic memory management by GC'.
> >> The similarity is striking. Both in terms of tradeoffs (trade low-level
> >> control for better productivity) as well as the style of arguments made
> >> against it ;-) Not that I mind by the way, I find these discussions
> >> necessary.
> >>
> > The striking is, that introducing GC does good things - removing a
> > necessity to care about memory, which helps a lot in developing and
> > makes code more clear and smaller. But i can't see how futures does
> > same. There are still lot things to consider for developer even by
> > using futures.
>
> The main advantages are increased robustness and productivity. We worry
> a *lot* about deadlocks since some our usage of Croquet shows exactly
> the kind of "mixed usage" that you pointed out. But never, not once,
> have we had a deadlock or even have had to worry about it, in places
> where we used event-loop concurrency consistently. (interesting aside:
> Just today we had a very complex deadlock on one of our servers and my
> knee-jerk reaction was to try to convert it to event-loop concurrency
> because although we got stack traces we may not be able to completely
> figure out how the system ended up in that deadlock :-)
>
> We've gradually continued to move to event-loop concurrency more and
> more in many areas of our code because the knowledge that this code will
> be deadlock-free allows us to concentrate on solving the problem at hand
> instead of figuring out the most unlikely occurences that can cause
> deadlock - I suspect that I'll be faster rewriting the code from today
> as event-loops than figuring out what caused and how to avoid that deadlock.
>
> And that is in my understanding the most important part - how many hours
> have you spent thinking about how exactly a highly concurrent system
> could possibly deadlock? What if you could spend this time on improving
> the system instead, knowing that deadlock *simply cannot happen*?
>
> Cheers,
>    - Andreas
>
>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Igor Stasenko
In reply to this post by Rob Withers
On 30/10/2007, Rob Withers <[hidden email]> wrote:

>
> > The idea is simple. Since we have
> > limited set of write operations in object memory we could design a
> > system which prevents a primitive operations (such as primat: put:)
> > from running concurrently on the same object.
> > This means (like in your concept), that each object should have an
> > additional slot identifying a context where it used as receiver.
> > The only write operations which could modify object memory within any
> > method body is write to receiver memory (like setting ivar or indexed
> > var). Other operations (like pushing on stack) is safe, because we'll
> > have separate stack for each execution thread.
> > The message evaluation sequence then can be done as following:
> > - check receiver's assigned context
> > - if it's nil, then assign it to current and start executing method
> > - if its not nil, then schedule method execution to the thread, which
> > owns given context.
>
> It seems to me that you are building event-loops in the VM.  Consider
> anObject that is sent a msg on thread 1, so he is assigned to thread 1 and
> you start processing it with thread 1.  In the meantime, 3 msgs are sent to
> this object on threads 2, 3, and 4.  Each message send is scheduled for
> processing by thread 1, when it becomes available.  There is your
> event-loop.
>

Yes, exactly. I'm still unsure if it really needed for all objects in
system. For instance, the true/false/nil object are just singletons
without any internal state. Thus scheduling them into single thread of
execution can be simply omitted.

> The difference is that you allow objects to be reassigned based on thread
> availability.  Since we have a shared memory, that is as simple as setting
> the new context.  This is similar to ideas I had for redistributing objects
> to different Vats based upon msg send density, which could be monitored, or
> user specification.
>

The other thing which bothers me is a reflection. Should we expose a
properties of native threads at language side (for using/showing them
in debugger)? Or how well new contexts will deal with continuations,
which is used in seaside..
Its hard to give a brief answer for such things.

>
> > when performing a message send in method we deactivate the context,
> > which means that we can set assigned object's context to nil, and set
> > it again when activate context again. Or if we not, then all message
> > sends to given object will be scheduled to same native thread. I don't
> > know what is best - both approaches having own pros and cos.
> > Retaining a context can help us with making all calls to some external
> > library be invoked from single thread.
>
> Indeed.  That would be a Vat.
>
> >
> > Either way we should ensure that there's a single active context at
> > some point of time in system running using same receiver. Also, we can
> > leave most primitives unmodified and be careless about concurrency in
> > these primitives, because by design, there can be only single
> > primitive running in system for particular object at any point of
> > time.
>
> But the primitive may have internal state that can't be shared.
>

I know, but i'm talking about basic primitives which used most
frequently and operating only with object memory. Such primitives
usually don't have any internal state, or using an interpreter state.

All other primitives which have internal state should be reviewed.
And at first stages we could add support for scheduling all these
'old' primitives into single 'main' thread. So they will work as in
current VM not knowing that VM is actually multithreaded.

> >
> > If you can see, such approach gives not much in solving concurrency
> > issues at language side. But that's where i think developers should
> > choose the way. We can provide all what they need
> > (semaphores/promises/futures e.t.c.) in libraries and provide some
> > additional primitives for controlling new aspects of VM.
>
> I could certainly build the language features I am interested in on top of
> such a VM.
>

I'd like to hear more critics about such model :) If it proves to be
viable and much or less easily doable (comparing to other models) then
i could start working on it :)

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Rob Withers
In reply to this post by pwl
Igor, why don't you add your ideas to http://wiki.squeak.org/squeak/6012.    . . .
 
----- Original Message ----
From: Igor Stasenko [hidden email]
> On 30/10/2007, Rob Withers <[hidden email]> wrote:

>
> > It seems to me that you are building event-loops in the VM.  Consider
> > anObject that is sent a msg on thread 1, so he is assigned to thread 1 and
> > you start processing it with thread 1.  In the meantime, 3 msgs are sent to
> > this object on threads 2, 3, and 4.  Each message send is scheduled for
> > processing by thread 1, when it becomes available.  There is your
> > event-loop.
> >
>
> Yes, exactly. I'm still unsure if it really needed for all objects in
> system. For instance, the true/false/nil object are just singletons
> without any internal state. Thus scheduling them into single thread of
> execution can be simply omitted.
The same goes for SmallIntegers, which while not singletons, are not mutable.  Perhaps Characters are also candidates for this.  These are PassByConstruction objects, which means they are always local.  PassByCopy is another type of PassByConstruction, where a copy is marshalled to the other side.
 
 
> The other thing which bothers me is a reflection. Should we expose a
> properties of native threads at language side (for using/showing them
> in debugger)? Or how well new contexts will deal with continuations,
> which is used in seaside..
I think each process should know which thread id it is running in.  In my model, the process stays with the thread, while in yours I am not sure.  A message to a VatRef to an object in a different Vat, would be sent asynchronously and scheduled in the other Vat for it's thread to process.
 
BTW, in another email you asked about how passing a ref between Vats (Islands) guaranteed that all msgs to it would go to the home Vat.  The answer is that a ref to a Vat local object, being passed as an arg to a msg send to a VatRef to an object in a different Vat, would be translated into a VatRef back to the origin Vat when the MsgSend marshalled it's arguments to the 2nd Vat.  No ref with direct memory access would ever be allowed to be passed to another Vat.
 
> > But the primitive may have internal state that can't be shared.

>
>
> I know, but i'm talking about basic primitives which used most
> frequently and operating only with object memory. Such primitives
> usually don't have any internal state, or using an interpreter state.
 
Ok, so from the link above, the ProtectedPrimitives would not always be protected.

>
> All other primitives which have internal state should be reviewed.
> And at first stages we could add support for scheduling all these
> 'old' primitives into single 'main' thread. So they will work as in
> current VM not knowing that VM is actually multithreaded.
 
I was thinking we mutex protect them, but still allow different threads to use them.

> I'd like to hear more critics about such model :) If it proves to be
> viable and much or less easily doable (comparing to other models) then
> i could start working on it :)
Go for it!!
 
cheers,
Rob



Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Igor Stasenko
On 30/10/2007, Rob Withers <[hidden email]> wrote:
>
> Igor, why don't you add your ideas to
> http://wiki.squeak.org/squeak/6012.    . . .

I think i can help with:
# Understand CCodeGenerator and TParseNode hierarchy.

Lately i did conversion of parse tree into lambda message sends.
Then by using transformers(substitution/replacement) its easy to
convert them into bytecodes, C/C++ source or to Exupery intermediate
representation, since lambdas are uniform representation of algorithm.
I don't know how much such abstraction could ease development.
Actually my original idea was to represent compiled methods not in
bytecode but in lambdas.
Yes, bytecodes can be viewed as short representation of
lambda-functions, but the difference is, that they are not real
objects.
Lambdas are best fit for method inlining, since substitution of free
variables can be done relatively easily at low-level (VM).

>
> ----- Original Message ----
> From: Igor Stasenko [hidden email]
>
> > On 30/10/2007, Rob Withers <[hidden email]> wrote:
> >
> > > It seems to me that you are building event-loops in the VM.  Consider
> > > anObject that is sent a msg on thread 1, so he is assigned to thread 1
> and
> > > you start processing it with thread 1.  In the meantime, 3 msgs are sent
> to
> > > this object on threads 2, 3, and 4.  Each message send is scheduled for
> > > processing by thread 1, when it becomes available.  There is your
> > > event-loop.
> > >
> >
> > Yes, exactly. I'm still unsure if it really needed for all objects in
> > system. For instance, the true/false/nil object are just singletons
> > without any internal state. Thus scheduling them into single thread of
> > execution can be simply omitted.
>
> The same goes for SmallIntegers, which while not singletons, are not
> mutable.  Perhaps Characters are also candidates for this.

I think this can be simply done by testing , if given object is
non-indexable and having ivar count =0, or its having a readonly flag
set.

>  These are
> PassByConstruction objects, which means they are always local.  PassByCopy
> is another type of PassByConstruction, where a copy is marshalled to the
> other side.
>
Hmm i can't follow you here. Could you elaborate?

>
> > The other thing which bothers me is a reflection. Should we expose a
> > properties of native threads at language side (for using/showing them
> > in debugger)? Or how well new contexts will deal with continuations,
> > which is used in seaside..
>
> I think each process should know which thread id it is running in.  In my
> model, the process stays with the thread, while in yours I am not sure.  A
> message to a VatRef to an object in a different Vat, would be sent
> asynchronously and scheduled in the other Vat for it's thread to process.
>

Yes, i don't like to bind the language side Process to particular native thread.
Simply because of scheduling/preemption issues. Trying to do nice
scheduling with different native thread issues under different
platforms/OSes could be a quite messy.
We are really don't need to have more than a fixed number of threads
in VM (one for each core, and maybe 1 more for GC).

> BTW, in another email you asked about how passing a ref between Vats
> (Islands) guaranteed that all msgs to it would go to the home Vat.  The
> answer is that a ref to a Vat local object, being passed as an arg to a msg
> send to a VatRef to an object in a different Vat, would be translated into a
> VatRef back to the origin Vat when the MsgSend marshalled it's arguments to
> the 2nd Vat.  No ref with direct memory access would ever be allowed to be
> passed to another Vat.
>
>
> > > But the primitive may have internal state that can't be shared.
> >
> >
> > I know, but i'm talking about basic primitives which used most
> > frequently and operating only with object memory. Such primitives
> > usually don't have any internal state, or using an interpreter state.
>
> Ok, so from the link above, the ProtectedPrimitives would not always be
> protected.
>
> >
> > All other primitives which have internal state should be reviewed.
> > And at first stages we could add support for scheduling all these
> > 'old' primitives into single 'main' thread. So they will work as in
> > current VM not knowing that VM is actually multithreaded.
>
> I was thinking we mutex protect them, but still allow different threads to
> use them.
>
>
> > I'd like to hear more critics about such model :) If it proves to be
> > viable and much or less easily doable (comparing to other models) then
> > i could start working on it :)
>
> Go for it!!
>
> cheers,
> Rob
>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Question about YAXO-XML and possible bug

Boris.Gaertner
In reply to this post by Igor Stasenko
I ran into a problem when I tried to process the
svg example files that come with the SVG specification
from W3C

The following is valid xml and even a valid SVG
(Scalable Vector Graphics) file.

<?xml version="1.0" standalone="no"?>
<svg width="10cm" height="3cm" viewBox="0 0 1000 300"
     xmlns="http://www.w3.org/2000/svg">

    <text x="200" y="150" fill="blue"
            font-family="Verdana"
            font-size="45">
        Text with
        <tspan  fill="red" >read</tspan> and
        <tspan fill="green">green</tspan> text spans
    </text>
</svg>

(This code draws one line of blue text, the words 'red' and
'green' are displayed in red and in green. The tspan
elements are used to encode text adornment. Try a
newer  release of Mozilla Firefox to render this svg file.
Note also that some xml readers expect a DOCTYPE
specification. A suitable DOCTYPE specification for
svg is
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN"
  "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">

)

The text element is a mixed contents element (in XML
terminology, see section 3.2.2 of the XML reference
from Oct 2000), it contains three #PCDATA items and
two child elementrs (The  tspan elements).

When we parse this piece of xml with XMLDOMParser,
the text element is translated into an instance of
XMLElement with the following values of its more important
instance variables:
 
  name = 'text'
  element
      an OrderedCollection with two XMLElements, one
      for each tspan item
  contents
     an OrderedCollection with the three instances of
      XMLStringNode for the strings 'Text with ',
      'and', 'text spans'.
 attributes
    a Dictionary with 5 elements.

The problem here is that this XMLElement does not
contain information about the sequence of the strings and
the text spans. This is however a very important information
when you want to render a svg text element. I really think
that it is an error to put #PCDATA and child elements into
separate collections.

For now, I changed XMLDOMParser>>characters
from

characters: aString
     | newElement |
   newElement _ XMLStringNode string: aString.
   self top addContent: newElement.

to

characters: aString
     | newElement |
   newElement _ XMLStringNode string: aString.
   self top addElement: newElement.

With that change, I put all substructures into the
'elements' collection; the 'contents' collection becomes
obsolete.

This solves my problem with decorated svg text
but my change will certainly break a lot of other
applications that use the XMLDOMParser.
 
My questions:
* what is your experience with xml elements
   that have mixed contents?
* what do you think should be done with svg
   text like the one of my example?

Any comments are welcome.

Greetings
Boris

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Rob Withers
In reply to this post by pwl


----- Original Message ----
From: Igor Stasenko [hidden email]
> >  These are
> > PassByConstruction objects, which means they are always local.   PassByCopy
> > is another type of PassByConstruction, where a copy is marshalled to the
> > other side.
>
> Hmm i can't follow you here. Could you elaborate?
Sorry, this is terminology from E.  They have a whole terminology describing how objects get passed over the wire between Vats.
 
 
The restriction that PassByCopy objects are selfless, immutable and transparent certainly apply to SmallIntegers, true, false, and nil.  I don't know about Characters, but that is their terminology.  We can define ourselves.
 
Rob


Reply | Threaded
Open this post in threaded view
|

Re: Question about YAXO-XML and possible bug

Philippe Marschall
In reply to this post by Boris.Gaertner
2007/10/30, Boris.Gaertner <[hidden email]>:

> I ran into a problem when I tried to process the
> svg example files that come with the SVG specification
> from W3C
>
> The following is valid xml and even a valid SVG
> (Scalable Vector Graphics) file.
>
> <?xml version="1.0" standalone="no"?>
> <svg width="10cm" height="3cm" viewBox="0 0 1000 300"
>      xmlns="http://www.w3.org/2000/svg">
>
>     <text x="200" y="150" fill="blue"
>             font-family="Verdana"
>             font-size="45">
>         Text with
>         <tspan  fill="red" >read</tspan> and
>         <tspan fill="green">green</tspan> text spans
>     </text>
> </svg>
>
> (This code draws one line of blue text, the words 'red' and
> 'green' are displayed in red and in green. The tspan
> elements are used to encode text adornment. Try a
> newer  release of Mozilla Firefox to render this svg file.
> Note also that some xml readers expect a DOCTYPE
> specification. A suitable DOCTYPE specification for
> svg is
> <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN"
>   "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
>
> )
>
> The text element is a mixed contents element (in XML
> terminology, see section 3.2.2 of the XML reference
> from Oct 2000), it contains three #PCDATA items and
> two child elementrs (The  tspan elements).
>
> When we parse this piece of xml with XMLDOMParser,
> the text element is translated into an instance of
> XMLElement with the following values of its more important
> instance variables:
>
>   name = 'text'
>   element
>       an OrderedCollection with two XMLElements, one
>       for each tspan item
>   contents
>      an OrderedCollection with the three instances of
>       XMLStringNode for the strings 'Text with ',
>       'and', 'text spans'.
>  attributes
>     a Dictionary with 5 elements.
>
> The problem here is that this XMLElement does not
> contain information about the sequence of the strings and
> the text spans. This is however a very important information
> when you want to render a svg text element. I really think
> that it is an error to put #PCDATA and child elements into
> separate collections.
>
> For now, I changed XMLDOMParser>>characters
> from
>
> characters: aString
>      | newElement |
>    newElement _ XMLStringNode string: aString.
>    self top addContent: newElement.
>
> to
>
> characters: aString
>      | newElement |
>    newElement _ XMLStringNode string: aString.
>    self top addElement: newElement.
>
> With that change, I put all substructures into the
> 'elements' collection; the 'contents' collection becomes
> obsolete.
>
> This solves my problem with decorated svg text
> but my change will certainly break a lot of other
> applications that use the XMLDOMParser.
>
> My questions:
> * what is your experience with xml elements
>    that have mixed contents?

They're broken Yaxo. The other Yaxo bug I know of is:
http://bugs.squeak.org/view.php?id=3082

Cheers
Philippe

> * what do you think should be done with svg
>    text like the one of my example?
>
> Any comments are welcome.
>
> Greetings
> Boris
>
>

12345 ... 8