Smalltalk › Squeak › Squeak - Dev

Multy-core CPUs

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

194 messages Options

1 ... 678910

Matej Kosik-2

Re: Erlang a primitive language? (was Re: Multy-core CPUs)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jason Johnson wrote:

> On 10/25/07, Matej Kosik <[hidden email]> wrote:
>> But there are strictly more interesting things that we want to describe whose behavior cannot be
>> described in the lambda-calculus.
>>
>> The thing that comes to my mind is
>> - - ClockMorph
>> - - the web-server
>> - - the programmable interrupt timer
>> - - Erlang concurrent, mutually interacting, processes.
>
> Why not? You don't need to update variables to have updates. And,
> you know that Erlang can not modify variables after creation right,
> although I don't know if this is the property of L-C you had in mind?
>

Sorry, what is L-C?

Some (infinitely many) processes expressed in the pi-calculus cannot be modeled in the
lambda-calculus because they cannot be regarded as algorithms.
http://www.amazon.com/s/ref=nb_ss_gw/102-3481753-9537767?initialSearch=1&url=search-alias%3Daps&field-keywords=milner+pi-calculus&Go.x=0&Go.y=0&Go=Go

Am I missing something?

Best regard
- --
Matej Kosik
ICQ: 300133844
skype: matej_kosik
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHIPNxL+CaXfJI/hgRAlgeAJwPJuCZgJXpwQvwgR76h6X8H7xoNQCdHhTo
Xm8BOw7excfEOsti21GSg/I=
=6IqL
-----END PGP SIGNATURE-----

pwl

Re: Multy-core CPUs

In reply to this post by Jason Johnson-5

Hi,

Jason Johnson wrote:

On 10/25/07, Peter William Lount [hidden email] wrote:

 What are they?


Here.  Again.  http://sixtyk.blogspot.com/2007/05/threading.html

Thanks for the link.

 Sure, one only has to search the internet for "concurrency" and one sees a
wide range of problems and potential solutions. Look at the Little Book of
Semaphores for a breathtaking look at a few of the many possible solutions
to various problems. Open your eyes to the wider horizon.


Open my eyes to the wider horizon of yesterday?  I've seen it.  It's
complicated.  I prefer to look at tomorrow.

Well you might create a simpler tomorrow or SOME PROBLEMS but not for many real world problems.

 I never said you stated that explicitly - I'd have to check all your
postings to find that out. It's implied by what you are saying in many of
your postings. At least that is the impression that I'm getting from your
writing. You've certainly not acknowledged the opposite.


I really believe that over-all your intentions are good, but this
seems downright dishonest.  Either that are you simply don't read what
I write.  I have told you *every single time* you brought up this
charge that I don't think it will solve all cases.

Well I don't recall that, and it's hard enough keeping up with this thread and the other stuff going on. It just seems that you and some of the others are ignoring some of the more complex real problems with simplistic solutions. As Einstein said, simple but not simplistic. In terms of deep copy that means yes by all means a full deep copy is needed but to avoid being simplistic a partial deep copy with or without references is also required.

Ok, we're getting no where with this.  I apologized to the list for
what this thread has turned into, and I'll try to do a better job
staying out of this sort of pointless "nu uh", "uh hu!", "nu uh!"
discussions in the future.  (if I start it again just warn me!  It's a
bit of a weakness of mine).

I think this has been a very good discussion. It's uncovered some interesting ideas that are out there. It's also shown that the wider Smalltalk group is getting ready - maybe - to accept some of the transaction processing notions that I've been supporting for over fifteen years now.

Keep up the good work Jason.

All the best,

peter

Jason Johnson-5

Re: Erlang a primitive language? (was Re: Multy-core CPUs)

In reply to this post by Matej Kosik-2

Sorry, I was trying to make an abbreviation for lambda-calculus

On 10/25/07, Matej Kosik <[hidden email]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Jason Johnson wrote:
> > On 10/25/07, Matej Kosik <[hidden email]> wrote:
> >> But there are strictly more interesting things that we want to describe whose behavior cannot be
> >> described in the lambda-calculus.
> >>
> >> The thing that comes to my mind is
> >> - - ClockMorph
> >> - - the web-server
> >> - - the programmable interrupt timer
> >> - - Erlang concurrent, mutually interacting, processes.
> >
> > Why not? You don't need to update variables to have updates. And,
> > you know that Erlang can not modify variables after creation right,
> > although I don't know if this is the property of L-C you had in mind?
> >
>
> Sorry, what is L-C?
>
> Some (infinitely many) processes expressed in the pi-calculus cannot be modeled in the
> lambda-calculus because they cannot be regarded as algorithms.
> http://www.amazon.com/s/ref=nb_ss_gw/102-3481753-9537767?initialSearch=1&url=search-alias%3Daps&field-keywords=milner+pi-calculus&Go.x=0&Go.y=0&Go=Go
>
> Am I missing something?
>
> Best regard
> - --
> Matej Kosik
> ICQ: 300133844
> skype: matej_kosik
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFHIPNxL+CaXfJI/hgRAlgeAJwPJuCZgJXpwQvwgR76h6X8H7xoNQCdHhTo
> Xm8BOw7excfEOsti21GSg/I=
> =6IqL
> -----END PGP SIGNATURE-----
>
>

Rob Withers

Re: Multy-core CPUs

In reply to this post by gruntfuttuck

Hi Peter,

My response here is not formatted as I would wish, but is the imit of my current technology.

I work on SqueakElib, and though I am certainly no expert on E, I am somewhat familiar with it's design and implementation. That said, a comment you made caught my eye...

> ----- Original Message ----
> From: Peter William Lount [hidden email]
>

> If a number of messages are waiting in the input queue of a process

> that can only process one message at a time since it's not multi-threaded

> then those messages are BLOCKED while in the thread. Now imagine

> another process with messages in it's queue that are also BLOCKED

> since they are waiting in the queue and only one message can be

> processed at a time. Now imagine that process A and process B each

> have messages that the other needs before it can proceed but those

> messages are BLOCKED waiting for processing in the queues.
>
> This is a real example of what can happen with message queues. The

> order isn't guaranteed. Simple concurrency solutions often have deadlock

> scenarios. This can occur when objects must synchronize events or

> information. As soon as you have multiple threads of execution you've got

> problems that need solving regardless of the concurrency model in place.

Messages don't get blocked, when they are queued for the Vat (thread/process) to service them. They do get queued when they are sent to a Promise, but it does not affect the Vat. When the Promise resolves, the messages are forwarded to the Vat for processing, in order, so partial msg ordering is maintained.

Msg ordering is partial because of this case: object a in Vat A, object b in Vat B, and object c in Vat C are intercommunicating. Both b and c have references to a. At time t1, b sends msg m1 to a, at time t2, c sends msg m2 to a, but there is higher network latency for b, so m2 gets to a at t3 and m1 gets to a at t4 and they are out of order from the remote invocation order.

I hope this helps. Read more about Elib.

cheers,

Rob

Sebastian Sastre-2

RE: Multy-core CPUs

In reply to this post by pwl

De: [hidden email] [mailto:[hidden email]] En nombre de Peter William Lount
Enviado el: Jueves, 25 de Octubre de 2007 16:29
Para: The general-purpose Squeak developers list
Asunto: Re: Multy-core CPUs

Hi,

Sebastian Sastre wrote:

hi,

What? That just won't work. Think of the memory overhead.

I don't give credit to unfounded apriorisms. I think it deserves to be proved that does not work. Anyway let's just assume that may be too much for state of the art hardware in common computers in year 2007. What about in 2009? what about in 2012? Remember the attitude you had saying this now the first day of 2012.

It's not an unfounded apriorism as you put it.

Current hardware and technology expected in the next ten years isn't optimized for N hundred thousand or N million threads of execution. Maybe in the future that will be the case.

The Tile-64 processor is expected to grow to about 4096 processors by pushing the limits of technology beyond what they are today. To reach the levels you are talking about for a current Smalltalk image with millions of objects each having their own thread (or process) isn't going to happen anytime soon.

I work with real hardware.

I am open and willing to be pleasantly surprised however.

Peter.. Peter.. you have to fight a little harder to that demon. Look I asked you to read my previous post with subject "One Process Per Instance" where I taked the time (so money) of explainly as didactic as I can how *your* millon object example could be managed in a system like the one I'm speculating with. So please, please Peter, I ask you not to make repeat myself go read it and make your statements there if you found them. As I already said I think the experiences you are sharing in this matter are precious so discussion gets just richer.

Tying an object instance to a particular process makes no sense. If you did that you'd likely end up with just as many dead locks and other concurrency problems since you'd now have message sends to the object being queued up on the processes input queue. Since processes could only process on message at a time deadlocks can occur - plus all kinds of nasty problems resulting from the order of messages in the queue. (There is a similar nasty problem with the GUI event processing in VisualAge Smalltalk that leads to very difficult to diagnose and comprehend concurrency problems). It's a rats maze that's best to avoid.

Besides, in some cases an object with multiple threads could respond to many messages - literally - at the same time given multiple cores. Why slow down the system by putting all the messages into a single queue when you don't have to!?

You didn't understand the model I'm talking about.

That is likely the case.

So I ask you kindly if you can read my previus emails where I where I have taken the job of expresing my exploratory thoughts until reached this model and the speculation about the existence of this model (consequences).

There isn't such a thing as an object with multiple trheads. That does not exists in this model.

Ok. I got that.

It does exists one process per instance no more no less.

I did get that. Even if you only do that logically you've got serious problems.

If you have read where I talk about how to manage with this model N millon objects with limited hardware and you still found problems please be my guest to inform me here because I want to know that as soon as possible.

I think you're thinking about processes and threads the same way you know them today.

I can easily see such a scenario working and also breaking all over the place.

Why?

Lets see if this helps you to get the idea: Desambiguation: for this model I'm talking about process not as an OS process but as a VM light process which we also use to call them threads.

Ok.

So I'm saying that in this model you have only one process per instance but that process is not a process that can have threads belonging to it.

ok.

That generates a hell of complexity.

You lost me there. What complexity?

Does not matter is other model not the one I'm speculating. (probably one you have imagined before clarifiying the 1:1 object process thing)

The process I'm saying it's tied to an instance it's more close to the process word you know from dictionary plus what you know what an instance is and with the process implemented by a VM that can balance it across cores.

I didn't understand. Please restate.

I restates that N times in my previus emails being too long. To give you a clue it's about the double nature I'm saying the object has. An amalgam between object and process. It's conceptual indissociability. More on those previus emails.

I'm not falling in the pitfall of start trying to parallelize code automagically. This far from it. In fact I think this is better than that illusion. Every message is guaranteed by VM to reach it's destination in guaranteed order. Otherwise will be chaos. And we want an ordered chaos like the one we have now in a Squeak reified image.

Yes, squeak is ordered chaos. ;--).

Clarified that I ask why do you think could be deadlocks? and what other kind of concurrency problems do you think that will this model suffer?

If a number of messages are waiting in the input queue of a process that can only process one message at a time since it's not multi-threaded then those messages are BLOCKED while in the thread. Now imagine another process with messages in it's queue that are also BLOCKED since they are waiting in the queue and only one message can be processed at a time. Now imagine that process A and process B each have messages that the other needs before it can proceed but those messages are BLOCKED waiting for processing in the queues.

This is a real example of what can happen with message queues. The order isn't guaranteed. Simple concurrency solutions often have deadlock scenarios. This can occur when objects must synchronize events or information. As soon as you have multiple threads of execution you've got problems that need solving regardless of the concurrency model in place.

But that can happen right now if you give a bad use of process in a current Smalltalk. I don't want to solve deadlocks for anybody using parallelism badly. I just want a Smalltalk that works like todays but balancing cpu load across cores and scaling to an arbitrary number of them. All this trhead it's about that.

Tying an object's life time to the lifetime of a process doesn't make sense since there could be references to the object all over the place. If the process quits the object should still be alive IF there are still references to it.

You'd need to pass around more than references to processes. For if a process has more than one object you'd not get the resolution you'd need. No, passing object references around is way better.

Yes of course there will be. In this system a process termination is one of two things: A) that instance is being reclaimed in a garbage collection or B) that instance has been written to disk in a kind of hibernation that can be reified again on demand.  Please refer to my previous post with subject "One Process Per Instance.." where I talk more about exacly this.

If all there is is a one object per process and one process per object - a 1 to 1 mapping then yes gc would work that way but the 1 to 1 mapping isn't likely to ever happen given current and future hardware prospects.

But Peter don't lower your guard on that so easy! we know techniques to administer resources like navegating 10 thousand instances at the time a 10 gigas image of 10 million objects! don't shoot hope before it borns! I talk some details I've imagined about this in my "One Process Per Instance" post.

Even if you considered an object as having it's own "logical" process you'd get into the queuing problems hinted at above.

Which I dont see and I ask your help to understand if you still find them after the clarifications made about the model.

See the example above.

Besides objects in Smalltalk are really fine grained. The notion that each object would have it's own thread would require so much thread switching that no current processor could handle that. It would also be a huge waste of resources.

And what do you think was going out of the mouths of criticizers of the initiatives like the park place team had in 1970's making a Smalltalk with the price of the CPU's and RAM at that time? that VM's are a smart efficient use of resources?

That's not really relevant. If you want to build that please go ahead - please don't let me stop you, that's the last thing I'd want. I wish you luck. I get to play with current hardware and hardware that's coming down the pipe such as the Tile-64 or the newest GPUs when they are available to the wider market.

We all have to use cheap hardware. Please (re)think about what I said about administering hardware resources over this model.

So I copy paste myself: "I don't give credit to unfounded apriorisms. It deserves to be proven that does not work. Anyway let's just assume that may be too much for state of the art hardware in common computers in year 2007. What about in 2009? what about in 2012?"

Well just get out your calculator. There is an overhead to a thread or process in bytes. Say 512 bytes per thread plus it's stack. There is the number of objects. Say 1 million for a medium to small image. Now multiply those and you get 1/2 gigabyte. Oh, we forgot the stack space and the memory for the objects themselves. Add a multiplier for that, say 8 and you get 4 gigabytes. Oh, wait we forgot that the stack is kind weird since as each message send that isn't to self must be an interprocess or interthread message send you've got some weirdness going on let along all the thread context switching for each message send that occurs. Then you've got to add more for who knows what... the list could go on for quite a bit. It's just mind boggling.

I just can't beleive we really can't find clever ways of adminiter resources to the point in which this becomes acceptable.

Simply put current cpu architectures are simply not designed for that approach. Heck they are even highly incompatible with dynamic message passing since they favor static code in terms of optimizations.

Yes that happens with machines based on mathematic models like the boolean model. It injects an inpedance mismatch between the conceptual modeling and the virtual modeling.

Again, one solution does not fit all problems - if it did programming would be easier.

But programming should have to be easier.
Yes, I concur, whenever it's possible to do so. But it also shouldn't ignore the hard problems either.

Smalltalk made it easier in a lot of aspects.

Sure I concur. That's why I am working here in this group spending time (is money) on these emails.

Listen.. I'm not a naif silver bullet purchaser nor a faithful person. I'm a critic Smalltalker that thinks he gets the point about OOP and tries to find solutions to surpass the multicore crisis by getting an empowered system not consoling itself with a weaker one.

I do get that about you.

Peter please try to forget about how systems are made and think in how you want to make them.

I do think about how I want to make them. However to make them I have no choice but to consider how to actually build them using existing technologies and the coming future technologies.

Currently we have 2-core and 4-core processors as the mainstream with 3-core and 8-core coming to a computer store near you. We have the current crop of GPUs from NVidia that have 128 processing units that can be programmed in a variant of C for some general purpose program tasks using a SIMD (single instruction multiple data) format - very useful for those number crunching applications like graphics, cryptology and numeric analysis to name just a few. We also have the general purpose networked Tile-64 coming - lots of general purpose compute power with an equal amount of scalable networked IO power - very impressive. Intel even has a prototype with 80-cores that is similar. Intel also has it's awesomely impressive Itanium processor with instruction level parallelism as well as multiple cores - just wait till that's a 128 core beastie. Please there is hardware that we likely don't know about or that hasn't been invented yet. Please bring it on!!!

The bigger problem is that in order to build real systems I need to think about how they are constructed.

So yes, I want easy parallel computing but it's just a harsh reality that concurrency, synchronization, distributed processing, and other advanced topics are not always easy or possible to simplify as much as we try to want them to be. That is the nature of computers.

Sorry for being a visionary-realist. Sorry if I've sounded like the critic. I don't mean to be the critic that kills your dreams - if I've done that I apologize. I've simply meant to be the realist who informs the visionary that certain adjustments are needed.

All the best,

Peter

Please take your time to think about what I've stated of administering resources being possible to manage load of millions of instances by a swarm of a few at the time. And don't be sorry of anything. I love criticism. Our culture need tons of criticism to be stronger. It's the only way we can unistall deprecated or obsolete ideas. You are helping here. If I really dreaming an this don't work I want that dream to be kill now so I can spend my time in something better. That helps.

By now this model it's just getting stronger. Please try to get it down !!! :)))

cheers,

Sebastian

pwl

Re: Multy-core CPUs

In reply to this post by Rob Withers

Hi Rob,

SqueakElib sounds interesting. With all the links on this thread (pun intended) I have weeks of reading and absorbing to do.

My example below wasn't expressed as clear as it could be.

When a message is sitting in an inbound queue on a process waiting for that process to do get to it it is in essence blocked until it gets processed. If the process is only processing one inbound message at once it is possible for a dead lock to occur when multiple processes are in a similar situation with each other. Both processes waiting for the message that would let them continue which is sitting blocked on the other's queue. That's all. Classic deadlock case.

I don't know if that is any clearer, hopefully it is.

All the best,

Peter

Rob Withers wrote:

Hi Peter,

My response here is not formatted as I would wish, but is the imit of my current technology.

I work on SqueakElib, and though I am certainly no expert on E, I am somewhat familiar with it's design and implementation. That said, a comment you made caught my eye...

> ----- Original Message ----
> From: Peter William Lount [hidden email]
>

> If a number of messages are waiting in the input queue of a process

> that can only process one message at a time since it's not multi-threaded

> then those messages are BLOCKED while in the thread. Now imagine

> another process with messages in it's queue that are also BLOCKED

> since they are waiting in the queue and only one message can be

> processed at a time. Now imagine that process A and process B each

> have messages that the other needs before it can proceed but those

> messages are BLOCKED waiting for processing in the queues.
>
> This is a real example of what can happen with message queues. The

> order isn't guaranteed. Simple concurrency solutions often have deadlock

> scenarios. This can occur when objects must synchronize events or

> information. As soon as you have multiple threads of execution you've got

> problems that need solving regardless of the concurrency model in place.

Messages don't get blocked, when they are queued for the Vat (thread/process) to service them. They do get queued when they are sent to a Promise, but it does not affect the Vat. When the Promise resolves, the messages are forwarded to the Vat for processing, in order, so partial msg ordering is maintained.

Msg ordering is partial because of this case: object a in Vat A, object b in Vat B, and object c in Vat C are intercommunicating. Both b and c have references to a. At time t1, b sends msg m1 to a, at time t2, c sends msg m2 to a, but there is higher network latency for b, so m2 gets to a at t3 and m1 gets to a at t4 and they are out of order from the remote invocation order.

I hope this helps. Read more about Elib.

cheers,

Rob
  

Nicolas Cellier-3

Re: Multy-core CPUs

In reply to this post by pwl

But Smalltalk methods are sequential procedures by nature, so having a
process per object maybe would adress mutual exclusion problem, but will
not introduce parallelism per se.

Someone has to decide to break execution sequential path into parallel
paths.

As long as we have 2,3,4 processing units, maybe we can trust programmer
can use old concurrency model, working with already existing
Smalltalk::Process objects, we can try hard to make them robust to
parallelism...

This obviously won't scale to a 1000 or more processing units!

It seems to me that:
- focusing exclusively on solving mutual exclusion problem, sharing
whole state or none (though I don't understand the value) or partial
state via some duplication and syncing mecanism (a-la-croquet or
a-la-spoon-sauce maybe).
- not adressing parallelisation problem (Except early doInParallel:
proposal).
makes this thread old-concurrency-few-core problem related.

Maybe there is a future for declarative language revival...

PS: for fun, what happens to all-is-object paradigm if each and every
object has a MessageQueue object? What is the MessageQueue of the
MessageQueue of ...

Peter William Lount a écrit :

> Hi,
>
> Sebastian Sastre wrote:
>
>> hi,
>>
>> What? That just won't work. Think of the memory overhead.
>>
>> I don't give credit to unfounded apriorisms. I think it deserves to be
>> proved that does not work. Anyway let's just assume that may be too
>> much for state of the art hardware in common computers in year 2007.
>> What about in 2009? what about in 2012? Remember the attitude you had
>> saying this now the first day of 2012.
>
> It's not an unfounded apriorism as you put it.
>
> Current hardware and technology expected in the next ten years isn't
> optimized for N hundred thousand or N million threads of execution.
> Maybe in the future that will be the case.
>
> The Tile-64 processor is expected to grow to about 4096 processors by
> pushing the limits of technology beyond what they are today. To reach
> the levels you are talking about for a current Smalltalk image with
> millions of objects each having their own thread (or process) isn't
> going to happen anytime soon.
>
> I work with real hardware.
>
> I am open and willing to be pleasantly surprised however.
>
>
>> Tying an object instance to a particular process makes no sense. If
>> you did that you'd likely end up with just as many dead locks and
>> other concurrency problems since you'd now have message sends to the
>> object being queued up on the processes input queue. Since processes
>> could only process on message at a time deadlocks can occur - plus all
>> kinds of nasty problems resulting from the order of messages in the
>> queue. (There is a similar nasty problem with the GUI event processing
>> in VisualAge Smalltalk that leads to very difficult to diagnose and
>> comprehend concurrency problems). It's a rats maze that's best to avoid.
>>
>> Besides, in some cases an object with multiple threads could respond
>> to many messages - literally - at the same time given multiple cores.
>> Why slow down the system by putting all the messages into a single
>> queue when you don't have to!?
>> You didn't understand the model I'm talking about.
>
> That is likely the case.
>
>
>> There isn't such a thing as an object with multiple trheads. That does
>> not exists in this model.
>
> Ok. I got that.
>
>
>> It does exists one process per instance no more no less.
>
> I did get that. Even if you only do that logically you've got serious
> problems.
>
>
>> I think you're thinking about processes and threads the same way you
>> know them today.
>
> I can easily see such a scenario working and also breaking all over the
> place.
>
>
>> Lets see if this helps you to get the idea: Desambiguation: for this
>> model I'm talking about process not as an OS process but as a VM light
>> process which we also use to call them threads.
>
> Ok.
>
>
>> So I'm saying that in this model you have only one process per
>> instance but that process is not a process that can have threads
>> belonging to it.
>
> ok.
>
>> That generates a hell of complexity.
>
> You lost me there. What complexity?
>
>
>> The process I'm saying it's tied to an instance it's more close to the
>> process word you know from dictionary plus what you know what an
>> instance is and with the process implemented by a VM that can balance
>> it across cores.
>
> I didn't understand. Please restate.
>
>
>>
>> I'm not falling in the pitfall of start trying to parallelize code
>> automagically. This far from it. In fact I think this is better than
>> that illusion. Every message is guaranteed by VM to reach it's
>> destination in guaranteed order. Otherwise will be chaos. And we want
>> an ordered chaos like the one we have now in a Squeak reified image.
>
> Yes, squeak is ordered chaos. ;--).
>
>
>
>>
>> Clarified that I ask why do you think could be deadlocks? and what
>> other kind of concurrency problems do you think that will this model
>> suffer?
>
>
> If a number of messages are waiting in the input queue of a process that
> can only process one message at a time since it's not multi-threaded
> then those messages are BLOCKED while in the thread. Now imagine another
> process with messages in it's queue that are also BLOCKED since they are
> waiting in the queue and only one message can be processed at a time.
> Now imagine that process A and process B each have messages that the
> other needs before it can proceed but those messages are BLOCKED waiting
> for processing in the queues.
>
> This is a real example of what can happen with message queues. The order
> isn't guaranteed. Simple concurrency solutions often have deadlock
> scenarios. This can occur when objects must synchronize events or
> information. As soon as you have multiple threads of execution you've
> got problems that need solving regardless of the concurrency model in
> place.
>
>
>>
>> Tying an object's life time to the lifetime of a process doesn't make
>> sense since there could be references to the object all over the
>> place. If the process quits the object should still be alive IF there
>> are still references to it.
>> You'd need to pass around more than references to processes. For if a
>> process has more than one object you'd not get the resolution you'd
>> need. No, passing object references around is way better.
>>
>> Yes of course there will be. In this system a process termination is
>> one of two things: A) that instance is being reclaimed in a garbage
>> collection or B) that instance has been written to disk in a kind of
>> hibernation that can be reified again on demand. Please refer to my
>> previous post with subject "One Process Per Instance.." where I talk
>> more about exacly this.
>
> If all there is is a one object per process and one process per object -
> a 1 to 1 mapping then yes gc would work that way but the 1 to 1 mapping
> isn't likely to ever happen given current and future hardware prospects.
>
>
>>
>> Even if you considered an object as having it's own "logical" process
>> you'd get into the queuing problems hinted at above.
>>
>> Which I dont see and I ask your help to understand if you still find
>> them after the clarifications made about the model.
>
> See the example above.
>
>
>>
>> Besides objects in Smalltalk are really fine grained. The notion that
>> each object would have it's own thread would require so much thread
>> switching that no current processor could handle that. It would also
>> be a huge waste of resources.
>> And what do you think was going out of the mouths of criticizers of
>> the initiatives like the park place team had in 1970's making a
>> Smalltalk with the price of the CPU's and RAM at that time? that VM's
>> are a smart efficient use of resources?
>
> That's not really relevant. If you want to build that please go ahead -
> please don't let me stop you, that's the last thing I'd want. I wish you
> luck. I get to play with current hardware and hardware that's coming
> down the pipe such as the Tile-64 or the newest GPUs when they are
> available to the wider market.
>
>
>>
>> So I copy paste myself: "I don't give credit to unfounded apriorisms.
>> It deserves to be proven that does not work. Anyway let's just assume
>> that may be too much for state of the art hardware in common computers
>> in year 2007. What about in 2009? what about in 2012?"
>
> Well just get out your calculator. There is an overhead to a thread or
> process in bytes. Say 512 bytes per thread plus it's stack. There is the
> number of objects. Say 1 million for a medium to small image. Now
> multiply those and you get 1/2 gigabyte. Oh, we forgot the stack space
> and the memory for the objects themselves. Add a multiplier for that,
> say 8 and you get 4 gigabytes. Oh, wait we forgot that the stack is kind
> weird since as each message send that isn't to self must be an
> interprocess or interthread message send you've got some weirdness going
> on let along all the thread context switching for each message send that
> occurs. Then you've got to add more for who knows what... the list could
> go on for quite a bit. It's just mind boggling.
>
> Simply put current cpu architectures are simply not designed for that
> approach. Heck they are even highly incompatible with dynamic message
> passing since they favor static code in terms of optimizations.
>
>>
>> Again, one solution does not fit all problems - if it did programming
>> would be easier.
>>
>> But programming should have to be easier.
> Yes, I concur, whenever it's possible to do so. But it also shouldn't
> ignore the hard problems either.
>
>
>> Smalltalk made it easier in a lot of aspects.
>
> Sure I concur. That's why I am working here in this group spending time
> (is money) on these emails.
>
>> Listen.. I'm not a naif silver bullet purchaser nor a faithful person.
>> I'm a critic Smalltalker that thinks he gets the point about OOP and
>> tries to find solutions to surpass the multicore crisis by getting an
>> empowered system not consoling itself with a weaker one.
>
> I do get that about you.
>
>
>> Peter please try to forget about how systems are made and think in how
>> you want to make them.
>
> I do think about how I want to make them. However to make them I have no
> choice but to consider how to actually build them using existing
> technologies and the coming future technologies.
>
> Currently we have 2-core and 4-core processors as the mainstream with
> 3-core and 8-core coming to a computer store near you. We have the
> current crop of GPUs from NVidia that have 128 processing units that can
> be programmed in a variant of C for some general purpose program tasks
> using a SIMD (single instruction multiple data) format - very useful for
> those number crunching applications like graphics, cryptology and
> numeric analysis to name just a few. We also have the general purpose
> networked Tile-64 coming - lots of general purpose compute power with an
> equal amount of scalable networked IO power - very impressive. Intel
> even has a prototype with 80-cores that is similar. Intel also has it's
> awesomely impressive Itanium processor with instruction level
> parallelism as well as multiple cores - just wait till that's a 128 core
> beastie. Please there is hardware that we likely don't know about or
> that hasn't been invented yet. Please bring it on!!!
>
> The bigger problem is that in order to build real systems I need to
> think about how they are constructed.
>
> So yes, I want easy parallel computing but it's just a harsh reality
> that concurrency, synchronization, distributed processing, and other
> advanced topics are not always easy or possible to simplify as much as
> we try to want them to be. That is the nature of computers.
>
> Sorry for being a visionary-realist. Sorry if I've sounded like the
> critic. I don't mean to be the critic that kills your dreams - if I've
> done that I apologize. I've simply meant to be the realist who informs
> the visionary that certain adjustments are needed.
>
> All the best,
>
> Peter
>
>
>
> ------------------------------------------------------------------------
>
>

Rob Withers

Re: Multy-core CPUs

In reply to this post by gruntfuttuck

Peter,

Of all the links you have been given, put this one at the top of your list. Commit to spending an half hour reading, much less than this thread! :)

> (concerning distributed programming, and event-loop concurrency)
> http://www.erights.org/elib/index.html

I also wasn't as clear as I could have been.

> When a message is sitting in an inbound queue on a process waiting

> for that process to do get to it it is in essence blocked until it gets

> processed. If the process is only processing one inbound message at

> once it is possible for a dead lock to occur when multiple processes

> are in a similar situation with each other. Both processes waiting for

> the message that would let them continue which is sitting blocked on

> the other's queue. That's all. Classic deadlock case.

Ok, with your terminology, and some of mine, a message in a queue feeding a Vat is blocked until the Vat gets to it. Ditto with Vat 2. The mistake in your classic deadlock case is to assume that the message could also get blocked because it is waiting on the msg in Vat 2. This does not happen. If there was an unresolved reference waiting on resolution from a remote computation, it is represented as a promise. So message 1 would have a promise for the outcome of message 2. It would NOT be on the queue for Vat 1. It would be sitting as the argument in a #whenMoreResolved msg to the promise. And when that promise resolves, it would then run.

Now, you are further saying that promise will not resolve because both are mutually dependent on each others results. So you are saying that in a sense they are deadlocked. Again this cannot happen. There is no way for you to kick off computation 1, with the promise to computation 2, unless the computation 2 had already been kicked off. DItto for the other direction. So it is impossible to construct this promise deadlock.

Cheers,

Rob

pwl

Re: Multy-core CPUs

In reply to this post by Sebastian Sastre-2

Hi,

Sebastian Sastre wrote:

Hi,

Sebastian Sastre wrote:

hi,

What? That just won't work. Think of the memory overhead.

I don't give credit to unfounded apriorisms. I think it deserves to be proved that does not work. Anyway let's just assume that may be too much for state of the art hardware in common computers in year 2007. What about in 2009? what about in 2012? Remember the attitude you had saying this now the first day of 2012.

It's not an unfounded apriorism as you put it.

Current hardware and technology expected in the next ten years isn't optimized for N hundred thousand or N million threads of execution. Maybe in the future that will be the case.

The Tile-64 processor is expected to grow to about 4096 processors by pushing the limits of technology beyond what they are today. To reach the levels you are talking about for a current Smalltalk image with millions of objects each having their own thread (or process) isn't going to happen anytime soon.

I work with real hardware.

I am open and willing to be pleasantly surprised however.

Peter.. Peter.. you have to fight a little harder to that demon. Look I asked you to read my previous post with subject "One Process Per Instance" where I taked the time (so money) of explainly as didactic as I can how *your* millon object example could be managed in a system like the one I'm speculating with. So please, please Peter, I ask you not to make repeat myself go read it and make your statements there if you found them. As I already said I think the experiences you are sharing in this matter are precious so discussion gets just richer.

I recall a counter example of the million objects was to split the data objects into 10,000 chunks. However, that's a different problem not the one that I have to deal with.

Tying an object instance to a particular process makes no sense. If you did that you'd likely end up with just as many dead locks and other concurrency problems since you'd now have message sends to the object being queued up on the processes input queue. Since processes could only process on message at a time deadlocks can occur - plus all kinds of nasty problems resulting from the order of messages in the queue. (There is a similar nasty problem with the GUI event processing in VisualAge Smalltalk that leads to very difficult to diagnose and comprehend concurrency problems). It's a rats maze that's best to avoid.

Besides, in some cases an object with multiple threads could respond to many messages - literally - at the same time given multiple cores. Why slow down the system by putting all the messages into a single queue when you don't have to!?

You didn't understand the model I'm talking about.

That is likely the case.

So I ask you kindly if you can read my previus emails where I where I have taken the job of expresing my exploratory thoughts until reached this model and the speculation about the existence of this model (consequences).

There are so many emails in this thread please link to the emails you'd like me to reread. Thanks very much.

There isn't such a thing as an object with multiple trheads. That does not exists in this model.

Ok. I got that.

It does exists one process per instance no more no less.

I did get that. Even if you only do that logically you've got serious problems.

If you have read where I talk about how to manage with this model N millon objects with limited hardware and you still found problems please be my guest to inform me here because I want to know that as soon as possible.

Yes, I know it's possible for systems like Erlang to have 100,000 virtual processes, aka lightweight threads, that can be in one real native operating system process or across many native processes.

Are you saying that you've figured out how to do that with millions of processes?

I think you're thinking about processes and threads the same way you know them today.

I can easily see such a scenario working and also breaking all over the place.

Why?

Lets see if this helps you to get the idea: Desambiguation: for this model I'm talking about process not as an OS process but as a VM light process which we also use to call them threads.

Ok.

So I'm saying that in this model you have only one process per instance but that process is not a process that can have threads belonging to it.

ok.

That generates a hell of complexity.

You lost me there. What complexity?

Does not matter is other model not the one I'm speculating. (probably one you have imagined before clarifiying the 1:1 object process thing)

Alan Kay's original work suggested that each object had a process. There is the logical view and the idealized view. Then there is the concrete and how to implement things. How one explains things to end users is often with the idealized view. How one implements is more often with something that isn't quite the idea.

The process I'm saying it's tied to an instance it's more close to the process word you know from dictionary plus what you know what an instance is and with the process implemented by a VM that can balance it across cores.

I didn't understand. Please restate.

I restates that N times in my previus emails being too long. To give you a clue it's about the double nature I'm saying the object has. An amalgam between object and process. It's conceptual indissociability. More on those previus emails.

Alright I'll have to reread the entire thread since no one wants to clearly state their pov in one email as I attempt to do out of courtesy. I don't have time to reread the entire thread today though. (That's why it is a courtesy to repost - it saves your readers time).

I'm not falling in the pitfall of start trying to parallelize code automagically. This far from it. In fact I think this is better than that illusion. Every message is guaranteed by VM to reach it's destination in guaranteed order. Otherwise will be chaos. And we want an ordered chaos like the one we have now in a Squeak reified image.

Yes, squeak is ordered chaos. ;--).

Clarified that I ask why do you think could be deadlocks? and what other kind of concurrency problems do you think that will this model suffer?

If a number of messages are waiting in the input queue of a process that can only process one message at a time since it's not multi-threaded then those messages are BLOCKED while in the thread. Now imagine another process with messages in it's queue that are also BLOCKED since they are waiting in the queue and only one message can be processed at a time. Now imagine that process A and process B each have messages that the other needs before it can proceed but those messages are BLOCKED waiting for processing in the queues.

This is a real example of what can happen with message queues. The order isn't guaranteed. Simple concurrency solutions often have deadlock scenarios. This can occur when objects must synchronize events or information. As soon as you have multiple threads of execution you've got problems that need solving regardless of the concurrency model in place.

But that can happen right now if you give a bad use of process in a current Smalltalk. I don't want to solve deadlocks for anybody using parallelism badly. I just want a Smalltalk that works like todays but balancing cpu load across cores and scaling to an arbitrary number of them. All this trhead it's about that.

Yes it can happen now. That's why it's important to actually learn concurrency control techniques. Books like the Little Book of Semaphores can help with that learning process.

The point is that a number of people in this thread are proposing solutions that seem to claim that these problems magically go away in some utopian manner with process-based concurrency. All I'm pointing out is that there isn't a silver bullet or concurrency utopia and now I'm getting flack for pointing out that non-ignorable reality. So be it. Those that push ahead are often the ones with many arrows in their back.

Tying an object's life time to the lifetime of a process doesn't make sense since there could be references to the object all over the place. If the process quits the object should still be alive IF there are still references to it.

You'd need to pass around more than references to processes. For if a process has more than one object you'd not get the resolution you'd need. No, passing object references around is way better.

Yes of course there will be. In this system a process termination is one of two things: A) that instance is being reclaimed in a garbage collection or B) that instance has been written to disk in a kind of hibernation that can be reified again on demand. Please refer to my previous post with subject "One Process Per Instance.." where I talk more about exacly this.

If all there is is a one object per process and one process per object - a 1 to 1 mapping then yes gc would work that way but the 1 to 1 mapping isn't likely to ever happen given current and future hardware prospects.

But Peter don't lower your guard on that so easy! we know techniques to administer resources like navegating 10 thousand instances at the time a 10 gigas image of 10 million objects! don't shoot hope before it borns! I talk some details I've imagined about this in my "One Process Per Instance" post.

Well then you must have a radically different meaning of 1 to 1 object to process mapping than I have or a radically different implementation that I've understood from your writings. If you can make it work that is all the proof that you need, isn't it!?

Even if you considered an object as having it's own "logical" process you'd get into the queuing problems hinted at above.

Which I dont see and I ask your help to understand if you still find them after the clarifications made about the model.

See the example above.

Besides objects in Smalltalk are really fine grained. The notion that each object would have it's own thread would require so much thread switching that no current processor could handle that. It would also be a huge waste of resources.

And what do you think was going out of the mouths of criticizers of the initiatives like the park place team had in 1970's making a Smalltalk with the price of the CPU's and RAM at that time? that VM's are a smart efficient use of resources?

That's not really relevant. If you want to build that please go ahead - please don't let me stop you, that's the last thing I'd want. I wish you luck. I get to play with current hardware and hardware that's coming down the pipe such as the Tile-64 or the newest GPUs when they are available to the wider market.

We all have to use cheap hardware. Please (re)think about what I said about administering hardware resources over this model.

So I copy paste myself: "I don't give credit to unfounded apriorisms. It deserves to be proven that does not work. Anyway let's just assume that may be too much for state of the art hardware in common computers in year 2007. What about in 2009? what about in 2012?"

Well just get out your calculator. There is an overhead to a thread or process in bytes. Say 512 bytes per thread plus it's stack. There is the number of objects. Say 1 million for a medium to small image. Now multiply those and you get 1/2 gigabyte. Oh, we forgot the stack space and the memory for the objects themselves. Add a multiplier for that, say 8 and you get 4 gigabytes. Oh, wait we forgot that the stack is kind weird since as each message send that isn't to self must be an interprocess or interthread message send you've got some weirdness going on let along all the thread context switching for each message send that occurs. Then you've got to add more for who knows what... the list could go on for quite a bit. It's just mind boggling.

I just can't beleive we really can't find clever ways of adminiter resources to the point in which this becomes acceptable.

Each thread needs a stack, a stored set of registers. That's at least two four kilobyte memory pages (one for the stack and one for the registers) with current hardware assuming your thread is mapped to a real processor thread of execution at some point. The two pages are there so that you can have the processor detect if the stack grows beyond it's four kilobyte page. Now you could pack them into one page when it's not being executed but that would increase your context switch time to pack and unpack. If you avoid that and simply use the same page for both then you're risking having your stack overwrite memory used for the process/thread which would be unsafe multi-threading.

Maybe some hardware designers will figure it out.

However, there is still the worst pitfall of the 1 to 1 mapping of process to object: that is the overhead of each message send to another object would require a thread context switch! That's is inescapably huge.

Simply put current cpu architectures are simply not designed for that approach. Heck they are even highly incompatible with dynamic message passing since they favor static code in terms of optimizations.

Yes that happens with machines based on mathematic models like the boolean model. It injects an inpedance mismatch between the conceptual modeling and the virtual modeling.

Again, one solution does not fit all problems - if it did programming would be easier.

But programming should have to be easier.

Yes, I concur, whenever it's possible to do so. But it also shouldn't ignore the hard problems either.

Smalltalk made it easier in a lot of aspects.

Sure I concur. That's why I am working here in this group spending time (is money) on these emails.

Listen.. I'm not a naif silver bullet purchaser nor a faithful person. I'm a critic Smalltalker that thinks he gets the point about OOP and tries to find solutions to surpass the multicore crisis by getting an empowered system not consoling itself with a weaker one.

I do get that about you.

Peter please try to forget about how systems are made and think in how you want to make them.

I do think about how I want to make them. However to make them I have no choice but to consider how to actually build them using existing technologies and the coming future technologies.

Currently we have 2-core and 4-core processors as the mainstream with 3-core and 8-core coming to a computer store near you. We have the current crop of GPUs from NVidia that have 128 processing units that can be programmed in a variant of C for some general purpose program tasks using a SIMD (single instruction multiple data) format - very useful for those number crunching applications like graphics, cryptology and numeric analysis to name just a few. We also have the general purpose networked Tile-64 coming - lots of general purpose compute power with an equal amount of scalable networked IO power - very impressive. Intel even has a prototype with 80-cores that is similar. Intel also has it's awesomely impressive Itanium processor with instruction level parallelism as well as multiple cores - just wait till that's a 128 core beastie. Please there is hardware that we likely don't know about or that hasn't been invented yet. Please bring it on!!!

The bigger problem is that in order to build real systems I need to think about how they are constructed.

So yes, I want easy parallel computing but it's just a harsh reality that concurrency, synchronization, distributed processing, and other advanced topics are not always easy or possible to simplify as much as we try to want them to be. That is the nature of computers.

Sorry for being a visionary-realist. Sorry if I've sounded like the critic. I don't mean to be the critic that kills your dreams - if I've done that I apologize. I've simply meant to be the realist who informs the visionary that certain adjustments are needed.

All the best,

Peter

Please take your time to think about what I've stated of administering resources being possible to manage load of millions of instances by a swarm of a few at the time. And don't be sorry of anything. I love criticism. Our culture need tons of criticism to be stronger. It's the only way we can unistall deprecated or obsolete ideas. You are helping here. If I really dreaming an this don't work I want that
dream to be kill now so I can spend my time in something better. That helps.

Well having loads of millions of instances managed by a swarm of them at once is what I was assuming. In fact Linux does this (well for thousands not millions anyway). It turns out that Intel's X86/IA32 architecture can only handle 4096 threads in hardware. What Linux did was virtualize them so that only one hardware thread was used for the active thread (per core I would assume). This allowed Linux to avoid the glass ceiling of 4096 threads. However, there are limits due to the overhead of context switching time and the overhead of space that each thread - even with a minimal stack as would be the case with the model you are proposing might have. It's just too onerous for practical use.

Unless you are doing something radically different that I don't understand that is.

By now this model it's just getting stronger. Please try to get it down !!! :)))

I though I crushed it already!!! ;--)

Certainly until you can provide a means for it to handle the one million data objects across 10,000 processes with edits going to the 10,000 processes plus partial object graph seeding (and any object on demand) to them and end up with one and a half million output objects with the total number of interconnections increased by 70% I'll consider it crushed. ;--)

Forward to the future - to infinity and beyond with real hardware!

Cheers,

Peter

Andreas.Raab

Re: Multy-core CPUs

In reply to this post by pwl

Peter William Lount wrote:
> When a message is sitting in an inbound queue on a process waiting for
> that process to do get to it it is in essence blocked until it gets
> processed. If the process is only processing one inbound message at once
> it is possible for a dead lock to occur when multiple processes are in a
> similar situation with each other. Both processes waiting for the
> message that would let them continue which is sitting blocked on the
> other's queue. That's all. Classic deadlock case.

Deadlock can only happen if one process waits for another. E does
*never* wait, there is no wait instruction. Instead, it schedules a
message to be run when the desired computation completes. This leaves
the Vat free to execute other messages in the meantime. In Croquet, this
looks like here:

"future is Croquet's variant for sending async messages"
promise := rcvr future doSomething.
promise whenResolved:[:value|
"do something with the result of the computation
this block will executed once the concurrent computation
is completed and the response message is being processed
in this Vat/Island."
].

Because there is no wait, "classic deadlock" simply cannot happen. There
is an equivalent situation which is called "data lock" where circular
dependencies will cause a computation not to make progress (because a is
computed in response to the completion of b, b in response to completion
of c and c in response of completion of a). But there are *major*
differences to deadlocks: First, datalock is deterministic, it only
depends on the sequence of messages which can be examined. Second,
because the Vat is *not* blocked, you are free to send further messages
to resolve any one of the dependencies and continue making progress.

In other words, the "control flow problem" of deadlock has been turned
around into a "data flow problem" (promises and completion messages)
with much less dramatic consequences when things go wrong.

Cheers,
- Andreas

Igor Stasenko

Re: Multy-core CPUs

In reply to this post by pwl

Too much to read and comment i wish to have time for all of this..

If we have different models for solving different concurrency problems
- then its obviously must be at language side, not VM side. Enforcing
a particular solution for VM will render other solutions hard or even
impossible to implement.
And i agree, while Erlang provides a solution for Erlang, its clearly
not the silver bullet for Smalltalk.

As for VM: i think we have to deal with parallelism and concurrency
problems of VM alone (change memory management and interpreter to
support a number of native threads running in parallel). Then rest
should be considered at language side.

Then we will stop arguing and implement a libraries each proposing own
good/bad solutions to concurrency and be happy :)

--
Best regards,
Igor Stasenko AKA sig.

Hans-Martin Mosner

Re: Multy-core CPUs

In reply to this post by pwl

Peter William Lount schrieb:

When a message is sitting in an inbound queue on a process waiting for that process to do get to it it is in essence blocked until it gets processed. If the process is only processing one inbound message at once it is possible for a dead lock to occur when multiple processes are in a similar situation with each other. Both processes waiting for the message that would let them continue which is sitting blocked on the other's queue. That's all. Classic deadlock case.

That's exactly my gut feeling about the claims of being deadlock free.
As far as I understood it, in E, processes can't "wait" for messages while they are processing one message. Each process (or Vat) has a loop which receives and processes one message at a time. There is no way for a process to wait for something else, it may only send messages to other processes.
Wherever you would have a remote message invocation in a traditional distributed system, E forces you to write an explicit continuation: put all your current state in an object and pass a handle that object as part of a message to a remote object. When that remote object has computed a result, this result is sent to your state object, which performs the next steps.
I think this is completely homomorphous to traditional remote message invocation in its potential for deadlock, just more complicated to program in...

Cheers,
Hans-Martin

Sebastian Sastre-2

RE: Multy-core CPUs

In reply to this post by Nicolas Cellier-3

> -----Mensaje original-----
> De: [hidden email]
> [mailto:[hidden email]] En
> nombre de nicolas cellier
> Enviado el: Jueves, 25 de Octubre de 2007 17:40
> Para: [hidden email]
> Asunto: Re: Multy-core CPUs
>
> But Smalltalk methods are sequential procedures by nature, so
> having a process per object maybe would adress mutual
> exclusion problem, but will not introduce parallelism per se.
>

But that's a black hole which I don't want to enter nor be near. I never
wanted to introduce parallelism per se. What I do want is *just* to get a
Smalltalk that can conveniently balance the cpu load in an arbitrary
quantity of cores.

> Someone has to decide to break execution sequential path into
....
>
> PS: for fun, what happens to all-is-object paradigm if each
> and every object has a MessageQueue object? What is the
> MessageQueue of the MessageQueue of ...
>
>
LOL.. Good question kind of class Metaclass Moebious thing

Cheers,

Sebastian

Sebastian Sastre-2

RE: Multy-core CPUs

In reply to this post by pwl

Hi Peter,

here I wrote (conceptually) about how an image with One Process Per Instance should start, work and be written in disk:

http://www.nabble.com/One-Process-Per-Instance-%28RE%3A-Multy-core-CPUs%29-p13408771.html

Here I sate some of why prioritize an idea like this and invokes your attention about your 1M instances example:

http://www.nabble.com/RE%3A-Multy-core-CPUs-p13406112.html

You may consider this option crushed but I consider it more onerous that we has used to but still valid. In fact I beleive that but I'm convinced that clever VM techniques can do it usable enough for mosts typical Smalltalk needs. So now maybe is not valid to real time sampling high quality sound nor real time ray traycing because of the hardware but that does dont makes it an invalid option.

For lots of applications will still be usable. Remember that cheap or expensive is subjetive. This model unloads the persons from having to pay the impedance mismatch by loading machines to pay it so machines had the hard work for us at that price. It's a trade off that you show that are you're not willing to take. But I'm very confident that, not being the silver bullet, this has a wide space of solutions.

In short: you don't use GemStone/S to sample audio. An by the way, with all the overhead you cited, how do you think an application like this will perform compared to one that uses a relational database for persistance? maybe compared to GemStone/S itself?

Anyway I feel we reached the limit of the theoretical discussion. Tests will be needed to go forward. With them maybe VM numbers surprise us and is not that onerous or we can use lots of known optimizations into it to mitigate the initial implementation cost. Sadly I'm unable to invest in this now to know where it leads.

So I think I'll hibernate this then. Maybe in 2 or 4 years result that is a convenient approach to something.

With a still valid model but with "wings cutted" by the lack of resources :)

all the best!

Sebastian

De: [hidden email] [mailto:[hidden email]] En nombre de Peter William Lount
Enviado el: Jueves, 25 de Octubre de 2007 18:08
Para: The general-purpose Squeak developers list
Asunto: Re: Multy-core CPUs

Hi,

...

I recall a counter example of the million objects was to split the data objects into 10,000 chunks. However, that's a different problem not the one that I have to deal with.

Tying an object instance to a particular process makes no sense. If you did that you'd likely end up with just as many dead locks and other concurrency problems since you'd now have message sends to the object being queued up on the processes input queue. Since processes could only process on message at a time deadlocks can occur - plus all kinds of nasty problems resulting from the order of messages in the queue. (There is a similar nasty problem with the GUI event processing in VisualAge Smalltalk that leads to very difficult to diagnose and comprehend concurrency problems). It's a rats maze that's best to avoid.

Besides, in some cases an object with multiple threads could respond to many messages - literally - at the same time given multiple cores. Why slow down the system by putting all the messages into a single queue when you don't have to!?

You didn't understand the model I'm talking about.

That is likely the case.

So I ask you kindly if you can read my previus emails where I where I have taken the job of expresing my exploratory thoughts until reached this model and the speculation about the existence of this model (consequences).
There are so many emails in this thread please link to the emails you'd like me to reread. Thanks very much.

There isn't such a thing as an object with multiple trheads. That does not exists in this model.

Ok. I got that.

It does exists one process per instance no more no less.

I did get that. Even if you only do that logically you've got serious problems.

If you have read where I talk about how to manage with this model N millon objects with limited hardware and you still found problems please be my guest to inform me here because I want to know that as soon as possible.
Yes, I know it's possible for systems like Erlang to have 100,000 virtual processes, aka lightweight threads, that can be in one real native operating system process or across many native processes.

Are you saying that you've figured out how to do that with millions of processes?

I think you're thinking about processes and threads the same way you know them today.

I can easily see such a scenario working and also breaking all over the place.

Why?

Lets see if this helps you to get the idea: Desambiguation: for this model I'm talking about process not as an OS process but as a VM light process which we also use to call them threads.

Ok.

So I'm saying that in this model you have only one process per instance but that process is not a process that can have threads belonging to it.

ok.

That generates a hell of complexity.

You lost me there. What complexity?

Does not matter is other model not the one I'm speculating. (probably one you have imagined before clarifiying the 1:1 object process thing)
Alan Kay's original work suggested that each object had a process. There is the logical view and the idealized view. Then there is the concrete and how to implement things. How one explains things to end users is often with the idealized view. How one implements is more often with something that isn't quite the idea.

The process I'm saying it's tied to an instance it's more close to the process word you know from dictionary plus what you know what an instance is and with the process implemented by a VM that can balance it across cores.

I didn't understand. Please restate.

I restates that N times in my previus emails being too long. To give you a clue it's about the double nature I'm saying the object has. An amalgam between object and process. It's conceptual indissociability. More on those previus emails.
Alright I'll have to reread the entire thread since no one wants to clearly state their pov in one email as I attempt to do out of courtesy. I don't have time to reread the entire thread today though. (That's why it is a courtesy to repost - it saves your readers time).

I'm not falling in the pitfall of start trying to parallelize code automagically. This far from it. In fact I think this is better than that illusion. Every message is guaranteed by VM to reach it's destination in guaranteed order. Otherwise will be chaos. And we want an ordered chaos like the one we have now in a Squeak reified image.

Yes, squeak is ordered chaos. ;--).

Clarified that I ask why do you think could be deadlocks? and what other kind of concurrency problems do you think that will this model suffer?

If a number of messages are waiting in the input queue of a process that can only process one message at a time since it's not multi-threaded then those messages are BLOCKED while in the thread. Now imagine another process with messages in it's queue that are also BLOCKED since they are waiting in the queue and only one message can be processed at a time. Now imagine that process A and process B each have messages that the other needs before it can proceed but those messages are BLOCKED waiting for processing in the queues.

This is a real example of what can happen with message queues. The order isn't guaranteed. Simple concurrency solutions often have deadlock scenarios. This can occur when objects must synchronize events or information. As soon as you have multiple threads of execution you've got problems that need solving regardless of the concurrency model in place.

But that can happen right now if you give a bad use of process in a current Smalltalk. I don't want to solve deadlocks for anybody using parallelism badly. I just want a Smalltalk that works like todays but balancing cpu load across cores and scaling to an arbitrary number of them. All this trhead it's about that.

Yes it can happen now. That's why it's important to actually learn concurrency control techniques. Books like the Little Book of Semaphores can help with that learning process.

The point is that a number of people in this thread are proposing solutions that seem to claim that these problems magically go away in some utopian manner with process-based concurrency. All I'm pointing out is that there isn't a silver bullet or concurrency utopia and now I'm getting flack for pointing out that non-ignorable reality. So be it. Those that push ahead are often the ones with many arrows in their back.

Tying an object's life time to the lifetime of a process doesn't make sense since there could be references to the object all over the place. If the process quits the object should still be alive IF there are still references to it.

You'd need to pass around more than references to processes. For if a process has more than one object you'd not get the resolution you'd need. No, passing object references around is way better.

Yes of course there will be. In this system a process termination is one of two things: A) that instance is being reclaimed in a garbage collection or B) that instance has been written to disk in a kind of hibernation that can be reified again on demand.  Please refer to my previous post with subject "One Process Per Instance.." where I talk more about exacly this.

If all there is is a one object per process and one process per object - a 1 to 1 mapping then yes gc would work that way but the 1 to 1 mapping isn't likely to ever happen given current and future hardware prospects.

But Peter don't lower your guard on that so easy! we know techniques to administer resources like navegating 10 thousand instances at the time a 10 gigas image of 10 million objects! don't shoot hope before it borns! I talk some details I've imagined about this in my "One Process Per Instance" post.

Well then you must have a radically different meaning of 1 to 1 object to process mapping than I have or a radically different implementation that I've understood from your writings. If you can make it work that is all the proof that you need, isn't it!?

Even if you considered an object as having it's own "logical" process you'd get into the queuing problems hinted at above.

Which I dont see and I ask your help to understand if you still find them after the clarifications made about the model.

See the example above.

Besides objects in Smalltalk are really fine grained. The notion that each object would have it's own thread would require so much thread switching that no current processor could handle that. It would also be a huge waste of resources.

And what do you think was going out of the mouths of criticizers of the initiatives like the park place team had in 1970's making a Smalltalk with the price of the CPU's and RAM at that time? that VM's are a smart efficient use of resources?

That's not really relevant. If you want to build that please go ahead - please don't let me stop you, that's the last thing I'd want. I wish you luck. I get to play with current hardware and hardware that's coming down the pipe such as the Tile-64 or the newest GPUs when they are available to the wider market.

We all have to use cheap hardware. Please (re)think about what I said about administering hardware resources over this model.

So I copy paste myself: "I don't give credit to unfounded apriorisms. It deserves to be proven that does not work. Anyway let's just assume that may be too much for state of the art hardware in common computers in year 2007. What about in 2009? what about in 2012?"

Well just get out your calculator. There is an overhead to a thread or process in bytes. Say 512 bytes per thread plus it's stack. There is the number of objects. Say 1 million for a medium to small image. Now multiply those and you get 1/2 gigabyte. Oh, we forgot the stack space and the memory for the objects themselves. Add a multiplier for that, say 8 and you get 4 gigabytes. Oh, wait we forgot that the stack is kind weird since as each message send that isn't to self must be an interprocess or interthread message send you've got some weirdness going on let along all the thread context switching for each message send that occurs. Then you've got to add more for who knows what... the list could go on for quite a bit. It's just mind boggling.

I just can't beleive we really can't find clever ways of adminiter resources to the point in which this becomes acceptable.

Each thread needs a stack, a stored set of registers. That's at least two four kilobyte memory pages (one for the stack and one for the registers) with current hardware assuming your thread is mapped to a real processor thread of execution at some point. The two pages are there so that you can have the processor detect if the stack grows beyond it's four kilobyte page. Now you could pack them into one page when it's not being executed but that would increase your context switch time to pack and unpack. If you avoid that and simply use the same page for both then you're risking having your stack overwrite memory used for the process/thread which would be unsafe multi-threading.

Maybe some hardware designers will figure it out.

However, there is still the worst pitfall of the 1 to 1 mapping of process to object: that is the overhead of each message send to another object would require a thread context switch! That's is inescapably huge.

Simply put current cpu architectures are simply not designed for that approach. Heck they are even highly incompatible with dynamic message passing since they favor static code in terms of optimizations.

Yes that happens with machines based on mathematic models like the boolean model. It injects an inpedance mismatch between the conceptual modeling and the virtual modeling.

Again, one solution does not fit all problems - if it did programming would be easier.

But programming should have to be easier.
Yes, I concur, whenever it's possible to do so. But it also shouldn't ignore the hard problems either.

Smalltalk made it easier in a lot of aspects.

Sure I concur. That's why I am working here in this group spending time (is money) on these emails.

Listen.. I'm not a naif silver bullet purchaser nor a faithful person. I'm a critic Smalltalker that thinks he gets the point about OOP and tries to find solutions to surpass the multicore crisis by getting an empowered system not consoling itself with a weaker one.

I do get that about you.

Peter please try to forget about how systems are made and think in how you want to make them.

I do think about how I want to make them. However to make them I have no choice but to consider how to actually build them using existing technologies and the coming future technologies.

Currently we have 2-core and 4-core processors as the mainstream with 3-core and 8-core coming to a computer store near you. We have the current crop of GPUs from NVidia that have 128 processing units that can be programmed in a variant of C for some general purpose program tasks using a SIMD (single instruction multiple data) format - very useful for those number crunching applications like graphics, cryptology and numeric analysis to name just a few. We also have the general purpose networked Tile-64 coming - lots of general purpose compute power with an equal amount of scalable networked IO power - very impressive. Intel even has a prototype with 80-cores that is similar. Intel also has it's awesomely impressive Itanium processor with instruction level parallelism as well as multiple cores - just wait till that's a 128 core beastie. Please there is hardware that we likely don't know about or that hasn't been invented yet. Please bring it on!!!

The bigger problem is that in order to build real systems I need to think about how they are constructed.

So yes, I want easy parallel computing but it's just a harsh reality that concurrency, synchronization, distributed processing, and other advanced topics are not always easy or possible to simplify as much as we try to want them to be. That is the nature of computers.

Sorry for being a visionary-realist. Sorry if I've sounded like the critic. I don't mean to be the critic that kills your dreams - if I've done that I apologize. I've simply meant to be the realist who informs the visionary that certain adjustments are needed.

All the best,

Peter

Please take your time to think about what I've stated of administering resources being possible to manage load of millions of instances by a swarm of a few at the time. And don't be sorry of anything. I love criticism. Our culture need tons of criticism to be stronger. It's the only way we can unistall deprecated or obsolete ideas. You are helping here. If I really dreaming an this don't work I want that
dream to be kill now so I can spend my time in something better. That helps.

Well having loads of millions of instances managed by a swarm of them at once is what I was assuming. In fact Linux does this (well for thousands not millions anyway). It turns out that Intel's X86/IA32 architecture can only handle 4096 threads in hardware. What Linux did was virtualize them so that only one hardware thread was used for the active thread (per core I would assume). This allowed Linux to avoid the glass ceiling of 4096 threads. However, there are limits due to the overhead of context switching time and the overhead of space that each thread - even with a minimal stack as would be the case with the model you are proposing might have. It's just too onerous for practical use.

Unless you are doing something radically different that I don't understand that is.

By now this model it's just getting stronger. Please try to get it down !!!   :)))

I though I crushed it already!!! ;--)

Certainly until you can provide a means for it to handle the one million data objects across 10,000 processes with edits going to the 10,000 processes plus partial object graph seeding (and any object on demand) to them and end up with one and a half million output objects with the total number of interconnections increased by 70% I'll consider it crushed. ;--)

Forward to the future - to infinity and beyond with real hardware!

Cheers,

Peter

David T. Lewis

Re: Multy-core CPUs

In reply to this post by pwl

On Wed, Oct 24, 2007 at 11:12:13PM -0700, Peter William Lount wrote:
> An excellent book for learning the ins and outs of concurrency control -
> and most importantly the common mistakes - is the free PDF book, "The
> Little Book of Semaphores", by Allen B. Downey and his students:
> http://www.greenteapress.com/semaphores/downey05semaphores.pdf.

Peter,
Thanks for this reference.
Dave

pwl

Re: Multy-core CPUs

David T. Lewis wrote:

On Wed, Oct 24, 2007 at 11:12:13PM -0700, Peter William Lount wrote:

An excellent book for learning the ins and outs of concurrency control - 
and most importantly the common mistakes - is the free PDF book, "The 
Little Book of Semaphores",  by Allen B. Downey and his students: 
http://www.greenteapress.com/semaphores/downey05semaphores.pdf.


Peter,
Thanks for this reference.
Dave

Hi,

Your welcome. It's an invaluable book for learning and reviewing the issues with Semaphores.

Peter

pwl

Re: Multy-core CPUs

In reply to this post by David T. Lewis

David T. Lewis wrote:

On Wed, Oct 24, 2007 at 11:12:13PM -0700, Peter William Lount wrote:

An excellent book for learning the ins and outs of concurrency control - 
and most importantly the common mistakes - is the free PDF book, "The 
Little Book of Semaphores",  by Allen B. Downey and his students: 
http://www.greenteapress.com/semaphores/downey05semaphores.pdf.


Peter,
Thanks for this reference.
Dave

pwl

Re: Multy-core CPUs

In reply to this post by David T. Lewis

David T. Lewis wrote:

On Wed, Oct 24, 2007 at 11:12:13PM -0700, Peter William Lount wrote:

An excellent book for learning the ins and outs of concurrency control - 
and most importantly the common mistakes - is the free PDF book, "The 
Little Book of Semaphores",  by Allen B. Downey and his students: 
http://www.greenteapress.com/semaphores/downey05semaphores.pdf.


Peter,
Thanks for this reference.
Dave

Hi,

Your welcome. It's an invaluable book for learning and reviewing the issues with Semaphores.

It sounds like the other links people submitted are also quite interesting. I've seen a few before but they obviously need a bit more in depth study to understand all the points of view - some of which seem to be in the formation stages rather than fully developed, which wasn't apparent to this reader at the start of the thread.

Peter

Marcel Weiher-3

Re: Multi-core CPUs

In reply to this post by pwl

On Oct 25, 2007, at 12:28 PM, Peter William Lount wrote:

> The Tile-64 processor is expected to grow to about 4096 processors
> by pushing the limits of technology beyond what they are today. To
> reach the levels you are talking about for a current Smalltalk image
> with millions of objects each having their own thread (or process)
> isn't going to happen anytime soon.
>
> I work with real hardware.

A couple of numbers:

- Montecito, the new dual-core Itanic has 1.72 billion transistors.
- The ARM6 macrocell has around 35000 transistors
- divide the two, and you will find that you could get more ARM6 cores
for the Montecito transistor budget than the ARM6 has transistors

So we can have a 35K object system with every processor having its own
CPU core and all message-passing being asynchronous. This is likely
to be highly inefficient, with most of the CPUs waiting/idle most of
the time, say 99%. With 1% efficiency, and say, a 200MHz clock, the
effective throughput would still be 200M * 35000 / 100 = 70 billion
instructions per second. That's a lot of instructions. And wait what
happens if we have some really parallel algorithm that cranks
efficiency up to 10%!

I am not saying any of these numbers are valid or that this is a
realistic system, but I do find the numbers of that little thought
experiment...interesting. And of coures, while Moore's law appears to
have stoppe for cycle times, it does seem to still be going for
transistors per chip.

Marcel

Jason Johnson-5

Re: Multy-core CPUs

In reply to this post by pwl

On 10/25/07, Peter William Lount <[hidden email]> wrote:
>
> When a message is sitting in an inbound queue on a process waiting for that
> process to do get to it it is in essence blocked until it gets processed. If
> the process is only processing one inbound message at once it is possible
> for a dead lock to occur when multiple processes are in a similar situation
> with each other. Both processes waiting for the message that would let them
> continue which is sitting blocked on the other's queue. That's all. Classic
> deadlock case.

But imo this is a design issue. From my experience so far in Erlang
programming, processes have a lot of similarities with objects in how
you define them. If you find yourself in a situation as you describe
here, then you probably have two or more processes hiding in a single
process. So the solution, as in Smalltalk when you have a class that
should be 2 or more, is to refactor to the correct amount of
processes.

1 ... 678910