[squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
68 messages Options
1234
pwl
Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

pwl
Hi,

Intel among others such as Tilera and NVidia are telling us - yes us
smalltalkers - to prepare for tens, hundreds and thousands of cores on a
single chip. It's up to us to bring this power to our end users - and
ourselves too!

Intel Says to Prepare For "Thousands of Cores"
http://hardware.slashdot.org/hardware/08/07/02/1833221.shtml
http://news.cnet.com/8301-13924_3-9981760-64.html?part=rss&subj=news&tag=2547-1_3-0-5

What would be nice for a new version of squeak/croquet:

HydraVM rewritten in Igor's new improved lambda+slang+exupery bypassing
C altogether yet interfacing and generating C or Java or Javascript or
Flash or ... as well for those deployment scenarios where that makes sense.

To take advantage of multi-core what is needed is real native
multi-threading per virtual machine + image not simply one native thread
per image. Both are good for various application scenarios. Remember
that a real multi-native threaded image can always just run one native
thread if you want it too while a single native thread virtual machine +
image will not run multiple native threads in the same image space. Sure
multiple images in one program memory space is nice for some scenarios.
I like that too and desire the option to deploy that way with multiple
native threads per image space in one program memory space.

All the best,

Peter


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

Philippe Marschall
2008/7/5, Peter William Lount <[hidden email]>:
> Hi,
>
>  Intel among others such as Tilera and NVidia are telling us - yes us
> smalltalkers - to prepare for tens, hundreds and thousands of cores on a
> single chip. It's up to us to bring this power to our end users - and
> ourselves too!

The same Intel that has told us that the future will be IA64 / EPIC?
The same Intel that has promised us 20 Gigahurtz today? The same Intel
that tried to sell us RAMBUST? The same Intel that builds SSE into
it's processors to make the Internet faster?

Cheers
Philippe

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

Igor Stasenko
In reply to this post by pwl
2008/7/5 Peter William Lount <[hidden email]>:

> Hi,
>
> Intel among others such as Tilera and NVidia are telling us - yes us
> smalltalkers - to prepare for tens, hundreds and thousands of cores on a
> single chip. It's up to us to bring this power to our end users - and
> ourselves too!
>
> Intel Says to Prepare For "Thousands of Cores"
> http://hardware.slashdot.org/hardware/08/07/02/1833221.shtml
> http://news.cnet.com/8301-13924_3-9981760-64.html?part=rss&subj=news&tag=2547-1_3-0-5
>
> What would be nice for a new version of squeak/croquet:
>
> HydraVM rewritten in Igor's new improved lambda+slang+exupery bypassing C
> altogether yet interfacing and generating C or Java or Javascript or Flash
> or ... as well for those deployment scenarios where that makes sense.
>

I think you misunderstand some points. Hydra is separate project and
it using VMMaker and generates C code for VM.
Eliot is doing another thing (Cog) which include JIT and then merge it
with Hydra.

But of course, it possible to rewrite everything from scratch using
native methods, which i described earlier. But this system can be
really different in many ways. No question, it would be good to make
it in a way, that it can run most of existing squeak code without
heavy rewritings.

> To take advantage of multi-core what is needed is real native
> multi-threading per virtual machine + image not simply one native thread per
> image. Both are good for various application scenarios. Remember that a real
> multi-native threaded image can always just run one native thread if you
> want it too while a single native thread virtual machine + image will not
> run multiple native threads in the same image space. Sure multiple images in
> one program memory space is nice for some scenarios. I like that too and
> desire the option to deploy that way with multiple native threads per image
> space in one program memory space.
>

Hydra is evolutionary step, which enables to use multiple cores with
substantially small expenses in development, because its based on
current Squeak VM.

> All the best,
>
> Peter
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

pwl
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

pwl
In reply to this post by Philippe Marschall
Philippe Marschall wrote:
2008/7/5, Peter William Lount [hidden email]:
  
Hi,

 Intel among others such as Tilera and NVidia are telling us - yes us
smalltalkers - to prepare for tens, hundreds and thousands of cores on a
single chip. It's up to us to bring this power to our end users - and
ourselves too!
    

The same Intel that has told us that the future will be IA64 / EPIC?
The same Intel that has promised us 20 Gigahurtz today? The same Intel
that tried to sell us RAMBUST? The same Intel that builds SSE into
it's processors to make the Internet faster?

Cheers
Philippe


  

Hi,

Yeah, the Itanium. An awesome chip architecture with an incredible instruction set. Gotta love it and it's doomed marketing.

Yes, that Intel who showed an 80 core research chip last year.
http://en.wikipedia.org/wiki/Teraflops_Research_Chip
http://www.theinquirer.net/en/inquirer/news/2007/06/21/intel-shows-off-2-tflops-processor
http://techresearch.intel.com/articles/Tera-Scale/1449.htm

Yes, that Intel that is bringing out this little wonder.
http://arstechnica.com/news.ars/post/20080205-small-wonder-inside-intels-silverthorne-ultramobile-cpu.html

Intel isn't the only vendor up to the N-Core game. In fact they are getting beaten down hard by all the other contenders for the thrown. Hard.

Yes, Multi-core from the companies listed here:
http://en.wikipedia.org/wiki/Multi-core_(computing)

Yes, that Tilera. They have 20 and Tile-64 core chips now, with 128 cores in the works. They have indicated that they plan up to 4096 with their technology.
http://www.Tilera.com

Yes, that NVidia who is already delivering tons of boards with massive numbers of GPGPUs. I have a couple of these boards already.
http://www.NVidia.com
http://en.wikipedia.org/wiki/GeForce_200_Series
http://en.wikipedia.org/wiki/Nvidia_Tesla
http://www.engadget.com/2008/07/03/nvidia-said-to-be-dropping-geforce-gtx-280-price-in-response-to/

Yes, AMD and Intel who announced 8 core mainstream chips for next year.

Yes, AMD (ATI) who announced this awesome speed beastie: 4870 X2 (RV770XT) cards. 800+ stream processing units!
http://www.engadget.com/2008/07/03/amd-radeon-hd-4870-x2-images-leaked-rumored-for-august-release/
http://ati.amd.com/products/Radeonhd4800/specs.html

Yes, IBM who makes the 9 core cell processor chip.
http://en.wikipedia.org/wiki/Cell_(microprocessor)

Yes, Apple who just bought P.A. Semi.
http://gizmodo.com/382929/apple-buys-itself-a-little-chip-company-known-for-super-efficient-processors


These are not your or your fathers Transputer, most of these are real N-core processors available now.

Looking at the cpu power available to us and comparing it with the software that is available one can't but be either sad or highly motivated to make use of the powerful chips to make much better software. Heck we're still programming with words! What ever happened to visual programming? We still have macosx and windows and ick linux and unix as the best of the breed systems? Ick. Sure I use them all but come on... we can and must do better...

Cheers,

Peter


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

Igor Stasenko
2008/7/5 Peter William Lount <[hidden email]>:

> Philippe Marschall wrote:
>
> 2008/7/5, Peter William Lount <[hidden email]>:
>
>
> Hi,
>
>  Intel among others such as Tilera and NVidia are telling us - yes us
> smalltalkers - to prepare for tens, hundreds and thousands of cores on a
> single chip. It's up to us to bring this power to our end users - and
> ourselves too!
>
>
> The same Intel that has told us that the future will be IA64 / EPIC?
> The same Intel that has promised us 20 Gigahurtz today? The same Intel
> that tried to sell us RAMBUST? The same Intel that builds SSE into
> it's processors to make the Internet faster?
>
> Cheers
> Philippe
>
>
>
>
> Hi,
>
> Yeah, the Itanium. An awesome chip architecture with an incredible
> instruction set. Gotta love it and it's doomed marketing.
>
> Yes, that Intel who showed an 80 core research chip last year.
> http://en.wikipedia.org/wiki/Teraflops_Research_Chip
> http://www.theinquirer.net/en/inquirer/news/2007/06/21/intel-shows-off-2-tflops-processor
> http://techresearch.intel.com/articles/Tera-Scale/1449.htm
>
> Yes, that Intel that is bringing out this little wonder.
> http://arstechnica.com/news.ars/post/20080205-small-wonder-inside-intels-silverthorne-ultramobile-cpu.html
>
> Intel isn't the only vendor up to the N-Core game. In fact they are getting
> beaten down hard by all the other contenders for the thrown. Hard.
>
> Yes, Multi-core from the companies listed here:
> http://en.wikipedia.org/wiki/Multi-core_(computing)
>
> Yes, that Tilera. They have 20 and Tile-64 core chips now, with 128 cores in
> the works. They have indicated that they plan up to 4096 with their
> technology.
> http://www.Tilera.com
>
> Yes, that NVidia who is already delivering tons of boards with massive
> numbers of GPGPUs. I have a couple of these boards already.
> http://www.NVidia.com
> http://en.wikipedia.org/wiki/GeForce_200_Series
> http://en.wikipedia.org/wiki/Nvidia_Tesla
> http://www.engadget.com/2008/07/03/nvidia-said-to-be-dropping-geforce-gtx-280-price-in-response-to/
>
> Yes, AMD and Intel who announced 8 core mainstream chips for next year.
>
> Yes, AMD (ATI) who announced this awesome speed beastie: 4870 X2 (RV770XT)
> cards. 800+ stream processing units!
> http://www.engadget.com/2008/07/03/amd-radeon-hd-4870-x2-images-leaked-rumored-for-august-release/
> http://ati.amd.com/products/Radeonhd4800/specs.html
>
> Yes, IBM who makes the 9 core cell processor chip.
> http://en.wikipedia.org/wiki/Cell_(microprocessor)
>
> Yes, Apple who just bought P.A. Semi.
> http://gizmodo.com/382929/apple-buys-itself-a-little-chip-company-known-for-super-efficient-processors
>
>
> These are not your or your fathers Transputer, most of these are real N-core
> processors available now.
>
> Looking at the cpu power available to us and comparing it with the software
> that is available one can't but be either sad or highly motivated to make
> use of the powerful chips to make much better software. Heck we're still
> programming with words! What ever happened to visual programming? We still
> have macosx and windows and ick linux and unix as the best of the breed
> systems? Ick. Sure I use them all but come on... we can and must do
> better...
>
> Cheers,
>
> Peter
>

I suppose Philippe meant that they don't always deliver what they telling about.
But it looks like that paradigm shift from single core to multi core
is inevitable. Building chip running 1Ghz with 20 processing units is
now more effective than building single-core chip which can run at
20GHz frequency.

SMP architectures available on desktop/server market for more than 10
years. And with new technological processes it became possible to fit
2 or more cores in single chip (using same area). CPUs already too
complex comparing to old 80's - multi-level caches, branch prediction,
parallel instructions etc etc. And putting even more cache , tricky
optimizations can't fill whole chip area. Of course you can put 1Gb
cache on chip. But this will be a waste of chip space and this chip
will have very bad processing power/power consumption ratio.

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

Jecel Assumpcao Jr
In reply to this post by pwl
I am sorry to keep repeating myself, but we will be ready before they
are. See this old picture of a Smalltalk machine with 1024 processors:

http://cva.stanford.edu/projects/j-machine/

Intel and Microsoft may have forgotten these lessons, but some of us
haven't. And having them cough up $20M for a joint research project in
multi-core software doesn't impress me very much. I bet just the coffee
budget at either of these companies is a lot larger than that.

-- Jecel

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

Igor Stasenko
In reply to this post by pwl
2008/7/5 Jecel Assumpcao Jr <[hidden email]>:

> I am sorry to keep repeating myself, but we will be ready before they
> are. See this old picture of a Smalltalk machine with 1024 processors:
>
> http://cva.stanford.edu/projects/j-machine/
>
> Intel and Microsoft may have forgotten these lessons, but some of us
> haven't. And having them cough up $20M for a joint research project in
> multi-core software doesn't impress me very much. I bet just the coffee
> budget at either of these companies is a lot larger than that.
>

I can't remember that Microsoft did any research which changed the
shape of computing world.
The only thing which they do best is buying out successful
product/company, putting Micro$oft flag on top of it, and making money
from selling it :)

> -- Jecel
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

tblanchard
In reply to this post by Igor Stasenko
This was pretty much the messages from Apple at WWDC recently as well.
Their next os version has several technologies based around this idea.
The shift is upon us.

On Jul 5, 2008, at 11:35 AM, Igor Stasenko wrote:

> I suppose Philippe meant that they don't always deliver what they  
> telling about.
> But it looks like that paradigm shift from single core to multi core
> is inevitable. Building chip running 1Ghz with 20 processing units is
> now more effective than building single-core chip which can run at
> 20GHz frequency.


pwl
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

pwl
Todd Blanchard wrote:
> This was pretty much the messages from Apple at WWDC recently as well.
> Their next os version has several technologies based around this idea.
> The shift is upon us.
>

Yeah, Apple is talking about two different approaches - program
parallelism with multi-cores and data parallelism with GPGPUs from the
likes of NVidia and AMD-ATI or possibly P.A.Semi (just a wild guess on
P.A.Semi as their chips could be made with many many cores soon).

And NO Smalltalk hasn't caught up yet. Just half a year ago in this very
forum thread people were arguing against generic fully multi-threading
of Smalltalk virtual machines. Cincom is against it. Instantiantions has
been quite and likely won't do much.

Only a few brave intrepid explorers get it and now we have experiments
like HydraVM for croquet/squeak.

Most smalltalks and smalltalkers are deeply stuck in the past of one
native thread. Most in fact are not good at multi-threading with
smalltalk non-native threads!!! It's difficult to learn and get right
which is one motivator behind those wanting to take the easy road - one
native thread per image, but that's the wrong route (in my view and
obviously in others view as well) because it isn't general purpose
enough. It involves hard work. No way around it.

Igor, how will we gain access to writing for chips like NVidia when they
keep it all secret? Use C with CUDA? Or hyjack OpenCL (to be part of
LLVM and clang frontend if I'm not mistaken) when Apple gets it working?

Cheers,

peter


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

Igor Stasenko
2008/7/6 Peter William Lount <[hidden email]>:

> Todd Blanchard wrote:
>>
>> This was pretty much the messages from Apple at WWDC recently as well.
>> Their next os version has several technologies based around this idea.
>> The shift is upon us.
>>
>
> Yeah, Apple is talking about two different approaches - program parallelism
> with multi-cores and data parallelism with GPGPUs from the likes of NVidia
> and AMD-ATI or possibly P.A.Semi (just a wild guess on P.A.Semi as their
> chips could be made with many many cores soon).
>
> And NO Smalltalk hasn't caught up yet. Just half a year ago in this very
> forum thread people were arguing against generic fully multi-threading of
> Smalltalk virtual machines. Cincom is against it. Instantiantions has been
> quite and likely won't do much.
>
> Only a few brave intrepid explorers get it and now we have experiments like
> HydraVM for croquet/squeak.
>
> Most smalltalks and smalltalkers are deeply stuck in the past of one native
> thread. Most in fact are not good at multi-threading with smalltalk
> non-native threads!!! It's difficult to learn and get right which is one
> motivator behind those wanting to take the easy road - one native thread per
> image, but that's the wrong route (in my view and obviously in others view
> as well) because it isn't general purpose enough. It involves hard work. No
> way around it.
>
> Igor, how will we gain access to writing for chips like NVidia when they
> keep it all secret? Use C with CUDA? Or hyjack OpenCL (to be part of LLVM
> and clang frontend if I'm not mistaken) when Apple gets it working?
>

You mean writing code for GPUs? Well, as the rest of the world:
provide/generate a source code and let GPU vendor API compile it.
If they don't open their chip architecture/instructions, how else it
can be done?

> Cheers,
>
> peter
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

K. K. Subramaniam
In reply to this post by pwl
On Sunday 06 Jul 2008 1:30:28 am Jecel Assumpcao Jr wrote:
> Intel and Microsoft may have forgotten these lessons, but some of us
> haven't. And having them cough up $20M for a joint research project in
> multi-core software doesn't impress me very much. I bet just the coffee
> budget at either of these companies is a lot larger than that.
The engineering issues are different this time. Intel is talking about their
capability to bring 1K-core boards to the retail store level. This is very
different from a research project in a handful of locations accessible to a
few researchers. J-Machine builders did not have to deal with battery
drain :-).

You do have a point about Intel/MS ignoring past work in this area. Their
research director could have presented the capability as a revival of prior
research instead of making it appear like a new challenge. CST and Actalk are
over two decades old and cited in many ACM papers.  Their collective amnesia
is inexcusable.

If processing elements could be wrapped in an object, then 1024 looks like a
small number....

Subbu

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

Rob Rothwell
On Sun, Jul 6, 2008 at 10:22 AM, K. K. Subramaniam <[hidden email]> wrote:
You do have a point about Intel/MS ignoring past work in this area. Their
research director could have presented the capability as a revival of prior
research instead of making it appear like a new challenge. CST and Actalk are
over two decades old and cited in many ACM papers.  Their collective amnesia
is inexcusable.

Do you think it is intentional, or merely a sad commentary on the general state of our current scientific infrastructure?  I do process improvement (Six Sigma) work at a hospital, which *assumes* an underlying foundation of, say, deductive and inductive reasoning, logic, math, science, even philosophy to some extent.  I think industry isn't so different from medicine, where vast specialization has pushed the generalist to the side so that not too many people see "the big picture" anymore.

I'm not saying it's an excuse, just maybe a reason--and yet another reason why the world NEEDS people like you having this conversation.  I think of the Squeak community kind of like Science Fiction writers--invariably what they dream up comes to pass...
 
If processing elements could be wrapped in an object, then 1024 looks like a
small number....

I know I'm not thinking at the level you guys are right now, but this is interesting...can you explain what you mean by "wrapping processing elements in an object" so I could get a picture of what that might look like?

Rob


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

Joshua Gargus-2
In reply to this post by pwl

On Jul 5, 2008, at 6:40 PM, Peter William Lount wrote:

> Todd Blanchard wrote:
>> This was pretty much the messages from Apple at WWDC recently as  
>> well.
>> Their next os version has several technologies based around this  
>> idea.
>> The shift is upon us.
>>
>
> Yeah, Apple is talking about two different approaches - program  
> parallelism with multi-cores and data parallelism with GPGPUs from  
> the likes of NVidia and AMD-ATI or possibly P.A.Semi (just a wild  
> guess on P.A.Semi as their chips could be made with many many cores  
> soon).
>
> And NO Smalltalk hasn't caught up yet. Just half a year ago in this  
> very forum thread people were arguing against generic fully multi-
> threading of Smalltalk virtual machines. Cincom is against it.  
> Instantiantions has been quite and likely won't do much.

And in my opinion, the people who were arguing against it won the  
argument.  Concerns were raised about the cache-thrashing that could  
result, and relevant empirical research was linked to that seemed to  
validate these concerns.

> Only a few brave intrepid explorers get it and now we have  
> experiments like HydraVM for croquet/squeak.

Perhaps I misunderstood what you meant in the previous part of the  
paragraph.  Hydra is explicitly one-thread-per-image for 1) simplicity  
of implementation, 2) simplicity of use and 3) because many-threads-
per-image hasn't been shown to be even theoretically desirable.

> Most smalltalks and smalltalkers are deeply stuck in the past of one  
> native thread. Most in fact are not good at multi-threading with  
> smalltalk non-native threads!!! It's difficult to learn and get  
> right which is one motivator behind those wanting to take the easy  
> road - one native thread per image,

Right, *one* motivator.

> but that's the wrong route (in my view and obviously in others view  
> as well) because it isn't general purpose enough. It involves hard  
> work. No way around it.

If you want to open up this discussion again, please bring some new  
facts.  Why would cache-thrashing not be an issue when running 64  
cores on a single image?  I'm willing to be convinced, but I haven't  
seen even a sketch of a design that would avoid this.

>
>
> Igor, how will we gain access to writing for chips like NVidia when  
> they keep it all secret?

Keep what secret?  Both AMD and NVIDIA have exposed low-level  
instructions sets for their processors.  AMD's is called CTM, and I  
can't remember the name of NVIDIA's.  These instruction sets are at  
approximately the level of x86 assembly (i.e. low-level, but still  
portable across different GPU models).

> Use C with CUDA?

One approach is to use CUDA just like Croquet uses OpenGL.  What's the  
difference?

Cheers,
Josh


> Or hyjack OpenCL (to be part of LLVM and clang frontend if I'm not  
> mistaken) when Apple gets it working?
>
> Cheers,
>
> peter
>
>


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

Igor Stasenko
2008/7/6 Joshua Gargus <[hidden email]>:

>
> On Jul 5, 2008, at 6:40 PM, Peter William Lount wrote:
>
>> Todd Blanchard wrote:
>>>
>>> This was pretty much the messages from Apple at WWDC recently as well.
>>> Their next os version has several technologies based around this idea.
>>> The shift is upon us.
>>>
>>
>> Yeah, Apple is talking about two different approaches - program
>> parallelism with multi-cores and data parallelism with GPGPUs from the likes
>> of NVidia and AMD-ATI or possibly P.A.Semi (just a wild guess on P.A.Semi as
>> their chips could be made with many many cores soon).
>>
>> And NO Smalltalk hasn't caught up yet. Just half a year ago in this very
>> forum thread people were arguing against generic fully multi-threading of
>> Smalltalk virtual machines. Cincom is against it. Instantiantions has been
>> quite and likely won't do much.
>
> And in my opinion, the people who were arguing against it won the argument.
>  Concerns were raised about the cache-thrashing that could result, and
> relevant empirical research was linked to that seemed to validate these
> concerns.
>
>> Only a few brave intrepid explorers get it and now we have experiments
>> like HydraVM for croquet/squeak.
>
> Perhaps I misunderstood what you meant in the previous part of the
> paragraph.  Hydra is explicitly one-thread-per-image for 1) simplicity of
> implementation, 2) simplicity of use and 3) because many-threads-per-image
> hasn't been shown to be even theoretically desirable.
>
>> Most smalltalks and smalltalkers are deeply stuck in the past of one
>> native thread. Most in fact are not good at multi-threading with smalltalk
>> non-native threads!!! It's difficult to learn and get right which is one
>> motivator behind those wanting to take the easy road - one native thread per
>> image,
>
> Right, *one* motivator.
>
>> but that's the wrong route (in my view and obviously in others view as
>> well) because it isn't general purpose enough. It involves hard work. No way
>> around it.
>
> If you want to open up this discussion again, please bring some new facts.
>  Why would cache-thrashing not be an issue when running 64 cores on a single
> image?  I'm willing to be convinced, but I haven't seen even a sketch of a
> design that would avoid this.
>
>>
>>
>> Igor, how will we gain access to writing for chips like NVidia when they
>> keep it all secret?
>
> Keep what secret?  Both AMD and NVIDIA have exposed low-level instructions
> sets for their processors.  AMD's is called CTM, and I can't remember the
> name of NVIDIA's.  These instruction sets are at approximately the level of
> x86 assembly (i.e. low-level, but still portable across different GPU
> models).
>

From:
http://en.wikipedia.org/wiki/CUDA
----
Threads must run in groups of at least 32 threads that execute
identical instructions simultaneously. Branches in the program code do
not impact performance significantly, provided that each of 32 threads
takes the same execution path; the SIMD execution model becomes a
significant limitation for any inherently divergent task (e.g.,
traversing a ray tracing acceleration data structure).
----

Despite that we can program GPU, we can't make it to run different code :(
Also, its something utterly wrong with this statement.
Since its waste to run 32 threads on same set of input data, it
obvious that input is different. But since input data is different,
how it possible that all branches taking same path for each thread?

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

K. K. Subramaniam
In reply to this post by Rob Rothwell
On Sunday 06 Jul 2008 9:14:13 pm Rob Rothwell wrote:
> Do you think it is intentional, or merely a sad commentary on the general
> state of our current scientific infrastructure?  I do process improvement
> (Six Sigma) work at a hospital, which *assumes* an underlying foundation
> of, say, deductive and inductive reasoning, logic, math, science, even
> philosophy to some extent.  I think industry isn't so different from
> medicine, where vast specialization has pushed the generalist to the side
> so that not too many people see "the big picture" anymore.
It is not a generalist vs. specialist issue. Would we have posed such
questions about Archimedes, Leonardo da Vinci or Newton? The real issue is
the reluctance to continue with a thread of investigation from the past. I
believe the trend since the 80s to treat knowledge as "intellectual property"
and monetize it pushes people to work in isolation and define themselves
as "specialists". Reviving decades old research work does not look as good as
initiating "new" development of multiprogramming toolkits during quarterly
appraisals.

Expect a flood of new terms around parallel computing over the next couple of
years. As Aussies would put it - prepare to be blinded with science :-).

Subbu

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

Jason Johnson-5
In reply to this post by pwl
On 7/5/08, Peter William Lount <[hidden email]> wrote:

> Hi,
>
> Intel among others such as Tilera and NVidia are telling us - yes us
> smalltalkers - to prepare for tens, hundreds and thousands of cores on a
> single chip. It's up to us to bring this power to our end users - and
> ourselves too!
>
> Intel Says to Prepare For "Thousands of Cores"
> http://hardware.slashdot.org/hardware/08/07/02/1833221.shtml
> http://news.cnet.com/8301-13924_3-9981760-64.html?part=rss&subj=news&tag=2547-1_3-0-5
>
> What would be nice for a new version of squeak/croquet:
>
> HydraVM rewritten in Igor's new improved lambda+slang+exupery bypassing C
> altogether yet interfacing and generating C or Java or Javascript or Flash
> or ... as well for those deployment scenarios where that makes sense.
>
> To take advantage of multi-core what is needed is real native
> multi-threading per virtual machine + image not simply one native thread per
> image. Both are good for various application scenarios. Remember that a real
> multi-native threaded image can always just run one native thread if you
> want it too while a single native thread virtual machine + image will not
> run multiple native threads in the same image space. Sure multiple images in
> one program memory space is nice for some scenarios. I like that too and
> desire the option to deploy that way with multiple native threads per image
> space in one program memory space.
>
> All the best,
>
> Peter

If you mean what I think you mean (fine grained shared state
multi-threading) then I find this a pretty ironic message: "the
multi-cores on upon us, we need to hurry up and adapt the method of
concurrent program that absolutely wont work on 1k+ cores".

The whole reason the "mega-cores" are interesting is that we have to
*change how we program them*.  Interesting how you are pointing out
later in this thread about others "not getting it" while managing to
miss this sky scraper size billboard.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

Jason Johnson-5
In reply to this post by pwl
On 7/6/08, Peter William Lount <[hidden email]> wrote:
>
> And NO Smalltalk hasn't caught up yet. Just half a year ago in this very
> forum thread people were arguing against generic fully multi-threading of
> Smalltalk virtual machines. Cincom is against it. Instantiantions has been
> quite and likely won't do much.

People were against it because it's a lot of work to get into a
soon-to-be-obsolete way of concurrent programming.

> Only a few brave intrepid explorers get it and now we have experiments like
> HydraVM for croquet/squeak.

Which are also single-thread-per-VM systems.

> Most smalltalks and smalltalkers are deeply stuck in the past of one native
> thread. Most in fact are not good at multi-threading with smalltalk
> non-native threads!!!

I guess that's because they're just not very smart.  Just like all
those folks who couldn't understand malloc/free weren't very smart.
Oh, wait, actually it wasn't like that it all.  Actually malloc/free
was just an overly complicated model that made everything more complex
then it needed to be......

>It's difficult to learn and get right which is one
> motivator behind those wanting to take the easy road - one native thread per
> image, but that's the wrong route (in my view and obviously in others view
> as well) because it isn't general purpose enough. It involves hard work. No
> way around it.

I would say doing what Java, of all things, does is "taking the easy
road", i.e. "no thinking required".  The right road is to actually
look at the research being done and the discoveries being made and the
systems that scale easily now (Erlang being by far the best at the
moment in actual practice), and decide how to get extremely concurrent
from there.  Not a mindless "let's do it how C/C++/Java does it!"
response.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

david54
Well said.  I think the principles embodied in Erlang may be the best way of dealing with lots of cores (and grids of lots of machines) for a large class of applications.  Perhaps I'm prejudiced -  we've been using many of these principles in our VW based solution for the last 7 years.  We have deployments running 100's of cores and are confident that it will continue to scale.  I'd love to see better support for Erlang architectural priniciples in Smalltalk.  If think that  Newspeak + Hydra + Cog would make a very interesting foundation.

Has anybody else watched Joe Armstrong describe Erlang, go on to talk about how he doesn't "get" OO programming and then just chuckle?

-david

On Sun, Jul 6, 2008 at 12:56 PM, Jason Johnson <[hidden email]> wrote:
On 7/6/08, Peter William Lount <[hidden email]> wrote:
>
> And NO Smalltalk hasn't caught up yet. Just half a year ago in this very
> forum thread people were arguing against generic fully multi-threading of
> Smalltalk virtual machines. Cincom is against it. Instantiantions has been
> quite and likely won't do much.

People were against it because it's a lot of work to get into a
soon-to-be-obsolete way of concurrent programming.

> Only a few brave intrepid explorers get it and now we have experiments like
> HydraVM for croquet/squeak.

Which are also single-thread-per-VM systems.

> Most smalltalks and smalltalkers are deeply stuck in the past of one native
> thread. Most in fact are not good at multi-threading with smalltalk
> non-native threads!!!

I guess that's because they're just not very smart.  Just like all
those folks who couldn't understand malloc/free weren't very smart.
Oh, wait, actually it wasn't like that it all.  Actually malloc/free
was just an overly complicated model that made everything more complex
then it needed to be......

>It's difficult to learn and get right which is one
> motivator behind those wanting to take the easy road - one native thread per
> image, but that's the wrong route (in my view and obviously in others view
> as well) because it isn't general purpose enough. It involves hard work. No
> way around it.

I would say doing what Java, of all things, does is "taking the easy
road", i.e. "no thinking required".  The right road is to actually
look at the research being done and the discoveries being made and the
systems that scale easily now (Erlang being by far the best at the
moment in actual practice), and decide how to get extremely concurrent
from there.  Not a mindless "let's do it how C/C++/Java does it!"
response.




Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

Joshua Gargus-2
In reply to this post by Igor Stasenko

On Jul 6, 2008, at 10:09 AM, Igor Stasenko wrote:

>
> From:
> http://en.wikipedia.org/wiki/CUDA
> ----
> Threads must run in groups of at least 32 threads that execute
> identical instructions simultaneously. Branches in the program code do
> not impact performance significantly, provided that each of 32 threads
> takes the same execution path; the SIMD execution model becomes a
> significant limitation for any inherently divergent task (e.g.,
> traversing a ray tracing acceleration data structure).
> ----
>
> Despite that we can program GPU, we can't make it to run different  
> code :(
> Also, its something utterly wrong with this statement.
> Since its waste to run 32 threads on same set of input data, it
> obvious that input is different. But since input data is different,
> how it possible that all branches taking same path for each thread?

They don't have to take the same branch, but performance can suffer if  
they take different branches.

As a real-world example of different inputs taking the same branch,  
consider the example of cel shading (http://en.wikipedia.org/wiki/Cel_shading 
).  Each pixel is processed by a separate thread.  You might have a  
bit of code like 'if (diffuse_component <  threshold) then color =  
shadow_color; else color = lit_color'.  You only need an 8x4 block of  
pixels to fill 32 threads, and the majority of 32-pixel blocks do  
execute the same path through the code.

I don't get to work on this sort of thing as much as I'd like, so I  
can't be completely certain about the following statement.  But, I  
believe that the above code snippet wouldn't result in bad performance  
even if some pixels within a block took one branch, and one took the  
other.  As I understand it, all of the threads in a block have to  
finish at the same time, so they can start on the next chunk of input  
at the same time.  So, if you have 31 threads that take the fast path,  
and 1 thread that branches into a longer computation, then the other  
31 threads are held up for the one.

Was that clear?

Cheers,
Josh

>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>


pwl
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

pwl
In reply to this post by Jason Johnson-5
Hi,

Oh, clearly I misunderstood about Hydra. I thought it could handle more
than one native thread per image in memory. My mistake if it can't.
Unfortunately I've not yet had time to take a look at hydro up close.

The one native thread per image model is an ok idea idea however it
fails to cover many situations for which multiple threads provides the
better or even simpler solution.

Besides as soon as you have two threads of either kind you essentially
have many of the same issues and the same level of complexity of issues
involved with regards to concurrency and data coherency of objects
within the image. As soon as you have two or more images you have data
coherency issues across those images. These problems don't magically go
away by limiting the native threads per image to one.

All systems regardless of design have scalability issues. It's the
nature of the beast.

Erlang certainly has a lot going for it's model even though it's not
fully message based nor object based.

Thinking to see further than simplistic solutions and doing the hard
work to get there are both important.

All the best,

Peter



1234