Hi,
Intel among others such as Tilera and NVidia are telling us - yes us smalltalkers - to prepare for tens, hundreds and thousands of cores on a single chip. It's up to us to bring this power to our end users - and ourselves too! Intel Says to Prepare For "Thousands of Cores" http://hardware.slashdot.org/hardware/08/07/02/1833221.shtml http://news.cnet.com/8301-13924_3-9981760-64.html?part=rss&subj=news&tag=2547-1_3-0-5 What would be nice for a new version of squeak/croquet: HydraVM rewritten in Igor's new improved lambda+slang+exupery bypassing C altogether yet interfacing and generating C or Java or Javascript or Flash or ... as well for those deployment scenarios where that makes sense. To take advantage of multi-core what is needed is real native multi-threading per virtual machine + image not simply one native thread per image. Both are good for various application scenarios. Remember that a real multi-native threaded image can always just run one native thread if you want it too while a single native thread virtual machine + image will not run multiple native threads in the same image space. Sure multiple images in one program memory space is nice for some scenarios. I like that too and desire the option to deploy that way with multiple native threads per image space in one program memory space. All the best, Peter |
2008/7/5, Peter William Lount <[hidden email]>:
> Hi, > > Intel among others such as Tilera and NVidia are telling us - yes us > smalltalkers - to prepare for tens, hundreds and thousands of cores on a > single chip. It's up to us to bring this power to our end users - and > ourselves too! The same Intel that has told us that the future will be IA64 / EPIC? The same Intel that has promised us 20 Gigahurtz today? The same Intel that tried to sell us RAMBUST? The same Intel that builds SSE into it's processors to make the Internet faster? Cheers Philippe |
In reply to this post by pwl
2008/7/5 Peter William Lount <[hidden email]>:
> Hi, > > Intel among others such as Tilera and NVidia are telling us - yes us > smalltalkers - to prepare for tens, hundreds and thousands of cores on a > single chip. It's up to us to bring this power to our end users - and > ourselves too! > > Intel Says to Prepare For "Thousands of Cores" > http://hardware.slashdot.org/hardware/08/07/02/1833221.shtml > http://news.cnet.com/8301-13924_3-9981760-64.html?part=rss&subj=news&tag=2547-1_3-0-5 > > What would be nice for a new version of squeak/croquet: > > HydraVM rewritten in Igor's new improved lambda+slang+exupery bypassing C > altogether yet interfacing and generating C or Java or Javascript or Flash > or ... as well for those deployment scenarios where that makes sense. > I think you misunderstand some points. Hydra is separate project and it using VMMaker and generates C code for VM. Eliot is doing another thing (Cog) which include JIT and then merge it with Hydra. But of course, it possible to rewrite everything from scratch using native methods, which i described earlier. But this system can be really different in many ways. No question, it would be good to make it in a way, that it can run most of existing squeak code without heavy rewritings. > To take advantage of multi-core what is needed is real native > multi-threading per virtual machine + image not simply one native thread per > image. Both are good for various application scenarios. Remember that a real > multi-native threaded image can always just run one native thread if you > want it too while a single native thread virtual machine + image will not > run multiple native threads in the same image space. Sure multiple images in > one program memory space is nice for some scenarios. I like that too and > desire the option to deploy that way with multiple native threads per image > space in one program memory space. > Hydra is evolutionary step, which enables to use multiple cores with substantially small expenses in development, because its based on current Squeak VM. > All the best, > > Peter > > > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Philippe Marschall
Philippe Marschall wrote:
2008/7/5, Peter William Lount [hidden email]:Hi, Intel among others such as Tilera and NVidia are telling us - yes us smalltalkers - to prepare for tens, hundreds and thousands of cores on a single chip. It's up to us to bring this power to our end users - and ourselves too!The same Intel that has told us that the future will be IA64 / EPIC? The same Intel that has promised us 20 Gigahurtz today? The same Intel that tried to sell us RAMBUST? The same Intel that builds SSE into it's processors to make the Internet faster? Cheers Philippe Hi, Yeah, the Itanium. An awesome chip architecture with an incredible instruction set. Gotta love it and it's doomed marketing. Yes, that Intel who showed an 80 core research chip last year. http://en.wikipedia.org/wiki/Teraflops_Research_Chip http://www.theinquirer.net/en/inquirer/news/2007/06/21/intel-shows-off-2-tflops-processor http://techresearch.intel.com/articles/Tera-Scale/1449.htm Yes, that Intel that is bringing out this little wonder. http://arstechnica.com/news.ars/post/20080205-small-wonder-inside-intels-silverthorne-ultramobile-cpu.html Intel isn't the only vendor up to the N-Core game. In fact they are getting beaten down hard by all the other contenders for the thrown. Hard. Yes, Multi-core from the companies listed here: http://en.wikipedia.org/wiki/Multi-core_(computing) Yes, that Tilera. They have 20 and Tile-64 core chips now, with 128 cores in the works. They have indicated that they plan up to 4096 with their technology. http://www.Tilera.com Yes, that NVidia who is already delivering tons of boards with massive numbers of GPGPUs. I have a couple of these boards already. http://www.NVidia.com http://en.wikipedia.org/wiki/GeForce_200_Series http://en.wikipedia.org/wiki/Nvidia_Tesla http://www.engadget.com/2008/07/03/nvidia-said-to-be-dropping-geforce-gtx-280-price-in-response-to/ Yes, AMD and Intel who announced 8 core mainstream chips for next year. Yes, AMD (ATI) who announced this awesome speed beastie: 4870 X2 (RV770XT) cards. 800+ stream processing units! http://www.engadget.com/2008/07/03/amd-radeon-hd-4870-x2-images-leaked-rumored-for-august-release/ http://ati.amd.com/products/Radeonhd4800/specs.html Yes, IBM who makes the 9 core cell processor chip. http://en.wikipedia.org/wiki/Cell_(microprocessor) Yes, Apple who just bought P.A. Semi. http://gizmodo.com/382929/apple-buys-itself-a-little-chip-company-known-for-super-efficient-processors These are not your or your fathers Transputer, most of these are real N-core processors available now. Looking at the cpu power available to us and comparing it with the software that is available one can't but be either sad or highly motivated to make use of the powerful chips to make much better software. Heck we're still programming with words! What ever happened to visual programming? We still have macosx and windows and ick linux and unix as the best of the breed systems? Ick. Sure I use them all but come on... we can and must do better... Cheers, Peter |
2008/7/5 Peter William Lount <[hidden email]>:
> Philippe Marschall wrote: > > 2008/7/5, Peter William Lount <[hidden email]>: > > > Hi, > > Intel among others such as Tilera and NVidia are telling us - yes us > smalltalkers - to prepare for tens, hundreds and thousands of cores on a > single chip. It's up to us to bring this power to our end users - and > ourselves too! > > > The same Intel that has told us that the future will be IA64 / EPIC? > The same Intel that has promised us 20 Gigahurtz today? The same Intel > that tried to sell us RAMBUST? The same Intel that builds SSE into > it's processors to make the Internet faster? > > Cheers > Philippe > > > > > Hi, > > Yeah, the Itanium. An awesome chip architecture with an incredible > instruction set. Gotta love it and it's doomed marketing. > > Yes, that Intel who showed an 80 core research chip last year. > http://en.wikipedia.org/wiki/Teraflops_Research_Chip > http://www.theinquirer.net/en/inquirer/news/2007/06/21/intel-shows-off-2-tflops-processor > http://techresearch.intel.com/articles/Tera-Scale/1449.htm > > Yes, that Intel that is bringing out this little wonder. > http://arstechnica.com/news.ars/post/20080205-small-wonder-inside-intels-silverthorne-ultramobile-cpu.html > > Intel isn't the only vendor up to the N-Core game. In fact they are getting > beaten down hard by all the other contenders for the thrown. Hard. > > Yes, Multi-core from the companies listed here: > http://en.wikipedia.org/wiki/Multi-core_(computing) > > Yes, that Tilera. They have 20 and Tile-64 core chips now, with 128 cores in > the works. They have indicated that they plan up to 4096 with their > technology. > http://www.Tilera.com > > Yes, that NVidia who is already delivering tons of boards with massive > numbers of GPGPUs. I have a couple of these boards already. > http://www.NVidia.com > http://en.wikipedia.org/wiki/GeForce_200_Series > http://en.wikipedia.org/wiki/Nvidia_Tesla > http://www.engadget.com/2008/07/03/nvidia-said-to-be-dropping-geforce-gtx-280-price-in-response-to/ > > Yes, AMD and Intel who announced 8 core mainstream chips for next year. > > Yes, AMD (ATI) who announced this awesome speed beastie: 4870 X2 (RV770XT) > cards. 800+ stream processing units! > http://www.engadget.com/2008/07/03/amd-radeon-hd-4870-x2-images-leaked-rumored-for-august-release/ > http://ati.amd.com/products/Radeonhd4800/specs.html > > Yes, IBM who makes the 9 core cell processor chip. > http://en.wikipedia.org/wiki/Cell_(microprocessor) > > Yes, Apple who just bought P.A. Semi. > http://gizmodo.com/382929/apple-buys-itself-a-little-chip-company-known-for-super-efficient-processors > > > These are not your or your fathers Transputer, most of these are real N-core > processors available now. > > Looking at the cpu power available to us and comparing it with the software > that is available one can't but be either sad or highly motivated to make > use of the powerful chips to make much better software. Heck we're still > programming with words! What ever happened to visual programming? We still > have macosx and windows and ick linux and unix as the best of the breed > systems? Ick. Sure I use them all but come on... we can and must do > better... > > Cheers, > > Peter > I suppose Philippe meant that they don't always deliver what they telling about. But it looks like that paradigm shift from single core to multi core is inevitable. Building chip running 1Ghz with 20 processing units is now more effective than building single-core chip which can run at 20GHz frequency. SMP architectures available on desktop/server market for more than 10 years. And with new technological processes it became possible to fit 2 or more cores in single chip (using same area). CPUs already too complex comparing to old 80's - multi-level caches, branch prediction, parallel instructions etc etc. And putting even more cache , tricky optimizations can't fill whole chip area. Of course you can put 1Gb cache on chip. But this will be a waste of chip space and this chip will have very bad processing power/power consumption ratio. -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by pwl
I am sorry to keep repeating myself, but we will be ready before they
are. See this old picture of a Smalltalk machine with 1024 processors: http://cva.stanford.edu/projects/j-machine/ Intel and Microsoft may have forgotten these lessons, but some of us haven't. And having them cough up $20M for a joint research project in multi-core software doesn't impress me very much. I bet just the coffee budget at either of these companies is a lot larger than that. -- Jecel |
In reply to this post by pwl
2008/7/5 Jecel Assumpcao Jr <[hidden email]>:
> I am sorry to keep repeating myself, but we will be ready before they > are. See this old picture of a Smalltalk machine with 1024 processors: > > http://cva.stanford.edu/projects/j-machine/ > > Intel and Microsoft may have forgotten these lessons, but some of us > haven't. And having them cough up $20M for a joint research project in > multi-core software doesn't impress me very much. I bet just the coffee > budget at either of these companies is a lot larger than that. > I can't remember that Microsoft did any research which changed the shape of computing world. The only thing which they do best is buying out successful product/company, putting Micro$oft flag on top of it, and making money from selling it :) > -- Jecel > > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Igor Stasenko
This was pretty much the messages from Apple at WWDC recently as well.
Their next os version has several technologies based around this idea. The shift is upon us. On Jul 5, 2008, at 11:35 AM, Igor Stasenko wrote: > I suppose Philippe meant that they don't always deliver what they > telling about. > But it looks like that paradigm shift from single core to multi core > is inevitable. Building chip running 1Ghz with 20 processing units is > now more effective than building single-core chip which can run at > 20GHz frequency. |
Todd Blanchard wrote:
> This was pretty much the messages from Apple at WWDC recently as well. > Their next os version has several technologies based around this idea. > The shift is upon us. > Yeah, Apple is talking about two different approaches - program parallelism with multi-cores and data parallelism with GPGPUs from the likes of NVidia and AMD-ATI or possibly P.A.Semi (just a wild guess on P.A.Semi as their chips could be made with many many cores soon). And NO Smalltalk hasn't caught up yet. Just half a year ago in this very forum thread people were arguing against generic fully multi-threading of Smalltalk virtual machines. Cincom is against it. Instantiantions has been quite and likely won't do much. Only a few brave intrepid explorers get it and now we have experiments like HydraVM for croquet/squeak. Most smalltalks and smalltalkers are deeply stuck in the past of one native thread. Most in fact are not good at multi-threading with smalltalk non-native threads!!! It's difficult to learn and get right which is one motivator behind those wanting to take the easy road - one native thread per image, but that's the wrong route (in my view and obviously in others view as well) because it isn't general purpose enough. It involves hard work. No way around it. Igor, how will we gain access to writing for chips like NVidia when they keep it all secret? Use C with CUDA? Or hyjack OpenCL (to be part of LLVM and clang frontend if I'm not mistaken) when Apple gets it working? Cheers, peter |
2008/7/6 Peter William Lount <[hidden email]>:
> Todd Blanchard wrote: >> >> This was pretty much the messages from Apple at WWDC recently as well. >> Their next os version has several technologies based around this idea. >> The shift is upon us. >> > > Yeah, Apple is talking about two different approaches - program parallelism > with multi-cores and data parallelism with GPGPUs from the likes of NVidia > and AMD-ATI or possibly P.A.Semi (just a wild guess on P.A.Semi as their > chips could be made with many many cores soon). > > And NO Smalltalk hasn't caught up yet. Just half a year ago in this very > forum thread people were arguing against generic fully multi-threading of > Smalltalk virtual machines. Cincom is against it. Instantiantions has been > quite and likely won't do much. > > Only a few brave intrepid explorers get it and now we have experiments like > HydraVM for croquet/squeak. > > Most smalltalks and smalltalkers are deeply stuck in the past of one native > thread. Most in fact are not good at multi-threading with smalltalk > non-native threads!!! It's difficult to learn and get right which is one > motivator behind those wanting to take the easy road - one native thread per > image, but that's the wrong route (in my view and obviously in others view > as well) because it isn't general purpose enough. It involves hard work. No > way around it. > > Igor, how will we gain access to writing for chips like NVidia when they > keep it all secret? Use C with CUDA? Or hyjack OpenCL (to be part of LLVM > and clang frontend if I'm not mistaken) when Apple gets it working? > You mean writing code for GPUs? Well, as the rest of the world: provide/generate a source code and let GPU vendor API compile it. If they don't open their chip architecture/instructions, how else it can be done? > Cheers, > > peter > > > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by pwl
On Sunday 06 Jul 2008 1:30:28 am Jecel Assumpcao Jr wrote:
> Intel and Microsoft may have forgotten these lessons, but some of us > haven't. And having them cough up $20M for a joint research project in > multi-core software doesn't impress me very much. I bet just the coffee > budget at either of these companies is a lot larger than that. The engineering issues are different this time. Intel is talking about their capability to bring 1K-core boards to the retail store level. This is very different from a research project in a handful of locations accessible to a few researchers. J-Machine builders did not have to deal with battery drain :-). You do have a point about Intel/MS ignoring past work in this area. Their research director could have presented the capability as a revival of prior research instead of making it appear like a new challenge. CST and Actalk are over two decades old and cited in many ACM papers. Their collective amnesia is inexcusable. If processing elements could be wrapped in an object, then 1024 looks like a small number.... Subbu |
On Sun, Jul 6, 2008 at 10:22 AM, K. K. Subramaniam <[hidden email]> wrote:
Do you think it is intentional, or merely a sad commentary on the general state of our current scientific infrastructure? I do process improvement (Six Sigma) work at a hospital, which *assumes* an underlying foundation of, say, deductive and inductive reasoning, logic, math, science, even philosophy to some extent. I think industry isn't so different from medicine, where vast specialization has pushed the generalist to the side so that not too many people see "the big picture" anymore. I'm not saying it's an excuse, just maybe a reason--and yet another reason why the world NEEDS people like you having this conversation. I think of the Squeak community kind of like Science Fiction writers--invariably what they dream up comes to pass... If processing elements could be wrapped in an object, then 1024 looks like a I know I'm not thinking at the level you guys are right now, but this is interesting...can you explain what you mean by "wrapping processing elements in an object" so I could get a picture of what that might look like? Rob |
In reply to this post by pwl
On Jul 5, 2008, at 6:40 PM, Peter William Lount wrote: > Todd Blanchard wrote: >> This was pretty much the messages from Apple at WWDC recently as >> well. >> Their next os version has several technologies based around this >> idea. >> The shift is upon us. >> > > Yeah, Apple is talking about two different approaches - program > parallelism with multi-cores and data parallelism with GPGPUs from > the likes of NVidia and AMD-ATI or possibly P.A.Semi (just a wild > guess on P.A.Semi as their chips could be made with many many cores > soon). > > And NO Smalltalk hasn't caught up yet. Just half a year ago in this > very forum thread people were arguing against generic fully multi- > threading of Smalltalk virtual machines. Cincom is against it. > Instantiantions has been quite and likely won't do much. And in my opinion, the people who were arguing against it won the argument. Concerns were raised about the cache-thrashing that could result, and relevant empirical research was linked to that seemed to validate these concerns. > Only a few brave intrepid explorers get it and now we have > experiments like HydraVM for croquet/squeak. Perhaps I misunderstood what you meant in the previous part of the paragraph. Hydra is explicitly one-thread-per-image for 1) simplicity of implementation, 2) simplicity of use and 3) because many-threads- per-image hasn't been shown to be even theoretically desirable. > Most smalltalks and smalltalkers are deeply stuck in the past of one > native thread. Most in fact are not good at multi-threading with > smalltalk non-native threads!!! It's difficult to learn and get > right which is one motivator behind those wanting to take the easy > road - one native thread per image, Right, *one* motivator. > but that's the wrong route (in my view and obviously in others view > as well) because it isn't general purpose enough. It involves hard > work. No way around it. If you want to open up this discussion again, please bring some new facts. Why would cache-thrashing not be an issue when running 64 cores on a single image? I'm willing to be convinced, but I haven't seen even a sketch of a design that would avoid this. > > > Igor, how will we gain access to writing for chips like NVidia when > they keep it all secret? Keep what secret? Both AMD and NVIDIA have exposed low-level instructions sets for their processors. AMD's is called CTM, and I can't remember the name of NVIDIA's. These instruction sets are at approximately the level of x86 assembly (i.e. low-level, but still portable across different GPU models). > Use C with CUDA? One approach is to use CUDA just like Croquet uses OpenGL. What's the difference? Cheers, Josh > Or hyjack OpenCL (to be part of LLVM and clang frontend if I'm not > mistaken) when Apple gets it working? > > Cheers, > > peter > > |
2008/7/6 Joshua Gargus <[hidden email]>:
> > On Jul 5, 2008, at 6:40 PM, Peter William Lount wrote: > >> Todd Blanchard wrote: >>> >>> This was pretty much the messages from Apple at WWDC recently as well. >>> Their next os version has several technologies based around this idea. >>> The shift is upon us. >>> >> >> Yeah, Apple is talking about two different approaches - program >> parallelism with multi-cores and data parallelism with GPGPUs from the likes >> of NVidia and AMD-ATI or possibly P.A.Semi (just a wild guess on P.A.Semi as >> their chips could be made with many many cores soon). >> >> And NO Smalltalk hasn't caught up yet. Just half a year ago in this very >> forum thread people were arguing against generic fully multi-threading of >> Smalltalk virtual machines. Cincom is against it. Instantiantions has been >> quite and likely won't do much. > > And in my opinion, the people who were arguing against it won the argument. > Concerns were raised about the cache-thrashing that could result, and > relevant empirical research was linked to that seemed to validate these > concerns. > >> Only a few brave intrepid explorers get it and now we have experiments >> like HydraVM for croquet/squeak. > > Perhaps I misunderstood what you meant in the previous part of the > paragraph. Hydra is explicitly one-thread-per-image for 1) simplicity of > implementation, 2) simplicity of use and 3) because many-threads-per-image > hasn't been shown to be even theoretically desirable. > >> Most smalltalks and smalltalkers are deeply stuck in the past of one >> native thread. Most in fact are not good at multi-threading with smalltalk >> non-native threads!!! It's difficult to learn and get right which is one >> motivator behind those wanting to take the easy road - one native thread per >> image, > > Right, *one* motivator. > >> but that's the wrong route (in my view and obviously in others view as >> well) because it isn't general purpose enough. It involves hard work. No way >> around it. > > If you want to open up this discussion again, please bring some new facts. > Why would cache-thrashing not be an issue when running 64 cores on a single > image? I'm willing to be convinced, but I haven't seen even a sketch of a > design that would avoid this. > >> >> >> Igor, how will we gain access to writing for chips like NVidia when they >> keep it all secret? > > Keep what secret? Both AMD and NVIDIA have exposed low-level instructions > sets for their processors. AMD's is called CTM, and I can't remember the > name of NVIDIA's. These instruction sets are at approximately the level of > x86 assembly (i.e. low-level, but still portable across different GPU > models). > From: http://en.wikipedia.org/wiki/CUDA ---- Threads must run in groups of at least 32 threads that execute identical instructions simultaneously. Branches in the program code do not impact performance significantly, provided that each of 32 threads takes the same execution path; the SIMD execution model becomes a significant limitation for any inherently divergent task (e.g., traversing a ray tracing acceleration data structure). ---- Despite that we can program GPU, we can't make it to run different code :( Also, its something utterly wrong with this statement. Since its waste to run 32 threads on same set of input data, it obvious that input is different. But since input data is different, how it possible that all branches taking same path for each thread? -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Rob Rothwell
On Sunday 06 Jul 2008 9:14:13 pm Rob Rothwell wrote:
> Do you think it is intentional, or merely a sad commentary on the general > state of our current scientific infrastructure? I do process improvement > (Six Sigma) work at a hospital, which *assumes* an underlying foundation > of, say, deductive and inductive reasoning, logic, math, science, even > philosophy to some extent. I think industry isn't so different from > medicine, where vast specialization has pushed the generalist to the side > so that not too many people see "the big picture" anymore. It is not a generalist vs. specialist issue. Would we have posed such questions about Archimedes, Leonardo da Vinci or Newton? The real issue is the reluctance to continue with a thread of investigation from the past. I believe the trend since the 80s to treat knowledge as "intellectual property" and monetize it pushes people to work in isolation and define themselves as "specialists". Reviving decades old research work does not look as good as initiating "new" development of multiprogramming toolkits during quarterly appraisals. Expect a flood of new terms around parallel computing over the next couple of years. As Aussies would put it - prepare to be blinded with science :-). Subbu |
In reply to this post by pwl
On 7/5/08, Peter William Lount <[hidden email]> wrote:
> Hi, > > Intel among others such as Tilera and NVidia are telling us - yes us > smalltalkers - to prepare for tens, hundreds and thousands of cores on a > single chip. It's up to us to bring this power to our end users - and > ourselves too! > > Intel Says to Prepare For "Thousands of Cores" > http://hardware.slashdot.org/hardware/08/07/02/1833221.shtml > http://news.cnet.com/8301-13924_3-9981760-64.html?part=rss&subj=news&tag=2547-1_3-0-5 > > What would be nice for a new version of squeak/croquet: > > HydraVM rewritten in Igor's new improved lambda+slang+exupery bypassing C > altogether yet interfacing and generating C or Java or Javascript or Flash > or ... as well for those deployment scenarios where that makes sense. > > To take advantage of multi-core what is needed is real native > multi-threading per virtual machine + image not simply one native thread per > image. Both are good for various application scenarios. Remember that a real > multi-native threaded image can always just run one native thread if you > want it too while a single native thread virtual machine + image will not > run multiple native threads in the same image space. Sure multiple images in > one program memory space is nice for some scenarios. I like that too and > desire the option to deploy that way with multiple native threads per image > space in one program memory space. > > All the best, > > Peter If you mean what I think you mean (fine grained shared state multi-threading) then I find this a pretty ironic message: "the multi-cores on upon us, we need to hurry up and adapt the method of concurrent program that absolutely wont work on 1k+ cores". The whole reason the "mega-cores" are interesting is that we have to *change how we program them*. Interesting how you are pointing out later in this thread about others "not getting it" while managing to miss this sky scraper size billboard. |
In reply to this post by pwl
On 7/6/08, Peter William Lount <[hidden email]> wrote:
> > And NO Smalltalk hasn't caught up yet. Just half a year ago in this very > forum thread people were arguing against generic fully multi-threading of > Smalltalk virtual machines. Cincom is against it. Instantiantions has been > quite and likely won't do much. People were against it because it's a lot of work to get into a soon-to-be-obsolete way of concurrent programming. > Only a few brave intrepid explorers get it and now we have experiments like > HydraVM for croquet/squeak. Which are also single-thread-per-VM systems. > Most smalltalks and smalltalkers are deeply stuck in the past of one native > thread. Most in fact are not good at multi-threading with smalltalk > non-native threads!!! I guess that's because they're just not very smart. Just like all those folks who couldn't understand malloc/free weren't very smart. Oh, wait, actually it wasn't like that it all. Actually malloc/free was just an overly complicated model that made everything more complex then it needed to be...... >It's difficult to learn and get right which is one > motivator behind those wanting to take the easy road - one native thread per > image, but that's the wrong route (in my view and obviously in others view > as well) because it isn't general purpose enough. It involves hard work. No > way around it. I would say doing what Java, of all things, does is "taking the easy road", i.e. "no thinking required". The right road is to actually look at the research being done and the discoveries being made and the systems that scale easily now (Erlang being by far the best at the moment in actual practice), and decide how to get extremely concurrent from there. Not a mindless "let's do it how C/C++/Java does it!" response. |
Well said. I think the principles embodied in Erlang may be the best way of dealing with lots of cores (and grids of lots of machines) for a large class of applications. Perhaps I'm prejudiced - we've been using many of these principles in our VW based solution for the last 7 years. We have deployments running 100's of cores and are confident that it will continue to scale. I'd love to see better support for Erlang architectural priniciples in Smalltalk. If think that Newspeak + Hydra + Cog would make a very interesting foundation.
Has anybody else watched Joe Armstrong describe Erlang, go on to talk about how he doesn't "get" OO programming and then just chuckle? -david On Sun, Jul 6, 2008 at 12:56 PM, Jason Johnson <[hidden email]> wrote:
|
In reply to this post by Igor Stasenko
On Jul 6, 2008, at 10:09 AM, Igor Stasenko wrote: > > From: > http://en.wikipedia.org/wiki/CUDA > ---- > Threads must run in groups of at least 32 threads that execute > identical instructions simultaneously. Branches in the program code do > not impact performance significantly, provided that each of 32 threads > takes the same execution path; the SIMD execution model becomes a > significant limitation for any inherently divergent task (e.g., > traversing a ray tracing acceleration data structure). > ---- > > Despite that we can program GPU, we can't make it to run different > code :( > Also, its something utterly wrong with this statement. > Since its waste to run 32 threads on same set of input data, it > obvious that input is different. But since input data is different, > how it possible that all branches taking same path for each thread? They don't have to take the same branch, but performance can suffer if they take different branches. As a real-world example of different inputs taking the same branch, consider the example of cel shading (http://en.wikipedia.org/wiki/Cel_shading ). Each pixel is processed by a separate thread. You might have a bit of code like 'if (diffuse_component < threshold) then color = shadow_color; else color = lit_color'. You only need an 8x4 block of pixels to fill 32 threads, and the majority of 32-pixel blocks do execute the same path through the code. I don't get to work on this sort of thing as much as I'd like, so I can't be completely certain about the following statement. But, I believe that the above code snippet wouldn't result in bad performance even if some pixels within a block took one branch, and one took the other. As I understand it, all of the threads in a block have to finish at the same time, so they can start on the next chunk of input at the same time. So, if you have 31 threads that take the fast path, and 1 thread that branches into a longer computation, then the other 31 threads are held up for the one. Was that clear? Cheers, Josh > > > -- > Best regards, > Igor Stasenko AKA sig. > |
In reply to this post by Jason Johnson-5
Hi,
Oh, clearly I misunderstood about Hydra. I thought it could handle more than one native thread per image in memory. My mistake if it can't. Unfortunately I've not yet had time to take a look at hydro up close. The one native thread per image model is an ok idea idea however it fails to cover many situations for which multiple threads provides the better or even simpler solution. Besides as soon as you have two threads of either kind you essentially have many of the same issues and the same level of complexity of issues involved with regards to concurrency and data coherency of objects within the image. As soon as you have two or more images you have data coherency issues across those images. These problems don't magically go away by limiting the native threads per image to one. All systems regardless of design have scalability issues. It's the nature of the beast. Erlang certainly has a lot going for it's model even though it's not fully message based nor object based. Thinking to see further than simplistic solutions and doing the hard work to get there are both important. All the best, Peter |
Free forum by Nabble | Edit this page |