Hi: Since we recently published a little article on first experiments with self-optimizing interpreters, I thought I might share it here as well: It is a first look at RPython and Truffle as foundations for language implementations: http://stefan-marr.de/2014/09/are-we-there-yet/?utm_source=ggroups&utm_medium=list&utm_campaign=self(The paywalled version is available here: http://dx.doi.org/10.1109/MS.2014.98) In the article, we briefly compare the AST-based approach of self-optimizing interpreters with the classic bytecode-based approach. Since then, we made further progress. Some of it can be seen in the performance numbers here: http://som-st.github.io/#performance Implementing a Smalltalk in barely 10k lines of code and being easily in the range of 1x-10x of Java looks pretty good, at least on paper :) Hope you'll find it interesting Stefan You received this message because you are subscribed to the Google Groups "Smalltalk Research" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
Stefan,
-- Great stuff! -- Jecel You received this message because you are subscribed to the Google Groups "Smalltalk Research" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
Back in 1984 I came up with an AST based VM for Smalltalk
-- as an alternative to bytecodes: http://merlintec.com/lsi/st84.txt In the mid 1990s I had some thoughts about how to implement that directly in hardware (with a focus on massive parallelism): http://merlintec.com/swiki/hardware/19.html These are all just vague musings, but I still feel they were good ideas and intend to explore this direction more in the future. -- Jecel You received this message because you are subscribed to the Google Groups "Smalltalk Research" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
Hi Jecel:
-- On Tuesday, September 23, 2014 11:14:59 PM UTC+2, Jecel wrote:
Skimming through the code, it looks a little incomplete or is it theoretically working, being a meta-circular implementation? For comparison, you might have a look at the SOM implementations. Not sure what the best entry point for code reading is, perhaps the parser: https://github.com/SOM-st/TruffleSOM/blob/master/src/som/compiler/Parser.java#L546 https://github.com/SOM-st/RTruffleSOM/blob/master/src/som/compiler/parser.py#L390 Either the Java or RPython version, guess it depends on taste. The RPython version is less complete when it comes to optimizations. (The older bytecode-based versions are here https://github.com/SOM-st/RPySOM and here https://github.com/SOM-st/som-java)
Hm, I wonder, is the idea of supporting objects in hardware still something that would be interesting? With the JIT compilation technology we got today, I feel that many optimizations are better done in software, and constraining hardware by making it object-aware potentially could restrict the possible optimizations? That's also with respect to the idea of using AST-based program representations. The reason that the AST-based interpreter are fast is that the compiler either traces through them (RPython) and generates close to ideal native code, or that a partial evaluator optimizes the interpreter code by taking concrete ASTs into account, which contain profiling data from actual program execution and at runtime determine specializations. Thus, the compiler gets plenty of information to generate efficient code. The generated code however does neither represent ASTs anymore nor does it necessarily contain any 'objects'. All objects local to a compilation unit, i.e., non-escaping, are usually optimized out. Objects are usually only reified at the borders of compilation units. After wrapping up the working on 'getting things reasonably fast', I also intent to get back to the concurrency part of the story. But that might only happen later this year. There's already a branch with thread support for TruffleSOM, but it still needs much more work. And in the end, I would like to experiment with various different models for instance actors. Best regards Stefan You received this message because you are subscribed to the Google Groups "Smalltalk Research" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
Stefan,
-- > > [1984 AST Smalltalk VM] > Skimming through the code, it looks a little incomplete or is it > theoretically working, being a meta-circular implementation? It is just a sketch on paper. To be an actual VM it would need an object memory system and primitives at the very least. At that time I was very interested in dataflow architectures and was wondering if a Smalltalk VM could be like that instead of a stack machine. > [SOM implementations] Thanks for the references. I have been keeping track of the SOM project but haven't read the code in detail yet. > Hm, I wonder, is the idea of supporting objects in hardware still > something that would be interesting? There is the famous ECOOP 95 paper by Urs Hölzle and David Ungar that take a look at what had changed since the SOAR project 10 years earlier and their conclusion was "no": https://www.cs.ucsb.edu/~urs/oocsb/papers/oo-hardware.html I don't agree, though I think their arguments are very good. But there is a bit of circular reasoning: they tried to make the best possible compiler for a machine that was designed to run C, then they compare the output of their compiler with the output of the C compiler and they look the same. I would not have expected anything else. > With the JIT compilation technology we got today, I feel that many > optimizations are better done in software, and constraining hardware > by making it object-aware potentially could restrict the possible > optimizations? This has been a real problem for many projects. Somebody designs a fancy MMU chip to sit between the processor and memory to do garbage collection faster and somebody else comes up with a new algorithm that makes a normal processor even faster. But I don't think this is as likely to happen now as 20 years ago since the state of the art has been stalled for so long. Software implementations aren't trying to make things even faster but instead the focus is on making things reasonably fast with a lot less effort. > [Objects are usually only reified at the borders of compilation units.] Indeed! I have learned the hard way that "the compiler can know the future but the hardware is limited to the past". Back in 1990 I was playing with some exotic architectures trying to get message passing down to one or two clock cycles. Then the Self group started publishing their papers and they got message passing to be a negative number of clock cycles! Yes, this is an accounting trick but in practice it is a fair comparison. So I changed my project to a slightly modified Sparc processor (which was, unfortunately, killed by my boss for not being academic enough). After I had a lot more experience with adaptive compilation (the title of the PhD thesis I am working on is "Adaptive Compilation for an Object-Oriented, Parallel and Reconfigurable Architecture", for example) I decided that mixing that with special hardware would be a very interesting research path. -- Jecel You received this message because you are subscribed to the Google Groups "Smalltalk Research" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
Hi there,
On Wed, 2014-09-24 at 13:09 -0700, Jecel wrote: > Stefan, > > > > [1984 AST Smalltalk VM] > > Skimming through the code, it looks a little incomplete or is it > > theoretically working, being a meta-circular implementation? > > It is just a sketch on paper. To be an actual VM it would need > an object memory system and primitives at the very least. At that > time I was very interested in dataflow architectures and was > wondering if a Smalltalk VM could be like that instead of a stack > machine. > > > [SOM implementations] > > Thanks for the references. I have been keeping track of the SOM > project > but haven't read the code in detail yet. > > > Hm, I wonder, is the idea of supporting objects in hardware still > > something that would be interesting? > > There is the famous ECOOP 95 paper by Urs Hölzle and David Ungar > that take a look at what had changed since the SOAR project 10 years > earlier and their conclusion was "no": > > https://www.cs.ucsb.edu/~urs/oocsb/papers/oo-hardware.html > > I don't agree, though I think their arguments are very good. But there > is a bit of circular reasoning: they tried to make the best possible > compiler for a machine that was designed to run C, then they compare > the output of their compiler with the output of the C compiler and > they > look the same. I would not have expected anything else. > > > With the JIT compilation technology we got today, I feel that many > > optimizations are better done in software, and constraining hardware > > by making it object-aware potentially could restrict the possible > > optimizations? > > This has been a real problem for many projects. Somebody designs > a fancy MMU chip to sit between the processor and memory to do > garbage collection faster and somebody else comes up with a new > algorithm that makes a normal processor even faster. I don't think it has to be really fancy. Having HW support for tagged pointers and tagged arithmetic could help, no? I mean having "add to tagged integers or jump if one is not tagged integer" kind of thing. Also having support for write barrier could help. SPARC used to have it in silicon, IIRC... Jan -- You received this message because you are subscribed to the Google Groups "Smalltalk Research" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
See Alan Knight's hardware article here.
http://www.macqueen.us/smalltalkReport/ST/91_95/SMAL0502.PDF On 9/24/14 14:10 , Jan Vrany wrote: > Hi there, > > On Wed, 2014-09-24 at 13:09 -0700, Jecel wrote: >> Stefan, >> >>>> [1984 AST Smalltalk VM] >>> Skimming through the code, it looks a little incomplete or is it >>> theoretically working, being a meta-circular implementation? >> >> It is just a sketch on paper. To be an actual VM it would need >> an object memory system and primitives at the very least. At that >> time I was very interested in dataflow architectures and was >> wondering if a Smalltalk VM could be like that instead of a stack >> machine. >> >>> [SOM implementations] >> >> Thanks for the references. I have been keeping track of the SOM >> project >> but haven't read the code in detail yet. >> >>> Hm, I wonder, is the idea of supporting objects in hardware still >>> something that would be interesting? >> >> There is the famous ECOOP 95 paper by Urs Hölzle and David Ungar >> that take a look at what had changed since the SOAR project 10 years >> earlier and their conclusion was "no": >> >> https://www.cs.ucsb.edu/~urs/oocsb/papers/oo-hardware.html >> >> I don't agree, though I think their arguments are very good. But there >> is a bit of circular reasoning: they tried to make the best possible >> compiler for a machine that was designed to run C, then they compare >> the output of their compiler with the output of the C compiler and >> they >> look the same. I would not have expected anything else. >> >>> With the JIT compilation technology we got today, I feel that many >>> optimizations are better done in software, and constraining hardware >>> by making it object-aware potentially could restrict the possible >>> optimizations? >> >> This has been a real problem for many projects. Somebody designs >> a fancy MMU chip to sit between the processor and memory to do >> garbage collection faster and somebody else comes up with a new >> algorithm that makes a normal processor even faster. > > I don't think it has to be really fancy. Having HW support for tagged > pointers and tagged arithmetic could help, no? I mean having "add to > tagged integers or jump if one is not tagged integer" kind of thing. > Also having support for write barrier could help. SPARC used to have it > in silicon, IIRC... > > Jan > > > > > -- You received this message because you are subscribed to the Google Groups "Smalltalk Research" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
In reply to this post by Jan Vrany
Jan,
-- > I don't think it has to be really fancy. Having HW support for tagged > pointers and tagged arithmetic could help, no? I mean having "add to > tagged integers or jump if one is not tagged integer" kind of thing. > Also having support for write barrier could help. SPARC used to have it > in silicon, IIRC... SOAR had tagged versions of all instructions. When they measured their benchmarks they found that only the tagged adds were making any difference. So SPARC kept the tagged add (and tagged subtract for symmetry), but when Urs measured their impact on his Self compiler he decided not to use these instructions. The problem is what when the tag check failed, an exception was raised and it took over a thousand clock cycles before SunOS started running the actual exception handler code. Obviously a tagged add instruction that takes zero clock cycles if the check works and only as much as a jump or call when it fails would have been far more useful. Another simple hardware feature that can help quite a bit is virtually addressed caches (as opposed to the more common physically addressed ones) like in the Mushroom Smalltalk computer. Mario Wolczko tried to get Sun to include this in their processors but they never did. The idea is that you only have to do an object table lookup on cache misses and not on every memory access. So you get the performance of direct pointers with the flexibility of object tables. -- Jecel You received this message because you are subscribed to the Google Groups "Smalltalk Research" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
In reply to this post by Andres Valloud-4
Andres,
-- > See Alan Knight's hardware article here. > > <a href="http://www.macqueen.us/smalltalkReport/ST/91_95/SMAL0502.PDF" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.macqueen.us%2FsmalltalkReport%2FST%2F91_95%2FSMAL0502.PDF\46sa\75D\46sntz\0751\46usg\75AFQjCNHoXiS5-64tZjoNfq3MVGlhxsmLVA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.macqueen.us%2FsmalltalkReport%2FST%2F91_95%2FSMAL0502.PDF\46sa\75D\46sntz\0751\46usg\75AFQjCNHoXiS5-64tZjoNfq3MVGlhxsmLVA';return true;">http://www.macqueen.us/ Thanks for the reference! In the end he is mostly citing Urs' article that I mentioned but the first part was the traditional economic argument against language specific processors. During the 1990s I think these ideas were very valid and I stopped my own Smalltalk hardware projects and work on software for PCs instead. Things had changed by the end of that decade, however, and we saw no lack of Forth and Java processors from then on. -- Jecel You received this message because you are subscribed to the Google Groups "Smalltalk Research" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
Hi Jecel-- > In the end he is mostly citing Urs' article that I mentioned but the > first part was the traditional economic argument against language > specific processors. > > During the 1990s I think these ideas were very valid and I stopped > my own Smalltalk hardware projects and work on software for PCs > instead. Things had changed by the end of that decade, however, and > we saw no lack of Forth and Java processors from then on. I'd also be interested to hear how the situation changed (or didn't) in the era of cheaper and more powerful FPGAs. -C -- Craig Latta netjam.org +31 6 2757 7177 (SMS ok) + 1 415 287 3547 (no SMS) -- You received this message because you are subscribed to the Google Groups "Smalltalk Research" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
In reply to this post by Jecel Assumpcao Jr
Strange. Even "today", Azul went from their custom hardware with very
specialized needs to commodity x86. On 9/25/14 15:06 , Jecel wrote: > Andres, > > > See Alan Knight's hardware article here. > > > > http://www.macqueen.us/smalltalkReport/ST/91_95/SMAL0502.PDF > <http://www.macqueen.us/smalltalkReport/ST/91_95/SMAL0502.PDF> > > Thanks for the reference! In the end he is mostly citing Urs' article > that I mentioned but the first part was the traditional economic > argument against language specific processors. > > During the 1990s I think these ideas were very valid and I stopped > my own Smalltalk hardware projects and work on software for PCs > instead. Things had changed by the end of that decade, however, and > we saw no lack of Forth and Java processors from then on. > > -- Jecel > > -- > You received this message because you are subscribed to the Google > Groups "Smalltalk Research" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to [hidden email] > <mailto:[hidden email]>. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Smalltalk Research" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
Andres,
-- > Strange. Even "today", Azul went from their custom hardware with very > specialized needs to commodity x86. That had been my impression as well, but looking at their site it seems that for now they are offering both options. It is extremely unlikely that they will make any newer versions of their own chips, however, and this is the classic argument against language specific processors - they don't have the volumes to ride Moore's Law like their competitors. I am cautious about reading too much into a single data point. For example: while it is true that Lisp Machines died, so did every single other minicomputer architecture that has ever existed. Was it the lispyness or the mininess that killed it? In the same way, I see the Azul architecture computing more against the likes of Tilera than against Intel or ARM. They had a very neat 64 bit RISC with a nice memory model, but it was less of a Java processor than an ARM with Jazelle 1 (though slightly more than Jazelle 2), for example. Sometimes these business decisions reflect more the result of executive shuffle than any keen insights into the future of the market. Like the new guy at ParcPlace-Digitalk feeling it would be cool to do Java like everyone else instead of this odd Smalltalk or the new guy at IBM thinking it would be best to drop their own OS/2 to build WinTel PCs like everyone else. It is scary to be different and if you bring an MBA from the food industry to run your company he will probably convince you to be like everyone else. -- Jecel You received this message because you are subscribed to the Google Groups "Smalltalk Research" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
In reply to this post by ccrraaiigg
Craig,
-- > I'd also be interested to hear how the situation changed (or > didn't) in the era of cheaper and more powerful FPGAs. The FPGA market has a lot of variation. You can get small ones for just a few dollars that are only a few times more powerful than the very first ones introduced in the mid 1980s. You can get some extremely large ones that cost of ten thousand dollars a piece (I wouldn't want to make a mistake when soldering one of these babies on a board!). I will ignore these extremes and just focus in the meat of the market between $10 and $100 per FPGA. In the early 1990s FPGAs were too small to implement a whole processor, but the Mushroom Smalltalk computer used a bunch of them to make it easier for them to experiment with design changes. Only one machine was built so the cost of the chips wasn't an important issue. By the late 1990s it was possible to put a complete simple RISC processor in an FPGA and by 2003 or so it was possible to put a reasonably complex processor or a bunch of simpler ones in a single FPGA. A decade later we have even more resources to play with. It is important to have realistic expectations, however. If you implement something like an ARM processor with a bunch of peripherals in a $10 FPGA it will probably run at less than 100 MHz and take up half of the area in the chip. Considering that you can probably buy an ARM just like that for $0.50 or less and you can buy a 700MHz dual core ARM for the $5 that your FPGA version is costing you, it doesn't seem like such a good idea. The main reason to do something like this is to fill the other half of your $10 FPGA with something that your application needs and that the ARMs you can buy off the shelf don't have. What this is depends on your application. An example from my master's project: I was able to get real time, full VGA resolution optical flow calculations from a custom 16MHz circuit running on an FPGA that doesn't even have a heat sink. You would need a very high end PC to get even close to this result. An interesting alternative is to play with your own processor designs. As Alan Kay likes to point out, people used to do this all the time. Commercial microprocessors killed this, but FPGAs give us the opportunity to restart research in this area. For this to be interesting as research, you only have to compare with other processors implemented in the same FPGAs and not with commercial ASIC processors. One thing that FPGAs can do that ASICs can't is have the circuit they implement be changed at runtime. We still don't have many results in this area, but objects can be switched from software to hardware without any changes to the rest of the system since they only interact via message passing. So it might be interesting the extend an adaptive compilation scheme so that a quick and dirty compiler would generate code the first time a method was called, a better compiler would be invoked when a hotspot was detected and for the extreme hotspots you could compile the object to hardware and place it in a spare area in the FPGA. I plan to work on this in the future, but for now have been doing an extremely limited version instead (using a fixed design coprocessor instead of compilation to hardware). -- Jecel You received this message because you are subscribed to the Google Groups "Smalltalk Research" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
Free forum by Nabble | Edit this page |