Hello,
For some reason, this page: http://lists.canonical.org/pipermail/kragen-tol/2007-March/000850.html and emails around: http://lists.squeakfoundation.org/pipermail/squeak-dev/2005-April/091215.html got my attention. Jecel and Tim speculated in one email that Dorado run 200k-400k bytecode/sec, and later something like 1.75m bc/s. I would imagine that some 20Mhz microcoded pipelining processor with supporting I/O processors would surely do better than 1 bytecode per 100 clock cycles. 10 cycles for 1 bytecode sounds still too much (as it is like current Squeak implementation, which seems to be more efficient than Apple Smalltalk) but it might have been like that. Also, table 9.1 and 9.2 in the Green Book (around p. 169 of http://stephane.ducasse.free.fr/FreeBooks/BitsOfHistory/BitsOfHistory.pdf) seem to indicate that Dorado ran about 20-30 times faster than the assmebly implementation on MC68000 running at 5Mhz. So Dorado was probably 5-6 times more efficient if normalized to the same clock speed, and here it could be said that 4 cycles for 1 bytecode or such? So with a very rough assesment, a CISC at 100Mhz would be comparable to Dorado, and a 4GHz processor would be like 40 times faster than Dorado, yet the transistor-count-wise, we indeed lost hundreds? I tried http://www.squeaksource.com/SystemBenchmarks.html on my computer and compare the numbers with the tables in the Green Book, but it appears that the repetition counts must be very different. Does anybody know the old numbers used in the book? -- Yoshiki |
Yoshiki Ohshima wrote:
> For some reason, this page: > http://lists.canonical.org/pipermail/kragen-tol/2007-March/000850.html > and emails around: > http://lists.squeakfoundation.org/pipermail/squeak-dev/2005-April/091215.html > > got my attention. Jecel and Tim speculated in one email that Dorado > run 200k-400k bytecode/sec, and later something like 1.75m bc/s. Just last week I got to read some interesting papers from OOPSLA 86 (from the university's network I have access to the ACM and IEEE electronic libraries, but I rarely go there with a laptop these days), particularly "Swamp: a fast processor for Smalltalk-80": http://portal.acm.org/citation.cfm?id=960112.28710 They have some interesting numbers for Dorado's performance. It seems that a peak of over 1M bc/s is about right, but real world performance was closer to 300K bc/s. > I > would imagine that some 20Mhz microcoded pipelining processor with > supporting I/O processors would surely do better than 1 bytecode per > 100 clock cycles. This is something that was worrying me a lot - my initial numbers for my designs indicated that I would be 10 times faster than the Dorado at the same clock speed, which didn't seem right. I spent some time looking for a missing order of magnitude somewhere but gave up. The numbers in the Swamp paper match mine exactly so it seems there was some problem with the Dorado after all. One version of the Smalltalk-80 microcode for the Dorado is available online: http://www.bitsavers.org/pdf/xerox/dorado/DoradoSmalltalkMicrocode.pdf A quick look at this shows something that is very odd - there is a slow and complicated microcode sequence to decode the bytecode. Yet every hardware description pointed out that the Dorado had some fancy circuits to dispatch 256 different codes in a single clock while supplying a decoded constant to help speed up things even further. And there are other features that are always stressed when presenting the machine and which seem to have been totally ignored by that particular microcode implementation. The green book chapter only mentions that the hardware stacks were ignored as they were too hard to use efficiently. > 10 cycles for 1 bytecode sounds still too much (as > it is like current Squeak implementation, which seems to be more > efficient than Apple Smalltalk) but it might have been like that. > Also, table 9.1 and 9.2 in the Green Book (around p. 169 of > http://stephane.ducasse.free.fr/FreeBooks/BitsOfHistory/BitsOfHistory.pdf) > seem to indicate that Dorado ran about 20-30 times faster than the > assmebly implementation on MC68000 running at 5Mhz. So Dorado was > probably 5-6 times more efficient if normalized to the same clock > speed, and here it could be said that 4 cycles for 1 bytecode or such? The 68000 took four clock cycles (at least - many of the development boards used in early implementation had lots of wait states) for each memory access, and a typical instruction made several such accesses. So we might say that a 5MHz 68000 machine is rather slow memory ran at 0.5 MIPS. The Dorado ran 14M microinstructions per second and had a cache that allowed memory to mostly match that. It is hard to directly compare the two, but I would say the efficiency of both implementations was about the same (and 10 times worse than current Squeak). > So with a very rough assesment, a CISC at 100Mhz would be comparable > to Dorado, and a 4GHz processor would be like 40 times faster than > Dorado, yet the transistor-count-wise, we indeed lost hundreds? Well, high end CISCs now execute an average of two instructions per clock. But the memory hasn't kept up, though we now have large multilevel caches. Based on the instruction rate alone I would expect a 4GHz CISC to be 560 times faster than the Dorado. Squeak actually beats that by a factor of 2 or so. > I tried http://www.squeaksource.com/SystemBenchmarks.html on my > computer and compare the numbers with the tables in the Green Book, > but it appears that the repetition counts must be very different. > Does anybody know the old numbers used in the book? A quick search on the web didn't turn up anything. I can try to see if there is anything in the sources for the old Apple Smalltalk-80 tomorrow. -- Jecel |
Free forum by Nabble | Edit this page |