Smalltalk › Squeak › Squeak - Dev

[squeak-dev] Alto And Dorado performance

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

2 messages Options

Yoshiki Ohshima-2

[squeak-dev] Alto And Dorado performance

Hello,

For some reason, this page:
http://lists.canonical.org/pipermail/kragen-tol/2007-March/000850.html
and emails around:
http://lists.squeakfoundation.org/pipermail/squeak-dev/2005-April/091215.html

got my attention. Jecel and Tim speculated in one email that Dorado
run 200k-400k bytecode/sec, and later something like 1.75m bc/s. I
would imagine that some 20Mhz microcoded pipelining processor with
supporting I/O processors would surely do better than 1 bytecode per
100 clock cycles. 10 cycles for 1 bytecode sounds still too much (as
it is like current Squeak implementation, which seems to be more
efficient than Apple Smalltalk) but it might have been like that.
Also, table 9.1 and 9.2 in the Green Book (around p. 169 of
http://stephane.ducasse.free.fr/FreeBooks/BitsOfHistory/BitsOfHistory.pdf)
seem to indicate that Dorado ran about 20-30 times faster than the
assmebly implementation on MC68000 running at 5Mhz. So Dorado was
probably 5-6 times more efficient if normalized to the same clock
speed, and here it could be said that 4 cycles for 1 bytecode or such?
So with a very rough assesment, a CISC at 100Mhz would be comparable
to Dorado, and a 4GHz processor would be like 40 times faster than
Dorado, yet the transistor-count-wise, we indeed lost hundreds?

I tried http://www.squeaksource.com/SystemBenchmarks.html on my
computer and compare the numbers with the tables in the Green Book,
but it appears that the repetition counts must be very different.
Does anybody know the old numbers used in the book?

-- Yoshiki

Jecel Assumpcao Jr

Re: [squeak-dev] Alto And Dorado performance

Yoshiki Ohshima wrote:
> For some reason, this page:
> http://lists.canonical.org/pipermail/kragen-tol/2007-March/000850.html
> and emails around:
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2005-April/091215.html
>
> got my attention. Jecel and Tim speculated in one email that Dorado
> run 200k-400k bytecode/sec, and later something like 1.75m bc/s.

Just last week I got to read some interesting papers from OOPSLA 86
(from the university's network I have access to the ACM and IEEE
electronic libraries, but I rarely go there with a laptop these days),
particularly "Swamp: a fast processor for Smalltalk-80":

http://portal.acm.org/citation.cfm?id=960112.28710

They have some interesting numbers for Dorado's performance. It seems
that a peak of over 1M bc/s is about right, but real world performance
was closer to 300K bc/s.

> I
> would imagine that some 20Mhz microcoded pipelining processor with
> supporting I/O processors would surely do better than 1 bytecode per
> 100 clock cycles.

This is something that was worrying me a lot - my initial numbers for my
designs indicated that I would be 10 times faster than the Dorado at the
same clock speed, which didn't seem right. I spent some time looking for
a missing order of magnitude somewhere but gave up. The numbers in the
Swamp paper match mine exactly so it seems there was some problem with
the Dorado after all.

One version of the Smalltalk-80 microcode for the Dorado is available
online:

http://www.bitsavers.org/pdf/xerox/dorado/DoradoSmalltalkMicrocode.pdf

A quick look at this shows something that is very odd - there is a slow
and complicated microcode sequence to decode the bytecode. Yet every
hardware description pointed out that the Dorado had some fancy circuits
to dispatch 256 different codes in a single clock while supplying a
decoded constant to help speed up things even further. And there are
other features that are always stressed when presenting the machine and
which seem to have been totally ignored by that particular microcode
implementation. The green book chapter only mentions that the hardware
stacks were ignored as they were too hard to use efficiently.

> 10 cycles for 1 bytecode sounds still too much (as
> it is like current Squeak implementation, which seems to be more
> efficient than Apple Smalltalk) but it might have been like that.
> Also, table 9.1 and 9.2 in the Green Book (around p. 169 of
> http://stephane.ducasse.free.fr/FreeBooks/BitsOfHistory/BitsOfHistory.pdf)
> seem to indicate that Dorado ran about 20-30 times faster than the
> assmebly implementation on MC68000 running at 5Mhz. So Dorado was
> probably 5-6 times more efficient if normalized to the same clock
> speed, and here it could be said that 4 cycles for 1 bytecode or such?

The 68000 took four clock cycles (at least - many of the development
boards used in early implementation had lots of wait states) for each
memory access, and a typical instruction made several such accesses. So
we might say that a 5MHz 68000 machine is rather slow memory ran at 0.5
MIPS. The Dorado ran 14M microinstructions per second and had a cache
that allowed memory to mostly match that. It is hard to directly compare
the two, but I would say the efficiency of both implementations was
about the same (and 10 times worse than current Squeak).

> So with a very rough assesment, a CISC at 100Mhz would be comparable
> to Dorado, and a 4GHz processor would be like 40 times faster than
> Dorado, yet the transistor-count-wise, we indeed lost hundreds?

Well, high end CISCs now execute an average of two instructions per
clock. But the memory hasn't kept up, though we now have large
multilevel caches. Based on the instruction rate alone I would expect a
4GHz CISC to be 560 times faster than the Dorado. Squeak actually beats
that by a factor of 2 or so.

> I tried http://www.squeaksource.com/SystemBenchmarks.html on my
> computer and compare the numbers with the tables in the Green Book,
> but it appears that the repetition counts must be very different.
> Does anybody know the old numbers used in the book?

A quick search on the web didn't turn up anything. I can try to see if
there is anything in the sources for the old Apple Smalltalk-80
tomorrow.

-- Jecel