Smalltalk › Squeak › Squeak - Dev

A Thought about Multiple Cores

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

5 messages Options

Dan Ingalls

A Thought about Multiple Cores

In 1978 we built a Smalltalk to run on the NoteTaker, an experimental
somewhat portable (we called it *trans* portable ;-) computer with an
8-MHz (this was considered fast ;-) 8086 processor. In addition to
the ST processor board, there was an I/O board with a 4-MHz 8086, and
also an Ethernet board with an 8-MHz 8086. We got ST going on it,
and it actually ran pretty well.

But you know how it is -- faster is always better. We weren't doing
anything with the Ethernet board, and I started to get ideas...

This was the first Smalltalk that did all graphics with BitBlt and
Smalltalk so, eg, while displaying text, there was as much work
running the bytecodes to pick characters and set up BitBlt as there
was to actually do the BitBlt operations. So it occurred to me to
have the BitBlt primitive simply put BitBlt requests on a queue, and
then use the Etherenet processor to read the queue and do the actual
Blt. I can't believe that I actually did this, but the truth is that
it worked beautifully. We only used it a few times because there was
only one working Ethernet board (for that matter there was usually
only one working NoteTaker ;-) but the result was nearly a full
factor of two improvement in how the UI behaved. Obviously care had
to be taken to wait for completion when the destination was not the
Display, but it's easy to check and occurs relatively infrequently.
[I can't remember, but I think I simply ran these in the ST processor
rather than queuing them for the Ether processor, so in some cases
BitBlt could actually be running in *both* processors]

So this is just a suggestion that, when multiple cores are available,
Squeak might well run such things as screen updates almost twice as
fast if we put the BitBlt engine (ie everything after the tests for
failures) in a separate thread from the rest of Squeak. What's nice
is that it's a very local change with *no* impact to the rest of
Squeak except to make things run faster.

If someone has suggested this already I apologize -- I get behind on
my email sometimes.

- Dan

Jim Gettys-3

Re: A Thought about Multiple Cores

We've been doing this with X since 1984....

It always confounded some people that apps would work faster across the
network than locally.

So yes, MP's help performance this way, with a suitable implementation
with queuing.
- Jim

On Fri, 2007-01-26 at 22:41 -0800, Dan Ingalls wrote:

> In 1978 we built a Smalltalk to run on the NoteTaker, an experimental
> somewhat portable (we called it *trans* portable ;-) computer with an
> 8-MHz (this was considered fast ;-) 8086 processor. In addition to
> the ST processor board, there was an I/O board with a 4-MHz 8086, and
> also an Ethernet board with an 8-MHz 8086. We got ST going on it,
> and it actually ran pretty well.
>
> But you know how it is -- faster is always better. We weren't doing
> anything with the Ethernet board, and I started to get ideas...
>
> This was the first Smalltalk that did all graphics with BitBlt and
> Smalltalk so, eg, while displaying text, there was as much work
> running the bytecodes to pick characters and set up BitBlt as there
> was to actually do the BitBlt operations. So it occurred to me to
> have the BitBlt primitive simply put BitBlt requests on a queue, and
> then use the Etherenet processor to read the queue and do the actual
> Blt. I can't believe that I actually did this, but the truth is that
> it worked beautifully. We only used it a few times because there was
> only one working Ethernet board (for that matter there was usually
> only one working NoteTaker ;-) but the result was nearly a full
> factor of two improvement in how the UI behaved. Obviously care had
> to be taken to wait for completion when the destination was not the
> Display, but it's easy to check and occurs relatively infrequently.
> [I can't remember, but I think I simply ran these in the ST processor
> rather than queuing them for the Ether processor, so in some cases
> BitBlt could actually be running in *both* processors]
>
> So this is just a suggestion that, when multiple cores are available,
> Squeak might well run such things as screen updates almost twice as
> fast if we put the BitBlt engine (ie everything after the tests for
> failures) in a separate thread from the rest of Squeak. What's nice
> is that it's a very local change with *no* impact to the rest of
> Squeak except to make things run faster.
>
> If someone has suggested this already I apologize -- I get behind on
> my email sometimes.
>
> - Dan
>

--
Jim Gettys
One Laptop Per Child

timrowledge

Re: A Thought about Multiple Cores

In reply to this post by Dan Ingalls

The Heeg port of objectworks to some dual cpu 68k workstation - back
in the day when 68k cpus were the dogs nadgers - did this core/blt
split. Mike Reuger can probably remember the details. It did the
bitblt and scanchars benchmarks *very* fast.

Then again the BrouHaHa/Archimedes port I did actually beat it on
both those with a single 8MHz ARM2 :-) No cache at all, not even
instruction prefetch. It's amazing what a clean instruction set and a
barrel shifter can do for you.

I guess we could claim that using things like the Rome/Cairo plugin
is doing a modern equivalent since it is supposed to pass off most
rendering chores to the GPU. If Squeak made effective use of such
capabilities for all the normal UI stuff it would certainly improve
things; certainly it is effective in Sophie. Unfortunately there is
still a huge amount of cruft in morphic/tweak that somehow soaks up
cycles like like a nsaty corner on the Tour de France.

And there's Croquet of course.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: DST: Deadlock System Tables

Michael Rueger-6

Re: A Thought about Multiple Cores

tim Rowledge wrote:
> The Heeg port of objectworks to some dual cpu 68k workstation - back in
> the day when 68k cpus were the dogs nadgers - did this core/blt split.
> Mike Reuger can probably remember the details. It did the bitblt and
> scanchars benchmarks *very* fast.

The PCS Cadmus had a second 68k processor integrated into the bitmap
display. I don't remember who did the coding (Hans Martin?), but we did
what Dan described, offloading bitblt and char scan to the display. Made
for an amazing Smalltalk experience back then :-) (250% Dorado)

> I guess we could claim that using things like the Rome/Cairo plugin is
> doing a modern equivalent since it is supposed to pass off most
> rendering chores to the GPU. If Squeak made effective use of such

The main bottleneck is that in most cases we still have at least one
extra bitblt as Squeak doesn't support a native surface as Display yet.
(at least to my knowledge and what I got from discussions with Bert, he
can explain that in more detail than I can).

Michael

Bert Freudenberg

Re: A Thought about Multiple Cores

On Jan 27, 2007, at 10:39 , Michael Rueger wrote:

> tim Rowledge wrote:
>> I guess we could claim that using things like the Rome/Cairo
>> plugin is doing a modern equivalent since it is supposed to pass
>> off most rendering chores to the GPU. If Squeak made effective use
>> of such
>
> The main bottleneck is that in most cases we still have at least
> one extra bitblt as Squeak doesn't support a native surface as
> Display yet. (at least to my knowledge and what I got from
> discussions with Bert, he can explain that in more detail than I can).

Indeed, you would have to use a Cairo backend that actually uses the
GPU (see http://cairographics.org/backends). Currently it uses the
image backend to an in-memory surface, bitblts that to Display, which
then is put onto the screen by the usual VM mechanics. No speedup
yet, this was about rendering quality foremost.

But if we could get a handle to a cairo surface that directly draws
onto an OS window (using the Win/Mac/Xlib backends) then we should
see a considerable speed-up. Areitha Ffenestri Romanus :)

- Bert -