A Thought about Multiple Cores

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

A Thought about Multiple Cores

Dan Ingalls
In 1978 we built a Smalltalk to run on the NoteTaker, an experimental
somewhat portable (we called it *trans* portable ;-) computer with an
8-MHz (this was considered fast ;-) 8086 processor.  In addition to
the ST processor board, there was an I/O board with a 4-MHz 8086, and
also an Ethernet board with an 8-MHz 8086.  We got ST going on it,
and it actually ran pretty well.

But you know how it is -- faster is always better.  We weren't doing
anything with the Ethernet board, and I started to get ideas...

This was the first Smalltalk that did all graphics with BitBlt and
Smalltalk so, eg, while displaying text, there was as much work
running the bytecodes to pick characters and set up BitBlt as there
was to actually do the BitBlt operations.  So it occurred to me to
have the BitBlt primitive simply put BitBlt requests on a queue, and
then use the Etherenet processor to read the queue and do the actual
Blt.  I can't believe that I actually did this, but the truth is that
it worked beautifully.  We only used it a few times because there was
only one working Ethernet board (for that matter there was usually
only one working NoteTaker ;-) but the result was nearly a full
factor of two improvement in how the UI behaved.  Obviously care had
to be taken to wait for completion when the destination was not the
Display, but it's easy to check and occurs relatively infrequently.
[I can't remember, but I think I simply ran these in the ST processor
rather than queuing them for the Ether processor, so in some cases
BitBlt could actually be running in *both* processors]

So this is just a suggestion that, when multiple cores are available,
Squeak might well run such things as screen updates almost twice as
fast if we put the BitBlt engine (ie everything after the tests for
failures) in a separate thread from the rest of Squeak.  What's nice
is that it's a very local change with *no* impact to the rest of
Squeak except to make things run faster.

If someone has suggested this already I apologize -- I get behind on
my email sometimes.

        - Dan

Reply | Threaded
Open this post in threaded view
|

Re: A Thought about Multiple Cores

Jim Gettys-3
We've been doing this with X since 1984....

It always confounded some people that apps would work faster across the
network than locally.

So yes, MP's help performance this way, with a suitable implementation
with queuing.
                                     - Jim


On Fri, 2007-01-26 at 22:41 -0800, Dan Ingalls wrote:

> In 1978 we built a Smalltalk to run on the NoteTaker, an experimental
> somewhat portable (we called it *trans* portable ;-) computer with an
> 8-MHz (this was considered fast ;-) 8086 processor.  In addition to
> the ST processor board, there was an I/O board with a 4-MHz 8086, and
> also an Ethernet board with an 8-MHz 8086.  We got ST going on it,
> and it actually ran pretty well.
>
> But you know how it is -- faster is always better.  We weren't doing
> anything with the Ethernet board, and I started to get ideas...
>
> This was the first Smalltalk that did all graphics with BitBlt and
> Smalltalk so, eg, while displaying text, there was as much work
> running the bytecodes to pick characters and set up BitBlt as there
> was to actually do the BitBlt operations.  So it occurred to me to
> have the BitBlt primitive simply put BitBlt requests on a queue, and
> then use the Etherenet processor to read the queue and do the actual
> Blt.  I can't believe that I actually did this, but the truth is that
> it worked beautifully.  We only used it a few times because there was
> only one working Ethernet board (for that matter there was usually
> only one working NoteTaker ;-) but the result was nearly a full
> factor of two improvement in how the UI behaved.  Obviously care had
> to be taken to wait for completion when the destination was not the
> Display, but it's easy to check and occurs relatively infrequently.
> [I can't remember, but I think I simply ran these in the ST processor
> rather than queuing them for the Ether processor, so in some cases
> BitBlt could actually be running in *both* processors]
>
> So this is just a suggestion that, when multiple cores are available,
> Squeak might well run such things as screen updates almost twice as
> fast if we put the BitBlt engine (ie everything after the tests for
> failures) in a separate thread from the rest of Squeak.  What's nice
> is that it's a very local change with *no* impact to the rest of
> Squeak except to make things run faster.
>
> If someone has suggested this already I apologize -- I get behind on
> my email sometimes.
>
> - Dan
>
--
Jim Gettys
One Laptop Per Child



Reply | Threaded
Open this post in threaded view
|

Re: A Thought about Multiple Cores

timrowledge
In reply to this post by Dan Ingalls
The Heeg port of objectworks to some dual cpu 68k workstation - back  
in the day when 68k cpus were the dogs nadgers - did this core/blt  
split. Mike Reuger can probably remember the details. It did the  
bitblt and scanchars benchmarks *very* fast.

Then again the BrouHaHa/Archimedes port I did actually beat it on  
both those with a single 8MHz ARM2 :-) No cache at all, not even  
instruction prefetch. It's amazing what a clean instruction set and a  
barrel shifter can do for you.

I guess we could claim that using things like the Rome/Cairo plugin  
is doing a modern equivalent since it is supposed to pass off most  
rendering chores to the GPU. If Squeak made effective use of such  
capabilities for all the normal UI stuff it would certainly improve  
things; certainly it is effective in Sophie. Unfortunately there is  
still a huge amount of cruft in morphic/tweak that somehow soaks up  
cycles like like a nsaty corner on the Tour de France.

And there's Croquet of course.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: DST: Deadlock System Tables



Reply | Threaded
Open this post in threaded view
|

Re: A Thought about Multiple Cores

Michael Rueger-6
tim Rowledge wrote:
> The Heeg port of objectworks to some dual cpu 68k workstation - back in
> the day when 68k cpus were the dogs nadgers - did this core/blt split.
> Mike Reuger can probably remember the details. It did the bitblt and
> scanchars benchmarks *very* fast.

The PCS Cadmus had a second 68k processor integrated into the bitmap
display. I don't remember who did the coding (Hans Martin?), but we did
what Dan described, offloading bitblt and char scan to the display. Made
for an amazing Smalltalk experience back then :-) (250% Dorado)

> I guess we could claim that using things like the Rome/Cairo plugin is
> doing a modern equivalent since it is supposed to pass off most
> rendering chores to the GPU. If Squeak made effective use of such

The main bottleneck is that in most cases we still have at least one
extra bitblt as Squeak doesn't support a native surface as Display yet.
(at least to my knowledge and what I got from discussions with Bert, he
can explain that in more detail than I can).

Michael

Reply | Threaded
Open this post in threaded view
|

Re: A Thought about Multiple Cores

Bert Freudenberg
On Jan 27, 2007, at 10:39 , Michael Rueger wrote:

> tim Rowledge wrote:
>> I guess we could claim that using things like the Rome/Cairo  
>> plugin is doing a modern equivalent since it is supposed to pass  
>> off most rendering chores to the GPU. If Squeak made effective use  
>> of such
>
> The main bottleneck is that in most cases we still have at least  
> one extra bitblt as Squeak doesn't support a native surface as  
> Display yet. (at least to my knowledge and what I got from  
> discussions with Bert, he can explain that in more detail than I can).

Indeed, you would have to use a Cairo backend that actually uses the  
GPU (see http://cairographics.org/backends). Currently it uses the  
image backend to an in-memory surface, bitblts that to Display, which  
then is put onto the screen by the usual VM mechanics. No speedup  
yet, this was about rendering quality foremost.

But if we could get a handle to a cairo surface that directly draws  
onto an OS window (using the Win/Mac/Xlib backends) then we should  
see a considerable speed-up. Areitha Ffenestri Romanus :)

- Bert -