In 1978 we built a Smalltalk to run on the NoteTaker, an experimental
somewhat portable (we called it *trans* portable ;-) computer with an 8-MHz (this was considered fast ;-) 8086 processor. In addition to the ST processor board, there was an I/O board with a 4-MHz 8086, and also an Ethernet board with an 8-MHz 8086. We got ST going on it, and it actually ran pretty well. But you know how it is -- faster is always better. We weren't doing anything with the Ethernet board, and I started to get ideas... This was the first Smalltalk that did all graphics with BitBlt and Smalltalk so, eg, while displaying text, there was as much work running the bytecodes to pick characters and set up BitBlt as there was to actually do the BitBlt operations. So it occurred to me to have the BitBlt primitive simply put BitBlt requests on a queue, and then use the Etherenet processor to read the queue and do the actual Blt. I can't believe that I actually did this, but the truth is that it worked beautifully. We only used it a few times because there was only one working Ethernet board (for that matter there was usually only one working NoteTaker ;-) but the result was nearly a full factor of two improvement in how the UI behaved. Obviously care had to be taken to wait for completion when the destination was not the Display, but it's easy to check and occurs relatively infrequently. [I can't remember, but I think I simply ran these in the ST processor rather than queuing them for the Ether processor, so in some cases BitBlt could actually be running in *both* processors] So this is just a suggestion that, when multiple cores are available, Squeak might well run such things as screen updates almost twice as fast if we put the BitBlt engine (ie everything after the tests for failures) in a separate thread from the rest of Squeak. What's nice is that it's a very local change with *no* impact to the rest of Squeak except to make things run faster. If someone has suggested this already I apologize -- I get behind on my email sometimes. - Dan |
We've been doing this with X since 1984....
It always confounded some people that apps would work faster across the network than locally. So yes, MP's help performance this way, with a suitable implementation with queuing. - Jim On Fri, 2007-01-26 at 22:41 -0800, Dan Ingalls wrote: > In 1978 we built a Smalltalk to run on the NoteTaker, an experimental > somewhat portable (we called it *trans* portable ;-) computer with an > 8-MHz (this was considered fast ;-) 8086 processor. In addition to > the ST processor board, there was an I/O board with a 4-MHz 8086, and > also an Ethernet board with an 8-MHz 8086. We got ST going on it, > and it actually ran pretty well. > > But you know how it is -- faster is always better. We weren't doing > anything with the Ethernet board, and I started to get ideas... > > This was the first Smalltalk that did all graphics with BitBlt and > Smalltalk so, eg, while displaying text, there was as much work > running the bytecodes to pick characters and set up BitBlt as there > was to actually do the BitBlt operations. So it occurred to me to > have the BitBlt primitive simply put BitBlt requests on a queue, and > then use the Etherenet processor to read the queue and do the actual > Blt. I can't believe that I actually did this, but the truth is that > it worked beautifully. We only used it a few times because there was > only one working Ethernet board (for that matter there was usually > only one working NoteTaker ;-) but the result was nearly a full > factor of two improvement in how the UI behaved. Obviously care had > to be taken to wait for completion when the destination was not the > Display, but it's easy to check and occurs relatively infrequently. > [I can't remember, but I think I simply ran these in the ST processor > rather than queuing them for the Ether processor, so in some cases > BitBlt could actually be running in *both* processors] > > So this is just a suggestion that, when multiple cores are available, > Squeak might well run such things as screen updates almost twice as > fast if we put the BitBlt engine (ie everything after the tests for > failures) in a separate thread from the rest of Squeak. What's nice > is that it's a very local change with *no* impact to the rest of > Squeak except to make things run faster. > > If someone has suggested this already I apologize -- I get behind on > my email sometimes. > > - Dan > Jim Gettys One Laptop Per Child |
In reply to this post by Dan Ingalls
The Heeg port of objectworks to some dual cpu 68k workstation - back
in the day when 68k cpus were the dogs nadgers - did this core/blt split. Mike Reuger can probably remember the details. It did the bitblt and scanchars benchmarks *very* fast. Then again the BrouHaHa/Archimedes port I did actually beat it on both those with a single 8MHz ARM2 :-) No cache at all, not even instruction prefetch. It's amazing what a clean instruction set and a barrel shifter can do for you. I guess we could claim that using things like the Rome/Cairo plugin is doing a modern equivalent since it is supposed to pass off most rendering chores to the GPU. If Squeak made effective use of such capabilities for all the normal UI stuff it would certainly improve things; certainly it is effective in Sophie. Unfortunately there is still a huge amount of cruft in morphic/tweak that somehow soaks up cycles like like a nsaty corner on the Tour de France. And there's Croquet of course. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Strange OpCodes: DST: Deadlock System Tables |
tim Rowledge wrote:
> The Heeg port of objectworks to some dual cpu 68k workstation - back in > the day when 68k cpus were the dogs nadgers - did this core/blt split. > Mike Reuger can probably remember the details. It did the bitblt and > scanchars benchmarks *very* fast. The PCS Cadmus had a second 68k processor integrated into the bitmap display. I don't remember who did the coding (Hans Martin?), but we did what Dan described, offloading bitblt and char scan to the display. Made for an amazing Smalltalk experience back then :-) (250% Dorado) > I guess we could claim that using things like the Rome/Cairo plugin is > doing a modern equivalent since it is supposed to pass off most > rendering chores to the GPU. If Squeak made effective use of such The main bottleneck is that in most cases we still have at least one extra bitblt as Squeak doesn't support a native surface as Display yet. (at least to my knowledge and what I got from discussions with Bert, he can explain that in more detail than I can). Michael |
On Jan 27, 2007, at 10:39 , Michael Rueger wrote:
> tim Rowledge wrote: >> I guess we could claim that using things like the Rome/Cairo >> plugin is doing a modern equivalent since it is supposed to pass >> off most rendering chores to the GPU. If Squeak made effective use >> of such > > The main bottleneck is that in most cases we still have at least > one extra bitblt as Squeak doesn't support a native surface as > Display yet. (at least to my knowledge and what I got from > discussions with Bert, he can explain that in more detail than I can). Indeed, you would have to use a Cairo backend that actually uses the GPU (see http://cairographics.org/backends). Currently it uses the image backend to an in-memory surface, bitblts that to Display, which then is put onto the screen by the usual VM mechanics. No speedup yet, this was about rendering quality foremost. But if we could get a handle to a cairo surface that directly draws onto an OS window (using the Win/Mac/Xlib backends) then we should see a considerable speed-up. Areitha Ffenestri Romanus :) - Bert - |
Free forum by Nabble | Edit this page |