New, faster RISC OS Squeak

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

New, faster RISC OS Squeak

timrowledge
I know there are many of you that are just waiting with bated breath for a faster RISC OS Squeak; well now you can stop turning that unattractive shade of blue. I found a spot where a stupid amount of time was being wasted in the UI, fixed it and hey presto! A dramatic improvement in Morphic UI performance. No longer does typing drag horribly. No more do menus take a coffee break to appear.
http://www.rowledge.org/tim/squeak/Squeak3-9d-RISCOS.zip

As an aside - and talking of stupidly wasted time in the VM- it turns out that around 20% of the entire time is spent handling the insane nonsense of converting old-style Mac OS pixels into proper RISC OS pixels. Are x86 machines afflicted the same way? Maybe the fast graphics cards are able to mask the time taken, but I'm pretty sure they're spending some time on the job.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: HEM: Hide Evidence of Malfunction



Reply | Threaded
Open this post in threaded view
|

Re: New, faster RISC OS Squeak

David T. Lewis
On Wed, Feb 06, 2013 at 08:09:48PM -0800, tim Rowledge wrote:
> I know there are many of you that are just waiting with bated breath for a faster RISC OS Squeak; well now you can stop turning that unattractive shade of blue. I found a spot where a stupid amount of time was being wasted in the UI, fixed it and hey presto! A dramatic improvement in Morphic UI performance. No longer does typing drag horribly. No more do menus take a coffee break to appear.
> http://www.rowledge.org/tim/squeak/Squeak3-9d-RISCOS.zip
>
> As an aside - and talking of stupidly wasted time in the VM- it turns out that around 20% of the entire time is spent handling the insane nonsense of converting old-style Mac OS pixels into proper RISC OS pixels. Are x86 machines afflicted the same way? Maybe the fast graphics cards are able to mask the time taken, but I'm pretty sure they're spending some time on the job.
>

Can you say which methods in Smalltalk (VMMaker) or functions in C are
consuming the 20% of processing? This sounds like something we should
profile on one or two other platforms but I'm not sure what to look for.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: New, faster RISC OS Squeak

timrowledge

On 06-02-2013, at 8:27 PM, "David T. Lewis" <[hidden email]> wrote:
>
> Can you say which methods in Smalltalk (VMMaker) or functions in C are
> consuming the 20% of processing? This sounds like something we should
> profile on one or two other platforms but I'm not sure what to look for.

It's very platform specific. Typically either ioShowDisplay or ioForceDisplayUpdate actually does the work of moving pixels from the object world to the glass. Squeak pixels are old Mac OS format - big endian, a specific R,G,B & Alpha layout - and have to be converted to the platform's needs. On RISC OS right now, each time the ioShowDisplay code is called, the relevant pixels are converted and copied to a shadow bitmap kept in RISC OS format; then when display update events come from the OS, the accumulated damage areas are copied from that to the glass. It takes roughly as long to convert each area as it takes to update the screen (on my Pi, that is. Some other RISC OS machines have faster memory busses or faster graphics subsystems) and for flashing a largish area of the screen that adds up fast.

It's a long time ago and I don't recall very precisely, but I think Andreas did some work on making bitblt to a platform bitmap work much better under certain circumstances. IIRC it sort-of did the right thing for RISC OS but failed horribly in some other way. I'll need to take another look at that.

Even longer ago I had code to make BitBLT work little-endian entirely, and horribly abused the RISC OS graphics system to make it think the Display bitmap object was actual screen buffer.

Even longer ago than *that* I had a system that simply stole the display hardware entirely. Wouldn't work with any other application windows tough. Not so popular…. but bloody fast :-)

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: NNI: Neglect Next Instruction



Reply | Threaded
Open this post in threaded view
|

Re: New, faster RISC OS Squeak

J. Vuletich (mail lists)
Hi Tim,

The Windows VM supports both big endian and little endian Display.  
This was done by Andreas and is supported in Cuis, see  
#setDisplayDepth. This essentially came from Squeak, so it's likely  
that Squeak can do the same.

On the Windows machines I tried, I could not see relevant performance  
improvements, and I guess this is because of Intel's instruction set  
and Andreas' performance tricks. But supporting this could be useful  
for RISC OS, right?

I remember in the old days of OS/2 Squeak, I added inline asssembly to  
use the fast intel asm op for byte reversing a 32 bit word...

Cheers,
Juan Vuletich

Quoting tim Rowledge <[hidden email]>:

>
> On 06-02-2013, at 8:27 PM, "David T. Lewis" <[hidden email]> wrote:
>>
>> Can you say which methods in Smalltalk (VMMaker) or functions in C are
>> consuming the 20% of processing? This sounds like something we should
>> profile on one or two other platforms but I'm not sure what to look for.
>
> It's very platform specific. Typically either ioShowDisplay or  
> ioForceDisplayUpdate actually does the work of moving pixels from  
> the object world to the glass. Squeak pixels are old Mac OS format -  
> big endian, a specific R,G,B & Alpha layout - and have to be  
> converted to the platform's needs. On RISC OS right now, each time  
> the ioShowDisplay code is called, the relevant pixels are converted  
> and copied to a shadow bitmap kept in RISC OS format; then when  
> display update events come from the OS, the accumulated damage areas  
> are copied from that to the glass. It takes roughly as long to  
> convert each area as it takes to update the screen (on my Pi, that  
> is. Some other RISC OS machines have faster memory busses or faster  
> graphics subsystems) and for flashing a largish area of the screen  
> that adds up fast.
>
> It's a long time ago and I don't recall very precisely, but I think  
> Andreas did some work on making bitblt to a platform bitmap work  
> much better under certain circumstances. IIRC it sort-of did the  
> right thing for RISC OS but failed horribly in some other way. I'll  
> need to take another look at that.
>
> Even longer ago I had code to make BitBLT work little-endian  
> entirely, and horribly abused the RISC OS graphics system to make it  
> think the Display bitmap object was actual screen buffer.
>
> Even longer ago than *that* I had a system that simply stole the  
> display hardware entirely. Wouldn't work with any other application  
> windows tough. Not so popular…. but bloody fast :-)
>
> tim
> --
> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> Strange OpCodes: NNI: Neglect Next Instruction
>
>
>
>



Cheers,
Juan Vuletich


Reply | Threaded
Open this post in threaded view
|

Re: New, faster RISC OS Squeak

Hannes Hirzel
In reply to this post by timrowledge
On 2/7/13, tim Rowledge <[hidden email]> wrote:
> Even longer ago than *that* I had a system that simply stole the display
> hardware entirely. Wouldn't work with any other application windows tough.
> Not so popular…. but bloody fast :-)

This might be good for the R-Pi as well if possible. I can see Squeak
as the only application running on RiscOS full screen. If I want
something else I would go for another SD card with a completely
different environment.

--Hannes

Reply | Threaded
Open this post in threaded view
|

Re: New, faster RISC OS Squeak

timrowledge
In reply to this post by J. Vuletich (mail lists)

On 07-02-2013, at 4:30 AM, "Juan Vuletich (mail lists)" <[hidden email]> wrote:
> The Windows VM supports both big endian and little endian Display. This was done by Andreas and is supported in Cuis, see #setDisplayDepth. This essentially came from Squeak, so it's likely that Squeak can do the same.

Yeah, that's pretty much what I remember. IIRC it sorta worked on RISC OS but something important didn't do the right thing… maybe display update areas were wrong? I'll have to try it out again sometime. To make life that *extra* bit fun, the RISC OS pixel format is quite different to that used by Windows, even when endian-ness is corrected for. So even when using 32bpp in Squeak and on screen I still have to swap bits around.

>
> On the Windows machines I tried, I could not see relevant performance improvements, and I guess this is because of Intel's instruction set and Andreas' performance tricks. But supporting this could be useful for RISC OS, right?

The good news for the future - such as it might be for RISC OS, hardly a major player these days - is that the PI has some honkin' great big GPU stuff hidden in there. And the OS source is open so potentially the assorted screen driving code could be mangled to accept Squeak-format pixels and tweak them in the GPU magic sauce.

>
> I remember in the old days of OS/2 Squeak, I added inline asssembly to use the fast intel asm op for byte reversing a 32 bit word…

Always fun to do devious stuff like that, isn't it :-)


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful Latin Phrases:- Canis meus id comedit = My dog ate it.