Pi performance fun

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Pi performance fun

timrowledge
A Raspberry Pi is not a terribly fast machine; after all, it's only 700MHz with a fairly slow memory system. It's only $35 for ghu's sake.

Still, it turns out that with some work you can make it do things reasonably quickly. I've been working on getting the stack vm running as a precursor to a full Cogit and the results are quite pleasing so far; it's around 50% faster than the plain interpreter. That's not a huge change but one must remember that the stack vm is really just a way of introducing the new stack and object memory underpinnings - the interpreter is barely changed.

As part of the benchmarking for this work and some exciting changes to BitBLT for ARMs I resurrected the ancient PARC benchmark code we used to use in the days of striving for Dorado equivalent performance. They're not really useful for much more than historical (an romantic) comparison but the cool thing is that
a) in a modern image that show that indeed the stack vm is 50% faster, agreeing with much bigger benchmarks
b) using a very old image that I used to benchmark changes to the 2.8 era VMs, we see that the Pi running the plain interpreter scores about the same as a 600MHz pentium 3 did in '00. It's about the same as my old Iyonix ARM machin, too - that was a 600MHz intel(!)StrongARM 80321 with a very fast (for the time) memory and a fast (for the time) graphics card. Either machine cost around 100 times as much as the Pi does now; add inflation. This old image cannot run on the stackvm due to the Great Image Shift.

The BitBLT work is quite interesting since it involves usurping the normal generated plugin to redirect to some very carefully written assembler code. You can do quite amazing things with the latest ARM graphics systems and knowledge of how to preload lines for the cache and interleave processing with the load queue and all without sacrificing too many virgins. Whether it will provide any useful lessons for other platforms is an open question; after all, we probably ought to be moving away from relying on bitblts and towards more modern vector libraries etc.

If any of you have Pi's and want to help with testing etc, let me know.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
If a train station is where the train stops, what is a work station?



Reply | Threaded
Open this post in threaded view
|

Re: Pi performance fun

Brad Fuller-3
I'd be interested. I have  a Pi-B that I just bought. Nothing is on it yet.
Will you make the code and the procedure of how to put it on available
to all?

brad

On 5/24/2013 2:11 PM, tim Rowledge wrote:

> A Raspberry Pi is not a terribly fast machine; after all, it's only 700MHz with a fairly slow memory system. It's only $35 for ghu's sake.
>
> Still, it turns out that with some work you can make it do things reasonably quickly. I've been working on getting the stack vm running as a precursor to a full Cogit and the results are quite pleasing so far; it's around 50% faster than the plain interpreter. That's not a huge change but one must remember that the stack vm is really just a way of introducing the new stack and object memory underpinnings - the interpreter is barely changed.
>
> As part of the benchmarking for this work and some exciting changes to BitBLT for ARMs I resurrected the ancient PARC benchmark code we used to use in the days of striving for Dorado equivalent performance. They're not really useful for much more than historical (an romantic) comparison but the cool thing is that
> a) in a modern image that show that indeed the stack vm is 50% faster, agreeing with much bigger benchmarks
> b) using a very old image that I used to benchmark changes to the 2.8 era VMs, we see that the Pi running the plain interpreter scores about the same as a 600MHz pentium 3 did in '00. It's about the same as my old Iyonix ARM machin, too - that was a 600MHz intel(!)StrongARM 80321 with a very fast (for the time) memory and a fast (for the time) graphics card. Either machine cost around 100 times as much as the Pi does now; add inflation. This old image cannot run on the stackvm due to the Great Image Shift.
>
> The BitBLT work is quite interesting since it involves usurping the normal generated plugin to redirect to some very carefully written assembler code. You can do quite amazing things with the latest ARM graphics systems and knowledge of how to preload lines for the cache and interleave processing with the load queue and all without sacrificing too many virgins. Whether it will provide any useful lessons for other platforms is an open question; after all, we probably ought to be moving away from relying on bitblts and towards more modern vector libraries etc.
>
> If any of you have Pi's and want to help with testing etc, let me know.
>
> tim
> --
> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> If a train station is where the train stops, what is a work station?
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Pi performance fun

timrowledge

On 24-05-2013, at 11:18 AM, Brad Fuller <[hidden email]> wrote:

> I'd be interested. I have  a Pi-B that I just bought. Nothing is on it yet.
> Will you make the code and the procedure of how to put it on available
> to all?

Of course - it's all open stuff. The stackvm code is already up there on source.squeak.org in 'VMMaker.oscog'. The bitblt stuff will be released when it isn't dangerous; amongst other things we've got to work out how to handle the mixing of generated code and very platform specific parts a bit more neatly.

First thing to do is load Raspbian and get your Pi talking. The latest download apparently includes the latest improvements to Scratch that I have been working on too.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- Moves his lips to pretend he's reading.



Reply | Threaded
Open this post in threaded view
|

Re: Pi performance fun

J. Vuletich (mail lists)
In reply to this post by timrowledge
Hi Tim,

You used the Green Book benchmarks, right? Can you publish them?

Cool stuff!

Cheers,
Juan Vuletich

Quoting tim Rowledge <[hidden email]>:

> A Raspberry Pi is not a terribly fast machine; after all, it's only  
> 700MHz with a fairly slow memory system. It's only $35 for ghu's sake.
>
> Still, it turns out that with some work you can make it do things  
> reasonably quickly. I've been working on getting the stack vm  
> running as a precursor to a full Cogit and the results are quite  
> pleasing so far; it's around 50% faster than the plain interpreter.  
> That's not a huge change but one must remember that the stack vm is  
> really just a way of introducing the new stack and object memory  
> underpinnings - the interpreter is barely changed.
>
> As part of the benchmarking for this work and some exciting changes  
> to BitBLT for ARMs I resurrected the ancient PARC benchmark code we  
> used to use in the days of striving for Dorado equivalent  
> performance. They're not really useful for much more than historical  
> (an romantic) comparison but the cool thing is that
> a) in a modern image that show that indeed the stack vm is 50%  
> faster, agreeing with much bigger benchmarks
> b) using a very old image that I used to benchmark changes to the  
> 2.8 era VMs, we see that the Pi running the plain interpreter scores  
> about the same as a 600MHz pentium 3 did in '00. It's about the same  
> as my old Iyonix ARM machin, too - that was a 600MHz  
> intel(!)StrongARM 80321 with a very fast (for the time) memory and a  
> fast (for the time) graphics card. Either machine cost around 100  
> times as much as the Pi does now; add inflation. This old image  
> cannot run on the stackvm due to the Great Image Shift.
>
> The BitBLT work is quite interesting since it involves usurping the  
> normal generated plugin to redirect to some very carefully written  
> assembler code. You can do quite amazing things with the latest ARM  
> graphics systems and knowledge of how to preload lines for the cache  
> and interleave processing with the load queue and all without  
> sacrificing too many virgins. Whether it will provide any useful  
> lessons for other platforms is an open question; after all, we  
> probably ought to be moving away from relying on bitblts and towards  
> more modern vector libraries etc.
>
> If any of you have Pi's and want to help with testing etc, let me know.
>
> tim
> --
> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> If a train station is where the train stops, what is a work station?
>
>
>
>



Cheers,
Juan Vuletich


Reply | Threaded
Open this post in threaded view
|

Re: Pi performance fun

timrowledge

On 24-05-2013, at 11:52 AM, "Juan Vuletich (mail lists)" <[hidden email]> wrote:

> Hi Tim,
>
> You used the Green Book benchmarks, right? Can you publish them?

Well of course; a slightly non-functional version has been on squeaksource for a long time now but there was one bug I fixed and one I can't see a simple fix for and we'd need someone more in practice with what the compiler does to look at it. The MacroBenchmarks>macroBenchmarks1 seems to upset things a bit.

Assuming I haven't screwed up horribly, the latest version is at http://www.smalltalkhub.com/#!/~timrowledge/Benchmarking


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Oxymorons: Advanced BASIC



Reply | Threaded
Open this post in threaded view
|

Re: Pi performance fun

Casey Ransberger-2
In reply to this post by timrowledge
The stack VM is getting 50% better perf on your Pi? Wow. That's more than I expected.

I've neglected my Pi this month to chase a contract, but now that's out of the way, I'd like to participate. Let me know what I can do to help.

On May 24, 2013, at 11:11 AM, tim Rowledge <[hidden email]> wrote:

> A Raspberry Pi is not a terribly fast machine; after all, it's only 700MHz with a fairly slow memory system. It's only $35 for ghu's sake.
>
> Still, it turns out that with some work you can make it do things reasonably quickly. I've been working on getting the stack vm running as a precursor to a full Cogit and the results are quite pleasing so far; it's around 50% faster than the plain interpreter. That's not a huge change but one must remember that the stack vm is really just a way of introducing the new stack and object memory underpinnings - the interpreter is barely changed.
>
> As part of the benchmarking for this work and some exciting changes to BitBLT for ARMs I resurrected the ancient PARC benchmark code we used to use in the days of striving for Dorado equivalent performance. They're not really useful for much more than historical (an romantic) comparison but the cool thing is that
> a) in a modern image that show that indeed the stack vm is 50% faster, agreeing with much bigger benchmarks
> b) using a very old image that I used to benchmark changes to the 2.8 era VMs, we see that the Pi running the plain interpreter scores about the same as a 600MHz pentium 3 did in '00. It's about the same as my old Iyonix ARM machin, too - that was a 600MHz intel(!)StrongARM 80321 with a very fast (for the time) memory and a fast (for the time) graphics card. Either machine cost around 100 times as much as the Pi does now; add inflation. This old image cannot run on the stackvm due to the Great Image Shift.
>
> The BitBLT work is quite interesting since it involves usurping the normal generated plugin to redirect to some very carefully written assembler code. You can do quite amazing things with the latest ARM graphics systems and knowledge of how to preload lines for the cache and interleave processing with the load queue and all without sacrificing too many virgins. Whether it will provide any useful lessons for other platforms is an open question; after all, we probably ought to be moving away from relying on bitblts and towards more modern vector libraries etc.
>
> If any of you have Pi's and want to help with testing etc, let me know.
>
> tim
> --
> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> If a train station is where the train stops, what is a work station?
>
>
>