hardware for eToys

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

hardware for eToys

Jecel Assumpcao Jr
I found the subheading for the following press release funny in the
context of Squeak:

http://www.lsi.com/news/product_news/2006_11_13.html

"ZEVIO 1020 processor provides best-in-class cost, power and performance
for the expanding digital consumer appliance market, including eToys and
portable navigation devices"

Though a 150MHz ARM9 would make for a rather slow Squeak machine (even
with all the neat graphics coprocessors it includes), at $8 (large
volumes) we can forgive this :-)

-- Jecel

Reply | Threaded
Open this post in threaded view
|

Re: hardware for eToys

Klaus D. Witzel
Hi Jecel,

nice find, perhaps they learned the eToys name from the OLPC project ;-)

Question to the VM guru: are there any opcodes in the instruction set of  
ARM9 which allow a *fast* VM? A *small* VM is addressed by the thumb  
thing, IIRC. But what about speed. And what VM technology, (direct)  
threaded bytecode, or what does ARM9 support best.

Thank you for your time.

/Klaus

On Tue, 14 Nov 2006 15:58:57 +0100, Jecel Assumpcao Jr wrote:

> I found the subheading for the following press release funny in the
> context of Squeak:
>
> http://www.lsi.com/news/product_news/2006_11_13.html
>
> "ZEVIO 1020 processor provides best-in-class cost, power and performance
> for the expanding digital consumer appliance market, including eToys and
> portable navigation devices"
>
> Though a 150MHz ARM9 would make for a rather slow Squeak machine (even
> with all the neat graphics coprocessors it includes), at $8 (large
> volumes) we can forgive this :-)
>
> -- Jecel
>
>



Reply | Threaded
Open this post in threaded view
|

Re: hardware for eToys

Bert Freudenberg
The most severe drawback of ARM is lacking floating point support.  
That's one of the major reasons OLPC went with the Geode and not ARM.

- Bert -

On Nov 14, 2006, at 15:44 , Klaus D. Witzel wrote:

> Hi Jecel,
>
> nice find, perhaps they learned the eToys name from the OLPC  
> project ;-)
>
> Question to the VM guru: are there any opcodes in the instruction  
> set of ARM9 which allow a *fast* VM? A *small* VM is addressed by  
> the thumb thing, IIRC. But what about speed. And what VM  
> technology, (direct) threaded bytecode, or what does ARM9 support  
> best.
>
> Thank you for your time.
>
> /Klaus
>
> On Tue, 14 Nov 2006 15:58:57 +0100, Jecel Assumpcao Jr wrote:
>
>> I found the subheading for the following press release funny in the
>> context of Squeak:
>>
>> http://www.lsi.com/news/product_news/2006_11_13.html
>>
>> "ZEVIO 1020 processor provides best-in-class cost, power and  
>> performance
>> for the expanding digital consumer appliance market, including  
>> eToys and
>> portable navigation devices"
>>
>> Though a 150MHz ARM9 would make for a rather slow Squeak machine  
>> (even
>> with all the neat graphics coprocessors it includes), at $8 (large
>> volumes) we can forgive this :-)
>>
>> -- Jecel
>>
>>
>
>
>




Reply | Threaded
Open this post in threaded view
|

Re: hardware for eToys

timrowledge
In reply to this post by Klaus D. Witzel

On 14-Nov-06, at 6:44 AM, Klaus D. Witzel wrote:

> Hi Jecel,
>
> nice find, perhaps they learned the eToys name from the OLPC  
> project ;-)
>
> Question to the VM guru: are there any opcodes in the instruction  
> set of ARM9 which allow a *fast* VM? A *small* VM is addressed by  
> the thumb thing, IIRC. But what about speed. And what VM  
> technology, (direct) threaded bytecode, or what does ARM9 support  
> best.
Well ARM is a generically nice little RISC cpu and the ARM9 is just  
one variation (actually ARM9 is a family designation for a range of  
actual chips) among many.
Things that are good for running a squeakish vm include good fast  
integer handling,  a barrel shifter, and perhaps most interesting the  
very fast call / return from subroutines that can reduce the use of  
inlining and save memory traffic. The fast interrupt handling can be  
useful too.
bad things include the almost non-existent cache and no FPU.

I still think that a maxed out ARM11 with the fully 4mb cache and 4mb  
TCM and the FPU would be a very fast system. But nobody is banging  
onmy door to offer me one.:-(

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: MAW: Make Aggravating Whine



Reply | Threaded
Open this post in threaded view
|

Re: hardware for eToys

Jecel Assumpcao Jr
In reply to this post by Klaus D. Witzel
Klaus D. Witzel wrote:
> nice find, perhaps they learned the eToys name from the OLPC project ;-)

This is a rather obvious name - I remember two companies fighting over
it back in the .com era.
 
> Question to the VM guru: are there any opcodes in the instruction set of  
> ARM9 which allow a *fast* VM? A *small* VM is addressed by the thumb  
> thing, IIRC. But what about speed. And what VM technology, (direct)  
> threaded bytecode, or what does ARM9 support best.

This chip has the Jazelle 1 technology, which is unfortunately hardwired
for Java. The most common bytecodes get translated on the fly into a
single ARM instruction (just like Thumb) while the more complex ones
trap into a software implementation. An equivalent circuit for handling
Squeak bytecodes would be fantastic.

Though it lacks the register windows I like so much, the ARM has been my
favorite RISC instruction set since I studied it 21 years ago. Only in
the last couple of months have I been able to come up with something I
feel is nicer (http://www.merlintec.com:8080/Hardware/RISC42). Two
things that make the ARM great for implementing stuff like Squeak are
the ability to use shifts and rotates with every instruction (allowing
up to four registers to be specified instead of just three) and the
ability to conditionally execute any instruction. The latter reduces the
cost of doing quick stuff like clearing tags.

Bert Freudenberg wrote:
> The most severe drawback of ARM is lacking floating point support.  
> That's one of the major reasons OLPC went with the Geode and not ARM.

That is the official story, but I would guess that the fact that AMD was
funding the project was a more important reason. Consider the ARM7500FE
from the late 1990s, for example (click on "features"):

http://www.cirrus.com/en/products/pro/detail/P940.html

Its hardware floating point implementation was a good match for its
integer performance (both very weak by today's standards). The truth is
that most ARM customers don't care about floating point and so most off
the shelf variations don't include it. Though OLPC had a minimum
projected volume of tens of millions and is an extremely cost sensitive
project, they decided early on to limit themselves to chips that were
already in use in other high volume products. This meant they would
avoid any components just being launched and would not design any chips
of their own. With that limitation then floating point is indeed a
problem for the ARM. But in the end they had to create two custom chips
for the project (display controller for the special LCD and the camera
and flash controller).

Alex Perez pointed out this very cool variation of the ARM that will
soon be available in production quantities:

> http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=i.MX31&nodeId=01J4Fs2973ZrDR

The vector floating point unit seems like a great option which Squeak's
FloatArrays could easily be patched to take full advantage of. In
contrast the Geode's floating point performance turned out to be only
about half of what the OLPC people originally had thought it would be.

-- Jecel

Reply | Threaded
Open this post in threaded view
|

Jazelle [was: hardware for eToys]

Klaus D. Witzel
In reply to this post by Klaus D. Witzel
Hi Jecel,

on Tue, 14 Nov 2006 20:49:09 +0100, you wrote:

> Klaus D. Witzel wrote:
>> nice find, perhaps they learned the eToys name from the OLPC project ;-)
>
> This is a rather obvious name - I remember two companies fighting over
> it back in the .com era.
>
>> Question to the VM guru: are there any opcodes in the instruction set of
>> ARM9 which allow a *fast* VM? A *small* VM is addressed by the thumb
>> thing, IIRC. But what about speed. And what VM technology, (direct)
>> threaded bytecode, or what does ARM9 support best.
>
> This chip has the Jazelle 1 technology,

Ah, I didn't associate Jazelle with ARM9. I thought that Jazelle was still  
for the lab guys.

> which is unfortunately hardwired
> for Java. The most common bytecodes get translated on the fly into a
> single ARM instruction (just like Thumb) while the more complex ones
> trap into a software implementation. An equivalent circuit for handling
> Squeak bytecodes would be fantastic.

Well, even the JVM is an Universal Turing machine ;-)

I have a Squeak compiler which emits the JVM's bytecodes (into regular  
class files). The run-time is hand-crafted as a very thin layer (currently  
around SmallIntegers, Characters and Floats and other basic material like  
ByteArray, OrderedCollection and BytesWriteStream [a companion of  
aByteArray>>writeStream] and analog to that String and CharsWriteStream,  
plus FileDirectory and the bit of refactoring done with Magnitude).

Just sufficient support for the compiler compiling itself :)

Now every JVM does dynamic dispatch if the correct opcodes are used (O.K.  
the bytecode verifier has to be convinced but that is only pro-forma and  
does not affect the run-time; recently Adrian Kuhn had the same idea how  
to do this; even GCJ knows how to do this :) No "Invokedynamic" is needed  
(except if that would automagically do boxing/unboxing, then I'd employ  
"Invokedynamic"). And only field access (instVar, classVar) needs the CAST  
opcode ;-) Well, I think that especially Jazelle can live without the CAST  
opcode :)

I would like to see this all running on Jazelle but lack the expertise for  
choosing the right platform for *not* producing an expensive failure. Can  
you help me with your expertise to choose a Jazelle platform, *that* would  
be fantastic.

/Klaus


Reply | Threaded
Open this post in threaded view
|

RISC42 [was: hardware for eToys]

Klaus D. Witzel
In reply to this post by Klaus D. Witzel
Hi Jecel,

on Tue, 14 Nov 2006 20:49:09 +0100, you wrote:
...
>> Question to the VM guru: are there any opcodes in the instruction set of
>> ARM9 which allow a *fast* VM? A *small* VM is addressed by the thumb
>> thing, IIRC. But what about speed. And what VM technology, (direct)
>> threaded bytecode, or what does ARM9 support best.
>
...
> Though it lacks the register windows I like so much, the ARM has been my
> favorite RISC instruction set since I studied it 21 years ago. Only in
> the last couple of months have I been able to come up with something I
> feel is nicer (http://www.merlintec.com:8080/Hardware/RISC42). Two
> things that make the ARM great for implementing stuff like Squeak are
> the ability to use shifts and rotates with every instruction (allowing
> up to four registers to be specified instead of just three) and the
> ability to conditionally execute any instruction. The latter reduces the
> cost of doing quick stuff like clearing tags.

I like the LOGIC/KLOGIC instruction, it looks like a friend of BitBlt :)

How would Smalltalk's method lookup routine look on RISC42?

/Klaus


Reply | Threaded
Open this post in threaded view
|

Re: Jazelle [was: hardware for eToys]

Jecel Assumpcao Jr
In reply to this post by Klaus D. Witzel
Klaus D. Witzel wrote on Wed, 15 Nov 2006 05:35:21 +0100
> > This chip has the Jazelle 1 technology,
>
> Ah, I didn't associate Jazelle with ARM9. I thought that Jazelle was still  
> for the lab guys.

We had a thread about this before. ARM calls two entirely different
technologies "Jazelle". I described the older one below and it is
present in many commercially available chips. The new one is not out of
the labs and is just a special variation of their Thumb instruction set
optimized as a target for JIT compilers (so it would be nice for
Exupery, for example).

> > which is unfortunately hardwired
> > for Java. The most common bytecodes get translated on the fly into a
> > single ARM instruction (just like Thumb) while the more complex ones
> > trap into a software implementation. An equivalent circuit for handling
> > Squeak bytecodes would be fantastic.
>
> Well, even the JVM is an Universal Turing machine ;-)

Sure.

> I have a Squeak compiler which emits the JVM's bytecodes (into regular  
> class files). The run-time is hand-crafted as a very thin layer (currently  
> around SmallIntegers, Characters and Floats and other basic material like  
> ByteArray, OrderedCollection and BytesWriteStream [a companion of  
> aByteArray>>writeStream] and analog to that String and CharsWriteStream,  
> plus FileDirectory and the bit of refactoring done with Magnitude).
>
> Just sufficient support for the compiler compiling itself :)

Very nice. I think there have been other Smalltalks which ran on the
JVM. At least I seem to remember something called "Bistro".
 
> Now every JVM does dynamic dispatch if the correct opcodes are used (O.K.  
> the bytecode verifier has to be convinced but that is only pro-forma and  
> does not affect the run-time; recently Adrian Kuhn had the same idea how  
> to do this; even GCJ knows how to do this :)

In the case of someone just wanting to use the Jazelle technology for
Squeak instead of running it on a regular JVM there is no need to worry
about thinking like the bytecode verifier. You can bend the rules as
much as you like.

> No "Invokedynamic" is needed  
> (except if that would automagically do boxing/unboxing, then I'd employ  
> "Invokedynamic"). And only field access (instVar, classVar) needs the CAST  
> opcode ;-) Well, I think that especially Jazelle can live without the CAST  
> opcode :)

Probably.

About the Invokedynamic - give how Jazelle works (essentially a
bytecode->ARM translation ROM with all the complicated instructions
translated to "call xxx") it is probably no more and nor less costly
than the other invoke bytecodes. So some optimizations you might need on
a normal JVM might not get the same results here.

> I would like to see this all running on Jazelle but lack the expertise for  
> choosing the right platform for *not* producing an expensive failure. Can  
> you help me with your expertise to choose a Jazelle platform, *that* would  
> be fantastic.

That depends on many details. Since this is very off topic we should
move it off squeak-dev.

-- Jecel

Reply | Threaded
Open this post in threaded view
|

Re: RISC42 [was: hardware for eToys]

Jecel Assumpcao Jr
In reply to this post by Klaus D. Witzel
Klaus D. Witzel wrote on Wed, 15 Nov 2006 05:40:06 +0100
> I like the LOGIC/KLOGIC instruction, it looks like a friend of BitBlt :)

That is the idea, though perhaps a KSHIFT/LOGIC combination (which the
ARM can also handle) would be even more important for the graphics
primitives.
 
> How would Smalltalk's method lookup routine look on RISC42?

This design was created for running C, not Smalltalk. So the answer is
that it would be just the same as on any other generic RISC. That said,
it would be nice to run SqueakNOS on this. A teacher at a local
university was trying to put together a project for creating an open
source RISC code with his students and that inspired me to come up with
this, but the idea is that they would do the actual work of implementing
it.

The designs that I *am* working on are optimized for Smalltalk. The
current 16 bit version (http://www.merlintec.com:8080/Hardware/Oliver)
essentially has a vast table of all class/selector combinations. Each
table entry is two words long and holds the actual instructions for the
corresponding method. So you just have to do "call
table[class(receiver),selector]" which, not counting cache misses, can
be executed in a single clock cycle (the "class(receiver)" function
doesn't involve a memory access since I use a variation of the class
encoding scheme from Smalltalk-74's OOZE). The bulk of the table entries
(over 80%) are for invalid class/selector combinations and their code is
just a jump to the MNU routine. Many other entries are for very short
methods that fit in the two words. For longer methods the table entry
has a jump to the rest of the method (since it is "long" anyway, the
overhead of the jump won't be too bad).

This scheme wastes memory, but I happen to have 8MB on a machine that
would work just fine with 2MB or less. So I have 6MB that would be
useless otherwise and wasting them to save a few clocks per message send
is a great option.

For a 32 bit Smalltalk this wouldn't work. The Squeak 3.8 image I am
typing this in has 2339 Metaclasses and 40849 ByteSymbols, and so would
need a table with 95 million entries (764MB if each entry has two words
of 32 bits each). So for that case I have an alternative called "PIC
Mode". I should put a description of this on my swiki, but basically it
scales far better and allows type feedback for optimizing compilers but
sends take a few more clocks than in the 16 bit version (hopefully
inlining eliminates many sends and makes up for that).

-- Jecel

Reply | Threaded
Open this post in threaded view
|

Re: RISC42 [was: hardware for eToys]

Alan Kay
In reply to this post by Klaus D. Witzel
Hi Jecel --

>The designs that I *am* working on are optimized for Smalltalk. The
>current 16 bit version (http://www.merlintec.com:8080/Hardware/Oliver)
>essentially has a vast table of all class/selector combinations.

We called this the "giant hash table scheme" at PARC and thought
about using some (future) VM hardware to look up this more useful
virtual address for methods.

I'd love to hear how this works out (and what kinds of HW would do it
better (like a specially programmed FPGA, etc.).

Cheers,

Alan

---------

At 03:50 PM 11/16/2006, Jecel Assumpcao Jr wrote:

>Klaus D. Witzel wrote on Wed, 15 Nov 2006 05:40:06 +0100
> > I like the LOGIC/KLOGIC instruction, it looks like a friend of BitBlt :)
>
>That is the idea, though perhaps a KSHIFT/LOGIC combination (which the
>ARM can also handle) would be even more important for the graphics
>primitives.
>
> > How would Smalltalk's method lookup routine look on RISC42?
>
>This design was created for running C, not Smalltalk. So the answer is
>that it would be just the same as on any other generic RISC. That said,
>it would be nice to run SqueakNOS on this. A teacher at a local
>university was trying to put together a project for creating an open
>source RISC code with his students and that inspired me to come up with
>this, but the idea is that they would do the actual work of implementing
>it.
>
>The designs that I *am* working on are optimized for Smalltalk. The
>current 16 bit version (http://www.merlintec.com:8080/Hardware/Oliver)
>essentially has a vast table of all class/selector combinations. Each
>table entry is two words long and holds the actual instructions for the
>corresponding method. So you just have to do "call
>table[class(receiver),selector]" which, not counting cache misses, can
>be executed in a single clock cycle (the "class(receiver)" function
>doesn't involve a memory access since I use a variation of the class
>encoding scheme from Smalltalk-74's OOZE). The bulk of the table entries
>(over 80%) are for invalid class/selector combinations and their code is
>just a jump to the MNU routine. Many other entries are for very short
>methods that fit in the two words. For longer methods the table entry
>has a jump to the rest of the method (since it is "long" anyway, the
>overhead of the jump won't be too bad).
>
>This scheme wastes memory, but I happen to have 8MB on a machine that
>would work just fine with 2MB or less. So I have 6MB that would be
>useless otherwise and wasting them to save a few clocks per message send
>is a great option.
>
>For a 32 bit Smalltalk this wouldn't work. The Squeak 3.8 image I am
>typing this in has 2339 Metaclasses and 40849 ByteSymbols, and so would
>need a table with 95 million entries (764MB if each entry has two words
>of 32 bits each). So for that case I have an alternative called "PIC
>Mode". I should put a description of this on my swiki, but basically it
>scales far better and allows type feedback for optimizing compilers but
>sends take a few more clocks than in the 16 bit version (hopefully
>inlining eliminates many sends and makes up for that).
>
>-- Jecel


Reply | Threaded
Open this post in threaded view
|

message dispatch (was: RISC42)

Jecel Assumpcao Jr
Alan Kay wrote on Fri, 17 Nov 2006 08:15:51 -0800
> >[vast table of all class/selector combinations]
>
> We called this the "giant hash table scheme" at PARC and thought
> about using some (future) VM hardware to look up this more useful
> virtual address for methods.

In this case I have the full table, not a hash one. The address is
virtual in the sense that I have an object table, but otherwise it is
pretty much a physical one. When David Ungar saw my talk about this in
OOPSLA 2003, where I mentioned I had only gone this route because the
smallest memory I could buy cheaply was an 8MB SDRAM chip for $2, he
wondered if some papers being presented at that conference wouldn't
become irrelevant in the future due to brute force advances in hardware.
But I think this particular machine is very atypical and all the great
message dispatch theory of the past still applies. In fact, I would
recommend that people interested in this read this comparison paper:

http://www.cs.ucsb.edu/~urs/oocsb/papers/dispatch.html

But for this machine I had 512KB of permanent storage (90KB of that is
the bits for programming the FPGA) and 8MB of RAM. So though the system
is now Smalltalk it has to be as tiny as the Forth it was originally to
fit in the Flash, yet it can waste megabytes of RAM in helper tables to
make things a little faster. This isn't the future, just a small niche
(a certain kind of embedded applications).

Due to the 16 bit object pointers, this system has a very poor
reflective structure. Methods aren't objects, for example, but rather
all code for a class is lumped into a single vector. This starts out
with the corresponding row of the class/selector table and also contains
the rest of the methods that don't fit in two words. The selector times
2 directly gives you the initial value for the program counter (which is
an index into that vector) for the method.

Given how small the system is and how fast current machines are (even
this processor at only 54MHz) the implementation isn't very incremental.
When you add a new selector to the system all classes are regenerated to
expand their table part to make room for this. When the source for a
method is edited the whole class and subclasses (the system doesn't
actually have these, but these Smalltalk-80 terms give the rough idea)
are recompiled.

> I'd love to hear how this works out (and what kinds of HW would do it
> better (like a specially programmed FPGA, etc.).

The 32 bit version will be more interesting since brute force solutions
don't work. I have just created a new page to explain the "PIC Mode"
(http://www.merlintec.com:8080/hardware/33) but I probably won't be able
to put any real content there until next month. The idea is very simple,
however:

The processor has a normal execution mode and a PIC mode. During normal
execution instructions are fetched from the location in the code cache
pointed to by the program counter. It doesn't fetch instructions
directly from methods - you have to copy the bytecodes (or translate, if
the processor doesn't directly execute bytecodes) to the code cache and
jump there. Inside the processor there is normally an instruction cache.

Some instructions (like "send") can switch the processor to PIC mode. In
that case instructions are fetched from a PIC cache instead of the code
cache. The instruction doing the switch must supply a "type" parameter
and the PIC cache is accessed with a 64 bit <PC,Type> address. The PC
part is the address of the instruction which switched to PIC mode and
the instructions are streamed from the PIC cache without changing the
PC. If the whole cache line is used then execution goes back to the
normal mode at PC+1. PIC mode is also exited if any branch or call
instructions are executed.

So if you execute a send instruction then the next instruction is
determined by the receiver's class (type). Suppose that this send
bytecode is at PC=16r213412 in the code cache and that the PIC cache has
five entries tagged as <16r213412,X> with five different values of X. If
the receiver type doesn't match any of them we call the copier (or
translator) to create a new entry. If the receiver type is one of those
five, then several instructions (up to the size of a cache line) are
executed as if there were inlined between 16r213412 and 16r213413. A
very common case is for these instructions to simply call a given
method, but for simple and short methods they could be the body of the
method itself.

Some software has to manage all the related entries in the PIC cache and
this is the source of type information when a method is being recompiled
to generate more optimized code.

David Faught wrote:
> Wouldn't Content Addressable Memory (CAM) work for this?  I can
> remember reading about this years ago, and I know that some current
> network routers and switches use this to speed up their table lookups.

CAMs are the way to go at least for the caches inside the chip itself.
For larger second level caches that live in main memory you wan't to use
some kind of hashing solution. Unfortunately FPGAs are very bad at
implementing CAMs, so I am forced to using hashing for the first level
caches as well. On a custom chip version of the design I would do as you
suggested.

-- Jecel

Reply | Threaded
Open this post in threaded view
|

Re: Jazelle [was: hardware for eToys]

Klaus D. Witzel
In reply to this post by Klaus D. Witzel
Hi Jecel,

on Fri, 17 Nov 2006 00:22:08 +0100, you wrote:

> Klaus D. Witzel wrote on Wed, 15 Nov 2006 05:35:21 +0100
>> > This chip has the Jazelle 1 technology,
>>
>> Ah, I didn't associate Jazelle with ARM9. I thought that Jazelle was  
>> still
>> for the lab guys.
>
> We had a thread about this before. ARM calls two entirely different
> technologies "Jazelle". I described the older one below and it is
> present in many commercially available chips. The new one is not out of
> the labs and is just a special variation of their Thumb instruction set
> optimized as a target for JIT compilers (so it would be nice for
> Exupery, for example).

Ah, IC.

>> > which is unfortunately hardwired
>> > for Java. The most common bytecodes get translated on the fly into a
>> > single ARM instruction (just like Thumb) while the more complex ones
>> > trap into a software implementation. An equivalent circuit for  
>> handling
>> > Squeak bytecodes would be fantastic.
>>
>> Well, even the JVM is an Universal Turing machine ;-)
>
> Sure.
>
>> I have a Squeak compiler which emits the JVM's bytecodes (into regular
>> class files).
...
>> Just sufficient support for the compiler compiling itself :)
>
> Very nice. I think there have been other Smalltalks which ran on the
> JVM. At least I seem to remember something called "Bistro".

Robert Tolksdorf collected an impressive (200 different systems) list of  
system wich use JVM as is @
- http://www.robert-tolksdorf.de/vmlanguages.html

>> Now every JVM does dynamic dispatch if the correct opcodes are used  
>> (O.K.
>> the bytecode verifier has to be convinced but that is only pro-forma and
>> does not affect the run-time; recently Adrian Kuhn had the same idea how
>> to do this; even GCJ knows how to do this :)
>
> In the case of someone just wanting to use the Jazelle technology for
> Squeak instead of running it on a regular JVM there is no need to worry
> about thinking like the bytecode verifier. You can bend the rules as
> much as you like.

Sure :) The point I meant was just, to convince the verifyer while still  
debugging and testing on regular JVM platforms. Saves a lot of time and  
headaches. The first utility I wrote into this direction was for replacing  
CAST bytecodes by NOOP. Many CASTs are just required by stupid  
static-minded regular javac / GCC, not by the bytecode verifier nor the  
JVM ;-)

>> No "Invokedynamic" is needed
>> (except if that would automagically do boxing/unboxing, then I'd employ
>> "Invokedynamic"). And only field access (instVar, classVar) needs the  
>> CAST
>> opcode ;-) Well, I think that especially Jazelle can live without the  
>> CAST
>> opcode :)
>
> Probably.
>
> About the Invokedynamic - give how Jazelle works (essentially a
> bytecode->ARM translation ROM with all the complicated instructions
> translated to "call xxx") it is probably no more and nor less costly
> than the other invoke bytecodes. So some optimizations you might need on
> a normal JVM might not get the same results here.

Of course.

>> I would like to see this all running on Jazelle but lack the expertise  
>> for
>> choosing the right platform for *not* producing an expensive failure.  
>> Can
>> you help me with your expertise to choose a Jazelle platform, *that*  
>> would
>> be fantastic.
>
> That depends on many details. Since this is very off topic we should
> move it off squeak-dev.

O.K. can you email items / topics / concerns for starting an off-list  
discussion, please. Thank you. Even if it'd turn out to be infeasible, I'd  
like to find out why so. OTHO if it's feasible and not expensive I'd like  
to create such a system on as common HW+SW platform as possible :)

/Klaus

> -- Jecel
>
>