CorruptVM slang (Was: Re: [squeak-dev] Anyone know the following about Slang?)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

CorruptVM slang (Was: Re: [squeak-dev] Anyone know the following about Slang?)

Igor Stasenko
2008/7/5 tim Rowledge <[hidden email]>:

>
> On 4-Jul-08, at 6:35 AM, Eliot Miranda wrote:
>>
>> [snip]
>> Thanks Tim!  That's what I needed.  Being pointed to the right place.  It
>> has taken 20 minutes to understand the code and 20 minutes to fix it.
>>  Thanks so much!!
>
> Nice to have actually achieved something this week; it's been one of those
> weeks...
>
> Simulating simulating the VM to gather type data seems like a pretty complex
> project. I can't help feeling it would be simpler and faster to simply write
> the VM cleanly, with decent documentation and specs.
>
> Igor, if you can produce a Better Slang With Lambas, do please share the
> code. There has to be some way of cleaning up the current mess.
>

It is already done. :)

The native methods which fully replacing primitives in CorruptVM
having special syntax.
Here an example of <native> method in CorruptVM:

lookup: selector
        <native>
        <variable: #delegate knownAs: #VTable>
        <variable: #vec knownAs: #Vector>
        <variable: #assoc knownAs: #Association>
        | vec delegate |
       
        delegate := self.
        delegate equal: nil whileFalse: [
                vec := delegate bindings.
                1 to: vec size do: [:i |  | assoc |
                        assoc := vec at: i.
                        assoc key equal: selector ifTrue: [
                                ^ assoc value
                                ]
                        ].
                delegate := delegate delegate.
        ].
        ^ nil

A <native> pragma tells compiler to switch to different logic. A logic
is simple:
any variables and values is a machine words (32/64/arbirary bit ints),
messages like + , - , * , / , bit shifts are pointer arithmetic , and
they behave exactly like corresponding CPU instructions.

Blocks not allowed in syntax except some special messages, like:
#equal:whileTrue: , #equal:ifTrue: , and variants (replace 'equal'
with 'greater', 'less' etc) simply translated to branching
instructions.

To read memory you simply write:
memValue := someAddressValue readWord.  "there is also a #readByte"

to write, you type:
someAddressValue writeWord: value

But if you look at method example, it looks quite similar to regular
smalltalk. Its because it using a static inlining.
A principle is simple: whenever compiler sees a message send, and its
not a special message (selector not found in CVSpecialMessages class),
then compiler does a method lookup and inlines found method.

Pragmas, like:
<variable: #vec knownAs: #Vector>
helping compiler with hints, where it should look for message implementation.
In example above , i sending #at: to 'vec' variable. In result
compiler inlines #at: method either in Vector class or its parent.
If class of variable not specified, then by default its class assumed
ProtoObject. If implementation not found - then compiler raising a
compile error.
You can't recursively inline same method. There are simple check which
throws an error when you trying this.

Messages to thisContext is treated as messages to compiler itself, and
therefore can be uses a preprocessing directives. For instance , i
found quite useful a following directive (or call it macro):

thisContext ifInlined: [ ... ] ifNotInlined: [ .. ].

With this, i can determine if current method are compiled for inlining
or for call from smalltalk and provide different implementation for
both cases.
An example is tagging small integers:
a method #size, returns a size of array. When inlinined, it returns a
number of array elements in machine word representation, when called
from smalltalk - it returns a small integer. This allows to avoid
excessive tagging/detagging in many places.

So, what about it in essence:
- with native methods you can provide implementation of any low-level
basic behavior.
- things are quite simple and you write code very similar to regular
smalltalk, with exception that you need to keep in mind, that all
message sends in native method either inlined or special low-level
messages.

So, in future system, you can write a package which contains smalltalk
code, and native code both. You don't need plugins or something else
external to make any code working just after it loaded into image.
Imagine a BitBlt package which contains everything in one place. Once
you load it - you got bitblt, and you don't need to care about
compiling/downloading plugins.

As for translation to C:
its really easy to write such. You basically need to write a lambda
transformation which transforms them to C code. This is quite
straightforward, once you got a low-level lambdas.

Simulation: it took me about 2 hours to implement a basic
CVCPUSimulator. It is really dumb and you don't find any complex logic
in it.

--
Best regards,
Igor Stasenko AKA sig.

pwl
Reply | Threaded
Open this post in threaded view
|

Re: CorruptVM slang (Was: Re: [squeak-dev] Anyone know the following about Slang?)

pwl
Hi Igor,

Very nice work Igor.

What about special assembly language instructions for the various
architectures? Such as test and set (for managing concurrency control)?
How do you control the mapping to native code?

By lambda, do you mean to say that you're doing this at the level of
block closures rather than only the level of methods? Block closures
being more general purpose so that the entire method need not be a
"primitive". Never liked primitive methods... always thought that blocks
were the better place to hang primitives - as well as a host of other
capabilities - from.

You mentioned you do byte code to native code conversion with Exupery.
How does this work and how can one fine tune the generated code for the
various architectures and calling conventions (for interfacing with ugly
things like DLLs or Shared Libraries that static <as in frozen in time>
system generate?

How much work would it take to put the squeak code base upon a hydra
rewritten in the new improved slang+exupery?

How many architectures does exupery currently support?

What are your plans on sharing your "lambda" code improvements?

All the best,

Peter




Reply | Threaded
Open this post in threaded view
|

Re: CorruptVM slang (Was: Re: [squeak-dev] Anyone know the following about Slang?)

Igor Stasenko
2008/7/5 Peter William Lount <[hidden email]>:

> Hi Igor,
>
> Very nice work Igor.
>
> What about special assembly language instructions for the various
> architectures? Such as test and set (for managing concurrency control)? How
> do you control the mapping to native code?
>
> By lambda, do you mean to say that you're doing this at the level of block
> closures rather than only the level of methods? Block closures being more
> general purpose so that the entire method need not be a "primitive". Never
> liked primitive methods... always thought that blocks were the better place
> to hang primitives - as well as a host of other capabilities - from.
>

Native methods , smalltalk methods , and block closures from ST
methods are compiled to common form: CompiledMethod.
A difference lies just in preprocessing style.
Compiled method always expects that arguments are passed via stack ,
but rest is up to implementor.
Compiler inlines code for method prologue/epilogue (code sits in
CVStackConext & friends), so if you don't like how its currently done,
you can always invent own stack layout & formats.
A polymorphic sends (Smalltalk method lookup & call) is implemented in
corresponding classes, and compiler just inlines this code with very
little assumptions about what code does.

> You mentioned you do byte code to native code conversion with Exupery. How
> does this work and how can one fine tune the generated code for the various
> architectures and calling conventions (for interfacing with ugly things like
> DLLs or Shared Libraries that static <as in frozen in time> system generate?
>

No, i'm not using byte codes at all. Translation performed from method
source to lambdas. Currently its using AST created by default squeak
Parser, but it will be replaced by own parser to translate sources
directly to lambdas. It already lies in package, written by Klaus, but
i didn't wired it up yet.

Interfacing with DLL's & other FFI stuff is up to implementor. Its as
easy as writing a code which does a call to address, stored in memory.
You can write such code yourself as:
result := someAddress call.
- generates a call to address = someAddress.

Maybe later, i'll add an foreign function callout code generator,
which will automatically generate code for arguments coercion &
pushing , and converting returned value. But its optional.

There are some nuances with returned values.
Currently it assumes that returned value is in eax, but its not true
for a C functions, which returning floats or 64 bit values on 32bit
platform.
It would require adding some special instructions later, like:

someAddress fcall: pointerForStoringFloat  " here you provide two
addresses: function pointer and pointer to memory location where
returned floating point value should be stored"

> How much work would it take to put the squeak code base upon a hydra
> rewritten in the new improved slang+exupery?
>

This project not related to Hydra.
The earlier you start writing it, the faster you'll get results :)
And if seriously, currently its a big playground. With freedom from C,
it opens a huge field of possibilities, how system can be implemented
from scratch.
Currently, there are only few bits, which make basic things working:
Context format, CompiledMethod format and VTable format to be able
perform polymorphic sends.

Low-level lambdas (the result of method compiling) are intentionally
platform independent. They assume you having a virtual CPU with
unlimited number of registers. Special register holding stack value ,
special register holding context value, and register for returning
value.

And i'm using Exupery for getting results fast. It lacks some
instructions (like working with byte-sized memory operands) and float
math support, but its not hard to add them later. Since 99% of code
already can work using current Exupery features, i don't think its a
big issue.
Its not using Exupery at full scale , only the last few classes
responsible for register allocation, instruction selection and
assembly.

> How many architectures does exupery currently support?
>

Well, originally it supports only i386. But i heard there are ports on ARM.
I think its not me who need to answer this question. Bryce knows
better about it. :)

> What are your plans on sharing your "lambda" code improvements?
>

Its free for any who may want to use it (MIT license). There a lot of
things which need to be done, before it can become a working system,
and amount of work to replicate such environment as Squeak, for
example is paramount for single man.
So i don't have an illusions that i could do it alone. But team of
people can do.



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: CorruptVM slang (Was: Re: [squeak-dev] Anyone know the following about Slang?)

Igor Stasenko
As follow-up , here an illustration, how easier you can implement
low-level behavior comparing to old ways:

Consider current SmallInteger>>+ implementation:

A method in SmallInteger:
--
+ aNumber
       "Primitive. Add the receiver to the argument and answer with the result
       if it is a SmallInteger. Fail if the argument or the result is not a
       SmallInteger  Essential  No Lookup. See Object documentation
whatIsAPrimitive."

       <primitive: 1>
       ^ super + aNumber
---

And method in VMMaker:
---
primitiveAdd
        self pop2AndPushIntegerIfOK: (self stackIntegerValue: 1) + (self
stackIntegerValue: 0)

which translated to following C code:

sqInt primitiveAdd(void) {
    sqInt integerResult;
    sqInt sp;

        /* begin pop2AndPushIntegerIfOK: */
        integerResult = (stackIntegerValue(1)) + (stackIntegerValue(0));
        if (successFlag) {
                if ((integerResult ^ (integerResult << 1)) >= 0) {
                        /* begin pop:thenPush: */
                        longAtput(sp = stackPointer - ((2 - 1) * BytesPerWord),
((integerResult << 1) | 1));
                        stackPointer = sp;
                } else {
                        successFlag = 0;
                }
        }
}

---

Now compare two methods above with single native method which does all itself:
---
SmallInteger>>+ aNumber
<native>

(aNumber bitAnd: 1) equal: 1 ifTrue: [
  | result |
  result := self >> 1 + (aNumber >>1 ).
  (result bitXor: result << 1) greaterOrEqual: 0 ifTrue: [ ^ result <<
1 bitOr: 1 ]
].

^ super perform: #+ with: aNumber   " a perform:[with:..]  is special
message for doing polymorphic send inside a native method"
----

Note , that checking that self is smallinteger is gone, because its
impossible to call this method for other than smallinteger instance.

Think about, how simpler would be to implement methods which use
complex structures, or when primitive needs to instantiate some
object(s) and force to use SpecialObjectsArray. Also, no need in
coercion: you can simply do a type-check and then inline a code which
reads a value(s) from object slots.

Lot of crappy things will be wiped away. Not mentioning, that you can
distribute any native code in your package, and don't need to
accompany it with pre-built plugin .dll etc.

--
Best regards,
Igor Stasenko AKA sig.