Compiler

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Compiler

Colin Putney-3
So I was poking around in Compiler today, and noticed that it's a
bit... messy. I had a few ideas for improvement, but before I go
monkeying with such an important part of the system, I thought I'd
bring it up here. Do we have a long- or medium-term plan for how the
compiler should evolve?

As I see it there are a few paths available to us:

1. Incremental improvement. The compiler we have now is tried and
true. Now that we have proper block closures and a high performance
VM, there's no real need to improve bytecode generation, so we
shouldn't put much effort into this part of the system. We'll just
make small improvements and minor refactorings as needed.

2. Adopt an existing project. There have been a few "new compiler"
projects over the years, and one or another of them might present an
opportunity for signifiant improvement over the status quo. I'm
thinking of ByteSurgeon, Opal, AOStA etc. It's not something we'll
rush into, but eventually, when the code is mature, we'll want to
replace the current compiler.

3. Something completely new. Now that we have closures and a fast VM,
existing projects aren't relevant anymore, but we have new
opportunities for improvement. VM-level changes, such as a new object
format or new bytecodes could drive this option, if they're big enough
that significant work on the compiler is required anyway. Maybe we can
only see the broad outlines of what the project might look like at the
moment, but we can see it on the horizon.

So, are there any pain points right now that we should think about
addressing? Is anybody planning or considering working on something
compiler-related?

Eliot, is there anything in the new object format that will have an
impact on image-side compilation? I seem to remember you mentioning
something about efficiently supporting alternate bytecode sets. Is
that meant for Newspeak, or do you have something in mind for
Smalltalk?

I don't think we have to come up with a definitive plan just now, I
just want to get a sense of what people are thinking.

Colin

Reply | Threaded
Open this post in threaded view
|

Re: Compiler

Eliot Miranda-2


On Tue, Jun 19, 2012 at 12:19 AM, Colin Putney <[hidden email]> wrote:
So I was poking around in Compiler today, and noticed that it's a
bit... messy. I had a few ideas for improvement, but before I go
monkeying with such an important part of the system, I thought I'd
bring it up here. Do we have a long- or medium-term plan for how the
compiler should evolve?

As I see it there are a few paths available to us:

1. Incremental improvement. The compiler we have now is tried and
true. Now that we have proper block closures and a high performance
VM, there's no real need to improve bytecode generation, so we
shouldn't put much effort into this part of the system. We'll just
make small improvements and minor refactorings as needed.

I wouldn't say that the bytecode set is satisfactory.  Some things that are wrong are
- limited to 256 literals
- limited to 32 temporaries, 16 arguments (IIRC)
- limited room for expansion (and in Newspeak there is no room for expansion)

I'd like to see a set with
- a primitive bytecode to move the primitive number field out of the header and make more room for num args, num literals (64k would be a fine limit) etc.
- a design based around prefixes for extended indices a la VW.  lifts the limits on addressability, arity etc while keeping the bytecode compact
- some mild experiments such as a nop which can be used e.g. to express metadata that guides the decompiler (this is a to:do:, etc).

Even more interesting would be metadata that allowed the discovery of inlined blocks so that e.g. mustBeBoolean is instead handled by dynamically creating closures and the relevant ifTrue:ifFalse: message so that these can be inlined for true/false but reimplemented for other objects.
 
2. Adopt an existing project. There have been a few "new compiler"
projects over the years, and one or another of them might present an
opportunity for signifiant improvement over the status quo. I'm
thinking of ByteSurgeon, Opal, AOStA etc. It's not something we'll
rush into, but eventually, when the code is mature, we'll want to
replace the current compiler.

3. Something completely new. Now that we have closures and a fast VM,
existing projects aren't relevant anymore, but we have new
opportunities for improvement. VM-level changes, such as a new object
format or new bytecodes could drive this option, if they're big enough
that significant work on the compiler is required anyway. Maybe we can
only see the broad outlines of what the project might look like at the
moment, but we can see it on the horizon.

Well, my refactoring of the compiler to move instruction encoding out of ParseNode general instances and into BytecodeEncoder takes the pressure off as far as changing the bytecode set.  There's still a need for refactoring in InstructionStream and CompiledMethod to handle bytecode set change.  It is really a BytecodeEncoder or InstructionStream that understands how a bytecode set works, and not CompiledMethod (in e.g. readsField etc).

So, are there any pain points right now that we should think about
addressing? Is anybody planning or considering working on something
compiler-related?

For me at least has to take second place to the new object representation because there's much more benefit to be derived from the object representation.

Eliot, is there anything in the new object format that will have an
impact on image-side compilation?

I don't think so.  It should be entirely orthogonal.
 
I seem to remember you mentioning
something about efficiently supporting alternate bytecode sets. Is
that meant for Newspeak, or do you have something in mind for
Smalltalk?

It is a convenient way of migrating the bytecode set.  Better than my EncoderForLongFormV3 approach.

I don't think we have to come up with a definitive plan just now, I
just want to get a sense of what people are thinking.

Colin

--
cheers,
Eliot



Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Re: [squeak-dev] Compiler

Colin Putney-3
On Tue, Jun 19, 2012 at 6:36 PM, Eliot Miranda <[hidden email]> wrote:

> Even more interesting would be metadata that allowed the discovery of inlined blocks so that e.g. mustBeBoolean is instead handled by dynamically creating closures and the relevant ifTrue:ifFalse: message so that these can be inlined for true/false but reimplemented for other objects.

Ah, interesting. That would be even better than the [aBlock value]
pattern in VW, effectively a PIC in the bytecode.

> Well, my refactoring of the compiler to move instruction encoding out of ParseNode general instances and into BytecodeEncoder takes the pressure off as far as changing the bytecode set.  There's still a need for refactoring in InstructionStream and CompiledMethod to handle bytecode set change.  It is really a BytecodeEncoder or InstructionStream that understands how a bytecode set works, and not CompiledMethod (in e.g. readsField etc).

Ok, it sounds to me like we're somewhere between 1. and 3. There's no
need for any big architectural changes to the compiler, but some would
be good. Thanks.

Colin

Reply | Threaded
Open this post in threaded view
|

Re: Compiler

Bert Freudenberg
In reply to this post by Eliot Miranda-2
On 2012-06-20, at 03:36, Eliot Miranda wrote:

> I wouldn't say that the bytecode set is satisfactory.  Some things that are wrong are
> - limited to 256 literals
> - limited to 32 temporaries, 16 arguments (IIRC)

In theory 64 temps, in practice a unary method can have at most 56 temps, but that's shared with the stack, so if you are sending a message with 9 arguments it's only 46 usable temps. To overcome this we would need larger context objects. Not sure how hard-coded that is in the VM.

> - limited room for expansion (and in Newspeak there is no room for expansion)

I would add:

- limited to 1k jump distance

> I'd like to see a set with
> - a primitive bytecode to move the primitive number field out of the header and make more room for num args, num literals (64k would be a fine limit) etc.
> - a design based around prefixes for extended indices a la VW.  lifts the limits on addressability, arity etc while keeping the bytecode compact
> - some mild experiments such as a nop which can be used e.g. to express metadata that guides the decompiler (this is a to:do:, etc).

I have run into these hard limits, too. My idea was to hack the compiler on the image side using the current byte code set. E.g., if there are more than 255 literals, use the 256th as literal array and generate code that pulls the literal out of that array (and if needed performs the send). Similarly for temps.

Extending the jump distance is harder. One could imagine "jump pads" sprinkled throughout the compiled method so there would be several hops for one jump. Seems to be hairy though. Or possibly one could compile the blocks that exceed the max distance as actual blocks + message sends.

This would allow large methods to compile, but methods exceeding the limits would be slow. For the generated code where I needed it that would have been fine, and saved considerable effort (I had to change the code generation to break up large methods into smaller ones, with temps spilling over into instance variables, and thisContext magic to emulate local returns).

So if someone wants to have fun with the compiler, working around (some of) these limits would be great. And if we get a new bytecode set, the code would become faster, too.

- Bert -


Reply | Threaded
Open this post in threaded view
|

Re: Compiler

Hannes Hirzel
I would vote for 1, and then 'easy things first'.

"1. Incremental improvement. The compiler we have now is tried and
true. Now that we have proper block closures and a high performance
VM, there's no real need to improve bytecode generation, so we
shouldn't put much effort into this part of the system. We'll just
make small improvements and minor refactorings as needed."

I don't know if overcoming the limit of 256 literals is easy or not,
but that is something I would like to use....

--Hannes


On 6/20/12, Bert Freudenberg <[hidden email]> wrote:

> On 2012-06-20, at 03:36, Eliot Miranda wrote:
>
>> I wouldn't say that the bytecode set is satisfactory.  Some things that
>> are wrong are
>> - limited to 256 literals
>> - limited to 32 temporaries, 16 arguments (IIRC)
>
> In theory 64 temps, in practice a unary method can have at most 56 temps,
> but that's shared with the stack, so if you are sending a message with 9
> arguments it's only 46 usable temps. To overcome this we would need larger
> context objects. Not sure how hard-coded that is in the VM.
>
>> - limited room for expansion (and in Newspeak there is no room for
>> expansion)
>
> I would add:
>
> - limited to 1k jump distance
>
>> I'd like to see a set with
>> - a primitive bytecode to move the primitive number field out of the
>> header and make more room for num args, num literals (64k would be a fine
>> limit) etc.
>> - a design based around prefixes for extended indices a la VW.  lifts the
>> limits on addressability, arity etc while keeping the bytecode compact
>> - some mild experiments such as a nop which can be used e.g. to express
>> metadata that guides the decompiler (this is a to:do:, etc).
>
> I have run into these hard limits, too. My idea was to hack the compiler on
> the image side using the current byte code set. E.g., if there are more than
> 255 literals, use the 256th as literal array and generate code that pulls
> the literal out of that array (and if needed performs the send). Similarly
> for temps.
>
> Extending the jump distance is harder. One could imagine "jump pads"
> sprinkled throughout the compiled method so there would be several hops for
> one jump. Seems to be hairy though. Or possibly one could compile the blocks
> that exceed the max distance as actual blocks + message sends.
>
> This would allow large methods to compile, but methods exceeding the limits
> would be slow. For the generated code where I needed it that would have been
> fine, and saved considerable effort (I had to change the code generation to
> break up large methods into smaller ones, with temps spilling over into
> instance variables, and thisContext magic to emulate local returns).
>
> So if someone wants to have fun with the compiler, working around (some of)
> these limits would be great. And if we get a new bytecode set, the code
> would become faster, too.
>
> - Bert -
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Compiler

Nicolas Cellier
In reply to this post by Bert Freudenberg
2012/6/20 Bert Freudenberg <[hidden email]>:

> On 2012-06-20, at 03:36, Eliot Miranda wrote:
>
>> I wouldn't say that the bytecode set is satisfactory.  Some things that are wrong are
>> - limited to 256 literals
>> - limited to 32 temporaries, 16 arguments (IIRC)
>
> In theory 64 temps, in practice a unary method can have at most 56 temps, but that's shared with the stack, so if you are sending a message with 9 arguments it's only 46 usable temps. To overcome this we would need larger context objects. Not sure how hard-coded that is in the VM.
>
>> - limited room for expansion (and in Newspeak there is no room for expansion)
>
> I would add:
>
> - limited to 1k jump distance
>
>> I'd like to see a set with
>> - a primitive bytecode to move the primitive number field out of the header and make more room for num args, num literals (64k would be a fine limit) etc.
>> - a design based around prefixes for extended indices a la VW.  lifts the limits on addressability, arity etc while keeping the bytecode compact
>> - some mild experiments such as a nop which can be used e.g. to express metadata that guides the decompiler (this is a to:do:, etc).
>
> I have run into these hard limits, too. My idea was to hack the compiler on the image side using the current byte code set. E.g., if there are more than 255 literals, use the 256th as literal array and generate code that pulls the literal out of that array (and if needed performs the send). Similarly for temps.

Yes, I did that, and even your index for accessing into the big
literal array could become too big and thus be itself a literal, so
you have to cheat and recursively build array of array of ... or
calculate the index too from smaller byte-encoded literals
(fortunately, * and + don't consume a literal, they are special):)

>
> Extending the jump distance is harder. One could imagine "jump pads" sprinkled throughout the compiled method so there would be several hops for one jump. Seems to be hairy though. Or possibly one could compile the blocks that exceed the max distance as actual blocks + message sends.
>

I did all that in VW too because I wanted to generate bytecodes for
re-evaluating arbitrarily complex symbolic expressions. For the jump,
it was easier because the block is just stored in literals rather than
inlined into the compiled method, and you don't need to jump to
evalutate such block, so you just have to turn block optimization off
in compiler with a flag.

> This would allow large methods to compile, but methods exceeding the limits would be slow. For the generated code where I needed it that would have been fine, and saved considerable effort (I had to change the code generation to break up large methods into smaller ones, with temps spilling over into instance variables, and thisContext magic to emulate local returns).
>
> So if someone wants to have fun with the compiler, working around (some of) these limits would be great. And if we get a new bytecode set, the code would become faster, too.
>
> - Bert -
>

Isn't there yet another limitation, the stack depth ?

Nicolas

Reply | Threaded
Open this post in threaded view
|

Re: Compiler

Bert Freudenberg

On 2012-06-20, at 13:17, Nicolas Cellier wrote:

> 2012/6/20 Bert Freudenberg <[hidden email]>:
>> On 2012-06-20, at 03:36, Eliot Miranda wrote:
>>
>>> I wouldn't say that the bytecode set is satisfactory.  Some things that are wrong are
>>> - limited to 256 literals
>>> - limited to 32 temporaries, 16 arguments (IIRC)
>>
>> In theory 64 temps, in practice a unary method can have at most 56 temps, but that's shared with the stack, so if you are sending a message with 9 arguments it's only 46 usable temps. To overcome this we would need larger context objects. Not sure how hard-coded that is in the VM.
>>
>>> - limited room for expansion (and in Newspeak there is no room for expansion)
>>
>> I would add:
>>
>> - limited to 1k jump distance
>>
>>> I'd like to see a set with
>>> - a primitive bytecode to move the primitive number field out of the header and make more room for num args, num literals (64k would be a fine limit) etc.
>>> - a design based around prefixes for extended indices a la VW.  lifts the limits on addressability, arity etc while keeping the bytecode compact
>>> - some mild experiments such as a nop which can be used e.g. to express metadata that guides the decompiler (this is a to:do:, etc).
>>
>> I have run into these hard limits, too. My idea was to hack the compiler on the image side using the current byte code set. E.g., if there are more than 255 literals, use the 256th as literal array and generate code that pulls the literal out of that array (and if needed performs the send). Similarly for temps.
>
> Yes, I did that,

Is it ready for adoption? Since this will only affect "unusual" methods there shouldn't be noticeable side effects in regular use.

> and even your index for accessing into the big
> literal array could become too big and thus be itself a literal, so
> you have to cheat and recursively build array of array of ... or
> calculate the index too from smaller byte-encoded literals
> (fortunately, * and + don't consume a literal, they are special):)

Eek. You're right.

>> Extending the jump distance is harder. One could imagine "jump pads" sprinkled throughout the compiled method so there would be several hops for one jump. Seems to be hairy though. Or possibly one could compile the blocks that exceed the max distance as actual blocks + message sends.
>>
>
> I did all that in VW too because I wanted to generate bytecodes for
> re-evaluating arbitrarily complex symbolic expressions. For the jump,
> it was easier because the block is just stored in literals rather than
> inlined into the compiled method, and you don't need to jump to
> evalutate such block, so you just have to turn block optimization off
> in compiler with a flag.

Hmm. Maybe for big blocks one could compile another method and store it in a literal ... Nah that's too ugly.

>> This would allow large methods to compile, but methods exceeding the limits would be slow. For the generated code where I needed it that would have been fine, and saved considerable effort (I had to change the code generation to break up large methods into smaller ones, with temps spilling over into instance variables, and thisContext magic to emulate local returns).
>>
>> So if someone wants to have fun with the compiler, working around (some of) these limits would be great. And if we get a new bytecode set, the code would become faster, too.
>>
>> - Bert -
>>
>
> Isn't there yet another limitation, the stack depth ?
>
> Nicolas

Yes, as I wrote above:

>> in practice a unary method can have at most 56 temps, but that's shared with the stack [...] To overcome this we would need larger context objects.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Compiler

Nicolas Cellier
2012/6/20 Bert Freudenberg <[hidden email]>:

>
> On 2012-06-20, at 13:17, Nicolas Cellier wrote:
>
>> 2012/6/20 Bert Freudenberg <[hidden email]>:
>>> On 2012-06-20, at 03:36, Eliot Miranda wrote:
>>>
>>>> I wouldn't say that the bytecode set is satisfactory.  Some things that are wrong are
>>>> - limited to 256 literals
>>>> - limited to 32 temporaries, 16 arguments (IIRC)
>>>
>>> In theory 64 temps, in practice a unary method can have at most 56 temps, but that's shared with the stack, so if you are sending a message with 9 arguments it's only 46 usable temps. To overcome this we would need larger context objects. Not sure how hard-coded that is in the VM.
>>>
>>>> - limited room for expansion (and in Newspeak there is no room for expansion)
>>>
>>> I would add:
>>>
>>> - limited to 1k jump distance
>>>
>>>> I'd like to see a set with
>>>> - a primitive bytecode to move the primitive number field out of the header and make more room for num args, num literals (64k would be a fine limit) etc.
>>>> - a design based around prefixes for extended indices a la VW.  lifts the limits on addressability, arity etc while keeping the bytecode compact
>>>> - some mild experiments such as a nop which can be used e.g. to express metadata that guides the decompiler (this is a to:do:, etc).
>>>
>>> I have run into these hard limits, too. My idea was to hack the compiler on the image side using the current byte code set. E.g., if there are more than 255 literals, use the 256th as literal array and generate code that pulls the literal out of that array (and if needed performs the send). Similarly for temps.
>>
>> Yes, I did that,
>
> Is it ready for adoption? Since this will only affect "unusual" methods there shouldn't be noticeable side effects in regular use.
>
>> and even your index for accessing into the big
>> literal array could become too big and thus be itself a literal, so
>> you have to cheat and recursively build array of array of ... or
>> calculate the index too from smaller byte-encoded literals
>> (fortunately, * and + don't consume a literal, they are special):)
>
> Eek. You're right.
>
>>> Extending the jump distance is harder. One could imagine "jump pads" sprinkled throughout the compiled method so there would be several hops for one jump. Seems to be hairy though. Or possibly one could compile the blocks that exceed the max distance as actual blocks + message sends.
>>>
>>
>> I did all that in VW too because I wanted to generate bytecodes for
>> re-evaluating arbitrarily complex symbolic expressions. For the jump,
>> it was easier because the block is just stored in literals rather than
>> inlined into the compiled method, and you don't need to jump to
>> evalutate such block, so you just have to turn block optimization off
>> in compiler with a flag.
>
> Hmm. Maybe for big blocks one could compile another method and store it in a literal ... Nah that's too ugly.
>
>>> This would allow large methods to compile, but methods exceeding the limits would be slow. For the generated code where I needed it that would have been fine, and saved considerable effort (I had to change the code generation to break up large methods into smaller ones, with temps spilling over into instance variables, and thisContext magic to emulate local returns).
>>>
>>> So if someone wants to have fun with the compiler, working around (some of) these limits would be great. And if we get a new bytecode set, the code would become faster, too.
>>>
>>> - Bert -
>>>
>>
>> Isn't there yet another limitation, the stack depth ?
>>
>> Nicolas
>
> Yes, as I wrote above:
>
>>> in practice a unary method can have at most 56 temps, but that's shared with the stack [...] To overcome this we would need larger context objects.
>
> - Bert -
>
>

Oh, but for the number of args, I already have a hack squeak-ready in
Smallapack, I just pass a single Array, and for the internal temps, we
could use the same trick, maybe I did that too in VW.

I was more thinking of the depth of nested message a + (b + (c + (d +
... will be translated as
 push a; push b; push c; push d; ...; send +; send +; send +
Is there a limitation there ?

Nicolas

Reply | Threaded
Open this post in threaded view
|

Re: Compiler

Bert Freudenberg
On 2012-06-20, at 20:24, Nicolas Cellier wrote:

> 2012/6/20 Bert Freudenberg <[hidden email]>:
>>
>>>> in practice a unary method can have at most 56 temps, but that's shared with the stack [...] To overcome this we would need larger context objects.
>>
>> - Bert -
>
> Oh, but for the number of args, I already have a hack squeak-ready in
> Smallapack, I just pass a single Array, and for the internal temps, we
> could use the same trick, maybe I did that too in VW.
>
> I was more thinking of the depth of nested message a + (b + (c + (d +
> ... will be translated as
> push a; push b; push c; push d; ...; send +; send +; send +
> Is there a limitation there ?


No limit (except for method size). The stack of this context only needs to hold all the arguments for a single of these sends. It's cleared when the control returns to this context so it can be re-used for the next cascaded message send.

That's why, for example, long brace arrays are compiled as

        (Array braceStream: N) nextPut: expr1; nextPut: expr2; nextPut: expr3; ... nextPut: exprN

whereas short ones used to be compiled as

        Array braceWith: expr1 with: expr2 with: expr3

Nowadays the short ones use Eliot's new "pop n into new array" byte code.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Compiler

Nicolas Cellier
2012/6/21 Bert Freudenberg <[hidden email]>:

> On 2012-06-20, at 20:24, Nicolas Cellier wrote:
>
>> 2012/6/20 Bert Freudenberg <[hidden email]>:
>>>
>>>>> in practice a unary method can have at most 56 temps, but that's shared with the stack [...] To overcome this we would need larger context objects.
>>>
>>> - Bert -
>>
>> Oh, but for the number of args, I already have a hack squeak-ready in
>> Smallapack, I just pass a single Array, and for the internal temps, we
>> could use the same trick, maybe I did that too in VW.
>>
>> I was more thinking of the depth of nested message a + (b + (c + (d +
>> ... will be translated as
>> push a; push b; push c; push d; ...; send +; send +; send +
>> Is there a limitation there ?
>
>
> No limit (except for method size). The stack of this context only needs to hold all the arguments for a single of these sends. It's cleared when the control returns to this context so it can be re-used for the next cascaded message send.
>
> That's why, for example, long brace arrays are compiled as
>
>        (Array braceStream: N) nextPut: expr1; nextPut: expr2; nextPut: expr3; ... nextPut: exprN
>
> whereas short ones used to be compiled as
>
>        Array braceWith: expr1 with: expr2 with: expr3
>
> Nowadays the short ones use Eliot's new "pop n into new array" byte code.
>
> - Bert -
>

OK, but with the LargeFrame = 56 limit, I cannot compile a method written

^1+(1+(1+(...  64x times )))

or I get an Error triggered by:

CompiledMethod>>needsFrameSize: newFrameSize
        "Set the largeFrameBit to accomodate the newFrameSize"
        | largeFrameBit header |
        largeFrameBit := 16r20000.
        (self numTemps + newFrameSize) > LargeFrame ifTrue:
                [^ TooDeepStackError signal: 'Cannot compile -- stack including
temps is too deep'].
        header := self objectAt: 1.
        (header bitAnd: largeFrameBit) ~= 0
                ifTrue: [header := header - largeFrameBit].
        self objectAt: 1 put: header
                        + ( ((self numTemps + newFrameSize) > SmallFrame or: [ self
primitive = 84 "perform:withArguments:"])
                                        ifTrue: [largeFrameBit]
                                        ifFalse: [0])

To overcome the limitation, I would have to re-arrange the order of
push/sends. Since I can't easily guess absence of side effect in
Smalltalk, I can't reorder.
So I would have to push in a temporary array rather than in a stack
(simulate a stack), then push (array at: i) on demand just before
sending (I think the closure bytecodes also provide some short form).
Phew...

Nicolas

Reply | Threaded
Open this post in threaded view
|

Re: Compiler

Bert Freudenberg
On 2012-06-21, at 17:55, Nicolas Cellier wrote:

> OK, but with the LargeFrame = 56 limit, I cannot compile a method written
>
> ^1+(1+(1+(...  64x times )))
>
> or I get an Error triggered by:
>
> CompiledMethod>>needsFrameSize: newFrameSize
> "Set the largeFrameBit to accomodate the newFrameSize"
> | largeFrameBit header |
> largeFrameBit := 16r20000.
> (self numTemps + newFrameSize) > LargeFrame ifTrue:
> [^ TooDeepStackError signal: 'Cannot compile -- stack including
> temps is too deep'].
> header := self objectAt: 1.
> (header bitAnd: largeFrameBit) ~= 0
> ifTrue: [header := header - largeFrameBit].
> self objectAt: 1 put: header
> + ( ((self numTemps + newFrameSize) > SmallFrame or: [ self
> primitive = 84 "perform:withArguments:"])
> ifTrue: [largeFrameBit]
> ifFalse: [0])
>
> To overcome the limitation, I would have to re-arrange the order of
> push/sends. Since I can't easily guess absence of side effect in
> Smalltalk, I can't reorder.
> So I would have to push in a temporary array rather than in a stack
> (simulate a stack), then push (array at: i) on demand just before
> sending (I think the closure bytecodes also provide some short form).
> Phew...
>
> Nicolas

That sounds way harder than pulling out subexpressions into hidden temp vars. E.g.,

        ^ 1 + (2 + (3 + 4))

is exactly equivalent (same order of message sends) to

        | t1 t2 |
        t1 := 3 + 4.
        t2 := 2 + t1.
        ^ 1 + t2

If the temp number limit was lifted this would be the way to go, I think.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Compiler

Nicolas Cellier
2012/6/21 Bert Freudenberg <[hidden email]>:

> On 2012-06-21, at 17:55, Nicolas Cellier wrote:
>> OK, but with the LargeFrame = 56 limit, I cannot compile a method written
>>
>> ^1+(1+(1+(...  64x times )))
>>
>> or I get an Error triggered by:
>>
>> CompiledMethod>>needsFrameSize: newFrameSize
>>       "Set the largeFrameBit to accomodate the newFrameSize"
>>       | largeFrameBit header |
>>       largeFrameBit := 16r20000.
>>       (self numTemps + newFrameSize) > LargeFrame ifTrue:
>>               [^ TooDeepStackError signal: 'Cannot compile -- stack including
>> temps is too deep'].
>>       header := self objectAt: 1.
>>       (header bitAnd: largeFrameBit) ~= 0
>>               ifTrue: [header := header - largeFrameBit].
>>       self objectAt: 1 put: header
>>                       + ( ((self numTemps + newFrameSize) > SmallFrame or: [ self
>> primitive = 84 "perform:withArguments:"])
>>                                       ifTrue: [largeFrameBit]
>>                                       ifFalse: [0])
>>
>> To overcome the limitation, I would have to re-arrange the order of
>> push/sends. Since I can't easily guess absence of side effect in
>> Smalltalk, I can't reorder.
>> So I would have to push in a temporary array rather than in a stack
>> (simulate a stack), then push (array at: i) on demand just before
>> sending (I think the closure bytecodes also provide some short form).
>> Phew...
>>
>> Nicolas
>
> That sounds way harder than pulling out subexpressions into hidden temp vars. E.g.,
>
>        ^ 1 + (2 + (3 + 4))
>
> is exactly equivalent (same order of message sends) to
>
>        | t1 t2 |
>        t1 := 3 + 4.
>        t2 := 2 + t1.
>        ^ 1 + t2
>
> If the temp number limit was lifted this would be the way to go, I think.
>
> - Bert -
>

Sure, it's rather theorical without thinking of any application, but
if these are objects rather than literals (that we can expect being
constant), say for example some instance variable, then can we
guaranty that the first message won't have any side effect...

(a at: 1) + ( (a at:2) * ( (a at: 1 put: ( ... ) ) + (... ) ) )

Nicolas

Reply | Threaded
Open this post in threaded view
|

Re: Compiler

Bert Freudenberg

On 2012-06-21, at 19:43, Nicolas Cellier wrote:

> 2012/6/21 Bert Freudenberg <[hidden email]>:
>>
>>        ^ 1 + (2 + (3 + 4))
>>
>> is exactly equivalent (same order of message sends) to
>>
>>        | t1 t2 |
>>        t1 := 3 + 4.
>>        t2 := 2 + t1.
>>        ^ 1 + t2
>>
>> If the temp number limit was lifted this would be the way to go, I think.
>>
>> - Bert -
>>
>
> Sure, it's rather theorical without thinking of any application, but
> if these are objects rather than literals (that we can expect being
> constant), say for example some instance variable, then can we
> guaranty that the first message won't have any side effect...
>
> (a at: 1) + ( (a at:2) * ( (a at: 1 put: ( ... ) ) + (... ) ) )
>
> Nicolas

Did you mean to write we can *not* guarantee it?

I don't think we have to worry about side effects, as long as the evaluation order of all expressions remains the same:

        a := {3. 4}.
        (a at: 1) + ( (a at: 2) * (a at: 1 put: 5) ).

is equivalent to

        a := {3. 4}.
        t1 := a at: 1.
        t2 := a at: 2.
        t3 := a at: 1 put: 5.
        t4 := t2 * t3.
        t5 := t1 + t4.
        t5


- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Compiler

Nicolas Cellier
2012/6/21 Bert Freudenberg <[hidden email]>:

>
> On 2012-06-21, at 19:43, Nicolas Cellier wrote:
>
>> 2012/6/21 Bert Freudenberg <[hidden email]>:
>>>
>>>        ^ 1 + (2 + (3 + 4))
>>>
>>> is exactly equivalent (same order of message sends) to
>>>
>>>        | t1 t2 |
>>>        t1 := 3 + 4.
>>>        t2 := 2 + t1.
>>>        ^ 1 + t2
>>>
>>> If the temp number limit was lifted this would be the way to go, I think.
>>>
>>> - Bert -
>>>
>>
>> Sure, it's rather theorical without thinking of any application, but
>> if these are objects rather than literals (that we can expect being
>> constant), say for example some instance variable, then can we
>> guaranty that the first message won't have any side effect...
>>
>> (a at: 1) + ( (a at:2) * ( (a at: 1 put: ( ... ) ) + (... ) ) )
>>
>> Nicolas
>
> Did you mean to write we can *not* guarantee it?
>
> I don't think we have to worry about side effects, as long as the evaluation order of all expressions remains the same:
>
>        a := {3. 4}.
>        (a at: 1) + ( (a at: 2) * (a at: 1 put: 5) ).
>
> is equivalent to
>
>        a := {3. 4}.
>        t1 := a at: 1.
>        t2 := a at: 2.
>        t3 := a at: 1 put: 5.
>        t4 := t2 * t3.
>        t5 := t1 + t4.
>        t5
>

Agree, but then we're back to our initial problem, above code first
push all parameters then perform all the sends, and might overflow the
stack limit in-between... We could hack and simulate the stack in such
case as I proposed , but I feel it will be really teddious to handle
all of current CompiledMethod limitations...

Nicolas

>
> - Bert -
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Compiler

Bert Freudenberg

On 2012-06-21, at 22:32, Nicolas Cellier wrote:

> 2012/6/21 Bert Freudenberg <[hidden email]>:
>>
>> On 2012-06-21, at 19:43, Nicolas Cellier wrote:
>>
>>> if these are objects rather than literals (that we can expect being
>>> constant), say for example some instance variable, then can we
>>> guaranty that the first message won't have any side effect...
>>>
>>> (a at: 1) + ( (a at:2) * ( (a at: 1 put: ( ... ) ) + (... ) ) )
>>>
>>> Nicolas
>>
>> Did you mean to write we can *not* guarantee it?
>>
>> I don't think we have to worry about side effects, as long as the evaluation order of all expressions remains the same:
>>
>>        a := {3. 4}.
>>        (a at: 1) + ( (a at: 2) * (a at: 1 put: 5) ).
>>
>> is equivalent to
>>
>>        a := {3. 4}.
>>        t1 := a at: 1.
>>        t2 := a at: 2.
>>        t3 := a at: 1 put: 5.
>>        t4 := t2 * t3.
>>        t5 := t1 + t4.
>>        t5
>>
>
> Agree, but then we're back to our initial problem, above code first
> push all parameters then perform all the sends, and might overflow the
> stack limit in-between... We could hack and simulate the stack in such
> case as I proposed , but I feel it will be really teddious to handle
> all of current CompiledMethod limitations...
>
> Nicolas


No, we have worked around the stack limitation. The expanded form uses temps, not stack slots. And for more temps you had a solution already :)

The first example needs a stack of size 5, while the second one needs only 3.

The maximum stack depth the expanded form needs is the number of arguments of the message send with the most arguments, plus 1 for the receiver/result.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Compiler

Eliot Miranda-2


On Thu, Jun 21, 2012 at 2:34 PM, Bert Freudenberg <[hidden email]> wrote:

On 2012-06-21, at 22:32, Nicolas Cellier wrote:

> 2012/6/21 Bert Freudenberg <[hidden email]>:
>>
>> On 2012-06-21, at 19:43, Nicolas Cellier wrote:
>>
>>> if these are objects rather than literals (that we can expect being
>>> constant), say for example some instance variable, then can we
>>> guaranty that the first message won't have any side effect...
>>>
>>> (a at: 1) + ( (a at:2) * ( (a at: 1 put: ( ... ) ) + (... ) ) )
>>>
>>> Nicolas
>>
>> Did you mean to write we can *not* guarantee it?
>>
>> I don't think we have to worry about side effects, as long as the evaluation order of all expressions remains the same:
>>
>>        a := {3. 4}.
>>        (a at: 1) + ( (a at: 2) * (a at: 1 put: 5) ).
>>
>> is equivalent to
>>
>>        a := {3. 4}.
>>        t1 := a at: 1.
>>        t2 := a at: 2.
>>        t3 := a at: 1 put: 5.
>>        t4 := t2 * t3.
>>        t5 := t1 + t4.
>>        t5
>>
>
> Agree, but then we're back to our initial problem, above code first
> push all parameters then perform all the sends, and might overflow the
> stack limit in-between... We could hack and simulate the stack in such
> case as I proposed , but I feel it will be really teddious to handle
> all of current CompiledMethod limitations...
>
> Nicolas


No, we have worked around the stack limitation. The expanded form uses temps, not stack slots. And for more temps you had a solution already :)

The first example needs a stack of size 5, while the second one needs only 3.

The maximum stack depth the expanded form needs is the number of arguments of the message send with the most arguments, plus 1 for the receiver/result.

This is fine.  But a 2 bit field could code the logarithm of the context size, either 16, 32, 64 & 128 slots, or 16, 64, 256 & 1024 slots.  I think that would be more than adequate.

So with 31 bits in the header to play with you'd get e.g.
0:     14 numLiterals (16k literals)
14:     6 numArgs (63 args)
20:     6 numTemps-numArgs (63 temps, temps ~= numTemps + numArgs)
26:     2 context size (16, 64, 256 & 1024 slots)
28:     3 bits flags

But there are lots of other tricks one could play moving some fields into bytecode.  For example, a one byte bytecode that specified 0 temps and a two byte bytecode that specified how many temps to allocate (up to 255 or 256) would save a further 6 bits, and provide a greater range.  Why have two bytecodes?  Most methods have no temps, e.g.:

| noTemps hasTemps |
noTemps := hasTemps := 0.
CompiledMethod allInstancesDo:
[:cm|
cm isQuick ifFalse:
[cm numTemps = cm numArgs ifTrue: [noTemps := noTemps + 1] ifFalse: [hasTemps := hasTemps + 1]]].
{noTemps. hasTemps } #(65839 24230)

But saving 64k by burning a bytecode might be silly.  Depends on how the bytecode set looks.
 

- Bert -






--
best,
Eliot