here's a curio for compiler mavens...

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

here's a curio for compiler mavens...

Eliot Miranda-2
Hi All,

    (whitewash alert)  I just had occasion to look at the bytecode generated by the standard compiler for HashedCollection class>>#goodPrimeAtLeast:.  Here's the source, with the issue in bold:

goodPrimeAtLeast: lowerLimit
"Answer the next good prime >= lowerlimit.
If lowerLimit is larger than the largest known good prime,
just make it odd."
| primes low mid high prime |
primes := self goodPrimes.
low := 1.
high := primes size.
lowerLimit > (primes at: high) ifTrue: [
^lowerLimit bitOr: 1 ].
[ high - low <= 1 ] whileFalse: [
mid := high + low // 2.
prime := primes at: mid.
prime = lowerLimit ifTrue: [ ^prime ].
prime < lowerLimit
ifTrue: [ low := mid ]
ifFalse: [ high := mid ] ].
(primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
^primes at: high

The code for this sequence is
58 <15> pushTemp: 5
59 <10> pushTemp: 0
60 <B2> send: <
61 <9B> jumpFalse: 66
62 <13> pushTemp: 3
63 <81 42> storeIntoTemp: 2
65 <92> jumpTo: 69
66 <13> pushTemp: 3
67 <81 44> storeIntoTemp: 4
69 <87> pop

where-as the following would be better:

58 <15> pushTemp: 5
59 <10> pushTemp: 0
60 <B2> send: <
61 <9B> jumpFalse: 66
62 <13> pushTemp: 3
63 <82 42> popIntoTemp: 2
65 <92> jumpTo: 69
66 <13> pushTemp: 3
67 <82 44> popIntoTemp: 4

The reason is that the code generator favours using a single pop for both arms of the if:

MessageNode>>sizeCodeForIf: encoder value: forValue
| thenExpr elseExpr branchSize thenSize elseSize |
thenExpr := arguments at: 1.
elseExpr := arguments at: 2.
(forValue
or: [(thenExpr isJust: NodeNil)
or: [elseExpr isJust: NodeNil]]) not
"(...not ifTrue: avoids using ifFalse: alone during this compile)"
ifTrue:  "Two-armed IFs forEffect share a single pop"
[^super sizeCodeForEffect: encoder].
...

MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue
| thenExpr thenSize elseExpr elseSize |
thenSize := sizes at: 1.
elseSize := sizes at: 2.
(forValue not and: [elseSize * thenSize > 0]) ifTrue:
"Two-armed IFs forEffect share a single pop"
[^super emitCodeForEffect: stack encoder: encoder].

It would be nice if this only happened if doing so actually reduced the size of the generated code.
--
best,
Eliot



Reply | Threaded
Open this post in threaded view
|

Re: here's a curio for compiler mavens...

Igor Stasenko
On 13 February 2012 20:58, Eliot Miranda <[hidden email]> wrote:

> Hi All,
>
>     (whitewash alert)  I just had occasion to look at the bytecode generated
> by the standard compiler for HashedCollection class>>#goodPrimeAtLeast:.
>  Here's the source, with the issue in bold:
>
> goodPrimeAtLeast: lowerLimit
> "Answer the next good prime >= lowerlimit.
> If lowerLimit is larger than the largest known good prime,
> just make it odd."
> | primes low mid high prime |
> primes := self goodPrimes.
> low := 1.
> high := primes size.
> lowerLimit > (primes at: high) ifTrue: [
> ^lowerLimit bitOr: 1 ].
> [ high - low <= 1 ] whileFalse: [
> mid := high + low // 2.
> prime := primes at: mid.
> prime = lowerLimit ifTrue: [ ^prime ].
> prime < lowerLimit
> ifTrue: [ low := mid ]
> ifFalse: [ high := mid ] ].
> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
> ^primes at: high
>
> The code for this sequence is
> 58 <15> pushTemp: 5
> 59 <10> pushTemp: 0
> 60 <B2> send: <
> 61 <9B> jumpFalse: 66
> 62 <13> pushTemp: 3
> 63 <81 42> storeIntoTemp: 2
> 65 <92> jumpTo: 69
> 66 <13> pushTemp: 3
> 67 <81 44> storeIntoTemp: 4
> 69 <87> pop
>
> where-as the following would be better:
>
> 58 <15> pushTemp: 5
> 59 <10> pushTemp: 0
> 60 <B2> send: <
> 61 <9B> jumpFalse: 66
> 62 <13> pushTemp: 3
> 63 <82 42> popIntoTemp: 2
> 65 <92> jumpTo: 69
> 66 <13> pushTemp: 3
> 67 <82 44> popIntoTemp: 4
>

[ (a) ifTrue:[ b] ifFalse: [c] ] value
should answer b or c (depending on a)
now if you use pop instead of store, it will break.

Of course in while loop, the [ loop-body closure ] value is always
discarded, so yes we can
eliminate extra pop.
But you can do even more:

1. loop head
...
....
 61 <9B> jumpFalse: 66
 62 <13> pushTemp: 3
 63 <82 42> popIntoTemp: 2
** 65 <92> jumpTo: 1
 66 <13> pushTemp: 3
 67 <82 44> popIntoTemp: 4
 69  ...  jumpTo: 1

so, we can have 1 less jump when code enters first branch.


> The reason is that the code generator favours using a single pop for both
> arms of the if:
>
> MessageNode>>sizeCodeForIf: encoder value: forValue
> | thenExpr elseExpr branchSize thenSize elseSize |
> thenExpr := arguments at: 1.
> elseExpr := arguments at: 2.
> (forValue
> or: [(thenExpr isJust: NodeNil)
> or: [elseExpr isJust: NodeNil]]) not
> "(...not ifTrue: avoids using ifFalse: alone during this compile)"
> ifTrue:  "Two-armed IFs forEffect share a single pop"
> [^super sizeCodeForEffect: encoder].
> ...
>
> MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue
> | thenExpr thenSize elseExpr elseSize |
> thenSize := sizes at: 1.
> elseSize := sizes at: 2.
> (forValue not and: [elseSize * thenSize > 0]) ifTrue:
> "Two-armed IFs forEffect share a single pop"
> [^super emitCodeForEffect: stack encoder: encoder].
>
> It would be nice if this only happened if doing so actually reduced the size
> of the generated code.
> --
> best,
> Eliot
>
>
>
>



--
Best regards,
Igor Stasenko.

Reply | Threaded
Open this post in threaded view
|

Re: here's a curio for compiler mavens...

Nicolas Cellier
In reply to this post by Eliot Miranda-2
Le 13 février 2012 20:58, Eliot Miranda <[hidden email]> a écrit :

> Hi All,
>
>     (whitewash alert)  I just had occasion to look at the bytecode generated
> by the standard compiler for HashedCollection class>>#goodPrimeAtLeast:.
>  Here's the source, with the issue in bold:
>
> goodPrimeAtLeast: lowerLimit
> "Answer the next good prime >= lowerlimit.
> If lowerLimit is larger than the largest known good prime,
> just make it odd."
> | primes low mid high prime |
> primes := self goodPrimes.
> low := 1.
> high := primes size.
> lowerLimit > (primes at: high) ifTrue: [
> ^lowerLimit bitOr: 1 ].
> [ high - low <= 1 ] whileFalse: [
> mid := high + low // 2.
> prime := primes at: mid.
> prime = lowerLimit ifTrue: [ ^prime ].
> prime < lowerLimit
> ifTrue: [ low := mid ]
> ifFalse: [ high := mid ] ].
> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
> ^primes at: high
>
> The code for this sequence is
> 58 <15> pushTemp: 5
> 59 <10> pushTemp: 0
> 60 <B2> send: <
> 61 <9B> jumpFalse: 66
> 62 <13> pushTemp: 3
> 63 <81 42> storeIntoTemp: 2
> 65 <92> jumpTo: 69
> 66 <13> pushTemp: 3
> 67 <81 44> storeIntoTemp: 4
> 69 <87> pop
>
> where-as the following would be better:
>
> 58 <15> pushTemp: 5
> 59 <10> pushTemp: 0
> 60 <B2> send: <
> 61 <9B> jumpFalse: 66
> 62 <13> pushTemp: 3
> 63 <82 42> popIntoTemp: 2
> 65 <92> jumpTo: 69
> 66 <13> pushTemp: 3
> 67 <82 44> popIntoTemp: 4
>
> The reason is that the code generator favours using a single pop for both
> arms of the if:
>
> MessageNode>>sizeCodeForIf: encoder value: forValue
> | thenExpr elseExpr branchSize thenSize elseSize |
> thenExpr := arguments at: 1.
> elseExpr := arguments at: 2.
> (forValue
> or: [(thenExpr isJust: NodeNil)
> or: [elseExpr isJust: NodeNil]]) not
> "(...not ifTrue: avoids using ifFalse: alone during this compile)"
> ifTrue:  "Two-armed IFs forEffect share a single pop"
> [^super sizeCodeForEffect: encoder].
> ...
>
> MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue
> | thenExpr thenSize elseExpr elseSize |
> thenSize := sizes at: 1.
> elseSize := sizes at: 2.
> (forValue not and: [elseSize * thenSize > 0]) ifTrue:
> "Two-armed IFs forEffect share a single pop"
> [^super emitCodeForEffect: stack encoder: encoder].
>
> It would be nice if this only happened if doing so actually reduced the size
> of the generated code.
> --

Ah yes, for example ifTrue: [self doThis] ifFalse: [self doThat] will
factor a pop.
So, the room for optimisation is the case when both branch finish with
assignment? (storePop)
Will it break the Decompiler?

Nicolas

> best,
> Eliot
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: here's a curio for compiler mavens...

Nicolas Cellier
In reply to this post by Igor Stasenko
Le 13 février 2012 21:27, Igor Stasenko <[hidden email]> a écrit :

> On 13 February 2012 20:58, Eliot Miranda <[hidden email]> wrote:
>> Hi All,
>>
>>     (whitewash alert)  I just had occasion to look at the bytecode generated
>> by the standard compiler for HashedCollection class>>#goodPrimeAtLeast:.
>>  Here's the source, with the issue in bold:
>>
>> goodPrimeAtLeast: lowerLimit
>> "Answer the next good prime >= lowerlimit.
>> If lowerLimit is larger than the largest known good prime,
>> just make it odd."
>> | primes low mid high prime |
>> primes := self goodPrimes.
>> low := 1.
>> high := primes size.
>> lowerLimit > (primes at: high) ifTrue: [
>> ^lowerLimit bitOr: 1 ].
>> [ high - low <= 1 ] whileFalse: [
>> mid := high + low // 2.
>> prime := primes at: mid.
>> prime = lowerLimit ifTrue: [ ^prime ].
>> prime < lowerLimit
>> ifTrue: [ low := mid ]
>> ifFalse: [ high := mid ] ].
>> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
>> ^primes at: high
>>
>> The code for this sequence is
>> 58 <15> pushTemp: 5
>> 59 <10> pushTemp: 0
>> 60 <B2> send: <
>> 61 <9B> jumpFalse: 66
>> 62 <13> pushTemp: 3
>> 63 <81 42> storeIntoTemp: 2
>> 65 <92> jumpTo: 69
>> 66 <13> pushTemp: 3
>> 67 <81 44> storeIntoTemp: 4
>> 69 <87> pop
>>
>> where-as the following would be better:
>>
>> 58 <15> pushTemp: 5
>> 59 <10> pushTemp: 0
>> 60 <B2> send: <
>> 61 <9B> jumpFalse: 66
>> 62 <13> pushTemp: 3
>> 63 <82 42> popIntoTemp: 2
>> 65 <92> jumpTo: 69
>> 66 <13> pushTemp: 3
>> 67 <82 44> popIntoTemp: 4
>>
>
> [ (a) ifTrue:[ b] ifFalse: [c] ] value
> should answer b or c (depending on a)
> now if you use pop instead of store, it will break.
>

Yes but we always know whether we want to emit code forValue or just for effect.

> Of course in while loop, the [ loop-body closure ] value is always
> discarded, so yes we can
> eliminate extra pop.
> But you can do even more:
>
> 1. loop head
> ...
> ....
>  61 <9B> jumpFalse: 66
>  62 <13> pushTemp: 3
>  63 <82 42> popIntoTemp: 2
> ** 65 <92> jumpTo: 1
>  66 <13> pushTemp: 3
>  67 <82 44> popIntoTemp: 4
>  69  ...  jumpTo: 1
>
> so, we can have 1 less jump when code enters first branch.
>

Very good!
I think however that hacking the Decompiler will be a challenge!

>
>> The reason is that the code generator favours using a single pop for both
>> arms of the if:
>>
>> MessageNode>>sizeCodeForIf: encoder value: forValue
>> | thenExpr elseExpr branchSize thenSize elseSize |
>> thenExpr := arguments at: 1.
>> elseExpr := arguments at: 2.
>> (forValue
>> or: [(thenExpr isJust: NodeNil)
>> or: [elseExpr isJust: NodeNil]]) not
>> "(...not ifTrue: avoids using ifFalse: alone during this compile)"
>> ifTrue:  "Two-armed IFs forEffect share a single pop"
>> [^super sizeCodeForEffect: encoder].
>> ...
>>
>> MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue
>> | thenExpr thenSize elseExpr elseSize |
>> thenSize := sizes at: 1.
>> elseSize := sizes at: 2.
>> (forValue not and: [elseSize * thenSize > 0]) ifTrue:
>> "Two-armed IFs forEffect share a single pop"
>> [^super emitCodeForEffect: stack encoder: encoder].
>>
>> It would be nice if this only happened if doing so actually reduced the size
>> of the generated code.
>> --
>> best,
>> Eliot
>>
>>
>>
>>
>
>
>
> --
> Best regards,
> Igor Stasenko.
>

Reply | Threaded
Open this post in threaded view
|

Re: here's a curio for compiler mavens...

Igor Stasenko
On 13 February 2012 22:38, Nicolas Cellier
<[hidden email]> wrote:

> Le 13 février 2012 21:27, Igor Stasenko <[hidden email]> a écrit :
>> On 13 February 2012 20:58, Eliot Miranda <[hidden email]> wrote:
>>> Hi All,
>>>
>>>     (whitewash alert)  I just had occasion to look at the bytecode generated
>>> by the standard compiler for HashedCollection class>>#goodPrimeAtLeast:.
>>>  Here's the source, with the issue in bold:
>>>
>>> goodPrimeAtLeast: lowerLimit
>>> "Answer the next good prime >= lowerlimit.
>>> If lowerLimit is larger than the largest known good prime,
>>> just make it odd."
>>> | primes low mid high prime |
>>> primes := self goodPrimes.
>>> low := 1.
>>> high := primes size.
>>> lowerLimit > (primes at: high) ifTrue: [
>>> ^lowerLimit bitOr: 1 ].
>>> [ high - low <= 1 ] whileFalse: [
>>> mid := high + low // 2.
>>> prime := primes at: mid.
>>> prime = lowerLimit ifTrue: [ ^prime ].
>>> prime < lowerLimit
>>> ifTrue: [ low := mid ]
>>> ifFalse: [ high := mid ] ].
>>> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
>>> ^primes at: high
>>>
>>> The code for this sequence is
>>> 58 <15> pushTemp: 5
>>> 59 <10> pushTemp: 0
>>> 60 <B2> send: <
>>> 61 <9B> jumpFalse: 66
>>> 62 <13> pushTemp: 3
>>> 63 <81 42> storeIntoTemp: 2
>>> 65 <92> jumpTo: 69
>>> 66 <13> pushTemp: 3
>>> 67 <81 44> storeIntoTemp: 4
>>> 69 <87> pop
>>>
>>> where-as the following would be better:
>>>
>>> 58 <15> pushTemp: 5
>>> 59 <10> pushTemp: 0
>>> 60 <B2> send: <
>>> 61 <9B> jumpFalse: 66
>>> 62 <13> pushTemp: 3
>>> 63 <82 42> popIntoTemp: 2
>>> 65 <92> jumpTo: 69
>>> 66 <13> pushTemp: 3
>>> 67 <82 44> popIntoTemp: 4
>>>
>>
>> [ (a) ifTrue:[ b] ifFalse: [c] ] value
>> should answer b or c (depending on a)
>> now if you use pop instead of store, it will break.
>>
>
> Yes but we always know whether we want to emit code forValue or just for effect.
>
>> Of course in while loop, the [ loop-body closure ] value is always
>> discarded, so yes we can
>> eliminate extra pop.
>> But you can do even more:
>>
>> 1. loop head
>> ...
>> ....
>>  61 <9B> jumpFalse: 66
>>  62 <13> pushTemp: 3
>>  63 <82 42> popIntoTemp: 2
>> ** 65 <92> jumpTo: 1
>>  66 <13> pushTemp: 3
>>  67 <82 44> popIntoTemp: 4
>>  69  ...  jumpTo: 1
>>
>> so, we can have 1 less jump when code enters first branch.
>>
>
> Very good!
> I think however that hacking the Decompiler will be a challenge!
>

Well, in this case it is easy to hide hints for decompiler directly in
a bytecode:

1. loop head
...
100. jumpTo: 106
101. push ...
102. pop
103. jumpTo: 1
104. hint
105. more hints
106. ... section start for false branch

n+1   ...

----
same actually for loops. instead of writing sophisticated detection
code , we can simply read hints:

1. loop head
...

10. jumpIfFalse: 101 (leaving the loop )

... loop body

99. jumpTo: 1  (loop head)
100. hint for loop
101. ... rest of method's bytecode

As you can see , it doesn't affects the performance, since its
unreachable code, just the size of bytecode.

just don't put me on fire for suggesting that :)


--
Best regards,
Igor Stasenko.

Reply | Threaded
Open this post in threaded view
|

Re: here's a curio for compiler mavens...

Nicolas Cellier
Le 13 février 2012 22:08, Igor Stasenko <[hidden email]> a écrit :

> On 13 February 2012 22:38, Nicolas Cellier
> <[hidden email]> wrote:
>> Le 13 février 2012 21:27, Igor Stasenko <[hidden email]> a écrit :
>>> On 13 February 2012 20:58, Eliot Miranda <[hidden email]> wrote:
>>>> Hi All,
>>>>
>>>>     (whitewash alert)  I just had occasion to look at the bytecode generated
>>>> by the standard compiler for HashedCollection class>>#goodPrimeAtLeast:.
>>>>  Here's the source, with the issue in bold:
>>>>
>>>> goodPrimeAtLeast: lowerLimit
>>>> "Answer the next good prime >= lowerlimit.
>>>> If lowerLimit is larger than the largest known good prime,
>>>> just make it odd."
>>>> | primes low mid high prime |
>>>> primes := self goodPrimes.
>>>> low := 1.
>>>> high := primes size.
>>>> lowerLimit > (primes at: high) ifTrue: [
>>>> ^lowerLimit bitOr: 1 ].
>>>> [ high - low <= 1 ] whileFalse: [
>>>> mid := high + low // 2.
>>>> prime := primes at: mid.
>>>> prime = lowerLimit ifTrue: [ ^prime ].
>>>> prime < lowerLimit
>>>> ifTrue: [ low := mid ]
>>>> ifFalse: [ high := mid ] ].
>>>> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
>>>> ^primes at: high
>>>>
>>>> The code for this sequence is
>>>> 58 <15> pushTemp: 5
>>>> 59 <10> pushTemp: 0
>>>> 60 <B2> send: <
>>>> 61 <9B> jumpFalse: 66
>>>> 62 <13> pushTemp: 3
>>>> 63 <81 42> storeIntoTemp: 2
>>>> 65 <92> jumpTo: 69
>>>> 66 <13> pushTemp: 3
>>>> 67 <81 44> storeIntoTemp: 4
>>>> 69 <87> pop
>>>>
>>>> where-as the following would be better:
>>>>
>>>> 58 <15> pushTemp: 5
>>>> 59 <10> pushTemp: 0
>>>> 60 <B2> send: <
>>>> 61 <9B> jumpFalse: 66
>>>> 62 <13> pushTemp: 3
>>>> 63 <82 42> popIntoTemp: 2
>>>> 65 <92> jumpTo: 69
>>>> 66 <13> pushTemp: 3
>>>> 67 <82 44> popIntoTemp: 4
>>>>
>>>
>>> [ (a) ifTrue:[ b] ifFalse: [c] ] value
>>> should answer b or c (depending on a)
>>> now if you use pop instead of store, it will break.
>>>
>>
>> Yes but we always know whether we want to emit code forValue or just for effect.
>>
>>> Of course in while loop, the [ loop-body closure ] value is always
>>> discarded, so yes we can
>>> eliminate extra pop.
>>> But you can do even more:
>>>
>>> 1. loop head
>>> ...
>>> ....
>>>  61 <9B> jumpFalse: 66
>>>  62 <13> pushTemp: 3
>>>  63 <82 42> popIntoTemp: 2
>>> ** 65 <92> jumpTo: 1
>>>  66 <13> pushTemp: 3
>>>  67 <82 44> popIntoTemp: 4
>>>  69  ...  jumpTo: 1
>>>
>>> so, we can have 1 less jump when code enters first branch.
>>>
>>
>> Very good!
>> I think however that hacking the Decompiler will be a challenge!
>>
>
> Well, in this case it is easy to hide hints for decompiler directly in
> a bytecode:
>
> 1. loop head
> ...
> 100. jumpTo: 106
> 101. push ...
> 102. pop
> 103. jumpTo: 1
> 104. hint
> 105. more hints
> 106. ... section start for false branch
>
> n+1   ...
>
> ----
> same actually for loops. instead of writing sophisticated detection
> code , we can simply read hints:
>
> 1. loop head
> ...
>
> 10. jumpIfFalse: 101 (leaving the loop )
>
> ... loop body
>
> 99. jumpTo: 1  (loop head)
> 100. hint for loop
> 101. ... rest of method's bytecode
>
> As you can see , it doesn't affects the performance, since its
> unreachable code, just the size of bytecode.
>

Huh, reserving plenty byte codes for nop or intermixing instructions and text,
whatever you choose might lead you directly to the stake ;)

You invented an old new idea: literate bytecode programming

Nicolas

> just don't put me on fire for suggesting that :)
>
>
> --
> Best regards,
> Igor Stasenko.
>

Reply | Threaded
Open this post in threaded view
|

Re: here's a curio for compiler mavens...

Eliot Miranda-2
In reply to this post by Eliot Miranda-2
OK, I fixed this (a pleasant Monday afternoon distraction) (at least I *think* I've fixed it; I still have to check the decompiler).  But... in doing so I realised I had to fix CompiledMethod>>#= since I needed to see exactly which methods the compiler change affected (just committed as Kernel-eem.669).  In doing this I noticed that recent compiler changes have not been applied by recompiling affected methods in the system.  If you look at Behavior>>#sourceMatchesBytecodeAt: it uses CompiledMethod>>#= to check that the compiler's output remains unchanged.  In the 4.3 trunk image with CompiledMethod>>#= fixed, sourceMatchesBytecodeAt: reports that some 19654 methods need recompiling:

(SystemNavigation new allSelect: [:m| (m methodClass sourceMatchesBytecodeAt: m selector) not]) size 19654

Being selective in recompiling, via e.g.

SystemNavigation new allSelect: [:m| (m methodClass sourceMatchesBytecodeAt: m selector) ifFalse: [m methodClass recompile: m selector]. false]

one gets to the desired

(SystemNavigation new allSelect: [:m| (m methodClass sourceMatchesBytecodeAt: m selector) not]) size 0

So the question is should we add an update that does the selective recompile?  I think we should, e.g. by committing a version of Compiler that contains the selective recompile as a post-script.  Agreed?

On Mon, Feb 13, 2012 at 11:58 AM, Eliot Miranda <[hidden email]> wrote:
Hi All,

    (whitewash alert)  I just had occasion to look at the bytecode generated by the standard compiler for HashedCollection class>>#goodPrimeAtLeast:.  Here's the source, with the issue in bold:

goodPrimeAtLeast: lowerLimit
"Answer the next good prime >= lowerlimit.
If lowerLimit is larger than the largest known good prime,
just make it odd."
| primes low mid high prime |
primes := self goodPrimes.
low := 1.
high := primes size.
lowerLimit > (primes at: high) ifTrue: [
^lowerLimit bitOr: 1 ].
[ high - low <= 1 ] whileFalse: [
mid := high + low // 2.
prime := primes at: mid.
prime = lowerLimit ifTrue: [ ^prime ].
prime < lowerLimit
ifTrue: [ low := mid ]
ifFalse: [ high := mid ] ].
(primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
^primes at: high

The code for this sequence is
58 <15> pushTemp: 5
59 <10> pushTemp: 0
60 <B2> send: <
61 <9B> jumpFalse: 66
62 <13> pushTemp: 3
63 <81 42> storeIntoTemp: 2
65 <92> jumpTo: 69
66 <13> pushTemp: 3
67 <81 44> storeIntoTemp: 4
69 <87> pop

where-as the following would be better:

58 <15> pushTemp: 5
59 <10> pushTemp: 0
60 <B2> send: <
61 <9B> jumpFalse: 66
62 <13> pushTemp: 3
63 <82 42> popIntoTemp: 2
65 <92> jumpTo: 69
66 <13> pushTemp: 3
67 <82 44> popIntoTemp: 4

The reason is that the code generator favours using a single pop for both arms of the if:

MessageNode>>sizeCodeForIf: encoder value: forValue
| thenExpr elseExpr branchSize thenSize elseSize |
thenExpr := arguments at: 1.
elseExpr := arguments at: 2.
(forValue
or: [(thenExpr isJust: NodeNil)
or: [elseExpr isJust: NodeNil]]) not
"(...not ifTrue: avoids using ifFalse: alone during this compile)"
ifTrue:  "Two-armed IFs forEffect share a single pop"
[^super sizeCodeForEffect: encoder].
...

MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue
| thenExpr thenSize elseExpr elseSize |
thenSize := sizes at: 1.
elseSize := sizes at: 2.
(forValue not and: [elseSize * thenSize > 0]) ifTrue:
"Two-armed IFs forEffect share a single pop"
[^super emitCodeForEffect: stack encoder: encoder].

It would be nice if this only happened if doing so actually reduced the size of the generated code.
--
best,
Eliot




--
best,
Eliot



Reply | Threaded
Open this post in threaded view
|

Re: here's a curio for compiler mavens...

Nicolas Cellier
Ah, I came up with a relatively explicit solution, but also quite
heavy in term of number of messages required to implement the
optimisation.
I'm curious to see your implementation :)
Also, I'm sure that multiple-jump-elimination proposed by Igor and
other kind of optimization would be a perfect subject for proving
concepts of NewCompiler IR infrastructure.

MessageNode>>doesEffectRequiresAPop: encoder
        special > 0
                ifTrue:
                        [^self perform: (MacroTesters at: special) with: encoder].
        ^super doesEffectRequiresAPop: encoder

MessageNode>>doesEffectForIfRequiresAPop: encoder
        "If both branch require a pop instruction, then the pop is factored
after the jump.
        In this case, the translation of this node also require a final pop
that can further be eliminated eventually."
        | thenExpr elseExpr |
        thenExpr := arguments at: 1.
        elseExpr := arguments at: 2.
        ^((thenExpr isJust: NodeNil) or: [elseExpr isJust: NodeNil]) not
                and: [(thenExpr doesEvaluatingEffectRequiresAPop: encoder) and:
[elseExpr doesEvaluatingEffectRequiresAPop: encoder]].

BlockNode>>doesEvaluatingEffectRequiresAPop: anEncoder
        "Test whether a pop instruction is required to skip the value on the
stack when this block is inlined."
        ^statements size > 0 and: [statements last doesEffectRequiresAPop: anEncoder]

AssigmentNode>>doesEffectRequiresAPop: anEncoder
        "Test whether a pop instruction is required to skip the value on the stack"
        ^(variable canStorePopInASingleInstruction: anEncoder) not

InstanceVariableNode>>canStorePopInASingleInstruction: anEncoder
        ^true

etc...

Le 14 février 2012 01:27, Eliot Miranda <[hidden email]> a écrit :

> OK, I fixed this (a pleasant Monday afternoon distraction) (at least I
> *think* I've fixed it; I still have to check the decompiler).  But... in
> doing so I realised I had to fix CompiledMethod>>#= since I needed to see
> exactly which methods the compiler change affected (just committed as
> Kernel-eem.669).  In doing this I noticed that recent compiler changes have
> not been applied by recompiling affected methods in the system.  If you look
> at Behavior>>#sourceMatchesBytecodeAt: it uses CompiledMethod>>#= to check
> that the compiler's output remains unchanged.  In the 4.3 trunk image with
> CompiledMethod>>#= fixed, sourceMatchesBytecodeAt: reports that some 19654
> methods need recompiling:
>
> (SystemNavigation new allSelect: [:m| (m methodClass
> sourceMatchesBytecodeAt: m selector) not]) size 19654
>
> Being selective in recompiling, via e.g.
>
> SystemNavigation new allSelect: [:m| (m methodClass sourceMatchesBytecodeAt:
> m selector) ifFalse: [m methodClass recompile: m selector]. false]
>
> one gets to the desired
>
> (SystemNavigation new allSelect: [:m| (m methodClass
> sourceMatchesBytecodeAt: m selector) not]) size 0
>
> So the question is should we add an update that does the selective
> recompile?  I think we should, e.g. by committing a version of Compiler that
> contains the selective recompile as a post-script.  Agreed?
>
> On Mon, Feb 13, 2012 at 11:58 AM, Eliot Miranda <[hidden email]>
> wrote:
>>
>> Hi All,
>>
>>     (whitewash alert)  I just had occasion to look at the bytecode
>> generated by the standard compiler for HashedCollection
>> class>>#goodPrimeAtLeast:.  Here's the source, with the issue in bold:
>>
>> goodPrimeAtLeast: lowerLimit
>> "Answer the next good prime >= lowerlimit.
>> If lowerLimit is larger than the largest known good prime,
>> just make it odd."
>> | primes low mid high prime |
>> primes := self goodPrimes.
>> low := 1.
>> high := primes size.
>> lowerLimit > (primes at: high) ifTrue: [
>> ^lowerLimit bitOr: 1 ].
>> [ high - low <= 1 ] whileFalse: [
>> mid := high + low // 2.
>> prime := primes at: mid.
>> prime = lowerLimit ifTrue: [ ^prime ].
>> prime < lowerLimit
>> ifTrue: [ low := mid ]
>> ifFalse: [ high := mid ] ].
>> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
>> ^primes at: high
>>
>> The code for this sequence is
>> 58 <15> pushTemp: 5
>> 59 <10> pushTemp: 0
>> 60 <B2> send: <
>> 61 <9B> jumpFalse: 66
>> 62 <13> pushTemp: 3
>> 63 <81 42> storeIntoTemp: 2
>> 65 <92> jumpTo: 69
>> 66 <13> pushTemp: 3
>> 67 <81 44> storeIntoTemp: 4
>> 69 <87> pop
>>
>> where-as the following would be better:
>>
>> 58 <15> pushTemp: 5
>> 59 <10> pushTemp: 0
>> 60 <B2> send: <
>> 61 <9B> jumpFalse: 66
>> 62 <13> pushTemp: 3
>> 63 <82 42> popIntoTemp: 2
>> 65 <92> jumpTo: 69
>> 66 <13> pushTemp: 3
>> 67 <82 44> popIntoTemp: 4
>>
>> The reason is that the code generator favours using a single pop for both
>> arms of the if:
>>
>> MessageNode>>sizeCodeForIf: encoder value: forValue
>> | thenExpr elseExpr branchSize thenSize elseSize |
>> thenExpr := arguments at: 1.
>> elseExpr := arguments at: 2.
>> (forValue
>> or: [(thenExpr isJust: NodeNil)
>> or: [elseExpr isJust: NodeNil]]) not
>> "(...not ifTrue: avoids using ifFalse: alone during this compile)"
>> ifTrue:  "Two-armed IFs forEffect share a single pop"
>> [^super sizeCodeForEffect: encoder].
>> ...
>>
>> MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue
>> | thenExpr thenSize elseExpr elseSize |
>> thenSize := sizes at: 1.
>> elseSize := sizes at: 2.
>> (forValue not and: [elseSize * thenSize > 0]) ifTrue:
>> "Two-armed IFs forEffect share a single pop"
>> [^super emitCodeForEffect: stack encoder: encoder].
>>
>> It would be nice if this only happened if doing so actually reduced the
>> size of the generated code.
>> --
>> best,
>> Eliot
>>
>
>
>
> --
> best,
> Eliot
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: here's a curio for compiler mavens...

Eliot Miranda-2
On Tue, Feb 14, 2012 at 5:30 AM, Nicolas Cellier <[hidden email]> wrote:
Ah, I came up with a relatively explicit solution, but also quite
heavy in term of number of messages required to implement the
optimisation.
I'm curious to see your implementation :)

similar, but should I commit it? I think mine is fine since it is only assignments that have storepop forms, so your generality doesn't win anything.  but i could be wrong.

 !MessageNode methodsFor: 'code generation' stamp: 'eem 2/13/2012 15:25'!
ifThenElseForEffectShouldSharePop
        | thenExpr elseExpr |
        thenExpr := arguments at: 1.
        elseExpr := arguments at: 2.
        ^((thenExpr isJust: NodeNil)
           or: [(elseExpr isJust: NodeNil)
           or: [thenExpr statements last isAssignmentNode
                  and: [elseExpr statements last isAssignmentNode]]]) not

        "^(thenExpr isJust: NodeNil) not
           and: [(elseExpr isJust: NodeNil) not
           and: [thenExpr statements last isAssignmentNode not
                  or: [elseExpr statements last isAssignmentNode not]]]"! !

Also, I'm sure that multiple-jump-elimination proposed by Igor and
other kind of optimization would be a perfect subject for proving
concepts of NewCompiler IR infrastructure.

MessageNode>>doesEffectRequiresAPop: encoder
       special > 0
               ifTrue:
                       [^self perform: (MacroTesters at: special) with: encoder].
       ^super doesEffectRequiresAPop: encoder

MessageNode>>doesEffectForIfRequiresAPop: encoder
       "If both branch require a pop instruction, then the pop is factored
after the jump.
       In this case, the translation of this node also require a final pop
that can further be eliminated eventually."
       | thenExpr elseExpr |
       thenExpr := arguments at: 1.
       elseExpr := arguments at: 2.
       ^((thenExpr isJust: NodeNil) or: [elseExpr isJust: NodeNil]) not
               and: [(thenExpr doesEvaluatingEffectRequiresAPop: encoder) and:
[elseExpr doesEvaluatingEffectRequiresAPop: encoder]].

BlockNode>>doesEvaluatingEffectRequiresAPop: anEncoder
       "Test whether a pop instruction is required to skip the value on the
stack when this block is inlined."
       ^statements size > 0 and: [statements last doesEffectRequiresAPop: anEncoder]

AssigmentNode>>doesEffectRequiresAPop: anEncoder
       "Test whether a pop instruction is required to skip the value on the stack"
       ^(variable canStorePopInASingleInstruction: anEncoder) not

InstanceVariableNode>>canStorePopInASingleInstruction: anEncoder
       ^true

etc...

Le 14 février 2012 01:27, Eliot Miranda <[hidden email]> a écrit :
> OK, I fixed this (a pleasant Monday afternoon distraction) (at least I
> *think* I've fixed it; I still have to check the decompiler).  But... in
> doing so I realised I had to fix CompiledMethod>>#= since I needed to see
> exactly which methods the compiler change affected (just committed as
> Kernel-eem.669).  In doing this I noticed that recent compiler changes have
> not been applied by recompiling affected methods in the system.  If you look
> at Behavior>>#sourceMatchesBytecodeAt: it uses CompiledMethod>>#= to check
> that the compiler's output remains unchanged.  In the 4.3 trunk image with
> CompiledMethod>>#= fixed, sourceMatchesBytecodeAt: reports that some 19654
> methods need recompiling:
>
> (SystemNavigation new allSelect: [:m| (m methodClass
> sourceMatchesBytecodeAt: m selector) not]) size 19654
>
> Being selective in recompiling, via e.g.
>
> SystemNavigation new allSelect: [:m| (m methodClass sourceMatchesBytecodeAt:
> m selector) ifFalse: [m methodClass recompile: m selector]. false]
>
> one gets to the desired
>
> (SystemNavigation new allSelect: [:m| (m methodClass
> sourceMatchesBytecodeAt: m selector) not]) size 0
>
> So the question is should we add an update that does the selective
> recompile?  I think we should, e.g. by committing a version of Compiler that
> contains the selective recompile as a post-script.  Agreed?
>
> On Mon, Feb 13, 2012 at 11:58 AM, Eliot Miranda <[hidden email]>
> wrote:
>>
>> Hi All,
>>
>>     (whitewash alert)  I just had occasion to look at the bytecode
>> generated by the standard compiler for HashedCollection
>> class>>#goodPrimeAtLeast:.  Here's the source, with the issue in bold:
>>
>> goodPrimeAtLeast: lowerLimit
>> "Answer the next good prime >= lowerlimit.
>> If lowerLimit is larger than the largest known good prime,
>> just make it odd."
>> | primes low mid high prime |
>> primes := self goodPrimes.
>> low := 1.
>> high := primes size.
>> lowerLimit > (primes at: high) ifTrue: [
>> ^lowerLimit bitOr: 1 ].
>> [ high - low <= 1 ] whileFalse: [
>> mid := high + low // 2.
>> prime := primes at: mid.
>> prime = lowerLimit ifTrue: [ ^prime ].
>> prime < lowerLimit
>> ifTrue: [ low := mid ]
>> ifFalse: [ high := mid ] ].
>> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
>> ^primes at: high
>>
>> The code for this sequence is
>> 58 <15> pushTemp: 5
>> 59 <10> pushTemp: 0
>> 60 <B2> send: <
>> 61 <9B> jumpFalse: 66
>> 62 <13> pushTemp: 3
>> 63 <81 42> storeIntoTemp: 2
>> 65 <92> jumpTo: 69
>> 66 <13> pushTemp: 3
>> 67 <81 44> storeIntoTemp: 4
>> 69 <87> pop
>>
>> where-as the following would be better:
>>
>> 58 <15> pushTemp: 5
>> 59 <10> pushTemp: 0
>> 60 <B2> send: <
>> 61 <9B> jumpFalse: 66
>> 62 <13> pushTemp: 3
>> 63 <82 42> popIntoTemp: 2
>> 65 <92> jumpTo: 69
>> 66 <13> pushTemp: 3
>> 67 <82 44> popIntoTemp: 4
>>
>> The reason is that the code generator favours using a single pop for both
>> arms of the if:
>>
>> MessageNode>>sizeCodeForIf: encoder value: forValue
>> | thenExpr elseExpr branchSize thenSize elseSize |
>> thenExpr := arguments at: 1.
>> elseExpr := arguments at: 2.
>> (forValue
>> or: [(thenExpr isJust: NodeNil)
>> or: [elseExpr isJust: NodeNil]]) not
>> "(...not ifTrue: avoids using ifFalse: alone during this compile)"
>> ifTrue:  "Two-armed IFs forEffect share a single pop"
>> [^super sizeCodeForEffect: encoder].
>> ...
>>
>> MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue
>> | thenExpr thenSize elseExpr elseSize |
>> thenSize := sizes at: 1.
>> elseSize := sizes at: 2.
>> (forValue not and: [elseSize * thenSize > 0]) ifTrue:
>> "Two-armed IFs forEffect share a single pop"
>> [^super emitCodeForEffect: stack encoder: encoder].
>>
>> It would be nice if this only happened if doing so actually reduced the
>> size of the generated code.
>> --
>> best,
>> Eliot
>>
>
>
>
> --
> best,
> Eliot
>
>
>
>




--
best,
Eliot




ifThenElseForEffectShouldSharePop.st (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: here's a curio for compiler mavens...

Nicolas Cellier
Le 14 février 2012 23:10, Eliot Miranda <[hidden email]> a écrit :

> On Tue, Feb 14, 2012 at 5:30 AM, Nicolas Cellier
> <[hidden email]> wrote:
>>
>> Ah, I came up with a relatively explicit solution, but also quite
>> heavy in term of number of messages required to implement the
>> optimisation.
>> I'm curious to see your implementation :)
>
>
> similar, but should I commit it? I think mine is fine since it is only
> assignments that have storepop forms, so your generality doesn't win
> anything.  but i could be wrong.
>

I was thinking of

1) nested ifTrue: ifFalse:
cond1 ifTrue: [x1 := y] ifFalse: [cond2 ifTrue: [x2 := y] ifFalse: [x3 := y]]

elseExpr statements last is a message, not an assignment, though we
can factor the pop out...

2) inlined loop messages don't require a final pop...

x <= y ifTrue: [x to: y do: [:i | (self at: i) doSome]] ifFalse: [y
to: x do: [:i | (self at: i) doSome]].

An encoder pushing a value in both branches just to pop it after
wouldn't be that well behaved...

I won't give any version publicly because I crashed several images in
the bootstrap process, and don't know if the bug is in the bootstrap
or in the code.

Nicolas

>  !MessageNode methodsFor: 'code generation' stamp: 'eem 2/13/2012 15:25'!
> ifThenElseForEffectShouldSharePop
>         | thenExpr elseExpr |
>         thenExpr := arguments at: 1.
>         elseExpr := arguments at: 2.
>         ^((thenExpr isJust: NodeNil)
>            or: [(elseExpr isJust: NodeNil)
>            or: [thenExpr statements last isAssignmentNode
>                   and: [elseExpr statements last isAssignmentNode]]]) not
>
>         "^(thenExpr isJust: NodeNil) not
>            and: [(elseExpr isJust: NodeNil) not
>            and: [thenExpr statements last isAssignmentNode not
>                   or: [elseExpr statements last isAssignmentNode not]]]"! !
>
>> Also, I'm sure that multiple-jump-elimination proposed by Igor and
>> other kind of optimization would be a perfect subject for proving
>> concepts of NewCompiler IR infrastructure.
>>
>> MessageNode>>doesEffectRequiresAPop: encoder
>>        special > 0
>>                ifTrue:
>>                        [^self perform: (MacroTesters at: special) with:
>> encoder].
>>        ^super doesEffectRequiresAPop: encoder
>>
>> MessageNode>>doesEffectForIfRequiresAPop: encoder
>>        "If both branch require a pop instruction, then the pop is factored
>> after the jump.
>>        In this case, the translation of this node also require a final pop
>> that can further be eliminated eventually."
>>        | thenExpr elseExpr |
>>        thenExpr := arguments at: 1.
>>        elseExpr := arguments at: 2.
>>        ^((thenExpr isJust: NodeNil) or: [elseExpr isJust: NodeNil]) not
>>                and: [(thenExpr doesEvaluatingEffectRequiresAPop: encoder)
>> and:
>> [elseExpr doesEvaluatingEffectRequiresAPop: encoder]].
>>
>> BlockNode>>doesEvaluatingEffectRequiresAPop: anEncoder
>>        "Test whether a pop instruction is required to skip the value on
>> the
>> stack when this block is inlined."
>>        ^statements size > 0 and: [statements last doesEffectRequiresAPop:
>> anEncoder]
>>
>> AssigmentNode>>doesEffectRequiresAPop: anEncoder
>>        "Test whether a pop instruction is required to skip the value on
>> the stack"
>>        ^(variable canStorePopInASingleInstruction: anEncoder) not
>>
>> InstanceVariableNode>>canStorePopInASingleInstruction: anEncoder
>>        ^true
>>
>> etc...
>>
>> Le 14 février 2012 01:27, Eliot Miranda <[hidden email]> a écrit
>> :
>> > OK, I fixed this (a pleasant Monday afternoon distraction) (at least I
>> > *think* I've fixed it; I still have to check the decompiler).  But... in
>> > doing so I realised I had to fix CompiledMethod>>#= since I needed to
>> > see
>> > exactly which methods the compiler change affected (just committed as
>> > Kernel-eem.669).  In doing this I noticed that recent compiler changes
>> > have
>> > not been applied by recompiling affected methods in the system.  If you
>> > look
>> > at Behavior>>#sourceMatchesBytecodeAt: it uses CompiledMethod>>#= to
>> > check
>> > that the compiler's output remains unchanged.  In the 4.3 trunk image
>> > with
>> > CompiledMethod>>#= fixed, sourceMatchesBytecodeAt: reports that
>> > some 19654
>> > methods need recompiling:
>> >
>> > (SystemNavigation new allSelect: [:m| (m methodClass
>> > sourceMatchesBytecodeAt: m selector) not]) size 19654
>> >
>> > Being selective in recompiling, via e.g.
>> >
>> > SystemNavigation new allSelect: [:m| (m methodClass
>> > sourceMatchesBytecodeAt:
>> > m selector) ifFalse: [m methodClass recompile: m selector]. false]
>> >
>> > one gets to the desired
>> >
>> > (SystemNavigation new allSelect: [:m| (m methodClass
>> > sourceMatchesBytecodeAt: m selector) not]) size 0
>> >
>> > So the question is should we add an update that does the selective
>> > recompile?  I think we should, e.g. by committing a version of Compiler
>> > that
>> > contains the selective recompile as a post-script.  Agreed?
>> >
>> > On Mon, Feb 13, 2012 at 11:58 AM, Eliot Miranda
>> > <[hidden email]>
>> > wrote:
>> >>
>> >> Hi All,
>> >>
>> >>     (whitewash alert)  I just had occasion to look at the bytecode
>> >> generated by the standard compiler for HashedCollection
>> >> class>>#goodPrimeAtLeast:.  Here's the source, with the issue in bold:
>> >>
>> >> goodPrimeAtLeast: lowerLimit
>> >> "Answer the next good prime >= lowerlimit.
>> >> If lowerLimit is larger than the largest known good prime,
>> >> just make it odd."
>> >> | primes low mid high prime |
>> >> primes := self goodPrimes.
>> >> low := 1.
>> >> high := primes size.
>> >> lowerLimit > (primes at: high) ifTrue: [
>> >> ^lowerLimit bitOr: 1 ].
>> >> [ high - low <= 1 ] whileFalse: [
>> >> mid := high + low // 2.
>> >> prime := primes at: mid.
>> >> prime = lowerLimit ifTrue: [ ^prime ].
>> >> prime < lowerLimit
>> >> ifTrue: [ low := mid ]
>> >> ifFalse: [ high := mid ] ].
>> >> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
>> >> ^primes at: high
>> >>
>> >> The code for this sequence is
>> >> 58 <15> pushTemp: 5
>> >> 59 <10> pushTemp: 0
>> >> 60 <B2> send: <
>> >> 61 <9B> jumpFalse: 66
>> >> 62 <13> pushTemp: 3
>> >> 63 <81 42> storeIntoTemp: 2
>> >> 65 <92> jumpTo: 69
>> >> 66 <13> pushTemp: 3
>> >> 67 <81 44> storeIntoTemp: 4
>> >> 69 <87> pop
>> >>
>> >> where-as the following would be better:
>> >>
>> >> 58 <15> pushTemp: 5
>> >> 59 <10> pushTemp: 0
>> >> 60 <B2> send: <
>> >> 61 <9B> jumpFalse: 66
>> >> 62 <13> pushTemp: 3
>> >> 63 <82 42> popIntoTemp: 2
>> >> 65 <92> jumpTo: 69
>> >> 66 <13> pushTemp: 3
>> >> 67 <82 44> popIntoTemp: 4
>> >>
>> >> The reason is that the code generator favours using a single pop for
>> >> both
>> >> arms of the if:
>> >>
>> >> MessageNode>>sizeCodeForIf: encoder value: forValue
>> >> | thenExpr elseExpr branchSize thenSize elseSize |
>> >> thenExpr := arguments at: 1.
>> >> elseExpr := arguments at: 2.
>> >> (forValue
>> >> or: [(thenExpr isJust: NodeNil)
>> >> or: [elseExpr isJust: NodeNil]]) not
>> >> "(...not ifTrue: avoids using ifFalse: alone during this compile)"
>> >> ifTrue:  "Two-armed IFs forEffect share a single pop"
>> >> [^super sizeCodeForEffect: encoder].
>> >> ...
>> >>
>> >> MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue
>> >> | thenExpr thenSize elseExpr elseSize |
>> >> thenSize := sizes at: 1.
>> >> elseSize := sizes at: 2.
>> >> (forValue not and: [elseSize * thenSize > 0]) ifTrue:
>> >> "Two-armed IFs forEffect share a single pop"
>> >> [^super emitCodeForEffect: stack encoder: encoder].
>> >>
>> >> It would be nice if this only happened if doing so actually reduced the
>> >> size of the generated code.
>> >> --
>> >> best,
>> >> Eliot
>> >>
>> >
>> >
>> >
>> > --
>> > best,
>> > Eliot
>> >
>> >
>> >
>> >
>>
>
>
>
> --
> best,
> Eliot
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: here's a curio for compiler mavens...

Igor Stasenko
On 15 February 2012 00:50, Nicolas Cellier
<[hidden email]> wrote:

> Le 14 février 2012 23:10, Eliot Miranda <[hidden email]> a écrit :
>> On Tue, Feb 14, 2012 at 5:30 AM, Nicolas Cellier
>> <[hidden email]> wrote:
>>>
>>> Ah, I came up with a relatively explicit solution, but also quite
>>> heavy in term of number of messages required to implement the
>>> optimisation.
>>> I'm curious to see your implementation :)
>>
>>
>> similar, but should I commit it? I think mine is fine since it is only
>> assignments that have storepop forms, so your generality doesn't win
>> anything.  but i could be wrong.
>>
>
> I was thinking of
>
> 1) nested ifTrue: ifFalse:
> cond1 ifTrue: [x1 := y] ifFalse: [cond2 ifTrue: [x2 := y] ifFalse: [x3 := y]]
>
> elseExpr statements last is a message, not an assignment, though we
> can factor the pop out...
>
> 2) inlined loop messages don't require a final pop...
>
> x <= y ifTrue: [x to: y do: [:i | (self at: i) doSome]] ifFalse: [y
> to: x do: [:i | (self at: i) doSome]].
>
> An encoder pushing a value in both branches just to pop it after
> wouldn't be that well behaved...
>
> I won't give any version publicly because I crashed several images in
> the bootstrap process, and don't know if the bug is in the bootstrap
> or in the code.
>
IMO, too much effort for such diminishing returns. :)

> Nicolas
>
>>  !MessageNode methodsFor: 'code generation' stamp: 'eem 2/13/2012 15:25'!
>> ifThenElseForEffectShouldSharePop
>>         | thenExpr elseExpr |
>>         thenExpr := arguments at: 1.
>>         elseExpr := arguments at: 2.
>>         ^((thenExpr isJust: NodeNil)
>>            or: [(elseExpr isJust: NodeNil)
>>            or: [thenExpr statements last isAssignmentNode
>>                   and: [elseExpr statements last isAssignmentNode]]]) not
>>
>>         "^(thenExpr isJust: NodeNil) not
>>            and: [(elseExpr isJust: NodeNil) not
>>            and: [thenExpr statements last isAssignmentNode not
>>                   or: [elseExpr statements last isAssignmentNode not]]]"! !
>>
>>> Also, I'm sure that multiple-jump-elimination proposed by Igor and
>>> other kind of optimization would be a perfect subject for proving
>>> concepts of NewCompiler IR infrastructure.
>>>
>>> MessageNode>>doesEffectRequiresAPop: encoder
>>>        special > 0
>>>                ifTrue:
>>>                        [^self perform: (MacroTesters at: special) with:
>>> encoder].
>>>        ^super doesEffectRequiresAPop: encoder
>>>
>>> MessageNode>>doesEffectForIfRequiresAPop: encoder
>>>        "If both branch require a pop instruction, then the pop is factored
>>> after the jump.
>>>        In this case, the translation of this node also require a final pop
>>> that can further be eliminated eventually."
>>>        | thenExpr elseExpr |
>>>        thenExpr := arguments at: 1.
>>>        elseExpr := arguments at: 2.
>>>        ^((thenExpr isJust: NodeNil) or: [elseExpr isJust: NodeNil]) not
>>>                and: [(thenExpr doesEvaluatingEffectRequiresAPop: encoder)
>>> and:
>>> [elseExpr doesEvaluatingEffectRequiresAPop: encoder]].
>>>
>>> BlockNode>>doesEvaluatingEffectRequiresAPop: anEncoder
>>>        "Test whether a pop instruction is required to skip the value on
>>> the
>>> stack when this block is inlined."
>>>        ^statements size > 0 and: [statements last doesEffectRequiresAPop:
>>> anEncoder]
>>>
>>> AssigmentNode>>doesEffectRequiresAPop: anEncoder
>>>        "Test whether a pop instruction is required to skip the value on
>>> the stack"
>>>        ^(variable canStorePopInASingleInstruction: anEncoder) not
>>>
>>> InstanceVariableNode>>canStorePopInASingleInstruction: anEncoder
>>>        ^true
>>>
>>> etc...
>>>
>>> Le 14 février 2012 01:27, Eliot Miranda <[hidden email]> a écrit
>>> :
>>> > OK, I fixed this (a pleasant Monday afternoon distraction) (at least I
>>> > *think* I've fixed it; I still have to check the decompiler).  But... in
>>> > doing so I realised I had to fix CompiledMethod>>#= since I needed to
>>> > see
>>> > exactly which methods the compiler change affected (just committed as
>>> > Kernel-eem.669).  In doing this I noticed that recent compiler changes
>>> > have
>>> > not been applied by recompiling affected methods in the system.  If you
>>> > look
>>> > at Behavior>>#sourceMatchesBytecodeAt: it uses CompiledMethod>>#= to
>>> > check
>>> > that the compiler's output remains unchanged.  In the 4.3 trunk image
>>> > with
>>> > CompiledMethod>>#= fixed, sourceMatchesBytecodeAt: reports that
>>> > some 19654
>>> > methods need recompiling:
>>> >
>>> > (SystemNavigation new allSelect: [:m| (m methodClass
>>> > sourceMatchesBytecodeAt: m selector) not]) size 19654
>>> >
>>> > Being selective in recompiling, via e.g.
>>> >
>>> > SystemNavigation new allSelect: [:m| (m methodClass
>>> > sourceMatchesBytecodeAt:
>>> > m selector) ifFalse: [m methodClass recompile: m selector]. false]
>>> >
>>> > one gets to the desired
>>> >
>>> > (SystemNavigation new allSelect: [:m| (m methodClass
>>> > sourceMatchesBytecodeAt: m selector) not]) size 0
>>> >
>>> > So the question is should we add an update that does the selective
>>> > recompile?  I think we should, e.g. by committing a version of Compiler
>>> > that
>>> > contains the selective recompile as a post-script.  Agreed?
>>> >
>>> > On Mon, Feb 13, 2012 at 11:58 AM, Eliot Miranda
>>> > <[hidden email]>
>>> > wrote:
>>> >>
>>> >> Hi All,
>>> >>
>>> >>     (whitewash alert)  I just had occasion to look at the bytecode
>>> >> generated by the standard compiler for HashedCollection
>>> >> class>>#goodPrimeAtLeast:.  Here's the source, with the issue in bold:
>>> >>
>>> >> goodPrimeAtLeast: lowerLimit
>>> >> "Answer the next good prime >= lowerlimit.
>>> >> If lowerLimit is larger than the largest known good prime,
>>> >> just make it odd."
>>> >> | primes low mid high prime |
>>> >> primes := self goodPrimes.
>>> >> low := 1.
>>> >> high := primes size.
>>> >> lowerLimit > (primes at: high) ifTrue: [
>>> >> ^lowerLimit bitOr: 1 ].
>>> >> [ high - low <= 1 ] whileFalse: [
>>> >> mid := high + low // 2.
>>> >> prime := primes at: mid.
>>> >> prime = lowerLimit ifTrue: [ ^prime ].
>>> >> prime < lowerLimit
>>> >> ifTrue: [ low := mid ]
>>> >> ifFalse: [ high := mid ] ].
>>> >> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
>>> >> ^primes at: high
>>> >>
>>> >> The code for this sequence is
>>> >> 58 <15> pushTemp: 5
>>> >> 59 <10> pushTemp: 0
>>> >> 60 <B2> send: <
>>> >> 61 <9B> jumpFalse: 66
>>> >> 62 <13> pushTemp: 3
>>> >> 63 <81 42> storeIntoTemp: 2
>>> >> 65 <92> jumpTo: 69
>>> >> 66 <13> pushTemp: 3
>>> >> 67 <81 44> storeIntoTemp: 4
>>> >> 69 <87> pop
>>> >>
>>> >> where-as the following would be better:
>>> >>
>>> >> 58 <15> pushTemp: 5
>>> >> 59 <10> pushTemp: 0
>>> >> 60 <B2> send: <
>>> >> 61 <9B> jumpFalse: 66
>>> >> 62 <13> pushTemp: 3
>>> >> 63 <82 42> popIntoTemp: 2
>>> >> 65 <92> jumpTo: 69
>>> >> 66 <13> pushTemp: 3
>>> >> 67 <82 44> popIntoTemp: 4
>>> >>
>>> >> The reason is that the code generator favours using a single pop for
>>> >> both
>>> >> arms of the if:
>>> >>
>>> >> MessageNode>>sizeCodeForIf: encoder value: forValue
>>> >> | thenExpr elseExpr branchSize thenSize elseSize |
>>> >> thenExpr := arguments at: 1.
>>> >> elseExpr := arguments at: 2.
>>> >> (forValue
>>> >> or: [(thenExpr isJust: NodeNil)
>>> >> or: [elseExpr isJust: NodeNil]]) not
>>> >> "(...not ifTrue: avoids using ifFalse: alone during this compile)"
>>> >> ifTrue:  "Two-armed IFs forEffect share a single pop"
>>> >> [^super sizeCodeForEffect: encoder].
>>> >> ...
>>> >>
>>> >> MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue
>>> >> | thenExpr thenSize elseExpr elseSize |
>>> >> thenSize := sizes at: 1.
>>> >> elseSize := sizes at: 2.
>>> >> (forValue not and: [elseSize * thenSize > 0]) ifTrue:
>>> >> "Two-armed IFs forEffect share a single pop"
>>> >> [^super emitCodeForEffect: stack encoder: encoder].
>>> >>
>>> >> It would be nice if this only happened if doing so actually reduced the
>>> >> size of the generated code.
>>> >> --
>>> >> best,
>>> >> Eliot
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > best,
>>> > Eliot
>>> >
>>> >
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> best,
>> Eliot
>>
>>
>>
>>
>



--
Best regards,
Igor Stasenko.

Reply | Threaded
Open this post in threaded view
|

Re: here's a curio for compiler mavens...

Eliot Miranda-2


On Tue, Feb 14, 2012 at 2:55 PM, Igor Stasenko <[hidden email]> wrote:
On 15 February 2012 00:50, Nicolas Cellier
<[hidden email]> wrote:
> Le 14 février 2012 23:10, Eliot Miranda <[hidden email]> a écrit :
>> On Tue, Feb 14, 2012 at 5:30 AM, Nicolas Cellier
>> <[hidden email]> wrote:
>>>
>>> Ah, I came up with a relatively explicit solution, but also quite
>>> heavy in term of number of messages required to implement the
>>> optimisation.
>>> I'm curious to see your implementation :)
>>
>>
>> similar, but should I commit it? I think mine is fine since it is only
>> assignments that have storepop forms, so your generality doesn't win
>> anything.  but i could be wrong.
>>
>
> I was thinking of
>
> 1) nested ifTrue: ifFalse:
> cond1 ifTrue: [x1 := y] ifFalse: [cond2 ifTrue: [x2 := y] ifFalse: [x3 := y]]
>
> elseExpr statements last is a message, not an assignment, though we
> can factor the pop out...
>
> 2) inlined loop messages don't require a final pop...
>
> x <= y ifTrue: [x to: y do: [:i | (self at: i) doSome]] ifFalse: [y
> to: x do: [:i | (self at: i) doSome]].
>
> An encoder pushing a value in both branches just to pop it after
> wouldn't be that well behaved...
>
> I won't give any version publicly because I crashed several images in
> the bootstrap process, and don't know if the bug is in the bootstrap
> or in the code.
>
IMO, too much effort for such diminishing returns. :)

That was my thought too.  It would be interesting to quantify the benefits.  I noticed some 451 methods that were improved by my simple assignment-only mod.  How many additional methods get caught by yours Nicolas?
 

> Nicolas
>
>>  !MessageNode methodsFor: 'code generation' stamp: 'eem 2/13/2012 15:25'!
>> ifThenElseForEffectShouldSharePop
>>         | thenExpr elseExpr |
>>         thenExpr := arguments at: 1.
>>         elseExpr := arguments at: 2.
>>         ^((thenExpr isJust: NodeNil)
>>            or: [(elseExpr isJust: NodeNil)
>>            or: [thenExpr statements last isAssignmentNode
>>                   and: [elseExpr statements last isAssignmentNode]]]) not
>>
>>         "^(thenExpr isJust: NodeNil) not
>>            and: [(elseExpr isJust: NodeNil) not
>>            and: [thenExpr statements last isAssignmentNode not
>>                   or: [elseExpr statements last isAssignmentNode not]]]"! !
>>
>>> Also, I'm sure that multiple-jump-elimination proposed by Igor and
>>> other kind of optimization would be a perfect subject for proving
>>> concepts of NewCompiler IR infrastructure.
>>>
>>> MessageNode>>doesEffectRequiresAPop: encoder
>>>        special > 0
>>>                ifTrue:
>>>                        [^self perform: (MacroTesters at: special) with:
>>> encoder].
>>>        ^super doesEffectRequiresAPop: encoder
>>>
>>> MessageNode>>doesEffectForIfRequiresAPop: encoder
>>>        "If both branch require a pop instruction, then the pop is factored
>>> after the jump.
>>>        In this case, the translation of this node also require a final pop
>>> that can further be eliminated eventually."
>>>        | thenExpr elseExpr |
>>>        thenExpr := arguments at: 1.
>>>        elseExpr := arguments at: 2.
>>>        ^((thenExpr isJust: NodeNil) or: [elseExpr isJust: NodeNil]) not
>>>                and: [(thenExpr doesEvaluatingEffectRequiresAPop: encoder)
>>> and:
>>> [elseExpr doesEvaluatingEffectRequiresAPop: encoder]].
>>>
>>> BlockNode>>doesEvaluatingEffectRequiresAPop: anEncoder
>>>        "Test whether a pop instruction is required to skip the value on
>>> the
>>> stack when this block is inlined."
>>>        ^statements size > 0 and: [statements last doesEffectRequiresAPop:
>>> anEncoder]
>>>
>>> AssigmentNode>>doesEffectRequiresAPop: anEncoder
>>>        "Test whether a pop instruction is required to skip the value on
>>> the stack"
>>>        ^(variable canStorePopInASingleInstruction: anEncoder) not
>>>
>>> InstanceVariableNode>>canStorePopInASingleInstruction: anEncoder
>>>        ^true
>>>
>>> etc...
>>>
>>> Le 14 février 2012 01:27, Eliot Miranda <[hidden email]> a écrit
>>> :
>>> > OK, I fixed this (a pleasant Monday afternoon distraction) (at least I
>>> > *think* I've fixed it; I still have to check the decompiler).  But... in
>>> > doing so I realised I had to fix CompiledMethod>>#= since I needed to
>>> > see
>>> > exactly which methods the compiler change affected (just committed as
>>> > Kernel-eem.669).  In doing this I noticed that recent compiler changes
>>> > have
>>> > not been applied by recompiling affected methods in the system.  If you
>>> > look
>>> > at Behavior>>#sourceMatchesBytecodeAt: it uses CompiledMethod>>#= to
>>> > check
>>> > that the compiler's output remains unchanged.  In the 4.3 trunk image
>>> > with
>>> > CompiledMethod>>#= fixed, sourceMatchesBytecodeAt: reports that
>>> > some 19654
>>> > methods need recompiling:
>>> >
>>> > (SystemNavigation new allSelect: [:m| (m methodClass
>>> > sourceMatchesBytecodeAt: m selector) not]) size 19654
>>> >
>>> > Being selective in recompiling, via e.g.
>>> >
>>> > SystemNavigation new allSelect: [:m| (m methodClass
>>> > sourceMatchesBytecodeAt:
>>> > m selector) ifFalse: [m methodClass recompile: m selector]. false]
>>> >
>>> > one gets to the desired
>>> >
>>> > (SystemNavigation new allSelect: [:m| (m methodClass
>>> > sourceMatchesBytecodeAt: m selector) not]) size 0
>>> >
>>> > So the question is should we add an update that does the selective
>>> > recompile?  I think we should, e.g. by committing a version of Compiler
>>> > that
>>> > contains the selective recompile as a post-script.  Agreed?
>>> >
>>> > On Mon, Feb 13, 2012 at 11:58 AM, Eliot Miranda
>>> > <[hidden email]>
>>> > wrote:
>>> >>
>>> >> Hi All,
>>> >>
>>> >>     (whitewash alert)  I just had occasion to look at the bytecode
>>> >> generated by the standard compiler for HashedCollection
>>> >> class>>#goodPrimeAtLeast:.  Here's the source, with the issue in bold:
>>> >>
>>> >> goodPrimeAtLeast: lowerLimit
>>> >> "Answer the next good prime >= lowerlimit.
>>> >> If lowerLimit is larger than the largest known good prime,
>>> >> just make it odd."
>>> >> | primes low mid high prime |
>>> >> primes := self goodPrimes.
>>> >> low := 1.
>>> >> high := primes size.
>>> >> lowerLimit > (primes at: high) ifTrue: [
>>> >> ^lowerLimit bitOr: 1 ].
>>> >> [ high - low <= 1 ] whileFalse: [
>>> >> mid := high + low // 2.
>>> >> prime := primes at: mid.
>>> >> prime = lowerLimit ifTrue: [ ^prime ].
>>> >> prime < lowerLimit
>>> >> ifTrue: [ low := mid ]
>>> >> ifFalse: [ high := mid ] ].
>>> >> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
>>> >> ^primes at: high
>>> >>
>>> >> The code for this sequence is
>>> >> 58 <15> pushTemp: 5
>>> >> 59 <10> pushTemp: 0
>>> >> 60 <B2> send: <
>>> >> 61 <9B> jumpFalse: 66
>>> >> 62 <13> pushTemp: 3
>>> >> 63 <81 42> storeIntoTemp: 2
>>> >> 65 <92> jumpTo: 69
>>> >> 66 <13> pushTemp: 3
>>> >> 67 <81 44> storeIntoTemp: 4
>>> >> 69 <87> pop
>>> >>
>>> >> where-as the following would be better:
>>> >>
>>> >> 58 <15> pushTemp: 5
>>> >> 59 <10> pushTemp: 0
>>> >> 60 <B2> send: <
>>> >> 61 <9B> jumpFalse: 66
>>> >> 62 <13> pushTemp: 3
>>> >> 63 <82 42> popIntoTemp: 2
>>> >> 65 <92> jumpTo: 69
>>> >> 66 <13> pushTemp: 3
>>> >> 67 <82 44> popIntoTemp: 4
>>> >>
>>> >> The reason is that the code generator favours using a single pop for
>>> >> both
>>> >> arms of the if:
>>> >>
>>> >> MessageNode>>sizeCodeForIf: encoder value: forValue
>>> >> | thenExpr elseExpr branchSize thenSize elseSize |
>>> >> thenExpr := arguments at: 1.
>>> >> elseExpr := arguments at: 2.
>>> >> (forValue
>>> >> or: [(thenExpr isJust: NodeNil)
>>> >> or: [elseExpr isJust: NodeNil]]) not
>>> >> "(...not ifTrue: avoids using ifFalse: alone during this compile)"
>>> >> ifTrue:  "Two-armed IFs forEffect share a single pop"
>>> >> [^super sizeCodeForEffect: encoder].
>>> >> ...
>>> >>
>>> >> MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue
>>> >> | thenExpr thenSize elseExpr elseSize |
>>> >> thenSize := sizes at: 1.
>>> >> elseSize := sizes at: 2.
>>> >> (forValue not and: [elseSize * thenSize > 0]) ifTrue:
>>> >> "Two-armed IFs forEffect share a single pop"
>>> >> [^super emitCodeForEffect: stack encoder: encoder].
>>> >>
>>> >> It would be nice if this only happened if doing so actually reduced the
>>> >> size of the generated code.
>>> >> --
>>> >> best,
>>> >> Eliot
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > best,
>>> > Eliot
>>> >
>>> >
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> best,
>> Eliot
>>
>>
>>
>>
>



--
Best regards,
Igor Stasenko.




--
best,
Eliot



Reply | Threaded
Open this post in threaded view
|

Re: here's a curio for compiler mavens...

Nicolas Cellier
It's not a clean image, but here are the results:

Eliot code
-> 503 methods

Modified Eliot code
doesEffectForIfRequiresAPop: encoder
        "If both branch require a pop instruction, then the pop is factored
after the jump.
        If a branch is empty or ends with assignment, it don't ends with a
pop, so we don't need to factor a pop"
        ^arguments noneSatisfy: [:branch | (branch isJust: NodeNil) or:
[branch statements last isAssignmentNode]].
-> 880 methods

Mine
doesEffectForIfRequiresAPop: encoder
        "If both branch require a pop instruction, then the pop is factored
after the jump.
        In this case, the translation of this node also require a final pop
that can further be eliminated eventually."
        ^arguments allSatisfy: [:branch | (branch isJust: NodeNil) not and:
[branch doesEvaluatingEffectRequiresAPop: encoder]].
-> 1202 methods (code attached)

And I think I forgot the case of ReturnNode which don't require a pop...

Nicolas

Le 15 février 2012 00:57, Eliot Miranda <[hidden email]> a écrit :

>
>
> On Tue, Feb 14, 2012 at 2:55 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 15 February 2012 00:50, Nicolas Cellier
>> <[hidden email]> wrote:
>> > Le 14 février 2012 23:10, Eliot Miranda <[hidden email]> a
>> > écrit :
>> >> On Tue, Feb 14, 2012 at 5:30 AM, Nicolas Cellier
>> >> <[hidden email]> wrote:
>> >>>
>> >>> Ah, I came up with a relatively explicit solution, but also quite
>> >>> heavy in term of number of messages required to implement the
>> >>> optimisation.
>> >>> I'm curious to see your implementation :)
>> >>
>> >>
>> >> similar, but should I commit it? I think mine is fine since it is only
>> >> assignments that have storepop forms, so your generality doesn't win
>> >> anything.  but i could be wrong.
>> >>
>> >
>> > I was thinking of
>> >
>> > 1) nested ifTrue: ifFalse:
>> > cond1 ifTrue: [x1 := y] ifFalse: [cond2 ifTrue: [x2 := y] ifFalse: [x3
>> > := y]]
>> >
>> > elseExpr statements last is a message, not an assignment, though we
>> > can factor the pop out...
>> >
>> > 2) inlined loop messages don't require a final pop...
>> >
>> > x <= y ifTrue: [x to: y do: [:i | (self at: i) doSome]] ifFalse: [y
>> > to: x do: [:i | (self at: i) doSome]].
>> >
>> > An encoder pushing a value in both branches just to pop it after
>> > wouldn't be that well behaved...
>> >
>> > I won't give any version publicly because I crashed several images in
>> > the bootstrap process, and don't know if the bug is in the bootstrap
>> > or in the code.
>> >
>> IMO, too much effort for such diminishing returns. :)
>
>
> That was my thought too.  It would be interesting to quantify the benefits.
>  I noticed some 451 methods that were improved by my simple assignment-only
> mod.  How many additional methods get caught by yours Nicolas?
>
>>
>>
>> > Nicolas
>> >
>> >>  !MessageNode methodsFor: 'code generation' stamp: 'eem 2/13/2012
>> >> 15:25'!
>> >> ifThenElseForEffectShouldSharePop
>> >>         | thenExpr elseExpr |
>> >>         thenExpr := arguments at: 1.
>> >>         elseExpr := arguments at: 2.
>> >>         ^((thenExpr isJust: NodeNil)
>> >>            or: [(elseExpr isJust: NodeNil)
>> >>            or: [thenExpr statements last isAssignmentNode
>> >>                   and: [elseExpr statements last isAssignmentNode]]])
>> >> not
>> >>
>> >>         "^(thenExpr isJust: NodeNil) not
>> >>            and: [(elseExpr isJust: NodeNil) not
>> >>            and: [thenExpr statements last isAssignmentNode not
>> >>                   or: [elseExpr statements last isAssignmentNode
>> >> not]]]"! !
>> >>
>> >>> Also, I'm sure that multiple-jump-elimination proposed by Igor and
>> >>> other kind of optimization would be a perfect subject for proving
>> >>> concepts of NewCompiler IR infrastructure.
>> >>>
>> >>> MessageNode>>doesEffectRequiresAPop: encoder
>> >>>        special > 0
>> >>>                ifTrue:
>> >>>                        [^self perform: (MacroTesters at: special)
>> >>> with:
>> >>> encoder].
>> >>>        ^super doesEffectRequiresAPop: encoder
>> >>>
>> >>> MessageNode>>doesEffectForIfRequiresAPop: encoder
>> >>>        "If both branch require a pop instruction, then the pop is
>> >>> factored
>> >>> after the jump.
>> >>>        In this case, the translation of this node also require a final
>> >>> pop
>> >>> that can further be eliminated eventually."
>> >>>        | thenExpr elseExpr |
>> >>>        thenExpr := arguments at: 1.
>> >>>        elseExpr := arguments at: 2.
>> >>>        ^((thenExpr isJust: NodeNil) or: [elseExpr isJust: NodeNil])
>> >>> not
>> >>>                and: [(thenExpr doesEvaluatingEffectRequiresAPop:
>> >>> encoder)
>> >>> and:
>> >>> [elseExpr doesEvaluatingEffectRequiresAPop: encoder]].
>> >>>
>> >>> BlockNode>>doesEvaluatingEffectRequiresAPop: anEncoder
>> >>>        "Test whether a pop instruction is required to skip the value
>> >>> on
>> >>> the
>> >>> stack when this block is inlined."
>> >>>        ^statements size > 0 and: [statements last
>> >>> doesEffectRequiresAPop:
>> >>> anEncoder]
>> >>>
>> >>> AssigmentNode>>doesEffectRequiresAPop: anEncoder
>> >>>        "Test whether a pop instruction is required to skip the value
>> >>> on
>> >>> the stack"
>> >>>        ^(variable canStorePopInASingleInstruction: anEncoder) not
>> >>>
>> >>> InstanceVariableNode>>canStorePopInASingleInstruction: anEncoder
>> >>>        ^true
>> >>>
>> >>> etc...
>> >>>
>> >>> Le 14 février 2012 01:27, Eliot Miranda <[hidden email]> a
>> >>> écrit
>> >>> :
>> >>> > OK, I fixed this (a pleasant Monday afternoon distraction) (at least
>> >>> > I
>> >>> > *think* I've fixed it; I still have to check the decompiler).
>> >>> >  But... in
>> >>> > doing so I realised I had to fix CompiledMethod>>#= since I needed
>> >>> > to
>> >>> > see
>> >>> > exactly which methods the compiler change affected (just committed
>> >>> > as
>> >>> > Kernel-eem.669).  In doing this I noticed that recent compiler
>> >>> > changes
>> >>> > have
>> >>> > not been applied by recompiling affected methods in the system.  If
>> >>> > you
>> >>> > look
>> >>> > at Behavior>>#sourceMatchesBytecodeAt: it uses CompiledMethod>>#= to
>> >>> > check
>> >>> > that the compiler's output remains unchanged.  In the 4.3 trunk
>> >>> > image
>> >>> > with
>> >>> > CompiledMethod>>#= fixed, sourceMatchesBytecodeAt: reports that
>> >>> > some 19654
>> >>> > methods need recompiling:
>> >>> >
>> >>> > (SystemNavigation new allSelect: [:m| (m methodClass
>> >>> > sourceMatchesBytecodeAt: m selector) not]) size 19654
>> >>> >
>> >>> > Being selective in recompiling, via e.g.
>> >>> >
>> >>> > SystemNavigation new allSelect: [:m| (m methodClass
>> >>> > sourceMatchesBytecodeAt:
>> >>> > m selector) ifFalse: [m methodClass recompile: m selector]. false]
>> >>> >
>> >>> > one gets to the desired
>> >>> >
>> >>> > (SystemNavigation new allSelect: [:m| (m methodClass
>> >>> > sourceMatchesBytecodeAt: m selector) not]) size 0
>> >>> >
>> >>> > So the question is should we add an update that does the selective
>> >>> > recompile?  I think we should, e.g. by committing a version of
>> >>> > Compiler
>> >>> > that
>> >>> > contains the selective recompile as a post-script.  Agreed?
>> >>> >
>> >>> > On Mon, Feb 13, 2012 at 11:58 AM, Eliot Miranda
>> >>> > <[hidden email]>
>> >>> > wrote:
>> >>> >>
>> >>> >> Hi All,
>> >>> >>
>> >>> >>     (whitewash alert)  I just had occasion to look at the bytecode
>> >>> >> generated by the standard compiler for HashedCollection
>> >>> >> class>>#goodPrimeAtLeast:.  Here's the source, with the issue in
>> >>> >> bold:
>> >>> >>
>> >>> >> goodPrimeAtLeast: lowerLimit
>> >>> >> "Answer the next good prime >= lowerlimit.
>> >>> >> If lowerLimit is larger than the largest known good prime,
>> >>> >> just make it odd."
>> >>> >> | primes low mid high prime |
>> >>> >> primes := self goodPrimes.
>> >>> >> low := 1.
>> >>> >> high := primes size.
>> >>> >> lowerLimit > (primes at: high) ifTrue: [
>> >>> >> ^lowerLimit bitOr: 1 ].
>> >>> >> [ high - low <= 1 ] whileFalse: [
>> >>> >> mid := high + low // 2.
>> >>> >> prime := primes at: mid.
>> >>> >> prime = lowerLimit ifTrue: [ ^prime ].
>> >>> >> prime < lowerLimit
>> >>> >> ifTrue: [ low := mid ]
>> >>> >> ifFalse: [ high := mid ] ].
>> >>> >> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
>> >>> >> ^primes at: high
>> >>> >>
>> >>> >> The code for this sequence is
>> >>> >> 58 <15> pushTemp: 5
>> >>> >> 59 <10> pushTemp: 0
>> >>> >> 60 <B2> send: <
>> >>> >> 61 <9B> jumpFalse: 66
>> >>> >> 62 <13> pushTemp: 3
>> >>> >> 63 <81 42> storeIntoTemp: 2
>> >>> >> 65 <92> jumpTo: 69
>> >>> >> 66 <13> pushTemp: 3
>> >>> >> 67 <81 44> storeIntoTemp: 4
>> >>> >> 69 <87> pop
>> >>> >>
>> >>> >> where-as the following would be better:
>> >>> >>
>> >>> >> 58 <15> pushTemp: 5
>> >>> >> 59 <10> pushTemp: 0
>> >>> >> 60 <B2> send: <
>> >>> >> 61 <9B> jumpFalse: 66
>> >>> >> 62 <13> pushTemp: 3
>> >>> >> 63 <82 42> popIntoTemp: 2
>> >>> >> 65 <92> jumpTo: 69
>> >>> >> 66 <13> pushTemp: 3
>> >>> >> 67 <82 44> popIntoTemp: 4
>> >>> >>
>> >>> >> The reason is that the code generator favours using a single pop
>> >>> >> for
>> >>> >> both
>> >>> >> arms of the if:
>> >>> >>
>> >>> >> MessageNode>>sizeCodeForIf: encoder value: forValue
>> >>> >> | thenExpr elseExpr branchSize thenSize elseSize |
>> >>> >> thenExpr := arguments at: 1.
>> >>> >> elseExpr := arguments at: 2.
>> >>> >> (forValue
>> >>> >> or: [(thenExpr isJust: NodeNil)
>> >>> >> or: [elseExpr isJust: NodeNil]]) not
>> >>> >> "(...not ifTrue: avoids using ifFalse: alone during this compile)"
>> >>> >> ifTrue:  "Two-armed IFs forEffect share a single pop"
>> >>> >> [^super sizeCodeForEffect: encoder].
>> >>> >> ...
>> >>> >>
>> >>> >> MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue
>> >>> >> | thenExpr thenSize elseExpr elseSize |
>> >>> >> thenSize := sizes at: 1.
>> >>> >> elseSize := sizes at: 2.
>> >>> >> (forValue not and: [elseSize * thenSize > 0]) ifTrue:
>> >>> >> "Two-armed IFs forEffect share a single pop"
>> >>> >> [^super emitCodeForEffect: stack encoder: encoder].
>> >>> >>
>> >>> >> It would be nice if this only happened if doing so actually reduced
>> >>> >> the
>> >>> >> size of the generated code.
>> >>> >> --
>> >>> >> best,
>> >>> >> Eliot
>> >>> >>
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > best,
>> >>> > Eliot
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> best,
>> >> Eliot
>> >>
>> >>
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko.
>>
>
>
>
> --
> best,
> Eliot
>
>
>
>



sharePop2.st (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: here's a curio for compiler mavens...

Nicolas Cellier
1238 methods don't really need to factor pop if I also handle the case of return
Note that this does not always make a difference neither in method
length nor in execution time...
It does for example if you end with an assignment in one branch and a
return in the other, like:
    cond ifTrue: [x := 0] ifFalse: [x := -1. ^self]. self doFurtherProcessing

So the above count does not really tell which methods are worth changing...

The case of assignment in single branch is interesting because we
replace a store and a pop with a single storePop.
    cond ifTrue: [x := 0] ifFalse: [self doSome]. self doFurtherProcessing
Don't know if it gains anything when jitted though...

Nicolas

Le 15 février 2012 01:24, Nicolas Cellier
<[hidden email]> a écrit :

> It's not a clean image, but here are the results:
>
> Eliot code
> -> 503 methods
>
> Modified Eliot code
> doesEffectForIfRequiresAPop: encoder
>        "If both branch require a pop instruction, then the pop is factored
> after the jump.
>        If a branch is empty or ends with assignment, it don't ends with a
> pop, so we don't need to factor a pop"
>        ^arguments noneSatisfy: [:branch | (branch isJust: NodeNil) or:
> [branch statements last isAssignmentNode]].
> -> 880 methods
>
> Mine
> doesEffectForIfRequiresAPop: encoder
>        "If both branch require a pop instruction, then the pop is factored
> after the jump.
>        In this case, the translation of this node also require a final pop
> that can further be eliminated eventually."
>        ^arguments allSatisfy: [:branch | (branch isJust: NodeNil) not and:
> [branch doesEvaluatingEffectRequiresAPop: encoder]].
> -> 1202 methods (code attached)
>
> And I think I forgot the case of ReturnNode which don't require a pop...
>
> Nicolas
>
> Le 15 février 2012 00:57, Eliot Miranda <[hidden email]> a écrit :
>>
>>
>> On Tue, Feb 14, 2012 at 2:55 PM, Igor Stasenko <[hidden email]> wrote:
>>>
>>> On 15 February 2012 00:50, Nicolas Cellier
>>> <[hidden email]> wrote:
>>> > Le 14 février 2012 23:10, Eliot Miranda <[hidden email]> a
>>> > écrit :
>>> >> On Tue, Feb 14, 2012 at 5:30 AM, Nicolas Cellier
>>> >> <[hidden email]> wrote:
>>> >>>
>>> >>> Ah, I came up with a relatively explicit solution, but also quite
>>> >>> heavy in term of number of messages required to implement the
>>> >>> optimisation.
>>> >>> I'm curious to see your implementation :)
>>> >>
>>> >>
>>> >> similar, but should I commit it? I think mine is fine since it is only
>>> >> assignments that have storepop forms, so your generality doesn't win
>>> >> anything.  but i could be wrong.
>>> >>
>>> >
>>> > I was thinking of
>>> >
>>> > 1) nested ifTrue: ifFalse:
>>> > cond1 ifTrue: [x1 := y] ifFalse: [cond2 ifTrue: [x2 := y] ifFalse: [x3
>>> > := y]]
>>> >
>>> > elseExpr statements last is a message, not an assignment, though we
>>> > can factor the pop out...
>>> >
>>> > 2) inlined loop messages don't require a final pop...
>>> >
>>> > x <= y ifTrue: [x to: y do: [:i | (self at: i) doSome]] ifFalse: [y
>>> > to: x do: [:i | (self at: i) doSome]].
>>> >
>>> > An encoder pushing a value in both branches just to pop it after
>>> > wouldn't be that well behaved...
>>> >
>>> > I won't give any version publicly because I crashed several images in
>>> > the bootstrap process, and don't know if the bug is in the bootstrap
>>> > or in the code.
>>> >
>>> IMO, too much effort for such diminishing returns. :)
>>
>>
>> That was my thought too.  It would be interesting to quantify the benefits.
>>  I noticed some 451 methods that were improved by my simple assignment-only
>> mod.  How many additional methods get caught by yours Nicolas?
>>
>>>
>>>
>>> > Nicolas
>>> >
>>> >>  !MessageNode methodsFor: 'code generation' stamp: 'eem 2/13/2012
>>> >> 15:25'!
>>> >> ifThenElseForEffectShouldSharePop
>>> >>         | thenExpr elseExpr |
>>> >>         thenExpr := arguments at: 1.
>>> >>         elseExpr := arguments at: 2.
>>> >>         ^((thenExpr isJust: NodeNil)
>>> >>            or: [(elseExpr isJust: NodeNil)
>>> >>            or: [thenExpr statements last isAssignmentNode
>>> >>                   and: [elseExpr statements last isAssignmentNode]]])
>>> >> not
>>> >>
>>> >>         "^(thenExpr isJust: NodeNil) not
>>> >>            and: [(elseExpr isJust: NodeNil) not
>>> >>            and: [thenExpr statements last isAssignmentNode not
>>> >>                   or: [elseExpr statements last isAssignmentNode
>>> >> not]]]"! !
>>> >>
>>> >>> Also, I'm sure that multiple-jump-elimination proposed by Igor and
>>> >>> other kind of optimization would be a perfect subject for proving
>>> >>> concepts of NewCompiler IR infrastructure.
>>> >>>
>>> >>> MessageNode>>doesEffectRequiresAPop: encoder
>>> >>>        special > 0
>>> >>>                ifTrue:
>>> >>>                        [^self perform: (MacroTesters at: special)
>>> >>> with:
>>> >>> encoder].
>>> >>>        ^super doesEffectRequiresAPop: encoder
>>> >>>
>>> >>> MessageNode>>doesEffectForIfRequiresAPop: encoder
>>> >>>        "If both branch require a pop instruction, then the pop is
>>> >>> factored
>>> >>> after the jump.
>>> >>>        In this case, the translation of this node also require a final
>>> >>> pop
>>> >>> that can further be eliminated eventually."
>>> >>>        | thenExpr elseExpr |
>>> >>>        thenExpr := arguments at: 1.
>>> >>>        elseExpr := arguments at: 2.
>>> >>>        ^((thenExpr isJust: NodeNil) or: [elseExpr isJust: NodeNil])
>>> >>> not
>>> >>>                and: [(thenExpr doesEvaluatingEffectRequiresAPop:
>>> >>> encoder)
>>> >>> and:
>>> >>> [elseExpr doesEvaluatingEffectRequiresAPop: encoder]].
>>> >>>
>>> >>> BlockNode>>doesEvaluatingEffectRequiresAPop: anEncoder
>>> >>>        "Test whether a pop instruction is required to skip the value
>>> >>> on
>>> >>> the
>>> >>> stack when this block is inlined."
>>> >>>        ^statements size > 0 and: [statements last
>>> >>> doesEffectRequiresAPop:
>>> >>> anEncoder]
>>> >>>
>>> >>> AssigmentNode>>doesEffectRequiresAPop: anEncoder
>>> >>>        "Test whether a pop instruction is required to skip the value
>>> >>> on
>>> >>> the stack"
>>> >>>        ^(variable canStorePopInASingleInstruction: anEncoder) not
>>> >>>
>>> >>> InstanceVariableNode>>canStorePopInASingleInstruction: anEncoder
>>> >>>        ^true
>>> >>>
>>> >>> etc...
>>> >>>
>>> >>> Le 14 février 2012 01:27, Eliot Miranda <[hidden email]> a
>>> >>> écrit
>>> >>> :
>>> >>> > OK, I fixed this (a pleasant Monday afternoon distraction) (at least
>>> >>> > I
>>> >>> > *think* I've fixed it; I still have to check the decompiler).
>>> >>> >  But... in
>>> >>> > doing so I realised I had to fix CompiledMethod>>#= since I needed
>>> >>> > to
>>> >>> > see
>>> >>> > exactly which methods the compiler change affected (just committed
>>> >>> > as
>>> >>> > Kernel-eem.669).  In doing this I noticed that recent compiler
>>> >>> > changes
>>> >>> > have
>>> >>> > not been applied by recompiling affected methods in the system.  If
>>> >>> > you
>>> >>> > look
>>> >>> > at Behavior>>#sourceMatchesBytecodeAt: it uses CompiledMethod>>#= to
>>> >>> > check
>>> >>> > that the compiler's output remains unchanged.  In the 4.3 trunk
>>> >>> > image
>>> >>> > with
>>> >>> > CompiledMethod>>#= fixed, sourceMatchesBytecodeAt: reports that
>>> >>> > some 19654
>>> >>> > methods need recompiling:
>>> >>> >
>>> >>> > (SystemNavigation new allSelect: [:m| (m methodClass
>>> >>> > sourceMatchesBytecodeAt: m selector) not]) size 19654
>>> >>> >
>>> >>> > Being selective in recompiling, via e.g.
>>> >>> >
>>> >>> > SystemNavigation new allSelect: [:m| (m methodClass
>>> >>> > sourceMatchesBytecodeAt:
>>> >>> > m selector) ifFalse: [m methodClass recompile: m selector]. false]
>>> >>> >
>>> >>> > one gets to the desired
>>> >>> >
>>> >>> > (SystemNavigation new allSelect: [:m| (m methodClass
>>> >>> > sourceMatchesBytecodeAt: m selector) not]) size 0
>>> >>> >
>>> >>> > So the question is should we add an update that does the selective
>>> >>> > recompile?  I think we should, e.g. by committing a version of
>>> >>> > Compiler
>>> >>> > that
>>> >>> > contains the selective recompile as a post-script.  Agreed?
>>> >>> >
>>> >>> > On Mon, Feb 13, 2012 at 11:58 AM, Eliot Miranda
>>> >>> > <[hidden email]>
>>> >>> > wrote:
>>> >>> >>
>>> >>> >> Hi All,
>>> >>> >>
>>> >>> >>     (whitewash alert)  I just had occasion to look at the bytecode
>>> >>> >> generated by the standard compiler for HashedCollection
>>> >>> >> class>>#goodPrimeAtLeast:.  Here's the source, with the issue in
>>> >>> >> bold:
>>> >>> >>
>>> >>> >> goodPrimeAtLeast: lowerLimit
>>> >>> >> "Answer the next good prime >= lowerlimit.
>>> >>> >> If lowerLimit is larger than the largest known good prime,
>>> >>> >> just make it odd."
>>> >>> >> | primes low mid high prime |
>>> >>> >> primes := self goodPrimes.
>>> >>> >> low := 1.
>>> >>> >> high := primes size.
>>> >>> >> lowerLimit > (primes at: high) ifTrue: [
>>> >>> >> ^lowerLimit bitOr: 1 ].
>>> >>> >> [ high - low <= 1 ] whileFalse: [
>>> >>> >> mid := high + low // 2.
>>> >>> >> prime := primes at: mid.
>>> >>> >> prime = lowerLimit ifTrue: [ ^prime ].
>>> >>> >> prime < lowerLimit
>>> >>> >> ifTrue: [ low := mid ]
>>> >>> >> ifFalse: [ high := mid ] ].
>>> >>> >> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
>>> >>> >> ^primes at: high
>>> >>> >>
>>> >>> >> The code for this sequence is
>>> >>> >> 58 <15> pushTemp: 5
>>> >>> >> 59 <10> pushTemp: 0
>>> >>> >> 60 <B2> send: <
>>> >>> >> 61 <9B> jumpFalse: 66
>>> >>> >> 62 <13> pushTemp: 3
>>> >>> >> 63 <81 42> storeIntoTemp: 2
>>> >>> >> 65 <92> jumpTo: 69
>>> >>> >> 66 <13> pushTemp: 3
>>> >>> >> 67 <81 44> storeIntoTemp: 4
>>> >>> >> 69 <87> pop
>>> >>> >>
>>> >>> >> where-as the following would be better:
>>> >>> >>
>>> >>> >> 58 <15> pushTemp: 5
>>> >>> >> 59 <10> pushTemp: 0
>>> >>> >> 60 <B2> send: <
>>> >>> >> 61 <9B> jumpFalse: 66
>>> >>> >> 62 <13> pushTemp: 3
>>> >>> >> 63 <82 42> popIntoTemp: 2
>>> >>> >> 65 <92> jumpTo: 69
>>> >>> >> 66 <13> pushTemp: 3
>>> >>> >> 67 <82 44> popIntoTemp: 4
>>> >>> >>
>>> >>> >> The reason is that the code generator favours using a single pop
>>> >>> >> for
>>> >>> >> both
>>> >>> >> arms of the if:
>>> >>> >>
>>> >>> >> MessageNode>>sizeCodeForIf: encoder value: forValue
>>> >>> >> | thenExpr elseExpr branchSize thenSize elseSize |
>>> >>> >> thenExpr := arguments at: 1.
>>> >>> >> elseExpr := arguments at: 2.
>>> >>> >> (forValue
>>> >>> >> or: [(thenExpr isJust: NodeNil)
>>> >>> >> or: [elseExpr isJust: NodeNil]]) not
>>> >>> >> "(...not ifTrue: avoids using ifFalse: alone during this compile)"
>>> >>> >> ifTrue:  "Two-armed IFs forEffect share a single pop"
>>> >>> >> [^super sizeCodeForEffect: encoder].
>>> >>> >> ...
>>> >>> >>
>>> >>> >> MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue
>>> >>> >> | thenExpr thenSize elseExpr elseSize |
>>> >>> >> thenSize := sizes at: 1.
>>> >>> >> elseSize := sizes at: 2.
>>> >>> >> (forValue not and: [elseSize * thenSize > 0]) ifTrue:
>>> >>> >> "Two-armed IFs forEffect share a single pop"
>>> >>> >> [^super emitCodeForEffect: stack encoder: encoder].
>>> >>> >>
>>> >>> >> It would be nice if this only happened if doing so actually reduced
>>> >>> >> the
>>> >>> >> size of the generated code.
>>> >>> >> --
>>> >>> >> best,
>>> >>> >> Eliot
>>> >>> >>
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > --
>>> >>> > best,
>>> >>> > Eliot
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> best,
>>> >> Eliot
>>> >>
>>> >>
>>> >>
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko.
>>>
>>
>>
>>
>> --
>> best,
>> Eliot
>>
>>
>>
>>



sharePop3.st (14K) Download Attachment