Hi All,
(whitewash alert) I just had occasion to look at the bytecode generated by the standard compiler for HashedCollection class>>#goodPrimeAtLeast:. Here's the source, with the issue in bold:
goodPrimeAtLeast: lowerLimit "Answer the next good prime >= lowerlimit. If lowerLimit is larger than the largest known good prime,
just make it odd." | primes low mid high prime |
primes := self goodPrimes. low := 1. high := primes size.
lowerLimit > (primes at: high) ifTrue: [ ^lowerLimit bitOr: 1 ]. [ high - low <= 1 ] whileFalse: [
mid := high + low // 2. prime := primes at: mid. prime = lowerLimit ifTrue: [ ^prime ].
prime < lowerLimit ifTrue: [ low := mid ] ifFalse: [ high := mid ] ].
(primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ]. ^primes at: high
The code for this sequence is 58 <15> pushTemp: 5 59 <10> pushTemp: 0
60 <B2> send: < 61 <9B> jumpFalse: 66 62 <13> pushTemp: 3
63 <81 42> storeIntoTemp: 2 65 <92> jumpTo: 69 66 <13> pushTemp: 3
67 <81 44> storeIntoTemp: 4 69 <87> pop where-as the following would be better: 58 <15> pushTemp: 5 59 <10> pushTemp: 0
60 <B2> send: < 61 <9B> jumpFalse: 66 62 <13> pushTemp: 3
63 <82 42> popIntoTemp: 2 65 <92> jumpTo: 69 66 <13> pushTemp: 3
67 <82 44> popIntoTemp: 4 The reason is that the code generator favours using a single pop for both arms of the if:
MessageNode>>sizeCodeForIf: encoder value: forValue | thenExpr elseExpr branchSize thenSize elseSize | thenExpr := arguments at: 1.
elseExpr := arguments at: 2. (forValue or: [(thenExpr isJust: NodeNil)
or: [elseExpr isJust: NodeNil]]) not "(...not ifTrue: avoids using ifFalse: alone during this compile)"
ifTrue: "Two-armed IFs forEffect share a single pop" [^super sizeCodeForEffect: encoder].
... MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue | thenExpr thenSize elseExpr elseSize |
thenSize := sizes at: 1. elseSize := sizes at: 2. (forValue not and: [elseSize * thenSize > 0]) ifTrue:
"Two-armed IFs forEffect share a single pop" [^super emitCodeForEffect: stack encoder: encoder].
It would be nice if this only happened if doing so actually reduced the size of the generated code. -- best, Eliot |
On 13 February 2012 20:58, Eliot Miranda <[hidden email]> wrote:
> Hi All, > > (whitewash alert) I just had occasion to look at the bytecode generated > by the standard compiler for HashedCollection class>>#goodPrimeAtLeast:. > Here's the source, with the issue in bold: > > goodPrimeAtLeast: lowerLimit > "Answer the next good prime >= lowerlimit. > If lowerLimit is larger than the largest known good prime, > just make it odd." > | primes low mid high prime | > primes := self goodPrimes. > low := 1. > high := primes size. > lowerLimit > (primes at: high) ifTrue: [ > ^lowerLimit bitOr: 1 ]. > [ high - low <= 1 ] whileFalse: [ > mid := high + low // 2. > prime := primes at: mid. > prime = lowerLimit ifTrue: [ ^prime ]. > prime < lowerLimit > ifTrue: [ low := mid ] > ifFalse: [ high := mid ] ]. > (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ]. > ^primes at: high > > The code for this sequence is > 58 <15> pushTemp: 5 > 59 <10> pushTemp: 0 > 60 <B2> send: < > 61 <9B> jumpFalse: 66 > 62 <13> pushTemp: 3 > 63 <81 42> storeIntoTemp: 2 > 65 <92> jumpTo: 69 > 66 <13> pushTemp: 3 > 67 <81 44> storeIntoTemp: 4 > 69 <87> pop > > where-as the following would be better: > > 58 <15> pushTemp: 5 > 59 <10> pushTemp: 0 > 60 <B2> send: < > 61 <9B> jumpFalse: 66 > 62 <13> pushTemp: 3 > 63 <82 42> popIntoTemp: 2 > 65 <92> jumpTo: 69 > 66 <13> pushTemp: 3 > 67 <82 44> popIntoTemp: 4 > [ (a) ifTrue:[ b] ifFalse: [c] ] value should answer b or c (depending on a) now if you use pop instead of store, it will break. Of course in while loop, the [ loop-body closure ] value is always discarded, so yes we can eliminate extra pop. But you can do even more: 1. loop head ... .... 61 <9B> jumpFalse: 66 62 <13> pushTemp: 3 63 <82 42> popIntoTemp: 2 ** 65 <92> jumpTo: 1 66 <13> pushTemp: 3 67 <82 44> popIntoTemp: 4 69 ... jumpTo: 1 so, we can have 1 less jump when code enters first branch. > The reason is that the code generator favours using a single pop for both > arms of the if: > > MessageNode>>sizeCodeForIf: encoder value: forValue > | thenExpr elseExpr branchSize thenSize elseSize | > thenExpr := arguments at: 1. > elseExpr := arguments at: 2. > (forValue > or: [(thenExpr isJust: NodeNil) > or: [elseExpr isJust: NodeNil]]) not > "(...not ifTrue: avoids using ifFalse: alone during this compile)" > ifTrue: "Two-armed IFs forEffect share a single pop" > [^super sizeCodeForEffect: encoder]. > ... > > MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue > | thenExpr thenSize elseExpr elseSize | > thenSize := sizes at: 1. > elseSize := sizes at: 2. > (forValue not and: [elseSize * thenSize > 0]) ifTrue: > "Two-armed IFs forEffect share a single pop" > [^super emitCodeForEffect: stack encoder: encoder]. > > It would be nice if this only happened if doing so actually reduced the size > of the generated code. > -- > best, > Eliot > > > > -- Best regards, Igor Stasenko. |
In reply to this post by Eliot Miranda-2
Le 13 février 2012 20:58, Eliot Miranda <[hidden email]> a écrit :
> Hi All, > > (whitewash alert) I just had occasion to look at the bytecode generated > by the standard compiler for HashedCollection class>>#goodPrimeAtLeast:. > Here's the source, with the issue in bold: > > goodPrimeAtLeast: lowerLimit > "Answer the next good prime >= lowerlimit. > If lowerLimit is larger than the largest known good prime, > just make it odd." > | primes low mid high prime | > primes := self goodPrimes. > low := 1. > high := primes size. > lowerLimit > (primes at: high) ifTrue: [ > ^lowerLimit bitOr: 1 ]. > [ high - low <= 1 ] whileFalse: [ > mid := high + low // 2. > prime := primes at: mid. > prime = lowerLimit ifTrue: [ ^prime ]. > prime < lowerLimit > ifTrue: [ low := mid ] > ifFalse: [ high := mid ] ]. > (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ]. > ^primes at: high > > The code for this sequence is > 58 <15> pushTemp: 5 > 59 <10> pushTemp: 0 > 60 <B2> send: < > 61 <9B> jumpFalse: 66 > 62 <13> pushTemp: 3 > 63 <81 42> storeIntoTemp: 2 > 65 <92> jumpTo: 69 > 66 <13> pushTemp: 3 > 67 <81 44> storeIntoTemp: 4 > 69 <87> pop > > where-as the following would be better: > > 58 <15> pushTemp: 5 > 59 <10> pushTemp: 0 > 60 <B2> send: < > 61 <9B> jumpFalse: 66 > 62 <13> pushTemp: 3 > 63 <82 42> popIntoTemp: 2 > 65 <92> jumpTo: 69 > 66 <13> pushTemp: 3 > 67 <82 44> popIntoTemp: 4 > > The reason is that the code generator favours using a single pop for both > arms of the if: > > MessageNode>>sizeCodeForIf: encoder value: forValue > | thenExpr elseExpr branchSize thenSize elseSize | > thenExpr := arguments at: 1. > elseExpr := arguments at: 2. > (forValue > or: [(thenExpr isJust: NodeNil) > or: [elseExpr isJust: NodeNil]]) not > "(...not ifTrue: avoids using ifFalse: alone during this compile)" > ifTrue: "Two-armed IFs forEffect share a single pop" > [^super sizeCodeForEffect: encoder]. > ... > > MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue > | thenExpr thenSize elseExpr elseSize | > thenSize := sizes at: 1. > elseSize := sizes at: 2. > (forValue not and: [elseSize * thenSize > 0]) ifTrue: > "Two-armed IFs forEffect share a single pop" > [^super emitCodeForEffect: stack encoder: encoder]. > > It would be nice if this only happened if doing so actually reduced the size > of the generated code. > -- Ah yes, for example ifTrue: [self doThis] ifFalse: [self doThat] will factor a pop. So, the room for optimisation is the case when both branch finish with assignment? (storePop) Will it break the Decompiler? Nicolas > best, > Eliot > > > > |
In reply to this post by Igor Stasenko
Le 13 février 2012 21:27, Igor Stasenko <[hidden email]> a écrit :
> On 13 February 2012 20:58, Eliot Miranda <[hidden email]> wrote: >> Hi All, >> >> (whitewash alert) I just had occasion to look at the bytecode generated >> by the standard compiler for HashedCollection class>>#goodPrimeAtLeast:. >> Here's the source, with the issue in bold: >> >> goodPrimeAtLeast: lowerLimit >> "Answer the next good prime >= lowerlimit. >> If lowerLimit is larger than the largest known good prime, >> just make it odd." >> | primes low mid high prime | >> primes := self goodPrimes. >> low := 1. >> high := primes size. >> lowerLimit > (primes at: high) ifTrue: [ >> ^lowerLimit bitOr: 1 ]. >> [ high - low <= 1 ] whileFalse: [ >> mid := high + low // 2. >> prime := primes at: mid. >> prime = lowerLimit ifTrue: [ ^prime ]. >> prime < lowerLimit >> ifTrue: [ low := mid ] >> ifFalse: [ high := mid ] ]. >> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ]. >> ^primes at: high >> >> The code for this sequence is >> 58 <15> pushTemp: 5 >> 59 <10> pushTemp: 0 >> 60 <B2> send: < >> 61 <9B> jumpFalse: 66 >> 62 <13> pushTemp: 3 >> 63 <81 42> storeIntoTemp: 2 >> 65 <92> jumpTo: 69 >> 66 <13> pushTemp: 3 >> 67 <81 44> storeIntoTemp: 4 >> 69 <87> pop >> >> where-as the following would be better: >> >> 58 <15> pushTemp: 5 >> 59 <10> pushTemp: 0 >> 60 <B2> send: < >> 61 <9B> jumpFalse: 66 >> 62 <13> pushTemp: 3 >> 63 <82 42> popIntoTemp: 2 >> 65 <92> jumpTo: 69 >> 66 <13> pushTemp: 3 >> 67 <82 44> popIntoTemp: 4 >> > > [ (a) ifTrue:[ b] ifFalse: [c] ] value > should answer b or c (depending on a) > now if you use pop instead of store, it will break. > Yes but we always know whether we want to emit code forValue or just for effect. > Of course in while loop, the [ loop-body closure ] value is always > discarded, so yes we can > eliminate extra pop. > But you can do even more: > > 1. loop head > ... > .... > 61 <9B> jumpFalse: 66 > 62 <13> pushTemp: 3 > 63 <82 42> popIntoTemp: 2 > ** 65 <92> jumpTo: 1 > 66 <13> pushTemp: 3 > 67 <82 44> popIntoTemp: 4 > 69 ... jumpTo: 1 > > so, we can have 1 less jump when code enters first branch. > Very good! I think however that hacking the Decompiler will be a challenge! > >> The reason is that the code generator favours using a single pop for both >> arms of the if: >> >> MessageNode>>sizeCodeForIf: encoder value: forValue >> | thenExpr elseExpr branchSize thenSize elseSize | >> thenExpr := arguments at: 1. >> elseExpr := arguments at: 2. >> (forValue >> or: [(thenExpr isJust: NodeNil) >> or: [elseExpr isJust: NodeNil]]) not >> "(...not ifTrue: avoids using ifFalse: alone during this compile)" >> ifTrue: "Two-armed IFs forEffect share a single pop" >> [^super sizeCodeForEffect: encoder]. >> ... >> >> MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue >> | thenExpr thenSize elseExpr elseSize | >> thenSize := sizes at: 1. >> elseSize := sizes at: 2. >> (forValue not and: [elseSize * thenSize > 0]) ifTrue: >> "Two-armed IFs forEffect share a single pop" >> [^super emitCodeForEffect: stack encoder: encoder]. >> >> It would be nice if this only happened if doing so actually reduced the size >> of the generated code. >> -- >> best, >> Eliot >> >> >> >> > > > > -- > Best regards, > Igor Stasenko. > |
On 13 February 2012 22:38, Nicolas Cellier
<[hidden email]> wrote: > Le 13 février 2012 21:27, Igor Stasenko <[hidden email]> a écrit : >> On 13 February 2012 20:58, Eliot Miranda <[hidden email]> wrote: >>> Hi All, >>> >>> (whitewash alert) I just had occasion to look at the bytecode generated >>> by the standard compiler for HashedCollection class>>#goodPrimeAtLeast:. >>> Here's the source, with the issue in bold: >>> >>> goodPrimeAtLeast: lowerLimit >>> "Answer the next good prime >= lowerlimit. >>> If lowerLimit is larger than the largest known good prime, >>> just make it odd." >>> | primes low mid high prime | >>> primes := self goodPrimes. >>> low := 1. >>> high := primes size. >>> lowerLimit > (primes at: high) ifTrue: [ >>> ^lowerLimit bitOr: 1 ]. >>> [ high - low <= 1 ] whileFalse: [ >>> mid := high + low // 2. >>> prime := primes at: mid. >>> prime = lowerLimit ifTrue: [ ^prime ]. >>> prime < lowerLimit >>> ifTrue: [ low := mid ] >>> ifFalse: [ high := mid ] ]. >>> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ]. >>> ^primes at: high >>> >>> The code for this sequence is >>> 58 <15> pushTemp: 5 >>> 59 <10> pushTemp: 0 >>> 60 <B2> send: < >>> 61 <9B> jumpFalse: 66 >>> 62 <13> pushTemp: 3 >>> 63 <81 42> storeIntoTemp: 2 >>> 65 <92> jumpTo: 69 >>> 66 <13> pushTemp: 3 >>> 67 <81 44> storeIntoTemp: 4 >>> 69 <87> pop >>> >>> where-as the following would be better: >>> >>> 58 <15> pushTemp: 5 >>> 59 <10> pushTemp: 0 >>> 60 <B2> send: < >>> 61 <9B> jumpFalse: 66 >>> 62 <13> pushTemp: 3 >>> 63 <82 42> popIntoTemp: 2 >>> 65 <92> jumpTo: 69 >>> 66 <13> pushTemp: 3 >>> 67 <82 44> popIntoTemp: 4 >>> >> >> [ (a) ifTrue:[ b] ifFalse: [c] ] value >> should answer b or c (depending on a) >> now if you use pop instead of store, it will break. >> > > Yes but we always know whether we want to emit code forValue or just for effect. > >> Of course in while loop, the [ loop-body closure ] value is always >> discarded, so yes we can >> eliminate extra pop. >> But you can do even more: >> >> 1. loop head >> ... >> .... >> 61 <9B> jumpFalse: 66 >> 62 <13> pushTemp: 3 >> 63 <82 42> popIntoTemp: 2 >> ** 65 <92> jumpTo: 1 >> 66 <13> pushTemp: 3 >> 67 <82 44> popIntoTemp: 4 >> 69 ... jumpTo: 1 >> >> so, we can have 1 less jump when code enters first branch. >> > > Very good! > I think however that hacking the Decompiler will be a challenge! > Well, in this case it is easy to hide hints for decompiler directly in a bytecode: 1. loop head ... 100. jumpTo: 106 101. push ... 102. pop 103. jumpTo: 1 104. hint 105. more hints 106. ... section start for false branch n+1 ... ---- same actually for loops. instead of writing sophisticated detection code , we can simply read hints: 1. loop head ... 10. jumpIfFalse: 101 (leaving the loop ) ... loop body 99. jumpTo: 1 (loop head) 100. hint for loop 101. ... rest of method's bytecode As you can see , it doesn't affects the performance, since its unreachable code, just the size of bytecode. just don't put me on fire for suggesting that :) -- Best regards, Igor Stasenko. |
Le 13 février 2012 22:08, Igor Stasenko <[hidden email]> a écrit :
> On 13 February 2012 22:38, Nicolas Cellier > <[hidden email]> wrote: >> Le 13 février 2012 21:27, Igor Stasenko <[hidden email]> a écrit : >>> On 13 February 2012 20:58, Eliot Miranda <[hidden email]> wrote: >>>> Hi All, >>>> >>>> (whitewash alert) I just had occasion to look at the bytecode generated >>>> by the standard compiler for HashedCollection class>>#goodPrimeAtLeast:. >>>> Here's the source, with the issue in bold: >>>> >>>> goodPrimeAtLeast: lowerLimit >>>> "Answer the next good prime >= lowerlimit. >>>> If lowerLimit is larger than the largest known good prime, >>>> just make it odd." >>>> | primes low mid high prime | >>>> primes := self goodPrimes. >>>> low := 1. >>>> high := primes size. >>>> lowerLimit > (primes at: high) ifTrue: [ >>>> ^lowerLimit bitOr: 1 ]. >>>> [ high - low <= 1 ] whileFalse: [ >>>> mid := high + low // 2. >>>> prime := primes at: mid. >>>> prime = lowerLimit ifTrue: [ ^prime ]. >>>> prime < lowerLimit >>>> ifTrue: [ low := mid ] >>>> ifFalse: [ high := mid ] ]. >>>> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ]. >>>> ^primes at: high >>>> >>>> The code for this sequence is >>>> 58 <15> pushTemp: 5 >>>> 59 <10> pushTemp: 0 >>>> 60 <B2> send: < >>>> 61 <9B> jumpFalse: 66 >>>> 62 <13> pushTemp: 3 >>>> 63 <81 42> storeIntoTemp: 2 >>>> 65 <92> jumpTo: 69 >>>> 66 <13> pushTemp: 3 >>>> 67 <81 44> storeIntoTemp: 4 >>>> 69 <87> pop >>>> >>>> where-as the following would be better: >>>> >>>> 58 <15> pushTemp: 5 >>>> 59 <10> pushTemp: 0 >>>> 60 <B2> send: < >>>> 61 <9B> jumpFalse: 66 >>>> 62 <13> pushTemp: 3 >>>> 63 <82 42> popIntoTemp: 2 >>>> 65 <92> jumpTo: 69 >>>> 66 <13> pushTemp: 3 >>>> 67 <82 44> popIntoTemp: 4 >>>> >>> >>> [ (a) ifTrue:[ b] ifFalse: [c] ] value >>> should answer b or c (depending on a) >>> now if you use pop instead of store, it will break. >>> >> >> Yes but we always know whether we want to emit code forValue or just for effect. >> >>> Of course in while loop, the [ loop-body closure ] value is always >>> discarded, so yes we can >>> eliminate extra pop. >>> But you can do even more: >>> >>> 1. loop head >>> ... >>> .... >>> 61 <9B> jumpFalse: 66 >>> 62 <13> pushTemp: 3 >>> 63 <82 42> popIntoTemp: 2 >>> ** 65 <92> jumpTo: 1 >>> 66 <13> pushTemp: 3 >>> 67 <82 44> popIntoTemp: 4 >>> 69 ... jumpTo: 1 >>> >>> so, we can have 1 less jump when code enters first branch. >>> >> >> Very good! >> I think however that hacking the Decompiler will be a challenge! >> > > Well, in this case it is easy to hide hints for decompiler directly in > a bytecode: > > 1. loop head > ... > 100. jumpTo: 106 > 101. push ... > 102. pop > 103. jumpTo: 1 > 104. hint > 105. more hints > 106. ... section start for false branch > > n+1 ... > > ---- > same actually for loops. instead of writing sophisticated detection > code , we can simply read hints: > > 1. loop head > ... > > 10. jumpIfFalse: 101 (leaving the loop ) > > ... loop body > > 99. jumpTo: 1 (loop head) > 100. hint for loop > 101. ... rest of method's bytecode > > As you can see , it doesn't affects the performance, since its > unreachable code, just the size of bytecode. > Huh, reserving plenty byte codes for nop or intermixing instructions and text, whatever you choose might lead you directly to the stake ;) You invented an old new idea: literate bytecode programming Nicolas > just don't put me on fire for suggesting that :) > > > -- > Best regards, > Igor Stasenko. > |
In reply to this post by Eliot Miranda-2
OK, I fixed this (a pleasant Monday afternoon distraction) (at least I *think* I've fixed it; I still have to check the decompiler). But... in doing so I realised I had to fix CompiledMethod>>#= since I needed to see exactly which methods the compiler change affected (just committed as Kernel-eem.669). In doing this I noticed that recent compiler changes have not been applied by recompiling affected methods in the system. If you look at Behavior>>#sourceMatchesBytecodeAt: it uses CompiledMethod>>#= to check that the compiler's output remains unchanged. In the 4.3 trunk image with CompiledMethod>>#= fixed, sourceMatchesBytecodeAt: reports that some 19654 methods need recompiling:
(SystemNavigation new allSelect: [:m| (m methodClass sourceMatchesBytecodeAt: m selector) not]) size 19654 Being selective in recompiling, via e.g.
SystemNavigation new allSelect: [:m| (m methodClass sourceMatchesBytecodeAt: m selector) ifFalse: [m methodClass recompile: m selector]. false]
one gets to the desired (SystemNavigation new allSelect: [:m| (m methodClass sourceMatchesBytecodeAt: m selector) not]) size 0
So the question is should we add an update that does the selective recompile? I think we should, e.g. by committing a version of Compiler that contains the selective recompile as a post-script. Agreed?
On Mon, Feb 13, 2012 at 11:58 AM, Eliot Miranda <[hidden email]> wrote: Hi All, best, Eliot |
Ah, I came up with a relatively explicit solution, but also quite
heavy in term of number of messages required to implement the optimisation. I'm curious to see your implementation :) Also, I'm sure that multiple-jump-elimination proposed by Igor and other kind of optimization would be a perfect subject for proving concepts of NewCompiler IR infrastructure. MessageNode>>doesEffectRequiresAPop: encoder special > 0 ifTrue: [^self perform: (MacroTesters at: special) with: encoder]. ^super doesEffectRequiresAPop: encoder MessageNode>>doesEffectForIfRequiresAPop: encoder "If both branch require a pop instruction, then the pop is factored after the jump. In this case, the translation of this node also require a final pop that can further be eliminated eventually." | thenExpr elseExpr | thenExpr := arguments at: 1. elseExpr := arguments at: 2. ^((thenExpr isJust: NodeNil) or: [elseExpr isJust: NodeNil]) not and: [(thenExpr doesEvaluatingEffectRequiresAPop: encoder) and: [elseExpr doesEvaluatingEffectRequiresAPop: encoder]]. BlockNode>>doesEvaluatingEffectRequiresAPop: anEncoder "Test whether a pop instruction is required to skip the value on the stack when this block is inlined." ^statements size > 0 and: [statements last doesEffectRequiresAPop: anEncoder] AssigmentNode>>doesEffectRequiresAPop: anEncoder "Test whether a pop instruction is required to skip the value on the stack" ^(variable canStorePopInASingleInstruction: anEncoder) not InstanceVariableNode>>canStorePopInASingleInstruction: anEncoder ^true etc... Le 14 février 2012 01:27, Eliot Miranda <[hidden email]> a écrit : > OK, I fixed this (a pleasant Monday afternoon distraction) (at least I > *think* I've fixed it; I still have to check the decompiler). But... in > doing so I realised I had to fix CompiledMethod>>#= since I needed to see > exactly which methods the compiler change affected (just committed as > Kernel-eem.669). In doing this I noticed that recent compiler changes have > not been applied by recompiling affected methods in the system. If you look > at Behavior>>#sourceMatchesBytecodeAt: it uses CompiledMethod>>#= to check > that the compiler's output remains unchanged. In the 4.3 trunk image with > CompiledMethod>>#= fixed, sourceMatchesBytecodeAt: reports that some 19654 > methods need recompiling: > > (SystemNavigation new allSelect: [:m| (m methodClass > sourceMatchesBytecodeAt: m selector) not]) size 19654 > > Being selective in recompiling, via e.g. > > SystemNavigation new allSelect: [:m| (m methodClass sourceMatchesBytecodeAt: > m selector) ifFalse: [m methodClass recompile: m selector]. false] > > one gets to the desired > > (SystemNavigation new allSelect: [:m| (m methodClass > sourceMatchesBytecodeAt: m selector) not]) size 0 > > So the question is should we add an update that does the selective > recompile? I think we should, e.g. by committing a version of Compiler that > contains the selective recompile as a post-script. Agreed? > > On Mon, Feb 13, 2012 at 11:58 AM, Eliot Miranda <[hidden email]> > wrote: >> >> Hi All, >> >> (whitewash alert) I just had occasion to look at the bytecode >> generated by the standard compiler for HashedCollection >> class>>#goodPrimeAtLeast:. Here's the source, with the issue in bold: >> >> goodPrimeAtLeast: lowerLimit >> "Answer the next good prime >= lowerlimit. >> If lowerLimit is larger than the largest known good prime, >> just make it odd." >> | primes low mid high prime | >> primes := self goodPrimes. >> low := 1. >> high := primes size. >> lowerLimit > (primes at: high) ifTrue: [ >> ^lowerLimit bitOr: 1 ]. >> [ high - low <= 1 ] whileFalse: [ >> mid := high + low // 2. >> prime := primes at: mid. >> prime = lowerLimit ifTrue: [ ^prime ]. >> prime < lowerLimit >> ifTrue: [ low := mid ] >> ifFalse: [ high := mid ] ]. >> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ]. >> ^primes at: high >> >> The code for this sequence is >> 58 <15> pushTemp: 5 >> 59 <10> pushTemp: 0 >> 60 <B2> send: < >> 61 <9B> jumpFalse: 66 >> 62 <13> pushTemp: 3 >> 63 <81 42> storeIntoTemp: 2 >> 65 <92> jumpTo: 69 >> 66 <13> pushTemp: 3 >> 67 <81 44> storeIntoTemp: 4 >> 69 <87> pop >> >> where-as the following would be better: >> >> 58 <15> pushTemp: 5 >> 59 <10> pushTemp: 0 >> 60 <B2> send: < >> 61 <9B> jumpFalse: 66 >> 62 <13> pushTemp: 3 >> 63 <82 42> popIntoTemp: 2 >> 65 <92> jumpTo: 69 >> 66 <13> pushTemp: 3 >> 67 <82 44> popIntoTemp: 4 >> >> The reason is that the code generator favours using a single pop for both >> arms of the if: >> >> MessageNode>>sizeCodeForIf: encoder value: forValue >> | thenExpr elseExpr branchSize thenSize elseSize | >> thenExpr := arguments at: 1. >> elseExpr := arguments at: 2. >> (forValue >> or: [(thenExpr isJust: NodeNil) >> or: [elseExpr isJust: NodeNil]]) not >> "(...not ifTrue: avoids using ifFalse: alone during this compile)" >> ifTrue: "Two-armed IFs forEffect share a single pop" >> [^super sizeCodeForEffect: encoder]. >> ... >> >> MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue >> | thenExpr thenSize elseExpr elseSize | >> thenSize := sizes at: 1. >> elseSize := sizes at: 2. >> (forValue not and: [elseSize * thenSize > 0]) ifTrue: >> "Two-armed IFs forEffect share a single pop" >> [^super emitCodeForEffect: stack encoder: encoder]. >> >> It would be nice if this only happened if doing so actually reduced the >> size of the generated code. >> -- >> best, >> Eliot >> > > > > -- > best, > Eliot > > > > |
On Tue, Feb 14, 2012 at 5:30 AM, Nicolas Cellier <[hidden email]> wrote: Ah, I came up with a relatively explicit solution, but also quite !MessageNode methodsFor: 'code generation' stamp: 'eem 2/13/2012 15:25'! ifThenElseForEffectShouldSharePop | thenExpr elseExpr | thenExpr := arguments at: 1.
elseExpr := arguments at: 2. ^((thenExpr isJust: NodeNil) or: [(elseExpr isJust: NodeNil) or: [thenExpr statements last isAssignmentNode and: [elseExpr statements last isAssignmentNode]]]) not
"^(thenExpr isJust: NodeNil) not and: [(elseExpr isJust: NodeNil) not and: [thenExpr statements last isAssignmentNode not or: [elseExpr statements last isAssignmentNode not]]]"! !
Also, I'm sure that multiple-jump-elimination proposed by Igor and best, Eliot ifThenElseForEffectShouldSharePop.st (6K) Download Attachment |
Le 14 février 2012 23:10, Eliot Miranda <[hidden email]> a écrit :
> On Tue, Feb 14, 2012 at 5:30 AM, Nicolas Cellier > <[hidden email]> wrote: >> >> Ah, I came up with a relatively explicit solution, but also quite >> heavy in term of number of messages required to implement the >> optimisation. >> I'm curious to see your implementation :) > > > similar, but should I commit it? I think mine is fine since it is only > assignments that have storepop forms, so your generality doesn't win > anything. but i could be wrong. > I was thinking of 1) nested ifTrue: ifFalse: cond1 ifTrue: [x1 := y] ifFalse: [cond2 ifTrue: [x2 := y] ifFalse: [x3 := y]] elseExpr statements last is a message, not an assignment, though we can factor the pop out... 2) inlined loop messages don't require a final pop... x <= y ifTrue: [x to: y do: [:i | (self at: i) doSome]] ifFalse: [y to: x do: [:i | (self at: i) doSome]]. An encoder pushing a value in both branches just to pop it after wouldn't be that well behaved... I won't give any version publicly because I crashed several images in the bootstrap process, and don't know if the bug is in the bootstrap or in the code. Nicolas > !MessageNode methodsFor: 'code generation' stamp: 'eem 2/13/2012 15:25'! > ifThenElseForEffectShouldSharePop > | thenExpr elseExpr | > thenExpr := arguments at: 1. > elseExpr := arguments at: 2. > ^((thenExpr isJust: NodeNil) > or: [(elseExpr isJust: NodeNil) > or: [thenExpr statements last isAssignmentNode > and: [elseExpr statements last isAssignmentNode]]]) not > > "^(thenExpr isJust: NodeNil) not > and: [(elseExpr isJust: NodeNil) not > and: [thenExpr statements last isAssignmentNode not > or: [elseExpr statements last isAssignmentNode not]]]"! ! > >> Also, I'm sure that multiple-jump-elimination proposed by Igor and >> other kind of optimization would be a perfect subject for proving >> concepts of NewCompiler IR infrastructure. >> >> MessageNode>>doesEffectRequiresAPop: encoder >> special > 0 >> ifTrue: >> [^self perform: (MacroTesters at: special) with: >> encoder]. >> ^super doesEffectRequiresAPop: encoder >> >> MessageNode>>doesEffectForIfRequiresAPop: encoder >> "If both branch require a pop instruction, then the pop is factored >> after the jump. >> In this case, the translation of this node also require a final pop >> that can further be eliminated eventually." >> | thenExpr elseExpr | >> thenExpr := arguments at: 1. >> elseExpr := arguments at: 2. >> ^((thenExpr isJust: NodeNil) or: [elseExpr isJust: NodeNil]) not >> and: [(thenExpr doesEvaluatingEffectRequiresAPop: encoder) >> and: >> [elseExpr doesEvaluatingEffectRequiresAPop: encoder]]. >> >> BlockNode>>doesEvaluatingEffectRequiresAPop: anEncoder >> "Test whether a pop instruction is required to skip the value on >> the >> stack when this block is inlined." >> ^statements size > 0 and: [statements last doesEffectRequiresAPop: >> anEncoder] >> >> AssigmentNode>>doesEffectRequiresAPop: anEncoder >> "Test whether a pop instruction is required to skip the value on >> the stack" >> ^(variable canStorePopInASingleInstruction: anEncoder) not >> >> InstanceVariableNode>>canStorePopInASingleInstruction: anEncoder >> ^true >> >> etc... >> >> Le 14 février 2012 01:27, Eliot Miranda <[hidden email]> a écrit >> : >> > OK, I fixed this (a pleasant Monday afternoon distraction) (at least I >> > *think* I've fixed it; I still have to check the decompiler). But... in >> > doing so I realised I had to fix CompiledMethod>>#= since I needed to >> > see >> > exactly which methods the compiler change affected (just committed as >> > Kernel-eem.669). In doing this I noticed that recent compiler changes >> > have >> > not been applied by recompiling affected methods in the system. If you >> > look >> > at Behavior>>#sourceMatchesBytecodeAt: it uses CompiledMethod>>#= to >> > check >> > that the compiler's output remains unchanged. In the 4.3 trunk image >> > with >> > CompiledMethod>>#= fixed, sourceMatchesBytecodeAt: reports that >> > some 19654 >> > methods need recompiling: >> > >> > (SystemNavigation new allSelect: [:m| (m methodClass >> > sourceMatchesBytecodeAt: m selector) not]) size 19654 >> > >> > Being selective in recompiling, via e.g. >> > >> > SystemNavigation new allSelect: [:m| (m methodClass >> > sourceMatchesBytecodeAt: >> > m selector) ifFalse: [m methodClass recompile: m selector]. false] >> > >> > one gets to the desired >> > >> > (SystemNavigation new allSelect: [:m| (m methodClass >> > sourceMatchesBytecodeAt: m selector) not]) size 0 >> > >> > So the question is should we add an update that does the selective >> > recompile? I think we should, e.g. by committing a version of Compiler >> > that >> > contains the selective recompile as a post-script. Agreed? >> > >> > On Mon, Feb 13, 2012 at 11:58 AM, Eliot Miranda >> > <[hidden email]> >> > wrote: >> >> >> >> Hi All, >> >> >> >> (whitewash alert) I just had occasion to look at the bytecode >> >> generated by the standard compiler for HashedCollection >> >> class>>#goodPrimeAtLeast:. Here's the source, with the issue in bold: >> >> >> >> goodPrimeAtLeast: lowerLimit >> >> "Answer the next good prime >= lowerlimit. >> >> If lowerLimit is larger than the largest known good prime, >> >> just make it odd." >> >> | primes low mid high prime | >> >> primes := self goodPrimes. >> >> low := 1. >> >> high := primes size. >> >> lowerLimit > (primes at: high) ifTrue: [ >> >> ^lowerLimit bitOr: 1 ]. >> >> [ high - low <= 1 ] whileFalse: [ >> >> mid := high + low // 2. >> >> prime := primes at: mid. >> >> prime = lowerLimit ifTrue: [ ^prime ]. >> >> prime < lowerLimit >> >> ifTrue: [ low := mid ] >> >> ifFalse: [ high := mid ] ]. >> >> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ]. >> >> ^primes at: high >> >> >> >> The code for this sequence is >> >> 58 <15> pushTemp: 5 >> >> 59 <10> pushTemp: 0 >> >> 60 <B2> send: < >> >> 61 <9B> jumpFalse: 66 >> >> 62 <13> pushTemp: 3 >> >> 63 <81 42> storeIntoTemp: 2 >> >> 65 <92> jumpTo: 69 >> >> 66 <13> pushTemp: 3 >> >> 67 <81 44> storeIntoTemp: 4 >> >> 69 <87> pop >> >> >> >> where-as the following would be better: >> >> >> >> 58 <15> pushTemp: 5 >> >> 59 <10> pushTemp: 0 >> >> 60 <B2> send: < >> >> 61 <9B> jumpFalse: 66 >> >> 62 <13> pushTemp: 3 >> >> 63 <82 42> popIntoTemp: 2 >> >> 65 <92> jumpTo: 69 >> >> 66 <13> pushTemp: 3 >> >> 67 <82 44> popIntoTemp: 4 >> >> >> >> The reason is that the code generator favours using a single pop for >> >> both >> >> arms of the if: >> >> >> >> MessageNode>>sizeCodeForIf: encoder value: forValue >> >> | thenExpr elseExpr branchSize thenSize elseSize | >> >> thenExpr := arguments at: 1. >> >> elseExpr := arguments at: 2. >> >> (forValue >> >> or: [(thenExpr isJust: NodeNil) >> >> or: [elseExpr isJust: NodeNil]]) not >> >> "(...not ifTrue: avoids using ifFalse: alone during this compile)" >> >> ifTrue: "Two-armed IFs forEffect share a single pop" >> >> [^super sizeCodeForEffect: encoder]. >> >> ... >> >> >> >> MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue >> >> | thenExpr thenSize elseExpr elseSize | >> >> thenSize := sizes at: 1. >> >> elseSize := sizes at: 2. >> >> (forValue not and: [elseSize * thenSize > 0]) ifTrue: >> >> "Two-armed IFs forEffect share a single pop" >> >> [^super emitCodeForEffect: stack encoder: encoder]. >> >> >> >> It would be nice if this only happened if doing so actually reduced the >> >> size of the generated code. >> >> -- >> >> best, >> >> Eliot >> >> >> > >> > >> > >> > -- >> > best, >> > Eliot >> > >> > >> > >> > >> > > > > -- > best, > Eliot > > > > |
On 15 February 2012 00:50, Nicolas Cellier
<[hidden email]> wrote: > Le 14 février 2012 23:10, Eliot Miranda <[hidden email]> a écrit : >> On Tue, Feb 14, 2012 at 5:30 AM, Nicolas Cellier >> <[hidden email]> wrote: >>> >>> Ah, I came up with a relatively explicit solution, but also quite >>> heavy in term of number of messages required to implement the >>> optimisation. >>> I'm curious to see your implementation :) >> >> >> similar, but should I commit it? I think mine is fine since it is only >> assignments that have storepop forms, so your generality doesn't win >> anything. but i could be wrong. >> > > I was thinking of > > 1) nested ifTrue: ifFalse: > cond1 ifTrue: [x1 := y] ifFalse: [cond2 ifTrue: [x2 := y] ifFalse: [x3 := y]] > > elseExpr statements last is a message, not an assignment, though we > can factor the pop out... > > 2) inlined loop messages don't require a final pop... > > x <= y ifTrue: [x to: y do: [:i | (self at: i) doSome]] ifFalse: [y > to: x do: [:i | (self at: i) doSome]]. > > An encoder pushing a value in both branches just to pop it after > wouldn't be that well behaved... > > I won't give any version publicly because I crashed several images in > the bootstrap process, and don't know if the bug is in the bootstrap > or in the code. > > Nicolas > >> !MessageNode methodsFor: 'code generation' stamp: 'eem 2/13/2012 15:25'! >> ifThenElseForEffectShouldSharePop >> | thenExpr elseExpr | >> thenExpr := arguments at: 1. >> elseExpr := arguments at: 2. >> ^((thenExpr isJust: NodeNil) >> or: [(elseExpr isJust: NodeNil) >> or: [thenExpr statements last isAssignmentNode >> and: [elseExpr statements last isAssignmentNode]]]) not >> >> "^(thenExpr isJust: NodeNil) not >> and: [(elseExpr isJust: NodeNil) not >> and: [thenExpr statements last isAssignmentNode not >> or: [elseExpr statements last isAssignmentNode not]]]"! ! >> >>> Also, I'm sure that multiple-jump-elimination proposed by Igor and >>> other kind of optimization would be a perfect subject for proving >>> concepts of NewCompiler IR infrastructure. >>> >>> MessageNode>>doesEffectRequiresAPop: encoder >>> special > 0 >>> ifTrue: >>> [^self perform: (MacroTesters at: special) with: >>> encoder]. >>> ^super doesEffectRequiresAPop: encoder >>> >>> MessageNode>>doesEffectForIfRequiresAPop: encoder >>> "If both branch require a pop instruction, then the pop is factored >>> after the jump. >>> In this case, the translation of this node also require a final pop >>> that can further be eliminated eventually." >>> | thenExpr elseExpr | >>> thenExpr := arguments at: 1. >>> elseExpr := arguments at: 2. >>> ^((thenExpr isJust: NodeNil) or: [elseExpr isJust: NodeNil]) not >>> and: [(thenExpr doesEvaluatingEffectRequiresAPop: encoder) >>> and: >>> [elseExpr doesEvaluatingEffectRequiresAPop: encoder]]. >>> >>> BlockNode>>doesEvaluatingEffectRequiresAPop: anEncoder >>> "Test whether a pop instruction is required to skip the value on >>> the >>> stack when this block is inlined." >>> ^statements size > 0 and: [statements last doesEffectRequiresAPop: >>> anEncoder] >>> >>> AssigmentNode>>doesEffectRequiresAPop: anEncoder >>> "Test whether a pop instruction is required to skip the value on >>> the stack" >>> ^(variable canStorePopInASingleInstruction: anEncoder) not >>> >>> InstanceVariableNode>>canStorePopInASingleInstruction: anEncoder >>> ^true >>> >>> etc... >>> >>> Le 14 février 2012 01:27, Eliot Miranda <[hidden email]> a écrit >>> : >>> > OK, I fixed this (a pleasant Monday afternoon distraction) (at least I >>> > *think* I've fixed it; I still have to check the decompiler). But... in >>> > doing so I realised I had to fix CompiledMethod>>#= since I needed to >>> > see >>> > exactly which methods the compiler change affected (just committed as >>> > Kernel-eem.669). In doing this I noticed that recent compiler changes >>> > have >>> > not been applied by recompiling affected methods in the system. If you >>> > look >>> > at Behavior>>#sourceMatchesBytecodeAt: it uses CompiledMethod>>#= to >>> > check >>> > that the compiler's output remains unchanged. In the 4.3 trunk image >>> > with >>> > CompiledMethod>>#= fixed, sourceMatchesBytecodeAt: reports that >>> > some 19654 >>> > methods need recompiling: >>> > >>> > (SystemNavigation new allSelect: [:m| (m methodClass >>> > sourceMatchesBytecodeAt: m selector) not]) size 19654 >>> > >>> > Being selective in recompiling, via e.g. >>> > >>> > SystemNavigation new allSelect: [:m| (m methodClass >>> > sourceMatchesBytecodeAt: >>> > m selector) ifFalse: [m methodClass recompile: m selector]. false] >>> > >>> > one gets to the desired >>> > >>> > (SystemNavigation new allSelect: [:m| (m methodClass >>> > sourceMatchesBytecodeAt: m selector) not]) size 0 >>> > >>> > So the question is should we add an update that does the selective >>> > recompile? I think we should, e.g. by committing a version of Compiler >>> > that >>> > contains the selective recompile as a post-script. Agreed? >>> > >>> > On Mon, Feb 13, 2012 at 11:58 AM, Eliot Miranda >>> > <[hidden email]> >>> > wrote: >>> >> >>> >> Hi All, >>> >> >>> >> (whitewash alert) I just had occasion to look at the bytecode >>> >> generated by the standard compiler for HashedCollection >>> >> class>>#goodPrimeAtLeast:. Here's the source, with the issue in bold: >>> >> >>> >> goodPrimeAtLeast: lowerLimit >>> >> "Answer the next good prime >= lowerlimit. >>> >> If lowerLimit is larger than the largest known good prime, >>> >> just make it odd." >>> >> | primes low mid high prime | >>> >> primes := self goodPrimes. >>> >> low := 1. >>> >> high := primes size. >>> >> lowerLimit > (primes at: high) ifTrue: [ >>> >> ^lowerLimit bitOr: 1 ]. >>> >> [ high - low <= 1 ] whileFalse: [ >>> >> mid := high + low // 2. >>> >> prime := primes at: mid. >>> >> prime = lowerLimit ifTrue: [ ^prime ]. >>> >> prime < lowerLimit >>> >> ifTrue: [ low := mid ] >>> >> ifFalse: [ high := mid ] ]. >>> >> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ]. >>> >> ^primes at: high >>> >> >>> >> The code for this sequence is >>> >> 58 <15> pushTemp: 5 >>> >> 59 <10> pushTemp: 0 >>> >> 60 <B2> send: < >>> >> 61 <9B> jumpFalse: 66 >>> >> 62 <13> pushTemp: 3 >>> >> 63 <81 42> storeIntoTemp: 2 >>> >> 65 <92> jumpTo: 69 >>> >> 66 <13> pushTemp: 3 >>> >> 67 <81 44> storeIntoTemp: 4 >>> >> 69 <87> pop >>> >> >>> >> where-as the following would be better: >>> >> >>> >> 58 <15> pushTemp: 5 >>> >> 59 <10> pushTemp: 0 >>> >> 60 <B2> send: < >>> >> 61 <9B> jumpFalse: 66 >>> >> 62 <13> pushTemp: 3 >>> >> 63 <82 42> popIntoTemp: 2 >>> >> 65 <92> jumpTo: 69 >>> >> 66 <13> pushTemp: 3 >>> >> 67 <82 44> popIntoTemp: 4 >>> >> >>> >> The reason is that the code generator favours using a single pop for >>> >> both >>> >> arms of the if: >>> >> >>> >> MessageNode>>sizeCodeForIf: encoder value: forValue >>> >> | thenExpr elseExpr branchSize thenSize elseSize | >>> >> thenExpr := arguments at: 1. >>> >> elseExpr := arguments at: 2. >>> >> (forValue >>> >> or: [(thenExpr isJust: NodeNil) >>> >> or: [elseExpr isJust: NodeNil]]) not >>> >> "(...not ifTrue: avoids using ifFalse: alone during this compile)" >>> >> ifTrue: "Two-armed IFs forEffect share a single pop" >>> >> [^super sizeCodeForEffect: encoder]. >>> >> ... >>> >> >>> >> MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue >>> >> | thenExpr thenSize elseExpr elseSize | >>> >> thenSize := sizes at: 1. >>> >> elseSize := sizes at: 2. >>> >> (forValue not and: [elseSize * thenSize > 0]) ifTrue: >>> >> "Two-armed IFs forEffect share a single pop" >>> >> [^super emitCodeForEffect: stack encoder: encoder]. >>> >> >>> >> It would be nice if this only happened if doing so actually reduced the >>> >> size of the generated code. >>> >> -- >>> >> best, >>> >> Eliot >>> >> >>> > >>> > >>> > >>> > -- >>> > best, >>> > Eliot >>> > >>> > >>> > >>> > >>> >> >> >> >> -- >> best, >> Eliot >> >> >> >> > -- Best regards, Igor Stasenko. |
On Tue, Feb 14, 2012 at 2:55 PM, Igor Stasenko <[hidden email]> wrote: On 15 February 2012 00:50, Nicolas Cellier That was my thought too. It would be interesting to quantify the benefits. I noticed some 451 methods that were improved by my simple assignment-only mod. How many additional methods get caught by yours Nicolas?
best, Eliot |
It's not a clean image, but here are the results:
Eliot code -> 503 methods Modified Eliot code doesEffectForIfRequiresAPop: encoder "If both branch require a pop instruction, then the pop is factored after the jump. If a branch is empty or ends with assignment, it don't ends with a pop, so we don't need to factor a pop" ^arguments noneSatisfy: [:branch | (branch isJust: NodeNil) or: [branch statements last isAssignmentNode]]. -> 880 methods Mine doesEffectForIfRequiresAPop: encoder "If both branch require a pop instruction, then the pop is factored after the jump. In this case, the translation of this node also require a final pop that can further be eliminated eventually." ^arguments allSatisfy: [:branch | (branch isJust: NodeNil) not and: [branch doesEvaluatingEffectRequiresAPop: encoder]]. -> 1202 methods (code attached) And I think I forgot the case of ReturnNode which don't require a pop... Nicolas Le 15 février 2012 00:57, Eliot Miranda <[hidden email]> a écrit : > > > On Tue, Feb 14, 2012 at 2:55 PM, Igor Stasenko <[hidden email]> wrote: >> >> On 15 February 2012 00:50, Nicolas Cellier >> <[hidden email]> wrote: >> > Le 14 février 2012 23:10, Eliot Miranda <[hidden email]> a >> > écrit : >> >> On Tue, Feb 14, 2012 at 5:30 AM, Nicolas Cellier >> >> <[hidden email]> wrote: >> >>> >> >>> Ah, I came up with a relatively explicit solution, but also quite >> >>> heavy in term of number of messages required to implement the >> >>> optimisation. >> >>> I'm curious to see your implementation :) >> >> >> >> >> >> similar, but should I commit it? I think mine is fine since it is only >> >> assignments that have storepop forms, so your generality doesn't win >> >> anything. but i could be wrong. >> >> >> > >> > I was thinking of >> > >> > 1) nested ifTrue: ifFalse: >> > cond1 ifTrue: [x1 := y] ifFalse: [cond2 ifTrue: [x2 := y] ifFalse: [x3 >> > := y]] >> > >> > elseExpr statements last is a message, not an assignment, though we >> > can factor the pop out... >> > >> > 2) inlined loop messages don't require a final pop... >> > >> > x <= y ifTrue: [x to: y do: [:i | (self at: i) doSome]] ifFalse: [y >> > to: x do: [:i | (self at: i) doSome]]. >> > >> > An encoder pushing a value in both branches just to pop it after >> > wouldn't be that well behaved... >> > >> > I won't give any version publicly because I crashed several images in >> > the bootstrap process, and don't know if the bug is in the bootstrap >> > or in the code. >> > >> IMO, too much effort for such diminishing returns. :) > > > That was my thought too. It would be interesting to quantify the benefits. > I noticed some 451 methods that were improved by my simple assignment-only > mod. How many additional methods get caught by yours Nicolas? > >> >> >> > Nicolas >> > >> >> !MessageNode methodsFor: 'code generation' stamp: 'eem 2/13/2012 >> >> 15:25'! >> >> ifThenElseForEffectShouldSharePop >> >> | thenExpr elseExpr | >> >> thenExpr := arguments at: 1. >> >> elseExpr := arguments at: 2. >> >> ^((thenExpr isJust: NodeNil) >> >> or: [(elseExpr isJust: NodeNil) >> >> or: [thenExpr statements last isAssignmentNode >> >> and: [elseExpr statements last isAssignmentNode]]]) >> >> not >> >> >> >> "^(thenExpr isJust: NodeNil) not >> >> and: [(elseExpr isJust: NodeNil) not >> >> and: [thenExpr statements last isAssignmentNode not >> >> or: [elseExpr statements last isAssignmentNode >> >> not]]]"! ! >> >> >> >>> Also, I'm sure that multiple-jump-elimination proposed by Igor and >> >>> other kind of optimization would be a perfect subject for proving >> >>> concepts of NewCompiler IR infrastructure. >> >>> >> >>> MessageNode>>doesEffectRequiresAPop: encoder >> >>> special > 0 >> >>> ifTrue: >> >>> [^self perform: (MacroTesters at: special) >> >>> with: >> >>> encoder]. >> >>> ^super doesEffectRequiresAPop: encoder >> >>> >> >>> MessageNode>>doesEffectForIfRequiresAPop: encoder >> >>> "If both branch require a pop instruction, then the pop is >> >>> factored >> >>> after the jump. >> >>> In this case, the translation of this node also require a final >> >>> pop >> >>> that can further be eliminated eventually." >> >>> | thenExpr elseExpr | >> >>> thenExpr := arguments at: 1. >> >>> elseExpr := arguments at: 2. >> >>> ^((thenExpr isJust: NodeNil) or: [elseExpr isJust: NodeNil]) >> >>> not >> >>> and: [(thenExpr doesEvaluatingEffectRequiresAPop: >> >>> encoder) >> >>> and: >> >>> [elseExpr doesEvaluatingEffectRequiresAPop: encoder]]. >> >>> >> >>> BlockNode>>doesEvaluatingEffectRequiresAPop: anEncoder >> >>> "Test whether a pop instruction is required to skip the value >> >>> on >> >>> the >> >>> stack when this block is inlined." >> >>> ^statements size > 0 and: [statements last >> >>> doesEffectRequiresAPop: >> >>> anEncoder] >> >>> >> >>> AssigmentNode>>doesEffectRequiresAPop: anEncoder >> >>> "Test whether a pop instruction is required to skip the value >> >>> on >> >>> the stack" >> >>> ^(variable canStorePopInASingleInstruction: anEncoder) not >> >>> >> >>> InstanceVariableNode>>canStorePopInASingleInstruction: anEncoder >> >>> ^true >> >>> >> >>> etc... >> >>> >> >>> Le 14 février 2012 01:27, Eliot Miranda <[hidden email]> a >> >>> écrit >> >>> : >> >>> > OK, I fixed this (a pleasant Monday afternoon distraction) (at least >> >>> > I >> >>> > *think* I've fixed it; I still have to check the decompiler). >> >>> > But... in >> >>> > doing so I realised I had to fix CompiledMethod>>#= since I needed >> >>> > to >> >>> > see >> >>> > exactly which methods the compiler change affected (just committed >> >>> > as >> >>> > Kernel-eem.669). In doing this I noticed that recent compiler >> >>> > changes >> >>> > have >> >>> > not been applied by recompiling affected methods in the system. If >> >>> > you >> >>> > look >> >>> > at Behavior>>#sourceMatchesBytecodeAt: it uses CompiledMethod>>#= to >> >>> > check >> >>> > that the compiler's output remains unchanged. In the 4.3 trunk >> >>> > image >> >>> > with >> >>> > CompiledMethod>>#= fixed, sourceMatchesBytecodeAt: reports that >> >>> > some 19654 >> >>> > methods need recompiling: >> >>> > >> >>> > (SystemNavigation new allSelect: [:m| (m methodClass >> >>> > sourceMatchesBytecodeAt: m selector) not]) size 19654 >> >>> > >> >>> > Being selective in recompiling, via e.g. >> >>> > >> >>> > SystemNavigation new allSelect: [:m| (m methodClass >> >>> > sourceMatchesBytecodeAt: >> >>> > m selector) ifFalse: [m methodClass recompile: m selector]. false] >> >>> > >> >>> > one gets to the desired >> >>> > >> >>> > (SystemNavigation new allSelect: [:m| (m methodClass >> >>> > sourceMatchesBytecodeAt: m selector) not]) size 0 >> >>> > >> >>> > So the question is should we add an update that does the selective >> >>> > recompile? I think we should, e.g. by committing a version of >> >>> > Compiler >> >>> > that >> >>> > contains the selective recompile as a post-script. Agreed? >> >>> > >> >>> > On Mon, Feb 13, 2012 at 11:58 AM, Eliot Miranda >> >>> > <[hidden email]> >> >>> > wrote: >> >>> >> >> >>> >> Hi All, >> >>> >> >> >>> >> (whitewash alert) I just had occasion to look at the bytecode >> >>> >> generated by the standard compiler for HashedCollection >> >>> >> class>>#goodPrimeAtLeast:. Here's the source, with the issue in >> >>> >> bold: >> >>> >> >> >>> >> goodPrimeAtLeast: lowerLimit >> >>> >> "Answer the next good prime >= lowerlimit. >> >>> >> If lowerLimit is larger than the largest known good prime, >> >>> >> just make it odd." >> >>> >> | primes low mid high prime | >> >>> >> primes := self goodPrimes. >> >>> >> low := 1. >> >>> >> high := primes size. >> >>> >> lowerLimit > (primes at: high) ifTrue: [ >> >>> >> ^lowerLimit bitOr: 1 ]. >> >>> >> [ high - low <= 1 ] whileFalse: [ >> >>> >> mid := high + low // 2. >> >>> >> prime := primes at: mid. >> >>> >> prime = lowerLimit ifTrue: [ ^prime ]. >> >>> >> prime < lowerLimit >> >>> >> ifTrue: [ low := mid ] >> >>> >> ifFalse: [ high := mid ] ]. >> >>> >> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ]. >> >>> >> ^primes at: high >> >>> >> >> >>> >> The code for this sequence is >> >>> >> 58 <15> pushTemp: 5 >> >>> >> 59 <10> pushTemp: 0 >> >>> >> 60 <B2> send: < >> >>> >> 61 <9B> jumpFalse: 66 >> >>> >> 62 <13> pushTemp: 3 >> >>> >> 63 <81 42> storeIntoTemp: 2 >> >>> >> 65 <92> jumpTo: 69 >> >>> >> 66 <13> pushTemp: 3 >> >>> >> 67 <81 44> storeIntoTemp: 4 >> >>> >> 69 <87> pop >> >>> >> >> >>> >> where-as the following would be better: >> >>> >> >> >>> >> 58 <15> pushTemp: 5 >> >>> >> 59 <10> pushTemp: 0 >> >>> >> 60 <B2> send: < >> >>> >> 61 <9B> jumpFalse: 66 >> >>> >> 62 <13> pushTemp: 3 >> >>> >> 63 <82 42> popIntoTemp: 2 >> >>> >> 65 <92> jumpTo: 69 >> >>> >> 66 <13> pushTemp: 3 >> >>> >> 67 <82 44> popIntoTemp: 4 >> >>> >> >> >>> >> The reason is that the code generator favours using a single pop >> >>> >> for >> >>> >> both >> >>> >> arms of the if: >> >>> >> >> >>> >> MessageNode>>sizeCodeForIf: encoder value: forValue >> >>> >> | thenExpr elseExpr branchSize thenSize elseSize | >> >>> >> thenExpr := arguments at: 1. >> >>> >> elseExpr := arguments at: 2. >> >>> >> (forValue >> >>> >> or: [(thenExpr isJust: NodeNil) >> >>> >> or: [elseExpr isJust: NodeNil]]) not >> >>> >> "(...not ifTrue: avoids using ifFalse: alone during this compile)" >> >>> >> ifTrue: "Two-armed IFs forEffect share a single pop" >> >>> >> [^super sizeCodeForEffect: encoder]. >> >>> >> ... >> >>> >> >> >>> >> MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue >> >>> >> | thenExpr thenSize elseExpr elseSize | >> >>> >> thenSize := sizes at: 1. >> >>> >> elseSize := sizes at: 2. >> >>> >> (forValue not and: [elseSize * thenSize > 0]) ifTrue: >> >>> >> "Two-armed IFs forEffect share a single pop" >> >>> >> [^super emitCodeForEffect: stack encoder: encoder]. >> >>> >> >> >>> >> It would be nice if this only happened if doing so actually reduced >> >>> >> the >> >>> >> size of the generated code. >> >>> >> -- >> >>> >> best, >> >>> >> Eliot >> >>> >> >> >>> > >> >>> > >> >>> > >> >>> > -- >> >>> > best, >> >>> > Eliot >> >>> > >> >>> > >> >>> > >> >>> > >> >>> >> >> >> >> >> >> >> >> -- >> >> best, >> >> Eliot >> >> >> >> >> >> >> >> >> > >> >> >> >> -- >> Best regards, >> Igor Stasenko. >> > > > > -- > best, > Eliot > > > > sharePop2.st (13K) Download Attachment |
1238 methods don't really need to factor pop if I also handle the case of return
Note that this does not always make a difference neither in method length nor in execution time... It does for example if you end with an assignment in one branch and a return in the other, like: cond ifTrue: [x := 0] ifFalse: [x := -1. ^self]. self doFurtherProcessing So the above count does not really tell which methods are worth changing... The case of assignment in single branch is interesting because we replace a store and a pop with a single storePop. cond ifTrue: [x := 0] ifFalse: [self doSome]. self doFurtherProcessing Don't know if it gains anything when jitted though... Nicolas Le 15 février 2012 01:24, Nicolas Cellier <[hidden email]> a écrit : > It's not a clean image, but here are the results: > > Eliot code > -> 503 methods > > Modified Eliot code > doesEffectForIfRequiresAPop: encoder > "If both branch require a pop instruction, then the pop is factored > after the jump. > If a branch is empty or ends with assignment, it don't ends with a > pop, so we don't need to factor a pop" > ^arguments noneSatisfy: [:branch | (branch isJust: NodeNil) or: > [branch statements last isAssignmentNode]]. > -> 880 methods > > Mine > doesEffectForIfRequiresAPop: encoder > "If both branch require a pop instruction, then the pop is factored > after the jump. > In this case, the translation of this node also require a final pop > that can further be eliminated eventually." > ^arguments allSatisfy: [:branch | (branch isJust: NodeNil) not and: > [branch doesEvaluatingEffectRequiresAPop: encoder]]. > -> 1202 methods (code attached) > > And I think I forgot the case of ReturnNode which don't require a pop... > > Nicolas > > Le 15 février 2012 00:57, Eliot Miranda <[hidden email]> a écrit : >> >> >> On Tue, Feb 14, 2012 at 2:55 PM, Igor Stasenko <[hidden email]> wrote: >>> >>> On 15 February 2012 00:50, Nicolas Cellier >>> <[hidden email]> wrote: >>> > Le 14 février 2012 23:10, Eliot Miranda <[hidden email]> a >>> > écrit : >>> >> On Tue, Feb 14, 2012 at 5:30 AM, Nicolas Cellier >>> >> <[hidden email]> wrote: >>> >>> >>> >>> Ah, I came up with a relatively explicit solution, but also quite >>> >>> heavy in term of number of messages required to implement the >>> >>> optimisation. >>> >>> I'm curious to see your implementation :) >>> >> >>> >> >>> >> similar, but should I commit it? I think mine is fine since it is only >>> >> assignments that have storepop forms, so your generality doesn't win >>> >> anything. but i could be wrong. >>> >> >>> > >>> > I was thinking of >>> > >>> > 1) nested ifTrue: ifFalse: >>> > cond1 ifTrue: [x1 := y] ifFalse: [cond2 ifTrue: [x2 := y] ifFalse: [x3 >>> > := y]] >>> > >>> > elseExpr statements last is a message, not an assignment, though we >>> > can factor the pop out... >>> > >>> > 2) inlined loop messages don't require a final pop... >>> > >>> > x <= y ifTrue: [x to: y do: [:i | (self at: i) doSome]] ifFalse: [y >>> > to: x do: [:i | (self at: i) doSome]]. >>> > >>> > An encoder pushing a value in both branches just to pop it after >>> > wouldn't be that well behaved... >>> > >>> > I won't give any version publicly because I crashed several images in >>> > the bootstrap process, and don't know if the bug is in the bootstrap >>> > or in the code. >>> > >>> IMO, too much effort for such diminishing returns. :) >> >> >> That was my thought too. It would be interesting to quantify the benefits. >> I noticed some 451 methods that were improved by my simple assignment-only >> mod. How many additional methods get caught by yours Nicolas? >> >>> >>> >>> > Nicolas >>> > >>> >> !MessageNode methodsFor: 'code generation' stamp: 'eem 2/13/2012 >>> >> 15:25'! >>> >> ifThenElseForEffectShouldSharePop >>> >> | thenExpr elseExpr | >>> >> thenExpr := arguments at: 1. >>> >> elseExpr := arguments at: 2. >>> >> ^((thenExpr isJust: NodeNil) >>> >> or: [(elseExpr isJust: NodeNil) >>> >> or: [thenExpr statements last isAssignmentNode >>> >> and: [elseExpr statements last isAssignmentNode]]]) >>> >> not >>> >> >>> >> "^(thenExpr isJust: NodeNil) not >>> >> and: [(elseExpr isJust: NodeNil) not >>> >> and: [thenExpr statements last isAssignmentNode not >>> >> or: [elseExpr statements last isAssignmentNode >>> >> not]]]"! ! >>> >> >>> >>> Also, I'm sure that multiple-jump-elimination proposed by Igor and >>> >>> other kind of optimization would be a perfect subject for proving >>> >>> concepts of NewCompiler IR infrastructure. >>> >>> >>> >>> MessageNode>>doesEffectRequiresAPop: encoder >>> >>> special > 0 >>> >>> ifTrue: >>> >>> [^self perform: (MacroTesters at: special) >>> >>> with: >>> >>> encoder]. >>> >>> ^super doesEffectRequiresAPop: encoder >>> >>> >>> >>> MessageNode>>doesEffectForIfRequiresAPop: encoder >>> >>> "If both branch require a pop instruction, then the pop is >>> >>> factored >>> >>> after the jump. >>> >>> In this case, the translation of this node also require a final >>> >>> pop >>> >>> that can further be eliminated eventually." >>> >>> | thenExpr elseExpr | >>> >>> thenExpr := arguments at: 1. >>> >>> elseExpr := arguments at: 2. >>> >>> ^((thenExpr isJust: NodeNil) or: [elseExpr isJust: NodeNil]) >>> >>> not >>> >>> and: [(thenExpr doesEvaluatingEffectRequiresAPop: >>> >>> encoder) >>> >>> and: >>> >>> [elseExpr doesEvaluatingEffectRequiresAPop: encoder]]. >>> >>> >>> >>> BlockNode>>doesEvaluatingEffectRequiresAPop: anEncoder >>> >>> "Test whether a pop instruction is required to skip the value >>> >>> on >>> >>> the >>> >>> stack when this block is inlined." >>> >>> ^statements size > 0 and: [statements last >>> >>> doesEffectRequiresAPop: >>> >>> anEncoder] >>> >>> >>> >>> AssigmentNode>>doesEffectRequiresAPop: anEncoder >>> >>> "Test whether a pop instruction is required to skip the value >>> >>> on >>> >>> the stack" >>> >>> ^(variable canStorePopInASingleInstruction: anEncoder) not >>> >>> >>> >>> InstanceVariableNode>>canStorePopInASingleInstruction: anEncoder >>> >>> ^true >>> >>> >>> >>> etc... >>> >>> >>> >>> Le 14 février 2012 01:27, Eliot Miranda <[hidden email]> a >>> >>> écrit >>> >>> : >>> >>> > OK, I fixed this (a pleasant Monday afternoon distraction) (at least >>> >>> > I >>> >>> > *think* I've fixed it; I still have to check the decompiler). >>> >>> > But... in >>> >>> > doing so I realised I had to fix CompiledMethod>>#= since I needed >>> >>> > to >>> >>> > see >>> >>> > exactly which methods the compiler change affected (just committed >>> >>> > as >>> >>> > Kernel-eem.669). In doing this I noticed that recent compiler >>> >>> > changes >>> >>> > have >>> >>> > not been applied by recompiling affected methods in the system. If >>> >>> > you >>> >>> > look >>> >>> > at Behavior>>#sourceMatchesBytecodeAt: it uses CompiledMethod>>#= to >>> >>> > check >>> >>> > that the compiler's output remains unchanged. In the 4.3 trunk >>> >>> > image >>> >>> > with >>> >>> > CompiledMethod>>#= fixed, sourceMatchesBytecodeAt: reports that >>> >>> > some 19654 >>> >>> > methods need recompiling: >>> >>> > >>> >>> > (SystemNavigation new allSelect: [:m| (m methodClass >>> >>> > sourceMatchesBytecodeAt: m selector) not]) size 19654 >>> >>> > >>> >>> > Being selective in recompiling, via e.g. >>> >>> > >>> >>> > SystemNavigation new allSelect: [:m| (m methodClass >>> >>> > sourceMatchesBytecodeAt: >>> >>> > m selector) ifFalse: [m methodClass recompile: m selector]. false] >>> >>> > >>> >>> > one gets to the desired >>> >>> > >>> >>> > (SystemNavigation new allSelect: [:m| (m methodClass >>> >>> > sourceMatchesBytecodeAt: m selector) not]) size 0 >>> >>> > >>> >>> > So the question is should we add an update that does the selective >>> >>> > recompile? I think we should, e.g. by committing a version of >>> >>> > Compiler >>> >>> > that >>> >>> > contains the selective recompile as a post-script. Agreed? >>> >>> > >>> >>> > On Mon, Feb 13, 2012 at 11:58 AM, Eliot Miranda >>> >>> > <[hidden email]> >>> >>> > wrote: >>> >>> >> >>> >>> >> Hi All, >>> >>> >> >>> >>> >> (whitewash alert) I just had occasion to look at the bytecode >>> >>> >> generated by the standard compiler for HashedCollection >>> >>> >> class>>#goodPrimeAtLeast:. Here's the source, with the issue in >>> >>> >> bold: >>> >>> >> >>> >>> >> goodPrimeAtLeast: lowerLimit >>> >>> >> "Answer the next good prime >= lowerlimit. >>> >>> >> If lowerLimit is larger than the largest known good prime, >>> >>> >> just make it odd." >>> >>> >> | primes low mid high prime | >>> >>> >> primes := self goodPrimes. >>> >>> >> low := 1. >>> >>> >> high := primes size. >>> >>> >> lowerLimit > (primes at: high) ifTrue: [ >>> >>> >> ^lowerLimit bitOr: 1 ]. >>> >>> >> [ high - low <= 1 ] whileFalse: [ >>> >>> >> mid := high + low // 2. >>> >>> >> prime := primes at: mid. >>> >>> >> prime = lowerLimit ifTrue: [ ^prime ]. >>> >>> >> prime < lowerLimit >>> >>> >> ifTrue: [ low := mid ] >>> >>> >> ifFalse: [ high := mid ] ]. >>> >>> >> (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ]. >>> >>> >> ^primes at: high >>> >>> >> >>> >>> >> The code for this sequence is >>> >>> >> 58 <15> pushTemp: 5 >>> >>> >> 59 <10> pushTemp: 0 >>> >>> >> 60 <B2> send: < >>> >>> >> 61 <9B> jumpFalse: 66 >>> >>> >> 62 <13> pushTemp: 3 >>> >>> >> 63 <81 42> storeIntoTemp: 2 >>> >>> >> 65 <92> jumpTo: 69 >>> >>> >> 66 <13> pushTemp: 3 >>> >>> >> 67 <81 44> storeIntoTemp: 4 >>> >>> >> 69 <87> pop >>> >>> >> >>> >>> >> where-as the following would be better: >>> >>> >> >>> >>> >> 58 <15> pushTemp: 5 >>> >>> >> 59 <10> pushTemp: 0 >>> >>> >> 60 <B2> send: < >>> >>> >> 61 <9B> jumpFalse: 66 >>> >>> >> 62 <13> pushTemp: 3 >>> >>> >> 63 <82 42> popIntoTemp: 2 >>> >>> >> 65 <92> jumpTo: 69 >>> >>> >> 66 <13> pushTemp: 3 >>> >>> >> 67 <82 44> popIntoTemp: 4 >>> >>> >> >>> >>> >> The reason is that the code generator favours using a single pop >>> >>> >> for >>> >>> >> both >>> >>> >> arms of the if: >>> >>> >> >>> >>> >> MessageNode>>sizeCodeForIf: encoder value: forValue >>> >>> >> | thenExpr elseExpr branchSize thenSize elseSize | >>> >>> >> thenExpr := arguments at: 1. >>> >>> >> elseExpr := arguments at: 2. >>> >>> >> (forValue >>> >>> >> or: [(thenExpr isJust: NodeNil) >>> >>> >> or: [elseExpr isJust: NodeNil]]) not >>> >>> >> "(...not ifTrue: avoids using ifFalse: alone during this compile)" >>> >>> >> ifTrue: "Two-armed IFs forEffect share a single pop" >>> >>> >> [^super sizeCodeForEffect: encoder]. >>> >>> >> ... >>> >>> >> >>> >>> >> MessageNode>>emitCodeForIf: stack encoder: encoder value: forValue >>> >>> >> | thenExpr thenSize elseExpr elseSize | >>> >>> >> thenSize := sizes at: 1. >>> >>> >> elseSize := sizes at: 2. >>> >>> >> (forValue not and: [elseSize * thenSize > 0]) ifTrue: >>> >>> >> "Two-armed IFs forEffect share a single pop" >>> >>> >> [^super emitCodeForEffect: stack encoder: encoder]. >>> >>> >> >>> >>> >> It would be nice if this only happened if doing so actually reduced >>> >>> >> the >>> >>> >> size of the generated code. >>> >>> >> -- >>> >>> >> best, >>> >>> >> Eliot >>> >>> >> >>> >>> > >>> >>> > >>> >>> > >>> >>> > -- >>> >>> > best, >>> >>> > Eliot >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> >>> >> >>> >> >>> >> >>> >> -- >>> >> best, >>> >> Eliot >>> >> >>> >> >>> >> >>> >> >>> > >>> >>> >>> >>> -- >>> Best regards, >>> Igor Stasenko. >>> >> >> >> >> -- >> best, >> Eliot >> >> >> >> sharePop3.st (14K) Download Attachment |
Free forum by Nabble | Edit this page |