A minor issue with register allocator

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

A minor issue with register allocator

Igor Stasenko
I'm currently testing my asm-parser. And found it generates unneeded
instruction (mov eax,eax):
code of asm method:

message4: arg1 with: arg2
        | b c |
        self pragma: #cdecl.
        c := arg1 + arg2.
        b := arg1 / arg2.
        ^ c + b

produced intermediate by parser:
#(#(#block 1
#(#mov #(#add #(#argument #arg1) #(#argument #arg2)) #c)
#(#mov #(#div #(#argument #arg1) #(#argument #arg2) 'temp7D6') #b)
#(#return #(#add #c #b))))

then with transformed arguments and cdecl prologue/epilogue:
#(#(#block 1 #(#push #ebp) #(#mov #esp #ebp)) #(#block 3
 #(#mov #(#add #(#mem #(#add #ebp -4)) #(#mem #(#add #ebp -8))) 't1')
#(#mov #(#div #(#mem #(#add #ebp -4)) #(#mem #(#add #ebp -8)) 't3') 't2')
#(#mov #(#add 't1' 't2') #eax) #(#jmp #block2)) #(#block 2 #(#mov #ebp
#esp) #(#pop #ebp) #(#ret)))

and then i using
LowLevelOptimiser ->InstructionSelector -> ColouringRegisterAllocator
-> JumpRemover :

#(#(#block 1 #(#push #ebp) #(#mov #esp #ebp)) #(#block 3 #(#mov #(-8
#ebp) #ebx) #(#add #(-4 #ebp) #ebx) #(#mov #(-4 #ebp) #eax) #(#mov
#(-8 #ebp) #ecx) #(#cdq) #(#div #ecx)

#(#mov #eax #eax)

#(#add #ebx #eax)) #(#block 2 #(#mov #ebp #esp) #(#pop #ebp) #(#ret)))

I suppose fix could be simple by testing in MedMov>>#visitWith: that
if reg1 = reg2 then return nil , which will be interpreted as no-op by
block.

version used: Exupery-wbk.261.
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

A minor issue with register allocator

Bryce Kampjes
sig writes:
 > I'm currently testing my asm-parser. And found it generates unneeded
 > instruction (mov eax,eax):
 > code of asm method:
 >
 > message4: arg1 with: arg2
 > | b c |
 > self pragma: #cdecl.
 > c := arg1 + arg2.
 > b := arg1 / arg2.
 > ^ c + b
 >
 > produced intermediate by parser:
 > #(#(#block 1
 > #(#mov #(#add #(#argument #arg1) #(#argument #arg2)) #c)
 > #(#mov #(#div #(#argument #arg1) #(#argument #arg2) 'temp7D6') #b)
 > #(#return #(#add #c #b))))
 >
 > then with transformed arguments and cdecl prologue/epilogue:
 > #(#(#block 1 #(#push #ebp) #(#mov #esp #ebp)) #(#block 3
 >  #(#mov #(#add #(#mem #(#add #ebp -4)) #(#mem #(#add #ebp -8))) 't1')
 > #(#mov #(#div #(#mem #(#add #ebp -4)) #(#mem #(#add #ebp -8)) 't3') 't2')
 > #(#mov #(#add 't1' 't2') #eax) #(#jmp #block2)) #(#block 2 #(#mov #ebp
 > #esp) #(#pop #ebp) #(#ret)))
 >
 > and then i using
 > LowLevelOptimiser ->InstructionSelector -> ColouringRegisterAllocator
 > -> JumpRemover :
 >
 > #(#(#block 1 #(#push #ebp) #(#mov #esp #ebp)) #(#block 3 #(#mov #(-8
 > #ebp) #ebx) #(#add #(-4 #ebp) #ebx) #(#mov #(-4 #ebp) #eax) #(#mov
 > #(-8 #ebp) #ecx) #(#cdq) #(#div #ecx)
 >
 > #(#mov #eax #eax)
 >
 > #(#add #ebx #eax)) #(#block 2 #(#mov #ebp #esp) #(#pop #ebp) #(#ret)))
 >
 > I suppose fix could be simple by testing in MedMov>>#visitWith: that
 > if reg1 = reg2 then return nil , which will be interpreted as no-op by
 > block.

I'm not sure why Exupery is doing that. The register uses a decent
move coalescer which will first try to create moves like (mov eax eax)
then remove them. The register allocator goes to some lengths to do
this.

It's possible that you've found a case where the no-op move removal
code isn't getting used, though I'm not sure where it is. It will be
interesting to here if the problem is still there after you change how
you're creating intermediate to make sure that there's only one #eax
register object. 1)

Exupery relies on object identity inside it's intermediate, this is
a design decision that's probabably in many places by now.

Bryce
 
1) Sig and I've spoken on irc about this.
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: A minor issue with register allocator

Igor Stasenko
I'm already found a bug. It was in my code, which introduced problems
with registers.
Sorry for disturbing :)

On 28/06/07, [hidden email] <[hidden email]> wrote:

> sig writes:
>  > I'm currently testing my asm-parser. And found it generates unneeded
>  > instruction (mov eax,eax):
>  > code of asm method:
>  >
>  > message4: arg1 with: arg2
>  >      | b c |
>  >      self pragma: #cdecl.
>  >      c := arg1 + arg2.
>  >      b := arg1 / arg2.
>  >      ^ c + b
>  >
>  > produced intermediate by parser:
>  > #(#(#block 1
>  > #(#mov #(#add #(#argument #arg1) #(#argument #arg2)) #c)
>  > #(#mov #(#div #(#argument #arg1) #(#argument #arg2) 'temp7D6') #b)
>  > #(#return #(#add #c #b))))
>  >
>  > then with transformed arguments and cdecl prologue/epilogue:
>  > #(#(#block 1 #(#push #ebp) #(#mov #esp #ebp)) #(#block 3
>  >  #(#mov #(#add #(#mem #(#add #ebp -4)) #(#mem #(#add #ebp -8))) 't1')
>  > #(#mov #(#div #(#mem #(#add #ebp -4)) #(#mem #(#add #ebp -8)) 't3') 't2')
>  > #(#mov #(#add 't1' 't2') #eax) #(#jmp #block2)) #(#block 2 #(#mov #ebp
>  > #esp) #(#pop #ebp) #(#ret)))
>  >
>  > and then i using
>  > LowLevelOptimiser ->InstructionSelector -> ColouringRegisterAllocator
>  > -> JumpRemover :
>  >
>  > #(#(#block 1 #(#push #ebp) #(#mov #esp #ebp)) #(#block 3 #(#mov #(-8
>  > #ebp) #ebx) #(#add #(-4 #ebp) #ebx) #(#mov #(-4 #ebp) #eax) #(#mov
>  > #(-8 #ebp) #ecx) #(#cdq) #(#div #ecx)
>  >
>  > #(#mov #eax #eax)
>  >
>  > #(#add #ebx #eax)) #(#block 2 #(#mov #ebp #esp) #(#pop #ebp) #(#ret)))
>  >
>  > I suppose fix could be simple by testing in MedMov>>#visitWith: that
>  > if reg1 = reg2 then return nil , which will be interpreted as no-op by
>  > block.
>
> I'm not sure why Exupery is doing that. The register uses a decent
> move coalescer which will first try to create moves like (mov eax eax)
> then remove them. The register allocator goes to some lengths to do
> this.
>
> It's possible that you've found a case where the no-op move removal
> code isn't getting used, though I'm not sure where it is. It will be
> interesting to here if the problem is still there after you change how
> you're creating intermediate to make sure that there's only one #eax
> register object. 1)
>
> Exupery relies on object identity inside it's intermediate, this is
> a design decision that's probabably in many places by now.
>
> Bryce
>
> 1) Sig and I've spoken on irc about this.
> _______________________________________________
> Exupery mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: A minor issue with register allocator

Igor Stasenko
Ohh, another issue.

Having an input intermediate:
#(#block 1 #(#push #ebp) #(#mov #esp #ebp))
#(#block 3 #(#mov #(#mem #(#add #ebp -4)) 't1') #(#mov #(#mem #(#add
#ebp -8)) 't2'))
#(#block 5 #(#mov #(#add 't1' 't2') 't3') #(#jmp #block4))
#(#block 4)
#(#block 6 #(#mov 't3' #eax) #(#jmp #block2))
#(#block 2 #(#mov #ebp #esp) #(#pop #ebp) #(#ret))

after running LowLevelOptimizer and InstructionSelector i got:

#(#block 1 #(#push #ebp) #(#mov #esp #ebp))
#(#block 3 #(#mov #(-4 #ebp) 't1') #(#mov #(-8 #ebp) 't2'))
#(#block 5 #(#mov 't2' 't4') #(#add 't1' 't4') #(#mov 't4' 't3')
#(#jmp #block4))
#(#block 4)
#(#block 6 #(#mov 't3' #eax) #(#jmp #block2))
#(#block 2 #(#mov #ebp #esp) #(#pop #ebp) #(#ret))

and then after coloring register allocator:
#(#(#block 1 #(#push #ebp) #(#mov #esp #ebp))
#(#block 3 #(#mov #(-4 #ebp) #eax) #(#mov #(-8 #ebp) #eax))
#(#block 5 #(#add #ebx #eax) #(#jmp #block4))
#(#block 4)
#(#block 6 #(#jmp #block2))
#(#block 2 #(#mov #ebp #esp) #(#pop #ebp) #(#ret)))

take a look at block3 , as you see it chooses #eax for replacing
temporary registers in both instructions (which is wrong of course)
and then in block5 , it seems thinks that there was #ebx used in one of them.
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: A minor issue with register allocator

Igor Stasenko
Yet again, issue was in using registers with same name but for
different instances.
Confused me second time, since intermediate of registers with same
name looks perfectly same.

Maybe its better to change #intermediateForm/#printOn: of
MedMachineRegister to put name with following object's hash? Just to
avoid misinterpretation in future..
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: A minor issue with register allocator

Igor Stasenko
Another observation:

Something prevents to make full optimisation of following intermediate

#(#block 1 #(#push #ebp) #(#mov #esp #ebp))
#(#block 3 #(#mov #(#mem #(#add #ebp -4)) 't1') #(#mov #(#mem #(#add
#ebp -8)) 't2'))
#(#block 5 #(#mov #(#add 't1' 't2') 't3') #(#jmp #block4))
#(#block 4)
#(#block 6 #(#mov 't3' #eax) #(#jmp #block2))
#(#block 2 #(#mov #ebp #esp) #(#pop #ebp) #(#ret))

into just:
#(#(#block 1 #(#push #ebp) #(#mov #esp #ebp))
#(#block 3 #(#mov #(-8 #ebp) #eax) #(#add #(-4 #ebp) #eax))
#(#block 2 #(#mov #ebp #esp) #(#pop #ebp) #(#ret)))

it generates following instead:
#(#(#block 1 #(#push #ebp) #(#mov #esp #ebp))
#(#block 3 #(#mov #(-4 #ebp) #ebx) #(#mov #(-8 #ebp) #eax))
#(#block 5 #(#add #ebx #eax))
#(#block 4)
#(#block 6)
#(#block 2 #(#mov #ebp #esp) #(#pop #ebp) #(#ret)))

which uses 1 extra register and 1 more instruction comparing to
previous output. Its not very important, i'm just curious what can
prevent it from optimizing ? is this because movs and add placed in
separate blocks, or because there is jumps (which is removed later,
but prevent to fully optimize the code)? Maybe its better to put jumps
remover before register allocation?
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: A minor issue with register allocator

Bryce Kampjes
In reply to this post by Igor Stasenko
sig writes:
 > Yet again, issue was in using registers with same name but for
 > different instances.
 > Confused me second time, since intermediate of registers with same
 > name looks perfectly same.
 >
 > Maybe its better to change #intermediateForm/#printOn: of
 > MedMachineRegister to put name with following object's hash? Just to
 > avoid misinterpretation in future..

I'd probably write a validate method that validates the intermediate
and checks that there's only one register object for each register
name.

Having multiple registers for the same name is not a problem I've
personally had. However a similar problem is forgetting to convert
a sub-expression and just using the one in the input intermediate.

There are verify methods scattered around the unit testing framework
designed to catch these hard to spot errors. Unit testing is a good
place to put them as they'll identify the stage where the error is
introduced.

It's common for Exupery's unit test suites to end with a call to "self
verify" which will then call the verification methods required.

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: A minor issue with register allocator

Bryce Kampjes
In reply to this post by Igor Stasenko
sig writes:
 > Another observation:
 >
 > Something prevents to make full optimisation of following intermediate
 >
 > #(#block 1 #(#push #ebp) #(#mov #esp #ebp))
 > #(#block 3 #(#mov #(#mem #(#add #ebp -4)) 't1') #(#mov #(#mem #(#add
 > #ebp -8)) 't2'))
 > #(#block 5 #(#mov #(#add 't1' 't2') 't3') #(#jmp #block4))
 > #(#block 4)
 > #(#block 6 #(#mov 't3' #eax) #(#jmp #block2))
 > #(#block 2 #(#mov #ebp #esp) #(#pop #ebp) #(#ret))
 >
 > into just:
 > #(#(#block 1 #(#push #ebp) #(#mov #esp #ebp))
 > #(#block 3 #(#mov #(-8 #ebp) #eax) #(#add #(-4 #ebp) #eax))
 > #(#block 2 #(#mov #ebp #esp) #(#pop #ebp) #(#ret)))
 >
 > it generates following instead:
 > #(#(#block 1 #(#push #ebp) #(#mov #esp #ebp))
 > #(#block 3 #(#mov #(-4 #ebp) #ebx) #(#mov #(-8 #ebp) #eax))
 > #(#block 5 #(#add #ebx #eax))
 > #(#block 4)
 > #(#block 6)
 > #(#block 2 #(#mov #ebp #esp) #(#pop #ebp) #(#ret)))
 >
 > which uses 1 extra register and 1 more instruction comparing to
 > previous output. Its not very important, i'm just curious what can
 > prevent it from optimizing ? is this because movs and add placed in
 > separate blocks, or because there is jumps (which is removed later,
 > but prevent to fully optimize the code)? Maybe its better to put jumps
 > remover before register allocation?

I'm guessing the question is why's (add ebx eax) a simple instruction
rather (add (-4 ebp) eax) thus potentually saving the instruction
and the register used to load (-4 eax).

The answer is the instruction selector only looks at the expression
trees, it doesn't do whole method analysis so because (-4 eax) is
loaded into a register it doesn't recognise that it's only used in
one place.

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: A minor issue with register allocator

Igor Stasenko
> The answer is the instruction selector only looks at the expression
> trees, it doesn't do whole method analysis so because (-4 eax) is
> loaded into a register it doesn't recognise that it's only used in
> one place.
>
I understand. I just wanted to use this for testing, to see how is my
method inlining works.

in sample above i compiling two methods:
--
message4: arg1 with: arg2
        self pragma: #cdecl.
        ^ arg1 + arg2.
--
and:

inlineMessage4: a with: b
        self pragma: #cdecl.
        ^ self inlineCall: (self message4: a with:b)
--

the intent was to get same code for both methods compiled with Exupery.
I thought that any extra registers/movs/jumps produced as side effect
of inlining can be simply optimized with Exupery low-level stuff..
But instead i have different code, which, of course doing same.
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery