Some questions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Some questions

Guillermo Adrián Molina
Hi list, I been playing around with exupery. And now I have a few questions:

1) I cant get tinyBenchmarks working, neither in linux, nor in windows,

Downloaded all the staff from:
http://wiki.squeak.org/squeak/Installing+Exupery

used: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-linux.tz in linux
and: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-win32.zip in windows

with prebuild image: http://ftp.squeak.org/Exupery/images/exupery-0.10.tz

Examples run ok, but when I try to run tinyBenchmarks I get segmentation
faults

2) Tried tinyBenchmarks in VisualWorks (NonCommercial 7.4.1) in my
machine, I got:
'652,229,299 bytecodes/sec; 89,016,165 sends/sec'

Does anyone know Why I get almost 90 million sends/sec?
I think It's quite a big difference from previous versions of vw

3) I saw that primitives for #at: and #at:put: are getting inlined, but I
think they are only implemented for Variable Objects (not for bytes nor
Characters nor anything else)
Is that true?

4) In my experiments with exupery, I get an error if I inline too many
methods. I think I am getting out of machine registers, for example, when
I try to compile Integer-#digitDiv:reg:.
I get this error In the ColouringRegisterAllocator phase, but it is not a
"You dont have more registers, dude" kind of error.
Is the "no more registers" situation taken into consideration?

5) Is there a way to implement indirect jump tables in exupery?

Thanks a lot.
Cheers
Guille

_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Some questions

Bryce Kampjes
Guillermo Adrián Molina writes:
 > Hi list, I been playing around with exupery. And now I have a few questions:
 >
 > 1) I cant get tinyBenchmarks working, neither in linux, nor in windows,
 >
 > Downloaded all the staff from:
 > http://wiki.squeak.org/squeak/Installing+Exupery
 >
 > used: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-linux.tz in linux
 > and: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-win32.zip in windows
 >
 > with prebuild image: http://ftp.squeak.org/Exupery/images/exupery-0.10.tz
 >
 > Examples run ok, but when I try to run tinyBenchmarks I get segmentation
 > faults

Try using the 0.11 Exupery VM with Exupery 0.11. Exupery VMs must
match the Exupery version. The interface between Exupery and the VM is
still evolving.

 > 2) Tried tinyBenchmarks in VisualWorks (NonCommercial 7.4.1) in my
 > machine, I got:
 > '652,229,299 bytecodes/sec; 89,016,165 sends/sec'
 >
 > Does anyone know Why I get almost 90 million sends/sec?
 > I think It's quite a big difference from previous versions of vw
 >
 > 3) I saw that primitives for #at: and #at:put: are getting inlined, but I
 > think they are only implemented for Variable Objects (not for bytes nor
 > Characters nor anything else)
 > Is that true?

It's true. #at: and #at:put: are only implemented for variable
objects. I should write primitives for other types. Good benchmarks
that demonstrate the need for such primitives would be nice.

 > 4) In my experiments with exupery, I get an error if I inline too many
 > methods. I think I am getting out of machine registers, for example, when
 > I try to compile Integer-#digitDiv:reg:.
 > I get this error In the ColouringRegisterAllocator phase, but it is not a
 > "You dont have more registers, dude" kind of error.
 > Is the "no more registers" situation taken into consideration?

I'd guess that it was because a variable was live at an entry point.
There's a stack tracing bug which I'm just fixing that could have
caused that.

I use the liveness analyser in the register allocator to catch
compiler bugs. It's much nicer to catch them there than with crashes.

 > 5) Is there a way to implement indirect jump tables in exupery?

It would be possible. I do use indirect jumps for returns to compiled
methods. If you look at any method you should see at least one
indirect jump in the return code. Just jump to a register.

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Some questions

Guillermo Adrián Molina
Hi there!
Thanks for the answers, found them very useful
I have a few more questions

> Guillermo Adrián Molina writes:
>  > Hi list, I been playing around with exupery. And now I have a few
> questions:
>  >
>  > 1) I cant get tinyBenchmarks working, neither in linux, nor in windows,
>  >
>  > Downloaded all the staff from:
>  > http://wiki.squeak.org/squeak/Installing+Exupery
>  >
>  > used: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-linux.tz in
> linux
>  > and: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-win32.zip in
> windows
>  >
>  > with prebuild image:
> http://ftp.squeak.org/Exupery/images/exupery-0.10.tz
>  >
>  > Examples run ok, but when I try to run tinyBenchmarks I get
> segmentation
>  > faults
>
> Try using the 0.11 Exupery VM with Exupery 0.11. Exupery VMs must
> match the Exupery version. The interface between Exupery and the VM is
> still evolving.
>

Ok!, tried that, it worked:
668407310 bytecodes/sec; 13559830 sends/sec
760772659 bytecodes/sec; 13803237 sends/sec
777524677 bytecodes/sec; 12762744 sends/sec
760772659 bytecodes/sec; 13834279 sends/sec
775757575 bytecodes/sec; 13569800 sends/sec
I read something about intel being faster than AMD for exupery, Do you
know why is that?


>  > 2) Tried tinyBenchmarks in VisualWorks (NonCommercial 7.4.1) in my
>  > machine, I got:
>  > '652,229,299 bytecodes/sec; 89,016,165 sends/sec'
>  >
>  > Does anyone know Why I get almost 90 million sends/sec?
>  > I think It's quite a big difference from previous versions of vw
>  >
>  > 3) I saw that primitives for #at: and #at:put: are getting inlined, but
> I
>  > think they are only implemented for Variable Objects (not for bytes nor
>  > Characters nor anything else)
>  > Is that true?
>
> It's true. #at: and #at:put: are only implemented for variable
> objects. I should write primitives for other types. Good benchmarks
> that demonstrate the need for such primitives would be nice.
>
I 'll try to check that, thanks

>  > 4) In my experiments with exupery, I get an error if I inline too many
>  > methods. I think I am getting out of machine registers, for example,
> when
>  > I try to compile Integer-#digitDiv:reg:.
>  > I get this error In the ColouringRegisterAllocator phase, but it is not
> a
>  > "You dont have more registers, dude" kind of error.
>  > Is the "no more registers" situation taken into consideration?
>
> I'd guess that it was because a variable was live at an entry point.
> There's a stack tracing bug which I'm just fixing that could have
> caused that.
>
> I use the liveness analyser in the register allocator to catch
> compiler bugs. It's much nicer to catch them there than with crashes.
>

Yes I've seen those kind of errors (variable live at entry point),
corrected them initializing temps with nil.
I think this is something different. In this method of the
ColouringRegisterAllocator:

findNodeToSpill
        | spillable |
        "This is just a basic heuristic, spill the register that interferes with
the most
        other registers. It is possible to do a lot better.
        The heuristic should concider how much each register is used while it is
alive"
        spillable := spillWorklist select:
                [:each | ((self hasSpill: each register) not) and: [each register
isMachineRegister not]].
        spillable := spillable asSortedCollection: [:a :b| a spillWeight > b
spillWeight].
        ^ spillable first

After compiling lots of methods using exupery, it fails with very big
methods because spillable is nil, and spillable first throws an error. If
I make less inlining (for example, not inlining divisions and
multiplications), it compiles ok!
Any ideas?

>  > 5) Is there a way to implement indirect jump tables in exupery?
>
> It would be possible. I do use indirect jumps for returns to compiled
> methods. If you look at any method you should see at least one
> indirect jump in the return code. Just jump to a register.
>
Yes, I checked that, but I still need to initialize that register with the
convenient block, but I need to do that without using Jcc (conditional
jumps) to choose from the right one, Any suggestions?


> Bryce
> _______________________________________________
> Exupery mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>

Thanks a lot
cheers, Guille

_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Some questions

Bryce Kampjes
Guillermo Adrián Molina writes:

 > Ok!, tried that, it worked:
 > 668407310 bytecodes/sec; 13559830 sends/sec
 > 760772659 bytecodes/sec; 13803237 sends/sec
 > 777524677 bytecodes/sec; 12762744 sends/sec
 > 760772659 bytecodes/sec; 13834279 sends/sec
 > 775757575 bytecodes/sec; 13569800 sends/sec
 > I read something about intel being faster than AMD for exupery, Do you
 > know why is that?
 >

Exupery was much faster than the interpreter on Pentium 4s. That's
because the Pentium 4 is an inefficient chip to run the interprter on.

Those comparisions are rather old now. Hardware has moved on and so
has Exupery. Benchmarking now with bigger suites may show different
numbers.

 > >  > 4) In my experiments with exupery, I get an error if I inline too many
 > >  > methods. I think I am getting out of machine registers, for example,
 > > when
 > >  > I try to compile Integer-#digitDiv:reg:.
 > >  > I get this error In the ColouringRegisterAllocator phase, but it is not
 > > a
 > >  > "You dont have more registers, dude" kind of error.
 > >  > Is the "no more registers" situation taken into consideration?
 > >
 > > I'd guess that it was because a variable was live at an entry point.
 > > There's a stack tracing bug which I'm just fixing that could have
 > > caused that.
 > >
 > > I use the liveness analyser in the register allocator to catch
 > > compiler bugs. It's much nicer to catch them there than with crashes.
 > >
 >
 > Yes I've seen those kind of errors (variable live at entry point),
 > corrected them initializing temps with nil.
 > I think this is something different. In this method of the
 > ColouringRegisterAllocator:
 >
 > findNodeToSpill
 > | spillable |
 > "This is just a basic heuristic, spill the register that interferes with
 > the most
 > other registers. It is possible to do a lot better.
 > The heuristic should concider how much each register is used while it is
 > alive"
 > spillable := spillWorklist select:
 > [:each | ((self hasSpill: each register) not) and: [each register
 > isMachineRegister not]].
 > spillable := spillable asSortedCollection: [:a :b| a spillWeight > b
 > spillWeight].
 > ^ spillable first
 >
 > After compiling lots of methods using exupery, it fails with very big
 > methods because spillable is nil, and spillable first throws an error. If
 > I make less inlining (for example, not inlining divisions and
 > multiplications), it compiles ok!
 > Any ideas?

I'd guess it's a limit with the register allocator. It is possible
that it can fail to find a register to spill when it needs to spill
something. Given this bug will not cause crashes or incorrect
execution it's not high priority.

 > >  > 5) Is there a way to implement indirect jump tables in exupery?
 > >
 > > It would be possible. I do use indirect jumps for returns to compiled
 > > methods. If you look at any method you should see at least one
 > > indirect jump in the return code. Just jump to a register.
 > >
 > Yes, I checked that, but I still need to initialize that register with the
 > convenient block, but I need to do that without using Jcc (conditional
 > jumps) to choose from the right one, Any suggestions?

Exupery also can get the address of a block. That's also done in the
send code to save the compiled program counter. The compiled program
counter is the address of the machine code block to return to encoded
as a SmallInteger. Return blocks are aligned to 2 byte boundaries to
allow for tagging. That's enough to build an indirect jump table if
you wanted to do that.

Why do you need to build an indirect jump table? What are you trying
to do?

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Some questions

Guillermo Adrián Molina

> Guillermo Adrián Molina writes:
>
>  > Ok!, tried that, it worked:
>  > 668407310 bytecodes/sec; 13559830 sends/sec
>  > 760772659 bytecodes/sec; 13803237 sends/sec
>  > 777524677 bytecodes/sec; 12762744 sends/sec
>  > 760772659 bytecodes/sec; 13834279 sends/sec
>  > 775757575 bytecodes/sec; 13569800 sends/sec
>  > I read something about intel being faster than AMD for exupery, Do you
>  > know why is that?
>  >
>
> Exupery was much faster than the interpreter on Pentium 4s. That's
> because the Pentium 4 is an inefficient chip to run the interprter on.
>
> Those comparisions are rather old now. Hardware has moved on and so
> has Exupery. Benchmarking now with bigger suites may show different
> numbers.
>
>  > >  > 4) In my experiments with exupery, I get an error if I inline too
> many
>  > >  > methods. I think I am getting out of machine registers, for
> example,
>  > > when
>  > >  > I try to compile Integer-#digitDiv:reg:.
>  > >  > I get this error In the ColouringRegisterAllocator phase, but it
> is not
>  > > a
>  > >  > "You dont have more registers, dude" kind of error.
>  > >  > Is the "no more registers" situation taken into consideration?
>  > >
>  > > I'd guess that it was because a variable was live at an entry point.
>  > > There's a stack tracing bug which I'm just fixing that could have
>  > > caused that.
>  > >
>  > > I use the liveness analyser in the register allocator to catch
>  > > compiler bugs. It's much nicer to catch them there than with crashes.
>  > >
>  >
>  > Yes I've seen those kind of errors (variable live at entry point),
>  > corrected them initializing temps with nil.
>  > I think this is something different. In this method of the
>  > ColouringRegisterAllocator:
>  >
>  > findNodeToSpill
>  > | spillable |
>  > "This is just a basic heuristic, spill the register that interferes
> with
>  > the most
>  > other registers. It is possible to do a lot better.
>  > The heuristic should concider how much each register is used while it
> is
>  > alive"
>  > spillable := spillWorklist select:
>  > [:each | ((self hasSpill: each register) not) and: [each register
>  > isMachineRegister not]].
>  > spillable := spillable asSortedCollection: [:a :b| a spillWeight > b
>  > spillWeight].
>  > ^ spillable first
>  >
>  > After compiling lots of methods using exupery, it fails with very big
>  > methods because spillable is nil, and spillable first throws an error.
> If
>  > I make less inlining (for example, not inlining divisions and
>  > multiplications), it compiles ok!
>  > Any ideas?
>
> I'd guess it's a limit with the register allocator. It is possible
> that it can fail to find a register to spill when it needs to spill
> something. Given this bug will not cause crashes or incorrect
> execution it's not high priority.
>
>  > >  > 5) Is there a way to implement indirect jump tables in exupery?
>  > >
>  > > It would be possible. I do use indirect jumps for returns to compiled
>  > > methods. If you look at any method you should see at least one
>  > > indirect jump in the return code. Just jump to a register.
>  > >
>  > Yes, I checked that, but I still need to initialize that register with
> the
>  > convenient block, but I need to do that without using Jcc (conditional
>  > jumps) to choose from the right one, Any suggestions?
>
> Exupery also can get the address of a block. That's also done in the
> send code to save the compiled program counter. The compiled program
> counter is the address of the machine code block to return to encoded
> as a SmallInteger. Return blocks are aligned to 2 byte boundaries to
> allow for tagging. That's enough to build an indirect jump table if
> you wanted to do that.
>
Yes I also notice that, using MedAddress, right?
Forgive me, but I still can't get the point:
For example:

MedMov
        from: (MedAddress addressOf: blockN)
        to: aMedReg
MedJump
        type: #jmp
        target: aMedReg
block1:
do something1
jmp end
block2:
do something2
jmp end
block3:
do something3
end:

this could be a jump table,
But I still need to select which block to jmp.
The only way of selecting the block I can Imagine is nesting compares,
something with jumps like:
MedJump
        type: #jc
        target: aLabel
        instruction: (MedComparision
                operator: #bitTest
                arg1: aMed
                arg2: (MedLiteral literal: 0))).
But I want to implement a jump table to avoid conditional branching

> Why do you need to build an indirect jump table? What are you trying
> to do?
>
I am implementing a smalltalk. It compiles directly to machine code, with
exupery. The last time I asked something to the list I was starting to use
exupery. Now I am almost done with that (without many optimizations). I am
doing unit testing right now.
My first mail to the list asked what would be the best to implement a new
st, so, in my implementation I use:
0 tagged ints.
A simple (and a little fat) object memory.
A very straightforward send mechanism (with C calling convention for
calling methods).
No contexts, but using BlockClosures (frames are the same as in C, the C
compiler does not differentiate C code from ST code).
I compile the ST code from .st files to .s (assembler) using SmaCC,
RefactoryBrowser, and then exupery, I still need squeak in order to run
all that.
I only use the bottom layer of exupery, (does not use IntermediateXXXXXX
classes)
I implemented the cmovxx instruction in exupery, because it is very useful.
But I need jump tables to implement for example, faster versions of
ifTrue:ifFalse:, and a lot of other things. This could lead to faster
results.
Right Now I am getting (with the same machine), tinyBenchmarks:
Squeak: 172043010 bytecodes/sec; 5468700 sends/sec
Squeak/Exupery: 775757575 bytecodes/sec; 13569800 sends/sec.
myST/Exupery: 1072251308 bytecodes/sec; 36056442 sends/sec

> Bryce
> _______________________________________________
> Exupery mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>
Cheers
Guille

_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

RE: Some questions

Sebastian Sastre-2
> > Why do you need to build an indirect jump table? What are
> you trying
> > to do?
> >
> I am implementing a smalltalk. It compiles directly to
> machine code, with exupery. The last time I asked something
> to the list I was starting to use exupery. Now I am almost
> done with that (without many optimizations). I am doing unit
> testing right now.
> My first mail to the list asked what would be the best to
> implement a new st, so, in my implementation I use:
> 0 tagged ints.
> A simple (and a little fat) object memory.
> A very straightforward send mechanism (with C calling
> convention for calling methods).
> No contexts, but using BlockClosures (frames are the same as
> in C, the C compiler does not differentiate C code from ST code).

Hi Guille, I don't get something here. If you are using Exupery to generate
asm code why are you talking about a C compiler?

> I compile the ST code from .st files to .s (assembler) using
> SmaCC, RefactoryBrowser, and then exupery, I still need
> squeak in order to run all that.
> I only use the bottom layer of exupery, (does not use
> IntermediateXXXXXX
> classes)
> I implemented the cmovxx instruction in exupery, because it
> is very useful.
> But I need jump tables to implement for example, faster
> versions of ifTrue:ifFalse:, and a lot of other things. This
> could lead to faster results.
> Right Now I am getting (with the same machine), tinyBenchmarks:
> Squeak: 172043010 bytecodes/sec; 5468700 sends/sec
> Squeak/Exupery: 775757575 bytecodes/sec; 13569800 sends/sec.
> myST/Exupery: 1072251308 bytecodes/sec; 36056442 sends/sec
>
That are numbers!

Cheers,

Sebastian


> > Bryce
> > _______________________________________________
> > Exupery mailing list
> > [hidden email]
> > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
> >
> Cheers
> Guille
>
> _______________________________________________
> Exupery mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery

_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

RE: Some questions

Guillermo Adrián Molina
>> > Why do you need to build an indirect jump table? What are
>> you trying
>> > to do?
>> >
>> I am implementing a smalltalk. It compiles directly to
>> machine code, with exupery. The last time I asked something
>> to the list I was starting to use exupery. Now I am almost
>> done with that (without many optimizations). I am doing unit
>> testing right now.
>> My first mail to the list asked what would be the best to
>> implement a new st, so, in my implementation I use:
>> 0 tagged ints.
>> A simple (and a little fat) object memory.
>> A very straightforward send mechanism (with C calling
>> convention for calling methods).
>> No contexts, but using BlockClosures (frames are the same as
>> in C, the C compiler does not differentiate C code from ST code).
>
> Hi Guille, I don't get something here. If you are using Exupery to
> generate
> asm code why are you talking about a C compiler?
>
Hi Guille, I don't get something here. If you are using Exupery to generate
asm code why are you talking about a C compiler?

Ok, short answer:

The ST VM is responsible for a lot of things, one of them is to interpret
bytecodes. In my ST every method is stored in x86 machine code, so, I
don’t need any interpreter to interpret methods (the CPU does all that).,
but VM’s have to deal with a lot of other things, like primitives and
method lookups. That part is done in C.

Long answer:
Right now building my VM is a little messy, this is more or less what I do:

• File out the classes I need from Squeak.

Right now I use only ~50 basic classes, and ~40 test classes. The file out
mechanism generates one file per class, called “ClassName.st”

• Compile methods in squeak

I load every *.st file from squeak (I said load, not file in!). While I
read the classes I compile the methods with SmaCC – Refactory Browser  -
Exupery. This generates assembler as an intermediate step, but the final
step produces x86 machine code. This is stored in every method.

• Generate assembler files from squeak

Once everything is compiled I generate an assembler file for every class,
for example “ClassName.s”. This could be a little confusing. I already
compiled everything, why would I need to generate assembler files? Because
assembler files are very handy to represent the image, take a look into a
real method:

/* Test>>test Method bytecodes */
.global Test_Class_test_bytecodes
Test_Class_test_bytecodes:
        .int ByteArray + 1
        .int 154 /* Number: 77 */
        .int 17888 /* Number: 8944 */
.global _Test_Class_test_bytecodes
_Test_Class_test_bytecodes:
        .byte 85, 137, 229, 139, 69, 8, 80, 184
        .int Test_Class_test_literals + 1
        .byte 139, 64, 11, 232
        .int getMethodIP - 4 - .
        .byte 255, 208, 129, 196, 4, 0, 0, 0, 139, 69, 8, 80, 184
        .int Test_Class_test_literals + 1
        .byte 139, 64, 15, 232
        .int getMethodIP - 4 - .
        .byte 255, 208, 129, 196, 4, 0, 0, 0, 80, 184
        .int Test_Class_test_literals + 1
        .byte 139, 64, 19, 232
        .int getMethodIP - 4 - .
        .byte 255, 208, 129, 196, 4, 0, 0, 0, 201, 195
        .align 2


As you can see, that is not assembler, but those bytes, are generated with
Exupery. Notice the references to other Objects. It is very easy to
represent the image with this method. For example, look how an array would
be represented in this way:

/* Array */
.global Test_Class_test_literals
Test_Class_test_literals:
        .int Array + 1
        .int 24 /* Number: 12 */
        .int 9784 /* Number: 4892 */
.global _Test_Class_test_literals
_Test_Class_test_literals:
        .int symbol_initialize + 1
        .int symbol_selfTest + 1
        .int symbol_printString + 1

Those + 1 , are there because of the 0 tagged integers.

• Generate a library with the code

With all the .s files I generate a library

• Compile everything into a static executable

I compile the library and the other C files into a static executable. I do
that because right now, I haven’t implemented the st compiler. And that’s
why I still need squeak. When I implement the compiler (SmaCC-RB-Exupery),
I will have to generate some kind of dynamic loading of the st part.

Cheers
Guille




>> I compile the ST code from .st files to .s (assembler) using
>> SmaCC, RefactoryBrowser, and then exupery, I still need
>> squeak in order to run all that.
>> I only use the bottom layer of exupery, (does not use
>> IntermediateXXXXXX
>> classes)
>> I implemented the cmovxx instruction in exupery, because it
>> is very useful.
>> But I need jump tables to implement for example, faster
>> versions of ifTrue:ifFalse:, and a lot of other things. This
>> could lead to faster results.
>> Right Now I am getting (with the same machine), tinyBenchmarks:
>> Squeak: 172043010 bytecodes/sec; 5468700 sends/sec
>> Squeak/Exupery: 775757575 bytecodes/sec; 13569800 sends/sec.
>> myST/Exupery: 1072251308 bytecodes/sec; 36056442 sends/sec
>>
> That are numbers!
>
> Cheers,
>
> Sebastian
>
>
>> > Bryce
>> > _______________________________________________
>> > Exupery mailing list
>> > [hidden email]
>> > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>> >
>> Cheers
>> Guille
>>
>> _______________________________________________
>> Exupery mailing list
>> [hidden email]
>> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>
> _______________________________________________
> Exupery mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>


_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Some questions

Bryce Kampjes
In reply to this post by Guillermo Adrián Molina
Guillermo Adrián Molina writes:
 > > Exupery also can get the address of a block. That's also done in the
 > > send code to save the compiled program counter. The compiled program
 > > counter is the address of the machine code block to return to encoded
 > > as a SmallInteger. Return blocks are aligned to 2 byte boundaries to
 > > allow for tagging. That's enough to build an indirect jump table if
 > > you wanted to do that.
 > >
 > Yes I also notice that, using MedAddress, right?
 > Forgive me, but I still can't get the point:
 > For example:

MedAddress is a literal that represents the address of a block. In
Exupery it gets relocated to be the blocks actual address.

You could write now:
    (jmp (mem (add (MedAddress blockWithTable) (sar anIndex 2))))

The only thing missing is a way to produce a block that just contained
literals. In your case a block that contained MedAddresses.

The MedAddress should be translated into a label refering to the
block.

Exupery currently does not have blocks that contain literals but
it shouldn't be too hard to add.

 > I am implementing a smalltalk. It compiles directly to machine code, with
 > exupery. The last time I asked something to the list I was starting to use
 > exupery. Now I am almost done with that (without many optimizations). I am
 > doing unit testing right now.

Interesting, what is the goal of your new Smalltalk? What are you
trying to do better than the other dialects or is this purely for
enjoyment?

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

RE: Some questions

Bryce Kampjes
In reply to this post by Guillermo Adrián Molina
Guillermo Adrián Molina writes:
 > Once everything is compiled I generate an assembler file for every class,
 > for example “ClassName.s”. This could be a little confusing. I already
 > compiled everything, why would I need to generate assembler files? Because
 > assembler files are very handy to represent the image, take a look into a
 > real method:
 >
 > /* Test>>test Method bytecodes */
 > .global Test_Class_test_bytecodes
 > Test_Class_test_bytecodes:
 > .int ByteArray + 1
 > .int 154 /* Number: 77 */
 > .int 17888 /* Number: 8944 */
 > .global _Test_Class_test_bytecodes
 > _Test_Class_test_bytecodes:
 > .byte 85, 137, 229, 139, 69, 8, 80, 184
 > .int Test_Class_test_literals + 1
 > .byte 139, 64, 11, 232
 > .int getMethodIP - 4 - .
 > .byte 255, 208, 129, 196, 4, 0, 0, 0, 139, 69, 8, 80, 184
 > .int Test_Class_test_literals + 1
 > .byte 139, 64, 15, 232
 > .int getMethodIP - 4 - .
 > .byte 255, 208, 129, 196, 4, 0, 0, 0, 80, 184
 > .int Test_Class_test_literals + 1
 > .byte 139, 64, 19, 232
 > .int getMethodIP - 4 - .
 > .byte 255, 208, 129, 196, 4, 0, 0, 0, 201, 195
 > .align 2

The first versions of Exupery generated gas assembly which
I compiled then linked against C support code. Even after
Exupery could compile inline I kept the code around to generate
assembly instructions for several releases. It eventually got
deleted as it wasn't adding any value.

If you're planning on continuing generating assembly then it
might be worthwhile to try and find the code to produce assembly and
update it to deal with the current instruction selector and the
instructions that have been added since I stopped maintaining it.

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

RE: Some questions

Guillermo Adrián Molina

> Guillermo Adrián Molina writes:
>  > Once everything is compiled I generate an assembler file for every
> class,
>  > for example “ClassName.s”. This could be a little confusing. I already
>  > compiled everything, why would I need to generate assembler files?
> Because
>  > assembler files are very handy to represent the image, take a look into
> a
>  > real method:
>  >
>  > /* Test>>test Method bytecodes */
>  > .global Test_Class_test_bytecodes
>  > Test_Class_test_bytecodes:
>  > .int ByteArray + 1
>  > .int 154 /* Number: 77 */
>  > .int 17888 /* Number: 8944 */
>  > .global _Test_Class_test_bytecodes
>  > _Test_Class_test_bytecodes:
>  > .byte 85, 137, 229, 139, 69, 8, 80, 184
>  > .int Test_Class_test_literals + 1
>  > .byte 139, 64, 11, 232
>  > .int getMethodIP - 4 - .
>  > .byte 255, 208, 129, 196, 4, 0, 0, 0, 139, 69, 8, 80, 184
>  > .int Test_Class_test_literals + 1
>  > .byte 139, 64, 15, 232
>  > .int getMethodIP - 4 - .
>  > .byte 255, 208, 129, 196, 4, 0, 0, 0, 80, 184
>  > .int Test_Class_test_literals + 1
>  > .byte 139, 64, 19, 232
>  > .int getMethodIP - 4 - .
>  > .byte 255, 208, 129, 196, 4, 0, 0, 0, 201, 195
>  > .align 2
>
> The first versions of Exupery generated gas assembly which
> I compiled then linked against C support code. Even after
> Exupery could compile inline I kept the code around to generate
> assembly instructions for several releases. It eventually got
> deleted as it wasn't adding any value.
>
> If you're planning on continuing generating assembly then it
> might be worthwhile to try and find the code to produce assembly and
> update it to deal with the current instruction selector and the
> instructions that have been added since I stopped maintaining it.
>
Well, it is good to know that, but I need to generate machine code, not
assembler code. I am generating assembler files just to make it easier to
mantain the relationship between objects. As you can see from the code,
the C compiler doesn't know (at compile time) what are those bytes. Before
I used exupery, I was generating assembler, but thanks to exupery, that
step isn't necessary.
In the future I am planning to generate some kind of relocatable objects
(instead of assembler files), that could be loaded on demand at run time.

Cheers, Guille
> Bryce
> _______________________________________________
> Exupery mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>


_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Some questions

Bryce Kampjes
In reply to this post by Guillermo Adrián Molina
Guillermo Adrián Molina writes:
 > >  > After compiling lots of methods using exupery, it fails with very big
 > >  > methods because spillable is nil, and spillable first throws an error.
 > > If
 > >  > I make less inlining (for example, not inlining divisions and
 > >  > multiplications), it compiles ok!
 > >  > Any ideas?
 > >
 > > I'd guess it's a limit with the register allocator. It is possible
 > > that it can fail to find a register to spill when it needs to spill
 > > something. Given this bug will not cause crashes or incorrect
 > > execution it's not high priority.

If you want to fix that limit in the register allocator I could give
you some pointers. The problem is due to to how the problem is broken
down into stages. I'd need to dig through code to remember the details
though.

I'm planning on working on the register allocator in the next release.
The goal will be making it faster, it has a few serious performance
problems.

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Some questions

Guillermo Adrián Molina
> Guillermo Adrián Molina writes:
>  > >  > After compiling lots of methods using exupery, it fails with very
> big
>  > >  > methods because spillable is nil, and spillable first throws an
> error.
>  > > If
>  > >  > I make less inlining (for example, not inlining divisions and
>  > >  > multiplications), it compiles ok!
>  > >  > Any ideas?
>  > >
>  > > I'd guess it's a limit with the register allocator. It is possible
>  > > that it can fail to find a register to spill when it needs to spill
>  > > something. Given this bug will not cause crashes or incorrect
>  > > execution it's not high priority.
>
> If you want to fix that limit in the register allocator I could give
> you some pointers. The problem is due to to how the problem is broken
> down into stages. I'd need to dig through code to remember the details
> though.
>
Yes I do want. Please let me know where to start.

> I'm planning on working on the register allocator in the next release.
> The goal will be making it faster, it has a few serious performance
> problems.
>
Exupery's compile time is not a problem for me. But may be I have to wait
for you to finish with the register allocator, in order to try to fix the
limit.
Please let me know what do you want me to do.
Right now, I have allready finished with unit testing. The next thing I
will do is to include all the compiler classes in my project (remeber that
right now, that is done in Squeak), may be it would be convenient for me
to wait for 0.12 before I do that.

Another thing, Do you want the code I made for cmovxx?

Cheers Guille.


> Bryce
> _______________________________________________
> Exupery mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>


_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Some questions

Bryce Kampjes
Guillermo Adrián Molina writes:
 > > If you want to fix that limit in the register allocator I could give
 > > you some pointers. The problem is due to to how the problem is broken
 > > down into stages. I'd need to dig through code to remember the details
 > > though.
 > >
 > Yes I do want. Please let me know where to start.

If it's not an urgent problem then it may be better to wait
until after 0.13. Or to look at the register allocator during
0.13 development.

Have a look at the stages of simplification. They're done

ColouringRegisterAllocator>>processWorkLists
        simplifyWorklist isEmpty ifFalse: [^ self simplify].
        self coalesce ifTrue: [^ self].
        self freeze ifTrue: [^ self].
        spillWorklist isEmpty ifFalse: [^ self spillRegister].
        self spillMove

Sets the steps for processing. However the spill worklist has some
registers on it that shouldn't be spilled, so it tries to select a
register to spill. It discards all registers then fails.

I'd see if there are any moves that might be spilled afterwards,
if so, then all you'd need to do is allow spillRegister to fail
gracefully.

 > > I'm planning on working on the register allocator in the next release.
 > > The goal will be making it faster, it has a few serious performance
 > > problems.
 > >
 > Exupery's compile time is not a problem for me. But may be I have to wait
 > for you to finish with the register allocator, in order to try to fix the
 > limit.
 > Please let me know what do you want me to do.
 > Right now, I have allready finished with unit testing. The next thing I
 > will do is to include all the compiler classes in my project (remeber tat
 > right now, that is done in Squeak), may be it would be convenient for me
 > to wait for 0.12 before I do that.
 >
 > Another thing, Do you want the code I made for cmovxx?

I'm interested.

Does it have unit test coverage? Exupery development relies on
testing so that's required.

When was cmov introduced? I know it was a long time ago but can't
remember precisely when. What I'm concerned with is making Exupery
incompatable with some chips that might still be being used.

Given adequate test coverage I'll add it.

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Some questions

Guillermo Adrián Molina

> Guillermo Adrián Molina writes:
>  > > If you want to fix that limit in the register allocator I could give
>  > > you some pointers. The problem is due to to how the problem is broken
>  > > down into stages. I'd need to dig through code to remember the
> details
>  > > though.
>  > >
>  > Yes I do want. Please let me know where to start.
>
> If it's not an urgent problem then it may be better to wait
> until after 0.13. Or to look at the register allocator during
> 0.13 development.
>
> Have a look at the stages of simplification. They're done
>
> ColouringRegisterAllocator>>processWorkLists
> simplifyWorklist isEmpty ifFalse: [^ self simplify].
> self coalesce ifTrue: [^ self].
> self freeze ifTrue: [^ self].
> spillWorklist isEmpty ifFalse: [^ self spillRegister].
> self spillMove
>
> Sets the steps for processing. However the spill worklist has some
> registers on it that shouldn't be spilled, so it tries to select a
> register to spill. It discards all registers then fails.
>
> I'd see if there are any moves that might be spilled afterwards,
> if so, then all you'd need to do is allow spillRegister to fail
> gracefully.
>

Ok, I will try to see what is happening. Is there any hard limit (besides
the number of available registers in x86 arch)?

>  > > I'm planning on working on the register allocator in the next
> release.
>  > > The goal will be making it faster, it has a few serious performance
>  > > problems.
>  > >
>  > Exupery's compile time is not a problem for me. But may be I have to
> wait
>  > for you to finish with the register allocator, in order to try to fix
> the
>  > limit.
>  > Please let me know what do you want me to do.
>  > Right now, I have allready finished with unit testing. The next thing I
>  > will do is to include all the compiler classes in my project (remeber
> tat
>  > right now, that is done in Squeak), may be it would be convenient for
> me
>  > to wait for 0.12 before I do that.
>  >
>  > Another thing, Do you want the code I made for cmovxx?
>
> I'm interested.
>
> Does it have unit test coverage? Exupery development relies on
> testing so that's required.
>
Not right now, I will work on that later, When I have it I will send it to
you.

> When was cmov introduced? I know it was a long time ago but can't
> remember precisely when. What I'm concerned with is making Exupery
> incompatable with some chips that might still be being used.
>

Intel's optimization manual says that cmov was introduced in Pentium, and
in AMD's optimization manual says that cmov is available from athlon. I
actually didn't investigate that thoroughly. The fact is that any modern
computer should have it. I know that in earlier implementations of cmov
(Pentium Pro) using the instruction wasn't really an advantage. But now,
it is really faster. My tinyBenchamrks showed a speed up of 10% when I
implemented cmov for smallinteger additions.
But, If you are really concerned about compatibility I think you should be
better considering not to use it.


> Given adequate test coverage I'll add it.

I also implemented enter and leave instructions. Not because they were
better (they aren't), but, beacuse I use it to signal the inclusion of
additional prologue and epilogue code in a final phase added just after
the allocator. I do it that way because I dont know until then, which
registrs are used, and the number of additional temps needed. I know that
exupery allways push and pop all the registers (which aren't eax, edx and
ecx). And that it make place for a big context as temp space in stack. I
don't do that. I only push the used regs, and if that is not enough, I
enter additional stack space. That brakes compatibility with original
exupery, but I wanted to implement it that way. For small methods, that is
really better.
So, given that, I don't offer anything of this for you. I think you'll
understand.

Cheers, Guille
>
> Bryce
> _______________________________________________
> Exupery mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>


_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Some questions

Bryce Kampjes
Guillermo Adrián Molina writes:
 > > Sets the steps for processing. However the spill worklist has some
 > > registers on it that shouldn't be spilled, so it tries to select a
 > > register to spill. It discards all registers then fails.
 > >
 > > I'd see if there are any moves that might be spilled afterwards,
 > > if so, then all you'd need to do is allow spillRegister to fail
 > > gracefully.
 > >
 >
 > Ok, I will try to see what is happening. Is there any hard limit (besides
 > the number of available registers in x86 arch)?

There should be no limit on the number of registers you can use. The
worst that should happen is you end up with a lot of spill code.

 > >  > Another thing, Do you want the code I made for cmovxx?
 > >
 > > I'm interested.
 > >
 > > Does it have unit test coverage? Exupery development relies on
 > > testing so that's required.
 > >
 > Not right now, I will work on that later, When I have it I will send it to
 > you.

OK

 > > When was cmov introduced? I know it was a long time ago but can't
 > > remember precisely when. What I'm concerned with is making Exupery
 > > incompatable with some chips that might still be being used.
 > >
 >
 > Intel's optimization manual says that cmov was introduced in Pentium, and
 > in AMD's optimization manual says that cmov is available from athlon. I
 > actually didn't investigate that thoroughly. The fact is that any modern
 > computer should have it. I know that in earlier implementations of cmov
 > (Pentium Pro) using the instruction wasn't really an advantage. But now,
 > it is really faster. My tinyBenchamrks showed a speed up of 10% when I
 > implemented cmov for smallinteger additions.
 > But, If you are really concerned about compatibility I think you should be
 > better considering not to use it.

I'm surprised that your SmallInteger addition code was helped.

In Exupery the SmallInteger addtion sequence is
   bitTest arg1
   jumpIfSet failureBlock
   bitTest arg2
   jumpIfSet failureBlock
   clearTagBit arg1
   add arg1 arg2
   jumpOverflow failureBlock

The failure case is a full message send.

There are code fragments where cmov whould be helpful. Converting
to a boolean comes to mind. The part of "a > b" where you're loading
either true or false into the result register.

 > > Given adequate test coverage I'll add it.
 >
 > I also implemented enter and leave instructions. Not because they were
 > better (they aren't), but, beacuse I use it to signal the inclusion of
 > additional prologue and epilogue code in a final phase added just after
 > the allocator. I do it that way because I dont know until then, which
 > registrs are used, and the number of additional temps needed. I know that
 > exupery allways push and pop all the registers (which aren't eax, edx and
 > ecx). And that it make place for a big context as temp space in stack. I
 > don't do that. I only push the used regs, and if that is not enough, I
 > enter additional stack space. That brakes compatibility with original
 > exupery, but I wanted to implement it that way. For small methods, that is
 > really better.
 > So, given that, I don't offer anything of this for you. I think you'll
 > understand.

Exupery's prolog and epilogue sequences could be improved. I've been
thinking about overhauling that area for a few years now. I'd like
to have variables spill into their actual locations. So if a stack
variable was stored, it would always be fetched from the context.
Then spilled registers wouldn't need to be loaded and stored on
context switches.

On thing that I might do in 0.13 is colour the isolated parts of a
method separately. That should improve register allocation as the
inteference graph will not be polluted by other isolated sections of
code. A compiled method is often made up of completely isolated
sections of code. Colouring the sections separately should also speed
up register allocation.

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Some questions

Guillermo Adrián Molina
> Guillermo Adrián Molina writes:
>  > > Sets the steps for processing. However the spill worklist has some
>  > > registers on it that shouldn't be spilled, so it tries to select a
>  > > register to spill. It discards all registers then fails.
>  > >
>  > > I'd see if there are any moves that might be spilled afterwards,
>  > > if so, then all you'd need to do is allow spillRegister to fail
>  > > gracefully.
>  > >
>  >
>  > Ok, I will try to see what is happening. Is there any hard limit
> (besides
>  > the number of available registers in x86 arch)?
>
> There should be no limit on the number of registers you can use. The
> worst that should happen is you end up with a lot of spill code.
>
>  > >  > Another thing, Do you want the code I made for cmovxx?
>  > >
>  > > I'm interested.
>  > >
>  > > Does it have unit test coverage? Exupery development relies on
>  > > testing so that's required.
>  > >
>  > Not right now, I will work on that later, When I have it I will send it
> to
>  > you.
>
> OK
>
>  > > When was cmov introduced? I know it was a long time ago but can't
>  > > remember precisely when. What I'm concerned with is making Exupery
>  > > incompatable with some chips that might still be being used.
>  > >
>  >
>  > Intel's optimization manual says that cmov was introduced in Pentium,
> and
>  > in AMD's optimization manual says that cmov is available from athlon. I
>  > actually didn't investigate that thoroughly. The fact is that any
> modern
>  > computer should have it. I know that in earlier implementations of cmov
>  > (Pentium Pro) using the instruction wasn't really an advantage. But
> now,
>  > it is really faster. My tinyBenchamrks showed a speed up of 10% when I
>  > implemented cmov for smallinteger additions.
>  > But, If you are really concerned about compatibility I think you should
> be
>  > better considering not to use it.
>
> I'm surprised that your SmallInteger addition code was helped.
>
> In Exupery the SmallInteger addtion sequence is
>    bitTest arg1
>    jumpIfSet failureBlock
>    bitTest arg2
>    jumpIfSet failureBlock
>    clearTagBit arg1
>    add arg1 arg2
>    jumpOverflow failureBlock
>
> The failure case is a full message send.
>
The problem with the above code is that you have 3 branches.
That is why I need jump tables, there are cases where cmov really dosn't help

Before I started using exupery, I called special methods in C that
implemented faster code. Every special method (and primitives) returned 1
in case of an error, and if success, returned the result object.
One of this special methods was +. This is part of the code:

if(areIntegers(rcvr,arg)) {
        int result;
        asm( "movl $1,%%edx\n\t"
                "movl %[rcvr],%[result]\n\t"
                "addl %[arg],%[result]\n\t"
                "cmovol %%edx,%[result]"
                : [result] "=r" (result)
                : [rcvr] "r" (rcvr), [arg] "r" (arg)
                : "edx" );
        return result;
}

with this code, I've got up to 10% faster code in + intensive tests.


> There are code fragments where cmov whould be helpful. Converting
> to a boolean comes to mind. The part of "a > b" where you're loading
> either true or false into the result register.
>

Yes, I implemented that with exupery (code for less "<"):

self addExpression:  (MedMov
        from: (self literal: false)
        to: answer ).
trueReg := machine createTemporaryRegister.
self addExpression:  (MedMov
        from: (self literal: true)
        to: trueReg ).
self addExpression:  (MedComparision
        operator: #cmp
        arg1: arg1
        arg2: arg2).
self addExpression:  (MedCMov
        type: #cmovl
        from: trueReg
        to: answer).

This gave me an impressive improvement (up to 40-50%), when I implemented
all the smallint comparissons in this way. Because, as you know, we dont
need to detag before compare.


>  > > Given adequate test coverage I'll add it.
>  >
>  > I also implemented enter and leave instructions. Not because they were
>  > better (they aren't), but, beacuse I use it to signal the inclusion of
>  > additional prologue and epilogue code in a final phase added just after
>  > the allocator. I do it that way because I dont know until then, which
>  > registrs are used, and the number of additional temps needed. I know
> that
>  > exupery allways push and pop all the registers (which aren't eax, edx
> and
>  > ecx). And that it make place for a big context as temp space in stack.
> I
>  > don't do that. I only push the used regs, and if that is not enough, I
>  > enter additional stack space. That brakes compatibility with original
>  > exupery, but I wanted to implement it that way. For small methods, that
> is
>  > really better.
>  > So, given that, I don't offer anything of this for you. I think you'll
>  > understand.
>
> Exupery's prolog and epilogue sequences could be improved. I've been
> thinking about overhauling that area for a few years now. I'd like
> to have variables spill into their actual locations. So if a stack
> variable was stored, it would always be fetched from the context.
> Then spilled registers wouldn't need to be loaded and stored on
> context switches.
>
> On thing that I might do in 0.13 is colour the isolated parts of a
> method separately. That should improve register allocation as the
> inteference graph will not be polluted by other isolated sections of
> code. A compiled method is often made up of completely isolated
> sections of code. Colouring the sections separately should also speed
> up register allocation.
>

Every improvement you make will help me.
Cheers, Guille


> Bryce
> _______________________________________________
> Exupery mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>


_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Some questions

Bryce Kampjes
Guillermo Adrián Molina writes:
 > > In Exupery the SmallInteger addtion sequence is
 > >    bitTest arg1
 > >    jumpIfSet failureBlock
 > >    bitTest arg2
 > >    jumpIfSet failureBlock
 > >    clearTagBit arg1
 > >    add arg1 arg2
 > >    jumpOverflow failureBlock
 > >
 > > The failure case is a full message send.
 > >
 > The problem with the above code is that you have 3 branches.
 > That is why I need jump tables, there are cases where cmov really dosn't help

There is only 3 branches and I'm hoping that they will never be
taken so they should be easy to predict. That said the branches do
use branch predictor resources which could cause other branches not
to be predicted as well.

 > Before I started using exupery, I called special methods in C that
 > implemented faster code. Every special method (and primitives) returned 1
 > in case of an error, and if success, returned the result object.
 > One of this special methods was +. This is part of the code:
 >
 > if(areIntegers(rcvr,arg)) {
 > int result;
 > asm( "movl $1,%%edx\n\t"
 > "movl %[rcvr],%[result]\n\t"
 > "addl %[arg],%[result]\n\t"
 > "cmovol %%edx,%[result]"
 > : [result] "=r" (result)
 > : [rcvr] "r" (rcvr), [arg] "r" (arg)
 > : "edx" );
 > return result;
 > }
 >
 > with this code, I've got up to 10% faster code in + intensive tests.

Do you have conditionals inside areIntegers and to check if the result
is 1 indicating an error?

 > > There are code fragments where cmov whould be helpful. Converting
 > > to a boolean comes to mind. The part of "a > b" where you're loading
 > > either true or false into the result register.
 > >
 >
 > Yes, I implemented that with exupery (code for less "<"):
 >
 > self addExpression:  (MedMov
 > from: (self literal: false)
 > to: answer ).
 > trueReg := machine createTemporaryRegister.
 > self addExpression:  (MedMov
 > from: (self literal: true)
 > to: trueReg ).
 > self addExpression:  (MedComparision
 > operator: #cmp
 > arg1: arg1
 > arg2: arg2).
 > self addExpression:  (MedCMov
 > type: #cmovl
 > from: trueReg
 > to: answer).
 >
 > This gave me an impressive improvement (up to 40-50%), when I implemented
 > all the smallint comparissons in this way. Because, as you know, we dont
 > need to detag before compare.

Exupery removes many of the boolean conversion sequences.

        "a < b ifTrue: [x]"

First gets translated into:

      (booleanToControlFlow (controlFlowToBoolean (a < b)))

Then Exupery removes the booleanToControlFlow controlFlowToBoolean
sequence. The booleanToControlFlow sequence is moved to the failure
case where either a or b are not SmallIntegers.

So I'm not sure if speeding up the general case will help Exupery
as I'm not sure how often it's called.

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Some questions

Guillermo Adrián Molina
> Guillermo Adrián Molina writes:
>  > > In Exupery the SmallInteger addtion sequence is
>  > >    bitTest arg1
>  > >    jumpIfSet failureBlock
>  > >    bitTest arg2
>  > >    jumpIfSet failureBlock
>  > >    clearTagBit arg1
>  > >    add arg1 arg2
>  > >    jumpOverflow failureBlock
>  > >
>  > > The failure case is a full message send.
>  > >
>  > The problem with the above code is that you have 3 branches.
>  > That is why I need jump tables, there are cases where cmov really
> dosn't help
>
> There is only 3 branches and I'm hoping that they will never be
> taken so they should be easy to predict. That said the branches do
> use branch predictor resources which could cause other branches not
> to be predicted as well.
>
Yes, I agree. I am really not an expert int this matters, but I think It
is not so uncommon to send #+ with other objects than smallints, in that
case, may be one of the first 2 branches would be misspredicted. May be
you could test that both of them are smallints with just one branch. (I am
doing that right now). But may be I will try to do it without branching at
all

>  > Before I started using exupery, I called special methods in C that
>  > implemented faster code. Every special method (and primitives) returned
> 1
>  > in case of an error, and if success, returned the result object.
>  > One of this special methods was +. This is part of the code:
>  >
>  > if(areIntegers(rcvr,arg)) {
>  > int result;
>  > asm( "movl $1,%%edx\n\t"
>  > "movl %[rcvr],%[result]\n\t"
>  > "addl %[arg],%[result]\n\t"
>  > "cmovol %%edx,%[result]"
>  > : [result] "=r" (result)
>  > : [rcvr] "r" (rcvr), [arg] "r" (arg)
>  > : "edx" );
>  > return result;
>  > }
>  >
>  > with this code, I've got up to 10% faster code in + intensive tests.
>
> Do you have conditionals inside areIntegers and to check if the result
> is 1 indicating an error?
>
As I dont use this code so often as before, (because I inline that with
exupery at compile time) I dont't worry about it any more. But
areIntegers() is just an "or" and an "and", the branch is represented in
the C "if" statement. I wrote the addition that way because I wanted to
test if cmov was really that fast. It was better, but not THAT better.

Guille

_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery