Hi list, I been playing around with exupery. And now I have a few questions:
1) I cant get tinyBenchmarks working, neither in linux, nor in windows, Downloaded all the staff from: http://wiki.squeak.org/squeak/Installing+Exupery used: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-linux.tz in linux and: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-win32.zip in windows with prebuild image: http://ftp.squeak.org/Exupery/images/exupery-0.10.tz Examples run ok, but when I try to run tinyBenchmarks I get segmentation faults 2) Tried tinyBenchmarks in VisualWorks (NonCommercial 7.4.1) in my machine, I got: '652,229,299 bytecodes/sec; 89,016,165 sends/sec' Does anyone know Why I get almost 90 million sends/sec? I think It's quite a big difference from previous versions of vw 3) I saw that primitives for #at: and #at:put: are getting inlined, but I think they are only implemented for Variable Objects (not for bytes nor Characters nor anything else) Is that true? 4) In my experiments with exupery, I get an error if I inline too many methods. I think I am getting out of machine registers, for example, when I try to compile Integer-#digitDiv:reg:. I get this error In the ColouringRegisterAllocator phase, but it is not a "You dont have more registers, dude" kind of error. Is the "no more registers" situation taken into consideration? 5) Is there a way to implement indirect jump tables in exupery? Thanks a lot. Cheers Guille _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
Guillermo Adrián Molina writes:
> Hi list, I been playing around with exupery. And now I have a few questions: > > 1) I cant get tinyBenchmarks working, neither in linux, nor in windows, > > Downloaded all the staff from: > http://wiki.squeak.org/squeak/Installing+Exupery > > used: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-linux.tz in linux > and: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-win32.zip in windows > > with prebuild image: http://ftp.squeak.org/Exupery/images/exupery-0.10.tz > > Examples run ok, but when I try to run tinyBenchmarks I get segmentation > faults Try using the 0.11 Exupery VM with Exupery 0.11. Exupery VMs must match the Exupery version. The interface between Exupery and the VM is still evolving. > 2) Tried tinyBenchmarks in VisualWorks (NonCommercial 7.4.1) in my > machine, I got: > '652,229,299 bytecodes/sec; 89,016,165 sends/sec' > > Does anyone know Why I get almost 90 million sends/sec? > I think It's quite a big difference from previous versions of vw > > 3) I saw that primitives for #at: and #at:put: are getting inlined, but I > think they are only implemented for Variable Objects (not for bytes nor > Characters nor anything else) > Is that true? It's true. #at: and #at:put: are only implemented for variable objects. I should write primitives for other types. Good benchmarks that demonstrate the need for such primitives would be nice. > 4) In my experiments with exupery, I get an error if I inline too many > methods. I think I am getting out of machine registers, for example, when > I try to compile Integer-#digitDiv:reg:. > I get this error In the ColouringRegisterAllocator phase, but it is not a > "You dont have more registers, dude" kind of error. > Is the "no more registers" situation taken into consideration? I'd guess that it was because a variable was live at an entry point. There's a stack tracing bug which I'm just fixing that could have caused that. I use the liveness analyser in the register allocator to catch compiler bugs. It's much nicer to catch them there than with crashes. > 5) Is there a way to implement indirect jump tables in exupery? It would be possible. I do use indirect jumps for returns to compiled methods. If you look at any method you should see at least one indirect jump in the return code. Just jump to a register. Bryce _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
Hi there!
Thanks for the answers, found them very useful I have a few more questions > Guillermo Adrián Molina writes: > > Hi list, I been playing around with exupery. And now I have a few > questions: > > > > 1) I cant get tinyBenchmarks working, neither in linux, nor in windows, > > > > Downloaded all the staff from: > > http://wiki.squeak.org/squeak/Installing+Exupery > > > > used: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-linux.tz in > linux > > and: http://ftp.squeak.org/Exupery/vms/exupery-vm-0.11-win32.zip in > windows > > > > with prebuild image: > http://ftp.squeak.org/Exupery/images/exupery-0.10.tz > > > > Examples run ok, but when I try to run tinyBenchmarks I get > segmentation > > faults > > Try using the 0.11 Exupery VM with Exupery 0.11. Exupery VMs must > match the Exupery version. The interface between Exupery and the VM is > still evolving. > Ok!, tried that, it worked: 668407310 bytecodes/sec; 13559830 sends/sec 760772659 bytecodes/sec; 13803237 sends/sec 777524677 bytecodes/sec; 12762744 sends/sec 760772659 bytecodes/sec; 13834279 sends/sec 775757575 bytecodes/sec; 13569800 sends/sec I read something about intel being faster than AMD for exupery, Do you know why is that? > > 2) Tried tinyBenchmarks in VisualWorks (NonCommercial 7.4.1) in my > > machine, I got: > > '652,229,299 bytecodes/sec; 89,016,165 sends/sec' > > > > Does anyone know Why I get almost 90 million sends/sec? > > I think It's quite a big difference from previous versions of vw > > > > 3) I saw that primitives for #at: and #at:put: are getting inlined, but > I > > think they are only implemented for Variable Objects (not for bytes nor > > Characters nor anything else) > > Is that true? > > It's true. #at: and #at:put: are only implemented for variable > objects. I should write primitives for other types. Good benchmarks > that demonstrate the need for such primitives would be nice. > > > 4) In my experiments with exupery, I get an error if I inline too many > > methods. I think I am getting out of machine registers, for example, > when > > I try to compile Integer-#digitDiv:reg:. > > I get this error In the ColouringRegisterAllocator phase, but it is not > a > > "You dont have more registers, dude" kind of error. > > Is the "no more registers" situation taken into consideration? > > I'd guess that it was because a variable was live at an entry point. > There's a stack tracing bug which I'm just fixing that could have > caused that. > > I use the liveness analyser in the register allocator to catch > compiler bugs. It's much nicer to catch them there than with crashes. > Yes I've seen those kind of errors (variable live at entry point), corrected them initializing temps with nil. I think this is something different. In this method of the ColouringRegisterAllocator: findNodeToSpill | spillable | "This is just a basic heuristic, spill the register that interferes with the most other registers. It is possible to do a lot better. The heuristic should concider how much each register is used while it is alive" spillable := spillWorklist select: [:each | ((self hasSpill: each register) not) and: [each register isMachineRegister not]]. spillable := spillable asSortedCollection: [:a :b| a spillWeight > b spillWeight]. ^ spillable first After compiling lots of methods using exupery, it fails with very big methods because spillable is nil, and spillable first throws an error. If I make less inlining (for example, not inlining divisions and multiplications), it compiles ok! Any ideas? > > 5) Is there a way to implement indirect jump tables in exupery? > > It would be possible. I do use indirect jumps for returns to compiled > methods. If you look at any method you should see at least one > indirect jump in the return code. Just jump to a register. > Yes, I checked that, but I still need to initialize that register with the convenient block, but I need to do that without using Jcc (conditional jumps) to choose from the right one, Any suggestions? > Bryce > _______________________________________________ > Exupery mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery > Thanks a lot cheers, Guille _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
Guillermo Adrián Molina writes:
> Ok!, tried that, it worked: > 668407310 bytecodes/sec; 13559830 sends/sec > 760772659 bytecodes/sec; 13803237 sends/sec > 777524677 bytecodes/sec; 12762744 sends/sec > 760772659 bytecodes/sec; 13834279 sends/sec > 775757575 bytecodes/sec; 13569800 sends/sec > I read something about intel being faster than AMD for exupery, Do you > know why is that? > Exupery was much faster than the interpreter on Pentium 4s. That's because the Pentium 4 is an inefficient chip to run the interprter on. Those comparisions are rather old now. Hardware has moved on and so has Exupery. Benchmarking now with bigger suites may show different numbers. > > > 4) In my experiments with exupery, I get an error if I inline too many > > > methods. I think I am getting out of machine registers, for example, > > when > > > I try to compile Integer-#digitDiv:reg:. > > > I get this error In the ColouringRegisterAllocator phase, but it is not > > a > > > "You dont have more registers, dude" kind of error. > > > Is the "no more registers" situation taken into consideration? > > > > I'd guess that it was because a variable was live at an entry point. > > There's a stack tracing bug which I'm just fixing that could have > > caused that. > > > > I use the liveness analyser in the register allocator to catch > > compiler bugs. It's much nicer to catch them there than with crashes. > > > > Yes I've seen those kind of errors (variable live at entry point), > corrected them initializing temps with nil. > I think this is something different. In this method of the > ColouringRegisterAllocator: > > findNodeToSpill > | spillable | > "This is just a basic heuristic, spill the register that interferes with > the most > other registers. It is possible to do a lot better. > The heuristic should concider how much each register is used while it is > alive" > spillable := spillWorklist select: > [:each | ((self hasSpill: each register) not) and: [each register > isMachineRegister not]]. > spillable := spillable asSortedCollection: [:a :b| a spillWeight > b > spillWeight]. > ^ spillable first > > After compiling lots of methods using exupery, it fails with very big > methods because spillable is nil, and spillable first throws an error. If > I make less inlining (for example, not inlining divisions and > multiplications), it compiles ok! > Any ideas? I'd guess it's a limit with the register allocator. It is possible that it can fail to find a register to spill when it needs to spill something. Given this bug will not cause crashes or incorrect execution it's not high priority. > > > 5) Is there a way to implement indirect jump tables in exupery? > > > > It would be possible. I do use indirect jumps for returns to compiled > > methods. If you look at any method you should see at least one > > indirect jump in the return code. Just jump to a register. > > > Yes, I checked that, but I still need to initialize that register with the > convenient block, but I need to do that without using Jcc (conditional > jumps) to choose from the right one, Any suggestions? Exupery also can get the address of a block. That's also done in the send code to save the compiled program counter. The compiled program counter is the address of the machine code block to return to encoded as a SmallInteger. Return blocks are aligned to 2 byte boundaries to allow for tagging. That's enough to build an indirect jump table if you wanted to do that. Why do you need to build an indirect jump table? What are you trying to do? Bryce _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
> Guillermo Adrián Molina writes: > > > Ok!, tried that, it worked: > > 668407310 bytecodes/sec; 13559830 sends/sec > > 760772659 bytecodes/sec; 13803237 sends/sec > > 777524677 bytecodes/sec; 12762744 sends/sec > > 760772659 bytecodes/sec; 13834279 sends/sec > > 775757575 bytecodes/sec; 13569800 sends/sec > > I read something about intel being faster than AMD for exupery, Do you > > know why is that? > > > > Exupery was much faster than the interpreter on Pentium 4s. That's > because the Pentium 4 is an inefficient chip to run the interprter on. > > Those comparisions are rather old now. Hardware has moved on and so > has Exupery. Benchmarking now with bigger suites may show different > numbers. > > > > > 4) In my experiments with exupery, I get an error if I inline too > many > > > > methods. I think I am getting out of machine registers, for > example, > > > when > > > > I try to compile Integer-#digitDiv:reg:. > > > > I get this error In the ColouringRegisterAllocator phase, but it > is not > > > a > > > > "You dont have more registers, dude" kind of error. > > > > Is the "no more registers" situation taken into consideration? > > > > > > I'd guess that it was because a variable was live at an entry point. > > > There's a stack tracing bug which I'm just fixing that could have > > > caused that. > > > > > > I use the liveness analyser in the register allocator to catch > > > compiler bugs. It's much nicer to catch them there than with crashes. > > > > > > > Yes I've seen those kind of errors (variable live at entry point), > > corrected them initializing temps with nil. > > I think this is something different. In this method of the > > ColouringRegisterAllocator: > > > > findNodeToSpill > > | spillable | > > "This is just a basic heuristic, spill the register that interferes > with > > the most > > other registers. It is possible to do a lot better. > > The heuristic should concider how much each register is used while it > is > > alive" > > spillable := spillWorklist select: > > [:each | ((self hasSpill: each register) not) and: [each register > > isMachineRegister not]]. > > spillable := spillable asSortedCollection: [:a :b| a spillWeight > b > > spillWeight]. > > ^ spillable first > > > > After compiling lots of methods using exupery, it fails with very big > > methods because spillable is nil, and spillable first throws an error. > If > > I make less inlining (for example, not inlining divisions and > > multiplications), it compiles ok! > > Any ideas? > > I'd guess it's a limit with the register allocator. It is possible > that it can fail to find a register to spill when it needs to spill > something. Given this bug will not cause crashes or incorrect > execution it's not high priority. > > > > > 5) Is there a way to implement indirect jump tables in exupery? > > > > > > It would be possible. I do use indirect jumps for returns to compiled > > > methods. If you look at any method you should see at least one > > > indirect jump in the return code. Just jump to a register. > > > > > Yes, I checked that, but I still need to initialize that register with > the > > convenient block, but I need to do that without using Jcc (conditional > > jumps) to choose from the right one, Any suggestions? > > Exupery also can get the address of a block. That's also done in the > send code to save the compiled program counter. The compiled program > counter is the address of the machine code block to return to encoded > as a SmallInteger. Return blocks are aligned to 2 byte boundaries to > allow for tagging. That's enough to build an indirect jump table if > you wanted to do that. > Forgive me, but I still can't get the point: For example: MedMov from: (MedAddress addressOf: blockN) to: aMedReg MedJump type: #jmp target: aMedReg block1: do something1 jmp end block2: do something2 jmp end block3: do something3 end: this could be a jump table, But I still need to select which block to jmp. The only way of selecting the block I can Imagine is nesting compares, something with jumps like: MedJump type: #jc target: aLabel instruction: (MedComparision operator: #bitTest arg1: aMed arg2: (MedLiteral literal: 0))). But I want to implement a jump table to avoid conditional branching > Why do you need to build an indirect jump table? What are you trying > to do? > I am implementing a smalltalk. It compiles directly to machine code, with exupery. The last time I asked something to the list I was starting to use exupery. Now I am almost done with that (without many optimizations). I am doing unit testing right now. My first mail to the list asked what would be the best to implement a new st, so, in my implementation I use: 0 tagged ints. A simple (and a little fat) object memory. A very straightforward send mechanism (with C calling convention for calling methods). No contexts, but using BlockClosures (frames are the same as in C, the C compiler does not differentiate C code from ST code). I compile the ST code from .st files to .s (assembler) using SmaCC, RefactoryBrowser, and then exupery, I still need squeak in order to run all that. I only use the bottom layer of exupery, (does not use IntermediateXXXXXX classes) I implemented the cmovxx instruction in exupery, because it is very useful. But I need jump tables to implement for example, faster versions of ifTrue:ifFalse:, and a lot of other things. This could lead to faster results. Right Now I am getting (with the same machine), tinyBenchmarks: Squeak: 172043010 bytecodes/sec; 5468700 sends/sec Squeak/Exupery: 775757575 bytecodes/sec; 13569800 sends/sec. myST/Exupery: 1072251308 bytecodes/sec; 36056442 sends/sec > Bryce > _______________________________________________ > Exupery mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery > Cheers Guille _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
> > Why do you need to build an indirect jump table? What are
> you trying > > to do? > > > I am implementing a smalltalk. It compiles directly to > machine code, with exupery. The last time I asked something > to the list I was starting to use exupery. Now I am almost > done with that (without many optimizations). I am doing unit > testing right now. > My first mail to the list asked what would be the best to > implement a new st, so, in my implementation I use: > 0 tagged ints. > A simple (and a little fat) object memory. > A very straightforward send mechanism (with C calling > convention for calling methods). > No contexts, but using BlockClosures (frames are the same as > in C, the C compiler does not differentiate C code from ST code). Hi Guille, I don't get something here. If you are using Exupery to generate asm code why are you talking about a C compiler? > I compile the ST code from .st files to .s (assembler) using > SmaCC, RefactoryBrowser, and then exupery, I still need > squeak in order to run all that. > I only use the bottom layer of exupery, (does not use > IntermediateXXXXXX > classes) > I implemented the cmovxx instruction in exupery, because it > is very useful. > But I need jump tables to implement for example, faster > versions of ifTrue:ifFalse:, and a lot of other things. This > could lead to faster results. > Right Now I am getting (with the same machine), tinyBenchmarks: > Squeak: 172043010 bytecodes/sec; 5468700 sends/sec > Squeak/Exupery: 775757575 bytecodes/sec; 13569800 sends/sec. > myST/Exupery: 1072251308 bytecodes/sec; 36056442 sends/sec > Cheers, Sebastian > > Bryce > > _______________________________________________ > > Exupery mailing list > > [hidden email] > > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery > > > Cheers > Guille > > _______________________________________________ > Exupery mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
>> > Why do you need to build an indirect jump table? What are
Hi Guille, I don't get something here. If you are using Exupery to generate
>> you trying >> > to do? >> > >> I am implementing a smalltalk. It compiles directly to >> machine code, with exupery. The last time I asked something >> to the list I was starting to use exupery. Now I am almost >> done with that (without many optimizations). I am doing unit >> testing right now. >> My first mail to the list asked what would be the best to >> implement a new st, so, in my implementation I use: >> 0 tagged ints. >> A simple (and a little fat) object memory. >> A very straightforward send mechanism (with C calling >> convention for calling methods). >> No contexts, but using BlockClosures (frames are the same as >> in C, the C compiler does not differentiate C code from ST code). > > Hi Guille, I don't get something here. If you are using Exupery to > generate > asm code why are you talking about a C compiler? > asm code why are you talking about a C compiler? Ok, short answer: The ST VM is responsible for a lot of things, one of them is to interpret bytecodes. In my ST every method is stored in x86 machine code, so, I dont need any interpreter to interpret methods (the CPU does all that)., but VMs have to deal with a lot of other things, like primitives and method lookups. That part is done in C. Long answer: Right now building my VM is a little messy, this is more or less what I do: File out the classes I need from Squeak. Right now I use only ~50 basic classes, and ~40 test classes. The file out mechanism generates one file per class, called ClassName.st Compile methods in squeak I load every *.st file from squeak (I said load, not file in!). While I read the classes I compile the methods with SmaCC Refactory Browser - Exupery. This generates assembler as an intermediate step, but the final step produces x86 machine code. This is stored in every method. Generate assembler files from squeak Once everything is compiled I generate an assembler file for every class, for example ClassName.s. This could be a little confusing. I already compiled everything, why would I need to generate assembler files? Because assembler files are very handy to represent the image, take a look into a real method: /* Test>>test Method bytecodes */ .global Test_Class_test_bytecodes Test_Class_test_bytecodes: .int ByteArray + 1 .int 154 /* Number: 77 */ .int 17888 /* Number: 8944 */ .global _Test_Class_test_bytecodes _Test_Class_test_bytecodes: .byte 85, 137, 229, 139, 69, 8, 80, 184 .int Test_Class_test_literals + 1 .byte 139, 64, 11, 232 .int getMethodIP - 4 - . .byte 255, 208, 129, 196, 4, 0, 0, 0, 139, 69, 8, 80, 184 .int Test_Class_test_literals + 1 .byte 139, 64, 15, 232 .int getMethodIP - 4 - . .byte 255, 208, 129, 196, 4, 0, 0, 0, 80, 184 .int Test_Class_test_literals + 1 .byte 139, 64, 19, 232 .int getMethodIP - 4 - . .byte 255, 208, 129, 196, 4, 0, 0, 0, 201, 195 .align 2 As you can see, that is not assembler, but those bytes, are generated with Exupery. Notice the references to other Objects. It is very easy to represent the image with this method. For example, look how an array would be represented in this way: /* Array */ .global Test_Class_test_literals Test_Class_test_literals: .int Array + 1 .int 24 /* Number: 12 */ .int 9784 /* Number: 4892 */ .global _Test_Class_test_literals _Test_Class_test_literals: .int symbol_initialize + 1 .int symbol_selfTest + 1 .int symbol_printString + 1 Those + 1 , are there because of the 0 tagged integers. Generate a library with the code With all the .s files I generate a library Compile everything into a static executable I compile the library and the other C files into a static executable. I do that because right now, I havent implemented the st compiler. And thats why I still need squeak. When I implement the compiler (SmaCC-RB-Exupery), I will have to generate some kind of dynamic loading of the st part. Cheers Guille >> I compile the ST code from .st files to .s (assembler) using >> SmaCC, RefactoryBrowser, and then exupery, I still need >> squeak in order to run all that. >> I only use the bottom layer of exupery, (does not use >> IntermediateXXXXXX >> classes) >> I implemented the cmovxx instruction in exupery, because it >> is very useful. >> But I need jump tables to implement for example, faster >> versions of ifTrue:ifFalse:, and a lot of other things. This >> could lead to faster results. >> Right Now I am getting (with the same machine), tinyBenchmarks: >> Squeak: 172043010 bytecodes/sec; 5468700 sends/sec >> Squeak/Exupery: 775757575 bytecodes/sec; 13569800 sends/sec. >> myST/Exupery: 1072251308 bytecodes/sec; 36056442 sends/sec >> > That are numbers! > > Cheers, > > Sebastian > > >> > Bryce >> > _______________________________________________ >> > Exupery mailing list >> > [hidden email] >> > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery >> > >> Cheers >> Guille >> >> _______________________________________________ >> Exupery mailing list >> [hidden email] >> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery > > _______________________________________________ > Exupery mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery > _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
In reply to this post by Guillermo Adrián Molina
Guillermo Adrián Molina writes:
> > Exupery also can get the address of a block. That's also done in the > > send code to save the compiled program counter. The compiled program > > counter is the address of the machine code block to return to encoded > > as a SmallInteger. Return blocks are aligned to 2 byte boundaries to > > allow for tagging. That's enough to build an indirect jump table if > > you wanted to do that. > > > Yes I also notice that, using MedAddress, right? > Forgive me, but I still can't get the point: > For example: MedAddress is a literal that represents the address of a block. In Exupery it gets relocated to be the blocks actual address. You could write now: (jmp (mem (add (MedAddress blockWithTable) (sar anIndex 2)))) The only thing missing is a way to produce a block that just contained literals. In your case a block that contained MedAddresses. The MedAddress should be translated into a label refering to the block. Exupery currently does not have blocks that contain literals but it shouldn't be too hard to add. > I am implementing a smalltalk. It compiles directly to machine code, with > exupery. The last time I asked something to the list I was starting to use > exupery. Now I am almost done with that (without many optimizations). I am > doing unit testing right now. Interesting, what is the goal of your new Smalltalk? What are you trying to do better than the other dialects or is this purely for enjoyment? Bryce _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
In reply to this post by Guillermo Adrián Molina
Guillermo Adrián Molina writes:
> Once everything is compiled I generate an assembler file for every class, > for example ClassName.s. This could be a little confusing. I already > compiled everything, why would I need to generate assembler files? Because > assembler files are very handy to represent the image, take a look into a > real method: > > /* Test>>test Method bytecodes */ > .global Test_Class_test_bytecodes > Test_Class_test_bytecodes: > .int ByteArray + 1 > .int 154 /* Number: 77 */ > .int 17888 /* Number: 8944 */ > .global _Test_Class_test_bytecodes > _Test_Class_test_bytecodes: > .byte 85, 137, 229, 139, 69, 8, 80, 184 > .int Test_Class_test_literals + 1 > .byte 139, 64, 11, 232 > .int getMethodIP - 4 - . > .byte 255, 208, 129, 196, 4, 0, 0, 0, 139, 69, 8, 80, 184 > .int Test_Class_test_literals + 1 > .byte 139, 64, 15, 232 > .int getMethodIP - 4 - . > .byte 255, 208, 129, 196, 4, 0, 0, 0, 80, 184 > .int Test_Class_test_literals + 1 > .byte 139, 64, 19, 232 > .int getMethodIP - 4 - . > .byte 255, 208, 129, 196, 4, 0, 0, 0, 201, 195 > .align 2 The first versions of Exupery generated gas assembly which I compiled then linked against C support code. Even after Exupery could compile inline I kept the code around to generate assembly instructions for several releases. It eventually got deleted as it wasn't adding any value. If you're planning on continuing generating assembly then it might be worthwhile to try and find the code to produce assembly and update it to deal with the current instruction selector and the instructions that have been added since I stopped maintaining it. Bryce _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
> Guillermo Adrián Molina writes: > > Once everything is compiled I generate an assembler file for every > class, > > for example ClassName.s. This could be a little confusing. I already > > compiled everything, why would I need to generate assembler files? > Because > > assembler files are very handy to represent the image, take a look into > a > > real method: > > > > /* Test>>test Method bytecodes */ > > .global Test_Class_test_bytecodes > > Test_Class_test_bytecodes: > > .int ByteArray + 1 > > .int 154 /* Number: 77 */ > > .int 17888 /* Number: 8944 */ > > .global _Test_Class_test_bytecodes > > _Test_Class_test_bytecodes: > > .byte 85, 137, 229, 139, 69, 8, 80, 184 > > .int Test_Class_test_literals + 1 > > .byte 139, 64, 11, 232 > > .int getMethodIP - 4 - . > > .byte 255, 208, 129, 196, 4, 0, 0, 0, 139, 69, 8, 80, 184 > > .int Test_Class_test_literals + 1 > > .byte 139, 64, 15, 232 > > .int getMethodIP - 4 - . > > .byte 255, 208, 129, 196, 4, 0, 0, 0, 80, 184 > > .int Test_Class_test_literals + 1 > > .byte 139, 64, 19, 232 > > .int getMethodIP - 4 - . > > .byte 255, 208, 129, 196, 4, 0, 0, 0, 201, 195 > > .align 2 > > The first versions of Exupery generated gas assembly which > I compiled then linked against C support code. Even after > Exupery could compile inline I kept the code around to generate > assembly instructions for several releases. It eventually got > deleted as it wasn't adding any value. > > If you're planning on continuing generating assembly then it > might be worthwhile to try and find the code to produce assembly and > update it to deal with the current instruction selector and the > instructions that have been added since I stopped maintaining it. > assembler code. I am generating assembler files just to make it easier to mantain the relationship between objects. As you can see from the code, the C compiler doesn't know (at compile time) what are those bytes. Before I used exupery, I was generating assembler, but thanks to exupery, that step isn't necessary. In the future I am planning to generate some kind of relocatable objects (instead of assembler files), that could be loaded on demand at run time. Cheers, Guille > Bryce > _______________________________________________ > Exupery mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery > _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
In reply to this post by Guillermo Adrián Molina
Guillermo Adrián Molina writes:
> > > After compiling lots of methods using exupery, it fails with very big > > > methods because spillable is nil, and spillable first throws an error. > > If > > > I make less inlining (for example, not inlining divisions and > > > multiplications), it compiles ok! > > > Any ideas? > > > > I'd guess it's a limit with the register allocator. It is possible > > that it can fail to find a register to spill when it needs to spill > > something. Given this bug will not cause crashes or incorrect > > execution it's not high priority. If you want to fix that limit in the register allocator I could give you some pointers. The problem is due to to how the problem is broken down into stages. I'd need to dig through code to remember the details though. I'm planning on working on the register allocator in the next release. The goal will be making it faster, it has a few serious performance problems. Bryce _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
> Guillermo Adrián Molina writes:
Yes I do want. Please let me know where to start.
> > > > After compiling lots of methods using exupery, it fails with very > big > > > > methods because spillable is nil, and spillable first throws an > error. > > > If > > > > I make less inlining (for example, not inlining divisions and > > > > multiplications), it compiles ok! > > > > Any ideas? > > > > > > I'd guess it's a limit with the register allocator. It is possible > > > that it can fail to find a register to spill when it needs to spill > > > something. Given this bug will not cause crashes or incorrect > > > execution it's not high priority. > > If you want to fix that limit in the register allocator I could give > you some pointers. The problem is due to to how the problem is broken > down into stages. I'd need to dig through code to remember the details > though. > > I'm planning on working on the register allocator in the next release. > The goal will be making it faster, it has a few serious performance > problems. > Exupery's compile time is not a problem for me. But may be I have to wait for you to finish with the register allocator, in order to try to fix the limit. Please let me know what do you want me to do. Right now, I have allready finished with unit testing. The next thing I will do is to include all the compiler classes in my project (remeber that right now, that is done in Squeak), may be it would be convenient for me to wait for 0.12 before I do that. Another thing, Do you want the code I made for cmovxx? Cheers Guille. > Bryce > _______________________________________________ > Exupery mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery > _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
Guillermo Adrián Molina writes:
> > If you want to fix that limit in the register allocator I could give > > you some pointers. The problem is due to to how the problem is broken > > down into stages. I'd need to dig through code to remember the details > > though. > > > Yes I do want. Please let me know where to start. If it's not an urgent problem then it may be better to wait until after 0.13. Or to look at the register allocator during 0.13 development. Have a look at the stages of simplification. They're done ColouringRegisterAllocator>>processWorkLists simplifyWorklist isEmpty ifFalse: [^ self simplify]. self coalesce ifTrue: [^ self]. self freeze ifTrue: [^ self]. spillWorklist isEmpty ifFalse: [^ self spillRegister]. self spillMove Sets the steps for processing. However the spill worklist has some registers on it that shouldn't be spilled, so it tries to select a register to spill. It discards all registers then fails. I'd see if there are any moves that might be spilled afterwards, if so, then all you'd need to do is allow spillRegister to fail gracefully. > > I'm planning on working on the register allocator in the next release. > > The goal will be making it faster, it has a few serious performance > > problems. > > > Exupery's compile time is not a problem for me. But may be I have to wait > for you to finish with the register allocator, in order to try to fix the > limit. > Please let me know what do you want me to do. > Right now, I have allready finished with unit testing. The next thing I > will do is to include all the compiler classes in my project (remeber tat > right now, that is done in Squeak), may be it would be convenient for me > to wait for 0.12 before I do that. > > Another thing, Do you want the code I made for cmovxx? I'm interested. Does it have unit test coverage? Exupery development relies on testing so that's required. When was cmov introduced? I know it was a long time ago but can't remember precisely when. What I'm concerned with is making Exupery incompatable with some chips that might still be being used. Given adequate test coverage I'll add it. Bryce _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
> Guillermo Adrián Molina writes: > > > If you want to fix that limit in the register allocator I could give > > > you some pointers. The problem is due to to how the problem is broken > > > down into stages. I'd need to dig through code to remember the > details > > > though. > > > > > Yes I do want. Please let me know where to start. > > If it's not an urgent problem then it may be better to wait > until after 0.13. Or to look at the register allocator during > 0.13 development. > > Have a look at the stages of simplification. They're done > > ColouringRegisterAllocator>>processWorkLists > simplifyWorklist isEmpty ifFalse: [^ self simplify]. > self coalesce ifTrue: [^ self]. > self freeze ifTrue: [^ self]. > spillWorklist isEmpty ifFalse: [^ self spillRegister]. > self spillMove > > Sets the steps for processing. However the spill worklist has some > registers on it that shouldn't be spilled, so it tries to select a > register to spill. It discards all registers then fails. > > I'd see if there are any moves that might be spilled afterwards, > if so, then all you'd need to do is allow spillRegister to fail > gracefully. > Ok, I will try to see what is happening. Is there any hard limit (besides the number of available registers in x86 arch)? > > > I'm planning on working on the register allocator in the next > release. > > > The goal will be making it faster, it has a few serious performance > > > problems. > > > > > Exupery's compile time is not a problem for me. But may be I have to > wait > > for you to finish with the register allocator, in order to try to fix > the > > limit. > > Please let me know what do you want me to do. > > Right now, I have allready finished with unit testing. The next thing I > > will do is to include all the compiler classes in my project (remeber > tat > > right now, that is done in Squeak), may be it would be convenient for > me > > to wait for 0.12 before I do that. > > > > Another thing, Do you want the code I made for cmovxx? > > I'm interested. > > Does it have unit test coverage? Exupery development relies on > testing so that's required. > you. > When was cmov introduced? I know it was a long time ago but can't > remember precisely when. What I'm concerned with is making Exupery > incompatable with some chips that might still be being used. > Intel's optimization manual says that cmov was introduced in Pentium, and in AMD's optimization manual says that cmov is available from athlon. I actually didn't investigate that thoroughly. The fact is that any modern computer should have it. I know that in earlier implementations of cmov (Pentium Pro) using the instruction wasn't really an advantage. But now, it is really faster. My tinyBenchamrks showed a speed up of 10% when I implemented cmov for smallinteger additions. But, If you are really concerned about compatibility I think you should be better considering not to use it. > Given adequate test coverage I'll add it. I also implemented enter and leave instructions. Not because they were better (they aren't), but, beacuse I use it to signal the inclusion of additional prologue and epilogue code in a final phase added just after the allocator. I do it that way because I dont know until then, which registrs are used, and the number of additional temps needed. I know that exupery allways push and pop all the registers (which aren't eax, edx and ecx). And that it make place for a big context as temp space in stack. I don't do that. I only push the used regs, and if that is not enough, I enter additional stack space. That brakes compatibility with original exupery, but I wanted to implement it that way. For small methods, that is really better. So, given that, I don't offer anything of this for you. I think you'll understand. Cheers, Guille > > Bryce > _______________________________________________ > Exupery mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery > _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
Guillermo Adrián Molina writes:
> > Sets the steps for processing. However the spill worklist has some > > registers on it that shouldn't be spilled, so it tries to select a > > register to spill. It discards all registers then fails. > > > > I'd see if there are any moves that might be spilled afterwards, > > if so, then all you'd need to do is allow spillRegister to fail > > gracefully. > > > > Ok, I will try to see what is happening. Is there any hard limit (besides > the number of available registers in x86 arch)? There should be no limit on the number of registers you can use. The worst that should happen is you end up with a lot of spill code. > > > Another thing, Do you want the code I made for cmovxx? > > > > I'm interested. > > > > Does it have unit test coverage? Exupery development relies on > > testing so that's required. > > > Not right now, I will work on that later, When I have it I will send it to > you. OK > > When was cmov introduced? I know it was a long time ago but can't > > remember precisely when. What I'm concerned with is making Exupery > > incompatable with some chips that might still be being used. > > > > Intel's optimization manual says that cmov was introduced in Pentium, and > in AMD's optimization manual says that cmov is available from athlon. I > actually didn't investigate that thoroughly. The fact is that any modern > computer should have it. I know that in earlier implementations of cmov > (Pentium Pro) using the instruction wasn't really an advantage. But now, > it is really faster. My tinyBenchamrks showed a speed up of 10% when I > implemented cmov for smallinteger additions. > But, If you are really concerned about compatibility I think you should be > better considering not to use it. I'm surprised that your SmallInteger addition code was helped. In Exupery the SmallInteger addtion sequence is bitTest arg1 jumpIfSet failureBlock bitTest arg2 jumpIfSet failureBlock clearTagBit arg1 add arg1 arg2 jumpOverflow failureBlock The failure case is a full message send. There are code fragments where cmov whould be helpful. Converting to a boolean comes to mind. The part of "a > b" where you're loading either true or false into the result register. > > Given adequate test coverage I'll add it. > > I also implemented enter and leave instructions. Not because they were > better (they aren't), but, beacuse I use it to signal the inclusion of > additional prologue and epilogue code in a final phase added just after > the allocator. I do it that way because I dont know until then, which > registrs are used, and the number of additional temps needed. I know that > exupery allways push and pop all the registers (which aren't eax, edx and > ecx). And that it make place for a big context as temp space in stack. I > don't do that. I only push the used regs, and if that is not enough, I > enter additional stack space. That brakes compatibility with original > exupery, but I wanted to implement it that way. For small methods, that is > really better. > So, given that, I don't offer anything of this for you. I think you'll > understand. Exupery's prolog and epilogue sequences could be improved. I've been thinking about overhauling that area for a few years now. I'd like to have variables spill into their actual locations. So if a stack variable was stored, it would always be fetched from the context. Then spilled registers wouldn't need to be loaded and stored on context switches. On thing that I might do in 0.13 is colour the isolated parts of a method separately. That should improve register allocation as the inteference graph will not be polluted by other isolated sections of code. A compiled method is often made up of completely isolated sections of code. Colouring the sections separately should also speed up register allocation. Bryce _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
> Guillermo Adrián Molina writes:
The problem with the above code is that you have 3 branches.
> > > Sets the steps for processing. However the spill worklist has some > > > registers on it that shouldn't be spilled, so it tries to select a > > > register to spill. It discards all registers then fails. > > > > > > I'd see if there are any moves that might be spilled afterwards, > > > if so, then all you'd need to do is allow spillRegister to fail > > > gracefully. > > > > > > > Ok, I will try to see what is happening. Is there any hard limit > (besides > > the number of available registers in x86 arch)? > > There should be no limit on the number of registers you can use. The > worst that should happen is you end up with a lot of spill code. > > > > > Another thing, Do you want the code I made for cmovxx? > > > > > > I'm interested. > > > > > > Does it have unit test coverage? Exupery development relies on > > > testing so that's required. > > > > > Not right now, I will work on that later, When I have it I will send it > to > > you. > > OK > > > > When was cmov introduced? I know it was a long time ago but can't > > > remember precisely when. What I'm concerned with is making Exupery > > > incompatable with some chips that might still be being used. > > > > > > > Intel's optimization manual says that cmov was introduced in Pentium, > and > > in AMD's optimization manual says that cmov is available from athlon. I > > actually didn't investigate that thoroughly. The fact is that any > modern > > computer should have it. I know that in earlier implementations of cmov > > (Pentium Pro) using the instruction wasn't really an advantage. But > now, > > it is really faster. My tinyBenchamrks showed a speed up of 10% when I > > implemented cmov for smallinteger additions. > > But, If you are really concerned about compatibility I think you should > be > > better considering not to use it. > > I'm surprised that your SmallInteger addition code was helped. > > In Exupery the SmallInteger addtion sequence is > bitTest arg1 > jumpIfSet failureBlock > bitTest arg2 > jumpIfSet failureBlock > clearTagBit arg1 > add arg1 arg2 > jumpOverflow failureBlock > > The failure case is a full message send. > That is why I need jump tables, there are cases where cmov really dosn't help Before I started using exupery, I called special methods in C that implemented faster code. Every special method (and primitives) returned 1 in case of an error, and if success, returned the result object. One of this special methods was +. This is part of the code: if(areIntegers(rcvr,arg)) { int result; asm( "movl $1,%%edx\n\t" "movl %[rcvr],%[result]\n\t" "addl %[arg],%[result]\n\t" "cmovol %%edx,%[result]" : [result] "=r" (result) : [rcvr] "r" (rcvr), [arg] "r" (arg) : "edx" ); return result; } with this code, I've got up to 10% faster code in + intensive tests. > There are code fragments where cmov whould be helpful. Converting > to a boolean comes to mind. The part of "a > b" where you're loading > either true or false into the result register. > Yes, I implemented that with exupery (code for less "<"): self addExpression: (MedMov from: (self literal: false) to: answer ). trueReg := machine createTemporaryRegister. self addExpression: (MedMov from: (self literal: true) to: trueReg ). self addExpression: (MedComparision operator: #cmp arg1: arg1 arg2: arg2). self addExpression: (MedCMov type: #cmovl from: trueReg to: answer). This gave me an impressive improvement (up to 40-50%), when I implemented all the smallint comparissons in this way. Because, as you know, we dont need to detag before compare. > > > Given adequate test coverage I'll add it. > > > > I also implemented enter and leave instructions. Not because they were > > better (they aren't), but, beacuse I use it to signal the inclusion of > > additional prologue and epilogue code in a final phase added just after > > the allocator. I do it that way because I dont know until then, which > > registrs are used, and the number of additional temps needed. I know > that > > exupery allways push and pop all the registers (which aren't eax, edx > and > > ecx). And that it make place for a big context as temp space in stack. > I > > don't do that. I only push the used regs, and if that is not enough, I > > enter additional stack space. That brakes compatibility with original > > exupery, but I wanted to implement it that way. For small methods, that > is > > really better. > > So, given that, I don't offer anything of this for you. I think you'll > > understand. > > Exupery's prolog and epilogue sequences could be improved. I've been > thinking about overhauling that area for a few years now. I'd like > to have variables spill into their actual locations. So if a stack > variable was stored, it would always be fetched from the context. > Then spilled registers wouldn't need to be loaded and stored on > context switches. > > On thing that I might do in 0.13 is colour the isolated parts of a > method separately. That should improve register allocation as the > inteference graph will not be polluted by other isolated sections of > code. A compiled method is often made up of completely isolated > sections of code. Colouring the sections separately should also speed > up register allocation. > Every improvement you make will help me. Cheers, Guille > Bryce > _______________________________________________ > Exupery mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery > _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
Guillermo Adrián Molina writes:
> > In Exupery the SmallInteger addtion sequence is > > bitTest arg1 > > jumpIfSet failureBlock > > bitTest arg2 > > jumpIfSet failureBlock > > clearTagBit arg1 > > add arg1 arg2 > > jumpOverflow failureBlock > > > > The failure case is a full message send. > > > The problem with the above code is that you have 3 branches. > That is why I need jump tables, there are cases where cmov really dosn't help There is only 3 branches and I'm hoping that they will never be taken so they should be easy to predict. That said the branches do use branch predictor resources which could cause other branches not to be predicted as well. > Before I started using exupery, I called special methods in C that > implemented faster code. Every special method (and primitives) returned 1 > in case of an error, and if success, returned the result object. > One of this special methods was +. This is part of the code: > > if(areIntegers(rcvr,arg)) { > int result; > asm( "movl $1,%%edx\n\t" > "movl %[rcvr],%[result]\n\t" > "addl %[arg],%[result]\n\t" > "cmovol %%edx,%[result]" > : [result] "=r" (result) > : [rcvr] "r" (rcvr), [arg] "r" (arg) > : "edx" ); > return result; > } > > with this code, I've got up to 10% faster code in + intensive tests. Do you have conditionals inside areIntegers and to check if the result is 1 indicating an error? > > There are code fragments where cmov whould be helpful. Converting > > to a boolean comes to mind. The part of "a > b" where you're loading > > either true or false into the result register. > > > > Yes, I implemented that with exupery (code for less "<"): > > self addExpression: (MedMov > from: (self literal: false) > to: answer ). > trueReg := machine createTemporaryRegister. > self addExpression: (MedMov > from: (self literal: true) > to: trueReg ). > self addExpression: (MedComparision > operator: #cmp > arg1: arg1 > arg2: arg2). > self addExpression: (MedCMov > type: #cmovl > from: trueReg > to: answer). > > This gave me an impressive improvement (up to 40-50%), when I implemented > all the smallint comparissons in this way. Because, as you know, we dont > need to detag before compare. Exupery removes many of the boolean conversion sequences. "a < b ifTrue: [x]" First gets translated into: (booleanToControlFlow (controlFlowToBoolean (a < b))) Then Exupery removes the booleanToControlFlow controlFlowToBoolean sequence. The booleanToControlFlow sequence is moved to the failure case where either a or b are not SmallIntegers. So I'm not sure if speeding up the general case will help Exupery as I'm not sure how often it's called. Bryce _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
> Guillermo Adrián Molina writes:
Yes, I agree. I am really not an expert int this matters, but I think It
> > > In Exupery the SmallInteger addtion sequence is > > > bitTest arg1 > > > jumpIfSet failureBlock > > > bitTest arg2 > > > jumpIfSet failureBlock > > > clearTagBit arg1 > > > add arg1 arg2 > > > jumpOverflow failureBlock > > > > > > The failure case is a full message send. > > > > > The problem with the above code is that you have 3 branches. > > That is why I need jump tables, there are cases where cmov really > dosn't help > > There is only 3 branches and I'm hoping that they will never be > taken so they should be easy to predict. That said the branches do > use branch predictor resources which could cause other branches not > to be predicted as well. > is not so uncommon to send #+ with other objects than smallints, in that case, may be one of the first 2 branches would be misspredicted. May be you could test that both of them are smallints with just one branch. (I am doing that right now). But may be I will try to do it without branching at all > > Before I started using exupery, I called special methods in C that > > implemented faster code. Every special method (and primitives) returned > 1 > > in case of an error, and if success, returned the result object. > > One of this special methods was +. This is part of the code: > > > > if(areIntegers(rcvr,arg)) { > > int result; > > asm( "movl $1,%%edx\n\t" > > "movl %[rcvr],%[result]\n\t" > > "addl %[arg],%[result]\n\t" > > "cmovol %%edx,%[result]" > > : [result] "=r" (result) > > : [rcvr] "r" (rcvr), [arg] "r" (arg) > > : "edx" ); > > return result; > > } > > > > with this code, I've got up to 10% faster code in + intensive tests. > > Do you have conditionals inside areIntegers and to check if the result > is 1 indicating an error? > exupery at compile time) I dont't worry about it any more. But areIntegers() is just an "or" and an "and", the branch is represented in the C "if" statement. I wrote the addition that way because I wanted to test if cmov was really that fast. It was better, but not THAT better. Guille _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
Free forum by Nabble | Edit this page |