testing with new 0.1

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

testing with new 0.1

Chris Muller
Hi Bryce, thanks for the new release.  I have been playing around with it and have some questions.  

First, I was surprised to see the suggestion to compile Array>>at: and Array>>at:put:.  These methods are already inlined, right?  They inherit the implementation in Object which are already both primitives.  Primitives are already "compiled" right?  So how does Exupery speed them up?

Overall, does it ever make sense to compile a method with just a single send in it?  Does Exupery dive into the called method and compile that too?  If not, I don't see how it could help unless some inlining was done..

This allso led to my other question.  These Swiki instructions tell Exupery to compile an inherited method, not an actual method that exists on Array.  Since we don't pass the actual CompiledMethod object to Exupery (e.g., Exupery compile: Array>>#at:), instead a selector and a class, what interpretations about these does Exupery make with respect to inheritance.

Forging onward, I decided to piggyback on your easy approach provided by ExuperyBenchmarks to experiment so I can easily see results of compiling some of my own methods.

I have these two methods which set or get an unsigned integer into a ByteArray.  Because of the lesser results I get, I want to post their implementation right in the email so you may have an idea..

ByteArray>>maUint: bits at: anInteger
    | answer bytes |
    bits == 64 ifTrue: [ ^ self maUnsigned64At: anInteger + 1 ].
    bits == 56 ifTrue: [ ^ self maUnsigned56At: anInteger + 1 ].
    bits == 48 ifTrue: [ ^ self maUnsigned48At: anInteger + 1 ].
    bits == 40 ifTrue: [ ^ self maUnsigned40At: anInteger + 1 ].
    bits == 32
        ifTrue:
            [ ^self
                unsignedLongAt: anInteger + 1
                bigEndian: false ].
    bits == 16
        ifTrue:
            [ ^self
                unsignedShortAt: anInteger + 1
                bigEndian: false ].
    bits == 8
        ifTrue:
            [ ^self byteAt: anInteger + 1 ].
    bytes _ bits // 8.
    answer _ LargePositiveInteger new: bytes.
    1 to: bytes do:
        [ :digitPosition |
        answer
            digitAt: digitPosition
            put: (self at: digitPosition + anInteger) ].
    ^answer normalize

and

ByteArray>>#maUint: bits at: position put: anInteger
    position + 1
        to: position + (bits // 8)
        do:
            [ :pos |
            self
                at: pos
                put: (anInteger digitAt: pos-position) ].
    ^anInteger

Now, when I just compiled these two methods alone, I get:

    maUintAtPutBenchmark 5625 compiled 7912 ratio: 0.711
    maUintAtBenchmark 2610 compiled 3474 ratio: 0.751
    Cumulative Time 27.124 compiled 30.748 ratio 0.882

So, this tells me that compiling *can* actually make things worse.  Is there any way for Exupery to detect and prevent this or is this strictly the user/developers responsibility to profile everything for comparison?

Next I tried adding compilation of some of the lower-level methods called by these methods.  I first just added

    Exupery compileMethod: #at: class: ByteArray

Normally I wouldn't do this but since the Swiki suggested compiling Array>>#at: I thought it worth a try.  But it caused the stack-trace at the end of this e-mail.  I tried just a few more experiments such as SmallInteger>>#digitAt: and Integer>>#bitShift: but generally couldn't get results above 1.0.

BTW, on a completely separate experiment, one method I tried to compile it said "Unknow bytecode" (it was bytecode 136).

Ok, so obviously I'm a novice at this!  This project is exciting and I'm at least I know Exupery is doing something (compared to my last attempt where I didn't even know to compile methods), but hope that I can figure out how to actually make it go *faster*.  :)  

Can you help me with any advice / guidelines for *what* to compile?

Thanks,
  Chris

15 November 2006 10:53:18 pm

VM: Win32 - a SmalltalkImage
Image: Squeak3.8 [latest update: #6665]

SecurityManager state:
Restricted: false
FileAccess: true
SocketAccess: true
Working Dir C:\Development\Chris\Development\Squeak
Trusted Dir C:\Development\Chris\Development\Squeak\Chris
Untrusted Dir C:\My Squeak\Chris

IntermediateSimplifier>>primitiveAt:
    Receiver: an IntermediateSimplifier
    Arguments and temporary variables:
        aMedPrimitive:     #(#primitive 60 'block6' #(#mem #(#add #(#mem #activeContext) #(...etc...
        methodBytecodes:     nil
        index:     nil
        resultAddress:     nil
        receiver:     nil
    Receiver's instance variables:
        source:     a MedMethod
        result:     a MedMethod
        emitter:     an IntermediateEmitter
        currentBlock:     (block1
 (primitiveReturn #(#primitive 60 'block6' #(#mem #(#add ...etc...
        stack:     an OrderedCollection()
        simplifier:     an IntermediateSimplifier
        stacksForBlocks:     a Dictionary()

IntermediateSimplifier>>visitPrimitive:
    Receiver: an IntermediateSimplifier
    Arguments and temporary variables:
        aMedPrimitive:     #(#primitive 60 'block6' #(#mem #(#add #(#mem #activeContext) #(...etc...
    Receiver's instance variables:
        source:     a MedMethod
        result:     a MedMethod
        emitter:     an IntermediateEmitter
        currentBlock:     (block1
 (primitiveReturn #(#primitive 60 'block6' #(#mem #(#add ...etc...
        stack:     an OrderedCollection()
        simplifier:     an IntermediateSimplifier
        stacksForBlocks:     a Dictionary()

MedPrimitive>>visitWith:
    Receiver: #(#primitive 60 'block6' #(#mem #(#add #(#mem #activeContext) #(#add #(#sal #(#mem #(#add ...etc...
    Arguments and temporary variables:
        aTreeOptimiser:     an IntermediateSimplifier
    Receiver's instance variables:
        in:     nil
        out:     nil
        primitiveName:     primitive 60
        arguments:     #(#(#mem #(#add #(#mem #activeContext) #(#add #(#sal #(#mem #(#add #...etc...
        failureBlock:     (block6
 (createContext)
 (deconvertBoolean falseObj #(#send #(#m...etc...
        receiver:     ByteArray

IntermediateSimplifier>>visitPrimitiveReturn:
    Receiver: an IntermediateSimplifier
    Arguments and temporary variables:
        aMedPrimitiveReturn:     #(#primitiveReturn 2 #(#primitive 60 'block6' #(#mem #(#ad...etc...
    Receiver's instance variables:
        source:     a MedMethod
        result:     a MedMethod
        emitter:     an IntermediateEmitter
        currentBlock:     (block1
 (primitiveReturn #(#primitive 60 'block6' #(#mem #(#add ...etc...
        stack:     an OrderedCollection()
        simplifier:     an IntermediateSimplifier
        stacksForBlocks:     a Dictionary()


--- The full stack ---
IntermediateSimplifier>>primitiveAt:
IntermediateSimplifier>>visitPrimitive:
MedPrimitive>>visitWith:
IntermediateSimplifier>>visitPrimitiveReturn:
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
MedPrimitiveReturn>>visitWith:
[] in IntermediateSimplifier(IntermediateCopier)>>simplifyBlock: {[:each | emitter   addExpression: (each visitWith: self)]}
OrderedCollection>>do:
MedBlock>>instructionsDo:
IntermediateSimplifier(IntermediateCopier)>>simplifyBlock:
IntermediateSimplifier>>simplifyBlock:
[] in IntermediateSimplifier>>visitMethod: {[:each |  self prepareBlockFrom: each.  self simplifyBlock: each]}
OrderedCollection>>do:
IntermediateSimplifier>>visitMethod:
MedMethod>>visitWith:
IntermediateSimplifier(IntermediateCopier)>>run
Exupery>>convertIntermediate
Exupery>>run
Exupery class>>compileMethod:inlining:receiver:
Exupery class>>compileMethod:into:forClass:inlining:
Exupery class>>compileMethod:class:inlining:
Exupery class>>compileMethod:class:
ExuperyBenchmarks>>compilemaUintAt
ExuperyBenchmarks>>runBenchmark:compilingWith:
[] in ExuperyBenchmarks>>run {[:each | self   runBenchmark: (each at: 1)   compilingWith: (each at: 2)]}
Array(SequenceableCollection)>>do:
ExuperyBenchmarks>>run
UndefinedObject>>DoIt
Compiler>>evaluate:in:to:notifying:ifFail:logged:
[] in TextMorphEditor(ParagraphEditor)>>evaluateSelection {[rcvr class evaluatorClass new   evaluate: self selectionAsStream   in: ctxt...]}
BlockContext>>on:do:
TextMorphEditor(ParagraphEditor)>>evaluateSelection
TextMorphEditor(ParagraphEditor)>>doIt
[] in TextMorphEditor(ParagraphEditor)>>doIt: {[self doIt]}
TextMorphEditor(Controller)>>terminateAndInitializeAround:
TextMorphEditor(ParagraphEditor)>>doIt:
TextMorphEditor(ParagraphEditor)>>dispatchOnCharacter:with:
TextMorphEditor>>dispatchOnCharacter:with:
TextMorphEditor(ParagraphEditor)>>readKeyboard
...etc...



_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

testing with new 0.1

Bryce Kampjes

Hi Chris,

First, there are a few bytecodes that don't compile. The major
language feature missing now is cascades. Bytecode 136 is duplicate
top of stack which is used for cascades. There's also only a handful
of primitives implemented. If you're doing something that the
interpreter optimises more than Exupery does now then compiling will
slow down execution. That said, 70% of the time is spent inside
interpret() the big interpreter function produced by inlining the
interpreter's main loop. For now I'm targeting that 70%.

The easiest way to try to optimise something is to use the following
sequence:

    ExuperyProfiler optimise: [your code].

#optimise: runs the code in the block and profiles it. Based on that
profile it will try to compile methods that will benefit.

   your code.

Execute your code again to populate the polymorphic inline
caches. Exupery uses them to dynamically inline primitives.

   Exupery dynamicallyInline.

#dynamciallyInline runs over all the natively compiled methods in
the system then dynamically inlines any primitives


Exupery's send optimisations only provide a speed improvement if both
sides are compiled. Performance seems identical to the interpreter
when calling interpreted code.

The interpreter's main loop includes implementations of a handful of
primitives including #at: and #at:put: that have their own bytecodes.
Exupery optimises these by using dynamic primitive inlining however
that requires a second compile or explicit inlining instructions. Also
I haven't yet re-implemented all of the primitives that the interpreter
optimises. SmallInteger operations are automatically inlined.

Exupery also needs to compile a method once for each receiver. I do
this so that I can specialise the method for it's receiver. At the
moment only #at: and #at:put: are specialised. The advantage is that
the code executed is customised to the receivers shape. I may allow
some method's to be shared to multiple receivers in the future but for
now compiling everything the same way is simpler.


In your example:

    ByteArray>>#maUint: bits at: position put: anInteger
        position + 1
            to: position + (bits // 8)
            do:
                [ :pos |
                self
                    at: pos
                    put: (anInteger digitAt: pos-position) ].
        ^anInteger

First I'd try compiling it using the profiler as above. If I was
manually trying to compile it, I would also compile
SmallInteger>>digitAt:. ByteArray>>#at:put: can not be compiled yet
but the intepreter optimises it into the #at:put: bytecode. When
optimising a method, try to compile all the methods it will call while
measuring any benefits.

I haven't yet tried to optimise code that uses LargeIntegers heavily.
I don't know how such code will perform. There are several options
availible to optimise them including compiling calls to primitives
into compiled code. Compiling a call to a primitive would let it
benefit from Exupery's faster sends between compiled code.

How heavily are LargeInteger's used in Magma?

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery