Interpreter>>isContextHeader: optimization

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Interpreter>>isContextHeader: optimization

Igor Stasenko
 
Here the method:

isContextHeader: aHeader
        self inline: true.
        ^ ((aHeader >> 12) bitAnd: 16r1F) = 13 "MethodContext"
                or: [((aHeader >> 12) bitAnd: 16r1F) = 14 "BlockContext"
                or: [((aHeader >> 12) bitAnd: 16r1F) = 4]] "PseudoContext"

i think it wouldn't hurt to rewrite it as:

isContextHeader: aHeader
        self inline: true.
 | hdr |
  hdr := aHeader bitAnd: (16r1F << 12).
        ^ hdr = (13 << 12) "MethodContext"
                or: [ hdr = (14 << 12) "BlockContext"
                or: [ hdr = (4 << 12)]] "PseudoContext"

which will allow GCC to optimize it more easily.
I'm not sure if it can optimize it in its current state.
This may impact a small speedup of copy operations and any other
operations which need to determine a number of pointer fields in
object (users of #lastPointerOf:)

--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Eliot Miranda-2
 
Hi Igor,

On Fri, Feb 20, 2009 at 11:37 PM, Igor Stasenko <[hidden email]> wrote:

Here the method:

isContextHeader: aHeader
       self inline: true.
       ^ ((aHeader >> 12) bitAnd: 16r1F) = 13                  "MethodContext"
               or: [((aHeader >> 12) bitAnd: 16r1F) = 14               "BlockContext"
               or: [((aHeader >> 12) bitAnd: 16r1F) = 4]]      "PseudoContext"

i think it wouldn't hurt to rewrite it as:

isContextHeader: aHeader
       self inline: true.
 | hdr |
 hdr := aHeader bitAnd: (16r1F << 12).
       ^ hdr = (13 << 12)                      "MethodContext"
               or: [ hdr = (14 << 12)          "BlockContext"
               or: [ hdr = (4 << 12)]]  "PseudoContext"

which will allow GCC to optimize it more easily.
I'm not sure if it can optimize it in its current state.
This may impact a small speedup of copy operations and any other
operations which need to determine a number of pointer fields in
object (users of #lastPointerOf:)
 
First you should look at the assembly that gcc generates to be sure anything is needed.  e.g.
cat >t.c <<END
long isContext(long aHeader) {
    return ((aHeader >> 12) & 0x1F) == 13
        || ((aHeader >> 12) & 0x1F) == 14
        || ((aHeader >> 12) & 0x1F) == 4;
}
END
gcc -O3 -S -fomit-frame-pointer t.c; cat t.s
    .text
.globl _isContext
_isContext:
    movl    4(%esp), %edx
    sarl    $12, %edx
    andl    $31, %edx
    leal    -13(%edx), %eax
    cmpl    $1, %eax
    jbe L2
    cmpl    $4, %edx
    je  L2
    xorl    %eax, %eax
    ret
L2:
    movl    $1, %eax
    ret
    .subsections_via_symbols


So you don't need to do anything; it has done everything for you.

However, one point is important.  Using 16r1F << 12 et al as your masks and constants to compare against is much worse on many systems, most importantly x86, than shifting down by 12 and comparing against small constants, because the instruction set can encode small constants far more compactly, and that means better code density in the icache which is significant for performance.  e.g. on x86 a constant in the range -128 to 127 typically takes a byte whereas anything else will take 4.

But what I really think is that this is too low a level to worry about.  Much more important to focus on
- context to stack mapping
- in-line cacheing via a JIT
- exploiting multicore via Hydra
and beyond (e.g. speculative inlining)
than worrying about tiny micro-optimizations like this :)

Best
Eliot



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Igor Stasenko

2009/2/21 Eliot Miranda <[hidden email]>:

>
> Hi Igor,
>
> On Fri, Feb 20, 2009 at 11:37 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> Here the method:
>>
>> isContextHeader: aHeader
>>        self inline: true.
>>        ^ ((aHeader >> 12) bitAnd: 16r1F) = 13                  "MethodContext"
>>                or: [((aHeader >> 12) bitAnd: 16r1F) = 14               "BlockContext"
>>                or: [((aHeader >> 12) bitAnd: 16r1F) = 4]]      "PseudoContext"
>>
>> i think it wouldn't hurt to rewrite it as:
>>
>> isContextHeader: aHeader
>>        self inline: true.
>>  | hdr |
>>  hdr := aHeader bitAnd: (16r1F << 12).
>>        ^ hdr = (13 << 12)                      "MethodContext"
>>                or: [ hdr = (14 << 12)          "BlockContext"
>>                or: [ hdr = (4 << 12)]]  "PseudoContext"
>>
>> which will allow GCC to optimize it more easily.
>> I'm not sure if it can optimize it in its current state.
>> This may impact a small speedup of copy operations and any other
>> operations which need to determine a number of pointer fields in
>> object (users of #lastPointerOf:)
>
>
> First you should look at the assembly that gcc generates to be sure anything is needed.  e.g.
> cat >t.c <<END
> long isContext(long aHeader) {
>     return ((aHeader >> 12) & 0x1F) == 13
>         || ((aHeader >> 12) & 0x1F) == 14
>         || ((aHeader >> 12) & 0x1F) == 4;
> }
> END
> gcc -O3 -S -fomit-frame-pointer t.c; cat t.s
>     .text
> .globl _isContext
> _isContext:
>     movl    4(%esp), %edx
>     sarl    $12, %edx
>     andl    $31, %edx
>     leal    -13(%edx), %eax
>     cmpl    $1, %eax
>     jbe L2
>     cmpl    $4, %edx
>     je  L2
>     xorl    %eax, %eax
>     ret
> L2:
>     movl    $1, %eax
>     ret
>     .subsections_via_symbols
>
> So you don't need to do anything; it has done everything for you.
> However, one point is important.  Using 16r1F << 12 et al as your masks and constants to compare against is much worse on many systems, most importantly x86, than shifting down by 12 and comparing against small constants, because the instruction set can encode small constants far more compactly, and that means better code density in the icache which is significant for performance.  e.g. on x86 a constant in the range -128 to 127 typically takes a byte whereas anything else will take 4.
> But what I really think is that this is too low a level to worry about.  Much more important to focus on
> - context to stack mapping
> - in-line cacheing via a JIT
> - exploiting multicore via Hydra
> and beyond (e.g. speculative inlining)
> than worrying about tiny micro-optimizations like this :)

Thanks Eliot.
In fact, this method drawn my attention because of its number of
checks. Typically, all code which dealing with object formats contain
many branches. And from places where this method is called, there are
additional checks surrounding it.
So, the real problem is the overwhelming number of checks to do
something, and i think this having own impacts on performance.
I hope that a new object format with 64 bit header, which you plan to
use, will allow us to avoid so many branches in code which having high
usage frequency.

> Best
> Eliot
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>


--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Eliot Miranda-2
 


On Sat, Feb 21, 2009 at 3:36 PM, Igor Stasenko <[hidden email]> wrote:

2009/2/21 Eliot Miranda <[hidden email]>:
>
> Hi Igor,
>
> On Fri, Feb 20, 2009 at 11:37 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> Here the method:
>>
>> isContextHeader: aHeader
>>        self inline: true.
>>        ^ ((aHeader >> 12) bitAnd: 16r1F) = 13                  "MethodContext"
>>                or: [((aHeader >> 12) bitAnd: 16r1F) = 14               "BlockContext"
>>                or: [((aHeader >> 12) bitAnd: 16r1F) = 4]]      "PseudoContext"
>>
>> i think it wouldn't hurt to rewrite it as:
>>
>> isContextHeader: aHeader
>>        self inline: true.
>>  | hdr |
>>  hdr := aHeader bitAnd: (16r1F << 12).
>>        ^ hdr = (13 << 12)                      "MethodContext"
>>                or: [ hdr = (14 << 12)          "BlockContext"
>>                or: [ hdr = (4 << 12)]]  "PseudoContext"
>>
>> which will allow GCC to optimize it more easily.
>> I'm not sure if it can optimize it in its current state.
>> This may impact a small speedup of copy operations and any other
>> operations which need to determine a number of pointer fields in
>> object (users of #lastPointerOf:)
>
>
> First you should look at the assembly that gcc generates to be sure anything is needed.  e.g.
> cat >t.c <<END
> long isContext(long aHeader) {
>     return ((aHeader >> 12) & 0x1F) == 13
>         || ((aHeader >> 12) & 0x1F) == 14
>         || ((aHeader >> 12) & 0x1F) == 4;
> }
> END
> gcc -O3 -S -fomit-frame-pointer t.c; cat t.s
>     .text
> .globl _isContext
> _isContext:
>     movl    4(%esp), %edx
>     sarl    $12, %edx
>     andl    $31, %edx
>     leal    -13(%edx), %eax
>     cmpl    $1, %eax
>     jbe L2
>     cmpl    $4, %edx
>     je  L2
>     xorl    %eax, %eax
>     ret
> L2:
>     movl    $1, %eax
>     ret
>     .subsections_via_symbols
>
> So you don't need to do anything; it has done everything for you.
> However, one point is important.  Using 16r1F << 12 et al as your masks and constants to compare against is much worse on many systems, most importantly x86, than shifting down by 12 and comparing against small constants, because the instruction set can encode small constants far more compactly, and that means better code density in the icache which is significant for performance.  e.g. on x86 a constant in the range -128 to 127 typically takes a byte whereas anything else will take 4.
> But what I really think is that this is too low a level to worry about.  Much more important to focus on
> - context to stack mapping
> - in-line cacheing via a JIT
> - exploiting multicore via Hydra
> and beyond (e.g. speculative inlining)
> than worrying about tiny micro-optimizations like this :)

Thanks Eliot.
In fact, this method drawn my attention because of its number of
checks. Typically, all code which dealing with object formats contain
many branches. And from places where this method is called, there are
additional checks surrounding it.
So, the real problem is the overwhelming number of checks to do
something, and i think this having own impacts on performance.
I hope that a new object format with 64 bit header, which you plan to
use, will allow us to avoid so many branches in code which having high
usage frequency.

In fact the StackVM makes a big improvement to this very method because in the StackVM there are only MethodContexts and so the method reads

isContextHeader: aHeader
<inline: true>
"c.f. {BlockContext. MethodContext. PseudoContext} collect: [:class| class -> class indexIfCompact]"
^(self compactClassIndexOfHeader: aHeader) == ClassMethodContextCompactIndex

which is f course equivalent to

isContextHeader: aHeader
^((aHeader >> 12) bitAnd: 16r1F) = 13

:)
 
> Best
> Eliot
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Igor Stasenko

2009/2/22 Eliot Miranda <[hidden email]>:

>
>
>
> On Sat, Feb 21, 2009 at 3:36 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> 2009/2/21 Eliot Miranda <[hidden email]>:
>> >
>> > Hi Igor,
>> >
>> > On Fri, Feb 20, 2009 at 11:37 PM, Igor Stasenko <[hidden email]> wrote:
>> >>
>> >> Here the method:
>> >>
>> >> isContextHeader: aHeader
>> >>        self inline: true.
>> >>        ^ ((aHeader >> 12) bitAnd: 16r1F) = 13                  "MethodContext"
>> >>                or: [((aHeader >> 12) bitAnd: 16r1F) = 14               "BlockContext"
>> >>                or: [((aHeader >> 12) bitAnd: 16r1F) = 4]]      "PseudoContext"
>> >>
>> >> i think it wouldn't hurt to rewrite it as:
>> >>
>> >> isContextHeader: aHeader
>> >>        self inline: true.
>> >>  | hdr |
>> >>  hdr := aHeader bitAnd: (16r1F << 12).
>> >>        ^ hdr = (13 << 12)                      "MethodContext"
>> >>                or: [ hdr = (14 << 12)          "BlockContext"
>> >>                or: [ hdr = (4 << 12)]]  "PseudoContext"
>> >>
>> >> which will allow GCC to optimize it more easily.
>> >> I'm not sure if it can optimize it in its current state.
>> >> This may impact a small speedup of copy operations and any other
>> >> operations which need to determine a number of pointer fields in
>> >> object (users of #lastPointerOf:)
>> >
>> >
>> > First you should look at the assembly that gcc generates to be sure anything is needed.  e.g.
>> > cat >t.c <<END
>> > long isContext(long aHeader) {
>> >     return ((aHeader >> 12) & 0x1F) == 13
>> >         || ((aHeader >> 12) & 0x1F) == 14
>> >         || ((aHeader >> 12) & 0x1F) == 4;
>> > }
>> > END
>> > gcc -O3 -S -fomit-frame-pointer t.c; cat t.s
>> >     .text
>> > .globl _isContext
>> > _isContext:
>> >     movl    4(%esp), %edx
>> >     sarl    $12, %edx
>> >     andl    $31, %edx
>> >     leal    -13(%edx), %eax
>> >     cmpl    $1, %eax
>> >     jbe L2
>> >     cmpl    $4, %edx
>> >     je  L2
>> >     xorl    %eax, %eax
>> >     ret
>> > L2:
>> >     movl    $1, %eax
>> >     ret
>> >     .subsections_via_symbols
>> >
>> > So you don't need to do anything; it has done everything for you.
>> > However, one point is important.  Using 16r1F << 12 et al as your masks and constants to compare against is much worse on many systems, most importantly x86, than shifting down by 12 and comparing against small constants, because the instruction set can encode small constants far more compactly, and that means better code density in the icache which is significant for performance.  e.g. on x86 a constant in the range -128 to 127 typically takes a byte whereas anything else will take 4.
>> > But what I really think is that this is too low a level to worry about.  Much more important to focus on
>> > - context to stack mapping
>> > - in-line cacheing via a JIT
>> > - exploiting multicore via Hydra
>> > and beyond (e.g. speculative inlining)
>> > than worrying about tiny micro-optimizations like this :)
>>
>> Thanks Eliot.
>> In fact, this method drawn my attention because of its number of
>> checks. Typically, all code which dealing with object formats contain
>> many branches. And from places where this method is called, there are
>> additional checks surrounding it.
>> So, the real problem is the overwhelming number of checks to do
>> something, and i think this having own impacts on performance.
>> I hope that a new object format with 64 bit header, which you plan to
>> use, will allow us to avoid so many branches in code which having high
>> usage frequency.
>
> In fact the StackVM makes a big improvement to this very method because in the StackVM there are only MethodContexts and so the method reads
> isContextHeader: aHeader
> <inline: true>
> "c.f. {BlockContext. MethodContext. PseudoContext} collect: [:class| class -> class indexIfCompact]"
> ^(self compactClassIndexOfHeader: aHeader) == ClassMethodContextCompactIndex
> which is f course equivalent to
> isContextHeader: aHeader
> ^((aHeader >> 12) bitAnd: 16r1F) = 13
> :)
>

yeah much more concise & understandable.

I currently thinking is there are simple ways to decompose huge
ObjectMemory/Interpreter on multiple smaller classes.
To illustrate it , applied to #isContextHeader: we could write it as following:

isContextHeader: aHeader
<inline: true>
<var: #aHeader class: #OopBaseHeader>
^ aHeader compactClassIndex == ClassMethodContextCompactIndex

and, of course, then we really don't need #isContextHeader: at all
because we can simply write a direct message in methods where we need
such checks:

<var: #header class: #OopBaseHeader>
<var: #oop class: #Oop>
header := oop basicHeader.
header isContextHeader ifTrue: [ ... ]

The idea is to use type information in code generator to determine
where it should look for a code when translating a message sends to C
code.
With little more heuristics, we don't even need to declare types so often:
Oop>>basicHeader
 <returnType: #OopBaseHeader>
 ^ self longAt: 0
or even:

Oop>>basicHeader
 ^ (self longAt: 0) as:OopBaseHeader


so, then you could simply write:
oop basicHeader isContextHeader ifTrue: [... ]

looks like a plain smalltalk code, isnt? :)

i'm using similar technique for static inlining in Moebius/CorruptVM.

>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Eliot Miranda-2
 
Hi Igor,

On Sat, Feb 21, 2009 at 7:10 PM, Igor Stasenko <[hidden email]> wrote:

2009/2/22 Eliot Miranda <[hidden email]>:
>
>
>
> On Sat, Feb 21, 2009 at 3:36 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> 2009/2/21 Eliot Miranda <[hidden email]>:
>> >
>> > Hi Igor,
>> >
>> > On Fri, Feb 20, 2009 at 11:37 PM, Igor Stasenko <[hidden email]> wrote:
>> >>
>> >> Here the method:
>> >>
>> >> isContextHeader: aHeader
>> >>        self inline: true.
>> >>        ^ ((aHeader >> 12) bitAnd: 16r1F) = 13                  "MethodContext"
>> >>                or: [((aHeader >> 12) bitAnd: 16r1F) = 14               "BlockContext"
>> >>                or: [((aHeader >> 12) bitAnd: 16r1F) = 4]]      "PseudoContext"
>> >>
>> >> i think it wouldn't hurt to rewrite it as:
>> >>
>> >> isContextHeader: aHeader
>> >>        self inline: true.
>> >>  | hdr |
>> >>  hdr := aHeader bitAnd: (16r1F << 12).
>> >>        ^ hdr = (13 << 12)                      "MethodContext"
>> >>                or: [ hdr = (14 << 12)          "BlockContext"
>> >>                or: [ hdr = (4 << 12)]]  "PseudoContext"
>> >>
>> >> which will allow GCC to optimize it more easily.
>> >> I'm not sure if it can optimize it in its current state.
>> >> This may impact a small speedup of copy operations and any other
>> >> operations which need to determine a number of pointer fields in
>> >> object (users of #lastPointerOf:)
>> >
>> >
>> > First you should look at the assembly that gcc generates to be sure anything is needed.  e.g.
>> > cat >t.c <<END
>> > long isContext(long aHeader) {
>> >     return ((aHeader >> 12) & 0x1F) == 13
>> >         || ((aHeader >> 12) & 0x1F) == 14
>> >         || ((aHeader >> 12) & 0x1F) == 4;
>> > }
>> > END
>> > gcc -O3 -S -fomit-frame-pointer t.c; cat t.s
>> >     .text
>> > .globl _isContext
>> > _isContext:
>> >     movl    4(%esp), %edx
>> >     sarl    $12, %edx
>> >     andl    $31, %edx
>> >     leal    -13(%edx), %eax
>> >     cmpl    $1, %eax
>> >     jbe L2
>> >     cmpl    $4, %edx
>> >     je  L2
>> >     xorl    %eax, %eax
>> >     ret
>> > L2:
>> >     movl    $1, %eax
>> >     ret
>> >     .subsections_via_symbols
>> >
>> > So you don't need to do anything; it has done everything for you.
>> > However, one point is important.  Using 16r1F << 12 et al as your masks and constants to compare against is much worse on many systems, most importantly x86, than shifting down by 12 and comparing against small constants, because the instruction set can encode small constants far more compactly, and that means better code density in the icache which is significant for performance.  e.g. on x86 a constant in the range -128 to 127 typically takes a byte whereas anything else will take 4.
>> > But what I really think is that this is too low a level to worry about.  Much more important to focus on
>> > - context to stack mapping
>> > - in-line cacheing via a JIT
>> > - exploiting multicore via Hydra
>> > and beyond (e.g. speculative inlining)
>> > than worrying about tiny micro-optimizations like this :)
>>
>> Thanks Eliot.
>> In fact, this method drawn my attention because of its number of
>> checks. Typically, all code which dealing with object formats contain
>> many branches. And from places where this method is called, there are
>> additional checks surrounding it.
>> So, the real problem is the overwhelming number of checks to do
>> something, and i think this having own impacts on performance.
>> I hope that a new object format with 64 bit header, which you plan to
>> use, will allow us to avoid so many branches in code which having high
>> usage frequency.
>
> In fact the StackVM makes a big improvement to this very method because in the StackVM there are only MethodContexts and so the method reads
> isContextHeader: aHeader
> <inline: true>
> "c.f. {BlockContext. MethodContext. PseudoContext} collect: [:class| class -> class indexIfCompact]"
> ^(self compactClassIndexOfHeader: aHeader) == ClassMethodContextCompactIndex
> which is f course equivalent to
> isContextHeader: aHeader
> ^((aHeader >> 12) bitAnd: 16r1F) = 13
> :)
>

yeah much more concise & understandable.

I currently thinking is there are simple ways to decompose huge
ObjectMemory/Interpreter on multiple smaller classes.
To illustrate it , applied to #isContextHeader: we could write it as following:

isContextHeader: aHeader
<inline: true>
<var: #aHeader class: #OopBaseHeader>
^ aHeader compactClassIndex == ClassMethodContextCompactIndex

and, of course, then we really don't need #isContextHeader: at all
because we can simply write a direct message in methods where we need
such checks:

<var: #header class: #OopBaseHeader>
<var: #oop class: #Oop>
header := oop basicHeader.
header isContextHeader ifTrue: [ ... ]

The idea is to use type information in code generator to determine
where it should look for a code when translating a message sends to C
code.
With little more heuristics, we don't even need to declare types so often:
Oop>>basicHeader
 <returnType: #OopBaseHeader>
 ^ self longAt: 0
or even:

Oop>>basicHeader
 ^ (self longAt: 0) as:OopBaseHeader


so, then you could simply write:
oop basicHeader isContextHeader ifTrue: [... ]

looks like a plain smalltalk code, isnt? :)

i'm using similar technique for static inlining in Moebius/CorruptVM.

I'm also using something like this in Cog, but only for simple struct types, a machine code method CogMethod, a stack page, various structs in the compiler such as an instruction, a block start, etc.  e.g.
generateInstructionsAt: eventualAbsoluteAddress 
        
"Size pc-dependent instructions and assign eventual addresses to all instructions. 
         Answer the size of the code. 
         Compute forward branches based on virtual address (abstract code starts at 0), 
         assuming that any branches branched over are long. 
         Compute backward branches based on actual address. 
         Reuse the fixups array to record the pc-dependent instructions that need to have 
         their code generation postponed until after the others."
 
        
| absoluteAddress pcDependentIndex abstractInstruction fixup | 
        <var: 
#abstractInstruction type: #'AbstractInstruction *'
        <var: 
#fixup type: #'BytecodeFixup *'
        
absoluteAddress := eventualAbsoluteAddress
        
pcDependentIndex := 0
        
0 to: opcodeIndex - 1 do: 
                 [:
i| 
                 breakPC 
= absoluteAddress ifTrue: 
                          
[self halt: 'breakPC reached in generateInstructionsAt:']
                 
abstractInstruction := self abstractInstructionAt: i
                 
abstractInstruction isPCDependent 
                          
ifTrue: 
                                   
[abstractInstruction sizePCDependentInstructionAt: absoluteAddress
                                    
fixup := self fixupAt: pcDependentIndex
                                    
pcDependentIndex := pcDependentIndex + 1
                                    
fixup instructionIndex: i
                                    
absoluteAddress := absoluteAddress + abstractInstruction machineCodeSize] 
                          
ifFalse: 
                                   
[abstractInstruction address: absoluteAddress
                                    
absoluteAddress := abstractInstruction concretizeAt: absoluteAddress]]. 
        
0 to: pcDependentIndex - 1 do: 
                 [:
i| 
                 
fixup := self fixupAt: i
                 
abstractInstruction := self abstractInstructionAt: fixup instructionIndex
                 breakPC 
= absoluteAddress ifTrue: 
                          
[self halt: 'breakPC reached in generateInstructionsAt:']
                 
abstractInstruction concretizeAt: abstractInstruction address]. 
        
^absoluteAddress - eventualAbsoluteAddress


You'll notice the lack of type inferrence means I have to assign typed results to typed local variables to have the code generator be able to find the right code.

My problem with doing it for oops has been not wanting to add methods to Integer.  Do you use a special type for oop or do you add methods to Integer?



>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Bryce Kampjes
In reply to this post by Eliot Miranda-2
 
Eliot Miranda writes:
 >
 > But what I really think is that this is too low a level to worry about.
 >  Much more important to focus on
 > - context to stack mapping
 > - in-line cacheing via a JIT
 > - exploiting multicore via Hydra
 > and beyond (e.g. speculative inlining)
 > than worrying about tiny micro-optimizations like this :)

If you're planning on adding speculative, I assume Self style dynamic,
inlining won't that reduce the value of context to stack mapping?

My view with Exupery is context caches should be left until after
dynamic inlining as their value will depend on how well dynamic
inlining reduces the number of sends.

Bryce
Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Eliot Miranda-2
 


On Sun, Feb 22, 2009 at 10:37 AM, <[hidden email]> wrote:

Eliot Miranda writes:
 >
 > But what I really think is that this is too low a level to worry about.
 >  Much more important to focus on
 > - context to stack mapping
 > - in-line cacheing via a JIT
 > - exploiting multicore via Hydra
 > and beyond (e.g. speculative inlining)
 > than worrying about tiny micro-optimizations like this :)

If you're planning on adding speculative, I assume Self style dynamic,
inlining won't that reduce the value of context to stack mapping?

Not at all; in fact quite the reverse.  Context to stack mapping allows one to retain contexts while having the VM execute efficient, stack-based code (i.e. using hardware call instructions).  This in turn enables the entire adaptive optimizer, including the stack analyser and the bytecode-to-bytecode compiler/method inliner to be written in Smalltalk.  The image level code can examine the run-time stack using contexts as their interface without having to understand native stack formats or different ISAs.  The optimizer is therefore completely portable with all machine specificities confined to the underlying VM which is much simpler by virtue of not containing a sophisticated optimizer (which one would have to squeeze through Slang etc).

So for me, context-to-stack mapping is fundamental to implementing speculative inlining in Smalltalk.


My view with Exupery is context caches should be left until after
dynamic inlining as their value will depend on how well dynamic
inlining reduces the number of sends.

I know and I disagree.  Dynamic inlining depends on collecting good type information, something that inline caches do well.  In-line caches are efficiently implemented with native call instructions, either to method entry-points or PIC jump tables.  Native call instructions mesh well with stacks.  So context-to-stack mapping, for me, is a sensible enabling optimization for speculative inlining because it meshes well with inline caches.

Further, context-to-stack mapping is such a huge win that it'll be of benefit even if the VM is spending 90% of its time in inlined call-less code.  We see a speedup of very nearly 2x (48% sticks in my head) for one non-micro tree walking benchmark from the computer language shootout.  And this is in a very slow VM.  In a faster VM context-to-stack mapping would be even more valuable, because it would save an even greater percentage of overall execution time.

Further still using call & return instructions as conventionally as possible meshes extremely well with current processor implementations which, because of the extensive use thereon of conventional stack-oriented language implementations, have done a great job optimizing call/return.

Further still, the current performance of call/return on contemporary processors, specifically prefetch across call & return (prefetch across return only possible if one sticks to the processor's expected stack organization of return addresses) renders call/return performance the same as jumps.  So the benefits of inlining are no longer in eliminating call/return, but rather in eliminating dispatch, argument copying, etc.  So inlining per se isn't of benefit.  It can actually worsen instruction cache density. Analysis and elimination of dispatch is.  So again context-to-stack mapping makes sense because it means the speculative inliner/adaptive optimizer doesn't have to focus on creating humongous methods or inlining accessos etc etc, and can focus on higher level optimizations like block removal (lambda lifting?), common subexpression elimination, and so on. 

best
Eliot
 

Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Bryce Kampjes
 
Eliot Miranda writes:
 >  On Sun, Feb 22, 2009 at 10:37 AM, <[hidden email]> wrote:
 >
 > >
 > > Eliot Miranda writes:
 > >  >
 > >  > But what I really think is that this is too low a level to worry about.
 > >  >  Much more important to focus on
 > >  > - context to stack mapping
 > >  > - in-line cacheing via a JIT
 > >  > - exploiting multicore via Hydra
 > >  > and beyond (e.g. speculative inlining)
 > >  > than worrying about tiny micro-optimizations like this :)
 > >
 > > If you're planning on adding speculative, I assume Self style dynamic,
 > > inlining won't that reduce the value of context to stack mapping?
 >
 >
 > Not at all; in fact quite the reverse.  Context to stack mapping allows one
 > to retain contexts while having the VM execute efficient, stack-based code
 > (i.e. using hardware call instructions).  This in turn enables the entire
 > adaptive optimizer, including the stack analyser and the
 > bytecode-to-bytecode compiler/method inliner to be written in Smalltalk.
 >  The image level code can examine the run-time stack using contexts as their
 > interface without having to understand native stack formats or different
 > ISAs.  The optimizer is therefore completely portable with all machine
 > specificities confined to the underlying VM which is much simpler by virtue
 > of not containing a sophisticated optimizer (which one would have to squeeze
 > through Slang etc).

All you need is the optimiser to run early in compilation for it to be
portable.

And we definately agree on trying to keep complex logic out of the
VM. Sound's like you're thinking of AoSTa.

 > So for me, context-to-stack mapping is fundamental to implementing
 > speculative inlining in Smalltalk.
 >
 >
 > My view with Exupery is context caches should be left until after
 > > dynamic inlining as their value will depend on how well dynamic
 > > inlining reduces the number of sends.
 > >
 >
 > I know and I disagree.  Dynamic inlining depends on collecting good type
 > information, something that inline caches do well.  In-line caches are
 > efficiently implemented with native call instructions, either to method
 > entry-points or PIC jump tables.  Native call instructions mesh well with
 > stacks.  So context-to-stack mapping, for me, is a sensible enabling
 > optimization for speculative inlining because it meshes well with inline
 > caches.

PICs are a separate issue. Exupery has PICs, and has had them for
years now. PICs are just as easily implemented as jumps.

 > Further, context-to-stack mapping is such a huge win that it'll be of
 > benefit even if the VM is spending 90% of its time in inlined call-less
 > code.  We see a speedup of very nearly 2x (48% sticks in my head) for one
 > non-micro tree walking benchmark from the computer language shootout.  And
 > this is in a very slow VM.  In a faster VM context-to-stack mapping would be
 > even more valuable, because it would save an even greater percentage of
 > overall execution time.

I see only one sixth of the time going into context creation for the
send benchmark which is about as send heavy as you can get. That's
running native code at about twice Squeak's speed. Also there's still
plenty of inefficiency in Exupery's call return sequences.

 > Further still using call & return instructions as conventionally as possible
 > meshes extremely well with current processor implementations which, because
 > of the extensive use thereon of conventional stack-oriented language
 > implementations, have done a great job optimizing call/return.

Unconditional jumps for sends also benefit from hardware
optimisation. Returns turn into indirect jumps which are less
efficent, but getting better with Core 2.

Cheers
Bryce
Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Eliot Miranda-2
 


On Sun, Feb 22, 2009 at 12:54 PM, <[hidden email]> wrote:

Eliot Miranda writes:
 >  On Sun, Feb 22, 2009 at 10:37 AM, <[hidden email]> wrote:
 >
 > >
 > > Eliot Miranda writes:
 > >  >
 > >  > But what I really think is that this is too low a level to worry about.
 > >  >  Much more important to focus on
 > >  > - context to stack mapping
 > >  > - in-line cacheing via a JIT
 > >  > - exploiting multicore via Hydra
 > >  > and beyond (e.g. speculative inlining)
 > >  > than worrying about tiny micro-optimizations like this :)
 > >
 > > If you're planning on adding speculative, I assume Self style dynamic,
 > > inlining won't that reduce the value of context to stack mapping?
 >
 >
 > Not at all; in fact quite the reverse.  Context to stack mapping allows one
 > to retain contexts while having the VM execute efficient, stack-based code
 > (i.e. using hardware call instructions).  This in turn enables the entire
 > adaptive optimizer, including the stack analyser and the
 > bytecode-to-bytecode compiler/method inliner to be written in Smalltalk.
 >  The image level code can examine the run-time stack using contexts as their
 > interface without having to understand native stack formats or different
 > ISAs.  The optimizer is therefore completely portable with all machine
 > specificities confined to the underlying VM which is much simpler by virtue
 > of not containing a sophisticated optimizer (which one would have to squeeze
 > through Slang etc).

All you need is the optimiser to run early in compilation for it to be
portable.

...and for it to be untimely.  An adaptive optimizer by definition needs to be running intermittently all the time.  It optimizes what is happening now, not what happened at start-up.

And we definately agree on trying to keep complex logic out of the
VM. Sound's like you're thinking of AoSTa.

yes (AOStA).
 
 > So for me, context-to-stack mapping is fundamental to implementing
 > speculative inlining in Smalltalk.
 >
 >
 > My view with Exupery is context caches should be left until after
 > > dynamic inlining as their value will depend on how well dynamic
 > > inlining reduces the number of sends.
 > >
 >
 > I know and I disagree.  Dynamic inlining depends on collecting good type
 > information, something that inline caches do well.  In-line caches are
 > efficiently implemented with native call instructions, either to method
 > entry-points or PIC jump tables.  Native call instructions mesh well with
 > stacks.  So context-to-stack mapping, for me, is a sensible enabling
 > optimization for speculative inlining because it meshes well with inline
 > caches.

PICs are a separate issue. Exupery has PICs, and has had them for
years now. PICs are just as easily implemented as jumps.

Yes, PICs are jump tables.  But, at least in my implementation and in others I know of, they get called.  Tey are composed of a jump table that then jumps into methods at a point past any entry-point dynamic-binding/type checking.
 
 > Further, context-to-stack mapping is such a huge win that it'll be of
 > benefit even if the VM is spending 90% of its time in inlined call-less
 > code.  We see a speedup of very nearly 2x (48% sticks in my head) for one
 > non-micro tree walking benchmark from the computer language shootout.  And
 > this is in a very slow VM.  In a faster VM context-to-stack mapping would be
 > even more valuable, because it would save an even greater percentage of
 > overall execution time.

I see only one sixth of the time going into context creation for the
send benchmark which is about as send heavy as you can get. That's
running native code at about twice Squeak's speed. Also there's still
plenty of inefficiency in Exupery's call return sequences.

So you could get a 17% speedup if you could remove the context overhead.  That's quite a tidy gain.  I see a 26% increase in benchFib performance between base Squeak and the StackVM with no native code at all.

What are the inefficiences in Exupery's call return sequences?

 > Further still using call & return instructions as conventionally as possible
 > meshes extremely well with current processor implementations which, because
 > of the extensive use thereon of conventional stack-oriented language
 > implementations, have done a great job optimizing call/return.

Unconditional jumps for sends also benefit from hardware
optimisation. Returns turn into indirect jumps which are less
efficent, but getting better with Core 2.

and Power 


Cheers
Bryce

Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Igor Stasenko
In reply to this post by Eliot Miranda-2

2009/2/22 Eliot Miranda <[hidden email]>:

>
> Hi Igor,
>
> On Sat, Feb 21, 2009 at 7:10 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> 2009/2/22 Eliot Miranda <[hidden email]>:
>> >
>> >
>> >
>> > On Sat, Feb 21, 2009 at 3:36 PM, Igor Stasenko <[hidden email]> wrote:
>> >>
>> >> 2009/2/21 Eliot Miranda <[hidden email]>:
>> >> >
>> >> > Hi Igor,
>> >> >
>> >> > On Fri, Feb 20, 2009 at 11:37 PM, Igor Stasenko <[hidden email]> wrote:
>> >> >>
>> >> >> Here the method:
>> >> >>
>> >> >> isContextHeader: aHeader
>> >> >>        self inline: true.
>> >> >>        ^ ((aHeader >> 12) bitAnd: 16r1F) = 13                  "MethodContext"
>> >> >>                or: [((aHeader >> 12) bitAnd: 16r1F) = 14               "BlockContext"
>> >> >>                or: [((aHeader >> 12) bitAnd: 16r1F) = 4]]      "PseudoContext"
>> >> >>
>> >> >> i think it wouldn't hurt to rewrite it as:
>> >> >>
>> >> >> isContextHeader: aHeader
>> >> >>        self inline: true.
>> >> >>  | hdr |
>> >> >>  hdr := aHeader bitAnd: (16r1F << 12).
>> >> >>        ^ hdr = (13 << 12)                      "MethodContext"
>> >> >>                or: [ hdr = (14 << 12)          "BlockContext"
>> >> >>                or: [ hdr = (4 << 12)]]  "PseudoContext"
>> >> >>
>> >> >> which will allow GCC to optimize it more easily.
>> >> >> I'm not sure if it can optimize it in its current state.
>> >> >> This may impact a small speedup of copy operations and any other
>> >> >> operations which need to determine a number of pointer fields in
>> >> >> object (users of #lastPointerOf:)
>> >> >
>> >> >
>> >> > First you should look at the assembly that gcc generates to be sure anything is needed.  e.g.
>> >> > cat >t.c <<END
>> >> > long isContext(long aHeader) {
>> >> >     return ((aHeader >> 12) & 0x1F) == 13
>> >> >         || ((aHeader >> 12) & 0x1F) == 14
>> >> >         || ((aHeader >> 12) & 0x1F) == 4;
>> >> > }
>> >> > END
>> >> > gcc -O3 -S -fomit-frame-pointer t.c; cat t.s
>> >> >     .text
>> >> > .globl _isContext
>> >> > _isContext:
>> >> >     movl    4(%esp), %edx
>> >> >     sarl    $12, %edx
>> >> >     andl    $31, %edx
>> >> >     leal    -13(%edx), %eax
>> >> >     cmpl    $1, %eax
>> >> >     jbe L2
>> >> >     cmpl    $4, %edx
>> >> >     je  L2
>> >> >     xorl    %eax, %eax
>> >> >     ret
>> >> > L2:
>> >> >     movl    $1, %eax
>> >> >     ret
>> >> >     .subsections_via_symbols
>> >> >
>> >> > So you don't need to do anything; it has done everything for you.
>> >> > However, one point is important.  Using 16r1F << 12 et al as your masks and constants to compare against is much worse on many systems, most importantly x86, than shifting down by 12 and comparing against small constants, because the instruction set can encode small constants far more compactly, and that means better code density in the icache which is significant for performance.  e.g. on x86 a constant in the range -128 to 127 typically takes a byte whereas anything else will take 4.
>> >> > But what I really think is that this is too low a level to worry about.  Much more important to focus on
>> >> > - context to stack mapping
>> >> > - in-line cacheing via a JIT
>> >> > - exploiting multicore via Hydra
>> >> > and beyond (e.g. speculative inlining)
>> >> > than worrying about tiny micro-optimizations like this :)
>> >>
>> >> Thanks Eliot.
>> >> In fact, this method drawn my attention because of its number of
>> >> checks. Typically, all code which dealing with object formats contain
>> >> many branches. And from places where this method is called, there are
>> >> additional checks surrounding it.
>> >> So, the real problem is the overwhelming number of checks to do
>> >> something, and i think this having own impacts on performance.
>> >> I hope that a new object format with 64 bit header, which you plan to
>> >> use, will allow us to avoid so many branches in code which having high
>> >> usage frequency.
>> >
>> > In fact the StackVM makes a big improvement to this very method because in the StackVM there are only MethodContexts and so the method reads
>> > isContextHeader: aHeader
>> > <inline: true>
>> > "c.f. {BlockContext. MethodContext. PseudoContext} collect: [:class| class -> class indexIfCompact]"
>> > ^(self compactClassIndexOfHeader: aHeader) == ClassMethodContextCompactIndex
>> > which is f course equivalent to
>> > isContextHeader: aHeader
>> > ^((aHeader >> 12) bitAnd: 16r1F) = 13
>> > :)
>> >
>>
>> yeah much more concise & understandable.
>>
>> I currently thinking is there are simple ways to decompose huge
>> ObjectMemory/Interpreter on multiple smaller classes.
>> To illustrate it , applied to #isContextHeader: we could write it as following:
>>
>> isContextHeader: aHeader
>> <inline: true>
>> <var: #aHeader class: #OopBaseHeader>
>> ^ aHeader compactClassIndex == ClassMethodContextCompactIndex
>>
>> and, of course, then we really don't need #isContextHeader: at all
>> because we can simply write a direct message in methods where we need
>> such checks:
>>
>> <var: #header class: #OopBaseHeader>
>> <var: #oop class: #Oop>
>> header := oop basicHeader.
>> header isContextHeader ifTrue: [ ... ]
>>
>> The idea is to use type information in code generator to determine
>> where it should look for a code when translating a message sends to C
>> code.
>> With little more heuristics, we don't even need to declare types so often:
>> Oop>>basicHeader
>>  <returnType: #OopBaseHeader>
>>  ^ self longAt: 0
>> or even:
>>
>> Oop>>basicHeader
>>  ^ (self longAt: 0) as:OopBaseHeader
>>
>>
>> so, then you could simply write:
>> oop basicHeader isContextHeader ifTrue: [... ]
>>
>> looks like a plain smalltalk code, isnt? :)
>>
>> i'm using similar technique for static inlining in Moebius/CorruptVM.
>
> I'm also using something like this in Cog, but only for simple struct types, a machine code method CogMethod, a stack page, various structs in the compiler such as an instruction, a block start, etc.  e.g.
> generateInstructionsAt: eventualAbsoluteAddress
>         "Size pc-dependent instructions and assign eventual addresses to all instructions.
>          Answer the size of the code.
>          Compute forward branches based on virtual address (abstract code starts at 0),
>          assuming that any branches branched over are long.
>          Compute backward branches based on actual address.
>          Reuse the fixups array to record the pc-dependent instructions that need to have
>          their code generation postponed until after the others."
>         | absoluteAddress pcDependentIndex abstractInstruction fixup |
>         <var: #abstractInstruction type: #'AbstractInstruction *'>
>         <var: #fixup type: #'BytecodeFixup *'>
>         absoluteAddress := eventualAbsoluteAddress.
>         pcDependentIndex := 0.
>         0 to: opcodeIndex - 1 do:
>                  [:i|
>                  breakPC = absoluteAddress ifTrue:
>                           [self halt: 'breakPC reached in generateInstructionsAt:'].
>                  abstractInstruction := self abstractInstructionAt: i.
>                  abstractInstruction isPCDependent
>                           ifTrue:
>                                    [abstractInstruction sizePCDependentInstructionAt: absoluteAddress.
>                                     fixup := self fixupAt: pcDependentIndex.
>                                     pcDependentIndex := pcDependentIndex + 1.
>                                     fixup instructionIndex: i.
>                                     absoluteAddress := absoluteAddress + abstractInstruction machineCodeSize]
>                           ifFalse:
>                                    [abstractInstruction address: absoluteAddress.
>                                     absoluteAddress := abstractInstruction concretizeAt: absoluteAddress]].
>         0 to: pcDependentIndex - 1 do:
>                  [:i|
>                  fixup := self fixupAt: i.
>                  abstractInstruction := self abstractInstructionAt: fixup instructionIndex.
>                  breakPC = absoluteAddress ifTrue:
>                           [self halt: 'breakPC reached in generateInstructionsAt:'].
>                  abstractInstruction concretizeAt: abstractInstruction address].
>         ^absoluteAddress - eventualAbsoluteAddress
>
> You'll notice the lack of type inferrence means I have to assign typed results to typed local variables to have the code generator be able to find the right code.

Yeah, a simplest possible solution :)

> My problem with doing it for oops has been not wanting to add methods to Integer.  Do you use a special type for oop or do you add methods to Integer?

Well, in Moebius, the simulation does not running these methods as
smalltalk methods. It simulates a low-level instructions which is
generated by compiler for this method. So i don't have a problem with
types.

And in case of VMMaker, i thought about it just yesterday.
This is not a problem for code translator to pick a right
methdod/class for corresponding message knowing its receiver type. But
for simulation its a problem.
I thought that it can be solved by adding a subclass of Integer, say

Integer subclass: #Oop
instanceVariables: 'value'

implement all methods in it , like
+ object
 ^ Oop value: (value + object)

as well as coercion routines for it. So you can mix  smi + oop or
oop+smi in arithmetic.
i hope it will not require monkey patching an Integer class.

>>
>>
>> >>
>> >> --
>> >> Best regards,
>> >> Igor Stasenko AKA sig.
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>
>
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Igor Stasenko
 
2009/2/23 Igor Stasenko <[hidden email]>:
> 2009/2/22 Eliot Miranda <[hidden email]>:
>>

[snip]

another idea how to make a cleaner slang code , was to introduce a
special 'C' shared variable.
So, then instead of writing something like:

self ioRelinquishProcessorForMicroseconds: xxx.

or even worse:

self cCode:' ((sqInt (*) (sqInt, sqInt*, sqInt*, sqInt*, sqInt*))querySurfaceFn)
                (handle, &sourceWidth, &sourceHeight, &sourceDepth, &sourceMSB)'
                        inSmalltalk:[false]

you merely write:

C ioRelinquishProcessorForMicroseconds: xxx.
C querySurfaceFn: handle with: sourceWidth cReference with:
sourceHeight cReference ....

First , it lets a code generator know, that given message send is raw
C function call (by taking a first keyword as a function name).
Second, it can be simulated appropriately by a simulator, since you
can assign an object to 'C' pool var which having best-match
implementations for all such calls. And of course it helps greatly in
finding errors or mistypes!

Then patterns like 'self foo' be exclusively treated by code generator
as method which belongs to an instance of class where  method which
containing such code residing, without any exceptions.

So, if you write
'self signalSemaphore: xx' in Interpreter's method
a code generator should lookup for #signalSemaphore: in Interpreter class.
And if you write 'self header'   in Oop class -- it will lookup
#header method in Oop class, but nowhere else!

--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Eliot Miranda-2
 


On Sun, Feb 22, 2009 at 8:08 PM, Igor Stasenko <[hidden email]> wrote:

2009/2/23 Igor Stasenko <[hidden email]>:
> 2009/2/22 Eliot Miranda <[hidden email]>:
>>

[snip]

another idea how to make a cleaner slang code , was to introduce a
special 'C' shared variable.
So, then instead of writing something like:

self ioRelinquishProcessorForMicroseconds: xxx.

or even worse:

self cCode:' ((sqInt (*) (sqInt, sqInt*, sqInt*, sqInt*, sqInt*))querySurfaceFn)
               (handle, &sourceWidth, &sourceHeight, &sourceDepth, &sourceMSB)'
                       inSmalltalk:[false]

you merely write:

C ioRelinquishProcessorForMicroseconds: xxx.
C querySurfaceFn: handle with: sourceWidth cReference with:
sourceHeight cReference ....

this is scarily similar to Alien style FFI's, e.g. Vassili's Newspeak Windows GUI interface :)
 
First , it lets a code generator know, that given message send is raw
C function call (by taking a first keyword as a function name).
Second, it can be simulated appropriately by a simulator, since you
can assign an object to 'C' pool var which having best-match
implementations for all such calls. And of course it helps greatly in
finding errors or mistypes!

Then patterns like 'self foo' be exclusively treated by code generator
as method which belongs to an instance of class where  method which
containing such code residing, without any exceptions.

So, if you write
'self signalSemaphore: xx' in Interpreter's method
a code generator should lookup for #signalSemaphore: in Interpreter class.
And if you write 'self header'   in Oop class -- it will lookup
#header method in Oop class, but nowhere else!

I also think we should do the following:

a) mangle names of selectors so each is prefixed by e.g. the capitals in the class name, so that e.g. StackInterpreter>>popStack: gets mangled to SI_popStack, and Cogit>>cog:selector: gets mangled to C_cogselector etc so one can use super in Slang.  

b) handle variables thusly:
Slang should provide unique names for all local variables as it creates TMethods.  These variable names can simply be integers (actually their key is an integer and their value is their original name).  Since they are all unique there can be no clashes.  Variables can safely be substituted by other variables since when one replaces one variable with another it cannot posibly accidentally clash with some other variable.


Later, when a TMethod is output, variables are output not as their integer key but as their value (their original name) provided it doesn't clash.  The same variable renumbering scheme can be used to resolve clashes.  i.e. the renaming is deferred until a TMethod is output, and done once, not every time one tries to inline a method.  Renaming clashes is simple.  A dictionary maps original names to sequences of integer variable keys.  The renamed variable is the original name concatenated with the index of its integer key in the sequence of keys for the original name.


When inlining a method into another one unifies the formals and actuals assigning new variable numbers for all new variable bindings.  That should simplify the inline considerably because the horrible variable renaming code will reduce to mapping old variable keys to new variable keys.




--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Igor Stasenko

2009/2/23 Eliot Miranda <[hidden email]>:

>
>
>
> On Sun, Feb 22, 2009 at 8:08 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> 2009/2/23 Igor Stasenko <[hidden email]>:
>> > 2009/2/22 Eliot Miranda <[hidden email]>:
>> >>
>>
>> [snip]
>>
>> another idea how to make a cleaner slang code , was to introduce a
>> special 'C' shared variable.
>> So, then instead of writing something like:
>>
>> self ioRelinquishProcessorForMicroseconds: xxx.
>>
>> or even worse:
>>
>> self cCode:' ((sqInt (*) (sqInt, sqInt*, sqInt*, sqInt*, sqInt*))querySurfaceFn)
>>                (handle, &sourceWidth, &sourceHeight, &sourceDepth, &sourceMSB)'
>>                        inSmalltalk:[false]
>>
>> you merely write:
>>
>> C ioRelinquishProcessorForMicroseconds: xxx.
>> C querySurfaceFn: handle with: sourceWidth cReference with:
>> sourceHeight cReference ....
>
> this is scarily similar to Alien style FFI's, e.g. Vassili's Newspeak Windows GUI interface :)
>
>>
>> First , it lets a code generator know, that given message send is raw
>> C function call (by taking a first keyword as a function name).
>> Second, it can be simulated appropriately by a simulator, since you
>> can assign an object to 'C' pool var which having best-match
>> implementations for all such calls. And of course it helps greatly in
>> finding errors or mistypes!
>>
>> Then patterns like 'self foo' be exclusively treated by code generator
>> as method which belongs to an instance of class where  method which
>> containing such code residing, without any exceptions.
>>
>> So, if you write
>> 'self signalSemaphore: xx' in Interpreter's method
>> a code generator should lookup for #signalSemaphore: in Interpreter class.
>> And if you write 'self header'   in Oop class -- it will lookup
>> #header method in Oop class, but nowhere else!
>
> I also think we should do the following:
> a) mangle names of selectors so each is prefixed by e.g. the capitals in the class name, so that e.g. StackInterpreter>>popStack: gets mangled to SI_popStack, and Cogit>>cog:selector: gets mangled to C_cogselector etc so one can use super in Slang.
+1

> b) handle variables thusly:
> Slang should provide unique names for all local variables as it creates TMethods.  These variable names can simply be integers (actually their key is an integer and their value is their original name).  Since they are all unique there can be no clashes.  Variables can safely be substituted by other variables since when one replaces one variable with another it cannot posibly accidentally clash with some other variable.
>
> Later, when a TMethod is output, variables are output not as their integer key but as their value (their original name) provided it doesn't clash.  The same variable renumbering scheme can be used to resolve clashes.  i.e. the renaming is deferred until a TMethod is output, and done once, not every time one tries to inline a method.  Renaming clashes is simple.  A dictionary maps original names to sequences of integer variable keys.  The renamed variable is the original name concatenated with the index of its integer key in the sequence of keys for the original name.
>
> When inlining a method into another one unifies the formals and actuals assigning new variable numbers for all new variable bindings.  That should simplify the inline considerably because the horrible variable renaming code will reduce to mapping old variable keys to new variable keys.

Agree.
i never took a look how method inliner works in code generator. But
some of its discrepancies is a pain.
Like unable to inline a method which having cCode, or c variables
declarations/definitions.

There are also some more syntax sugar, which i forgot to mention:

methodFoo
 | x |
C initializer: (x := 5).
^ x

can produce a following:

int methodFoo()
{
  int x=5;
  return x;
}

and even if you inline such method:

int result;
...
   { int x = 5;
     result = x;
     goto l10;
   }
l10:


what is interesting about inlining, i discovered that GCC 2.95 deals
fine with following code:

#include <stdio.h>

int main(int argc , char** argv)
{
  int i;

  i = ({ printf("foo"); 10; });
  printf("%d", i);

}

i think we can use this for inlining (if we using GCC everywhere, i
don't see why we can't use it), then we don't need to define any vars
in outer scope, like current inliner does. And we can avoid naming
clashes, except those, where arguments to inlined method clashing with
temps declared in it i.e..

int param;
  param = computeParam();
  return computeSomethingElse(param);

and
int  computeSomethingElse( int x)
{
 int param=10;
   return x + param;
}
so, if we try to inline computeSomethingElse, we will have a name
clashing 'x' -> 'param'
so, if naively implemented, it will produce

({int param=10; param+param;})

inlined code.

>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.


--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Igor Stasenko

also, note that using ({ ... })
makes it very easy to inline a blocks:

a := 5>7 ifTrue: [ self foo. b+8 ] ifFalse: [ self bar. b-8]

=>>

a = (5>7) ? ({foo(); b+8;}) : ({bar(); b-8;}) ;


2009/2/23 Igor Stasenko <[hidden email]>:

> 2009/2/23 Eliot Miranda <[hidden email]>:
>>
>>
>>
>> On Sun, Feb 22, 2009 at 8:08 PM, Igor Stasenko <[hidden email]> wrote:
>>>
>>> 2009/2/23 Igor Stasenko <[hidden email]>:
>>> > 2009/2/22 Eliot Miranda <[hidden email]>:
>>> >>
>>>
>>> [snip]
>>>
>>> another idea how to make a cleaner slang code , was to introduce a
>>> special 'C' shared variable.
>>> So, then instead of writing something like:
>>>
>>> self ioRelinquishProcessorForMicroseconds: xxx.
>>>
>>> or even worse:
>>>
>>> self cCode:' ((sqInt (*) (sqInt, sqInt*, sqInt*, sqInt*, sqInt*))querySurfaceFn)
>>>                (handle, &sourceWidth, &sourceHeight, &sourceDepth, &sourceMSB)'
>>>                        inSmalltalk:[false]
>>>
>>> you merely write:
>>>
>>> C ioRelinquishProcessorForMicroseconds: xxx.
>>> C querySurfaceFn: handle with: sourceWidth cReference with:
>>> sourceHeight cReference ....
>>
>> this is scarily similar to Alien style FFI's, e.g. Vassili's Newspeak Windows GUI interface :)
>>
>>>
>>> First , it lets a code generator know, that given message send is raw
>>> C function call (by taking a first keyword as a function name).
>>> Second, it can be simulated appropriately by a simulator, since you
>>> can assign an object to 'C' pool var which having best-match
>>> implementations for all such calls. And of course it helps greatly in
>>> finding errors or mistypes!
>>>
>>> Then patterns like 'self foo' be exclusively treated by code generator
>>> as method which belongs to an instance of class where  method which
>>> containing such code residing, without any exceptions.
>>>
>>> So, if you write
>>> 'self signalSemaphore: xx' in Interpreter's method
>>> a code generator should lookup for #signalSemaphore: in Interpreter class.
>>> And if you write 'self header'   in Oop class -- it will lookup
>>> #header method in Oop class, but nowhere else!
>>
>> I also think we should do the following:
>> a) mangle names of selectors so each is prefixed by e.g. the capitals in the class name, so that e.g. StackInterpreter>>popStack: gets mangled to SI_popStack, and Cogit>>cog:selector: gets mangled to C_cogselector etc so one can use super in Slang.
> +1
>
>> b) handle variables thusly:
>> Slang should provide unique names for all local variables as it creates TMethods.  These variable names can simply be integers (actually their key is an integer and their value is their original name).  Since they are all unique there can be no clashes.  Variables can safely be substituted by other variables since when one replaces one variable with another it cannot posibly accidentally clash with some other variable.
>>
>> Later, when a TMethod is output, variables are output not as their integer key but as their value (their original name) provided it doesn't clash.  The same variable renumbering scheme can be used to resolve clashes.  i.e. the renaming is deferred until a TMethod is output, and done once, not every time one tries to inline a method.  Renaming clashes is simple.  A dictionary maps original names to sequences of integer variable keys.  The renamed variable is the original name concatenated with the index of its integer key in the sequence of keys for the original name.
>>
>> When inlining a method into another one unifies the formals and actuals assigning new variable numbers for all new variable bindings.  That should simplify the inline considerably because the horrible variable renaming code will reduce to mapping old variable keys to new variable keys.
>
> Agree.
> i never took a look how method inliner works in code generator. But
> some of its discrepancies is a pain.
> Like unable to inline a method which having cCode, or c variables
> declarations/definitions.
>
> There are also some more syntax sugar, which i forgot to mention:
>
> methodFoo
>  | x |
> C initializer: (x := 5).
> ^ x
>
> can produce a following:
>
> int methodFoo()
> {
>  int x=5;
>  return x;
> }
>
> and even if you inline such method:
>
> int result;
> ...
>   { int x = 5;
>     result = x;
>     goto l10;
>   }
> l10:
>
>
> what is interesting about inlining, i discovered that GCC 2.95 deals
> fine with following code:
>
> #include <stdio.h>
>
> int main(int argc , char** argv)
> {
>  int i;
>
>  i = ({ printf("foo"); 10; });
>  printf("%d", i);
>
> }
>
> i think we can use this for inlining (if we using GCC everywhere, i
> don't see why we can't use it), then we don't need to define any vars
> in outer scope, like current inliner does. And we can avoid naming
> clashes, except those, where arguments to inlined method clashing with
> temps declared in it i.e..
>
> int param;
>  param = computeParam();
>  return computeSomethingElse(param);
>
> and
> int  computeSomethingElse( int x)
> {
>  int param=10;
>   return x + param;
> }
> so, if we try to inline computeSomethingElse, we will have a name
> clashing 'x' -> 'param'
> so, if naively implemented, it will produce
>
> ({int param=10; param+param;})
>
> inlined code.
>
>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko AKA sig.
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Igor Stasenko
In reply to this post by Igor Stasenko

2009/2/23 Igor Stasenko <[hidden email]>:

> 2009/2/23 Eliot Miranda <[hidden email]>:
>>
>>
>>
>> On Sun, Feb 22, 2009 at 8:08 PM, Igor Stasenko <[hidden email]> wrote:
>>>
>>> 2009/2/23 Igor Stasenko <[hidden email]>:
>>> > 2009/2/22 Eliot Miranda <[hidden email]>:
>>> >>
>>>
>>> [snip]
>>>
>>> another idea how to make a cleaner slang code , was to introduce a
>>> special 'C' shared variable.
>>> So, then instead of writing something like:
>>>
>>> self ioRelinquishProcessorForMicroseconds: xxx.
>>>
>>> or even worse:
>>>
>>> self cCode:' ((sqInt (*) (sqInt, sqInt*, sqInt*, sqInt*, sqInt*))querySurfaceFn)
>>>                (handle, &sourceWidth, &sourceHeight, &sourceDepth, &sourceMSB)'
>>>                        inSmalltalk:[false]
>>>
>>> you merely write:
>>>
>>> C ioRelinquishProcessorForMicroseconds: xxx.
>>> C querySurfaceFn: handle with: sourceWidth cReference with:
>>> sourceHeight cReference ....
>>
>> this is scarily similar to Alien style FFI's, e.g. Vassili's Newspeak Windows GUI interface :)
>>
>>>
>>> First , it lets a code generator know, that given message send is raw
>>> C function call (by taking a first keyword as a function name).
>>> Second, it can be simulated appropriately by a simulator, since you
>>> can assign an object to 'C' pool var which having best-match
>>> implementations for all such calls. And of course it helps greatly in
>>> finding errors or mistypes!
>>>
>>> Then patterns like 'self foo' be exclusively treated by code generator
>>> as method which belongs to an instance of class where  method which
>>> containing such code residing, without any exceptions.
>>>
>>> So, if you write
>>> 'self signalSemaphore: xx' in Interpreter's method
>>> a code generator should lookup for #signalSemaphore: in Interpreter class.
>>> And if you write 'self header'   in Oop class -- it will lookup
>>> #header method in Oop class, but nowhere else!
>>
>> I also think we should do the following:
>> a) mangle names of selectors so each is prefixed by e.g. the capitals in the class name, so that e.g. StackInterpreter>>popStack: gets mangled to SI_popStack, and Cogit>>cog:selector: gets mangled to C_cogselector etc so one can use super in Slang.
> +1
>
>> b) handle variables thusly:
>> Slang should provide unique names for all local variables as it creates TMethods.  These variable names can simply be integers (actually their key is an integer and their value is their original name).  Since they are all unique there can be no clashes.  Variables can safely be substituted by other variables since when one replaces one variable with another it cannot posibly accidentally clash with some other variable.
>>
>> Later, when a TMethod is output, variables are output not as their integer key but as their value (their original name) provided it doesn't clash.  The same variable renumbering scheme can be used to resolve clashes.  i.e. the renaming is deferred until a TMethod is output, and done once, not every time one tries to inline a method.  Renaming clashes is simple.  A dictionary maps original names to sequences of integer variable keys.  The renamed variable is the original name concatenated with the index of its integer key in the sequence of keys for the original name.
>>
>> When inlining a method into another one unifies the formals and actuals assigning new variable numbers for all new variable bindings.  That should simplify the inline considerably because the horrible variable renaming code will reduce to mapping old variable keys to new variable keys.
>
> Agree.
> i never took a look how method inliner works in code generator. But
> some of its discrepancies is a pain.
> Like unable to inline a method which having cCode, or c variables
> declarations/definitions.
>
> There are also some more syntax sugar, which i forgot to mention:
>
> methodFoo
>  | x |
> C initializer: (x := 5).
> ^ x
>
> can produce a following:
>
> int methodFoo()
> {
>  int x=5;
>  return x;
> }
>
> and even if you inline such method:
>
> int result;
> ...
>   { int x = 5;
>     result = x;
>     goto l10;
>   }
> l10:
>
>
> what is interesting about inlining, i discovered that GCC 2.95 deals
> fine with following code:
>
> #include <stdio.h>
>
> int main(int argc , char** argv)
> {
>  int i;
>
>  i = ({ printf("foo"); 10; });
>  printf("%d", i);
>
> }
>
> i think we can use this for inlining (if we using GCC everywhere, i
> don't see why we can't use it), then we don't need to define any vars
> in outer scope, like current inliner does. And we can avoid naming
> clashes, except those, where arguments to inlined method clashing with
> temps declared in it i.e..
>
> int param;
>  param = computeParam();
>  return computeSomethingElse(param);
>
> and
> int  computeSomethingElse( int x)
> {
>  int param=10;
>   return x + param;
> }
> so, if we try to inline computeSomethingElse, we will have a name
> clashing 'x' -> 'param'
> so, if naively implemented, it will produce
>
> ({int param=10; param+param;})
>
> inlined code.
>

Ha, even better.

Suppose some code calls a method which has to be inlined in a following way:

foo := self method: 5+i with: self bar.

and method declared as following:

method: arg1 with: arg2
 ^ arg1 + arg2

now, to inline it we can generate:

({
  int arg1 = 5+i;
  int arg2 = bar();
  arg1+arg2;
})

the only difference here is the order of evaluation: arg1, then arg2
while in C calls arguments are evaluated in reverse order - arg2 then arg1
because C calling convention pushing last argument first on stack.
But i think it will be more correct to evaluate them in smalltalk
order, since we coding in smalltalk.
so, simulated code will behave similar to compiled VM for all such cases.
--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Igor Stasenko

Some more observations:

#include <stdio.h>

int main(int argc , char** argv)
{
  int i;

  i = 1;
  i = ({ int y=i; int i=6; printf("foo"); 10+y+i; });
  printf("%d", i);

}

prints:
foo17

so, to generate a proper inlined code, we should care only about name clashing
for method arguments.
In above code, suppose 'y' is an argument var,
and 'i', declared in inner scope - a temporary var.


2009/2/23 Igor Stasenko <[hidden email]>:

> 2009/2/23 Igor Stasenko <[hidden email]>:
>> 2009/2/23 Eliot Miranda <[hidden email]>:
>>>
>>>
>>>
>>> On Sun, Feb 22, 2009 at 8:08 PM, Igor Stasenko <[hidden email]> wrote:
>>>>
>>>> 2009/2/23 Igor Stasenko <[hidden email]>:
>>>> > 2009/2/22 Eliot Miranda <[hidden email]>:
>>>> >>
>>>>
>>>> [snip]
>>>>
>>>> another idea how to make a cleaner slang code , was to introduce a
>>>> special 'C' shared variable.
>>>> So, then instead of writing something like:
>>>>
>>>> self ioRelinquishProcessorForMicroseconds: xxx.
>>>>
>>>> or even worse:
>>>>
>>>> self cCode:' ((sqInt (*) (sqInt, sqInt*, sqInt*, sqInt*, sqInt*))querySurfaceFn)
>>>>                (handle, &sourceWidth, &sourceHeight, &sourceDepth, &sourceMSB)'
>>>>                        inSmalltalk:[false]
>>>>
>>>> you merely write:
>>>>
>>>> C ioRelinquishProcessorForMicroseconds: xxx.
>>>> C querySurfaceFn: handle with: sourceWidth cReference with:
>>>> sourceHeight cReference ....
>>>
>>> this is scarily similar to Alien style FFI's, e.g. Vassili's Newspeak Windows GUI interface :)
>>>
>>>>
>>>> First , it lets a code generator know, that given message send is raw
>>>> C function call (by taking a first keyword as a function name).
>>>> Second, it can be simulated appropriately by a simulator, since you
>>>> can assign an object to 'C' pool var which having best-match
>>>> implementations for all such calls. And of course it helps greatly in
>>>> finding errors or mistypes!
>>>>
>>>> Then patterns like 'self foo' be exclusively treated by code generator
>>>> as method which belongs to an instance of class where  method which
>>>> containing such code residing, without any exceptions.
>>>>
>>>> So, if you write
>>>> 'self signalSemaphore: xx' in Interpreter's method
>>>> a code generator should lookup for #signalSemaphore: in Interpreter class.
>>>> And if you write 'self header'   in Oop class -- it will lookup
>>>> #header method in Oop class, but nowhere else!
>>>
>>> I also think we should do the following:
>>> a) mangle names of selectors so each is prefixed by e.g. the capitals in the class name, so that e.g. StackInterpreter>>popStack: gets mangled to SI_popStack, and Cogit>>cog:selector: gets mangled to C_cogselector etc so one can use super in Slang.
>> +1
>>
>>> b) handle variables thusly:
>>> Slang should provide unique names for all local variables as it creates TMethods.  These variable names can simply be integers (actually their key is an integer and their value is their original name).  Since they are all unique there can be no clashes.  Variables can safely be substituted by other variables since when one replaces one variable with another it cannot posibly accidentally clash with some other variable.
>>>
>>> Later, when a TMethod is output, variables are output not as their integer key but as their value (their original name) provided it doesn't clash.  The same variable renumbering scheme can be used to resolve clashes.  i.e. the renaming is deferred until a TMethod is output, and done once, not every time one tries to inline a method.  Renaming clashes is simple.  A dictionary maps original names to sequences of integer variable keys.  The renamed variable is the original name concatenated with the index of its integer key in the sequence of keys for the original name.
>>>
>>> When inlining a method into another one unifies the formals and actuals assigning new variable numbers for all new variable bindings.  That should simplify the inline considerably because the horrible variable renaming code will reduce to mapping old variable keys to new variable keys.
>>
>> Agree.
>> i never took a look how method inliner works in code generator. But
>> some of its discrepancies is a pain.
>> Like unable to inline a method which having cCode, or c variables
>> declarations/definitions.
>>
>> There are also some more syntax sugar, which i forgot to mention:
>>
>> methodFoo
>>  | x |
>> C initializer: (x := 5).
>> ^ x
>>
>> can produce a following:
>>
>> int methodFoo()
>> {
>>  int x=5;
>>  return x;
>> }
>>
>> and even if you inline such method:
>>
>> int result;
>> ...
>>   { int x = 5;
>>     result = x;
>>     goto l10;
>>   }
>> l10:
>>
>>
>> what is interesting about inlining, i discovered that GCC 2.95 deals
>> fine with following code:
>>
>> #include <stdio.h>
>>
>> int main(int argc , char** argv)
>> {
>>  int i;
>>
>>  i = ({ printf("foo"); 10; });
>>  printf("%d", i);
>>
>> }
>>
>> i think we can use this for inlining (if we using GCC everywhere, i
>> don't see why we can't use it), then we don't need to define any vars
>> in outer scope, like current inliner does. And we can avoid naming
>> clashes, except those, where arguments to inlined method clashing with
>> temps declared in it i.e..
>>
>> int param;
>>  param = computeParam();
>>  return computeSomethingElse(param);
>>
>> and
>> int  computeSomethingElse( int x)
>> {
>>  int param=10;
>>   return x + param;
>> }
>> so, if we try to inline computeSomethingElse, we will have a name
>> clashing 'x' -> 'param'
>> so, if naively implemented, it will produce
>>
>> ({int param=10; param+param;})
>>
>> inlined code.
>>
>
> Ha, even better.
>
> Suppose some code calls a method which has to be inlined in a following way:
>
> foo := self method: 5+i with: self bar.
>
> and method declared as following:
>
> method: arg1 with: arg2
>  ^ arg1 + arg2
>
> now, to inline it we can generate:
>
> ({
>  int arg1 = 5+i;
>  int arg2 = bar();
>  arg1+arg2;
> })
>
> the only difference here is the order of evaluation: arg1, then arg2
> while in C calls arguments are evaluated in reverse order - arg2 then arg1
> because C calling convention pushing last argument first on stack.
> But i think it will be more correct to evaluate them in smalltalk
> order, since we coding in smalltalk.
> so, simulated code will behave similar to compiled VM for all such cases.
> --
> Best regards,
> Igor Stasenko AKA sig.
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Yoshiki Ohshima-2
In reply to this post by Igor Stasenko
 
At Sat, 21 Feb 2009 09:37:29 +0200,
Igor Stasenko wrote:

>
>  
> i think it wouldn't hurt to rewrite it as:
>
> isContextHeader: aHeader
> self inline: true.
>  | hdr |
>   hdr := aHeader bitAnd: (16r1F << 12).
> ^ hdr = (13 << 12) "MethodContext"
> or: [ hdr = (14 << 12) "BlockContext"
> or: [ hdr = (4 << 12)]] "PseudoContext"

  This is totally tangent but I was supposed to relay a message from
somebody (IIRC it was originally from Dave Ungar but a few hops in
between) pointed out that "13" is BlockContext and "14" is
MethodContext in the image.

  (This level of nitpicking may not have anything to do with average
Squeakers but for VM hackers I understand it could be crucial^^;)

-- Yoshiki

Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

Igor Stasenko
 
2009/2/23 Yoshiki Ohshima <[hidden email]>:

>
> At Sat, 21 Feb 2009 09:37:29 +0200,
> Igor Stasenko wrote:
>>
>>
>> i think it wouldn't hurt to rewrite it as:
>>
>> isContextHeader: aHeader
>>       self inline: true.
>>  | hdr |
>>   hdr := aHeader bitAnd: (16r1F << 12).
>>       ^ hdr = (13 << 12)                      "MethodContext"
>>               or: [ hdr = (14 << 12)          "BlockContext"
>>               or: [ hdr = (4 << 12)]]  "PseudoContext"
>
>  This is totally tangent but I was supposed to relay a message from
> somebody (IIRC it was originally from Dave Ungar but a few hops in
> between) pointed out that "13" is BlockContext and "14" is
> MethodContext in the image.
>
I just copied the source, didn't really changed anything in it.. so it
sits there for a while :)

>  (This level of nitpicking may not have anything to do with average
> Squeakers but for VM hackers I understand it could be crucial^^;)
>
Yup, this is another argument in having:

OopHeader>>isContextHeader
  ^ self isMethodContext or: [self isBlockContext or: [self isPseudoContext ]]

instead of numerous #bitAnd: and #<< in many different places :)

> -- Yoshiki
>
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Interpreter>>isContextHeader: optimization

David T. Lewis
In reply to this post by Igor Stasenko
 
On Mon, Feb 23, 2009 at 07:05:39AM +0200, Igor Stasenko wrote:
>
> i never took a look how method inliner works in code generator. But
> some of its discrepancies is a pain.
> Like unable to inline a method which having cCode, or c variables
> declarations/definitions.

I think I forgot to put this on Mantis (sorry), but look at VMMaker-dtl.103
on SqueakSource for the changes that allow inlining methods with embedded C
code:

   Name: VMMaker-dtl.103
   Author: dtl
   Time: 24 September 2008, 11:23:59 pm
   UUID: b4f365aa-e032-49eb-83ca-11526fd5101e
   Ancestors: VMMaker-dtl.97
   
   VMMaker 3.9.1
   Supports #inline: directive for methods containing embedded C code (#cCode:).
   This includes fixes for Slang generation, case statement translation to C in
   the interp() loop, and various type declaration cleanups.
   
   Low level methods with embedded C may now be inlined and translated, permitting
   functions that previously required external support code and C macros to be
   directly implemented in Slang.
   
   A remaining caveat is that methods with #cCode: strings with references to
   method parameters must not request inlining, as the inliner may rename method
   parameters such that they no longer match variable names in the embedded C.

Dave

12