[VM-dev] Does doesNotUnderstand:/cannotInterprer: jitted?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[VM-dev] Does doesNotUnderstand:/cannotInterprer: jitted?

Denis Kudriashov
 
Hello.

What happens when message lookup not found method for given message? Will #doesNotUnderstand: or #cannotInterpret: be added to PIC?
Reply | Threaded
Open this post in threaded view
|

Re: [VM-dev] Does doesNotUnderstand:/cannotInterprer: jitted?

Clément Béra
 
Hi.

#doesNotUnderstand: is added to the PIC. It's a special case handled for performance. DNUs are slower than normal sends but the overhead is acceptable.

Other VM call-backs are not added to PIC. In this case the VM is slower. But this is usually not present in production application.

On Mon, Nov 21, 2016 at 9:50 AM, Denis Kudriashov <[hidden email]> wrote:
 
Hello.

What happens when message lookup not found method for given message? Will #doesNotUnderstand: or #cannotInterpret: be added to PIC?


Reply | Threaded
Open this post in threaded view
|

Re: [VM-dev] Does doesNotUnderstand:/cannotInterprer: jitted?

Denis Kudriashov
 

2016-11-21 10:05 GMT+01:00 Clément Bera <[hidden email]>:
Hi.

#doesNotUnderstand: is added to the PIC. It's a special case handled for performance. DNUs are slower than normal sends but the overhead is acceptable.

But when it is in PIC it is not anymore slower? 
 

Other VM call-backs are not added to PIC. In this case the VM is slower. But this is usually not present in production application.

Proxies could be based on it and used in production.

What about callback message when readonly objects is going to be modified? Is it jitted?

Reply | Threaded
Open this post in threaded view
|

Re: [VM-dev] Does doesNotUnderstand:/cannotInterprer: jitted?

Clément Béra
 


On Mon, Nov 21, 2016 at 10:10 AM, Denis Kudriashov <[hidden email]> wrote:
 

2016-11-21 10:05 GMT+01:00 Clément Bera <[hidden email]>:
Hi.

#doesNotUnderstand: is added to the PIC. It's a special case handled for performance. DNUs are slower than normal sends but the overhead is acceptable.

But when it is in PIC it is not anymore slower? 

Well the DNU requires to create the message argument so it's always slower than normal call.
 

Other VM call-backs are not added to PIC. In this case the VM is slower. But this is usually not present in production application.

Proxies could be based on it and used in production.

Right. Well then it's slower. 
 

What about callback message when readonly objects is going to be modified? Is it jitted?

 This is done in a similar way to mustBeBoolean. The detection of read-only object instance variable mutation is jitted as part of the instance variable store, but if the call-back is actually triggered it's a bit slow. The call-back itself can be jitted too.

Reply | Threaded
Open this post in threaded view
|

Re: [VM-dev] Does doesNotUnderstand:/cannotInterprer: jitted?

Denis Kudriashov
 

2016-11-21 10:21 GMT+01:00 Clément Bera <[hidden email]>:

2016-11-21 10:05 GMT+01:00 Clément Bera <[hidden email]>:
Hi.

#doesNotUnderstand: is added to the PIC. It's a special case handled for performance. DNUs are slower than normal sends but the overhead is acceptable.

But when it is in PIC it is not anymore slower? 

Well the DNU requires to create the message argument so it's always slower than normal call.

Ah, yes and it means that DNU always generates garbage.
Reply | Threaded
Open this post in threaded view
|

Re: [VM-dev] Does doesNotUnderstand:/cannotInterprer: jitted?

Clément Béra
 
Hi again,

I was looking at all the VM call-backs, and it's interesting to see how they're handled in the VM.

The following call-backs are called only from C code or interpreter code, hence they're not really optimised: #cannotReturn:. #cannotInterpret:, #aboutToReturn:through:, #run:with:in:, #unusedBytecode.

The following call-backs are called from machine code through a trampoline switching to C code, i.e., machine code detects if the trampoline will be called, going at full performance if it's not called, and slower if the trampoline is called: #mustBeBoolean, #attemptToAssign:withIndex:, #conditionalBranchCounterTrippedOn:, #trapTripped.

#doesNotUnderstand: is optimised by the JIT as a PIC case, so it's the most optimised call-back and likely the most frequent.

I am not sure about #invokeCallbackContext: . Is it a FFI thing ? I believed it's also called only from C code.


On Mon, Nov 21, 2016 at 10:29 AM, Denis Kudriashov <[hidden email]> wrote:
 

2016-11-21 10:21 GMT+01:00 Clément Bera <[hidden email]>:

2016-11-21 10:05 GMT+01:00 Clément Bera <[hidden email]>:
Hi.

#doesNotUnderstand: is added to the PIC. It's a special case handled for performance. DNUs are slower than normal sends but the overhead is acceptable.

But when it is in PIC it is not anymore slower? 

Well the DNU requires to create the message argument so it's always slower than normal call.

Ah, yes and it means that DNU always generates garbage.


Reply | Threaded
Open this post in threaded view
|

Re: [VM-dev] Does doesNotUnderstand:/cannotInterprer: jitted?

Eliot Miranda-2
In reply to this post by Clément Béra
 
Hi Denis,



_,,,^..^,,,_ (phone)
On Nov 21, 2016, at 1:21 AM, Clément Bera <[hidden email]> wrote:



On Mon, Nov 21, 2016 at 10:10 AM, Denis Kudriashov <[hidden email]> wrote:
 

2016-11-21 10:05 GMT+01:00 Clément Bera <[hidden email]>:
Hi.

#doesNotUnderstand: is added to the PIC. It's a special case handled for performance. DNUs are slower than normal sends but the overhead is acceptable.

But when it is in PIC it is not anymore slower? 

Well the DNU requires to create the message argument so it's always slower than normal call.

Right, bug it's very much faster than an uncached MNU.  In a VM that doesn't have the PUC optimization for MNU the following happens:
- the message being sent is looked up in the method lookup cache, and the lookup misses
- the message is searched for up the superclass chain all the way to Object and not found
- the Message is created
- doesNotUnderstand: is looked up in the method lookup cache, and the lookup succeeds

With the PIC optimization the above happens only on the first MNU send. Subsequently 
- the Message is created
- the relevant MNU method is invoked


You can measure the cost of a normal MNU in the stack vm which doesn't have PUCs and so can't optimise MNU


 Other VM call-backs are not added to PIC. In this case the VM is slower. But this is usually not present in production application.

Proxies could be based on it and used in production.

Right. Well then it's slower. 

But it's a lot faster than no PICs, or PUCs that don't record MNUs. :-)

 

What about callback message when readonly objects is going to be modified? Is it jitted?

 This is done in a similar way to mustBeBoolean. The detection of read-only object instance variable mutation is jitted as part of the instance variable store, but if the call-back is actually triggered it's a bit slow. The call-back itself can be jitted too.

We could perhaps optimise this too; IIRC the receiver is always thisContext so the VM could cache that one lookup somewhere statically.
Reply | Threaded
Open this post in threaded view
|

Re: [VM-dev] Does doesNotUnderstand:/cannotInterprer: jitted?

Denis Kudriashov
 
Hi Eliot.

2016-11-21 18:49 GMT+01:00 Eliot Miranda <[hidden email]>:
You can measure the cost of a normal MNU in the stack vm which doesn't have PUCs and so can't optimise MNU

Thank's for details. I understand now. But can't find how PUC is decrypted. Could you describe?
Reply | Threaded
Open this post in threaded view
|

Re: [VM-dev] Does doesNotUnderstand:/cannotInterprer: jitted?

Eliot Miranda-2
 
Hi Denis,

On Tue, Nov 22, 2016 at 11:59 AM, Denis Kudriashov <[hidden email]> wrote:
 
Hi Eliot.

2016-11-21 18:49 GMT+01:00 Eliot Miranda <[hidden email]>:
You can measure the cost of a normal MNU in the stack vm which doesn't have PUCs and so can't optimise MNU

Thank's for details. I understand now. But can't find how PUC is decrypted. Could you describe?

A PIC is just a table of up to 6 register load, class (index) comparison pairs.  The entry code gets the class of the receiver into a temp reg and then jumps to the sequence of load, class (index) comparisons.  Whenever there's a match the comparison jumps to the entry point of the method.  For an MNU case, instead of jumping to a method entry point, it jumps to an abort call at the start of the PIC, before the PIC's entry code.  The abort call creates the method and tests the value loaded into the register.  If the value loaded is that of a method it jumps to the entry point of that method.  The method loaded is the MNU method for the class.

Here are the gory details for Spur on the x86_64:

Constants beginning with $0xbada55 are the MNU method loads into %r9.
Constants beginning with $0xbabe1f16 are the class indexes

nArgs: ?? type: 4
blksiz: 16rD0
selctr: ??
cPICNumCases: ?? cpicHasMNUCase: ??
28 xorq %rcx, %rcx : 48 31 C9 
2B call .-0xFF0 (0xa50=cePICAbort0Args)
entry:
30 movq %rdx, %rax # the receiver is passed in %rdx; this sequence loads either the tags (for an immediate) or the class index (for a non-immediate) into %rax
33 andq $0x7, %rax
37 jnz .+0x9 (.+0042)
39 movq (%rdx), %rax
3C andq $0x3fffff, %rax
42 cmpq %rcx, %rax # the value for the first entry is the class (index) at the send site, which is stored in %rcx
45 jnz .+0x68 (.+00AF) # this comparison jumps to case 4 if it misses. It is adjusted to jump to case 3 on extension, case 2 on the next extension, etc
47 movq $0x0, %r9
51 nop (*)
52 jmp .+0xF2A1 (0x10d08) # this is the jump to the target for the first case
ClosedPICCase0:
57 movq $0xbada551, %r9 # this is a load of an MNU method if there is an MNU case
61 nop
62 cmpl $0xbabe1f16, %eax # since class indices are 22 bits we use a 32-bit comparison to save space
67 jz .+0xF29B (0x10d18)
ClosedPICCase1:
6D movq $0xbada552, %r9
77 nop
78 cmpl $0xbabe1f17, %eax
7D jz .+0xF295 (0x10d28)
ClosedPICCase2:
83 movq $0xbada553, %r9
8D nop
8E cmpl $0xbabe1f18, %eax
93 jz .+0xF28F (0x10d38)
ClosedPICCase3:
99 movq $0xbada554, %r9
A3 nop
A4 cmpl $0xbabe1f19, %eax
A9 jz .+0xF289 (0x10d48)
ClosedPICCase4:
AF movq $0xbada555, %r9
B9 nop  : 90 
BA cmpl $0xbabe1f1a, %eax
BF jz .+0xF283 (0x10d58)
ClosedPICCase5:
C5 leaq 0xffffffffffffff34(%rip), %rcx # this loads the address of the PIC into %rcx so that cePICMiss??Args can find the PIC and either extend it or relink the send site to an open PIC when the PIC is full
CC jmp .-0xDD1 (0xd10=cePICMiss??Args)

(*) The nops are details of the x86_64 code generator, allowing the system to distinguish move constant from push constant, needed when scanning machine code looking for object references to update, e.g. on garbage collection.

_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: [VM-dev] Does doesNotUnderstand:/cannotInterprer: jitted?

Denis Kudriashov
 

2016-11-23 0:47 GMT+01:00 Eliot Miranda <[hidden email]>:

A PIC is just a table of up to 6 register load, class (index) comparison pairs.  The entry code gets the class of the receiver into a temp reg and then jumps to the sequence of load, class (index) comparisons.  Whenever there's a match the comparison jumps to the entry point of the method.  For an MNU case, instead of jumping to a method entry point, it jumps to an abort call at the start of the PIC, before the PIC's entry code.  The abort call creates the method and tests the value loaded into the register.  If the value loaded is that of a method it jumps to the entry point of that method.  The method loaded is the MNU method for the class.

Thank's Eliot. Logic is clear for me now. And it is named PUC, right? What abbreviation means?
Reply | Threaded
Open this post in threaded view
|

Re: [VM-dev] Does doesNotUnderstand:/cannotInterprer: jitted?

Clément Béra
 
PIC = Polymorphic Inline Cache

PUC I don't know.

On Wed, Nov 23, 2016 at 10:11 AM, Denis Kudriashov <[hidden email]> wrote:
 

2016-11-23 0:47 GMT+01:00 Eliot Miranda <[hidden email]>:

A PIC is just a table of up to 6 register load, class (index) comparison pairs.  The entry code gets the class of the receiver into a temp reg and then jumps to the sequence of load, class (index) comparisons.  Whenever there's a match the comparison jumps to the entry point of the method.  For an MNU case, instead of jumping to a method entry point, it jumps to an abort call at the start of the PIC, before the PIC's entry code.  The abort call creates the method and tests the value loaded into the register.  If the value loaded is that of a method it jumps to the entry point of that method.  The method loaded is the MNU method for the class.

Thank's Eliot. Logic is clear for me now. And it is named PUC, right? What abbreviation means?


Reply | Threaded
Open this post in threaded view
|

Re: [VM-dev] Does doesNotUnderstand:/cannotInterprer: jitted?

Levente Uzonyi
 
PUC = PIC with a typo

Levente
Reply | Threaded
Open this post in threaded view
|

Re: [VM-dev] Does doesNotUnderstand:/cannotInterprer: jitted?

Denis Kudriashov
 

2016-11-23 10:43 GMT+01:00 Levente Uzonyi <[hidden email]>:
PUC = PIC with a typo

:)
Reply | Threaded
Open this post in threaded view
|

Re: [VM-dev] Does doesNotUnderstand:/cannotInterprer: jitted?

timrowledge
In reply to this post by Denis Kudriashov
 

> On 23-11-2016, at 1:11 AM, Denis Kudriashov <[hidden email]> wrote:
>
>
> 2016-11-23 0:47 GMT+01:00 Eliot Miranda <[hidden email]>:
>
> A PIC is just a table of up to 6 register load, class (index) comparison pairs.  The entry code gets the class of the receiver into a temp reg and then jumps to the sequence of load, class (index) comparisons.  Whenever there's a match the comparison jumps to the entry point of the method.  For an MNU case, instead of jumping to a method entry point, it jumps to an abort call at the start of the PIC, before the PIC's entry code.  The abort call creates the method and tests the value loaded into the register.  If the value loaded is that of a method it jumps to the entry point of that method.  The method loaded is the MNU method for the class.
>
> Thank's Eliot. Logic is clear for me now.

More explanation of the PIC structure at http://wiki.squeak.org/squeak/6205


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- IQ = dx / (1 + dx), where x = age.