About uncacheable singly implemented messages

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

About uncacheable singly implemented messages

Andres Valloud-4
Hello,

Typically, with ICs and PICs, messages like isKindOf: are uncacheable.  
This is because if you do something like

LargePositiveInteger isKindOf: Object

then isKindOf: does, essentially:

1.  LargePositiveInteger == Object?  No, so send isKindOf: to
LargePositiveInteger's superclass.

2.  LargeInteger == Object?  No, so send isKindOf: to LargeInteger's
superclass.

3.  Integer == Object?  No, so send isKindOf: to Integer's superclass.

4.  Magnitude == Object?  No, so send isKindOf: to Magnitude's superclass.

5.  Object == Object?  Yes, done.

Note how, in every line, there are multiple forced lookups the VM must
perform.  Each isKindOf: send requires a lookup because the class of the
receiver's class changes.  Each send of superclass requires a lookup
because of the receiver's class changes.  Fortunately, == can be
implemented as a special bytecode that does not require actual message
sends.  However, the problem is that the code runs horribly slow due to
the numerous lookups.  I know this because changing VW's sorted
collection method dictionary approach with an actual hashed collection
results in speedups close to 2x in code such as

| receivers |
receivers := Object withAllSubclasses reverse.
Time millisecondsToRun: [receivers do: [:each | each yourself]]

Are there any bits of interesting information on how to cache or
otherwise speed up these lookup-rich cases?

Andres.


--
To unsubscribe, reply using "remove me" as the subject.
Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

Paolo Bonzini-2
On 04/15/2010 10:27 PM, Andres Valloud wrote:
> Are there any bits of interesting information on how to cache or
> otherwise speed up these lookup-rich cases?

I suppose sends of the same selector that is executing (including most
sends to super) could be special cased.

For each selector you create a small Bloom filter that, given a class,
says whether the method is overridden.  It can be even a single word,
anyway it will be almost completely zero if there is a possibility of
optimization; after a number of misses you throw it away just like you
throw away the IC/PIC.

Before looking up the method (but after the inline cache fails), you
check with the Bloom filter whether a full lookup needs to be performed.
  If the Bloom filter test passes, you just reinvoke the same
translation that is currently executing.

It might be an alternative to PICs for this kind of sends.  Maybe
instead of having max. 8 PIC slots, you have max. 4 PIC slots + the
Bloom filter thing before declaring the callsite megamorphic.

I hope you can figure this quick explanation of a quick idea, and I hope
it actually speeds out something.  I just threw it together now.

Have fun!


--
Subscription settings: http://groups.google.com/group/smalltalk-research/subscribe?hl=en
Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

Paolo Bonzini-2
In reply to this post by Andres Valloud-4
On 04/15/2010 10:27 PM, Andres Valloud wrote:
> Are there any bits of interesting information on how to cache or
> otherwise speed up these lookup-rich cases?

What I said before makes no sense.  Clicked send before showering,
should have done so after.

But these two might:

1) Hashed inline cache.  For sends of the same selector, make a wide PIC
but only look up one element based on the identityHash of the receiver's
class.

2) Call stack memoization inline cache.  For sends of the same selector,
reserve space for a small array of inline cache entries instead of the
PIC.  Whenever a method that includes a same-selector send is called
from outside, reset an index j to 0.  Instead when it is called from
another implementation of the same selector, increment j at the top of
the callee.  When translating a same-selector send, treat the j-th
element of the array as a monomorphic inline cache or at most a very
small PIC (2-3 elements).  If I got this right, as long as the
call-stack isn't too deep it should always hit.  Also, in the #isKindOf:
case if the various receivers have the same distance from Object and a
common superclass, the cache should hit at least partially.

Paolo


--
Subscription settings: http://groups.google.com/group/smalltalk-research/subscribe?hl=en
Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

Diego-4
May be you could increase the expressive power of the PICs. Currently
you have entries of type: self class = Object ifTrue: ["use some
method implementation"].
How about adding a new type of entry like: (self class
isOrInheritsFrom: Object) ifTrue: ["use some method implementation"].
Then you could detect that the same method implementation is being
used for a lot of different classes, so you realize it must be in one
superclass, so you add that kind of entry to the PIC... You should
check for reimplementations of that message for it to work
correctly...
So for example in the isKindOf: example, you detect that you always
end up using Object >> #isKindOf: so you add
(self class isOrInheritsFrom: Object) ifTrue: [Object >> #isKindOf:].
to the PIC. Then you check for reimplementation, and as Proxy
subclasses Object and it has it's own implementation of isKindOf the
PIC ends being:
(self class isOrInheritsFrom: Proxy) ifTrue: [Proxy >> #isKindOf:].
(self class isOrInheritsFrom: Object) ifTrue: [Object >> #isKindOf:].

You need a fast way of testing (self class isOrInheritsFrom: Object)
to make this work... If iterating by self class hierarchiy until nil
isn't efficient, you might create a cache on each class, with a unique
number for each of it's superclasses (the hash, or oop or some
number), and have a sorted array, and perform a binary search to test
if something is a subclass of another something... as classes usually
don't have that many superclasses, that might work...

Then, for any selector with N implementations, you only need N entries
in a PIC, to have a 100% hit rate...

Diego

On Thu, Apr 15, 2010 at 8:37 PM, Paolo Bonzini <[hidden email]> wrote:

> On 04/15/2010 10:27 PM, Andres Valloud wrote:
>>
>> Are there any bits of interesting information on how to cache or
>> otherwise speed up these lookup-rich cases?
>
> What I said before makes no sense.  Clicked send before showering, should
> have done so after.
>
> But these two might:
>
> 1) Hashed inline cache.  For sends of the same selector, make a wide PIC but
> only look up one element based on the identityHash of the receiver's class.
>
> 2) Call stack memoization inline cache.  For sends of the same selector,
> reserve space for a small array of inline cache entries instead of the PIC.
>  Whenever a method that includes a same-selector send is called from
> outside, reset an index j to 0.  Instead when it is called from another
> implementation of the same selector, increment j at the top of the callee.
>  When translating a same-selector send, treat the j-th element of the array
> as a monomorphic inline cache or at most a very small PIC (2-3 elements).
>  If I got this right, as long as the call-stack isn't too deep it should
> always hit.  Also, in the #isKindOf: case if the various receivers have the
> same distance from Object and a common superclass, the cache should hit at
> least partially.
>
> Paolo
>
>
> --
> Subscription settings:
> http://groups.google.com/group/smalltalk-research/subscribe?hl=en
>



--
Diego Geffner
-------------------------------------------------
Desarrollo y Tecnología - Mercap S.R.L.
Tacuarí 202 - 7º Piso - Tel: 54-11-48781116 al 19
Interno: 115
Ciudad Autónoma de Buenos Aires - Argentina
http://www.mercapsoftware.com

-------------------------------------------------

Este mensaje es confidencial. Puede contener información amparada por
el secreto profesional. Si usted ha
recibido este e-mail por error, por favor comuníquenoslo
inmediatamente via e-mail y tenga la amabilidad
de eliminarlo de su sistema; no deberá copiar el mensaje ni divulgar
su contenido a ninguna persona.
Muchas gracias.

This message is confidential. It may also contain information that is
privileged or otherwise legally exempt
from disclosure. If you have received it by mistake please let us know
by e-mail immediately and delete it
from your system; also you shouldn't copy the message nor disclose its
contents to anyone. Thanks.
jas
Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

jas
In reply to this post by Andres Valloud-4
Hi Andres,

Given a typical implementation

        Object>>isKindOf: targetClass
                ^self class includesBehavior: targetClass

        Behavior>>includesBehavior: targetClass
                ^self == targetClass
                        or:     [superclass includesBehavior: targetClass]
       
the lookups are a bit expensive.

(Maybe that's a good thing, in this particular case,
 as it may just discourage the use of #isKindOf:.)

The image I'm running solves the problem by
implementing the method as a primitive.

Short of that, another approach would be to go the #basic route,
and implement a far less general version, for use in the typical
case - which I suspect is virtually indistinguishable from 100%
of all uses:

Object>>isKindOf: targetClass
    ^self class isKindOf: targetClass

class>>isKindOf: targetClass
    ^self basicIsKindOf: targetClass

class>>basicIsKindOf: targetClass
    | doh |
    doh := self.
    [ doh == targetClass or: [doh == UndefinedObject]
    ] whileFalse:
        [ doh := doh superclass ].
    ^doh == targetClass

The implementation doesn't generalize, per se,
to whatever other methods you have in mind,
but the idea itself might:

    If the PIC is adding useless overhead,
    and said overhead is in fact compounding,
    then the "pure" semantics of method sends
    and extreme late binding isn't providing
    a realizable value in the case at hand,
    -- so we can safely drop the pretense,
    and expense, of maintaining the illusion,
    if there is reason to do so.

    And there may be reason to do so,
    whenever such a case is of relatively
    high occurance.

Regards,

-cstb

 
At 01:27 PM 4/15/2010, Andres Valloud wrote:

>Hello,
>
>Typically, with ICs and PICs, messages like isKindOf: are uncacheable.  
>This is because if you do something like
>
>LargePositiveInteger isKindOf: Object
>
>then isKindOf: does, essentially:
>
>1.  LargePositiveInteger == Object?  No, so send isKindOf: to LargePositiveInteger's superclass.
>
>2.  LargeInteger == Object?  No, so send isKindOf: to LargeInteger's superclass.
>
>3.  Integer == Object?  No, so send isKindOf: to Integer's superclass.
>
>4.  Magnitude == Object?  No, so send isKindOf: to Magnitude's superclass.
>
>5.  Object == Object?  Yes, done.
>
>Note how, in every line, there are multiple forced lookups the VM must perform.  Each isKindOf: send requires a lookup because the class of the receiver's class changes.  Each send of superclass requires a lookup because of the receiver's class changes.  Fortunately, == can be implemented as a special bytecode that does not require actual message sends.  However, the problem is that the code runs horribly slow due to the numerous lookups.  I know this because changing VW's sorted collection method dictionary approach with an actual hashed collection results in speedups close to 2x in code such as
>
>| receivers |
>receivers := Object withAllSubclasses reverse.
>Time millisecondsToRun: [receivers do: [:each | each yourself]]
>
>Are there any bits of interesting information on how to cache or otherwise speed up these lookup-rich cases?
>
>Andres.
>
>
>--
>To unsubscribe, reply using "remove me" as the subject.
>
>
>__________ Information from ESET NOD32 Antivirus, version of virus signature database 4826 (20100202) __________
>
>The message was checked by ESET NOD32 Antivirus.
>
>http://www.eset.com
>
>


Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

Mariano Martinez Peck
Hi Andres. I came late to this conversation. I am not sure if may help, but I once read this paper:

http://scg.unibe.ch/archive/papers/Haup07aPIC.pdf

and seemed very cool for me. I don't think they implemented it, but at least you can get an idea.

Cheers

Mariano

On Sun, Apr 18, 2010 at 1:37 PM, cstb <[hidden email]> wrote:
Hi Andres,

Given a typical implementation

       Object>>isKindOf: targetClass
               ^self class includesBehavior: targetClass

       Behavior>>includesBehavior: targetClass
               ^self == targetClass
                       or:     [superclass includesBehavior: targetClass]

the lookups are a bit expensive.

(Maybe that's a good thing, in this particular case,
 as it may just discourage the use of #isKindOf:.)

The image I'm running solves the problem by
implementing the method as a primitive.

Short of that, another approach would be to go the #basic route,
and implement a far less general version, for use in the typical
case - which I suspect is virtually indistinguishable from 100%
of all uses:

Object>>isKindOf: targetClass
   ^self class isKindOf: targetClass

class>>isKindOf: targetClass
   ^self basicIsKindOf: targetClass

class>>basicIsKindOf: targetClass
   | doh |
   doh := self.
   [ doh == targetClass or: [doh == UndefinedObject]
   ] whileFalse:
       [ doh := doh superclass ].
   ^doh == targetClass

The implementation doesn't generalize, per se,
to whatever other methods you have in mind,
but the idea itself might:

   If the PIC is adding useless overhead,
   and said overhead is in fact compounding,
   then the "pure" semantics of method sends
   and extreme late binding isn't providing
   a realizable value in the case at hand,
   -- so we can safely drop the pretense,
   and expense, of maintaining the illusion,
   if there is reason to do so.

   And there may be reason to do so,
   whenever such a case is of relatively
   high occurance.

Regards,

-cstb


At 01:27 PM 4/15/2010, Andres Valloud wrote:
>Hello,
>
>Typically, with ICs and PICs, messages like isKindOf: are uncacheable.
>This is because if you do something like
>
>LargePositiveInteger isKindOf: Object
>
>then isKindOf: does, essentially:
>
>1.  LargePositiveInteger == Object?  No, so send isKindOf: to LargePositiveInteger's superclass.
>
>2.  LargeInteger == Object?  No, so send isKindOf: to LargeInteger's superclass.
>
>3.  Integer == Object?  No, so send isKindOf: to Integer's superclass.
>
>4.  Magnitude == Object?  No, so send isKindOf: to Magnitude's superclass.
>
>5.  Object == Object?  Yes, done.
>
>Note how, in every line, there are multiple forced lookups the VM must perform.  Each isKindOf: send requires a lookup because the class of the receiver's class changes.  Each send of superclass requires a lookup because of the receiver's class changes.  Fortunately, == can be implemented as a special bytecode that does not require actual message sends.  However, the problem is that the code runs horribly slow due to the numerous lookups.  I know this because changing VW's sorted collection method dictionary approach with an actual hashed collection results in speedups close to 2x in code such as
>
>| receivers |
>receivers := Object withAllSubclasses reverse.
>Time millisecondsToRun: [receivers do: [:each | each yourself]]
>
>Are there any bits of interesting information on how to cache or otherwise speed up these lookup-rich cases?
>
>Andres.
>
>
>--
>To unsubscribe, reply using "remove me" as the subject.
>
>
>__________ Information from ESET NOD32 Antivirus, version of virus signature database 4826 (20100202) __________
>
>The message was checked by ESET NOD32 Antivirus.
>
>http://www.eset.com
>
>



Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

Marcus Denker-4

On Apr 28, 2010, at 8:15 PM, Mariano Martinez Peck wrote:

> Hi Andres. I came late to this conversation. I am not sure if may help, but I once read this paper:
>
> http://scg.unibe.ch/archive/papers/Haup07aPIC.pdf
>

No... this is just a paper about standard - PICs (but in an interpreter and reflectively accessible).
The question of Andres was about the cases where PIC do not help.

        Marcus

--
Marcus Denker  -- http://www.marcusdenker.de
INRIA Lille -- Nord Europe. Team RMoD.

Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

Michael Haupt-3
Hi,

On Thu, Apr 29, 2010 at 8:58 AM, Marcus Denker <[hidden email]> wrote:
>> http://scg.unibe.ch/archive/papers/Haup07aPIC.pdf
>
> No... this is just a paper about standard - PICs (but in an interpreter and reflectively accessible).

... and unfortunately, we never got around to completing the
rudimentary implementation.

Best,

Michael
Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

Andres Valloud-4
In reply to this post by Diego-4
Generally speaking, the class inheritance check is expensive and it
would be much better to avoid it.

On 4/15/10 20:40 , Diego wrote:

> May be you could increase the expressive power of the PICs. Currently
> you have entries of type: self class = Object ifTrue: ["use some
> method implementation"].
> How about adding a new type of entry like: (self class
> isOrInheritsFrom: Object) ifTrue: ["use some method implementation"].
> Then you could detect that the same method implementation is being
> used for a lot of different classes, so you realize it must be in one
> superclass, so you add that kind of entry to the PIC... You should
> check for reimplementations of that message for it to work
> correctly...
> So for example in the isKindOf: example, you detect that you always
> end up using Object>>  #isKindOf: so you add
> (self class isOrInheritsFrom: Object) ifTrue: [Object>>  #isKindOf:].
> to the PIC. Then you check for reimplementation, and as Proxy
> subclasses Object and it has it's own implementation of isKindOf the
> PIC ends being:
> (self class isOrInheritsFrom: Proxy) ifTrue: [Proxy>>  #isKindOf:].
> (self class isOrInheritsFrom: Object) ifTrue: [Object>>  #isKindOf:].
>
> You need a fast way of testing (self class isOrInheritsFrom: Object)
> to make this work... If iterating by self class hierarchiy until nil
> isn't efficient, you might create a cache on each class, with a unique
> number for each of it's superclasses (the hash, or oop or some
> number), and have a sorted array, and perform a binary search to test
> if something is a subclass of another something... as classes usually
> don't have that many superclasses, that might work...
>
> Then, for any selector with N implementations, you only need N entries
> in a PIC, to have a 100% hit rate...
>
> Diego
>
> On Thu, Apr 15, 2010 at 8:37 PM, Paolo Bonzini<[hidden email]>  wrote:
>    
>> On 04/15/2010 10:27 PM, Andres Valloud wrote:
>>      
>>> Are there any bits of interesting information on how to cache or
>>> otherwise speed up these lookup-rich cases?
>>>        
>> What I said before makes no sense.  Clicked send before showering, should
>> have done so after.
>>
>> But these two might:
>>
>> 1) Hashed inline cache.  For sends of the same selector, make a wide PIC but
>> only look up one element based on the identityHash of the receiver's class.
>>
>> 2) Call stack memoization inline cache.  For sends of the same selector,
>> reserve space for a small array of inline cache entries instead of the PIC.
>>   Whenever a method that includes a same-selector send is called from
>> outside, reset an index j to 0.  Instead when it is called from another
>> implementation of the same selector, increment j at the top of the callee.
>>   When translating a same-selector send, treat the j-th element of the array
>> as a monomorphic inline cache or at most a very small PIC (2-3 elements).
>>   If I got this right, as long as the call-stack isn't too deep it should
>> always hit.  Also, in the #isKindOf: case if the various receivers have the
>> same distance from Object and a common superclass, the cache should hit at
>> least partially.
>>
>> Paolo
>>
>>
>> --
>> Subscription settings:
>> http://groups.google.com/group/smalltalk-research/subscribe?hl=en
>>
>>      
>
>
>    
Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

Andres Valloud-4
In reply to this post by jas
It could be isKindOf:, or printString, or yourself... generally
speaking, all these messages can't / shouldn't be implemented as primitives.

On 4/18/10 4:37 , cstb wrote:

> Hi Andres,
>
> Given a typical implementation
>
>          Object>>isKindOf: targetClass
>                  ^self class includesBehavior: targetClass
>
>          Behavior>>includesBehavior: targetClass
>                  ^self == targetClass
>                          or:     [superclass includesBehavior: targetClass]
>
> the lookups are a bit expensive.
>
> (Maybe that's a good thing, in this particular case,
>   as it may just discourage the use of #isKindOf:.)
>
> The image I'm running solves the problem by
> implementing the method as a primitive.
>
> Short of that, another approach would be to go the #basic route,
> and implement a far less general version, for use in the typical
> case - which I suspect is virtually indistinguishable from 100%
> of all uses:
>
> Object>>isKindOf: targetClass
>      ^self class isKindOf: targetClass
>
> class>>isKindOf: targetClass
>      ^self basicIsKindOf: targetClass
>
> class>>basicIsKindOf: targetClass
>      | doh |
>      doh := self.
>      [ doh == targetClass or: [doh == UndefinedObject]
>      ] whileFalse:
>          [ doh := doh superclass ].
>      ^doh == targetClass
>
> The implementation doesn't generalize, per se,
> to whatever other methods you have in mind,
> but the idea itself might:
>
>      If the PIC is adding useless overhead,
>      and said overhead is in fact compounding,
>      then the "pure" semantics of method sends
>      and extreme late binding isn't providing
>      a realizable value in the case at hand,
>      -- so we can safely drop the pretense,
>      and expense, of maintaining the illusion,
>      if there is reason to do so.
>
>      And there may be reason to do so,
>      whenever such a case is of relatively
>      high occurance.
>
> Regards,
>
> -cstb
>
>
> At 01:27 PM 4/15/2010, Andres Valloud wrote:
>    
>> Hello,
>>
>> Typically, with ICs and PICs, messages like isKindOf: are uncacheable.
>> This is because if you do something like
>>
>> LargePositiveInteger isKindOf: Object
>>
>> then isKindOf: does, essentially:
>>
>> 1.  LargePositiveInteger == Object?  No, so send isKindOf: to LargePositiveInteger's superclass.
>>
>> 2.  LargeInteger == Object?  No, so send isKindOf: to LargeInteger's superclass.
>>
>> 3.  Integer == Object?  No, so send isKindOf: to Integer's superclass.
>>
>> 4.  Magnitude == Object?  No, so send isKindOf: to Magnitude's superclass.
>>
>> 5.  Object == Object?  Yes, done.
>>
>> Note how, in every line, there are multiple forced lookups the VM must perform.  Each isKindOf: send requires a lookup because the class of the receiver's class changes.  Each send of superclass requires a lookup because of the receiver's class changes.  Fortunately, == can be implemented as a special bytecode that does not require actual message sends.  However, the problem is that the code runs horribly slow due to the numerous lookups.  I know this because changing VW's sorted collection method dictionary approach with an actual hashed collection results in speedups close to 2x in code such as
>>
>> | receivers |
>> receivers := Object withAllSubclasses reverse.
>> Time millisecondsToRun: [receivers do: [:each | each yourself]]
>>
>> Are there any bits of interesting information on how to cache or otherwise speed up these lookup-rich cases?
>>
>> Andres.
>>
>>
>> --
>> To unsubscribe, reply using "remove me" as the subject.
>>
>>
>> __________ Information from ESET NOD32 Antivirus, version of virus signature database 4826 (20100202) __________
>>
>> The message was checked by ESET NOD32 Antivirus.
>>
>> http://www.eset.com
>>
>>
>>      
>
> .
>
>    
Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

Diego-4
In reply to this post by Andres Valloud-4
Yes, but if you only use it on this kind of cases, where it probably
has a 99% hit rate, because there is a single implementation in
object, and the alternative is even more expensive (multiple lookups),
it surely would be worth it...

On Mon, May 3, 2010 at 8:45 PM, Andres Valloud
<[hidden email]> wrote:

> Generally speaking, the class inheritance check is expensive and it would be
> much better to avoid it.
>
> On 4/15/10 20:40 , Diego wrote:
>>
>> May be you could increase the expressive power of the PICs. Currently
>> you have entries of type: self class = Object ifTrue: ["use some
>> method implementation"].
>> How about adding a new type of entry like: (self class
>> isOrInheritsFrom: Object) ifTrue: ["use some method implementation"].
>> Then you could detect that the same method implementation is being
>> used for a lot of different classes, so you realize it must be in one
>> superclass, so you add that kind of entry to the PIC... You should
>> check for reimplementations of that message for it to work
>> correctly...
>> So for example in the isKindOf: example, you detect that you always
>> end up using Object>>  #isKindOf: so you add
>> (self class isOrInheritsFrom: Object) ifTrue: [Object>>  #isKindOf:].
>> to the PIC. Then you check for reimplementation, and as Proxy
>> subclasses Object and it has it's own implementation of isKindOf the
>> PIC ends being:
>> (self class isOrInheritsFrom: Proxy) ifTrue: [Proxy>>  #isKindOf:].
>> (self class isOrInheritsFrom: Object) ifTrue: [Object>>  #isKindOf:].
>>
>> You need a fast way of testing (self class isOrInheritsFrom: Object)
>> to make this work... If iterating by self class hierarchiy until nil
>> isn't efficient, you might create a cache on each class, with a unique
>> number for each of it's superclasses (the hash, or oop or some
>> number), and have a sorted array, and perform a binary search to test
>> if something is a subclass of another something... as classes usually
>> don't have that many superclasses, that might work...
>>
>> Then, for any selector with N implementations, you only need N entries
>> in a PIC, to have a 100% hit rate...
>>
>> Diego
>>
>> On Thu, Apr 15, 2010 at 8:37 PM, Paolo Bonzini<[hidden email]>  wrote:
>>
>>>
>>> On 04/15/2010 10:27 PM, Andres Valloud wrote:
>>>
>>>>
>>>> Are there any bits of interesting information on how to cache or
>>>> otherwise speed up these lookup-rich cases?
>>>>
>>>
>>> What I said before makes no sense.  Clicked send before showering, should
>>> have done so after.
>>>
>>> But these two might:
>>>
>>> 1) Hashed inline cache.  For sends of the same selector, make a wide PIC
>>> but
>>> only look up one element based on the identityHash of the receiver's
>>> class.
>>>
>>> 2) Call stack memoization inline cache.  For sends of the same selector,
>>> reserve space for a small array of inline cache entries instead of the
>>> PIC.
>>>  Whenever a method that includes a same-selector send is called from
>>> outside, reset an index j to 0.  Instead when it is called from another
>>> implementation of the same selector, increment j at the top of the
>>> callee.
>>>  When translating a same-selector send, treat the j-th element of the
>>> array
>>> as a monomorphic inline cache or at most a very small PIC (2-3 elements).
>>>  If I got this right, as long as the call-stack isn't too deep it should
>>> always hit.  Also, in the #isKindOf: case if the various receivers have
>>> the
>>> same distance from Object and a common superclass, the cache should hit
>>> at
>>> least partially.
>>>
>>> Paolo
>>>
>>>
>>> --
>>> Subscription settings:
>>> http://groups.google.com/group/smalltalk-research/subscribe?hl=en
>>>
>>>
>>
>>
>>
>



--
Diego Geffner
-------------------------------------------------
Desarrollo y Tecnología - Mercap S.R.L.
Tacuarí 202 - 7º Piso - Tel: 54-11-48781116 al 19
Interno: 115
Ciudad Autónoma de Buenos Aires - Argentina
http://www.mercapsoftware.com

-------------------------------------------------

Este mensaje es confidencial. Puede contener información amparada por
el secreto profesional. Si usted ha
recibido este e-mail por error, por favor comuníquenoslo
inmediatamente via e-mail y tenga la amabilidad
de eliminarlo de su sistema; no deberá copiar el mensaje ni divulgar
su contenido a ninguna persona.
Muchas gracias.

This message is confidential. It may also contain information that is
privileged or otherwise legally exempt
from disclosure. If you have received it by mistake please let us know
by e-mail immediately and delete it
from your system; also you shouldn't copy the message nor disclose its
contents to anyone. Thanks.
jas
Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

jas
In reply to this post by Andres Valloud-4
At 04:46 PM 5/3/2010, Andres Valloud wrote:
>It could be isKindOf:, or printString, or yourself... generally speaking, all these messages can't / shouldn't be implemented as primitives.


Hi Andres,

My point was that the following method *should* inline,
yet produce the same result as the typical implementation
as long as the constraint holds (i.e. singly implemented).

And therefore, one either assumes or verifies (at compile time)
that the constraint (singly implemented) holds (at compile time),
and trade away late binding for performance.

I.e compile the "#basic" version,
*without* compiling a prefix to verify (at runtime)
that it was safe to have done so.



Object class>>basicIsKindOf: targetClass
     | doh |
     doh := self.
     [ doh == targetClass or: [doh == UndefinedObject]
     ] whileFalse:
         [ doh := doh superclass ].
     ^doh == targetClass



Regards,

-cstb



>On 4/18/10 4:37 , cstb wrote:
>>Hi Andres,
>>
>>Given a typical implementation
>>
>>         Object>>isKindOf: targetClass
>>                 ^self class includesBehavior: targetClass
>>
>>         Behavior>>includesBehavior: targetClass
>>                 ^self == targetClass
>>                         or:     [superclass includesBehavior: targetClass]
>>
>>the lookups are a bit expensive.
>>
>>(Maybe that's a good thing, in this particular case,
>>  as it may just discourage the use of #isKindOf:.)
>>
>>The image I'm running solves the problem by
>>implementing the method as a primitive.
>>
>>Short of that, another approach would be to go the #basic route,
>>and implement a far less general version, for use in the typical
>>case - which I suspect is virtually indistinguishable from 100%
>>of all uses:
>>
>>Object>>isKindOf: targetClass
>>     ^self class isKindOf: targetClass
>>
>>class>>isKindOf: targetClass
>>     ^self basicIsKindOf: targetClass
>>
>>class>>basicIsKindOf: targetClass
>>     | doh |
>>     doh := self.
>>     [ doh == targetClass or: [doh == UndefinedObject]
>>     ] whileFalse:
>>         [ doh := doh superclass ].
>>     ^doh == targetClass
>>
>>The implementation doesn't generalize, per se,
>>to whatever other methods you have in mind,
>>but the idea itself might:
>>
>>     If the PIC is adding useless overhead,
>>     and said overhead is in fact compounding,
>>     then the "pure" semantics of method sends
>>     and extreme late binding isn't providing
>>     a realizable value in the case at hand,
>>     -- so we can safely drop the pretense,
>>     and expense, of maintaining the illusion,
>>     if there is reason to do so.
>>
>>     And there may be reason to do so,
>>     whenever such a case is of relatively
>>     high occurance.
>>
>>Regards,
>>
>>-cstb
>>
>>
>>At 01:27 PM 4/15/2010, Andres Valloud wrote:
>>  
>>>Hello,
>>>
>>>Typically, with ICs and PICs, messages like isKindOf: are uncacheable.
>>>This is because if you do something like
>>>
>>>LargePositiveInteger isKindOf: Object
>>>
>>>then isKindOf: does, essentially:
>>>
>>>1.  LargePositiveInteger == Object?  No, so send isKindOf: to LargePositiveInteger's superclass.
>>>
>>>2.  LargeInteger == Object?  No, so send isKindOf: to LargeInteger's superclass.
>>>
>>>3.  Integer == Object?  No, so send isKindOf: to Integer's superclass.
>>>
>>>4.  Magnitude == Object?  No, so send isKindOf: to Magnitude's superclass.
>>>
>>>5.  Object == Object?  Yes, done.
>>>
>>>Note how, in every line, there are multiple forced lookups the VM must perform.  Each isKindOf: send requires a lookup because the class of the receiver's class changes.  Each send of superclass requires a lookup because of the receiver's class changes.  Fortunately, == can be implemented as a special bytecode that does not require actual message sends.  However, the problem is that the code runs horribly slow due to the numerous lookups.  I know this because changing VW's sorted collection method dictionary approach with an actual hashed collection results in speedups close to 2x in code such as
>>>
>>>| receivers |
>>>receivers := Object withAllSubclasses reverse.
>>>Time millisecondsToRun: [receivers do: [:each | each yourself]]
>>>
>>>Are there any bits of interesting information on how to cache or otherwise speed up these lookup-rich cases?
>>>
>>>Andres.
>>>
>>>
>>>--
>>>To unsubscribe, reply using "remove me" as the subject.
>>>
>>>
>>>__________ Information from ESET NOD32 Antivirus, version of virus signature database 4826 (20100202) __________
>>>
>>>The message was checked by ESET NOD32 Antivirus.
>>>
>>>http://www.eset.com
>>>
>>>
>>>    
>>
>>.
>>
>>  
>
>
>__________ Information from ESET NOD32 Antivirus, version of virus signature database 4826 (20100202) __________
>
>The message was checked by ESET NOD32 Antivirus.
>
>http://www.eset.com
>
>


Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

Paolo Bonzini-3
On Tue, May 4, 2010 at 23:50, cstb <[hidden email]> wrote:
> At 04:46 PM 5/3/2010, Andres Valloud wrote:
>> It could be isKindOf:, or printString, or yourself... generally
>> speaking, all these messages can't / shouldn't be implemented
>> as primitives.

#printString and especially #yourself are rarely megamorphic call sites though.

> My point was that the following method *should* inline,
> yet produce the same result as the typical implementation
> as long as the constraint holds (i.e. singly implemented).

True, however the possible receiver's classes will still be many,
precluding the use of PICs.  For the particular case of super sends,
you can convert tail recursion like you did, or implement inline
caching tricks, but it won't help for other megamorphic sends that
have a single implementation.  In fact, even #isKindOf: will be one of
them.  For all these sends, you still have to do useless lookups and
(though they are only one per call site) you still cannot use PICs.

So, while you and I provided a solution to this particular problem,
neither of us provided a solution to the general problem of
singly-implemented methods with megamorphic call sites.

> Object class>>basicIsKindOf: targetClass
>     | doh |
>     doh := self.
>     [ doh == targetClass or: [doh == UndefinedObject]
>     ] whileFalse:
>         [ doh := doh superclass ].
>     ^doh == targetClass

There is already a similar method (#includesBehavior:).  Note however that

 Object>>isKindOf: targetClass
     | doh |
     doh := self class.
     [ doh == targetClass or: [doh == UndefinedObject]
     ] whileFalse:
         [ doh := doh superclass ].
     ^doh == targetClass

 Behavior>>includesBehavior: targetClass
     | doh |
     doh := self.
     [ doh == targetClass or: [doh == UndefinedObject]
     ] whileFalse:
         [ doh := doh superclass ].
     ^doh == targetClass

are perfectly safe _and_ equivalent to the corresponding recursive
implementations. You know that the superclass of Object is nil, thus
once you reach this implementation no superclass will be able to
change to another one.

Paolo
Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

Eliot Miranda-2


On Tue, May 4, 2010 at 11:16 PM, Paolo Bonzini <[hidden email]> wrote:
On Tue, May 4, 2010 at 23:50, cstb <[hidden email]> wrote:
> At 04:46 PM 5/3/2010, Andres Valloud wrote:
>> It could be isKindOf:, or printString, or yourself... generally
>> speaking, all these messages can't / shouldn't be implemented
>> as primitives.

#printString and especially #yourself are rarely megamorphic call sites though.

> My point was that the following method *should* inline,
> yet produce the same result as the typical implementation
> as long as the constraint holds (i.e. singly implemented).

True, however the possible receiver's classes will still be many,
precluding the use of PICs.  For the particular case of super sends,
you can convert tail recursion like you did, or implement inline
caching tricks, but it won't help for other megamorphic sends that
have a single implementation.  In fact, even #isKindOf: will be one of
them.  For all these sends, you still have to do useless lookups and
(though they are only one per call site) you still cannot use PICs.

You can't use ordinary jump table ICs but you can use what I call "Open PICs" (I call dispatch tables, having a maximium number of cases, "Closed").  An open PIC is a hash table lookup specific to the call site's selector, so within the hash table lookup, which is created on the fly, the selector and its hash is a constant.  This makes the hash table lookup faster and reduces register pressure.  This technique is in both the VW and the Cog VMs.  The life cycle of the lesser-spotted in-line cache is as follows:

1. send site unlinked, (calls a lookup and link routine), which if executed morphs into
2. send site linked to a target method whose prologue does an inline cache check, which if fails morphs
3. send site linked to a two entry Closed PIC jumping to original or second method depending on receiver class, which on no match
4... CPIC grows as new receiver classes are encountered which when beyond PIC max size (e.g. 6 or 8 cases) morphs
5. send site linked to an Open PIC for the send site's selector

In Cog, however, all the open PICs are stored on a linked list and so megamorphic sites with the same selector share open PICs.  Further, step 2 will short-circuit directly to step 5 if a failing monomorphic send site's selector matches an Open PIC on the Open PIC list.  This saves a lot of cache flushes as a closed PIC for a likely megamorphic selector grows.

There's an interesting variation on lookup from Open PICs one can use profitably if class hierarchy lookups are high.  Open PIC sends are likely to bind to methods found high in the hierarchy since megaphorphic methods are usually implemented by abstract classes which tend to be towards the root of the class hierarchy, and the methods found are very likely to be applicable to other classes encountered in the same open PIC.  So one can hash the result of an open PIC method lookup in the first-level lookup cache for each class we search during the lookup.  If we then also probe the cache for each class we encounter in the hierarchy then subsequent sends of a selector will find a match in the first common ancestor class.  e.g. in some Open PIC sending isNil to true and false, first we send the message to true and search True's hierarchy, notfinding the method until we reach Object.  But we put entries for the method in the cache for True x isNil -> Object>>#isNil, Boolean x isNil -> Object>>#isNil and Object x isNil -> Object>>#isNil.  Note that the existing algorithm would only cache True x isNil -> Object>>#isNil.  Now on the subsequent send of #isNil to false we search the method dictionary of False, and continue.  But then we immediately find the Boolean x isNil -> Object>>#isNil entry, saving two method dictionary searches at the cost of more cache probes and stores.

Given that in the steady state in a system with PICs the only method lookups come from Open PICs this works well.  Note that if we use this variation on lookup for every send we reduce the effectiveness of the first-level lookup cache and the system runs slower overall.  This technique is in the VW VM but not yet in Cog.


best
Eliot


So, while you and I provided a solution to this particular problem,
neither of us provided a solution to the general problem of
singly-implemented methods with megamorphic call sites.

> Object class>>basicIsKindOf: targetClass
>     | doh |
>     doh := self.
>     [ doh == targetClass or: [doh == UndefinedObject]
>     ] whileFalse:
>         [ doh := doh superclass ].
>     ^doh == targetClass

There is already a similar method (#includesBehavior:).  Note however that

 Object>>isKindOf: targetClass
    | doh |
    doh := self class.
    [ doh == targetClass or: [doh == UndefinedObject]
    ] whileFalse:
        [ doh := doh superclass ].
    ^doh == targetClass

 Behavior>>includesBehavior: targetClass
    | doh |
    doh := self.
    [ doh == targetClass or: [doh == UndefinedObject]
    ] whileFalse:
        [ doh := doh superclass ].
    ^doh == targetClass

are perfectly safe _and_ equivalent to the corresponding recursive
implementations. You know that the superclass of Object is nil, thus
once you reach this implementation no superclass will be able to
change to another one.

Paolo

jas
Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

jas
In reply to this post by Paolo Bonzini-3

Hi Paulo,

You're right - the {#basic} prefix thing was overkill.

We're still touching on different things,
but getting closer, I think.  

Continuing below...

At 11:16 PM 5/4/2010, Paolo Bonzini wrote:
>On Tue, May 4, 2010 at 23:50, cstb <[hidden email]> wrote:
>> At 04:46 PM 5/3/2010, Andres Valloud wrote:

...

>There is already a similar method (#includesBehavior:).


Yes, but it correctly checks for nil,
which I knew, yet I sloppily checked
for UndefinedObject.  Arrrgh.

Having read the other posts, and yours,
and trying to understand where we're talking
past one another, I noticed a compiler bug:

The compiler in my head is treating

        doh := doh superclass

as a load/store, not a message send.
I believe it would be safe to do so, but
it needs to be called out, at minimum,
because the whole point of this implementation
is to compile such that no actual message sends
and no actual lookups occur.
 
Also, the implementation occurring class side
(with a forward from instance side) was intentional,
so that #isKindOf: could be sent from either.

Regards,

-cstb


...

> Note however that
>
> Object>>isKindOf: targetClass
>     | doh |
>     doh := self class.
>     [ doh == targetClass or: [doh == UndefinedObject]
>     ] whileFalse:
>         [ doh := doh superclass ].
>     ^doh == targetClass
>
> Behavior>>includesBehavior: targetClass
>     | doh |
>     doh := self.
>     [ doh == targetClass or: [doh == UndefinedObject]
>     ] whileFalse:
>         [ doh := doh superclass ].
>     ^doh == targetClass
>
>are perfectly safe _and_ equivalent to the corresponding recursive
>implementations. You know that the superclass of Object is nil, thus
>once you reach this implementation no superclass will be able to
>change to another one.
>
>Paolo
>
>
>__________ Information from ESET NOD32 Antivirus, version of virus signature database 4826 (20100202) __________
>
>The message was checked by ESET NOD32 Antivirus.
>
>http://www.eset.com


Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

Andres Valloud-4
In reply to this post by Diego-4
It may have a 99% hit rate, but the hit check is expensive.

On 5/4/10 5:44 , Diego wrote:

> Yes, but if you only use it on this kind of cases, where it probably
> has a 99% hit rate, because there is a single implementation in
> object, and the alternative is even more expensive (multiple lookups),
> it surely would be worth it...
>
> On Mon, May 3, 2010 at 8:45 PM, Andres Valloud
> <[hidden email]>  wrote:
>    
>> Generally speaking, the class inheritance check is expensive and it would be
>> much better to avoid it.
>>
>> On 4/15/10 20:40 , Diego wrote:
>>      
>>> May be you could increase the expressive power of the PICs. Currently
>>> you have entries of type: self class = Object ifTrue: ["use some
>>> method implementation"].
>>> How about adding a new type of entry like: (self class
>>> isOrInheritsFrom: Object) ifTrue: ["use some method implementation"].
>>> Then you could detect that the same method implementation is being
>>> used for a lot of different classes, so you realize it must be in one
>>> superclass, so you add that kind of entry to the PIC... You should
>>> check for reimplementations of that message for it to work
>>> correctly...
>>> So for example in the isKindOf: example, you detect that you always
>>> end up using Object>>    #isKindOf: so you add
>>> (self class isOrInheritsFrom: Object) ifTrue: [Object>>    #isKindOf:].
>>> to the PIC. Then you check for reimplementation, and as Proxy
>>> subclasses Object and it has it's own implementation of isKindOf the
>>> PIC ends being:
>>> (self class isOrInheritsFrom: Proxy) ifTrue: [Proxy>>    #isKindOf:].
>>> (self class isOrInheritsFrom: Object) ifTrue: [Object>>    #isKindOf:].
>>>
>>> You need a fast way of testing (self class isOrInheritsFrom: Object)
>>> to make this work... If iterating by self class hierarchiy until nil
>>> isn't efficient, you might create a cache on each class, with a unique
>>> number for each of it's superclasses (the hash, or oop or some
>>> number), and have a sorted array, and perform a binary search to test
>>> if something is a subclass of another something... as classes usually
>>> don't have that many superclasses, that might work...
>>>
>>> Then, for any selector with N implementations, you only need N entries
>>> in a PIC, to have a 100% hit rate...
>>>
>>> Diego
>>>
>>> On Thu, Apr 15, 2010 at 8:37 PM, Paolo Bonzini<[hidden email]>    wrote:
>>>
>>>        
>>>> On 04/15/2010 10:27 PM, Andres Valloud wrote:
>>>>
>>>>          
>>>>> Are there any bits of interesting information on how to cache or
>>>>> otherwise speed up these lookup-rich cases?
>>>>>
>>>>>            
>>>> What I said before makes no sense.  Clicked send before showering, should
>>>> have done so after.
>>>>
>>>> But these two might:
>>>>
>>>> 1) Hashed inline cache.  For sends of the same selector, make a wide PIC
>>>> but
>>>> only look up one element based on the identityHash of the receiver's
>>>> class.
>>>>
>>>> 2) Call stack memoization inline cache.  For sends of the same selector,
>>>> reserve space for a small array of inline cache entries instead of the
>>>> PIC.
>>>>   Whenever a method that includes a same-selector send is called from
>>>> outside, reset an index j to 0.  Instead when it is called from another
>>>> implementation of the same selector, increment j at the top of the
>>>> callee.
>>>>   When translating a same-selector send, treat the j-th element of the
>>>> array
>>>> as a monomorphic inline cache or at most a very small PIC (2-3 elements).
>>>>   If I got this right, as long as the call-stack isn't too deep it should
>>>> always hit.  Also, in the #isKindOf: case if the various receivers have
>>>> the
>>>> same distance from Object and a common superclass, the cache should hit
>>>> at
>>>> least partially.
>>>>
>>>> Paolo
>>>>
>>>>
>>>> --
>>>> Subscription settings:
>>>> http://groups.google.com/group/smalltalk-research/subscribe?hl=en
>>>>
>>>>
>>>>          
>>>
>>>
>>>        
>>      
>
>
>    
Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

Andres Valloud-4
In reply to this post by Paolo Bonzini-3
Paolo,

On 5/4/10 23:16 , Paolo Bonzini wrote:

> On Tue, May 4, 2010 at 23:50, cstb<[hidden email]>  wrote:
>    
>> At 04:46 PM 5/3/2010, Andres Valloud wrote:
>>      
>>> It could be isKindOf:, or printString, or yourself... generally
>>> speaking, all these messages can't / shouldn't be implemented
>>> as primitives.
>>>        
> #printString and especially #yourself are rarely megamorphic call sites though.
>    

Right, those are the cases I was asking about.

>
> So, while you and I provided a solution to this particular problem,
> neither of us provided a solution to the general problem of
> singly-implemented methods with megamorphic call sites.
>
>    

In general it's not solved because the below method is still sending
#superclass, which will become a megamorphic message send.

Andres.
Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

Andres Valloud-4
In reply to this post by Eliot Miranda-2
Eliot,

On 5/5/10 9:27 , Eliot Miranda wrote:
> In Cog, however, all the open PICs are stored on a linked list and so
> megamorphic sites with the same selector share open PICs.  Further,
> step 2 will short-circuit directly to step 5 if a failing monomorphic
> send site's selector matches an Open PIC on the Open PIC list.  This
> saves a lot of cache flushes as a closed PIC for a likely megamorphic
> selector grows.

Don't the savings depend on the send site?  Wouldn't it be better to use
the open PICs as lookup caches for IC / PIC misses instead?  At least
you can do it because you share open PICs on a selector basis.  Also,
why do you need to do any cache flushing?  Assuming you're flushing
intentionally, a mere jmp instruction is enough on x86, and it's very
likely you are already executing one of those.  You must have meant
something else, what is it?

Andres.
Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

Andres Valloud-4
In reply to this post by Andres Valloud-4
Maybe that's not phrased right... I'd like to think there will be
basically one implementor for #superclass.  However, the classes of the
receivers will change with each invocation.

On 5/8/10 12:16 , Andres Valloud wrote:
> In general it's not solved because the below method is still sending
> #superclass, which will become a megamorphic message send.
>    
Reply | Threaded
Open this post in threaded view
|

Re: About uncacheable singly implemented messages

Eliot Miranda-2
In reply to this post by Andres Valloud-4


On Sat, May 8, 2010 at 3:44 PM, Andres Valloud <[hidden email]> wrote:
Eliot,


On 5/5/10 9:27 , Eliot Miranda wrote:
In Cog, however, all the open PICs are stored on a linked list and so megamorphic sites with the same selector share open PICs.  Further, step 2 will short-circuit directly to step 5 if a failing monomorphic send site's selector matches an Open PIC on the Open PIC list.  This saves a lot of cache flushes as a closed PIC for a likely megamorphic selector grows.

Don't the savings depend on the send site?

Yes, but in practice this is a good heuristic.
 
 Wouldn't it be better to use the open PICs as lookup caches for IC / PIC misses instead?

Depends on the send site.  In truly megamorphic sites then going straight to the hash lookup is better, right?
 
 At least you can do it because you share open PICs on a selector basis.  Also, why do you need to do any cache flushing?  Assuming you're flushing intentionally, a mere jmp instruction is enough on x86, and it's very likely you are already executing one of those.  You must have meant something else, what is it?

Yes I do mean cache flushing.  On several processors/ISAs cache flush is typically very expensive.  On Transmeta (an x86 to vliw jit) any instruction space modification is much slower than conventional x86 implementations.  ARM & PPC are also very expensive.  So if sites that fail a monomorphic test are relinked against a new 2-element PIC then there will be N - 1 send site rewrites or PIC table rewrites (as additional entries in the PIC are filled in) and so N - 1 cache flushes before the send site gets rewritten as an open PIC and if, based on selector, its much more likely for a polymorphic selector to result in a megamorphic call site then the above heuristic will win.


Andres.