# Cog Primitive Performance

28 messages
12
Open this post in threaded view
|
Report Content as Inappropriate

## Cog Primitive Performance

Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 Hi Eliot, This all sounds pretty cool. I wonder why you kept the multiplication by 16384, when you could have just used #bitShift: 14 instead. If I read it correctly, this is like that in the jitted code too. I suppose the multiplication is a leftover from the time it was faster than bit shift. Levente
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 To be more precise, in SimpleStackBasedCogit >> #genPrimitiveHashMultiply                 MoveCq: 16384 R: TempReg;                 MulR: TempReg R: highReg;                                               "highReg := (16r260D * (receiver bitShift: -14)) + (16r0065 * low)" could read:                 LogicalShiftLeftCq: 14 R: highReg;                                               "highReg := (16r260D * (receiver bitShift: -14)) + (16r0065 * low) bitShift: 14" (Note that I changed the comment too; the shift was missing.) Levente On Tue, 18 Apr 2017, Levente Uzonyi wrote: > > Hi Eliot, > > This all sounds pretty cool. > I wonder why you kept the multiplication by 16384, when you could have > just used #bitShift: 14 instead. If I read it correctly, this is like that > in the jitted code too. > I suppose the multiplication is a leftover from the time it was faster > than bit shift. > > Levente >
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 In reply to this post by Eliot Miranda-2   On 4/17/17 16:45 , Eliot Miranda wrote: > FYI, hash multiply is > hashMultiply > | low | > low := self bitAnd: 16383. > ^(16r260D * low + ((16r260D * (self bitShift: -14) + (16r0065 * low) > bitAnd: 16383) * 16384)) > bitAnd: 16r0FFFFFFF > > and when written as a primitive is > hashMultiply > | low | > low := self bitAnd: 16383. > ^(16r260D * low + (16r260D * (self bitShift: -14) + (16r0065 * low) * > 16384)) > bitAnd: 16r0FFFFFFF > because we don't care about the multiply by 16384 overflowing; when > written as a primitive it won't overflow into a LargePositiveInteger. Hopefully this doesn't mean the primitive got implemented by actually doing these operations verbatim.  As you have probably seen, that convoluted arithmetic is done this way in Smalltalk only to simulate a 32x32 multiplication bit-anded down to 28 bits without overflowing into large integers (the original code from August 2000 had my initials). At a sufficiently low level such as C, all that complexity is just an unsigned multiplication by 1664525.  The image code should still have a comment to that effect, did it get lost? Andres.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 In reply to this post by Eliot Miranda-2  On Tue, Apr 18, 2017 at 7:45 AM, Eliot Miranda wrote: Hi All,    Clément and I have been looking at primitive performance in the context of string hash.  We'll say more about string hash in a later message (and there is nothing to be alarmed about here).  But it is worth describing the performance measurements of the various kinds of primitive dispatch in Cog.  The following measurements are collected from a 64-bit Squeak trunk image with a 64-bit Cog Spur VM with four different implementations of hashMultiply, three of them primitive.  The times are collected on a 2.3 GHz Intel Core i7 MacMini.The doit below compares the time taken to go round a loop containing 10 unrolled copies of an operation 10 million times, for a total of 100 million invocations of the operation.  The operations are- doing nothing- sending yourself- invoking the SmallInteger>>hashMultiply method- invoking a version of hashMultiply written as a normal interpreter primitive (more on this below)- invoking a version of hashMultiply written as a C function called from machine code- invoking a version of hashMultiply written as a machine code primitiveWhere does lowcode fit into this picture?cheers -ben
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 In reply to this post by Andres Valloud-4  2017-04-18 4:02 GMT+02:00 Andres Valloud : On 4/17/17 16:45 , Eliot Miranda wrote: FYI, hash multiply is hashMultiply | low | low := self bitAnd: 16383. ^(16r260D * low + ((16r260D * (self bitShift: -14) + (16r0065 * low) bitAnd: 16383) * 16384)) bitAnd: 16r0FFFFFFF and when written as a primitive is hashMultiply | low | low := self bitAnd: 16383. ^(16r260D * low + (16r260D * (self bitShift: -14) + (16r0065 * low) * 16384)) bitAnd: 16r0FFFFFFF because we don't care about the multiply by 16384 overflowing; when written as a primitive it won't overflow into a LargePositiveInteger. Hopefully this doesn't mean the primitive got implemented by actually doing these operations verbatim.  As you have probably seen, that convoluted arithmetic is done this way in Smalltalk only to simulate a 32x32 multiplication bit-anded down to 28 bits without overflowing into large integers (the original code from August 2000 had my initials). At a sufficiently low level such as C, all that complexity is just an unsigned multiplication by 1664525.  The image code should still have a comment to that effect, did it get lost? Andres. More: if it's a 64 bit image, then we have 60 bit long unsigned small intsince 1664525 highBit = 21, and self is a hash result not exceeding 30 bits, we can implement hashMultiply as    ^self * 1664525 bitAnd: 16r0FFFFFFFMaybe the JIT can be given a second chance too.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 In reply to this post by Andres Valloud-4   The methods in SmallInteger and LargePositiveInteger have the initials SqR (that's yours IIRC), but they have no comment. The variant in String is originally from 2001 and has never had that comment. Levente On Mon, 17 Apr 2017, Andres Valloud wrote: > > On 4/17/17 16:45 , Eliot Miranda wrote: >> FYI, hash multiply is >> hashMultiply >> | low | >> low := self bitAnd: 16383. >> ^(16r260D * low + ((16r260D * (self bitShift: -14) + (16r0065 * low) >> bitAnd: 16383) * 16384)) >> bitAnd: 16r0FFFFFFF >> >> and when written as a primitive is >> hashMultiply >> | low | >> low := self bitAnd: 16383. >> ^(16r260D * low + (16r260D * (self bitShift: -14) + (16r0065 * low) * >> 16384)) >> bitAnd: 16r0FFFFFFF >> because we don't care about the multiply by 16384 overflowing; when >> written as a primitive it won't overflow into a LargePositiveInteger. > > Hopefully this doesn't mean the primitive got implemented by actually > doing these operations verbatim.  As you have probably seen, that > convoluted arithmetic is done this way in Smalltalk only to simulate a > 32x32 multiplication bit-anded down to 28 bits without overflowing into > large integers (the original code from August 2000 had my initials). > > At a sufficiently low level such as C, all that complexity is just an > unsigned multiplication by 1664525.  The image code should still have a > comment to that effect, did it get lost? > > Andres. >
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 Yes, the SqR initials are mine.  That's too bad about the comments, at the very least there should be something like this: "This code performs a multiplication by 1664525 mod 2^28 without overflowing into large integers." On 4/18/17 1:48 , Levente Uzonyi wrote: > The methods in SmallInteger and LargePositiveInteger have the initials > SqR (that's yours IIRC), but they have no comment. > The variant in String is originally from 2001 and has never had that > comment. > > Levente > > On Mon, 17 Apr 2017, Andres Valloud wrote: > >> >> On 4/17/17 16:45 , Eliot Miranda wrote: >>> FYI, hash multiply is >>> hashMultiply >>> | low | >>> low := self bitAnd: 16383. >>> ^(16r260D * low + ((16r260D * (self bitShift: -14) + (16r0065 * low) >>> bitAnd: 16383) * 16384)) >>> bitAnd: 16r0FFFFFFF >>> >>> and when written as a primitive is >>> hashMultiply >>> | low | >>> low := self bitAnd: 16383. >>> ^(16r260D * low + (16r260D * (self bitShift: -14) + (16r0065 * low) * >>> 16384)) >>> bitAnd: 16r0FFFFFFF >>> because we don't care about the multiply by 16384 overflowing; when >>> written as a primitive it won't overflow into a LargePositiveInteger. >> >> Hopefully this doesn't mean the primitive got implemented by actually >> doing these operations verbatim.  As you have probably seen, that >> convoluted arithmetic is done this way in Smalltalk only to simulate a >> 32x32 multiplication bit-anded down to 28 bits without overflowing >> into large integers (the original code from August 2000 had my initials). >> >> At a sufficiently low level such as C, all that complexity is just an >> unsigned multiplication by 1664525.  The image code should still have >> a comment to that effect, did it get lost? >> >> Andres. >> >
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 In reply to this post by Nicolas Cellier   On Tue, Apr 18, 2017 at 10:27:41AM +0200, Nicolas Cellier wrote: >   > 2017-04-18 4:02 GMT+02:00 Andres Valloud <[hidden email]> > : > > > > > On 4/17/17 16:45 , Eliot Miranda wrote: > > > >> FYI, hash multiply is > >> hashMultiply > >> | low | > >> low := self bitAnd: 16383. > >> ^(16r260D * low + ((16r260D * (self bitShift: -14) + (16r0065 * low) > >> bitAnd: 16383) * 16384)) > >> bitAnd: 16r0FFFFFFF > >> > >> and when written as a primitive is > >> hashMultiply > >> | low | > >> low := self bitAnd: 16383. > >> ^(16r260D * low + (16r260D * (self bitShift: -14) + (16r0065 * low) * > >> 16384)) > >> bitAnd: 16r0FFFFFFF > >> because we don't care about the multiply by 16384 overflowing; when > >> written as a primitive it won't overflow into a LargePositiveInteger. > >> > > > > Hopefully this doesn't mean the primitive got implemented by actually > > doing these operations verbatim.  As you have probably seen, that > > convoluted arithmetic is done this way in Smalltalk only to simulate a > > 32x32 multiplication bit-anded down to 28 bits without overflowing into > > large integers (the original code from August 2000 had my initials). > > > > At a sufficiently low level such as C, all that complexity is just an > > unsigned multiplication by 1664525.  The image code should still have a > > comment to that effect, did it get lost? > > > > Andres. > > > > More: if it's a 64 bit image, then we have 60 bit long unsigned small int > since 1664525 highBit = 21, and self is a hash result not exceeding 30 > bits, we can implement hashMultiply as >     ^self * 1664525 bitAnd: 16r0FFFFFFF > > Maybe the JIT can be given a second chance too. AFAIK, primitiveStringHash has always been translated directly from ByteArray class>>hashBytes:startingWith: and the hashMultiply logic that we are discussing here currently is repeated in four different methods in Squeak trunk (String class>>stringHash:initialHash:, TypeString class>>stringHash:initialHash:, SmallInteger hashMultiply, and ByteArray class hashBytes:startingWith:). At this point, and particularly given the differences in various VMs, it would make sense to reimplement primitiveStringHash directly in MiscPrimitivesPlugin with whatever optimizations are needed (maybe differently based on size of a small integers), rather than translating it from a default implementation in the image. But I think we may be missing Eliot's main point. There is overhead associated with accessing the stack (at least on a stack interpreter) and in returning values to image, and this will be the case regardless of what calculations are being done in the primitive itself. So at least for some VMs we may be reaching the point where that overhead becomes a significant part of the performance profile. Whether that overhead would justify creating a new primitive callout mechanism may be another question, but it is certainly worth understanding where the overhead occurs, and how that performance profile changes as the Cog jit and Sista progress. Dave
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 2017-04-18 16:03 GMT+02:00 David T. Lewis : On Tue, Apr 18, 2017 at 10:27:41AM +0200, Nicolas Cellier wrote: > > 2017-04-18 4:02 GMT+02:00 Andres Valloud <[hidden email]> > : > > > > > On 4/17/17 16:45 , Eliot Miranda wrote: > > > >> FYI, hash multiply is > >> hashMultiply > >> | low | > >> low := self bitAnd: 16383. > >> ^(16r260D * low + ((16r260D * (self bitShift: -14) + (16r0065 * low) > >> bitAnd: 16383) * 16384)) > >> bitAnd: 16r0FFFFFFF > >> > >> and when written as a primitive is > >> hashMultiply > >> | low | > >> low := self bitAnd: 16383. > >> ^(16r260D * low + (16r260D * (self bitShift: -14) + (16r0065 * low) * > >> 16384)) > >> bitAnd: 16r0FFFFFFF > >> because we don't care about the multiply by 16384 overflowing; when > >> written as a primitive it won't overflow into a LargePositiveInteger. > >> > > > > Hopefully this doesn't mean the primitive got implemented by actually > > doing these operations verbatim.  As you have probably seen, that > > convoluted arithmetic is done this way in Smalltalk only to simulate a > > 32x32 multiplication bit-anded down to 28 bits without overflowing into > > large integers (the original code from August 2000 had my initials). > > > > At a sufficiently low level such as C, all that complexity is just an > > unsigned multiplication by 1664525.  The image code should still have a > > comment to that effect, did it get lost? > > > > Andres. > > > > More: if it's a 64 bit image, then we have 60 bit long unsigned small int > since 1664525 highBit = 21, and self is a hash result not exceeding 30 > bits, we can implement hashMultiply as >     ^self * 1664525 bitAnd: 16r0FFFFFFF > > Maybe the JIT can be given a second chance too. AFAIK, primitiveStringHash has always been translated directly from ByteArray class>>hashBytes:startingWith: and the hashMultiply logic that we are discussing here currently is repeated in four different methods in Squeak trunk (String class>>stringHash:initialHash:, TypeString class>>stringHash:initialHash:, SmallInteger hashMultiply, and ByteArray class hashBytes:startingWith:). At this point, and particularly given the differences in various VMs, it would make sense to reimplement primitiveStringHash directly in MiscPrimitivesPlugin with whatever optimizations are needed (maybe differently based on size of a small integers), rather than translating it from a default implementation in the image. Hi David,My opinion is rather that we should get rid of MiscPrimitivesPlugin.1) If String hash is performance critical (and it is), it's not really Misc.2) Eliot's post shows that regular primitive invocation is slow vs modern spur/jit, so it might not pay3) other than that, we don't properly version control MiscPrimitivePlugin source code   it's not handled in VMMaker repository nor in one of the well identified external plugin repositories.   but it's spreaded over Squeak and/or Pharo packages, and thus subject to uncontrolled changes and divergences Maybe it's time to do it. But I think we may be missing Eliot's main point. There is overhead associated with accessing the stack (at least on a stack interpreter) and in returning values to image, and this will be the case regardless of what calculations are being done in the primitive itself. So at least for some VMs we may be reaching the point where that overhead becomes a significant part of the performance profile. Whether that overhead would justify creating a new primitive callout mechanism may be another question, but it is certainly worth understanding where the overhead occurs, and how that performance profile changes as the Cog jit and Sista progress. Dave Yes, exactly, but the new callout mechanism has severe stack limitations preventing general purpose usage.Concerning the specific case of hashMultiply, since it's just an imul and a bitAnd, the primitive should be inlined by the VM just like other basic arithmetic operations IMO.So the remarks of Andres makes sense, even if we can't generalize to other Misc primitives.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 On Tue, Apr 18, 2017 at 9:09 AM, Nicolas Cellier wrote: 2017-04-18 16:03 GMT+02:00 David T. Lewis : On Tue, Apr 18, 2017 at 10:27:41AM +0200, Nicolas Cellier wrote: > > 2017-04-18 4:02 GMT+02:00 Andres Valloud <[hidden email]> > : > > > > > On 4/17/17 16:45 , Eliot Miranda wrote: > > > >> FYI, hash multiply is > >> hashMultiply > >> | low | > >> low := self bitAnd: 16383. > >> ^(16r260D * low + ((16r260D * (self bitShift: -14) + (16r0065 * low) > >> bitAnd: 16383) * 16384)) > >> bitAnd: 16r0FFFFFFF > >> > >> and when written as a primitive is > >> hashMultiply > >> | low | > >> low := self bitAnd: 16383. > >> ^(16r260D * low + (16r260D * (self bitShift: -14) + (16r0065 * low) * > >> 16384)) > >> bitAnd: 16r0FFFFFFF > >> because we don't care about the multiply by 16384 overflowing; when > >> written as a primitive it won't overflow into a LargePositiveInteger. > >> > > > > Hopefully this doesn't mean the primitive got implemented by actually > > doing these operations verbatim.  As you have probably seen, that > > convoluted arithmetic is done this way in Smalltalk only to simulate a > > 32x32 multiplication bit-anded down to 28 bits without overflowing into > > large integers (the original code from August 2000 had my initials). > > > > At a sufficiently low level such as C, all that complexity is just an > > unsigned multiplication by 1664525.  The image code should still have a > > comment to that effect, did it get lost? > > > > Andres. > > > > More: if it's a 64 bit image, then we have 60 bit long unsigned small int > since 1664525 highBit = 21, and self is a hash result not exceeding 30 > bits, we can implement hashMultiply as >     ^self * 1664525 bitAnd: 16r0FFFFFFF > > Maybe the JIT can be given a second chance too. AFAIK, primitiveStringHash has always been translated directly from ByteArray class>>hashBytes:startingWith: and the hashMultiply logic that we are discussing here currently is repeated in four different methods in Squeak trunk (String class>>stringHash:initialHash:, TypeString class>>stringHash:initialHash:, SmallInteger hashMultiply, and ByteArray class hashBytes:startingWith:). At this point, and particularly given the differences in various VMs, it would make sense to reimplement primitiveStringHash directly in MiscPrimitivesPlugin with whatever optimizations are needed (maybe differently based on size of a small integers), rather than translating it from a default implementation in the image. Hi David,My opinion is rather that we should get rid of MiscPrimitivesPlugin.1) If String hash is performance critical (and it is), it's not really Misc.2) Eliot's post shows that regular primitive invocation is slow vs modern spur/jit, so it might not pay3) other than that, we don't properly version control MiscPrimitivePlugin source code   it's not handled in VMMaker repository nor in one of the well identified external plugin repositories.   but it's spreaded over Squeak and/or Pharo packages, and thus subject to uncontrolled changes and divergences Maybe it's time to do it.I agree. I am ready to collaborate on this and update the Pharo packages. For each primitive, we need to evaluate the performance with and without the primitive. In each case, we either move the primitive to numbered primitive or we just use normal Smalltalk code. We looked into string hashing with Eliot and it seems having only hashMultiply as a primitive is enough.  But I think we may be missing Eliot's main point. There is overhead associated with accessing the stack (at least on a stack interpreter) and in returning values to image, and this will be the case regardless of what calculations are being done in the primitive itself. So at least for some VMs we may be reaching the point where that overhead becomes a significant part of the performance profile. Whether that overhead would justify creating a new primitive callout mechanism may be another question, but it is certainly worth understanding where the overhead occurs, and how that performance profile changes as the Cog jit and Sista progress. Dave Yes, exactly, but the new callout mechanism has severe stack limitations preventing general purpose usage.The idea there is that each primitive can be analysed to determine if it can be called that way or not. I believe large integers primitives may be able to be called that way for example. Concerning the specific case of hashMultiply, since it's just an imul and a bitAnd, the primitive should be inlined by the VM just like other basic arithmetic operations IMO.So the remarks of Andres makes sense, even if we can't generalize to other Misc primitives.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 In reply to this post by Andres Valloud-4  Hi Andres,On Tue, Apr 18, 2017 at 1:52 AM, Andres Valloud wrote: Yes, the SqR initials are mine.  That's too bad about the comments, at the very least there should be something like this: "This code performs a multiplication by 1664525 mod 2^28 without overflowing into large integers."Thanks.  I'll update the primitives appropriately.  Could you supply a reference both for the technique and the choice of  1664525 in particular?  I'd like to include that info in the comments too.On 4/18/17 1:48 , Levente Uzonyi wrote: The methods in SmallInteger and LargePositiveInteger have the initials SqR (that's yours IIRC), but they have no comment. The variant in String is originally from 2001 and has never had that comment. Levente On Mon, 17 Apr 2017, Andres Valloud wrote: On 4/17/17 16:45 , Eliot Miranda wrote: FYI, hash multiply is hashMultiply | low | low := self bitAnd: 16383. ^(16r260D * low + ((16r260D * (self bitShift: -14) + (16r0065 * low) bitAnd: 16383) * 16384)) bitAnd: 16r0FFFFFFF and when written as a primitive is hashMultiply | low | low := self bitAnd: 16383. ^(16r260D * low + (16r260D * (self bitShift: -14) + (16r0065 * low) * 16384)) bitAnd: 16r0FFFFFFF because we don't care about the multiply by 16384 overflowing; when written as a primitive it won't overflow into a LargePositiveInteger. Hopefully this doesn't mean the primitive got implemented by actually doing these operations verbatim.  As you have probably seen, that convoluted arithmetic is done this way in Smalltalk only to simulate a 32x32 multiplication bit-anded down to 28 bits without overflowing into large integers (the original code from August 2000 had my initials). At a sufficiently low level such as C, all that complexity is just an unsigned multiplication by 1664525.  The image code should still have a comment to that effect, did it get lost? Andres. -- _,,,^..^,,,_best, Eliot
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 In reply to this post by Andres Valloud-4   > On 18-04-2017, at 1:52 AM, Andres Valloud <[hidden email]> wrote: > > Yes, the SqR initials are mine.  That's too bad about the comments, at the very least there should be something like this: > > "This code performs a multiplication by 1664525 mod 2^28 without overflowing into large integers.” But.. .but.. I thought comments weren’t useful ? ;-) tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/timFractured Idiom:- HARLEZ-VOUS FRANCAIS? - Can you drive a French motorcycle?
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 In reply to this post by Nicolas Cellier   > On 18-04-2017, at 9:09 AM, Nicolas Cellier <[hidden email]> wrote: > > My opinion is rather that we should get rid of MiscPrimitivesPlugin. It’s been on my list of “things we ought to do” for, oh, 15 years? tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/timStrange OpCodes: PWB: Put to Waste Basket
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 In reply to this post by Nicolas Cellier  On Tue, Apr 18, 2017 at 6:09 PM, Nicolas Cellier wrote:Concerning the specific case of hashMultiply, since it's just an imul and a bitAnd, the primitive should be inlined by the VM just like other basic arithmetic operations IMO.So the remarks of Andres makes sense, even if we can't generalize to other Misc primitives.Yes, code optimized by Sista should be pretty much as fast as a primitive, so we could hopefully get rid of many primitives that are just there to improve performance.- Bert -
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 In reply to this post by Eliot Miranda-2   The hashing technique is just a multiplicative hash function.  The multiplication technique is equivalent to splitting AX into AH and AL, and using the distributive rule to (AH * 2^8 + AL) * (BH * 2^8 + BL). The choice of 1664525 is due to Mark van der Gulik, he took the constant out of a Knuth table of good linear congruency pseudo-RNG factors.  It's been 17 years since that, I do not currently think the pseudo-RNG relationship is that relevant.  In the mean time, as hashing goes, it's good enough. On 4/18/17 10:01 , Eliot Miranda wrote: > Hi Andres, > > On Tue, Apr 18, 2017 at 1:52 AM, Andres Valloud > <[hidden email] > > wrote: > > >     Yes, the SqR initials are mine.  That's too bad about the comments, >     at the very least there should be something like this: > >     "This code performs a multiplication by 1664525 mod 2^28 without >     overflowing into large integers." > > > Thanks.  I'll update the primitives appropriately.  Could you supply a > reference both for the technique and the choice of  1664525 in > particular?  I'd like to include that info in the comments too. > > >     On 4/18/17 1:48 , Levente Uzonyi wrote: > >         The methods in SmallInteger and LargePositiveInteger have the >         initials >         SqR (that's yours IIRC), but they have no comment. >         The variant in String is originally from 2001 and has never had that >         comment. > >         Levente > >         On Mon, 17 Apr 2017, Andres Valloud wrote: > > >             On 4/17/17 16:45 , Eliot Miranda wrote: > >                 FYI, hash multiply is >                 hashMultiply >                 | low | >                 low := self bitAnd: 16383. >                 ^(16r260D * low + ((16r260D * (self bitShift: -14) + >                 (16r0065 * low) >                 bitAnd: 16383) * 16384)) >                 bitAnd: 16r0FFFFFFF > >                 and when written as a primitive is >                 hashMultiply >                 | low | >                 low := self bitAnd: 16383. >                 ^(16r260D * low + (16r260D * (self bitShift: -14) + >                 (16r0065 * low) * >                 16384)) >                 bitAnd: 16r0FFFFFFF >                 because we don't care about the multiply by 16384 >                 overflowing; when >                 written as a primitive it won't overflow into a >                 LargePositiveInteger. > > >             Hopefully this doesn't mean the primitive got implemented by >             actually >             doing these operations verbatim.  As you have probably seen, >             that >             convoluted arithmetic is done this way in Smalltalk only to >             simulate a >             32x32 multiplication bit-anded down to 28 bits without >             overflowing >             into large integers (the original code from August 2000 had >             my initials). > >             At a sufficiently low level such as C, all that complexity >             is just an >             unsigned multiplication by 1664525.  The image code should >             still have >             a comment to that effect, did it get lost? > >             Andres. > > > > > > -- > _,,,^..^,,,_ > best, Eliot
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 In reply to this post by Bert Freudenberg  Hi Bert, Hi All,On Tue, Apr 18, 2017 at 10:55 AM, Bert Freudenberg wrote: On Tue, Apr 18, 2017 at 6:09 PM, Nicolas Cellier wrote:Concerning the specific case of hashMultiply, since it's just an imul and a bitAnd, the primitive should be inlined by the VM just like other basic arithmetic operations IMO.So the remarks of Andres makes sense, even if we can't generalize to other Misc primitives.Yes, code optimized by Sista should be pretty much as fast as a primitive, so we could hopefully get rid of many primitives that are just there to improve performance.The problem to think through carefully is the impact on the ContextInterpreter, StackInterpreter and SqueakJS VMs.  I have a gut feeling that the right thing to do is to keep the primitives for the benefit of these VMs and to have Cog ignore them.  That implies that the primitive failure code can be what one would like to execute and have Sista optimize.  That means that translated primitives are an obstacle.Rewriting the translated primitives as conventional primitives in MiscPrimitivePlugin and not including the MiscPrimitivePlugin in Cog (JIT) VMs (o including one which provides no primitives) would allow the primitive failure code in the various Collection methods to be written optimally, allow the interpreter VMs to keep using the primitives, allow Cog VMs to ignore the primitives easily, and would allow putting the primitives closer to the receivers.  Redirecting through to the class side wastes cycles._,,,^..^,,,_best, Eliot
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 In reply to this post by Andres Valloud-4  Hi Andres,On Tue, Apr 18, 2017 at 11:06 AM, Andres Valloud wrote: The hashing technique is just a multiplicative hash function.  The multiplication technique is equivalent to splitting AX into AH and AL, and using the distributive rule to (AH * 2^8 + AL) * (BH * 2^8 + BL). The choice of 1664525 is due to Mark van der Gulik, he took the constant out of a Knuth table of good linear congruency pseudo-RNG factors.  It's been 17 years since that, I do not currently think the pseudo-RNG relationship is that relevant.  In the mean time, as hashing goes, it's good enough.I'm looking for more of a reference, say to a paper which describes the technique, or a reference implementation on the web.  Remember that one important proper of 1664525 is that it belongs to a set of integers for which n bitCount * 2 between: n highBit and: n highBit + 2, i.e. that about half the bits are set.  Further, the bits are reasonably well distributed throughout, unlike for example 16rF0F0F0F0F.I'll look up the table in my Knuth vol 3. On 4/18/17 10:01 , Eliot Miranda wrote: Hi Andres, On Tue, Apr 18, 2017 at 1:52 AM, Andres Valloud <[hidden email] > wrote:     Yes, the SqR initials are mine.  That's too bad about the comments,     at the very least there should be something like this:     "This code performs a multiplication by 1664525 mod 2^28 without     overflowing into large integers." Thanks.  I'll update the primitives appropriately.  Could you supply a reference both for the technique and the choice of  1664525 in particular?  I'd like to include that info in the comments too.     On 4/18/17 1:48 , Levente Uzonyi wrote:         The methods in SmallInteger and LargePositiveInteger have the         initials         SqR (that's yours IIRC), but they have no comment.         The variant in String is originally from 2001 and has never had that         comment.         Levente         On Mon, 17 Apr 2017, Andres Valloud wrote:             On 4/17/17 16:45 , Eliot Miranda wrote:                 FYI, hash multiply is                 hashMultiply                 | low |                 low := self bitAnd: 16383.                 ^(16r260D * low + ((16r260D * (self bitShift: -14) +                 (16r0065 * low)                 bitAnd: 16383) * 16384))                 bitAnd: 16r0FFFFFFF                 and when written as a primitive is                 hashMultiply                 | low |                 low := self bitAnd: 16383.                 ^(16r260D * low + (16r260D * (self bitShift: -14) +                 (16r0065 * low) *                 16384))                 bitAnd: 16r0FFFFFFF                 because we don't care about the multiply by 16384                 overflowing; when                 written as a primitive it won't overflow into a                 LargePositiveInteger.             Hopefully this doesn't mean the primitive got implemented by             actually             doing these operations verbatim.  As you have probably seen,             that             convoluted arithmetic is done this way in Smalltalk only to             simulate a             32x32 multiplication bit-anded down to 28 bits without             overflowing             into large integers (the original code from August 2000 had             my initials).             At a sufficiently low level such as C, all that complexity             is just an             unsigned multiplication by 1664525.  The image code should             still have             a comment to that effect, did it get lost?             Andres. -- _,,,^..^,,,_ best, Eliot -- _,,,^..^,,,_best, Eliot
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Cog Primitive Performance

 The "technique" of multiplicative hash functions is well publicized, e.g. https://en.wikipedia.org/wiki/Universal_hashing (under hashing strings) https://en.wikipedia.org/wiki/Hash_function#Multiplicative_hashingand IIRC Knuth under hashing.  Part of the problem with multiplicative hash functions is that every other person chooses a different number and attaches their name to it.  So if you choose 13, 131, 1313 etc you have the K&R "digital method" hash, or if you choose an FNV factor then you have the FNV hashes, and so on.  But really they are all the same algorithm: namely, the evaluation of a polynomial with coefficients implied by the data at the factor chosen using Horner's rule. Note I did measure, and even Knuth's suggestions for factors did not substantially improve the outcome.  The entire class of multiplicative hash functions, as long as the factor is not really bad like 31 or 255, are "good enough".  In particular, the size of the factor doesn't matter as much provided it's sufficiently large compared to the quanta being hashed (e.g. bytes => factor [significantly] greater than 256). Overall, I checked 300+ hash functions against 100+ datasets and documented the results in the hashing book.  Many of the hash functions are multiplicative hash functions, either explicitly or in disguise. At one point I did find the RNG factor 1664525 on line 26 from Knuth's TAOCP vol 2 table of good linear congruence RNG multipliers.  However, that the factor is good for an RNG does not necessarily show it has to be a good factor for a multiplicative hash function.  In retrospect, that implication seems like an argument by osmosis. Andres. On 4/18/17 12:08 , Eliot Miranda wrote: > Hi Andres, > > On Tue, Apr 18, 2017 at 11:06 AM, Andres Valloud > <[hidden email] > > wrote: > > >     The hashing technique is just a multiplicative hash function.  The >     multiplication technique is equivalent to splitting AX into AH and >     AL, and using the distributive rule to (AH * 2^8 + AL) * (BH * 2^8 + >     BL). The choice of 1664525 is due to Mark van der Gulik, he took the >     constant out of a Knuth table of good linear congruency pseudo-RNG >     factors.  It's been 17 years since that, I do not currently think >     the pseudo-RNG relationship is that relevant.  In the mean time, as >     hashing goes, it's good enough. > > > I'm looking for more of a reference, say to a paper which describes the > technique, or a reference implementation on the web.  Remember that one > important proper of 1664525 is that it belongs to a set of integers for > which n bitCount * 2 between: n highBit and: n highBit + 2, i.e. that > about half the bits are set.  Further, the bits are reasonably well > distributed throughout, unlike for example 16rF0F0F0F0F. > > I'll look up the table in my Knuth vol 3. > > > >     On 4/18/17 10:01 , Eliot Miranda wrote: > >         Hi Andres, > >         On Tue, Apr 18, 2017 at 1:52 AM, Andres Valloud >         <[hidden email] >         >                 >> wrote: > > >             Yes, the SqR initials are mine.  That's too bad about the >         comments, >             at the very least there should be something like this: > >             "This code performs a multiplication by 1664525 mod 2^28 without >             overflowing into large integers." > > >         Thanks.  I'll update the primitives appropriately.  Could you >         supply a >         reference both for the technique and the choice of  1664525 in >         particular?  I'd like to include that info in the comments too. > > >             On 4/18/17 1:48 , Levente Uzonyi wrote: > >                 The methods in SmallInteger and LargePositiveInteger >         have the >                 initials >                 SqR (that's yours IIRC), but they have no comment. >                 The variant in String is originally from 2001 and has >         never had that >                 comment. > >                 Levente > >                 On Mon, 17 Apr 2017, Andres Valloud wrote: > > >                     On 4/17/17 16:45 , Eliot Miranda wrote: > >                         FYI, hash multiply is >                         hashMultiply >                         | low | >                         low := self bitAnd: 16383. >                         ^(16r260D * low + ((16r260D * (self bitShift: -14) + >                         (16r0065 * low) >                         bitAnd: 16383) * 16384)) >                         bitAnd: 16r0FFFFFFF > >                         and when written as a primitive is >                         hashMultiply >                         | low | >                         low := self bitAnd: 16383. >                         ^(16r260D * low + (16r260D * (self bitShift: -14) + >                         (16r0065 * low) * >                         16384)) >                         bitAnd: 16r0FFFFFFF >                         because we don't care about the multiply by 16384 >                         overflowing; when >                         written as a primitive it won't overflow into a >                         LargePositiveInteger. > > >                     Hopefully this doesn't mean the primitive got >         implemented by >                     actually >                     doing these operations verbatim.  As you have >         probably seen, >                     that >                     convoluted arithmetic is done this way in Smalltalk >         only to >                     simulate a >                     32x32 multiplication bit-anded down to 28 bits without >                     overflowing >                     into large integers (the original code from August >         2000 had >                     my initials). > >                     At a sufficiently low level such as C, all that >         complexity >                     is just an >                     unsigned multiplication by 1664525.  The image code >         should >                     still have >                     a comment to that effect, did it get lost? > >                     Andres. > > > > > >         -- >         _,,,^..^,,,_ >         best, Eliot > > > > > -- > _,,,^..^,,,_ > best, Eliot
12