Questions about hash

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Questions about hash

Mariano Martinez Peck
 
Hi folks. I am trying to understand a little how hash is assigned and how it works in the SqueakVM. I have a couple of questions:

1) There are 12 bits in the ObjectHeader for the hash. I saw that the value of this hash comes from #newObjectHash. But....this method answers a hash of 16 bit:
newObjectHash
    "Answer a new 16-bit pseudo-random number for use as an identity hash."

    lastHash := 13849 + (27181 * lastHash) bitAnd: 65535.
    ^ lastHash

Since in the OH we have 12 bits, each place where this method is used, then we have to do something like this:

header1 := (classFormat bitAnd: 16r1FF00) bitOr: (hash << HashBitsOffset bitAnd: HashBits).

or

    hash := self newObjectHash.
    header := (self longAt: newOop) bitAnd: 16r1FFFF.
    "use old ccIndex, format, size, and header-type fields"
    header := header bitOr: ((hash << 17) bitAnd: 16r1FFE0000).

....
so, we always need to trim it to 12 bits using (hash << HashBitsOffset bitAnd: HashBits).

Now, I wonder, why #newObjectHash  does not answer a 12bits long hash directly ?  is it because of the hash function performance?


2)  It looks to me that not, but can it change the hash of an object? if true, why and where?

3) If #basicIdentityHash is ONLY used for Sets and friends, why not letting the hash assignment as late/lazy as possible? i mean, instead of creating and assigning a hash at object creation, what if we do this in #hashBitsOf: oop 
where we can check in the hash is empty and if true assign one.
Would this make sense or it would be worst?

Thanks

Mariano
Reply | Threaded
Open this post in threaded view
|

Re: Questions about hash

Eliot Miranda-2
 


On Thu, Jan 6, 2011 at 6:23 AM, Mariano Martinez Peck <[hidden email]> wrote:
 
Hi folks. I am trying to understand a little how hash is assigned and how it works in the SqueakVM. I have a couple of questions:

1) There are 12 bits in the ObjectHeader for the hash. I saw that the value of this hash comes from #newObjectHash. But....this method answers a hash of 16 bit:
newObjectHash
    "Answer a new 16-bit pseudo-random number for use as an identity hash."

    lastHash := 13849 + (27181 * lastHash) bitAnd: 65535.
    ^ lastHash

Since in the OH we have 12 bits, each place where this method is used, then we have to do something like this:

header1 := (classFormat bitAnd: 16r1FF00) bitOr: (hash << HashBitsOffset bitAnd: HashBits).

or

    hash := self newObjectHash.
    header := (self longAt: newOop) bitAnd: 16r1FFFF.
    "use old ccIndex, format, size, and header-type fields"
    header := header bitOr: ((hash << 17) bitAnd: 16r1FFE0000).

....
so, we always need to trim it to 12 bits using (hash << HashBitsOffset bitAnd: HashBits).

Now, I wonder, why #newObjectHash  does not answer a 12bits long hash directly ?  is it because of the hash function performance?


2)  It looks to me that not, but can it change the hash of an object? if true, why and where?

3) If #basicIdentityHash is ONLY used for Sets and friends, why not letting the hash assignment as late/lazy as possible? i mean, instead of creating and assigning a hash at object creation, what if we do this in #hashBitsOf: oop 
where we can check in the hash is empty and if true assign one.
Would this make sense or it would be worst?

IMO it is far better and I will implement this in the new object representation.  VW has always had lazy hash creation.

Note in Cog the hash is derived from the allocation pointer not from the pseudo-random hash generator, avoiding a read-modify-write cycle in object allocation.

best
Eliot


Thanks

Mariano


Reply | Threaded
Open this post in threaded view
|

Re: Questions about hash

Mariano Martinez Peck
 


On Thu, Jan 6, 2011 at 6:33 PM, Eliot Miranda <[hidden email]> wrote:
 


On Thu, Jan 6, 2011 at 6:23 AM, Mariano Martinez Peck <[hidden email]> wrote:
 
Hi folks. I am trying to understand a little how hash is assigned and how it works in the SqueakVM. I have a couple of questions:

1) There are 12 bits in the ObjectHeader for the hash. I saw that the value of this hash comes from #newObjectHash. But....this method answers a hash of 16 bit:
newObjectHash
    "Answer a new 16-bit pseudo-random number for use as an identity hash."

    lastHash := 13849 + (27181 * lastHash) bitAnd: 65535.
    ^ lastHash

Since in the OH we have 12 bits, each place where this method is used, then we have to do something like this:

header1 := (classFormat bitAnd: 16r1FF00) bitOr: (hash << HashBitsOffset bitAnd: HashBits).

or

    hash := self newObjectHash.
    header := (self longAt: newOop) bitAnd: 16r1FFFF.
    "use old ccIndex, format, size, and header-type fields"
    header := header bitOr: ((hash << 17) bitAnd: 16r1FFE0000).

....
so, we always need to trim it to 12 bits using (hash << HashBitsOffset bitAnd: HashBits).

Now, I wonder, why #newObjectHash  does not answer a 12bits long hash directly ?  is it because of the hash function performance?


2)  It looks to me that not, but can it change the hash of an object? if true, why and where?

3) If #basicIdentityHash is ONLY used for Sets and friends, why not letting the hash assignment as late/lazy as possible? i mean, instead of creating and assigning a hash at object creation, what if we do this in #hashBitsOf: oop 
where we can check in the hash is empty and if true assign one.
Would this make sense or it would be worst?

IMO it is far better and I will implement this in the new object representation.  VW has always had lazy hash creation.


Excellent. Marcus told me about this :)
 
Note in Cog the hash is derived from the allocation pointer not from the pseudo-random hash generator, avoiding a read-modify-write cycle in object allocation.


Yeah, I saw it:

newObjectHash
    "Derive the new object hash from the allocation pointer.  This is less costly than
     using lastHash because it avoids the read-modify-write cycle to update lastHash.
     Since the size of eden is a power of two and larger than the hash range this provides
     a well-distributed and fairly random set of values."
    <inline: true>
    ^freeStart >> BytesPerWord
 
best
Eliot


Thanks

Mariano




Reply | Threaded
Open this post in threaded view
|

Re: Questions about hash

johnmci
In reply to this post by Eliot Miranda-2
 
Well 10 years back I built a VM that precomputed all the random numbers since it's a *small* amount, you'll note the code does store the last 
seed so storing the index wasn't hard so it was lookup in static array, increment index. 

There was a few percent performance gain if you did Object new 100 million times. But I'm afraid nothing you could see in real life. This was on hardware
were executing instructions cost computable microseconds.  

The other adventure was tuning the New logic, not a complete rewrite but refining the IF tree logic, that resulted in a few percent, but again in real life nothing that would justify screwing with the code. Still that was long ago, maybe things have changed. I still have the changes sets of course.

Let me attach the email from July 19, 2003 12:52:37 PM PDT  [VM] HashBits, a lazy way  

On 2011-01-06, at 9:33 AM, Eliot Miranda wrote:



On Thu, Jan 6, 2011 at 6:23 AM, Mariano Martinez Peck <[hidden email]> wrote:
 
Hi folks. I am trying to understand a little how hash is assigned and how it works in the SqueakVM. I have a couple of questions:



--
===========================================================================
John M. McIntosh <[hidden email]>   Twitter:  squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
===========================================================================


Things are a bit quiet this morning so I wandered over to my freebsd intel box and downloaded
the unix 3.5-development tar file and build an intel squeak vm.

I used a interp.c  I had (from june 30th)  which was a bit newer, but followed the same pattern as the one that Ian is shipping, gnuified, use of global structure foo, use of AHC changeset to localize variables in the interpreter

For the 3.6.1bx interp.c that used
BitOfGCTuning-JMM
which removes the +1 in the headerTypeBytes lookup, plus some localization for the mark/sweep
InterpreterFixes-3Point6 from Andreas
CodeGenEnh-3Point6 from Andreas
MakePrimPointXInternal-JMM which makes point x/y internal in the interp.c
allocateNoFillHashTable-JMM which has improved allocation/nofill & hashtable lookup.

Not quite the same, however in running tinybenchmarks I noted

june 30thVM 40,738,383 bytecode/sec 972,076 sends/sec
3.6.1bxVM     40,429,564 bytecodes/sec 984,169 sends/sec

Similar performance, but the 3.6.1bxVM has better sends/second. Due to the CodeGenEnh I think...
Certainly one is not extraordinary faster than the other.

However  when I looked at allocation rates for 1 asFloat,  2@3, Array new: 5 I saw

june30thVM 37,654, 40,198, 31,037

3.6.1bxVM    75,952,  82,524, 39,325

Ah, much nicer, also quite remarkable as compared to the powerpc benchmarks.

These changes did translate into a savings in the macrobenchmarks.
371,654 versus 376,568
It's hard to say if the better performance is due to allocation, or to faster sends/second.
however cutting small object allocation time by 50%should show up somewhere.

Someone will need to do a better degree of testing, to better reflect just the hashtable/ old hash differences? Versus  my wholesale before/after testing. And of course try this under windows.

PS The intel box is a dual pentium 350Mhz, which ran 49.5% busy during the testing.
gcc version 2.95.4 was used, gcc 3.3.x might do a better job?