Smalltalk › Squeak › Squeak VM

Questions about hash

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

4 messages Options

Mariano Martinez Peck

Questions about hash

Hi folks. I am trying to understand a little how hash is assigned and how it works in the SqueakVM. I have a couple of questions:

1) There are 12 bits in the ObjectHeader for the hash. I saw that the value of this hash comes from #newObjectHash. But....this method answers a hash of 16 bit:
newObjectHash
    "Answer a new 16-bit pseudo-random number for use as an identity hash."

    lastHash := 13849 + (27181 * lastHash) bitAnd: 65535.
    ^ lastHash

Since in the OH we have 12 bits, each place where this method is used, then we have to do something like this:

header1 := (classFormat bitAnd: 16r1FF00) bitOr: (hash << HashBitsOffset bitAnd: HashBits).

or

    hash := self newObjectHash.
    header := (self longAt: newOop) bitAnd: 16r1FFFF.
    "use old ccIndex, format, size, and header-type fields"
    header := header bitOr: ((hash << 17) bitAnd: 16r1FFE0000).

....
so, we always need to trim it to 12 bits using (hash << HashBitsOffset bitAnd: HashBits).

Now, I wonder, why #newObjectHash does not answer a 12bits long hash directly ? is it because of the hash function performance?

2) It looks to me that not, but can it change the hash of an object? if true, why and where?

3) If #basicIdentityHash is ONLY used for Sets and friends, why not letting the hash assignment as late/lazy as possible? i mean, instead of creating and assigning a hash at object creation, what if we do this in #hashBitsOf: oop
where we can check in the hash is empty and if true assign one.
Would this make sense or it would be worst?

Thanks

Mariano

Eliot Miranda-2

Re: Questions about hash

On Thu, Jan 6, 2011 at 6:23 AM, Mariano Martinez Peck <[hidden email]> wrote:

Hi folks. I am trying to understand a little how hash is assigned and how it works in the SqueakVM. I have a couple of questions:

1) There are 12 bits in the ObjectHeader for the hash. I saw that the value of this hash comes from #newObjectHash. But....this method answers a hash of 16 bit:
newObjectHash
    "Answer a new 16-bit pseudo-random number for use as an identity hash."

    lastHash := 13849 + (27181 * lastHash) bitAnd: 65535.
    ^ lastHash

Since in the OH we have 12 bits, each place where this method is used, then we have to do something like this:

header1 := (classFormat bitAnd: 16r1FF00) bitOr: (hash << HashBitsOffset bitAnd: HashBits).

or

    hash := self newObjectHash.
    header := (self longAt: newOop) bitAnd: 16r1FFFF.
    "use old ccIndex, format, size, and header-type fields"
    header := header bitOr: ((hash << 17) bitAnd: 16r1FFE0000).

....
so, we always need to trim it to 12 bits using (hash << HashBitsOffset bitAnd: HashBits).

Now, I wonder, why #newObjectHash does not answer a 12bits long hash directly ? is it because of the hash function performance?

2) It looks to me that not, but can it change the hash of an object? if true, why and where?

3) If #basicIdentityHash is ONLY used for Sets and friends, why not letting the hash assignment as late/lazy as possible? i mean, instead of creating and assigning a hash at object creation, what if we do this in #hashBitsOf: oop
where we can check in the hash is empty and if true assign one.
Would this make sense or it would be worst?

IMO it is far better and I will implement this in the new object representation. VW has always had lazy hash creation.

Note in Cog the hash is derived from the allocation pointer not from the pseudo-random hash generator, avoiding a read-modify-write cycle in object allocation.

best

Eliot

Thanks

Mariano

Mariano Martinez Peck

Re: Questions about hash

On Thu, Jan 6, 2011 at 6:33 PM, Eliot Miranda <[hidden email]> wrote:

On Thu, Jan 6, 2011 at 6:23 AM, Mariano Martinez Peck <[hidden email]> wrote:

Hi folks. I am trying to understand a little how hash is assigned and how it works in the SqueakVM. I have a couple of questions:

1) There are 12 bits in the ObjectHeader for the hash. I saw that the value of this hash comes from #newObjectHash. But....this method answers a hash of 16 bit:
newObjectHash
    "Answer a new 16-bit pseudo-random number for use as an identity hash."

    lastHash := 13849 + (27181 * lastHash) bitAnd: 65535.
    ^ lastHash

Since in the OH we have 12 bits, each place where this method is used, then we have to do something like this:

header1 := (classFormat bitAnd: 16r1FF00) bitOr: (hash << HashBitsOffset bitAnd: HashBits).

or

    hash := self newObjectHash.
    header := (self longAt: newOop) bitAnd: 16r1FFFF.
    "use old ccIndex, format, size, and header-type fields"
    header := header bitOr: ((hash << 17) bitAnd: 16r1FFE0000).

....
so, we always need to trim it to 12 bits using (hash << HashBitsOffset bitAnd: HashBits).

Now, I wonder, why #newObjectHash does not answer a 12bits long hash directly ? is it because of the hash function performance?

2) It looks to me that not, but can it change the hash of an object? if true, why and where?

3) If #basicIdentityHash is ONLY used for Sets and friends, why not letting the hash assignment as late/lazy as possible? i mean, instead of creating and assigning a hash at object creation, what if we do this in #hashBitsOf: oop
where we can check in the hash is empty and if true assign one.
Would this make sense or it would be worst?

IMO it is far better and I will implement this in the new object representation. VW has always had lazy hash creation.

Excellent. Marcus told me about this :)

Note in Cog the hash is derived from the allocation pointer not from the pseudo-random hash generator, avoiding a read-modify-write cycle in object allocation.

Yeah, I saw it:

newObjectHash
    "Derive the new object hash from the allocation pointer. This is less costly than
    using lastHash because it avoids the read-modify-write cycle to update lastHash.
    Since the size of eden is a power of two and larger than the hash range this provides
    a well-distributed and fairly random set of values."
    <inline: true>
    ^freeStart >> BytesPerWord

best
Eliot

Thanks

Mariano

johnmci

Re: Questions about hash

In reply to this post by Eliot Miranda-2

Well 10 years back I built a VM that precomputed all the random numbers since it's a *small* amount, you'll note the code does store the last

seed so storing the index wasn't hard so it was lookup in static array, increment index.

There was a few percent performance gain if you did Object new 100 million times. But I'm afraid nothing you could see in real life. This was on hardware

were executing instructions cost computable microseconds.

The other adventure was tuning the New logic, not a complete rewrite but refining the IF tree logic, that resulted in a few percent, but again in real life nothing that would justify screwing with the code. Still that was long ago, maybe things have changed. I still have the changes sets of course.

Let me attach the email from July 19, 2003 12:52:37 PM PDT [VM] HashBits, a lazy way

On 2011-01-06, at 9:33 AM, Eliot Miranda wrote:

On Thu, Jan 6, 2011 at 6:23 AM, Mariano Martinez Peck <[hidden email]> wrote:

Hi folks. I am trying to understand a little how hash is assigned and how it works in the SqueakVM. I have a couple of questions:

===========================================================================

John M. McIntosh <[hidden email]> Twitter: squeaker68882

Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com

===========================================================================

Things are a bit quiet this morning so I wandered over to my freebsd intel box and downloaded
the unix 3.5-development tar file and build an intel squeak vm.

I used a interp.c I had (from june 30th) which was a bit newer, but followed the same pattern as the one that Ian is shipping, gnuified, use of global structure foo, use of AHC changeset to localize variables in the interpreter

For the 3.6.1bx interp.c that used
BitOfGCTuning-JMM
which removes the +1 in the headerTypeBytes lookup, plus some localization for the mark/sweep
InterpreterFixes-3Point6 from Andreas
CodeGenEnh-3Point6 from Andreas
MakePrimPointXInternal-JMM which makes point x/y internal in the interp.c
allocateNoFillHashTable-JMM which has improved allocation/nofill & hashtable lookup.

Not quite the same, however in running tinybenchmarks I noted

june 30thVM 40,738,383 bytecode/sec 972,076 sends/sec
3.6.1bxVM 40,429,564 bytecodes/sec 984,169 sends/sec

Similar performance, but the 3.6.1bxVM has better sends/second. Due to the CodeGenEnh I think...
Certainly one is not extraordinary faster than the other.

However when I looked at allocation rates for 1 asFloat, 2@3, Array new: 5 I saw

june30thVM 37,654, 40,198, 31,037

3.6.1bxVM 75,952, 82,524, 39,325

Ah, much nicer, also quite remarkable as compared to the powerpc benchmarks.

These changes did translate into a savings in the macrobenchmarks.
371,654 versus 376,568
It's hard to say if the better performance is due to allocation, or to faster sends/second.
however cutting small object allocation time by 50%should show up somewhere.

Someone will need to do a better degree of testing, to better reflect just the hashtable/ old hash differences? Versus my wholesale before/after testing. And of course try this under windows.

PS The intel box is a dual pentium 350Mhz, which ran 49.5% busy during the testing.
gcc version 2.95.4 was used, gcc 3.3.x might do a better job?