Smalltalk › Squeak › Squeak VM

identityHash bits not incremented for new objects?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

8 messages Options

Igor Stasenko

identityHash bits not incremented for new objects?

I found that #testBecomeIdentityHash sometimes failing, sometimes not.

It seems like VM 'forgets' to produce different identityHash bits for
two consequently allocated objects,
while test assumes that they are always different.

If i insert a statement (see the code), test no longer fails.

testBecomeIdentityHash
"Note. The identity hash of both objects seems to change after the become:"

| a b c d |

a := 'ab' copy.
b := 'cd' copy.

>>> [ b identityHash = a identityHash ] whileTrue: [ b := b copy ].

c := a.
d := b.

a become: b.

self
assert: a identityHash = c identityHash;
assert: b identityHash = d identityHash;
deny: a identityHash = b identityHash.

A simple piece of code reveals the problem:

(1 to: 20) collect: [:i |
'ab' copy basicIdentityHash ] #(954 954 955 955 956 956 957 957 958
958 959 959 960 960 961 961 962 962 963 963)

--
Best regards,
Igor Stasenko AKA sig.

Mariano Martinez Peck

Re: [squeak-dev] identityHash bits not incremented for new objects?

Hi Igor. Does the same test work correctlly in a interpreter VM? because I can see:

ObjectMemory >> newObjectHash
    "Answer a new 16-bit pseudo-random number for use as an identity hash."

    lastHash := 13849 + (27181 * lastHash) bitAnd: 65535.
    ^ lastHash

In which case, it seems that every new hash will be different than the previous one.

But NewObjectMemory >> newObjectHash
    "Derive the new object hash from the allocation pointer. This is less costly than
    using lastHash because it avoids the read-modify-write cycle to update lastHash.
    Since the size of eden is a power of two and larger than the hash range this provides
    a well-distributed and fairly random set of values."
    <inline: true>
    ^freeStart >> BytesPerWord

so...I have no idea, but maybe for two consequently allocated objects, the instVar freeStart can remain with the same value ?

On Mon, Aug 1, 2011 at 8:43 AM, Igor Stasenko <[hidden email]> wrote:

I found that #testBecomeIdentityHash sometimes failing, sometimes not.

It seems like VM 'forgets' to produce different identityHash bits for
two consequently allocated objects,
while test assumes that they are always different.

If i insert a statement (see the code), test no longer fails.

testBecomeIdentityHash
"Note. The identity hash of both objects seems to change after the become:"

| a b c d |

a := 'ab' copy.
b := 'cd' copy.

>>> [ b identityHash = a identityHash ] whileTrue: [ b := b copy ].

c := a.
d := b.

a become: b.

self
assert: a identityHash = c identityHash;
assert: b identityHash = d identityHash;
deny: a identityHash = b identityHash.

A simple piece of code reveals the problem:

(1 to: 20) collect: [:i |
'ab' copy basicIdentityHash ] #(954 954 955 955 956 956 957 957 958
958 959 959 960 960 961 961 962 962 963 963)

--
Best regards,
Igor Stasenko AKA sig.

--
Mariano
http://marianopeck.wordpress.com

Igor Stasenko

Re: [squeak-dev] identityHash bits not incremented for new objects?

On 1 August 2011 09:47, Mariano Martinez Peck <[hidden email]> wrote:

>
> Hi Igor. Does the same test work correctlly in a interpreter VM? because I can see:
>
> ObjectMemory >> newObjectHash
>     "Answer a new 16-bit pseudo-random number for use as an identity hash."
>
>     lastHash := 13849 + (27181 * lastHash) bitAnd: 65535.
>     ^ lastHash
>
>
> In which case, it seems that every new hash will be different than the previous one.
>
> But NewObjectMemory >> newObjectHash
>     "Derive the new object hash from the allocation pointer. This is less costly than
>     using lastHash because it avoids the read-modify-write cycle to update lastHash.
>     Since the size of eden is a power of two and larger than the hash range this provides
>     a well-distributed and fairly random set of values."
>     <inline: true>
>     ^freeStart >> BytesPerWord

aha.. so this is the reason why hash is not incrementing!
Since BytesPerWord == 4
it means that hash will change only if old freeStart - new freeStart value >= 16

and since two short strings 'ab' fit in 16 bytes, they receive same hash value.

So, instead it should be:

^freeStart // BytesPerWord

since freeStart is aligned on BytesPerWord, but not (2^BytesPerWord)

>
> so...I have no idea, but maybe for two consequently allocated objects, the instVar freeStart can remain with the same value ?
>
>
>
> On Mon, Aug 1, 2011 at 8:43 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> I found that #testBecomeIdentityHash sometimes failing, sometimes not.
>>
>> It seems like VM 'forgets' to produce different identityHash bits for
>> two consequently allocated objects,
>> while test assumes that they are always different.
>>
>> If i insert a statement (see the code), test no longer fails.
>>
>> testBecomeIdentityHash
>> "Note. The identity hash of both objects seems to change after the become:"
>>
>> | a b c d |
>>
>> a := 'ab' copy.
>> b := 'cd' copy.
>>
>> >>> [ b identityHash = a identityHash ] whileTrue: [ b := b copy ].
>>
>> c := a.
>> d := b.
>>
>> a become: b.
>>
>> self
>> assert: a identityHash = c identityHash;
>> assert: b identityHash = d identityHash;
>> deny: a identityHash = b identityHash.
>>
>> A simple piece of code reveals the problem:
>>
>> (1 to: 20) collect: [:i |
>> 'ab' copy basicIdentityHash ] #(954 954 955 955 956 956 957 957 958
>> 958 959 959 960 960 961 961 962 962 963 963)
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>
>
>

--
Best regards,
Igor Stasenko AKA sig.

David T. Lewis

Re: [squeak-dev] identityHash bits not incremented for new objects?

On Mon, Aug 01, 2011 at 10:03:04AM +0200, Igor Stasenko wrote:

>
> On 1 August 2011 09:47, Mariano Martinez Peck <[hidden email]> wrote:
> >
> > Hi Igor. Does the same test work correctlly in a interpreter VM??? because I can see:
> >
> > ObjectMemory >> newObjectHash
> > ?????? "Answer a new 16-bit pseudo-random number for use as an identity hash."
> >
> > ?????? lastHash := 13849 + (27181 * lastHash) bitAnd: 65535.
> > ?????? ^ lastHash
> >
> >
> > In which case, it seems that every new hash will be different than the previous one.
> >
> > But NewObjectMemory >> newObjectHash
> > ?????? "Derive the new object hash from the allocation pointer.?? This is less costly than
> > ?????? ??using lastHash because it avoids the read-modify-write cycle to update lastHash.
> > ?????? ??Since the size of eden is a power of two and larger than the hash range this provides
> > ?????? ??a well-distributed and fairly random set of values."
> > ?????? <inline: true>
> > ?????? ^freeStart >> BytesPerWord
>
> aha.. so this is the reason why hash is not incrementing!
> Since BytesPerWord == 4
> it means that hash will change only if old freeStart - new freeStart value >= 16
>
> and since two short strings 'ab' fit in 16 bytes, they receive same hash value.
>
> So, instead it should be:
>
> ^freeStart // BytesPerWord
>
> since freeStart is aligned on BytesPerWord, but not (2^BytesPerWord)
>

Nice catch Igor!

I think the intented implementation was probably this:

^freeStart >> ShiftForWord

Nasty sort of bug, it's the sort of typo that looks fine until you
stop to think about it ;)

Dave

Nicolas Cellier

Re: [squeak-dev] identityHash bits not incremented for new objects?

Well distributed ?
In examples like Igor's one you'll likely get consecutive hashes...

Nicolas

2011/8/1 David T. Lewis <[hidden email]>:

>
> On Mon, Aug 01, 2011 at 10:03:04AM +0200, Igor Stasenko wrote:
>>
>> On 1 August 2011 09:47, Mariano Martinez Peck <[hidden email]> wrote:
>> >
>> > Hi Igor. Does the same test work correctlly in a interpreter VM??? because I can see:
>> >
>> > ObjectMemory >> newObjectHash
>> > ?????? "Answer a new 16-bit pseudo-random number for use as an identity hash."
>> >
>> > ?????? lastHash := 13849 + (27181 * lastHash) bitAnd: 65535.
>> > ?????? ^ lastHash
>> >
>> >
>> > In which case, it seems that every new hash will be different than the previous one.
>> >
>> > But NewObjectMemory >> newObjectHash
>> > ?????? "Derive the new object hash from the allocation pointer.?? This is less costly than
>> > ?????? ??using lastHash because it avoids the read-modify-write cycle to update lastHash.
>> > ?????? ??Since the size of eden is a power of two and larger than the hash range this provides
>> > ?????? ??a well-distributed and fairly random set of values."
>> > ?????? <inline: true>
>> > ?????? ^freeStart >> BytesPerWord
>>
>> aha.. so this is the reason why hash is not incrementing!
>> Since BytesPerWord == 4
>> it means that hash will change only if old freeStart - new freeStart value >= 16
>>
>> and since two short strings 'ab' fit in 16 bytes, they receive same hash value.
>>
>> So, instead it should be:
>>
>> ^freeStart // BytesPerWord
>>
>> since freeStart is aligned on BytesPerWord, but not (2^BytesPerWord)
>>
>
> Nice catch Igor!
>
> I think the intented implementation was probably this:
>
> ^freeStart >> ShiftForWord
>
> Nasty sort of bug, it's the sort of typo that looks fine until you
> stop to think about it ;)
>
> Dave
>
>

David T. Lewis

Re: [squeak-dev] identityHash bits not incremented for new objects?

On Mon, Aug 01, 2011 at 01:55:16PM +0200, Nicolas Cellier wrote:
>
> Well distributed ?
> In examples like Igor's one you'll likely get consecutive hashes...
>
> Nicolas

I don't know, but I would say that if you have a suitable hash test, please
try comparing the results of that test on Cog versus a interpreter VM (and
check it again after Igor's patch becomes available) to see if it makes
a significant difference.

Dave

>
> 2011/8/1 David T. Lewis <[hidden email]>:
> >
> > On Mon, Aug 01, 2011 at 10:03:04AM +0200, Igor Stasenko wrote:
> >>
> >> On 1 August 2011 09:47, Mariano Martinez Peck <[hidden email]> wrote:
> >> >
> >> > Hi Igor. Does the same test work correctlly in a interpreter VM??? because I can see:
> >> >
> >> > ObjectMemory >> newObjectHash
> >> > ?????? "Answer a new 16-bit pseudo-random number for use as an identity hash."
> >> >
> >> > ?????? lastHash := 13849 + (27181 * lastHash) bitAnd: 65535.
> >> > ?????? ^ lastHash
> >> >
> >> >
> >> > In which case, it seems that every new hash will be different than the previous one.
> >> >
> >> > But NewObjectMemory >> newObjectHash
> >> > ?????? "Derive the new object hash from the allocation pointer.?? This is less costly than
> >> > ?????? ??using lastHash because it avoids the read-modify-write cycle to update lastHash.
> >> > ?????? ??Since the size of eden is a power of two and larger than the hash range this provides
> >> > ?????? ??a well-distributed and fairly random set of values."
> >> > ?????? <inline: true>
> >> > ?????? ^freeStart >> BytesPerWord
> >>
> >> aha.. so this is the reason why hash is not incrementing!
> >> Since BytesPerWord == 4
> >> it means that hash will change only if old freeStart - new freeStart value >= 16
> >>
> >> and since two short strings 'ab' fit in 16 bytes, they receive same hash value.
> >>
> >> So, instead it should be:
> >>
> >> ? ? ^freeStart // BytesPerWord
> >>
> >> since freeStart is aligned on BytesPerWord, but not (2^BytesPerWord)
> >>
> >
> > Nice catch Igor!
> >
> > I think the intented implementation was probably this:
> >
> > ? ? ? ?^freeStart >> ShiftForWord
> >
> > Nasty sort of bug, it's the sort of typo that looks fine until you
> > stop to think about it ;)
> >
> > Dave
> >
> >

Henrik Sperre Johansen

Re: [Pharo-project] identityHash bits not incremented for new objects?

In reply to this post by Igor Stasenko

On Aug 2, 2011, at 5:09 51PM, David T. Lewis wrote:

> On Tue, Aug 02, 2011 at 03:31:30PM +0200, Henrik Johansen wrote:
>>
>> On Aug 2, 2011, at 3:14 15PM, Igor Stasenko wrote:
>>
>>> On 2 August 2011 14:18, Henrik Johansen <[hidden email]> wrote:
>>>>
>>>> On Aug 1, 2011, at 8:43 53AM, Igor Stasenko wrote:
>>>>
>>>>> I found that #testBecomeIdentityHash sometimes failing, sometimes not.
>>>>>
>>>>> It seems like VM 'forgets' to produce different identityHash bits for
>>>>> two consequently allocated objects,
>>>>> while test assumes that they are always different.
>>>>>
>>>>> If i insert a statement (see the code), test no longer fails.
>>>>>
>>>>> testBecomeIdentityHash
>>>>> "Note. The identity hash of both objects seems to change after the become:"
>>>>>
>>>>> | a b c d |
>>>>>
>>>>> a := 'ab' copy.
>>>>> b := 'cd' copy.
>>>>>
>>>>>>>> [ b identityHash = a identityHash ] whileTrue: [ b := b copy ].
>>>>>
>>>>> c := a.
>>>>> d := b.
>>>>>
>>>>> a become: b.
>>>>>
>>>>> self
>>>>> assert: a identityHash = c identityHash;
>>>>> assert: b identityHash = d identityHash;
>>>>> deny: a identityHash = b identityHash.
>>>>>
>>>>> A simple piece of code reveals the problem:
>>>>>
>>>>> (1 to: 20) collect: [:i |
>>>>> 'ab' copy basicIdentityHash ] #(954 954 955 955 956 956 957 957 958
>>>>> 958 959 959 960 960 961 961 962 962 963 963)
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Igor Stasenko AKA sig.
>>>>
>>>> What is the point of this test in the first place?
>>>> To me it seems to test whether a VM's implementation of become: adheres to some properties.
>>>> At least, VM bugs in traversal of references during become is the only way I could see the two first asserts EVER being false.
>>>>
>>>> As for the last assert (well, deny actually), in the context of what it's supposed to test as established above, it seems to me to say a VM should never give the same identityHash to subsequently copied objects. Which means
>>>> a) It has nothing to do with become and identityHash, thus is in the test.
>>>> b) Is this a property any code actually rely on?
>>>>
>>>> If not, why would we assert such a restriction on the VM implementation, or is it harmless?
>>>>
>>>> Either way, modifying the test to iterate till they ARE different, would make it meaningless in either case (and possible infinite loop in case of a really bad VM).
>>>>
>>>
>>> There is strong reason to test identityHashes, because VM should
>>> preserve an identity hash for oops.
>>>
>>> But of course test is wrong , because to test that , it should be like
>>> following:
>>>
>>> | a b hash1 hash2 |
>>>
>>> a := 'ab' copy.
>>> b := 'cd' copy.
>>> hash1 := a identityHash.
>>> hash2 := b identityHash.
>>>
>>> a become: b.
>>>
>>> self
>>> assert: a identityHash = hash1;
>>> assert: b identityHash = hash2.
>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko AKA sig.
>>>
>>
>> Please include a comment with that. :)
>> Like this has shown, it's rather hard to know wtf to do when you can't unambiguously read what its actually intended to test from the code itself.
>
> Henry, FYI Igor has found a small but important bug in the VM. There is
> some additional follow up discussion on the vm-dev list that you can read
> starting here:
>
> <http://lists.squeakfoundation.org/pipermail/vm-dev/2011-August/009006.html>
>
> Dave

I saw, and it does eliminate the common case where two small objects get the same identityHash.
Though, it doesn't really _ensure_ that two subsequently allocated objects will have different identityHashes, does it?
IE. if an object is allocated with size which increases (freeStart >> someInt) by 4096, won't you still see the next object getting the same identityHash?

Not to mention power-of-two sized objects will have a _really_ low number of different hashes. (though none subsequently equal)
Hence my question if this is really an important property to require of the VM.
Considering the amount of times I've seen complaints of Set performance due to bad hashing under Cog (in other words, 0), I don't think so.

Like Nicolas, I am rather skeptic to the comment saying the values are "well-distributed and fairly random" (which is both true and false, depending on how you interpret "distributed" and "random") though, and would like it if it additionally pointed out edge-cases like above.

Cheers,
Henry

PS.
What's the state of NewObjectMemory usage in VMMaker?
When I loaded it, I couldn't find definition/initialization of HashMaskUnshifted, in VMMaker.oscog those were in VMSqueakV3ObjectRepresentationConstants/ObjectMemory respectively.

David T. Lewis

Re: [Pharo-project] identityHash bits not incremented for new objects?

On Wed, Aug 03, 2011 at 04:54:09PM +0200, Henrik Johansen wrote:

>
> On Aug 2, 2011, at 5:09 51PM, David T. Lewis wrote:
>
> > On Tue, Aug 02, 2011 at 03:31:30PM +0200, Henrik Johansen wrote:
> >>
> >> Please include a comment with that. :)
> >> Like this has shown, it's rather hard to know wtf to do when you can't unambiguously read what its actually intended to test from the code itself.
> >
> > Henry, FYI Igor has found a small but important bug in the VM. There is
> > some additional follow up discussion on the vm-dev list that you can read
> > starting here:
> >
> > <http://lists.squeakfoundation.org/pipermail/vm-dev/2011-August/009006.html>
> >
> > Dave
>
> I saw, and it does eliminate the common case where two small objects get the same identityHash.
> Though, it doesn't really _ensure_ that two subsequently allocated objects will have different identityHashes, does it?
> IE. if an object is allocated with size which increases (freeStart >> someInt) by 4096, won't you still see the next object getting the same identityHash?
>
> Not to mention power-of-two sized objects will have a _really_ low number of different hashes. (though none subsequently equal)
> Hence my question if this is really an important property to require of the VM.
> Considering the amount of times I've seen complaints of Set performance due to bad hashing under Cog (in other words, 0), I don't think so.
>
> Like Nicolas, I am rather skeptic to the comment saying the values are "well-distributed and fairly random" (which is both true and false, depending on how you interpret "distributed" and "random") though, and would like it if it additionally pointed out edge-cases like above.
>

I can't comment on that, I just wanted to give credit to Igor for
a good bug catch :)

> Cheers,
> Henry
>
> PS.
> What's the state of NewObjectMemory usage in VMMaker?
> When I loaded it, I couldn't find definition/initialization of HashMaskUnshifted, in
> VMMaker.oscog those were in VMSqueakV3ObjectRepresentationConstants/ObjectMemory respectively.

For NewObjectMemory, StackInterpreter, etc you should use only the oscog
VMMaker branch blessed by Eliot (or the related development updates
from Igor or others). I have made a preliminary effort to bring some
of this into VMMaker trunk, primarily updating it to work with the
64bit/32bit object memory support in VMMaker trunk (e.g. using
'self bytesPerWord' rather than BytesPerWord). But this is not suitable
for general use and you definitely cannot yet produce a working
interpreter with it, so please use Eliot's oscog branch if you are
building a StackInterpreter or Cog.

As always, VMMaker trunk uses the Interpreter and ObjectMemory classes
to produce a standard interpreter VM compatible with the Subversion
trunk platform sources, and with generated code suitable for 32bit
and 64bit images.

Dave