Spur Identity Hash [was: The Trunk: System-cmm.689.mcz]

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Spur Identity Hash [was: The Trunk: System-cmm.689.mcz]

Eliot Miranda-2
Hi Chris,

    interesting!

On Wed, Nov 12, 2014 at 1:40 PM, Chris Muller <[hidden email]> wrote:
I finally tracked down why the keys of the #knownEnvironments
Dictionary were changing when trying to build an image in Spur.

It's because, in my core-extensions package, I override
UndefinedObject>>#hash to have hard-coded value independent of its
#identityHash, to be safer with distributed systems which may be using
nil in a hash calculation -- in case they would be different due to
accessing with a Spur image, for example.

Such a hash calculation is made for the keys of the #knownEnvironments
Dictionary when the country's are nil.  By the different hash value in
Spur, the image would lock when trying to load my core-extensions
package, because it tried to access knownEnvironments as part of the
load-operation itself.

The least-lazy way to "fix" it was to ensure something is populated in
the LocaleID's 'country's.  Whew!

Anyway, something to be aware of -- anywhere we have true, false or
nil used in a hash calculation, now has a different hash in Spur vs.
Cog.  Maybe we should think about separating those objects' logical
"value" hash from their identityHash in trunk..?  That could be useful
when we move to 64-bit someday..

 Remember that Spur has a common header format for both 32-bit and 64-bit versions, so in both there is a 22-bit identityHash and hence the identityHashes of all objects in a 64-bit Spur image bootstrapped from a 32-bit Spur image will be _unchanged_.  Convenient. So no need to worry.  And it should be the case that a freshly bootstrapped 64-bit Spur image does not need to be rehashed to function properly.

But while we're on the subject, one thing we could do is arrange that Symbols have an identityHash based on their value.  So when interning a string we'd compute its string hash and derive and assign the identityHash of the Symbol from the string hash.  That would mean that when unpickling classes in e.g. Fuel we would not have to rehash method dictionaries, which would be very nice indeed.

On Wed, Nov 12, 2014 at 3:26 PM,  <[hidden email]> wrote:
> Chris Muller uploaded a new version of System to project The Trunk:
> http://source.squeak.org/trunk/System-cmm.689.mcz
>
> ==================== Summary ====================
>
> Name: System-cmm.689
> Author: cmm
> Time: 12 November 2014, 3:19:56.156 pm
> UUID: a1ffba24-42ff-4391-9387-4e8ee20e6b2a
> Ancestors: System-ul.688
>
> Populate all LocaleID's 'country's.
>
> =============== Diff against System-ul.688 ===============
>
> Item was added:
> + ----- Method: LocaleID class>>countryFor: (in category 'accessing') -----
> + countryFor: iso6391Code
> +       "http://www.loc.gov/standards/iso639-2/php/code_list.php"
> +       ^ iso6391Code
> +               caseOf:
> +                       { ['af'] -> ['Afrikaans'].
> +                       ['ca'] -> ['Catalan'].
> +                       ['cs'] -> [ 'Czech'].
> +                       ['da'] -> [ 'Danish'].
> +                       ['de'] -> [ 'German'].
> +                       ['el'] -> [ 'Greek Modern'].
> +                       ['en'] -> [ 'English'].
> +                       ['es'] -> [ 'Spanish'].
> +                       ['eu'] -> [ 'Basque'].
> +                       ['fi'] -> [ 'Finnish'].
> +                       ['fo'] -> [ 'Faroese'].
> +                       ['fr'] -> [ 'French'].
> +                       ['ga'] -> [ 'Irish'].
> +                       ['gd'] -> [ 'Gaelic'].
> +                       ['hr'] -> [ 'Croatian'].
> +                       ['hu'] -> [ 'Hungarian'].
> +                       ['is'] -> [ 'Icelandic'].
> +                       ['it'] -> [ 'Italian'].
> +                       ['ja'] -> [ 'Japanese'].
> +                       ['ja-etoys'] -> [ 'Japanese'].
> +                       ['ko'] -> [ 'Korean'].
> +                       ['nl'] -> [ 'Dutch'].
> +                       ['no'] -> [ 'Norwegian'].
> +                       ['pt'] -> [ 'Portuguese'].
> +                       ['rm'] -> [ 'Romansh'].
> +                       ['ro'] -> [ 'Romainian'].
> +                       ['sk'] -> [ 'Slovak'].
> +                       ['sl'] -> [ 'Slovenian'].
> +                       ['sq'] -> [ 'Albanian'].
> +                       ['sv'] -> [ 'Swedish'].
> +                       ['sw'] -> [ 'Swahili'].
> +                       ['zh'] -> [ 'Chinese'] }
> +               otherwise:
> +                       [ 'other' ]!
>
> Item was changed:
>   ----- Method: LocaleID>>isoLanguage:isoCountry: (in category 'initialize') -----
> + isoLanguage: langString isoCountry: countryStringOrNil
> - isoLanguage: langString isoCountry: countryStringOrNil
>         isoLanguage := langString.
> +       isoCountry := countryStringOrNil ifNil: (self class countryFor: langString)!
> -       isoCountry := countryStringOrNil!
>
> Item was changed:
>   (PackageInfo named: 'System') postscript: '"Preferences already removed by hand, but whose state still lingers:"
> + LocaleID allInstances do:
> +       [ : each | each
> +               isoLanguage: each isoLanguage
> +               isoCountry: (each isoCountry ifNil: [ each isoCountry ]) ].
> + LanguageEnvironment knownEnvironments rehash'!
> - Preferences removePreference: #upgradeIsMerge.
> - Preferences removePreference: #colorWhenPrettyPrinting.
> - Preferences removePreference: #promptForUpdateServer.
> - Preferences removePreference: #updateSavesFile.
> - Preferences removePreference: #updateFromServerAtStartup.'!
>
>




--
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: Spur Identity Hash [was: The Trunk: System-cmm.689.mcz]

Chris Muller-3
>> Anyway, something to be aware of -- anywhere we have true, false or
>> nil used in a hash calculation, now has a different hash in Spur vs.
>> Cog.  Maybe we should think about separating those objects' logical
>> "value" hash from their identityHash in trunk..?  That could be useful
>> when we move to 64-bit someday..
>
>  Remember that Spur has a common header format for both 32-bit and 64-bit
> versions, so in both there is a 22-bit identityHash and hence the
> identityHashes of all objects in a 64-bit Spur image bootstrapped from a
> 32-bit Spur image will be _unchanged_.  Convenient. So no need to worry.
> And it should be the case that a freshly bootstrapped 64-bit Spur image does
> not need to be rehashed to function properly.

Suprising that their identityHash needs to change for Spur but not to
go to 64-bit..

Wait, I thought one of the benefits of 64-bit was to finally increase
that small identityHash?

Reply | Threaded
Open this post in threaded view
|

Re: Spur Identity Hash [was: The Trunk: System-cmm.689.mcz]

Bert Freudenberg

On 14.11.2014, at 17:56, Chris Muller <[hidden email]> wrote:

>>> Anyway, something to be aware of -- anywhere we have true, false or
>>> nil used in a hash calculation, now has a different hash in Spur vs.
>>> Cog.  Maybe we should think about separating those objects' logical
>>> "value" hash from their identityHash in trunk..?  That could be useful
>>> when we move to 64-bit someday..
>>
>> Remember that Spur has a common header format for both 32-bit and 64-bit
>> versions, so in both there is a 22-bit identityHash and hence the
>> identityHashes of all objects in a 64-bit Spur image bootstrapped from a
>> 32-bit Spur image will be _unchanged_.  Convenient. So no need to worry.
>> And it should be the case that a freshly bootstrapped 64-bit Spur image does
>> not need to be rehashed to function properly.
>
> Suprising that their identityHash needs to change for Spur
It didn't *have* to change. Eliot could have just re-used the old identity hash of nil, true, and false. He probably just didn't think to do that.

> but not to go to 64-bit..

Spur already increases the number of bits to 22. It does not increase it again for 64 bits. 4 M different hashes should be enough, just like 4 M possible classes should be enough ;)

> Wait, I thought one of the benefits of 64-bit was to finally increase
> that small identityHash?

22 > 10

- Bert -






smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] [squeak-dev] Spur Identity Hash [was: The Trunk: System-cmm.689.mcz]

Levente Uzonyi-2
In reply to this post by Chris Muller-3
On Fri, 14 Nov 2014, Chris Muller wrote:

>>> Anyway, something to be aware of -- anywhere we have true, false or
>>> nil used in a hash calculation, now has a different hash in Spur vs.
>>> Cog.  Maybe we should think about separating those objects' logical
>>> "value" hash from their identityHash in trunk..?  That could be useful
>>> when we move to 64-bit someday..
>>
>>  Remember that Spur has a common header format for both 32-bit and 64-bit
>> versions, so in both there is a 22-bit identityHash and hence the
>> identityHashes of all objects in a 64-bit Spur image bootstrapped from a
>> 32-bit Spur image will be _unchanged_.  Convenient. So no need to worry.
>> And it should be the case that a freshly bootstrapped 64-bit Spur image does
>> not need to be rehashed to function properly.
>
> Suprising that their identityHash needs to change for Spur but not to
> go to 64-bit..
>
> Wait, I thought one of the benefits of 64-bit was to finally increase
> that small identityHash?

22 is already a lot more than the current 12. Current hashed collections
should give excellent performance up to 4 million elements with a 22-bit
identity hash. Insertion and lookup performance should be good up to 60
million elements, and removal performance should be good up to 20 million.
And you'll still be able to use hashed collections optimized for large
sizes[1][2] if you want to store more objects.

Levente

[1] http://leves.web.elte.hu/LargeIdentityDictionary/
[2] http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary2.png

P.S.: With a primitive I suggested long ago, the blue line on the picture
(#at:) could be as flat as the red (#includesKey:).

Reply | Threaded
Open this post in threaded view
|

Re: Spur Identity Hash [was: The Trunk: System-cmm.689.mcz]

Eliot Miranda-2
In reply to this post by Bert Freudenberg


On Fri, Nov 14, 2014 at 9:20 AM, Bert Freudenberg <[hidden email]> wrote:

On 14.11.2014, at 17:56, Chris Muller <[hidden email]> wrote:

>>> Anyway, something to be aware of -- anywhere we have true, false or
>>> nil used in a hash calculation, now has a different hash in Spur vs.
>>> Cog.  Maybe we should think about separating those objects' logical
>>> "value" hash from their identityHash in trunk..?  That could be useful
>>> when we move to 64-bit someday..
>>
>> Remember that Spur has a common header format for both 32-bit and 64-bit
>> versions, so in both there is a 22-bit identityHash and hence the
>> identityHashes of all objects in a 64-bit Spur image bootstrapped from a
>> 32-bit Spur image will be _unchanged_.  Convenient. So no need to worry.
>> And it should be the case that a freshly bootstrapped 64-bit Spur image does
>> not need to be rehashed to function properly.
>
> Suprising that their identityHash needs to change for Spur

It didn't *have* to change. Eliot could have just re-used the old identity hash of nil, true, and false. He probably just didn't think to do that.

Right. I thought since identityHashes would be changing (class identityHashes must change for Spur's class table, and there are more than 2^10 objects in an image) I would assign new hashes to all objects in the image that needed them and start with 1, 2 & 3 as the hashes for the first objects, nil, false & true.  Any system which relies on identityHashes not changing from V3 to Spur will be broken anyway, so why keep the hashes for those objects?

> but not to go to 64-bit..

Spur already increases the number of bits to 22. It does not increase it again for 64 bits. 4 M different hashes should be enough, just like 4 M possible classes should be enough ;)

> Wait, I thought one of the benefits of 64-bit was to finally increase
> that small identityHash?

22 > 10

Right (see both Bert's & Levente's responses).  Spur lifts the number of identityHashes from 2^10 to 2^22.  There's no room for more in a 64-bit system.  SPur is designed to go beyond 32-bits, but it isn't designed for terabyte heaps.  One step at a time ;-)

- Bert -

--
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: Spur Identity Hash [was: The Trunk: System-cmm.689.mcz]

Eliot Miranda-2


On Fri, Nov 14, 2014 at 10:17 AM, Eliot Miranda <[hidden email]> wrote:


On Fri, Nov 14, 2014 at 9:20 AM, Bert Freudenberg <[hidden email]> wrote:

On 14.11.2014, at 17:56, Chris Muller <[hidden email]> wrote:

>>> Anyway, something to be aware of -- anywhere we have true, false or
>>> nil used in a hash calculation, now has a different hash in Spur vs.
>>> Cog.  Maybe we should think about separating those objects' logical
>>> "value" hash from their identityHash in trunk..?  That could be useful
>>> when we move to 64-bit someday..
>>
>> Remember that Spur has a common header format for both 32-bit and 64-bit
>> versions, so in both there is a 22-bit identityHash and hence the
>> identityHashes of all objects in a 64-bit Spur image bootstrapped from a
>> 32-bit Spur image will be _unchanged_.  Convenient. So no need to worry.
>> And it should be the case that a freshly bootstrapped 64-bit Spur image does
>> not need to be rehashed to function properly.
>
> Suprising that their identityHash needs to change for Spur

It didn't *have* to change. Eliot could have just re-used the old identity hash of nil, true, and false. He probably just didn't think to do that.

Right. I thought since identityHashes would be changing (class identityHashes must change for Spur's class table, and there are more than 2^10 objects in an image) I would assign new hashes to all objects in the image that needed them and start with 1, 2 & 3 as the hashes for the first objects, nil, false & true.  Any system which relies on identityHashes not changing from V3 to Spur will be broken anyway, so why keep the hashes for those objects?

> but not to go to 64-bit..

Spur already increases the number of bits to 22. It does not increase it again for 64 bits. 4 M different hashes should be enough, just like 4 M possible classes should be enough ;)

> Wait, I thought one of the benefits of 64-bit was to finally increase
> that small identityHash?

22 > 10

Right (see both Bert's & Levente's responses).  Spur lifts the number of identityHashes from 2^10 to 2^22.  There's no room for more in a 64-bit system.  SPur is designed to go beyond 32-bits, but it isn't designed for terabyte heaps.  One step at a time ;-)

I should say no room for more hash bits in *this* 64-bit system.  Here's the Sour object header:

headerForSlots: numSlots format: formatField classIndex: classIndex
<api>
"The header format in LSB is
MSB: | 8: numSlots | (on a byte boundary)
| 2 bits | (msb,lsb = {isMarked,?})
| 22: identityHash | (on a word boundary)
| 3 bits | (msb <-> lsb = {isGrey,isPinned,isRemembered}
| 5: format | (on a byte boundary)
| 2 bits | (msb,lsb = {isImmutable,?})
| 22: classIndex | (on a word boundary) : LSB
The remaining bits (7) are used for
isImmutable (bit 23)
isRemembered (bit 29)
isPinned (bit 30)
isGrey (bit 31)
isMarked (bit 55)
leaving 2 unused bits, each next to a 22-bit field, allowing those fields to be
expanded to 23 bits..  The three bit field { isGrey, isPinned, isRemembered }
is for bits that are never set in young objects.  This allows the remembered
table to be pruned when full by using these bits as a reference count of
newSpace objects from the remembered table. Objects with a high count
should be tenured to prune the remembered table."
<returnTypeC: #usqLong>
<inline: true>
^ ((self cCoerceSimple: numSlots to: #usqLong) << self numSlotsFullShift)
+ (formatField << self formatShift)
+ classIndex

I hope this makes sense...
--
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] [squeak-dev] Spur Identity Hash [was: The Trunk: System-cmm.689.mcz]

Chris Muller-4
In reply to this post by Levente Uzonyi-2
>> Suprising that their identityHash needs to change for Spur but not to
>> go to 64-bit..
>>
>> Wait, I thought one of the benefits of 64-bit was to finally increase
>> that small identityHash?
>
> 22 is already a lot more than the current 12.

Ah, indeed, I forgot how utterly small the current identityHash is!  I
of course would like to go even bigger than 22.  I've been using
MaIdentityDictionary's developed by Igor which provide linked-lists at
each of the 4096 slots so that collisions are less onerous.  That same
strategy under 22-bit will scale a LOT further..  Ohh, I should put a
#isRunningSpur check in there..

> Current hashed collections
> should give excellent performance up to 4 million elements with a 22-bit
> identity hash. Insertion and lookup performance should be good up to 60
> million elements, and removal performance should be good up to 20 million.
> And you'll still be able to use hashed collections optimized for large
> sizes[1][2] if you want to store more objects.

I did try your LargeIdentityDictionary a few years ago but for some
reason Magma test suite couldn't pass with it, and since I had Igor's
I didn't have quite enough urgency to debug why.

> Levente
>
> [1] http://leves.web.elte.hu/LargeIdentityDictionary/
> [2]
> http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary2.png
>
> P.S.: With a primitive I suggested long ago, the blue line on the picture
> (#at:) could be as flat as the red (#includesKey:).