Smalltalk › Squeak › Squeak VM

Spur Identity Hash [was: The Trunk: System-cmm.689.mcz]

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

3 messages Options

Eliot Miranda-2

Spur Identity Hash [was: The Trunk: System-cmm.689.mcz]

Hi Chris,

interesting!

On Wed, Nov 12, 2014 at 1:40 PM, Chris Muller <[hidden email]> wrote:

I finally tracked down why the keys of the #knownEnvironments
Dictionary were changing when trying to build an image in Spur.

It's because, in my core-extensions package, I override
UndefinedObject>>#hash to have hard-coded value independent of its
#identityHash, to be safer with distributed systems which may be using
nil in a hash calculation -- in case they would be different due to
accessing with a Spur image, for example.

Such a hash calculation is made for the keys of the #knownEnvironments
Dictionary when the country's are nil. By the different hash value in
Spur, the image would lock when trying to load my core-extensions
package, because it tried to access knownEnvironments as part of the
load-operation itself.

The least-lazy way to "fix" it was to ensure something is populated in
the LocaleID's 'country's. Whew!

Anyway, something to be aware of -- anywhere we have true, false or
nil used in a hash calculation, now has a different hash in Spur vs.
Cog. Maybe we should think about separating those objects' logical
"value" hash from their identityHash in trunk..? That could be useful
when we move to 64-bit someday..

Remember that Spur has a common header format for both 32-bit and 64-bit versions, so in both there is a 22-bit identityHash and hence the identityHashes of all objects in a 64-bit Spur image bootstrapped from a 32-bit Spur image will be _unchanged_. Convenient. So no need to worry. And it should be the case that a freshly bootstrapped 64-bit Spur image does not need to be rehashed to function properly.

But while we're on the subject, one thing we could do is arrange that Symbols have an identityHash based on their value. So when interning a string we'd compute its string hash and derive and assign the identityHash of the Symbol from the string hash. That would mean that when unpickling classes in e.g. Fuel we would not have to rehash method dictionaries, which would be very nice indeed.

On Wed, Nov 12, 2014 at 3:26 PM, <[hidden email]> wrote:
> Chris Muller uploaded a new version of System to project The Trunk:
> http://source.squeak.org/trunk/System-cmm.689.mcz
>
> ==================== Summary ====================
>
> Name: System-cmm.689
> Author: cmm
> Time: 12 November 2014, 3:19:56.156 pm
> UUID: a1ffba24-42ff-4391-9387-4e8ee20e6b2a
> Ancestors: System-ul.688
>
> Populate all LocaleID's 'country's.
>
> =============== Diff against System-ul.688 ===============
>
> Item was added:
> + ----- Method: LocaleID class>>countryFor: (in category 'accessing') -----
> + countryFor: iso6391Code
> + "http://www.loc.gov/standards/iso639-2/php/code_list.php"
> + ^ iso6391Code
> + caseOf:
> + { ['af'] -> ['Afrikaans'].
> + ['ca'] -> ['Catalan'].
> + ['cs'] -> [ 'Czech'].
> + ['da'] -> [ 'Danish'].
> + ['de'] -> [ 'German'].
> + ['el'] -> [ 'Greek Modern'].
> + ['en'] -> [ 'English'].
> + ['es'] -> [ 'Spanish'].
> + ['eu'] -> [ 'Basque'].
> + ['fi'] -> [ 'Finnish'].
> + ['fo'] -> [ 'Faroese'].
> + ['fr'] -> [ 'French'].
> + ['ga'] -> [ 'Irish'].
> + ['gd'] -> [ 'Gaelic'].
> + ['hr'] -> [ 'Croatian'].
> + ['hu'] -> [ 'Hungarian'].
> + ['is'] -> [ 'Icelandic'].
> + ['it'] -> [ 'Italian'].
> + ['ja'] -> [ 'Japanese'].
> + ['ja-etoys'] -> [ 'Japanese'].
> + ['ko'] -> [ 'Korean'].
> + ['nl'] -> [ 'Dutch'].
> + ['no'] -> [ 'Norwegian'].
> + ['pt'] -> [ 'Portuguese'].
> + ['rm'] -> [ 'Romansh'].
> + ['ro'] -> [ 'Romainian'].
> + ['sk'] -> [ 'Slovak'].
> + ['sl'] -> [ 'Slovenian'].
> + ['sq'] -> [ 'Albanian'].
> + ['sv'] -> [ 'Swedish'].
> + ['sw'] -> [ 'Swahili'].
> + ['zh'] -> [ 'Chinese'] }
> + otherwise:
> + [ 'other' ]!
>
> Item was changed:
> ----- Method: LocaleID>>isoLanguage:isoCountry: (in category 'initialize') -----
> + isoLanguage: langString isoCountry: countryStringOrNil
> - isoLanguage: langString isoCountry: countryStringOrNil
> isoLanguage := langString.
> + isoCountry := countryStringOrNil ifNil: (self class countryFor: langString)!
> - isoCountry := countryStringOrNil!
>
> Item was changed:
> (PackageInfo named: 'System') postscript: '"Preferences already removed by hand, but whose state still lingers:"
> + LocaleID allInstances do:
> + [ : each | each
> + isoLanguage: each isoLanguage
> + isoCountry: (each isoCountry ifNil: [ each isoCountry ]) ].
> + LanguageEnvironment knownEnvironments rehash'!
> - Preferences removePreference: #upgradeIsMerge.
> - Preferences removePreference: #colorWhenPrettyPrinting.
> - Preferences removePreference: #promptForUpdateServer.
> - Preferences removePreference: #updateSavesFile.
> - Preferences removePreference: #updateFromServerAtStartup.'!
>
>

best,

Eliot

Chris Muller-3

Re: [squeak-dev] Spur Identity Hash [was: The Trunk: System-cmm.689.mcz]

>> Anyway, something to be aware of -- anywhere we have true, false or
>> nil used in a hash calculation, now has a different hash in Spur vs.
>> Cog. Maybe we should think about separating those objects' logical
>> "value" hash from their identityHash in trunk..? That could be useful
>> when we move to 64-bit someday..
>
> Remember that Spur has a common header format for both 32-bit and 64-bit
> versions, so in both there is a 22-bit identityHash and hence the
> identityHashes of all objects in a 64-bit Spur image bootstrapped from a
> 32-bit Spur image will be _unchanged_. Convenient. So no need to worry.
> And it should be the case that a freshly bootstrapped 64-bit Spur image does
> not need to be rehashed to function properly.

Suprising that their identityHash needs to change for Spur but not to
go to 64-bit..

Wait, I thought one of the benefits of 64-bit was to finally increase
that small identityHash?

Levente Uzonyi-2

Re: [Pharo-dev] [squeak-dev] Spur Identity Hash [was: The Trunk: System-cmm.689.mcz]

On Fri, 14 Nov 2014, Chris Muller wrote:

>>> Anyway, something to be aware of -- anywhere we have true, false or
>>> nil used in a hash calculation, now has a different hash in Spur vs.
>>> Cog. Maybe we should think about separating those objects' logical
>>> "value" hash from their identityHash in trunk..? That could be useful
>>> when we move to 64-bit someday..
>>
>> Remember that Spur has a common header format for both 32-bit and 64-bit
>> versions, so in both there is a 22-bit identityHash and hence the
>> identityHashes of all objects in a 64-bit Spur image bootstrapped from a
>> 32-bit Spur image will be _unchanged_. Convenient. So no need to worry.
>> And it should be the case that a freshly bootstrapped 64-bit Spur image does
>> not need to be rehashed to function properly.
>
> Suprising that their identityHash needs to change for Spur but not to
> go to 64-bit..
>
> Wait, I thought one of the benefits of 64-bit was to finally increase
> that small identityHash?

22 is already a lot more than the current 12. Current hashed collections
should give excellent performance up to 4 million elements with a 22-bit
identity hash. Insertion and lookup performance should be good up to 60
million elements, and removal performance should be good up to 20 million.
And you'll still be able to use hashed collections optimized for large
sizes[1][2] if you want to store more objects.

Levente

[1] http://leves.web.elte.hu/LargeIdentityDictionary/
[2] http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary2.png

P.S.: With a primitive I suggested long ago, the blue line on the picture
(#at:) could be as flat as the red (#includesKey:).