Smalltalk - Re: What is slowing Glorp down north of 10,000 objects in the Transaction's undoMap?

Re: What is slowing Glorp down north of 10,000 objects in the Transaction's undoMap?

Posted by Tom Robinson on Nov 06, 2020; 12:12pm
URL: https://forum.world.st/What-is-slowing-Glorp-down-north-of-10-000-objects-in-the-Transaction-s-undoMap-tp5124433p5124472.html

Hi Joachim,

Is VAST available with a 64 bit VM? If so, it might be interesting to see what happens to the performance there. The problem with Smalltalk hashes in a 32-bit implementation with lots of memory (and objects) is that it means you get lots of duplicate hash values and lookups can start to slow down due to sequential search issues. Or are you already using a 64-bit version?

On 11/6/2020 3:10 AM, jtuchel wrote:

I can already answer parts of my questions ;-)

I can easily make this perform a whole lot slower by overriding #basicHash in my persistent classes. Thus I can easily move IdentitySet>>#includes: and IdentityDictionary>>#at:ifAbsent: to the top of the list of worst performers ;-)

So basicHash clearly has in anfluence on the overall performance. There is only this little remaining riddle: can I use this knowledge to achieve the opposite effect ;-))

I am a bit sceptical. In my attempts to play with #basicHash, it always showed up in the list, but it had obviously never been there with the default implementation (because sampling won't measure VM primitives, I guess). So I either chose very slow hashing algorithms, or the hashing algorithms I chose were bad. Is suspect a combination of both ;-)

I chose to include the Class in order to make the hash of an instance of ClassA with id 17 distinguishable from an instance of ClassB with the same id(17). My observation with Hashes of all Classes in the image is that they are all in the range between 1 and 32767. So I went to Andres' book and found his chapter on VsiualWorks' #hash implementation in Date. I thought the class' hash is somewhat similar to a Date's year, just that the class hashes take 15 bits instead of 9.

so I tried

self class hash * 32768 + id hash

(self class hash bitShift: 15) bitXOr: id hash

And a few even lower performing and less clever ideas. But they all just made things worse.

So, what do I do with this new knoweldge? I don't know, tbh.

jtuchel schrieb am Freitag, 6. November 2020 um 08:21:12 UTC+1:

little correction:

>If I wanted to implement another hashing agorithm, should the class be part of the hash? most of our persistent objects have a sequence number as id, each of them >created by the database for each table individually *starting* with 1,

jtuchel schrieb am Freitag, 6. November 2020 um 07:51:46 UTC+1:

Hi again,

I gave up waiting for the results Browser on my tracing results yesterday. The image grew above 4 GB in size and the Browser still didn't open after ~3 hrs. So I tried sampling at a rate of 5ms.

The results are a bit surprising. If the mein problem would be inefficient hash algroithms or an ineffecient IdentityDictionary, I would expect IdentityDictionary methods like at:ifAbsentPut: and such on top of the sorted list of methods most time spent in. That is not the case in a sample of 12 runs. The methods most time spent in are isRegistered: and registeredObjectsDo: as well as Collection>>#includes: . IdentityDictionary and IdentitySet are on the list, but with low percentages of the overall execution time time.

The top of the list in my Workbench looks like this:

(50,4%) UnitOfWork>>#registeredObjectsDo:

(16,6%) Collection>>#includes:

(2,2%) IdentityDictionary>>#includesKey:

(2,0%) IdentityDictionary#at:ifAbsentPut:

(1,9%) IdentitySet>>#includes:

....

These methods do use #= extensively, of course, but I am not so sure this is related to hashing, right? The main job of these methods is to iterate over #registeredObjects, which, iiuc, does also not rely on hashing, because all they do is walk thorugh a long list of pointers, visiting each object. So I am almost sure this is not a hashing issue, but just a simple case of too much work due to too many objects in the #registeredObjects collection.

@Alan: would you agree on this thesis?

Just to see if I can improve things by another hashing function, i tried implementing hash functions on the two classes that are the majority on the list of registered objects as

hash

^id hash "the send of #hash is probably not necessary, sind id is an Integer anyways, but might one day in a century or so be a LargeInteger..."

The performance wasn't affected at all, it neither improved nor got worse. There are a few questions about hashing in this context, which may be very important for the purpose of changing hashing for persistent objects, like

since registeredObjects and undoMaps are IdentityDctionaries, I guess they're not relying on hash at all, but basicHash instead. #basicHash is a VM primitive in VAST, so maybe there is not much point in overriding this. I am most likely not more clever than what the VM guys do for hashing...

If I wanted to implement another hashing agorithm, should the class be part of the hash? most of our persistent objects have a sequence number as id, each of them created by the database for each table individually with 1, so if all persistent objects just return the id as their hash value, and if Glorp manages instances of different classes in Dctionaries, there are probably lots of collisions. So Maybe teh Class's hash should be part of an Object's hash? Something like
self class hash * id hash
maybe?

But I am not so sure hashing is relevant in my case. My gut feeling is that I simply have a problem of too many registered objects in the session. This is most likely a consequence of the way we handle our Transaction (see my other question about best practices on this group).

So the next thing I'll try is to change the Transction handling for this specific dialog first and see if this has an effect.

Thanks for reading, and also lots of thanks for any comments on my thinking out loud here...

Joachim

--
You received this message because you are subscribed to the Google Groups "glorp-group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/glorp-group/27d40c12-b71e-4c92-b96e-724e4c2e2de6n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "glorp-group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/glorp-group/ec1d8cfa-33af-471e-9a0f-d7b3bdde294f%40gmail.com.