Cost of includesKey in RcKeyValueDictionary

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Cost of includesKey in RcKeyValueDictionary

GLASS mailing list
Hi,

A domain object (A) has a RcKeyValueDictionary to store large quantity of
(B) objects.

B objects has a random id generated with:
id := Lag1MwcRandom new integer. "i think the max number is 4294967295"

The problem here is that with 100.000 (B) objects the probability of id
clash is pretty high.

It can be fixed with #includesKey: (inside a loop :( ) but which is the cost
of sending #includesKey: in RcKeyValueDictionary ?

It all reduce to #_at: but is a primitive. As far as i remember in a Martin
McClure presentation about Hash functions the cost should NOT be to much.
But just want to confirm this :)

Maybe a better solution it will be to use persistent counters ?
Or maybe a UUID  ? (which has the same problem but a lower clash
probability)

regards,
bruno

PS:
At first it seems that the loop is NOT a good idea but the collections of
ids will be already generated and the #includesKey: does not take to long.

| numbers instances r int milli clashes|
numbers := Dictionary new.
r := Lag1MwcRandom new.
instances := 900000.
1 to: instances do: [:each | int := r integer.
                                numbers at: int put: int]. "already generated collection"

clashes := 0.
milli := Time millisecondsToRun: [ "new ids"
1 to: 10000 do: [:each | int := r integer.
                                [numbers includesKey: int] whileTrue: [clashes := clashes + 1. int :=
int +1].
                                numbers at: int put: int].
].
Array with: clashes with: milli.




--
Sent from: http://forum.world.st/GLASS-f1460844.html
_______________________________________________
Glass mailing list
[hidden email]
https://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Cost of includesKey in RcKeyValueDictionary

GLASS mailing list
Hi Bruno,

What does the ID of a B object get used for? Is is needed to look up a B
object in the A object?

If so, I think using a random SmallInteger key is a reasonable approach,
likely less trouble than other possibilities. A UUID does not fit in a
SmallInteger and would take more space, and though a UUID could reduce
the probability of clashes to the point that you didn't need to test for
them, it might well be slower than using a SmallInteger and testing for
clashes.

Lookups with #includesKey: are fairly fast. I don't know what kind of
speed you need, but using a variant of your test code (included below) I
see the new keys being added at a rate of about 60,000 per second on an
8-year-old not very fast desktop machine I have at home. I'm using more
of the SmallInteger range by using a 56-bit random ID. The increase from
a 32-bit ID has very little impact on speed, in my quick testing.

Regards,
-Martin

| numbers instances r int milli1 milli2 clashes|
numbers := KeyValueDictionary new.
r := Lag1MwcRandom new.
milli1 := System millisecondsToRun: [
instances := 900000.
1 to: instances do: [:each | int := r integer + ((r integer bitAnd:
16rFFFFFF) bitShift: 32).
                 numbers at: int put: int]. "already generated collection"].

clashes := 0.
milli2 := System millisecondsToRun: [ "new ids"
1 to: 10000 do: [:each | int := ((r integer bitAnd: 16rFFFFFF) bitShift:
32).
                 [numbers includesKey: int] whileTrue: [clashes :=
clashes + 1. int :=
int +1].
                 numbers at: int put: int].
].
^ Array with: clashes with: milli1 with: milli2



On 2/26/20 1:50 PM, BrunoBB via Glass wrote:

> Hi,
>
> A domain object (A) has a RcKeyValueDictionary to store large quantity of
> (B) objects.
>
> B objects has a random id generated with:
> id := Lag1MwcRandom new integer. "i think the max number is 4294967295"
>
> The problem here is that with 100.000 (B) objects the probability of id
> clash is pretty high.
>
> It can be fixed with #includesKey: (inside a loop :( ) but which is the cost
> of sending #includesKey: in RcKeyValueDictionary ?
>
> It all reduce to #_at: but is a primitive. As far as i remember in a Martin
> McClure presentation about Hash functions the cost should NOT be to much.
> But just want to confirm this :)
>
> Maybe a better solution it will be to use persistent counters ?
> Or maybe a UUID  ? (which has the same problem but a lower clash
> probability)
>
> regards,
> bruno
>
> PS:
> At first it seems that the loop is NOT a good idea but the collections of
> ids will be already generated and the #includesKey: does not take to long.
>
> | numbers instances r int milli clashes|
> numbers := Dictionary new.
> r := Lag1MwcRandom new.
> instances := 900000.
> 1 to: instances do: [:each | int := r integer.
> numbers at: int put: int]. "already generated collection"
>
> clashes := 0.
> milli := Time millisecondsToRun: [ "new ids"
> 1 to: 10000 do: [:each | int := r integer.
> [numbers includesKey: int] whileTrue: [clashes := clashes + 1. int :=
> int +1].
> numbers at: int put: int].
> ].
> Array with: clashes with: milli.
>
>
>
>
> --
> Sent from: http://forum.world.st/GLASS-f1460844.html
> _______________________________________________
> Glass mailing list
> [hidden email]
> https://lists.gemtalksystems.com/mailman/listinfo/glass

_______________________________________________
Glass mailing list
[hidden email]
https://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Cost of includesKey in RcKeyValueDictionary

GLASS mailing list
Martin,

Thanks for your answer.

Yes, a look up is needed to get an object B from object A.

After some testing the speed is good enough thanks for your example.

I will follow your recommendation and keep the current implementation but
adding a testing for possible id clashes.

regards,
bruno



--
Sent from: http://forum.world.st/GLASS-f1460844.html
_______________________________________________
Glass mailing list
[hidden email]
https://lists.gemtalksystems.com/mailman/listinfo/glass