Hi,
A domain object (A) has a RcKeyValueDictionary to store large quantity of (B) objects. B objects has a random id generated with: id := Lag1MwcRandom new integer. "i think the max number is 4294967295" The problem here is that with 100.000 (B) objects the probability of id clash is pretty high. It can be fixed with #includesKey: (inside a loop :( ) but which is the cost of sending #includesKey: in RcKeyValueDictionary ? It all reduce to #_at: but is a primitive. As far as i remember in a Martin McClure presentation about Hash functions the cost should NOT be to much. But just want to confirm this :) Maybe a better solution it will be to use persistent counters ? Or maybe a UUID ? (which has the same problem but a lower clash probability) regards, bruno PS: At first it seems that the loop is NOT a good idea but the collections of ids will be already generated and the #includesKey: does not take to long. | numbers instances r int milli clashes| numbers := Dictionary new. r := Lag1MwcRandom new. instances := 900000. 1 to: instances do: [:each | int := r integer. numbers at: int put: int]. "already generated collection" clashes := 0. milli := Time millisecondsToRun: [ "new ids" 1 to: 10000 do: [:each | int := r integer. [numbers includesKey: int] whileTrue: [clashes := clashes + 1. int := int +1]. numbers at: int put: int]. ]. Array with: clashes with: milli. -- Sent from: http://forum.world.st/GLASS-f1460844.html _______________________________________________ Glass mailing list [hidden email] https://lists.gemtalksystems.com/mailman/listinfo/glass |
Hi Bruno,
What does the ID of a B object get used for? Is is needed to look up a B object in the A object? If so, I think using a random SmallInteger key is a reasonable approach, likely less trouble than other possibilities. A UUID does not fit in a SmallInteger and would take more space, and though a UUID could reduce the probability of clashes to the point that you didn't need to test for them, it might well be slower than using a SmallInteger and testing for clashes. Lookups with #includesKey: are fairly fast. I don't know what kind of speed you need, but using a variant of your test code (included below) I see the new keys being added at a rate of about 60,000 per second on an 8-year-old not very fast desktop machine I have at home. I'm using more of the SmallInteger range by using a 56-bit random ID. The increase from a 32-bit ID has very little impact on speed, in my quick testing. Regards, -Martin | numbers instances r int milli1 milli2 clashes| numbers := KeyValueDictionary new. r := Lag1MwcRandom new. milli1 := System millisecondsToRun: [ instances := 900000. 1 to: instances do: [:each | int := r integer + ((r integer bitAnd: 16rFFFFFF) bitShift: 32). numbers at: int put: int]. "already generated collection"]. clashes := 0. milli2 := System millisecondsToRun: [ "new ids" 1 to: 10000 do: [:each | int := ((r integer bitAnd: 16rFFFFFF) bitShift: 32). [numbers includesKey: int] whileTrue: [clashes := clashes + 1. int := int +1]. numbers at: int put: int]. ]. ^ Array with: clashes with: milli1 with: milli2 On 2/26/20 1:50 PM, BrunoBB via Glass wrote: > Hi, > > A domain object (A) has a RcKeyValueDictionary to store large quantity of > (B) objects. > > B objects has a random id generated with: > id := Lag1MwcRandom new integer. "i think the max number is 4294967295" > > The problem here is that with 100.000 (B) objects the probability of id > clash is pretty high. > > It can be fixed with #includesKey: (inside a loop :( ) but which is the cost > of sending #includesKey: in RcKeyValueDictionary ? > > It all reduce to #_at: but is a primitive. As far as i remember in a Martin > McClure presentation about Hash functions the cost should NOT be to much. > But just want to confirm this :) > > Maybe a better solution it will be to use persistent counters ? > Or maybe a UUID ? (which has the same problem but a lower clash > probability) > > regards, > bruno > > PS: > At first it seems that the loop is NOT a good idea but the collections of > ids will be already generated and the #includesKey: does not take to long. > > | numbers instances r int milli clashes| > numbers := Dictionary new. > r := Lag1MwcRandom new. > instances := 900000. > 1 to: instances do: [:each | int := r integer. > numbers at: int put: int]. "already generated collection" > > clashes := 0. > milli := Time millisecondsToRun: [ "new ids" > 1 to: 10000 do: [:each | int := r integer. > [numbers includesKey: int] whileTrue: [clashes := clashes + 1. int := > int +1]. > numbers at: int put: int]. > ]. > Array with: clashes with: milli. > > > > > -- > Sent from: http://forum.world.st/GLASS-f1460844.html > _______________________________________________ > Glass mailing list > [hidden email] > https://lists.gemtalksystems.com/mailman/listinfo/glass _______________________________________________ Glass mailing list [hidden email] https://lists.gemtalksystems.com/mailman/listinfo/glass |
Martin,
Thanks for your answer. Yes, a look up is needed to get an object B from object A. After some testing the speed is good enough thanks for your example. I will follow your recommendation and keep the current implementation but adding a testing for possible id clashes. regards, bruno -- Sent from: http://forum.world.st/GLASS-f1460844.html _______________________________________________ Glass mailing list [hidden email] https://lists.gemtalksystems.com/mailman/listinfo/glass |
Free forum by Nabble | Edit this page |