Smalltalk › Pharo › Pharo Smalltalk Users

voyage/mongo randomly wrong OIDs

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

22 messages Options

Henrik Sperre Johansen

Sep 02, 2013; 8:34am

Re: voyage/mongo randomly wrong OIDs

1774 posts

On Aug 31, 2013, at 3:30 , Sven Van Caekenberghe <[hidden email]> wrote:

>
> On 31 Aug 2013, at 13:47, Stéphane Ducasse <[hidden email]> wrote:
>
>> Sabine
>> what we could do is to propose a "subclass of UUID" and to group several UUID generators.
>> Like that with a couple of classes, we could get a better eco system where people can pick the one they want.
>>
>> stef
>
> I just made my own, called NeoUUIDGenerator, http://www.smalltalkhub.com/#!/~SvenVanCaekenberghe/Neo/packages/Neo-UUID
>
> @Sabine
>
> IMHO what I think a local counter does not, is give you uniques over different machines, images, instances - that is why there is also the concept of node identification.
>
> In my implementation I combine the millisecond clock, a small random number, a counter and a node id. The node id is based on several elements, it should be different when running multiple images.
>
> This is a hack, not something that I can prove mathematically. But it can't be worse than pure random. I think the speed is also acceptable:
>
> | generator |
> generator := NeoUUIDGenerator new.
> [ generator next ] bench. '408,000 per second.'
>
> | generator |
> generator := UUIDGenerator new.
> [ generator generateBytes: UUID nilUUID forVersion: 4 ] bench. '13,300 per second.'
>
> Sven

... [show rest of quote]

So sorta like UUID type 3/5, but with a custom object identifier scheme, and no hashing?
Not sure it'd be fair to call that any kind of UUIDGenerator anymore, as the UUID standard and its encompassing types is a pretty well-defined ;)

IIRC, the reason those went out of flavor in favor of type 4, is the fact they do potentially identify the source computer from which they were created, and thus a purely random approach was considered better. (as long as it is just that, which, as this thread illustrates, is another matter)

Cheers,
Henry

signature.asc (859 bytes) Download Attachment

Sven Van Caekenberghe-2

Sep 02, 2013; 8:51am

Re: voyage/mongo randomly wrong OIDs

5697 posts

On 02 Sep 2013, at 10:34, Henrik Johansen <[hidden email]> wrote:

>
> On Aug 31, 2013, at 3:30 , Sven Van Caekenberghe <[hidden email]> wrote:
>
>>
>> On 31 Aug 2013, at 13:47, Stéphane Ducasse <[hidden email]> wrote:
>>
>>> Sabine
>>> what we could do is to propose a "subclass of UUID" and to group several UUID generators.
>>> Like that with a couple of classes, we could get a better eco system where people can pick the one they want.
>>>
>>> stef
>>
>> I just made my own, called NeoUUIDGenerator, http://www.smalltalkhub.com/#!/~SvenVanCaekenberghe/Neo/packages/Neo-UUID
>>
>> @Sabine
>>
>> IMHO what I think a local counter does not, is give you uniques over different machines, images, instances - that is why there is also the concept of node identification.
>>
>> In my implementation I combine the millisecond clock, a small random number, a counter and a node id. The node id is based on several elements, it should be different when running multiple images.
>>
>> This is a hack, not something that I can prove mathematically. But it can't be worse than pure random. I think the speed is also acceptable:
>>
>> | generator |
>> generator := NeoUUIDGenerator new.
>> [ generator next ] bench. '408,000 per second.'
>>
>> | generator |
>> generator := UUIDGenerator new.
>> [ generator generateBytes: UUID nilUUID forVersion: 4 ] bench. '13,300 per second.'
>>
>> Sven
>
> So sorta like UUID type 3/5, but with a custom object identifier scheme, and no hashing?
> Not sure it'd be fair to call that any kind of UUIDGenerator anymore, as the UUID standard and its encompassing types is a pretty well-defined ;)

... [show rest of quote]

Yes, it is a hack, mixing type 3/5 elements while pretending to be type 4 ;-)

> IIRC, the reason those went out of flavor in favor of type 4, is the fact they do potentially identify the source computer from which they were created, and thus a purely random approach was considered better. (as long as it is just that, which, as this thread illustrates, is another matter)

Somehow, I don't feel like just random data would do. Maybe the chance for repetition is low, but it is not zero across instances, images and machines, and it depends on the quality of a random generator that is hard to control. I have this feeling that adding a counter, the time and a node identification is better. But this is totally unscientific.

> Cheers,
> Henry