voyage/mongo randomly wrong OIDs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: voyage/mongo randomly wrong OIDs

Henrik Sperre Johansen

On Aug 31, 2013, at 3:30 , Sven Van Caekenberghe <[hidden email]> wrote:

>
> On 31 Aug 2013, at 13:47, Stéphane Ducasse <[hidden email]> wrote:
>
>> Sabine
>> what we could do is to propose a "subclass of UUID" and to group several UUID generators.
>> Like that with a couple of classes, we could get a better eco system where people can pick the one they want.
>>
>> stef
>
> I just made my own, called NeoUUIDGenerator, http://www.smalltalkhub.com/#!/~SvenVanCaekenberghe/Neo/packages/Neo-UUID
>
> @Sabine
>
> IMHO what I think a local counter does not, is give you uniques over different machines, images, instances - that is why there is also the concept of node identification.
>
> In my implementation I combine the millisecond clock, a small random number, a counter and a node id. The node id is based on several elements, it should be different when running multiple images.
>
> This is a hack, not something that I can prove mathematically. But it can't be worse than pure random. I think the speed is also acceptable:
>
> | generator |
> generator := NeoUUIDGenerator new.
> [ generator next ] bench. '408,000 per second.'
>
> | generator |
> generator := UUIDGenerator new.
> [ generator generateBytes: UUID nilUUID forVersion: 4 ] bench. '13,300 per second.'
>
> Sven
So sorta like UUID type 3/5, but with a custom object identifier scheme, and no hashing?
Not sure it'd be fair to call that any kind of UUIDGenerator anymore, as the UUID standard and its encompassing types is a pretty well-defined ;)

IIRC, the reason those went out of flavor in favor of type 4, is the fact they do potentially identify the source computer from which they were created, and thus a purely random approach was considered better. (as long as it is just that, which, as this thread illustrates, is another matter)

Cheers,
Henry

signature.asc (859 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: voyage/mongo randomly wrong OIDs

Sven Van Caekenberghe-2

On 02 Sep 2013, at 10:34, Henrik Johansen <[hidden email]> wrote:

>
> On Aug 31, 2013, at 3:30 , Sven Van Caekenberghe <[hidden email]> wrote:
>
>>
>> On 31 Aug 2013, at 13:47, Stéphane Ducasse <[hidden email]> wrote:
>>
>>> Sabine
>>> what we could do is to propose a "subclass of UUID" and to group several UUID generators.
>>> Like that with a couple of classes, we could get a better eco system where people can pick the one they want.
>>>
>>> stef
>>
>> I just made my own, called NeoUUIDGenerator, http://www.smalltalkhub.com/#!/~SvenVanCaekenberghe/Neo/packages/Neo-UUID
>>
>> @Sabine
>>
>> IMHO what I think a local counter does not, is give you uniques over different machines, images, instances - that is why there is also the concept of node identification.
>>
>> In my implementation I combine the millisecond clock, a small random number, a counter and a node id. The node id is based on several elements, it should be different when running multiple images.
>>
>> This is a hack, not something that I can prove mathematically. But it can't be worse than pure random. I think the speed is also acceptable:
>>
>> | generator |
>> generator := NeoUUIDGenerator new.
>> [ generator next ] bench. '408,000 per second.'
>>
>> | generator |
>> generator := UUIDGenerator new.
>> [ generator generateBytes: UUID nilUUID forVersion: 4 ] bench. '13,300 per second.'
>>
>> Sven
>
> So sorta like UUID type 3/5, but with a custom object identifier scheme, and no hashing?
> Not sure it'd be fair to call that any kind of UUIDGenerator anymore, as the UUID standard and its encompassing types is a pretty well-defined ;)

Yes, it is a hack, mixing type 3/5 elements while pretending to be type 4 ;-)

> IIRC, the reason those went out of flavor in favor of type 4, is the fact they do potentially identify the source computer from which they were created, and thus a purely random approach was considered better. (as long as it is just that, which, as this thread illustrates, is another matter)

Somehow, I don't feel like just random data would do. Maybe the chance for repetition is low, but it is not zero across instances, images and machines, and it depends on the quality of a random generator that is hard to control. I have this feeling that adding a counter, the time and a node identification is better. But this is totally unscientific.

> Cheers,
> Henry


12