Hi everyone,
I’m using the session id (Smalltalk session id) for my data recording, so I can distinguish if the recorded events came from the same session. The idea is that each time an image is started a new session is created and assigned a new UUID. Now when I started to look on the data I noticed that I have some cases where I have same session IDs with different session creation times (yes a new session is initialized with a current timestamp). The time difference for the sessions with the same UUID and a different timestamp is within 2 hours. Then another thing that I did is to group the data by the timestamp and there are no cases where I have a different ID for the same timestamp, which shows that the timestamp is a more reliable ID. Now I will deal with my data just fine, but maybe we need to look in the implementation why do we get sessions with the same IDs? Cheers. Uko |
> On 6 Feb 2017, at 14:17, Yuriy Tymchuk <[hidden email]> wrote: > > Hi everyone, > > I’m using the session id (Smalltalk session id) for my data recording, so I can distinguish if the recorded events came from the same session. The idea is that each time an image is started a new session is created and assigned a new UUID. Now when I started to look on the data I noticed that I have some cases where I have same session IDs with different session creation times (yes a new session is initialized with a current timestamp). The time difference for the sessions with the same UUID and a different timestamp is within 2 hours. Then another thing that I did is to group the data by the timestamp and there are no cases where I have a different ID for the same timestamp, which shows that the timestamp is a more reliable ID. Now I will deal with my data just fine, but maybe we need to look in the implementation why do we get sessions with the same IDs? > > Cheers. > Uko I would be very surprised it would happen with NeoUUIDGenerator (NeoUUIDGenerator next). The idea was to replace UUIDGenerator and the VM plugin by it. That got stalled when there was unforeseen interaction with WorkingSession. I believe that should be solved by now. Still, UUIDs are not 100% guaranteed to be unique, they are a (very good) best effort. But I agree that if they repeat in such a short time frame, that should be considered a bug. Sven |
I had issues with Pharo not being random during the first startup: http://forum.world.st/Random-is-not-random-at-startup-td4895905.html
Even though the seed is different during startup, the mask trims the randomness away. Looking at NeoUUID, it actually makes it worse on Linux (and presumably Unix/Mac too), because it uses Pharo's broken seeding and not pooling /dev/urandom. (I can test it on linux machine later today just to be sure.) > Still, UUIDs are not 100% guaranteed to be unique, they are a (very good) best effort. For all intents and purposes they are considered 100% to be unique. If you generate two identical V4 UUIDs then either PRNG or seeding is broken (seeding in Pharo's case). Peter On Mon, Feb 06, 2017 at 02:35:37PM +0100, Sven Van Caekenberghe wrote: > > > On 6 Feb 2017, at 14:17, Yuriy Tymchuk <[hidden email]> wrote: > > > > Hi everyone, > > > > I’m using the session id (Smalltalk session id) for my data recording, so I can distinguish if the recorded events came from the same session. The idea is that each time an image is started a new session is created and assigned a new UUID. Now when I started to look on the data I noticed that I have some cases where I have same session IDs with different session creation times (yes a new session is initialized with a current timestamp). The time difference for the sessions with the same UUID and a different timestamp is within 2 hours. Then another thing that I did is to group the data by the timestamp and there are no cases where I have a different ID for the same timestamp, which shows that the timestamp is a more reliable ID. Now I will deal with my data just fine, but maybe we need to look in the implementation why do we get sessions with the same IDs? > > > > Cheers. > > Uko > > I would be very surprised it would happen with NeoUUIDGenerator (NeoUUIDGenerator next). The idea was to replace UUIDGenerator and the VM plugin by it. That got stalled when there was unforeseen interaction with WorkingSession. I believe that should be solved by now. > > Still, UUIDs are not 100% guaranteed to be unique, they are a (very good) best effort. > > But I agree that if they repeat in such a short time frame, that should be considered a bug. > > Sven > > > > > |
> On 6 Feb 2017, at 14:51, Peter Uhnak <[hidden email]> wrote: > > I had issues with Pharo not being random during the first startup: http://forum.world.st/Random-is-not-random-at-startup-td4895905.html > > Even though the seed is different during startup, the mask trims the randomness away. The Random generator is what it is, it can certainly be improved, but that is tricky too. In 64 bits we could probably do with less masking/trimming. > Looking at NeoUUID, it actually makes it worse on Linux (and presumably Unix/Mac too), because it uses Pharo's broken seeding and not pooling /dev/urandom. > > (I can test it on linux machine later today just to be sure.) Why would it be worse ? Of course it is not. Reading from /dev/random is not portable to Windows and tricky too (because it sometimes hangs until there is enough entropy). >> Still, UUIDs are not 100% guaranteed to be unique, they are a (very good) best effort. > > For all intents and purposes they are considered 100% to be unique. > If you generate two identical V4 UUIDs then either PRNG or seeding is broken (seeding in Pharo's case). > > Peter According to https://en.wikipedia.org/wiki/Universally_unique_identifier << When generated according to the standard methods, UUIDs are for practical purposes unique, without depending for their uniqueness on a central registration authority or coordination between the parties generating them, unlike most other numbering schemes. While the probability that a UUID will be duplicated is not zero, it is so close to zero as to be negligible. >> Read the last sentence. So IMO it is certainly not 'broken'. Note also that NeoUUID uses different elements, the random part is only one of them. > On Mon, Feb 06, 2017 at 02:35:37PM +0100, Sven Van Caekenberghe wrote: >> >>> On 6 Feb 2017, at 14:17, Yuriy Tymchuk <[hidden email]> wrote: >>> >>> Hi everyone, >>> >>> I’m using the session id (Smalltalk session id) for my data recording, so I can distinguish if the recorded events came from the same session. The idea is that each time an image is started a new session is created and assigned a new UUID. Now when I started to look on the data I noticed that I have some cases where I have same session IDs with different session creation times (yes a new session is initialized with a current timestamp). The time difference for the sessions with the same UUID and a different timestamp is within 2 hours. Then another thing that I did is to group the data by the timestamp and there are no cases where I have a different ID for the same timestamp, which shows that the timestamp is a more reliable ID. Now I will deal with my data just fine, but maybe we need to look in the implementation why do we get sessions with the same IDs? >>> >>> Cheers. >>> Uko >> >> I would be very surprised it would happen with NeoUUIDGenerator (NeoUUIDGenerator next). The idea was to replace UUIDGenerator and the VM plugin by it. That got stalled when there was unforeseen interaction with WorkingSession. I believe that should be solved by now. >> >> Still, UUIDs are not 100% guaranteed to be unique, they are a (very good) best effort. >> >> But I agree that if they repeat in such a short time frame, that should be considered a bug. >> >> Sven >> >> >> >> >> > |
In reply to this post by Sven Van Caekenberghe-2
Of course UUID is not guaranteed to be absolutely random. But it is suspicious these similarities happen is a small timeframe, so I think that the image got restarted a few times, but generated the same ID.
Uko > On 6 Feb 2017, at 14:35, Sven Van Caekenberghe <[hidden email]> wrote: > > >> On 6 Feb 2017, at 14:17, Yuriy Tymchuk <[hidden email]> wrote: >> >> Hi everyone, >> >> I’m using the session id (Smalltalk session id) for my data recording, so I can distinguish if the recorded events came from the same session. The idea is that each time an image is started a new session is created and assigned a new UUID. Now when I started to look on the data I noticed that I have some cases where I have same session IDs with different session creation times (yes a new session is initialized with a current timestamp). The time difference for the sessions with the same UUID and a different timestamp is within 2 hours. Then another thing that I did is to group the data by the timestamp and there are no cases where I have a different ID for the same timestamp, which shows that the timestamp is a more reliable ID. Now I will deal with my data just fine, but maybe we need to look in the implementation why do we get sessions with the same IDs? >> >> Cheers. >> Uko > > I would be very surprised it would happen with NeoUUIDGenerator (NeoUUIDGenerator next). The idea was to replace UUIDGenerator and the VM plugin by it. That got stalled when there was unforeseen interaction with WorkingSession. I believe that should be solved by now. > > Still, UUIDs are not 100% guaranteed to be unique, they are a (very good) best effort. > > But I agree that if they repeat in such a short time frame, that should be considered a bug. > > Sven > > > > > |
> On 6 Feb 2017, at 15:21, Yuriy Tymchuk <[hidden email]> wrote: > > Of course UUID is not guaranteed to be absolutely random. But it is suspicious these similarities happen is a small timeframe, so I think that the image got restarted a few times, but generated the same ID. If you say a timestamp is more reliable in your use case, note that NeoUUID contains a ms clock value, so it should not be possible that you get 2 similar values for 2 distinct runs. If you are limited to macOS/Linux one can always do '/dev/random' asFileReference binaryReadStreamDo: [ :in | in next: 8 ]. to get random bytes. > Uko > > >> On 6 Feb 2017, at 14:35, Sven Van Caekenberghe <[hidden email]> wrote: >> >> >>> On 6 Feb 2017, at 14:17, Yuriy Tymchuk <[hidden email]> wrote: >>> >>> Hi everyone, >>> >>> I’m using the session id (Smalltalk session id) for my data recording, so I can distinguish if the recorded events came from the same session. The idea is that each time an image is started a new session is created and assigned a new UUID. Now when I started to look on the data I noticed that I have some cases where I have same session IDs with different session creation times (yes a new session is initialized with a current timestamp). The time difference for the sessions with the same UUID and a different timestamp is within 2 hours. Then another thing that I did is to group the data by the timestamp and there are no cases where I have a different ID for the same timestamp, which shows that the timestamp is a more reliable ID. Now I will deal with my data just fine, but maybe we need to look in the implementation why do we get sessions with the same IDs? >>> >>> Cheers. >>> Uko >> >> I would be very surprised it would happen with NeoUUIDGenerator (NeoUUIDGenerator next). The idea was to replace UUIDGenerator and the VM plugin by it. That got stalled when there was unforeseen interaction with WorkingSession. I believe that should be solved by now. >> >> Still, UUIDs are not 100% guaranteed to be unique, they are a (very good) best effort. >> >> But I agree that if they repeat in such a short time frame, that should be considered a bug. >> >> Sven >> >> >> >> >> > > |
In reply to this post by Sven Van Caekenberghe-2
1. Regarding WorkingSession
The WS' comment claims "On each image startup the current session is invalidated and a new session is created.", but in reality WS is reset only save&quit, and not on startup... isn't that odd? So if image crashes or I am running it headlessly without saving I am actually still on the same session. 2. Regarding NeoUUID My apologies for stating that it was made worse than the /dev/urandom one, now I know why it doesn't really matter due to the other rolling factors. However I found some things that you may or may not find interesting (especially on Windows): On Windows (10) (with latest 6 image and VM) `Time microsecondClockValue` returns microseconds, but (presumably the system) cannot give precision beyond 1 second - this will imho need a VM fix; I can also generate about 1.2M UUIDs per second (limited by single core I guess), which means that about every 600-1200 consequential UUIDs will have the same clock value. On Linux the microseconds are fine, also I generate only 0.8M UUIDs (it is older machine, so with >1M UUID/sec you will still have time clash), now there's about 1-2% probability that the immediately next UUID will have the same clock, but this is countered by the counter. :) You are also taking 8 bytes of microsecondClockValue, but the value has only 7 bytes... so 8th byte is fixed to 0. 9 & 10 bytes are counter, but 9th bit is rewritten with variant, so the counter is actually 0-255 and not 0-65536. And finally 11 & 12 are random bits (assuming the seeding isn't broken). So on Windows, the conditional probability that nth and n+256th UUIDs will be identical is imho 1/65536 (assuming they are in the same second, which is easy). On Linux my understanding is that clash can only happen if NTP adjusts my clock during UUID generation (at which point it is same as Windows). Can UUID clash be achieved on Linux if you deploy copies of the same image and let them all generate UUIDs? (It should be again 1/65536). Regarding the poor seed at startup: 1k outside runs of 'NeoUUIDGenerator new nextRandom16' (on a fresh image) gives me only 116 unique values, compared to the expected 990-1000 In the above it's already second run of the generator, for (random initial) counter, there was only 69 unique values out of 1000. 3. Chicken and egg question How would one bootstrap session's id initialized with just NeoUUID? :) (WorkingSession wants UUID new in initialize, but UUID needs WorkingSession to generate a UUID) Peter > > Reading from /dev/random is not portable to Windows and tricky too (because it sometimes hangs until there is enough entropy). > > >> Still, UUIDs are not 100% guaranteed to be unique, they are a (very good) best effort. > > > > For all intents and purposes they are considered 100% to be unique. > > If you generate two identical V4 UUIDs then either PRNG or seeding is broken (seeding in Pharo's case). > > > > Peter > > According to https://en.wikipedia.org/wiki/Universally_unique_identifier > > << > When generated according to the standard methods, UUIDs are for practical purposes unique, without depending for their uniqueness on a central registration authority or coordination between the parties generating them, unlike most other numbering schemes. While the probability that a UUID will be duplicated is not zero, it is so close to zero as to be negligible. > >> > > Read the last sentence. > > So IMO it is certainly not 'broken'. > > Note also that NeoUUID uses different elements, the random part is only one of them. > > > On Mon, Feb 06, 2017 at 02:35:37PM +0100, Sven Van Caekenberghe wrote: > >> > >>> On 6 Feb 2017, at 14:17, Yuriy Tymchuk <[hidden email]> wrote: > >>> > >>> Hi everyone, > >>> > >>> I’m using the session id (Smalltalk session id) for my data recording, so I can distinguish if the recorded events came from the same session. The idea is that each time an image is started a new session is created and assigned a new UUID. Now when I started to look on the data I noticed that I have some cases where I have same session IDs with different session creation times (yes a new session is initialized with a current timestamp). The time difference for the sessions with the same UUID and a different timestamp is within 2 hours. Then another thing that I did is to group the data by the timestamp and there are no cases where I have a different ID for the same timestamp, which shows that the timestamp is a more reliable ID. Now I will deal with my data just fine, but maybe we need to look in the implementation why do we get sessions with the same IDs? > >>> > >>> Cheers. > >>> Uko > >> > >> I would be very surprised it would happen with NeoUUIDGenerator (NeoUUIDGenerator next). The idea was to replace UUIDGenerator and the VM plugin by it. That got stalled when there was unforeseen interaction with WorkingSession. I believe that should be solved by now. > >> > >> Still, UUIDs are not 100% guaranteed to be unique, they are a (very good) best effort. > >> > >> But I agree that if they repeat in such a short time frame, that should be considered a bug. > >> > >> Sven > >> > >> > >> > >> > >> > > > > |
First, thanks for the discussion, I think this is good.
The idea/reason behind NeoUUIDGenerator was to get an algorithm that is fully documented and clearly implemented with proper unit tests, all at the Smalltalk level, so that we can get rid of the plugin. > On 6 Feb 2017, at 18:54, Peter Uhnak <[hidden email]> wrote: > > 1. Regarding WorkingSession > > The WS' comment claims "On each image startup the current session is invalidated and a new session is created.", > but in reality WS is reset only save&quit, and not on startup... isn't that odd? > So if image crashes or I am running it headlessly without saving I am actually still on the same session. That is a bug. Either the documentation has to be changed, but more likely the implementation. The idea behind the old, simple Session object was crystal clear: create a new, unique (in the sense of #==) Session object on each run of an image (and not when saving). WorkingSession got more complicated and less clear. People seem to have different expectations from it, and we keep on having trouble with it, which are not a good signs. Your point (3) also indicates and important issue in division of responsibility. > 2. Regarding NeoUUID > > My apologies for stating that it was made worse than the /dev/urandom one, now I know why it doesn't really matter due to the other rolling factors. > However I found some things that you may or may not find interesting (especially on Windows): > > On Windows (10) (with latest 6 image and VM) `Time microsecondClockValue` returns microseconds, but (presumably the system) cannot give precision beyond 1 second - this will imho need a VM fix; I find that quite hard to believe and I did not know that. Are you really sure ? That would be terrible. A solution might be to use another clock primitive. > I can also generate about 1.2M UUIDs per second (limited by single core I guess), which means that about every 600-1200 consequential UUIDs will have the same clock value. I can't follow your reasoning here: if the clock precision would be 1 second, you would get 1.2M consecutive UUIDs with the same clock value, right ? In the unit tests this is verified (that the time value goes forward on consecutive calls). > On Linux the microseconds are fine, also I generate only 0.8M UUIDs (it is older machine, so with >1M UUID/sec you will still have time clash), > now there's about 1-2% probability that the immediately next UUID will have the same clock, but this is countered by the counter. :) Yes, the clock + the counter + the machine id + random (this is a described in detail in the class comment of NeoUUIDGenerator). > You are also taking 8 bytes of microsecondClockValue, but the value has only 7 bytes... so 8th byte is fixed to 0. True. > 9 & 10 bytes are counter, but 9th bit is rewritten with variant, so the counter is actually 0-255 and not 0-65536. Yes, but more correctly, the top 2 bits are set to 10, making the range 1 to 2^14 (16384) (instead of 2^16). > And finally 11 & 12 are random bits (assuming the seeding isn't broken). The seeding is one aspect (the quality of the seed), the algorithm is another. Note that in the current Random class, the seed is initialised using the clock as well. > So on Windows, the conditional probability that nth and n+256th UUIDs will be identical is imho 1/65536 (assuming they are in the same second, which is easy). I am pretty/quite sure this is not correct but it would take me much more time to come up with a more correct calculation. BTW, you also have to define the context of being the same: the same instance of one NeoUUIDGenerator generating the same UUID, or several different generator instances, in the same image, in different images, on different runs, the same or different machines/networks, ... > On Linux my understanding is that clash can only happen if NTP adjusts my clock during UUID generation (at which point it is same as Windows). > > Can UUID clash be achieved on Linux if you deploy copies of the same image and let them all generate UUIDs? (It should be again 1/65536). Same remark as above. > Regarding the poor seed at startup: > > 1k outside runs of 'NeoUUIDGenerator new nextRandom16' (on a fresh image) gives me only 116 unique values, compared to the expected 990-1000 > In the above it's already second run of the generator, for (random initial) counter, there was only 69 unique values out of 1000. But that is not proper use of a random generator, you create a new instance every time. You basically test seeding, which is different from random number generator proper. It seems that is a chicken-egg problem too ;-) BTW, you can currently provide your own seed too, for example: Random seed: ('/dev/random' asFileReference binaryReadStreamDo: [ :in | in next: 4 ]) asInteger. > 3. Chicken and egg question > > How would one bootstrap session's id initialized with just NeoUUID? :) (WorkingSession wants UUID new in initialize, but UUID needs WorkingSession to generate a UUID) That is correct. I don't understand why WorkingSession needs a UUID, it was not like that before. It also does not seem to be really used. One solution I can think of would be to make id lazy initialized. In any case, I am going to try integrating/activating NeoUUIDGenerator again in the latest Pharo 6. > Peter > >> >> Reading from /dev/random is not portable to Windows and tricky too (because it sometimes hangs until there is enough entropy). >> >>>> Still, UUIDs are not 100% guaranteed to be unique, they are a (very good) best effort. >>> >>> For all intents and purposes they are considered 100% to be unique. >>> If you generate two identical V4 UUIDs then either PRNG or seeding is broken (seeding in Pharo's case). >>> >>> Peter >> >> According to https://en.wikipedia.org/wiki/Universally_unique_identifier >> >> << >> When generated according to the standard methods, UUIDs are for practical purposes unique, without depending for their uniqueness on a central registration authority or coordination between the parties generating them, unlike most other numbering schemes. While the probability that a UUID will be duplicated is not zero, it is so close to zero as to be negligible. >>>> >> >> Read the last sentence. >> >> So IMO it is certainly not 'broken'. >> >> Note also that NeoUUID uses different elements, the random part is only one of them. >> >>> On Mon, Feb 06, 2017 at 02:35:37PM +0100, Sven Van Caekenberghe wrote: >>>> >>>>> On 6 Feb 2017, at 14:17, Yuriy Tymchuk <[hidden email]> wrote: >>>>> >>>>> Hi everyone, >>>>> >>>>> I’m using the session id (Smalltalk session id) for my data recording, so I can distinguish if the recorded events came from the same session. The idea is that each time an image is started a new session is created and assigned a new UUID. Now when I started to look on the data I noticed that I have some cases where I have same session IDs with different session creation times (yes a new session is initialized with a current timestamp). The time difference for the sessions with the same UUID and a different timestamp is within 2 hours. Then another thing that I did is to group the data by the timestamp and there are no cases where I have a different ID for the same timestamp, which shows that the timestamp is a more reliable ID. Now I will deal with my data just fine, but maybe we need to look in the implementation why do we get sessions with the same IDs? >>>>> >>>>> Cheers. >>>>> Uko >>>> >>>> I would be very surprised it would happen with NeoUUIDGenerator (NeoUUIDGenerator next). The idea was to replace UUIDGenerator and the VM plugin by it. That got stalled when there was unforeseen interaction with WorkingSession. I believe that should be solved by now. >>>> >>>> Still, UUIDs are not 100% guaranteed to be unique, they are a (very good) best effort. >>>> >>>> But I agree that if they repeat in such a short time frame, that should be considered a bug. >>>> >>>> Sven >>>> >>>> >>>> >>>> >>>> >>> >> >> > |
> > On Windows (10) (with latest 6 image and VM) `Time microsecondClockValue` returns microseconds, but (presumably the system) cannot give precision beyond 1 second - this will imho need a VM fix; > > I find that quite hard to believe and I did not know that. Are you really sure ? That would be terrible. A solution might be to use another clock primitive. > > > I can also generate about 1.2M UUIDs per second (limited by single core I guess), which means that about every 600-1200 consequential UUIDs will have the same clock value. > > I can't follow your reasoning here: if the clock precision would be 1 second, you would get 1.2M consecutive UUIDs with the same clock value, right ? Whoops, I meant the precision on a millisecond, not second... thus the ~1000k consecutive clock values. The returned value is still in microseconds, but the last 3 digits are fixed (they are different on startup, but same during execution). I faintly remember having the same issue in a different language years ago, because they had different APIs or some bullshit like that. So when I generate e.g. 10 random UUIDs I get: (1 to: 10) collect: [ :i | NeoUUIDGenerator next ]. "an Array( an UUID('349b2ccc-4404-0d00-a8e7-5c9c0ef2da37') an UUID('349b2ccc-4404-0d00-a8e8-2f3a0ef2da37') an UUID('349b2ccc-4404-0d00-a8e9-9e0b0ef2da37') an UUID('349b2ccc-4404-0d00-a8ea-15520ef2da37') an UUID('349b2ccc-4404-0d00-a8eb-fa5c0ef2da37') an UUID('349b2ccc-4404-0d00-a8ec-d25f0ef2da37') an UUID('349b2ccc-4404-0d00-a8ed-6a590ef2da37') an UUID('349b2ccc-4404-0d00-a8ee-3a8d0ef2da37') an UUID('349b2ccc-4404-0d00-a8ef-0ad20ef2da37') an UUID('349b2ccc-4404-0d00-a8f0-7bda0ef2da37'))" > > In the unit tests this is verified (that the time value goes forward on consecutive calls). > testTwo*Generator only tests if they are close, not that one is higher than the other, if I add self assert: time2 equals: time1. the test will still pass (but that is already obvious from the list above). > > 9 & 10 bytes are counter, but 9th bit is rewritten with variant, so the counter is actually 0-255 and not 0-65536. > > Yes, but more correctly, the top 2 bits are set to 10, making the range 1 to 2^14 (16384) (instead of 2^16). Welp, I can't do bit math. > > > And finally 11 & 12 are random bits (assuming the seeding isn't broken). > > The seeding is one aspect (the quality of the seed), the algorithm is another. > Note that in the current Random class, the seed is initialised using the clock as well. The problem with Random is not the time-based initialization, but masking of the initial value (which is mentioned in the originally linked thread on randomness, which fixes itself upon further generations). > > > So on Windows, the conditional probability that nth and n+256th UUIDs will be identical is imho 1/65536 (assuming they are in the same second, which is easy). > > I am pretty/quite sure this is not correct but it would take me much more time to come up with a more correct calculation. With the actual counter I would have to generate 16384 UUID/milisecond, which unlike 256/ms I cannot do, so it is safe again. :) (well... until Pharo goes multicore) > > BTW, you also have to define the context of being the same: the same instance of one NeoUUIDGenerator generating the same UUID, or several different generator instances, in the same image, in different images, on different runs, the same or different machines/networks, ... > > > On Linux my understanding is that clash can only happen if NTP adjusts my clock during UUID generation (at which point it is same as Windows). > > > > Can UUID clash be achieved on Linux if you deploy copies of the same image and let them all generate UUIDs? (It should be again 1/65536). Yeah, across machines the last four bytes behave randomly too. Maybe I can get clash if I run two random generators in forks (to fix 14 bits of counter randomness :) ... but no time for that now. :) Peter |
In reply to this post by Sven Van Caekenberghe-2
On 02/06/2017 09:00 PM, Sven Van Caekenberghe wrote:
> Regarding the poor seed at startup: >> 1k outside runs of 'NeoUUIDGenerator new nextRandom16' (on a fresh image) gives me only 116 unique values, compared to the expected 990-1000 >> In the above it's already second run of the generator, for (random initial) counter, there was only 69 unique values out of 1000. > But that is not proper use of a random generator, you create a new instance every time. You basically test seeding, which is different from random number generator proper. > Hi, isnt it that, what the thread originally was about? as far as i understand it, it started with yuriy saying: The idea is that each time an image is started a new session is created and assigned a new UUID... werner |
In reply to this post by Peter Uhnak
> On 6 Feb 2017, at 22:59, Peter Uhnak <[hidden email]> wrote: > > >>> On Windows (10) (with latest 6 image and VM) `Time microsecondClockValue` returns microseconds, but (presumably the system) cannot give precision beyond 1 second - this will imho need a VM fix; >> >> I find that quite hard to believe and I did not know that. Are you really sure ? That would be terrible. A solution might be to use another clock primitive. >> >>> I can also generate about 1.2M UUIDs per second (limited by single core I guess), which means that about every 600-1200 consequential UUIDs will have the same clock value. >> >> I can't follow your reasoning here: if the clock precision would be 1 second, you would get 1.2M consecutive UUIDs with the same clock value, right ? > > Whoops, I meant the precision on a millisecond, not second... thus the ~1000k consecutive clock values. > The returned value is still in microseconds, but the last 3 digits are fixed (they are different on startup, but same during execution). > > I faintly remember having the same issue in a different language years ago, because they had different APIs or some bullshit like that. > > So when I generate e.g. 10 random UUIDs I get: > > > (1 to: 10) collect: [ :i | NeoUUIDGenerator next ]. "an Array( > an UUID('349b2ccc-4404-0d00-a8e7-5c9c0ef2da37') > an UUID('349b2ccc-4404-0d00-a8e8-2f3a0ef2da37') > an UUID('349b2ccc-4404-0d00-a8e9-9e0b0ef2da37') > an UUID('349b2ccc-4404-0d00-a8ea-15520ef2da37') > an UUID('349b2ccc-4404-0d00-a8eb-fa5c0ef2da37') > an UUID('349b2ccc-4404-0d00-a8ec-d25f0ef2da37') > an UUID('349b2ccc-4404-0d00-a8ed-6a590ef2da37') > an UUID('349b2ccc-4404-0d00-a8ee-3a8d0ef2da37') > an UUID('349b2ccc-4404-0d00-a8ef-0ad20ef2da37') > an UUID('349b2ccc-4404-0d00-a8f0-7bda0ef2da37'))" I don't see any problem here, they are all different. Of course the clock part is the same during the same millisecond. >> In the unit tests this is verified (that the time value goes forward on consecutive calls). >> > > testTwo*Generator only tests if they are close, not that one is higher than the other, > > if I add > > self assert: time2 equals: time1. > > the test will still pass (but that is already obvious from the list above). > > > >>> 9 & 10 bytes are counter, but 9th bit is rewritten with variant, so the counter is actually 0-255 and not 0-65536. >> >> Yes, but more correctly, the top 2 bits are set to 10, making the range 1 to 2^14 (16384) (instead of 2^16). > > Welp, I can't do bit math. > >> >>> And finally 11 & 12 are random bits (assuming the seeding isn't broken). >> >> The seeding is one aspect (the quality of the seed), the algorithm is another. >> Note that in the current Random class, the seed is initialised using the clock as well. > > The problem with Random is not the time-based initialization, but masking of the initial value (which is mentioned in the originally linked thread on randomness, which fixes itself upon further generations). > >> >>> So on Windows, the conditional probability that nth and n+256th UUIDs will be identical is imho 1/65536 (assuming they are in the same second, which is easy). >> >> I am pretty/quite sure this is not correct but it would take me much more time to come up with a more correct calculation. > > With the actual counter I would have to generate 16384 UUID/milisecond, which unlike 256/ms I cannot do, so it is safe again. :) (well... until Pharo goes multicore) > >> >> BTW, you also have to define the context of being the same: the same instance of one NeoUUIDGenerator generating the same UUID, or several different generator instances, in the same image, in different images, on different runs, the same or different machines/networks, ... >> >>> On Linux my understanding is that clash can only happen if NTP adjusts my clock during UUID generation (at which point it is same as Windows). >>> >>> Can UUID clash be achieved on Linux if you deploy copies of the same image and let them all generate UUIDs? (It should be again 1/65536). > > Yeah, across machines the last four bytes behave randomly too. > > Maybe I can get clash if I run two random generators in forks (to fix 14 bits of counter randomness :) ... but no time for that now. :) > > Peter > |
In reply to this post by wernerk
> On 6 Feb 2017, at 23:01, werner kassens <[hidden email]> wrote: > > On 02/06/2017 09:00 PM, Sven Van Caekenberghe wrote: >> Regarding the poor seed at startup: >>> 1k outside runs of 'NeoUUIDGenerator new nextRandom16' (on a fresh image) gives me only 116 unique values, compared to the expected 990-1000 >>> In the above it's already second run of the generator, for (random initial) counter, there was only 69 unique values out of 1000. >> But that is not proper use of a random generator, you create a new instance every time. You basically test seeding, which is different from random number generator proper. >> > Hi, isnt it that, what the thread originally was about? as far as i understand it, it started with yuriy saying: > > The idea is that each time an image is started a new session is created and assigned a new UUID... Certainly for NeoUUIDGenerator, that will be the case, AFAIK. > werner |
In reply to this post by Peter Uhnak
On Tue, Feb 7, 2017 at 5:59 AM, Peter Uhnak <[hidden email]> wrote:
> >> > On Windows (10) (with latest 6 image and VM) `Time microsecondClockValue` returns microseconds, but (presumably the system) cannot give precision beyond 1 second - this will imho need a VM fix; >> >> I find that quite hard to believe and I did not know that. Are you really sure ? That would be terrible. A solution might be to use another clock primitive. >> >> > I can also generate about 1.2M UUIDs per second (limited by single core I guess), which means that about every 600-1200 consequential UUIDs will have the same clock value. >> >> I can't follow your reasoning here: if the clock precision would be 1 second, you would get 1.2M consecutive UUIDs with the same clock value, right ? > > Whoops, I meant the precision on a millisecond, not second... thus the ~1000k consecutive clock values. > The returned value is still in microseconds, but the last 3 digits are fixed (they are different on startup, but same during execution). This is a limitation of the Windows VM. I opened an issue to work on VM clock, but a few other side-tasks jumped above it while I was learning more about the VM code. https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/36 cheers -ben |
In reply to this post by Peter Uhnak
On Mon, Feb 6, 2017 at 6:54 PM, Peter Uhnak <[hidden email]> wrote: 1. Regarding WorkingSession Peter, how can we make this an actionable point? Guille |
Administrator
|
Yes, that doesn't sound good!
Cheers,
Sean |
I think that Peter’s main point was that a session should be reinitialized on boot and not reset during the closing event.
Uko > On 20 Feb 2017, at 15:51, Sean P. DeNigris <[hidden email]> wrote: > > Guillermo Polito wrote >>> So if image crashes or I am running it headlessly without saving I am >>> actually still on the same session. >>> >> Peter, how can we make this an actionable point? > > Yes, that doesn't sound good! > > > > ----- > Cheers, > Sean > -- > View this message in context: http://forum.world.st/WorkingSession-UUID-looks-sketchy-tp4933140p4935079.html > Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com. > |
Free forum by Nabble | Edit this page |