Hi All
This is a request for information which I think may have appeared in this group before, but I can't locate it at the moment. I am trying to re-implement in Dolphin an algorithm for pseudo-random number generation for which I have the original code in C. The method operates on an array of 624 unsigned 32-bit integers. If I just reproduce the code in Dolphin, I think the result will be a mixture of large and small integers, some of them changing from one type to the other at each update. This looks like a recipe for great inefficiency. Is there any way of operating on unsigned 32-bit integers and guaranteeing to remain within 4 bytes? (The operations are addition, bit shift left and right, bitAnd:, bitOr:) -- Thanks in advance Peter Kenny |
Peter Kenny wrote:
> I am trying to > re-implement in Dolphin an algorithm for pseudo-random number generation > for which I have the original code in C. The method operates on an array > of 624 unsigned 32-bit integers. Mersenne Twister ? I was just looking at that myself ;-) > If I just reproduce the code in Dolphin, > I think the result will be a mixture of large and small integers, some of > them changing from one type to the other at each update. This looks like > a recipe for great inefficiency. That could be a problem, but Dolphin's LargeIntegers are pretty quick, so I wouldn't worry about it until you've tried it. (And if it /is/ too small it'd make sense to pull the Twister code out into a DLL. I'd have it fill a DOUBLEArray buffer with data rather than involving an external call on each invocation of #next.) > Is there any way of operating on > unsigned 32-bit integers and guaranteeing to remain within 4 bytes? (The > operations are addition, bit shift left and right, bitAnd:, bitOr:) You could /store/ the numbers in a DWORDArray. I don't think it would help much in this case -- the gain is in storage overhead rather than runtime efficiency. Also the actual arithmetic operations can only happen if you have actual Integer objects (Large or Small), and will necessarily answer new ones, so I don't think there's any way to avoid creating them. -- chris |
Chris
Thanks. Yes it is Mersenne Twister. Following the discussion in another thread, I started looking for more background on random numbers. I began with Wikipedia, and I found things have moved on since I first used them in the 1950s. One article described MT as ' the method of choice for simulation', so I started to look at how to do it. There is code on the internet for C/C++, Fortran, Pascal, Java and Heaven knows what else, so I tried to do a direct implementation of the C code put up by the original authors. I shall now do a simple-minded reproduction of the C code in Smalltalk, which will almost certainly use large integers, and report back here on the results. Peter |
Hello all
My attempt to do a simple-minded reproduction in Smalltalk of the C code for the Mersenne Twister pseudo-random number generator seems to have ground to a halt. I have what seems to be a reproduction of the code logic, and it runs and produces numbers in the right range. However, the authors of the original give a listing of the first 1000 numbers produced with their initialization, and I cannot get anything like their numbers. I think there is a problem in reproducing the original logic fully, because Dolphin does not have a logical bit-shift operation. The effect of the rightward numerical bit-shift, which is to shift in copies of the sign bit at the left, should be countered by bit-anding an appropriate mask, but this does not seem to help. I think a leftward numerical bit-shift should be the same as a logical shift, but I may have missed something here. My worries about running speed were misplaced. The code I have takes about 50% longer than the integer Park & Miller algorithm in D5, so presumably about twice as long as the all-floating one in D6, which would be acceptable given the better quality of the output, but that is no comfort if the logic is not correct. It may be necessary, as Chris suggests, to use the nucleus of the C code as a DLL - which is beyond my knowledge. Oh dear - after midnight again - I must be mad! Peter |
Peter,
> My attempt to do a simple-minded reproduction in Smalltalk of the C code > for the Mersenne Twister pseudo-random number generator seems to have > ground to a halt. I have what seems to be a reproduction of the code > logic, and it runs and produces numbers in the right range. However, the > authors of the original give a listing of the first 1000 numbers produced > with their initialization, and I cannot get anything like their numbers. I think that might either be because there are several implementations around which don't necessarily produce the same results. There were issues with the seeding (which I don't understand), so that might break things too if you're using test data from before the fix and template code from after. Another thing is that it's bloody difficult to translate array handling code into Smalltalk from C. I had great difficulty getting the seeding code to work correctly. I dislike 1-based indexing intensely... Anyway, and for what it's worth, I have a (apparently) working implementation based on the code and tests at: http://www-personal.engin.umich.edu/~wagnerr/MersenneTwister.html which produces the same numbers when used in the same way. > My worries about running speed were misplaced. The code I have takes about > 50% longer than the integer Park & Miller algorithm in D5, so presumably > about twice as long as the all-floating one in D6, which would be > acceptable given the better quality of the output, It's a bit difficult to compare like with like, when the Random default is only generating 31 bits of data with each #next, wheras my implementation/transcription of MT is using 48 bits a go. But with some hacking to add the missing stuff to each class, I find that: Random 31 bits: 2.4 usecs Random 48 bits: 4.7 usecs That's the D5 Random, the D6 may be faster but I don't know how to pack the extra bits in. The full MT: MercenneTwister 32 bits: 4.37 usecs MercenneTwister 48 bits: 9.7 usecs The MT algoritm passes each output integer through an additional "tempering" stage, which doesn't actually lengthen the cycle, but does scramble the bits a bit more. Unfortunately, under Dolphin, it is taking up almost half the total execution time. If I remove the tempering, then: MercenneTwister2 32 bits: 2.28 usecs MercenneTwister2 48 bits: 5.5 usecs which might be a worthwhile speedup in some circumstances. On the whole, I'm not tempted to push the code into a DLL. It would be quite a lot faster, but I'm not sure that's worth the effort -- yet... For comparison, I measured a similar random stream,, where the "randomness" is generated by RC4 (aka PC1 as in PC1Cipher in the image). I don't know what kind of cycle time that has, but it is supposed to be of cryptgraphic quality (not surprising since its a cryptographic quality cypher). For that: RandomRC4 32 bits: 3.5 usecs RandomRC4 48 bits: 6.0 usecs So that's a useful, and reasonably quick, option too. -- chris |
Chris
> Anyway, and for what it's worth, I have a (apparently) working > implementation > based on the code and tests at: > http://www-personal.engin.umich.edu/~wagnerr/MersenneTwister.html > which produces the same numbers when used in the same way. > My version is based on the original authors' C code, which is included in Wagner's zip as mt19937ar.c, and I compared with their results in mt19937ar.out. I don't understand C++ too well, so I can't see the differences between the original and Wagner's version. Clearly you have done better than I did, if you can match the given results. > Random 31 bits: 2.4 usecs > Random 48 bits: 4.7 usecs > > That's the D5 Random, the D6 may be faster but I don't know how to pack > the > extra bits in. The full MT: > > MercenneTwister 32 bits: 4.37 usecs > MercenneTwister 48 bits: 9.7 usecs I was getting about 3.5 microsecs for my 32 bit version, which is clearly irrelevant since it was not getting the right answers! > > The MT algorithm passes each output integer through an additional > "tempering" > stage, which doesn't actually lengthen the cycle, but does scramble the > bits a > bit more. Unfortunately, under Dolphin, it is taking up almost half the > total > execution time. If I remove the tempering, then: > > MercenneTwister2 32 bits: 2.28 usecs > MercenneTwister2 48 bits: 5.5 usecs > > which might be a worthwhile speedup in some circumstances. I have looked at some literature on testing of random numbers, and some of the non-randomness can be quite subtle and yet quite devastating in some circumstances. I would be inclined not to tamper with the tempering - I doubt whether the extra running time would matter too much, certainly for my application. > > On the whole, I'm not tempted to push the code into a DLL. It would be > quite a > lot faster, but I'm not sure that's worth the effort -- yet... my suggestion of a DLL was not for speed, but just to be sure of reproducing the results of the original C code - by actually using it! > > For comparison, I measured a similar random stream,, where the > "randomness" is > generated by RC4 (aka PC1 as in PC1Cipher in the image). I don't know > what > kind of cycle time that has, but it is supposed to be of cryptographic > quality > (not surprising since its a cryptographic quality cypher). For that: > > RandomRC4 32 bits: 3.5 usecs > RandomRC4 48 bits: 6.0 usecs > > So that's a useful, and reasonably quick, option too. That looks very interesting. Do you have a source for that generator? It's presumably not PC1Cipher class>>generateRandomWithBits:, because I timed that at about 50 microsecs for the 32 bit case. I am not particularly hung up on the MT version, I would just like to get something with more guarantee of randomness than the Park & Miller version seems to have. Would you be willing to let me have either the 32 bit MT or the RC4 generator to play with, please? I realise I am not very good at this sort of development, and I would like to get on with my statistics research. Many thanks Peter |
Peter,
> Would you be willing to let me have either the 32 bit MT or > the RC4 generator to play with, please? Sure. I'll have to untangle them from some other stuff first, but I'll send them later on. -- chris |
Free forum by Nabble | Edit this page |