Unsigned small integers in Dolphin?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Unsigned small integers in Dolphin?

Peter Kenny-2
Hi All

This is a request for information which I think may have appeared in this
group before, but I can't locate it at the moment. I am trying to
re-implement in Dolphin an algorithm for pseudo-random number generation for
which I have the original code in C. The method operates on an array of 624
unsigned 32-bit integers. If I just reproduce the code in Dolphin, I think
the result will be a mixture of large and small integers, some of them
changing from one type to the other at each update. This looks like a recipe
for great inefficiency. Is there any way of operating on unsigned 32-bit
integers and guaranteeing to remain within 4 bytes? (The operations are
addition, bit shift left and right, bitAnd:, bitOr:)

--
Thanks in advance

Peter Kenny


Reply | Threaded
Open this post in threaded view
|

Re: Unsigned small integers in Dolphin?

Chris Uppal-3
Peter Kenny wrote:

> I am trying to
> re-implement in Dolphin an algorithm for pseudo-random number generation
> for which I have the original code in C. The method operates on an array
> of 624 unsigned 32-bit integers.

Mersenne Twister ?  I was just looking at that myself ;-)


> If I just reproduce the code in Dolphin,
> I think the result will be a mixture of large and small integers, some of
> them changing from one type to the other at each update. This looks like
> a recipe for great inefficiency.

That could be a problem, but Dolphin's LargeIntegers are pretty quick, so I
wouldn't worry about it until you've tried it.  (And if it /is/ too small it'd
make sense to pull the Twister code out into a DLL.  I'd have it fill a
DOUBLEArray buffer with data rather than involving an external call on each
invocation of #next.)


> Is there any way of operating on
> unsigned 32-bit integers and guaranteeing to remain within 4 bytes? (The
> operations are addition, bit shift left and right, bitAnd:, bitOr:)

You could /store/ the numbers in a DWORDArray.  I don't think it would help
much in this case -- the gain is in storage overhead rather than runtime
efficiency.  Also the actual arithmetic operations can only happen if you have
actual Integer objects (Large or Small), and will necessarily answer new ones,
so I don't think there's any way to avoid creating them.

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: Unsigned small integers in Dolphin?

Peter Kenny-2
Chris

Thanks. Yes it is Mersenne Twister. Following the discussion in another
thread, I started looking for more background on random numbers. I began
with Wikipedia, and I found things have moved on since I first used them in
the 1950s. One article described MT as ' the method of choice for
simulation', so I started to look at how to do it. There is code on the
internet for C/C++, Fortran, Pascal, Java and Heaven knows what else, so I
tried to do a direct implementation of the C code put up by the original
authors. I shall now do a simple-minded reproduction of the C code in
Smalltalk, which will almost certainly use large integers, and report back
here on the results.

Peter


Reply | Threaded
Open this post in threaded view
|

Mersenne Twister [was: Unsigned small integers in Dolphin?]

Peter Kenny-2
Hello all

My attempt to do a simple-minded reproduction in Smalltalk of the C code for
the Mersenne Twister pseudo-random number generator seems to have ground to
a halt. I have what seems to be a reproduction of the code logic, and it
runs and produces numbers in the right range. However, the authors of the
original give a listing of the first 1000 numbers produced with their
initialization, and I cannot get anything like their numbers. I think there
is a problem in reproducing the original logic fully, because Dolphin does
not have a logical bit-shift operation. The effect of the rightward
numerical bit-shift, which is to shift in copies of the sign bit at the
left, should be countered by bit-anding an appropriate mask, but this does
not seem to help. I think a leftward numerical bit-shift should be the same
as a logical shift, but I may have missed something here.

My worries about running speed were misplaced. The code I have takes about
50% longer than the integer Park & Miller algorithm in D5, so presumably
about twice as long as the all-floating one in D6, which would be acceptable
given the better quality of the output, but that is no comfort if the logic
is not correct. It may be necessary, as Chris suggests, to use the nucleus
of the C code as a DLL - which is beyond my knowledge.

Oh dear - after midnight again - I must be mad!

Peter


Reply | Threaded
Open this post in threaded view
|

Re: Mersenne Twister [was: Unsigned small integers in Dolphin?]

Chris Uppal-3
Peter,

> My attempt to do a simple-minded reproduction in Smalltalk of the C code
> for the Mersenne Twister pseudo-random number generator seems to have
> ground to a halt. I have what seems to be a reproduction of the code
> logic, and it runs and produces numbers in the right range. However, the
> authors of the original give a listing of the first 1000 numbers produced
> with their initialization, and I cannot get anything like their numbers.

I think that might either be because there are several implementations around
which don't necessarily produce the same results.  There were issues with the
seeding (which I don't understand), so that might break things too if you're
using test data from before the fix and template code from after.  Another
thing is that it's bloody difficult to translate array handling code into
Smalltalk from C.  I had great difficulty getting the seeding code to work
correctly.  I dislike 1-based indexing intensely...

Anyway, and for what it's worth, I have a (apparently) working implementation
based on the code and tests at:
    http://www-personal.engin.umich.edu/~wagnerr/MersenneTwister.html
which produces the same numbers when used in the same way.


> My worries about running speed were misplaced. The code I have takes about
> 50% longer than the integer Park & Miller algorithm in D5, so presumably
> about twice as long as the all-floating one in D6, which would be
> acceptable given the better quality of the output,

It's a bit difficult to compare like with like, when the Random default is only
generating 31 bits of data with each #next, wheras my
implementation/transcription of MT is using 48 bits a go.  But with some
hacking to add the missing stuff to each class, I find that:

    Random 31 bits:    2.4 usecs
    Random 48 bits:    4.7 usecs

That's the D5 Random, the D6 may be faster but I don't know how to pack the
extra bits in.  The full MT:

    MercenneTwister 32 bits:    4.37 usecs
    MercenneTwister 48 bits:    9.7 usecs

The MT algoritm passes each output integer through an additional "tempering"
stage, which doesn't actually lengthen the cycle, but does scramble the bits a
bit more.  Unfortunately, under Dolphin, it is taking up almost half the total
execution time.  If I remove the tempering, then:

    MercenneTwister2 32 bits:    2.28 usecs
    MercenneTwister2 48 bits:    5.5 usecs

which might be a worthwhile speedup in some circumstances.

On the whole, I'm not tempted to push the code into a DLL.  It would be quite a
lot faster, but I'm not sure that's worth the effort -- yet...

For comparison, I measured a similar random stream,, where the "randomness" is
generated by RC4 (aka PC1 as in PC1Cipher in the image).  I don't know what
kind of cycle time that has, but it is supposed to be of cryptgraphic quality
(not surprising since its a cryptographic quality cypher).  For that:

    RandomRC4 32 bits:    3.5 usecs
    RandomRC4 48 bits:    6.0 usecs

So that's a useful, and reasonably quick, option too.

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: Mersenne Twister [was: Unsigned small integers in Dolphin?]

Peter Kenny-2
Chris

> Anyway, and for what it's worth, I have a (apparently) working
> implementation
> based on the code and tests at:
>    http://www-personal.engin.umich.edu/~wagnerr/MersenneTwister.html
> which produces the same numbers when used in the same way.
>
My version is based on the original authors' C code, which is included in
Wagner's zip as mt19937ar.c, and I compared with their results in
mt19937ar.out. I don't understand C++ too well, so I can't see the
differences between the original and Wagner's version. Clearly you have done
better than I did, if you can match the given results.

>    Random 31 bits:    2.4 usecs
>    Random 48 bits:    4.7 usecs
>
> That's the D5 Random, the D6 may be faster but I don't know how to pack
> the
> extra bits in.  The full MT:
>
>    MercenneTwister 32 bits:    4.37 usecs
>    MercenneTwister 48 bits:    9.7 usecs

I was getting about 3.5 microsecs for my 32 bit version, which is clearly
irrelevant since it was not getting the right answers!

>
> The MT algorithm passes each output integer through an additional
> "tempering"
> stage, which doesn't actually lengthen the cycle, but does scramble the
> bits a
> bit more.  Unfortunately, under Dolphin, it is taking up almost half the
> total
> execution time.  If I remove the tempering, then:
>
>    MercenneTwister2 32 bits:    2.28 usecs
>    MercenneTwister2 48 bits:    5.5 usecs
>
> which might be a worthwhile speedup in some circumstances.

I have looked at some literature on testing of random numbers, and some of
the non-randomness can be quite subtle and yet quite devastating in some
circumstances. I would be inclined not to tamper with the tempering - I
doubt whether the extra running time would matter too much, certainly for my
application.

>
> On the whole, I'm not tempted to push the code into a DLL.  It would be
> quite a
> lot faster, but I'm not sure that's worth the effort -- yet...

my suggestion of a DLL was not for speed, but just to be sure of reproducing
the results of the original C code - by actually using it!

>
> For comparison, I measured a similar random stream,, where the
> "randomness" is
> generated by RC4 (aka PC1 as in PC1Cipher in the image).  I don't know
> what
> kind of cycle time that has, but it is supposed to be of cryptographic
> quality
> (not surprising since its a cryptographic quality cypher).  For that:
>
>    RandomRC4 32 bits:    3.5 usecs
>    RandomRC4 48 bits:    6.0 usecs
>
> So that's a useful, and reasonably quick, option too.

That looks very interesting. Do you have a source for that generator? It's
presumably not PC1Cipher class>>generateRandomWithBits:, because I timed
that at about 50 microsecs for the 32 bit case.

I am not particularly hung up on the MT version, I would just like to get
something with more guarantee of randomness than the Park & Miller version
seems to have. Would you be willing to let me have either the 32 bit MT or
the RC4 generator to play with, please? I realise I am not very good at this
sort of development, and I would like to get on with my statistics research.

Many thanks

Peter


Reply | Threaded
Open this post in threaded view
|

Re: Mersenne Twister [was: Unsigned small integers in Dolphin?]

Chris Uppal-3
Peter,

> Would you be willing to let me have either the 32 bit MT or
> the RC4 generator to play with, please?

Sure. I'll have to untangle them from some other stuff first, but I'll send
them later on.

    -- chris