Is this a buglet? (SmallInteger>>printString related)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Is this a buglet? (SmallInteger>>printString related)

Göran Krampe
Hi!

In playing around with SmallInteger>>printString (see
bugs.squeak.org/view.php?id=6512) I tested this code for calculating the
length of the String:

    self log truncated + 1

...but that doesn't work. The current code in my ChangeSet uses an
unrolled binary search (nested ifs) which is 7x faster
(#decimalDigitLength) so the above code is not something we would like to
use, but I still wonder why it fails and what I would have needed to do
about it to get it to work:

999 log truncated  ===> 2         As expected.

1000 log           ===> 3.0       As expected.

1000 log truncated ===> 2         Eh, what? Not expected!

...ok, yes, I know - Floats are the "tar pits of hell" when it comes to
expected results - but... can someone answer if this is a bug or if I
simply can't expect this code to work?

regards, Göran

PS. Strongtalk does in fact do SmallInteger>>printString almost exactly
like my enhancement does (didn't peek, promise!) but does it around 5x
faster than Squeak. :) I fired it up to look at how it does #log but log
wasn't implemented. On the other hand Bryce tested Exupery on my code and
it made a real difference - at least 2x better (not sure if he only tested
printString). And then he mentioned an obvious improvement in Exupery that
would make it even faster. Interesting method for comparing!


Reply | Threaded
Open this post in threaded view
|

Re: Is this a buglet? (SmallInteger>>printString related)

Bert Freudenberg

On May 30, 2007, at 13:01 , Göran Krampe wrote:

> Hi!
>
> In playing around with SmallInteger>>printString (see
> bugs.squeak.org/view.php?id=6512) I tested this code for  
> calculating the
> length of the String:
>
>     self log truncated + 1
>
> ...but that doesn't work. The current code in my ChangeSet uses an
> unrolled binary search (nested ifs) which is 7x faster
> (#decimalDigitLength) so the above code is not something we would  
> like to
> use, but I still wonder why it fails and what I would have needed  
> to do
> about it to get it to work:
>
> 999 log truncated  ===> 2         As expected.
>
> 1000 log           ===> 3.0       As expected.
>
> 1000 log truncated ===> 2         Eh, what? Not expected!
>
> ...ok, yes, I know - Floats are the "tar pits of hell" when it  
> comes to
> expected results - but... can someone answer if this is a bug or if I
> simply can't expect this code to work?


1000 log - 3.0 "-4.44089209850063e-16"

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Is this a buglet? (SmallInteger>>printString related)

Alexander Lazarevic'
In reply to this post by Göran Krampe
Hi Göran,

when I wrote printStringBase: I remember I tried to use x ln / base ln
to get the string length with the same results as you are. Depending on
floats it is just not robust. I went with Stream to keep it simple. It
sure can help to have an optimized version for the major base rather
then using printStringBase: 10.

Alex

Göran Krampe schrieb:

> Hi!
>
> In playing around with SmallInteger>>printString (see
> bugs.squeak.org/view.php?id=6512) I tested this code for calculating the
> length of the String:
>
>     self log truncated + 1
>
> ...but that doesn't work. The current code in my ChangeSet uses an
> unrolled binary search (nested ifs) which is 7x faster
> (#decimalDigitLength) so the above code is not something we would like to
> use, but I still wonder why it fails and what I would have needed to do
> about it to get it to work:
>
> 999 log truncated  ===> 2         As expected.
>
> 1000 log           ===> 3.0       As expected.
>
> 1000 log truncated ===> 2         Eh, what? Not expected!
>
> ...ok, yes, I know - Floats are the "tar pits of hell" when it comes to
> expected results - but... can someone answer if this is a bug or if I
> simply can't expect this code to work?
>
> regards, Göran
>
> PS. Strongtalk does in fact do SmallInteger>>printString almost exactly
> like my enhancement does (didn't peek, promise!) but does it around 5x
> faster than Squeak. :) I fired it up to look at how it does #log but log
> wasn't implemented. On the other hand Bryce tested Exupery on my code and
> it made a real difference - at least 2x better (not sure if he only tested
> printString). And then he mentioned an obvious improvement in Exupery that
> would make it even faster. Interesting method for comparing!
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Is this a buglet? (SmallInteger>>printString related)

Bert Freudenberg
Well,

        (self asFloat + 0.9) log truncated + 1

would work reliably for SmallIntegers.

The nested conditionals might still win ... in particular if it was  
not binary search but based on frequency of numbers.

- Bert -

On May 30, 2007, at 15:02 , Alexander Lazarevic' wrote:

> Hi Göran,
>
> when I wrote printStringBase: I remember I tried to use x ln / base ln
> to get the string length with the same results as you are.  
> Depending on
> floats it is just not robust. I went with Stream to keep it simple. It
> sure can help to have an optimized version for the major base rather
> then using printStringBase: 10.
>
> Alex
>
> Göran Krampe schrieb:
>> Hi!
>>
>> In playing around with SmallInteger>>printString (see
>> bugs.squeak.org/view.php?id=6512) I tested this code for  
>> calculating the
>> length of the String:
>>
>>     self log truncated + 1
>>
>> ...but that doesn't work. The current code in my ChangeSet uses an
>> unrolled binary search (nested ifs) which is 7x faster
>> (#decimalDigitLength) so the above code is not something we would  
>> like to
>> use, but I still wonder why it fails and what I would have needed  
>> to do
>> about it to get it to work:
>>
>> 999 log truncated  ===> 2         As expected.
>>
>> 1000 log           ===> 3.0       As expected.
>>
>> 1000 log truncated ===> 2         Eh, what? Not expected!
>>
>> ...ok, yes, I know - Floats are the "tar pits of hell" when it  
>> comes to
>> expected results - but... can someone answer if this is a bug or if I
>> simply can't expect this code to work?
>>
>> regards, Göran
>>
>> PS. Strongtalk does in fact do SmallInteger>>printString almost  
>> exactly
>> like my enhancement does (didn't peek, promise!) but does it  
>> around 5x
>> faster than Squeak. :) I fired it up to look at how it does #log  
>> but log
>> wasn't implemented. On the other hand Bryce tested Exupery on my  
>> code and
>> it made a real difference - at least 2x better (not sure if he  
>> only tested
>> printString). And then he mentioned an obvious improvement in  
>> Exupery that
>> would make it even faster. Interesting method for comparing!
>>
>>
>





Reply | Threaded
Open this post in threaded view
|

Re: Is this a buglet? (SmallInteger>>printString related)

Nicolas Cellier-3
I used this in VW for any arbitrary large integer ,
Works even if integer cannot convert to Float because > Float maxVal.
(available at public store SYSEXT-floorLog)

Integer>>floorLog10
        "Initial guess is based on digitLength and following approximation :
        (2 raisedTo: 10) > (10 raisedTo: 3)"

        | i j step guess n |
        (self negative or: [self isZero])
                ifTrue:
                        [^self error: 'Can''t take log of number <= 0'].
        guess := (self digitLength - 1) * 8 * 3 // 10.
        i := self // (10 raisedTo: guess).
        step := 2.
        [step > 0] whileTrue:
                        [n := 10 raisedTo: step.
                       
                        [j := i // n.
                        j > 0] whileTrue:
                                                [guess := guess + step.
                                                i := j].
                        step := step // 2].
        ^guess

Of course, for SmallInteger, Bert's solution might be faster.

As for Float, i guess a native log10 FFI call should behave better than
self ln / 10 ln, since the latter implies 3 rounding operations (and an
inexact operation). I did not check the former however.

Nicolas


Bert Freudenberg a écrit :

> Well,
>
>     (self asFloat + 0.9) log truncated + 1
>
> would work reliably for SmallIntegers.
>
> The nested conditionals might still win ... in particular if it was not
> binary search but based on frequency of numbers.
>
> - Bert -
>
> On May 30, 2007, at 15:02 , Alexander Lazarevic' wrote:
>
>> Hi Göran,
>>
>> when I wrote printStringBase: I remember I tried to use x ln / base ln
>> to get the string length with the same results as you are. Depending on
>> floats it is just not robust. I went with Stream to keep it simple. It
>> sure can help to have an optimized version for the major base rather
>> then using printStringBase: 10.
>>
>> Alex
>>
>> Göran Krampe schrieb:
>>> Hi!
>>>
>>> In playing around with SmallInteger>>printString (see
>>> bugs.squeak.org/view.php?id=6512) I tested this code for calculating the
>>> length of the String:
>>>
>>>     self log truncated + 1
>>>
>>> ...but that doesn't work. The current code in my ChangeSet uses an
>>> unrolled binary search (nested ifs) which is 7x faster
>>> (#decimalDigitLength) so the above code is not something we would
>>> like to
>>> use, but I still wonder why it fails and what I would have needed to do
>>> about it to get it to work:
>>>
>>> 999 log truncated  ===> 2         As expected.
>>>
>>> 1000 log           ===> 3.0       As expected.
>>>
>>> 1000 log truncated ===> 2         Eh, what? Not expected!
>>>
>>> ...ok, yes, I know - Floats are the "tar pits of hell" when it comes to
>>> expected results - but... can someone answer if this is a bug or if I
>>> simply can't expect this code to work?
>>>
>>> regards, Göran
>>>
>>> PS. Strongtalk does in fact do SmallInteger>>printString almost exactly
>>> like my enhancement does (didn't peek, promise!) but does it around 5x
>>> faster than Squeak. :) I fired it up to look at how it does #log but log
>>> wasn't implemented. On the other hand Bryce tested Exupery on my code
>>> and
>>> it made a real difference - at least 2x better (not sure if he only
>>> tested
>>> printString). And then he mentioned an obvious improvement in Exupery
>>> that
>>> would make it even faster. Interesting method for comparing!
>>>
>>>
>>
>
>
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Is this a buglet? (SmallInteger>>printString related)

Andreas.Raab
nicolas cellier wrote:
> As for Float, i guess a native log10 FFI call should behave better than
> self ln / 10 ln, since the latter implies 3 rounding operations (and an
> inexact operation).

Indeed. Croquet gets it right (it uses a primitive from fdlibm).

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Is this a buglet? (SmallInteger>>printString related)

Nicolas Cellier-3
In reply to this post by Bert Freudenberg
Just tried this very simple one:
(I guess the smaller the merrier)

floorLog10bert
     "from Bert Freundenberg's idea"

     (self negative or: [self isZero])
         ifTrue:
             [^self error: 'Can''t take log of number <= 0'].
        self < 10 ifTrue: [^0].
        self < 100 ifTrue: [^1].
        self < 1000 ifTrue: [^2].
        self < 10000 ifTrue: [^3].
        self < 100000 ifTrue: [^4].
        self < 1000000 ifTrue: [^5].
        self < 10000000 ifTrue: [^6].
        self < 100000000 ifTrue: [^7].
        self < 1000000000 ifTrue: [^8].
        ^9 "Based on the fact that (SmallInteger maxVal log: 10) floor = 9"
       

Comparing universal Integer version provided in previous mail and Bert's
gives a factor 5 at least

Time millisecondsToRun: [1000000 timesRepeat: [SmallInteger maxVal
floorLog10 ]]. 19846
Time millisecondsToRun: [1000000 timesRepeat: [SmallInteger maxVal
floorLog10bert ]]. 3790 3777

Ain't that much to gain when reducing number of tests anyway:
Time millisecondsToRun: [1000000 timesRepeat: [1 floorLog10bert ]]. 2563

Seems fast enough not to bother.

Nicolas

Bert Freudenberg a écrit :
>
> The nested conditionals might still win ... in particular if it was not
> binary search but based on frequency of numbers.
>
> - Bert -
>


Reply | Threaded
Open this post in threaded view
|

Re: Is this a buglet? (SmallInteger>>printString related)

Göran Krampe
Hi!

nicolas cellier <[hidden email]> wrote:

> Just tried this very simple one:
> (I guess the smaller the merrier)
>
> floorLog10bert
>      "from Bert Freundenberg's idea"
>
>      (self negative or: [self isZero])
>          ifTrue:
>              [^self error: 'Can''t take log of number <= 0'].
> self < 10 ifTrue: [^0].
> self < 100 ifTrue: [^1].
> self < 1000 ifTrue: [^2].
> self < 10000 ifTrue: [^3].
> self < 100000 ifTrue: [^4].
> self < 1000000 ifTrue: [^5].
> self < 10000000 ifTrue: [^6].
> self < 100000000 ifTrue: [^7].
> self < 1000000000 ifTrue: [^8].
> ^9 "Based on the fact that (SmallInteger maxVal log: 10) floor = 9"
>
>
> Comparing universal Integer version provided in previous mail and Bert's
> gives a factor 5 at least

I also noted this and I also used the above approach - though with a
binary search. My code is in the mantis issue I posted in the beginning.
:)

Anyway, since smaller numbers are likely to be much more common I would
guess the ordering above is actually "smarter". And yes, it was fun to
see that Strongtalk does the above and its SmallInteger>>printString is
very similar to the one I ended up writing in the mantis issue.

regards, Göran

Reply | Threaded
Open this post in threaded view
|

Re: Is this a buglet? (SmallInteger>>printString related)

Bert Freudenberg
On May 30, 2007, at 23:36 , [hidden email] wrote:

> Hi!
>
> nicolas cellier <[hidden email]> wrote:
>> Just tried this very simple one:
>> (I guess the smaller the merrier)
>>
>> floorLog10bert
>>      "from Bert Freundenberg's idea"

(My name occurrencesOf: $n) = 1 ;-)

>>      (self negative or: [self isZero])
>>          ifTrue:
>>              [^self error: 'Can''t take log of number <= 0'].

I'd use "self < 1" for the first test to avoid unnecessary message  
sends (and it's prettier, too).

>> self < 10 ifTrue: [^0].
>> self < 100 ifTrue: [^1].
>> self < 1000 ifTrue: [^2].
>> self < 10000 ifTrue: [^3].
>> self < 100000 ifTrue: [^4].
>> self < 1000000 ifTrue: [^5].
>> self < 10000000 ifTrue: [^6].
>> self < 100000000 ifTrue: [^7].
>> self < 1000000000 ifTrue: [^8].
>> ^9 "Based on the fact that (SmallInteger maxVal log: 10) floor = 9"
>>
>>
>> Comparing universal Integer version provided in previous mail and  
>> Bert's
>> gives a factor 5 at least


> I also noted this and I also used the above approach - though with a
> binary search. My code is in the mantis issue I posted in the  
> beginning.
> :)
>
> Anyway, since smaller numbers are likely to be much more common I  
> would
> guess the ordering above is actually "smarter". And yes, it was fun to
> see that Strongtalk does the above and its  
> SmallInteger>>printString is
> very similar to the one I ended up writing in the mantis issue.

I still wonder how common #printString calls are in performance-
sensitive code ... I usually use printOn: if performance matters, so  
maybe that should be optimized, too?

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Is this a buglet? (SmallInteger>>printString related)

Nicolas Cellier-3
Bert Freudenberg a écrit :
> On May 30, 2007, at 23:36 , [hidden email] wrote:
>
>>> floorLog10bert
>>>      "from Bert Freundenberg's idea"
>
> (My name occurrencesOf: $n) = 1 ;-)
>

Apologies, take it as a friendly error.

>>>      (self negative or: [self isZero])
>>>          ifTrue:
>>>              [^self error: 'Can''t take log of number <= 0'].
>
> I'd use "self < 1" for the first test to avoid unnecessary message sends
> (and it's prettier, too).
>

For sure, (self negative or: [self isZero]) is a stupid copy paste.
It was tailored for LargeInteger implementation for which both
operations are trivial.
But when i think of it, LargeInteger < SmallInteger should be trivial too.
I wonder which dialect might engage a costly useless arithmetic
operation when asked for it...

Nicolas


Reply | Threaded
Open this post in threaded view
|

Re: Is this a buglet? (SmallInteger>>printString related)

Alexander Lazarevic'
In reply to this post by Bert Freudenberg
Bert Freudenberg schrieb:
> Well,
>
>     (self asFloat + 0.9) log truncated + 1
>
> would work reliably for SmallIntegers.

Ah. Well yes :)

Alex

Reply | Threaded
Open this post in threaded view
|

Re: Is this a buglet? (SmallInteger>>printString related)

Nicolas Cellier-3
In reply to this post by Nicolas Cellier-3
nicolas cellier a écrit :

> For sure, (self negative or: [self isZero]) is a stupid copy paste.
> It was tailored for LargeInteger implementation for which both
> operations are trivial.
> But when i think of it, LargeInteger < SmallInteger should be trivial too.
> I wonder which dialect might engage a costly useless arithmetic
> operation when asked for it...
>
> Nicolas
>

For my defense, just gave it a try:
after defining LargePositiveInteger>>isZero ^false, i test:

| val |
val := (SmallInteger maxVal raisedTo: 4) + 1.
{Time millisecondsToRun: [100000 timesRepeat: [val < 1]].
Time millisecondsToRun: [100000 timesRepeat: [val negative or: [val
isZero]]].
Time millisecondsToRun: [100000 timesRepeat: [val strictlyPositive not]].}
        #(457 118 106)

Hmm test < 1 is not the fastest.
Since primitive 23 seems to fail, i remove LargePositiveInteger>>#<
and i get #(399 121 113).
Then not much to gain, #digitCompare: must be half the cost, one trivial
negative and two #< tests on small integers...

So i tried the double dispatching way:
LargePositiveInteger>>#< aNumber
        ^aNumber isInteger
                ifTrue: [aNumber greaterThanLargePositiveInteger: self]
                ifFalse: [aNumber adaptToInteger: self andSend: #<]
LargePositiveInteger>>#greaterThanLargePositiveInteger: aNumber
         ^(self digitCompare: aNumber) > 0
LargeNegativeInteger>>#greaterThanLargePositiveInteger: aNumber
         ^false
SmallInteger>>#greaterThanLargePositiveInteger: aNumber
         ^false

I now get    #(217 111 108).
Like for isZero and isNegative, I call two trivial methods but pay a
penalty for intermediate #< method invocation.
(would need Exupery inlining if able to handle such polymorphic case).

I could also suppress the isInteger test and implement:
Object>>#greaterThanLargePositiveInteger: aNumber
         ^self adaptToInteger: aNumber andSend: #<
Well  #(190 112 108) maybe does not deserve polluting Object protocol...

If you don't care, or think it's too much for such a ridiculous gain,
then you're probably right.
Don't you optimize but in tight loops! And maybe arithmetics too...
Anyway it's fun when it can be done in Smalltalk without any additionnal
primitive or plugin...

Nicolas


Reply | Threaded
Open this post in threaded view
|

Re: Is this a buglet? (SmallInteger>>printString related)

Göran Krampe
In reply to this post by Bert Freudenberg
Hi folks!

Messing with small scale optimization can be great fun. Preamble:

"Change Set: FloorLogSmallInteger
Date: 31 May 2007
Author: Göran Krampe

After messing with SmallInteger>>decimalDigitLength and noticing that
'1000 log truncated' returns the wrong answer I ended up chasing down
base-10 log algorithms. Found this nice source of tricks:

http://www.hackersdelight.org/HDcode/ilog.c

And based on code in there and the fact that nlz (number of leading zeros)
can be computed using 'self asFloat exponent' I came up with this
implementation of #floorLog for SmallIntegers. It still is not as fast as
simpleminded comparisons as in decimalDigitLength though so it is not
worth using. But it was fun to experiment."


So the attachment is merely for amusement and/or info on the subject! :) :)

regards, Göran


FloorLogSmallInteger.1.cs.gz (898 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Is this a buglet? (SmallInteger>>printString related)

Nicolas Cellier-3
Göran Krampe <goran <at> krampe.se> writes:

>
> After messing with SmallInteger>>decimalDigitLength and noticing that
> '1000 log truncated' returns the wrong answer I ended up chasing down
> base-10 log algorithms. Found this nice source of tricks:
>
> http://www.hackersdelight.org/HDcode/ilog.c
>
> And based on code in there and the fact that nlz (number of leading zeros)
> can be computed using 'self asFloat exponent' I came up with this
> implementation of #floorLog for SmallIntegers. It still is not as fast as
> simpleminded comparisons as in decimalDigitLength though so it is not
> worth using. But it was fun to experiment."
>

Yes, this link is fun.
Also don't forget Smalltalk has #highBit,
a complement of nlz not involving Float.

Nicolas