Smalltalk › Squeak › Squeak - Dev

Is this a buglet? (SmallInteger>>printString related)

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

14 messages Options

Göran Krampe

Is this a buglet? (SmallInteger>>printString related)

Hi!

In playing around with SmallInteger>>printString (see
bugs.squeak.org/view.php?id=6512) I tested this code for calculating the
length of the String:

self log truncated + 1

...but that doesn't work. The current code in my ChangeSet uses an
unrolled binary search (nested ifs) which is 7x faster
(#decimalDigitLength) so the above code is not something we would like to
use, but I still wonder why it fails and what I would have needed to do
about it to get it to work:

999 log truncated ===> 2 As expected.

1000 log ===> 3.0 As expected.

1000 log truncated ===> 2 Eh, what? Not expected!

...ok, yes, I know - Floats are the "tar pits of hell" when it comes to
expected results - but... can someone answer if this is a bug or if I
simply can't expect this code to work?

regards, Göran

PS. Strongtalk does in fact do SmallInteger>>printString almost exactly
like my enhancement does (didn't peek, promise!) but does it around 5x
faster than Squeak. :) I fired it up to look at how it does #log but log
wasn't implemented. On the other hand Bryce tested Exupery on my code and
it made a real difference - at least 2x better (not sure if he only tested
printString). And then he mentioned an obvious improvement in Exupery that
would make it even faster. Interesting method for comparing!

Bert Freudenberg

Re: Is this a buglet? (SmallInteger>>printString related)

On May 30, 2007, at 13:01 , Göran Krampe wrote:

> Hi!
>
> In playing around with SmallInteger>>printString (see
> bugs.squeak.org/view.php?id=6512) I tested this code for
> calculating the
> length of the String:
>
> self log truncated + 1
>
> ...but that doesn't work. The current code in my ChangeSet uses an
> unrolled binary search (nested ifs) which is 7x faster
> (#decimalDigitLength) so the above code is not something we would
> like to
> use, but I still wonder why it fails and what I would have needed
> to do
> about it to get it to work:
>
> 999 log truncated ===> 2 As expected.
>
> 1000 log ===> 3.0 As expected.
>
> 1000 log truncated ===> 2 Eh, what? Not expected!
>
> ...ok, yes, I know - Floats are the "tar pits of hell" when it
> comes to
> expected results - but... can someone answer if this is a bug or if I
> simply can't expect this code to work?

1000 log - 3.0 "-4.44089209850063e-16"

- Bert -

Alexander Lazarevic'

Re: Is this a buglet? (SmallInteger>>printString related)

In reply to this post by Göran Krampe

Hi Göran,

when I wrote printStringBase: I remember I tried to use x ln / base ln
to get the string length with the same results as you are. Depending on
floats it is just not robust. I went with Stream to keep it simple. It
sure can help to have an optimized version for the major base rather
then using printStringBase: 10.

Alex

Göran Krampe schrieb:

> Hi!
>
> In playing around with SmallInteger>>printString (see
> bugs.squeak.org/view.php?id=6512) I tested this code for calculating the
> length of the String:
>
> self log truncated + 1
>
> ...but that doesn't work. The current code in my ChangeSet uses an
> unrolled binary search (nested ifs) which is 7x faster
> (#decimalDigitLength) so the above code is not something we would like to
> use, but I still wonder why it fails and what I would have needed to do
> about it to get it to work:
>
> 999 log truncated ===> 2 As expected.
>
> 1000 log ===> 3.0 As expected.
>
> 1000 log truncated ===> 2 Eh, what? Not expected!
>
> ...ok, yes, I know - Floats are the "tar pits of hell" when it comes to
> expected results - but... can someone answer if this is a bug or if I
> simply can't expect this code to work?
>
> regards, Göran
>
> PS. Strongtalk does in fact do SmallInteger>>printString almost exactly
> like my enhancement does (didn't peek, promise!) but does it around 5x
> faster than Squeak. :) I fired it up to look at how it does #log but log
> wasn't implemented. On the other hand Bryce tested Exupery on my code and
> it made a real difference - at least 2x better (not sure if he only tested
> printString). And then he mentioned an obvious improvement in Exupery that
> would make it even faster. Interesting method for comparing!
>
>

Bert Freudenberg

Re: Is this a buglet? (SmallInteger>>printString related)

Well,

(self asFloat + 0.9) log truncated + 1

would work reliably for SmallIntegers.

The nested conditionals might still win ... in particular if it was
not binary search but based on frequency of numbers.

- Bert -

On May 30, 2007, at 15:02 , Alexander Lazarevic' wrote:

> Hi Göran,
>
> when I wrote printStringBase: I remember I tried to use x ln / base ln
> to get the string length with the same results as you are.
> Depending on
> floats it is just not robust. I went with Stream to keep it simple. It
> sure can help to have an optimized version for the major base rather
> then using printStringBase: 10.
>
> Alex
>
> Göran Krampe schrieb:
>> Hi!
>>
>> In playing around with SmallInteger>>printString (see
>> bugs.squeak.org/view.php?id=6512) I tested this code for
>> calculating the
>> length of the String:
>>
>> self log truncated + 1
>>
>> ...but that doesn't work. The current code in my ChangeSet uses an
>> unrolled binary search (nested ifs) which is 7x faster
>> (#decimalDigitLength) so the above code is not something we would
>> like to
>> use, but I still wonder why it fails and what I would have needed
>> to do
>> about it to get it to work:
>>
>> 999 log truncated ===> 2 As expected.
>>
>> 1000 log ===> 3.0 As expected.
>>
>> 1000 log truncated ===> 2 Eh, what? Not expected!
>>
>> ...ok, yes, I know - Floats are the "tar pits of hell" when it
>> comes to
>> expected results - but... can someone answer if this is a bug or if I
>> simply can't expect this code to work?
>>
>> regards, Göran
>>
>> PS. Strongtalk does in fact do SmallInteger>>printString almost
>> exactly
>> like my enhancement does (didn't peek, promise!) but does it
>> around 5x
>> faster than Squeak. :) I fired it up to look at how it does #log
>> but log
>> wasn't implemented. On the other hand Bryce tested Exupery on my
>> code and
>> it made a real difference - at least 2x better (not sure if he
>> only tested
>> printString). And then he mentioned an obvious improvement in
>> Exupery that
>> would make it even faster. Interesting method for comparing!
>>
>>
>

Nicolas Cellier-3

Re: Is this a buglet? (SmallInteger>>printString related)

I used this in VW for any arbitrary large integer ,
Works even if integer cannot convert to Float because > Float maxVal.
(available at public store SYSEXT-floorLog)

Integer>>floorLog10
"Initial guess is based on digitLength and following approximation :
(2 raisedTo: 10) > (10 raisedTo: 3)"

| i j step guess n |
(self negative or: [self isZero])
ifTrue:
[^self error: 'Can''t take log of number <= 0'].
guess := (self digitLength - 1) * 8 * 3 // 10.
i := self // (10 raisedTo: guess).
step := 2.
[step > 0] whileTrue:
[n := 10 raisedTo: step.

[j := i // n.
j > 0] whileTrue:
[guess := guess + step.
i := j].
step := step // 2].
^guess

Of course, for SmallInteger, Bert's solution might be faster.

As for Float, i guess a native log10 FFI call should behave better than
self ln / 10 ln, since the latter implies 3 rounding operations (and an
inexact operation). I did not check the former however.

Nicolas

Bert Freudenberg a écrit :

> Well,
>
> (self asFloat + 0.9) log truncated + 1
>
> would work reliably for SmallIntegers.
>
> The nested conditionals might still win ... in particular if it was not
> binary search but based on frequency of numbers.
>
> - Bert -
>
> On May 30, 2007, at 15:02 , Alexander Lazarevic' wrote:
>
>> Hi Göran,
>>
>> when I wrote printStringBase: I remember I tried to use x ln / base ln
>> to get the string length with the same results as you are. Depending on
>> floats it is just not robust. I went with Stream to keep it simple. It
>> sure can help to have an optimized version for the major base rather
>> then using printStringBase: 10.
>>
>> Alex
>>
>> Göran Krampe schrieb:
>>> Hi!
>>>
>>> In playing around with SmallInteger>>printString (see
>>> bugs.squeak.org/view.php?id=6512) I tested this code for calculating the
>>> length of the String:
>>>
>>> self log truncated + 1
>>>
>>> ...but that doesn't work. The current code in my ChangeSet uses an
>>> unrolled binary search (nested ifs) which is 7x faster
>>> (#decimalDigitLength) so the above code is not something we would
>>> like to
>>> use, but I still wonder why it fails and what I would have needed to do
>>> about it to get it to work:
>>>
>>> 999 log truncated ===> 2 As expected.
>>>
>>> 1000 log ===> 3.0 As expected.
>>>
>>> 1000 log truncated ===> 2 Eh, what? Not expected!
>>>
>>> ...ok, yes, I know - Floats are the "tar pits of hell" when it comes to
>>> expected results - but... can someone answer if this is a bug or if I
>>> simply can't expect this code to work?
>>>
>>> regards, Göran
>>>
>>> PS. Strongtalk does in fact do SmallInteger>>printString almost exactly
>>> like my enhancement does (didn't peek, promise!) but does it around 5x
>>> faster than Squeak. :) I fired it up to look at how it does #log but log
>>> wasn't implemented. On the other hand Bryce tested Exupery on my code
>>> and
>>> it made a real difference - at least 2x better (not sure if he only
>>> tested
>>> printString). And then he mentioned an obvious improvement in Exupery
>>> that
>>> would make it even faster. Interesting method for comparing!
>>>
>>>
>>
>
>
>
>
>
>

Andreas.Raab

Re: Is this a buglet? (SmallInteger>>printString related)

nicolas cellier wrote:
> As for Float, i guess a native log10 FFI call should behave better than
> self ln / 10 ln, since the latter implies 3 rounding operations (and an
> inexact operation).

Indeed. Croquet gets it right (it uses a primitive from fdlibm).

Cheers,
- Andreas

Nicolas Cellier-3

Re: Is this a buglet? (SmallInteger>>printString related)

In reply to this post by Bert Freudenberg

Just tried this very simple one:
(I guess the smaller the merrier)

floorLog10bert
"from Bert Freundenberg's idea"

(self negative or: [self isZero])
ifTrue:
[^self error: 'Can''t take log of number <= 0'].
self < 10 ifTrue: [^0].
self < 100 ifTrue: [^1].
self < 1000 ifTrue: [^2].
self < 10000 ifTrue: [^3].
self < 100000 ifTrue: [^4].
self < 1000000 ifTrue: [^5].
self < 10000000 ifTrue: [^6].
self < 100000000 ifTrue: [^7].
self < 1000000000 ifTrue: [^8].
^9 "Based on the fact that (SmallInteger maxVal log: 10) floor = 9"

Comparing universal Integer version provided in previous mail and Bert's
gives a factor 5 at least

Time millisecondsToRun: [1000000 timesRepeat: [SmallInteger maxVal
floorLog10 ]]. 19846
Time millisecondsToRun: [1000000 timesRepeat: [SmallInteger maxVal
floorLog10bert ]]. 3790 3777

Ain't that much to gain when reducing number of tests anyway:
Time millisecondsToRun: [1000000 timesRepeat: [1 floorLog10bert ]]. 2563

Seems fast enough not to bother.

Nicolas

Bert Freudenberg a écrit :
>
> The nested conditionals might still win ... in particular if it was not
> binary search but based on frequency of numbers.
>
> - Bert -
>

Göran Krampe

Re: Is this a buglet? (SmallInteger>>printString related)

Hi!

nicolas cellier <[hidden email]> wrote:

> Just tried this very simple one:
> (I guess the smaller the merrier)
>
> floorLog10bert
> "from Bert Freundenberg's idea"
>
> (self negative or: [self isZero])
> ifTrue:
> [^self error: 'Can''t take log of number <= 0'].
> self < 10 ifTrue: [^0].
> self < 100 ifTrue: [^1].
> self < 1000 ifTrue: [^2].
> self < 10000 ifTrue: [^3].
> self < 100000 ifTrue: [^4].
> self < 1000000 ifTrue: [^5].
> self < 10000000 ifTrue: [^6].
> self < 100000000 ifTrue: [^7].
> self < 1000000000 ifTrue: [^8].
> ^9 "Based on the fact that (SmallInteger maxVal log: 10) floor = 9"
>
>
> Comparing universal Integer version provided in previous mail and Bert's
> gives a factor 5 at least

I also noted this and I also used the above approach - though with a
binary search. My code is in the mantis issue I posted in the beginning.
:)

Anyway, since smaller numbers are likely to be much more common I would
guess the ordering above is actually "smarter". And yes, it was fun to
see that Strongtalk does the above and its SmallInteger>>printString is
very similar to the one I ended up writing in the mantis issue.

regards, Göran

Bert Freudenberg

Re: Is this a buglet? (SmallInteger>>printString related)

On May 30, 2007, at 23:36 , [hidden email] wrote:

> Hi!
>
> nicolas cellier <[hidden email]> wrote:
>> Just tried this very simple one:
>> (I guess the smaller the merrier)
>>
>> floorLog10bert
>> "from Bert Freundenberg's idea"

(My name occurrencesOf: $n) = 1 ;-)

>> (self negative or: [self isZero])
>> ifTrue:
>> [^self error: 'Can''t take log of number <= 0'].

I'd use "self < 1" for the first test to avoid unnecessary message
sends (and it's prettier, too).

>> self < 10 ifTrue: [^0].
>> self < 100 ifTrue: [^1].
>> self < 1000 ifTrue: [^2].
>> self < 10000 ifTrue: [^3].
>> self < 100000 ifTrue: [^4].
>> self < 1000000 ifTrue: [^5].
>> self < 10000000 ifTrue: [^6].
>> self < 100000000 ifTrue: [^7].
>> self < 1000000000 ifTrue: [^8].
>> ^9 "Based on the fact that (SmallInteger maxVal log: 10) floor = 9"
>>
>>
>> Comparing universal Integer version provided in previous mail and
>> Bert's
>> gives a factor 5 at least

> I also noted this and I also used the above approach - though with a
> binary search. My code is in the mantis issue I posted in the
> beginning.
> :)
>
> Anyway, since smaller numbers are likely to be much more common I
> would
> guess the ordering above is actually "smarter". And yes, it was fun to
> see that Strongtalk does the above and its
> SmallInteger>>printString is
> very similar to the one I ended up writing in the mantis issue.

I still wonder how common #printString calls are in performance-
sensitive code ... I usually use printOn: if performance matters, so
maybe that should be optimized, too?

- Bert -

Nicolas Cellier-3

Re: Is this a buglet? (SmallInteger>>printString related)

Bert Freudenberg a écrit :
> On May 30, 2007, at 23:36 , [hidden email] wrote:
>
>>> floorLog10bert
>>> "from Bert Freundenberg's idea"
>
> (My name occurrencesOf: $n) = 1 ;-)
>

Apologies, take it as a friendly error.

>>> (self negative or: [self isZero])
>>> ifTrue:
>>> [^self error: 'Can''t take log of number <= 0'].
>
> I'd use "self < 1" for the first test to avoid unnecessary message sends
> (and it's prettier, too).
>

For sure, (self negative or: [self isZero]) is a stupid copy paste.
It was tailored for LargeInteger implementation for which both
operations are trivial.
But when i think of it, LargeInteger < SmallInteger should be trivial too.
I wonder which dialect might engage a costly useless arithmetic
operation when asked for it...

Nicolas

Alexander Lazarevic'

Re: Is this a buglet? (SmallInteger>>printString related)

In reply to this post by Bert Freudenberg

Bert Freudenberg schrieb:
> Well,
>
> (self asFloat + 0.9) log truncated + 1
>
> would work reliably for SmallIntegers.

Ah. Well yes :)

Alex

Nicolas Cellier-3

Re: Is this a buglet? (SmallInteger>>printString related)

In reply to this post by Nicolas Cellier-3

nicolas cellier a écrit :

> For sure, (self negative or: [self isZero]) is a stupid copy paste.
> It was tailored for LargeInteger implementation for which both
> operations are trivial.
> But when i think of it, LargeInteger < SmallInteger should be trivial too.
> I wonder which dialect might engage a costly useless arithmetic
> operation when asked for it...
>
> Nicolas
>

For my defense, just gave it a try:
after defining LargePositiveInteger>>isZero ^false, i test:

| val |
val := (SmallInteger maxVal raisedTo: 4) + 1.
{Time millisecondsToRun: [100000 timesRepeat: [val < 1]].
Time millisecondsToRun: [100000 timesRepeat: [val negative or: [val
isZero]]].
Time millisecondsToRun: [100000 timesRepeat: [val strictlyPositive not]].}
#(457 118 106)

Hmm test < 1 is not the fastest.
Since primitive 23 seems to fail, i remove LargePositiveInteger>>#<
and i get #(399 121 113).
Then not much to gain, #digitCompare: must be half the cost, one trivial
negative and two #< tests on small integers...

So i tried the double dispatching way:
LargePositiveInteger>>#< aNumber
^aNumber isInteger
ifTrue: [aNumber greaterThanLargePositiveInteger: self]
ifFalse: [aNumber adaptToInteger: self andSend: #<]
LargePositiveInteger>>#greaterThanLargePositiveInteger: aNumber
^(self digitCompare: aNumber) > 0
LargeNegativeInteger>>#greaterThanLargePositiveInteger: aNumber
^false
SmallInteger>>#greaterThanLargePositiveInteger: aNumber
^false

I now get #(217 111 108).
Like for isZero and isNegative, I call two trivial methods but pay a
penalty for intermediate #< method invocation.
(would need Exupery inlining if able to handle such polymorphic case).

I could also suppress the isInteger test and implement:
Object>>#greaterThanLargePositiveInteger: aNumber
^self adaptToInteger: aNumber andSend: #<
Well #(190 112 108) maybe does not deserve polluting Object protocol...

If you don't care, or think it's too much for such a ridiculous gain,
then you're probably right.
Don't you optimize but in tight loops! And maybe arithmetics too...
Anyway it's fun when it can be done in Smalltalk without any additionnal
primitive or plugin...

Nicolas

Göran Krampe

Re: Is this a buglet? (SmallInteger>>printString related)

In reply to this post by Bert Freudenberg

Hi folks!

Messing with small scale optimization can be great fun. Preamble:

"Change Set: FloorLogSmallInteger
Date: 31 May 2007
Author: Göran Krampe

After messing with SmallInteger>>decimalDigitLength and noticing that
'1000 log truncated' returns the wrong answer I ended up chasing down
base-10 log algorithms. Found this nice source of tricks:

http://www.hackersdelight.org/HDcode/ilog.c

And based on code in there and the fact that nlz (number of leading zeros)
can be computed using 'self asFloat exponent' I came up with this
implementation of #floorLog for SmallIntegers. It still is not as fast as
simpleminded comparisons as in decimalDigitLength though so it is not
worth using. But it was fun to experiment."

So the attachment is merely for amusement and/or info on the subject! :) :)

regards, Göran

FloorLogSmallInteger.1.cs.gz (898 bytes) Download Attachment

Nicolas Cellier-3

Re: Is this a buglet? (SmallInteger>>printString related)

Göran Krampe <goran <at> krampe.se> writes:

>
> After messing with SmallInteger>>decimalDigitLength and noticing that
> '1000 log truncated' returns the wrong answer I ended up chasing down
> base-10 log algorithms. Found this nice source of tricks:
>
> http://www.hackersdelight.org/HDcode/ilog.c
>
> And based on code in there and the fact that nlz (number of leading zeros)
> can be computed using 'self asFloat exponent' I came up with this
> implementation of #floorLog for SmallIntegers. It still is not as fast as
> simpleminded comparisons as in decimalDigitLength though so it is not
> worth using. But it was fun to experiment."
>

Yes, this link is fun.
Also don't forget Smalltalk has #highBit,
a complement of nlz not involving Float.

Nicolas