Smalltalk › Squeak › Squeak - Dev

The Trunk: Collections-ul.607.mcz

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

3 messages Options

commits-2

The Trunk: Collections-ul.607.mcz

Levente Uzonyi uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ul.607.mcz

==================== Summary ====================

Name: Collections-ul.607
Author: ul
Time: 2 April 2015, 12:07:14.622 am
UUID: 607c7a3f-c462-454c-8e18-02c49e4ae91a
Ancestors: Collections-ul.606

Simplified CharacterSet class>>separators.
Fixed ReadStream >> #nextFloat, when collection is not a ByteString.

=============== Diff against Collections-ul.606 ===============

Item was changed:
----- Method: CharacterSet class>>separators (in category 'accessing') -----
separators
"return a set containing just the whitespace characters"

+ ^Separators ifNil: [ Separators := self newFrom: Character separators ]!
- ^Separators ifNil: [
- Separators := self new
- addAll: Character separators;
- yourself ]!

Item was changed:
----- Method: ReadStream>>nextFloat (in category 'accessing') -----
nextFloat
"Read a floating point value from the receiver. This method is highly optimized for cases
where many floating point values need to be read subsequently. And if this needs to go
even faster, look at the inner loops fetching the characters - moving those into a plugin
would speed things up even more."
| buffer count sign index cc value digit fraction exp startIndex anyDigit digitNeeded |
buffer := collection.
count := readLimit.
index := position+1.

"Skip separators"
+ index := ByteString findFirstInString: buffer inSet: CharacterSet nonSeparators byteArrayMap startingAt: index.
- index := buffer indexOfAnyOf: CharacterSet nonSeparators startingAt: index.
index = 0 ifTrue:[self setToEnd. ^nil].

"check for sign"
digitNeeded := false.
sign := 1. cc := buffer byteAt: index.
cc = 45 "$- asciiValue"
ifTrue:[sign := -1. index := index+1. digitNeeded := true]
ifFalse:[cc = 43 "$+ asciiValue" ifTrue:[index := index+1. digitNeeded := true]].

"Read integer part"
startIndex := index.
value := 0.
[index <= count and:[
digit := (buffer byteAt: index) - 48. "$0 asciiValue"
digit >= 0 and:[digit <= 9]]] whileTrue:[
value := value * 10 + digit.
index := index + 1.
].
anyDigit := index > startIndex.
index > count ifTrue:[
(digitNeeded and:[anyDigit not]) ifTrue:[^self error: 'At least one digit expected'].
self setToEnd. ^value asFloat * sign].

(buffer byteAt: index) = 46 "$. asciiValue" ifTrue:["<integer>.<fraction>"
index := index+1.
startIndex := index.
"NOTE: fraction and exp below can overflow into LargeInteger range. If they do, then things slow down horribly due to the relatively slow LargeInt -> Float conversion. This can be avoided by changing fraction and exp to use floats to begin with (0.0 and 1.0 respectively), however, this will give different results to Float>>readFrom: and it is not clear if that is acceptable here."
fraction := 0. exp := 1.
[index <= count and:[
digit := (buffer byteAt: index) - 48. "$0 asciiValue"
digit >= 0 and:[digit <= 9]]] whileTrue:[
fraction := fraction * 10 + digit.
exp := exp * 10.
index := index + 1.
].
value := value + (fraction asFloat / exp asFloat).
anyDigit := anyDigit or:[index > startIndex].
].
value := value asFloat * sign.

"At this point we require at least one digit to avoid allowing:
- . ('0.0' without leading digits)
- e32 ('0e32' without leading digits)
- .e32 ('0.0e32' without leading digits)
but these are currently allowed:
- .5 (0.5)
- 1. ('1.0')
- 1e32 ('1.0e32')
- 1.e32 ('1.0e32')
- .5e32 ('0.5e32')
"
anyDigit ifFalse:["Check for NaN/Infinity first"
(count - index >= 2 and:[(buffer copyFrom: index to: index+2) = 'NaN'])
ifTrue:[position := index+2. ^Float nan * sign].
(count - index >= 7 and:[(buffer copyFrom: index to: index+7) = 'Infinity'])
ifTrue:[position := index+7. ^Float infinity * sign].
^self error: 'At least one digit expected'
].

index > count ifTrue:[self setToEnd. ^value asFloat].

(buffer byteAt: index) = 101 "$e asciiValue" ifTrue:["<number>e[+|-]<exponent>"
index := index+1. "skip e"
sign := 1. cc := buffer byteAt: index.
cc = 45 "$- asciiValue"
ifTrue:[sign := -1. index := index+1]
ifFalse:[cc = 43 "$+ asciiValue" ifTrue:[index := index+1]].
startIndex := index.
exp := 0. anyDigit := false.
[index <= count and:[
digit := (buffer byteAt: index) - 48. "$0 asciiValue"
digit >= 0 and:[digit <= 9]]] whileTrue:[
exp := exp * 10 + digit.
index := index + 1.
].
index> startIndex ifFalse:[^self error: 'Exponent expected'].
value := value * (10.0 raisedToInteger: exp * sign).
].

position := index-1.
^value!

Nicolas Cellier

Re: The Trunk: Collections-ul.607.mcz

Just a side note: nextFloat is broken in the sense that it will fail to answer the nearest floating point value to the specified decimal value.

OK, it's reasonably fast, but the comments should mention the trade offs.

Also in 32 bits VM, nextFloat might be slower than NumberParser when numbers are specified with many decimal digits.

Nicolas

2015-04-02 0:08 GMT+02:00 <[hidden email]>:

Levente Uzonyi uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ul.607.mcz

==================== Summary ====================

Name: Collections-ul.607
Author: ul
Time: 2 April 2015, 12:07:14.622 am
UUID: 607c7a3f-c462-454c-8e18-02c49e4ae91a
Ancestors: Collections-ul.606

Simplified CharacterSet class>>separators.
Fixed ReadStream >> #nextFloat, when collection is not a ByteString.

=============== Diff against Collections-ul.606 ===============

Item was changed:
----- Method: CharacterSet class>>separators (in category 'accessing') -----
separators
"return a set containing just the whitespace characters"

+ ^Separators ifNil: [ Separators := self newFrom: Character separators ]!
- ^Separators ifNil: [
- Separators := self new
- addAll: Character separators;
- yourself ]!

Item was changed:
----- Method: ReadStream>>nextFloat (in category 'accessing') -----
nextFloat
"Read a floating point value from the receiver. This method is highly optimized for cases
where many floating point values need to be read subsequently. And if this needs to go
even faster, look at the inner loops fetching the characters - moving those into a plugin
would speed things up even more."
| buffer count sign index cc value digit fraction exp startIndex anyDigit digitNeeded |
buffer := collection.
count := readLimit.
index := position+1.

"Skip separators"
+ index := ByteString findFirstInString: buffer inSet: CharacterSet nonSeparators byteArrayMap startingAt: index.
- index := buffer indexOfAnyOf: CharacterSet nonSeparators startingAt: index.
index = 0 ifTrue:[self setToEnd. ^nil].

"check for sign"
digitNeeded := false.
sign := 1. cc := buffer byteAt: index.
cc = 45 "$- asciiValue"
ifTrue:[sign := -1. index := index+1. digitNeeded := true]
ifFalse:[cc = 43 "$+ asciiValue" ifTrue:[index := index+1. digitNeeded := true]].

"Read integer part"
startIndex := index.
value := 0.
[index <= count and:[
digit := (buffer byteAt: index) - 48. "$0 asciiValue"
digit >= 0 and:[digit <= 9]]] whileTrue:[
value := value * 10 + digit.
index := index + 1.
].
anyDigit := index > startIndex.
index > count ifTrue:[
(digitNeeded and:[anyDigit not]) ifTrue:[^self error: 'At least one digit expected'].
self setToEnd. ^value asFloat * sign].

(buffer byteAt: index) = 46 "$. asciiValue" ifTrue:["<integer>.<fraction>"
index := index+1.
startIndex := index.
"NOTE: fraction and exp below can overflow into LargeInteger range. If they do, then things slow down horribly due to the relatively slow LargeInt -> Float conversion. This can be avoided by changing fraction and exp to use floats to begin with (0.0 and 1.0 respectively), however, this will give different results to Float>>readFrom: and it is not clear if that is acceptable here."
fraction := 0. exp := 1.
[index <= count and:[
digit := (buffer byteAt: index) - 48. "$0 asciiValue"
digit >= 0 and:[digit <= 9]]] whileTrue:[
fraction := fraction * 10 + digit.
exp := exp * 10.
index := index + 1.
].
value := value + (fraction asFloat / exp asFloat).
anyDigit := anyDigit or:[index > startIndex].
].
value := value asFloat * sign.

"At this point we require at least one digit to avoid allowing:
- . ('0.0' without leading digits)
- e32 ('0e32' without leading digits)
- .e32 ('0.0e32' without leading digits)
but these are currently allowed:
- .5 (0.5)
- 1. ('1.0')
- 1e32 ('1.0e32')
- 1.e32 ('1.0e32')
- .5e32 ('0.5e32')
"
anyDigit ifFalse:["Check for NaN/Infinity first"
(count - index >= 2 and:[(buffer copyFrom: index to: index+2) = 'NaN'])
ifTrue:[position := index+2. ^Float nan * sign].
(count - index >= 7 and:[(buffer copyFrom: index to: index+7) = 'Infinity'])
ifTrue:[position := index+7. ^Float infinity * sign].
^self error: 'At least one digit expected'
].

index > count ifTrue:[self setToEnd. ^value asFloat].

(buffer byteAt: index) = 101 "$e asciiValue" ifTrue:["<number>e[+|-]<exponent>"
index := index+1. "skip e"
sign := 1. cc := buffer byteAt: index.
cc = 45 "$- asciiValue"
ifTrue:[sign := -1. index := index+1]
ifFalse:[cc = 43 "$+ asciiValue" ifTrue:[index := index+1]].
startIndex := index.
exp := 0. anyDigit := false.
[index <= count and:[
digit := (buffer byteAt: index) - 48. "$0 asciiValue"
digit >= 0 and:[digit <= 9]]] whileTrue:[
exp := exp * 10 + digit.
index := index + 1.
].
index> startIndex ifFalse:[^self error: 'Exponent expected'].
value := value * (10.0 raisedToInteger: exp * sign).
].

position := index-1.
^value!

Levente Uzonyi-2

Re: The Trunk: Collections-ul.607.mcz

I don't see any senders of #nextFloat in my image, so I think we can
safely trade performance for quality here.

Levente

On Thu, 2 Apr 2015, Nicolas Cellier wrote:

> Just a side note: nextFloat is broken in the sense that it will fail to answer the nearest floating point value to the specified decimal value.
>
> OK, it's reasonably fast, but the comments should mention the trade offs.
>
> Also in 32 bits VM, nextFloat might be slower than NumberParser when numbers are specified with many decimal digits.
>
> Nicolas
>
> 2015-04-02 0:08 GMT+02:00 <[hidden email]>:
> Levente Uzonyi uploaded a new version of Collections to project The Trunk:
> http://source.squeak.org/trunk/Collections-ul.607.mcz
>
> ==================== Summary ====================
>
> Name: Collections-ul.607
> Author: ul
> Time: 2 April 2015, 12:07:14.622 am
> UUID: 607c7a3f-c462-454c-8e18-02c49e4ae91a
> Ancestors: Collections-ul.606
>
> Simplified CharacterSet class>>separators.
> Fixed ReadStream >> #nextFloat, when collection is not a ByteString.
>
> =============== Diff against Collections-ul.606 ===============
>
> Item was changed:
> ----- Method: CharacterSet class>>separators (in category 'accessing') -----
> separators
> "return a set containing just the whitespace characters"
>
> + ^Separators ifNil: [ Separators := self newFrom: Character separators ]!
> - ^Separators ifNil: [
> - Separators := self new
> - addAll: Character separators;
> - yourself ]!
>
> Item was changed:
> ----- Method: ReadStream>>nextFloat (in category 'accessing') -----
> nextFloat
> "Read a floating point value from the receiver. This method is highly optimized for cases
> where many floating point values need to be read subsequently. And if this needs to go
> even faster, look at the inner loops fetching the characters - moving those into a plugin
> would speed things up even more."
> | buffer count sign index cc value digit fraction exp startIndex anyDigit digitNeeded |
> buffer := collection.
> count := readLimit.
> index := position+1.
>
> "Skip separators"
> + index := ByteString findFirstInString: buffer inSet: CharacterSet nonSeparators byteArrayMap startingAt: index.
> - index := buffer indexOfAnyOf: CharacterSet nonSeparators startingAt: index.
> index = 0 ifTrue:[self setToEnd. ^nil].
>
> "check for sign"
> digitNeeded := false.
> sign := 1. cc := buffer byteAt: index.
> cc = 45 "$- asciiValue"
> ifTrue:[sign := -1. index := index+1. digitNeeded := true]
> ifFalse:[cc = 43 "$+ asciiValue" ifTrue:[index := index+1. digitNeeded := true]].
>
> "Read integer part"
> startIndex := index.
> value := 0.
> [index <= count and:[
> digit := (buffer byteAt: index) - 48. "$0 asciiValue"
> digit >= 0 and:[digit <= 9]]] whileTrue:[
> value := value * 10 + digit.
> index := index + 1.
> ].
> anyDigit := index > startIndex.
> index > count ifTrue:[
> (digitNeeded and:[anyDigit not]) ifTrue:[^self error: 'At least one digit expected'].
> self setToEnd. ^value asFloat * sign].
>
> (buffer byteAt: index) = 46 "$. asciiValue" ifTrue:["<integer>.<fraction>"
> index := index+1.
> startIndex := index.
> "NOTE: fraction and exp below can overflow into LargeInteger range. If they do, then things slow down horribly due to the relatively slow LargeInt -> Float
> conversion. This can be avoided by changing fraction and exp to use floats to begin with (0.0 and 1.0 respectively), however, this will give different results to Float>>readFrom:
> and it is not clear if that is acceptable here."
> fraction := 0. exp := 1.
> [index <= count and:[
> digit := (buffer byteAt: index) - 48. "$0 asciiValue"
> digit >= 0 and:[digit <= 9]]] whileTrue:[
> fraction := fraction * 10 + digit.
> exp := exp * 10.
> index := index + 1.
> ].
> value := value + (fraction asFloat / exp asFloat).
> anyDigit := anyDigit or:[index > startIndex].
> ].
> value := value asFloat * sign.
>
> "At this point we require at least one digit to avoid allowing:
> - . ('0.0' without leading digits)
> - e32 ('0e32' without leading digits)
> - .e32 ('0.0e32' without leading digits)
> but these are currently allowed:
> - .5 (0.5)
> - 1. ('1.0')
> - 1e32 ('1.0e32')
> - 1.e32 ('1.0e32')
> - .5e32 ('0.5e32')
> "
> anyDigit ifFalse:["Check for NaN/Infinity first"
> (count - index >= 2 and:[(buffer copyFrom: index to: index+2) = 'NaN'])
> ifTrue:[position := index+2. ^Float nan * sign].
> (count - index >= 7 and:[(buffer copyFrom: index to: index+7) = 'Infinity'])
> ifTrue:[position := index+7. ^Float infinity * sign].
> ^self error: 'At least one digit expected'
> ].
>
> index > count ifTrue:[self setToEnd. ^value asFloat].
>
> (buffer byteAt: index) = 101 "$e asciiValue" ifTrue:["<number>e[+|-]<exponent>"
> index := index+1. "skip e"
> sign := 1. cc := buffer byteAt: index.
> cc = 45 "$- asciiValue"
> ifTrue:[sign := -1. index := index+1]
> ifFalse:[cc = 43 "$+ asciiValue" ifTrue:[index := index+1]].
> startIndex := index.
> exp := 0. anyDigit := false.
> [index <= count and:[
> digit := (buffer byteAt: index) - 48. "$0 asciiValue"
> digit >= 0 and:[digit <= 9]]] whileTrue:[
> exp := exp * 10 + digit.
> index := index + 1.
> ].
> index> startIndex ifFalse:[^self error: 'Exponent expected'].
> value := value * (10.0 raisedToInteger: exp * sign).
> ].
>
> position := index-1.
> ^value!
>
>
>
>
>