The Trunk: Collections-ul.607.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Collections-ul.607.mcz

commits-2
Levente Uzonyi uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ul.607.mcz

==================== Summary ====================

Name: Collections-ul.607
Author: ul
Time: 2 April 2015, 12:07:14.622 am
UUID: 607c7a3f-c462-454c-8e18-02c49e4ae91a
Ancestors: Collections-ul.606

Simplified CharacterSet class>>separators.
Fixed ReadStream >> #nextFloat, when collection is not a ByteString.

=============== Diff against Collections-ul.606 ===============

Item was changed:
  ----- Method: CharacterSet class>>separators (in category 'accessing') -----
  separators
  "return a set containing just the whitespace characters"
 
+ ^Separators ifNil: [ Separators := self newFrom: Character separators ]!
- ^Separators ifNil: [
- Separators := self new
- addAll: Character separators;
- yourself ]!

Item was changed:
  ----- Method: ReadStream>>nextFloat (in category 'accessing') -----
  nextFloat
  "Read a floating point value from the receiver. This method is highly optimized for cases
  where many floating point values need to be read subsequently. And if this needs to go
  even faster, look at the inner loops fetching the characters - moving those into a plugin
  would speed things up even more."
  | buffer count sign index cc value digit fraction exp startIndex anyDigit digitNeeded |
  buffer := collection.
  count := readLimit.
  index := position+1.
 
  "Skip separators"
+ index := ByteString findFirstInString: buffer inSet: CharacterSet nonSeparators byteArrayMap startingAt: index.
- index := buffer indexOfAnyOf: CharacterSet nonSeparators startingAt: index.
  index = 0 ifTrue:[self setToEnd. ^nil].
 
  "check for sign"
  digitNeeded := false.
  sign := 1. cc := buffer byteAt: index.
  cc = 45 "$- asciiValue"
  ifTrue:[sign := -1. index := index+1. digitNeeded := true]
  ifFalse:[cc =  43 "$+ asciiValue" ifTrue:[index := index+1. digitNeeded := true]].
 
  "Read integer part"
  startIndex := index.
  value := 0.
  [index <= count and:[
  digit := (buffer byteAt: index) - 48. "$0 asciiValue"
  digit >= 0 and:[digit <= 9]]] whileTrue:[
  value := value * 10 + digit.
  index := index + 1.
  ].
  anyDigit := index > startIndex.
  index > count ifTrue:[
  (digitNeeded and:[anyDigit not]) ifTrue:[^self error: 'At least one digit expected'].
  self setToEnd. ^value asFloat * sign].
 
  (buffer byteAt: index) = 46 "$. asciiValue" ifTrue:["<integer>.<fraction>"
  index := index+1.
  startIndex := index.
  "NOTE: fraction and exp below can overflow into LargeInteger range. If they do, then things slow down horribly due to the relatively slow LargeInt -> Float conversion. This can be avoided by changing fraction and exp to use floats to begin with (0.0 and 1.0 respectively), however, this will give different results to Float>>readFrom: and it is not clear if that is acceptable here."
  fraction := 0. exp := 1.
  [index <= count and:[
  digit := (buffer byteAt: index) - 48. "$0 asciiValue"
  digit >= 0 and:[digit <= 9]]] whileTrue:[
  fraction := fraction * 10 + digit.
  exp := exp * 10.
  index := index + 1.
  ].
  value := value + (fraction asFloat / exp asFloat).
  anyDigit := anyDigit or:[index > startIndex].
  ].
  value := value asFloat * sign.
 
  "At this point we require at least one digit to avoid allowing:
  - . ('0.0' without leading digits)
  - e32 ('0e32' without leading digits)
  - .e32 ('0.0e32' without leading digits)
  but these are currently allowed:
  - .5 (0.5)
  - 1. ('1.0')
  - 1e32 ('1.0e32')
  - 1.e32 ('1.0e32')
  - .5e32 ('0.5e32')
  "
  anyDigit ifFalse:["Check for NaN/Infinity first"
  (count - index >= 2 and:[(buffer copyFrom: index to: index+2) = 'NaN'])
  ifTrue:[position := index+2. ^Float nan * sign].
  (count - index >= 7 and:[(buffer copyFrom: index to: index+7) = 'Infinity'])
  ifTrue:[position := index+7. ^Float infinity * sign].
  ^self error: 'At least one digit expected'
  ].
 
  index > count ifTrue:[self setToEnd. ^value asFloat].
 
  (buffer byteAt: index) = 101 "$e asciiValue" ifTrue:["<number>e[+|-]<exponent>"
  index := index+1. "skip e"
  sign := 1. cc := buffer byteAt: index.
  cc = 45 "$- asciiValue"
  ifTrue:[sign := -1. index := index+1]
  ifFalse:[cc = 43 "$+ asciiValue" ifTrue:[index := index+1]].
  startIndex := index.
  exp := 0. anyDigit := false.
  [index <= count and:[
  digit := (buffer byteAt: index) - 48. "$0 asciiValue"
  digit >= 0 and:[digit <= 9]]] whileTrue:[
  exp := exp * 10 + digit.
  index := index + 1.
  ].
  index> startIndex ifFalse:[^self error: 'Exponent expected'].
  value := value * (10.0 raisedToInteger: exp * sign).
  ].
 
  position := index-1.
  ^value!


Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-ul.607.mcz

Nicolas Cellier
Just a side note: nextFloat is broken in the sense that it will fail to answer the nearest floating point value to the specified decimal value.

OK, it's reasonably fast, but the comments should mention the trade offs.

Also in 32 bits VM, nextFloat might be slower than NumberParser when numbers are specified with many decimal digits.

Nicolas

2015-04-02 0:08 GMT+02:00 <[hidden email]>:
Levente Uzonyi uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ul.607.mcz

==================== Summary ====================

Name: Collections-ul.607
Author: ul
Time: 2 April 2015, 12:07:14.622 am
UUID: 607c7a3f-c462-454c-8e18-02c49e4ae91a
Ancestors: Collections-ul.606

Simplified CharacterSet class>>separators.
Fixed ReadStream >> #nextFloat, when collection is not a ByteString.

=============== Diff against Collections-ul.606 ===============

Item was changed:
  ----- Method: CharacterSet class>>separators (in category 'accessing') -----
  separators
        "return a set containing just the whitespace characters"

+       ^Separators ifNil: [ Separators := self newFrom: Character separators ]!
-       ^Separators ifNil: [
-               Separators := self new
-                       addAll: Character separators;
-                       yourself ]!

Item was changed:
  ----- Method: ReadStream>>nextFloat (in category 'accessing') -----
  nextFloat
        "Read a floating point value from the receiver. This method is highly optimized for cases
        where many floating point values need to be read subsequently. And if this needs to go
        even faster, look at the inner loops fetching the characters - moving those into a plugin
        would speed things up even more."
        | buffer count sign index cc value digit fraction exp startIndex anyDigit digitNeeded |
        buffer := collection.
        count := readLimit.
        index := position+1.

        "Skip separators"
+       index := ByteString findFirstInString: buffer inSet: CharacterSet nonSeparators byteArrayMap startingAt: index.
-       index := buffer indexOfAnyOf: CharacterSet nonSeparators startingAt: index.
        index = 0 ifTrue:[self setToEnd. ^nil].

        "check for sign"
        digitNeeded := false.
        sign := 1. cc := buffer byteAt: index.
        cc = 45 "$- asciiValue"
                ifTrue:[sign := -1. index := index+1. digitNeeded := true]
                ifFalse:[cc =  43 "$+ asciiValue" ifTrue:[index := index+1. digitNeeded := true]].

        "Read integer part"
        startIndex := index.
        value := 0.
        [index <= count and:[
                digit := (buffer byteAt: index) - 48. "$0 asciiValue"
                digit >= 0 and:[digit <= 9]]] whileTrue:[
                        value := value * 10 + digit.
                        index := index + 1.
        ].
        anyDigit := index > startIndex.
        index > count ifTrue:[
                (digitNeeded and:[anyDigit not]) ifTrue:[^self error: 'At least one digit expected'].
                self setToEnd. ^value asFloat * sign].

        (buffer byteAt: index) = 46 "$. asciiValue" ifTrue:["<integer>.<fraction>"
                index := index+1.
                startIndex := index.
                "NOTE: fraction and exp below can overflow into LargeInteger range. If they do, then things slow down horribly due to the relatively slow LargeInt -> Float conversion. This can be avoided by changing fraction and exp to use floats to begin with (0.0 and 1.0 respectively), however, this will give different results to Float>>readFrom: and it is not clear if that is acceptable here."
                fraction := 0. exp := 1.
                [index <= count and:[
                        digit := (buffer byteAt: index) - 48. "$0 asciiValue"
                        digit >= 0 and:[digit <= 9]]] whileTrue:[
                                fraction := fraction * 10 + digit.
                                exp := exp * 10.
                                index := index + 1.
                ].
                value := value + (fraction asFloat / exp asFloat).
                anyDigit := anyDigit or:[index > startIndex].
        ].
        value := value asFloat * sign.

        "At this point we require at least one digit to avoid allowing:
                - . ('0.0' without leading digits)
                - e32 ('0e32' without leading digits)
                - .e32 ('0.0e32' without leading digits)
        but these are currently allowed:
                - .5 (0.5)
                - 1. ('1.0')
                - 1e32 ('1.0e32')
                - 1.e32 ('1.0e32')
                - .5e32 ('0.5e32')
        "
        anyDigit ifFalse:["Check for NaN/Infinity first"
                (count - index >= 2 and:[(buffer copyFrom: index to: index+2) = 'NaN'])
                        ifTrue:[position := index+2. ^Float nan * sign].
                (count - index >= 7 and:[(buffer copyFrom: index to: index+7) = 'Infinity'])
                        ifTrue:[position := index+7. ^Float infinity * sign].
                ^self error: 'At least one digit expected'
        ].

        index > count ifTrue:[self setToEnd. ^value asFloat].

        (buffer byteAt: index) = 101 "$e asciiValue" ifTrue:["<number>e[+|-]<exponent>"
                index := index+1. "skip e"
                sign := 1. cc := buffer byteAt: index.
                cc = 45 "$- asciiValue"
                        ifTrue:[sign := -1. index := index+1]
                        ifFalse:[cc = 43 "$+ asciiValue" ifTrue:[index := index+1]].
                startIndex := index.
                exp := 0. anyDigit := false.
                [index <= count and:[
                        digit := (buffer byteAt: index) - 48. "$0 asciiValue"
                        digit >= 0 and:[digit <= 9]]] whileTrue:[
                                exp := exp * 10 + digit.
                                index := index + 1.
                ].
                index> startIndex ifFalse:[^self error: 'Exponent expected'].
                value := value * (10.0 raisedToInteger: exp * sign).
        ].

        position := index-1.
        ^value!





Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-ul.607.mcz

Levente Uzonyi-2
I don't see any senders of #nextFloat in my image, so I think we can
safely trade performance for quality here.

Levente

On Thu, 2 Apr 2015, Nicolas Cellier wrote:

> Just a side note: nextFloat is broken in the sense that it will fail to answer the nearest floating point value to the specified decimal value.
>
> OK, it's reasonably fast, but the comments should mention the trade offs.
>
> Also in 32 bits VM, nextFloat might be slower than NumberParser when numbers are specified with many decimal digits.
>
> Nicolas
>
> 2015-04-02 0:08 GMT+02:00 <[hidden email]>:
>       Levente Uzonyi uploaded a new version of Collections to project The Trunk:
>       http://source.squeak.org/trunk/Collections-ul.607.mcz
>
>       ==================== Summary ====================
>
>       Name: Collections-ul.607
>       Author: ul
>       Time: 2 April 2015, 12:07:14.622 am
>       UUID: 607c7a3f-c462-454c-8e18-02c49e4ae91a
>       Ancestors: Collections-ul.606
>
>       Simplified CharacterSet class>>separators.
>       Fixed ReadStream >> #nextFloat, when collection is not a ByteString.
>
>       =============== Diff against Collections-ul.606 ===============
>
>       Item was changed:
>         ----- Method: CharacterSet class>>separators (in category 'accessing') -----
>         separators
>               "return a set containing just the whitespace characters"
>
>       +       ^Separators ifNil: [ Separators := self newFrom: Character separators ]!
>       -       ^Separators ifNil: [
>       -               Separators := self new
>       -                       addAll: Character separators;
>       -                       yourself ]!
>
>       Item was changed:
>         ----- Method: ReadStream>>nextFloat (in category 'accessing') -----
>         nextFloat
>               "Read a floating point value from the receiver. This method is highly optimized for cases
>               where many floating point values need to be read subsequently. And if this needs to go
>               even faster, look at the inner loops fetching the characters - moving those into a plugin
>               would speed things up even more."
>               | buffer count sign index cc value digit fraction exp startIndex anyDigit digitNeeded |
>               buffer := collection.
>               count := readLimit.
>               index := position+1.
>
>               "Skip separators"
>       +       index := ByteString findFirstInString: buffer inSet: CharacterSet nonSeparators byteArrayMap startingAt: index.
>       -       index := buffer indexOfAnyOf: CharacterSet nonSeparators startingAt: index.
>               index = 0 ifTrue:[self setToEnd. ^nil].
>
>               "check for sign"
>               digitNeeded := false.
>               sign := 1. cc := buffer byteAt: index.
>               cc = 45 "$- asciiValue"
>                       ifTrue:[sign := -1. index := index+1. digitNeeded := true]
>                       ifFalse:[cc =  43 "$+ asciiValue" ifTrue:[index := index+1. digitNeeded := true]].
>
>               "Read integer part"
>               startIndex := index.
>               value := 0.
>               [index <= count and:[
>                       digit := (buffer byteAt: index) - 48. "$0 asciiValue"
>                       digit >= 0 and:[digit <= 9]]] whileTrue:[
>                               value := value * 10 + digit.
>                               index := index + 1.
>               ].
>               anyDigit := index > startIndex.
>               index > count ifTrue:[
>                       (digitNeeded and:[anyDigit not]) ifTrue:[^self error: 'At least one digit expected'].
>                       self setToEnd. ^value asFloat * sign].
>
>               (buffer byteAt: index) = 46 "$. asciiValue" ifTrue:["<integer>.<fraction>"
>                       index := index+1.
>                       startIndex := index.
>                       "NOTE: fraction and exp below can overflow into LargeInteger range. If they do, then things slow down horribly due to the relatively slow LargeInt -> Float
>       conversion. This can be avoided by changing fraction and exp to use floats to begin with (0.0 and 1.0 respectively), however, this will give different results to Float>>readFrom:
>       and it is not clear if that is acceptable here."
>                       fraction := 0. exp := 1.
>                       [index <= count and:[
>                               digit := (buffer byteAt: index) - 48. "$0 asciiValue"
>                               digit >= 0 and:[digit <= 9]]] whileTrue:[
>                                       fraction := fraction * 10 + digit.
>                                       exp := exp * 10.
>                                       index := index + 1.
>                       ].
>                       value := value + (fraction asFloat / exp asFloat).
>                       anyDigit := anyDigit or:[index > startIndex].
>               ].
>               value := value asFloat * sign.
>
>               "At this point we require at least one digit to avoid allowing:
>                       - . ('0.0' without leading digits)
>                       - e32 ('0e32' without leading digits)
>                       - .e32 ('0.0e32' without leading digits)
>               but these are currently allowed:
>                       - .5 (0.5)
>                       - 1. ('1.0')
>                       - 1e32 ('1.0e32')
>                       - 1.e32 ('1.0e32')
>                       - .5e32 ('0.5e32')
>               "
>               anyDigit ifFalse:["Check for NaN/Infinity first"
>                       (count - index >= 2 and:[(buffer copyFrom: index to: index+2) = 'NaN'])
>                               ifTrue:[position := index+2. ^Float nan * sign].
>                       (count - index >= 7 and:[(buffer copyFrom: index to: index+7) = 'Infinity'])
>                               ifTrue:[position := index+7. ^Float infinity * sign].
>                       ^self error: 'At least one digit expected'
>               ].
>
>               index > count ifTrue:[self setToEnd. ^value asFloat].
>
>               (buffer byteAt: index) = 101 "$e asciiValue" ifTrue:["<number>e[+|-]<exponent>"
>                       index := index+1. "skip e"
>                       sign := 1. cc := buffer byteAt: index.
>                       cc = 45 "$- asciiValue"
>                               ifTrue:[sign := -1. index := index+1]
>                               ifFalse:[cc = 43 "$+ asciiValue" ifTrue:[index := index+1]].
>                       startIndex := index.
>                       exp := 0. anyDigit := false.
>                       [index <= count and:[
>                               digit := (buffer byteAt: index) - 48. "$0 asciiValue"
>                               digit >= 0 and:[digit <= 9]]] whileTrue:[
>                                       exp := exp * 10 + digit.
>                                       index := index + 1.
>                       ].
>                       index> startIndex ifFalse:[^self error: 'Exponent expected'].
>                       value := value * (10.0 raisedToInteger: exp * sign).
>               ].
>
>               position := index-1.
>               ^value!
>
>
>
>
>