The Trunk: Collections-eem.907.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Collections-eem.907.mcz

commits-2
Eliot Miranda uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-eem.907.mcz

==================== Summary ====================

Name: Collections-eem.907
Author: eem
Time: 6 August 2020, 12:51:46.481296 pm
UUID: 4ea594f7-b7bf-4d48-9e7e-2e7cb3af5e70
Ancestors: Collections-mt.906

2.7x faster ByteArray>>readHexFrom:

=============== Diff against Collections-mt.906 ===============

Item was changed:
  ----- Method: ByteArray>>readHexFrom: (in category 'initialize') -----
  readHexFrom: aStream
  "Initialize the receiver from a hexadecimal string representation"
+
+ 1 to: self size do:
+ [:i| | n v1 v2 |
+ n := aStream next asInteger.
+ v1 := n > 57 "$9 asInteger = 57"
+ ifTrue:
+ [n > 96 "$a asInteger 97"
+ ifTrue: [n - 87]
+ ifFalse: [n > 64 "$A asInteger = 65"
+ ifTrue: [n - 55]
+ ifFalse: [-1]]]
+ ifFalse: [n - 48]. "$0 asInteger = 48"
+ (v1 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected'].
+ n := aStream next asInteger.
+ v2 := n > 57 "$9 asInteger = 57"
+ ifTrue:
+ [n > 96 "$a asInteger 97"
+ ifTrue: [n - 87]
+ ifFalse: [n > 64 "$A asInteger = 65"
+ ifTrue: [n - 55]
+ ifFalse: [-1]]]
+ ifFalse: [n - 48]. "$0 asInteger = 48"
+ (v2 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected'].
+ self at: i put: (v1 bitShift: 4) + v2]
+
+ "Proof that our filter selects only hexadecimal characters:
+ (0 to: 255)
+ select:
+ [:n|
+ (n > 57
+ ifTrue:
+ [n > 96
+ ifTrue: [n - 87]
+ ifFalse: [n > 64
+ ifTrue: [n - 55]
+ ifFalse: [-1]]]
+ ifFalse: [n - 48]) between: 0 and: 15]
+ thenCollect:
+ [:n| Character value: n]"!
- | map v ch value |
- map := '0123456789abcdefABCDEF'.
- 1 to: self size do:[:i|
- ch := aStream next.
- v := (map indexOf: ch) - 1.
- ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected'].
- value := v bitShift: 4.
- ch := aStream next.
- v := (map indexOf: ch) - 1.
- ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected'].
- value := value + v.
- self at: i put: value.
- ].
- !


Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-eem.907.mcz

Levente Uzonyi
Hi Eliot,

If performance matters, I've got two alternative implementations.
Both are about 1.2x faster than yours and are based on lookup tables.
The first one uses #digitValue which uses Character's "built-in"
DigitValues table:

readHexFrom: aStream
  "Initialize the receiver from a hexadecimal string representation"

  1 to: self size do: [ :i |
  | c v1 v2 |
  ((c := aStream next) asInteger > 102 "$f asInteger"
  or: [ (v1 := c digitValue) < 0
  or: [ v1 > 15
  or: [ (c := aStream next) asInteger > 102 "$f asInteger"
  or: [ (v2 := c digitValue) < 0
  or: [ v2 > 15 ] ] ] ] ])
  ifTrue: [ ^self error: 'Hex digit expected' ].
  self at: i put: (v1 bitShift: 4) + v2 ]

The second one uses a non-minimal perfect hash table with an extremely
simple hash function:

readHexFrom: aStream
  "Initialize the receiver from a hexadecimal string representation"

  | keys values |
  keys :=  #[0 65 66 67 68 69 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 97 98 99 100 101 102 0 0 0 0 0 0 0 0 0 48 49 50 51 52 53 54 55 56 57 0 0 0 0 0 0].
  values :=  #[0 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0].
  1 to: self size do: [ :i |
  | c index v1 v2 |
  c := aStream next asInteger.
  index := (c bitAnd: 63) + 1.
  (keys at: index) = c ifFalse: [ ^self error: 'Hex digit expected' ].
  v1 := values at: index.
  c := aStream next asInteger.
  index := (c bitAnd: 63) + 1.
  (keys at: index) = c ifFalse: [ ^self error: 'Hex digit expected' ].
  v2 := values at: index.
  self at: i put: (v1 bitShift: 4) + v2 ]


Levente

On Thu, 6 Aug 2020, [hidden email] wrote:

> Eliot Miranda uploaded a new version of Collections to project The Trunk:
> http://source.squeak.org/trunk/Collections-eem.907.mcz
>
> ==================== Summary ====================
>
> Name: Collections-eem.907
> Author: eem
> Time: 6 August 2020, 12:51:46.481296 pm
> UUID: 4ea594f7-b7bf-4d48-9e7e-2e7cb3af5e70
> Ancestors: Collections-mt.906
>
> 2.7x faster ByteArray>>readHexFrom:
>
> =============== Diff against Collections-mt.906 ===============
>
> Item was changed:
>  ----- Method: ByteArray>>readHexFrom: (in category 'initialize') -----
>  readHexFrom: aStream
>   "Initialize the receiver from a hexadecimal string representation"
> +
> + 1 to: self size do:
> + [:i| | n v1 v2 |
> + n := aStream next asInteger.
> + v1 := n > 57 "$9 asInteger = 57"
> + ifTrue:
> + [n > 96 "$a asInteger 97"
> + ifTrue: [n - 87]
> + ifFalse: [n > 64 "$A asInteger = 65"
> + ifTrue: [n - 55]
> + ifFalse: [-1]]]
> + ifFalse: [n - 48]. "$0 asInteger = 48"
> + (v1 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected'].
> + n := aStream next asInteger.
> + v2 := n > 57 "$9 asInteger = 57"
> + ifTrue:
> + [n > 96 "$a asInteger 97"
> + ifTrue: [n - 87]
> + ifFalse: [n > 64 "$A asInteger = 65"
> + ifTrue: [n - 55]
> + ifFalse: [-1]]]
> + ifFalse: [n - 48]. "$0 asInteger = 48"
> + (v2 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected'].
> + self at: i put: (v1 bitShift: 4) + v2]
> +
> + "Proof that our filter selects only hexadecimal characters:
> + (0 to: 255)
> + select:
> + [:n|
> + (n > 57
> + ifTrue:
> + [n > 96
> + ifTrue: [n - 87]
> + ifFalse: [n > 64
> + ifTrue: [n - 55]
> + ifFalse: [-1]]]
> + ifFalse: [n - 48]) between: 0 and: 15]
> + thenCollect:
> + [:n| Character value: n]"!
> - | map v ch value |
> - map := '0123456789abcdefABCDEF'.
> - 1 to: self size do:[:i|
> - ch := aStream next.
> - v := (map indexOf: ch) - 1.
> - ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected'].
> - value := v bitShift: 4.
> - ch := aStream next.
> - v := (map indexOf: ch) - 1.
> - ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected'].
> - value := value + v.
> - self at: i put: value.
> - ].
> - !

Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-eem.907.mcz

Eliot Miranda-2
Hi Levente,


> On Aug 7, 2020, at 5:07 PM, Levente Uzonyi <[hidden email]> wrote:
>
> Hi Eliot,
>
> If performance matters, I've got two alternative implementations.

Performance isn’t that big a deal.  But something told me that the indexOf: approach sux :-)

> Both are about 1.2x faster than yours and are based on lookup tables.
> The first one uses #digitValue which uses Character's "built-in" DigitValues table:
>
> readHexFrom: aStream
>    "Initialize the receiver from a hexadecimal string representation"
>
>    1 to: self size do: [ :i |
>        | c v1 v2 |
>        ((c := aStream next) asInteger > 102 "$f asInteger"
>            or: [ (v1 := c digitValue) < 0
>            or: [ v1 > 15
>            or: [ (c := aStream next) asInteger > 102 "$f asInteger"
>            or: [ (v2 := c digitValue) < 0
>            or: [ v2 > 15 ] ] ] ] ])
>            ifTrue: [ ^self error: 'Hex digit expected' ].
>        self at: i put: (v1 bitShift: 4) + v2 ]

When I tested this it was slower.  I tested it on ByteArrays >= 256 bytes in length that were less than 50% zeros.  So let’s not go this way.

>
> The second one uses a non-minimal perfect hash table with an extremely simple hash function:
>
> readHexFrom: aStream
>    "Initialize the receiver from a hexadecimal string representation"
>
>    | keys values |
>    keys :=  #[0 65 66 67 68 69 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 97 98 99 100 101 102 0 0 0 0 0 0 0 0 0 48 49 50 51 52 53 54 55 56 57 0 0 0 0 0 0].
>    values :=  #[0 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0].
>    1 to: self size do: [ :i |
>        | c index v1 v2 |
>        c := aStream next asInteger.
>        index := (c bitAnd: 63) + 1.
>        (keys at: index) = c ifFalse: [ ^self error: 'Hex digit expected' ].
>        v1 := values at: index.
>        c := aStream next asInteger.
>        index := (c bitAnd: 63) + 1.
>        (keys at: index) = c ifFalse: [ ^self error: 'Hex digit expected' ].
>        v2 := values at: index.
>        self at: i put: (v1 bitShift: 4) + v2 ]

That’s neat, I’ll remember the technique.  But what we have is more straight forward.

> Levente
>
>> On Thu, 6 Aug 2020, [hidden email] wrote:
>>
>> Eliot Miranda uploaded a new version of Collections to project The Trunk:
>> http://source.squeak.org/trunk/Collections-eem.907.mcz
>>
>> ==================== Summary ====================
>>
>> Name: Collections-eem.907
>> Author: eem
>> Time: 6 August 2020, 12:51:46.481296 pm
>> UUID: 4ea594f7-b7bf-4d48-9e7e-2e7cb3af5e70
>> Ancestors: Collections-mt.906
>>
>> 2.7x faster ByteArray>>readHexFrom:
>>
>> =============== Diff against Collections-mt.906 ===============
>>
>> Item was changed:
>> ----- Method: ByteArray>>readHexFrom: (in category 'initialize') -----
>> readHexFrom: aStream
>>    "Initialize the receiver from a hexadecimal string representation"
>> + +    1 to: self size do:
>> +        [:i| | n v1 v2 |
>> +         n := aStream next asInteger.
>> +         v1 := n > 57                        "$9 asInteger = 57"
>> +                ifTrue:
>> +                    [n > 96                    "$a asInteger 97"
>> +                        ifTrue: [n - 87]
>> +                        ifFalse: [n > 64        "$A asInteger = 65"
>> +                                ifTrue: [n - 55]
>> +                                ifFalse: [-1]]] +                ifFalse: [n - 48].                "$0 asInteger = 48"
>> +         (v1 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected'].
>> +         n := aStream next asInteger.
>> +         v2 := n > 57                        "$9 asInteger = 57"
>> +                ifTrue:
>> +                    [n > 96                    "$a asInteger 97"
>> +                        ifTrue: [n - 87]
>> +                        ifFalse: [n > 64        "$A asInteger = 65"
>> +                                ifTrue: [n - 55]
>> +                                ifFalse: [-1]]]
>> +                ifFalse: [n - 48].                "$0 asInteger = 48"
>> +        (v2 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected'].
>> +        self at: i put: (v1 bitShift: 4) + v2]
>> + +    "Proof that our filter selects only hexadecimal characters:
>> +    (0 to: 255)
>> +        select:
>> +            [:n|
>> +            (n > 57
>> +                ifTrue:
>> +                    [n > 96 +                        ifTrue: [n - 87]
>> +                        ifFalse: [n > 64
>> +                                ifTrue: [n - 55]
>> +                                ifFalse: [-1]]]
>> +                ifFalse: [n - 48]) between: 0 and: 15]
>> +        thenCollect:
>> +            [:n| Character value: n]"!
>> -    | map v ch value |
>> -    map := '0123456789abcdefABCDEF'.
>> -    1 to: self size do:[:i|
>> -        ch := aStream next.
>> -        v := (map indexOf: ch) - 1.
>> -        ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected'].
>> -        value := v bitShift: 4.
>> -        ch := aStream next.
>> -        v := (map indexOf: ch) - 1.
>> -        ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected'].
>> -        value := value + v.
>> -        self at: i put: value.
>> -    ].
>> - !
>

Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-eem.907.mcz

Levente Uzonyi
Hi Eliot,

On Sat, 8 Aug 2020, Eliot Miranda wrote:

> Hi Levente,
>
>
>> On Aug 7, 2020, at 5:07 PM, Levente Uzonyi <[hidden email]> wrote:
>>
>> Hi Eliot,
>>
>> If performance matters, I've got two alternative implementations.
>
> Performance isn’t that big a deal.  But something told me that the indexOf: approach sux :-)
>
>> Both are about 1.2x faster than yours and are based on lookup tables.
>> The first one uses #digitValue which uses Character's "built-in" DigitValues table:
>>
>> readHexFrom: aStream
>>    "Initialize the receiver from a hexadecimal string representation"
>>
>>    1 to: self size do: [ :i |
>>        | c v1 v2 |
>>        ((c := aStream next) asInteger > 102 "$f asInteger"
>>            or: [ (v1 := c digitValue) < 0
>>            or: [ v1 > 15
>>            or: [ (c := aStream next) asInteger > 102 "$f asInteger"
>>            or: [ (v2 := c digitValue) < 0>>            or: [ v2 > 15 ] ] ] ] ])
>>            ifTrue: [ ^self error: 'Hex digit expected' ].
>>        self at: i put: (v1 bitShift: 4) + v2 ]
>
> When I tested this it was slower.  I tested it on ByteArrays >= 256 bytes in length that were less than 50% zeros.  So let’s not go this way.
That seemed really surprising first, but the "less than 50% zeroes" gave
it away. Your code takes different execution paths based on the input,
and that has a surpisingly significant effect on performance.
Here's the benchmark I used which generates uniformly distributed values
with mixed lower and upper case characters. It shows the performance
difference I wrote:

| b r letters s stream |
b := ByteArray new: 1000.
r := Random seed: 36rSqueak.
letters := ($0 to: $9), ($0 to: $9), ($a to: $f), ($A to: $F).
s := (1 to: b size * 2) collect: [ :e | letters atRandom: r ] as: String.
stream := s readStream.
{
  [ stream reset. b readHexFrom: stream ] benchFor: 1 seconds.
  [ stream reset. b readHexFromVariant1: stream ] benchFor: 1 seconds.
  [ stream reset. b readHexFromVariant2: stream ] benchFor: 1 seconds }

#(
  '16,700 per second. 59.7 microseconds per run. 0 % GC time.'
  '20,200 per second. 49.4 microseconds per run. 0 % GC time.'
  '20,100 per second. 49.8 microseconds per run. 0 % GC time.'
)

However, if the input is restricted to 0-9 characters by using

letters := $0 to: $9.

in the above benchmark, the results become

#(
  '22,700 per second. 44.2 microseconds per run. 0 % GC time.'
  '19,400 per second. 51.5 microseconds per run. 0 % GC time.'
  '19,500 per second. 51.3 microseconds per run. 0 % GC time.'
)

with

letters := ($0 to: $9), ($a to: $f).

the results are

#(
  '18,800 per second. 53.1 microseconds per run. 0 % GC time.'
  '21,200 per second. 47.2 microseconds per run. 0 % GC time.'
  '20,200 per second. 49.4 microseconds per run. 0 % GC time.'
)

and with

letters := $a to: $f.

#(
  '22,600 per second. 44.3 microseconds per run. 0 % GC time.'
  '20,500 per second. 48.9 microseconds per run. 0 % GC time.'
  '20,000 per second. 50 microseconds per run. 0 % GC time.'
)

So, the more characters are from a single range (0-9 or a-f or A-F), as it
was probably the case with the data you benchmarked with, the faster your
variant becomes.


Levente

>
>>
>> The second one uses a non-minimal perfect hash table with an extremely simple hash function:
>>
>> readHexFrom: aStream
>>    "Initialize the receiver from a hexadecimal string representation"
>>
>>    | keys values |
>>    keys :=  #[0 65 66 67 68 69 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 97 98 99 100 101 102 0 0 0 0 0 0 0 0 0 48 49 50 51 52 53 54 55 56 57 0 0 0 0 0 0].
>>    values :=  #[0 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0].
>>    1 to: self size do: [ :i |
>>        | c index v1 v2 |
>>        c := aStream next asInteger.
>>        index := (c bitAnd: 63) + 1.
>>        (keys at: index) = c ifFalse: [ ^self error: 'Hex digit expected' ].
>>        v1 := values at: index.
>>        c := aStream next asInteger.
>>        index := (c bitAnd: 63) + 1.
>>        (keys at: index) = c ifFalse: [ ^self error: 'Hex digit expected' ].
>>        v2 := values at: index.
>>        self at: i put: (v1 bitShift: 4) + v2 ]
>
> That’s neat, I’ll remember the technique.  But what we have is more straight forward.
>
>> Levente
>>
>>> On Thu, 6 Aug 2020, [hidden email] wrote:
>>>
>>> Eliot Miranda uploaded a new version of Collections to project The Trunk:
>>> http://source.squeak.org/trunk/Collections-eem.907.mcz
>>>
>>> ==================== Summary ====================
>>>
>>> Name: Collections-eem.907
>>> Author: eem
>>> Time: 6 August 2020, 12:51:46.481296 pm
>>> UUID: 4ea594f7-b7bf-4d48-9e7e-2e7cb3af5e70
>>> Ancestors: Collections-mt.906
>>>
>>> 2.7x faster ByteArray>>readHexFrom:
>>>
>>> =============== Diff against Collections-mt.906 ===============
>>>
>>> Item was changed:
>>> ----- Method: ByteArray>>readHexFrom: (in category 'initialize') -----
>>> readHexFrom: aStream
>>>    "Initialize the receiver from a hexadecimal string representation"
>>> + +    1 to: self size do:
>>> +        [:i| | n v1 v2 |
>>> +         n := aStream next asInteger.
>>> +         v1 := n > 57                        "$9 asInteger = 57"
>>> +                ifTrue:
>>> +                    [n > 96                    "$a asInteger 97"
>>> +                        ifTrue: [n - 87]
>>> +                        ifFalse: [n > 64        "$A asInteger = 65"
>>> +                                ifTrue: [n - 55]
>>> +                                ifFalse: [-1]]] +                ifFalse: [n - 48].                "$0 asInteger = 48"
>>> +         (v1 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected'].
>>> +         n := aStream next asInteger.
>>> +         v2 := n > 57                        "$9 asInteger = 57"
>>> +                ifTrue:
>>> +                    [n > 96                    "$a asInteger 97"
>>> +                        ifTrue: [n - 87]
>>> +                        ifFalse: [n > 64        "$A asInteger = 65"
>>> +                                ifTrue: [n - 55]
>>> +                                ifFalse: [-1]]]
>>> +                ifFalse: [n - 48].                "$0 asInteger = 48"
>>> +        (v2 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected'].
>>> +        self at: i put: (v1 bitShift: 4) + v2]
>>> + +    "Proof that our filter selects only hexadecimal characters:
>>> +    (0 to: 255)
>>> +        select:
>>> +            [:n|
>>> +            (n > 57
>>> +                ifTrue:
>>> +                    [n > 96 +                        ifTrue: [n - 87]
>>> +                        ifFalse: [n > 64
>>> +                                ifTrue: [n - 55]
>>> +                                ifFalse: [-1]]]
>>> +                ifFalse: [n - 48]) between: 0 and: 15]
>>> +        thenCollect:
>>> +            [:n| Character value: n]"!
>>> -    | map v ch value |
>>> -    map := '0123456789abcdefABCDEF'.
>>> -    1 to: self size do:[:i|
>>> -        ch := aStream next.
>>> -        v := (map indexOf: ch) - 1.
>>> -        ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected'].
>>> -        value := v bitShift: 4.
>>> -        ch := aStream next.
>>> -        v := (map indexOf: ch) - 1.
>>> -        ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected'].
>>> -        value := value + v.
>>> -        self at: i put: value.
>>> -    ].
>>> - !
>>