Eliot Miranda uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-eem.907.mcz ==================== Summary ==================== Name: Collections-eem.907 Author: eem Time: 6 August 2020, 12:51:46.481296 pm UUID: 4ea594f7-b7bf-4d48-9e7e-2e7cb3af5e70 Ancestors: Collections-mt.906 2.7x faster ByteArray>>readHexFrom: =============== Diff against Collections-mt.906 =============== Item was changed: ----- Method: ByteArray>>readHexFrom: (in category 'initialize') ----- readHexFrom: aStream "Initialize the receiver from a hexadecimal string representation" + + 1 to: self size do: + [:i| | n v1 v2 | + n := aStream next asInteger. + v1 := n > 57 "$9 asInteger = 57" + ifTrue: + [n > 96 "$a asInteger 97" + ifTrue: [n - 87] + ifFalse: [n > 64 "$A asInteger = 65" + ifTrue: [n - 55] + ifFalse: [-1]]] + ifFalse: [n - 48]. "$0 asInteger = 48" + (v1 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected']. + n := aStream next asInteger. + v2 := n > 57 "$9 asInteger = 57" + ifTrue: + [n > 96 "$a asInteger 97" + ifTrue: [n - 87] + ifFalse: [n > 64 "$A asInteger = 65" + ifTrue: [n - 55] + ifFalse: [-1]]] + ifFalse: [n - 48]. "$0 asInteger = 48" + (v2 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected']. + self at: i put: (v1 bitShift: 4) + v2] + + "Proof that our filter selects only hexadecimal characters: + (0 to: 255) + select: + [:n| + (n > 57 + ifTrue: + [n > 96 + ifTrue: [n - 87] + ifFalse: [n > 64 + ifTrue: [n - 55] + ifFalse: [-1]]] + ifFalse: [n - 48]) between: 0 and: 15] + thenCollect: + [:n| Character value: n]"! - | map v ch value | - map := '0123456789abcdefABCDEF'. - 1 to: self size do:[:i| - ch := aStream next. - v := (map indexOf: ch) - 1. - ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected']. - value := v bitShift: 4. - ch := aStream next. - v := (map indexOf: ch) - 1. - ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected']. - value := value + v. - self at: i put: value. - ]. - ! |
Hi Eliot,
If performance matters, I've got two alternative implementations. Both are about 1.2x faster than yours and are based on lookup tables. The first one uses #digitValue which uses Character's "built-in" DigitValues table: readHexFrom: aStream "Initialize the receiver from a hexadecimal string representation" 1 to: self size do: [ :i | | c v1 v2 | ((c := aStream next) asInteger > 102 "$f asInteger" or: [ (v1 := c digitValue) < 0 or: [ v1 > 15 or: [ (c := aStream next) asInteger > 102 "$f asInteger" or: [ (v2 := c digitValue) < 0 or: [ v2 > 15 ] ] ] ] ]) ifTrue: [ ^self error: 'Hex digit expected' ]. self at: i put: (v1 bitShift: 4) + v2 ] The second one uses a non-minimal perfect hash table with an extremely simple hash function: readHexFrom: aStream "Initialize the receiver from a hexadecimal string representation" | keys values | keys := #[0 65 66 67 68 69 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 97 98 99 100 101 102 0 0 0 0 0 0 0 0 0 48 49 50 51 52 53 54 55 56 57 0 0 0 0 0 0]. values := #[0 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0]. 1 to: self size do: [ :i | | c index v1 v2 | c := aStream next asInteger. index := (c bitAnd: 63) + 1. (keys at: index) = c ifFalse: [ ^self error: 'Hex digit expected' ]. v1 := values at: index. c := aStream next asInteger. index := (c bitAnd: 63) + 1. (keys at: index) = c ifFalse: [ ^self error: 'Hex digit expected' ]. v2 := values at: index. self at: i put: (v1 bitShift: 4) + v2 ] Levente On Thu, 6 Aug 2020, [hidden email] wrote: > Eliot Miranda uploaded a new version of Collections to project The Trunk: > http://source.squeak.org/trunk/Collections-eem.907.mcz > > ==================== Summary ==================== > > Name: Collections-eem.907 > Author: eem > Time: 6 August 2020, 12:51:46.481296 pm > UUID: 4ea594f7-b7bf-4d48-9e7e-2e7cb3af5e70 > Ancestors: Collections-mt.906 > > 2.7x faster ByteArray>>readHexFrom: > > =============== Diff against Collections-mt.906 =============== > > Item was changed: > ----- Method: ByteArray>>readHexFrom: (in category 'initialize') ----- > readHexFrom: aStream > "Initialize the receiver from a hexadecimal string representation" > + > + 1 to: self size do: > + [:i| | n v1 v2 | > + n := aStream next asInteger. > + v1 := n > 57 "$9 asInteger = 57" > + ifTrue: > + [n > 96 "$a asInteger 97" > + ifTrue: [n - 87] > + ifFalse: [n > 64 "$A asInteger = 65" > + ifTrue: [n - 55] > + ifFalse: [-1]]] > + ifFalse: [n - 48]. "$0 asInteger = 48" > + (v1 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected']. > + n := aStream next asInteger. > + v2 := n > 57 "$9 asInteger = 57" > + ifTrue: > + [n > 96 "$a asInteger 97" > + ifTrue: [n - 87] > + ifFalse: [n > 64 "$A asInteger = 65" > + ifTrue: [n - 55] > + ifFalse: [-1]]] > + ifFalse: [n - 48]. "$0 asInteger = 48" > + (v2 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected']. > + self at: i put: (v1 bitShift: 4) + v2] > + > + "Proof that our filter selects only hexadecimal characters: > + (0 to: 255) > + select: > + [:n| > + (n > 57 > + ifTrue: > + [n > 96 > + ifTrue: [n - 87] > + ifFalse: [n > 64 > + ifTrue: [n - 55] > + ifFalse: [-1]]] > + ifFalse: [n - 48]) between: 0 and: 15] > + thenCollect: > + [:n| Character value: n]"! > - | map v ch value | > - map := '0123456789abcdefABCDEF'. > - 1 to: self size do:[:i| > - ch := aStream next. > - v := (map indexOf: ch) - 1. > - ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected']. > - value := v bitShift: 4. > - ch := aStream next. > - v := (map indexOf: ch) - 1. > - ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected']. > - value := value + v. > - self at: i put: value. > - ]. > - ! |
Hi Levente,
> On Aug 7, 2020, at 5:07 PM, Levente Uzonyi <[hidden email]> wrote: > > Hi Eliot, > > If performance matters, I've got two alternative implementations. Performance isn’t that big a deal. But something told me that the indexOf: approach sux :-) > Both are about 1.2x faster than yours and are based on lookup tables. > The first one uses #digitValue which uses Character's "built-in" DigitValues table: > > readHexFrom: aStream > "Initialize the receiver from a hexadecimal string representation" > > 1 to: self size do: [ :i | > | c v1 v2 | > ((c := aStream next) asInteger > 102 "$f asInteger" > or: [ (v1 := c digitValue) < 0 > or: [ v1 > 15 > or: [ (c := aStream next) asInteger > 102 "$f asInteger" > or: [ (v2 := c digitValue) < 0 > or: [ v2 > 15 ] ] ] ] ]) > ifTrue: [ ^self error: 'Hex digit expected' ]. > self at: i put: (v1 bitShift: 4) + v2 ] When I tested this it was slower. I tested it on ByteArrays >= 256 bytes in length that were less than 50% zeros. So let’s not go this way. > > The second one uses a non-minimal perfect hash table with an extremely simple hash function: > > readHexFrom: aStream > "Initialize the receiver from a hexadecimal string representation" > > | keys values | > keys := #[0 65 66 67 68 69 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 97 98 99 100 101 102 0 0 0 0 0 0 0 0 0 48 49 50 51 52 53 54 55 56 57 0 0 0 0 0 0]. > values := #[0 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0]. > 1 to: self size do: [ :i | > | c index v1 v2 | > c := aStream next asInteger. > index := (c bitAnd: 63) + 1. > (keys at: index) = c ifFalse: [ ^self error: 'Hex digit expected' ]. > v1 := values at: index. > c := aStream next asInteger. > index := (c bitAnd: 63) + 1. > (keys at: index) = c ifFalse: [ ^self error: 'Hex digit expected' ]. > v2 := values at: index. > self at: i put: (v1 bitShift: 4) + v2 ] That’s neat, I’ll remember the technique. But what we have is more straight forward. > Levente > >> On Thu, 6 Aug 2020, [hidden email] wrote: >> >> Eliot Miranda uploaded a new version of Collections to project The Trunk: >> http://source.squeak.org/trunk/Collections-eem.907.mcz >> >> ==================== Summary ==================== >> >> Name: Collections-eem.907 >> Author: eem >> Time: 6 August 2020, 12:51:46.481296 pm >> UUID: 4ea594f7-b7bf-4d48-9e7e-2e7cb3af5e70 >> Ancestors: Collections-mt.906 >> >> 2.7x faster ByteArray>>readHexFrom: >> >> =============== Diff against Collections-mt.906 =============== >> >> Item was changed: >> ----- Method: ByteArray>>readHexFrom: (in category 'initialize') ----- >> readHexFrom: aStream >> "Initialize the receiver from a hexadecimal string representation" >> + + 1 to: self size do: >> + [:i| | n v1 v2 | >> + n := aStream next asInteger. >> + v1 := n > 57 "$9 asInteger = 57" >> + ifTrue: >> + [n > 96 "$a asInteger 97" >> + ifTrue: [n - 87] >> + ifFalse: [n > 64 "$A asInteger = 65" >> + ifTrue: [n - 55] >> + ifFalse: [-1]]] + ifFalse: [n - 48]. "$0 asInteger = 48" >> + (v1 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected']. >> + n := aStream next asInteger. >> + v2 := n > 57 "$9 asInteger = 57" >> + ifTrue: >> + [n > 96 "$a asInteger 97" >> + ifTrue: [n - 87] >> + ifFalse: [n > 64 "$A asInteger = 65" >> + ifTrue: [n - 55] >> + ifFalse: [-1]]] >> + ifFalse: [n - 48]. "$0 asInteger = 48" >> + (v2 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected']. >> + self at: i put: (v1 bitShift: 4) + v2] >> + + "Proof that our filter selects only hexadecimal characters: >> + (0 to: 255) >> + select: >> + [:n| >> + (n > 57 >> + ifTrue: >> + [n > 96 + ifTrue: [n - 87] >> + ifFalse: [n > 64 >> + ifTrue: [n - 55] >> + ifFalse: [-1]]] >> + ifFalse: [n - 48]) between: 0 and: 15] >> + thenCollect: >> + [:n| Character value: n]"! >> - | map v ch value | >> - map := '0123456789abcdefABCDEF'. >> - 1 to: self size do:[:i| >> - ch := aStream next. >> - v := (map indexOf: ch) - 1. >> - ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected']. >> - value := v bitShift: 4. >> - ch := aStream next. >> - v := (map indexOf: ch) - 1. >> - ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected']. >> - value := value + v. >> - self at: i put: value. >> - ]. >> - ! > |
Hi Eliot,
On Sat, 8 Aug 2020, Eliot Miranda wrote: > Hi Levente, > > >> On Aug 7, 2020, at 5:07 PM, Levente Uzonyi <[hidden email]> wrote: >> >> Hi Eliot, >> >> If performance matters, I've got two alternative implementations. > > Performance isn’t that big a deal. But something told me that the indexOf: approach sux :-) > >> Both are about 1.2x faster than yours and are based on lookup tables. >> The first one uses #digitValue which uses Character's "built-in" DigitValues table: >> >> readHexFrom: aStream >> "Initialize the receiver from a hexadecimal string representation" >> >> 1 to: self size do: [ :i | >> | c v1 v2 | >> ((c := aStream next) asInteger > 102 "$f asInteger" >> or: [ (v1 := c digitValue) < 0 >> or: [ v1 > 15 >> or: [ (c := aStream next) asInteger > 102 "$f asInteger" >> or: [ (v2 := c digitValue) < 0>> or: [ v2 > 15 ] ] ] ] ]) >> ifTrue: [ ^self error: 'Hex digit expected' ]. >> self at: i put: (v1 bitShift: 4) + v2 ] > > When I tested this it was slower. I tested it on ByteArrays >= 256 bytes in length that were less than 50% zeros. So let’s not go this way. it away. Your code takes different execution paths based on the input, and that has a surpisingly significant effect on performance. Here's the benchmark I used which generates uniformly distributed values with mixed lower and upper case characters. It shows the performance difference I wrote: | b r letters s stream | b := ByteArray new: 1000. r := Random seed: 36rSqueak. letters := ($0 to: $9), ($0 to: $9), ($a to: $f), ($A to: $F). s := (1 to: b size * 2) collect: [ :e | letters atRandom: r ] as: String. stream := s readStream. { [ stream reset. b readHexFrom: stream ] benchFor: 1 seconds. [ stream reset. b readHexFromVariant1: stream ] benchFor: 1 seconds. [ stream reset. b readHexFromVariant2: stream ] benchFor: 1 seconds } #( '16,700 per second. 59.7 microseconds per run. 0 % GC time.' '20,200 per second. 49.4 microseconds per run. 0 % GC time.' '20,100 per second. 49.8 microseconds per run. 0 % GC time.' ) However, if the input is restricted to 0-9 characters by using letters := $0 to: $9. in the above benchmark, the results become #( '22,700 per second. 44.2 microseconds per run. 0 % GC time.' '19,400 per second. 51.5 microseconds per run. 0 % GC time.' '19,500 per second. 51.3 microseconds per run. 0 % GC time.' ) with letters := ($0 to: $9), ($a to: $f). the results are #( '18,800 per second. 53.1 microseconds per run. 0 % GC time.' '21,200 per second. 47.2 microseconds per run. 0 % GC time.' '20,200 per second. 49.4 microseconds per run. 0 % GC time.' ) and with letters := $a to: $f. #( '22,600 per second. 44.3 microseconds per run. 0 % GC time.' '20,500 per second. 48.9 microseconds per run. 0 % GC time.' '20,000 per second. 50 microseconds per run. 0 % GC time.' ) So, the more characters are from a single range (0-9 or a-f or A-F), as it was probably the case with the data you benchmarked with, the faster your variant becomes. Levente > >> >> The second one uses a non-minimal perfect hash table with an extremely simple hash function: >> >> readHexFrom: aStream >> "Initialize the receiver from a hexadecimal string representation" >> >> | keys values | >> keys := #[0 65 66 67 68 69 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 97 98 99 100 101 102 0 0 0 0 0 0 0 0 0 48 49 50 51 52 53 54 55 56 57 0 0 0 0 0 0]. >> values := #[0 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0]. >> 1 to: self size do: [ :i | >> | c index v1 v2 | >> c := aStream next asInteger. >> index := (c bitAnd: 63) + 1. >> (keys at: index) = c ifFalse: [ ^self error: 'Hex digit expected' ]. >> v1 := values at: index. >> c := aStream next asInteger. >> index := (c bitAnd: 63) + 1. >> (keys at: index) = c ifFalse: [ ^self error: 'Hex digit expected' ]. >> v2 := values at: index. >> self at: i put: (v1 bitShift: 4) + v2 ] > > That’s neat, I’ll remember the technique. But what we have is more straight forward. > >> Levente >> >>> On Thu, 6 Aug 2020, [hidden email] wrote: >>> >>> Eliot Miranda uploaded a new version of Collections to project The Trunk: >>> http://source.squeak.org/trunk/Collections-eem.907.mcz >>> >>> ==================== Summary ==================== >>> >>> Name: Collections-eem.907 >>> Author: eem >>> Time: 6 August 2020, 12:51:46.481296 pm >>> UUID: 4ea594f7-b7bf-4d48-9e7e-2e7cb3af5e70 >>> Ancestors: Collections-mt.906 >>> >>> 2.7x faster ByteArray>>readHexFrom: >>> >>> =============== Diff against Collections-mt.906 =============== >>> >>> Item was changed: >>> ----- Method: ByteArray>>readHexFrom: (in category 'initialize') ----- >>> readHexFrom: aStream >>> "Initialize the receiver from a hexadecimal string representation" >>> + + 1 to: self size do: >>> + [:i| | n v1 v2 | >>> + n := aStream next asInteger. >>> + v1 := n > 57 "$9 asInteger = 57" >>> + ifTrue: >>> + [n > 96 "$a asInteger 97" >>> + ifTrue: [n - 87] >>> + ifFalse: [n > 64 "$A asInteger = 65" >>> + ifTrue: [n - 55] >>> + ifFalse: [-1]]] + ifFalse: [n - 48]. "$0 asInteger = 48" >>> + (v1 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected']. >>> + n := aStream next asInteger. >>> + v2 := n > 57 "$9 asInteger = 57" >>> + ifTrue: >>> + [n > 96 "$a asInteger 97" >>> + ifTrue: [n - 87] >>> + ifFalse: [n > 64 "$A asInteger = 65" >>> + ifTrue: [n - 55] >>> + ifFalse: [-1]]] >>> + ifFalse: [n - 48]. "$0 asInteger = 48" >>> + (v2 between: 0 and: 15) ifFalse: [^self error: 'Hex digit expected']. >>> + self at: i put: (v1 bitShift: 4) + v2] >>> + + "Proof that our filter selects only hexadecimal characters: >>> + (0 to: 255) >>> + select: >>> + [:n| >>> + (n > 57 >>> + ifTrue: >>> + [n > 96 + ifTrue: [n - 87] >>> + ifFalse: [n > 64 >>> + ifTrue: [n - 55] >>> + ifFalse: [-1]]] >>> + ifFalse: [n - 48]) between: 0 and: 15] >>> + thenCollect: >>> + [:n| Character value: n]"! >>> - | map v ch value | >>> - map := '0123456789abcdefABCDEF'. >>> - 1 to: self size do:[:i| >>> - ch := aStream next. >>> - v := (map indexOf: ch) - 1. >>> - ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected']. >>> - value := v bitShift: 4. >>> - ch := aStream next. >>> - v := (map indexOf: ch) - 1. >>> - ((v between: 0 and: 15) or: [((v:= v - 6) between: 0 and: 15)]) ifFalse:[^self error: 'Hex digit expected']. >>> - value := value + v. >>> - self at: i put: value. >>> - ]. >>> - ! >> |
Free forum by Nabble | Edit this page |