The Trunk: Collections-ul.645.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Collections-ul.645.mcz

commits-2
Levente Uzonyi uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ul.645.mcz

==================== Summary ====================

Name: Collections-ul.645
Author: ul
Time: 22 August 2015, 1:25:51.922 pm
UUID: d0e515b8-18ad-43f0-b96a-c13fa4834fc5
Ancestors: Collections-ul.644

Use the extracted Bitset in WideCharacterSet:
- decreased chunk size from 65536 to 256. This should give a purpose to use a dictionary instread of an array for map, because its size will have a chance to be larger than 64.
- byteArrayMap is always initialized, so there's no need to use the method to reference it, nor to check if it's initialized
- moved #remove: behavior to #remove:ifAbsent:, just like it's done by most collections. Kept the existing behavior of #remove: to not signal an error when the character is not in the collection.
- use #atAllPut: to clear the byteArrayMap in #removeAll, because it should be faster than allocating a new instance
- improved #size's performance
- migrate instances on-the-fly
- migrate any unmigrated instaces during the postscript
- removed the unused methods

=============== Diff against Collections-ul.644 ===============

Item was changed:
  Collection subclass: #WideCharacterSet
+ instanceVariableNames: 'map byteArrayMap bitsetCapacity highBitsShift lowBitsMask'
- instanceVariableNames: 'map byteArrayMap'
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Collections-Support'!
 
  !WideCharacterSet commentStamp: 'nice 12/10/2009 19:17' prior: 0!
  WideCharacterSet is used to store a Set of WideCharacter with fast access and inclusion test.
 
  Implementation should be efficient in memory if sets are sufficently sparse.
 
  Wide Characters are at most 32bits.
  We split them into 16 highBits and 16 lowBits.
 
  map is a dictionary key: 16 highBits value: map of 16 lowBits.
 
  Maps of lowBits  are stored as arrays of bits in a ByteArray.
  If a bit is set to 1, this indicate that corresponding character is present.
  8192 bytes are necessary in each lowmap.
  Empty lowmap are removed from the map Dictionary.
 
  A byteArrayMap is maintained in parallel with map for fast handling of ByteString.
  (byteArrayMap at: i+1) = 0 means that character of asciiValue i is absent, = 1 means present.!

Item was changed:
  ----- Method: WideCharacterSet>>add: (in category 'collection ops') -----
+ add: aCharacter
+
+ | value highBits lowBits |
+ self migrate.
+ (value := aCharacter asInteger) < 256 ifTrue: [
+ byteArrayMap at: value + 1 put: 1 ].
+ highBits := value bitShift: highBitsShift.
+ lowBits := value bitAnd: lowBitsMask.
+ (map at: highBits ifAbsentPut: [ Bitset new: bitsetCapacity ])
+ setBitAt: lowBits.
+ ^aCharacter!
- add: aCharacter
- | val high low lowmap |
- val := aCharacter asciiValue.
- val < 256 ifTrue: [self byteArrayMap at: val + 1 put: 1].
- high := val bitShift: -16.
- low := val bitAnd: 16rFFFF.
- lowmap := map at: high ifAbsentPut: ["create a chunk of 65536=8192*8 bits"
- ByteArray new: 8192].
- self setBitmap: lowmap at: low.
- ^ aCharacter!

Item was removed:
- ----- Method: WideCharacterSet>>bitmap:at: (in category 'private') -----
- bitmap: aMap at: shortInteger
- "access a single bit in aMap.
- shortInteger should be between: 0 and: 16rFFFF"
-
- | collecIndex bitIndex |
- collecIndex := shortInteger bitShift: -3.
- bitIndex := shortInteger bitAnd: 7.
- ^(aMap at: collecIndex + 1) bitAnd: (1 bitShift: bitIndex)!

Item was changed:
  ----- Method: WideCharacterSet>>bitmap:do: (in category 'private') -----
  bitmap: aMap do: aBlock
  "Execute a block with each value (0 based) corresponding to set bits.
  Implementation notes: this version works best for sparse maps.
  It has (byte lowBit) inlined for speed."
 
  | byte byteOffset lowBits |
+ lowBits := Integer lowBitPerByteTable. "The lowBits table gives a 1-based bitOffset"
- lowBits := #[1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 5 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 6 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 5 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 7 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 5 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 6 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 5 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 8 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 5 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 6 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 5 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 7 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 5 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 6 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 5 1 2 1 3 1 2 1 4 1 2 1 3 1 2 1]. "The lowBits table gives a 1-based bitOffset"
  1 to: aMap size do: [:i |
  (byte := aMap at: i) = 0 ifFalse: [
  byteOffset := (i bitShift: 3) - 9. "This byteOffset is -1 based"
  ["Evaluate the block with 0-based (byteOffset + bitOffset)"
  aBlock value: (byteOffset + (lowBits at: byte)).
  "Eliminate the low bit and loop if some bit remain"
  (byte := byte bitAnd: byte - 1) = 0] whileFalse]]!

Item was changed:
  ----- Method: WideCharacterSet>>byteArrayMap (in category 'comparing') -----
  byteArrayMap
  "return a ByteArray mapping each ascii value to a 1 if that ascii value is in the set, and a 0 if it isn't.
  Intended for use by primitives only. (and comparison)
  This version will answer a subset with only byte characters"
 
- | lowmap |
- byteArrayMap ifNil: [
- byteArrayMap := ByteArray new: 256.
- lowmap := map at: 0 ifAbsent: [^byteArrayMap].
- lowmap := lowmap copyFrom: 1 to: 32. "Keep first 8*32=256 bits..."
- self bitmap: lowmap do: [:code | byteArrayMap at: code + 1 put: 1]].
  ^byteArrayMap!

Item was removed:
- ----- Method: WideCharacterSet>>clearBitmap:at: (in category 'private') -----
- clearBitmap: aMap at: shortInteger
- "clear a single bit in aMap.
- shortInteger should be between: 0 and: 16rFFFF"
-
- | collecIndex bitIndex |
- collecIndex := shortInteger bitShift: -3.
- bitIndex := shortInteger bitAnd: 7.
- ^aMap at: collecIndex + 1 put: ((aMap at: collecIndex + 1) bitClear: (1 bitShift: bitIndex))!

Item was changed:
  ----- Method: WideCharacterSet>>do: (in category 'collection ops') -----
+ do: aBlock
+  
+ self migrate.
+ map keysAndValuesDo: [ :index :bitset |
+ | highBits |
+ highBits := index * bitsetCapacity.
+ bitset do: [ :lowBits |
+ aBlock value: (Character value: highBits + lowBits) ] ]!
- do: aBlock
- map
- keysAndValuesDo: [:index :lowmap |
- | high16Bits |
- high16Bits := index bitShift: 16.
- self
- bitmap: lowmap
- do: [:low16Bits | aBlock value: (Character value: high16Bits + low16Bits)]]!

Item was changed:
  ----- Method: WideCharacterSet>>findFirstInByteString:startingAt: (in category 'collection ops') -----
  findFirstInByteString: aByteString startingAt: startIndex
  "Double dispatching: since we know this is a ByteString, we can use a superfast primitive using a ByteArray map with 0 slots for byte characters not included and 1 for byte characters included in the receiver."
+
  ^ByteString
  findFirstInString: aByteString
+ inSet: byteArrayMap
- inSet: self byteArrayMap
  startingAt: startIndex!

Item was changed:
  ----- Method: WideCharacterSet>>hash (in category 'comparing') -----
  hash
  "Answer a hash code aimed at storing and retrieving the receiver in a Set or Dictionary.
  Two equal objects should have equal hash.
  Note: as the receiver can be equal to an ordinary CharacterSet,
  the hash code must reflect this"
 
+ self hasWideCharacters ifTrue: [ ^map hash ].
+ ^byteArrayMap hash!
- ^self hasWideCharacters
- ifTrue: [map hash]
- ifFalse: [self asCharacterSet hash]!

Item was changed:
  ----- Method: WideCharacterSet>>includes: (in category 'collection ops') -----
  includes: aCharacter
 
+ | value |
- | value bitmap |
  (value := aCharacter asInteger) < 256 ifTrue: [
  ^(byteArrayMap at: value + 1) ~= 0 ].
+ self migrate.
+ ^((map at: (value bitShift: highBitsShift) ifAbsent: nil) ifNil: [ ^false ])
+ includes: (value bitAnd: lowBitsMask)!
- bitmap := (map at: (value bitShift: -16) ifAbsent: nil) ifNil: [ ^false ].
- ^(self
- bitmap: bitmap
- at: (value bitAnd: 16rFFFF)) ~= 0!

Item was changed:
  ----- Method: WideCharacterSet>>initialize (in category 'initialize-release') -----
  initialize
 
  map := PluggableDictionary integerDictionary.
+ byteArrayMap := ByteArray new: 256.
+ self initializeWithLowBits: 8!
- byteArrayMap := ByteArray new: 256!

Item was added:
+ ----- Method: WideCharacterSet>>initializeWithLowBits: (in category 'initialize-release') -----
+ initializeWithLowBits: lowBits
+
+ bitsetCapacity := 1 bitShift: lowBits.
+ highBitsShift := 0 - lowBits.
+ lowBitsMask := bitsetCapacity - 1.
+ !

Item was added:
+ ----- Method: WideCharacterSet>>migrate (in category 'private') -----
+ migrate
+
+ | newMap |
+ bitsetCapacity ifNotNil: [ ^self "already migrated" ].
+ self initializeWithLowBits: 8.
+ newMap := PluggableDictionary integerDictionary.
+ map keysAndValuesDo: [ :index :lowmap |
+ | high16Bits |
+ high16Bits := index bitShift: 16.
+ self
+ bitmap: lowmap
+ do: [ :low16Bits |
+ | value highBits lowBits |
+ value := high16Bits + low16Bits.
+ highBits := value bitShift: highBitsShift.
+ lowBits := value bitAnd: lowBitsMask.
+ (newMap at: highBits ifAbsentPut: [ Bitset new: bitsetCapacity ])
+ setBitAt: lowBits ] ].
+ map := newMap!

Item was changed:
  ----- Method: WideCharacterSet>>remove: (in category 'collection ops') -----
+ remove: aCharacter
+ "Don't signal an error when aCharacter is not present."
+
+ ^self remove: aCharacter ifAbsent: aCharacter!
- remove: aCharacter
- | val high low lowmap |
- val := aCharacter asciiValue.
- val < 256 ifTrue: [self byteArrayMap at: val + 1 put: 0].
- high := val bitShift: -16.
- low := val bitAnd: 16rFFFF.
- lowmap := map
- at: high
- ifAbsent: [^ aCharacter].
- self clearBitmap: lowmap at: low.
- (lowmap allSatisfy: [:e | e = 0])
- ifTrue: [map removeKey: high].
- ^ aCharacter!

Item was changed:
  ----- Method: WideCharacterSet>>remove:ifAbsent: (in category 'collection ops') -----
  remove: aCharacter ifAbsent: aBlock
+
+ | value highBits lowBits bitset |
+ (value := aCharacter asInteger) < 256 ifTrue: [
+ (byteArrayMap at: value + 1) = 0 ifTrue: [ ^aBlock value ].
+ byteArrayMap at: value + 1 put: 0 ].
+ self migrate.
+ highBits := value bitShift: highBitsShift.
+ lowBits := value bitAnd: lowBitsMask.
+ bitset := (map at: highBits ifAbsent: nil) ifNil: [ ^aBlock value ].
+ ((bitset clearBitAt: lowBits) and: [ bitset size = 0 ]) ifTrue: [
+ map removeKey: highBits ].
+ ^aCharacter!
- (self includes: aCharacter) ifFalse: [^aBlock value].
- ^self remove: aCharacter!

Item was changed:
  ----- Method: WideCharacterSet>>removeAll (in category 'collection ops') -----
  removeAll
+
+ map isEmpty ifTrue: [ ^self ].
  map removeAll.
+ byteArrayMap atAllPut: 0!
- byteArrayMap := ByteArray new: 256!

Item was removed:
- ----- Method: WideCharacterSet>>setBitmap:at: (in category 'private') -----
- setBitmap: aMap at: shortInteger
- "set a single bit in aMap.
- shortInteger should be between: 0 and: 16rFFFF"
-
- | collecIndex bitIndex |
- collecIndex := shortInteger bitShift: -3.
- bitIndex := shortInteger bitAnd: 7.
- ^aMap at: collecIndex + 1 put: ((aMap at: collecIndex + 1) bitOr: (1 bitShift: bitIndex))!

Item was changed:
  ----- Method: WideCharacterSet>>size (in category 'collection ops') -----
  size
+
+ ^map detectSum: [ :each | each size ]!
- | size |
- size := 0.
- map
- keysAndValuesDo: [:high :lowmap | self
- bitmap: lowmap
- do: [:low | size := size + 1]].
- ^ size!

Item was changed:
+ (PackageInfo named: 'Collections') postscript: 'WideCharacterSet allInstancesDo: #migrate'!
- (PackageInfo named: 'Collections') postscript: 'Character initializeClassificationTable.
- String initialize'!