The Trunk: Collections-nice.776.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Collections-nice.776.mcz

commits-2
Nicolas Cellier uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-nice.776.mcz

==================== Summary ====================

Name: Collections-nice.776
Author: nice
Time: 1 December 2017, 2:42:56.081887 am
UUID: 0682d8f2-1e46-4d55-be53-ed1efc8873cc
Ancestors: Collections-nice.775

Rename AbstractCharacterSet->CharacterSet

This is Part 2.
- remove AbstractCharacterSet
- recycle existing CharacterSet as abstract
  (it previously was byte-CharacterSet but instances have been migrated to ByteCharacterSet in step 1)
- keep byteArrayMap instance variable in abstract class, because all CharacterSet can have a map usable for accelerated primitive.
- make CharacterSet a factory
- clean-up ByteCharacterSet now that it is hooked back as a CharacterSet subclass

Note: now CharacterSet allCharacters will return a Set with all characters, including the wide Characters.
If you want only the byte characters, then use ByteCharacterSet allCharacters.

=============== Diff against Collections-nice.775 ===============

Item was removed:
- Collection subclass: #AbstractCharacterSet
- instanceVariableNames: ''
- classVariableNames: ''
- poolDictionaries: ''
- category: 'Collections-Support'!

Item was removed:
- ----- Method: AbstractCharacterSet class>>isAbstract (in category 'testing') -----
- isAbstract
- ^self = AbstractCharacterSet!

Item was removed:
- ----- Method: AbstractCharacterSet>>byteArrayMap (in category 'accessing') -----
- byteArrayMap
- ^self subclassReponsibility!

Item was removed:
- ----- Method: AbstractCharacterSet>>findFirstInByteString:startingAt: (in category 'enumerating') -----
- findFirstInByteString: aByteString startingAt: startIndex
- "Double dispatching: since we know this is a ByteString, we can use a superfast primitive using a ByteArray map with 0 slots for byte characters not included and 1 for byte characters included in the receiver."
- ^ByteString
- findFirstInString: aByteString
- inSet: self byteArrayMap
- startingAt: startIndex!

Item was removed:
- ----- Method: AbstractCharacterSet>>occurrencesOf: (in category 'enumerating') -----
- occurrencesOf: anObject
- "Answer how many of the receiver's elements are equal to anObject. Optimized version."
-
- (self includes: anObject) ifTrue: [ ^1 ].
- ^0!

Item was removed:
- ----- Method: AbstractCharacterSet>>removeAll (in category 'removing') -----
- removeAll
- self becomeForward: CharacterSet new!

Item was removed:
- ----- Method: AbstractCharacterSet>>species (in category 'private') -----
- species
- ^CharacterSet!

Item was changed:
+ CharacterSet subclass: #ByteCharacterSet
+ instanceVariableNames: 'tally'
+ classVariableNames: ''
- Collection subclass: #ByteCharacterSet
- instanceVariableNames: 'byteArrayMap tally'
- classVariableNames: 'CrLf NonSeparators Separators'
  poolDictionaries: ''
  category: 'Collections-Support'!
 
  !ByteCharacterSet commentStamp: '<historical>' prior: 0!
  A set of characters.  Lookups for inclusion are very fast.!

Item was changed:
  ----- Method: ByteCharacterSet class>>allCharacters (in category 'instance creation') -----
  allCharacters
+ "return a set containing all byte characters"
- "return a set containing all characters"
 
+ ^ self fromMap: (ByteArray new: 256 withAll: 1)!
- | set |
- set := self empty.
- 0 to: 255 do: [ :ascii | set add: (Character value: ascii) ].
- ^set!

Item was removed:
- ----- Method: ByteCharacterSet class>>cleanUp: (in category 'initialize-release') -----
- cleanUp: aggressive
-
- CrLf := NonSeparators := Separators := nil!

Item was removed:
- ----- Method: ByteCharacterSet class>>crlf (in category 'accessing') -----
- crlf
-
- ^CrLf ifNil: [ CrLf := self with: Character cr with: Character lf ]!

Item was removed:
- ----- Method: ByteCharacterSet class>>empty (in category 'instance creation') -----
- empty
-   "return an empty set of characters"
- ^self new!

Item was added:
+ ----- Method: ByteCharacterSet class>>fromMap: (in category 'instance creation') -----
+ fromMap: aByteArray
+ self basicNew fromMap: aByteArray!

Item was removed:
- ----- Method: ByteCharacterSet class>>newFrom: (in category 'instance creation') -----
- newFrom: aCollection
- | newCollection |
- newCollection := self new.
- newCollection addAll: aCollection.
- ^newCollection!

Item was removed:
- ----- Method: ByteCharacterSet class>>nonSeparators (in category 'accessing') -----
- nonSeparators
- "return a set containing everything but the whitespace characters"
-
- ^NonSeparators ifNil: [
- NonSeparators := self separators complement ]!

Item was removed:
- ----- Method: ByteCharacterSet class>>separators (in category 'accessing') -----
- separators
- "return a set containing just the whitespace characters"
-
- ^Separators ifNil: [ Separators := self newFrom: Character separators ]!

Item was removed:
- ----- Method: ByteCharacterSet class>>withAll: (in category 'instance creation') -----
- withAll: aCollection
- "Create a new ByteCharacterSet containing all the characters from aCollection."
-
- ^self newFrom: aCollection!

Item was removed:
- ----- Method: ByteCharacterSet>>byteComplement (in category 'conversion') -----
- byteComplement
- "return a character set containing precisely the single byte characters the receiver does not"
-
- | set |
- set := ByteCharacterSet allCharacters.
- self do: [ :c | set remove: c ].
- ^set!

Item was removed:
- ----- Method: ByteCharacterSet>>complement (in category 'conversion') -----
- complement
- "return a character set containing precisely the characters the receiver does not"
-
- ^ByteCharacterSetComplement of: self copy!

Item was removed:
- ----- Method: ByteCharacterSet>>findFirstInByteString:startingAt: (in category 'zap me later') -----
- findFirstInByteString: aByteString startingAt: startIndex
- "Double dispatching: since we know this is a ByteString, we can use a superfast primitive using a ByteArray map with 0 slots for byte characters not included and 1 for byte characters included in the receiver."
- ^ByteString
- findFirstInString: aByteString
- inSet: self byteArrayMap
- startingAt: startIndex!

Item was added:
+ ----- Method: ByteCharacterSet>>fromMap: (in category 'initialize-release') -----
+ fromMap: aByteArray
+ byteArrayMap := aByteArray.
+ tally := aByteArray count: [:e | e = 1]!

Item was changed:
+ ----- Method: ByteCharacterSet>>initialize (in category 'initialize-release') -----
- ----- Method: ByteCharacterSet>>initialize (in category 'private') -----
  initialize
-
  byteArrayMap := ByteArray new: 256.
  tally := 0!

Item was removed:
- ----- Method: ByteCharacterSet>>occurrencesOf: (in category 'zap me later') -----
- occurrencesOf: anObject
- "Answer how many of the receiver's elements are equal to anObject. Optimized version."
-
- (self includes: anObject) ifTrue: [ ^1 ].
- ^0!

Item was removed:
- ----- Method: ByteCharacterSet>>species (in category 'zap me later') -----
- species
- ^CharacterSet!

Item was changed:
  ----- Method: ByteCharacterSet>>union: (in category 'enumerating') -----
  union: aCollection
+ (self isCharacters: aCollection) ifFalse: [^super union: aCollection].
- (self species = aCollection species or: [aCollection isString or: [aCollection allSatisfy: [:e | e isCharacter]]]) ifFalse: [^super union: aCollection].
  (self species = aCollection species and: [self class ~= aCollection class]) ifTrue: [^aCollection union: self].
  ^self copy addAll: aCollection; yourself!

Item was changed:
  ----- Method: ByteCharacterSet>>wideCharacterMap (in category 'private') -----
  wideCharacterMap
  "used for comparing with WideByteCharacterSet"
 
  | wide |
+ wide := WideCharacterSet new.
- wide := WideByteCharacterSet new.
  wide addAll: self.
  ^wide wideCharacterMap!

Item was changed:
+ Collection subclass: #CharacterSet
+ instanceVariableNames: 'byteArrayMap'
- AbstractCharacterSet subclass: #CharacterSet
- instanceVariableNames: 'map tally'
  classVariableNames: 'CrLf NonSeparators Separators'
  poolDictionaries: ''
  category: 'Collections-Support'!
 
  !CharacterSet commentStamp: '<historical>' prior: 0!
  A set of characters.  Lookups for inclusion are very fast.!

Item was changed:
  ----- Method: CharacterSet class>>allCharacters (in category 'instance creation') -----
  allCharacters
  "return a set containing all characters"
 
+ ^ self empty complement!
- | set |
- set := self empty.
- 0 to: 255 do: [ :ascii | set add: (Character value: ascii) ].
- ^set!

Item was changed:
  ----- Method: CharacterSet class>>crlf (in category 'accessing') -----
  crlf
 
+ ^CrLf ifNil: [ CrLf := self with: Character cr with: Character lf ]!
- ^CrLf ifNil: [ CrLf := ByteCharacterSet with: Character cr with: Character lf ]!

Item was changed:
  ----- Method: CharacterSet class>>empty (in category 'instance creation') -----
  empty
    "return an empty set of characters"
+ ^self new!
- ^ByteCharacterSet new!

Item was added:
+ ----- Method: CharacterSet class>>isAbstract (in category 'testing') -----
+ isAbstract
+ ^self = CharacterSet!

Item was added:
+ ----- Method: CharacterSet class>>new (in category 'instance creation') -----
+ new
+ self = CharacterSet ifTrue: [^ByteCharacterSet new].
+ ^super new!

Item was changed:
  ----- Method: CharacterSet class>>newFrom: (in category 'instance creation') -----
  newFrom: aCollection
  | newCollection |
+ newCollection := self new.
- newCollection := ByteCharacterSet new.
  newCollection addAll: aCollection.
  ^newCollection!

Item was removed:
- ----- Method: CharacterSet>>= (in category 'comparing') -----
- = anObject
-
- self species == anObject species ifFalse: [ ^false ].
- anObject size = tally ifFalse: [ ^false ].
- ^self byteArrayMap = anObject byteArrayMap!

Item was removed:
- ----- Method: CharacterSet>>add: (in category 'adding') -----
- add: aCharacter
- "I automatically become a WideCharacterSet if you add a wide character to myself"
-
- | index |
- (index := aCharacter asInteger + 1) <= 256 ifFalse: [
- | wide |
- wide := WideCharacterSet new.
- wide addAll: self.
- wide add: aCharacter.
- self becomeForward: wide.
- ^aCharacter ].
- (map at: index) = 1 ifFalse: [
- map at: index put: 1.
- tally := tally + 1 ].
- ^aCharacter!

Item was removed:
- ----- Method: CharacterSet>>asString (in category 'conversion') -----
- asString
- "Convert the receiver into a String"
-
- ^String new: self size streamContents:[:s|
- self do:[:ch| s nextPut: ch].
- ].!

Item was changed:
+ ----- Method: CharacterSet>>byteArrayMap (in category 'accessing') -----
- ----- Method: CharacterSet>>byteArrayMap (in category 'private') -----
  byteArrayMap
  "return a ByteArray mapping each ascii value to a 1 if that ascii value is in the set, and a 0 if it isn't.  Intended for use by primitives only"
+ ^byteArrayMap ifNil: [self initializeByteArrayMap]!
- ^map!

Item was changed:
+ ----- Method: CharacterSet>>byteComplement (in category 'converting') -----
- ----- Method: CharacterSet>>byteComplement (in category 'conversion') -----
  byteComplement
  "return a character set containing precisely the single byte characters the receiver does not"
 
+ ^ ByteCharacterSet fromMap: (self byteArrayMap collect: [:i | 1 - i])!
- | set |
- set := CharacterSet allCharacters.
- self do: [ :c | set remove: c ].
- ^set!

Item was added:
+ ----- Method: CharacterSet>>canBeEnumerated (in category 'testing') -----
+ canBeEnumerated
+ ^true!

Item was changed:
+ ----- Method: CharacterSet>>complement (in category 'converting') -----
- ----- Method: CharacterSet>>complement (in category 'conversion') -----
  complement
  "return a character set containing precisely the characters the receiver does not"
 
  ^CharacterSetComplement of: self copy!

Item was removed:
- ----- Method: CharacterSet>>do: (in category 'enumerating') -----
- do: aBlock
- "evaluate aBlock with each character in the set"
-
- | index |
- tally >= 128 ifTrue: [ "dense"
- index := 0.
- [ (index := index + 1) <= 256 ] whileTrue: [
- (map at: index) = 1 ifTrue: [
- aBlock value: (Character value: index - 1) ] ].
- ^self ].
- "sparse"
- index := 0.
- [ (index := map indexOf: 1 startingAt: index + 1) = 0 ] whileFalse: [
- aBlock value: (Character value: index - 1) ].
- !

Item was removed:
- ----- Method: CharacterSet>>hasWideCharacters (in category 'testing') -----
- hasWideCharacters
- ^false!

Item was removed:
- ----- Method: CharacterSet>>hash (in category 'comparing') -----
- hash
- ^self byteArrayMap hash!

Item was removed:
- ----- Method: CharacterSet>>includes: (in category 'testing') -----
- includes: anObject
-
- | index |
- anObject isCharacter ifFalse: [ ^false ].
- (index := anObject asInteger + 1) > 256 ifTrue: [ ^false ].
- ^(map at: index) > 0!

Item was removed:
- ----- Method: CharacterSet>>initialize (in category 'private') -----
- initialize
-
- map := ByteArray new: 256.
- tally := 0!

Item was added:
+ ----- Method: CharacterSet>>initializeByteArrayMap (in category 'private') -----
+ initializeByteArrayMap
+ byteArrayMap := (0 to: 255) asByteArray collect:
+ [:i | (self includes: (Character value: i)) ifTrue: [1] ifFalse: [0]]!

Item was added:
+ ----- Method: CharacterSet>>isCharacters: (in category 'testing') -----
+ isCharacters: aCollection
+ "Answer whether this collection contains characters"
+ ^self species = aCollection species or: [aCollection isString or: [aCollection allSatisfy: [:e | e isCharacter]]]!

Item was removed:
- ----- Method: CharacterSet>>isEmpty (in category 'testing') -----
- isEmpty
- ^tally = 0!

Item was added:
+ ----- Method: CharacterSet>>occurrencesOf: (in category 'enumerating') -----
+ occurrencesOf: anObject
+ "Answer how many of the receiver's elements are equal to anObject. Optimized version."
+
+ (self includes: anObject) ifTrue: [ ^1 ].
+ ^0!

Item was changed:
  ----- Method: CharacterSet>>postCopy (in category 'copying') -----
  postCopy
  super postCopy.
+ byteArrayMap := byteArrayMap copy!
- map := map copy!

Item was removed:
- ----- Method: CharacterSet>>remove: (in category 'removing') -----
- remove: aCharacter
-
- ^self remove: aCharacter ifAbsent: aCharacter!

Item was removed:
- ----- Method: CharacterSet>>remove:ifAbsent: (in category 'removing') -----
- remove: aCharacter ifAbsent: aBlock
-
- | index |
- (index := aCharacter asciiValue + 1) <= 256 ifFalse: [ ^aBlock value ].
- (map at: index) = 0 ifTrue: [ ^aBlock value ].
- map at: index put: 0.
- tally := tally - 1.
- ^aCharacter!

Item was changed:
  ----- Method: CharacterSet>>removeAll (in category 'removing') -----
  removeAll
+ self becomeForward: ByteCharacterSet new!
-
- map atAllPut: 0.
- tally := 0!

Item was changed:
+ ----- Method: CharacterSet>>species (in category 'private') -----
- ----- Method: CharacterSet>>species (in category 'comparing') -----
  species
  ^CharacterSet!

Item was removed:
- ----- Method: CharacterSet>>union: (in category 'enumerating') -----
- union: aCollection
- aCollection class = CharacterSetComplement ifTrue: [^aCollection union: self].
- (self species = aCollection species or: [aCollection isString]) ifFalse: [^super union: aCollection].
- ^self copy addAll: aCollection; yourself!

Item was removed:
- ----- Method: CharacterSet>>wideCharacterMap (in category 'private') -----
- wideCharacterMap
- "used for comparing with WideCharacterSet"
-
- | wide |
- wide := WideCharacterSet new.
- wide addAll: self.
- ^wide wideCharacterMap!

Item was changed:
+ CharacterSet subclass: #CharacterSetComplement
+ instanceVariableNames: 'absent'
- AbstractCharacterSet subclass: #CharacterSetComplement
- instanceVariableNames: 'absent byteArrayMapCache'
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Collections-Support'!
 
  !CharacterSetComplement commentStamp: 'nice 8/31/2008 14:53' prior: 0!
  CharacterSetComplement is a space efficient implementation of (CharacterSet complement) taking care of WideCharacter (code > 255)
 
  However, it will maintain a byteArrayMap for character <= 255 in a cache keeping
 
  instance variables:
  absent <CharacterSet> contains character that are not in the set (i.e. my complement)
  byteArrayMapCache <ByteArray | nil> cache this information because it has to be used in tight loops where efficiency matters!

Item was changed:
  ----- Method: CharacterSetComplement>>add: (in category 'adding') -----
  add: aCharacter
  "a character is present if not absent, so adding a character is removing it from the absent"
 
  (absent includes: aCharacter)
  ifTrue:
+ [byteArrayMap := nil.
- [byteArrayMapCache := nil.
  absent remove: aCharacter].
  ^ aCharacter!

Item was removed:
- ----- Method: CharacterSetComplement>>byteArrayMap (in category 'private') -----
- byteArrayMap
- "return a ByteArray mapping each ascii value to a 1 if that ascii value is in the set, and a 0 if it isn't.  Intended for use by primitives only"
-
- ^byteArrayMapCache ifNil: [byteArrayMapCache := absent byteArrayMap collect: [:i | 1 - i]]!

Item was added:
+ ----- Method: CharacterSetComplement>>canBeEnumerated (in category 'testing') -----
+ canBeEnumerated
+ ^false!

Item was changed:
  ----- Method: CharacterSetComplement>>complement: (in category 'initialize-release') -----
  complement: aCharacterSet
  "initialize with the complement"
 
+ byteArrayMap := nil.
- byteArrayMapCache := nil.
  absent := aCharacterSet.
  !

Item was changed:
  ----- Method: CharacterSetComplement>>remove: (in category 'removing') -----
  remove: aCharacter
  "This means aCharacter is now absent from myself.
  It must be added to my absent."
 
+ byteArrayMap := nil.
- byteArrayMapCache := nil.
  ^absent add: aCharacter!

Item was changed:
  ----- Method: CharacterSetComplement>>union: (in category 'enumerating') -----
  union: aCollection
  aCollection class = self class ifTrue: [^(self complement intersection: aCollection complement) complement].
+ (self isCharacters: aCollection) ifFalse: [^super union: aCollection].
- (aCollection species = CharacterSet or: [aCollection isString]) ifFalse: [^self error: 'no way to compute the union, this collection is too big'].
  ^(absent reject: [:e | aCollection includes: e]) complement!

Item was changed:
+ CharacterSet subclass: #LazyCharacterSet
+ instanceVariableNames: 'block'
- AbstractCharacterSet subclass: #LazyCharacterSet
- instanceVariableNames: 'block byteArrayMapCache'
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Collections-Support'!
 
  !LazyCharacterSet commentStamp: 'nice 11/30/2017 21:40' prior: 0!
  A LazyCharacterSet is a kind of CharacterSet which does not know in advance which Character it contains or not.
  If will lazily evaluate a block on demand if ever one ask whether it includes: a character.
  It is not feasible to enumerate a LazyCharacterSet, because there are way too many characters.
 
  Instance Variables
  block: <BlockContext | Symbol>
  byteArrayMapCache: <ByteArray | nil>
 
  block
  - a valuable, answering either true or false when sent the message value: - true means that this set includes the character passed as value: argument.
 
  byteArrayMapCache
  - a cache holding 0 or 1 for the first 256 character codes - 0 meaning not included, 1 included. This is used in some priitives
  !

Item was changed:
  ----- Method: LazyCharacterSet>>block: (in category 'accessing') -----
  block: aValuable
  "Set the block used to determine if I include a Character or not.
  aValuable is an object that shoud answer true or false when sent value:"
 
+ byteArrayMap := nil.
- byteArrayMapCache := nil.
  block := aValuable!

Item was removed:
- ----- Method: LazyCharacterSet>>byteArrayMap (in category 'accessing') -----
- byteArrayMap
- "return a ByteArray mapping each ascii value to a 1 if that ascii value is in the set, and a 0 if it isn't.  Intended for use by primitives only"
-
- ^byteArrayMapCache ifNil: [byteArrayMapCache := (0 to: 255) collect: [:i | self includes: (Character value: i)]]!

Item was added:
+ ----- Method: LazyCharacterSet>>canBeEnumerated (in category 'testing') -----
+ canBeEnumerated
+ ^false!

Item was added:
+ ----- Method: LazyCharacterSet>>intersection: (in category 'enumerating') -----
+ intersection: aCollection
+ (self species = aCollection species or: [aCollection isString or: [aCollection allSatisfy: [:e | e isCharacter]]]) ifFalse: [^super intersection: aCollection].
+ ^self class including: [:c | (aCollection includes: c) and: [block value: c]]!

Item was added:
+ ----- Method: LazyCharacterSet>>union: (in category 'enumerating') -----
+ union: aCollection
+ (self isCharacters: aCollection) ifFalse: [^super union: aCollection].
+ ^self class including: [:c | (aCollection includes: c) or: [block value: c]]!

Item was changed:
+ CharacterSet subclass: #WideCharacterSet
+ instanceVariableNames: 'map bitsetCapacity highBitsShift lowBitsMask'
- AbstractCharacterSet subclass: #WideCharacterSet
- instanceVariableNames: 'map byteArrayMap bitsetCapacity highBitsShift lowBitsMask'
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Collections-Support'!
 
  !WideCharacterSet commentStamp: 'nice 12/10/2009 19:17' prior: 0!
  WideCharacterSet is used to store a Set of WideCharacter with fast access and inclusion test.
 
  Implementation should be efficient in memory if sets are sufficently sparse.
 
  Wide Characters are at most 32bits.
  We split them into 16 highBits and 16 lowBits.
 
  map is a dictionary key: 16 highBits value: map of 16 lowBits.
 
  Maps of lowBits  are stored as arrays of bits in a ByteArray.
  If a bit is set to 1, this indicate that corresponding character is present.
  8192 bytes are necessary in each lowmap.
  Empty lowmap are removed from the map Dictionary.
 
  A byteArrayMap is maintained in parallel with map for fast handling of ByteString.
  (byteArrayMap at: i+1) = 0 means that character of asciiValue i is absent, = 1 means present.!

Item was changed:
  ----- Method: WideCharacterSet>>= (in category 'comparing') -----
  = anObject
+ ^self species == anObject species
+ and: [ anObject canBeEnumerated
+ and: [ self wideCharacterMap = anObject wideCharacterMap ] ]!
- ^self species == anObject species and: [
- self wideCharacterMap = anObject wideCharacterMap ]!

Item was removed:
- ----- Method: WideCharacterSet>>complement (in category 'converting') -----
- complement
- "return a character set containing precisely the characters the receiver does not"
-
- ^CharacterSetComplement of: self copy!

Item was changed:
  ----- Method: WideCharacterSet>>union: (in category 'enumerating') -----
  union: aCollection
+ (self isCharacters: aCollection) ifFalse: [^super union: aCollection].
+ (self species = aCollection species and: [self class ~= aCollection class]) ifTrue: [^aCollection union: self].
- (self species = aCollection species or: [aCollection isString]) ifFalse: [^super union: aCollection].
  ^self copy addAll: aCollection; yourself!