The Inbox: Multilingual-ul.208.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

The Inbox: Multilingual-ul.208.mcz

commits-2
A new version of Multilingual was added to project The Inbox:
http://source.squeak.org/inbox/Multilingual-ul.208.mcz

==================== Summary ====================

Name: Multilingual-ul.208
Author: ul
Time: 1 May 2015, 3:25:18.828 pm
UUID: 82d19dac-c602-4c0d-bc9a-7858e3a3c283
Ancestors: Multilingual-ul.206

Improved Unicode caseMappings:
- Don't overwrite an existing mapping, because that leads to problems (like (Unicode toUppercaseCode: $k asciiValue) = 8490)
- Use PluggableDictionary class >> #integerDictionary for better lookup performance (~+16%), and compaction resistance (done at every release).
- Compact the dictionaries before saving.
- Save the new dictionaries atomically.

=============== Diff against Multilingual-ul.206 ===============

Item was changed:
  ----- Method: Unicode class>>initializeCaseMappings (in category 'casing') -----
  initializeCaseMappings
  "Unicode initializeCaseMappings"
+
+ UIManager default informUserDuring: [ :bar |
- ToCasefold := IdentityDictionary new.
- ToUpper := IdentityDictionary new.
- ToLower := IdentityDictionary new.
- UIManager default informUserDuring: [:bar|
  | stream |
  bar value: 'Downloading Unicode data'.
  stream := HTTPClient httpGet: 'http://www.unicode.org/Public/UNIDATA/CaseFolding.txt'.
  (stream isKindOf: RWBinaryOrTextStream) ifFalse:[^self error: 'Download failed'].
  stream reset.
  bar value: 'Updating Case Mappings'.
+ self parseCaseMappingFrom: stream ].!
- self parseCaseMappingFrom: stream.
- ].!

Item was changed:
  ----- Method: Unicode class>>parseCaseMappingFrom: (in category 'casing') -----
  parseCaseMappingFrom: stream
  "Parse the Unicode casing mappings from the given stream.
  Handle only the simple mappings"
  "
  Unicode initializeCaseMappings.
  "
 
+ | newToCasefold newToUpper newToLower casefoldKeys |
+ newToCasefold := PluggableDictionary integerDictionary.
+ newToUpper := PluggableDictionary integerDictionary.
+ newToLower := PluggableDictionary integerDictionary.
- ToCasefold := IdentityDictionary new: 2048.
- ToUpper := IdentityDictionary new: 2048.
- ToLower := IdentityDictionary new: 2048.
 
+ "Filter the mappings (Simple and Common) to newToCasefold."
+ stream contents linesDo: [ :line |
+ | data fields sourceCode destinationCode |
+ data := line copyUpTo: $#.
+ fields := data findTokens: '; '.
+ (fields size > 2 and: [ #('C' 'S') includes: (fields at: 2) ]) ifTrue:[
+ sourceCode := Integer readFrom: (fields at: 1) base: 16.
+ destinationCode := Integer readFrom: (fields at: 3) base: 16.
+ newToCasefold at: sourceCode put: destinationCode ] ].
- [stream atEnd] whileFalse:[
- | fields line srcCode dstCode |
- line := stream nextLine copyUpTo: $#.
- fields := line withBlanksTrimmed findTokens: $;.
- (fields size > 2 and: [#('C' 'S') includes: (fields at: 2) withBlanksTrimmed]) ifTrue:[
- srcCode := Integer readFrom: (fields at: 1) withBlanksTrimmed base: 16.
- dstCode := Integer readFrom: (fields at: 3) withBlanksTrimmed base: 16.
- ToCasefold at: srcCode put: dstCode.
- ].
- ].
 
+ casefoldKeys := newToCasefold keys.
+ newToCasefold keysAndValuesDo: [ :sourceCode :destinationCode |
+ (self isUppercaseCode: sourceCode) ifTrue: [
+ "In most cases, uppercase letter are folded to lower case"
+ newToUpper at: destinationCode put: sourceCode.
+ newToLower at: sourceCode ifAbsentPut: destinationCode "Don't overwrite existing pairs. To avoid $k asUppercase to return the Kelvin character (8490)." ].
+ (self isLowercaseCode: sourceCode) ifTrue: [
+ "In a few cases, two upper case letters are folded to the same lower case.
+ We must find an upper case letter folded to the same letter"
+ casefoldKeys
+ detect: [ :each |
+ (self isUppercaseCode: each) and: [
+ (newToCasefold at: each) = destinationCode ] ]
+ ifFound: [ :uppercaseCode |
+ newToUpper at: sourceCode put: uppercaseCode ]
+ ifNone: [ ] ] ].
+
+ "Compact the dictionaries."
+ newToCasefold compact.
+ newToUpper compact.
+ newToLower compact.
+ "Save in an atomic operation."
+ ToCasefold := newToCasefold.
+ ToUpper := newToUpper.
+ ToLower := newToLower
+ !
- ToCasefold keysAndValuesDo:
- [:k :v |
- (self isUppercaseCode: k)
- ifTrue:
- ["In most cases, uppercase letter are folded to lower case"
- ToUpper at: v put: k.
- ToLower at: k put: v].
- (self isLowercaseCode: k)
- ifTrue:
- ["In a few cases, two upper case letters are folded to the same lower case.
- We must find an upper case letter folded to the same letter"
- | up |
- up := ToCasefold keys detect: [:e | (self isUppercaseCode: e) and: [(ToCasefold at: e) = v]] ifNone: [nil].
- up ifNotNil: [ToUpper at: k put: up]]].!