The Trunk: Multilingual-pre.228.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Multilingual-pre.228.mcz

commits-2
Patrick Rein uploaded a new version of Multilingual to project The Trunk:
http://source.squeak.org/trunk/Multilingual-pre.228.mcz

==================== Summary ====================

Name: Multilingual-pre.228
Author: pre
Time: 8 June 2017, 9:40:08.341697 am
UUID: cb64d235-8b3f-a140-b34f-4695b78dd94e
Ancestors: Multilingual-pre.227

Adds a UTF32 TextConverter. Updates the comments of some of the TextConverter. Updates the encoding names of utf16.

=============== Diff against Multilingual-pre.227 ===============

Item was changed:
  ISO8859TextConverter subclass: #ISO88592TextConverter
  instanceVariableNames: ''
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Multilingual-TextConversion'!
+
+ !ISO88592TextConverter commentStamp: '<historical>' prior: 0!
+ Text converter for ISO 8859-2.  An international encoding used in Eastern Europe.!

Item was changed:
  ISO8859TextConverter subclass: #ISO88597TextConverter
  instanceVariableNames: ''
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Multilingual-TextConversion'!
+
+ !ISO88597TextConverter commentStamp: '<historical>' prior: 0!
+ Text converter for ISO 8859-7.  An international encoding used for Greek.!

Item was changed:
  ISO88591TextConverter subclass: #Latin1TextConverter
  instanceVariableNames: ''
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Multilingual-TextConversion'!
+
+ !Latin1TextConverter commentStamp: '<historical>' prior: 0!
+ Text converter for ISO 8859-1.  An international encoding used in Western Europe.!

Item was changed:
  ISO885915TextConverter subclass: #Latin9TextConverter
  instanceVariableNames: ''
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Multilingual-TextConversion'!
+
+ !Latin9TextConverter commentStamp: 'pre 4/21/2017 11:40' prior: 0!
+ Text converter for ISO 8859-15.  An international encoding also used in Western Europe.!

Item was changed:
  ----- Method: UTF16TextConverter class>>encodingNames (in category 'utilities') -----
  encodingNames
 
+ ^ #('utf-16' 'utf16' 'utf-16-le' 'utf-16-be' 'utf-16be' 'utf-16le') copy.
- ^ #('utf-16' 'utf16' 'utf-16-le' 'utf-16-be') copy.
  !

Item was added:
+ TextConverter subclass: #UTF32TextConverter
+ instanceVariableNames: 'useLittleEndian useByteOrderMark byteOrderMarkDone'
+ classVariableNames: ''
+ poolDictionaries: ''
+ category: 'Multilingual-TextConversion'!
+
+ !UTF32TextConverter commentStamp: 'pre 6/7/2017 17:55' prior: 0!
+ Text converter for UTF-32.  It supports the endianness and byte order mark.!

Item was added:
+ ----- Method: UTF32TextConverter class>>encodingNames (in category 'utilities') -----
+ encodingNames
+
+ ^ #( 'utf32' 'utf32be' 'utf32le' 'utf-32' 'utf-32be' 'utf-32le' 'ucs4' 'ucs4be' 'ucs4le') copy
+ !

Item was added:
+ ----- Method: UTF32TextConverter class>>initializeLatin1MapAndEncodings (in category 'utilities') -----
+ initializeLatin1MapAndEncodings
+ "Initialize the latin1Map and latin1Encodings.
+ These variables ensure that conversions from latin1 ByteString is reasonably fast"
+
+ latin1Map := (ByteArray new: 256) atAllPut: 1.
+ latin1Encodings := (0 to: 255) collect: [:i | (ByteArray newFrom: {0 . 0 . 0 . i}) asString]!

Item was added:
+ ----- Method: UTF32TextConverter>>initialize (in category 'initialize-release') -----
+ initialize
+
+ super initialize.
+ useLittleEndian := useByteOrderMark := byteOrderMarkDone := false!

Item was added:
+ ----- Method: UTF32TextConverter>>next16BitValue:toStream: (in category 'private') -----
+ next16BitValue: value toStream: aStream
+
+ | v1 v2 |
+ v1 := (value bitShift: -8) bitAnd: 16rFF.
+ v2 := value bitAnd: 16rFF.
+ useLittleEndian
+ ifTrue: [
+ aStream
+ basicNextPut: (Character value: v2);
+ basicNextPut: (Character value: v1) ]
+ ifFalse: [
+ aStream
+ basicNextPut: (Character value: v1);
+ basicNextPut: (Character value: v2) ].
+ !

Item was added:
+ ----- Method: UTF32TextConverter>>next32BitValue:toStream: (in category 'private') -----
+ next32BitValue: value toStream: aStream
+
+ | v1 v2 v3 v4 |
+ v1 := (value bitShift: -24) bitAnd: 16rFF.
+ v2 := (value bitShift: -16) bitAnd: 16rFF.
+ v3 := (value bitShift: -8) bitAnd: 16rFF.
+ v4 := (value bitShift: 0) bitAnd: 16rFF.
+ useLittleEndian
+ ifTrue: [
+ aStream
+ basicNextPut: (Character value: v4);
+ basicNextPut: (Character value: v3);
+ basicNextPut: (Character value: v2);
+ basicNextPut: (Character value: v1) ]
+ ifFalse: [
+ aStream
+ basicNextPut: (Character value: v1);
+ basicNextPut: (Character value: v2);
+ basicNextPut: (Character value: v3);
+ basicNextPut: (Character value: v4) ].
+ !

Item was added:
+ ----- Method: UTF32TextConverter>>nextFromStream: (in category 'conversion') -----
+ nextFromStream: aStream
+
+ | character1 character2 readBOM charValue character3 character4 |
+ aStream isBinary ifTrue: [ ^aStream basicNext ].
+ character1 := aStream basicNext ifNil: [ ^nil ].
+ character2 := aStream basicNext ifNil: [ ^nil ].
+ character3 := aStream basicNext ifNil: [ ^nil ].
+ character4 := aStream basicNext ifNil: [ ^nil ].
+
+ readBOM := false.
+ (character1 asciiValue = 16rFF and: [character2 asciiValue = 16rFE]) ifTrue: [
+ self
+ useByteOrderMark: true;
+ useLittleEndian: true.
+ readBOM := true ].
+
+ ((character1 asciiValue = 0 and: [character2 asciiValue = 0])
+ and: [character3 asciiValue = 16rFE and: [character4 asciiValue = 16rFF]]) ifTrue: [
+ self
+ useByteOrderMark: true;
+ useLittleEndian: false.
+ readBOM := true ].
+
+ readBOM ifTrue: [
+ "Re-initialize character variables if they contain BOM"
+ character1 := aStream basicNext ifNil: [ ^nil ].
+ character2 := aStream basicNext ifNil: [ ^nil ].
+ character3 := aStream basicNext ifNil: [ ^nil ].
+ character4 := aStream basicNext ifNil: [ ^nil ]. ].
+
+ useLittleEndian
+ ifTrue: [ charValue := (character4 charCode bitShift: 24) + (character3 charCode bitShift: 16) + (character2 charCode bitShift: 8) + character1 charCode ]
+ ifFalse: [ charValue := (character1 charCode bitShift: 24) + (character2 charCode bitShift: 16) + (character3 charCode bitShift: 8) + character4 charCode ].
+
+ ^ Unicode value: charValue!

Item was added:
+ ----- Method: UTF32TextConverter>>nextPut:toStream: (in category 'conversion') -----
+ nextPut: aCharacter toStream: aStream
+
+ | charCode |
+ aStream isBinary ifTrue: [ ^aCharacter storeBinaryOn: aStream ].
+ (useByteOrderMark and: [ byteOrderMarkDone not ]) ifTrue: [
+ self next32BitValue: 16r0000FEFF toStream: aStream.
+ byteOrderMarkDone := true ].
+ (charCode := aCharacter charCode) < 256
+ ifTrue: [
+ (latin1Encodings at: charCode + 1)
+ ifNil: [ self next32BitValue: charCode toStream: aStream ]
+ ifNotNil: [ :encodedString | aStream basicNextPutAll: encodedString ] ]
+ ifFalse: [
+ self next32BitValue: charCode toStream: aStream ].
+ ^aCharacter!

Item was added:
+ ----- Method: UTF32TextConverter>>swapLatin1EncodingByteOrder (in category 'private') -----
+ swapLatin1EncodingByteOrder
+ latin1Encodings := latin1Encodings collect: [:each |
+ each ifNotNil: [each reverse]]!

Item was added:
+ ----- Method: UTF32TextConverter>>useByteOrderMark (in category 'accessing') -----
+ useByteOrderMark
+
+ ^useByteOrderMark
+ !

Item was added:
+ ----- Method: UTF32TextConverter>>useByteOrderMark: (in category 'accessing') -----
+ useByteOrderMark: aBoolean
+
+ useByteOrderMark := aBoolean.
+ !

Item was added:
+ ----- Method: UTF32TextConverter>>useLittleEndian (in category 'accessing') -----
+ useLittleEndian
+
+ ^useLittleEndian
+ !

Item was added:
+ ----- Method: UTF32TextConverter>>useLittleEndian: (in category 'accessing') -----
+ useLittleEndian: aBoolean
+
+ aBoolean = useLittleEndian ifFalse: [ self swapLatin1EncodingByteOrder ].
+ useLittleEndian := aBoolean.
+ !