The Trunk: Multilingual-nice.249.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Multilingual-nice.249.mcz

commits-2
Nicolas Cellier uploaded a new version of Multilingual to project The Trunk:
http://source.squeak.org/trunk/Multilingual-nice.249.mcz

==================== Summary ====================

Name: Multilingual-nice.249
Author: nice
Time: 9 December 2019, 6:17:46.99458 pm
UUID: 1a02af2f-d014-2a4e-9023-ef0cec93897b
Ancestors: Multilingual-nice.248

1) Nuke code specific to windows CE OS. We do not support such legacy OS in Opensmalltalk VM. Multilingual is complex enough without this useless drag.

2) Fix LanguageEnvironment comment which was a mixture of iso8859L1 (ByteString) and UTF32 (WideString) re-interpreted as ByteString. Note: that might have happened before UTF8 fixes to Monticello (the stream did start as ByteString, but did continue as WideString at first non-byte code... The encoding change did only happen at buffer boundary leading to such strange side-effects). Fortunately (sic!) most Multilingual classes are un-commented.

3) Nuke useless copy of 'encoding' literal

4) Comment possibly incorrect usage of (evtBuf at: 3) in UTF32InputInterpreter. This is the charCode field of squeak event, which is not anymore macRoman encoded on recent windows VM. More work is required at VM side for Unix before we can fully clean-up...

5) classify UTF32RussianInputInterpreter in same category than siblings

=============== Diff against Multilingual-nice.248 ===============

Item was changed:
  ----- Method: JapaneseEnvironment class>>clipboardInterpreterClass (in category 'subclass responsibilities') -----
  clipboardInterpreterClass
  | platformName osVersion |
  platformName := Smalltalk platformName.
  osVersion := Smalltalk osVersion.
- (platformName = 'Win32' and: [osVersion = 'CE'])
- ifTrue: [^NoConversionClipboardInterpreter].
  platformName = 'Win32' ifTrue: [^UTF8ClipboardInterpreter].
  platformName = 'Mac OS' ifTrue: [^MacShiftJISClipboardInterpreter].
  ^platformName = 'unix'
  ifTrue:
  [(ShiftJISTextConverter encodingNames includes: X11Encoding getEncoding)
  ifTrue: [MacShiftJISClipboardInterpreter]
  ifFalse: [UnixJPClipboardInterpreter]]
  ifFalse: [ NoConversionClipboardInterpreter ]!

Item was changed:
  ----- Method: JapaneseEnvironment class>>defaultEncodingName (in category 'public query') -----
  defaultEncodingName
  | platformName osVersion |
  platformName := Smalltalk platformName.
  osVersion := Smalltalk osVersion.
- (platformName = 'Win32' and: [osVersion = 'CE']) ifTrue: [^'utf-8'].
  (#('Win32' 'ZaurusOS') includes: platformName) ifTrue: [^'shift-jis'].
  platformName = 'Mac OS'
  ifTrue:
  [^('10*' match: osVersion)
  ifTrue: ['utf-8']
  ifFalse: ['shift-jis']].
  ^'unix' = platformName ifTrue: ['euc-jp'] ifFalse: ['mac-roman']!

Item was changed:
  ----- Method: JapaneseEnvironment class>>inputInterpreterClass (in category 'subclass responsibilities') -----
  inputInterpreterClass
  | platformName osVersion encoding |
  platformName := Smalltalk platformName.
  osVersion := Smalltalk osVersion.
- (platformName = 'Win32'
- and: [osVersion = 'CE'])
- ifTrue: [^ MacRomanInputInterpreter].
  platformName = 'Win32'
  ifTrue: [^ (self win32VMUsesUnicode) ifTrue: [UTF32JPInputInterpreter] ifFalse: [WinShiftJISInputInterpreter]].
  platformName = 'Mac OS'
  ifTrue: [^ (('10*' match: osVersion)
  and: [(Smalltalk getSystemAttribute: 3) isNil])
  ifTrue: [MacUnicodeInputInterpreter]
  ifFalse: [MacShiftJISInputInterpreter]].
  platformName = 'unix'
  ifTrue: [encoding := X11Encoding encoding.
  (EUCJPTextConverter encodingNames includes: encoding)
  ifTrue: [^ UnixEUCJPInputInterpreter].
  (UTF8TextConverter encodingNames includes: encoding)
  ifTrue: [^ UnixUTF8JPInputInterpreter].
  (ShiftJISTextConverter encodingNames includes: encoding)
  ifTrue: [^ MacShiftJISInputInterpreter]].
  ^ MacRomanInputInterpreter!

Item was changed:
  ----- Method: KoreanEnvironment class>>clipboardInterpreterClass (in category 'subclass responsibilities') -----
  clipboardInterpreterClass
  | platformName osVersion |
  platformName := Smalltalk platformName.
  osVersion := Smalltalk osVersion.
- (platformName = 'Win32' and: [osVersion = 'CE'])
- ifTrue: [^NoConversionClipboardInterpreter].
  platformName = 'Win32' ifTrue: [^WinKSX1001ClipboardInterpreter].
  platformName = 'Mac OS'
  ifTrue:
  [('10*' match: osVersion)
  ifTrue: [^NoConversionClipboardInterpreter]
  ifFalse: [^WinKSX1001ClipboardInterpreter]].
  platformName = 'unix'
  ifTrue:
  [(ShiftJISTextConverter encodingNames includes: X11Encoding getEncoding)
  ifTrue: [^WinKSX1001ClipboardInterpreter]
  ifFalse: [^NoConversionClipboardInterpreter]].
  ^NoConversionClipboardInterpreter!

Item was changed:
  ----- Method: KoreanEnvironment class>>defaultEncodingName (in category 'public query') -----
  defaultEncodingName
  | platformName osVersion |
  platformName := Smalltalk platformName.
  osVersion := Smalltalk osVersion.
- (platformName = 'Win32' and: [osVersion = 'CE']) ifTrue: [^'utf-8' copy].
  (#('Win32' 'Mac OS' 'ZaurusOS') includes: platformName)
+ ifTrue: [^'euc-kr'].
+ (#('unix') includes: platformName) ifTrue: [^'euc-kr'].
- ifTrue: [^'euc-kr' copy].
- (#('unix') includes: platformName) ifTrue: [^'euc-kr' copy].
  ^'mac-roman'!

Item was changed:
  ----- Method: KoreanEnvironment class>>inputInterpreterClass (in category 'subclass responsibilities') -----
  inputInterpreterClass
  | platformName osVersion encoding |
  platformName := Smalltalk platformName.
  osVersion := Smalltalk osVersion.
- (platformName = 'Win32' and: [osVersion = 'CE'])
- ifTrue: [^MacRomanInputInterpreter].
  platformName = 'Win32' ifTrue: [^WinKSX1001InputInterpreter].
  platformName = 'Mac OS'
  ifTrue:
  [('10*' match: osVersion)
  ifTrue: [^MacUnicodeInputInterpreter]
  ifFalse: [^WinKSX1001InputInterpreter]].
  platformName = 'unix'
  ifTrue:
  [encoding := X11Encoding encoding.
  (EUCJPTextConverter encodingNames includes: encoding)
  ifTrue: [^MacRomanInputInterpreter].
  (UTF8TextConverter encodingNames includes: encoding)
  ifTrue: [^MacRomanInputInterpreter].
  (ShiftJISTextConverter encodingNames includes: encoding)
  ifTrue: [^MacRomanInputInterpreter]].
  ^MacRomanInputInterpreter!

Item was changed:
  Object subclass: #LanguageEnvironment
  instanceVariableNames: 'id'
  classVariableNames: 'ClipboardInterpreterClass FileNameConverterClass InputInterpreterClass KnownEnvironments SystemConverterClass'
  poolDictionaries: ''
  category: 'Multilingual-Languages'!
 
+ !LanguageEnvironment commentStamp: 'nice 12/9/2019 15:34' prior: 0!
- !LanguageEnvironment commentStamp: 'bf 8/16/2009 16:52' prior: 0!
  The name multilingualized Squeak suggests that you can use multiple language at one time.  This is true, of course, but the system still how to manage the primary language; that provides the interpretation of data going out or coming in from outside world. It also provides how to render strings, as there rendering rule could be different in one language to another, even if the code points in a string is the same.
 
    Originally, LanguageEnvironment and its subclasses only has class side methods.  After merged with Diego's Babel work, it now has instance side methods.  Since this historical reason, the class side and instance side are not related well.
 
+   When we talk about the interface with the outside of the Squeak world, there are three different "channels"; the keyboard input, clipboard output and input, and filename.  On a not-to-uncommon system such as a Unix system localized to Japan, all of these three can have (and does have) different encodings.  So we need to manage them separately.  Note that the encoding in a file can be anything.  While it is nice to provide a suggested guess for this 'default system file content encoding', it is not critical.
+
+   Rendering support is limited basic L-to-R rendering so far.  But you can provide different line-wrap rule, at least.
+ !
-   When we talk about the interface with the outside of the Squeak world, there are three different "channels"; the keyboard input, clipboard output and input, and filename. On a not-to-uncommon system such as a Unix system localized to Japan, all of these three can have (and does h
 ave) different encodings. So we need to manage them separately. Note that the encoding in a file can be anything. While it is nice to provide a suggested guess for this 'default system file content encoding', it is not critical.
-
- Rendering support is limited basic L-to-R rendering so far. But you can provide different line-wrap rule, at least.
- !

Item was changed:
  ----- Method: Latin1Environment class>>defaultEncodingName (in category 'subclass responsibilities') -----
  defaultEncodingName
  | platformName osVersion |
  platformName := Smalltalk platformName.
  osVersion := Smalltalk osVersion.
- (platformName = 'Win32' and: [osVersion = 'CE']) ifTrue: [^'utf-8' copy].
  (#('Win32' 'Mac OS' 'ZaurusOS') includes: platformName)
+ ifTrue: [^'iso8859-1'].
+ (#('unix') includes: platformName) ifTrue: [^'iso8859-1'].
- ifTrue: [^'iso8859-1' copy].
- (#('unix') includes: platformName) ifTrue: [^'iso8859-1' copy].
  ^'mac-roman'!

Item was changed:
  ----- Method: Latin1Environment class>>inputInterpreterClass (in category 'subclass responsibilities') -----
  inputInterpreterClass
+ | platformName |
- | platformName osVersion |
  platformName := Smalltalk platformName.
+ (platformName = 'Win32')
- osVersion := Smalltalk osVersion.
- (platformName = 'Win32' and: [osVersion ~= 'CE'])
  ifTrue: [^ (self win32VMUsesUnicode) ifTrue: [UTF32InputInterpreter] ifFalse: [MacRomanInputInterpreter]].
  platformName = 'Mac OS'
  ifTrue: [^ MacUnicodeInputInterpreter].
  platformName = 'unix'
  ifTrue: [^ UTF32InputInterpreter].
  ^ MacUnicodeInputInterpreter!

Item was changed:
  ----- Method: Latin1Environment class>>systemConverterClass (in category 'subclass responsibilities') -----
  systemConverterClass
 
  | platformName osVersion |
  platformName := Smalltalk platformName.
  osVersion := Smalltalk getSystemAttribute: 1002.
- (platformName = 'Win32'
- and: [osVersion = 'CE'])
- ifTrue: [^ MacRomanTextConverter].
  platformName = 'Win32'
  ifTrue: [^ (self win32VMUsesUnicode) ifTrue: [UTF8TextConverter] ifFalse: [ISO88591TextConverter]].
  platformName = 'Mac OS'
  ifTrue: [^ ('10*' match: Smalltalk osVersion)
  ifTrue: [UTF8TextConverter]
  ifFalse: [MacRomanTextConverter]].
  platformName = 'unix'
  ifTrue: [^ UTF8TextConverter].
  ^ MacRomanTextConverter!

Item was changed:
  ----- Method: Latin2Environment class>>defaultEncodingName (in category 'subclass responsibilities') -----
  defaultEncodingName
+ | platformName |
- | platformName osVersion |
  platformName := Smalltalk platformName.
- osVersion := Smalltalk osVersion.
- (platformName = 'Win32' and: [osVersion = 'CE']) ifTrue: [^'utf-8' copy].
  (#('Win32') includes: platformName)
+ ifTrue: [^'cp-1250'].
+ (#('unix') includes: platformName) ifTrue: [^'iso8859-2'].
- ifTrue: [^'cp-1250' copy].
- (#('unix') includes: platformName) ifTrue: [^'iso8859-2' copy].
  ^'mac-roman'!

Item was changed:
  ----- Method: SimplifiedChineseEnvironment class>>clipboardInterpreterClass (in category 'subclass responsibilities') -----
  clipboardInterpreterClass
  | platformName osVersion |
  platformName := Smalltalk platformName.
  osVersion := Smalltalk osVersion.
- (platformName = 'Win32' and: [osVersion = 'CE'])
- ifTrue: [^NoConversionClipboardInterpreter].
  platformName = 'Win32' ifTrue: [^WinGB2312ClipboardInterpreter].
  platformName = 'Mac OS'
  ifTrue:
  [('10*' match: osVersion)
  ifTrue: [^NoConversionClipboardInterpreter]
  ifFalse: [^WinGB2312ClipboardInterpreter]].
  platformName = 'unix'
  ifTrue:
  [(ShiftJISTextConverter encodingNames includes: X11Encoding getEncoding)
  ifTrue: [^MacShiftJISClipboardInterpreter]
  ifFalse: [^NoConversionClipboardInterpreter]].
  ^NoConversionClipboardInterpreter!

Item was changed:
  ----- Method: SimplifiedChineseEnvironment class>>defaultEncodingName (in category 'public query') -----
  defaultEncodingName
  | platformName osVersion |
  platformName := Smalltalk platformName.
  osVersion := Smalltalk osVersion.
- (platformName = 'Win32' and: [osVersion = 'CE']) ifTrue: [^'utf-8' copy].
  (#('Win32' 'Mac OS' 'ZaurusOS') includes: platformName)
+ ifTrue: [^'gb2312'].
+ (#('unix') includes: platformName) ifTrue: [^'euc-cn'].
- ifTrue: [^'gb2312' copy].
- (#('unix') includes: platformName) ifTrue: [^'euc-cn' copy].
  ^'mac-roman'!

Item was changed:
  ----- Method: SimplifiedChineseEnvironment class>>inputInterpreterClass (in category 'subclass responsibilities') -----
  inputInterpreterClass
  | platformName osVersion encoding |
  platformName := Smalltalk platformName.
  osVersion := Smalltalk osVersion.
- (platformName = 'Win32' and: [osVersion = 'CE'])
- ifTrue: [^MacRomanInputInterpreter].
  platformName = 'Win32' ifTrue: [^WinGB2312InputInterpreter].
  platformName = 'Mac OS'
  ifTrue:
  [('10*' match: osVersion)
  ifTrue: [^MacUnicodeInputInterpreter]
  ifFalse: [^WinGB2312InputInterpreter]].
  platformName = 'unix'
  ifTrue:
  [encoding := X11Encoding encoding.
  (EUCJPTextConverter encodingNames includes: encoding)
  ifTrue: [^MacRomanInputInterpreter].
  (UTF8TextConverter encodingNames includes: encoding)
  ifTrue: [^MacRomanInputInterpreter].
  (ShiftJISTextConverter encodingNames includes: encoding)
  ifTrue: [^MacRomanInputInterpreter]].
  ^MacRomanInputInterpreter!

Item was changed:
  ----- Method: UTF32InputInterpreter>>nextCharFrom:firstEvt: (in category 'keyboard') -----
  nextCharFrom: sensor firstEvt: evtBuf
+ "Fall back to internal char-code if char is 0"
- "Fall back on MacRoman if char is 0"
  ^(evtBuf at: 6) > 0
  ifTrue: [(evtBuf at: 6) asCharacter]
+ ifFalse:
+ [#fixme.
+ "The windows VM does not require macToSqueak for the fallback char-code
+ since https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/403
+ But unix VM still uses sqTextEncoding which still defaults to MacRoman...
+ We should fix the Unix VM too, or create two different KeyboardInterpreter.
+ Hardcoding macToSqueak is a bad idea anyway, because Unix behavior is a
+ parameter that can be changed thru either --encoding VM option
+ or SQUEAK_ENCODING environment variable"
+ (evtBuf at: 3) asCharacter macToSqueak].
- ifFalse: [(evtBuf at: 3) asCharacter macToSqueak].
  !

Item was changed:
  UTF32InputInterpreter subclass: #UTF32RussianInputInterpreter
  instanceVariableNames: ''
  classVariableNames: ''
  poolDictionaries: ''
+ category: 'Multilingual-TextConversion'!
- category: 'Multilingual-Languages'!