Hi again,
Does anyone know the rationale behind this? declareEncodedCharSet: anEncodedCharSetOrLanguageEnvironmentClass atIndex: aNumber EncodedCharSets at: aNumber put: anEncodedCharSetOrLanguageEnvironmentClass "this method is used to modularize the old initialize method: EncodedCharSets at: 0+1 put: Unicode. EncodedCharSets at: 1+1 put: JISX0208. EncodedCharSets at: 2+1 put: GB2312. EncodedCharSets at: 3+1 put: KSX1001. EncodedCharSets at: 4+1 put: JISX0208. EncodedCharSets at: 5+1 put: JapaneseEnvironment. EncodedCharSets at: 6+1 put: SimplifiedChineseEnvironment. EncodedCharSets at: 7+1 put: KoreanEnvironment. EncodedCharSets at: 8+1 put: GB2312. EncodedCharSets at: 12+1 put: KSX1001. EncodedCharSets at: 13+1 put: GreekEnvironment. EncodedCharSets at: 14+1 put: Latin2Environment. EncodedCharSets at: 15+1 put: RussianEnvironment. EncodedCharSets at: 17+1 put: Latin9Environment. EncodedCharSets at: 256 put: Unicode. " smime.p7s (5K) Download Attachment |
> On 17 Sep 2015, at 4:09 , Christophe Demarey <[hidden email]> wrote: > > Hi again, > > Does anyone know the rationale behind this? > > declareEncodedCharSet: anEncodedCharSetOrLanguageEnvironmentClass atIndex: aNumber > > EncodedCharSets at: aNumber put: anEncodedCharSetOrLanguageEnvironmentClass > > "this method is used to modularize the old initialize method: > EncodedCharSets at: 0+1 put: Unicode. > EncodedCharSets at: 1+1 put: JISX0208. > EncodedCharSets at: 2+1 put: GB2312. > EncodedCharSets at: 3+1 put: KSX1001. > EncodedCharSets at: 4+1 put: JISX0208. > EncodedCharSets at: 5+1 put: JapaneseEnvironment. > EncodedCharSets at: 6+1 put: SimplifiedChineseEnvironment. > EncodedCharSets at: 7+1 put: KoreanEnvironment. > EncodedCharSets at: 8+1 put: GB2312. > EncodedCharSets at: 12+1 put: KSX1001. > EncodedCharSets at: 13+1 put: GreekEnvironment. > EncodedCharSets at: 14+1 put: Latin2Environment. > EncodedCharSets at: 15+1 put: RussianEnvironment. > EncodedCharSets at: 17+1 put: Latin9Environment. > EncodedCharSets at: 256 put: Unicode. > > " That made little sense since each WideChar takes 32 bits anyways (other than a possibly simpler codePoint -> glyph index translation described below), so most/all places where this happened have been removed and we now assume WideString code points to be equal to unicode code points. The goal of the old code was to give parameters to the old StrikeFont string display primitive, which is limited to using a table of 256 glyphs for any string it wants to render. The Scanners job was to introduce stops when it encounters a change that makes the string it's scanning unable to be displayed by a single call to the primitive. The presence of a leadingChar induced(/s) a stop in the Scanner, which explains the mix of encodings and Languages (whose codePoints are in Unicode) in the EncodedCharSets table. This stop let a properly constructed StrikeFontSet swap in a glyph table suitable for displaying the leading char, using a custom codePoint -> glyph index conversion. TLDR; It's a relic of a more complex past used in a mechanism that let StrikeFonts display other than macroman characters. Cheers, Henry P.S. Funnily enough, the mechanism is somewhat similar to what would be needed to swap in fallback fonts for code points not covered by the default font efficiently (although the stop would be on missing glyph, rather than leading char), instead of the current, doomed-to-fail approach of using a FallbackFont in each font. signature.asc (859 bytes) Download Attachment |
In reply to this post by demarey
Hi Christophe,
On Thu, Sep 17, 2015 at 7:09 AM, Christophe Demarey <[hidden email]> wrote: Hi again, what Henrik says is correct. Here's the relevant definition in Character: Character>>leadingChar "Answer the value of the 8 highest bits which is used to identify the language. This is mostly used for east asian languages CJKV as a workaround against unicode han-unification." ^ self asInteger bitShift: -22 i.e. the top 8 bytes of the leading character in a string is (was?) used to index EncodedCharSets to determine what language the string is in. _,,,^..^,,,_ best, Eliot |
> On 17 Sep 2015, at 19:26, Eliot Miranda <[hidden email]> wrote: > > Hi Christophe, > > On Thu, Sep 17, 2015 at 7:09 AM, Christophe Demarey <[hidden email]> wrote: > Hi again, > > Does anyone know the rationale behind this? > > declareEncodedCharSet: anEncodedCharSetOrLanguageEnvironmentClass atIndex: aNumber > > EncodedCharSets at: aNumber put: anEncodedCharSetOrLanguageEnvironmentClass > > "this method is used to modularize the old initialize method: > EncodedCharSets at: 0+1 put: Unicode. > EncodedCharSets at: 1+1 put: JISX0208. > EncodedCharSets at: 2+1 put: GB2312. > EncodedCharSets at: 3+1 put: KSX1001. > EncodedCharSets at: 4+1 put: JISX0208. > EncodedCharSets at: 5+1 put: JapaneseEnvironment. > EncodedCharSets at: 6+1 put: SimplifiedChineseEnvironment. > EncodedCharSets at: 7+1 put: KoreanEnvironment. > EncodedCharSets at: 8+1 put: GB2312. > EncodedCharSets at: 12+1 put: KSX1001. > EncodedCharSets at: 13+1 put: GreekEnvironment. > EncodedCharSets at: 14+1 put: Latin2Environment. > EncodedCharSets at: 15+1 put: RussianEnvironment. > EncodedCharSets at: 17+1 put: Latin9Environment. > EncodedCharSets at: 256 put: Unicode. > > " > > what Henrik says is correct. Here's the relevant definition in Character: > > Character>>leadingChar > "Answer the value of the 8 highest bits which is used to identify the language. > This is mostly used for east asian languages CJKV as a workaround against unicode han-unification." > ^ self asInteger bitShift: -22 > > > i.e. the top 8 bytes of the leading character in a string is (was?) used to index EncodedCharSets to determine what language the string is in. Past tense indeed, until someone can explain why we would need this while it cannot be found anywhere else. > _,,,^..^,,,_ > best, Eliot |
In reply to this post by Eliot Miranda-2
Thanks all for your explanations.
It looks like there are still a lot of things to clean. Christophe Le 17 sept. 2015 à 19:26, Eliot Miranda a écrit :
smime.p7s (5K) Download Attachment |
Yes, but first step consists in reading http://www.ipa.go.jp/files/000005751.pdf to exactly understand which feature you're going to loose, or better, how you're going to support it differently. Nicolas2015-09-18 10:10 GMT+02:00 Christophe Demarey <[hidden email]>:
|
Free forum by Nabble | Edit this page |