tim Rowledge uploaded a new version of Multilingual to project The Trunk:
http://source.squeak.org/trunk/Multilingual-tpr.185.mcz ==================== Summary ==================== Name: Multilingual-tpr.185 Author: tpr Time: 8 October 2013, 2:50:18.117 pm UUID: 4417f293-d927-4f27-a55e-140178ab2eee Ancestors: Multilingual-nice.184 Make the character encoders and language environments understand how to delgate the next step of character scanning =============== Diff against Multilingual-nice.184 =============== Item was changed: ----- Method: EncodedCharSet class>>charsetAt: (in category 'class methods') ----- charsetAt: encoding + "Find the char set encoding that matches 'encoding'; return a decent default rather than nil" + ^ (EncodedCharSets at: encoding + 1) ifNil: [EncodedCharSets at: 1]. - - ^ EncodedCharSets at: encoding + 1 ifAbsent: [EncodedCharSets at: 1]. ! Item was added: + ----- Method: EncodedCharSet class>>scanMultibyteCharactersFrom:to:in:with:rightX:font: (in category 'accessing - displaying') ----- + scanMultibyteCharactersFrom: startIndex to: stopIndex in: aWideString with: aCharacterScanner rightX: rightX font: aFont + "the default for scanning multibyte characters- other more specific encodings may do something else" + ^aFont scanMultibyteCharactersFrom: startIndex to: stopIndex in: aWideString with: aCharacterScanner rightX: rightX! Item was added: + ----- Method: JapaneseEnvironment class>>scanMultibyteCharactersFrom:to:in:with:rightX:font: (in category 'language methods') ----- + scanMultibyteCharactersFrom: startIndex to: stopIndex in: aWideString with: aCharacterScanner rightX: rightX font: aFont + "scanning multibyte Japanese strings" + ^aFont scanMultibyteJapaneseCharactersFrom: startIndex to: stopIndex in: aWideString with: aCharacterScanner rightX: rightX! Item was added: + ----- Method: LanguageEnvironment class>>scanMultibyteCharactersFrom:to:in:with:rightX:font: (in category 'language methods') ----- + scanMultibyteCharactersFrom: startIndex to: stopIndex in: aWideString with: aCharacterScanner rightX: rightX font: aFont + "the default for scanning multibyte characters- other more specific encodings may do something else" + ^aFont scanMultibyteCharactersFrom: startIndex to: stopIndex in: aWideString with: aCharacterScanner rightX: rightX! Item was added: + ----- Method: String>>encodedCharSetAt: (in category '*Multilingual') ----- + encodedCharSetAt: index + "return the character encoding in place at index; the actual EncodedCharSet, not just a number. A bad index is an Error" + ^EncodedCharSet charsetAt: (self at: index) leadingChar! |
I would prefer decent default being ^Unicode, if ever (EncodedCharSets at:1) isNil for some (bad) reason. 2013/10/8 <[hidden email]> tim Rowledge uploaded a new version of Multilingual to project The Trunk: |
On 08-10-2013, at 2:55 PM, Nicolas Cellier <[hidden email]> wrote: > I would prefer decent default being ^Unicode, if ever (EncodedCharSets at:1) isNil for some (bad) reason. That would be fine by me, but I've been trying to change as little as possible at each step, especially when I have no information on why a particular choice was made. If you understand enough to feel it is a good change to make, I say go for it. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Useful Latin Phrases:- Furnulum pani nolo = I don't want a toaster. |
In reply to this post by Nicolas Cellier
On Tue, 8 Oct 2013, Nicolas Cellier wrote:
> I would prefer decent default being ^Unicode, if ever (EncodedCharSets at:1) isNil for some (bad) reason. Wouldn't it be better to fill the EncodedCharSets array with Unicode by default in EncodedCharSet class >> #initialize? (replace the line EncodedCharSets := Array new: 256. with: EncodedCharSets := Array new: 256 withAll: Unicode ) That way #charsetAt: could be simply ^EncodedCharSets at: encoding + 1 Levente > > > 2013/10/8 <[hidden email]> > tim Rowledge uploaded a new version of Multilingual to project The Trunk: > http://source.squeak.org/trunk/Multilingual-tpr.185.mcz > > ==================== Summary ==================== > > Name: Multilingual-tpr.185 > Author: tpr > Time: 8 October 2013, 2:50:18.117 pm > UUID: 4417f293-d927-4f27-a55e-140178ab2eee > Ancestors: Multilingual-nice.184 > > Make the character encoders and language environments understand how to delgate the next step of character scanning > > =============== Diff against Multilingual-nice.184 =============== > > Item was changed: > ----- Method: EncodedCharSet class>>charsetAt: (in category 'class methods') ----- > charsetAt: encoding > + "Find the char set encoding that matches 'encoding'; return a decent default rather than nil" > + ^ (EncodedCharSets at: encoding + 1) ifNil: [EncodedCharSets at: 1]. > - > - ^ EncodedCharSets at: encoding + 1 ifAbsent: [EncodedCharSets at: 1]. > ! > > Item was added: > + ----- Method: EncodedCharSet class>>scanMultibyteCharactersFrom:to:in:with:rightX:font: (in category 'accessing - displaying') ----- > + scanMultibyteCharactersFrom: startIndex to: stopIndex in: aWideString with: aCharacterScanner rightX: rightX font: aFont > + "the default for scanning multibyte characters- other more specific encodings may do something else" > + ^aFont scanMultibyteCharactersFrom: startIndex to: stopIndex in: aWideString with: aCharacterScanner rightX: rightX! > > Item was added: > + ----- Method: JapaneseEnvironment class>>scanMultibyteCharactersFrom:to:in:with:rightX:font: (in category 'language methods') ----- > + scanMultibyteCharactersFrom: startIndex to: stopIndex in: aWideString with: aCharacterScanner rightX: rightX font: aFont > + "scanning multibyte Japanese strings" > + ^aFont scanMultibyteJapaneseCharactersFrom: startIndex to: stopIndex in: aWideString with: aCharacterScanner rightX: rightX! > > Item was added: > + ----- Method: LanguageEnvironment class>>scanMultibyteCharactersFrom:to:in:with:rightX:font: (in category 'language methods') ----- > + scanMultibyteCharactersFrom: startIndex to: stopIndex in: aWideString with: aCharacterScanner rightX: rightX font: aFont > + "the default for scanning multibyte characters- other more specific encodings may do something else" > + ^aFont scanMultibyteCharactersFrom: startIndex to: stopIndex in: aWideString with: aCharacterScanner rightX: rightX! > > Item was added: > + ----- Method: String>>encodedCharSetAt: (in category '*Multilingual') ----- > + encodedCharSetAt: index > + "return the character encoding in place at index; the actual EncodedCharSet, not just a number. A bad index is an Error" > + ^EncodedCharSet charsetAt: (self at: index) leadingChar! > > > > > |
On 09.10.2013, at 00:52, Levente Uzonyi <[hidden email]> wrote: > On Tue, 8 Oct 2013, Nicolas Cellier wrote: > >> I would prefer decent default being ^Unicode, if ever (EncodedCharSets at:1) isNil for some (bad) reason. > > Wouldn't it be better to fill the EncodedCharSets array with Unicode by default in EncodedCharSet class >> #initialize? (replace the line > > EncodedCharSets := Array new: 256. > > with: > > EncodedCharSets := Array new: 256 withAll: Unicode > ) > > That way #charsetAt: could be simply > > ^EncodedCharSets at: encoding + 1 > > > Levente IMHO that would obscure the intention. It is technically equivalent, yes, but I'd like to see the explicit default. Most readable might be this: ^ (EncodedCharSets at: encoding + 1) ifNil: [Unicode] We could even skip the "+ 1" part and only store the encoded charsets in EncodedCharSets. Unicode is not encoded, which is well-expressed by the code 0. ^ (EncodedCharSets at: encoding ifAbsent: [nil]) ifNil: [Unicode] - Bert - >> charsetAt: encoding >> + "Find the char set encoding that matches 'encoding'; return a decent default rather than nil" >> + ^ (EncodedCharSets at: encoding + 1) ifNil: [EncodedCharSets at: 1]. >> - >> - ^ EncodedCharSets at: encoding + 1 ifAbsent: [EncodedCharSets at: 1]. >> ! |
On 10/9/13, Bert Freudenberg <[hidden email]> wrote:
> > On 09.10.2013, at 00:52, Levente Uzonyi <[hidden email]> wrote: > >> On Tue, 8 Oct 2013, Nicolas Cellier wrote: >> >>> I would prefer decent default being ^Unicode, if ever (EncodedCharSets >>> at:1) isNil for some (bad) reason. >> >> Wouldn't it be better to fill the EncodedCharSets array with Unicode by >> default in EncodedCharSet class >> #initialize? (replace the line >> >> EncodedCharSets := Array new: 256. >> >> with: >> >> EncodedCharSets := Array new: 256 withAll: Unicode >> ) >> >> That way #charsetAt: could be simply >> >> ^EncodedCharSets at: encoding + 1 >> >> >> Levente > > > IMHO that would obscure the intention. It is technically equivalent, yes, > but I'd like to see the explicit default. Most readable might be this: > > ^ (EncodedCharSets at: encoding + 1) ifNil: [Unicode] > > We could even skip the "+ 1" part and only store the encoded charsets in > EncodedCharSets. Unicode is not encoded, which is well-expressed by the code > 0. > > ^ (EncodedCharSets at: encoding ifAbsent: [nil]) ifNil: [Unicode] +1 for this as intention-revealing. Tells us that Unicode is the default. > - Bert - > >>> charsetAt: encoding >>> + "Find the char set encoding that matches 'encoding'; return a >>> decent default rather than nil" >>> + ^ (EncodedCharSets at: encoding + 1) ifNil: [EncodedCharSets >>> at: 1]. >>> - >>> - ^ EncodedCharSets at: encoding + 1 ifAbsent: >>> [EncodedCharSets at: 1]. >>> ! > > > > > |
In reply to this post by Bert Freudenberg
On Wed, 9 Oct 2013, Bert Freudenberg wrote:
> > On 09.10.2013, at 00:52, Levente Uzonyi <[hidden email]> wrote: > >> On Tue, 8 Oct 2013, Nicolas Cellier wrote: >> >>> I would prefer decent default being ^Unicode, if ever (EncodedCharSets at:1) isNil for some (bad) reason. >> >> Wouldn't it be better to fill the EncodedCharSets array with Unicode by default in EncodedCharSet class >> #initialize? (replace the line >> >> EncodedCharSets := Array new: 256. >> >> with: >> >> EncodedCharSets := Array new: 256 withAll: Unicode >> ) >> >> That way #charsetAt: could be simply >> >> ^EncodedCharSets at: encoding + 1 >> >> >> Levente > > > IMHO that would obscure the intention. It is technically equivalent, yes, but I'd like to see the explicit default. Most readable might be this: I think it's better, because the intention is expressed in a single method, instead of two. The explicit default is there, but in #initialize. > > ^ (EncodedCharSets at: encoding + 1) ifNil: [Unicode] > > We could even skip the "+ 1" part and only store the encoded charsets in EncodedCharSets. Unicode is not encoded, which is well-expressed by the code 0. > > ^ (EncodedCharSets at: encoding ifAbsent: [nil]) ifNil: [Unicode] Performance wise it's better to keep the "+ 1", and even better to save the #ifNil: too. :) Levente > > > - Bert - > >>> charsetAt: encoding >>> + "Find the char set encoding that matches 'encoding'; return a decent default rather than nil" >>> + ^ (EncodedCharSets at: encoding + 1) ifNil: [EncodedCharSets at: 1]. >>> - >>> - ^ EncodedCharSets at: encoding + 1 ifAbsent: [EncodedCharSets at: 1]. >>> ! > > > > > |
I don't know if this micro-benchmark is relevant, since the charsetAt: should be inquired only at a leadingChar change. (the send should be put out of the scanJapaneseCharactersFrom: loop). I should also run more than once, but here it is| tmp | tmp := {Unicode. nil}. { [tmp at: 1] bench. [(tmp at: 1) ifNil: [Unicode]] bench. [(tmp at: 2) ifNil: [Unicode]] bench. [tmp at: 1 ifAbsent: [Unicode]] bench. [tmp at: 0 ifAbsent: [Unicode]] bench. [(tmp at: 0 ifAbsent: [nil]) ifNil: [Unicode]] bench. [(tmp at: 0 ifAbsent: nil) ifNil: [Unicode]] bench. } #( '22,900,000 per second.' '22,700,000 per second.' '18,500,000 per second.' '5,570,000 per second.' '5,200,000 per second.' '5,160,000 per second.' '14,600,000 per second.' ) Cheating with this property: nil value -> nil makes a difference. Shall we make provisions for leadingChar > 256 in next 64bits Spur image, or will immediate characters be restricted to 32bits? Note that leadingChar could already reach 1023 (10 bits), because there is no reason to restrict a WordArray content (32 bits) to small positive integers (30 bits), except a convention for not slowing down things too much with LargeIntegers... The ifAbsent: is protecting us from such crafted MalCharacter. 2013/10/10 Levente Uzonyi <[hidden email]>
|
Free forum by Nabble | Edit this page |