fonts, characterscanners and dead primitive 103

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
66 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: fonts, characterscanners and dead primitive 103

timrowledge

On 25-09-2013, at 6:02 PM, tim Rowledge <[hidden email]> wrote:

> I note that there seems to be no attempt in #scanJapaneseCharactersFrom… to handle kerning. Is this because Japanese character glyphs don't get kerned, or is it a bug that should be addressed at some point?


And similarly it seems that only in scanJapaneseCharacters… do we actually need to send isBreakable:in & registerBreakableIndex. That would speed things up a bit if I'm correct.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: RLBMI: Ruin Logic Board Multiple Indexed



Reply | Threaded
Open this post in threaded view
|

Re: fonts, characterscanners and dead primitive 103

timrowledge
Right now it appears that DisplayScanner is no longer needed. Another one bites the dust...

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- Living proof that nature does not abhor a vacuum.



Reply | Threaded
Open this post in threaded view
|

Re: fonts, characterscanners and dead primitive 103

Nicolas Cellier
I suggest that we move the last bits of difference from MultiCharacter* -> CharacterScanner, properly classify what is *Multilingual related, and then remove the Multi* classes. Sounds right?


2013/9/28 tim Rowledge <[hidden email]>
Right now it appears that DisplayScanner is no longer needed. Another one bites the dust...
Useful random insult:- Living proof that nature does not abhor a vacuum.






Reply | Threaded
Open this post in threaded view
|

Re: fonts, characterscanners and dead primitive 103

timrowledge

On 28-09-2013, at 12:34 AM, Nicolas Cellier <[hidden email]> wrote:

> I suggest that we move the last bits of difference from MultiCharacter* -> CharacterScanner, properly classify what is *Multilingual related, and then remove the Multi* classes. Sounds right?

In general, yes. The smart thing is to end up with CharacterScanner instead of MultiCharacterScanner, just because the name is simpler.

We can probably fudge things a little to make the Multi-* classes acceptable in all normal-running, swap code to not refer to the non-Multi classes, then they can be cleaned up without fear of breaking anything live, then swap everything back to non-Multi, then delete all Multi-*.

Then maybe we can do the same to Paragraph & NewParagraph and any other confused classes.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Never write software that patronizes the user.



Reply | Threaded
Open this post in threaded view
|

Re: fonts, characterscanners and dead primitive 103

timrowledge
Now that Nicolas & I have pretty much finished this stage of cleaning up the scanners etc, we have at least achieved the major aim I had in mind; getting back to a single class tree for scanning text. So far as I can tell everything is working ok and I haven't managed to cause any errors.

The magic keys (cmd - & cmd +) that are supposed to kern the selection do not work, but then they don't in a vanilla 4.5-12461 image from before this work was done. So we didn't break it…

The next thing to do is try to simplify the choices between byte and wide strings, fonts and encodings and language environments. I hate seeing #isKindOf: or #isMemberOf: type tests in running code (you can excuse it in prototypes, for a few minutes at least) and #isByteString is not much better. We have classes and inheritance for a reason; nobody should be writing C code in Smalltalk.

Trying to list the factors involved in working out how to scan a text (and *please* correct whatever I get wrong):-

the String -
byteString; so far as I can see ByteStrings are single-byte characters (duh) with an assumed encoding. That appears to be 'mac roman' which is almost but not quite latin1 or iso-something-or-other.
wideString; 32 bit characters where the top (ish) 8 bits are used as a leading character (not to be confused with leading in the typographic sense of affecting line spacing - isn't English wonderfully clear…) that defines an EncodedCharSet (or LanguageEnvironment, sigh) which provides for a specific scanning message to use. To complicate life further, a later character in a WideString can change the encoding to use, which may well change the font, oh frabjous day.

the Font -
we have several classes of fonts, not all in the base image right now.
I think I'd divide them into two phyla at the moment;
a) StrikeFonts and other simple bitmap glyphs. This would include StrikeFont itself, HostFont and TTCFont (since it generates bitmaps that are simply bitblt'd to use)
b) ComplicatedPluginFonts where an interface to a more complex and sophisticated renderer is used to leverage a library such as TrueType, Cairo, Pango, Weyland or whatever. These may well need to completely usurp the actual scanner to do the work.

There's another font aspect that is important too, but for now at least it is tied to a & b above - whether pair-kerning is supported. I'm sure we could make a variant of StrikeFonts that does it  if we wanted but let's keep things tolerably intelligible for now, eh?

I'm going to take a quick swing at changing the scanning to delegate to
1) the string, which will then delegate to
2) the font, which for all the classes in the image right now will then delegate back to
3) the scanner, but having already worked out which form of scanning is required.

OK; I'm going in! Cover me!

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: CSF: Charge to NSF



Reply | Threaded
Open this post in threaded view
|

Re: fonts, characterscanners and dead primitive 103

timrowledge
In reply to this post by timrowledge
I've completed some changes to clean up the dispatching of scanning so that we use multi-dispatching instead of nasty tests.

It's running quite happily in my development image, seems to handle plain ascii stuff and widestrings with all them furrin' accents ok. I'm *not* going to just drop it into the trunk right now though since it has plenty of scope for totally messing up things and really ought to be tested a little first.

The explanation and code is at http://bugs.squeak.org/view.php?id=7789
Two small changesets - one with all the new code and one with a single scary flip-over to use it. I suggest browsing the new code first….

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: BDC: Break Down and Cry



1234