Login  Register

Re: The Trunk: Collections-topa.806.mcz

Posted by timrowledge on Sep 13, 2018; 10:52pm
URL: https://forum.world.st/The-Trunk-Collections-topa-806-mcz-tp5084658p5084725.html

>> The question should, IMO at least, be "what character set should Squeak use" and, again IMO, that should be Unicode and, in particular, the UTF-8 encoding. (http://utf8everywhere.org/)

We should probably have a proper UTF8String class so that at least we know that it is encoded and needs conversion to a 'real' String. During the NuScratch work I toiled mightily with string stuff and really ought to have done it then. The current widestring/bytestring stuff works quite well though for most internal cases, though the cost of converting an entire string anytime a big char is inserted could get annoying.

If one were making a word processor for large amounts of text, rather than a text editor with some prettiness tweaks for code editting etc, it might pay to have a form of text that allows for mixed byte & wide sub-parts. Perhaps even possible to use text attributes in yet another twisted and sneaky way? As we discovered in the Sophie Project, handling formatted texts is decidedly non-trivial. Especially when the customer can't even define a paragraph for you....


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: PSM: Print and SMear