I’m currently in front of a paper with non-ascii french letters used in
it as Smalltalk keywords (namely “é”). So I don’t know if their
proprietary software supports it (maybe even not), but it seems GNU
Most lisps allow unicode identifiers and symbols, and even C11 allows
unicode identifiers (though almost nobody implements it and gcc still
doesn’t support including these directly in source files), and GNU
Smalltalk appears to support unicode strings ('é' works out of the box,
even without loading I18N), unicode symbols (#'é', same thing, but #é
doesn’t work), and unicode characters but only through an
non-human-readable notation (such as $<16r00C3> instead of $é).
I wanted to know: why doesn’t it support this feature? Is it a choice?
Does the original standard requires to do so? Does it support unicode
variables internally (is there anything such as “intern #symbol”, “eval
#symbol”, or “symbol-to-string symbol”?)? At least it doesn’t seem to
support reading them… Is a such support planned? Would it be
long/difficult to do?
begin quoting Garreau, Alexandre as of Tue, Oct 16, 2018 at 04:09:46PM +0200:
> On 2018-10-16 at 15:27, Garreau, Alexandre wrote:
> > So I don???t know if their proprietary software supports it (maybe even
> > not),
> So I just asked someone to check and it does, so wouldn???t it be nice to
> have it in GNU Smalltalk too? would it be difficult?
On 2018-10-16 at 17:20, SJS wrote:
> begin quoting Garreau, Alexandre as of Tue, Oct 16, 2018 at 04:09:46PM +0200:
>> On 2018-10-16 at 15:27, Garreau, Alexandre wrote:
>> > So I don???t know if their proprietary software supports it (maybe even
>> > not),
>> So I just asked someone to check and it does, so wouldn???t it be nice to
>> have it in GNU Smalltalk too? would it be difficult?
> Don't forget the problem of homoglyphs.
I know, there even was a discussion about this recently on emacs-devel.
This is an interesting issue, that arise way more relevantly about
domain names, usernames, web searches, or spam, etc. than for source
code, as if you use free software from trusted sources, this is just as
likely to happen than willingly-too-complex obfuscated programs, or
bad-names obfuscated programs, etc.
This two aren’t a reason from banning other languages from texts such as
domain names, nicks, etc., nor for, for instance, putting restrictions
in a language about the complexity of anything such as function length.
And moreover that’s not a reason for banning other languages from
identifiers, as C11, all lisps, and one SmallTalk showed.
Unicode, or intermixing several languages, is in itself a modern issue,
as more and more languages begin mixing writing systems, it becomes
mandatory, yet indeed it is badly done, and centralized, and such, but
it is standard and existing.
> And don't forget about emojis.
They indeed pushed it pretty far, but hey, that’s their fault… And if
you implement unicode, you get functions to know what category of
character a character is, so you may as well exhaustively ban emojis, or
only allow letters, or letters and digits, or and so on.
I don’t know what’s the norm for SmallTalk, but C11 for instance, afair,
allow *at least* any letter or digit from any alphabet. So that should
include arabic as well as maya digits, as well as arabic, chinese,
hangul, cyrillic, etc.