Smalltalk › Squeak › Squeak - Dev

Squeak-dev/Squeak-web image v95-2

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

58 messages Options

123

J J-6

Re: Squeak-dev/Squeak-web image v95-2

I think I usually agree with Lex, but this time I must say I do not. C was
breathtakingly good at interop as well, but at some point we saw that it was
holding us back and moved on. I feel it's the same with the ASCII set.

I haven't seen much pain in Smalltalk because of being tied to ASCII (though
that doesn't mean it isn't there, just that we are good at working around
it). But one place that is showing it in a big way is Haskell [1].

I not sure what to do here, but there must be something. Maybe do like
laptops already do and have a "function" key that causes normal keys to do
an alternative function (normally written in blue), e.g. holding down F2 and
the L key could make a lambda symbol. When a user presses the "function"
key, we could have a little keyboard pop up to show what the symbols are
since they wont be written on the keyboard.

[1] In Haskell one becomes aware of the limitations of using ASCII right
away. The compose symbol (normally a circle) is the period. They decided
to use the \ symbol for lambdas (!!!) since that was the closest graphic
available.

>From: Bert Freudenberg <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: The general-purpose Squeak developers
>list<[hidden email]>
>Subject: Re: Squeak-dev/Squeak-web image v95-2
>Date: Sat, 7 Apr 2007 18:29:25 +0200
>
>
>On Apr 7, 2007, at 15:45 , Lex Spoon wrote:
>
>>tim Rowledge <[hidden email]> writes:
>>>Make the parser accept the unicode leftarrow as assign; leave the
>>>':=' for backcompat.
>>>Make fileout to text convert leftaror to := for plausible human
>>>readability and ascii compat.
>>>Make filein convert := to leftarrow for aesthetic compat.
>>>Make a hotkey to insert leftarrow. Remember, the $= key is just a
>>>hotkey that is already set for you.
>>>Let people chose a suitable assign hotkey. Perhaps cmd-=? Perhaps alt-
>>>\? I don't really care at this stage.
>>>Profit.
>>
>>
>>This approach will work. However, we could instead not do all that,
>>and use pure ASCII. ASCII is simple, sufficient, and breathtakingly
>>good at interop.
>
>The American SCII also breathtakingly sucks at expressiveness, even for
>many Americans. If you find its expressiveness sufficient, good for you.
>In most parts of the world it is not.
>
>Typewriters with their limited character set fortunately were only a small
>interlude in the history of typography. I'm glad in Smalltalk we at least
>use proportional fonts. Arbitrary punctuation like := really disturbs in
>reading IMHO. I actually hope we're moving towards higher typographical
>standards (we at least should be able to do what Fortress does for
>example) and not falling back into the stone age of computing.
>
>- Bert -
>
>
>

_________________________________________________________________
MSN is giving away a trip to Vegas to see Elton John. Enter to win today.
http://msnconcertcontest.com?icid-nceltontagline

timrowledge

Re: Squeak-dev/Squeak-web image v95-2

In reply to this post by Bert Freudenberg

On 7-Apr-07, at 9:29 AM, Bert Freudenberg wrote:
[snip]

>
> The American SCII also breathtakingly sucks at expressiveness, even
> for many Americans. If you find its expressiveness sufficient, good
> for you. In most parts of the world it is not.
Hear-hear. How am I supposed to put a diaresis on a 'y' to indicate
it is to be pronounced as a separate vowel?

There's lot of important lexical(sic) symbols that Lex's wish for
'pure ASCII' would prevent us using. Smalltalk-return and Smalltalk-
assign are just two that happen to concern us here and now.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Oxymorons: Sweet sorrow

Andrew Tween

Re: Unicode Arrows in Shout ( was Re: Squeak-dev/Squeak-web imagev95-2)

In reply to this post by Bert Freudenberg

Hi Bert,
> The freetype importer needs to map those characters to a blank glyph.

freetype? I think you meant truetype?
In which case - yes, but how?

Lots of problems with WideString/Unicode/MultiTTCFont will be exposed if people
start using my hacked Shout.
Which reinforces the (very valid) observation you made earlier about excercising
WideString code paths.

Cheers,
Andy

Simon Michael

Re: Squeak-dev/Squeak-web image v95-2

In reply to this post by J J-6

J J wrote:
> [1] In Haskell one becomes aware of the limitations of using ASCII right
> away. The compose symbol (normally a circle) is the period. They
> decided to use the \ symbol for lambdas (!!!) since that was the closest
> graphic available.

For what it's worth, I think both of those choices work well for haskell,
and in a modern emacs and with a lucky font choice :), (setq
haskell-font-lock-symbols 'unicode) makes them look great!

Lex Spoon-3

Re: Squeak-dev/Squeak-web image v95-2

In reply to this post by timrowledge

tim Rowledge <[hidden email]> writes:

> On 7-Apr-07, at 9:29 AM, Bert Freudenberg wrote:
> > The American SCII also breathtakingly sucks at expressiveness, even
> > for many Americans. If you find its expressiveness sufficient, good
> > for you. In most parts of the world it is not.
> Hear-hear. How am I supposed to put a diaresis on a 'y' to indicate
> it is to be pronounced as a separate vowel?
>
> There's lot of important lexical(sic) symbols that Lex's wish for
> 'pure ASCII' would prevent us using. Smalltalk-return and Smalltalk-
> assign are just two that happen to concern us here and now.

I get the argument for going to Unicode, though that is not without
its problems. What bugs me is redefining _ and ^ in non-standard
ways. It's awkward and obscure.

Lex

Bert Freudenberg

Re: Unicode Arrows in Shout ( was Re: Squeak-dev/Squeak-web imagev95-2)

In reply to this post by Andrew Tween

On Apr 7, 2007, at 21:12 , Andrew Tween wrote:

> Hi Bert,
>> The freetype importer needs to map those characters to a blank glyph.
>
> freetype? I think you meant truetype?

Sorry, yes, that's what I meant.

> In which case - yes, but how?

IIRC truetype requires glyph number 0 to be a zero-width blank glyph.
So if you assign that in the character-to-glyph map it will not be
rendered.

I seem to recall there even was a method for doing this remapping,
but I haven't looked for it.

- Bert -

Andrew Tween

Re: Unicode Arrows in Shout ( was Re: Squeak-dev/Squeak-webimagev95-2)

Hi,
----- Original Message -----
From: "Bert Freudenberg" <[hidden email]>
To: "The general-purpose Squeak developers list"
<[hidden email]>
Sent: Saturday, April 07, 2007 10:34 PM
Subject: Re: Unicode Arrows in Shout ( was Re: Squeak-dev/Squeak-webimagev95-2)

>
> On Apr 7, 2007, at 21:12 , Andrew Tween wrote:
>
> > Hi Bert,
> >> The freetype importer needs to map those characters to a blank glyph.
> >
> > freetype? I think you meant truetype?
>
> Sorry, yes, that's what I meant.
>
> > In which case - yes, but how?
>
> IIRC truetype requires glyph number 0 to be a zero-width blank glyph.
> So if you assign that in the character-to-glyph map it will not be
> rendered.

I think glyph 0 is the "unknown" glyph - usually a rectangle.
glyph 1 is the "null" glyph.

Having said that, the TTCFontReader is making its own blank glyph for
separators.

Which is ok. But...

TTCFontReader>>processCharMap: creates two encodings; the first has 256 entries,
the second 65536.
In the first encoding, it sets the entry for each separator character to the
blank glyph.
But, in the second encoding, it doesn't.

Modifying it so that separators are set to a blank glyph in both encodings fixes
the problem. Fileout is attached.

Cheers,
Andy

TTCFontReader-processCharMap.st (1K) Download Attachment

"Martin v. Löwis"

Re: Unicode Arrows in Shout ( was Re: Squeak-dev/Squeak-webimagev95-2)

> Modifying it so that separators are set to a blank glyph in both encodings fixes
> the problem. Fileout is attached.

Thanks, works fine. Not sure whose fault it is, but when browsing
the senders of setDemoFonts, looking at Preferences
class>>fontConfigurationMenu, I get a ByteArray>>errorSubscriptBounds:,
index=8593. The byte array in question is CaseInsensitiveMatchOrder.

The call originates from TextMorphForShoutEditor>>againOnce:,
where it says

where ← paragraph text findString: FindText startingAt:
self stopIndex caseSensitive:
((ChangeText ~~ FindText) or: [Preferences caseSensitiveFinds]).

The actual problem seems to be that CaseInsensitiveMatchOrder is
only 256 bytes in size.

Regards,
Martin

"Martin v. Löwis"

Re: Unicode Arrows in Shout ( was Re: Squeak-dev/Squeak-webimagev95-2)

In reply to this post by Andrew Tween

As another follow-up: Cut-n-paste of source text does not work anymore,
on Linux. The arrows are stripped when pasting (both between two Squeak
windows, and when copying from Squeak to, say, this IceDove window
I'm typing the mail into (whereas copying the arrow from gucharmap
works fine).

Regards,
Martin

Bert Freudenberg

Re: Unicode Arrows in Shout ( was Re: Squeak-dev/Squeak-webimagev95-2)

In reply to this post by "Martin v. Löwis"

On Apr 8, 2007, at 10:49 , Martin v. Löwis wrote:

>> Modifying it so that separators are set to a blank glyph in both
>> encodings fixes
>> the problem. Fileout is attached.
>
> Thanks, works fine. Not sure whose fault it is, but when browsing
> the senders of setDemoFonts, looking at Preferences
> class>>fontConfigurationMenu, I get a
> ByteArray>>errorSubscriptBounds:,
> index=8593. The byte array in question is CaseInsensitiveMatchOrder.
>
> The call originates from TextMorphForShoutEditor>>againOnce:,
> where it says
>
> where ← paragraph text findString: FindText startingAt:
> self stopIndex caseSensitive:
> ((ChangeText ~~ FindText) or: [Preferences caseSensitiveFinds]).
>
> The actual problem seems to be that CaseInsensitiveMatchOrder is
> only 256 bytes in size.

Yep. Sounds like my suggestion to get the wide char paths more
exercised is valid ;)

- Bert -

Andrew Tween

Re: Unicode Arrows in Shout ( wasRe: Squeak-dev/Squeak-webimagev95-2)

In reply to this post by "Martin v. Löwis"

Hi,
----- Original Message -----
From: "Martin v. Löwis" <[hidden email]>
To: "The general-purpose Squeak developers list"
<[hidden email]>
Sent: Sunday, April 08, 2007 9:53 AM
Subject: Re: Unicode Arrows in Shout ( wasRe: Squeak-dev/Squeak-webimagev95-2)

> As another follow-up: Cut-n-paste of source text does not work anymore,
> on Linux. The arrows are stripped when pasting (both between two Squeak
> windows, and when copying from Squeak to, say, this IceDove window
> I'm typing the mail into (whereas copying the arrow from gucharmap
> works fine).

I get the same problem on Windows.
Interestingly, cut-paste of a string containing only up-arrows and left-arrows
works ok.
But not with a longer string containing both arrows and ascii chars.

Cheers,
Andy

Nicolas Cellier-3

Re: Unicode Arrows in Shout ( was Re: Squeak-dev/Squeak-webimagev95-2)

In reply to this post by Bert Freudenberg

Bert Freudenberg a écrit :

> On Apr 8, 2007, at 10:49 , Martin v. Löwis wrote:
>
>>> Modifying it so that separators are set to a blank glyph in both
>>> encodings fixes
>>> the problem. Fileout is attached.
>>
>> Thanks, works fine. Not sure whose fault it is, but when browsing
>> the senders of setDemoFonts, looking at Preferences
>> class>>fontConfigurationMenu, I get a ByteArray>>errorSubscriptBounds:,
>> index=8593. The byte array in question is CaseInsensitiveMatchOrder.
>>
>> The call originates from TextMorphForShoutEditor>>againOnce:,
>> where it says
>>
>> where ← paragraph text findString: FindText startingAt:
>> self stopIndex caseSensitive:
>> ((ChangeText ~~ FindText) or: [Preferences caseSensitiveFinds]).
>>
>> The actual problem seems to be that CaseInsensitiveMatchOrder is
>> only 256 bytes in size.
>
>
> Yep. Sounds like my suggestion to get the wide char paths more exercised
> is valid ;)
>
> - Bert -
>
>
>
>

One way to correct bugs indeed is to release the unicode arrow change,
let bugs express themselves randomly and correct them as they appear.
Because this kind of bug is so central, harvesting should come fast.

The longer way is to first write tests foreach method (preferably tests
with wide character in WideString, not just asciiString asWideString).

Former solution will let uncorrected bugs present and we then should be
prepared to big traffic dedicated to complaints and angry mails in the
lists from user loosing their changes.

But don't forget to use Mantis, you certainly experienced one of these
http://bugs.squeak.org/view.php?id=6367
http://bugs.squeak.org/view.php?id=6366
http://bugs.squeak.org/view.php?id=5331
http://bugs.squeak.org/view.php?id=3574

Some corrections are underway in 3.10.
But not all known bugs are corrected yet.
And quite sure we can find more problems because a lot of code were
optimized for single byte string and is abusively inherited by WideString.

Quite sure some bugs are also encountered and maybe cured in other forks
(Squeakland, Sophie, OLPC?). That'a a weak point of forks.

We also have to rewrite some plugins or be prepared to some loss of
efficiency.

That said, I would encourage going unicode.
And Smalltalk is a framework in which we can do this well and easily.
Try to program such a shift in C++ for example...
Problems will appear when interfacing outside world, but with clever
conversions and wrappers we can deal.

Nicolas

"Martin v. Löwis"

Re: Unicode Arrows in Shout ( was Re: Squeak-dev/Squeak-webimagev95-2)

> But don't forget to use Mantis, you certainly experienced one of these
> http://bugs.squeak.org/view.php?id=3574

Indeed, this is the one I found. The other one (cut-n-paste does not
work) apparently hasn't been reported.

> And Smalltalk is a framework in which we can do this well and easily.
> Try to program such a shift in C++ for example...

Well, the Win32 API (pure C) handles Unicode just fine for more than 10
years, now. Many applications have been ported during that time,
although you still find applications that break when confronted with
characters outside the current locale.

Many C++ APIs have been moved to Unicode also, e.g. the Mozilla XPCOM
infrastructure started out in single-byte mode, but is now Unicode
throughout.

This is a case where static typing helps: if you change the APIs (which
is essentially what has been done in both cases), you will get a
compiler error (so people have to change a lot of code), but once
you are through with these changes, there will be fewer surprises at
run-time.

Regards,
Martin

J J-6

Re: Squeak-dev/Squeak-web image v95-2

In reply to this post by Simon Michael

Well, they may be workable. But it is 2007, sooner or later one should be
able to write mathematical code in mathematical symbols. How much longer is
paper going to be much more flexible then computers for writing things down?
:)

>From: Simon Michael <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: [hidden email]
>Subject: Re: Squeak-dev/Squeak-web image v95-2
>Date: Sat, 07 Apr 2007 13:05:49 -0700
>
>J J wrote:
>>[1] In Haskell one becomes aware of the limitations of using ASCII right
>>away. The compose symbol (normally a circle) is the period. They decided
>>to use the \ symbol for lambdas (!!!) since that was the closest graphic
>>available.
>
>For what it's worth, I think both of those choices work well for haskell,
>and in a modern emacs and with a lucky font choice :), (setq
>haskell-font-lock-symbols 'unicode) makes them look great!
>
>

_________________________________________________________________
MSN is giving away a trip to Vegas to see Elton John. Enter to win today.
http://msnconcertcontest.com?icid-nceltontagline

J J-6

Re: Squeak-dev/Squeak-web image v95-2

In reply to this post by Lex Spoon-3

>From: Lex Spoon <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: [hidden email]
>Subject: Re: Squeak-dev/Squeak-web image v95-2
>Date: 07 Apr 2007 16:13:43 -0400
>
>I get the argument for going to Unicode, though that is not without
>its problems. What bugs me is redefining _ and ^ in non-standard
>ways. It's awkward and obscure.

Who wants to do that? I think _ should mean _. If we want an arrow for
assignment and there is one in Unicode then we should provide a way to input
it and use that. And we could provide a preference to use only ASCII for
people who want to, but the _ assignments should go away imho.

_________________________________________________________________
MSN is giving away a trip to Vegas to see Elton John. Enter to win today.
http://msnconcertcontest.com?icid-nceltontagline

timrowledge

Re: Squeak-dev/Squeak-web image v95-2

On 9-Apr-07, at 2:51 AM, J J wrote:

>> From: Lex Spoon <[hidden email]>
>> Reply-To: The general-purpose Squeak developers list<squeak-
>> [hidden email]>
>> To: [hidden email]
>> Subject: Re: Squeak-dev/Squeak-web image v95-2
>> Date: 07 Apr 2007 16:13:43 -0400
>>
>> I get the argument for going to Unicode, though that is not without
>> its problems. What bugs me is redefining _ and ^ in non-standard
>> ways. It's awkward and obscure.
>
> Who wants to do that? I think _ should mean _. If we want an
> arrow for assignment and there is one in Unicode then we should
> provide a way to input it and use that. And we could provide a
> preference to use only ASCII for people who want to, but the _
> assignments should go away imho.

Exactly; I certainly didn't intend to suggest "redefining _ and ^ in
non-standard
ways" and I don't think I saw anyone else do so. We have the option
of using proper characters that are outside the narrow scope of mere
ASCII to improve our system. I posit the we should so do.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- Ought to have a warning label on his forehead.

Prof. Andrew P. Black

Re: Squeak-dev/Squeak-web image v95-2

In reply to this post by J J-6

Fortress is using unicode, so that it can look more like mathematics. IIRC, they have an ASCII encoding that can be used for input or portability, but the language is designed to be typed with multiple fonts and wide range of glyphs.

On 9 Apr 2007, at 2:49, J J wrote:

But it is 2007, sooner or later one should be able to write mathematical code in mathematical symbols. How much longer is paper going to be much more flexible then computers for writing things down? :)

Andrew P. Black

Department of Computer Science

Portland State University

+1 503 725 2411

Lex Spoon-3

Re: Squeak-dev/Squeak-web image v95-2

In reply to this post by timrowledge

tim Rowledge <[hidden email]> writes:
> Exactly; I certainly didn't intend to suggest "redefining _ and ^ in
> non-standard
> ways" and I don't think I saw anyone else do so. We have the option
> of using proper characters that are outside the narrow scope of mere
> ASCII to improve our system. I posit the we should so do.

As you know, this is the status quo in Squeak. _ and ^ are left-arrow
and up-arrow in many Squeak fonts. Changing these back to the normal
characters is IMHO an improvement for Squeak interop, and an
improvement in general.

You did propose a collection of rewrites that should happen as code
comes into and out of the system. This is part of what I meant by
treating characters in non-standard ways. Whether or not we switch to
Unicode, translating on file-in/file-out is problematic. Just
consider cut and paste with Squeak workspaces. Sometimes workspaces
hold code and sometimes they do not, so you cannot know whether or not
to apply the rewrite. It is far nicer if you can treat the code as
normal text, and thus copy/paste it as is.

Even if we do switch to Unicode, there is a case for sticking with :=
and ^ instead of using Unicode arrow characters. := and ^ are easy to
type, and they will work fine in every tool that processes text. You
don't have to get into rewrites and pretty printers; you can just use
the text as it is. That degree of interop is powerful for our users.

Lex

123