MacRoman, Latin1, squeak fonts, and non breaking spaces.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

MacRoman, Latin1, squeak fonts, and non breaking spaces.

Jerome Peace
Hi Bert and other concerned folk.

In reading Bert's post about fixing fonts to show the
invisible characters I was reminded of tripping over
the nonbreaking space (nbsp).

See mantis report:

http://bugs.impara.de/view.php?id=2446
 

I use a Mac and MacRoman defines nbsp as char 202. And
this can be gotten from Character nbsp.

In the default font in 7021 this appears as the
British pound sign. There are some squeak fonts
(atlantis for example) that will show a blank space
for that character.

Now Bert's fix uses char 160. Which is used by
browsers as nbsp but the Latin1 standard I was pointed
to has 160 in a range of undefined character values.  


So the question is there is (at least one) bug in
this. What is the bug?

1) Should nbsp be define as the latin1 value?
2) Should squeak fonts have a way of saying what set
of characters they represent?
3) Should the available fonts in squeak be consistent
in choice of encodeing?
4) Should Character class be refactored to reflect the
 ability to choose different encodings?
5) Should Character class be debugged to reflect
Latin1 rather than MacRoman encodings? If so what do
you do about MacRoman?

I have enough knowledge to know these questions are
significant to the well being and maintenence of
squeak. I am out of my depth in trying to suggest
answers.

It would be good it someone who understands the issue
more deeply would formulate a mantis issue around it.

Yours in service, -- Jerome Peace

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 

Reply | Threaded
Open this post in threaded view
|

Re: MacRoman, Latin1, squeak fonts, and non breaking spaces.

Jerome Peace
Hi all,

Ah, the eyes are going and the glasses are not up to
the task.

Correction: the default font shows 202 as a E with a
circumflex (the latin1 encoding).

--- Peace Jerome <[hidden email]> wrote:

> Hi Bert and other concerned folk.
>
> In reading Bert's post about fixing fonts to show
> the
> invisible characters I was reminded of tripping over
> the nonbreaking space (nbsp).
>
> See mantis report:
>
> http://bugs.impara.de/view.php?id=2446
>  
>
> I use a Mac and MacRoman defines nbsp as char 202.
> And
> this can be gotten from Character nbsp.
>
> In the default font in 7021 this appears as the
> British pound sign. There are some squeak fonts
> (atlantis for example) that will show a blank space
> for that character.
>
> Now Bert's fix uses char 160. Which is used by
> browsers as nbsp but the Latin1 standard I was
> pointed
> to has 160 in a range of undefined character values.
>  
>
>
> So the question is there is (at least one) bug in
> this. What is the bug?
>
> 1) Should nbsp be define as the latin1 value?
> 2) Should squeak fonts have a way of saying what set
> of characters they represent?
> 3) Should the available fonts in squeak be
> consistent
> in choice of encodeing?
> 4) Should Character class be refactored to reflect
> the
>  ability to choose different encodings?
> 5) Should Character class be debugged to reflect
> Latin1 rather than MacRoman encodings? If so what do
> you do about MacRoman?
>
> I have enough knowledge to know these questions are
> significant to the well being and maintenence of
> squeak. I am out of my depth in trying to suggest
> answers.
>
> It would be good it someone who understands the
> issue
> more deeply would formulate a mantis issue around
> it.
>
> Yours in service, -- Jerome Peace
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around
> http://mail.yahoo.com 
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 

Reply | Threaded
Open this post in threaded view
|

Re: MacRoman, Latin1, squeak fonts, and non breaking spaces.

Bert Freudenberg-3
In reply to this post by Jerome Peace
Am 10.04.2006 um 02:19 schrieb Peace Jerome:

> Hi Bert and other concerned folk.
>
> In reading Bert's post about fixing fonts to show the
> invisible characters I was reminded of tripping over
> the nonbreaking space (nbsp).
>
> See mantis report:
>
> http://bugs.impara.de/view.php?id=2446
>
>
> I use a Mac and MacRoman defines nbsp as char 202. And
> this can be gotten from Character nbsp.

Doesn't have anything to do with the host operating system. We  
switched to Unicode, of which latin-1 (iso-8859-1) is the 8-bit  
subset (nitpicking aside).

> In the default font in 7021 this appears as the
> British pound sign.

It should be Ê (E circumflex).

> There are some squeak fonts
> (atlantis for example) that will show a blank space
> for that character.

Only because Atlantis never had a glyph for "E circumflex". That's  
why it was blank. That's why it's replaced with a rectangle with my  
fix now.

> Now Bert's fix uses char 160. Which is used by
> browsers as nbsp but the Latin1 standard I was pointed
> to has 160 in a range of undefined character values.

Codepoints 128-159 do have a meaning but no glyphs in Unicode. 160 is  
indeed the non-breaking space. It's "reserved" in that there is no  
actual glyph associated with it, in that respect it's more like a  
control character. However, for our particular implementation of  
bitmap fonts it's convenient to just use a blank glyph.

See http://www.unicode.org/charts/PDF/U0080.pdf

> So the question is there is (at least one) bug in
> this. What is the bug?
>
> 1) Should nbsp be define as the latin1 value?

Yes.

> 2) Should squeak fonts have a way of saying what set
> of characters they represent?

I guess so.

> 3) Should the available fonts in squeak be consistent
> in choice of encodeing?

In an ideal world, yes. For practical reasons I think we have to deal  
with whatever we get.

> 4) Should Character class be refactored to reflect the
>  ability to choose different encodings?

No. Characters are not encoded, they represent Unicode values.

Or at least by default they are. We support some non-unicode 16-bit  
encodings for asian languages, too, IIRC. Yoshiki would know best.

> 5) Should Character class be debugged to reflect
> Latin1 rather than MacRoman encodings?

Yes.

> If so what do you do about MacRoman?

Use the appropriate converter class.

> I have enough knowledge to know these questions are
> significant to the well being and maintenence of
> squeak. I am out of my depth in trying to suggest
> answers.
>
> It would be good it someone who understands the issue
> more deeply would formulate a mantis issue around it.

Sure. There's a whole lot still to do in that area.

- Bert -