Iliad: problem with UTF-8 in text: what the heck?????

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Iliad: problem with UTF-8 in text: what the heck?????

Bèrto ëd Sèra
Hi!

I'm positive this is just MY problem, since I see

  ot aufgabe
     text: 'Was zählt zur Hardware eines Computers?';

in Stephan's blog. Yet, when served by my box it renders as:

Was zählt zur...

What can this beast be?

Berto

--
==============================
Constitution du 24 juin 1793 - Article 35. - Quand le gouvernement
viole les droits du peuple, l'insurrection est, pour le peuple et pour
chaque portion du peuple, le plus sacré des droits et le plus
indispensable des devoirs.


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Iliad: problem with UTF-8 in text: what the heck?????

Paolo Bonzini-2
On 08/08/2009 09:44 AM, Bèrto ëd Sèra wrote:
> in Stephan's blog. Yet, when served by my box it renders as:
>
> Was zählt zur...

It's being parsed as ISO-8859-something.  Check the headers with
wireshark or something.

Paolo


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Iliad: problem with UTF-8 in text: what the heck?????

Stefan Schmiedl
In reply to this post by Bèrto ëd Sèra
On Sat, 8 Aug 2009 10:44:33 +0300
Bèrto ëd Sèra <[hidden email]> wrote:

> Hi!
>
> I'm positive this is just MY problem, since I see
>
>   ot aufgabe
>      text: 'Was zählt zur Hardware eines Computers?';
>
> in Stephan's blog. Yet, when served by my box it renders as:
>
> Was zählt zur...
>
> What can this beast be?

The input file you're quoting above is using ISO-8859-1 (in fact
windows-1252) , as it's coming from a windoze box.

hmm... just now Paolo's reply popped up. Might be an opportunity
to ask what gst input should be encoded as? Are strings "just byte
arrays" or do we have encoders and a canonical internal representation?

I remember wondering about this back when I had this effect, too,
but then forgot about it as it disappeared with the "right" encoding
of the input file.

Anyways, the à is a dead giveaway, as it's the ISO-8859-1 representation
of one of the multibyte markers in UTF-8. So it could be two things:
- your browser uses ISO-8859 encoding when it should be using UTF-8
- your input was UTF-8 encoded but got parsed as ISO-8859

s.



s.


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Iliad: problem with UTF-8 in text: what the heck?????

Paolo Bonzini-2

> hmm... just now Paolo's reply popped up. Might be an opportunity
> to ask what gst input should be encoded as? Are strings "just byte
> arrays" or do we have encoders and a canonical internal representation?

Strings should match whatever the LC_* environment variables say.  If
you manually use EncodedStream and methods such as
#asString:/#asUnicodeString: you can use strings in whatever encoding
you want.

> Anyways, the à is a dead giveaway, as it's the ISO-8859-1 representation
> of one of the multibyte markers in UTF-8. So it could be two things:
> - your browser uses ISO-8859 encoding when it should be using UTF-8
> - your input was UTF-8 encoded but got parsed as ISO-8859

Indeed.

Paolo


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Iliad: problem with UTF-8 in text: what the heck?????

Bèrto ëd Sèra
Hi!

Headers seems okay:
Hypertext Transfer Protocol
    GET /ambaradan HTTP/1.1\r\n
    Host: localhost:8080\r\n
    User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.11)
Gecko/2009080620 Gentoo Firefox/3.0.11\r\n
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
    Accept-Language: en-gb,en;q=0.8,ru;q=0.7,it;q=0.5,fr;q=0.3,es;q=0.2\r\n
    Accept-Encoding: gzip,deflate\r\n
    Accept-Charset: UTF-8,*\r\n
    Keep-Alive: 300\r\n
    Connection: keep-alive\r\n
    Cookie: _iliad685744=viedbqwkee8t8plm00q451yqm1s7ae-u\r\n
    Cache-Control: max-age=0\r\n
    \r\n

It looks like there's something weird on my box... so I tried the manual version

((I18N.EncodedStream new nextPutAll: ' Feòrag NicBhrìde') asString)

This won't work because #NextPut: is subclassResponsibility...

((I18N.EncodedString fromString: ' Feòrag NicBhrìde') asUnicodeString)

will return ' Feòrag NicBhrìde'

ANYWAY....

just copying and pasting the correct text from BLOX will yeld the very
same result. So if I copy the example from the Transcript and paste it
here I get:
((I18N.EncodedString fromString: ' Feòrag NicBhrìde') asUnicodeString)

Which leads me to think that I might have got something wrong with
encoding at compilation time. Or that I should not use experimental
locales on my development box (which is much more likely to be the
true case). I'll make a check on Fedora, where locales are absolutely
standard.

Berto



2009/8/8 Paolo Bonzini <[hidden email]>:

>
>> hmm... just now Paolo's reply popped up. Might be an opportunity
>> to ask what gst input should be encoded as? Are strings "just byte
>> arrays" or do we have encoders and a canonical internal representation?
>
> Strings should match whatever the LC_* environment variables say.  If you
> manually use EncodedStream and methods such as #asString:/#asUnicodeString:
> you can use strings in whatever encoding you want.
>
>> Anyways, the à is a dead giveaway, as it's the ISO-8859-1 representation
>> of one of the multibyte markers in UTF-8. So it could be two things:
>> - your browser uses ISO-8859 encoding when it should be using UTF-8
>> - your input was UTF-8 encoded but got parsed as ISO-8859
>
> Indeed.
>
> Paolo
>
>
> _______________________________________________
> help-smalltalk mailing list
> [hidden email]
> http://lists.gnu.org/mailman/listinfo/help-smalltalk
>



--
==============================
Constitution du 24 juin 1793 - Article 35. - Quand le gouvernement
viole les droits du peuple, l'insurrection est, pour le peuple et pour
chaque portion du peuple, le plus sacré des droits et le plus
indispensable des devoirs.


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Iliad: problem with UTF-8 in text: what the heck?????

Paolo Bonzini-2
On 08/08/2009 08:25 PM, Bèrto ëd Sèra wrote:
> Headers seems okay:
> Hypertext Transfer Protocol
>      GET /ambaradan HTTP/1.1\r\n
>      Host: localhost:8080\r\n
>      Accept-Charset: UTF-8,*\r\n

What's important is the reply headers.

> ((I18N.EncodedString fromString: ' Feòrag NicBhrìde') asUnicodeString)
>
> will return ' Feòrag NicBhrìde'

This means your locale is UTF-8 but your terminal is not, or something
like that.

Paolo


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Iliad: problem with UTF-8 in text: what the heck?????

Bèrto ëd Sèra
Hi all,

The problem persists, and I'm running out of ideas. I verified it on 3
boxes with different locale settings by now, so I'm progressively more
certain that it's me doing something stupid. Only, I have no clue at
what.

A public install of the UI prototype is here:
http://ktf1-itfl.urz.uni-bamberg.de/ambaradan

I do NOT advice you look at it with anything else than Mozilla 3 or
superior (3.5.2 is what it's being built for) as I'm investing no time
in checking browser compatibilities, there are loads of CSS involved
and chances are it's gonna look very funny on any other browser. It's
just quick a mockup at this point, the red buttons you see are to be
used to open (not just to toggle, as it happens now) the string
localizator, so that you can localize the GUI while using it. Red
means there is no string, Yellow will mark fuzzy translations (msg
changed upstream). Normally there will be no small ball when the
string is present, although authorized users will be able to set a
flag that will show a green one, to edit msgs that need adjustment,
even when they are marked as present and verified.

What will sadly remain on just any browser is the string at the
bottom, with the broken chars in the guy's name.

I have the like 3 of weeks to fix it, because the prototype must be
usable at least for LTR languages in mid september, at a public
conference in Ireland... and I'm really running out of ideas.

Berto

2009/8/8 Bèrto ëd Sèra <[hidden email]>:

> Replay headers are a bit mixed:
>
> HTTP/1.1 200 OK
> Server: Swazoo 2.2 Smalltalk Web Server
> Connection: keep-alive
> expires: Sat, 08 Aug 2009 23:16:44 GMT
> Cache-Control: no-store, no-cache, must-revalidate
> Allow: OPTIONS,GET,HEAD,POST,DELETE,TRACE,PROPFIND,PROPPATCH,MKCOL,PUT,COPY,MOVE,LOCK,UNLOCK
> Content-Type: text/html
> Set-Cookie: _iliad685744=w0nu5v9semja4-e5f6d9gdp-_s0whs5e; path=/;
> expires=Thu, 31 Mar 2011 0:0:0 GMT
> Date: Sat, 08 Aug 2009 20:16:44 GMT
> Content-Length: 5888
>
> <?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html....... etc
>
> This is Swazoo output for the page, and I kind of see no UTF-8 mention
> in it. The text opens with an XML declaration for UTF-8, but that may
> be too late...
>
> YE4T...
>
> Hypertext Transfer Protocol
>    HTTP/1.1 404 Not Found\r\n
>    Server: Swazoo 2.2 Smalltalk Web Server\r\n
>    Connection: keep-alive\r\n
>    Content-Type: text/html; charset=utf-8\r\n
>    Date: Sat, 08 Aug 2009 17:32:31 GMT\r\n
>    Content-Length: 46\r\n
>    \r\n
> Line-based text data: text/html
>
> Which should be okay (it's the message sent for not finding the
> gnu-smalltalk icon, that really is missing).
>
>> This means your locale is UTF-8 but your terminal is not, or something like
>> that.
> I have the same effecting by opening the page on the laptop through
> wi-fi, the laptop runs Fedora and has no weird locales on. It seems to
> happen server-side.
>
> Berto
>
> --
> ==============================
> Constitution du 24 juin 1793 - Article 35. - Quand le gouvernement
> viole les droits du peuple, l'insurrection est, pour le peuple et pour
> chaque portion du peuple, le plus sacré des droits et le plus
> indispensable des devoirs.
>



--
==============================
Constitution du 24 juin 1793 - Article 35. - Quand le gouvernement
viole les droits du peuple, l'insurrection est, pour le peuple et pour
chaque portion du peuple, le plus sacré des droits et le plus
indispensable des devoirs.


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Fwd: Iliad: problem with UTF-8 in text: what the heck?????

Bèrto ëd Sèra
In reply to this post by Paolo Bonzini-2
I had hit the wrong reply button for this message, as I always seem to
do... I hate gmail :(


---------- Forwarded message ----------
From: Bèrto ëd Sèra <[hidden email]>
Date: 2009/8/8
Subject: Re: [Help-smalltalk] Iliad: problem with UTF-8 in text: what
the heck?????
To: Paolo Bonzini <[hidden email]>


Replay headers are a bit mixed:

HTTP/1.1 200 OK
Server: Swazoo 2.2 Smalltalk Web Server
Connection: keep-alive
expires: Sat, 08 Aug 2009 23:16:44 GMT
Cache-Control: no-store, no-cache, must-revalidate
Allow: OPTIONS,GET,HEAD,POST,DELETE,TRACE,PROPFIND,PROPPATCH,MKCOL,PUT,COPY,MOVE,LOCK,UNLOCK
Content-Type: text/html
Set-Cookie: _iliad685744=w0nu5v9semja4-e5f6d9gdp-_s0whs5e; path=/;
expires=Thu, 31 Mar 2011 0:0:0 GMT
Date: Sat, 08 Aug 2009 20:16:44 GMT
Content-Length: 5888

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html....... etc

This is Swazoo output for the page, and I kind of see no UTF-8 mention
in it. The text opens with an XML declaration for UTF-8, but that may
be too late...

YE4T...

Hypertext Transfer Protocol
   HTTP/1.1 404 Not Found\r\n
   Server: Swazoo 2.2 Smalltalk Web Server\r\n
   Connection: keep-alive\r\n
   Content-Type: text/html; charset=utf-8\r\n
   Date: Sat, 08 Aug 2009 17:32:31 GMT\r\n
   Content-Length: 46\r\n
   \r\n
Line-based text data: text/html

Which should be okay (it's the message sent for not finding the
gnu-smalltalk icon, that really is missing).

> This means your locale is UTF-8 but your terminal is not, or something like
> that.
I have the same effecting by opening the page on the laptop through
wi-fi, the laptop runs Fedora and has no weird locales on. It seems to
happen server-side.

Berto

--
==============================
Constitution du 24 juin 1793 - Article 35. - Quand le gouvernement
viole les droits du peuple, l'insurrection est, pour le peuple et pour
chaque portion du peuple, le plus sacré des droits et le plus
indispensable des devoirs.



--
==============================
Constitution du 24 juin 1793 - Article 35. - Quand le gouvernement
viole les droits du peuple, l'insurrection est, pour le peuple et pour
chaque portion du peuple, le plus sacré des droits et le plus
indispensable des devoirs.


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Iliad: problem with UTF-8 in text: what the heck?????

Bèrto ëd Sèra
In reply to this post by Bèrto ëd Sèra
Forgot to add, the public box you see has the following locale (all
the others I tried have different UTF-8 locales set)

root@ktf1:~# locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

hmmm, I just ralized that I'm running gst as root... okay it's an
empty testbox, but still...
Bèrto


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Iliad: problem with UTF-8 in text: what the heck?????

Paolo Bonzini-2
In reply to this post by Bèrto ëd Sèra
On 08/20/2009 01:22 PM, Bèrto ëd Sèra wrote:

> Hi all,
>
> The problem persists, and I'm running out of ideas. I verified it on 3
> boxes with different locale settings by now, so I'm progressively more
> certain that it's me doing something stupid. Only, I have no clue at
> what.
>
> A public install of the UI prototype is here:
> http://ktf1-itfl.urz.uni-bamberg.de/ambaradan
>
> I do NOT advice you look at it with anything else than Mozilla 3 or
> superior (3.5.2 is what it's being built for) as I'm investing no time
> in checking browser compatibilities, there are loads of CSS involved
> and chances are it's gonna look very funny on any other browser. It's
> just quick a mockup at this point, the red buttons you see are to be
> used to open (not just to toggle, as it happens now) the string
> localizator, so that you can localize the GUI while using it. Red
> means there is no string, Yellow will mark fuzzy translations (msg
> changed upstream). Normally there will be no small ball when the
> string is present, although authorized users will be able to set a
> flag that will show a green one, to edit msgs that need adjustment,
> even when they are marked as present and verified.
>
> What will sadly remain on just any browser is the string at the
> bottom, with the broken chars in the guy's name.

And how am I supposed to use it to see the bug?

Step by step please.  Click here, type this, click there.

Paolo


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Iliad: problem with UTF-8 in text: what the heck?????

Bèrto ëd Sèra
LOL, sorry, I supposed it was self-explanatory :)

just look at the foot of the document. The following message:
    footerContent [
        ^[ :e |
            | divFooter divFooterText |
            divFooter := e div id: 'footer'.
            divFooterText := divFooter div id: 'footer-text'.
            divFooterText build: (self localize: #credits).
            (divFooterText anchor)
              href: 'http://voxhumanitatis.org';
              text: ' Vox Humanitatis'.
            divFooterText break.
            divFooterText build: (self localize: #fontsCredits).
            (divFooterText anchor)
              href:
'http://www.antipope.org/feorag/freestuff/strangenewes.html';
              text: ' Feòrag NicBhrìde'.
            divFooterText break.
            divFooterText image source: '/images/vh.jpeg'
alternativeText: 'Vox Humanitatis'.
            divFooterText image source: '/images/gst_medium.png'
alternativeText: 'GNU Smalltalk'.
            divFooterText image source: '/images/iliad_medium.png'
alternativeText: 'Iliad Framework'.
            divFooterText image source: '/images/mercurial.jpeg'
alternativeText: 'Mercurial'.
            divFooterText image source: '/images/twisted.jpeg'
alternativeText: 'Twisted'.
            divFooterText image source: '/images/postgresql.jpeg'
alternativeText: 'PostgreSql'.
            divFooterText image source: '/images/freenet.jpeg'
alternativeText: 'Freenet'.
            divFooterText image source: '/images/gentoo.jpeg'
alternativeText: 'Gentoo Linuxl'.  ]
    ]


renders as:
The free fonts used on this site have been designed by
Feòrag NicBhrìde

Bèrto

2009/8/20 Paolo Bonzini <[hidden email]>:

> On 08/20/2009 01:22 PM, Bčrto ėd Sčra wrote:
>>
>> Hi all,
>>
>> The problem persists, and I'm running out of ideas. I verified it on 3
>> boxes with different locale settings by now, so I'm progressively more
>> certain that it's me doing something stupid. Only, I have no clue at
>> what.
>>
>> A public install of the UI prototype is here:
>> http://ktf1-itfl.urz.uni-bamberg.de/ambaradan
>>
>> I do NOT advice you look at it with anything else than Mozilla 3 or
>> superior (3.5.2 is what it's being built for) as I'm investing no time
>> in checking browser compatibilities, there are loads of CSS involved
>> and chances are it's gonna look very funny on any other browser. It's
>> just quick a mockup at this point, the red buttons you see are to be
>> used to open (not just to toggle, as it happens now) the string
>> localizator, so that you can localize the GUI while using it. Red
>> means there is no string, Yellow will mark fuzzy translations (msg
>> changed upstream). Normally there will be no small ball when the
>> string is present, although authorized users will be able to set a
>> flag that will show a green one, to edit msgs that need adjustment,
>> even when they are marked as present and verified.
>>
>> What will sadly remain on just any browser is the string at the
>> bottom, with the broken chars in the guy's name.
>
> And how am I supposed to use it to see the bug?
>
> Step by step please.  Click here, type this, click there.
>
> Paolo
>



--
==============================
Constitution du 24 juin 1793 - Article 35. - Quand le gouvernement
viole les droits du peuple, l'insurrection est, pour le peuple et pour
chaque portion du peuple, le plus sacré des droits et le plus
indispensable des devoirs.


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Iliad: problem with UTF-8 in text: what the heck?????

Paolo Bonzini-2
On 08/20/2009 01:45 PM, Bèrto ëd Sèra wrote:
> LOL, sorry, I supposed it was self-explanatory :)

Oops, yes, I was overwhelmed by the UI.

Paolo


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Iliad: problem with UTF-8 in text: what the heck?????

Stefan Schmiedl
In reply to this post by Bèrto ëd Sèra
On Thu, 20 Aug 2009 14:45:30 +0300
Bèrto ëd Sèra <[hidden email]> wrote:

>             (divFooterText anchor)
>               href:
> 'http://www.antipope.org/feorag/freestuff/strangenewes.html';
>               text: ' Feòrag NicBhrìde'.
>
>
> renders as:
> The free fonts used on this site have been designed by
> Feòrag NicBhrìde

Please try converting your .st file from UTF-8 to ISO8859-1
and reload the app. I'm curious what will happen.

iconv -f utf-8 -t iso8859-1 < test.st > test2.st

s.


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk