character encoding again

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

character encoding again

Sten Kvamme
Hello
I'm new here and would like to introduce myself, skip to section two  
for the question. I have been programming for 25 years but never in  
Smalltalk. It was Seaside that made me sit down and give it a try.  
However, the whole idea of programming in a running system takes some  
time to grasp. My current programming languages are Java, C and  
Prolog. I live in the south-west of Sweden.

I have browsed thru the archives but couldn't find anything about  
converting accented characters to "HTML encoding". I would like the  
swedish Å (an A with a ring) to be converted to Å in HTML  
source, just as & is converted to &

This is done in the WAAbstractHtmlBuilder class method initialize if  
I am not mistaken. If I add my own accented character there, how can  
I get the change to take effect. (sorry for this newbie kind of  
question).

Code snippet:
        #($" 'quot' $< 'lt' $& 'amp' $> 'gt' $Å 'Aring') pairsDo:
                [:c :s | HtmlCharacters at: (c asInteger + 1) put: ('&',s,';') ]

Thanks,
Stenis_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding again

Jason Johnson-3
Sten Kvamme wrote:

> Hello
> I'm new here and would like to introduce myself, skip to section two
> for the question. I have been programming for 25 years but never in
> Smalltalk. It was Seaside that made me sit down and give it a try.
> However, the whole idea of programming in a running system takes some
> time to grasp. My current programming languages are Java, C and
> Prolog. I live in the south-west of Sweden.
>
> I have browsed thru the archives but couldn't find anything about
> converting accented characters to "HTML encoding". I would like the
> swedish Å (an A with a ring) to be converted to &Aring; in HTML
> source, just as & is converted to &amp;
>
> This is done in the WAAbstractHtmlBuilder class method initialize if I
> am not mistaken. If I add my own accented character there, how can I
> get the change to take effect. (sorry for this newbie kind of question).
>
> Code snippet:
>     #($" 'quot' $< 'lt' $& 'amp' $> 'gt' $Å 'Aring') pairsDo:
>         [:c :s | HtmlCharacters at: (c asInteger + 1) put: ('&',s,';') ]
>
> Thanks,
> Stenis_______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>

Are you running WAEncoded39 (or something like that) instead of the
default Seaside picks (WACom I think)?  If you don't then your letters
will show up as a ?.
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding again

Sten Kvamme

On Mar 5, 2007, at 6:55 , Jason Johnson wrote:

> Sten Kvamme wrote:
>> Hello
>> I'm new here and would like to introduce myself, skip to section  
>> two for the question. I have been programming for 25 years but  
>> never in Smalltalk. It was Seaside that made me sit down and give  
>> it a try. However, the whole idea of programming in a running  
>> system takes some time to grasp. My current programming languages  
>> are Java, C and Prolog. I live in the south-west of Sweden.
>>
>> I have browsed thru the archives but couldn't find anything about  
>> converting accented characters to "HTML encoding". I would like  
>> the swedish Å (an A with a ring) to be converted to &Aring; in  
>> HTML source, just as & is converted to &amp;
>>
>> This is done in the WAAbstractHtmlBuilder class method initialize  
>> if I am not mistaken. If I add my own accented character there,  
>> how can I get the change to take effect. (sorry for this newbie  
>> kind of question).
>>
>> Code snippet:
>>     #($" 'quot' $< 'lt' $& 'amp' $> 'gt' $Å 'Aring') pairsDo:
>>         [:c :s | HtmlCharacters at: (c asInteger + 1) put:  
>> ('&',s,';') ]
>>
>> Thanks,
>> Stenis_______________________________________________
>> Seaside mailing list
>> [hidden email]
>> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>>
>
> Are you running WAEncoded39 (or something like that) instead of the  
> default Seaside picks (WACom I think)?  If you don't then your  
> letters will show up as a ?.
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside


Yes, WAEncoded39. The character is there in the output, not a  
questionmark. But I want it to be encoded to be sure it will show up  
correctly in all kind of browsers. I want Å to be &Aring; and If I  
only could find out how to "cold start" Seaside in order to have the  
WAAbstractHtmlBuilder class method initialize to be invoked I should  
be fine. I am on very thin ice here, I don't really know how the  
system works so "cold start" is probably not in your vocabulary.



_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding again

Lukas Renggli
> Yes, WAEncoded39. The character is there in the output, not a
> questionmark. But I want it to be encoded to be sure it will show up
> correctly in all kind of browsers. I want Å to be &Aring; and If I
> only could find out how to "cold start" Seaside in order to have the
> WAAbstractHtmlBuilder class method initialize to be invoked I should
> be fine. I am on very thin ice here, I don't really know how the
> system works so "cold start" is probably not in your vocabulary.

Yes, the last "cold start" happened more than 25 years ago. I am
afraid that I wasn't even born by then ...

To answer your question: Evaluate

     WAAbstractHtmlBuilder initialize

what will rebuild the updated encoding table.

Cheers,
Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding again

Sten Kvamme
> To answer your question: Evaluate
>
>     WAAbstractHtmlBuilder initialize
>
> what will rebuild the updated encoding table.
>
> Cheers,
> Lukas
>
> --  
> Lukas Renggli
> http://www.lukas-renggli.ch
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

Thanks Lucas,

It works great! I'm blushing, the answer was even right under my nose  
(as usual). Anyway, Here is the HTML encoding I am using for the  
upper part of the Squeak character set (only a couple of characters  
are missing). You have probably done this already.


initialize
        "WAHtmlBuilder initialize"
        HtmlCharacters _ Array new: 256.
       
        0 to: 255 do: [:ea | HtmlCharacters at: ea + 1 put: ea asCharacter].
       
        #($" 'quot'
        $< 'lt'
        $& 'amp'
        $> 'gt'
        $¡ 'iexcl'
        $¢ 'cent'
        $£ 'pound'
        $€ 'euro'
        $¥ 'yen'
        $§ 'sect'
        $© 'copy'
        $ª 'ordf'
        $« 'laquo'
        $¬ 'not'
        $® 'reg'
        $¯ 'macr'
        $° 'deg'
        $± 'plusmn'
        $µ 'micro'
        $¶ 'para'
        $· 'middot'
        $º 'ordm'
        $» 'raquo'
        $¿ 'iquest'
        $À 'Agrave'
        $Á 'Aacute'
        $Â 'Acirc'
        $Ã 'Atilde'
        $Ä 'Auml'
        $Å 'Aring'
        $Æ 'AElig'
        $Ç 'Ccedil'
        $È 'Egrave'
        $É 'Eacute'
        $Ê 'Ecirc'
        $Ë 'Euml'
        $Ì 'Igrave'
        $Í 'Iacute'
        $Î 'Icirc'
        $Ï 'Iuml'
        $Ñ 'Ntilde'
        $Ò 'Ograve'
        $Ó 'Oacute'
        $Ô 'Ocirc'
        $Õ 'Otilde'
        $Ö 'Ouml'
        $Ø 'Oslash'
        $Ù 'Ugrave'
        $Ú 'Uacute'
        $Û 'Ucirc'
        $Ü 'Uuml'
        $ß 'szlig'
        $à 'agrave'
        $á 'aacute'
        $â 'acirc'
        $ã 'atilde'
        $ä 'auml'
        $å 'aring'
        $æ 'aelig'
        $ç 'ccedil'
        $è 'egrave'
        $é 'eacute'
        $ê 'ecirc'
        $ë 'euml'
        $ì 'igrave'
        $í 'iacute'
        $î 'icirc'
        $ï 'iuml'
        $ñ 'ntilde'
        $ò 'ograve'
        $ó 'oacute'
        $ô 'ocirc'
        $õ 'otilde'
        $ö 'ouml'
        $÷ 'divide'
        $ø 'oslash'
        $ù 'ugrave'
        $ú 'uacute'
        $û 'ucirc'
        $ü 'uuml'
        $ÿ 'yuml') pairsDo:
                [:c :s | HtmlCharacters at: (c asInteger + 1) put: ('&',s,';') ]_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding again

Sten Kvamme

On Mar 5, 2007, at 10:22 , Sten Kvamme wrote:

>> To answer your question: Evaluate
>>
>>     WAAbstractHtmlBuilder initialize
>>
>> what will rebuild the updated encoding table.
>>
>> Cheers,
>> Lukas
>>

Thanks again, and sorry for misspelling your name Lukas.

I got a friendly advice how to publish code (thanks). Here's an st-
file with the extended character encoding:

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

WAAbstractHtmlBuilder class-initialize.st (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: character encoding again

Lukas Renggli
In reply to this post by Sten Kvamme
> It works great! I'm blushing, the answer was even right under my nose
> (as usual). Anyway, Here is the HTML encoding I am using for the
> upper part of the Squeak character set (only a couple of characters
> are missing). You have probably done this already.

I wouldn't do that and I am sure you will sooner or later run into troubles.

In my experience the easiest (but certainly not the only) way to make
things work is the following:

- leave WAHtmlBuilder class>>#initialize as it is
- use WAKom instead of anything that does a special encoding
- use #isoToUtf8 on all strings you define in your image
- strings that come from the outside (requests) are already utf-8

Cheers,
Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding again

Philippe Marschall
2007/3/5, Lukas Renggli <[hidden email]>:

> > It works great! I'm blushing, the answer was even right under my nose
> > (as usual). Anyway, Here is the HTML encoding I am using for the
> > upper part of the Squeak character set (only a couple of characters
> > are missing). You have probably done this already.
>
> I wouldn't do that and I am sure you will sooner or later run into troubles.
>
> In my experience the easiest (but certainly not the only) way to make
> things work is the following:
>
> - leave WAHtmlBuilder class>>#initialize as it is
> - use WAKom instead of anything that does a special encoding

- use Squeak 3.8, you can't use WAKom on Squeak 3.9

> - use #isoToUtf8 on all strings you define in your image
> - strings that come from the outside (requests) are already utf-8
>
> Cheers,
> Lukas
>
> --
> Lukas Renggli
> http://www.lukas-renggli.ch
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding again

Sten Kvamme
In reply to this post by Lukas Renggli

On Mar 5, 2007, at 11:42 , Lukas Renggli wrote:

>
> I wouldn't do that and I am sure you will sooner or later run into  
> troubles.

I get it. Accented characters input by the user in a browser  
textfield will be encoded in the wrong way.

Thanks.

>
> In my experience the easiest (but certainly not the only) way to make
> things work is the following:
>
> - leave WAHtmlBuilder class>>#initialize as it is
> - use WAKom instead of anything that does a special encoding
> - use #isoToUtf8 on all strings you define in your image
> - strings that come from the outside (requests) are already utf-8
>
> Cheers,
> Lukas
>
> --
> Lukas Renggli
> http://www.lukas-renggli.ch
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside



_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding again

cbeler
In reply to this post by Lukas Renggli
Hi

I think I need some precisions...

>
> I wouldn't do that and I am sure you will sooner or later run into
> troubles.
>
> In my experience the easiest (but certainly not the only) way to make
> things work is the following:
>
> - leave WAHtmlBuilder class>>#initialize as it is
> - use WAKom instead of anything that does a special encoding
> - use #isoToUtf8 on all strings you define in your image
> - strings that come from the outside (requests) are already utf-8
>
So instead of using WAKomEncoded39.... we can use WAKom and converts
image strings in UTF8.
Is it correct ?
What about the performance ? better with WAKom/#isoToUtf8 or with
WAKomEncoded39 (or equal :) ) ?

Thanks

Cédrick
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding again

Philippe Marschall
2007/3/5, Cédrick Béler <[hidden email]>:

> Hi
>
> I think I need some precisions...
> >
> > I wouldn't do that and I am sure you will sooner or later run into
> > troubles.
> >
> > In my experience the easiest (but certainly not the only) way to make
> > things work is the following:
> >
> > - leave WAHtmlBuilder class>>#initialize as it is
> > - use WAKom instead of anything that does a special encoding
> > - use #isoToUtf8 on all strings you define in your image
> > - strings that come from the outside (requests) are already utf-8
> >
> So instead of using WAKomEncoded39.... we can use WAKom and converts
> image strings in UTF8.
> Is it correct ?
No, WAKom does not converting whatsoever.

> What about the performance ? better with WAKom/#isoToUtf8 or with
> WAKomEncoded39 (or equal :) ) ?

Premature optimization. Performance should play no role in your
decision WAKom <-> WAKomEncoded

You don't have this choice in Squeak 3.9. In Squeak 3.9 WAKomEncoded39
is the only thing that works.

Philippe

> Thanks
>
> Cédrick
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding again

Sten Kvamme
In reply to this post by Philippe Marschall

On Mar 5, 2007, at 11:48 , Philippe Marschall wrote:

>
> - use Squeak 3.8, you can't use WAKom on Squeak 3.9
>
>> - use #isoToUtf8 on all strings you define in your image
>> - strings that come from the outside (requests) are already utf-8
>>
>> Cheers,
>> Lukas


I am now using Squeak 3.9 and WAKom and isoToUtf8 on strings defined  
in the image. Accented characters are displayed correctly both from  
in_the_image defined strings and strings from browser input textfields.

What problem have you seen with Squeak 3.9 and WAKom?

Thanks,
Sten
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding again

Philippe Marschall
2007/3/5, Sten Kvamme <[hidden email]>:

>
> On Mar 5, 2007, at 11:48 , Philippe Marschall wrote:
>
> >
> > - use Squeak 3.8, you can't use WAKom on Squeak 3.9
> >
> >> - use #isoToUtf8 on all strings you define in your image
> >> - strings that come from the outside (requests) are already utf-8
> >>
> >> Cheers,
> >> Lukas
>
>
> I am now using Squeak 3.9 and WAKom and isoToUtf8 on strings defined
> in the image. Accented characters are displayed correctly both from
> in_the_image defined strings and strings from browser input textfields.
>
> What problem have you seen with Squeak 3.9 and WAKom?

- Put some korean text into a textfiled.
- Submit the form

Philippe

> Thanks,
> Sten
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding again

Lukas Renggli
> > I am now using Squeak 3.9 and WAKom and isoToUtf8 on strings defined
> > in the image. Accented characters are displayed correctly both from
> > in_the_image defined strings and strings from browser input textfields.
> >
> > What problem have you seen with Squeak 3.9 and WAKom?
>
> - Put some korean text into a textfiled.
> - Submit the form

Why don't you just commit the version of KomHttpServer we fixed for
3.9? Then nobody would need those ugly back and forth encoding
transformations anymore.

Cheers,
Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding again

Philippe Marschall
2007/3/5, Lukas Renggli <[hidden email]>:

> > > I am now using Squeak 3.9 and WAKom and isoToUtf8 on strings defined
> > > in the image. Accented characters are displayed correctly both from
> > > in_the_image defined strings and strings from browser input textfields.
> > >
> > > What problem have you seen with Squeak 3.9 and WAKom?
> >
> > - Put some korean text into a textfiled.
> > - Submit the form
>
> Why don't you just commit the version of KomHttpServer we fixed for
> 3.9? Then nobody would need those ugly back and forth encoding
> transformations anymore.

Because I'm not the maintainer of Kom.

Philippe

> Cheers,
> Lukas
>
> --
> Lukas Renggli
> http://www.lukas-renggli.ch
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding again

Lukas Renggli
> > Why don't you just commit the version of KomHttpServer we fixed for
> > 3.9? Then nobody would need those ugly back and forth encoding
> > transformations anymore.
>
> Because I'm not the maintainer of Kom.

I know. I suggest just to fork. Hacking around Kom has to stop.

--
Lukas Renggli
http://www.lukas-renggli.ch
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding again

Philippe Marschall
2007/3/5, Lukas Renggli <[hidden email]>:
> > > Why don't you just commit the version of KomHttpServer we fixed for
> > > 3.9? Then nobody would need those ugly back and forth encoding
> > > transformations anymore.
> >
> > Because I'm not the maintainer of Kom.
>
> I know. I suggest just to fork. Hacking around Kom has to stop.

I don't feel like maintaining a fork of Kom either.

Philippe

> --
> Lukas Renggli
> http://www.lukas-renggli.ch
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside