Accented characters

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Accented characters

Koji Yokokawa
I struggled to use Japanese on Seaside recently.

The problem is not only about accented characters (Unicode). The cause
of that is a lack of the fundamental facility, 'charset', in Seaside.
Charset is very important especially in Asia. Many of Asian sites uses
various local charset not Unicode in reality.

Umezawa-san and I made a patch for internationalization of
Seaside2/Squeak. This patch fixes the problems cause of charset
encodings (includes Unicode) on Seaside2.6b1. It makes WAKom (NOT
WAKomEncoded) handle Charset, XHTML lang, encoded filenames and encoded URL.

http://squeaksource.blueplane.jp/Seaside2I18N/
(This is a project on the SqueakSource in *Japanese*, however you can
load it into your image without any changes.)

I tested it on SqueakLand3.8-05 and Squeak3.9b-7048 with UTF8 and
several Japanese charsets by hand. It works well in any encoding.

The character encoding is a basic facility of a web server so, I think,
the patch should be merged into the main package of Seaside2.
(Seaside2I18N is branched from Seaside2.6b1-lr.52.)
Please, take a look.


Koji

On Mon, 24 Jul 2006 10:48:04 -0700
Avi Bryant <[hidden email]> wrote:

>
> On Jul 24, 2006, at 7:45 AM, Damien Cassou wrote:
> >
> >
> > How is it possible that nobody cares about accented characters  
> > within Seaside ?
>
> Speaking for myself: I certainly care about them, but I use Squeak  
> 3.7.  The UTF-8 support from 3.8 mostly tends to complicate the issue  
> (of course, accented characters don't show up correctly in inspectors  
> etc, but that's a price I'm willing to pay).
>
> I'm surprised things are now broken in 3.8/3.9, however -  
> WAKomEncoded *used* to work, didn't it?
>
> Avi
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

-- !
Koji Yokokawa <[hidden email]>
    http://yengawa.com/
    ^self new!

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Accented characters

Damien Cassou-3
Koji Yokokawa wrote:

> I struggled to use Japanese on Seaside recently.
>
> The problem is not only about accented characters (Unicode). The cause
> of that is a lack of the fundamental facility, 'charset', in Seaside.
> Charset is very important especially in Asia. Many of Asian sites uses
> various local charset not Unicode in reality.
>
> Umezawa-san and I made a patch for internationalization of
> Seaside2/Squeak. This patch fixes the problems cause of charset
> encodings (includes Unicode) on Seaside2.6b1. It makes WAKom (NOT
> WAKomEncoded) handle Charset, XHTML lang, encoded filenames and encoded URL.
>
> http://squeaksource.blueplane.jp/Seaside2I18N/
> (This is a project on the SqueakSource in *Japanese*, however you can
> load it into your image without any changes.)
>
> I tested it on SqueakLand3.8-05 and Squeak3.9b-7048 with UTF8 and
> several Japanese charsets by hand. It works well in any encoding.
>
> The character encoding is a basic facility of a web server so, I think,
> the patch should be merged into the main package of Seaside2.
> (Seaside2I18N is branched from Seaside2.6b1-lr.52.)
> Please, take a look.


Hi,

thank you for this. Why don't you put it on squeaksource.com ? People
will be able to review it.


Thank you very much


Bye
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Accented characters

Philippe Marschall
In reply to this post by Koji Yokokawa
2006/7/27, Koji Yokokawa <[hidden email]>:

> I struggled to use Japanese on Seaside recently.
>
> The problem is not only about accented characters (Unicode). The cause
> of that is a lack of the fundamental facility, 'charset', in Seaside.
> Charset is very important especially in Asia. Many of Asian sites uses
> various local charset not Unicode in reality.
>
> Umezawa-san and I made a patch for internationalization of
> Seaside2/Squeak. This patch fixes the problems cause of charset
> encodings (includes Unicode) on Seaside2.6b1. It makes WAKom (NOT
> WAKomEncoded) handle Charset, XHTML lang, encoded filenames and encoded URL.

I think this should be done by WAKomEncoded instead of WAKom. WAKom is
supposed to do no conversion at all and thus effectively deals with
byte arrays rather than strings.

Like said before, for some people it's perfectly ok to have raw utf-8
(or whatever encoding) strings in the image. Others even want it that
way.

> http://squeaksource.blueplane.jp/Seaside2I18N/
> (This is a project on the SqueakSource in *Japanese*, however you can
> load it into your image without any changes.)

The problem is that this is not at all portable. I will only work on
Squeak 3.9 with Kom. No other Squeak, not other Smalltalk, no other
http server.

Philippe
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Accented characters

Koji Yokokawa
Hi,

On Thu, 27 Jul 2006 16:58:34 +0200
"Philippe Marschall" <[hidden email]> wrote:

> 2006/7/27, Koji Yokokawa <[hidden email]>:
> > I struggled to use Japanese on Seaside recently.
> >
> > The problem is not only about accented characters (Unicode). The cause
> > of that is a lack of the fundamental facility, 'charset', in Seaside.
> > Charset is very important especially in Asia. Many of Asian sites uses
> > various local charset not Unicode in reality.
> >
> > Umezawa-san and I made a patch for internationalization of
> > Seaside2/Squeak. This patch fixes the problems cause of charset
> > encodings (includes Unicode) on Seaside2.6b1. It makes WAKom (NOT
> > WAKomEncoded) handle Charset, XHTML lang, encoded filenames and encoded URL.
>
> I think this should be done by WAKomEncoded instead of WAKom. WAKom is
> supposed to do no conversion at all and thus effectively deals with
> byte arrays rather than strings.
>
> Like said before, for some people it's perfectly ok to have raw utf-8
> (or whatever encoding) strings in the image. Others even want it that
> way.

I don't think so.
The encoding depends on the application (the session to be exact), not
on the server. Therefor I added the 'charset' value as a property of an
application. Then the changes are scattered over the system. (check the
changed methods by the Monticello's 'Merge' button in your Seaside image.)


>
> > http://squeaksource.blueplane.jp/Seaside2I18N/
> > (This is a project on the SqueakSource in *Japanese*, however you can
> > load it into your image without any changes.)
>
> The problem is that this is not at all portable. I will only work on
> Squeak 3.9 with Kom. No other Squeak, not other Smalltalk, no other
> http server.

You're right.
I don't have knowledge of porting Seaside to other environment. Is there
some one teach me rules or idioms to make the code portable in Seaside?


Koji

-- !
Koji Yokokawa <[hidden email]>
    http://yengawa.com/
    ^self new!

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Accented characters

Koji Yokokawa
In reply to this post by Damien Cassou-3
Hi,

On Thu, 27 Jul 2006 16:45:34 +0200
Damien Cassou <[hidden email]> wrote:


> thank you for this. Why don't you put it on squeaksource.com ? People
> will be able to review it.

I started it only for Japanese community. But I agree with you now.
I'll put it on squeaksource.com when I have time to do.


Koji

-- !
Koji Yokokawa <[hidden email]>
    http://yengawa.com/
    ^self new!

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Accented characters

Philippe Marschall
In reply to this post by Koji Yokokawa
2006/7/27, Koji Yokokawa <[hidden email]>:

> Hi,
>
> On Thu, 27 Jul 2006 16:58:34 +0200
> "Philippe Marschall" <[hidden email]> wrote:
>
> > 2006/7/27, Koji Yokokawa <[hidden email]>:
> > > I struggled to use Japanese on Seaside recently.
> > >
> > > The problem is not only about accented characters (Unicode). The cause
> > > of that is a lack of the fundamental facility, 'charset', in Seaside.
> > > Charset is very important especially in Asia. Many of Asian sites uses
> > > various local charset not Unicode in reality.
> > >
> > > Umezawa-san and I made a patch for internationalization of
> > > Seaside2/Squeak. This patch fixes the problems cause of charset
> > > encodings (includes Unicode) on Seaside2.6b1. It makes WAKom (NOT
> > > WAKomEncoded) handle Charset, XHTML lang, encoded filenames and encoded URL.
> >
> > I think this should be done by WAKomEncoded instead of WAKom. WAKom is
> > supposed to do no conversion at all and thus effectively deals with
> > byte arrays rather than strings.
> >
> > Like said before, for some people it's perfectly ok to have raw utf-8
> > (or whatever encoding) strings in the image. Others even want it that
> > way.
>
> I don't think so.
> The encoding depends on the application (the session to be exact), not
> on the server. Therefor I added the 'charset' value as a property of an
> application. Then the changes are scattered over the system. (check the
> changed methods by the Monticello's 'Merge' button in your Seaside image.)

I think we are talking about different this. What I meant is the
following. Suppose you have an application that uses utf-8 (or
whatever encoding) both externally and in the backend for the
database. The application never needs to query the size of strings in
number of characters and never directly indices into the strings.

You now have to options. Either convert the strings that come into the
image (form database or web) to WideStrings only the convert them back
to the original encoding when the out of the image (to database or
web) or do no conversion at all. Sometimes the later really is a valid
option.

> >
> > > http://squeaksource.blueplane.jp/Seaside2I18N/
> > > (This is a project on the SqueakSource in *Japanese*, however you can
> > > load it into your image without any changes.)
> >
> > The problem is that this is not at all portable. I will only work on
> > Squeak 3.9 with Kom. No other Squeak, not other Smalltalk, no other
> > http server.
>
> You're right.
> I don't have knowledge of porting Seaside to other environment. Is there
> some one teach me rules or idioms to make the code portable in Seaside?

Michel was our expert here but it looks like Boris has taken over. So
they are probably better qualified. Some rules I learned:

1. don't send #asString, send #displayString instead (exception WAUrl)
2. move platform specifc stuff to SeasidePlatformSupport.

Now in your special case, I suggest the following

What about the following contract:
We don't do any conversion (character encoding or decoding) in
Seaside. We do it in the server adapters. This should make porting
easier since they are platform specific anyway. This way the get rid
of all the TextConverters in Seaside (I don't think they are anywhere
near portable). In cases where we absolutely have to (probably WAUrl)
move it to SeasidePlatformSupport.

Move the Kom specific stuff to Kom. We probably have to do a Kom for 3.9.

Let's keep it the way that WAKom does not do any de/encoding and do it
instead in WAKomEncoded.

And ask Michel, Boris and Avi what the think about it.

Philippe
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
12