VAST and UTF-8 / bug in AbtCodePageConverter?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

VAST and UTF-8 / bug in AbtCodePageConverter?

jtuchel
I know, this is an issue that has been discussed so many times that noone can handle any more.

But I have to: I am using VAST with Seaside and Ajax, so I have to handle UTF-8 form data that is serialized using Ajax calls. There seems to be no *working* way to convince jQuery or the Browser to send an Ajax request in another charset than utf.

So while I have found a way to make hand-written javascript code encode Form data in iso-8859-1 before it is sent to the server, there is one absolute negative drawback: I must surpass all server side javascript code like JQAjax>>serializeForm and friends. These methods are all useless in the VAST context if you need to be prepared for non-US-ASCII.

For a little search field I decided to do the STTCPW, and therefor do a server side conversion using this snippet:

AbtCodePageConverter current
            convert: anObject
            fromCodePage: (AbtAbstractCodePageConverter codePageFromCharacterSet: 'UTF-8')
            toCodePage: (AbtCodePageConverter codePageFromCharacterSet:  'iso-8859-15')


This does, however yield a String with trailing zero bytes, because it internally uses a bufferSize of 4*length of the String to be converted.

I think that is wrong. The result of a conversion is not a String with trailing null bytes!

It is not hard to get around this, like so:

        (AbtCodePageConverter current
            convert: anObject
            fromCodePage: (AbtAbstractCodePageConverter codePageFromCharacterSet: 'UTF-8')
            toCodePage: (AbtCodePageConverter codePageFromCharacterSet:  'iso-8859-15')) reject: [:ch| ch codePoint = 0]

but that is a dirty hack and should be hidden in the convert:fromCodePagetoCodePage:

Joachim

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: VAST and UTF-8 / bug in AbtCodePageConverter?

jtuchel
This whole thing makes me really sick!

Now I've deployed the image on our Linux server and the conversion doesn't work there. Neither to iso-8859-1 nor to currentCodePage. 

This is so frustrating!
So I'll have to find a way to bypass jQuery's serialize() once more for this component. It's really time VAST learns to handle UTF-8...

Joachim


Am Dienstag, 27. Mai 2014 08:48:18 UTC+2 schrieb Joachim Tuchel:
I know, this is an issue that has been discussed so many times that noone can handle any more.

But I have to: I am using VAST with Seaside and Ajax, so I have to handle UTF-8 form data that is serialized using Ajax calls. There seems to be no *working* way to convince jQuery or the Browser to send an Ajax request in another charset than utf.

So while I have found a way to make hand-written javascript code encode Form data in iso-8859-1 before it is sent to the server, there is one absolute negative drawback: I must surpass all server side javascript code like JQAjax>>serializeForm and friends. These methods are all useless in the VAST context if you need to be prepared for non-US-ASCII.

For a little search field I decided to do the STTCPW, and therefor do a server side conversion using this snippet:

AbtCodePageConverter current
            convert: anObject
            fromCodePage: (AbtAbstractCodePageConverter codePageFromCharacterSet: 'UTF-8')
            toCodePage: (AbtCodePageConverter codePageFromCharacterSet:  'iso-8859-15')


This does, however yield a String with trailing zero bytes, because it internally uses a bufferSize of 4*length of the String to be converted.

I think that is wrong. The result of a conversion is not a String with trailing null bytes!

It is not hard to get around this, like so:

        (AbtCodePageConverter current
            convert: anObject
            fromCodePage: (AbtAbstractCodePageConverter codePageFromCharacterSet: 'UTF-8')
            toCodePage: (AbtCodePageConverter codePageFromCharacterSet:  'iso-8859-15')) reject: [:ch| ch codePoint = 0]

but that is a dirty hack and should be hidden in the convert:fromCodePagetoCodePage:

Joachim

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: VAST and UTF-8 / bug in AbtCodePageConverter?

Marten Feldtmann-2
In reply to this post by jtuchel
Well, as a first answer how to use the conversion and get away the zero characters:

mskFromUTF8AsCCPString
    "^<String>"
   
     ^( AbtCodePageConverter current
            convert: self asString
            fromCodePage: AbtCodePageConverter utf8CodePage
            toCodePage: AbtCodePageConverter currentCodePage
        ) trimNull
 
What I found useful: Either keep your application data complete in a current code page or stay with Unicode16 - UTF8 is only used for importing/exporting data in/from your application.

Am Dienstag, 27. Mai 2014 08:48:18 UTC+2 schrieb Joachim Tuchel:
I know, this is an issue that has been discussed so many times that noone can handle any more.

But I have to: I am using VAST with Seaside and Ajax, so I have to handle UTF-8 form data that is serialized using Ajax calls. There seems to be no *working* way to convince jQuery or the Browser to send an Ajax request in another charset than utf.

So while I have found a way to make hand-written javascript code encode Form data in iso-8859-1 before it is sent to the server, there is one absolute negative drawback: I must surpass all server side javascript code like JQAjax>>serializeForm and friends. These methods are all useless in the VAST context if you need to be prepared for non-US-ASCII.

For a little search field I decided to do the STTCPW, and therefor do a server side conversion using this snippet:

AbtCodePageConverter current
            convert: anObject
            fromCodePage: (AbtAbstractCodePageConverter codePageFromCharacterSet: 'UTF-8')
            toCodePage: (AbtCodePageConverter codePageFromCharacterSet:  'iso-8859-15')


This does, however yield a String with trailing zero bytes, because it internally uses a bufferSize of 4*length of the String to be converted.

I think that is wrong. The result of a conversion is not a String with trailing null bytes!

It is not hard to get around this, like so:

        (AbtCodePageConverter current
            convert: anObject
            fromCodePage: (AbtAbstractCodePageConverter codePageFromCharacterSet: 'UTF-8')
            toCodePage: (AbtCodePageConverter codePageFromCharacterSet:  'iso-8859-15')) reject: [:ch| ch codePoint = 0]

but that is a dirty hack and should be hidden in the convert:fromCodePagetoCodePage:

Joachim

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: VAST and UTF-8 / bug in AbtCodePageConverter?

jtuchel
Marten

trimNull looks good and works nicely.

As for your advice: that is exactly what I do: I force both DB2 and the web browser to use ISO-8859-1 (or -15 to be exact). The problem is that if I use jQuery to serialize form data, it will always be UTF-8, no matter what I tell the ajax or ajaxSetup object. So there is this one little weak piece in the chain that lifts the whole thing out of its place.

It is very frustrating, especially given the fact that the code page conversion as you suggest it breaks with OS Error 1 on our production system (see different thread on this group).

Joachim



Am Mittwoch, 28. Mai 2014 15:28:44 UTC+2 schrieb Marten Feldtmann:
Well, as a first answer how to use the conversion and get away the zero characters:

mskFromUTF8AsCCPString
    "^<String>"
   
     ^( AbtCodePageConverter current
            convert: self asString
            fromCodePage: AbtCodePageConverter utf8CodePage
            toCodePage: AbtCodePageConverter currentCodePage
        ) trimNull
 
What I found useful: Either keep your application data complete in a current code page or stay with Unicode16 - UTF8 is only used for importing/exporting data in/from your application.

Am Dienstag, 27. Mai 2014 08:48:18 UTC+2 schrieb Joachim Tuchel:
I know, this is an issue that has been discussed so many times that noone can handle any more.

But I have to: I am using VAST with Seaside and Ajax, so I have to handle UTF-8 form data that is serialized using Ajax calls. There seems to be no *working* way to convince jQuery or the Browser to send an Ajax request in another charset than utf.

So while I have found a way to make hand-written javascript code encode Form data in iso-8859-1 before it is sent to the server, there is one absolute negative drawback: I must surpass all server side javascript code like JQAjax>>serializeForm and friends. These methods are all useless in the VAST context if you need to be prepared for non-US-ASCII.

For a little search field I decided to do the STTCPW, and therefor do a server side conversion using this snippet:

AbtCodePageConverter current
            convert: anObject
            fromCodePage: (AbtAbstractCodePageConverter codePageFromCharacterSet: 'UTF-8')
            toCodePage: (AbtCodePageConverter codePageFromCharacterSet:  'iso-8859-15')


This does, however yield a String with trailing zero bytes, because it internally uses a bufferSize of 4*length of the String to be converted.

I think that is wrong. The result of a conversion is not a String with trailing null bytes!

It is not hard to get around this, like so:

        (AbtCodePageConverter current
            convert: anObject
            fromCodePage: (AbtAbstractCodePageConverter codePageFromCharacterSet: 'UTF-8')
            toCodePage: (AbtCodePageConverter codePageFromCharacterSet:  'iso-8859-15')) reject: [:ch| ch codePoint = 0]

but that is a dirty hack and should be hidden in the convert:fromCodePagetoCodePage:

Joachim

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.