Hey,
it seems to me, that Zinc - out of the box - seems to convert from/to UTF8. How can I tell Zinc to do NO conversion and that ZnStringEntity should leave their strings as they are .. I am fighting here with German Umlauts Marten -- Marten Feldtmann |
Hi Marten,
I will need (much) more detail, what are you trying to do that is not working according to you ? As far as I know Zinc HTTP Components does the right thing and can be used (configured) to do almost anything you want. It mostly depends on your mime types and their charset options. Sven On 09 Apr 2014, at 15:20, [hidden email] wrote: > Hey, > > it seems to me, that Zinc - out of the box - seems to convert from/to > UTF8. How can I tell Zinc to do NO conversion and that ZnStringEntity > should leave their strings as they are .. I am fighting here with German > Umlauts > > Marten > > -- > Marten Feldtmann > |
Am 09.04.2014 15:29, schrieb Sven Van Caekenberghe:
> Hi Marten, > > I will need (much) more detail, what are you trying to do that is not working according to you ? > > As far as I know Zinc HTTP Components does the right thing and can be used (configured) to do almost anything you want. It mostly depends on your mime types and their charset options. > The browser sends UTF8 data and in my application code I get instances of ZnStringEntity and the contained string is converted to (?) ISO8859-1 (?) or CP-1252 (?). This seems to be due to the fact, that the entity instance always has a ZnUTF8Encoder to do the conversion. I would like to have UTF8 everywhere ... without all these conversions ... Marten -- Marten Feldtmann |
Am 09.04.2014 um 17:29 schrieb [hidden email]: > Am 09.04.2014 15:29, schrieb Sven Van Caekenberghe: >> Hi Marten, >> >> I will need (much) more detail, what are you trying to do that is not working according to you ? >> >> As far as I know Zinc HTTP Components does the right thing and can be used (configured) to do almost anything you want. It mostly depends on your mime types and their charset options. >> > > The browser sends UTF8 data and in my application code I get instances > of ZnStringEntity and the contained string is converted to (?) ISO8859-1 > (?) or CP-1252 (?). This seems to be due to the fact, that the entity > instance always has a ZnUTF8Encoder to do the conversion. > > I would like to have UTF8 everywhere ... without all these conversions … > Norbert |
In reply to this post by marten
On 09 Apr 2014, at 17:29, [hidden email] wrote: > The browser sends UTF8 data and in my application code I get instances > of ZnStringEntity and the contained string is converted to (?) ISO8859-1 > (?) or CP-1252 (?). This seems to be due to the fact, that the entity > instance always has a ZnUTF8Encoder to do the conversion. I still don't understand the problem, but consider this: ZnServer startDefaultOn: 1701. ZnClient new url: 'http://localhost:1701/echo'; entity: (ZnEntity with: 'An der schönen blauen Donau'); post. ZnClient new url: 'http://localhost:1701/echo'; entity: (ZnEntity with: 'An der schönen blauen Donau' type: (ZnMimeType textPlain charSet: #'iso-8859-1'; yourself)); post; yourself. In the first case, a UTF-8 encoded string is POST-ed and correctly returned (in a UTF-8 encoded response). In the second case, an ISO-8859-1 encoded string is POST-ed and correctly returned (in a UTF-8 encoded response). In both cases the decoding was done correctly, using the specified charset (if that is missing, the ZnNullEncoder is used). Now, ö is not a perfect test example because its encoding value in Unicode, 246 decimal, U+00F6 hex, still fits in 1 byte and hence survives null encoding/decoding. That is why the following still works, although it is wrong to drop the charset. ZnClient new url: 'http://localhost:1701/echo'; entity: (ZnEntity with: 'An der schönen blauen Donau' type: (ZnMimeType textPlain clearCharSet; yourself)); post; yourself. HTH, Sven -- Sven Van Caekenberghe http://stfx.eu Smalltalk is the Red Pill |
Ok, if the browser sends POST/PUT request with a JSON structure it also
sends charset = utf8 (in my case). That's ok, because for JSON this is more or less the default charset. Zinc now seems to notice, that UTF8 charset is needed and creates a ZnStringEntity with an UTF8Encoder. Now when my application tries to get the JSON string of that ZnStringEntity and builds the structure out of that string - and the strings are NOT UTF8, but converted to (?) ISO8859 ? Marten -- Marten Feldtmann |
Marten,
On 09 Apr 2014, at 18:25, [hidden email] wrote: > Ok, if the browser sends POST/PUT request with a JSON structure it also > sends charset = utf8 (in my case). That's ok, because for JSON this is > more or less the default charset. > > Zinc now seems to notice, that UTF8 charset is needed and creates a > ZnStringEntity with an UTF8Encoder. > > Now when my application tries to get the JSON string of that > ZnStringEntity and builds the structure out of that string - and the > strings are NOT UTF8, but converted to (?) ISO8859 ? (NeoJSONReader fromString: (ZnEntity with: (NeoJSONWriter toString: { #message -> 'An der schönen blauen Donau' } asDictionary))) at: #message. You must be doing something possibly wrong when you <<get the JSON string of that ZnStringEntity and builds the structure out of that string>> (how do you do that, BTW), so please write some code that demonstrates what is not right according to you. Sven |
Ok, forget the JSON stuff - it has nothing to do with the "problem".
Other way round: My whole database and internal processing is done in UTF8. This is the most important point here to mention. Now the request comes into Zinc as mentioned below (the content of the request is a JSON string only): HTML-Request (charset=UTF-8) =(sends)=> ZINC HTTP Now Zinc sees the content of the body, knows that it is coded in UTF8 and creates a ZnStringEntity with UTF8Encoder. Zinc HTTP =(builds)=> ZnStringEntity (with UTF8Encoder) The instance of ZnRequest and its entity value is an instance of ZnStringEntity (with its encoder attribute is set to an instance to ZnUTF8Encoder). I checked the content of the string attribute of the ZnStringEntity and this string is NOT encoded in UTF8 any more, but in either ISO8859-? or WIN1252. I think, that this is ok for almost all people, because they work with some CodePages - but my internal processing assumes UTF8. I just fixed this for me by changing ZnStringEntity>>initializeEncoder to ALWAYS set the encoder attribute to ZnNullEncoder and now everthing is ok again. This means of course, that all apllication running with that source code work in UTF 8 only ... Marten Am 09.04.2014 18:42, schrieb Sven Van Caekenberghe: > Marten, > > On 09 Apr 2014, at 18:25, [hidden email] wrote: > >> Ok, if the browser sends POST/PUT request with a JSON structure it also >> sends charset = utf8 (in my case). That's ok, because for JSON this is >> more or less the default charset. >> >> Zinc now seems to notice, that UTF8 charset is needed and creates a >> ZnStringEntity with an UTF8Encoder. >> >> Now when my application tries to get the JSON string of that >> ZnStringEntity and builds the structure out of that string - and the >> strings are NOT UTF8, but converted to (?) ISO8859 ? > > (NeoJSONReader fromString: > (ZnEntity with: (NeoJSONWriter toString: { #message -> 'An der schönen blauen Donau' } asDictionary))) > at: #message. > > You must be doing something possibly wrong when you <<get the JSON string of that ZnStringEntity and builds the structure out of that string>> (how do you do that, BTW), so please write some code that demonstrates what is not right according to you. > > Sven > -- Marten Feldtmann |
On 09 Apr 2014, at 19:35, [hidden email] wrote: > Ok, forget the JSON stuff - it has nothing to do with the "problem". > > Other way round: > > My whole database and internal processing is done in UTF8. This is the > most important point here to mention. Why ? This means you forgo almost all String functionality, since UTF8 is a variable length encoding not really suitable to character by character processing. > Now the request comes into Zinc as mentioned below (the content of the > request is a JSON string only): > > HTML-Request (charset=UTF-8) =(sends)=> ZINC HTTP > > Now Zinc sees the content of the body, knows that it is coded in UTF8 > and creates a ZnStringEntity with UTF8Encoder. > > Zinc HTTP =(builds)=> ZnStringEntity (with UTF8Encoder) > > The instance of ZnRequest and its entity value is an instance of > ZnStringEntity (with its encoder attribute is set to an instance to > ZnUTF8Encoder). Yes, of course, UTF-8 (a variable length binary encoding) is converted into native Pharo Strings (possibly WideStrings) containing Characters, each of which is encoded using a Unicode code point value. > I checked the content of the string attribute of the ZnStringEntity and > this string is NOT encoded in UTF8 any more, but in either ISO8859-? > or WIN1252. Here you lose me (again) ;-) > I think, that this is ok for almost all people, because they work with > some CodePages - but my internal processing assumes UTF8. No, nobody works with code pages or any encoding, just native [Wide]Strings in pure Unicode. > I just fixed this for me by changing ZnStringEntity>>initializeEncoder > to ALWAYS set the encoder attribute to ZnNullEncoder and now everthing > is ok again. This means of course, that all apllication running with > that source code work in UTF 8 only ... OK, I think I understand, you want UTF-8 to remain UTF-8. What you did is one solution, but I think it is wrong to use a String to represent bytes. This case is actually already implemented server side for Seaside: ZnZincServerAdaptor>>#configureServerForBinaryReading "Seaside wants to do its own text conversions" server reader: [ :stream | ZnRequest readBinaryFrom: stream ] The #reader: option is used here to read everything binary, without decoding to Strings. You will get ZnZnByteArrayEntity objects back, containing the original binary representation. BTW, I think this is an interesting discussion. Regards, Sven > Marten > > Am 09.04.2014 18:42, schrieb Sven Van Caekenberghe: >> Marten, >> >> On 09 Apr 2014, at 18:25, [hidden email] wrote: >> >>> Ok, if the browser sends POST/PUT request with a JSON structure it also >>> sends charset = utf8 (in my case). That's ok, because for JSON this is >>> more or less the default charset. >>> >>> Zinc now seems to notice, that UTF8 charset is needed and creates a >>> ZnStringEntity with an UTF8Encoder. >>> >>> Now when my application tries to get the JSON string of that >>> ZnStringEntity and builds the structure out of that string - and the >>> strings are NOT UTF8, but converted to (?) ISO8859 ? >> >> (NeoJSONReader fromString: >> (ZnEntity with: (NeoJSONWriter toString: { #message -> 'An der schönen blauen Donau' } asDictionary))) >> at: #message. >> >> You must be doing something possibly wrong when you <<get the JSON string of that ZnStringEntity and builds the structure out of that string>> (how do you do that, BTW), so please write some code that demonstrates what is not right according to you. >> >> Sven >> > > > -- > Marten Feldtmann > |
And now the additional information: I'm working under Gemstone and I
noticed quite some differences between Pharo and its Gemstone port of Zinc in this area ... I have to take a closer look here. Marten -- Marten Feldtmann |
On 09 Apr 2014, at 20:54, [hidden email] wrote: > And now the additional information: I'm working under Gemstone and I > noticed quite some differences between Pharo and its Gemstone port > of Zinc in this area ... I have to take a closer look here. > > Marten I already expected that much. Yes, Zinc on Gemstone is seriously behind the original. Furthermore, I have never seen or worked with it. And although I have some sympathy for other Smalltalk implementations out there, I find it difficult to give free support for an expensive, closed source, commercial product. Sven |
+1 to Sven's comment and Marten - you should post this on the GemStone list as one of their guys will be able to help you with the encoding issues.
Paul
|
Am 10.04.2014 03:09, schrieb Paul DeBruicker:
> +1 to Sven's comment and Marten - you should post this on the GemStone list > as one of their guys will be able to help you with the encoding issues. Its ok for me. Due to Sven's software I actually start using Gemstone - just to give the honor back to him. Marten -- Marten Feldtmann |
On 10 Apr 2014, at 07:03, [hidden email] wrote: > Am 10.04.2014 03:09, schrieb Paul DeBruicker: >> +1 to Sven's comment and Marten - you should post this on the GemStone list >> as one of their guys will be able to help you with the encoding issues. > > Its ok for me. Due to Sven's software I actually start using Gemstone - > just to give the honor back to him. Thanks, I hope you're back on your way now, but I also hope you understand my point. In any case, I appreciate the feedback. Sven |
Free forum by Nabble | Edit this page |