Zinc HTTP server seems to convert always ...

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Zinc HTTP server seems to convert always ...

marten
Hey,

it seems to me, that Zinc - out of the box - seems to convert from/to
UTF8. How can I tell Zinc to do NO conversion and that ZnStringEntity
should leave their strings as they are .. I am fighting here with German
Umlauts

Marten

--
Marten Feldtmann

Reply | Threaded
Open this post in threaded view
|

Re: Zinc HTTP server seems to convert always ...

Sven Van Caekenberghe-2
Hi Marten,

I will need (much) more detail, what are you trying to do that is not working according to you ?

As far as I know Zinc HTTP Components does the right thing and can be used (configured) to do almost anything you want. It mostly depends on your mime types and their charset options.

Sven

On 09 Apr 2014, at 15:20, [hidden email] wrote:

> Hey,
>
> it seems to me, that Zinc - out of the box - seems to convert from/to
> UTF8. How can I tell Zinc to do NO conversion and that ZnStringEntity
> should leave their strings as they are .. I am fighting here with German
> Umlauts
>
> Marten
>
> --
> Marten Feldtmann
>


Reply | Threaded
Open this post in threaded view
|

Re: Zinc HTTP server seems to convert always ...

marten
Am 09.04.2014 15:29, schrieb Sven Van Caekenberghe:
> Hi Marten,
>
> I will need (much) more detail, what are you trying to do that is not working according to you ?
>
> As far as I know Zinc HTTP Components does the right thing and can be used (configured) to do almost anything you want. It mostly depends on your mime types and their charset options.
>

The browser sends UTF8 data and in my application code I get instances
of ZnStringEntity and the contained string is converted to (?) ISO8859-1
(?) or CP-1252 (?). This seems to be due to the fact, that the entity
instance always has a ZnUTF8Encoder to do the conversion.

I would like to have UTF8 everywhere ... without all these conversions ...

Marten



--
Marten Feldtmann

Reply | Threaded
Open this post in threaded view
|

Re: Zinc HTTP server seems to convert always ...

NorbertHartl

Am 09.04.2014 um 17:29 schrieb [hidden email]:

> Am 09.04.2014 15:29, schrieb Sven Van Caekenberghe:
>> Hi Marten,
>>
>> I will need (much) more detail, what are you trying to do that is not working according to you ?
>>
>> As far as I know Zinc HTTP Components does the right thing and can be used (configured) to do almost anything you want. It mostly depends on your mime types and their charset options.
>>
>
> The browser sends UTF8 data and in my application code I get instances
> of ZnStringEntity and the contained string is converted to (?) ISO8859-1
> (?) or CP-1252 (?). This seems to be due to the fact, that the entity
> instance always has a ZnUTF8Encoder to do the conversion.
>
> I would like to have UTF8 everywhere ... without all these conversions …
>
There are no automatic conversions in Zinc. So Zinc is one of the pieces of software I know that do not assume stupid defaults :) Did you specify a proper Content-Type header including the charset information? Otherwise Zinc has no chance of knowing what to use and the default NullEncoder will make your string as wrong as your example above.

Norbert




Reply | Threaded
Open this post in threaded view
|

Re: Zinc HTTP server seems to convert always ...

Sven Van Caekenberghe-2
In reply to this post by marten

On 09 Apr 2014, at 17:29, [hidden email] wrote:

> The browser sends UTF8 data and in my application code I get instances
> of ZnStringEntity and the contained string is converted to (?) ISO8859-1
> (?) or CP-1252 (?). This seems to be due to the fact, that the entity
> instance always has a ZnUTF8Encoder to do the conversion.

I still don't understand the problem, but consider this:

ZnServer startDefaultOn: 1701.

ZnClient new
  url: 'http://localhost:1701/echo';
  entity: (ZnEntity with: 'An der schönen blauen Donau');
  post.
       
ZnClient new
  url: 'http://localhost:1701/echo';
  entity: (ZnEntity
            with: 'An der schönen blauen Donau'
            type: (ZnMimeType textPlain charSet: #'iso-8859-1'; yourself));
  post;
  yourself.

In the first case, a UTF-8 encoded string is POST-ed and correctly returned (in a UTF-8 encoded response).

In the second case, an ISO-8859-1 encoded string is POST-ed and correctly returned (in a UTF-8 encoded response).

In both cases the decoding was done correctly, using the specified charset (if that is missing, the ZnNullEncoder is used). Now, ö is not a perfect test example because its encoding value in Unicode, 246 decimal, U+00F6 hex, still fits in 1 byte and hence survives null encoding/decoding. That is why the following still works, although it is wrong to drop the charset.

ZnClient new
  url: 'http://localhost:1701/echo';
  entity: (ZnEntity
            with: 'An der schönen blauen Donau'
            type: (ZnMimeType textPlain clearCharSet; yourself));
  post;
  yourself.

HTH,

Sven



--
Sven Van Caekenberghe
http://stfx.eu
Smalltalk is the Red Pill


Reply | Threaded
Open this post in threaded view
|

Re: Zinc HTTP server seems to convert always ...

marten
Ok, if the browser sends POST/PUT request with a JSON structure it also
sends charset = utf8 (in my case). That's ok, because for JSON this is
more or less the default charset.

Zinc now seems to notice, that UTF8 charset is needed and creates a
ZnStringEntity with an UTF8Encoder.

Now when my application tries to get the JSON string of that
ZnStringEntity and builds the structure out of that string - and the
strings are NOT UTF8, but converted to (?) ISO8859 ?


Marten


--
Marten Feldtmann

Reply | Threaded
Open this post in threaded view
|

Re: Zinc HTTP server seems to convert always ...

Sven Van Caekenberghe-2
Marten,

On 09 Apr 2014, at 18:25, [hidden email] wrote:

> Ok, if the browser sends POST/PUT request with a JSON structure it also
> sends charset = utf8 (in my case). That's ok, because for JSON this is
> more or less the default charset.
>
> Zinc now seems to notice, that UTF8 charset is needed and creates a
> ZnStringEntity with an UTF8Encoder.
>
> Now when my application tries to get the JSON string of that
> ZnStringEntity and builds the structure out of that string - and the
> strings are NOT UTF8, but converted to (?) ISO8859 ?

(NeoJSONReader fromString:
  (ZnEntity with: (NeoJSONWriter toString: { #message -> 'An der schönen blauen Donau' } asDictionary)))
    at: #message.

You must be doing something possibly wrong when you <<get the JSON string of that ZnStringEntity and builds the structure out of that string>> (how do you do that, BTW), so please write some code that demonstrates what is not right according to you.

Sven
Reply | Threaded
Open this post in threaded view
|

Re: Zinc HTTP server seems to convert always ...

marten
Ok, forget the JSON stuff - it has nothing to do with the "problem".

Other way round:

My whole database and internal processing is done in UTF8. This is the
most important point here to mention.


Now the request comes into Zinc as mentioned below (the content of the
request is a JSON string only):

  HTML-Request (charset=UTF-8) =(sends)=> ZINC HTTP

Now Zinc sees the content of the body, knows that it is coded in UTF8
and creates a ZnStringEntity with UTF8Encoder.

  Zinc HTTP =(builds)=> ZnStringEntity (with UTF8Encoder)

The instance of ZnRequest and its entity value is an instance of
ZnStringEntity (with its encoder attribute is set to an instance to
ZnUTF8Encoder).

I checked the content of the string attribute of the ZnStringEntity and
this string is NOT encoded in UTF8 any more, but in either ISO8859-?
or WIN1252.

I think, that this is ok for almost all people, because they work with
some CodePages - but my internal processing assumes UTF8.

I just fixed this for me by changing ZnStringEntity>>initializeEncoder
to ALWAYS set the encoder attribute to ZnNullEncoder and now everthing
is ok again. This means of course, that all apllication running with
that source code work in UTF 8 only ...

Marten

Am 09.04.2014 18:42, schrieb Sven Van Caekenberghe:

> Marten,
>
> On 09 Apr 2014, at 18:25, [hidden email] wrote:
>
>> Ok, if the browser sends POST/PUT request with a JSON structure it also
>> sends charset = utf8 (in my case). That's ok, because for JSON this is
>> more or less the default charset.
>>
>> Zinc now seems to notice, that UTF8 charset is needed and creates a
>> ZnStringEntity with an UTF8Encoder.
>>
>> Now when my application tries to get the JSON string of that
>> ZnStringEntity and builds the structure out of that string - and the
>> strings are NOT UTF8, but converted to (?) ISO8859 ?
>
> (NeoJSONReader fromString:
>   (ZnEntity with: (NeoJSONWriter toString: { #message -> 'An der schönen blauen Donau' } asDictionary)))
>     at: #message.
>
> You must be doing something possibly wrong when you <<get the JSON string of that ZnStringEntity and builds the structure out of that string>> (how do you do that, BTW), so please write some code that demonstrates what is not right according to you.
>
> Sven
>


--
Marten Feldtmann

Reply | Threaded
Open this post in threaded view
|

Re: Zinc HTTP server seems to convert always ...

Sven Van Caekenberghe-2

On 09 Apr 2014, at 19:35, [hidden email] wrote:

> Ok, forget the JSON stuff - it has nothing to do with the "problem".
>
> Other way round:
>
> My whole database and internal processing is done in UTF8. This is the
> most important point here to mention.

Why ? This means you forgo almost all String functionality, since UTF8 is a variable length encoding not really suitable to character by character processing.

> Now the request comes into Zinc as mentioned below (the content of the
> request is a JSON string only):
>
>  HTML-Request (charset=UTF-8) =(sends)=> ZINC HTTP
>
> Now Zinc sees the content of the body, knows that it is coded in UTF8
> and creates a ZnStringEntity with UTF8Encoder.
>
>  Zinc HTTP =(builds)=> ZnStringEntity (with UTF8Encoder)
>
> The instance of ZnRequest and its entity value is an instance of
> ZnStringEntity (with its encoder attribute is set to an instance to
> ZnUTF8Encoder).

Yes, of course, UTF-8 (a variable length binary encoding) is converted into native Pharo Strings (possibly WideStrings) containing Characters, each of which is encoded using a Unicode code point value.

> I checked the content of the string attribute of the ZnStringEntity and
> this string is NOT encoded in UTF8 any more, but in either ISO8859-?
> or WIN1252.

Here you lose me (again) ;-)

> I think, that this is ok for almost all people, because they work with
> some CodePages - but my internal processing assumes UTF8.

No, nobody works with code pages or any encoding, just native [Wide]Strings in pure Unicode.

> I just fixed this for me by changing ZnStringEntity>>initializeEncoder
> to ALWAYS set the encoder attribute to ZnNullEncoder and now everthing
> is ok again. This means of course, that all apllication running with
> that source code work in UTF 8 only ...

OK, I think I understand, you want UTF-8 to remain UTF-8. What you did is one solution, but I think it is wrong to use a String to represent bytes.

This case is actually already implemented server side for Seaside:

ZnZincServerAdaptor>>#configureServerForBinaryReading
  "Seaside wants to do its own text conversions"

  server reader: [ :stream | ZnRequest readBinaryFrom: stream ]

The #reader: option is used here to read everything binary, without decoding to Strings. You will get ZnZnByteArrayEntity objects back, containing the original binary representation.

BTW, I think this is an interesting discussion.

Regards,

Sven

> Marten
>
> Am 09.04.2014 18:42, schrieb Sven Van Caekenberghe:
>> Marten,
>>
>> On 09 Apr 2014, at 18:25, [hidden email] wrote:
>>
>>> Ok, if the browser sends POST/PUT request with a JSON structure it also
>>> sends charset = utf8 (in my case). That's ok, because for JSON this is
>>> more or less the default charset.
>>>
>>> Zinc now seems to notice, that UTF8 charset is needed and creates a
>>> ZnStringEntity with an UTF8Encoder.
>>>
>>> Now when my application tries to get the JSON string of that
>>> ZnStringEntity and builds the structure out of that string - and the
>>> strings are NOT UTF8, but converted to (?) ISO8859 ?
>>
>> (NeoJSONReader fromString:
>>  (ZnEntity with: (NeoJSONWriter toString: { #message -> 'An der schönen blauen Donau' } asDictionary)))
>>    at: #message.
>>
>> You must be doing something possibly wrong when you <<get the JSON string of that ZnStringEntity and builds the structure out of that string>> (how do you do that, BTW), so please write some code that demonstrates what is not right according to you.
>>
>> Sven
>>
>
>
> --
> Marten Feldtmann
>


Reply | Threaded
Open this post in threaded view
|

Re: Zinc HTTP server seems to convert always ...

marten
And now the additional information: I'm working under Gemstone and I
noticed quite some differences between Pharo and its Gemstone port
of Zinc in this area ... I have to take a closer look here.

Marten


--
Marten Feldtmann

Reply | Threaded
Open this post in threaded view
|

Re: Zinc HTTP server seems to convert always ...

Sven Van Caekenberghe-2

On 09 Apr 2014, at 20:54, [hidden email] wrote:

> And now the additional information: I'm working under Gemstone and I
> noticed quite some differences between Pharo and its Gemstone port
> of Zinc in this area ... I have to take a closer look here.
>
> Marten

I already expected that much. Yes, Zinc on Gemstone is seriously behind the original. Furthermore, I have never seen or worked with it. And although I have some sympathy for other Smalltalk implementations out there, I find it difficult to give free support for an expensive, closed source, commercial product.

Sven
 
Reply | Threaded
Open this post in threaded view
|

Re: Zinc HTTP server seems to convert always ...

Paul DeBruicker
+1 to Sven's comment and Marten - you should post this on the GemStone list as one of their guys will be able to help you with the encoding issues.



Paul

Sven Van Caekenberghe-2 wrote
On 09 Apr 2014, at 20:54, [hidden email] wrote:

> And now the additional information: I'm working under Gemstone and I
> noticed quite some differences between Pharo and its Gemstone port
> of Zinc in this area ... I have to take a closer look here.
>
> Marten

I already expected that much. Yes, Zinc on Gemstone is seriously behind the original. Furthermore, I have never seen or worked with it. And although I have some sympathy for other Smalltalk implementations out there, I find it difficult to give free support for an expensive, closed source, commercial product.

Sven
Reply | Threaded
Open this post in threaded view
|

Re: Zinc HTTP server seems to convert always ...

marten
Am 10.04.2014 03:09, schrieb Paul DeBruicker:
> +1 to Sven's comment and Marten - you should post this on the GemStone list
> as one of their guys will be able to help you with the encoding issues.

 Its ok for me. Due to Sven's software I actually start using Gemstone -
just to give the honor back to him.

Marten


--
Marten Feldtmann

Reply | Threaded
Open this post in threaded view
|

Re: Zinc HTTP server seems to convert always ...

Sven Van Caekenberghe-2

On 10 Apr 2014, at 07:03, [hidden email] wrote:

> Am 10.04.2014 03:09, schrieb Paul DeBruicker:
>> +1 to Sven's comment and Marten - you should post this on the GemStone list
>> as one of their guys will be able to help you with the encoding issues.
>
> Its ok for me. Due to Sven's software I actually start using Gemstone -
> just to give the honor back to him.

Thanks, I hope you're back on your way now, but I also hope you understand my point. In any case, I appreciate the feedback.

Sven