[ANN] WebClient and WebServer 1.0 for Squeak

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
50 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 in JSON (was: Re: [ANN] WebClient and WebServer 1.0 for Squeak)

Hannes Hirzel
Igor, your argument convinces me.
Thank you for the quick feedback.

see updates below

--Hannes

On 5/11/10, Igor Stasenko <[hidden email]> wrote:

> On 12 May 2010 00:09, Hannes Hirzel <[hidden email]> wrote:
>> 1) UFT8 conversion
>> 2) Change to JSON package of Tony Garnock-Jones
>> 3) My updated Test case
>> 4) Conclusion
>>
>>
>> 1) UFT8 conversion
>>
>> My question was:
>>    How do I convert a WideString to UTF8?
>>
>>
>> Levente answered:
>>
>> There are various possibilities:
>> 'äbc' squeakToUtf8.
>> 'äbc' convertToEncoding: 'utf-8'.
>> 'äbc' convertToWithConverter: UTF8TextConverter new.
>> UTF8TextConverter new encodeString: 'äbc'.
>>
>>
>>
>> 2) Change to JSON package of Tony Garnock-Jones
>>
>> As CouchDB stores UTF8 values I did not want to escape them with
>> \uNNNN as the forked JSON package in SCouchDB does.
>
> i know. But JSON could be used for something else, and also its a part
> of syntax,
> so it should be supported there.
>
>> But instead I
>> wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion
>> is not correct in the original JSON package.
>>
> Yeah.. SCouchDB having no utf-8 support for output. Yet.
>
>> So I did the following correction.
>>
>> In the class
>>  String  - category *JSON-writing
>>  (from package http://www.squeaksource.com/JSON)
>> I replaced
>>
>>  jsonWriteOn: aStream
>>        | replacement |
>>        aStream nextPut: $".
>>        self do: [ :ch |
>>                (replacement := Json escapeForCharacter: ch)    "***"
>>                        ifNil: [ aStream nextPut: ch ]
>>                        ifNotNil: [ aStream nextPutAll: replacement ] ].
>>        aStream nextPut: $".
>>
>>
>> WITH
>>
>>  jsonWriteOn: aStream
>>        aStream nextPut: $".
>>        aStream nextPutAll:  (UTF8TextConverter new encodeString: self).
>>        aStream nextPut: $".
>>
>
> No, this is WRONG!
>
> Json writer methods should output a unicode text, and do not deal with
> any encoding!
> Then, a layer which responsible for transferring the data will be free
> decide how to encode the
> json output, either using utf-8 encoding or any other appropriate UTF
> encoding.
>
> By putting utf-8 conversions in JSON library routines you limiting
> JSON library to be used only with utf-8 encoding.
>
> I repeat: JSON library is wrong place for dealing with encodings. It
> should take a unicode text/stream as input
> and unicode text/stream as output. Any encodings should be up to the
> outer layers, which responsible for data transmission!

So String>> jsonWriteOn:aStream

is now just

jsonWriteOn: aStream
       aStream nextPut: $".
       aStream nextPutAll:  self.
       aStream nextPut: $".



>>
>> "*** NOTE: escapeForCharacter is incorrectly implemented in
>> http://www.squeaksource.com/JSON
>> and is corrected by Rado in the SCouchDB fork of the package JSON
>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz"
>>
>
>
>>
>>
>> 3) My updated Test case
>>

myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b').
d := Dictionary new. d at: 'title' put:   'aTitle'. d at: 'body' put:
myWideString.
r := WriteStream on: String new.
(JsonObject newFrom: d) jsonWriteOn: r.
WebClient httpPut: host, '/notes/test25' content: (UTF8TextConverter
new encodeString: r contents) type: 'text/plain'.


RESULT: OK.


>> 4) Conclusion
>>
>> With the change to the JSON package I am now fine in using WebClient
>> for storing objects in a couchdB.
>>

However I did not commit my change to
http://www.squeaksource.com/JSON

though
   Json escapeForCharacter: ch
is wrong.

And probably it should not do it. At least the current couchDB deals
properly with UTF8 encoded strings.

Reply | Threaded
Open this post in threaded view
|

Re: UTF8 in JSON (was: Re: [ANN] WebClient and WebServer 1.0 for Squeak)

Igor Stasenko
On 12 May 2010 01:12, Hannes Hirzel <[hidden email]> wrote:

> Igor, your argument convinces me.
> Thank you for the quick feedback.
>
> see updates below
>
> --Hannes
>
> On 5/11/10, Igor Stasenko <[hidden email]> wrote:
>> On 12 May 2010 00:09, Hannes Hirzel <[hidden email]> wrote:
>>> 1) UFT8 conversion
>>> 2) Change to JSON package of Tony Garnock-Jones
>>> 3) My updated Test case
>>> 4) Conclusion
>>>
>>>
>>> 1) UFT8 conversion
>>>
>>> My question was:
>>>    How do I convert a WideString to UTF8?
>>>
>>>
>>> Levente answered:
>>>
>>> There are various possibilities:
>>> 'äbc' squeakToUtf8.
>>> 'äbc' convertToEncoding: 'utf-8'.
>>> 'äbc' convertToWithConverter: UTF8TextConverter new.
>>> UTF8TextConverter new encodeString: 'äbc'.
>>>
>>>
>>>
>>> 2) Change to JSON package of Tony Garnock-Jones
>>>
>>> As CouchDB stores UTF8 values I did not want to escape them with
>>> \uNNNN as the forked JSON package in SCouchDB does.
>>
>> i know. But JSON could be used for something else, and also its a part
>> of syntax,
>> so it should be supported there.
>>
>>> But instead I
>>> wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion
>>> is not correct in the original JSON package.
>>>
>> Yeah.. SCouchDB having no utf-8 support for output. Yet.
>>
>>> So I did the following correction.
>>>
>>> In the class
>>>  String  - category *JSON-writing
>>>  (from package http://www.squeaksource.com/JSON)
>>> I replaced
>>>
>>>  jsonWriteOn: aStream
>>>        | replacement |
>>>        aStream nextPut: $".
>>>        self do: [ :ch |
>>>                (replacement := Json escapeForCharacter: ch)    "***"
>>>                        ifNil: [ aStream nextPut: ch ]
>>>                        ifNotNil: [ aStream nextPutAll: replacement ] ].
>>>        aStream nextPut: $".
>>>
>>>
>>> WITH
>>>
>>>  jsonWriteOn: aStream
>>>        aStream nextPut: $".
>>>        aStream nextPutAll:  (UTF8TextConverter new encodeString: self).
>>>        aStream nextPut: $".
>>>
>>
>> No, this is WRONG!
>>
>> Json writer methods should output a unicode text, and do not deal with
>> any encoding!
>> Then, a layer which responsible for transferring the data will be free
>> decide how to encode the
>> json output, either using utf-8 encoding or any other appropriate UTF
>> encoding.
>>
>> By putting utf-8 conversions in JSON library routines you limiting
>> JSON library to be used only with utf-8 encoding.
>>
>> I repeat: JSON library is wrong place for dealing with encodings. It
>> should take a unicode text/stream as input
>> and unicode text/stream as output. Any encodings should be up to the
>> outer layers, which responsible for data transmission!
>
> So String>> jsonWriteOn:aStream
>
> is now just
>
> jsonWriteOn: aStream
>       aStream nextPut: $".
>       aStream nextPutAll:  self.
>       aStream nextPut: $".
>
>
>
>>>
>>> "*** NOTE: escapeForCharacter is incorrectly implemented in
>>> http://www.squeaksource.com/JSON
>>> and is corrected by Rado in the SCouchDB fork of the package JSON
>>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz"
>>>
>>
>>
>>>
>>>
>>> 3) My updated Test case
>>>
>
> myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b').
> d := Dictionary new. d at: 'title' put:   'aTitle'. d at: 'body' put:
> myWideString.
> r := WriteStream on: String new.
> (JsonObject newFrom: d) jsonWriteOn: r.
> WebClient httpPut: host, '/notes/test25' content: (UTF8TextConverter
> new encodeString: r contents) type: 'text/plain'.
>
>
> RESULT: OK.
>
>
>>> 4) Conclusion
>>>
>>> With the change to the JSON package I am now fine in using WebClient
>>> for storing objects in a couchdB.
>>>
>
> However I did not commit my change to
> http://www.squeaksource.com/JSON
>
> though
>   Json escapeForCharacter: ch
> is wrong.
>
Yeah, thanks for noting that. This probably should be simply wiped out.
Or, maybe we could be more clever and add an option, whether we want
to escape a non-ascii characters or not.
This can be done by adding a single method to stream, which could tell
if it can deal with unicode , or
only with ascii characters.

> And probably it should not do it. At least the current couchDB deals
> properly with UTF8 encoded strings.
>
>
In SCouchDB i will put an encoding layer right before sending json (in
similar way as you used in the example above).
Its easy to do, given the assumption, that JSON output is _always_ a
unicode text,
then i can simply use an appropriate utf-8 encoder, which will encode
it while sending to server.
And thus, no extra effort is required in JSON itself.


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: UTF8 in JSON (was: Re: [ANN] WebClient and WebServer 1.0 for Squeak)

Igor Stasenko
In reply to this post by Hannes Hirzel
On 12 May 2010 01:12, Hannes Hirzel <[hidden email]> wrote:

> So String>> jsonWriteOn:aStream
>
> is now just
>
> jsonWriteOn: aStream
>       aStream nextPut: $".
>       aStream nextPutAll:  self.
>       aStream nextPut: $".
>
>
this is also wrong, because if your string contains a $" character
(and other control characters), it must be properly escaped:

'"' asJsonString  '"\""'

String crlf asJsonString  '"\r\n"'



>




--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: UTF8 in JSON (was: Re: [ANN] WebClient and WebServer 1.0 for Squeak)

Hannes Hirzel
On 5/11/10, Igor Stasenko <[hidden email]> wrote:

> On 12 May 2010 01:12, Hannes Hirzel <[hidden email]> wrote:
>> So String>> jsonWriteOn:aStream
>>
>> is now just
>>
>> jsonWriteOn: aStream
>>       aStream nextPut: $".
>>       aStream nextPutAll:  self.
>>       aStream nextPut: $".
>>
>>
> this is also wrong, because if your string contains a $" character
> (and other control characters), it must be properly escaped:
>
> '"' asJsonString  '"\""'
>
> String crlf asJsonString  '"\r\n"'
>

Yes, you are right. I just realised it as well. However there is no method
   asJsonString
in the http://www.squeaksource.com/JSON package by Tony Garnock-Jones
and others.


At least " and \ have to be escaped (cf. for example
http://awwx.ws/combinator/13)

So I went for the following

Instead of

Json class

  escapeForCharacter: c
       
        | index |
        ^ (index := c asciiValue + 1) <= escapeArray size
                ifTrue: [ ^ escapeArray at: index ]
                ifFalse: [ ^ '\u', ((c asciiValue bitAnd: 16rFFFF) printStringBase: 16) ]


I do


escapeForCharacter: c
       
        | index |
        ^ (index := c asciiValue + 1) <= escapeArray size
                ifTrue: [ ^ escapeArray at: index ]
                ifFalse: [ ^nil]


And I go back from

 String
   jsonWriteOn: aStream
  aStream nextPut: $".
        aStream nextPutAll:  self.
        aStream nextPut: $".

to what it was before


 String
  jsonWriteOn: aStream

        | replacement |
        aStream nextPut: $".
        self do: [ :ch |
                (replacement := Json escapeForCharacter: ch)
                        ifNil: [ aStream nextPut: ch ]
                        ifNotNil: [ aStream nextPutAll: replacement ] ].
        aStream nextPut: $".


And in fact the test case has to be extended to include a backslash u
in the example string (myWideString).

myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b\user').
d := Dictionary new. d at: 'title' put:   'aTitle'. d at: 'body' put:
myWideString.
r := WriteStream on: String new.
(JsonObject newFrom: d) jsonWriteOn: r.
WebClient httpPut: host, '/notes/test30' content: (UTF8TextConverter
new encodeString: r contents) type: 'text/plain'.



--Hannes

Reply | Threaded
Open this post in threaded view
|

Re: UTF8 in JSON (was: Re: [ANN] WebClient and WebServer 1.0 for Squeak)

Igor Stasenko
On 12 May 2010 02:17, Hannes Hirzel <[hidden email]> wrote:

> On 5/11/10, Igor Stasenko <[hidden email]> wrote:
>> On 12 May 2010 01:12, Hannes Hirzel <[hidden email]> wrote:
>>> So String>> jsonWriteOn:aStream
>>>
>>> is now just
>>>
>>> jsonWriteOn: aStream
>>>       aStream nextPut: $".
>>>       aStream nextPutAll:  self.
>>>       aStream nextPut: $".
>>>
>>>
>> this is also wrong, because if your string contains a $" character
>> (and other control characters), it must be properly escaped:
>>
>> '"' asJsonString  '"\""'
>>
>> String crlf asJsonString  '"\r\n"'
>>
>
> Yes, you are right. I just realised it as well. However there is no method
>   asJsonString
> in the http://www.squeaksource.com/JSON package by Tony Garnock-Jones
> and others.
>
ah, sorry, i added this stuff in my own JSON fork. I found it quite
convenient :)

>
> At least " and \ have to be escaped (cf. for example
> http://awwx.ws/combinator/13)
>
> So I went for the following
>
> Instead of
>
> Json class
>
>  escapeForCharacter: c
>
>        | index |
>        ^ (index := c asciiValue + 1) <= escapeArray size
>                ifTrue: [ ^ escapeArray at: index ]
>                ifFalse: [ ^ '\u', ((c asciiValue bitAnd: 16rFFFF) printStringBase: 16) ]
>
>
> I do
>
>
> escapeForCharacter: c
>
>        | index |
>        ^ (index := c asciiValue + 1) <= escapeArray size
>                ifTrue: [ ^ escapeArray at: index ]
>                ifFalse: [ ^nil]
>
>
> And I go back from
>
>  String
>   jsonWriteOn: aStream
>        aStream nextPut: $".
>        aStream nextPutAll:  self.
>        aStream nextPut: $".
>
> to what it was before
>
>
>  String
>  jsonWriteOn: aStream
>
>        | replacement |
>        aStream nextPut: $".
>        self do: [ :ch |
>                (replacement := Json escapeForCharacter: ch)
>                        ifNil: [ aStream nextPut: ch ]
>                        ifNotNil: [ aStream nextPutAll: replacement ] ].
>        aStream nextPut: $".
>
>
> And in fact the test case has to be extended to include a backslash u
> in the example string (myWideString).
>
> myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b\user').
> d := Dictionary new. d at: 'title' put:   'aTitle'. d at: 'body' put:
> myWideString.
> r := WriteStream on: String new.
> (JsonObject newFrom: d) jsonWriteOn: r.
> WebClient httpPut: host, '/notes/test30' content: (UTF8TextConverter
> new encodeString: r contents) type: 'text/plain'.
>
>
Yeah, this seems like closer to right implementation.
Thank you, Hannes for being scrupulous. :)

>
> --Hannes
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: UTF8 in JSON (was: Re: [ANN] WebClient and WebServer 1.0 for Squeak)

Levente Uzonyi-2
In reply to this post by Hannes Hirzel
On Tue, 11 May 2010, Hannes Hirzel wrote:

> 1) UFT8 conversion
> 2) Change to JSON package of Tony Garnock-Jones
> 3) My updated Test case
> 4) Conclusion
>
>
> 1) UFT8 conversion
>
> My question was:
>    How do I convert a WideString to UTF8?
>
>
> Levente answered:
>
> There are various possibilities:
> 'äbc' squeakToUtf8.
> 'äbc' convertToEncoding: 'utf-8'.
> 'äbc' convertToWithConverter: UTF8TextConverter new.
> UTF8TextConverter new encodeString: 'äbc'.
>
>
>
> 2) Change to JSON package of Tony Garnock-Jones
>
> As CouchDB stores UTF8 values I did not want to escape them with
> \uNNNN as the forked JSON package in SCouchDB does. But instead I
> wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion
> is not correct in the original JSON package.
>
> So I did the following correction.
>
> In the class
>  String  - category *JSON-writing
>  (from package http://www.squeaksource.com/JSON)
> I replaced
>
>  jsonWriteOn: aStream
> | replacement |
> aStream nextPut: $".
> self do: [ :ch |
> (replacement := Json escapeForCharacter: ch)    "***"
> ifNil: [ aStream nextPut: ch ]
> ifNotNil: [ aStream nextPutAll: replacement ] ].
> aStream nextPut: $".
>
>
> WITH
>
>  jsonWriteOn: aStream
> aStream nextPut: $".
> aStream nextPutAll:  (UTF8TextConverter new encodeString: self).
> aStream nextPut: $".
This is just wrong. According to http://json.org a string can contain any
unicode character except for \ " and control characters. So here should be
no UTF-8 conversion.

You only need to convert the characters to UTF-8, because you're sending
them over the network to a server, and unicode characters have to be
converted to bytes someway. So the JSON printer shouldn't do any
conversion by default except for escaping. The only problem is that
escaping is not done as the spec requires it, but that's easy to fix.


Levente

>
>
> "*** NOTE: escapeForCharacter is incorrectly implemented in
> http://www.squeaksource.com/JSON
> and is corrected by Rado in the SCouchDB fork of the package JSON
> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz"
>
>
>
> 3) My updated Test case
>
> myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b').
> d := Dictionary new. d at: 'title' put:   'aTitle'. d at: 'body' put:
> myWideString.
> r := WriteStream on: String new.
> (JsonObject newFrom: d) jsonWriteOn: r.
> WebClient httpPut: host, '/notes/test24' content: r contents type: 'text/plain'.
>
> RESULT: OK.
>
>
>
> 4) Conclusion
>
> With the change to the JSON package I am now fine in using WebClient
> for storing objects in a couchdB.
>
> However I did not commit my change to
>  http://www.squeaksource.com/JSON
> as I do not (yet) understand the full impact of it.
>
>
> Thank you Andreas Raab, Levente Uzony and Rado Hodnicak for your help
>
> --Hannes
>
> On 5/11/10, Igor Stasenko <[hidden email]> wrote:
>> On 11 May 2010 17:44, Hannes Hirzel <[hidden email]> wrote:
>>> On 5/10/10, radoslav hodnicak <[hidden email]> wrote:
>>>>
>>>> Which JSON package/version are you using? I fixed a bug in the one
>>>> distributed with SCouchDB few weeks ago, where it didn't encode utf8
>>>> characters properly - the correct escaped form is \uNNNN - always padded
>>>> to 4 Ns. that's why you get that warning, yours is only 2-3
>>>>
>>>> rado
>>>
>>> I have been using
>>> http://www.squeaksource.com/JSON (over 7000 downloads)
>>> in combination with WebClient.
>>>
>>> Thank you Rado, I found
>>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz
>>> and will have a look at it.
>>> (Your comment: added handling of utf8 encoded input data - this is
>>> necessary for couchdb-lucene which sends results directly in utf8 and
>>> not \uNNNN encoded)
>>>
>> SCouchDB using a forked version of JSON package, which you can find in
>> SCouchDB repository
>> http://www.squeaksource.com/SCouchDB/JSON-Igor.Stasenko.34.mcz
>>
>> If you looking for that method, it can be found in Json>>unescapeUnicode
>>
>>
>>> --Hannes
>>>
>>>
>>>> On Mon, 10 May 2010, Hannes Hirzel wrote:
>>>>
>>>>> The test case made simpler
>>>>>
>>>>> WebClient httpPut: host, '/notes/test7' content:
>>>>> '{"content":"\uC3\uA4s"}' type: 'text/plain'.
>>>>>
>>>>> gives back as answer: '{"error":"bad_request","reason":"invalid UTF-8
>>>>> JSON"}
>>>>> '
>>>>>
>>>>> whereas
>>>>>
>>>>> WebClient httpPut: host, '/notes/test8' content: '{"content":"abc"}'
>>>>> type: 'text/plain'.
>>>>>
>>>>> gives back
>>>>> '{"ok":true,"id":"test8","rev":"1-f40e52919735ae6775af3d388361b3da"}
>>>>> '
>>>>>
>>>>> --Hannes
>>>>
>>>>
>>>
>>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: UTF8 in JSON (was: Re: [ANN] WebClient and WebServer 1.0 for Squeak)

Hannes Hirzel
Levente, your answer covers an earlier state of the exchange. See here
for the latest account
http://lists.squeakfoundation.org/pipermail/squeak-dev/2010-May/150497.html

Basically the need for UFT8 conversion in my case stems from the fact
that I use the WebClient to post the JSON object and it accepts only
bytes. And I want to post to a couchDB which deals nicely with UTF8.

The JSON package as such needs no UTF8 conversion. Only escaping of
backslash \, double quote " and control characters.


The method
String >>jsonWriteOn: aStream

should stay at

String
>  jsonWriteOn: aStream
>
>        | replacement |
>        aStream nextPut: $".
>        self do: [ :ch |
>                (replacement := Json escapeForCharacter: ch)
>                        ifNil: [ aStream nextPut: ch ]
>                        ifNotNil: [ aStream nextPutAll: replacement ] ].
>        aStream nextPut: $".

but the method
  Json escapeForCharacter: ch
does not need to go for \uNNNN for non-ASCII characters.

So I do the UFT8 conversion just before Http posting.

I hope this clarified the situation and we might move soon to an
update of the JSON package.

--Hannes

On 5/11/10, Levente Uzonyi <[hidden email]> wrote:

> On Tue, 11 May 2010, Hannes Hirzel wrote:
>
>> 1) UFT8 conversion
>> 2) Change to JSON package of Tony Garnock-Jones
>> 3) My updated Test case
>> 4) Conclusion
>>
>>
>> 1) UFT8 conversion
>>
>> My question was:
>>    How do I convert a WideString to UTF8?
>>
>>
>> Levente answered:
>>
>> There are various possibilities:
>> 'äbc' squeakToUtf8.
>> 'äbc' convertToEncoding: 'utf-8'.
>> 'äbc' convertToWithConverter: UTF8TextConverter new.
>> UTF8TextConverter new encodeString: 'äbc'.
>>
>>
>>
>> 2) Change to JSON package of Tony Garnock-Jones
>>
>> As CouchDB stores UTF8 values I did not want to escape them with
>> \uNNNN as the forked JSON package in SCouchDB does. But instead I
>> wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion
>> is not correct in the original JSON package.
>>
>> So I did the following correction.
>>
>> In the class
>>  String  - category *JSON-writing
>>  (from package http://www.squeaksource.com/JSON)
>> I replaced
>>
>>  jsonWriteOn: aStream
>> | replacement |
>> aStream nextPut: $".
>> self do: [ :ch |
>> (replacement := Json escapeForCharacter: ch)    "***"
>> ifNil: [ aStream nextPut: ch ]
>> ifNotNil: [ aStream nextPutAll: replacement ] ].
>> aStream nextPut: $".
>>
>>
>> WITH
>>
>>  jsonWriteOn: aStream
>> aStream nextPut: $".
>> aStream nextPutAll:  (UTF8TextConverter new encodeString: self).
>> aStream nextPut: $".
>
> This is just wrong. According to http://json.org a string can contain any
> unicode character except for \ " and control characters. So here should be
> no UTF-8 conversion.
>
> You only need to convert the characters to UTF-8, because you're sending
> them over the network to a server, and unicode characters have to be
> converted to bytes someway. So the JSON printer shouldn't do any
> conversion by default except for escaping. The only problem is that
> escaping is not done as the spec requires it, but that's easy to fix.
>
>
> Levente
>
>>
>>
>> "*** NOTE: escapeForCharacter is incorrectly implemented in
>> http://www.squeaksource.com/JSON
>> and is corrected by Rado in the SCouchDB fork of the package JSON
>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz"
>>
>>
>>
>> 3) My updated Test case
>>
>> myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b').
>> d := Dictionary new. d at: 'title' put:   'aTitle'. d at: 'body' put:
>> myWideString.
>> r := WriteStream on: String new.
>> (JsonObject newFrom: d) jsonWriteOn: r.
>> WebClient httpPut: host, '/notes/test24' content: r contents type:
>> 'text/plain'.
>>
>> RESULT: OK.
>>
>>
>>
>> 4) Conclusion
>>
>> With the change to the JSON package I am now fine in using WebClient
>> for storing objects in a couchdB.
>>
>> However I did not commit my change to
>>  http://www.squeaksource.com/JSON
>> as I do not (yet) understand the full impact of it.
>>
>>
>> Thank you Andreas Raab, Levente Uzony and Rado Hodnicak for your help
>>
>> --Hannes
>>
>> On 5/11/10, Igor Stasenko <[hidden email]> wrote:
>>> On 11 May 2010 17:44, Hannes Hirzel <[hidden email]> wrote:
>>>> On 5/10/10, radoslav hodnicak <[hidden email]> wrote:
>>>>>
>>>>> Which JSON package/version are you using? I fixed a bug in the one
>>>>> distributed with SCouchDB few weeks ago, where it didn't encode utf8
>>>>> characters properly - the correct escaped form is \uNNNN - always
>>>>> padded
>>>>> to 4 Ns. that's why you get that warning, yours is only 2-3
>>>>>
>>>>> rado
>>>>
>>>> I have been using
>>>> http://www.squeaksource.com/JSON (over 7000 downloads)
>>>> in combination with WebClient.
>>>>
>>>> Thank you Rado, I found
>>>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz
>>>> and will have a look at it.
>>>> (Your comment: added handling of utf8 encoded input data - this is
>>>> necessary for couchdb-lucene which sends results directly in utf8 and
>>>> not \uNNNN encoded)
>>>>
>>> SCouchDB using a forked version of JSON package, which you can find in
>>> SCouchDB repository
>>> http://www.squeaksource.com/SCouchDB/JSON-Igor.Stasenko.34.mcz
>>>
>>> If you looking for that method, it can be found in Json>>unescapeUnicode
>>>
>>>
>>>> --Hannes
>>>>
>>>>
>>>>> On Mon, 10 May 2010, Hannes Hirzel wrote:
>>>>>
>>>>>> The test case made simpler
>>>>>>
>>>>>> WebClient httpPut: host, '/notes/test7' content:
>>>>>> '{"content":"\uC3\uA4s"}' type: 'text/plain'.
>>>>>>
>>>>>> gives back as answer: '{"error":"bad_request","reason":"invalid UTF-8
>>>>>> JSON"}
>>>>>> '
>>>>>>
>>>>>> whereas
>>>>>>
>>>>>> WebClient httpPut: host, '/notes/test8' content: '{"content":"abc"}'
>>>>>> type: 'text/plain'.
>>>>>>
>>>>>> gives back
>>>>>> '{"ok":true,"id":"test8","rev":"1-f40e52919735ae6775af3d388361b3da"}
>>>>>> '
>>>>>>
>>>>>> --Hannes
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko AKA sig.
>>>
>>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: UTF8 in JSON (was: Re: [ANN] WebClient and WebServer 1.0 for Squeak)

Levente Uzonyi-2
On Wed, 12 May 2010, Hannes Hirzel wrote:

> Levente, your answer covers an earlier state of the exchange. See here
> for the latest account
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2010-May/150497.html

Sorry, I didn't read all the mails before I replied.


Levente

>
> Basically the need for UFT8 conversion in my case stems from the fact
> that I use the WebClient to post the JSON object and it accepts only
> bytes. And I want to post to a couchDB which deals nicely with UTF8.
>
> The JSON package as such needs no UTF8 conversion. Only escaping of
> backslash \, double quote " and control characters.
>
>
> The method
> String >>jsonWriteOn: aStream
>
> should stay at
>
> String
>>  jsonWriteOn: aStream
>>
>>        | replacement |
>>        aStream nextPut: $".
>>        self do: [ :ch |
>>                (replacement := Json escapeForCharacter: ch)
>>                        ifNil: [ aStream nextPut: ch ]
>>                        ifNotNil: [ aStream nextPutAll: replacement ] ].
>>        aStream nextPut: $".
>
> but the method
>  Json escapeForCharacter: ch
> does not need to go for \uNNNN for non-ASCII characters.
>
> So I do the UFT8 conversion just before Http posting.
>
> I hope this clarified the situation and we might move soon to an
> update of the JSON package.
>
> --Hannes
>
> On 5/11/10, Levente Uzonyi <[hidden email]> wrote:
>> On Tue, 11 May 2010, Hannes Hirzel wrote:
>>
>>> 1) UFT8 conversion
>>> 2) Change to JSON package of Tony Garnock-Jones
>>> 3) My updated Test case
>>> 4) Conclusion
>>>
>>>
>>> 1) UFT8 conversion
>>>
>>> My question was:
>>>    How do I convert a WideString to UTF8?
>>>
>>>
>>> Levente answered:
>>>
>>> There are various possibilities:
>>> 'äbc' squeakToUtf8.
>>> 'äbc' convertToEncoding: 'utf-8'.
>>> 'äbc' convertToWithConverter: UTF8TextConverter new.
>>> UTF8TextConverter new encodeString: 'äbc'.
>>>
>>>
>>>
>>> 2) Change to JSON package of Tony Garnock-Jones
>>>
>>> As CouchDB stores UTF8 values I did not want to escape them with
>>> \uNNNN as the forked JSON package in SCouchDB does. But instead I
>>> wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion
>>> is not correct in the original JSON package.
>>>
>>> So I did the following correction.
>>>
>>> In the class
>>>  String  - category *JSON-writing
>>>  (from package http://www.squeaksource.com/JSON)
>>> I replaced
>>>
>>>  jsonWriteOn: aStream
>>> | replacement |
>>> aStream nextPut: $".
>>> self do: [ :ch |
>>> (replacement := Json escapeForCharacter: ch)    "***"
>>> ifNil: [ aStream nextPut: ch ]
>>> ifNotNil: [ aStream nextPutAll: replacement ] ].
>>> aStream nextPut: $".
>>>
>>>
>>> WITH
>>>
>>>  jsonWriteOn: aStream
>>> aStream nextPut: $".
>>> aStream nextPutAll:  (UTF8TextConverter new encodeString: self).
>>> aStream nextPut: $".
>>
>> This is just wrong. According to http://json.org a string can contain any
>> unicode character except for \ " and control characters. So here should be
>> no UTF-8 conversion.
>>
>> You only need to convert the characters to UTF-8, because you're sending
>> them over the network to a server, and unicode characters have to be
>> converted to bytes someway. So the JSON printer shouldn't do any
>> conversion by default except for escaping. The only problem is that
>> escaping is not done as the spec requires it, but that's easy to fix.
>>
>>
>> Levente
>>
>>>
>>>
>>> "*** NOTE: escapeForCharacter is incorrectly implemented in
>>> http://www.squeaksource.com/JSON
>>> and is corrected by Rado in the SCouchDB fork of the package JSON
>>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz"
>>>
>>>
>>>
>>> 3) My updated Test case
>>>
>>> myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b').
>>> d := Dictionary new. d at: 'title' put:   'aTitle'. d at: 'body' put:
>>> myWideString.
>>> r := WriteStream on: String new.
>>> (JsonObject newFrom: d) jsonWriteOn: r.
>>> WebClient httpPut: host, '/notes/test24' content: r contents type:
>>> 'text/plain'.
>>>
>>> RESULT: OK.
>>>
>>>
>>>
>>> 4) Conclusion
>>>
>>> With the change to the JSON package I am now fine in using WebClient
>>> for storing objects in a couchdB.
>>>
>>> However I did not commit my change to
>>>  http://www.squeaksource.com/JSON
>>> as I do not (yet) understand the full impact of it.
>>>
>>>
>>> Thank you Andreas Raab, Levente Uzony and Rado Hodnicak for your help
>>>
>>> --Hannes
>>>
>>> On 5/11/10, Igor Stasenko <[hidden email]> wrote:
>>>> On 11 May 2010 17:44, Hannes Hirzel <[hidden email]> wrote:
>>>>> On 5/10/10, radoslav hodnicak <[hidden email]> wrote:
>>>>>>
>>>>>> Which JSON package/version are you using? I fixed a bug in the one
>>>>>> distributed with SCouchDB few weeks ago, where it didn't encode utf8
>>>>>> characters properly - the correct escaped form is \uNNNN - always
>>>>>> padded
>>>>>> to 4 Ns. that's why you get that warning, yours is only 2-3
>>>>>>
>>>>>> rado
>>>>>
>>>>> I have been using
>>>>> http://www.squeaksource.com/JSON (over 7000 downloads)
>>>>> in combination with WebClient.
>>>>>
>>>>> Thank you Rado, I found
>>>>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz
>>>>> and will have a look at it.
>>>>> (Your comment: added handling of utf8 encoded input data - this is
>>>>> necessary for couchdb-lucene which sends results directly in utf8 and
>>>>> not \uNNNN encoded)
>>>>>
>>>> SCouchDB using a forked version of JSON package, which you can find in
>>>> SCouchDB repository
>>>> http://www.squeaksource.com/SCouchDB/JSON-Igor.Stasenko.34.mcz
>>>>
>>>> If you looking for that method, it can be found in Json>>unescapeUnicode
>>>>
>>>>
>>>>> --Hannes
>>>>>
>>>>>
>>>>>> On Mon, 10 May 2010, Hannes Hirzel wrote:
>>>>>>
>>>>>>> The test case made simpler
>>>>>>>
>>>>>>> WebClient httpPut: host, '/notes/test7' content:
>>>>>>> '{"content":"\uC3\uA4s"}' type: 'text/plain'.
>>>>>>>
>>>>>>> gives back as answer: '{"error":"bad_request","reason":"invalid UTF-8
>>>>>>> JSON"}
>>>>>>> '
>>>>>>>
>>>>>>> whereas
>>>>>>>
>>>>>>> WebClient httpPut: host, '/notes/test8' content: '{"content":"abc"}'
>>>>>>> type: 'text/plain'.
>>>>>>>
>>>>>>> gives back
>>>>>>> '{"ok":true,"id":"test8","rev":"1-f40e52919735ae6775af3d388361b3da"}
>>>>>>> '
>>>>>>>
>>>>>>> --Hannes
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Igor Stasenko AKA sig.
>>>>
>>>>
>>>
>>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] WebClient and WebServer 1.0 for Squeak

Andreas.Raab
In reply to this post by Igor Stasenko
On 5/11/2010 2:36 PM, Igor Stasenko wrote:
> On 11 May 2010 19:22, Andreas Raab<[hidden email]>  wrote:
>> What other methods do you need? There should be no problem adding any I just
>> had no need for them initially.
>>
> As far as i can tell,
>
> CouchDB API using PUT, POST, GET, DELETE methods.
>
> (http://wiki.apache.org/couchdb/API_Cheatsheet)

All of those are now covered by the latest WebClient. In addition to
GET, POST, PUT and DELETE, WebClient also supports HEAD, TRACE, and
OPTIONS.

Plus, this forced me to deal with HTTP methods in WebServer correctly
which is great since it means you can now specify what methods should
apply to a given resource, e.g.,

        server := WebServer reset default.
        server listenOn: 8080.
        server addService: '/foo' action:[:req|
                req send200Response: 'OK'
        ] methods: {'GET'. 'PUT'. 'DELETE'}. "GET/PUT/DELETE are allowed but no
POST"

Now, you can do, e.g.,

        WebClient httpGet: 'http://localhost:8080/foo'.
          "=> 200 OK"

but POSTs are not allowed:

        WebClient httpPost: 'http://localhost:8080/foo' content: '' type:''.
          "=> 405 Method Not Allowed"

However, since OPTIONS is supported you can ask for the supported
operations of the resource:

        WebClient httpOptions: 'http://localhost:8080/foo'.
          "=> allow: HEAD,TRACE,OPTIONS,GET,PUT,DELETE"

and for the default server options:

        WebClient httpOptions: 'http://localhost:8080/*'.
          "=> allow: HEAD,TRACE,OPTIONS,GET,POST"

Nice forcing function, thanks!

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] WebClient and WebServer 1.0 for Squeak

laza
In reply to this post by Andreas.Raab
Works like a charm using squid as a proxy with WebClient-Core-ar.22. Thanks!

Alex

123