Igor, your argument convinces me.
Thank you for the quick feedback. see updates below --Hannes On 5/11/10, Igor Stasenko <[hidden email]> wrote: > On 12 May 2010 00:09, Hannes Hirzel <[hidden email]> wrote: >> 1) UFT8 conversion >> 2) Change to JSON package of Tony Garnock-Jones >> 3) My updated Test case >> 4) Conclusion >> >> >> 1) UFT8 conversion >> >> My question was: >> How do I convert a WideString to UTF8? >> >> >> Levente answered: >> >> There are various possibilities: >> 'äbc' squeakToUtf8. >> 'äbc' convertToEncoding: 'utf-8'. >> 'äbc' convertToWithConverter: UTF8TextConverter new. >> UTF8TextConverter new encodeString: 'äbc'. >> >> >> >> 2) Change to JSON package of Tony Garnock-Jones >> >> As CouchDB stores UTF8 values I did not want to escape them with >> \uNNNN as the forked JSON package in SCouchDB does. > > i know. But JSON could be used for something else, and also its a part > of syntax, > so it should be supported there. > >> But instead I >> wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion >> is not correct in the original JSON package. >> > Yeah.. SCouchDB having no utf-8 support for output. Yet. > >> So I did the following correction. >> >> In the class >> String - category *JSON-writing >> (from package http://www.squeaksource.com/JSON) >> I replaced >> >> jsonWriteOn: aStream >> | replacement | >> aStream nextPut: $". >> self do: [ :ch | >> (replacement := Json escapeForCharacter: ch) "***" >> ifNil: [ aStream nextPut: ch ] >> ifNotNil: [ aStream nextPutAll: replacement ] ]. >> aStream nextPut: $". >> >> >> WITH >> >> jsonWriteOn: aStream >> aStream nextPut: $". >> aStream nextPutAll: (UTF8TextConverter new encodeString: self). >> aStream nextPut: $". >> > > No, this is WRONG! > > Json writer methods should output a unicode text, and do not deal with > any encoding! > Then, a layer which responsible for transferring the data will be free > decide how to encode the > json output, either using utf-8 encoding or any other appropriate UTF > encoding. > > By putting utf-8 conversions in JSON library routines you limiting > JSON library to be used only with utf-8 encoding. > > I repeat: JSON library is wrong place for dealing with encodings. It > should take a unicode text/stream as input > and unicode text/stream as output. Any encodings should be up to the > outer layers, which responsible for data transmission! So String>> jsonWriteOn:aStream is now just jsonWriteOn: aStream aStream nextPut: $". aStream nextPutAll: self. aStream nextPut: $". >> >> "*** NOTE: escapeForCharacter is incorrectly implemented in >> http://www.squeaksource.com/JSON >> and is corrected by Rado in the SCouchDB fork of the package JSON >> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz" >> > > >> >> >> 3) My updated Test case >> myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b'). d := Dictionary new. d at: 'title' put: 'aTitle'. d at: 'body' put: myWideString. r := WriteStream on: String new. (JsonObject newFrom: d) jsonWriteOn: r. WebClient httpPut: host, '/notes/test25' content: (UTF8TextConverter new encodeString: r contents) type: 'text/plain'. RESULT: OK. >> 4) Conclusion >> >> With the change to the JSON package I am now fine in using WebClient >> for storing objects in a couchdB. >> However I did not commit my change to http://www.squeaksource.com/JSON though Json escapeForCharacter: ch is wrong. And probably it should not do it. At least the current couchDB deals properly with UTF8 encoded strings. |
On 12 May 2010 01:12, Hannes Hirzel <[hidden email]> wrote:
> Igor, your argument convinces me. > Thank you for the quick feedback. > > see updates below > > --Hannes > > On 5/11/10, Igor Stasenko <[hidden email]> wrote: >> On 12 May 2010 00:09, Hannes Hirzel <[hidden email]> wrote: >>> 1) UFT8 conversion >>> 2) Change to JSON package of Tony Garnock-Jones >>> 3) My updated Test case >>> 4) Conclusion >>> >>> >>> 1) UFT8 conversion >>> >>> My question was: >>> How do I convert a WideString to UTF8? >>> >>> >>> Levente answered: >>> >>> There are various possibilities: >>> 'äbc' squeakToUtf8. >>> 'äbc' convertToEncoding: 'utf-8'. >>> 'äbc' convertToWithConverter: UTF8TextConverter new. >>> UTF8TextConverter new encodeString: 'äbc'. >>> >>> >>> >>> 2) Change to JSON package of Tony Garnock-Jones >>> >>> As CouchDB stores UTF8 values I did not want to escape them with >>> \uNNNN as the forked JSON package in SCouchDB does. >> >> i know. But JSON could be used for something else, and also its a part >> of syntax, >> so it should be supported there. >> >>> But instead I >>> wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion >>> is not correct in the original JSON package. >>> >> Yeah.. SCouchDB having no utf-8 support for output. Yet. >> >>> So I did the following correction. >>> >>> In the class >>> String - category *JSON-writing >>> (from package http://www.squeaksource.com/JSON) >>> I replaced >>> >>> jsonWriteOn: aStream >>> | replacement | >>> aStream nextPut: $". >>> self do: [ :ch | >>> (replacement := Json escapeForCharacter: ch) "***" >>> ifNil: [ aStream nextPut: ch ] >>> ifNotNil: [ aStream nextPutAll: replacement ] ]. >>> aStream nextPut: $". >>> >>> >>> WITH >>> >>> jsonWriteOn: aStream >>> aStream nextPut: $". >>> aStream nextPutAll: (UTF8TextConverter new encodeString: self). >>> aStream nextPut: $". >>> >> >> No, this is WRONG! >> >> Json writer methods should output a unicode text, and do not deal with >> any encoding! >> Then, a layer which responsible for transferring the data will be free >> decide how to encode the >> json output, either using utf-8 encoding or any other appropriate UTF >> encoding. >> >> By putting utf-8 conversions in JSON library routines you limiting >> JSON library to be used only with utf-8 encoding. >> >> I repeat: JSON library is wrong place for dealing with encodings. It >> should take a unicode text/stream as input >> and unicode text/stream as output. Any encodings should be up to the >> outer layers, which responsible for data transmission! > > So String>> jsonWriteOn:aStream > > is now just > > jsonWriteOn: aStream > aStream nextPut: $". > aStream nextPutAll: self. > aStream nextPut: $". > > > >>> >>> "*** NOTE: escapeForCharacter is incorrectly implemented in >>> http://www.squeaksource.com/JSON >>> and is corrected by Rado in the SCouchDB fork of the package JSON >>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz" >>> >> >> >>> >>> >>> 3) My updated Test case >>> > > myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b'). > d := Dictionary new. d at: 'title' put: 'aTitle'. d at: 'body' put: > myWideString. > r := WriteStream on: String new. > (JsonObject newFrom: d) jsonWriteOn: r. > WebClient httpPut: host, '/notes/test25' content: (UTF8TextConverter > new encodeString: r contents) type: 'text/plain'. > > > RESULT: OK. > > >>> 4) Conclusion >>> >>> With the change to the JSON package I am now fine in using WebClient >>> for storing objects in a couchdB. >>> > > However I did not commit my change to > http://www.squeaksource.com/JSON > > though > Json escapeForCharacter: ch > is wrong. > Or, maybe we could be more clever and add an option, whether we want to escape a non-ascii characters or not. This can be done by adding a single method to stream, which could tell if it can deal with unicode , or only with ascii characters. > And probably it should not do it. At least the current couchDB deals > properly with UTF8 encoded strings. > > In SCouchDB i will put an encoding layer right before sending json (in similar way as you used in the example above). Its easy to do, given the assumption, that JSON output is _always_ a unicode text, then i can simply use an appropriate utf-8 encoder, which will encode it while sending to server. And thus, no extra effort is required in JSON itself. -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Hannes Hirzel
On 12 May 2010 01:12, Hannes Hirzel <[hidden email]> wrote:
> So String>> jsonWriteOn:aStream > > is now just > > jsonWriteOn: aStream > aStream nextPut: $". > aStream nextPutAll: self. > aStream nextPut: $". > > (and other control characters), it must be properly escaped: '"' asJsonString '"\""' String crlf asJsonString '"\r\n"' > -- Best regards, Igor Stasenko AKA sig. |
On 5/11/10, Igor Stasenko <[hidden email]> wrote:
> On 12 May 2010 01:12, Hannes Hirzel <[hidden email]> wrote: >> So String>> jsonWriteOn:aStream >> >> is now just >> >> jsonWriteOn: aStream >> aStream nextPut: $". >> aStream nextPutAll: self. >> aStream nextPut: $". >> >> > this is also wrong, because if your string contains a $" character > (and other control characters), it must be properly escaped: > > '"' asJsonString '"\""' > > String crlf asJsonString '"\r\n"' > Yes, you are right. I just realised it as well. However there is no method asJsonString in the http://www.squeaksource.com/JSON package by Tony Garnock-Jones and others. At least " and \ have to be escaped (cf. for example http://awwx.ws/combinator/13) So I went for the following Instead of Json class escapeForCharacter: c | index | ^ (index := c asciiValue + 1) <= escapeArray size ifTrue: [ ^ escapeArray at: index ] ifFalse: [ ^ '\u', ((c asciiValue bitAnd: 16rFFFF) printStringBase: 16) ] I do escapeForCharacter: c | index | ^ (index := c asciiValue + 1) <= escapeArray size ifTrue: [ ^ escapeArray at: index ] ifFalse: [ ^nil] And I go back from String jsonWriteOn: aStream aStream nextPut: $". aStream nextPutAll: self. aStream nextPut: $". to what it was before String jsonWriteOn: aStream | replacement | aStream nextPut: $". self do: [ :ch | (replacement := Json escapeForCharacter: ch) ifNil: [ aStream nextPut: ch ] ifNotNil: [ aStream nextPutAll: replacement ] ]. aStream nextPut: $". And in fact the test case has to be extended to include a backslash u in the example string (myWideString). myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b\user'). d := Dictionary new. d at: 'title' put: 'aTitle'. d at: 'body' put: myWideString. r := WriteStream on: String new. (JsonObject newFrom: d) jsonWriteOn: r. WebClient httpPut: host, '/notes/test30' content: (UTF8TextConverter new encodeString: r contents) type: 'text/plain'. --Hannes |
On 12 May 2010 02:17, Hannes Hirzel <[hidden email]> wrote:
> On 5/11/10, Igor Stasenko <[hidden email]> wrote: >> On 12 May 2010 01:12, Hannes Hirzel <[hidden email]> wrote: >>> So String>> jsonWriteOn:aStream >>> >>> is now just >>> >>> jsonWriteOn: aStream >>> aStream nextPut: $". >>> aStream nextPutAll: self. >>> aStream nextPut: $". >>> >>> >> this is also wrong, because if your string contains a $" character >> (and other control characters), it must be properly escaped: >> >> '"' asJsonString '"\""' >> >> String crlf asJsonString '"\r\n"' >> > > Yes, you are right. I just realised it as well. However there is no method > asJsonString > in the http://www.squeaksource.com/JSON package by Tony Garnock-Jones > and others. > convenient :) > > At least " and \ have to be escaped (cf. for example > http://awwx.ws/combinator/13) > > So I went for the following > > Instead of > > Json class > > escapeForCharacter: c > > | index | > ^ (index := c asciiValue + 1) <= escapeArray size > ifTrue: [ ^ escapeArray at: index ] > ifFalse: [ ^ '\u', ((c asciiValue bitAnd: 16rFFFF) printStringBase: 16) ] > > > I do > > > escapeForCharacter: c > > | index | > ^ (index := c asciiValue + 1) <= escapeArray size > ifTrue: [ ^ escapeArray at: index ] > ifFalse: [ ^nil] > > > And I go back from > > String > jsonWriteOn: aStream > aStream nextPut: $". > aStream nextPutAll: self. > aStream nextPut: $". > > to what it was before > > > String > jsonWriteOn: aStream > > | replacement | > aStream nextPut: $". > self do: [ :ch | > (replacement := Json escapeForCharacter: ch) > ifNil: [ aStream nextPut: ch ] > ifNotNil: [ aStream nextPutAll: replacement ] ]. > aStream nextPut: $". > > > And in fact the test case has to be extended to include a backslash u > in the example string (myWideString). > > myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b\user'). > d := Dictionary new. d at: 'title' put: 'aTitle'. d at: 'body' put: > myWideString. > r := WriteStream on: String new. > (JsonObject newFrom: d) jsonWriteOn: r. > WebClient httpPut: host, '/notes/test30' content: (UTF8TextConverter > new encodeString: r contents) type: 'text/plain'. > > Thank you, Hannes for being scrupulous. :) > > --Hannes > > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Hannes Hirzel
On Tue, 11 May 2010, Hannes Hirzel wrote:
> 1) UFT8 conversion > 2) Change to JSON package of Tony Garnock-Jones > 3) My updated Test case > 4) Conclusion > > > 1) UFT8 conversion > > My question was: > How do I convert a WideString to UTF8? > > > Levente answered: > > There are various possibilities: > 'äbc' squeakToUtf8. > 'äbc' convertToEncoding: 'utf-8'. > 'äbc' convertToWithConverter: UTF8TextConverter new. > UTF8TextConverter new encodeString: 'äbc'. > > > > 2) Change to JSON package of Tony Garnock-Jones > > As CouchDB stores UTF8 values I did not want to escape them with > \uNNNN as the forked JSON package in SCouchDB does. But instead I > wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion > is not correct in the original JSON package. > > So I did the following correction. > > In the class > String - category *JSON-writing > (from package http://www.squeaksource.com/JSON) > I replaced > > jsonWriteOn: aStream > | replacement | > aStream nextPut: $". > self do: [ :ch | > (replacement := Json escapeForCharacter: ch) "***" > ifNil: [ aStream nextPut: ch ] > ifNotNil: [ aStream nextPutAll: replacement ] ]. > aStream nextPut: $". > > > WITH > > jsonWriteOn: aStream > aStream nextPut: $". > aStream nextPutAll: (UTF8TextConverter new encodeString: self). > aStream nextPut: $". unicode character except for \ " and control characters. So here should be no UTF-8 conversion. You only need to convert the characters to UTF-8, because you're sending them over the network to a server, and unicode characters have to be converted to bytes someway. So the JSON printer shouldn't do any conversion by default except for escaping. The only problem is that escaping is not done as the spec requires it, but that's easy to fix. Levente > > > "*** NOTE: escapeForCharacter is incorrectly implemented in > http://www.squeaksource.com/JSON > and is corrected by Rado in the SCouchDB fork of the package JSON > http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz" > > > > 3) My updated Test case > > myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b'). > d := Dictionary new. d at: 'title' put: 'aTitle'. d at: 'body' put: > myWideString. > r := WriteStream on: String new. > (JsonObject newFrom: d) jsonWriteOn: r. > WebClient httpPut: host, '/notes/test24' content: r contents type: 'text/plain'. > > RESULT: OK. > > > > 4) Conclusion > > With the change to the JSON package I am now fine in using WebClient > for storing objects in a couchdB. > > However I did not commit my change to > http://www.squeaksource.com/JSON > as I do not (yet) understand the full impact of it. > > > Thank you Andreas Raab, Levente Uzony and Rado Hodnicak for your help > > --Hannes > > On 5/11/10, Igor Stasenko <[hidden email]> wrote: >> On 11 May 2010 17:44, Hannes Hirzel <[hidden email]> wrote: >>> On 5/10/10, radoslav hodnicak <[hidden email]> wrote: >>>> >>>> Which JSON package/version are you using? I fixed a bug in the one >>>> distributed with SCouchDB few weeks ago, where it didn't encode utf8 >>>> characters properly - the correct escaped form is \uNNNN - always padded >>>> to 4 Ns. that's why you get that warning, yours is only 2-3 >>>> >>>> rado >>> >>> I have been using >>> http://www.squeaksource.com/JSON (over 7000 downloads) >>> in combination with WebClient. >>> >>> Thank you Rado, I found >>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz >>> and will have a look at it. >>> (Your comment: added handling of utf8 encoded input data - this is >>> necessary for couchdb-lucene which sends results directly in utf8 and >>> not \uNNNN encoded) >>> >> SCouchDB using a forked version of JSON package, which you can find in >> SCouchDB repository >> http://www.squeaksource.com/SCouchDB/JSON-Igor.Stasenko.34.mcz >> >> If you looking for that method, it can be found in Json>>unescapeUnicode >> >> >>> --Hannes >>> >>> >>>> On Mon, 10 May 2010, Hannes Hirzel wrote: >>>> >>>>> The test case made simpler >>>>> >>>>> WebClient httpPut: host, '/notes/test7' content: >>>>> '{"content":"\uC3\uA4s"}' type: 'text/plain'. >>>>> >>>>> gives back as answer: '{"error":"bad_request","reason":"invalid UTF-8 >>>>> JSON"} >>>>> ' >>>>> >>>>> whereas >>>>> >>>>> WebClient httpPut: host, '/notes/test8' content: '{"content":"abc"}' >>>>> type: 'text/plain'. >>>>> >>>>> gives back >>>>> '{"ok":true,"id":"test8","rev":"1-f40e52919735ae6775af3d388361b3da"} >>>>> ' >>>>> >>>>> --Hannes >>>> >>>> >>> >>> >> >> >> >> -- >> Best regards, >> Igor Stasenko AKA sig. >> >> > > |
Levente, your answer covers an earlier state of the exchange. See here
for the latest account http://lists.squeakfoundation.org/pipermail/squeak-dev/2010-May/150497.html Basically the need for UFT8 conversion in my case stems from the fact that I use the WebClient to post the JSON object and it accepts only bytes. And I want to post to a couchDB which deals nicely with UTF8. The JSON package as such needs no UTF8 conversion. Only escaping of backslash \, double quote " and control characters. The method String >>jsonWriteOn: aStream should stay at String > jsonWriteOn: aStream > > | replacement | > aStream nextPut: $". > self do: [ :ch | > (replacement := Json escapeForCharacter: ch) > ifNil: [ aStream nextPut: ch ] > ifNotNil: [ aStream nextPutAll: replacement ] ]. > aStream nextPut: $". but the method Json escapeForCharacter: ch does not need to go for \uNNNN for non-ASCII characters. So I do the UFT8 conversion just before Http posting. I hope this clarified the situation and we might move soon to an update of the JSON package. --Hannes On 5/11/10, Levente Uzonyi <[hidden email]> wrote: > On Tue, 11 May 2010, Hannes Hirzel wrote: > >> 1) UFT8 conversion >> 2) Change to JSON package of Tony Garnock-Jones >> 3) My updated Test case >> 4) Conclusion >> >> >> 1) UFT8 conversion >> >> My question was: >> How do I convert a WideString to UTF8? >> >> >> Levente answered: >> >> There are various possibilities: >> 'äbc' squeakToUtf8. >> 'äbc' convertToEncoding: 'utf-8'. >> 'äbc' convertToWithConverter: UTF8TextConverter new. >> UTF8TextConverter new encodeString: 'äbc'. >> >> >> >> 2) Change to JSON package of Tony Garnock-Jones >> >> As CouchDB stores UTF8 values I did not want to escape them with >> \uNNNN as the forked JSON package in SCouchDB does. But instead I >> wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion >> is not correct in the original JSON package. >> >> So I did the following correction. >> >> In the class >> String - category *JSON-writing >> (from package http://www.squeaksource.com/JSON) >> I replaced >> >> jsonWriteOn: aStream >> | replacement | >> aStream nextPut: $". >> self do: [ :ch | >> (replacement := Json escapeForCharacter: ch) "***" >> ifNil: [ aStream nextPut: ch ] >> ifNotNil: [ aStream nextPutAll: replacement ] ]. >> aStream nextPut: $". >> >> >> WITH >> >> jsonWriteOn: aStream >> aStream nextPut: $". >> aStream nextPutAll: (UTF8TextConverter new encodeString: self). >> aStream nextPut: $". > > This is just wrong. According to http://json.org a string can contain any > unicode character except for \ " and control characters. So here should be > no UTF-8 conversion. > > You only need to convert the characters to UTF-8, because you're sending > them over the network to a server, and unicode characters have to be > converted to bytes someway. So the JSON printer shouldn't do any > conversion by default except for escaping. The only problem is that > escaping is not done as the spec requires it, but that's easy to fix. > > > Levente > >> >> >> "*** NOTE: escapeForCharacter is incorrectly implemented in >> http://www.squeaksource.com/JSON >> and is corrected by Rado in the SCouchDB fork of the package JSON >> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz" >> >> >> >> 3) My updated Test case >> >> myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b'). >> d := Dictionary new. d at: 'title' put: 'aTitle'. d at: 'body' put: >> myWideString. >> r := WriteStream on: String new. >> (JsonObject newFrom: d) jsonWriteOn: r. >> WebClient httpPut: host, '/notes/test24' content: r contents type: >> 'text/plain'. >> >> RESULT: OK. >> >> >> >> 4) Conclusion >> >> With the change to the JSON package I am now fine in using WebClient >> for storing objects in a couchdB. >> >> However I did not commit my change to >> http://www.squeaksource.com/JSON >> as I do not (yet) understand the full impact of it. >> >> >> Thank you Andreas Raab, Levente Uzony and Rado Hodnicak for your help >> >> --Hannes >> >> On 5/11/10, Igor Stasenko <[hidden email]> wrote: >>> On 11 May 2010 17:44, Hannes Hirzel <[hidden email]> wrote: >>>> On 5/10/10, radoslav hodnicak <[hidden email]> wrote: >>>>> >>>>> Which JSON package/version are you using? I fixed a bug in the one >>>>> distributed with SCouchDB few weeks ago, where it didn't encode utf8 >>>>> characters properly - the correct escaped form is \uNNNN - always >>>>> padded >>>>> to 4 Ns. that's why you get that warning, yours is only 2-3 >>>>> >>>>> rado >>>> >>>> I have been using >>>> http://www.squeaksource.com/JSON (over 7000 downloads) >>>> in combination with WebClient. >>>> >>>> Thank you Rado, I found >>>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz >>>> and will have a look at it. >>>> (Your comment: added handling of utf8 encoded input data - this is >>>> necessary for couchdb-lucene which sends results directly in utf8 and >>>> not \uNNNN encoded) >>>> >>> SCouchDB using a forked version of JSON package, which you can find in >>> SCouchDB repository >>> http://www.squeaksource.com/SCouchDB/JSON-Igor.Stasenko.34.mcz >>> >>> If you looking for that method, it can be found in Json>>unescapeUnicode >>> >>> >>>> --Hannes >>>> >>>> >>>>> On Mon, 10 May 2010, Hannes Hirzel wrote: >>>>> >>>>>> The test case made simpler >>>>>> >>>>>> WebClient httpPut: host, '/notes/test7' content: >>>>>> '{"content":"\uC3\uA4s"}' type: 'text/plain'. >>>>>> >>>>>> gives back as answer: '{"error":"bad_request","reason":"invalid UTF-8 >>>>>> JSON"} >>>>>> ' >>>>>> >>>>>> whereas >>>>>> >>>>>> WebClient httpPut: host, '/notes/test8' content: '{"content":"abc"}' >>>>>> type: 'text/plain'. >>>>>> >>>>>> gives back >>>>>> '{"ok":true,"id":"test8","rev":"1-f40e52919735ae6775af3d388361b3da"} >>>>>> ' >>>>>> >>>>>> --Hannes >>>>> >>>>> >>>> >>>> >>> >>> >>> >>> -- >>> Best regards, >>> Igor Stasenko AKA sig. >>> >>> >> >> |
On Wed, 12 May 2010, Hannes Hirzel wrote:
> Levente, your answer covers an earlier state of the exchange. See here > for the latest account > http://lists.squeakfoundation.org/pipermail/squeak-dev/2010-May/150497.html Sorry, I didn't read all the mails before I replied. Levente > > Basically the need for UFT8 conversion in my case stems from the fact > that I use the WebClient to post the JSON object and it accepts only > bytes. And I want to post to a couchDB which deals nicely with UTF8. > > The JSON package as such needs no UTF8 conversion. Only escaping of > backslash \, double quote " and control characters. > > > The method > String >>jsonWriteOn: aStream > > should stay at > > String >> jsonWriteOn: aStream >> >> | replacement | >> aStream nextPut: $". >> self do: [ :ch | >> (replacement := Json escapeForCharacter: ch) >> ifNil: [ aStream nextPut: ch ] >> ifNotNil: [ aStream nextPutAll: replacement ] ]. >> aStream nextPut: $". > > but the method > Json escapeForCharacter: ch > does not need to go for \uNNNN for non-ASCII characters. > > So I do the UFT8 conversion just before Http posting. > > I hope this clarified the situation and we might move soon to an > update of the JSON package. > > --Hannes > > On 5/11/10, Levente Uzonyi <[hidden email]> wrote: >> On Tue, 11 May 2010, Hannes Hirzel wrote: >> >>> 1) UFT8 conversion >>> 2) Change to JSON package of Tony Garnock-Jones >>> 3) My updated Test case >>> 4) Conclusion >>> >>> >>> 1) UFT8 conversion >>> >>> My question was: >>> How do I convert a WideString to UTF8? >>> >>> >>> Levente answered: >>> >>> There are various possibilities: >>> 'äbc' squeakToUtf8. >>> 'äbc' convertToEncoding: 'utf-8'. >>> 'äbc' convertToWithConverter: UTF8TextConverter new. >>> UTF8TextConverter new encodeString: 'äbc'. >>> >>> >>> >>> 2) Change to JSON package of Tony Garnock-Jones >>> >>> As CouchDB stores UTF8 values I did not want to escape them with >>> \uNNNN as the forked JSON package in SCouchDB does. But instead I >>> wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion >>> is not correct in the original JSON package. >>> >>> So I did the following correction. >>> >>> In the class >>> String - category *JSON-writing >>> (from package http://www.squeaksource.com/JSON) >>> I replaced >>> >>> jsonWriteOn: aStream >>> | replacement | >>> aStream nextPut: $". >>> self do: [ :ch | >>> (replacement := Json escapeForCharacter: ch) "***" >>> ifNil: [ aStream nextPut: ch ] >>> ifNotNil: [ aStream nextPutAll: replacement ] ]. >>> aStream nextPut: $". >>> >>> >>> WITH >>> >>> jsonWriteOn: aStream >>> aStream nextPut: $". >>> aStream nextPutAll: (UTF8TextConverter new encodeString: self). >>> aStream nextPut: $". >> >> This is just wrong. According to http://json.org a string can contain any >> unicode character except for \ " and control characters. So here should be >> no UTF-8 conversion. >> >> You only need to convert the characters to UTF-8, because you're sending >> them over the network to a server, and unicode characters have to be >> converted to bytes someway. So the JSON printer shouldn't do any >> conversion by default except for escaping. The only problem is that >> escaping is not done as the spec requires it, but that's easy to fix. >> >> >> Levente >> >>> >>> >>> "*** NOTE: escapeForCharacter is incorrectly implemented in >>> http://www.squeaksource.com/JSON >>> and is corrected by Rado in the SCouchDB fork of the package JSON >>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz" >>> >>> >>> >>> 3) My updated Test case >>> >>> myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b'). >>> d := Dictionary new. d at: 'title' put: 'aTitle'. d at: 'body' put: >>> myWideString. >>> r := WriteStream on: String new. >>> (JsonObject newFrom: d) jsonWriteOn: r. >>> WebClient httpPut: host, '/notes/test24' content: r contents type: >>> 'text/plain'. >>> >>> RESULT: OK. >>> >>> >>> >>> 4) Conclusion >>> >>> With the change to the JSON package I am now fine in using WebClient >>> for storing objects in a couchdB. >>> >>> However I did not commit my change to >>> http://www.squeaksource.com/JSON >>> as I do not (yet) understand the full impact of it. >>> >>> >>> Thank you Andreas Raab, Levente Uzony and Rado Hodnicak for your help >>> >>> --Hannes >>> >>> On 5/11/10, Igor Stasenko <[hidden email]> wrote: >>>> On 11 May 2010 17:44, Hannes Hirzel <[hidden email]> wrote: >>>>> On 5/10/10, radoslav hodnicak <[hidden email]> wrote: >>>>>> >>>>>> Which JSON package/version are you using? I fixed a bug in the one >>>>>> distributed with SCouchDB few weeks ago, where it didn't encode utf8 >>>>>> characters properly - the correct escaped form is \uNNNN - always >>>>>> padded >>>>>> to 4 Ns. that's why you get that warning, yours is only 2-3 >>>>>> >>>>>> rado >>>>> >>>>> I have been using >>>>> http://www.squeaksource.com/JSON (over 7000 downloads) >>>>> in combination with WebClient. >>>>> >>>>> Thank you Rado, I found >>>>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz >>>>> and will have a look at it. >>>>> (Your comment: added handling of utf8 encoded input data - this is >>>>> necessary for couchdb-lucene which sends results directly in utf8 and >>>>> not \uNNNN encoded) >>>>> >>>> SCouchDB using a forked version of JSON package, which you can find in >>>> SCouchDB repository >>>> http://www.squeaksource.com/SCouchDB/JSON-Igor.Stasenko.34.mcz >>>> >>>> If you looking for that method, it can be found in Json>>unescapeUnicode >>>> >>>> >>>>> --Hannes >>>>> >>>>> >>>>>> On Mon, 10 May 2010, Hannes Hirzel wrote: >>>>>> >>>>>>> The test case made simpler >>>>>>> >>>>>>> WebClient httpPut: host, '/notes/test7' content: >>>>>>> '{"content":"\uC3\uA4s"}' type: 'text/plain'. >>>>>>> >>>>>>> gives back as answer: '{"error":"bad_request","reason":"invalid UTF-8 >>>>>>> JSON"} >>>>>>> ' >>>>>>> >>>>>>> whereas >>>>>>> >>>>>>> WebClient httpPut: host, '/notes/test8' content: '{"content":"abc"}' >>>>>>> type: 'text/plain'. >>>>>>> >>>>>>> gives back >>>>>>> '{"ok":true,"id":"test8","rev":"1-f40e52919735ae6775af3d388361b3da"} >>>>>>> ' >>>>>>> >>>>>>> --Hannes >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> Igor Stasenko AKA sig. >>>> >>>> >>> >>> > > |
In reply to this post by Igor Stasenko
On 5/11/2010 2:36 PM, Igor Stasenko wrote:
> On 11 May 2010 19:22, Andreas Raab<[hidden email]> wrote: >> What other methods do you need? There should be no problem adding any I just >> had no need for them initially. >> > As far as i can tell, > > CouchDB API using PUT, POST, GET, DELETE methods. > > (http://wiki.apache.org/couchdb/API_Cheatsheet) All of those are now covered by the latest WebClient. In addition to GET, POST, PUT and DELETE, WebClient also supports HEAD, TRACE, and OPTIONS. Plus, this forced me to deal with HTTP methods in WebServer correctly which is great since it means you can now specify what methods should apply to a given resource, e.g., server := WebServer reset default. server listenOn: 8080. server addService: '/foo' action:[:req| req send200Response: 'OK' ] methods: {'GET'. 'PUT'. 'DELETE'}. "GET/PUT/DELETE are allowed but no POST" Now, you can do, e.g., WebClient httpGet: 'http://localhost:8080/foo'. "=> 200 OK" but POSTs are not allowed: WebClient httpPost: 'http://localhost:8080/foo' content: '' type:''. "=> 405 Method Not Allowed" However, since OPTIONS is supported you can ask for the supported operations of the resource: WebClient httpOptions: 'http://localhost:8080/foo'. "=> allow: HEAD,TRACE,OPTIONS,GET,PUT,DELETE" and for the default server options: WebClient httpOptions: 'http://localhost:8080/*'. "=> allow: HEAD,TRACE,OPTIONS,GET,POST" Nice forcing function, thanks! Cheers, - Andreas |
In reply to this post by Andreas.Raab
Works like a charm using squid as a proxy with WebClient-Core-ar.22. Thanks!
Alex |
Free forum by Nabble | Edit this page |