WebClient, Json and CouchDB

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

WebClient, Json and CouchDB

Hannes Hirzel
Hello

This is a note how I upload Json documents to a couchDB using

Webclient   http://www.squeaksource.com/WebClient
and a modified version of http://www.squeaksource.com/JSON   JSON-jrd.28

The test case is

  |d r|
  d := Dictionary new. d at: 'title' put: 'The title of this card'.
  d at: 'body' put: (8820 asCharacter asString, 'aäbc', Character cr).
  r := WriteStream on: String new.
  (JsonObject newFrom: d) jsonWriteOn: r.
  "r contents"

  WebClient httpPut: 'http://192.168.0.121:5984/test/myDoc6' content:
r contents type: ''.

WebClient gives back as contents
  '{"ok":true,"id":"myDoc6","rev":"1-cef58e13534fc0fcf7f38262bc086d12"}
'


To make this work I had to patch the method

Json escapeForCharacter: aCharacter


This was necessary because for example

    Json escapeForCharacter: 228 asCharacter

gave back
    '\uE4'

instead of
     '\u00E4'


I changed the method Json escapeForCharacter to

escapeForCharacter: c
       
        | index nnnn |
        ^ (index := c asciiValue + 1) <= escapeArray size
                ifTrue: [ ^ escapeArray at: index ]
                ifFalse: [nnnn := ((c asciiValue bitAnd: 16rFFFF) printStringBase: 16) .
                                                [nnnn size < 4] whileTrue: [nnnn := '0', nnnn].
                          ^ '\u', nnnn]


My question:

Is there a nicer way of doing

nnnn := ((c asciiValue bitAnd: 16rFFFF) printStringBase: 16) .
                                                [nnnn size < 4] whileTrue: [nnnn := '0', nnnn].
                          ^ '\u', nnnn

Basically I have to patch zeros in front of a ByteString which is too short.


Regards
Hannes



P.S. This note does not address the issue that it would be nice to NOT
escape characters which have a code >127 at all but rather keep them
as Unicode characters as the Json spec allows for this.
http://www.json.org/

Any Unicode character except " and \ is allowed.

This is about fixing an error which comes up when posting Json objects
to CouchDB.

Reply | Threaded
Open this post in threaded view
|

Re: WebClient, Json and CouchDB

Chris Cunnington
http://wiki.squeak.org/squeak/512


The first link is to the Databases page, which now lists CouchDB. The second page is Hannes's post made into a swiki page. This is useful stuff, so I thought I'd put it here for reference, where I and others can find it later. If Hannes finds this an appropriation of his post, I can take it down. 

Chris 


Reply | Threaded
Open this post in threaded view
|

Re: WebClient, Json and CouchDB

radoslav hodnicak
In reply to this post by Hannes Hirzel


On Wed, 12 May 2010, Hannes Hirzel wrote:

> My question:
>
> Is there a nicer way of doing
>
> nnnn := ((c asciiValue bitAnd: 16rFFFF) printStringBase: 16) .
> [nnnn size < 4] whileTrue: [nnnn := '0', nnnn].
> ^ '\u', nnnn
>

Yes there is. As I said before, check the JSON package in the SCouchDB
repository (Igor's link from few days ago), where I fixed this bug. I'm
kinda surprised at your insistence to use a buggy/unmaintained JSON code
when you have been told several times there's one that's tested to work
with CouchDB (I use it in production).

rado

Reply | Threaded
Open this post in threaded view
|

Re: WebClient, Json and CouchDB

Hannes Hirzel
On 5/12/10, radoslav hodnicak <[hidden email]> wrote:

>
>
> On Wed, 12 May 2010, Hannes Hirzel wrote:
>
>> My question:
>>
>> Is there a nicer way of doing
>>
>> nnnn := ((c asciiValue bitAnd: 16rFFFF) printStringBase: 16) .
>> [nnnn size < 4] whileTrue: [nnnn := '0', nnnn].
>> ^ '\u', nnnn
>>
>
> Yes there is. As I said before, check the JSON package in the SCouchDB
> repository (Igor's link from few days ago), where I fixed this bug. I'm
> kinda surprised at your insistence to use a buggy/unmaintained JSON code
> when you have been told several times there's one that's tested to work
> with CouchDB (I use it in production).
>
> rado
>
>



Hello Rado

Yes, your version of the method is nicer
escapeForCharacter: c
       
        | index |
        ^ (index := c asciiValue + 1) <= escapeArray size
                ifTrue: [ ^ escapeArray at: index ]


                "THIS IS WROOONG!!! unicode is not 16bit wide!"
                ifFalse: [ ^ '\u', (((c asciiValue bitAnd: 16rFFFF) printStringBase:
16) padded: #left to: 4 with: $0) ]

However your comment leads me to the non-urgent question: How would we
deal with a code point >65536?

Thank you for insisting that I check out your copy of the JSON package
which you maintain in the SCouchDB project. The surprise on my side is
that you went for creating a copy instead of putting your changes into
the JSON project as it is open for everybody to write. Your copy is
actually pretty hidden whereas the general JSON package is easy to
find.

I went through all the changes you and Igor did in the SCouchDB
project and decided to fold part of them back to the JSON package
http://www.squeaksource.org/JSON .

I documented it on the wiki page which goes along with the JSON
project. I copy it in below.***


So the updated test case for working with WebClient, JSON and the
couchDB is the following.

|json couchDBurl |
  json := JsonObject new.
  json title: 'The title of my note card'.
  json body: 'The body test text of my note card with some Unicode
test characters ',
                   (8450 asCharacter asString, 'ä.', Character cr).

"Note: JsonObject behaves like a JavaScript object insofar that you
can add properties to instances without the necessity that they have
been declared as instance variables. But you might just as well use
JsonObject like a Dictionary instead as it is a subclass of
Dictionary."

"create couchDB instance"
couchDBurl := 'http://localhost:5984/notes'.

WebClient httpPut: couchDBurl
                 content: ''
                 type: 'text/plain'.

"Store first document"
WebClient httpPut: couchDBurl, '/myNote1'
                 content: json asJsonString
                 type: 'text/plain'.

"You get the document back with"

WebClient httpGet: couchDBurl, '/myNote1' .

So far so good. This solution however still escapes code points > 127.
See a note on this below and more on this in an upcoming post.

Regards

Hannes


----------------------------------------------------------------------------------------------------------

***
JSON-hjh.32

Author: Hannes Hirzel

Ancestors: JSON-rh.31

In the project SCouchDB a copy of JSON is maintained by Igor Stasenko
and Radoslav Hodnicak.

This merges part of the changes back, in particular

SCouchDB project

    * JSON-Igor.Stasenko.28
    * JSON-Igor.Stasenko.29
    * JSON-rh.30
    * JSON-rh.31

Main changes

   1. JsonObject is now a subclass of Dictionary instead of Object. So
there is no need to implement the Dictionary interface.
   2. Fix for converting Unicode characters to \uNNNN format (missing
padding to 4 characters)

No further changes

The SCouchDB project contains more changes in the copy of the JSON package.

I did not go further in merging because in SCouchDB / JSON-rh.32
Radoslav Hodnicak introduces an instance variable 'converter'

which is initialized to

 converter := UTF8TextConverter new

Igor Stasenko, Levente Uzonyi and Hannes Hirzel agreed that the UTF8
conversion does not belong into the JSON package

http://lists.squeakfoundation.org/pipermail/squeak-dev/2010-May/150497.html

Levente Uzonyi:

You only need to convert the characters to UTF-8, because you're
sending them over the network to a server, and Unicode characters have
to be converted to bytes someway. So the JSON printer shouldn't do any
conversion by default except for escaping. The only problem is that
escaping is not done as the spec requires it, but that's easy to fix.

http://www.json.org/

A string is a collection of zero or more Unicode characters, wrapped
in double quotes, using backslash escapes. A character is represented
as a single character string. A string is very much like a C or Java
string.
About escaping Unicode characters

Actually escaping Unicode characters to

\uNNNN

is not necessary for characters with codes >127 in case of an upload
to a CouchDB. But this version does it.

In case you want to patch this change method

 Json class escapeForCharacter: c

Reply | Threaded
Open this post in threaded view
|

Re: WebClient, Json and CouchDB

Igor Stasenko
Hannes, if you would be so kind, please merge your changes
with SCouchDB version of JSON
and save them in SCouchDB repository.

I will take a time to review them an fully integrate & fix all of the issues you
mentioned, including a proper utf-8 output encoding.

On 13 May 2010 03:16, Hannes Hirzel <[hidden email]> wrote:

> On 5/12/10, radoslav hodnicak <[hidden email]> wrote:
>>
>>
>> On Wed, 12 May 2010, Hannes Hirzel wrote:
>>
>>> My question:
>>>
>>> Is there a nicer way of doing
>>>
>>> nnnn := ((c asciiValue bitAnd: 16rFFFF) printStringBase: 16) .
>>>                                              [nnnn size < 4] whileTrue: [nnnn := '0', nnnn].
>>>                              ^ '\u', nnnn
>>>
>>
>> Yes there is. As I said before, check the JSON package in the SCouchDB
>> repository (Igor's link from few days ago), where I fixed this bug. I'm
>> kinda surprised at your insistence to use a buggy/unmaintained JSON code
>> when you have been told several times there's one that's tested to work
>> with CouchDB (I use it in production).
>>
>> rado
>>
>>
>
>
>
> Hello Rado
>
> Yes, your version of the method is nicer
> escapeForCharacter: c
>
>        | index |
>        ^ (index := c asciiValue + 1) <= escapeArray size
>                ifTrue: [ ^ escapeArray at: index ]
>
>
>                "THIS IS WROOONG!!! unicode is not 16bit wide!"
>                ifFalse: [ ^ '\u', (((c asciiValue bitAnd: 16rFFFF) printStringBase:
> 16) padded: #left to: 4 with: $0) ]
>
> However your comment leads me to the non-urgent question: How would we
> deal with a code point >65536?
>
> Thank you for insisting that I check out your copy of the JSON package
> which you maintain in the SCouchDB project. The surprise on my side is
> that you went for creating a copy instead of putting your changes into
> the JSON project as it is open for everybody to write. Your copy is
> actually pretty hidden whereas the general JSON package is easy to
> find.
>
> I went through all the changes you and Igor did in the SCouchDB
> project and decided to fold part of them back to the JSON package
> http://www.squeaksource.org/JSON .
>
> I documented it on the wiki page which goes along with the JSON
> project. I copy it in below.***
>
>
> So the updated test case for working with WebClient, JSON and the
> couchDB is the following.
>
> |json couchDBurl |
>  json := JsonObject new.
>  json title: 'The title of my note card'.
>  json body: 'The body test text of my note card with some Unicode
> test characters ',
>                   (8450 asCharacter asString, 'ä.', Character cr).
>
> "Note: JsonObject behaves like a JavaScript object insofar that you
> can add properties to instances without the necessity that they have
> been declared as instance variables. But you might just as well use
> JsonObject like a Dictionary instead as it is a subclass of
> Dictionary."
>
> "create couchDB instance"
> couchDBurl := 'http://localhost:5984/notes'.
>
> WebClient httpPut: couchDBurl
>                 content: ''
>                 type: 'text/plain'.
>
> "Store first document"
> WebClient httpPut: couchDBurl, '/myNote1'
>                 content: json asJsonString
>                 type: 'text/plain'.
>
> "You get the document back with"
>
> WebClient httpGet: couchDBurl, '/myNote1' .
>
> So far so good. This solution however still escapes code points > 127.
> See a note on this below and more on this in an upcoming post.
>
> Regards
>
> Hannes
>
>
> ----------------------------------------------------------------------------------------------------------
>
> ***
> JSON-hjh.32
>
> Author: Hannes Hirzel
>
> Ancestors: JSON-rh.31
>
> In the project SCouchDB a copy of JSON is maintained by Igor Stasenko
> and Radoslav Hodnicak.
>
> This merges part of the changes back, in particular
>
> SCouchDB project
>
>    * JSON-Igor.Stasenko.28
>    * JSON-Igor.Stasenko.29
>    * JSON-rh.30
>    * JSON-rh.31
>
> Main changes
>
>   1. JsonObject is now a subclass of Dictionary instead of Object. So
> there is no need to implement the Dictionary interface.
>   2. Fix for converting Unicode characters to \uNNNN format (missing
> padding to 4 characters)
>
> No further changes
>
> The SCouchDB project contains more changes in the copy of the JSON package.
>
> I did not go further in merging because in SCouchDB / JSON-rh.32
> Radoslav Hodnicak introduces an instance variable 'converter'
>
> which is initialized to
>
>  converter := UTF8TextConverter new
>
> Igor Stasenko, Levente Uzonyi and Hannes Hirzel agreed that the UTF8
> conversion does not belong into the JSON package
>
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2010-May/150497.html
>
> Levente Uzonyi:
>
> You only need to convert the characters to UTF-8, because you're
> sending them over the network to a server, and Unicode characters have
> to be converted to bytes someway. So the JSON printer shouldn't do any
> conversion by default except for escaping. The only problem is that
> escaping is not done as the spec requires it, but that's easy to fix.
>
> http://www.json.org/
>
> A string is a collection of zero or more Unicode characters, wrapped
> in double quotes, using backslash escapes. A character is represented
> as a single character string. A string is very much like a C or Java
> string.
> About escaping Unicode characters
>
> Actually escaping Unicode characters to
>
> \uNNNN
>
> is not necessary for characters with codes >127 in case of an upload
> to a CouchDB. But this version does it.
>
> In case you want to patch this change method
>
>  Json class escapeForCharacter: c
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: WebClient, Json and CouchDB

Igor Stasenko
In reply to this post by Hannes Hirzel
My comment on
SqS/JSON/JSON-hjh.32

guys, can you give me any idea, why you replaced back the

stream peek / stream next

by

self peek / self next

removed all uses of #peekFor:
and added:

next
        ^ self stream next

peek
        ^ self stream peek


you're seem little concerned with speed?

Reply | Threaded
Open this post in threaded view
|

Re: WebClient, Json and CouchDB

Hannes Hirzel
In reply to this post by Igor Stasenko
On 5/13/10, Igor Stasenko <[hidden email]> wrote:
> Hannes, if you would be so kind, please merge your changes
> with SCouchDB version of JSON
> and save them in SCouchDB repository.
>
> I will take a time to review them an fully integrate & fix all of the issues
> you
> mentioned, including a proper utf-8 output encoding.

Igor, I think you misunderstood me

I took your changes

  SCouchDB project

     * JSON-Igor.Stasenko.28
     * JSON-Igor.Stasenko.29
     * JSON-rh.30
     * JSON-rh.31

and folded them back into SqueakSource/JSON project

--Hannes

Reply | Threaded
Open this post in threaded view
|

Re: WebClient, Json and CouchDB

Igor Stasenko
On 13 May 2010 03:49, Hannes Hirzel <[hidden email]> wrote:

> On 5/13/10, Igor Stasenko <[hidden email]> wrote:
>> Hannes, if you would be so kind, please merge your changes
>> with SCouchDB version of JSON
>> and save them in SCouchDB repository.
>>
>> I will take a time to review them an fully integrate & fix all of the issues
>> you
>> mentioned, including a proper utf-8 output encoding.
>
> Igor, I think you misunderstood me
>
> I took your changes
>
>  SCouchDB project
>
>     * JSON-Igor.Stasenko.28
>     * JSON-Igor.Stasenko.29
>     * JSON-rh.30
>     * JSON-rh.31
>
> and folded them back into SqueakSource/JSON project
>
Yeah, i found that, once i sent message :)

I will merge your fixes then with my package, but i won't switch to
JSON repository,
since i think that reverting 'stream peek' to 'self peek' is bad idea.


> --Hannes
>

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: WebClient, Json and CouchDB

Hannes Hirzel
In reply to this post by Igor Stasenko
On 5/13/10, Igor Stasenko <[hidden email]> wrote:

> My comment on
> SqS/JSON/JSON-hjh.32
>
> guys, can you give me any idea, why you replaced back the
>
> stream peek / stream next
>
> by
>
> self peek / self next
>
> removed all uses of #peekFor:
> and added:
>
> next
> ^ self stream next
>
> peek
> ^ self stream peek
>
>
> you're seem little concerned with speed?


The reason is that this change is in the following version

Name: JSON-Igor.Stasenko.34
Author: Igor.Stasenko
Time: 7 April 2010, 1:58:24.739 am
UUID: 4a92f912-177d-5941-9a4f-a773cb11f659
Ancestors: JSON-Igor.Stasenko.33

And I did not include this version 34 yet in http://www.squeaksource.com/JSON.

Thank you for pointing this out. Yes, I realized that a local upload
for 10000 records resulting in a 7MB compacted couchDB was a bit slow.
I did not measure it though. My estimate is that it took 50 seconds.
And maybe this it not slow - I don't know.

Hannes


P.S. Regarding UTF8, please read my post carefully. It should not be
in the JSON package. Currently all the code values > 128 are escaped
so there is no need for it in case of storing documents. However I do
not know yet how an elegant interface for getting it back properly
should look like.

Reply | Threaded
Open this post in threaded view
|

Re: WebClient, Json and CouchDB

Levente Uzonyi-2
In reply to this post by Hannes Hirzel
On Thu, 13 May 2010, Hannes Hirzel wrote:

> On 5/12/10, radoslav hodnicak <[hidden email]> wrote:
>>
>>
>> On Wed, 12 May 2010, Hannes Hirzel wrote:
>>
>>> My question:
>>>
>>> Is there a nicer way of doing
>>>
>>> nnnn := ((c asciiValue bitAnd: 16rFFFF) printStringBase: 16) .
>>> [nnnn size < 4] whileTrue: [nnnn := '0', nnnn].
>>> ^ '\u', nnnn
>>>
>>
>> Yes there is. As I said before, check the JSON package in the SCouchDB
>> repository (Igor's link from few days ago), where I fixed this bug. I'm
>> kinda surprised at your insistence to use a buggy/unmaintained JSON code
>> when you have been told several times there's one that's tested to work
>> with CouchDB (I use it in production).
>>
>> rado
>>
>>
>
>
>
> Hello Rado
>
> Yes, your version of the method is nicer
> escapeForCharacter: c
>
> | index |
> ^ (index := c asciiValue + 1) <= escapeArray size
> ifTrue: [ ^ escapeArray at: index ]
>
>
> "THIS IS WROOONG!!! unicode is not 16bit wide!"
> ifFalse: [ ^ '\u', (((c asciiValue bitAnd: 16rFFFF) printStringBase:
> 16) padded: #left to: 4 with: $0) ]
>
> However your comment leads me to the non-urgent question: How would we
> deal with a code point >65536?
Noone has to deal with those, since all characters that must be
escaped fit into 16 bits (you can find the escaping rule in RFC 4627 if
you're interested). So this implementation is wrong, because it's
trying to escape everything which asciiValue is greater than 127 and
will fail for values greater than 65535. This escaping is totally
unnecessary, it just gives a (not so) nice slowdown.

From RFC 4627:
"
    ... All Unicode characters may be placed within the
    quotation marks except for the characters that must be escaped:
    quotation mark, reverse solidus, and the control characters (U+0000
    through U+001F).

    Any character may be escaped. ...
"

So the best to do is: escape only $\ $" and the characters from 0 to 31.


Levente

>
> Thank you for insisting that I check out your copy of the JSON package
> which you maintain in the SCouchDB project. The surprise on my side is
> that you went for creating a copy instead of putting your changes into
> the JSON project as it is open for everybody to write. Your copy is
> actually pretty hidden whereas the general JSON package is easy to
> find.
>
> I went through all the changes you and Igor did in the SCouchDB
> project and decided to fold part of them back to the JSON package
> http://www.squeaksource.org/JSON .
>
> I documented it on the wiki page which goes along with the JSON
> project. I copy it in below.***
>
>
> So the updated test case for working with WebClient, JSON and the
> couchDB is the following.
>
> |json couchDBurl |
>  json := JsonObject new.
>  json title: 'The title of my note card'.
>  json body: 'The body test text of my note card with some Unicode
> test characters ',
>                   (8450 asCharacter asString, 'ä.', Character cr).
>
> "Note: JsonObject behaves like a JavaScript object insofar that you
> can add properties to instances without the necessity that they have
> been declared as instance variables. But you might just as well use
> JsonObject like a Dictionary instead as it is a subclass of
> Dictionary."
>
> "create couchDB instance"
> couchDBurl := 'http://localhost:5984/notes'.
>
> WebClient httpPut: couchDBurl
>                 content: ''
>                 type: 'text/plain'.
>
> "Store first document"
> WebClient httpPut: couchDBurl, '/myNote1'
>                 content: json asJsonString
>                 type: 'text/plain'.
>
> "You get the document back with"
>
> WebClient httpGet: couchDBurl, '/myNote1' .
>
> So far so good. This solution however still escapes code points > 127.
> See a note on this below and more on this in an upcoming post.
>
> Regards
>
> Hannes
>
>
> ----------------------------------------------------------------------------------------------------------
>
> ***
> JSON-hjh.32
>
> Author: Hannes Hirzel
>
> Ancestors: JSON-rh.31
>
> In the project SCouchDB a copy of JSON is maintained by Igor Stasenko
> and Radoslav Hodnicak.
>
> This merges part of the changes back, in particular
>
> SCouchDB project
>
>    * JSON-Igor.Stasenko.28
>    * JSON-Igor.Stasenko.29
>    * JSON-rh.30
>    * JSON-rh.31
>
> Main changes
>
>   1. JsonObject is now a subclass of Dictionary instead of Object. So
> there is no need to implement the Dictionary interface.
>   2. Fix for converting Unicode characters to \uNNNN format (missing
> padding to 4 characters)
>
> No further changes
>
> The SCouchDB project contains more changes in the copy of the JSON package.
>
> I did not go further in merging because in SCouchDB / JSON-rh.32
> Radoslav Hodnicak introduces an instance variable 'converter'
>
> which is initialized to
>
> converter := UTF8TextConverter new
>
> Igor Stasenko, Levente Uzonyi and Hannes Hirzel agreed that the UTF8
> conversion does not belong into the JSON package
>
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2010-May/150497.html
>
> Levente Uzonyi:
>
> You only need to convert the characters to UTF-8, because you're
> sending them over the network to a server, and Unicode characters have
> to be converted to bytes someway. So the JSON printer shouldn't do any
> conversion by default except for escaping. The only problem is that
> escaping is not done as the spec requires it, but that's easy to fix.
>
> http://www.json.org/
>
> A string is a collection of zero or more Unicode characters, wrapped
> in double quotes, using backslash escapes. A character is represented
> as a single character string. A string is very much like a C or Java
> string.
> About escaping Unicode characters
>
> Actually escaping Unicode characters to
>
> \uNNNN
>
> is not necessary for characters with codes >127 in case of an upload
> to a CouchDB. But this version does it.
>
> In case you want to patch this change method
>
> Json class escapeForCharacter: c
>
>

Reply | Threaded
Open this post in threaded view
|

Re: WebClient, Json and CouchDB

Igor Stasenko
In reply to this post by Hannes Hirzel
On 13 May 2010 04:02, Hannes Hirzel <[hidden email]> wrote:

> On 5/13/10, Igor Stasenko <[hidden email]> wrote:
>> My comment on
>> SqS/JSON/JSON-hjh.32
>>
>> guys, can you give me any idea, why you replaced back the
>>
>> stream peek / stream next
>>
>> by
>>
>> self peek / self next
>>
>> removed all uses of #peekFor:
>> and added:
>>
>> next
>>       ^ self stream next
>>
>> peek
>>       ^ self stream peek
>>
>>
>> you're seem little concerned with speed?
>
>
> The reason is that this change is in the following version
>
> Name: JSON-Igor.Stasenko.34
> Author: Igor.Stasenko
> Time: 7 April 2010, 1:58:24.739 am
> UUID: 4a92f912-177d-5941-9a4f-a773cb11f659
> Ancestors: JSON-Igor.Stasenko.33
>
> And I did not include this version 34 yet in http://www.squeaksource.com/JSON.
>
> Thank you for pointing this out. Yes, I realized that a local upload
> for 10000 records resulting in a 7MB compacted couchDB was a bit slow.
> I did not measure it though. My estimate is that it took 50 seconds.
> And maybe this it not slow - I don't know.
>
I measured , for what it worth, on a given json, a stream peek vesion
gives about 5% higher parsing speed:

|json |
 json := JsonObject new
        title: 'The title of my note card';
  body: 'The body test text of my note card with some Unicode test characters ';
        foo: 10;
        bar: #( 10 'twenty' #thirty );
        bar1: #( 10 'twenty' #thirty );
        bar2: #( 10 'twenty' #thirty );
        bar3: #( 10 'twenty' #thirty );
        bar4: #( 10 'twenty' #thirty );
        bar5: #( 10 20 30 22 23 24 56 34 36 34 3 634 346 'twenty' #thirty );
      asJsonString.

[ 1000 timesRepeat: [ Json readFrom: json readStream ] ] timeToRun

with self peek/next:
1500
1485

with stream peek/next:

1431
1415

Also, i found that once you put more data in it, you'll get the more difference.
|json |
 json := JsonObject new
        bar: ((1 to: 1000) collect: [:i | i odd ifTrue: [i] ifFalse: [ 'x' ,
i asString ] ])
      asJsonString.

[ 100 timesRepeat: [ Json readFrom: json readStream ] ] timeToRun

self peek/next
3538
stream peek/next
3294

so now its 7%


> Hannes
>
>
> P.S. Regarding UTF8, please read my post carefully. It should not be
> in the JSON package. Currently all the code values > 128 are escaped
> so there is no need for it in case of storing documents. However I do
> not know yet how an elegant interface for getting it back properly
> should look like.
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: WebClient, Json and CouchDB

Igor Stasenko
In reply to this post by Levente Uzonyi-2
2010/5/13 Levente Uzonyi <[hidden email]>:

>>
>> Yes, your version of the method is nicer
>> escapeForCharacter: c
>>
>>        | index |
>>        ^ (index := c asciiValue + 1) <= escapeArray size
>>                ifTrue: [ ^ escapeArray at: index ]
>>
>>
>>                "THIS IS WROOONG!!! unicode is not 16bit wide!"
>>                ifFalse: [ ^ '\u', (((c asciiValue bitAnd: 16rFFFF)
>> printStringBase:
>> 16) padded: #left to: 4 with: $0) ]
>>
>> However your comment leads me to the non-urgent question: How would we
>> deal with a code point >65536?
>
> Noone has to deal with those, since all characters that must be escaped fit
> into 16 bits (you can find the escaping rule in RFC 4627 if you're
> interested). So this implementation is wrong, because it's trying to escape
> everything which asciiValue is greater than 127 and will fail for values
> greater than 65535. This escaping is totally unnecessary, it just gives a
> (not so) nice slowdown.
>
> From RFC 4627:
> "
>   ... All Unicode characters may be placed within the
>   quotation marks except for the characters that must be escaped:
>   quotation mark, reverse solidus, and the control characters (U+0000
>   through U+001F).
>
>   Any character may be escaped. ...
> "
>
> So the best to do is: escape only $\ $" and the characters from 0 to 31.
>
so, how about just this:

escapeForCharacter: c
       
        | index |
        ^ (index := c asciiValue + 1) <= escapeArray size
                ifTrue: [ ^ escapeArray at: index ]
                ifFalse: [ c ]



>
> Levente
>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: WebClient, Json and CouchDB

Hannes Hirzel
In reply to this post by Igor Stasenko
Igor,

I measured it as well, see below

On 5/13/10, Igor Stasenko <[hidden email]> wrote:

> On 13 May 2010 04:02, Hannes Hirzel <[hidden email]> wrote:
>> On 5/13/10, Igor Stasenko <[hidden email]> wrote:
>>> My comment on
>>> SqS/JSON/JSON-hjh.32
>>>
>>> guys, can you give me any idea, why you replaced back the
>>>
>>> stream peek / stream next
>>>
>>> by
>>>
>>> self peek / self next
>>>
>>> removed all uses of #peekFor:
>>> and added:
>>>
>>> next
>>>       ^ self stream next
>>>
>>> peek
>>>       ^ self stream peek
>>>
>>>
>>> you're seem little concerned with speed?
>>
...

> I measured , for what it worth, on a given json, a stream peek vesion
> gives about 5% higher parsing speed:
>
> |json |
>  json := JsonObject new
> title: 'The title of my note card';
>   body: 'The body test text of my note card with some Unicode test
> characters ';
> foo: 10;
> bar: #( 10 'twenty' #thirty );
> bar1: #( 10 'twenty' #thirty );
> bar2: #( 10 'twenty' #thirty );
> bar3: #( 10 'twenty' #thirty );
> bar4: #( 10 'twenty' #thirty );
> bar5: #( 10 20 30 22 23 24 56 34 36 34 3 634 346 'twenty' #thirty );
>       asJsonString.
>
> [ 1000 timesRepeat: [ Json readFrom: json readStream ] ] timeToRun
>
> with self peek/next:
> 1500
> 1485
>
> with stream peek/next:
>
> 1431
> 1415
>
> Also, i found that once you put more data in it, you'll get the more
> difference.
> |json |
>  json := JsonObject new
> bar: ((1 to: 1000) collect: [:i | i odd ifTrue: [i] ifFalse: [ 'x' ,
> i asString ] ])
>       asJsonString.
>
> [ 100 timesRepeat: [ Json readFrom: json readStream ] ] timeToRun
>
> self peek/next
> 3538
> stream peek/next
> 3294
>
> so now its 7%
>



I did a test with actually uploading data to a couchDB

|json couchDBurl |
  json := JsonObject new.
  json title: 'The title of my note card'.
  json body: 'The body test text of my note card with some Unicode
test characters ',
                   (8450 asCharacter asString, 'ä.', Character cr).
  json myTestArray: ((1 to: 1000) collect: [:i | i odd ifTrue: [i]
ifFalse: [ 'x' ,
i asString ] ]).

"Note: JsonObject behaves like a JavaScript object insofar that you
can add properties to instances without the necessity that they have
been declared as instance variables. But you might just as well use
JsonObject like a Dictionary instead as it is a subclass of
Dictionary."

"create couchDB instance"
couchDBurl := 'http://localhost:5984/notes'.

WebClient httpPut: couchDBurl
                 content: ''
                 type: 'text/plain'.

"Store first document"
[1 to: 1000 do: [ :i |
WebClient httpPut: couchDBurl, '/myNote', i printString
                 content: json asJsonString
                 type: 'text/plain'.]] timeToRun printString.

With the speedup
8979
10854

(I measured it two times)

Without the speedup
13752

So it is worth going for it.

I commited http://www.squeaksource.com/JSON/JSON-hjh.34.mcz
It contains
   stream peek
instead of
   self peek

Regards
Hannes

Reply | Threaded
Open this post in threaded view
|

Re: WebClient, Json and CouchDB

Igor Stasenko
i tried to compare speed of two backends (SCouchDb and WebClient)
to find a winner.. but unfortunately WebClient stops with error, while
mine works ok.

Here a doit (maybe Andreas could say something about it):

--------------

|json couchDBurl |
 json := JsonObject new.
 json title: 'The title of my note card'.
 json body: 'The body test text of my note card with some Unicode
test characters ',
                  (8450 asCharacter asString, 'ä.', Character cr).
 json myTestArray: ((1 to: 1000) collect: [:i | i odd ifTrue: [i]
ifFalse: [ 'x' ,
i asString ] ]).

"create couchDB instance"
couchDBurl := 'http://192.168.0.11:5984/foo'.

WebClient httpPut: couchDBurl
                content: ''
                type: 'text/plain'.

"Store first document"
[1 to: 1000 do: [ :i |
WebClient httpPut: couchDBurl, '/myNote', i printString
                content: json asJsonString
                type: 'text/plain'.]] timeToRun printString.

-------------------

i thought that maybe its because some recent updates to SocketStream,
Andreas mentioned.
I updated image to recent trunk (to 10143 now),
but still i get a walkback, when WebResponse trying to read a response
header in #readFrom: aStream

status := stream upToAll: String crlf.

and got status = ''.

i running it on windoze.

And here the doit, which works:
------------
|json db time |
 json := JsonObject new.
 json title: 'The title of my note card'.
 json body: 'The body test text of my note card with some Unicode
test characters ',
                  (8450 asCharacter asString, 'ä.', Character cr).
 json myTestArray: ((1 to: 1000) collect: [:i | i odd ifTrue: [i]
ifFalse: [ 'x' ,
i asString ] ]).


db := SCouchDBAdaptor new host: '192.168.0.11'; ensureDatabase: 'foo'.

time := [1 to: 1000 do: [ :i |
                db documentAt: 'myNote', i printString
                put: json ]] timeToRun printString.

db adaptor deleteDatabase: 'foo'.
time
---------------

--
Best regards,
Igor Stasenko AKA sig.