Smalltalk › Gnu

performance of the json parser

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

11 messages Options

Robin Redeker-2

performance of the json parser

Hi!

I'm having a little performance problem with the JSON example.
Given the following program:

http://www.ta-sa.org/files/data/jstest.st

Produces this output here:

Best : 2 ms
Worst: 21 ms
Avg : 9 ms

That means: the best parsing time is 2ms, the worst was 21 ms, and it
takes 9ms to parse the little json string in the file there.

I don't know where the variance comes from, it looks weird. And it seems
to be even more weird that that small string takes up to 21ms to parse.

The same string takes, in the fast C JSON parser I know, only 0.0165 ms.
And the slowes C JSON parser I know takes 0.147 ms. I don't know how
fast smalltalk is when it comes down to string processing, but I
certainly would not expect an avg of 9ms to parse that string, at least
not with that variance. There must be something else that slows this
down so much.

Any suggestions?

Robin

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

S11001001

Re: performance of the json parser

On Sat, 2007-11-03 at 18:57 +0100, Robin Redeker wrote:
> The same string takes, in the fast C JSON parser I know, only 0.0165 ms.
> And the slowes C JSON parser I know takes 0.147 ms. I don't know how
> fast smalltalk is when it comes down to string processing, but I
> certainly would not expect an avg of 9ms to parse that string, at least
> not with that variance. There must be something else that slows this
> down so much.

How about GC?

Try rerunning the test showing every time result. If it seems that GC
is running pretty often, remember that every Iconv makes its own
1000-byte buffer, in addition to whatever bytes you use in parsing.

And of course encoding support has its own overhead. How are those C
parsers in that field?

--
Our last-ditch plan is to change the forums into a podcast, then send
RSS feeds into the blogosphere so our users can further debate the
legality of mashups amongst this month's 20 'sexiest' gadgets.
--Richard "Lowtax" Kyanka

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

signature.asc (196 bytes) Download Attachment

Robin Redeker-2

Re: performance of the json parser

On Sat, Nov 03, 2007 at 06:06:05PM -0500, Stephen Compall wrote:

> On Sat, 2007-11-03 at 18:57 +0100, Robin Redeker wrote:
> > The same string takes, in the fast C JSON parser I know, only 0.0165 ms.
> > And the slowes C JSON parser I know takes 0.147 ms. I don't know how
> > fast smalltalk is when it comes down to string processing, but I
> > certainly would not expect an avg of 9ms to parse that string, at least
> > not with that variance. There must be something else that slows this
> > down so much.
>
> How about GC?
>
> Try rerunning the test showing every time result. If it seems that GC
> is running pretty often, remember that every Iconv makes its own
> 1000-byte buffer, in addition to whatever bytes you use in parsing.
>
> And of course encoding support has its own overhead. How are those C
> parsers in that field?

Well, those only operate on unicode characters as the json
implementation does. But they indeed don't do any encoding stuff.
So that might explain a bit of the overhead, and also the GC stuff might
be explain the bad performance. Is GC known to cause multiple ms long
breaks?

If I make a list of iterations I get:

27
5
22
5
7
24
6
24
5
21
...

Does the GC run that often? And still, I find 17 a _quite_ long
pause for a GC run, but I'm of course not used to GC systems as I've been
using refcounting systems for most of the time now.

Robin

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

S11001001

Re: performance of the json parser

On Sun, 2007-11-04 at 01:07 +0100, Robin Redeker wrote:
> Well, those only operate on unicode characters as the json
> implementation does. But they indeed don't do any encoding stuff.
> So that might explain a bit of the overhead, and also the GC stuff might
> be explain the bad performance. Is GC known to cause multiple ms long
> breaks?

It explains a *lot* of the overhead. In 604 (615 was "have JSON work on
Unicode"), I got 0/14/1, with only a few outliers (presumably due to GC)
and 0/1 dominating the set. In 619 I get 5/34/14, though of course
using I18N also loads the GC so it can't account for all of that.

I should add that I would find a JSON parser/formatter that supports
I18N to be far more useful than one without, regardless of the load.
Furthermore, I would rather have UnicodeStrings than Strings, which by
my count would eliminate 17 of the 18 Iconvs that your test produces on
each iteration, though this doesn't appear to be an option.

--
Our last-ditch plan is to change the forums into a podcast, then send
RSS feeds into the blogosphere so our users can further debate the
legality of mashups amongst this month's 20 'sexiest' gadgets.
--Richard "Lowtax" Kyanka

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

signature.asc (196 bytes) Download Attachment

Robin Redeker-2

Re: performance of the json parser

On Sat, Nov 03, 2007 at 09:05:20PM -0500, Stephen Compall wrote:

> On Sun, 2007-11-04 at 01:07 +0100, Robin Redeker wrote:
> > Well, those only operate on unicode characters as the json
> > implementation does. But they indeed don't do any encoding stuff.
> > So that might explain a bit of the overhead, and also the GC stuff might
> > be explain the bad performance. Is GC known to cause multiple ms long
> > breaks?
>
> It explains a *lot* of the overhead. In 604 (615 was "have JSON work on
> Unicode"), I got 0/14/1, with only a few outliers (presumably due to GC)
> and 0/1 dominating the set. In 619 I get 5/34/14, though of course
> using I18N also loads the GC so it can't account for all of that.

Thats very interesting, thanks for testing that. It surprises me a bit
that GC is indeed such a huge overhead.

I guess the GC doesn't leave much space for optimization?

> I should add that I would find a JSON parser/formatter that supports
> I18N to be far more useful than one without, regardless of the load.
> Furthermore, I would rather have UnicodeStrings than Strings, which by
> my count would eliminate 17 of the 18 Iconvs that your test produces on
> each iteration, though this doesn't appear to be an option.

A JSON parser _MUST_ support Unicode or else it's not a JSON parser :-)
Maybe I can change the parser to parse utf-8 encoded byte-buffers
instead of real Unicode strings. But I don't think that there is much
to be saved here anyways. (The best way to save performance is probably
to implement a C parser for utf-8 encoded byte-buffers with the same
ST API)

And indeed, skipping the iconv step would just test the JSON parser
itself. In my application that uses the JSON parser I actually can't
skip that step as I get the data from the network and have to run it
through iconv anyways.

Maybe I could move GC into some process that only runs when the system
is idle and have a timer that ensures that the GC runs once in while.
That would probably optimize the latency of my server application a bit.

Robin

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

S11001001

Re: performance of the json parser

On Sun, 2007-11-04 at 14:00 +0100, Robin Redeker wrote:
> And indeed, skipping the iconv step would just test the JSON parser
> itself. In my application that uses the JSON parser I actually can't
> skip that step as I get the data from the network and have to run it
> through iconv anyways.

I don't mean the input; of course that should be iconved. I'm referring
to the data structures produced by the program.

A UnicodeString, if written out, may do the conversion anyway in a way
similar to how it's done now; however, it will also first ask whether
the stream supports UnicodeString handling, in which case it'll rely on
the stream to recode it properly.

With the attached patch and removing use of "outputEncoding" from the
example, I got 2/22/3 with a great reduction in outliers (opposed to
5/34/14 before).

Let me know if you like it and I'll add it to my archive.

--
Our last-ditch plan is to change the forums into a podcast, then send
RSS feeds into the blogosphere so our users can further debate the
legality of mashups amongst this month's 20 'sexiest' gadgets.
--Richard "Lowtax" Kyanka

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

json-optional-outputenc.patch (1K) Download Attachment

signature.asc (196 bytes) Download Attachment

Paolo Bonzini

Re: performance of the json parser

> With the attached patch and removing use of "outputEncoding" from the
> example, I got 2/22/3 with a great reduction in outliers (opposed to
> 5/34/14 before).

Fine by me.

The best thing would be to avoid completely allocating the input buffer
for the Iconv objects when the streams are backed by a String. However,
I fear that's post-3.0.

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Robin Redeker-2

Re: performance of the json parser

In reply to this post by S11001001

On Sun, Nov 04, 2007 at 07:21:55PM +0000, Stephen Compall wrote:

> On Sun, 2007-11-04 at 14:00 +0100, Robin Redeker wrote:
> > And indeed, skipping the iconv step would just test the JSON parser
> > itself. In my application that uses the JSON parser I actually can't
> > skip that step as I get the data from the network and have to run it
> > through iconv anyways.
>
> I don't mean the input; of course that should be iconved. I'm referring
> to the data structures produced by the program.
>
> A UnicodeString, if written out, may do the conversion anyway in a way
> similar to how it's done now; however, it will also first ask whether
> the stream supports UnicodeString handling, in which case it'll rely on
> the stream to recode it properly.
>
> With the attached patch and removing use of "outputEncoding" from the
> example, I got 2/22/3 with a great reduction in outliers (opposed to
> 5/34/14 before).
>
> Let me know if you like it and I'll add it to my archive.
>

I like it! Especially the 5/34/14 -> 2/22/3 is incredible!

> - str := ReadWriteStream on: UnicodeString new.
> + str := WriteStream on: (UnicodeString new: 8).

Whats the magic behind 8?

Robin

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Paolo Bonzini

Re: performance of the json parser

> I like it! Especially the 5/34/14 -> 2/22/3 is incredible!
>
>> - str := ReadWriteStream on: UnicodeString new.
>> + str := WriteStream on: (UnicodeString new: 8).
>
> Whats the magic behind 8?

Greater than 0. :-)

If I have to choose a magic number, I would choose 6 (8 for the header +
6*4 = a cache line) or 14. Not that I think it makes any difference in
practice.

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

S11001001

Re: performance of the json parser

As discussed,

smalltalk--backstage--2.2--patch-70
optionalize JSON outputEncoding

On Mon, 2007-11-05 at 11:26 +0100, Paolo Bonzini wrote:
> > Whats the magic behind 8?
>
> Greater than 0. :-)

I picked the growCollection size for a WriteStream collection of size 0,
which is what was originally produced.

Speaking of which:

WriteStream class extend [
onSpecies: aSpecies [
^self on: (aSpecies new: 8)
]
]

? Since collection species needs to support #new: anyway....

--
Our last-ditch plan is to change the forums into a podcast, then send
RSS feeds into the blogosphere so our users can further debate the
legality of mashups amongst this month's 20 'sexiest' gadgets.
--Richard "Lowtax" Kyanka

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

signature.asc (196 bytes) Download Attachment

Paolo Bonzini

Re: performance of the json parser

> WriteStream class extend [
> onSpecies: aSpecies [
> ^self on: (aSpecies new: 8)
> ]
> ]
>
> ? Since collection species needs to support #new: anyway....

What about instead

ArrayedCollection class extend [
writeStream [
^WriteStream on: (self new: 8)
]
]

As in "Array writeStream"? Actually, there is SequenceableCollection
class>>#streamContents: too, which is what you were looking for. I am
moving it to ArrayedCollection, since those are the collection for which
the current implementation works.

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk