Hi!
I'm having a little performance problem with the JSON example. Given the following program: http://www.ta-sa.org/files/data/jstest.st Produces this output here: Best : 2 ms Worst: 21 ms Avg : 9 ms That means: the best parsing time is 2ms, the worst was 21 ms, and it takes 9ms to parse the little json string in the file there. I don't know where the variance comes from, it looks weird. And it seems to be even more weird that that small string takes up to 21ms to parse. The same string takes, in the fast C JSON parser I know, only 0.0165 ms. And the slowes C JSON parser I know takes 0.147 ms. I don't know how fast smalltalk is when it comes down to string processing, but I certainly would not expect an avg of 9ms to parse that string, at least not with that variance. There must be something else that slows this down so much. Any suggestions? Robin _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Sat, 2007-11-03 at 18:57 +0100, Robin Redeker wrote:
> The same string takes, in the fast C JSON parser I know, only 0.0165 ms. > And the slowes C JSON parser I know takes 0.147 ms. I don't know how > fast smalltalk is when it comes down to string processing, but I > certainly would not expect an avg of 9ms to parse that string, at least > not with that variance. There must be something else that slows this > down so much. How about GC? Try rerunning the test showing every time result. If it seems that GC is running pretty often, remember that every Iconv makes its own 1000-byte buffer, in addition to whatever bytes you use in parsing. And of course encoding support has its own overhead. How are those C parsers in that field? -- Our last-ditch plan is to change the forums into a podcast, then send RSS feeds into the blogosphere so our users can further debate the legality of mashups amongst this month's 20 'sexiest' gadgets. --Richard "Lowtax" Kyanka _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk signature.asc (196 bytes) Download Attachment |
On Sat, Nov 03, 2007 at 06:06:05PM -0500, Stephen Compall wrote:
> On Sat, 2007-11-03 at 18:57 +0100, Robin Redeker wrote: > > The same string takes, in the fast C JSON parser I know, only 0.0165 ms. > > And the slowes C JSON parser I know takes 0.147 ms. I don't know how > > fast smalltalk is when it comes down to string processing, but I > > certainly would not expect an avg of 9ms to parse that string, at least > > not with that variance. There must be something else that slows this > > down so much. > > How about GC? > > Try rerunning the test showing every time result. If it seems that GC > is running pretty often, remember that every Iconv makes its own > 1000-byte buffer, in addition to whatever bytes you use in parsing. > > And of course encoding support has its own overhead. How are those C > parsers in that field? Well, those only operate on unicode characters as the json implementation does. But they indeed don't do any encoding stuff. So that might explain a bit of the overhead, and also the GC stuff might be explain the bad performance. Is GC known to cause multiple ms long breaks? If I make a list of iterations I get: 27 5 22 5 7 24 6 24 5 21 ... Does the GC run that often? And still, I find 17 a _quite_ long pause for a GC run, but I'm of course not used to GC systems as I've been using refcounting systems for most of the time now. Robin _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Sun, 2007-11-04 at 01:07 +0100, Robin Redeker wrote:
> Well, those only operate on unicode characters as the json > implementation does. But they indeed don't do any encoding stuff. > So that might explain a bit of the overhead, and also the GC stuff might > be explain the bad performance. Is GC known to cause multiple ms long > breaks? It explains a *lot* of the overhead. In 604 (615 was "have JSON work on Unicode"), I got 0/14/1, with only a few outliers (presumably due to GC) and 0/1 dominating the set. In 619 I get 5/34/14, though of course using I18N also loads the GC so it can't account for all of that. I should add that I would find a JSON parser/formatter that supports I18N to be far more useful than one without, regardless of the load. Furthermore, I would rather have UnicodeStrings than Strings, which by my count would eliminate 17 of the 18 Iconvs that your test produces on each iteration, though this doesn't appear to be an option. -- Our last-ditch plan is to change the forums into a podcast, then send RSS feeds into the blogosphere so our users can further debate the legality of mashups amongst this month's 20 'sexiest' gadgets. --Richard "Lowtax" Kyanka _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk signature.asc (196 bytes) Download Attachment |
On Sat, Nov 03, 2007 at 09:05:20PM -0500, Stephen Compall wrote:
> On Sun, 2007-11-04 at 01:07 +0100, Robin Redeker wrote: > > Well, those only operate on unicode characters as the json > > implementation does. But they indeed don't do any encoding stuff. > > So that might explain a bit of the overhead, and also the GC stuff might > > be explain the bad performance. Is GC known to cause multiple ms long > > breaks? > > It explains a *lot* of the overhead. In 604 (615 was "have JSON work on > Unicode"), I got 0/14/1, with only a few outliers (presumably due to GC) > and 0/1 dominating the set. In 619 I get 5/34/14, though of course > using I18N also loads the GC so it can't account for all of that. Thats very interesting, thanks for testing that. It surprises me a bit that GC is indeed such a huge overhead. I guess the GC doesn't leave much space for optimization? > I should add that I would find a JSON parser/formatter that supports > I18N to be far more useful than one without, regardless of the load. > Furthermore, I would rather have UnicodeStrings than Strings, which by > my count would eliminate 17 of the 18 Iconvs that your test produces on > each iteration, though this doesn't appear to be an option. A JSON parser _MUST_ support Unicode or else it's not a JSON parser :-) Maybe I can change the parser to parse utf-8 encoded byte-buffers instead of real Unicode strings. But I don't think that there is much to be saved here anyways. (The best way to save performance is probably to implement a C parser for utf-8 encoded byte-buffers with the same ST API) And indeed, skipping the iconv step would just test the JSON parser itself. In my application that uses the JSON parser I actually can't skip that step as I get the data from the network and have to run it through iconv anyways. Maybe I could move GC into some process that only runs when the system is idle and have a timer that ensures that the GC runs once in while. That would probably optimize the latency of my server application a bit. Robin _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Sun, 2007-11-04 at 14:00 +0100, Robin Redeker wrote:
> And indeed, skipping the iconv step would just test the JSON parser > itself. In my application that uses the JSON parser I actually can't > skip that step as I get the data from the network and have to run it > through iconv anyways. I don't mean the input; of course that should be iconved. I'm referring to the data structures produced by the program. A UnicodeString, if written out, may do the conversion anyway in a way similar to how it's done now; however, it will also first ask whether the stream supports UnicodeString handling, in which case it'll rely on the stream to recode it properly. With the attached patch and removing use of "outputEncoding" from the example, I got 2/22/3 with a great reduction in outliers (opposed to 5/34/14 before). Let me know if you like it and I'll add it to my archive. -- Our last-ditch plan is to change the forums into a podcast, then send RSS feeds into the blogosphere so our users can further debate the legality of mashups amongst this month's 20 'sexiest' gadgets. --Richard "Lowtax" Kyanka _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk json-optional-outputenc.patch (1K) Download Attachment signature.asc (196 bytes) Download Attachment |
> With the attached patch and removing use of "outputEncoding" from the > example, I got 2/22/3 with a great reduction in outliers (opposed to > 5/34/14 before). Fine by me. The best thing would be to avoid completely allocating the input buffer for the Iconv objects when the streams are backed by a String. However, I fear that's post-3.0. Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by S11001001
On Sun, Nov 04, 2007 at 07:21:55PM +0000, Stephen Compall wrote:
> On Sun, 2007-11-04 at 14:00 +0100, Robin Redeker wrote: > > And indeed, skipping the iconv step would just test the JSON parser > > itself. In my application that uses the JSON parser I actually can't > > skip that step as I get the data from the network and have to run it > > through iconv anyways. > > I don't mean the input; of course that should be iconved. I'm referring > to the data structures produced by the program. > > A UnicodeString, if written out, may do the conversion anyway in a way > similar to how it's done now; however, it will also first ask whether > the stream supports UnicodeString handling, in which case it'll rely on > the stream to recode it properly. > > With the attached patch and removing use of "outputEncoding" from the > example, I got 2/22/3 with a great reduction in outliers (opposed to > 5/34/14 before). > > Let me know if you like it and I'll add it to my archive. > I like it! Especially the 5/34/14 -> 2/22/3 is incredible! > - str := ReadWriteStream on: UnicodeString new. > + str := WriteStream on: (UnicodeString new: 8). Whats the magic behind 8? Robin _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
> I like it! Especially the 5/34/14 -> 2/22/3 is incredible! > >> - str := ReadWriteStream on: UnicodeString new. >> + str := WriteStream on: (UnicodeString new: 8). > > Whats the magic behind 8? Greater than 0. :-) If I have to choose a magic number, I would choose 6 (8 for the header + 6*4 = a cache line) or 14. Not that I think it makes any difference in practice. Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
As discussed,
smalltalk--backstage--2.2--patch-70 optionalize JSON outputEncoding On Mon, 2007-11-05 at 11:26 +0100, Paolo Bonzini wrote: > > Whats the magic behind 8? > > Greater than 0. :-) I picked the growCollection size for a WriteStream collection of size 0, which is what was originally produced. Speaking of which: WriteStream class extend [ onSpecies: aSpecies [ ^self on: (aSpecies new: 8) ] ] ? Since collection species needs to support #new: anyway.... -- Our last-ditch plan is to change the forums into a podcast, then send RSS feeds into the blogosphere so our users can further debate the legality of mashups amongst this month's 20 'sexiest' gadgets. --Richard "Lowtax" Kyanka _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk signature.asc (196 bytes) Download Attachment |
> WriteStream class extend [ > onSpecies: aSpecies [ > ^self on: (aSpecies new: 8) > ] > ] > > ? Since collection species needs to support #new: anyway.... What about instead ArrayedCollection class extend [ writeStream [ ^WriteStream on: (self new: 8) ] ] As in "Array writeStream"? Actually, there is SequenceableCollection class>>#streamContents: too, which is what you were looking for. I am moving it to ArrayedCollection, since those are the collection for which the current implementation works. Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
Free forum by Nabble | Edit this page |