Hi guys
tristan is a new student here and he would like to work on collection optimization and implementation. I would like the get some ideas from you. - are there some collections that would be cool to improve? - are there some collections that would be cool to have and that we do not have yet? If you have any idea related to optimizations let me know. Stef _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
stephane ducasse wrote:
> Hi guys > > tristan is a new student here and he would like to work on collection optimization and implementation. > I would like the get some ideas from you. > - are there some collections that would be cool to improve? > - are there some collections that would be cool to have and that we do not have yet? > If you have any idea related to optimizations let me know. I will get bashed but: [1] What would be cool for Seaside IMHO: - Multi Value Dictionaries, e.g. for request parameters - Some kind of thread safe map with atomic operations [2], e.g. for session store [1] http://code.google.com/p/google-collections/ [2] http://java.sun.com/javase/6/docs/api/java/util/concurrent/ConcurrentMap.html Cheers Philippe _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
> What would be cool for Seaside IMHO:
> - Multi Value Dictionaries, e.g. for request parameters > - Some kind of thread safe map with atomic operations [2], e.g. for > session store Why would that be cool? We already have a WAMultiValueDictionary for request parameters. Also we have some ad-hock solutions for the thread safe dictionaries. I would be more interested to see ... - if it would be beneficial for a collection to be able to swap out its internal representation automatically depending on its load and use ... - what we can do by externalizing the enumeration protocol (no, this is not the same as streams or the iterable refactoring that was recently proposed) ... - how to make collections more composable, e.g. make them thread safe, make them read-only, make them ordered (sets and dictionaries), ... Lukas -- Lukas Renggli http://www.lukas-renggli.ch _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Lukas Renggli wrote:
>> What would be cool for Seaside IMHO: >> - Multi Value Dictionaries, e.g. for request parameters >> - Some kind of thread safe map with atomic operations [2], e.g. for >> session store > > Why would that be cool? We already have a WAMultiValueDictionary for > request parameters. We could (theoretically, if other dialects follow) delete our code. Everybody who needs something similar doesn't need to write his own. > Also we have some ad-hock solutions for the thread > safe dictionaries. Yeah, the same comment as above applies. Cheers Philippe _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by stephane ducasse
On 11 March 2010 15:24, stephane ducasse <[hidden email]> wrote:
> Hi guys > > tristan is a new student here and he would like to work on collection optimization and implementation. > I would like the get some ideas from you. > - are there some collections that would be cool to improve? > - are there some collections that would be cool to have and that we do not have yet? > If you have any idea related to optimizations let me know. > I'd say that it is strange choice, because collections , apart of having big protocols don't really interesting piece of code from an optimization side, because most of things there already implemented quite good. Reimplementing them using traits - that's would be cool. > Stef > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Philippe Marschall-2
Thanks for the suggestions :)
> >> Hi guys >> >> tristan is a new student here and he would like to work on collection optimization and implementation. >> I would like the get some ideas from you. >> - are there some collections that would be cool to improve? >> - are there some collections that would be cool to have and that we do not have yet? >> If you have any idea related to optimizations let me know. > > I will get bashed but: [1] no not us :) You remember I forced you to give a talk on what we could learn from Java :) > What would be cool for Seaside IMHO: > - Multi Value Dictionaries, e.g. for request parameters > - Some kind of thread safe map with atomic operations [2], e.g. for > session store > > [1] http://code.google.com/p/google-collections/ > [2] > http://java.sun.com/javase/6/docs/api/java/util/concurrent/ConcurrentMap.html > > Cheers > Philippe > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by stephane ducasse
On Thu, 11 Mar 2010, stephane ducasse wrote:
> Hi guys > > tristan is a new student here and he would like to work on collection optimization and implementation. > I would like the get some ideas from you. > - are there some collections that would be cool to improve? Here's the list I plan to improve is Squeak: - OrderedCollection: fix the growing behavior, adding n elements takes O(n^2) time in some cases - SortedCollection: just deprecate it (or replace it's crappy quicksort implementation if you really want to improve it. But I think it's useless) - TextStream: adding n characters with different attributes has O(n^2) runtime - StandardFileStream >> #upTo: (the recursive call + concatenation requires O(n^2) runtime and O(n^2) memory for large chunks of data) Levente > - are there some collections that would be cool to have and that we do not have yet? > If you have any idea related to optimizations let me know. > > Stef > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On 13.03.2010 01:23, Levente Uzonyi wrote:
> On Thu, 11 Mar 2010, stephane ducasse wrote: > >> Hi guys >> >> tristan is a new student here and he would like to work on collection >> optimization and implementation. >> I would like the get some ideas from you. >> - are there some collections that would be cool to improve? > > Here's the list I plan to improve is Squeak: > - OrderedCollection: fix the growing behavior, adding n elements takes > O(n^2) time in some cases really a usecase worth optimizing for? makeRoomAtFirst:/Last: can definately be sped up using replaceFrom:to:with:startingAt: though. > - SortedCollection: just deprecate it (or replace it's crappy > quicksort implementation if you really want to improve it. But I think > it's useless) +1 > > - StandardFileStream >> #upTo: (the recursive call + concatenation > requires O(n^2) runtime and O(n^2) memory for large chunks of data) > I did a different implementation some time ago, didn't notice enough of a speedup to consider posting it though. Hope you have better luck :) Cheers, Henry _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Levente Uzonyi-2
On Sat, Mar 13, 2010 at 12:23 AM, Levente Uzonyi <[hidden email]> wrote:
> - SortedCollection: just deprecate it (or replace it's crappy quicksort > implementation if you really want to improve it. But I think it's useless) You shouldn't deprecate it... it's ANSI. :) But it's implementation could certainly be improved... Julian _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Henrik Sperre Johansen
On Sat, 13 Mar 2010, Henrik Sperre Johansen wrote:
> On 13.03.2010 01:23, Levente Uzonyi wrote: >> On Thu, 11 Mar 2010, stephane ducasse wrote: >> >>> Hi guys >>> >>> tristan is a new student here and he would like to work on collection >>> optimization and implementation. >>> I would like the get some ideas from you. >>> - are there some collections that would be cool to improve? >> >> Here's the list I plan to improve is Squeak: >> - OrderedCollection: fix the growing behavior, adding n elements takes >> O(n^2) time in some cases > If you're talking about alternate addFirst: addLast: adds, is that really a > usecase worth optimizing for? > makeRoomAtFirst:/Last: can definately be sped up using > replaceFrom:to:with:startingAt: though. My ideas would result in a general minor speedup, while mixed use of #addFirst: and #addLast: wouldn't suffer the performance penalty at all. So we can win something without losing anything. #replaceFrom:to:with:startingAt: can only be used to move elements to a position with lower index, otherwise overlapping areas will be overwritten. >> - SortedCollection: just deprecate it (or replace it's crappy quicksort >> implementation if you really want to improve it. But I think it's useless) > +1 >> >> - StandardFileStream >> #upTo: (the recursive call + concatenation requires >> O(n^2) runtime and O(n^2) memory for large chunks of data) >> > I did a different implementation some time ago, didn't notice enough of a > speedup to consider posting it though. Hope you have better luck :) > I don't have an implementation yet, but I expect ~3-5% speedup if reading all the sources. This may seem insignificant, but if other bottlenecks are out of the way, this may be 15-25%. And my opinion is still that "library functions" should be optimized for most usecases, so this should be fixed. Levente > Cheers, > Henry > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Julian Fitzell-2
On Sat, 13 Mar 2010, Julian Fitzell wrote:
> On Sat, Mar 13, 2010 at 12:23 AM, Levente Uzonyi <[hidden email]> wrote: >> - SortedCollection: just deprecate it (or replace it's crappy quicksort >> implementation if you really want to improve it. But I think it's useless) > > You shouldn't deprecate it... it's ANSI. :) That's why I suggested deprecating instead of removing. :) > > But it's implementation could certainly be improved... > IMHO it shouldn't be used at all. Levente > Julian > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Julian Fitzell-2
10 or 15 years back I look a look at the SortedCollection logic with an eye to improve it.
Oddly I found the folks who wrote it had a good understanding of how the code becomes bytecodes and how that impacts performance. My attempts made things worst, so good luck. Personally I'd choose the font rendering path, since there is lots of *stuff* required to run to splash a character glyph to Display. On 2010-03-12, at 4:46 PM, Julian Fitzell wrote: > On Sat, Mar 13, 2010 at 12:23 AM, Levente Uzonyi <[hidden email]> wrote: >> - SortedCollection: just deprecate it (or replace it's crappy quicksort >> implementation if you really want to improve it. But I think it's useless) > > You shouldn't deprecate it... it's ANSI. :) > > But it's implementation could certainly be improved... > > Julian > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project -- =========================================================================== John M. McIntosh <[hidden email]> Twitter: squeaker68882 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com =========================================================================== _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On Fri, 12 Mar 2010, John M McIntosh wrote:
> 10 or 15 years back I look a look at the SortedCollection logic with an eye to improve it. > Oddly I found the folks who wrote it had a good understanding of how the code becomes bytecodes and > how that impacts performance. My attempts made things worst, so good luck. Quicksort has worse performance than merge sort in dynamic languages (and in general when comparison costs more than minimal), because it makes more comparisons. The simple quicksort performs really bad (O(n^2)) if there are only a few different elements. A 3-way partitioning algorithm could perform much better in general, but I think it'd be still slower than merge sort. Here is a bug report which is pretty naive, but shows the effect of the few different elements case: http://code.google.com/p/pharo/issues/detail?id=1718&q=sort&colspec=ID%20Type%20Status%20Summary%20Milestone%20Difficulty Squeak/Pharo's merge sort implementation is highly optimized. The merge algorithm compiles to pure bytecodes and primitive sends. Levente > > Personally I'd choose the font rendering path, since there is lots of *stuff* required to run to splash a > character glyph to Display. > > On 2010-03-12, at 4:46 PM, Julian Fitzell wrote: > >> On Sat, Mar 13, 2010 at 12:23 AM, Levente Uzonyi <[hidden email]> wrote: >>> - SortedCollection: just deprecate it (or replace it's crappy quicksort >>> implementation if you really want to improve it. But I think it's useless) >> >> You shouldn't deprecate it... it's ANSI. :) >> >> But it's implementation could certainly be improved... >> >> Julian >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > > -- > =========================================================================== > John M. McIntosh <[hidden email]> Twitter: squeaker68882 > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > =========================================================================== > > > > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Levente Uzonyi-2
I think that this is great.
We need to optimize as much as it makes sense without breaking our nice apis. Stef On Mar 13, 2010, at 1:23 AM, Levente Uzonyi wrote: > On Thu, 11 Mar 2010, stephane ducasse wrote: > >> Hi guys >> >> tristan is a new student here and he would like to work on collection optimization and implementation. >> I would like the get some ideas from you. >> - are there some collections that would be cool to improve? > > Here's the list I plan to improve is Squeak: > - OrderedCollection: fix the growing behavior, adding n elements takes O(n^2) time in some cases > - SortedCollection: just deprecate it (or replace it's crappy quicksort implementation if you really want to improve it. But I think it's useless) > - TextStream: adding n characters with different attributes has O(n^2) runtime > - StandardFileStream >> #upTo: (the recursive call + concatenation requires O(n^2) runtime and O(n^2) memory for large chunks of data) > > > Levente > >> - are there some collections that would be cool to have and that we do not have yet? >> If you have any idea related to optimizations let me know. >> >> Stef >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by stephane ducasse
Igor wrote:
>Yes. Use right tool for to do job. >An OrderedCollection can't satisfy every possible combination of >tasks, which developer facing. So, these collections are currently missing. And as all current code is bound to the current implementations, switching implementation on size is the prefered approach. Stephan _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On 13 March 2010 12:56, Stephan Eggermont <[hidden email]> wrote:
> Igor wrote: >>Yes. Use right tool for to do job. >>An OrderedCollection can't satisfy every possible combination of >>tasks, which developer facing. > > So, these collections are currently missing. And as all current > code is bound to the current implementations, switching > implementation on size is the prefered approach. > If such kind of collections didn't existed over more than 30 years of smalltalk existance, what makes you think that there a big interest in having them today? Of course, they could occupy some niche, but i doubt that they will replace an existing classes in a day-to-day use. Besides, many projects implementing own kinds of collections, specific to their needs. > Stephan > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by stephane ducasse
Igor wrote:
>If such kind of collections didn't existed over more than >30 years of smalltalk existance, what makes you think >that there a big interest in having them today? Memory. Without the memory to use them, the need was not there. >Of course, they could occupy some niche, but i doubt >that they will replace an existing classes in a day-to-day use. In a 64 bit system, it is not a niche. You simply need a set of collection classes that work well with millions an tens of millions of elements, not just 100.000s. >Besides, many projects implementing own kinds of collections, >specific to their needs. That is exactly what you want to avoid. I've had to implement these collections too often. Stephan _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On 13 March 2010 13:24, Stephan Eggermont <[hidden email]> wrote:
> Igor wrote: >>If such kind of collections didn't existed over more than >>30 years of smalltalk existance, what makes you think >>that there a big interest in having them today? > Memory. Without the memory to use them, the need was > not there. >>Of course, they could occupy some niche, but i doubt >>that they will replace an existing classes in a day-to-day use. > In a 64 bit system, it is not a niche. You simply need a set of > collection classes that work well with millions an tens of millions > of elements, not just 100.000s. What you talking about is an edge case in currently existing systems. And its easy to prove: (OrderedCollection allInstances collect: [:each | each size ]) asBag sortedCounts {2956->0 . 1054->1 . 570->4 . 457->5 . 387->6 . 288->2 . 247->3 . 222->7 . 162->8 . 145->9 . 83->10 . 66->11 . 41->12 . 27->14 . 18->13 . 14->15 . 11->16 . 8->30 . 6->17 . 6->25 . 5->36 . 4->21 . 4->27 . 4->42 . 3->26 . 2->18 . 2->23 . 2->24 . 2->35 . 2->44 . 2->52 . 2->222 . 2->336 . 2->1352 . 1->20 . 1->22 . 1->32 . 1->33 . 1->34 . 1->38 . 1->43 . 1->48 . 1->53 . 1->69 . 1->73 . 1->74 . 1->89 . 1->90 . 1->100 . 1->120 . 1->121 . 1->129 . 1->131 . 1->133 . 1->137 . 1->158 . 1->167 . 1->177 . 1->190 . 1->197 . 1->198 . 1->206 . 1->219 . 1->231 . 1->232 . 1->278 . 1->293 . 1->312 . 1->317 . 1->354 . 1->359 . 1->372 . 1->394 . 1->412 . 1->482 . 1->565 . 1->605 . 1->640 . 1->670 . 1->680 . 1->694 . 1->751 . 1->872} and until this list will show a 5% or more presence of collections which having 100000 and more elements, i don't see a pressing need in doing something, because obviously, a current implementation which works fine for general cases. Or do you think that on 64 bit systems the above distribution will be different? >>Besides, many projects implementing own kinds of collections, >>specific to their needs. > That is exactly what you want to avoid. I've had to implement > these collections too often. > So, what stopping you from creating a separate package, say CollectionsXL and add there a generalized versions of collections which optimized for working with extremely large datasets? > Stephan > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Stephan Eggermont-3
On Sat, 13 Mar 2010, Stephan Eggermont wrote:
> Igor wrote: >> Yes. Use right tool for to do job. >> An OrderedCollection can't satisfy every possible combination of >> tasks, which developer facing. > > So, these collections are currently missing. And as all current > code is bound to the current implementations, switching > implementation on size is the prefered approach. Trees are rarely useful in Smalltalk, so there's no default tree implementation. Note that trees consume a lot more memory (>=5x) than a single array. Levente > > Stephan > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by stephane ducasse
Levente wrote:
>Trees are rarely useful in Smalltalk, so there's no >default tree implementation. Note that trees consume >a lot more memory (>=5x) than a single array. Huh? You mean binary trees? Or Red-Black or so? On current procesors they are not very useful, and not usable at all for large data structures because of the large overhead. But n-ary trees or two-level stuctures are. Stephan _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Free forum by Nabble | Edit this page |