Collection

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Collection

stephane ducasse
Hi guys

tristan is a new student here and he would like to work on collection optimization and implementation.
I would like the get some ideas from you.
        - are there some collections that would be cool to improve?
        - are there some collections that would be cool to have and that we do not have yet?
If you have any idea related to optimizations let me know.

Stef
_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Philippe Marschall-2
stephane ducasse wrote:
> Hi guys
>
> tristan is a new student here and he would like to work on collection optimization and implementation.
> I would like the get some ideas from you.
> - are there some collections that would be cool to improve?
> - are there some collections that would be cool to have and that we do not have yet?
> If you have any idea related to optimizations let me know.

I will get bashed but: [1]

What would be cool for Seaside IMHO:
- Multi Value Dictionaries, e.g. for request parameters
- Some kind of thread safe map with atomic operations [2], e.g. for
session store

 [1] http://code.google.com/p/google-collections/
 [2]
http://java.sun.com/javase/6/docs/api/java/util/concurrent/ConcurrentMap.html

Cheers
Philippe


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Lukas Renggli
> What would be cool for Seaside IMHO:
> - Multi Value Dictionaries, e.g. for request parameters
> - Some kind of thread safe map with atomic operations [2], e.g. for
> session store

Why would that be cool? We already have a WAMultiValueDictionary for
request parameters. Also we have some ad-hock solutions for the thread
safe dictionaries.

I would be more interested to see ...

- if it would be beneficial for a collection to be able to swap out
its internal representation automatically depending on its load and
use ...

- what we can do by externalizing the enumeration protocol (no, this
is not the same as streams or the iterable refactoring that was
recently proposed) ...

- how to make collections more composable, e.g. make them thread safe,
make them read-only, make them ordered (sets and dictionaries), ...

Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Philippe Marschall-2
Lukas Renggli wrote:
>> What would be cool for Seaside IMHO:
>> - Multi Value Dictionaries, e.g. for request parameters
>> - Some kind of thread safe map with atomic operations [2], e.g. for
>> session store
>
> Why would that be cool? We already have a WAMultiValueDictionary for
> request parameters.

We could (theoretically, if other dialects follow) delete our code.
Everybody who needs something similar doesn't need to write his own.

> Also we have some ad-hock solutions for the thread
> safe dictionaries.

Yeah, the same comment as above applies.

Cheers
Philippe


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Igor Stasenko
In reply to this post by stephane ducasse
On 11 March 2010 15:24, stephane ducasse <[hidden email]> wrote:
> Hi guys
>
> tristan is a new student here and he would like to work on collection optimization and implementation.
> I would like the get some ideas from you.
>        - are there some collections that would be cool to improve?
>        - are there some collections that would be cool to have and that we do not have yet?
> If you have any idea related to optimizations let me know.
>

I'd say that it is strange choice, because collections , apart of
having big protocols don't really interesting piece
of code from an optimization side, because most of things there
already implemented quite good.
Reimplementing them using traits - that's would be cool.

> Stef
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Stéphane Ducasse
In reply to this post by Philippe Marschall-2
Thanks for the suggestions :)

>
>> Hi guys
>>
>> tristan is a new student here and he would like to work on collection optimization and implementation.
>> I would like the get some ideas from you.
>> - are there some collections that would be cool to improve?
>> - are there some collections that would be cool to have and that we do not have yet?
>> If you have any idea related to optimizations let me know.
>
> I will get bashed but: [1]

no not us :)
You remember I forced you to give a talk on what we could learn from Java :)

> What would be cool for Seaside IMHO:
> - Multi Value Dictionaries, e.g. for request parameters
> - Some kind of thread safe map with atomic operations [2], e.g. for
> session store
>
> [1] http://code.google.com/p/google-collections/
> [2]
> http://java.sun.com/javase/6/docs/api/java/util/concurrent/ConcurrentMap.html
>
> Cheers
> Philippe
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Levente Uzonyi-2
In reply to this post by stephane ducasse
On Thu, 11 Mar 2010, stephane ducasse wrote:

> Hi guys
>
> tristan is a new student here and he would like to work on collection optimization and implementation.
> I would like the get some ideas from you.
> - are there some collections that would be cool to improve?

Here's the list I plan to improve is Squeak:
- OrderedCollection: fix the growing behavior, adding n elements takes
O(n^2) time in some cases
- SortedCollection: just deprecate it (or replace it's crappy
quicksort implementation if you really want to improve it. But I think
it's useless)
- TextStream: adding n characters with different attributes has O(n^2)
runtime
- StandardFileStream >> #upTo: (the recursive call + concatenation
requires O(n^2) runtime and O(n^2) memory for large chunks of data)


Levente

> - are there some collections that would be cool to have and that we do not have yet?
> If you have any idea related to optimizations let me know.
>
> Stef
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Henrik Sperre Johansen
  On 13.03.2010 01:23, Levente Uzonyi wrote:

> On Thu, 11 Mar 2010, stephane ducasse wrote:
>
>> Hi guys
>>
>> tristan is a new student here and he would like to work on collection
>> optimization and implementation.
>> I would like the get some ideas from you.
>>     - are there some collections that would be cool to improve?
>
> Here's the list I plan to improve is Squeak:
> - OrderedCollection: fix the growing behavior, adding n elements takes
> O(n^2) time in some cases
If you're talking about alternate addFirst: addLast: adds, is that
really a usecase worth optimizing for?
makeRoomAtFirst:/Last: can definately be sped up using
replaceFrom:to:with:startingAt: though.
> - SortedCollection: just deprecate it (or replace it's crappy
> quicksort implementation if you really want to improve it. But I think
> it's useless)
+1
>
> - StandardFileStream >> #upTo: (the recursive call + concatenation
> requires O(n^2) runtime and O(n^2) memory for large chunks of data)
>
I did a different implementation some time ago, didn't notice enough of
a speedup to consider posting it though. Hope you have better luck :)

Cheers,
Henry

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Julian Fitzell-2
In reply to this post by Levente Uzonyi-2
On Sat, Mar 13, 2010 at 12:23 AM, Levente Uzonyi <[hidden email]> wrote:
> - SortedCollection: just deprecate it (or replace it's crappy quicksort
> implementation if you really want to improve it. But I think it's useless)

You shouldn't deprecate it... it's ANSI. :)

But it's implementation could certainly be improved...

Julian

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Levente Uzonyi-2
In reply to this post by Henrik Sperre Johansen
On Sat, 13 Mar 2010, Henrik Sperre Johansen wrote:

> On 13.03.2010 01:23, Levente Uzonyi wrote:
>> On Thu, 11 Mar 2010, stephane ducasse wrote:
>>
>>> Hi guys
>>>
>>> tristan is a new student here and he would like to work on collection
>>> optimization and implementation.
>>> I would like the get some ideas from you.
>>>     - are there some collections that would be cool to improve?
>>
>> Here's the list I plan to improve is Squeak:
>> - OrderedCollection: fix the growing behavior, adding n elements takes
>> O(n^2) time in some cases
> If you're talking about alternate addFirst: addLast: adds, is that really a
> usecase worth optimizing for?
> makeRoomAtFirst:/Last: can definately be sped up using
> replaceFrom:to:with:startingAt: though.

My ideas would result in a general minor speedup, while mixed use of
#addFirst: and #addLast: wouldn't suffer the performance penalty at all.
So we can win something without losing anything.
#replaceFrom:to:with:startingAt: can only be used to move elements to a
position with lower index, otherwise overlapping areas will be
overwritten.

>> - SortedCollection: just deprecate it (or replace it's crappy quicksort
>> implementation if you really want to improve it. But I think it's useless)
> +1
>>
>> - StandardFileStream >> #upTo: (the recursive call + concatenation requires
>> O(n^2) runtime and O(n^2) memory for large chunks of data)
>>
> I did a different implementation some time ago, didn't notice enough of a
> speedup to consider posting it though. Hope you have better luck :)
>

I don't have an implementation yet, but I expect ~3-5% speedup if reading
all the sources. This may seem insignificant, but if other bottlenecks are
out of the way, this may be 15-25%. And my opinion is still that "library
functions" should be optimized for most usecases, so this should be fixed.


Levente

> Cheers,
> Henry
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Levente Uzonyi-2
In reply to this post by Julian Fitzell-2
On Sat, 13 Mar 2010, Julian Fitzell wrote:

> On Sat, Mar 13, 2010 at 12:23 AM, Levente Uzonyi <[hidden email]> wrote:
>> - SortedCollection: just deprecate it (or replace it's crappy quicksort
>> implementation if you really want to improve it. But I think it's useless)
>
> You shouldn't deprecate it... it's ANSI. :)

That's why I suggested deprecating instead of removing. :)

>
> But it's implementation could certainly be improved...
>

IMHO it shouldn't be used at all.


Levente

> Julian
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

johnmci
In reply to this post by Julian Fitzell-2
10 or 15 years back I look a look at the SortedCollection logic with an eye to improve it.
Oddly I found the folks who wrote it had a good understanding of how the code becomes bytecodes and
how that impacts performance.  My attempts made things worst, so good luck.

Personally I'd choose the font rendering path, since there is lots of *stuff* required to run to splash a
character glyph to Display.

On 2010-03-12, at 4:46 PM, Julian Fitzell wrote:

> On Sat, Mar 13, 2010 at 12:23 AM, Levente Uzonyi <[hidden email]> wrote:
>> - SortedCollection: just deprecate it (or replace it's crappy quicksort
>> implementation if you really want to improve it. But I think it's useless)
>
> You shouldn't deprecate it... it's ANSI. :)
>
> But it's implementation could certainly be improved...
>
> Julian
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

--
===========================================================================
John M. McIntosh <[hidden email]>   Twitter:  squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
===========================================================================





_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Levente Uzonyi-2
On Fri, 12 Mar 2010, John M McIntosh wrote:

> 10 or 15 years back I look a look at the SortedCollection logic with an eye to improve it.
> Oddly I found the folks who wrote it had a good understanding of how the code becomes bytecodes and
> how that impacts performance.  My attempts made things worst, so good luck.

Quicksort has worse performance than merge sort in dynamic languages
(and in general when comparison costs more than minimal), because it makes
more comparisons.
The simple quicksort performs really bad (O(n^2)) if there are only a
few different elements. A 3-way partitioning algorithm could perform much
better in general, but I think it'd be still slower than merge sort.
Here is a bug report which is pretty naive, but shows the effect of the
few different elements case:
http://code.google.com/p/pharo/issues/detail?id=1718&q=sort&colspec=ID%20Type%20Status%20Summary%20Milestone%20Difficulty

Squeak/Pharo's merge sort implementation is highly optimized. The merge
algorithm compiles to pure bytecodes and primitive sends.


Levente

>
> Personally I'd choose the font rendering path, since there is lots of *stuff* required to run to splash a
> character glyph to Display.
>
> On 2010-03-12, at 4:46 PM, Julian Fitzell wrote:
>
>> On Sat, Mar 13, 2010 at 12:23 AM, Levente Uzonyi <[hidden email]> wrote:
>>> - SortedCollection: just deprecate it (or replace it's crappy quicksort
>>> implementation if you really want to improve it. But I think it's useless)
>>
>> You shouldn't deprecate it... it's ANSI. :)
>>
>> But it's implementation could certainly be improved...
>>
>> Julian
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
> --
> ===========================================================================
> John M. McIntosh <[hidden email]>   Twitter:  squeaker68882
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> ===========================================================================
>
>
>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Stéphane Ducasse
In reply to this post by Levente Uzonyi-2
I think that this is great.
We need to optimize as much as it makes sense without breaking our nice apis.

Stef

On Mar 13, 2010, at 1:23 AM, Levente Uzonyi wrote:

> On Thu, 11 Mar 2010, stephane ducasse wrote:
>
>> Hi guys
>>
>> tristan is a new student here and he would like to work on collection optimization and implementation.
>> I would like the get some ideas from you.
>> - are there some collections that would be cool to improve?
>
> Here's the list I plan to improve is Squeak:
> - OrderedCollection: fix the growing behavior, adding n elements takes O(n^2) time in some cases
> - SortedCollection: just deprecate it (or replace it's crappy quicksort implementation if you really want to improve it. But I think it's useless)
> - TextStream: adding n characters with different attributes has O(n^2) runtime
> - StandardFileStream >> #upTo: (the recursive call + concatenation requires O(n^2) runtime and O(n^2) memory for large chunks of data)
>
>
> Levente
>
>> - are there some collections that would be cool to have and that we do not have yet?
>> If you have any idea related to optimizations let me know.
>>
>> Stef
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Stephan Eggermont-3
In reply to this post by stephane ducasse
Igor wrote:
 >Yes. Use right tool for to do job.
 >An OrderedCollection can't satisfy every possible combination of
 >tasks, which developer facing.

So, these collections are currently missing. And as all current
code is bound to the current implementations, switching
implementation on size is the prefered approach.

Stephan

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Igor Stasenko
On 13 March 2010 12:56, Stephan Eggermont <[hidden email]> wrote:
> Igor wrote:
>>Yes. Use right tool for to do job.
>>An OrderedCollection can't satisfy every possible combination of
>>tasks, which developer facing.
>
> So, these collections are currently missing. And as all current
> code is bound to the current implementations, switching
> implementation on size is the prefered approach.
>
If such kind of collections didn't existed over more than 30 years of
smalltalk existance,
what makes you think that there a big interest in having them today?
Of course, they could occupy some niche, but i doubt that they will
replace an existing classes
in a day-to-day use.
Besides, many projects implementing own kinds of collections, specific
to their needs.

> Stephan
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Stephan Eggermont-3
In reply to this post by stephane ducasse
Igor wrote:
 >If such kind of collections didn't existed over more than
 >30 years of smalltalk existance, what makes you think
 >that there a big interest in having them today?
Memory. Without the memory to use them, the need was
not there.
 >Of course, they could occupy some niche, but i doubt
 >that they will replace an existing classes in a day-to-day use.
In a 64 bit system, it is not a niche. You simply need a set of
collection classes that work well with millions an tens of millions
of elements, not just 100.000s.
 >Besides, many projects implementing own kinds of collections,
 >specific to their needs.
That is exactly what you want to avoid. I've had to implement
these collections too often.

Stephan


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Igor Stasenko
On 13 March 2010 13:24, Stephan Eggermont <[hidden email]> wrote:

> Igor wrote:
>>If such kind of collections didn't existed over more than
>>30 years of smalltalk existance, what makes you think
>>that there a big interest in having them today?
> Memory. Without the memory to use them, the need was
> not there.
>>Of course, they could occupy some niche, but i doubt
>>that they will replace an existing classes in a day-to-day use.
> In a 64 bit system, it is not a niche. You simply need a set of
> collection classes that work well with millions an tens of millions
> of elements, not just 100.000s.

What you talking about is an edge case in currently existing systems.
And its easy to prove:

(OrderedCollection allInstances collect: [:each | each size ]) asBag
sortedCounts
 {2956->0 . 1054->1 . 570->4 . 457->5 . 387->6 . 288->2 . 247->3 .
222->7 . 162->8 . 145->9 . 83->10 . 66->11 . 41->12 . 27->14 . 18->13
. 14->15 . 11->16 . 8->30 . 6->17 . 6->25 . 5->36 . 4->21 . 4->27 .
4->42 . 3->26 . 2->18 . 2->23 . 2->24 . 2->35 . 2->44 . 2->52 . 2->222
. 2->336 . 2->1352 . 1->20 . 1->22 . 1->32 . 1->33 . 1->34 . 1->38 .
1->43 . 1->48 . 1->53 . 1->69 . 1->73 . 1->74 . 1->89 . 1->90 . 1->100
. 1->120 . 1->121 . 1->129 . 1->131 . 1->133 . 1->137 . 1->158 .
1->167 . 1->177 . 1->190 . 1->197 . 1->198 . 1->206 . 1->219 . 1->231
. 1->232 . 1->278 . 1->293 . 1->312 . 1->317 . 1->354 . 1->359 .
1->372 . 1->394 . 1->412 . 1->482 . 1->565 . 1->605 . 1->640 . 1->670
. 1->680 . 1->694 . 1->751 . 1->872}

and until this list will show a 5% or more presence of collections
which having 100000 and more elements,
i don't see a pressing need in doing something, because obviously, a
current implementation which works fine for general cases.
Or do you think that on 64 bit systems the above distribution will be different?

>>Besides, many projects implementing own kinds of collections,
>>specific to their needs.
> That is exactly what you want to avoid. I've had to implement
> these collections too often.
>
So, what stopping you from creating a separate package, say
CollectionsXL
and add there a generalized versions of collections which optimized
for working with extremely large datasets?

> Stephan
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Levente Uzonyi-2
In reply to this post by Stephan Eggermont-3
On Sat, 13 Mar 2010, Stephan Eggermont wrote:

> Igor wrote:
>> Yes. Use right tool for to do job.
>> An OrderedCollection can't satisfy every possible combination of
>> tasks, which developer facing.
>
> So, these collections are currently missing. And as all current
> code is bound to the current implementations, switching
> implementation on size is the prefered approach.

Trees are rarely useful in Smalltalk, so there's no default tree
implementation. Note that trees consume a lot more memory (>=5x) than a
single array.


Levente

>
> Stephan
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection

Stephan Eggermont-3
In reply to this post by stephane ducasse
Levente wrote:
 >Trees are rarely useful in Smalltalk, so there's no
 >default tree implementation. Note that trees consume
 >a lot more memory (>=5x) than a single array.
Huh? You mean binary trees? Or Red-Black or so?
On current procesors they are not very useful, and
not usable at all for large data structures because
of the large overhead. But n-ary trees or two-level
stuctures are.

Stephan

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
12