Collection >> collectDistinct: ??

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Collection >> collectDistinct: ??

Mariano Martinez Peck
Hi. I was needing something like the SQL select distinct, that doesn't take into account repeated objects. I didn't found anything useful in Collection, and thus, I have implemented this:

Collection >> collectDistinct: aBlock


collectDistinct: aBlock
    "Evaluate aBlock with each of the receiver's elements as the argument. 
    Collect the resulting values into a Set, thus repeated objects will not be present.
    Answer the new collection."

    | newSet |
    newSet := self species new asSet.
    self do: [:each | newSet add: (aBlock value: each)].
    ^ newSet


Is there a better way ?   Do you think it make sense to put this in Pharo ?

Cheers

Mariano

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection >> collectDistinct: ??

Nicolas Cellier
self collect: aBlock as: Set

2010/3/10 Mariano Martinez Peck <[hidden email]>:

> Hi. I was needing something like the SQL select distinct, that doesn't take
> into account repeated objects. I didn't found anything useful in Collection,
> and thus, I have implemented this:
>
> Collection >> collectDistinct: aBlock
>
>
> collectDistinct: aBlock
>     "Evaluate aBlock with each of the receiver's elements as the argument.
>     Collect the resulting values into a Set, thus repeated objects will not
> be present.
>     Answer the new collection."
>
>     | newSet |
>     newSet := self species new asSet.
>     self do: [:each | newSet add: (aBlock value: each)].
>     ^ newSet
>
>
> Is there a better way ?   Do you think it make sense to put this in Pharo ?
>
> Cheers
>
> Mariano
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection >> collectDistinct: ??

Mariano Martinez Peck
hahhha thanks Nicolas..I was sure somewhere it should be...I was blind...weird I check there!!!

thanks a lot.

mariano

On Wed, Mar 10, 2010 at 2:07 PM, Nicolas Cellier <[hidden email]> wrote:
self collect: aBlock as: Set

2010/3/10 Mariano Martinez Peck <[hidden email]>:
> Hi. I was needing something like the SQL select distinct, that doesn't take
> into account repeated objects. I didn't found anything useful in Collection,
> and thus, I have implemented this:
>
> Collection >> collectDistinct: aBlock
>
>
> collectDistinct: aBlock
>     "Evaluate aBlock with each of the receiver's elements as the argument.
>     Collect the resulting values into a Set, thus repeated objects will not
> be present.
>     Answer the new collection."
>
>     | newSet |
>     newSet := self species new asSet.
>     self do: [:each | newSet add: (aBlock value: each)].
>     ^ newSet
>
>
> Is there a better way ?   Do you think it make sense to put this in Pharo ?
>
> Cheers
>
> Mariano
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection >> collectDistinct: ??

Igor Stasenko
In reply to this post by Nicolas Cellier
On 10 March 2010 15:07, Nicolas Cellier
<[hidden email]> wrote:
> self collect: aBlock as: Set
>

One more time, i found this very beatiful and useful extension to
Collection protocol.
And i am proud being a witness when this thing is born! :)

> 2010/3/10 Mariano Martinez Peck <[hidden email]>:
>> Hi. I was needing something like the SQL select distinct, that doesn't take
>> into account repeated objects. I didn't found anything useful in Collection,
>> and thus, I have implemented this:
>>
>> Collection >> collectDistinct: aBlock
>>
>>
>> collectDistinct: aBlock
>>     "Evaluate aBlock with each of the receiver's elements as the argument.
>>     Collect the resulting values into a Set, thus repeated objects will not
>> be present.
>>     Answer the new collection."
>>
>>     | newSet |
>>     newSet := self species new asSet.
>>     self do: [:each | newSet add: (aBlock value: each)].
>>     ^ newSet
>>
>>
>> Is there a better way ?   Do you think it make sense to put this in Pharo ?
>>
>> Cheers
>>
>> Mariano
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection >> collectDistinct: ??

Nicolas Cellier
2010/3/11 Igor Stasenko <[hidden email]>:
> On 10 March 2010 15:07, Nicolas Cellier
> <[hidden email]> wrote:
>> self collect: aBlock as: Set
>>
>
> One more time, i found this very beatiful and useful extension to
> Collection protocol.
> And i am proud being a witness when this thing is born! :)
>

Yes,
I remember saying I was not convinced some times ago...
http://lists.squeakfoundation.org/pipermail/squeak-dev/2008-June/129457.html
Being able to change own opinion is a good thing !

Nicolas

>> 2010/3/10 Mariano Martinez Peck <[hidden email]>:
>>> Hi. I was needing something like the SQL select distinct, that doesn't take
>>> into account repeated objects. I didn't found anything useful in Collection,
>>> and thus, I have implemented this:
>>>
>>> Collection >> collectDistinct: aBlock
>>>
>>>
>>> collectDistinct: aBlock
>>>     "Evaluate aBlock with each of the receiver's elements as the argument.
>>>     Collect the resulting values into a Set, thus repeated objects will not
>>> be present.
>>>     Answer the new collection."
>>>
>>>     | newSet |
>>>     newSet := self species new asSet.
>>>     self do: [:each | newSet add: (aBlock value: each)].
>>>     ^ newSet
>>>
>>>
>>> Is there a better way ?   Do you think it make sense to put this in Pharo ?
>>>
>>> Cheers
>>>
>>> Mariano
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection >> collectDistinct: ??

Randal L. Schwartz
In reply to this post by Mariano Martinez Peck
>>>>> "Mariano" == Mariano Martinez Peck <[hidden email]> writes:


Mariano> Is there a better way ?   Do you think it make sense to put this in Pharo ?

I can envision collection types where collect: can create something that asSet
can more easily turn into a set in bulk than adding each element individually
like this.

Have you already benchmarked your verbose code against the "simple" code
under varying sizes?

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection >> collectDistinct: ??

Randal L. Schwartz
>>>>> "Randal" == Randal L Schwartz <[hidden email]> writes:

Randal> I can envision collection types where collect: can create something
Randal> that asSet can more easily turn into a set in bulk than adding each
Randal> element individually like this.

In particular, turning an array of 10,000 elements into a set
can start with a particular hashing size, whereas starting with
an empty set and adding 10,000 elements will likely involve rehasing
a number of times.  If 10,000 isn't troubling enough, make that
a million. :)

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection >> collectDistinct: ??

Levente Uzonyi-2
On Wed, 10 Mar 2010, Randal L. Schwartz wrote:

>>>>>> "Randal" == Randal L Schwartz <[hidden email]> writes:
>
> Randal> I can envision collection types where collect: can create something
> Randal> that asSet can more easily turn into a set in bulk than adding each
> Randal> element individually like this.

Keys of dictionaries, but it's hackish.

>
> In particular, turning an array of 10,000 elements into a set
> can start with a particular hashing size, whereas starting with
> an empty set and adding 10,000 elements will likely involve rehasing
> a number of times.  If 10,000 isn't troubling enough, make that
> a million. :)
>

Lets see what we've got (in Squeak):

Smalltalk garbageCollect.
[ (1 to: 1000000) collect: #yourself as: Set ] timeToRun. "==> 2651."

Smalltalk garbageCollect.
[ (1 to: 1000000) asSet ] timeToRun. "==>  2522".

| s |
s := Set new.
Smalltalk garbageCollect.
[ 1 to: 1000000 do: [ :each | s add: each ] ] timeToRun. "==> 1192"

| s |
s := Set new: 1000000.
Smalltalk garbageCollect.
[ 1 to: 1000000 do: [ :each | s add: each ] ] timeToRun. "==> 472"

Note that adding numbers from 1 to n has the lowest cost for addition,
since there will be no hash collisions at all.


So try it with randomized data:

a := (1 to: 1000000) asArray shuffled.

Smalltalk garbageCollect.
[ a collect: #yourself as: Set ] timeToRun. "==> 4143"

Smalltalk garbageCollect.
[ a asSet ] timeToRun. "==> 3995".

| s |
s := Set new.
Smalltalk garbageCollect.
[ s addAll: a ] timeToRun. "==> 157304 oops weak hashes".

| s |
s := Set new: 1000000.
Smalltalk garbageCollect.
[ s addAll: a ] timeToRun. "==> 636".


With better hashes:

a := Set new: 1000000.
[ a size < 1000000 ] whileTrue: [
  a add: SmallInteger maxVal atRandom ].
a := a asArray.

Smalltalk garbageCollect.
[ a collect: #yourself as: Set ] timeToRun. "==>  3895"

Smalltalk garbageCollect.
[ a asSet ] timeToRun. "==>  3777".

| s |
s := Set new.
Smalltalk garbageCollect.
[ s addAll: a ] timeToRun. "==> 4729 much better".

| s |
s := Set new: 1000000.
Smalltalk garbageCollect.
[ s addAll: a ] timeToRun. "==> 774".


Conclusions:
- #collect:as: is (almost) as fast as #asSet
- better hashes can make a huge difference
- message sends cost a lot compared to bytecodes
- if you can guess the size of your collection, do it (applies for all
growing collections and collection streaming)
- only optimize if you have to in your own code


Levente

> --
> Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
> <[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
> Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
> See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection >> collectDistinct: ??

Stéphane Ducasse
In reply to this post by Nicolas Cellier
Yes
I remember when I was against {  .  } :)
On Mar 11, 2010, at 1:54 AM, Nicolas Cellier wrote:

> 2010/3/11 Igor Stasenko <[hidden email]>:
>> On 10 March 2010 15:07, Nicolas Cellier
>> <[hidden email]> wrote:
>>> self collect: aBlock as: Set
>>>
>>
>> One more time, i found this very beatiful and useful extension to
>> Collection protocol.
>> And i am proud being a witness when this thing is born! :)
>>
>
> Yes,
> I remember saying I was not convinced some times ago...
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2008-June/129457.html
> Being able to change own opinion is a good thing !
>
> Nicolas
>
>>> 2010/3/10 Mariano Martinez Peck <[hidden email]>:
>>>> Hi. I was needing something like the SQL select distinct, that doesn't take
>>>> into account repeated objects. I didn't found anything useful in Collection,
>>>> and thus, I have implemented this:
>>>>
>>>> Collection >> collectDistinct: aBlock
>>>>
>>>>
>>>> collectDistinct: aBlock
>>>>     "Evaluate aBlock with each of the receiver's elements as the argument.
>>>>     Collect the resulting values into a Set, thus repeated objects will not
>>>> be present.
>>>>     Answer the new collection."
>>>>
>>>>     | newSet |
>>>>     newSet := self species new asSet.
>>>>     self do: [:each | newSet add: (aBlock value: each)].
>>>>     ^ newSet
>>>>
>>>>
>>>> Is there a better way ?   Do you think it make sense to put this in Pharo ?
>>>>
>>>> Cheers
>>>>
>>>> Mariano
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection >> collectDistinct: ??

Stéphane Ducasse
In reply to this post by Levente Uzonyi-2

>
> So try it with randomized data:
>
> a := (1 to: 1000000) asArray shuffled.
>
> Smalltalk garbageCollect.
> [ a collect: #yourself as: Set ] timeToRun. "==> 4143"
>
> Smalltalk garbageCollect.
> [ a asSet ] timeToRun. "==> 3995".
>
> | s |
> s := Set new.
> Smalltalk garbageCollect.
> [ s addAll: a ] timeToRun. "==> 157304 oops weak hashes".
>
> | s |
> s := Set new: 1000000.
> Smalltalk garbageCollect.
> [ s addAll: a ] timeToRun. "==> 636".
>
>
> With better hashes:

levente how do you determine that you use better hashes below


> a := Set new: 1000000.
> [ a size < 1000000 ] whileTrue: [
> a add: SmallInteger maxVal atRandom ].
> a := a asArray.
>
> Smalltalk garbageCollect.
> [ a collect: #yourself as: Set ] timeToRun. "==>  3895"
>
> Smalltalk garbageCollect.
> [ a asSet ] timeToRun. "==>  3777".
>
> | s |
> s := Set new.
> Smalltalk garbageCollect.
> [ s addAll: a ] timeToRun. "==> 4729 much better".
>
> | s |
> s := Set new: 1000000.
> Smalltalk garbageCollect.
> [ s addAll: a ] timeToRun. "==> 774".
>
>
> Conclusions:
> - #collect:as: is (almost) as fast as #asSet
> - better hashes can make a huge difference
> - message sends cost a lot compared to bytecodes
> - if you can guess the size of your collection, do it (applies for all growing collections and collection streaming)
yes!!

> - only optimize if you have to in your own code
>
>
> Levente
>
>> --
>> Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
>> <[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
>> Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
>> See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection >> collectDistinct: ??

Nicolas Cellier
2010/3/11 Stéphane Ducasse <[hidden email]>:

>
>>
>> So try it with randomized data:
>>
>> a := (1 to: 1000000) asArray shuffled.
>>
>> Smalltalk garbageCollect.
>> [ a collect: #yourself as: Set ] timeToRun. "==> 4143"
>>
>> Smalltalk garbageCollect.
>> [ a asSet ] timeToRun. "==> 3995".
>>
>> | s |
>> s := Set new.
>> Smalltalk garbageCollect.
>> [ s addAll: a ] timeToRun. "==> 157304 oops weak hashes".
>>
>> | s |
>> s := Set new: 1000000.
>> Smalltalk garbageCollect.
>> [ s addAll: a ] timeToRun. "==> 636".
>>
>>
>> With better hashes:
>
> levente how do you determine that you use better hashes below
>

SmallInteger are their own hash.
Code below produces random hashes with uniform density.

>
>> a := Set new: 1000000.
>> [ a size < 1000000 ] whileTrue: [
>>       a add: SmallInteger maxVal atRandom ].
>> a := a asArray.
>>
>> Smalltalk garbageCollect.
>> [ a collect: #yourself as: Set ] timeToRun. "==>  3895"
>>
>> Smalltalk garbageCollect.
>> [ a asSet ] timeToRun. "==>  3777".
>>
>> | s |
>> s := Set new.
>> Smalltalk garbageCollect.
>> [ s addAll: a ] timeToRun. "==> 4729 much better".
>>
>> | s |
>> s := Set new: 1000000.
>> Smalltalk garbageCollect.
>> [ s addAll: a ] timeToRun. "==> 774".
>>
>>
>> Conclusions:
>> - #collect:as: is (almost) as fast as #asSet
>> - better hashes can make a huge difference
>> - message sends cost a lot compared to bytecodes
>> - if you can guess the size of your collection, do it (applies for all growing collections and collection streaming)
> yes!!
>> - only optimize if you have to in your own code
>>
>>
>> Levente
>>
>>> --
>>> Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
>>> <[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
>>> Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
>>> See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection >> collectDistinct: ??

Henrik Sperre Johansen
In reply to this post by Nicolas Cellier


Den 11.03.2010 01:54, skrev Nicolas Cellier:

> 2010/3/11 Igor Stasenko <[hidden email]>:
>  
>> On 10 March 2010 15:07, Nicolas Cellier
>> <[hidden email]> wrote:
>>    
>>> self collect: aBlock as: Set
>>>
>>>      
>> One more time, i found this very beatiful and useful extension to
>> Collection protocol.
>> And i am proud being a witness when this thing is born! :)
>>
>>    
> Yes,
> I remember saying I was not convinced some times ago...
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2008-June/129457.html
> Being able to change own opinion is a good thing !
>
> Nicolas
>  
I have to admit, I still don't quite see what's wrong with
1.
(self collect: aBlock) asSet
or
2.
mySet := Set new: self size.
self do: aBlockAddingToMySetAtEnd.

or (well, this one is IMO indeed a bit ugly)
3.
self inject: (Set new:self size) into:
    [ :sub :next |
     sub add: next collectValue; yourself ] ...

Sure, they're a bit verbose, and when you'd like to use 1 vs 2 or 3
varies abit on the collection contents, but they all work in all
dialects, and aren't THAT much longer, unless your collect block is
really short.

Cheers,
Henry

>  
>>> 2010/3/10 Mariano Martinez Peck <[hidden email]>:
>>>      
>>>> Hi. I was needing something like the SQL select distinct, that doesn't take
>>>> into account repeated objects. I didn't found anything useful in Collection,
>>>> and thus, I have implemented this:
>>>>
>>>> Collection >> collectDistinct: aBlock
>>>>
>>>>
>>>> collectDistinct: aBlock
>>>>     "Evaluate aBlock with each of the receiver's elements as the argument.
>>>>     Collect the resulting values into a Set, thus repeated objects will not
>>>> be present.
>>>>     Answer the new collection."
>>>>
>>>>     | newSet |
>>>>     newSet := self species new asSet.
>>>>     self do: [:each | newSet add: (aBlock value: each)].
>>>>     ^ newSet
>>>>
>>>>
>>>> Is there a better way ?   Do you think it make sense to put this in Pharo ?
>>>>
>>>> Cheers
>>>>
>>>> Mariano
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>
>>>>        
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>>      
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>    
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
>  

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection >> collectDistinct: ??

Levente Uzonyi-2
In reply to this post by Stéphane Ducasse
On Thu, 11 Mar 2010, Stéphane Ducasse wrote:

>
>>
>> So try it with randomized data:
>>
>> a := (1 to: 1000000) asArray shuffled.
>>
>> Smalltalk garbageCollect.
>> [ a collect: #yourself as: Set ] timeToRun. "==> 4143"
>>
>> Smalltalk garbageCollect.
>> [ a asSet ] timeToRun. "==> 3995".
>>
>> | s |
>> s := Set new.
>> Smalltalk garbageCollect.
>> [ s addAll: a ] timeToRun. "==> 157304 oops weak hashes".
>>
>> | s |
>> s := Set new: 1000000.
>> Smalltalk garbageCollect.
>> [ s addAll: a ] timeToRun. "==> 636".
>>
>>
>> With better hashes:
>
> levente how do you determine that you use better hashes below
The hash of a SmallInteger is itself. The larger the interval where the
hashes come from the better, because the distribution of (hash \\ array
size) values is more uniform, therefore the chance of clustering and
collisions is lower.


Levente

>
>
>> a := Set new: 1000000.
>> [ a size < 1000000 ] whileTrue: [
>> a add: SmallInteger maxVal atRandom ].
>> a := a asArray.
>>
>> Smalltalk garbageCollect.
>> [ a collect: #yourself as: Set ] timeToRun. "==>  3895"
>>
>> Smalltalk garbageCollect.
>> [ a asSet ] timeToRun. "==>  3777".
>>
>> | s |
>> s := Set new.
>> Smalltalk garbageCollect.
>> [ s addAll: a ] timeToRun. "==> 4729 much better".
>>
>> | s |
>> s := Set new: 1000000.
>> Smalltalk garbageCollect.
>> [ s addAll: a ] timeToRun. "==> 774".
>>
>>
>> Conclusions:
>> - #collect:as: is (almost) as fast as #asSet
>> - better hashes can make a huge difference
>> - message sends cost a lot compared to bytecodes
>> - if you can guess the size of your collection, do it (applies for all growing collections and collection streaming)
> yes!!
>> - only optimize if you have to in your own code
>>
>>
>> Levente
>>
>>> --
>>> Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
>>> <[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
>>> Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
>>> See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection >> collectDistinct: ??

Levente Uzonyi-2
In reply to this post by Henrik Sperre Johansen
On Thu, 11 Mar 2010, Henrik Johansen wrote:

>
>
> Den 11.03.2010 01:54, skrev Nicolas Cellier:
>> 2010/3/11 Igor Stasenko <[hidden email]>:
>>
>>> On 10 March 2010 15:07, Nicolas Cellier
>>> <[hidden email]> wrote:
>>>
>>>> self collect: aBlock as: Set
>>>>
>>>>
>>> One more time, i found this very beatiful and useful extension to
>>> Collection protocol.
>>> And i am proud being a witness when this thing is born! :)
>>>
>>>
>> Yes,
>> I remember saying I was not convinced some times ago...
>> http://lists.squeakfoundation.org/pipermail/squeak-dev/2008-June/129457.html
>> Being able to change own opinion is a good thing !
>>
>> Nicolas
>>
> I have to admit, I still don't quite see what's wrong with
> 1.
> (self collect: aBlock) asSet

('foo' collect: [ :each | each asciiValue ]) asSet "===> boom :)"

> or
> 2.
> mySet := Set new: self size.
> self do: aBlockAddingToMySetAtEnd.

That's ok, but it's two lines + a temporary declaration.

>
> or (well, this one is IMO indeed a bit ugly)
> 3.
> self inject: (Set new:self size) into:
>    [ :sub :next |
>     sub add: next collectValue; yourself ] ...
>
> Sure, they're a bit verbose, and when you'd like to use 1 vs 2 or 3
> varies abit on the collection contents, but they all work in all
> dialects, and aren't THAT much longer, unless your collect block is
> really short.

I did a quick sampling (not representative) and found that 1 out of 20
collect blocks is longer than a single line.

#collect:as: is a solution to the problem about #species which limits the
use of #collect: in some cases. You can find the proposal about
#collect:as: here if you're interested:
http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-November/141245.html
And who knows, maybe other dialects will implement these methods too,
until then, don't use #collect:as: in code which should be portable across
dialects.


Levente

>
> Cheers,
> Henry
>>
>>>> 2010/3/10 Mariano Martinez Peck <[hidden email]>:
>>>>
>>>>> Hi. I was needing something like the SQL select distinct, that doesn't take
>>>>> into account repeated objects. I didn't found anything useful in Collection,
>>>>> and thus, I have implemented this:
>>>>>
>>>>> Collection >> collectDistinct: aBlock
>>>>>
>>>>>
>>>>> collectDistinct: aBlock
>>>>>     "Evaluate aBlock with each of the receiver's elements as the argument.
>>>>>     Collect the resulting values into a Set, thus repeated objects will not
>>>>> be present.
>>>>>     Answer the new collection."
>>>>>
>>>>>     | newSet |
>>>>>     newSet := self species new asSet.
>>>>>     self do: [:each | newSet add: (aBlock value: each)].
>>>>>     ^ newSet
>>>>>
>>>>>
>>>>> Is there a better way ?   Do you think it make sense to put this in Pharo ?
>>>>>
>>>>> Cheers
>>>>>
>>>>> Mariano
>>>>>
>>>>> _______________________________________________
>>>>> Pharo-project mailing list
>>>>> [hidden email]
>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko AKA sig.
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>>
>>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection >> collectDistinct: ??

csrabak
In reply to this post by Mariano Martinez Peck
Em 10/03/2010 09:50, Mariano Martinez Peck < [hidden email] > escreveu:

> Hi.  I was  needing something  like  the SQL  select distinct,  that
> doesn't take into account  repeated objects. I didn't found anything
> useful in Collection, and thus, I have implemented this:
>  Collection >> collectDistinct: aBlock
>
>  collectDistinct: aBlock
>  "Evaluate  aBlock  with each  of  the  receiver's  elements as  the
>  argument.  Collect  the resulting values into a  Set, thus repeated
>  objects will not be present.  Answer the new collection."
> | newSet  |
> newSet := self  species new asSet.  
> self  do: [:each |
>  newSet add: (aBlock value: each)].  
> ^newSet
>
>  Is there a better way ?  Do  you think it make sense to put this in
> Pharo ?
>   Cheers
Mariano,

IIUC, your intent with this method is/was to get non repeated elements in a generic collection.  However, the way is coded the result is Set which is returned instead of the collection from which the method had been called.

I didn't miss something, the result of your proposed method is akin to calling Collection>>collect: and sending asSet to it.

ITYM:

| newSet  |
 newSet := Set new .  
 self  do: [:each |
  newSet add: (aBlock value: each)].  
 ^newSet as: (self class)

HTH

--
Cesar Rabak

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection >> collectDistinct: ??

csrabak
In reply to this post by Nicolas Cellier
Although the end result should be the same, I wonder if for a collection of items where there is a lot of duplicates the sequential nature of collecting and then converting to a Set wouldn't be less performant than Mariano's proposed method?

--
Cesar Rabak



Em 10/03/2010 10:07, Nicolas Cellier < [hidden email] > escreveu:
self collect: aBlock as: Set

2010/3/10 Mariano Martinez Peck :

> Hi. I was needing something like the SQL select distinct, that doesn't take
> into account repeated objects. I didn't found anything useful in Collection,
> and thus, I have implemented this:
>
> Collection >> collectDistinct: aBlock
>
>
> collectDistinct: aBlock
>     "Evaluate aBlock with each of the receiver's elements as the argument.
>     Collect the resulting values into a Set, thus repeated objects will not
> be present.
>     Answer the new collection."
>
>     | newSet |
>     newSet := self species new asSet.
>     self do: [:each | newSet add: (aBlock value: each)].
>     ^ newSet
>
>
> Is there a better way ?   Do you think it make sense to put this in Pharo ?
>
> Cheers
>
> Mariano
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Collection >> collectDistinct: ??

Henrik Sperre Johansen

On 13.03.2010 00:14, [hidden email] wrote:
> Although the end result should be the same, I wonder if for a collection of items where there is a lot of duplicates the sequential nature of collecting and then converting to a Set wouldn't be less performant than Mariano's proposed method?
>
> --
> Cesar Rabak
In my experience, yes, it would.
The difference in practice is usually so small you wouldn't prefer one
over the other unless it show up in a profile of a time-critical method
though.

Cheers,
Henry

Few distinct:
[(1 to: 500000) collect: [:each | each odd] as: Set] timeToRun 1347

[|aSet |
aSet := Set new.
(1 to: 500000) do: [:each | aSet add: each odd] ] timeToRun 375

All distinct:
[(1 to: 500000) collect: [:each | each +1] as: Set] timeToRun  1350

[|aSet |
aSet := Set new: 500000.
(1 to: 500000) do: [:each | aSet add: each +1] ] timeToRun  1740


>
>
> Em 10/03/2010 10:07, Nicolas Cellier<  [hidden email]>  escreveu:
> self collect: aBlock as: Set
>
> 2010/3/10 Mariano Martinez Peck :
>> Hi. I was needing something like the SQL select distinct, that doesn't take
>> into account repeated objects. I didn't found anything useful in Collection,
>> and thus, I have implemented this:
>>
>> Collection>>  collectDistinct: aBlock
>>
>>
>> collectDistinct: aBlock
>>      "Evaluate aBlock with each of the receiver's elements as the argument.
>>      Collect the resulting values into a Set, thus repeated objects will not
>> be present.
>>      Answer the new collection."
>>
>>      | newSet |
>>      newSet := self species new asSet.
>>      self do: [:each | newSet add: (aBlock value: each)].
>>      ^ newSet
>>
>>
>> Is there a better way ?   Do you think it make sense to put this in Pharo ?
>>
>> Cheers
>>
>> Mariano
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project