Another extension proposal -> subsets

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Another extension proposal -> subsets

cedreek
When we don't find method, we reinvent the wheel. Ok, so this is what
happened with my quest on rounding float. So here is another example.
I needed a method subsets ... to have all subsets of a Set...

#(1 2 ) asSet subsets " -> a Set(a Set(1 2) a Set(1) a Set(2) a Set()) "

Is there already something ? I couldn't find so here is what I did ?
probably hackish (because of the binary mask) so feel free to comment
;)

Set>>subsets
| subsetsSize subsets workArray |
        workArray := self asArray.
  subsetsSize := 2 raisedTo: self size.
        subsets := OrderedCollection new.
        1 to: subsetsSize do: [:ea |
                subsets add: ((workArray masquedBy: (ea printStringBase: 2))
asSet)]. "masque par une conversion binaire"
        ^subsets asSet  "could be an array of sets"

ArrayedCollection>>masquedBy: aBitString
| result entry bitString |
        entry := self reverse.
        bitString := aBitString reverse.
        result := OrderedCollection new.
        1 to: (self size) do: [:ea |
                 ((bitString  at: ea ifAbsent: []) = $1) ifTrue: [result add: (entry
at: ea)]].
        ^result reverse

Cédrick

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Another extension proposal -> subsets

Marcin Tustin
Googling "smalltalk powerset" turns this up: http://www.foldr.org/~michaelw/log/2005/09/

On Wed, Jul 23, 2008 at 11:51 AM, cdrick <[hidden email]> wrote:
When we don't find method, we reinvent the wheel. Ok, so this is what
happened with my quest on rounding float. So here is another example.
I needed a method subsets ... to have all subsets of a Set...

#(1 2 ) asSet subsets " -> a Set(a Set(1 2) a Set(1) a Set(2) a Set()) "

Is there already something ? I couldn't find so here is what I did ?
probably hackish (because of the binary mask) so feel free to comment
;)

Set>>subsets
| subsetsSize subsets workArray |
       workArray := self asArray.
       subsetsSize := 2 raisedTo: self size.
       subsets := OrderedCollection new.
       1 to: subsetsSize do: [:ea |
               subsets add: ((workArray masquedBy: (ea printStringBase: 2))
asSet)]. "masque par une conversion binaire"
       ^subsets asSet  "could be an array of sets"

ArrayedCollection>>masquedBy: aBitString
| result entry bitString |
       entry := self reverse.
       bitString := aBitString reverse.
       result := OrderedCollection new.
       1 to: (self size) do: [:ea |
                ((bitString  at: ea ifAbsent: []) = $1) ifTrue: [result add: (entry
at: ea)]].
       ^result reverse

Cédrick

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners



_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Another extension proposal -> subsets

cedreek
> Googling "smalltalk powerset" turns this up:
> http://www.foldr.org/~michaelw/log/2005/09/
>

Thanks Marcin, interesting read :)  This time, I had the wrong keyword :(
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Another extension proposal -> subsets

Zulq Alam-2
In reply to this post by cedreek
Couldn't find anything obvious but remembered #permutationsDo: which may
help. Not sure if these does what you want, but maybe it can be adapted?

Set>>subsets
   | subsets |
   subsets := Set with: self.
   self asArray permutationsDo: [:e | subsets add: e asSet].
   self do: [:e | subsets addAll: (self copyWithout: e) subsets].
   ^ subsets

#(1 2) asSet subsets.
"a Set(a Set(1 2) a Set() a Set(2) a Set(1))"

#(1 2 3) asSet subsets.
"a Set(a Set(1 2) a Set() a Set(2) a Set(1 2 3) a Set(2 3) a Set(3) a
Set(1 3) a Set(1))"

Z.

cdrick wrote:

> When we don't find method, we reinvent the wheel. Ok, so this is what
> happened with my quest on rounding float. So here is another example.
> I needed a method subsets ... to have all subsets of a Set...
>
> #(1 2 ) asSet subsets " -> a Set(a Set(1 2) a Set(1) a Set(2) a Set()) "
>
> Is there already something ? I couldn't find so here is what I did ?
> probably hackish (because of the binary mask) so feel free to comment
> ;)
>
> Set>>subsets
> | subsetsSize subsets workArray |
> workArray := self asArray.
>   subsetsSize := 2 raisedTo: self size.
> subsets := OrderedCollection new.
> 1 to: subsetsSize do: [:ea |
> subsets add: ((workArray masquedBy: (ea printStringBase: 2))
> asSet)]. "masque par une conversion binaire"
> ^subsets asSet  "could be an array of sets"
>
> ArrayedCollection>>masquedBy: aBitString
> | result entry bitString |
> entry := self reverse.
> bitString := aBitString reverse.
> result := OrderedCollection new.
> 1 to: (self size) do: [:ea |
> ((bitString  at: ea ifAbsent: []) = $1) ifTrue: [result add: (entry
> at: ea)]].
> ^result reverse
>
> Cédrick
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/beginners

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Re: Another extension proposal -> subsets

cedreek
>
> Set>>subsets
>  | subsets |
>  subsets := Set with: self.
>  self asArray permutationsDo: [:e | subsets add: e asSet].
>  self do: [:e | subsets addAll: (self copyWithout: e) subsets].
>  ^ subsets

nice too and frustrating how you got it quick :)

I tried a recursive method too first but found the byte ressemblance so...
My only consolation is that the recursive solution (subsets2) is
slower and hardly work for Set with more than 10 elements.

set := #(1 2 3 4 5) asSet.
[ set subsets  ] timeToRun ." 1"
[ set subsets2  ] timeToRun. " 8"

set := #(1 2 3 4 5 6 7) asSet.
[ set subsets  ] timeToRun ." 5"
[ set subsets2  ] timeToRun. " 233"

set := #(1 2 3 4 5 6 7 8 ) asSet.
[ set subsets  ] timeToRun . " 11"
[ set subsets2  ] timeToRun. " 1683"

set := (1 to: 10) asSet.
[ set subsets  ] timeToRun . " 46"

set := (1 to: 15) asSet.
[ set subsets  ] timeToRun ." 2484"

set := (1 to: 20) asSet.
[ set subsets  ] timeToRun .  "559953"  "but here the result has (2
raisedTo: 20)  1 048 576 Sets :)"

set := (1 to: 50) asSet.
[ set subsets  ] timeToRun .    "I got a "space is low   wow" :)

I have to go,
That was fun :)

See you

Cédrick

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Another extension proposal -> subsets

Nicolas Cellier-3
You have one messsage #combinations:atATimeDo: to enumerate partitions,
but in SequenceableCollection, not in Set:

| collec subset |
collec := 1 to: 4.
subset := OrderedCollection new: (2 raisedTo: collec size).
subset add: #().
1 to: collec size do: [:subSize |
  collec combinations: subSize atATimeDo: [:subArray | subset add:
subArray copy]].
subset

Mind the copy, because the iteration recycle same subArray for economy.

cdrick a écrit :

>> Set>>subsets
>>  | subsets |
>>  subsets := Set with: self.
>>  self asArray permutationsDo: [:e | subsets add: e asSet].
>>  self do: [:e | subsets addAll: (self copyWithout: e) subsets].
>>  ^ subsets
>
> nice too and frustrating how you got it quick :)
>
> I tried a recursive method too first but found the byte ressemblance so...
> My only consolation is that the recursive solution (subsets2) is
> slower and hardly work for Set with more than 10 elements.
>
> set := #(1 2 3 4 5) asSet.
> [ set subsets  ] timeToRun ." 1"
> [ set subsets2  ] timeToRun. " 8"
>
> set := #(1 2 3 4 5 6 7) asSet.
> [ set subsets  ] timeToRun ." 5"
> [ set subsets2  ] timeToRun. " 233"
>
> set := #(1 2 3 4 5 6 7 8 ) asSet.
> [ set subsets  ] timeToRun . " 11"
> [ set subsets2  ] timeToRun. " 1683"
>
> set := (1 to: 10) asSet.
> [ set subsets  ] timeToRun . " 46"
>
> set := (1 to: 15) asSet.
> [ set subsets  ] timeToRun ." 2484"
>
> set := (1 to: 20) asSet.
> [ set subsets  ] timeToRun .  "559953"  "but here the result has (2
> raisedTo: 20)  1 048 576 Sets :)"
>
> set := (1 to: 50) asSet.
> [ set subsets  ] timeToRun .    "I got a "space is low   wow" :)
>
> I have to go,
> That was fun :)
>
> See you
>
> Cédrick
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/beginners

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Another extension proposal -> subsets

Zulq Alam-2
In reply to this post by cedreek
Actually, we don't need #permutationsDo: as we're only looking for
combinations. With a few optimisations:

combinations
   ^ self combinationsInto: (Set new: (2 raisedTo: self size))

combinationsInto: aSet
   (aSet includes: self) ifTrue: [^ self]. "optimisation"
   aSet add: self.
   self do:
     [:each |
     (self copyWithout: each) combinationsInto: aSet].
   ^ aSet

[(1 to: 8) asSet combinations] timeToRun "215"
[(1 to: 9) asSet combinations] timeToRun "918"
[(1 to: 10) asSet combinations] timeToRun "3989"
[(1 to: 11) asSet combinations] timeToRun "16349"
[(1 to: 12) asSet combinations] timeToRun  "68780"

So a little better, but I was expecting much more. What's worrying is this:

(1 to: 10) asSet combinations size  "1024"
((1 to: 10) asSet combinations collect: [:e | e hash]) asSet size "16"

So, 1024 distinct Sets have only 16 distinct hashes between them? That
seems pretty bad. It would probably be possible to get a little bit more
out of the routine with a better (more appropriate) hash function. Even
so, I don't think it will ever be as fast as your method but will
happily be proved wrong! :)

Z.

cdrick wrote:

>> Set>>subsets
>>  | subsets |
>>  subsets := Set with: self.
>>  self asArray permutationsDo: [:e | subsets add: e asSet].
>>  self do: [:e | subsets addAll: (self copyWithout: e) subsets].
>>  ^ subsets
>
> nice too and frustrating how you got it quick :)
>
> I tried a recursive method too first but found the byte ressemblance so...
> My only consolation is that the recursive solution (subsets2) is
> slower and hardly work for Set with more than 10 elements.
>
> set := #(1 2 3 4 5) asSet.
> [ set subsets  ] timeToRun ." 1"
> [ set subsets2  ] timeToRun. " 8"
>
> set := #(1 2 3 4 5 6 7) asSet.
> [ set subsets  ] timeToRun ." 5"
> [ set subsets2  ] timeToRun. " 233"
>
> set := #(1 2 3 4 5 6 7 8 ) asSet.
> [ set subsets  ] timeToRun . " 11"
> [ set subsets2  ] timeToRun. " 1683"
>
> set := (1 to: 10) asSet.
> [ set subsets  ] timeToRun . " 46"
>
> set := (1 to: 15) asSet.
> [ set subsets  ] timeToRun ." 2484"
>
> set := (1 to: 20) asSet.
> [ set subsets  ] timeToRun .  "559953"  "but here the result has (2
> raisedTo: 20)  1 048 576 Sets :)"
>
> set := (1 to: 50) asSet.
> [ set subsets  ] timeToRun .    "I got a "space is low   wow" :)
>
> I have to go,
> That was fun :)
>
> See you
>
> Cédrick
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/beginners

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Re: Another extension proposal -> subsets

cedreek
In reply to this post by Nicolas Cellier-3
Nicolas, your initials fits you ;)

nice and twice quicker as the bit-sift one.

> You have one messsage #combinations:atATimeDo: to enumerate partitions, but
> in SequenceableCollection, not in Set:
>
> | collec subset |
> collec := 1 to: 4.
> subset := OrderedCollection new: (2 raisedTo: collec size).
> subset add: #().
> 1 to: collec size do: [:subSize |
>  collec combinations: subSize atATimeDo: [:subArray | subset add: subArray
> copy]].
> subset
>
> Mind the copy, because the iteration recycle same subArray for economy.
>
I'dnevertheless prefer a name like:  #combinationsSize:do:


Cédrick

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Another extension proposal -> subsets

Klaus D. Witzel
In reply to this post by Zulq Alam-2
On Thu, 24 Jul 2008 15:13:22 +0200, Zulq Alam wrote:

> Actually, we don't need #permutationsDo: as we're only looking for  
> combinations.

Permutations? (n factorial)? v.s. (2 raisedTo: n)?

> With a few optimisations:
>
> combinations
>    ^ self combinationsInto: (Set new: (2 raisedTo: self size))
>
> combinationsInto: aSet
>    (aSet includes: self) ifTrue: [^ self]. "optimisation"
>    aSet add: self.
>    self do:
>      [:each |
>      (self copyWithout: each) combinationsInto: aSet].
>    ^ aSet

Yeah, that's a good one :)

> [(1 to: 8) asSet combinations] timeToRun "215"
> [(1 to: 9) asSet combinations] timeToRun "918"
> [(1 to: 10) asSet combinations] timeToRun "3989"
> [(1 to: 11) asSet combinations] timeToRun "16349"
> [(1 to: 12) asSet combinations] timeToRun  "68780"
>
> So a little better, but I was expecting much more.

I think that the amount of work for (n + 1) is about (timeToRun of: n) *  
3; should be observable better with larger n's ;)

Squeak looks to be already fast in this case, your routine is almost  
optimal :) Implementing it in Set so that Set's internal can be of benefit  
won't bring these figures down much -- what remains is (aSet includes:  
another), times #copyWithout: loop, and that seems to be unavoidable.

How about naming

Collection>>#asPowerset
        ^ self asSet powersetInto: (Set new: (2 raisedTo: self size))

with Set>>#powersetInto: and putting that into the next release.

/Klaus

> What's worrying is this:
>
> (1 to: 10) asSet combinations size  "1024"
> ((1 to: 10) asSet combinations collect: [:e | e hash]) asSet size "16"
>
> So, 1024 distinct Sets have only 16 distinct hashes between them? That  
> seems pretty bad. It would probably be possible to get a little bit more  
> out of the routine with a better (more appropriate) hash function. Even  
> so, I don't think it will ever be as fast as your method but will  
> happily be proved wrong! :)
>
> Z.
>
> cdrick wrote:
>>> Set>>subsets
>>>  | subsets |
>>>  subsets := Set with: self.
>>>  self asArray permutationsDo: [:e | subsets add: e asSet].
>>>  self do: [:e | subsets addAll: (self copyWithout: e) subsets].
>>>  ^ subsets
>>  nice too and frustrating how you got it quick :)
>>  I tried a recursive method too first but found the byte ressemblance  
>> so...
>> My only consolation is that the recursive solution (subsets2) is
>> slower and hardly work for Set with more than 10 elements.
>>  set := #(1 2 3 4 5) asSet.
>> [ set subsets  ] timeToRun ." 1"
>> [ set subsets2  ] timeToRun. " 8"
>>  set := #(1 2 3 4 5 6 7) asSet.
>> [ set subsets  ] timeToRun ." 5"
>> [ set subsets2  ] timeToRun. " 233"
>>  set := #(1 2 3 4 5 6 7 8 ) asSet.
>> [ set subsets  ] timeToRun . " 11"
>> [ set subsets2  ] timeToRun. " 1683"
>>  set := (1 to: 10) asSet.
>> [ set subsets  ] timeToRun . " 46"
>>  set := (1 to: 15) asSet.
>> [ set subsets  ] timeToRun ." 2484"
>>  set := (1 to: 20) asSet.
>> [ set subsets  ] timeToRun .  "559953"  "but here the result has (2
>> raisedTo: 20)  1 048 576 Sets :)"
>>  set := (1 to: 50) asSet.
>> [ set subsets  ] timeToRun .    "I got a "space is low   wow" :)
>>  I have to go,
>> That was fun :)
>>  See you
>>  Cédrick
>>    
>> ------------------------------------------------------------------------
>>  _______________________________________________
>> Beginners mailing list
>> [hidden email]
>> http://lists.squeakfoundation.org/mailman/listinfo/beginners


_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Re: Another extension proposal -> subsets

cedreek
or withNicolas suggestion:

Set>>asPowerset
        | subset |
        subset := (OrderedCollection new: (2 raisedTo: self size))
                add: Set new;
                yourself.
        1 to: self size do: [:subSize | "don't copy anymore as there is the
Set conversion"
  self asArray combinations: subSize atATimeDo: [:subArray | subset
add: subArray asSet]].
        ^ subset asSet

I'd still prefer #combinationsSize:do:   instead of    #combinations:atATimeDo:

>
> How about naming
>
> Collection>>#asPowerset
>        ^ self asSet powersetInto: (Set new: (2 raisedTo: self size))
>
> with Set>>#powersetInto: and putting that into the next release.
>
> /Klaus
>
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Another extension proposal -> subsets

Klaus D. Witzel
On Thu, 24 Jul 2008 18:27:48 +0200, Cedrick wrote:

> or withNicolas suggestion:
>
> Set>>asPowerset
> | subset |
> subset := (OrderedCollection new: (2 raisedTo: self size))
> add: Set new;
> yourself.
> 1 to: self size do: [:subSize | "don't copy anymore as there is the
> Set conversion"
>   self asArray combinations: subSize atATimeDo: [:subArray | subset
> add: subArray asSet]].
> ^ subset asSet

Ah, you francophones always want to see Blaise Pascal's triangle at work ;)

Didn't check #combinations:* implementation, does it avoid #includes: ?

> I'd still prefer #combinationsSize:do:   instead of    
> #combinations:atATimeDo:

+1 but it looks alt-w friendly already (like #detectSum: looks ;)

>>
>> How about naming
>>
>> Collection>>#asPowerset
>>        ^ self asSet powersetInto: (Set new: (2 raisedTo: self size))
>>
>> with Set>>#powersetInto: and putting that into the next release.
>>
>> /Klaus
>>


_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Re: Another extension proposal -> subsets

cedreek
>
> Didn't check #combinations:* implementation, does it avoid #includes: ?

I think so. It calls #combinationsAt:in: after: do:  wich is recursive.

>
>> I'd still prefer #combinationsSize:do:   instead of
>>  #combinations:atATimeDo:
>
> +1 but it looks alt-w friendly already (like #detectSum: looks ;)
>

Funny to see it was probably the original name. See the comment of
#combinationsAt:in: after: (last line)

"Choose k of N items and put in aCollection.  jj-1 already chosen.
Indexes of items are in numerical order, to avoid the same combo being
used twice.  In this slot, we are allowed to use items in self indexed
by nn+1 to self size.  nn is the index used for position jj-1."
"(1 to: 6) combinationsSize: 3 do: [:each | Transcript cr; show: each
printString]"


Cédrick

>>>
>>> How about naming
>>>
>>> Collection>>#asPowerset
>>>       ^ self asSet powersetInto: (Set new: (2 raisedTo: self size))
>>>
>>> with Set>>#powersetInto: and putting that into the next release.
>>>
>>> /Klaus
>>>
>
>
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/beginners
>

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Another extension proposal -> subsets

Zulq Alam-2
In reply to this post by Klaus D. Witzel
Klaus D. Witzel wrote:

> Squeak looks to be already fast in this case, your routine is almost
> optimal :) Implementing it in Set so that Set's internal can be of
> benefit won't bring these figures down much -- what remains is (aSet
> includes: another), times #copyWithout: loop, and that seems to be
> unavoidable.

It's significantly faster with a more appropriate hashing function:

{2->0 . 3->0 . 4->1 . 5->1 . 6->3 . 7->9 . 8->22 . 9->59 . 10->138 .
11->305 . 12->686 . 13->1640 . 14->4366 . 15->10550 . 16->28104 .
17->93373 . 18->303425}

combinations
   | combinations |
   combinations := (PluggableSet new: (2 raisedTo: self size))
     hashBlock:
       [:aSet |
       aSet
         inject: aSet size
         into: [:hash :each | hash * each hash + each]];
     yourself.
   self combinationsInto: combinations.
   ^ combinations

combinationsInto: aSet
   (aSet includes: self) ifTrue: [^ self].
   aSet add: self.
   self do:
     [:each |
     (self copyWithout: each) combinationsInto: aSet].

I think the problem is that most of the sets contain similar data so the
hashes (calculated with bitXor) tend towards to a small set of values.
This means there are a lot of collisions and each #add: and #includes:
is very expensive.

>
> How about naming
>
> Collection>>#asPowerset
>     ^ self asSet powersetInto: (Set new: (2 raisedTo: self size))
>
> with Set>>#powersetInto: and putting that into the next release.
>

Even now, I'm still not clear on how one takes a change like this and
drives it into a squeak release - Tests, Mantis, then what?

Z.

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Another extension proposal -> subsets

Klaus D. Witzel
On Thu, 24 Jul 2008 20:09:53 +0200, Zulq Alam wrote:

> Klaus D. Witzel wrote:
>
>> Squeak looks to be already fast in this case, your routine is almost  
>> optimal :) Implementing it in Set so that Set's internal can be of  
>> benefit won't bring these figures down much -- what remains is (aSet  
>> includes: another), times #copyWithout: loop, and that seems to be  
>> unavoidable.
>
> It's significantly faster with a more appropriate hashing function:
>
> {2->0 . 3->0 . 4->1 . 5->1 . 6->3 . 7->9 . 8->22 . 9->59 . 10->138 .  
> 11->305 . 12->686 . 13->1640 . 14->4366 . 15->10550 . 16->28104 .  
> 17->93373 . 18->303425}

This is very good for small n :)

> combinations
>    | combinations |
>    combinations := (PluggableSet new: (2 raisedTo: self size))
>      hashBlock:
>        [:aSet |
>        aSet
>          inject: aSet size
>          into: [:hash :each | hash * each hash + each]];
>      yourself.
>    self combinationsInto: combinations.
>    ^ combinations
>
> combinationsInto: aSet
>    (aSet includes: self) ifTrue: [^ self].
>    aSet add: self.
>    self do:
>      [:each |
>      (self copyWithout: each) combinationsInto: aSet].
>
> I think the problem is that most of the sets contain similar data so the  
> hashes (calculated with bitXor) tend towards to a small set of values.  
> This means there are a lot of collisions and each #add: and #includes:  
> is very expensive.
>
>>  How about naming
>>  Collection>>#asPowerset
>>     ^ self asSet powersetInto: (Set new: (2 raisedTo: self size))
>>  with Set>>#powersetInto: and putting that into the next release.
>>
>
> Even now, I'm still not clear on how one takes a change like this and  
> drives it into a squeak release - Tests, Mantis, then what?

Put it on Mantis as enhancement into the collection category. The rest  
follows automatically ;) I suggest to put something like "fast powerset  
method for collections" into the summary line.

> Z.

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Another extension proposal -> subsets

Nicolas Cellier-3
Klaus D. Witzel <klaus.witzel <at> cobss.com> writes:

>
> On Thu, 24 Jul 2008 20:09:53 +0200, Zulq Alam wrote:
>
> This is very good for small n :)
>
> > combinations
> >    | combinations |
> >    combinations := (PluggableSet new: (2 raisedTo: self size))
> >      hashBlock:
> >        [:aSet |
> >        aSet
> >          inject: aSet size
> >          into: [:hash :each | hash * each hash + each]];
> >      yourself.
> >    self combinationsInto: combinations.
> >    ^ combinations
> >
> > combinationsInto: aSet
> >    (aSet includes: self) ifTrue: [^ self].
> >    aSet add: self.
> >    self do:
> >      [:each |
> >      (self copyWithout: each) combinationsInto: aSet].
> >
> > Even now, I'm still not clear on how one takes a change like this and  
> > drives it into a squeak release - Tests, Mantis, then what?
>
> Put it on Mantis as enhancement into the collection category. The rest  
> follows automatically ;) I suggest to put something like "fast powerset  
> method for collections" into the summary line.
>
> > Z.
>

Zulq,
the algorithm you are proposing is very simple but has major problems:

1) it is not efficient for large size n: it will do (n factorial) loops when
only (2 raisedTo: n) are necessary
        (2 raisedTo: 10) -> 1024
        (10 factorial) -> 3628800

2) each loop will result in a costly hash lookup.
Your hashBlock involve LargeInteger of size (B raisedTo: p) where
- B is the size of an elementary hash (presumably 4 bytes),
- p the size of the partition (ranging from 0 to n)
The cost of a single hash of size p involves p multiplications of growing size.
Each multiplication of k digits has a cost k^2
(this is the naive algorithm currently programmed in LargeInteger plugin).
Thus a single hash costs (B^2+(2*B)^2+...+(p*B)^2)
Combine this with the (n factorial) loops and you will understand the problem.

3) the algorithm must store all partitions even if we don't want to collect but
just to iterate on partitions.

No offense, but you'd better not bother opening a mantis issue for this time.

Cheers

Nicolas






_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Re: Another extension proposal -> subsets

cedreek
Yes.. Nico's solution is lot faster...

set := #(1 2 3 4 5) asSet.
[ set subsets  "mine"] timeToRun ." 1"
[ set subsets2  "Zulk first"  ] timeToRun. " 8"
[ set subsets4  "Nicolas"] timeToRun. " 1"
[ set subsets5  "Zulk"] timeToRun. " 2"

set := (1 to: 10) asSet.
[ set subsets  ] timeToRun . " 46"
[ set subsets4  ] timeToRun. " 18"
[ set subsets5  ] timeToRun." 114"

set := (1 to: 15) asSet.
[ set subsets  ] timeToRun ." 2484"
[ set subsets4  ] timeToRun. " 1598"
[ set subsets5  ] timeToRun." 12493"

Nicolas won ;) So maybe we can add this one and rename the
combinations:atATime: ?

Cédrick



2008/7/25 Nicolas Cellier <[hidden email]>:

> Klaus D. Witzel <klaus.witzel <at> cobss.com> writes:
>
>>
>> On Thu, 24 Jul 2008 20:09:53 +0200, Zulq Alam wrote:
>>
>> This is very good for small n :)
>>
>> > combinations
>> >    | combinations |
>> >    combinations := (PluggableSet new: (2 raisedTo: self size))
>> >      hashBlock:
>> >        [:aSet |
>> >        aSet
>> >          inject: aSet size
>> >          into: [:hash :each | hash * each hash + each]];
>> >      yourself.
>> >    self combinationsInto: combinations.
>> >    ^ combinations
>> >
>> > combinationsInto: aSet
>> >    (aSet includes: self) ifTrue: [^ self].
>> >    aSet add: self.
>> >    self do:
>> >      [:each |
>> >      (self copyWithout: each) combinationsInto: aSet].
>> >
>> > Even now, I'm still not clear on how one takes a change like this and
>> > drives it into a squeak release - Tests, Mantis, then what?
>>
>> Put it on Mantis as enhancement into the collection category. The rest
>> follows automatically ;) I suggest to put something like "fast powerset
>> method for collections" into the summary line.
>>
>> > Z.
>>
>
> Zulq,
> the algorithm you are proposing is very simple but has major problems:
>
> 1) it is not efficient for large size n: it will do (n factorial) loops when
> only (2 raisedTo: n) are necessary
>        (2 raisedTo: 10) -> 1024
>        (10 factorial) -> 3628800
>
> 2) each loop will result in a costly hash lookup.
> Your hashBlock involve LargeInteger of size (B raisedTo: p) where
> - B is the size of an elementary hash (presumably 4 bytes),
> - p the size of the partition (ranging from 0 to n)
> The cost of a single hash of size p involves p multiplications of growing size.
> Each multiplication of k digits has a cost k^2
> (this is the naive algorithm currently programmed in LargeInteger plugin).
> Thus a single hash costs (B^2+(2*B)^2+...+(p*B)^2)
> Combine this with the (n factorial) loops and you will understand the problem.
>
> 3) the algorithm must store all partitions even if we don't want to collect but
> just to iterate on partitions.
>
> No offense, but you'd better not bother opening a mantis issue for this time.
>
> Cheers
>
> Nicolas
>
>
>
>
>
>
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/beginners
>

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Another extension proposal -> subsets

Zulq Alam-2
In reply to this post by Nicolas Cellier-3
Nicolas Cellier wrote:

> Zulq,
> the algorithm you are proposing is very simple but has major problems:
>
> 1) it is not efficient for large size n: it will do (n factorial) loops when
> only (2 raisedTo: n) are necessary

It's better than N! because it will not loop over a set already
processed. For instance, for a set of 5 elements it will try 81 sets but
only process 32 of these. Not 120 in either case (5 factorial).

>
> 2) each loop will result in a costly hash lookup.
> Your hashBlock involve LargeInteger of size (B raisedTo: p) where

It doesn't need to. That was just a very rough attempt at producing a
hash that didn't evaluate to only 16 values. It should be possible to
create one that produces SmallIntegers but with a higher cardinality.

> 3) the algorithm must store all partitions even if we don't want to collect but
> just to iterate on partitions.

Yes.

>
> No offense, but you'd better not bother opening a mantis issue for this time.
>

Agreed. I was just curious about why the naive algorithm was so slow and
then as a seperate question how one gets such changes in.

Z.
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Another extension proposal -> subsets

Nicolas Cellier-3
Zulq Alam <me <at> zulq.net> writes:

>
> Nicolas Cellier wrote:
>
> > Zulq,
> > the algorithm you are proposing is very simple but has major problems:
> >
> > 1) it is not efficient for large size n: it will do (n factorial) loops when
> > only (2 raisedTo: n) are necessary
>
> It's better than N! because it will not loop over a set already
> processed. For instance, for a set of 5 elements it will try 81 sets but
> only process 32 of these. Not 120 in either case (5 factorial).
>

Apologies for this case of blindness!
I forgot the includes: test was cutting branches.

> >
> > 2) each loop will result in a costly hash lookup.
> > Your hashBlock involve LargeInteger of size (B raisedTo: p) where
>
> It doesn't need to. That was just a very rough attempt at producing a
> hash that didn't evaluate to only 16 values. It should be possible to
> create one that produces SmallIntegers but with a higher cardinality.
>

Yes, modular arithmetic would be a way to go.
Beware, "small" LargeIntegers intermediate might still spoil the game...
Crafting a good hash is an art (see work from Andres Valloud).

> > 3) the algorithm must store all partitions even if we don't want to collect
 but
> > just to iterate on partitions.
>
> Yes.
>
> >
> > No offense, but you'd better not bother opening a mantis issue for this
 time.
> >
>
> Agreed. I was just curious about why the naive algorithm was so slow and
> then as a seperate question how one gets such changes in.
>
> Z.
>

Right, you did a good job exercizing your (and our) curiousity.
This is very enlightening.
Beginners should also learn using MessageTally spyOn: [],
to analyze were the real costs go.

Cheers

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Another extension proposal -> subsets

Nicolas Cellier-3
In reply to this post by cedreek
cdrick <cdrick65 <at> gmail.com> writes:

>
> Yes.. Nico's solution is lot faster...
>
> set := #(1 2 3 4 5) asSet.
> [ set subsets  "mine"] timeToRun ." 1"
> [ set subsets2  "Zulk first"  ] timeToRun. " 8"
> [ set subsets4  "Nicolas"] timeToRun. " 1"
> [ set subsets5  "Zulk"] timeToRun. " 2"
>
> set := (1 to: 10) asSet.
> [ set subsets  ] timeToRun . " 46"
> [ set subsets4  ] timeToRun. " 18"
> [ set subsets5  ] timeToRun." 114"
>
> set := (1 to: 15) asSet.
> [ set subsets  ] timeToRun ." 2484"
> [ set subsets4  ] timeToRun. " 1598"
> [ set subsets5  ] timeToRun." 12493"
>
> Nicolas won ;) So maybe we can add this one and rename the
> combinations:atATime: ?
>
> Cédrick
>

He, press ALT-v to get versions of #combinations:atATimeDo: and thanks tk!

I like bit sift solution too for it's simplicity.
The problem is that it will iterate p times for creating each subset of size p.
#combinations:atATimeDo: does not. It is building subsets in parallel.
The required copy does an iteration though, but in a primitive!
Hence the difference...

Nicolas

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Re: Another extension proposal -> subsets

cedreek


He, press ALT-v to get versions of #combinations:atATimeDo: and thanks tk!

:)
 


I like bit sift solution too for it's simplicity.

me too even if we could argue this is not self explaining...
That's what Andres Valoud said here (thanks Marcin): http://blogten.blogspot.com/2005/09/very-nice-methods.html
 

The problem is that it will iterate p times for creating each subset of size p.
#combinations:atATimeDo: does not. It is building subsets in parallel.

uhm, ok...
the diferrence is an order or 2 more or less...
 

The required copy does an iteration though, but in a primitive!
Hence the difference...

Nicolas
 
Thanks all, this was fun and interesting ;)...

Cédrick


_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
12