Another extension proposal -> subsets

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Another extension proposal -> subsets

Nicolas Cellier-3
nicolas cellier <ncellier <at> ifrance.com> writes:

>
> He, press ALT-v to get versions of #combinations:atATimeDo: and thanks tk!
>
> I like bit sift solution too for it's simplicity.
> The problem is that it will iterate p times for creating subset of size p.
> #combinations:atATimeDo: does not. It is building subsets in parallel.
> The required copy does an iteration though, but in a primitive!
> Hence the difference...
>
> Nicolas
>

What i wrote is true only if you collect: Arrays.
It is not true if you use #asSet instead of #copy. #asSet is not primitive.

The cost of bit sift can be greatly reduced: factor the reverse operations:

Set>>subsets
   | subsetsSize subsets workArray |
   workArray := self asArray.
   subsetsSize := 2 raisedTo: self size.
   subsets := Set new: subsetsSize.
   1 to: subsetsSize do: [:sift |
      subsets add:
        (workArray maskedBy: (sift printStringBase: 2) reverse) asSet].
   ^subsets

ArrayedCollection>>maskedBy: aBitString
   | maxSize result |
   maxSize := self size min: aBitString size.
   result := (self species new: maxSize) writeStream.
   1 to: maxSize do: [:index |
      (aBitString  at: index) = $1 ifTrue: [result nextPut: (self at: index)]].
   ^result contents

While at it, anyway, you do not need to reverse at all...
Ordering does not matter.
This eliminates one loop.
It still costs 3 loops per partition:
 (1 to: maxSize) and asSet plus printStringBase:...
Thus 2P+Q loops for a subset of size P out of N with P<=Q<=N

You can eliminate the printString and send directly #bitAt: to integers.
See http://bugs.squeak.org/view.php?id=6985 and load as pre-requisite:
    Installer mantis ensureFix: 6985.

Set>>subsets
   | subsetsSize subsets workArray |
   workArray := self asArray.
   subsetsSize := 2 raisedTo: self size.
   subsets := Set new: subsetsSize.
   1 to: subsetsSize do: [:sift |
      subsets add:
        (workArray maskedByIntegerBits: sift) asSet].
   ^subsets

ArrayedCollection>>maskedByIntegerBits: anInteger
   | maxSize result |
   maxSize := self size min: anInteger highBit.
   result := (self species new: maxSize) writeStream.
   1 to: maxSize do: [:index |
      (anInteger bitAt: index) = 1 ifTrue: [result nextPut: (self at: index)]].
   ^result contents

You still have P+Q loops, P<= Q<=N
You can now eliminate the asSet loop using collect:into: / select:into:
(I think we can call this Klaus trick).

Set>>subsets
   | subsetsSize subsets workArray |
   workArray := self asArray.
   subsetsSize := 2 raisedTo: self size.
   subsets := Set new: subsetsSize.
   1 to: subsetsSize do: [:sift |
      subsets add:
        (workArray selectIntegerMask: sift into: (Set new: sift highBit)].
   ^subsets

ArrayedCollection>>selectIntegerMask: anInteger into: aCollection
   1 to: (self size min: anInteger highBit) do: [:index |
      (anInteger bitAt: index) = 1 ifTrue: [aCollection add: (self at: index)]].
   ^aCollection

Only costs Q loops per partition...
Maybe you can now approach #combinations:atAtTimeDo:
I let you try.

Nicolas

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Another extension proposal -> subsets

Nicolas Cellier-3
In reply to this post by cedreek
cdrick <cdrick65 <at> gmail.com> writes:

>
> I like bit sift solution too for it's simplicity.
>
> me too even if we could argue this is not self explaining...That's what Andres
Valoud said here (thanks Marcin):
http://blogten.blogspot.com/2005/09/very-nice-methods.html
>  
>

which is the exact transposition of #combinations:atATimeDo: ...

Nicolas




_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Re: Another extension proposal -> subsets

cedreek
In reply to this post by Nicolas Cellier-3
really interesting.
but it reveals slower but it was because of Set operations. To be consistent with other method, I used an array.

subsets6
  | subsetsSize subsets workArray |
  workArray := self asArray.
  subsetsSize := 2 raisedTo: self size.
  subsets := Array new: subsetsSize.
  1 to: subsetsSize do: [:sift |
     subsets at: sift put:
       (workArray selectIntegerMask: sift into: (Set new: sift highBit) )].
  ^subsets

Putting asSet as the end is very expensive when the collection is big


results:

set := (1 to: 15) asSet.
[ set subsets "first sift" ] timeToRun ." 2484"
[ set subsets4  "combinationsSize:do:"] timeToRun. " 1598"
[ set subsets5  "Zulk"] timeToRun." 12493"    "I think he uses Set for the subset"
[ set subsets6  "optimized sift] timeToRun. " 1436"   "Winner :) "

Thats was instructive, especially the way you deal with bit. Nice also how you removed the reverse(s).

Cédrick





_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
12