large dense sets of 1..n

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

large dense sets of 1..n

Ralph Boland
I need to create some dense sets which will contain
subsets of the integers  1..n where n is very large.
I don't want to use regular sets because they would use to much space.
Instead I want to map each set to an array of approximately size
n/8 bytes which, because my sets will be dense, will save space.
Containment of a number k  in a set is to be determined by testing
whether bit  k  is set or cleared.  Pretty standard stuff I think.

Is there code already that does this?

If not, I will create a class LargeDenseSet containing some instance
bitMap that holds the necessary space.
For  bitMap,  is it better for me to use an array of integers or a
LargePositiveInteger.
How about using a BltBit?

Is  LargeDenseSet a sufficiently used class to be part of Squeak or at
least part of
some standard package?

Regards,

Ralph Boland

Reply | Threaded
Open this post in threaded view
|

Re: large dense sets of 1..n

Levente Uzonyi-2
On Sun, 22 Aug 2010, Ralph Boland wrote:

> I need to create some dense sets which will contain
> subsets of the integers  1..n where n is very large.
> I don't want to use regular sets because they would use to much space.
> Instead I want to map each set to an array of approximately size
> n/8 bytes which, because my sets will be dense, will save space.
> Containment of a number k  in a set is to be determined by testing
> whether bit  k  is set or cleared.  Pretty standard stuff I think.
>
> Is there code already that does this?

Yes and no. For example Integer class >> #largePrimesUpTo:do: does this,
but it's specialized for the problem, so it's not reusable.

>
> If not, I will create a class LargeDenseSet containing some instance
> bitMap that holds the necessary space.
> For  bitMap,  is it better for me to use an array of integers or a
> LargePositiveInteger.
> How about using a BltBit?

I would just roll my own variableByteSubclass.

>
> Is  LargeDenseSet a sufficiently used class to be part of Squeak or at
> least part of
> some standard package?

I guess not. People rarely face problems which require such a special
collection. SkipLists were removed, and I think we are fine without them
99.9% of the time.


Levente

>
> Regards,
>
> Ralph Boland
>
>

Reply | Threaded
Open this post in threaded view
|

Re: large dense sets of 1..n

Nicolas Cellier
In reply to this post by Ralph Boland
You can see example in WideCharacterSet in image, or also load
BitArray from SqueakMap.

See also these messages for playing with bits
http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-December/141893.html

And other tricks like how to
enumerate bits in a byte...
My own answsers:

| x |
x := 16rE4.
{
[ | y |
y := 1. 0 to: 7 do: [:iBit | (x bitAnd: y) = 0 ifTrue: [iBit + 1]. y
:= y *2]] bench.

[| y pow2 iBit |
y := x.
pow2 := #[1 2 4 8 16 32 64 128].
[y == 0] whileFalse: [y := y - (pow2 at: (iBit := y
highBitOfPositiveReceiver)). iBit]] bench.

[| y h iBit |
y := x.
iBit := 8.
h := 128.
[y == 0] whileFalse: [[y < h] whileTrue: [h := h bitShift: -1. iBit :=
iBit - 1]. y := y - h. h := h bitShift: -1. iBit := iBit - 1]] bench.

[| y h iBit |
y := x.
iBit := 8.
pow2 := #[1 2 4 8 16 32 64 128].
[y == 0] whileFalse: [[y < (h := pow2 at: iBit)] whileTrue: [iBit :=
iBit - 1]. y := y - h. iBit]] bench.

[| y pow2 hb iBit |
y := x.
pow2 := #[1 2 4 8 16 32 64 128].
hb := #[1 2 2 3 3 3 3 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7
7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
8 8 8 8 8 8 8 8 8 8 8 8 8 8].
[y == 0] whileFalse: [y := y - (pow2 at: (iBit := hb at: y)). iBit]] bench.

Nicolas

2010/8/23 Ralph Boland <[hidden email]>:

> I need to create some dense sets which will contain
> subsets of the integers  1..n where n is very large.
> I don't want to use regular sets because they would use to much space.
> Instead I want to map each set to an array of approximately size
> n/8 bytes which, because my sets will be dense, will save space.
> Containment of a number k  in a set is to be determined by testing
> whether bit  k  is set or cleared.  Pretty standard stuff I think.
>
> Is there code already that does this?
>
> If not, I will create a class LargeDenseSet containing some instance
> bitMap that holds the necessary space.
> For  bitMap,  is it better for me to use an array of integers or a
> LargePositiveInteger.
> How about using a BltBit?
>
> Is  LargeDenseSet a sufficiently used class to be part of Squeak or at
> least part of
> some standard package?
>
> Regards,
>
> Ralph Boland
>
>

Reply | Threaded
Open this post in threaded view
|

Re: large dense sets of 1..n

Ralph Boland
In reply to this post by Ralph Boland
>> I need to create some dense sets which will contain
>> subsets of the integers á1..n where n is very large.
>> I don't want to use regular sets because they would use to much space.
...
>> Is there code already that does this?
...
>> Ralph Boland
>
> You can see example in WideCharacterSet in image, or also load
> BitArray from SqueakMap.

> See also these messages for playing with bits
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-December/141893.html
...

I downloaded BitArray and discovered that it used Form.
This seemed backwards to me so I looked at Form and discovered
that Form uses Bitmap.  Bitmap looked like what I was looking for
so I looked at Bitmap.
To use Bitmap I would need to add methods  at:  and  at:put:  based on code
from class BitArray. However, the code in BitArray use instance variables
wordCache cacheIndex which make accessing consecutive bits more efficient.

So my plan was to create class BitSet which will be a subclass of Bitmap and
have instance variables wordCache and casheIndex.  Of course this didn't work
because I am not allowed to add instance variables to BitSet.

So my plan now is to make BitSet a subclass of ArrayedCollection with
an instance variable bitmap holding objects of class Bitmap.
I will implement some but not all of the methods in BitArray.
I will also implement methods booleanAt:  and booleanAt:put:.

BitSet will be a class in a package that I will eventually release to SqueakMap
so I thought it was worth allowing comments before I implement BitSet.

So comments welcome.

Regards,

Ralph Boland

Reply | Threaded
Open this post in threaded view
|

Re: large dense sets of 1..n

Yoshiki Ohshima-2
At Mon, 23 Aug 2010 19:37:28 -0600,
Ralph Boland wrote:

>
> >> I need to create some dense sets which will contain
> >> subsets of the integers á1..n where n is very large.
> >> I don't want to use regular sets because they would use to much space.
> ...
> >> Is there code already that does this?
> ...
> >> Ralph Boland
> >
> > You can see example in WideCharacterSet in image, or also load
> > BitArray from SqueakMap.
>
> > See also these messages for playing with bits
> > http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-December/141893.html
> ...
>
> I downloaded BitArray and discovered that it used Form.
> This seemed backwards to me so I looked at Form and discovered
> that Form uses Bitmap.  Bitmap looked like what I was looking for
> so I looked at Bitmap.

  What is backwards?  A collection of bits should represent a bitmap but a
bitmap should not represent a collection of bits?

> To use Bitmap I would need to add methods  at:  and  at:put:  based on code
> from class BitArray.

  And, these accessors are pretty much provided via Form.  Yes, Form
is a facade to a collection of bits so to me, it is pretty good way to
represent bits.

>  However, the code in BitArray use instance variables
> wordCache cacheIndex which make accessing consecutive bits more efficient.

  You should measure the effects of the cache for your use case.  It
may not be needed.

> So my plan was to create class BitSet which will be a subclass of Bitmap and
> have instance variables wordCache and casheIndex.  Of course this didn't work
> because I am not allowed to add instance variables to BitSet.
> So my plan now is to make BitSet a subclass of ArrayedCollection with
> an instance variable bitmap holding objects of class Bitmap.
> I will implement some but not all of the methods in BitArray.
> I will also implement methods booleanAt:  and booleanAt:put:.

  In that way, you cannot take advantage of existing primitives for
Forms.  These are pretty efficient.  Why don't you add a few methods
to BitArray (like add:, includes: and etc.)?  BitArray would suddenly
behave as if a set of integers.

> BitSet will be a class in a package that I will eventually release to SqueakMap
> so I thought it was worth allowing comments before I implement
> BitSet.

  BitArray is also on SqueakSource.  If you are happy with the idea of
just adding some features to it, I don't mind adding you as a commitor.

-- Yoshiki

Reply | Threaded
Open this post in threaded view
|

Re: large dense sets of 1..n

Chris Muller-3
In reply to this post by Ralph Boland
Hi Ralph,

On Sun, Aug 22, 2010 at 6:11 PM, Ralph Boland <[hidden email]> wrote:
> I need to create some dense sets which will contain
> subsets of the integers  1..n where n is very large.
> I don't want to use regular sets because they would use to much space.
> Instead I want to map each set to an array of approximately size
> n/8 bytes which, because my sets will be dense, will save space.
> Containment of a number k  in a set is to be determined by testing
> whether bit  k  is set or cleared.  Pretty standard stuff I think.
>
> Is there code already that does this?

I'm not positive what your requirements are, but a number of
special-collections related to storing Integers are available when you
load Magma.

"MaLargeArrayOfNumbers" is an auto-growing, never-shrinking array of
numbers.  All numbers have the same number of bits available to
represent them, which must be set at creation time.  It supports the
#add:, #at:, #at:put:, and #size API, and uses a file to allow an
array-of-numbers that is larger than available RAM.

"MaIntervalCollection" keeps track of a collection of intervals.  As
intervals are added that intersect or are within the
proximityThreshold of existing intervals, the existing interval is
enlarged to include for the new interval.  If an interval is removed
in the middle of a large interval such that a "hole" is created, a new
interval will be created and the adjacent one is adjusted so that they
will represent only the valid ranges.

Finally, MaHashIndex is a Integer key / Integer value Dictionary that,
like MaLargeArrayOfNumbers, can employ the file-system for sizes
larger than RAM, as well as access by index position.  However, it
also allows access by key.

All of these are part of a subordinate package ("Ma special
collections") that can be used without Magma, I just don't have
updated separate configurations for all sub-packages...

HTH,
  Chris