I need to create some dense sets which will contain
subsets of the integers 1..n where n is very large. I don't want to use regular sets because they would use to much space. Instead I want to map each set to an array of approximately size n/8 bytes which, because my sets will be dense, will save space. Containment of a number k in a set is to be determined by testing whether bit k is set or cleared. Pretty standard stuff I think. Is there code already that does this? If not, I will create a class LargeDenseSet containing some instance bitMap that holds the necessary space. For bitMap, is it better for me to use an array of integers or a LargePositiveInteger. How about using a BltBit? Is LargeDenseSet a sufficiently used class to be part of Squeak or at least part of some standard package? Regards, Ralph Boland |
On Sun, 22 Aug 2010, Ralph Boland wrote:
> I need to create some dense sets which will contain > subsets of the integers 1..n where n is very large. > I don't want to use regular sets because they would use to much space. > Instead I want to map each set to an array of approximately size > n/8 bytes which, because my sets will be dense, will save space. > Containment of a number k in a set is to be determined by testing > whether bit k is set or cleared. Pretty standard stuff I think. > > Is there code already that does this? Yes and no. For example Integer class >> #largePrimesUpTo:do: does this, but it's specialized for the problem, so it's not reusable. > > If not, I will create a class LargeDenseSet containing some instance > bitMap that holds the necessary space. > For bitMap, is it better for me to use an array of integers or a > LargePositiveInteger. > How about using a BltBit? I would just roll my own variableByteSubclass. > > Is LargeDenseSet a sufficiently used class to be part of Squeak or at > least part of > some standard package? I guess not. People rarely face problems which require such a special collection. SkipLists were removed, and I think we are fine without them 99.9% of the time. Levente > > Regards, > > Ralph Boland > > |
In reply to this post by Ralph Boland
You can see example in WideCharacterSet in image, or also load
BitArray from SqueakMap. See also these messages for playing with bits http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-December/141893.html And other tricks like how to enumerate bits in a byte... My own answsers: | x | x := 16rE4. { [ | y | y := 1. 0 to: 7 do: [:iBit | (x bitAnd: y) = 0 ifTrue: [iBit + 1]. y := y *2]] bench. [| y pow2 iBit | y := x. pow2 := #[1 2 4 8 16 32 64 128]. [y == 0] whileFalse: [y := y - (pow2 at: (iBit := y highBitOfPositiveReceiver)). iBit]] bench. [| y h iBit | y := x. iBit := 8. h := 128. [y == 0] whileFalse: [[y < h] whileTrue: [h := h bitShift: -1. iBit := iBit - 1]. y := y - h. h := h bitShift: -1. iBit := iBit - 1]] bench. [| y h iBit | y := x. iBit := 8. pow2 := #[1 2 4 8 16 32 64 128]. [y == 0] whileFalse: [[y < (h := pow2 at: iBit)] whileTrue: [iBit := iBit - 1]. y := y - h. iBit]] bench. [| y pow2 hb iBit | y := x. pow2 := #[1 2 4 8 16 32 64 128]. hb := #[1 2 2 3 3 3 3 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8]. [y == 0] whileFalse: [y := y - (pow2 at: (iBit := hb at: y)). iBit]] bench. Nicolas 2010/8/23 Ralph Boland <[hidden email]>: > I need to create some dense sets which will contain > subsets of the integers 1..n where n is very large. > I don't want to use regular sets because they would use to much space. > Instead I want to map each set to an array of approximately size > n/8 bytes which, because my sets will be dense, will save space. > Containment of a number k in a set is to be determined by testing > whether bit k is set or cleared. Pretty standard stuff I think. > > Is there code already that does this? > > If not, I will create a class LargeDenseSet containing some instance > bitMap that holds the necessary space. > For bitMap, is it better for me to use an array of integers or a > LargePositiveInteger. > How about using a BltBit? > > Is LargeDenseSet a sufficiently used class to be part of Squeak or at > least part of > some standard package? > > Regards, > > Ralph Boland > > |
In reply to this post by Ralph Boland
>> I need to create some dense sets which will contain
>> subsets of the integers á1..n where n is very large. >> I don't want to use regular sets because they would use to much space. ... >> Is there code already that does this? ... >> Ralph Boland > > You can see example in WideCharacterSet in image, or also load > BitArray from SqueakMap. > See also these messages for playing with bits > http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-December/141893.html ... I downloaded BitArray and discovered that it used Form. This seemed backwards to me so I looked at Form and discovered that Form uses Bitmap. Bitmap looked like what I was looking for so I looked at Bitmap. To use Bitmap I would need to add methods at: and at:put: based on code from class BitArray. However, the code in BitArray use instance variables wordCache cacheIndex which make accessing consecutive bits more efficient. So my plan was to create class BitSet which will be a subclass of Bitmap and have instance variables wordCache and casheIndex. Of course this didn't work because I am not allowed to add instance variables to BitSet. So my plan now is to make BitSet a subclass of ArrayedCollection with an instance variable bitmap holding objects of class Bitmap. I will implement some but not all of the methods in BitArray. I will also implement methods booleanAt: and booleanAt:put:. BitSet will be a class in a package that I will eventually release to SqueakMap so I thought it was worth allowing comments before I implement BitSet. So comments welcome. Regards, Ralph Boland |
At Mon, 23 Aug 2010 19:37:28 -0600,
Ralph Boland wrote: > > >> I need to create some dense sets which will contain > >> subsets of the integers á1..n where n is very large. > >> I don't want to use regular sets because they would use to much space. > ... > >> Is there code already that does this? > ... > >> Ralph Boland > > > > You can see example in WideCharacterSet in image, or also load > > BitArray from SqueakMap. > > > See also these messages for playing with bits > > http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-December/141893.html > ... > > I downloaded BitArray and discovered that it used Form. > This seemed backwards to me so I looked at Form and discovered > that Form uses Bitmap. Bitmap looked like what I was looking for > so I looked at Bitmap. What is backwards? A collection of bits should represent a bitmap but a bitmap should not represent a collection of bits? > To use Bitmap I would need to add methods at: and at:put: based on code > from class BitArray. And, these accessors are pretty much provided via Form. Yes, Form is a facade to a collection of bits so to me, it is pretty good way to represent bits. > However, the code in BitArray use instance variables > wordCache cacheIndex which make accessing consecutive bits more efficient. You should measure the effects of the cache for your use case. It may not be needed. > So my plan was to create class BitSet which will be a subclass of Bitmap and > have instance variables wordCache and casheIndex. Of course this didn't work > because I am not allowed to add instance variables to BitSet. > So my plan now is to make BitSet a subclass of ArrayedCollection with > an instance variable bitmap holding objects of class Bitmap. > I will implement some but not all of the methods in BitArray. > I will also implement methods booleanAt: and booleanAt:put:. In that way, you cannot take advantage of existing primitives for Forms. These are pretty efficient. Why don't you add a few methods to BitArray (like add:, includes: and etc.)? BitArray would suddenly behave as if a set of integers. > BitSet will be a class in a package that I will eventually release to SqueakMap > so I thought it was worth allowing comments before I implement > BitSet. BitArray is also on SqueakSource. If you are happy with the idea of just adding some features to it, I don't mind adding you as a commitor. -- Yoshiki |
In reply to this post by Ralph Boland
Hi Ralph,
On Sun, Aug 22, 2010 at 6:11 PM, Ralph Boland <[hidden email]> wrote: > I need to create some dense sets which will contain > subsets of the integers 1..n where n is very large. > I don't want to use regular sets because they would use to much space. > Instead I want to map each set to an array of approximately size > n/8 bytes which, because my sets will be dense, will save space. > Containment of a number k in a set is to be determined by testing > whether bit k is set or cleared. Pretty standard stuff I think. > > Is there code already that does this? I'm not positive what your requirements are, but a number of special-collections related to storing Integers are available when you load Magma. "MaLargeArrayOfNumbers" is an auto-growing, never-shrinking array of numbers. All numbers have the same number of bits available to represent them, which must be set at creation time. It supports the #add:, #at:, #at:put:, and #size API, and uses a file to allow an array-of-numbers that is larger than available RAM. "MaIntervalCollection" keeps track of a collection of intervals. As intervals are added that intersect or are within the proximityThreshold of existing intervals, the existing interval is enlarged to include for the new interval. If an interval is removed in the middle of a large interval such that a "hole" is created, a new interval will be created and the adjacent one is adjusted so that they will represent only the valid ranges. Finally, MaHashIndex is a Integer key / Integer value Dictionary that, like MaLargeArrayOfNumbers, can employ the file-system for sizes larger than RAM, as well as access by index position. However, it also allows access by key. All of these are part of a subordinate package ("Ma special collections") that can be used without Magma, I just don't have updated separate configurations for all sub-packages... HTH, Chris |
Free forum by Nabble | Edit this page |