Hi all, assuming I have a special-kind of array that can grow without
enumerating the entire old array, is there any way to implement a Dictionary that can grow without enumerating its internal array? I'll elaborate. In our own Dictionary, when we grow there are two reasons why we must enumerate the entire old array: 1) because Array's in Squeak can't grow; for that we have OrderedCollection. So we must allocate a new Array and copy the existing elements to the new-sized array. 2) during this time, the elements are redistributed based on modulo the new array's size. I have a special kind of Array (let's call it SpecialArray) that can grow without having to re-allocate (because it's on disk and files can grow). I want to use this as the basis for a new kind of Dictionary, but I want to avoid the need to _redistribute_ the elements. One idea is to pre-allocate as large a SpecialArray as I think I'll ever need (say, 30 million) and simply don't grow. At 6-bytes per pointer, that's a 90MB file, but at least it won't ever get any bigger. "Disk space is cheap," and, in most cases, this actually is fine for my needs. Still, it prevents one from having a lot of instances of SpecialArray if they wanted, and one doesn't always know what will be "more than enough". Is there a more elegant solution that could make use of SpecialArray's ability to grow, but not redistribute? It seems like not, since not only we need to use the new "upper half" of the Array for new elements, we still need to find elements in the "lower-half" (the first half). The size of the internal array is used in the hash key calculation, so that would be a problem for finding the old elements.. - Chris PS - I thought I remember someone posting a book about hashing algorithms, but I cannot find it.. |
On Wed, 3 Nov 2010, Chris Muller wrote:
> Hi all, assuming I have a special-kind of array that can grow without > enumerating the entire old array, is there any way to implement a > Dictionary that can grow without enumerating its internal array? I'll > elaborate. I guess you are looking for something like this: http://en.wikipedia.org/wiki/Linear_hashing http://en.wikipedia.org/wiki/Consistent_hashing Levente > > In our own Dictionary, when we grow there are two reasons why we must > enumerate the entire old array: > > 1) because Array's in Squeak can't grow; for that we have > OrderedCollection. So we must allocate a new Array and copy the > existing elements to the new-sized array. > 2) during this time, the elements are redistributed based on modulo > the new array's size. > > I have a special kind of Array (let's call it SpecialArray) that can > grow without having to re-allocate (because it's on disk and files can > grow). I want to use this as the basis for a new kind of Dictionary, > but I want to avoid the need to _redistribute_ the elements. > > One idea is to pre-allocate as large a SpecialArray as I think I'll > ever need (say, 30 million) and simply don't grow. At 6-bytes per > pointer, that's a 90MB file, but at least it won't ever get any > bigger. "Disk space is cheap," and, in most cases, this actually is > fine for my needs. > > Still, it prevents one from having a lot of instances of SpecialArray > if they wanted, and one doesn't always know what will be "more than > enough". > > Is there a more elegant solution that could make use of SpecialArray's > ability to grow, but not redistribute? It seems like not, since not > only we need to use the new "upper half" of the Array for new > elements, we still need to find elements in the "lower-half" (the > first half). The size of the internal array is used in the hash key > calculation, so that would be a problem for finding the old elements.. > > - Chris > > PS - I thought I remember someone posting a book about hashing > algorithms, but I cannot find it.. > > |
Depending on the application, you may also want to use hash buckets.
On 11/3/10 11:55 , Levente Uzonyi wrote: > On Wed, 3 Nov 2010, Chris Muller wrote: > >> Hi all, assuming I have a special-kind of array that can grow without >> enumerating the entire old array, is there any way to implement a >> Dictionary that can grow without enumerating its internal array? I'll >> elaborate. > > I guess you are looking for something like this: > http://en.wikipedia.org/wiki/Linear_hashing > http://en.wikipedia.org/wiki/Consistent_hashing > > > Levente > >> >> In our own Dictionary, when we grow there are two reasons why we must >> enumerate the entire old array: >> >> 1) because Array's in Squeak can't grow; for that we have >> OrderedCollection. So we must allocate a new Array and copy the >> existing elements to the new-sized array. >> 2) during this time, the elements are redistributed based on modulo >> the new array's size. >> >> I have a special kind of Array (let's call it SpecialArray) that can >> grow without having to re-allocate (because it's on disk and files can >> grow). I want to use this as the basis for a new kind of Dictionary, >> but I want to avoid the need to _redistribute_ the elements. >> >> One idea is to pre-allocate as large a SpecialArray as I think I'll >> ever need (say, 30 million) and simply don't grow. At 6-bytes per >> pointer, that's a 90MB file, but at least it won't ever get any >> bigger. "Disk space is cheap," and, in most cases, this actually is >> fine for my needs. >> >> Still, it prevents one from having a lot of instances of SpecialArray >> if they wanted, and one doesn't always know what will be "more than >> enough". >> >> Is there a more elegant solution that could make use of SpecialArray's >> ability to grow, but not redistribute? It seems like not, since not >> only we need to use the new "upper half" of the Array for new >> elements, we still need to find elements in the "lower-half" (the >> first half). The size of the internal array is used in the hash key >> calculation, so that would be a problem for finding the old elements.. >> >> - Chris >> >> PS - I thought I remember someone posting a book about hashing >> algorithms, but I cannot find it.. >> >> > > . > |
In reply to this post by Chris Muller-3
Hi Chris,
Look at the ideas in a Split-Order-List hash table, which allows expansion without re-hashing. I wrote a sample implementation in Squeak a while back: -------- begin quote (of my own outgoing mail, a few months ago) I've created a project on squeaksouce for it, and saved one further round of refactoring/development, but then I ran out of time to take it much further. Look for the project: http://www.squeaksource.com/SOLHashTable.html The class comments contain a reference to the main paper I used to implement the algorithm. There's a reasonable amount of testing code somewhere in there too, but my not knowing the ins and outs of the Squeak test framework means the test code is organized as "just a bunch of code" :). ------------- end quote You'll want to look at the main paper (see the class comment) and then follow the the references back as far as you need to in order to understand the SOL. Or just play with the squeak implementation :). The main paper is all about using SOL for lock-free hashing, but you don't need the lock free stuff, just the non-enumeration for expansion. On 2010-Nov-3, at 11:16, Chris Muller wrote: > Hi all, assuming I have a special-kind of array that can grow without > enumerating the entire old array, is there any way to implement a > Dictionary that can grow without enumerating its internal array? I'll > elaborate. > > In our own Dictionary, when we grow there are two reasons why we must > enumerate the entire old array: > > 1) because Array's in Squeak can't grow; for that we have > OrderedCollection. So we must allocate a new Array and copy the > existing elements to the new-sized array. > 2) during this time, the elements are redistributed based on modulo > the new array's size. > > I have a special kind of Array (let's call it SpecialArray) that can > grow without having to re-allocate (because it's on disk and files can > grow). I want to use this as the basis for a new kind of Dictionary, > but I want to avoid the need to _redistribute_ the elements. > > One idea is to pre-allocate as large a SpecialArray as I think I'll > ever need (say, 30 million) and simply don't grow. At 6-bytes per > pointer, that's a 90MB file, but at least it won't ever get any > bigger. "Disk space is cheap," and, in most cases, this actually is > fine for my needs. > > Still, it prevents one from having a lot of instances of SpecialArray > if they wanted, and one doesn't always know what will be "more than > enough". > > Is there a more elegant solution that could make use of SpecialArray's > ability to grow, but not redistribute? It seems like not, since not > only we need to use the new "upper half" of the Array for new > elements, we still need to find elements in the "lower-half" (the > first half). The size of the internal array is used in the hash key > calculation, so that would be a problem for finding the old elements.. > > - Chris > > PS - I thought I remember someone posting a book about hashing > algorithms, but I cannot find it.. > -- Tom Rushworth |
Oh, great responses, thank you all.
|
Free forum by Nabble | Edit this page |