Faster dictionaries?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Faster dictionaries?

Igor Stasenko
its a scratch implementation of dictionaries using linked list of associations.
dont care about names, care about numbers:
(code to run taken from MaDictionary class comment)

magma vs mine:

time to add to sd: 161
time to add to md: 1865
time to access to sd: 124
time to access to md: 597
time to replace to sd: 114
time to replace to md: 676
time to remove 300 from sd: 2
time to remove 300 from md: 3568
--
sd - my dictionary
md - MaWeakIdentityKeyDictionary

smalltalk vs mine:
--
time to add to sd: 159
time to add to md: 63178
time to access to sd: 147
time to access to md: 49052
time to replace to sd: 132
time to replace to md: 50736
time to remove 300 from sd: 2
--
sd - my dict
md - smalltalk Dictionary. removing keys takes forever, so i stopped
it before it printed a result..



BucketDictionary.st (6K) Download Attachment
DictBucket.st (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Damien Cassou-3
Hi,

your work sounds interesting to me. Is your dictionary passing the
tests given in DictionaryTest? If yes, why not opening a bug report on
mantis and try to have it integrated into a future squeak release?

Bye

2007/7/19, sig <[hidden email]>:

> its a scratch implementation of dictionaries using linked list of associations.
> dont care about names, care about numbers:
> (code to run taken from MaDictionary class comment)
>
> magma vs mine:
>
> time to add to sd: 161
> time to add to md: 1865
> time to access to sd: 124
> time to access to md: 597
> time to replace to sd: 114
> time to replace to md: 676
> time to remove 300 from sd: 2
> time to remove 300 from md: 3568
> --
> sd - my dictionary
> md - MaWeakIdentityKeyDictionary
>
> smalltalk vs mine:
> --
> time to add to sd: 159
> time to add to md: 63178
> time to access to sd: 147
> time to access to md: 49052
> time to replace to sd: 132
> time to replace to md: 50736
> time to remove 300 from sd: 2
> --
> sd - my dict
> md - smalltalk Dictionary. removing keys takes forever, so i stopped
> it before it printed a result..
>
>
>
>
>


--
Damien Cassou

Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Igor Stasenko
as i noted its a scratch.. didn't even tested it yet. :)
i want to adopt it for magma at first place.

On 19/07/07, Damien Cassou <[hidden email]> wrote:

> Hi,
>
> your work sounds interesting to me. Is your dictionary passing the
> tests given in DictionaryTest? If yes, why not opening a bug report on
> mantis and try to have it integrated into a future squeak release?
>
> Bye
>
> 2007/7/19, sig <[hidden email]>:
> > its a scratch implementation of dictionaries using linked list of associations.
> > dont care about names, care about numbers:
> > (code to run taken from MaDictionary class comment)
> >
> > magma vs mine:
> >
> > time to add to sd: 161
> > time to add to md: 1865
> > time to access to sd: 124
> > time to access to md: 597
> > time to replace to sd: 114
> > time to replace to md: 676
> > time to remove 300 from sd: 2
> > time to remove 300 from md: 3568
> > --
> > sd - my dictionary
> > md - MaWeakIdentityKeyDictionary
> >
> > smalltalk vs mine:
> > --
> > time to add to sd: 159
> > time to add to md: 63178
> > time to access to sd: 147
> > time to access to md: 49052
> > time to replace to sd: 132
> > time to replace to md: 50736
> > time to remove 300 from sd: 2
> > --
> > sd - my dict
> > md - smalltalk Dictionary. removing keys takes forever, so i stopped
> > it before it printed a result..
> >
> >
> >
> >
> >
>
>
> --
> Damien Cassou
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Colin Putney
In reply to this post by Igor Stasenko

On Jul 19, 2007, at 5:10 AM, sig wrote:

> its a scratch implementation of dictionaries using linked list of  
> associations.
> dont care about names, care about numbers:
> (code to run taken from MaDictionary class comment)

Interesting... but another set of numbers that matter are memory  
usage stats. Can you provide a comparison of the memory used by each  
implementation after the tests are run?

Also, you might want to look at the HashTable package on  
SqueakSource, which I suspect does something similar.

Colin

Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Igor Stasenko
Im not really care about memory footprint.
Dictionaries using array of Association-s
Association having 2 ivars.
My Association add 3rd.
So, comparing raw memory footprint of association objects it less than 1.5.

But as it shows, that speed increased rougly by 1-2 factors (comparing
to magma dicts)
and by 3-4 factors comparing squeak current dicts.

I think this is the case when speed over size is preferable :)

On 19/07/07, Colin Putney <[hidden email]> wrote:

>
> On Jul 19, 2007, at 5:10 AM, sig wrote:
>
> > its a scratch implementation of dictionaries using linked list of
> > associations.
> > dont care about names, care about numbers:
> > (code to run taken from MaDictionary class comment)
>
> Interesting... but another set of numbers that matter are memory
> usage stats. Can you provide a comparison of the memory used by each
> implementation after the tests are run?
>
> Also, you might want to look at the HashTable package on
> SqueakSource, which I suspect does something similar.
>
> Colin
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Andreas.Raab
In reply to this post by Damien Cassou-3
Damien Cassou wrote:
> your work sounds interesting to me. Is your dictionary passing the
> tests given in DictionaryTest? If yes, why not opening a bug report on
> mantis and try to have it integrated into a future squeak release?

I'd recommend running your own benchmarks first. Claims like here:

>> smalltalk vs mine:
>> --
>> time to add to sd: 159
>> time to add to md: 63178
>> time to access to sd: 147
>> time to access to md: 49052
>> time to replace to sd: 132
>> time to replace to md: 50736
>> time to remove 300 from sd: 2
>> --
>> sd - my dict
>> md - smalltalk Dictionary. removing keys takes forever, so i stopped
>> it before it printed a result..

... where BucketDictionary is supposedly 500 times faster in every
regard (adding, accessing, replacing) sound to good to be true. I love
500x speed improvements just like the next guy but in my experience
these don't just happen.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Igor Stasenko
Wait a bit more. I done with basic BDictionary class, now
adding Identity/Weak subclasses.
All 22 tests which i found for Dictionary in my 3.10 image passed.

On 19/07/07, Andreas Raab <[hidden email]> wrote:

> Damien Cassou wrote:
> > your work sounds interesting to me. Is your dictionary passing the
> > tests given in DictionaryTest? If yes, why not opening a bug report on
> > mantis and try to have it integrated into a future squeak release?
>
> I'd recommend running your own benchmarks first. Claims like here:
>
> >> smalltalk vs mine:
> >> --
> >> time to add to sd: 159
> >> time to add to md: 63178
> >> time to access to sd: 147
> >> time to access to md: 49052
> >> time to replace to sd: 132
> >> time to replace to md: 50736
> >> time to remove 300 from sd: 2
> >> --
> >> sd - my dict
> >> md - smalltalk Dictionary. removing keys takes forever, so i stopped
> >> it before it printed a result..
>
> ... where BucketDictionary is supposedly 500 times faster in every
> regard (adding, accessing, replacing) sound to good to be true. I love
> 500x speed improvements just like the next guy but in my experience
> these don't just happen.
>
> Cheers,
>    - Andreas
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Andres Valloud-3
In reply to this post by Igor Stasenko
sig wrote:
> its a scratch implementation of dictionaries using linked list of
> associations.
> dont care about names, care about numbers:
> (code to run taken from MaDictionary class comment)
What values were stored in the dictionaries when you run the benchmarks?

Thanks,
Andres.

Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Igor Stasenko
On 19/07/07, Andres Valloud <[hidden email]> wrote:

> sig wrote:
> > its a scratch implementation of dictionaries using linked list of
> > associations.
> > dont care about names, care about numbers:
> > (code to run taken from MaDictionary class comment)
> What values were stored in the dictionaries when you run the benchmarks?
>
> Thanks,
> Andres.
>
>
look at MaDictionary class comment.

both dictionaries used in comparison tests populated simply using just:

obj := Object new.
dict at: obj put: obj.

Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Andres Valloud-3
sig wrote:
> look at MaDictionary class comment.
>
> both dictionaries used in comparison tests populated simply using just:
>
> obj := Object new.
> dict at: obj put: obj.
I don't have a Squeak image handy right now, but I'd suggest checking
these cases out...

Time it takes to add all strings in the image to a dictionary.

Time it takes to add all symbols in the image to a dictionary.

Time it takes to add 100k consecutive integers as keys of a dictionary.

Thanks,
Andres.


Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Ian Piumarta-2
In reply to this post by Igor Stasenko
On Jul 19, 2007, at 5:10 AM, sig wrote:

> its a scratch implementation of dictionaries using linked list of  
> associations.

I think the two 'at:ifAbsent:' methods neglect to send #value to the  
errorBlock?

Otherwise the lookup performance ain't too shabby for relatively  
small tallies.  (For anyone who can decode this sentence: BucketDict  
is almost as fast as Pepsi's SlotDictionary when used as the basis  
for lexical environments in the Jolt compiler.  This is my usual  
'quasi-realistic' benchmark for a Dictionary.)

One classic optimisation would be to use an ArrayedCollection of some  
kind (instead of a linked list) for the buckets.  When #findKeyOrNil:  
hits and the association is not the first element in the bucket then  
exchange it with the element immediately preceding it.  Associations  
related to popular keys will gradually migrate to the front of their  
buckets.

Cheers,
Ian


Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Nicolas Cellier-3
In reply to this post by Andres Valloud-3
Andres Valloud a écrit :

> sig wrote:
>> look at MaDictionary class comment.
>>
>> both dictionaries used in comparison tests populated simply using just:
>>
>> obj := Object new.
>> dict at: obj put: obj.
> I don't have a Squeak image handy right now, but I'd suggest checking
> these cases out...
>
> Time it takes to add all strings in the image to a dictionary.
>
> Time it takes to add all symbols in the image to a dictionary.
>
> Time it takes to add 100k consecutive integers as keys of a dictionary.
>
> Thanks,
> Andres.
>

sig's BucketDictionary is optimized in the case of Poorly hashed keys.

Strings, Symbols and integers have rather good hash in Squeak, and in
this case sig's BucketDictionary is slower than regular Dictionary.

Ian Piumarta a écrit :
 >
 > One classic optimisation would be to use an ArrayedCollection of some
 > kind (instead of a linked list) for the buckets.  When #findKeyOrNil:
 > hits and the association is not the first element in the bucket then
 > exchange it with the element immediately preceding it.  Associations
 > related to popular keys will gradually migrate to the front of their
 > buckets.
 >
 > Cheers,
 > Ian
 >

Ian, very bright to swap popular toward head.
But the Bucket efficiency in adding/removing comes from linked list.
How a copy of a bucket array would do better?
Maybe you mean avoiding writing something ugly like:

BucketDictionary>>findKeyOrNil: key
| index bucket prevprev prev next |
index := (key hash \\ array size) + 1.
bucket := array at: index.
prev := prevprev := nil.
next := bucket.
[next key = key ifTrue: [
        prev ifNotNil:
                [prev next: next next.
                next next: prev.
                prevprev
                        ifNil [array at: index put: next]
                        ifNotNil: [prevprev next: next]].
        ^next].
prevprev := prev.
prev := next.
next := next next.
next isNil] whileFalse.
^nil

Argh! better start learning lisp next time...
I didn't test it. If it works, someone pay me a beer!

Nicolas


Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

stephane ducasse
In reply to this post by Igor Stasenko
did you check the hashtable implementation on squeaksource.
t is better than Squeak dictionary since it does lineary degrades.
in your tests did you try to have large dictionaries? Because this  
has also an
impact on the performance.

Stef

On 19 juil. 07, at 14:10, sig wrote:

> its a scratch implementation of dictionaries using linked list of  
> associations.
> dont care about names, care about numbers:
> (code to run taken from MaDictionary class comment)
>
> magma vs mine:
>
> time to add to sd: 161
> time to add to md: 1865
> time to access to sd: 124
> time to access to md: 597
> time to replace to sd: 114
> time to replace to md: 676
> time to remove 300 from sd: 2
> time to remove 300 from md: 3568
> --
> sd - my dictionary
> md - MaWeakIdentityKeyDictionary
>
> smalltalk vs mine:
> --
> time to add to sd: 159
> time to add to md: 63178
> time to access to sd: 147
> time to access to md: 49052
> time to replace to sd: 132
> time to replace to md: 50736
> time to remove 300 from sd: 2
> --
> sd - my dict
> md - smalltalk Dictionary. removing keys takes forever, so i stopped
> it before it printed a result..
> <BucketDictionary.st>
> <DictBucket.st>
>


Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Igor Stasenko
In reply to this post by Ian Piumarta-2
On 19/07/07, Ian Piumarta <[hidden email]> wrote:
> On Jul 19, 2007, at 5:10 AM, sig wrote:
>
> > its a scratch implementation of dictionaries using linked list of
> > associations.
>
> I think the two 'at:ifAbsent:' methods neglect to send #value to the
> errorBlock?
>
no they dont :)

its just a shortcut. instead of putting:
^ bucket ifNil: [ aBlock value ]

i can put:

^ bucket ifNil: aBlock

since IfNil expects block as argument, #value is sent to aBlock
anyways. No reason to encapsulate it into another block.


> Otherwise the lookup performance ain't too shabby for relatively
> small tallies.  (For anyone who can decode this sentence: BucketDict
> is almost as fast as Pepsi's SlotDictionary when used as the basis
> for lexical environments in the Jolt compiler.  This is my usual
> 'quasi-realistic' benchmark for a Dictionary.)
>
> One classic optimisation would be to use an ArrayedCollection of some
> kind (instead of a linked list) for the buckets.  When #findKeyOrNil:
> hits and the association is not the first element in the bucket then
> exchange it with the element immediately preceding it.  Associations
> related to popular keys will gradually migrate to the front of their
> buckets.
i tempted to implement something like:
#findKeyOrNil: key withPrevious: aBlock

so, when it finds association with corresponding key, it sends #value:
to aBlock with previous association in linked list just before
returning a result.
Then it can be easily reordered. But i'm not sure if this needed at all.

Dicts are very sensitive to rehash frequencies.

in given sample:

| sd md r |
Transcript clear.
r := OrderedCollection new.
ByteString allInstancesDo: [:each | r add: each ].
sd := BDictionary new.
Smalltalk garbageCollect.
Transcript cr; show: 'time to add all strings to bd: ',
        [ r do: [:each | sd at: each put: each ] ] timeToRun printString.
sd := Dictionary new.
Smalltalk garbageCollect.
Transcript cr; show: 'time to add all strings to sd: ',
        [ r do: [:each | sd at: each put: each ] ] timeToRun printString.

i got following results:
Making  'tally >= (sz *4)' in #checkForOverflow, i got:

time to add all strings to bd: 3184
time to add all strings to sd: 3010

Making  'tally >= (sz )' in #checkForOverflow, i got:

time to add all strings to bd: 6286
time to add all strings to sd: 2991

Making tally >= sz * 2

time to add all strings to bd: 3162
time to add all strings to sd: 2985

And please , note i made these dicts mainly for replacing Magma ones,
which working with huge sets and poorly hashed keys  taken
IdentityDictionaries:

time to add all strings to bd: 4146
time to add all strings to sd: 7151


PS. In attachment new version.. I will put it into mantis too.



Collections-Unordered-FastDicts.st (38K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Ian Piumarta-2
In reply to this post by Nicolas Cellier-3
On Jul 19, 2007, at 1:14 PM, nicolas cellier wrote:

> > exchange it with the element immediately preceding it.  Associations
> > related to popular keys will gradually migrate to the front of their
> > buckets.
> But the Bucket efficiency in adding/removing comes from linked list.
> How a copy of a bucket array would do better?

You're right, of course.  An ArrayedCollection only makes sense if  
the ratio of removal to other operations is small.

A doubly-linked list perhaps?  ;-)

Cheers,
Ian


Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Igor Stasenko
In reply to this post by Andres Valloud-3
On 19/07/07, Andres Valloud <[hidden email]> wrote:

> sig wrote:
> > look at MaDictionary class comment.
> >
> > both dictionaries used in comparison tests populated simply using just:
> >
> > obj := Object new.
> > dict at: obj put: obj.
> I don't have a Squeak image handy right now, but I'd suggest checking
> these cases out...
>
> Time it takes to add all strings in the image to a dictionary.
>
> Time it takes to add all symbols in the image to a dictionary.

for symbols:

time to add all strings to bd: 467
time to add all strings to sd: 998

>
> Time it takes to add 100k consecutive integers as keys of a dictionary.
>
| sd md r |
Transcript clear.
r := OrderedCollection new.
1 to: 100000 do: [:each | r add: each ].
sd := BDictionary new.
Smalltalk garbageCollect.
Transcript cr; show: 'time to add all strings to bd: ',
        [ r do: [:each | sd at: each put: each ] ] timeToRun printString.
sd := Dictionary new.
Smalltalk garbageCollect.
Transcript cr; show: 'time to add all strings to sd: ',
        [ r do: [:each | sd at: each put: each ] ] timeToRun printString.


time to add all strings to bd: 719
time to add all strings to sd: 870


| sd md r |
Transcript clear.
r := OrderedCollection new.
1 to: 100000 do: [:each | r add: each ].
sd := IdentityBDictionary new.
Smalltalk garbageCollect.
Transcript cr; show: 'time to add all strings to bd: ',
        [ r do: [:each | sd at: each put: each ] ] timeToRun printString.
sd := IdentityDictionary new.
Smalltalk garbageCollect.
Transcript cr; show: 'time to add all strings to sd: ',
        [ r do: [:each | sd at: each put: each ] ] timeToRun printString.

time to add all strings to bd: 719
time to add all strings to sd: 1580


> Thanks,
> Andres.
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Nicolas Cellier-3
In reply to this post by stephane ducasse
stephane ducasse a écrit :
> did you check the hashtable implementation on squeaksource.
> t is better than Squeak dictionary since it does lineary degrades.
> in your tests did you try to have large dictionaries? Because this has
> also an
> impact on the performance.
>
> Stef
>

sig's work is essentially same as HashTable...

Except some details:
HashTable has lot of message indirections to ease subclassing (Weak,
Identity...)
HashTable stores hash code instead of re-computing (memory/speed compromise)
HashTable optimize link list enumeration with [...
(item := item next) isNil] whileFalse loop rather than using recursive
message send

Nicolas


Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Igor Stasenko
On 20/07/07, nicolas cellier <[hidden email]> wrote:

> stephane ducasse a écrit :
> > did you check the hashtable implementation on squeaksource.
> > t is better than Squeak dictionary since it does lineary degrades.
> > in your tests did you try to have large dictionaries? Because this has
> > also an
> > impact on the performance.
> >
> > Stef
> >
>
> sig's work is essentially same as HashTable...
>
> Except some details:
> HashTable has lot of message indirections to ease subclassing (Weak,
> Identity...)
> HashTable stores hash code instead of re-computing (memory/speed compromise)
> HashTable optimize link list enumeration with [...
> (item := item next) isNil] whileFalse loop rather than using recursive
> message send
>

Interesting, why its not included to kernel classes?

> Nicolas
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

stephane ducasse
>> sig's work is essentially same as HashTable...
>>
>> Except some details:
>> HashTable has lot of message indirections to ease subclassing (Weak,
>> Identity...)
>> HashTable stores hash code instead of re-computing (memory/speed  
>> compromise)
>> HashTable optimize link list enumeration with [...
>> (item := item next) isNil] whileFalse loop rather than using  
>> recursive
>> message send
>>
>
> Interesting, why its not included to kernel classes?

it would be good to have a nice package with the different  
implementation.
I think that speeding up Dictionary would also have an impact on the  
overall
performance of squeak (to be verified). Now we never got the energy  
to push that.
But if someone want to have a look this would be great.

Stref



Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Colin Putney

On Jul 20, 2007, at 2:10 AM, stephane ducasse wrote:

>>> sig's work is essentially same as HashTable...
>>
>> Interesting, why its not included to kernel classes?
>
> it would be good to have a nice package with the different  
> implementation.
> I think that speeding up Dictionary would also have an impact on  
> the overall
> performance of squeak (to be verified). Now we never got the energy  
> to push that.
> But if someone want to have a look this would be great.

That someone should look at memory usage as well as speed. HashTable  
uses significantly more memory than a standard Dictionary. Sometimes  
it's a good idea to trade space for speed, but not always.

Colin

12