Smalltalk › Squeak › Squeak - Dev

Faster dictionaries?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

26 messages Options

Igor Stasenko

Faster dictionaries?

its a scratch implementation of dictionaries using linked list of associations.
dont care about names, care about numbers:
(code to run taken from MaDictionary class comment)

magma vs mine:

time to add to sd: 161
time to add to md: 1865
time to access to sd: 124
time to access to md: 597
time to replace to sd: 114
time to replace to md: 676
time to remove 300 from sd: 2
time to remove 300 from md: 3568
--
sd - my dictionary
md - MaWeakIdentityKeyDictionary

smalltalk vs mine:
--
time to add to sd: 159
time to add to md: 63178
time to access to sd: 147
time to access to md: 49052
time to replace to sd: 132
time to replace to md: 50736
time to remove 300 from sd: 2
--
sd - my dict
md - smalltalk Dictionary. removing keys takes forever, so i stopped
it before it printed a result..

BucketDictionary.st (6K) Download Attachment

DictBucket.st (1K) Download Attachment

Damien Cassou-3

Re: Faster dictionaries?

Hi,

your work sounds interesting to me. Is your dictionary passing the
tests given in DictionaryTest? If yes, why not opening a bug report on
mantis and try to have it integrated into a future squeak release?

Bye

2007/7/19, sig <[hidden email]>:

> its a scratch implementation of dictionaries using linked list of associations.
> dont care about names, care about numbers:
> (code to run taken from MaDictionary class comment)
>
> magma vs mine:
>
> time to add to sd: 161
> time to add to md: 1865
> time to access to sd: 124
> time to access to md: 597
> time to replace to sd: 114
> time to replace to md: 676
> time to remove 300 from sd: 2
> time to remove 300 from md: 3568
> --
> sd - my dictionary
> md - MaWeakIdentityKeyDictionary
>
> smalltalk vs mine:
> --
> time to add to sd: 159
> time to add to md: 63178
> time to access to sd: 147
> time to access to md: 49052
> time to replace to sd: 132
> time to replace to md: 50736
> time to remove 300 from sd: 2
> --
> sd - my dict
> md - smalltalk Dictionary. removing keys takes forever, so i stopped
> it before it printed a result..
>
>
>
>
>

--
Damien Cassou

Igor Stasenko

Re: Faster dictionaries?

as i noted its a scratch.. didn't even tested it yet. :)
i want to adopt it for magma at first place.

On 19/07/07, Damien Cassou <[hidden email]> wrote:

> Hi,
>
> your work sounds interesting to me. Is your dictionary passing the
> tests given in DictionaryTest? If yes, why not opening a bug report on
> mantis and try to have it integrated into a future squeak release?
>
> Bye
>
> 2007/7/19, sig <[hidden email]>:
> > its a scratch implementation of dictionaries using linked list of associations.
> > dont care about names, care about numbers:
> > (code to run taken from MaDictionary class comment)
> >
> > magma vs mine:
> >
> > time to add to sd: 161
> > time to add to md: 1865
> > time to access to sd: 124
> > time to access to md: 597
> > time to replace to sd: 114
> > time to replace to md: 676
> > time to remove 300 from sd: 2
> > time to remove 300 from md: 3568
> > --
> > sd - my dictionary
> > md - MaWeakIdentityKeyDictionary
> >
> > smalltalk vs mine:
> > --
> > time to add to sd: 159
> > time to add to md: 63178
> > time to access to sd: 147
> > time to access to md: 49052
> > time to replace to sd: 132
> > time to replace to md: 50736
> > time to remove 300 from sd: 2
> > --
> > sd - my dict
> > md - smalltalk Dictionary. removing keys takes forever, so i stopped
> > it before it printed a result..
> >
> >
> >
> >
> >
>
>
> --
> Damien Cassou
>
>

Colin Putney

Re: Faster dictionaries?

In reply to this post by Igor Stasenko

On Jul 19, 2007, at 5:10 AM, sig wrote:

> its a scratch implementation of dictionaries using linked list of
> associations.
> dont care about names, care about numbers:
> (code to run taken from MaDictionary class comment)

Interesting... but another set of numbers that matter are memory
usage stats. Can you provide a comparison of the memory used by each
implementation after the tests are run?

Also, you might want to look at the HashTable package on
SqueakSource, which I suspect does something similar.

Colin

Igor Stasenko

Re: Faster dictionaries?

Im not really care about memory footprint.
Dictionaries using array of Association-s
Association having 2 ivars.
My Association add 3rd.
So, comparing raw memory footprint of association objects it less than 1.5.

But as it shows, that speed increased rougly by 1-2 factors (comparing
to magma dicts)
and by 3-4 factors comparing squeak current dicts.

I think this is the case when speed over size is preferable :)

On 19/07/07, Colin Putney <[hidden email]> wrote:

>
> On Jul 19, 2007, at 5:10 AM, sig wrote:
>
> > its a scratch implementation of dictionaries using linked list of
> > associations.
> > dont care about names, care about numbers:
> > (code to run taken from MaDictionary class comment)
>
> Interesting... but another set of numbers that matter are memory
> usage stats. Can you provide a comparison of the memory used by each
> implementation after the tests are run?
>
> Also, you might want to look at the HashTable package on
> SqueakSource, which I suspect does something similar.
>
> Colin
>
>

Andreas.Raab

Re: Faster dictionaries?

In reply to this post by Damien Cassou-3

Damien Cassou wrote:
> your work sounds interesting to me. Is your dictionary passing the
> tests given in DictionaryTest? If yes, why not opening a bug report on
> mantis and try to have it integrated into a future squeak release?

I'd recommend running your own benchmarks first. Claims like here:

>> smalltalk vs mine:
>> --
>> time to add to sd: 159
>> time to add to md: 63178
>> time to access to sd: 147
>> time to access to md: 49052
>> time to replace to sd: 132
>> time to replace to md: 50736
>> time to remove 300 from sd: 2
>> --
>> sd - my dict
>> md - smalltalk Dictionary. removing keys takes forever, so i stopped
>> it before it printed a result..

... where BucketDictionary is supposedly 500 times faster in every
regard (adding, accessing, replacing) sound to good to be true. I love
500x speed improvements just like the next guy but in my experience
these don't just happen.

Cheers,
- Andreas

Igor Stasenko

Re: Faster dictionaries?

Wait a bit more. I done with basic BDictionary class, now
adding Identity/Weak subclasses.
All 22 tests which i found for Dictionary in my 3.10 image passed.

On 19/07/07, Andreas Raab <[hidden email]> wrote:

> Damien Cassou wrote:
> > your work sounds interesting to me. Is your dictionary passing the
> > tests given in DictionaryTest? If yes, why not opening a bug report on
> > mantis and try to have it integrated into a future squeak release?
>
> I'd recommend running your own benchmarks first. Claims like here:
>
> >> smalltalk vs mine:
> >> --
> >> time to add to sd: 159
> >> time to add to md: 63178
> >> time to access to sd: 147
> >> time to access to md: 49052
> >> time to replace to sd: 132
> >> time to replace to md: 50736
> >> time to remove 300 from sd: 2
> >> --
> >> sd - my dict
> >> md - smalltalk Dictionary. removing keys takes forever, so i stopped
> >> it before it printed a result..
>
> ... where BucketDictionary is supposedly 500 times faster in every
> regard (adding, accessing, replacing) sound to good to be true. I love
> 500x speed improvements just like the next guy but in my experience
> these don't just happen.
>
> Cheers,
> - Andreas
>
>

Andres Valloud-3

Re: Faster dictionaries?

In reply to this post by Igor Stasenko

sig wrote:
> its a scratch implementation of dictionaries using linked list of
> associations.
> dont care about names, care about numbers:
> (code to run taken from MaDictionary class comment)
What values were stored in the dictionaries when you run the benchmarks?

Thanks,
Andres.

Igor Stasenko

Re: Faster dictionaries?

On 19/07/07, Andres Valloud <[hidden email]> wrote:

> sig wrote:
> > its a scratch implementation of dictionaries using linked list of
> > associations.
> > dont care about names, care about numbers:
> > (code to run taken from MaDictionary class comment)
> What values were stored in the dictionaries when you run the benchmarks?
>
> Thanks,
> Andres.
>
>

look at MaDictionary class comment.

both dictionaries used in comparison tests populated simply using just:

obj := Object new.
dict at: obj put: obj.

Andres Valloud-3

Re: Faster dictionaries?

sig wrote:
> look at MaDictionary class comment.
>
> both dictionaries used in comparison tests populated simply using just:
>
> obj := Object new.
> dict at: obj put: obj.
I don't have a Squeak image handy right now, but I'd suggest checking
these cases out...

Time it takes to add all strings in the image to a dictionary.

Time it takes to add all symbols in the image to a dictionary.

Time it takes to add 100k consecutive integers as keys of a dictionary.

Thanks,
Andres.

Ian Piumarta-2

Re: Faster dictionaries?

In reply to this post by Igor Stasenko

On Jul 19, 2007, at 5:10 AM, sig wrote:

> its a scratch implementation of dictionaries using linked list of
> associations.

I think the two 'at:ifAbsent:' methods neglect to send #value to the
errorBlock?

Otherwise the lookup performance ain't too shabby for relatively
small tallies. (For anyone who can decode this sentence: BucketDict
is almost as fast as Pepsi's SlotDictionary when used as the basis
for lexical environments in the Jolt compiler. This is my usual
'quasi-realistic' benchmark for a Dictionary.)

One classic optimisation would be to use an ArrayedCollection of some
kind (instead of a linked list) for the buckets. When #findKeyOrNil:
hits and the association is not the first element in the bucket then
exchange it with the element immediately preceding it. Associations
related to popular keys will gradually migrate to the front of their
buckets.

Cheers,
Ian

Nicolas Cellier-3

Re: Faster dictionaries?

In reply to this post by Andres Valloud-3

Andres Valloud a écrit :

> sig wrote:
>> look at MaDictionary class comment.
>>
>> both dictionaries used in comparison tests populated simply using just:
>>
>> obj := Object new.
>> dict at: obj put: obj.
> I don't have a Squeak image handy right now, but I'd suggest checking
> these cases out...
>
> Time it takes to add all strings in the image to a dictionary.
>
> Time it takes to add all symbols in the image to a dictionary.
>
> Time it takes to add 100k consecutive integers as keys of a dictionary.
>
> Thanks,
> Andres.
>

sig's BucketDictionary is optimized in the case of Poorly hashed keys.

Strings, Symbols and integers have rather good hash in Squeak, and in
this case sig's BucketDictionary is slower than regular Dictionary.

Ian Piumarta a écrit :
>
> One classic optimisation would be to use an ArrayedCollection of some
> kind (instead of a linked list) for the buckets. When #findKeyOrNil:
> hits and the association is not the first element in the bucket then
> exchange it with the element immediately preceding it. Associations
> related to popular keys will gradually migrate to the front of their
> buckets.
>
> Cheers,
> Ian
>

Ian, very bright to swap popular toward head.
But the Bucket efficiency in adding/removing comes from linked list.
How a copy of a bucket array would do better?
Maybe you mean avoiding writing something ugly like:

BucketDictionary>>findKeyOrNil: key
| index bucket prevprev prev next |
index := (key hash \\ array size) + 1.
bucket := array at: index.
prev := prevprev := nil.
next := bucket.
[next key = key ifTrue: [
prev ifNotNil:
[prev next: next next.
next next: prev.
prevprev
ifNil [array at: index put: next]
ifNotNil: [prevprev next: next]].
^next].
prevprev := prev.
prev := next.
next := next next.
next isNil] whileFalse.
^nil

Argh! better start learning lisp next time...
I didn't test it. If it works, someone pay me a beer!

Nicolas

stephane ducasse

Re: Faster dictionaries?

In reply to this post by Igor Stasenko

did you check the hashtable implementation on squeaksource.
t is better than Squeak dictionary since it does lineary degrades.
in your tests did you try to have large dictionaries? Because this
has also an
impact on the performance.

Stef

On 19 juil. 07, at 14:10, sig wrote:

> its a scratch implementation of dictionaries using linked list of
> associations.
> dont care about names, care about numbers:
> (code to run taken from MaDictionary class comment)
>
> magma vs mine:
>
> time to add to sd: 161
> time to add to md: 1865
> time to access to sd: 124
> time to access to md: 597
> time to replace to sd: 114
> time to replace to md: 676
> time to remove 300 from sd: 2
> time to remove 300 from md: 3568
> --
> sd - my dictionary
> md - MaWeakIdentityKeyDictionary
>
> smalltalk vs mine:
> --
> time to add to sd: 159
> time to add to md: 63178
> time to access to sd: 147
> time to access to md: 49052
> time to replace to sd: 132
> time to replace to md: 50736
> time to remove 300 from sd: 2
> --
> sd - my dict
> md - smalltalk Dictionary. removing keys takes forever, so i stopped
> it before it printed a result..
> <BucketDictionary.st>
> <DictBucket.st>
>

Igor Stasenko

Re: Faster dictionaries?

In reply to this post by Ian Piumarta-2

On 19/07/07, Ian Piumarta <[hidden email]> wrote:
> On Jul 19, 2007, at 5:10 AM, sig wrote:
>
> > its a scratch implementation of dictionaries using linked list of
> > associations.
>
> I think the two 'at:ifAbsent:' methods neglect to send #value to the
> errorBlock?
>
no they dont :)

its just a shortcut. instead of putting:
^ bucket ifNil: [ aBlock value ]

i can put:

^ bucket ifNil: aBlock

since IfNil expects block as argument, #value is sent to aBlock
anyways. No reason to encapsulate it into another block.

> Otherwise the lookup performance ain't too shabby for relatively
> small tallies. (For anyone who can decode this sentence: BucketDict
> is almost as fast as Pepsi's SlotDictionary when used as the basis
> for lexical environments in the Jolt compiler. This is my usual
> 'quasi-realistic' benchmark for a Dictionary.)
>
> One classic optimisation would be to use an ArrayedCollection of some
> kind (instead of a linked list) for the buckets. When #findKeyOrNil:
> hits and the association is not the first element in the bucket then
> exchange it with the element immediately preceding it. Associations
> related to popular keys will gradually migrate to the front of their
> buckets.

i tempted to implement something like:
#findKeyOrNil: key withPrevious: aBlock

so, when it finds association with corresponding key, it sends #value:
to aBlock with previous association in linked list just before
returning a result.
Then it can be easily reordered. But i'm not sure if this needed at all.

Dicts are very sensitive to rehash frequencies.

in given sample:

| sd md r |
Transcript clear.
r := OrderedCollection new.
ByteString allInstancesDo: [:each | r add: each ].
sd := BDictionary new.
Smalltalk garbageCollect.
Transcript cr; show: 'time to add all strings to bd: ',
[ r do: [:each | sd at: each put: each ] ] timeToRun printString.
sd := Dictionary new.
Smalltalk garbageCollect.
Transcript cr; show: 'time to add all strings to sd: ',
[ r do: [:each | sd at: each put: each ] ] timeToRun printString.

i got following results:
Making 'tally >= (sz *4)' in #checkForOverflow, i got:

time to add all strings to bd: 3184
time to add all strings to sd: 3010

Making 'tally >= (sz )' in #checkForOverflow, i got:

time to add all strings to bd: 6286
time to add all strings to sd: 2991

Making tally >= sz * 2

time to add all strings to bd: 3162
time to add all strings to sd: 2985

And please , note i made these dicts mainly for replacing Magma ones,
which working with huge sets and poorly hashed keys taken
IdentityDictionaries:

time to add all strings to bd: 4146
time to add all strings to sd: 7151

PS. In attachment new version.. I will put it into mantis too.

Collections-Unordered-FastDicts.st (38K) Download Attachment

Ian Piumarta-2

Re: Faster dictionaries?

In reply to this post by Nicolas Cellier-3

On Jul 19, 2007, at 1:14 PM, nicolas cellier wrote:

> > exchange it with the element immediately preceding it. Associations
> > related to popular keys will gradually migrate to the front of their
> > buckets.
> But the Bucket efficiency in adding/removing comes from linked list.
> How a copy of a bucket array would do better?

You're right, of course. An ArrayedCollection only makes sense if
the ratio of removal to other operations is small.

A doubly-linked list perhaps? ;-)

Cheers,
Ian

Igor Stasenko

Re: Faster dictionaries?

In reply to this post by Andres Valloud-3

On 19/07/07, Andres Valloud <[hidden email]> wrote:

> sig wrote:
> > look at MaDictionary class comment.
> >
> > both dictionaries used in comparison tests populated simply using just:
> >
> > obj := Object new.
> > dict at: obj put: obj.
> I don't have a Squeak image handy right now, but I'd suggest checking
> these cases out...
>
> Time it takes to add all strings in the image to a dictionary.
>
> Time it takes to add all symbols in the image to a dictionary.

for symbols:

time to add all strings to bd: 467
time to add all strings to sd: 998

>
> Time it takes to add 100k consecutive integers as keys of a dictionary.
>
| sd md r |
Transcript clear.
r := OrderedCollection new.
1 to: 100000 do: [:each | r add: each ].
sd := BDictionary new.
Smalltalk garbageCollect.
Transcript cr; show: 'time to add all strings to bd: ',
[ r do: [:each | sd at: each put: each ] ] timeToRun printString.
sd := Dictionary new.
Smalltalk garbageCollect.
Transcript cr; show: 'time to add all strings to sd: ',
[ r do: [:each | sd at: each put: each ] ] timeToRun printString.

time to add all strings to bd: 719
time to add all strings to sd: 870

| sd md r |
Transcript clear.
r := OrderedCollection new.
1 to: 100000 do: [:each | r add: each ].
sd := IdentityBDictionary new.
Smalltalk garbageCollect.
Transcript cr; show: 'time to add all strings to bd: ',
[ r do: [:each | sd at: each put: each ] ] timeToRun printString.
sd := IdentityDictionary new.
Smalltalk garbageCollect.
Transcript cr; show: 'time to add all strings to sd: ',
[ r do: [:each | sd at: each put: each ] ] timeToRun printString.

time to add all strings to bd: 719
time to add all strings to sd: 1580

> Thanks,
> Andres.
>
>
>

Nicolas Cellier-3

Re: Faster dictionaries?

In reply to this post by stephane ducasse

stephane ducasse a écrit :
> did you check the hashtable implementation on squeaksource.
> t is better than Squeak dictionary since it does lineary degrades.
> in your tests did you try to have large dictionaries? Because this has
> also an
> impact on the performance.
>
> Stef
>

sig's work is essentially same as HashTable...

Except some details:
HashTable has lot of message indirections to ease subclassing (Weak,
Identity...)
HashTable stores hash code instead of re-computing (memory/speed compromise)
HashTable optimize link list enumeration with [...
(item := item next) isNil] whileFalse loop rather than using recursive
message send

Nicolas

Igor Stasenko

Re: Faster dictionaries?

On 20/07/07, nicolas cellier <[hidden email]> wrote:

> stephane ducasse a écrit :
> > did you check the hashtable implementation on squeaksource.
> > t is better than Squeak dictionary since it does lineary degrades.
> > in your tests did you try to have large dictionaries? Because this has
> > also an
> > impact on the performance.
> >
> > Stef
> >
>
> sig's work is essentially same as HashTable...
>
> Except some details:
> HashTable has lot of message indirections to ease subclassing (Weak,
> Identity...)
> HashTable stores hash code instead of re-computing (memory/speed compromise)
> HashTable optimize link list enumeration with [...
> (item := item next) isNil] whileFalse loop rather than using recursive
> message send
>

Interesting, why its not included to kernel classes?

> Nicolas
>
>
>

stephane ducasse

Re: Faster dictionaries?

>> sig's work is essentially same as HashTable...
>>
>> Except some details:
>> HashTable has lot of message indirections to ease subclassing (Weak,
>> Identity...)
>> HashTable stores hash code instead of re-computing (memory/speed
>> compromise)
>> HashTable optimize link list enumeration with [...
>> (item := item next) isNil] whileFalse loop rather than using
>> recursive
>> message send
>>
>
> Interesting, why its not included to kernel classes?

it would be good to have a nice package with the different
implementation.
I think that speeding up Dictionary would also have an impact on the
overall
performance of squeak (to be verified). Now we never got the energy
to push that.
But if someone want to have a look this would be great.

Stref

Colin Putney

Re: Faster dictionaries?

On Jul 20, 2007, at 2:10 AM, stephane ducasse wrote:

>>> sig's work is essentially same as HashTable...
>>
>> Interesting, why its not included to kernel classes?
>
> it would be good to have a nice package with the different
> implementation.
> I think that speeding up Dictionary would also have an impact on
> the overall
> performance of squeak (to be verified). Now we never got the energy
> to push that.
> But if someone want to have a look this would be great.

That someone should look at memory usage as well as speed. HashTable
uses significantly more memory than a standard Dictionary. Sometimes
it's a good idea to trade space for speed, but not always.

Colin