Faster dictionaries?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Andres Valloud-3
stephane ducasse wrote:
> it would be good to have a nice package with the different
> implementation.
> I think that speeding up Dictionary would also have an impact on the
> overall
> performance of squeak (to be verified). Now we never got the energy to
> push that.
> But if someone want to have a look this would be great.
Hey, talking about that... IIRC, the slang for hashMultiply was the
direct translation of the Smalltalk source code into C.  However, the
Smalltalk code is just the small integer friendly version of * 1664525
bitAnd: 16rFFFFFFF (1664525 in decimal, or 16r65 and 16r260D in 14-bit
chunks).

So really, the primitive could just do * 1664525 while in the loop,
letting the unsigned long variable overflow into the void since the
result will be bitAnd'ed with a 28 bit mask anyway, and just do a last
bitAnd: at the end.

My 2 cents,
Andres.

Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Igor Stasenko
In reply to this post by stephane ducasse
On 20/07/07, stephane ducasse <[hidden email]> wrote:

> >> sig's work is essentially same as HashTable...
> >>
> >> Except some details:
> >> HashTable has lot of message indirections to ease subclassing (Weak,
> >> Identity...)
> >> HashTable stores hash code instead of re-computing (memory/speed
> >> compromise)
> >> HashTable optimize link list enumeration with [...
> >> (item := item next) isNil] whileFalse loop rather than using
> >> recursive
> >> message send
> >>
> >
> > Interesting, why its not included to kernel classes?
>
> it would be good to have a nice package with the different
> implementation.

you can find it here:
http://bugs.squeak.org/view.php?id=6569

Want me to put it in squeaksource?

> I think that speeding up Dictionary would also have an impact on the
> overall
> performance of squeak (to be verified). Now we never got the energy
> to push that.
> But if someone want to have a look this would be great.
>

SpaceTally new spaceForInstancesOf: Association withInstanceCount: 1
returns 12

SpaceTally new spaceForInstancesOf: BAssociation withInstanceCount: 1
returns 20

Now, for dictionary with 100 items:
Dictionary:

412 bytes for array ivar. + 100 * 12 for associations
+ 16 bytes for Dictionary instance.
= 1628 bytes

BDictionary:

412 bytes for array ivar. + 100 * 20 for associations
+ 16 bytes for Dictionary instance.
= 2428 bytes

2428 / 1628  ~ 1.5

And regarding speed, there are still some things which can be
fine-tuned to get more performance.

> Stref
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Adrian Lienhard
In reply to this post by Igor Stasenko
On Jul 19, 2007, at 22:53 , sig wrote:

>>
>> Time it takes to add 100k consecutive integers as keys of a  
>> dictionary.
>>
> | sd md r |
> Transcript clear.
> r := OrderedCollection new.
> 1 to: 100000 do: [:each | r add: each ].
> sd := BDictionary new.
> Smalltalk garbageCollect.
> Transcript cr; show: 'time to add all strings to bd: ',
> [ r do: [:each | sd at: each put: each ] ] timeToRun printString.
> sd := Dictionary new.
> Smalltalk garbageCollect.
> Transcript cr; show: 'time to add all strings to sd: ',
> [ r do: [:each | sd at: each put: each ] ] timeToRun printString.
>
>
> time to add all strings to bd: 719
> time to add all strings to sd: 870

However, accessing the integer keys is faster with the Smalltalk  
Dictionary!

I got the folllowing numbers:

time to add to bd: 298
time to add to sd: 347

time to access bd: 207
time to access sd: 110

| sd r |
r := OrderedCollection new.
1 to: 100000 do: [:each | r add: each ].
bd := BDictionary new.
Smalltalk garbageCollect.
Transcript cr; show: 'time to add to bd: ',
        [ r do: [:each | bd at: each put: each ] ] timeToRun printString.
sd := Dictionary new.
Smalltalk garbageCollect.
Transcript cr; show: 'time to add to sd: ',
        [ r do: [:each | sd at: each put: each ] ] timeToRun printString.
r := r shuffled.
Smalltalk garbageCollect.
Transcript cr; show: 'time to access bd: ',
        [ r do: [:each | bd at: each ] ] timeToRun printString.
Smalltalk garbageCollect.
Transcript cr; show: 'time to access sd: ',
        [ r do: [:each | sd at: each ] ] timeToRun printString.


On Jul 20, 2007, at 12:35 , sig wrote:
>>
> SpaceTally new spaceForInstancesOf: Association withInstanceCount: 1
> returns 12
>
> SpaceTally new spaceForInstancesOf: BAssociation withInstanceCount: 1
> returns 20

ok, the additional memory consumption seems reasonable as there is  
only one additional instance variable in BAssociation (note,  
Association is a compact class and hence its header is only 4 bytes  
compared to 8 bytes of the header of BAssociation; but BAssocitaion  
could also be made a compact class).

As far as I can tell, your implementation performs very well for  
objects with no reasonable hash, like the identityHash of  Squeak  
(which is way too small). Else, this does not seem to be the case.

Cheers,
Adrian

___________________
Adrian Lienhard
www.adrian-lienhard.ch


Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Igor Stasenko
On 20/07/07, Adrian Lienhard <[hidden email]> wrote:

> On Jul 19, 2007, at 22:53 , sig wrote:
> >>
> >> Time it takes to add 100k consecutive integers as keys of a
> >> dictionary.
> >>
> > | sd md r |
> > Transcript clear.
> > r := OrderedCollection new.
> > 1 to: 100000 do: [:each | r add: each ].
> > sd := BDictionary new.
> > Smalltalk garbageCollect.
> > Transcript cr; show: 'time to add all strings to bd: ',
> >       [ r do: [:each | sd at: each put: each ] ] timeToRun printString.
> > sd := Dictionary new.
> > Smalltalk garbageCollect.
> > Transcript cr; show: 'time to add all strings to sd: ',
> >       [ r do: [:each | sd at: each put: each ] ] timeToRun printString.
> >
> >
> > time to add all strings to bd: 719
> > time to add all strings to sd: 870
>
> However, accessing the integer keys is faster with the Smalltalk
> Dictionary!
>
> I got the folllowing numbers:
>
> time to add to bd: 298
> time to add to sd: 347
>
> time to access bd: 207
> time to access sd: 110
>

This is no longer true :)

time to add all  to md: 499
time to add all  to sd: 868

time to access to md: 315
time to add access to sd: 325

md - stands for MaDictionary. i replaced them with mine and tuned some
methods :)

>
> ok, the additional memory consumption seems reasonable as there is
> only one additional instance variable in BAssociation (note,
> Association is a compact class and hence its header is only 4 bytes
> compared to 8 bytes of the header of BAssociation; but BAssocitaion
> could also be made a compact class).
>
> As far as I can tell, your implementation performs very well for
> objects with no reasonable hash, like the identityHash of  Squeak
> (which is way too small). Else, this does not seem to be the case.
>

My dicts showing most better results , when you need to remove keys.
In most cases, dictionaries used to be populated once, then
accessing/replacing values.
But in case, when you need to remove keys (especially weak key
dictionaries) current squeak implementation is very slow.

Dictionary calls #fixCollisionsFrom:
and WeakKeyDictionary calls rehash on each key removal!
This is a show-stopper approach.
Dictionary with linked associations don't require immediate rehashing
upon removing a key.
Actually, in my case i need rehashing just for two reasons: optimize
access time and optimize memory consumption.
So, whenever you using dictionary not only for populate and access
purposes, but also deleting keys from it , then its preferable to use
linked-assoc. dicts.

> Cheers,
> Adrian
>
> ___________________
> Adrian Lienhard
> www.adrian-lienhard.ch
>
>
>

Reply | Threaded
Open this post in threaded view
|

RE: Faster dictionaries?

J J-6
In reply to this post by Igor Stasenko





> Date: Fri, 20 Jul 2007 18:18:23 +0300
> From: [hidden email]
> To: [hidden email]
> Subject: Re: Faster dictionaries?
>
> My dicts showing most better results , when you need to remove keys.
> In most cases, dictionaries used to be populated once, then
> accessing/replacing values.
> But in case, when you need to remove keys (especially weak key
> dictionaries) current squeak implementation is very slow.

Are you using the updated WeakDictionary that someone put on Mantis a few months back?  I hope that has made it into the image by now, but I don't recall if it did or not.


Missed the show?  Watch videos of the Live Earth Concert on MSN. See them now!

Reply | Threaded
Open this post in threaded view
|

Re: Faster dictionaries?

Igor Stasenko
On 22/07/07, J J <[hidden email]> wrote:

>
>
>
>
>
> ________________________________
> > Date: Fri, 20 Jul 2007 18:18:23 +0300
> > From: [hidden email]
> > To: [hidden email]
> > Subject: Re: Faster dictionaries?
> >
> > My dicts showing most better results , when you need to remove keys.
> > In most cases, dictionaries used to be populated once, then
> > accessing/replacing values.
> > But in case, when you need to remove keys (especially weak key
> > dictionaries) current squeak implementation is very slow.
>
> Are you using the updated WeakDictionary that someone put on Mantis a few
> months back?  I hope that has made it into the image by now, but I don't
> recall if it did or not.
>

I'm currently working on project which using 3.8 image as base.
Look at this:

WeakKeyDictionary>>fixCollisionsFrom: oldIndex
        "The element at index has been removed and replaced by nil."
        self rehash. "Do it the hard way - we may have any number of nil keys
and #rehash deals with them"

method's stamp:
ar 10/21/2000 19:58

> ________________________________
> Missed the show?  Watch videos of the Live Earth Concert on MSN. See them
> now!
>
>
>


--
Best regards,
Igor Stasenko AKA sig.

12