IdentitySet but using #hash rather than #identityHash ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
62 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Levente Uzonyi-2
On Fri, 16 Dec 2011, Henrik Sperre Johansen wrote:

> On 16.12.2011 00:40, Henrik Sperre Johansen wrote:
>> On 15.12.2011 23:35, Mariano Martinez Peck wrote:
>>> Well...it is much slower :(  it seems that the cost of  (aKey identityHash
>>> + ( aKey mareaClass identityHash bitShift: 12) + (aKey basicSize bitShift:
>>> 24)
>>> is bigger than the colisions.
>>> Anyway, thanks for the nice thread. I learned.
>>>
>>> Cheers
>>>
>> Well, first of all, that always creates a largeInteger, since identityHash
>> is already shifted by 22... You'd want to be using basicIdentityHash.
>> 2nd off, basicSize is useless for non-variable classes, which I guess is
>> the common case.
>> 3rd, putting class hash in the high bits won't really help much if you're
>> serializing many instances of the same class, as they will all occupy a
>> sequential hash range.
>>
>> So if I were you, I'd try with something like
>> obj basicIdentityHash << 12 + (obj class basicIdentityHash)
>> before discarding it outright due to computation cost.
>>
>> Cheers,
>> Henry
>>
> Note: This is not portable to Squeak.
> They kept the old version of identityHash (renamed to basicIdentityHash in
> Pharo), and introduced scaledIdentityHash (doing the same as identityHash in
> Pharo), due to backwards compatability concerns for users explicitly relying
> on #identityHash returning a value in range [0, 4095].

I saw no benefit (and I still don't see any) of changing the behavior
of #identityHash and using #basicIdentityHash, so I didn't
integrate Martin's suggestion. It's just a side effect that "old code"
still works in Squeak.


Levente

>
> Cheers,
> Henry
>
>

Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Henrik Sperre Johansen
In reply to this post by Levente Uzonyi-2
On 16.12.2011 03:26, Levente Uzonyi wrote:

>
> How about my numbers? :)
>
> "Preallocate objects, so we won't count gc time."
> n := 1000000.
> objects := Array new: n streamContents: [ :stream |
>     n timesRepeat: [ stream nextPut: Object new ] ].
>
> set := IdentitySet new: n.
> Smalltalk garbageCollect.
> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "4949"
>
> set := LargeIdentitySet new.
> Smalltalk garbageCollect.
> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "331"
>
> set := (PluggableSet new: n)
>     hashBlock: [ :object | object identityHash * 4096 + object class
> identityHash * 64 ]; "Change this to #basicIdentityHash in Pharo"
>     equalBlock: [ :a :b | a == b ];
>     yourself.
> Smalltalk garbageCollect.
> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "5511"
>
>
> I also have a LargeIdentityDictionary, which is relatively fast, but
> not as fast as LargeIdentitySet, because (for some unknown reason) we
> don't have a primitive that could support it. If we had a primitive
> like primitive 132 which would return the index of the element if
> found or 0 if not, then we could have a really fast
> LargeIdentityDictionary.
>
>
> Levente
Hehe yes, if writing a version fully exploiting the limited range,
that's probably the approach I would go for as well.
(IAssuming it's the version at  
http://leves.web.elte.hu/squeak/LargeIdentitySet.st)

Mariano commented in the version at
http://www.squeaksource.com/FuelExperiments that it's slow for them,
which I guess is due to not adopting #identityHash calls to
#basicIdentityHash calls for Pharo:
((0 to: 4095) collect: [:each | each << 22 \\ 4096 ]) asSet size -> 1
So it basically uses 1 bucket instead of 4096... Whoops. :)

Uploaded a new version to the MC repository which is adapted for Pharo,
on the same machine my numbers were taken from, it does the same test as
I used above in 871 ms. (181 with preallocation).

Cheers,
Henry

Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Levente Uzonyi-2
On Fri, 16 Dec 2011, Henrik Sperre Johansen wrote:

> On 16.12.2011 03:26, Levente Uzonyi wrote:
>>
>> How about my numbers? :)
>>
>> "Preallocate objects, so we won't count gc time."
>> n := 1000000.
>> objects := Array new: n streamContents: [ :stream |
>>     n timesRepeat: [ stream nextPut: Object new ] ].
>>
>> set := IdentitySet new: n.
>> Smalltalk garbageCollect.
>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "4949"
>>
>> set := LargeIdentitySet new.
>> Smalltalk garbageCollect.
>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "331"
>>
>> set := (PluggableSet new: n)
>>     hashBlock: [ :object | object identityHash * 4096 + object class
>> identityHash * 64 ]; "Change this to #basicIdentityHash in Pharo"
>>     equalBlock: [ :a :b | a == b ];
>>     yourself.
>> Smalltalk garbageCollect.
>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "5511"
>>
>>
>> I also have a LargeIdentityDictionary, which is relatively fast, but not as
>> fast as LargeIdentitySet, because (for some unknown reason) we don't have a
>> primitive that could support it. If we had a primitive like primitive 132
>> which would return the index of the element if found or 0 if not, then we
>> could have a really fast LargeIdentityDictionary.
>>
>>
>> Levente
> Hehe yes, if writing a version fully exploiting the limited range, that's
> probably the approach I would go for as well.
> (IAssuming it's the version at
> http://leves.web.elte.hu/squeak/LargeIdentitySet.st)
>
> Mariano commented in the version at
> http://www.squeaksource.com/FuelExperiments that it's slow for them, which I
> guess is due to not adopting #identityHash calls to #basicIdentityHash calls
> for Pharo:
> ((0 to: 4095) collect: [:each | each << 22 \\ 4096 ]) asSet size -> 1
> So it basically uses 1 bucket instead of 4096... Whoops. :)
>
> Uploaded a new version to the MC repository which is adapted for Pharo, on
> the same machine my numbers were taken from, it does the same test as I used
> above in 871 ms. (181 with preallocation).

Cool. One more thing: in Squeak the method using primitive 132 directly
was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If
this was also added to Pharo, then the #pointsTo: sends should be changed
to #instVarsInclude:, otherwise Array can be reported as included even if
it wasn't added.
I'll upload my LargeIdentityDictionary implementation to the same place
this evening, since it's still 2-3 factor faster than other solutionts and
there seem to be demand for it.


Levente

>
> Cheers,
> Henry
>
>

Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Mariano Martinez Peck
Wow......

Henrik: thank you so much for teaching me :)  Indeed, I didn't notice that in that case I use using a LargePositiveInteger, and indeed I had removed the basicSize since I also noticed it wouldn't make much sense.  Second, thanks a LOT for finding Levente's LargeIdentitySet in Fuel repo with our small adaptation (#addIfNotPresent: anObject ifPresentDo: aBlock). Thanks for fixing it and commiting a new version.

All I can say is that I am impressed by the numbers it is really much faster.
I still don't understand why I send this email with a subject say IdentitySet because what I really need is a fast/large IdentityDictionary :(  Anyway, there's a place where we can use this LargeIdentitySet in Fuel I think).

So Levente, you say this is not possible to adapt this for dictionary?  can we contact Eliot to provide such a primitive?

thanks

On Fri, Dec 16, 2011 at 3:28 PM, Levente Uzonyi <[hidden email]> wrote:
On Fri, 16 Dec 2011, Henrik Sperre Johansen wrote:

On 16.12.2011 03:26, Levente Uzonyi wrote:

How about my numbers? :)

"Preallocate objects, so we won't count gc time."
n := 1000000.
objects := Array new: n streamContents: [ :stream |
   n timesRepeat: [ stream nextPut: Object new ] ].

set := IdentitySet new: n.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "4949"

set := LargeIdentitySet new.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "331"

set := (PluggableSet new: n)
   hashBlock: [ :object | object identityHash * 4096 + object class identityHash * 64 ]; "Change this to #basicIdentityHash in Pharo"
   equalBlock: [ :a :b | a == b ];
   yourself.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "5511"


I also have a LargeIdentityDictionary, which is relatively fast, but not as fast as LargeIdentitySet, because (for some unknown reason) we don't have a primitive that could support it. If we had a primitive like primitive 132 which would return the index of the element if found or 0 if not, then we could have a really fast LargeIdentityDictionary.


Levente
Hehe yes, if writing a version fully exploiting the limited range, that's probably the approach I would go for as well.
(IAssuming it's the version at http://leves.web.elte.hu/squeak/LargeIdentitySet.st)

Mariano commented in the version at http://www.squeaksource.com/FuelExperiments that it's slow for them, which I guess is due to not adopting #identityHash calls to #basicIdentityHash calls for Pharo:
((0 to: 4095) collect: [:each | each << 22 \\ 4096 ]) asSet size -> 1
So it basically uses 1 bucket instead of 4096... Whoops. :)

Uploaded a new version to the MC repository which is adapted for Pharo, on the same machine my numbers were taken from, it does the same test as I used above in 871 ms. (181 with preallocation).

Cool. One more thing: in Squeak the method using primitive 132 directly was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If this was also added to Pharo, then the #pointsTo: sends should be changed to #instVarsInclude:, otherwise Array can be reported as included even if it wasn't added.
I'll upload my LargeIdentityDictionary implementation to the same place this evening, since it's still 2-3 factor faster than other solutionts and there seem to be demand for it.


Levente


Cheers,
Henry






--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Mariano Martinez Peck
Levente, one more question: is there a case (small sets?) where LargeIdentitySet is not recommended?  my question is, should I ONLY use it for places where I know I can have large sets or should it also work for small sets as well?  from my tests it seems to work well also with small ones...but just wondering.

Thanks

On Fri, Dec 16, 2011 at 8:43 PM, Mariano Martinez Peck <[hidden email]> wrote:
Wow......

Henrik: thank you so much for teaching me :)  Indeed, I didn't notice that in that case I use using a LargePositiveInteger, and indeed I had removed the basicSize since I also noticed it wouldn't make much sense.  Second, thanks a LOT for finding Levente's LargeIdentitySet in Fuel repo with our small adaptation (#addIfNotPresent: anObject ifPresentDo: aBlock). Thanks for fixing it and commiting a new version.

All I can say is that I am impressed by the numbers it is really much faster.
I still don't understand why I send this email with a subject say IdentitySet because what I really need is a fast/large IdentityDictionary :(  Anyway, there's a place where we can use this LargeIdentitySet in Fuel I think).

So Levente, you say this is not possible to adapt this for dictionary?  can we contact Eliot to provide such a primitive?

thanks


On Fri, Dec 16, 2011 at 3:28 PM, Levente Uzonyi <[hidden email]> wrote:
On Fri, 16 Dec 2011, Henrik Sperre Johansen wrote:

On 16.12.2011 03:26, Levente Uzonyi wrote:

How about my numbers? :)

"Preallocate objects, so we won't count gc time."
n := 1000000.
objects := Array new: n streamContents: [ :stream |
   n timesRepeat: [ stream nextPut: Object new ] ].

set := IdentitySet new: n.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "4949"

set := LargeIdentitySet new.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "331"

set := (PluggableSet new: n)
   hashBlock: [ :object | object identityHash * 4096 + object class identityHash * 64 ]; "Change this to #basicIdentityHash in Pharo"
   equalBlock: [ :a :b | a == b ];
   yourself.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "5511"


I also have a LargeIdentityDictionary, which is relatively fast, but not as fast as LargeIdentitySet, because (for some unknown reason) we don't have a primitive that could support it. If we had a primitive like primitive 132 which would return the index of the element if found or 0 if not, then we could have a really fast LargeIdentityDictionary.


Levente
Hehe yes, if writing a version fully exploiting the limited range, that's probably the approach I would go for as well.
(IAssuming it's the version at http://leves.web.elte.hu/squeak/LargeIdentitySet.st)

Mariano commented in the version at http://www.squeaksource.com/FuelExperiments that it's slow for them, which I guess is due to not adopting #identityHash calls to #basicIdentityHash calls for Pharo:
((0 to: 4095) collect: [:each | each << 22 \\ 4096 ]) asSet size -> 1
So it basically uses 1 bucket instead of 4096... Whoops. :)

Uploaded a new version to the MC repository which is adapted for Pharo, on the same machine my numbers were taken from, it does the same test as I used above in 871 ms. (181 with preallocation).

Cool. One more thing: in Squeak the method using primitive 132 directly was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If this was also added to Pharo, then the #pointsTo: sends should be changed to #instVarsInclude:, otherwise Array can be reported as included even if it wasn't added.
I'll upload my LargeIdentityDictionary implementation to the same place this evening, since it's still 2-3 factor faster than other solutionts and there seem to be demand for it.


Levente


Cheers,
Henry






--
Mariano
http://marianopeck.wordpress.com




--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Mariano Martinez Peck
In reply to this post by Levente Uzonyi-2


On Fri, Dec 16, 2011 at 3:28 PM, Levente Uzonyi <[hidden email]> wrote:
On Fri, 16 Dec 2011, Henrik Sperre Johansen wrote:

On 16.12.2011 03:26, Levente Uzonyi wrote:

How about my numbers? :)

"Preallocate objects, so we won't count gc time."
n := 1000000.
objects := Array new: n streamContents: [ :stream |
   n timesRepeat: [ stream nextPut: Object new ] ].

set := IdentitySet new: n.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "4949"

set := LargeIdentitySet new.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "331"

set := (PluggableSet new: n)
   hashBlock: [ :object | object identityHash * 4096 + object class identityHash * 64 ]; "Change this to #basicIdentityHash in Pharo"
   equalBlock: [ :a :b | a == b ];
   yourself.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "5511"


I also have a LargeIdentityDictionary, which is relatively fast, but not as fast as LargeIdentitySet, because (for some unknown reason) we don't have a primitive that could support it. If we had a primitive like primitive 132 which would return the index of the element if found or 0 if not, then we could have a really fast LargeIdentityDictionary.


Levente
Hehe yes, if writing a version fully exploiting the limited range, that's probably the approach I would go for as well.
(IAssuming it's the version at http://leves.web.elte.hu/squeak/LargeIdentitySet.st)

Mariano commented in the version at http://www.squeaksource.com/FuelExperiments that it's slow for them, which I guess is due to not adopting #identityHash calls to #basicIdentityHash calls for Pharo:
((0 to: 4095) collect: [:each | each << 22 \\ 4096 ]) asSet size -> 1
So it basically uses 1 bucket instead of 4096... Whoops. :)

Uploaded a new version to the MC repository which is adapted for Pharo, on the same machine my numbers were taken from, it does the same test as I used above in 871 ms. (181 with preallocation).

Cool. One more thing: in Squeak the method using primitive 132 directly was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If this was also added to Pharo, then the #pointsTo: sends should be changed to #instVarsInclude:, otherwise Array can be reported as included even if it wasn't added.

In Pharo we have:

pointsTo: anObject
    "This method returns true if self contains a pointer to anObject,
        and returns false otherwise"
    <primitive: 132>

So I guess it is correct to let it like this.

 
I'll upload my LargeIdentityDictionary implementation to the same place this evening, since it's still 2-3 factor faster than other solutionts and there seem to be demand for it.


I am lost.  thought I read something saying you couldn't do that for Dictionaries because you needed a primitive?  Sorry...long day, maybe I am just crazy ;)
Anyway, I would appreaciate and take a look to LargeIdentityDictionary if you do it.
BTW, I guess both are MIT ?

Thanks Levente

 

Levente


Cheers,
Henry






--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Levente Uzonyi-2
In reply to this post by Mariano Martinez Peck
On Fri, 16 Dec 2011, Mariano Martinez Peck wrote:

> Wow......
>
> Henrik: thank you so much for teaching me :)  Indeed, I didn't notice that
> in that case I use using a LargePositiveInteger, and indeed I had removed
> the basicSize since I also noticed it wouldn't make much sense.  Second,
> thanks a LOT for finding Levente's LargeIdentitySet in Fuel repo with our
> small adaptation (#addIfNotPresent: anObject ifPresentDo: aBlock). Thanks
> for fixing it and commiting a new version.
>
> All I can say is that I am impressed by the numbers it is really much
> faster.
> I still don't understand why I send this email with a subject say
> IdentitySet because what I really need is a fast/large IdentityDictionary
> :(  Anyway, there's a place where we can use this LargeIdentitySet in Fuel
> I think).
>
> So Levente, you say this is not possible to adapt this for dictionary?  can
> we contact Eliot to provide such a primitive?

As promised, I uploaded my LargeIdentityDictionary implementation to
http://leves.web.elte.hu/squeak/LargeIdentityDictionary.st .
The numbers will be a bit worse compared to LargeIdentitySet, because of
the lack of the primitive, but it's still 2-3x faster than other
solutions (IdentityDictionary, PluggableIdentityDictionary, subclassing,
etc). I'm about to propose this primitive with other improvements on the
vm-dev list.


Levente

>
> thanks
>
> On Fri, Dec 16, 2011 at 3:28 PM, Levente Uzonyi <[hidden email]> wrote:
>
>> On Fri, 16 Dec 2011, Henrik Sperre Johansen wrote:
>>
>>  On 16.12.2011 03:26, Levente Uzonyi wrote:
>>>
>>>>
>>>> How about my numbers? :)
>>>>
>>>> "Preallocate objects, so we won't count gc time."
>>>> n := 1000000.
>>>> objects := Array new: n streamContents: [ :stream |
>>>>    n timesRepeat: [ stream nextPut: Object new ] ].
>>>>
>>>> set := IdentitySet new: n.
>>>> Smalltalk garbageCollect.
>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "4949"
>>>>
>>>> set := LargeIdentitySet new.
>>>> Smalltalk garbageCollect.
>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "331"
>>>>
>>>> set := (PluggableSet new: n)
>>>>    hashBlock: [ :object | object identityHash * 4096 + object class
>>>> identityHash * 64 ]; "Change this to #basicIdentityHash in Pharo"
>>>>    equalBlock: [ :a :b | a == b ];
>>>>    yourself.
>>>> Smalltalk garbageCollect.
>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "5511"
>>>>
>>>>
>>>> I also have a LargeIdentityDictionary, which is relatively fast, but not
>>>> as fast as LargeIdentitySet, because (for some unknown reason) we don't
>>>> have a primitive that could support it. If we had a primitive like
>>>> primitive 132 which would return the index of the element if found or 0 if
>>>> not, then we could have a really fast LargeIdentityDictionary.
>>>>
>>>>
>>>> Levente
>>>>
>>> Hehe yes, if writing a version fully exploiting the limited range, that's
>>> probably the approach I would go for as well.
>>> (IAssuming it's the version at http://leves.web.elte.hu/**
>>> squeak/LargeIdentitySet.st<http://leves.web.elte.hu/squeak/LargeIdentitySet.st>
>>> )
>>>
>>> Mariano commented in the version at http://www.squeaksource.com/**
>>> FuelExperiments <http://www.squeaksource.com/FuelExperiments> that it's
>>> slow for them, which I guess is due to not adopting #identityHash calls to
>>> #basicIdentityHash calls for Pharo:
>>> ((0 to: 4095) collect: [:each | each << 22 \\ 4096 ]) asSet size -> 1
>>> So it basically uses 1 bucket instead of 4096... Whoops. :)
>>>
>>> Uploaded a new version to the MC repository which is adapted for Pharo,
>>> on the same machine my numbers were taken from, it does the same test as I
>>> used above in 871 ms. (181 with preallocation).
>>>
>>
>> Cool. One more thing: in Squeak the method using primitive 132 directly
>> was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If
>> this was also added to Pharo, then the #pointsTo: sends should be changed
>> to #instVarsInclude:, otherwise Array can be reported as included even if
>> it wasn't added.
>> I'll upload my LargeIdentityDictionary implementation to the same place
>> this evening, since it's still 2-3 factor faster than other solutionts and
>> there seem to be demand for it.
>>
>>
>> Levente
>>
>>
>>> Cheers,
>>> Henry
>>>
>>>
>>>
>>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>

Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Levente Uzonyi-2
In reply to this post by Mariano Martinez Peck
On Fri, 16 Dec 2011, Mariano Martinez Peck wrote:

> Levente, one more question: is there a case (small sets?) where
> LargeIdentitySet is not recommended?  my question is, should I ONLY use it
> for places where I know I can have large sets or should it also work for
> small sets as well?  from my tests it seems to work well also with small
> ones...but just wondering.

One drawback of this implementation is that it allocates an array with
4096 slots even if it's empty. So it has an overhead about 16kB compared
to IdentitySet. This number doubles for LargeIdentityDictionary.
So if you're using only a few of these collections, then they won't cause
any problem.


Levente

>
> Thanks
>
> On Fri, Dec 16, 2011 at 8:43 PM, Mariano Martinez Peck <
> [hidden email]> wrote:
>
>> Wow......
>>
>> Henrik: thank you so much for teaching me :)  Indeed, I didn't notice that
>> in that case I use using a LargePositiveInteger, and indeed I had removed
>> the basicSize since I also noticed it wouldn't make much sense.  Second,
>> thanks a LOT for finding Levente's LargeIdentitySet in Fuel repo with our
>> small adaptation (#addIfNotPresent: anObject ifPresentDo: aBlock). Thanks
>> for fixing it and commiting a new version.
>>
>> All I can say is that I am impressed by the numbers it is really much
>> faster.
>> I still don't understand why I send this email with a subject say
>> IdentitySet because what I really need is a fast/large IdentityDictionary
>> :(  Anyway, there's a place where we can use this LargeIdentitySet in Fuel
>> I think).
>>
>> So Levente, you say this is not possible to adapt this for dictionary?
>> can we contact Eliot to provide such a primitive?
>>
>> thanks
>>
>>
>> On Fri, Dec 16, 2011 at 3:28 PM, Levente Uzonyi <[hidden email]> wrote:
>>
>>> On Fri, 16 Dec 2011, Henrik Sperre Johansen wrote:
>>>
>>>  On 16.12.2011 03:26, Levente Uzonyi wrote:
>>>>
>>>>>
>>>>> How about my numbers? :)
>>>>>
>>>>> "Preallocate objects, so we won't count gc time."
>>>>> n := 1000000.
>>>>> objects := Array new: n streamContents: [ :stream |
>>>>>    n timesRepeat: [ stream nextPut: Object new ] ].
>>>>>
>>>>> set := IdentitySet new: n.
>>>>> Smalltalk garbageCollect.
>>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "4949"
>>>>>
>>>>> set := LargeIdentitySet new.
>>>>> Smalltalk garbageCollect.
>>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "331"
>>>>>
>>>>> set := (PluggableSet new: n)
>>>>>    hashBlock: [ :object | object identityHash * 4096 + object class
>>>>> identityHash * 64 ]; "Change this to #basicIdentityHash in Pharo"
>>>>>    equalBlock: [ :a :b | a == b ];
>>>>>    yourself.
>>>>> Smalltalk garbageCollect.
>>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "5511"
>>>>>
>>>>>
>>>>> I also have a LargeIdentityDictionary, which is relatively fast, but
>>>>> not as fast as LargeIdentitySet, because (for some unknown reason) we don't
>>>>> have a primitive that could support it. If we had a primitive like
>>>>> primitive 132 which would return the index of the element if found or 0 if
>>>>> not, then we could have a really fast LargeIdentityDictionary.
>>>>>
>>>>>
>>>>> Levente
>>>>>
>>>> Hehe yes, if writing a version fully exploiting the limited range,
>>>> that's probably the approach I would go for as well.
>>>> (IAssuming it's the version at http://leves.web.elte.hu/**
>>>> squeak/LargeIdentitySet.st<http://leves.web.elte.hu/squeak/LargeIdentitySet.st>
>>>> )
>>>>
>>>> Mariano commented in the version at http://www.squeaksource.com/**
>>>> FuelExperiments <http://www.squeaksource.com/FuelExperiments> that it's
>>>> slow for them, which I guess is due to not adopting #identityHash calls to
>>>> #basicIdentityHash calls for Pharo:
>>>> ((0 to: 4095) collect: [:each | each << 22 \\ 4096 ]) asSet size -> 1
>>>> So it basically uses 1 bucket instead of 4096... Whoops. :)
>>>>
>>>> Uploaded a new version to the MC repository which is adapted for Pharo,
>>>> on the same machine my numbers were taken from, it does the same test as I
>>>> used above in 871 ms. (181 with preallocation).
>>>>
>>>
>>> Cool. One more thing: in Squeak the method using primitive 132 directly
>>> was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If
>>> this was also added to Pharo, then the #pointsTo: sends should be changed
>>> to #instVarsInclude:, otherwise Array can be reported as included even if
>>> it wasn't added.
>>> I'll upload my LargeIdentityDictionary implementation to the same place
>>> this evening, since it's still 2-3 factor faster than other solutionts and
>>> there seem to be demand for it.
>>>
>>>
>>> Levente
>>>
>>>
>>>> Cheers,
>>>> Henry
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Mariano
>> http://marianopeck.wordpress.com
>>
>>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>

Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Levente Uzonyi-2
In reply to this post by Mariano Martinez Peck
On Fri, 16 Dec 2011, Mariano Martinez Peck wrote:

> On Fri, Dec 16, 2011 at 3:28 PM, Levente Uzonyi <[hidden email]> wrote:
>
>> On Fri, 16 Dec 2011, Henrik Sperre Johansen wrote:
>>
>>  On 16.12.2011 03:26, Levente Uzonyi wrote:
>>>
>>>>
>>>> How about my numbers? :)
>>>>
>>>> "Preallocate objects, so we won't count gc time."
>>>> n := 1000000.
>>>> objects := Array new: n streamContents: [ :stream |
>>>>    n timesRepeat: [ stream nextPut: Object new ] ].
>>>>
>>>> set := IdentitySet new: n.
>>>> Smalltalk garbageCollect.
>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "4949"
>>>>
>>>> set := LargeIdentitySet new.
>>>> Smalltalk garbageCollect.
>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "331"
>>>>
>>>> set := (PluggableSet new: n)
>>>>    hashBlock: [ :object | object identityHash * 4096 + object class
>>>> identityHash * 64 ]; "Change this to #basicIdentityHash in Pharo"
>>>>    equalBlock: [ :a :b | a == b ];
>>>>    yourself.
>>>> Smalltalk garbageCollect.
>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "5511"
>>>>
>>>>
>>>> I also have a LargeIdentityDictionary, which is relatively fast, but not
>>>> as fast as LargeIdentitySet, because (for some unknown reason) we don't
>>>> have a primitive that could support it. If we had a primitive like
>>>> primitive 132 which would return the index of the element if found or 0 if
>>>> not, then we could have a really fast LargeIdentityDictionary.
>>>>
>>>>
>>>> Levente
>>>>
>>> Hehe yes, if writing a version fully exploiting the limited range, that's
>>> probably the approach I would go for as well.
>>> (IAssuming it's the version at http://leves.web.elte.hu/**
>>> squeak/LargeIdentitySet.st<http://leves.web.elte.hu/squeak/LargeIdentitySet.st>
>>> )
>>>
>>> Mariano commented in the version at http://www.squeaksource.com/**
>>> FuelExperiments <http://www.squeaksource.com/FuelExperiments> that it's
>>> slow for them, which I guess is due to not adopting #identityHash calls to
>>> #basicIdentityHash calls for Pharo:
>>> ((0 to: 4095) collect: [:each | each << 22 \\ 4096 ]) asSet size -> 1
>>> So it basically uses 1 bucket instead of 4096... Whoops. :)
>>>
>>> Uploaded a new version to the MC repository which is adapted for Pharo,
>>> on the same machine my numbers were taken from, it does the same test as I
>>> used above in 871 ms. (181 with preallocation).
>>>
>>
>> Cool. One more thing: in Squeak the method using primitive 132 directly
>> was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If
>> this was also added to Pharo, then the #pointsTo: sends should be changed
>> to #instVarsInclude:, otherwise Array can be reported as included even if
>> it wasn't added.
>>
>
> In Pharo we have:
>
> pointsTo: anObject
>    "This method returns true if self contains a pointer to anObject,
>        and returns false otherwise"
>    <primitive: 132>
>
> So I guess it is correct to let it like this.

Right, until you apply the patch. :)

>
>
>
>> I'll upload my LargeIdentityDictionary implementation to the same place
>> this evening, since it's still 2-3 factor faster than other solutionts and
>> there seem to be demand for it.
>>
>>
> I am lost.  thought I read something saying you couldn't do that for
> Dictionaries because you needed a primitive?  Sorry...long day, maybe I am
> just crazy ;)
> Anyway, I would appreaciate and take a look to LargeIdentityDictionary if
> you do it.
> BTW, I guess both are MIT ?

Yes, there's a license.txt file at the download page.


Levente

>
> Thanks Levente
>
>
>
>>
>> Levente
>>
>>
>>> Cheers,
>>> Henry
>>>
>>>
>>>
>>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>

Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Mariano Martinez Peck
In reply to this post by Levente Uzonyi-2


On Sat, Dec 17, 2011 at 12:52 AM, Levente Uzonyi <[hidden email]> wrote:
On Fri, 16 Dec 2011, Mariano Martinez Peck wrote:

Wow......

Henrik: thank you so much for teaching me :)  Indeed, I didn't notice that
in that case I use using a LargePositiveInteger, and indeed I had removed
the basicSize since I also noticed it wouldn't make much sense.  Second,
thanks a LOT for finding Levente's LargeIdentitySet in Fuel repo with our
small adaptation (#addIfNotPresent: anObject ifPresentDo: aBlock). Thanks
for fixing it and commiting a new version.

All I can say is that I am impressed by the numbers it is really much
faster.
I still don't understand why I send this email with a subject say
IdentitySet because what I really need is a fast/large IdentityDictionary
:(  Anyway, there's a place where we can use this LargeIdentitySet in Fuel
I think).

So Levente, you say this is not possible to adapt this for dictionary?  can
we contact Eliot to provide such a primitive?

As promised, I uploaded my LargeIdentityDictionary implementation to http://leves.web.elte.hu/squeak/LargeIdentityDictionary.st .
The numbers will be a bit worse compared to LargeIdentitySet, because of the lack of the primitive, but it's still 2-3x faster than other solutions (IdentityDictionary, PluggableIdentityDictionary, subclassing, etc). I'm about to propose this primitive with other improvements on the vm-dev list.


Thanks Levente. Is there something else I should change apart from #identityHash to #basicIdentityHash for Pharo?
Because I tried to running Fuel tests using an instance of LargeIdentityDictionary in the place were we usually use a IdentityDictionary and some tests are failing. It is difficult to reproduce them or to isolate from Fuel.  I will try to figure it out, but in the meanwhile, maybe you already know something.
I commited the first changes (#basicIdentityHash) in http://www.squeaksource.com/FuelExperiments and explained in the commit what should be changed in Fuel. I commented that because magically Herny fixed for us ;)

Time to sleep now...thanks!

 

Levente


thanks

On Fri, Dec 16, 2011 at 3:28 PM, Levente Uzonyi <[hidden email]> wrote:

On Fri, 16 Dec 2011, Henrik Sperre Johansen wrote:

 On 16.12.2011 03:26, Levente Uzonyi wrote:


How about my numbers? :)

"Preallocate objects, so we won't count gc time."
n := 1000000.
objects := Array new: n streamContents: [ :stream |
  n timesRepeat: [ stream nextPut: Object new ] ].

set := IdentitySet new: n.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "4949"

set := LargeIdentitySet new.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "331"

set := (PluggableSet new: n)
  hashBlock: [ :object | object identityHash * 4096 + object class
identityHash * 64 ]; "Change this to #basicIdentityHash in Pharo"
  equalBlock: [ :a :b | a == b ];
  yourself.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "5511"


I also have a LargeIdentityDictionary, which is relatively fast, but not
as fast as LargeIdentitySet, because (for some unknown reason) we don't
have a primitive that could support it. If we had a primitive like
primitive 132 which would return the index of the element if found or 0 if
not, then we could have a really fast LargeIdentityDictionary.


Levente

Hehe yes, if writing a version fully exploiting the limited range, that's
probably the approach I would go for as well.
(IAssuming it's the version at http://leves.web.elte.hu/**
squeak/LargeIdentitySet.st<http://leves.web.elte.hu/squeak/LargeIdentitySet.st>
)

Mariano commented in the version at http://www.squeaksource.com/**
FuelExperiments <http://www.squeaksource.com/FuelExperiments> that it's

slow for them, which I guess is due to not adopting #identityHash calls to
#basicIdentityHash calls for Pharo:
((0 to: 4095) collect: [:each | each << 22 \\ 4096 ]) asSet size -> 1
So it basically uses 1 bucket instead of 4096... Whoops. :)

Uploaded a new version to the MC repository which is adapted for Pharo,
on the same machine my numbers were taken from, it does the same test as I
used above in 871 ms. (181 with preallocation).


Cool. One more thing: in Squeak the method using primitive 132 directly
was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If
this was also added to Pharo, then the #pointsTo: sends should be changed
to #instVarsInclude:, otherwise Array can be reported as included even if
it wasn't added.
I'll upload my LargeIdentityDictionary implementation to the same place
this evening, since it's still 2-3 factor faster than other solutionts and
there seem to be demand for it.


Levente


Cheers,
Henry






--
Mariano
http://marianopeck.wordpress.com





--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Mariano Martinez Peck
BTW...I notice that I cannod do:   LargeIdentityDictionary new: 5454
So...it doesn't help if I know before hand the exact number of objects I will put?

thanks

On Sat, Dec 17, 2011 at 1:25 AM, Mariano Martinez Peck <[hidden email]> wrote:


On Sat, Dec 17, 2011 at 12:52 AM, Levente Uzonyi <[hidden email]> wrote:
On Fri, 16 Dec 2011, Mariano Martinez Peck wrote:

Wow......

Henrik: thank you so much for teaching me :)  Indeed, I didn't notice that
in that case I use using a LargePositiveInteger, and indeed I had removed
the basicSize since I also noticed it wouldn't make much sense.  Second,
thanks a LOT for finding Levente's LargeIdentitySet in Fuel repo with our
small adaptation (#addIfNotPresent: anObject ifPresentDo: aBlock). Thanks
for fixing it and commiting a new version.

All I can say is that I am impressed by the numbers it is really much
faster.
I still don't understand why I send this email with a subject say
IdentitySet because what I really need is a fast/large IdentityDictionary
:(  Anyway, there's a place where we can use this LargeIdentitySet in Fuel
I think).

So Levente, you say this is not possible to adapt this for dictionary?  can
we contact Eliot to provide such a primitive?

As promised, I uploaded my LargeIdentityDictionary implementation to http://leves.web.elte.hu/squeak/LargeIdentityDictionary.st .
The numbers will be a bit worse compared to LargeIdentitySet, because of the lack of the primitive, but it's still 2-3x faster than other solutions (IdentityDictionary, PluggableIdentityDictionary, subclassing, etc). I'm about to propose this primitive with other improvements on the vm-dev list.


Thanks Levente. Is there something else I should change apart from #identityHash to #basicIdentityHash for Pharo?
Because I tried to running Fuel tests using an instance of LargeIdentityDictionary in the place were we usually use a IdentityDictionary and some tests are failing. It is difficult to reproduce them or to isolate from Fuel.  I will try to figure it out, but in the meanwhile, maybe you already know something.
I commited the first changes (#basicIdentityHash) in http://www.squeaksource.com/FuelExperiments and explained in the commit what should be changed in Fuel. I commented that because magically Herny fixed for us ;)

Time to sleep now...thanks!

 

Levente


thanks

On Fri, Dec 16, 2011 at 3:28 PM, Levente Uzonyi <[hidden email]> wrote:

On Fri, 16 Dec 2011, Henrik Sperre Johansen wrote:

 On 16.12.2011 03:26, Levente Uzonyi wrote:


How about my numbers? :)

"Preallocate objects, so we won't count gc time."
n := 1000000.
objects := Array new: n streamContents: [ :stream |
  n timesRepeat: [ stream nextPut: Object new ] ].

set := IdentitySet new: n.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "4949"

set := LargeIdentitySet new.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "331"

set := (PluggableSet new: n)
  hashBlock: [ :object | object identityHash * 4096 + object class
identityHash * 64 ]; "Change this to #basicIdentityHash in Pharo"
  equalBlock: [ :a :b | a == b ];
  yourself.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "5511"


I also have a LargeIdentityDictionary, which is relatively fast, but not
as fast as LargeIdentitySet, because (for some unknown reason) we don't
have a primitive that could support it. If we had a primitive like
primitive 132 which would return the index of the element if found or 0 if
not, then we could have a really fast LargeIdentityDictionary.


Levente

Hehe yes, if writing a version fully exploiting the limited range, that's
probably the approach I would go for as well.
(IAssuming it's the version at http://leves.web.elte.hu/**
squeak/LargeIdentitySet.st<http://leves.web.elte.hu/squeak/LargeIdentitySet.st>
)

Mariano commented in the version at http://www.squeaksource.com/**
FuelExperiments <http://www.squeaksource.com/FuelExperiments> that it's

slow for them, which I guess is due to not adopting #identityHash calls to
#basicIdentityHash calls for Pharo:
((0 to: 4095) collect: [:each | each << 22 \\ 4096 ]) asSet size -> 1
So it basically uses 1 bucket instead of 4096... Whoops. :)

Uploaded a new version to the MC repository which is adapted for Pharo,
on the same machine my numbers were taken from, it does the same test as I
used above in 871 ms. (181 with preallocation).


Cool. One more thing: in Squeak the method using primitive 132 directly
was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If
this was also added to Pharo, then the #pointsTo: sends should be changed
to #instVarsInclude:, otherwise Array can be reported as included even if
it wasn't added.
I'll upload my LargeIdentityDictionary implementation to the same place
this evening, since it's still 2-3 factor faster than other solutionts and
there seem to be demand for it.


Levente


Cheers,
Henry






--
Mariano
http://marianopeck.wordpress.com





--
Mariano
http://marianopeck.wordpress.com




--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Mariano Martinez Peck
In reply to this post by Mariano Martinez Peck


On Sat, Dec 17, 2011 at 1:25 AM, Mariano Martinez Peck <[hidden email]> wrote:


On Sat, Dec 17, 2011 at 12:52 AM, Levente Uzonyi <[hidden email]> wrote:
On Fri, 16 Dec 2011, Mariano Martinez Peck wrote:

Wow......

Henrik: thank you so much for teaching me :)  Indeed, I didn't notice that
in that case I use using a LargePositiveInteger, and indeed I had removed
the basicSize since I also noticed it wouldn't make much sense.  Second,
thanks a LOT for finding Levente's LargeIdentitySet in Fuel repo with our
small adaptation (#addIfNotPresent: anObject ifPresentDo: aBlock). Thanks
for fixing it and commiting a new version.

All I can say is that I am impressed by the numbers it is really much
faster.
I still don't understand why I send this email with a subject say
IdentitySet because what I really need is a fast/large IdentityDictionary
:(  Anyway, there's a place where we can use this LargeIdentitySet in Fuel
I think).

So Levente, you say this is not possible to adapt this for dictionary?  can
we contact Eliot to provide such a primitive?

As promised, I uploaded my LargeIdentityDictionary implementation to http://leves.web.elte.hu/squeak/LargeIdentityDictionary.st .
The numbers will be a bit worse compared to LargeIdentitySet, because of the lack of the primitive, but it's still 2-3x faster than other solutions (IdentityDictionary, PluggableIdentityDictionary, subclassing, etc). I'm about to propose this primitive with other improvements on the vm-dev list.


Thanks Levente. Is there something else I should change apart from #identityHash to #basicIdentityHash for Pharo?
Because I tried to running Fuel tests using an instance of LargeIdentityDictionary in the place were we usually use a IdentityDictionary and some tests are failing. It is difficult to reproduce them or to isolate from Fuel.  I will try to figure it out, but in the meanwhile, maybe you already know something.
I commited the first changes (#basicIdentityHash) in http://www.squeaksource.com/FuelExperiments and explained in the commit what should be changed in Fuel. I commented that because magically Herny fixed for us ;)


Just for fun I took a Pharo image and in IdentityDictionaryTest  I changed the method

classToBeTested

    ^ LargeIdentityDictionary

and there are 7 failures and 48 errors while with IdentityDictionary they are all green. So I guess there are some differences. The red ones are easy because I guess it is just that LargeIdentityDictionary does not have the whole protocol but a subset. The problem are the failing one I think because the suggest the usage or API is different?

Thanks in advance,

 
Time to sleep now...thanks!

 

Levente


thanks

On Fri, Dec 16, 2011 at 3:28 PM, Levente Uzonyi <[hidden email]> wrote:

On Fri, 16 Dec 2011, Henrik Sperre Johansen wrote:

 On 16.12.2011 03:26, Levente Uzonyi wrote:


How about my numbers? :)

"Preallocate objects, so we won't count gc time."
n := 1000000.
objects := Array new: n streamContents: [ :stream |
  n timesRepeat: [ stream nextPut: Object new ] ].

set := IdentitySet new: n.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "4949"

set := LargeIdentitySet new.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "331"

set := (PluggableSet new: n)
  hashBlock: [ :object | object identityHash * 4096 + object class
identityHash * 64 ]; "Change this to #basicIdentityHash in Pharo"
  equalBlock: [ :a :b | a == b ];
  yourself.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "5511"


I also have a LargeIdentityDictionary, which is relatively fast, but not
as fast as LargeIdentitySet, because (for some unknown reason) we don't
have a primitive that could support it. If we had a primitive like
primitive 132 which would return the index of the element if found or 0 if
not, then we could have a really fast LargeIdentityDictionary.


Levente

Hehe yes, if writing a version fully exploiting the limited range, that's
probably the approach I would go for as well.
(IAssuming it's the version at http://leves.web.elte.hu/**
squeak/LargeIdentitySet.st<http://leves.web.elte.hu/squeak/LargeIdentitySet.st>
)

Mariano commented in the version at http://www.squeaksource.com/**
FuelExperiments <http://www.squeaksource.com/FuelExperiments> that it's

slow for them, which I guess is due to not adopting #identityHash calls to
#basicIdentityHash calls for Pharo:
((0 to: 4095) collect: [:each | each << 22 \\ 4096 ]) asSet size -> 1
So it basically uses 1 bucket instead of 4096... Whoops. :)

Uploaded a new version to the MC repository which is adapted for Pharo,
on the same machine my numbers were taken from, it does the same test as I
used above in 871 ms. (181 with preallocation).


Cool. One more thing: in Squeak the method using primitive 132 directly
was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If
this was also added to Pharo, then the #pointsTo: sends should be changed
to #instVarsInclude:, otherwise Array can be reported as included even if
it wasn't added.
I'll upload my LargeIdentityDictionary implementation to the same place
this evening, since it's still 2-3 factor faster than other solutionts and
there seem to be demand for it.


Levente


Cheers,
Henry






--
Mariano
http://marianopeck.wordpress.com





--
Mariano
http://marianopeck.wordpress.com




--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Benoit St-Jean
In reply to this post by Levente Uzonyi-2
Hi Levente,

Have you tried a different value for your array and tallies size instead of 4096 in the code you provided for LargeIdentitySet? Such as prime numbers, as it is usually the case for hash tables?

I've tried 3079, and 4093 (prime numbers) and 3079 performs a little bit better .  Other meaningful suggestions can be found here :

http://planetmath.org/encyclopedia/GoodHashTablePrimes.html

P.S.  So far, 3079 is the best candidate saving between 26 and 35% on very large sets !!
P.P.S.  Nice work!


-----------------
Benoit St-Jean
Yahoo! Messenger: bstjean
A standpoint is an intellectual horizon of radius zero.
(Albert Einstein)

From: Levente Uzonyi <[hidden email]>
To: [hidden email]
Sent: Friday, December 16, 2011 6:52:29 PM
Subject: Re: [Pharo-project] IdentitySet but using #hash rather than #identityHash ?

On Fri, 16 Dec 2011, Mariano Martinez Peck wrote:

> Wow......
>
> Henrik: thank you so much for teaching me :)  Indeed, I didn't notice that
> in that case I use using a LargePositiveInteger, and indeed I had removed
> the basicSize since I also noticed it wouldn't make much sense.  Second,
> thanks a LOT for finding Levente's LargeIdentitySet in Fuel repo with our
> small adaptation (#addIfNotPresent: anObject ifPresentDo: aBlock). Thanks
> for fixing it and commiting a new version.
>
> All I can say is that I am impressed by the numbers it is really much
> faster.
> I still don't understand why I send this email with a subject say
> IdentitySet because what I really need is a fast/large IdentityDictionary
> :(  Anyway, there's a place where we can use this LargeIdentitySet in Fuel
> I think).
>
> So Levente, you say this is not possible to adapt this for dictionary?  can
> we contact Eliot to provide such a primitive?

As promised, I uploaded my LargeIdentityDictionary implementation to
http://leves.web.elte.hu/squeak/LargeIdentityDictionary.st .
The numbers will be a bit worse compared to LargeIdentitySet, because of
the lack of the primitive, but it's still 2-3x faster than other
solutions (IdentityDictionary, PluggableIdentityDictionary, subclassing,
etc). I'm about to propose this primitive with other improvements on the
vm-dev list.


Levente

>
> thanks
>
> On Fri, Dec 16, 2011 at 3:28 PM, Levente Uzonyi <[hidden email]> wrote:
>
>> On Fri, 16 Dec 2011, Henrik Sperre Johansen wrote:
>>
>>  On 16.12.2011 03:26, Levente Uzonyi wrote:
>>>
>>>>
>>>> How about my numbers? :)
>>>>
>>>> "Preallocate objects, so we won't count gc time."
>>>> n := 1000000.
>>>> objects := Array new: n streamContents: [ :stream |
>>>>    n timesRepeat: [ stream nextPut: Object new ] ].
>>>>
>>>> set := IdentitySet new: n.
>>>> Smalltalk garbageCollect.
>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "4949"
>>>>
>>>> set := LargeIdentitySet new.
>>>> Smalltalk garbageCollect.
>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "331"
>>>>
>>>> set := (PluggableSet new: n)
>>>>    hashBlock: [ :object | object identityHash * 4096 + object class
>>>> identityHash * 64 ]; "Change this to #basicIdentityHash in Pharo"
>>>>    equalBlock: [ :a :b | a == b ];
>>>>    yourself.
>>>> Smalltalk garbageCollect.
>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "5511"
>>>>
>>>>
>>>> I also have a LargeIdentityDictionary, which is relatively fast, but not
>>>> as fast as LargeIdentitySet, because (for some unknown reason) we don't
>>>> have a primitive that could support it. If we had a primitive like
>>>> primitive 132 which would return the index of the element if found or 0 if
>>>> not, then we could have a really fast LargeIdentityDictionary.
>>>>
>>>>
>>>> Levente
>>>>
>>> Hehe yes, if writing a version fully exploiting the limited range, that's
>>> probably the approach I would go for as well.
>>> (IAssuming it's the version at http://leves.web.elte.hu/**
>>> squeak/LargeIdentitySet.st<http://leves.web.elte.hu/squeak/LargeIdentitySet.st>
>>> )
>>>
>>> Mariano commented in the version at http://www.squeaksource.com/**
>>> FuelExperiments <http://www.squeaksource.com/FuelExperiments> that it's
>>> slow for them, which I guess is due to not adopting #identityHash calls to
>>> #basicIdentityHash calls for Pharo:
>>> ((0 to: 4095) collect: [:each | each << 22 \\ 4096 ]) asSet size -> 1
>>> So it basically uses 1 bucket instead of 4096... Whoops. :)
>>>
>>> Uploaded a new version to the MC repository which is adapted for Pharo,
>>> on the same machine my numbers were taken from, it does the same test as I
>>> used above in 871 ms. (181 with preallocation).
>>>
>>
>> Cool. One more thing: in Squeak the method using primitive 132 directly
>> was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If
>> this was also added to Pharo, then the #pointsTo: sends should be changed
>> to #instVarsInclude:, otherwise Array can be reported as included even if
>> it wasn't added.
>> I'll upload my LargeIdentityDictionary implementation to the same place
>> this evening, since it's still 2-3 factor faster than other solutionts and
>> there seem to be demand for it.
>>
>>
>> Levente
>>
>>
>>> Cheers,
>>> Henry
>>>
>>>
>>>
>>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>



Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Stéphane Ducasse
In reply to this post by Levente Uzonyi-2
I would love to have that!

Stef

>
>
> I also have a LargeIdentityDictionary, which is relatively fast, but not as fast as LargeIdentitySet, because (for some unknown reason) we don't have a primitive that could support it. If we had a primitive like primitive 132 which would return the index of the element if found or 0 if not, then we could have a really fast LargeIdentityDictionary.
>
>
> Levente
>
>>
>> Cheers,
>> Henry
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Stéphane Ducasse
In reply to this post by Levente Uzonyi-2

On Dec 16, 2011, at 3:28 PM, Levente Uzonyi wrote:

> Cool. One more thing: in Squeak the method using primitive 132 directly was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If this was also added to Pharo, then the #pointsTo: sends should be changed to #instVarsInclude:, otherwise Array can be reported as included even if it wasn't added.
> I'll upload my LargeIdentityDictionary implementation to the same place this evening, since it's still 2-3 factor faster than other solutionts and there seem to be demand for it.

Levente

> "in Squeak the method using primitive 132 directly was renamed to #instVarsInclude:, so now #pointsTo: works as expected."
I do not get the following. Indeed pointTo: looks like instVarsInclude: now I do not understand the rest of your paragraph.

Stef
Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Mariano Martinez Peck
Hi Levente. I take a deeper look to see why LargeIdentityDictionary is not working as expected for us.

The way we are putting objects in the Dict is always this way:

registerIndexesOn: aDictionary

    self objects do: [ :instance | aDictionary at: instance put: aDictionary size + 1 ].


So…keys are the objects of the graphs and the values are the internal position in the serialization.  However, when I run a serialization it looks like if the follow happens that when we are doing:

>> encodeReferenceTo: anObject

    indexCluster
        serialize: (objectsIndexes at: anObject)
        on: stream


(objectsIndexes at: anObject) == anObject   ->  true

so  (objectsIndexes at: anObject)  is answering "self" rather than the number (which was the dict size + 1 at the serialization moment).

Something interesting is that most tests seem to pass and the problem is with benchmarks (where graphs are much bigger), so maybe it could be something related to that.  For example, the following benchmark for bitmaps is failing:

         numbers := OrderedCollection new.
    bitmaps := OrderedCollection new.
    numbers addAll:  ( 1 << 30 to:( ( 1 << 30) + 10000) ) asArray.
    numbers do: [:each | bitmaps add: (Bitmap with: each with: each + 1) ].
    ^ bitmaps

it happens the mentioned problem when trying to serialize the collection 'bitmaps'.


I tried to reproduce it outside Fuel and I think I succeeded (not sure). The following does work with IdentityDictionary but not with LargeIdentityDictionary. Basically I am simulating the way we use this dict in Fuel:

testLargeIdentityDictionary

    | numbers bitmaps dict |
   
"Create some bitmaps"

    numbers := OrderedCollection new.
    bitmaps := OrderedCollection new.
    numbers addAll:  ( 1 << 30 to:( ( 1 << 30) + 10000) ) asArray.
    numbers do: [:each | bitmaps add: (Bitmap with: each with: each + 1) ].
   
"Put them in a dictionary using dict as as key"
    dict := LargeIdentityDictionary new.
    bitmaps do: [:each |
         dict at: each put: dict size + 1.
        ].
   
"Check they are correct"
    bitmaps doWithIndex: [:each :index | self assert: index == (dict at: each)]
   
Of course that test works with IdentityDictionary.
   

Thanks for any idea.



On Sat, Dec 17, 2011 at 1:50 PM, Stéphane Ducasse <[hidden email]> wrote:

On Dec 16, 2011, at 3:28 PM, Levente Uzonyi wrote:

> Cool. One more thing: in Squeak the method using primitive 132 directly was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If this was also added to Pharo, then the #pointsTo: sends should be changed to #instVarsInclude:, otherwise Array can be reported as included even if it wasn't added.
> I'll upload my LargeIdentityDictionary implementation to the same place this evening, since it's still 2-3 factor faster than other solutionts and there seem to be demand for it.

Levente

> "in Squeak the method using primitive 132 directly was renamed to #instVarsInclude:, so now #pointsTo: works as expected."
I do not get the following. Indeed pointTo: looks like instVarsInclude: now I do not understand the rest of your paragraph.

Stef



--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Nicolas Cellier
2011/12/17 Mariano Martinez Peck <[hidden email]>:

> Hi Levente. I take a deeper look to see why LargeIdentityDictionary is not
> working as expected for us.
>
> The way we are putting objects in the Dict is always this way:
>
> registerIndexesOn: aDictionary
>
>     self objects do: [ :instance | aDictionary at: instance put: aDictionary
> size + 1 ].
>
>
> So…keys are the objects of the graphs and the values are the internal
> position in the serialization.  However, when I run a serialization it looks
> like if the follow happens that when we are doing:
>
>>> encodeReferenceTo: anObject
>
>     indexCluster
>         serialize: (objectsIndexes at: anObject)
>         on: stream
>
>
> (objectsIndexes at: anObject) == anObject   ->  true
>
> so  (objectsIndexes at: anObject)  is answering "self" rather than the
> number (which was the dict size + 1 at the serialization moment).
>
> Something interesting is that most tests seem to pass and the problem is
> with benchmarks (where graphs are much bigger), so maybe it could be
> something related to that.  For example, the following benchmark for bitmaps
> is failing:
>
>          numbers := OrderedCollection new.
>     bitmaps := OrderedCollection new.
>     numbers addAll:  ( 1 << 30 to:( ( 1 << 30) + 10000) ) asArray.
>     numbers do: [:each | bitmaps add: (Bitmap with: each with: each + 1) ].
>     ^ bitmaps
>
> it happens the mentioned problem when trying to serialize the collection
> 'bitmaps'.
>
>
> I tried to reproduce it outside Fuel and I think I succeeded (not sure). The
> following does work with IdentityDictionary but not with
> LargeIdentityDictionary. Basically I am simulating the way we use this dict
> in Fuel:
>
> testLargeIdentityDictionary
>
>     | numbers bitmaps dict |
>
> "Create some bitmaps"
>
>     numbers := OrderedCollection new.
>     bitmaps := OrderedCollection new.
>     numbers addAll:  ( 1 << 30 to:( ( 1 << 30) + 10000) ) asArray.
>     numbers do: [:each | bitmaps add: (Bitmap with: each with: each + 1) ].
>
> "Put them in a dictionary using dict as as key"
>     dict := LargeIdentityDictionary new.
>     bitmaps do: [:each |
>          dict at: each put: dict size + 1.
>         ].
>
> "Check they are correct"
>     bitmaps doWithIndex: [:each :index | self assert: index == (dict at:
> each)]
>
> Of course that test works with IdentityDictionary.
>

Can't you debug it ?

>
> Thanks for any idea.
>
>
>
> On Sat, Dec 17, 2011 at 1:50 PM, Stéphane Ducasse
> <[hidden email]> wrote:
>>
>>
>> On Dec 16, 2011, at 3:28 PM, Levente Uzonyi wrote:
>>
>> > Cool. One more thing: in Squeak the method using primitive 132 directly
>> > was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If
>> > this was also added to Pharo, then the #pointsTo: sends should be changed to
>> > #instVarsInclude:, otherwise Array can be reported as included even if it
>> > wasn't added.
>> > I'll upload my LargeIdentityDictionary implementation to the same place
>> > this evening, since it's still 2-3 factor faster than other solutionts and
>> > there seem to be demand for it.
>>
>> Levente
>>
>> > "in Squeak the method using primitive 132 directly was renamed to
>> > #instVarsInclude:, so now #pointsTo: works as expected."
>> I do not get the following. Indeed pointTo: looks like instVarsInclude:
>> now I do not understand the rest of your paragraph.
>>
>> Stef
>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>

Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Nicolas Cellier
2011/12/17 Nicolas Cellier <[hidden email]>:

> 2011/12/17 Mariano Martinez Peck <[hidden email]>:
>> Hi Levente. I take a deeper look to see why LargeIdentityDictionary is not
>> working as expected for us.
>>
>> The way we are putting objects in the Dict is always this way:
>>
>> registerIndexesOn: aDictionary
>>
>>     self objects do: [ :instance | aDictionary at: instance put: aDictionary
>> size + 1 ].
>>
>>
>> So…keys are the objects of the graphs and the values are the internal
>> position in the serialization.  However, when I run a serialization it looks
>> like if the follow happens that when we are doing:
>>
>>>> encodeReferenceTo: anObject
>>
>>     indexCluster
>>         serialize: (objectsIndexes at: anObject)
>>         on: stream
>>
>>
>> (objectsIndexes at: anObject) == anObject   ->  true
>>
>> so  (objectsIndexes at: anObject)  is answering "self" rather than the
>> number (which was the dict size + 1 at the serialization moment).
>>
>> Something interesting is that most tests seem to pass and the problem is
>> with benchmarks (where graphs are much bigger), so maybe it could be
>> something related to that.  For example, the following benchmark for bitmaps
>> is failing:
>>
>>          numbers := OrderedCollection new.
>>     bitmaps := OrderedCollection new.
>>     numbers addAll:  ( 1 << 30 to:( ( 1 << 30) + 10000) ) asArray.
>>     numbers do: [:each | bitmaps add: (Bitmap with: each with: each + 1) ].
>>     ^ bitmaps
>>
>> it happens the mentioned problem when trying to serialize the collection
>> 'bitmaps'.
>>
>>
>> I tried to reproduce it outside Fuel and I think I succeeded (not sure). The
>> following does work with IdentityDictionary but not with
>> LargeIdentityDictionary. Basically I am simulating the way we use this dict
>> in Fuel:
>>
>> testLargeIdentityDictionary
>>
>>     | numbers bitmaps dict |
>>
>> "Create some bitmaps"
>>
>>     numbers := OrderedCollection new.
>>     bitmaps := OrderedCollection new.
>>     numbers addAll:  ( 1 << 30 to:( ( 1 << 30) + 10000) ) asArray.
>>     numbers do: [:each | bitmaps add: (Bitmap with: each with: each + 1) ].
>>
>> "Put them in a dictionary using dict as as key"
>>     dict := LargeIdentityDictionary new.
>>     bitmaps do: [:each |
>>          dict at: each put: dict size + 1.
>>         ].
>>
>> "Check they are correct"
>>     bitmaps doWithIndex: [:each :index | self assert: index == (dict at:
>> each)]
>>
>> Of course that test works with IdentityDictionary.
>>
>
> Can't you debug it ?
>

It took 2 minutes to find this sentence at line 19 of  LargeDictionary>>at:put:

(values at: hash) at: newIndex put: key

Can you replace key with value and retry the tests ?

Nicolas

>>
>> Thanks for any idea.
>>
>>
>>
>> On Sat, Dec 17, 2011 at 1:50 PM, Stéphane Ducasse
>> <[hidden email]> wrote:
>>>
>>>
>>> On Dec 16, 2011, at 3:28 PM, Levente Uzonyi wrote:
>>>
>>> > Cool. One more thing: in Squeak the method using primitive 132 directly
>>> > was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If
>>> > this was also added to Pharo, then the #pointsTo: sends should be changed to
>>> > #instVarsInclude:, otherwise Array can be reported as included even if it
>>> > wasn't added.
>>> > I'll upload my LargeIdentityDictionary implementation to the same place
>>> > this evening, since it's still 2-3 factor faster than other solutionts and
>>> > there seem to be demand for it.
>>>
>>> Levente
>>>
>>> > "in Squeak the method using primitive 132 directly was renamed to
>>> > #instVarsInclude:, so now #pointsTo: works as expected."
>>> I do not get the following. Indeed pointTo: looks like instVarsInclude:
>>> now I do not understand the rest of your paragraph.
>>>
>>> Stef
>>
>>
>>
>>
>> --
>> Mariano
>> http://marianopeck.wordpress.com
>>

Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Mariano Martinez Peck

>>
>
> Can't you debug it ?
>


Of course I could do it when I have some time. It already took me 1 hour to isolate the bug from Fuel and make a reproducible test. I hope that at least you find that useful. I found the bug and I thought I could send my progress and see if someone could continue it. Otherwise, I was going to debug it as soon as I have some free time (which I don't have much this year).
 
It took 2 minutes to find this sentence at line 19 of  LargeDictionary>>at:put:


But you are so smart and a Collection hacker. Me, I am newbie, and just the method at:put: scares me, but I understand if it has to be that way in order to be fast. And so far it seems to be fast, so I am happy with it.

 
(values at: hash) at: newIndex put: key
 
Can you replace key with value and retry the tests ?


Yes, that works. Thanks!

Now I noticed that SmallIntegers storing is really slow

    numbers :=  OrderedCollection new.
    numbers add: 1.
    numbers addAll: (1 to: 1 << 29 by: 1 << 14) asArray.
    numbers addAll: ((1 to: 1 << 29 by: 1 << 14) asArray collect: [:each | each negated] ).
   
    dict := LargeIdentityDictionary new.
    [numbers do: [:each |
         dict at: each put: dict size + 1.
        ].
    ] timeToRun   
     -> 12657


whereas with IdentityDictionary it is 37.  I guess it could be related to #identityHash or #basicIdentityHash.  I tried in Squeak 4.3 and it is also also there.
I will try to take a look into it during the next week.

Thanks in advance

 
Nicolas

>>
>> Thanks for any idea.
>>
>>
>>
>> On Sat, Dec 17, 2011 at 1:50 PM, Stéphane Ducasse
>> <[hidden email]> wrote:
>>>
>>>
>>> On Dec 16, 2011, at 3:28 PM, Levente Uzonyi wrote:
>>>
>>> > Cool. One more thing: in Squeak the method using primitive 132 directly
>>> > was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If
>>> > this was also added to Pharo, then the #pointsTo: sends should be changed to
>>> > #instVarsInclude:, otherwise Array can be reported as included even if it
>>> > wasn't added.
>>> > I'll upload my LargeIdentityDictionary implementation to the same place
>>> > this evening, since it's still 2-3 factor faster than other solutionts and
>>> > there seem to be demand for it.
>>>
>>> Levente
>>>
>>> > "in Squeak the method using primitive 132 directly was renamed to
>>> > #instVarsInclude:, so now #pointsTo: works as expected."
>>> I do not get the following. Indeed pointTo: looks like instVarsInclude:
>>> now I do not understand the rest of your paragraph.
>>>
>>> Stef
>>
>>
>>
>>
>> --
>> Mariano
>> http://marianopeck.wordpress.com
>>




--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet but using #hash rather than #identityHash ?

Levente Uzonyi-2
In reply to this post by Benoit St-Jean
On Fri, 16 Dec 2011, Benoit St-Jean wrote:

> Hi Levente,

Have you tried a different value for your array and tallies size instead of 4096 in the code you provided for LargeIdentitySet? Such as prime numbers, as it is usually the case for hash tables?


I've tried 3079, and 4093 (prime numbers) and 3079 performs a little bit better .  Other meaningful suggestions can be found here :

http://planetmath.org/encyclopedia/GoodHashTablePrimes.html

P.S.  So far, 3079 is the best candidate saving between 26 and 35% on very large sets !!
P.P.S.  Nice work!


Using primes is suboptimal in this case, because there are exactly 4096
possible identityHash values (for non-SmallInteger objects), so the
mapping from hash value to slot is as simple (and fast) as possible. Also
note that not all hash tables use prime sizes, using a power of two for
size and multiplying in the hash function with a special prime is also
very common.


Levente

-----------------
Benoit St-Jean
Yahoo! Messenger: bstjean
A standpoint is an intellectual horizon of radius zero.
(Albert Einstein)


>________________________________
> From: Levente Uzonyi <[hidden email]>
>To: [hidden email]
>Sent: Friday, December 16, 2011 6:52:29 PM
>Subject: Re: [Pharo-project] IdentitySet but using #hash rather than #identityHash ?
>
>On Fri, 16 Dec 2011, Mariano Martinez Peck wrote:
>
>> Wow......
>>
>> Henrik: thank you so much for teaching me :)  Indeed, I didn't notice that
>> in that case I use using a LargePositiveInteger, and indeed I had removed
>> the basicSize since I also noticed it wouldn't make much sense.  Second,
>> thanks a LOT for finding Levente's LargeIdentitySet in Fuel repo with our
>> small adaptation (#addIfNotPresent: anObject ifPresentDo: aBlock). Thanks
>> for fixing it and commiting a new version.
>>
>> All I can say is that I am impressed by the numbers it is really much
>> faster.
>> I still don't understand why I send this email with a subject say
>> IdentitySet because what I really need is a fast/large IdentityDictionary
>> :(  Anyway, there's a place where we can use this LargeIdentitySet in Fuel
>> I think).
>>
>> So Levente, you say this is not possible to adapt this for dictionary?  can
>> we contact Eliot to provide such a primitive?
>
>As promised, I uploaded my LargeIdentityDictionary implementation to
>http://leves.web.elte.hu/squeak/LargeIdentityDictionary.st .
>The numbers will be a bit worse compared to LargeIdentitySet, because of
>the lack of the primitive, but it's still 2-3x faster than other
>solutions (IdentityDictionary, PluggableIdentityDictionary, subclassing,
>etc). I'm about to propose this primitive with other improvements on the
>vm-dev list.
>
>
>Levente
>
>>
>> thanks
>>
>> On Fri, Dec 16, 2011 at 3:28 PM, Levente Uzonyi <[hidden email]> wrote:
>>
>>> On Fri, 16 Dec 2011, Henrik Sperre Johansen wrote:
>>>
>>>  On 16.12.2011 03:26, Levente Uzonyi wrote:
>>>>
>>>>>
>>>>> How about my numbers? :)
>>>>>
>>>>> "Preallocate objects, so we won't count gc time."
>>>>> n := 1000000.
>>>>> objects := Array new: n streamContents: [ :stream |
>>>>>    n timesRepeat: [ stream nextPut: Object new ] ].
>>>>>
>>>>> set := IdentitySet new: n.
>>>>> Smalltalk garbageCollect.
>>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "4949"
>>>>>
>>>>> set := LargeIdentitySet new.
>>>>> Smalltalk garbageCollect.
>>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "331"
>>>>>
>>>>> set := (PluggableSet new: n)
>>>>>    hashBlock: [ :object | object identityHash * 4096 + object class
>>>>> identityHash * 64 ]; "Change this to #basicIdentityHash in Pharo"
>>>>>    equalBlock: [ :a :b | a == b ];
>>>>>    yourself.
>>>>> Smalltalk garbageCollect.
>>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "5511"
>>>>>
>>>>>
>>>>> I also have a LargeIdentityDictionary, which is relatively fast, but not
>>>>> as fast as LargeIdentitySet, because (for some unknown reason) we don't
>>>>> have a primitive that could support it. If we had a primitive like
>>>>> primitive 132 which would return the index of the element if found or 0 if
>>>>> not, then we could have a really fast LargeIdentityDictionary.
>>>>>
>>>>>
>>>>> Levente
>>>>>
>>>> Hehe yes, if writing a version fully exploiting the limited range, that's
>>>> probably the approach I would go for as well.
>>>> (IAssuming it's the version at http://leves.web.elte.hu/**
>>>> squeak/LargeIdentitySet.st<http://leves.web.elte.hu/squeak/LargeIdentitySet.st>
>>>> )
>>>>
>>>> Mariano commented in the version at http://www.squeaksource.com/**
>>>> FuelExperiments <http://www.squeaksource.com/FuelExperiments> that it's
>>>> slow for them, which I guess is due to not adopting #identityHash calls to
>>>> #basicIdentityHash calls for Pharo:
>>>> ((0 to: 4095) collect: [:each | each << 22 \\ 4096 ]) asSet size -> 1
>>>> So it basically uses 1 bucket instead of 4096... Whoops. :)
>>>>
>>>> Uploaded a new version to the MC repository which is adapted for Pharo,
>>>> on the same machine my numbers were taken from, it does the same test as I
>>>> used above in 871 ms. (181 with preallocation).
>>>>
>>>
>>> Cool. One more thing: in Squeak the method using primitive 132 directly
>>> was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If
>>> this was also added to Pharo, then the #pointsTo: sends should be changed
>>> to #instVarsInclude:, otherwise Array can be reported as included even if
>>> it wasn't added.
>>> I'll upload my LargeIdentityDictionary implementation to the same place
>>> this evening, since it's still 2-3 factor faster than other solutionts and
>>> there seem to be demand for it.
>>>
>>>
>>> Levente
>>>
>>>
>>>> Cheers,
>>>> Henry
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Mariano
>> http://marianopeck.wordpress.com
>>
>
>
>
>
1234