Ideas about sets and dictionaries

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
69 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: Re: Ideas about sets and dictionaries

Igor Stasenko
2009/11/13 Andreas Raab <[hidden email]>:

> Igor Stasenko wrote:
>>
>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>> unexpected passes
>>
>>
>> After applying changes to sets using nil wrappers [1]:
>>
>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>> unexpected passes
>>
>>
>> After adding changes to sets using negative tally[2]:
>>
>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>> unexpected passes
>
> Those are great results!
>
>> [1] http://bugs.squeak.org/file_download.php?file_id=3829&type=bug
>
> Yeah... seeing the code I like the wrapper solution even better. It's just
> so elegant. Virtually no overhead, nicely dealing with all sorts of
> nestings, having the option for future extensions (weak elements, collection
> elements etc). I think I've just promoted that to my top choice ;-)
>
Yes, i can't resist the elegancy of idea as well, that's why i put it
first in my list of preference :)

> Seriously folks, look at that code. It's a great solution.
>
> Cheers,
>  - Andreas
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Re: Ideas about sets and dictionaries

Bert Freudenberg
In reply to this post by Andreas.Raab

On 13.11.2009, at 08:39, Andreas Raab wrote:

> Igor Stasenko wrote:
>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>> unexpected passes
>> After applying changes to sets using nil wrappers [1]:
>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>> unexpected passes
>> After adding changes to sets using negative tally[2]:
>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>> unexpected passes
>
> Those are great results!
>
>> [1] http://bugs.squeak.org/file_download.php?file_id=3829&type=bug
>
> Yeah... seeing the code I like the wrapper solution even better. It's just so elegant. Virtually no overhead, nicely dealing with all sorts of nestings, having the option for future extensions (weak elements, collection elements etc). I think I've just promoted that to my top choice ;-)
>
> Seriously folks, look at that code. It's a great solution.
>
> Cheers,
>  - Andreas
>

I agree it's elegant. Unless someone actually puts a nil in the Set, its inner structure is exactly the same as before. From the description it sounded like any element would be wrapped, but of course only nil gets wrapped (and instances of SetElement itself).

There is a performance decrease due to the additional message sends. The following micro benchmark demonstrates it:

| r a b s |
r := Random seed: 1234567.
a := (1 to: 100000) collect: [:i | r nextInt: 1000000].
b := (1 to: 100000) collect: [:i | r nextInt: 1000000].
s := Set new.
Smalltalk garbageCollectMost.
[ a do: [:each | s add: each].
        b do: [:each | s includes: each].
        b do: [:each | s like: each].
        a do: [:each | s remove: each ifAbsent: []].
] timeToRun

I'm purposely not including my results, do measure yourself if you care (but don't spoil the fun for others).

However, unless someone can demonstrate this affects a macro benchmark, I'd choose this for its generality and minimal compatibility risk.

+1 from me :)

- Bert -


Reply | Threaded
Open this post in threaded view
|

Re: Re: Ideas about sets and dictionaries

Levente Uzonyi-2
On Fri, 13 Nov 2009, Bert Freudenberg wrote:

>
> On 13.11.2009, at 08:39, Andreas Raab wrote:
>
>> Igor Stasenko wrote:
>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>> unexpected passes
>>> After applying changes to sets using nil wrappers [1]:
>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>> unexpected passes
>>> After adding changes to sets using negative tally[2]:
>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>> unexpected passes
>>
>> Those are great results!
>>
>>> [1] http://bugs.squeak.org/file_download.php?file_id=3829&type=bug
>>
>> Yeah... seeing the code I like the wrapper solution even better. It's just so elegant. Virtually no overhead, nicely dealing with all sorts of nestings, having the option for future extensions (weak elements, collection elements etc). I think I've just promoted that to my top choice ;-)
>>
>> Seriously folks, look at that code. It's a great solution.
>>
>> Cheers,
>>  - Andreas
>>
>
> I agree it's elegant. Unless someone actually puts a nil in the Set, its inner structure is exactly the same as before. From the description it sounded like any element would be wrapped, but of course only nil gets wrapped (and instances of SetElement itself).
>
> There is a performance decrease due to the additional message sends. The following micro benchmark demonstrates it:
>
> | r a b s |
> r := Random seed: 1234567.
> a := (1 to: 100000) collect: [:i | r nextInt: 1000000].
> b := (1 to: 100000) collect: [:i | r nextInt: 1000000].
> s := Set new.
> Smalltalk garbageCollectMost.
> [ a do: [:each | s add: each].
> b do: [:each | s includes: each].
> b do: [:each | s like: each].
> a do: [:each | s remove: each ifAbsent: []].
> ] timeToRun
>
> I'm purposely not including my results, do measure yourself if you care (but don't spoil the fun for others).
>
> However, unless someone can demonstrate this affects a macro benchmark, I'd choose this for its generality and minimal compatibility risk.

I was about to publish my measurements when I read your mail.
I could write a macro benchmark that could abuse the "weakness" of this
implementation and would show real slowdown, but it's easier to decrease
the performance penalty of the implementation to almost 0.
Btw how often do you use Set >> #like:?

Levente

>
> +1 from me :)
>
> - Bert -
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Re: Ideas about sets and dictionaries

Nicolas Cellier
2009/11/13 Levente Uzonyi <[hidden email]>:

> On Fri, 13 Nov 2009, Bert Freudenberg wrote:
>
>>
>> On 13.11.2009, at 08:39, Andreas Raab wrote:
>>
>>> Igor Stasenko wrote:
>>>>
>>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>>> unexpected passes
>>>> After applying changes to sets using nil wrappers [1]:
>>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>>> unexpected passes
>>>> After adding changes to sets using negative tally[2]:
>>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>>> unexpected passes
>>>
>>> Those are great results!
>>>
>>>> [1] http://bugs.squeak.org/file_download.php?file_id=3829&type=bug
>>>
>>> Yeah... seeing the code I like the wrapper solution even better. It's
>>> just so elegant. Virtually no overhead, nicely dealing with all sorts of
>>> nestings, having the option for future extensions (weak elements, collection
>>> elements etc). I think I've just promoted that to my top choice ;-)
>>>
>>> Seriously folks, look at that code. It's a great solution.
>>>
>>> Cheers,
>>>  - Andreas
>>>
>>
>> I agree it's elegant. Unless someone actually puts a nil in the Set, its
>> inner structure is exactly the same as before. From the description it
>> sounded like any element would be wrapped, but of course only nil gets
>> wrapped (and instances of SetElement itself).
>>
>> There is a performance decrease due to the additional message sends. The
>> following micro benchmark demonstrates it:
>>
>> | r a b s |
>> r := Random seed: 1234567.
>> a := (1 to: 100000) collect: [:i | r nextInt: 1000000].
>> b := (1 to: 100000) collect: [:i | r nextInt: 1000000].
>> s := Set new.
>> Smalltalk garbageCollectMost.
>> [       a do: [:each | s add: each].
>>        b do: [:each | s includes: each].
>>        b do: [:each | s like: each].
>>        a do: [:each | s remove: each ifAbsent: []].
>> ] timeToRun
>>
>> I'm purposely not including my results, do measure yourself if you care
>> (but don't spoil the fun for others).
>>
>> However, unless someone can demonstrate this affects a macro benchmark,
>> I'd choose this for its generality and minimal compatibility risk.
>
> I was about to publish my measurements when I read your mail.
> I could write a macro benchmark that could abuse the "weakness" of this
> implementation and would show real slowdown, but it's easier to decrease the
> performance penalty of the implementation to almost 0.
> Btw how often do you use Set >> #like:?
>
> Levente
>

Every time you look up or create a Symbol

>>
>> +1 from me :)
>>
>> - Bert -
>>
>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Re: Ideas about sets and dictionaries

Levente Uzonyi-2
On Fri, 13 Nov 2009, Nicolas Cellier wrote:

> 2009/11/13 Levente Uzonyi <[hidden email]>:
>> On Fri, 13 Nov 2009, Bert Freudenberg wrote:
>>
>>>
>>> On 13.11.2009, at 08:39, Andreas Raab wrote:
>>>
>>>> Igor Stasenko wrote:
>>>>>
>>>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>>>> unexpected passes
>>>>> After applying changes to sets using nil wrappers [1]:
>>>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>>>> unexpected passes
>>>>> After adding changes to sets using negative tally[2]:
>>>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>>>> unexpected passes
>>>>
>>>> Those are great results!
>>>>
>>>>> [1] http://bugs.squeak.org/file_download.php?file_id=3829&type=bug
>>>>
>>>> Yeah... seeing the code I like the wrapper solution even better. It's
>>>> just so elegant. Virtually no overhead, nicely dealing with all sorts of
>>>> nestings, having the option for future extensions (weak elements, collection
>>>> elements etc). I think I've just promoted that to my top choice ;-)
>>>>
>>>> Seriously folks, look at that code. It's a great solution.
>>>>
>>>> Cheers,
>>>>  - Andreas
>>>>
>>>
>>> I agree it's elegant. Unless someone actually puts a nil in the Set, its
>>> inner structure is exactly the same as before. From the description it
>>> sounded like any element would be wrapped, but of course only nil gets
>>> wrapped (and instances of SetElement itself).
>>>
>>> There is a performance decrease due to the additional message sends. The
>>> following micro benchmark demonstrates it:
>>>
>>> | r a b s |
>>> r := Random seed: 1234567.
>>> a := (1 to: 100000) collect: [:i | r nextInt: 1000000].
>>> b := (1 to: 100000) collect: [:i | r nextInt: 1000000].
>>> s := Set new.
>>> Smalltalk garbageCollectMost.
>>> [       a do: [:each | s add: each].
>>>        b do: [:each | s includes: each].
>>>        b do: [:each | s like: each].
>>>        a do: [:each | s remove: each ifAbsent: []].
>>> ] timeToRun
>>>
>>> I'm purposely not including my results, do measure yourself if you care
>>> (but don't spoil the fun for others).
>>>
>>> However, unless someone can demonstrate this affects a macro benchmark,
>>> I'd choose this for its generality and minimal compatibility risk.
>>
>> I was about to publish my measurements when I read your mail.
>> I could write a macro benchmark that could abuse the "weakness" of this
>> implementation and would show real slowdown, but it's easier to decrease the
>> performance penalty of the implementation to almost 0.
>> Btw how often do you use Set >> #like:?
>>
>> Levente
>>
>
> Every time you look up or create a Symbol
That's WeakSet >> #like:, not Set >> #like:.

Levente

>
>>>
>>> +1 from me :)
>>>
>>> - Bert -
>>>
>>>
>>>
>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Re: Ideas about sets and dictionaries

Nicolas Cellier
2009/11/13 Levente Uzonyi <[hidden email]>:

> On Fri, 13 Nov 2009, Nicolas Cellier wrote:
>
>> 2009/11/13 Levente Uzonyi <[hidden email]>:
>>>
>>> On Fri, 13 Nov 2009, Bert Freudenberg wrote:
>>>
>>>>
>>>> On 13.11.2009, at 08:39, Andreas Raab wrote:
>>>>
>>>>> Igor Stasenko wrote:
>>>>>>
>>>>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>>>>> unexpected passes
>>>>>> After applying changes to sets using nil wrappers [1]:
>>>>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>>>>> unexpected passes
>>>>>> After adding changes to sets using negative tally[2]:
>>>>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>>>>> unexpected passes
>>>>>
>>>>> Those are great results!
>>>>>
>>>>>> [1] http://bugs.squeak.org/file_download.php?file_id=3829&type=bug
>>>>>
>>>>> Yeah... seeing the code I like the wrapper solution even better. It's
>>>>> just so elegant. Virtually no overhead, nicely dealing with all sorts
>>>>> of
>>>>> nestings, having the option for future extensions (weak elements,
>>>>> collection
>>>>> elements etc). I think I've just promoted that to my top choice ;-)
>>>>>
>>>>> Seriously folks, look at that code. It's a great solution.
>>>>>
>>>>> Cheers,
>>>>>  - Andreas
>>>>>
>>>>
>>>> I agree it's elegant. Unless someone actually puts a nil in the Set, its
>>>> inner structure is exactly the same as before. From the description it
>>>> sounded like any element would be wrapped, but of course only nil gets
>>>> wrapped (and instances of SetElement itself).
>>>>
>>>> There is a performance decrease due to the additional message sends. The
>>>> following micro benchmark demonstrates it:
>>>>
>>>> | r a b s |
>>>> r := Random seed: 1234567.
>>>> a := (1 to: 100000) collect: [:i | r nextInt: 1000000].
>>>> b := (1 to: 100000) collect: [:i | r nextInt: 1000000].
>>>> s := Set new.
>>>> Smalltalk garbageCollectMost.
>>>> [       a do: [:each | s add: each].
>>>>        b do: [:each | s includes: each].
>>>>        b do: [:each | s like: each].
>>>>        a do: [:each | s remove: each ifAbsent: []].
>>>> ] timeToRun
>>>>
>>>> I'm purposely not including my results, do measure yourself if you care
>>>> (but don't spoil the fun for others).
>>>>
>>>> However, unless someone can demonstrate this affects a macro benchmark,
>>>> I'd choose this for its generality and minimal compatibility risk.
>>>
>>> I was about to publish my measurements when I read your mail.
>>> I could write a macro benchmark that could abuse the "weakness" of this
>>> implementation and would show real slowdown, but it's easier to decrease
>>> the
>>> performance penalty of the implementation to almost 0.
>>> Btw how often do you use Set >> #like:?
>>>
>>> Levente
>>>
>>
>> Every time you look up or create a Symbol
>
> That's WeakSet >> #like:, not Set >> #like:.
>
> Levente
>

Oh yes, I forgot they were different.
I remembered this usage because of
http://code.google.com/p/pharo/issues/detail?id=331

Nicolas

>>
>>>>
>>>> +1 from me :)
>>>>
>>>> - Bert -
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Re: Ideas about sets and dictionaries

Levente Uzonyi-2
In reply to this post by Bert Freudenberg
On Fri, 13 Nov 2009, Bert Freudenberg wrote:

>
> On 13.11.2009, at 08:39, Andreas Raab wrote:
>
>> Igor Stasenko wrote:
>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>> unexpected passes
>>> After applying changes to sets using nil wrappers [1]:
>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>> unexpected passes
>>> After adding changes to sets using negative tally[2]:
>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>> unexpected passes
>>
>> Those are great results!
>>
>>> [1] http://bugs.squeak.org/file_download.php?file_id=3829&type=bug
>>
>> Yeah... seeing the code I like the wrapper solution even better. It's just so elegant. Virtually no overhead, nicely dealing with all sorts of nestings, having the option for future extensions (weak elements, collection elements etc). I think I've just promoted that to my top choice ;-)
>>
>> Seriously folks, look at that code. It's a great solution.
>>
>> Cheers,
>>  - Andreas
>>
>
> I agree it's elegant. Unless someone actually puts a nil in the Set, its inner structure is exactly the same as before. From the description it sounded like any element would be wrapped, but of course only nil gets wrapped (and instances of SetElement itself).
>
> There is a performance decrease due to the additional message sends. The following micro benchmark demonstrates it:
>
> | r a b s |
> r := Random seed: 1234567.
> a := (1 to: 100000) collect: [:i | r nextInt: 1000000].
> b := (1 to: 100000) collect: [:i | r nextInt: 1000000].
> s := Set new.
> Smalltalk garbageCollectMost.
> [ a do: [:each | s add: each].
> b do: [:each | s includes: each].
> b do: [:each | s like: each].
> a do: [:each | s remove: each ifAbsent: []].
> ] timeToRun
>
> I'm purposely not including my results, do measure yourself if you care (but don't spoil the fun for others).
>

According to my own microbenchmark the slowdown for #add: #includes: and
#remove:ifAbsent is between 1 and 4%, because of the extra message send
in #scanFor:. The new implementation of #like: gave ~27-28% slowdown
mostly because of another extra message send and the block activation (in
case the element is not in the set), but this can be reduced by
reimplementing #like:.

Levente

> However, unless someone can demonstrate this affects a macro benchmark, I'd choose this for its generality and minimal compatibility risk.
>
> +1 from me :)
>
> - Bert -
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Re: Ideas about sets and dictionaries

Igor Stasenko
2009/11/17 Levente Uzonyi <[hidden email]>:

> On Fri, 13 Nov 2009, Bert Freudenberg wrote:
>
>>
>> On 13.11.2009, at 08:39, Andreas Raab wrote:
>>
>>> Igor Stasenko wrote:
>>>>
>>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>>> unexpected passes
>>>> After applying changes to sets using nil wrappers [1]:
>>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>>> unexpected passes
>>>> After adding changes to sets using negative tally[2]:
>>>> 2428 run, 2406 passes, 0 expected failures, 15 failures, 7 errors, 0
>>>> unexpected passes
>>>
>>> Those are great results!
>>>
>>>> [1] http://bugs.squeak.org/file_download.php?file_id=3829&type=bug
>>>
>>> Yeah... seeing the code I like the wrapper solution even better. It's
>>> just so elegant. Virtually no overhead, nicely dealing with all sorts of
>>> nestings, having the option for future extensions (weak elements, collection
>>> elements etc). I think I've just promoted that to my top choice ;-)
>>>
>>> Seriously folks, look at that code. It's a great solution.
>>>
>>> Cheers,
>>>  - Andreas
>>>
>>
>> I agree it's elegant. Unless someone actually puts a nil in the Set, its
>> inner structure is exactly the same as before. From the description it
>> sounded like any element would be wrapped, but of course only nil gets
>> wrapped (and instances of SetElement itself).
>>
>> There is a performance decrease due to the additional message sends. The
>> following micro benchmark demonstrates it:
>>
>> | r a b s |
>> r := Random seed: 1234567.
>> a := (1 to: 100000) collect: [:i | r nextInt: 1000000].
>> b := (1 to: 100000) collect: [:i | r nextInt: 1000000].
>> s := Set new.
>> Smalltalk garbageCollectMost.
>> [       a do: [:each | s add: each].
>>        b do: [:each | s includes: each].
>>        b do: [:each | s like: each].
>>        a do: [:each | s remove: each ifAbsent: []].
>> ] timeToRun
>>
>> I'm purposely not including my results, do measure yourself if you care
>> (but don't spoil the fun for others).
>>
>
> According to my own microbenchmark the slowdown for #add: #includes: and
> #remove:ifAbsent is between 1 and 4%, because of the extra message send in
> #scanFor:. The new implementation of #like: gave ~27-28% slowdown mostly
> because of another extra message send and the block activation (in case the
> element is not in the set), but this can be reduced by reimplementing
> #like:.
>

i introduced a new #like:ifAbsent: method, so #like: just using it by
providing a default block for absent case.
As you can conclude, we need a more generic #like:ifAbsent: since
#like: originally returns nil if object not in set, but since nils now
can be included in set, it is more recommended to use like:ifAbsent: .
Btw, if you take a look on a single, real usage of #like:
Symbol class>>hasInterned: aString ifTrue: symBlock
you can see that it could be optimized to use a new message, and
completely eliminate an introduced slowdown of extra block activation.

> Levente




--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Re: Ideas about sets and dictionaries

Levente Uzonyi-2
On Tue, 17 Nov 2009, Igor Stasenko wrote:

> i introduced a new #like:ifAbsent: method, so #like: just using it by
> providing a default block for absent case.
> As you can conclude, we need a more generic #like:ifAbsent: since
> #like: originally returns nil if object not in set, but since nils now
> can be included in set, it is more recommended to use like:ifAbsent: .

I agree, #like:ifAbsent: is a must. I don't know any use of Set >> #like:,
but there may be packages which use this method. The only reason we should
keep it is for backwards compatibility with these possible packages.

> Btw, if you take a look on a single, real usage of #like:
> Symbol class>>hasInterned: aString ifTrue: symBlock
> you can see that it could be optimized to use a new message, and
> completely eliminate an introduced slowdown of extra block activation.
>

Symbols are stored in WeakSets and AFAIK those aren't changed yet here:
http://bugs.squeak.org/file_download.php?file_id=3829&type=bug

Levente

>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
>

1234