Smalltalk › Pharo › Pharo Smalltalk Developers

FasterSets improvements

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

20 messages Options

Ralph Boland

FasterSets improvements

2009/7/1 Henrik Johansen <[hidden email]>:

> I took some time to check this yesterday, my comments on the code itself are
> basically the same as what Lucas posted in the issue tracker.
> Works as advertised, but should probably clean out the "gunk" which is
> unused after it's integrated.
>
> Some feedback on possible improvements:
> - What it does, is basically speed up the adds to the new array in a grow
> that might happen Why not take this all to the logical conclusion, and
> instead optimize an addAll: method to be used in grow, that assumes elements
> are not in set (and themselves are not duplicates)? F.ex. it could also add
> to tally just once, instead of one increase per element as the noCompareAdd:
> does. (an add method with no tally add, no = object check, and no
> growCheck).

>
> - The usual addAll: method can be sped up in a similar way, by performing
> one grow operation before adding elements if necessary (pessimistically
> assuming all elements are unique), then use an add method which does not
> check for grow need at all. (an add method with tally add and = object
> check, but no growCheck)
> If desirable, you could also shrink the capacity after adds are done, to
> match the real amount of new elements.
>
> Adding (1 to: 1200000) to a new Set with such an addAll: method took 1/2 to
> 1/3rd the time of the regular addAll: method, even with the lower grow cost
> of FasterSets. (Lower gains if Set is presized, and if hash-calculation is
> bigger part of cost)
>

I considered this when I created FasterSets. I didn't do this because I was
being conservative. I didn't want FasterSets rejected because it did things
that weren't wanted. I am surprised though at the amount of performance
gain you see.
As things stand I see no reason for not including
FasterSets other than the potential of breaking packages that subclass
Set and then override methods crucial to FasterSets.
My vote is obviously for fixing any packages that get broken rather than
rejecting FasterSets.

> Note: If you want to stay ANSI-compatible, one tricky point in such an
> optimization is adding all objects in collection up to a nil element, then
> raising an error. (ie: must produce equivalent results to add: performed of
> each of collections elements in sequence )
>
> Cheers,
> Henry
>
>

I created FasterSets for adding to Squeak and did not consider Pharo but
of course am delighted to have it considered for Pharo.
You can of course make whatever improvements you want.
I encourage though that you create a FasterSets2 containing all the
improvements and leave the original FasterSets alone. I am still interested
in having FasterSets included in Squeak and I do not know if the powers that
be will want the additional improvements or not so I prefer that both versions
exist.

One thing that I have in my code is a method called nastyAddNew: which
adds an element to a set without checking if the element is already there.
If the element is already there of course it gets added again and your set
contains two copies of the element; thus your set is corrupted. This problem
explains the 'nasty' part of the name nastyAddNew. I also have a method
addNew: which reports an error if the object is already in the set.
nastyAddNew: can be used in set operations such as copy and intersection
making these operations faster and can also be used externally by brave users.

If improvements to FasterSets are being contemplated then I suggest that
adding methods addNew: and nastyAddNew: be considered. If nastyAddNew:
is added I see no point in making it private because some users will
use it anyway.
I am not necessarily voting for their inclusion because we are then
giving the user the opportunity to corrupt his data.
Of course I do use them in my current project.
thus, for my current project at least, I am a brave user.

Regards,

Ralph Boland

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Igor Stasenko

Re: FasterSets improvements

Anyone considered fixing Sets to allow a nil as an element?

2009/7/3 Ralph Boland <[hidden email]>:

> 2009/7/1 Henrik Johansen <[hidden email]>:
>> I took some time to check this yesterday, my comments on the code itself are
>> basically the same as what Lucas posted in the issue tracker.
>> Works as advertised, but should probably clean out the "gunk" which is
>> unused after it's integrated.
>>
>> Some feedback on possible improvements:
>> - What it does, is basically speed up the adds to the new array in a grow
>> that might happen Why not take this all to the logical conclusion, and
>> instead optimize an addAll: method to be used in grow, that assumes elements
>> are not in set (and themselves are not duplicates)? F.ex. it could also add
>> to tally just once, instead of one increase per element as the noCompareAdd:
>> does. (an add method with no tally add, no = object check, and no
>> growCheck).
>
>
>>
>> - The usual addAll: method can be sped up in a similar way, by performing
>> one grow operation before adding elements if necessary (pessimistically
>> assuming all elements are unique), then use an add method which does not
>> check for grow need at all. (an add method with tally add and = object
>> check, but no growCheck)
>> If desirable, you could also shrink the capacity after adds are done, to
>> match the real amount of new elements.
>>
>> Adding (1 to: 1200000) to a new Set with such an addAll: method took 1/2 to
>> 1/3rd the time of the regular addAll: method, even with the lower grow cost
>> of FasterSets. (Lower gains if Set is presized, and if hash-calculation is
>> bigger part of cost)
>>
>
> I considered this when I created FasterSets. I didn't do this because I was
> being conservative. I didn't want FasterSets rejected because it did things
> that weren't wanted. I am surprised though at the amount of performance
> gain you see.
> As things stand I see no reason for not including
> FasterSets other than the potential of breaking packages that subclass
> Set and then override methods crucial to FasterSets.
> My vote is obviously for fixing any packages that get broken rather than
> rejecting FasterSets.
>
>> Note: If you want to stay ANSI-compatible, one tricky point in such an
>> optimization is adding all objects in collection up to a nil element, then
>> raising an error. (ie: must produce equivalent results to add: performed of
>> each of collections elements in sequence )
>>
>> Cheers,
>> Henry
>>
>>
>
> I created FasterSets for adding to Squeak and did not consider Pharo but
> of course am delighted to have it considered for Pharo.
> You can of course make whatever improvements you want.
> I encourage though that you create a FasterSets2 containing all the
> improvements and leave the original FasterSets alone. I am still interested
> in having FasterSets included in Squeak and I do not know if the powers that
> be will want the additional improvements or not so I prefer that both versions
> exist.
>
> One thing that I have in my code is a method called nastyAddNew: which
> adds an element to a set without checking if the element is already there.
> If the element is already there of course it gets added again and your set
> contains two copies of the element; thus your set is corrupted. This problem
> explains the 'nasty' part of the name nastyAddNew. I also have a method
> addNew: which reports an error if the object is already in the set.
> nastyAddNew: can be used in set operations such as copy and intersection
> making these operations faster and can also be used externally by brave users.
>
> If improvements to FasterSets are being contemplated then I suggest that
> adding methods addNew: and nastyAddNew: be considered. If nastyAddNew:
> is added I see no point in making it private because some users will
> use it anyway.
> I am not necessarily voting for their inclusion because we are then
> giving the user the opportunity to corrupt his data.
> Of course I do use them in my current project.
> thus, for my current project at least, I am a brave user.
>
> Regards,
>
> Ralph Boland
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Andres Valloud-4

Re: FasterSets improvements

Keep in mind that, as they're implemented, there will be always some
object that cannot be added to a Set because the set uses the object as
a sentinel value. On the other hand, since it does not make a whole lot
of sense to add a set to itself, maybe it should use self as the
sentinel value.

Igor Stasenko wrote:

> Anyone considered fixing Sets to allow a nil as an element?
>
> 2009/7/3 Ralph Boland <[hidden email]>:
>
>> 2009/7/1 Henrik Johansen <[hidden email]>:
>>
>>> I took some time to check this yesterday, my comments on the code itself are
>>> basically the same as what Lucas posted in the issue tracker.
>>> Works as advertised, but should probably clean out the "gunk" which is
>>> unused after it's integrated.
>>>
>>> Some feedback on possible improvements:
>>> - What it does, is basically speed up the adds to the new array in a grow
>>> that might happen Why not take this all to the logical conclusion, and
>>> instead optimize an addAll: method to be used in grow, that assumes elements
>>> are not in set (and themselves are not duplicates)? F.ex. it could also add
>>> to tally just once, instead of one increase per element as the noCompareAdd:
>>> does. (an add method with no tally add, no = object check, and no
>>> growCheck).
>>>
>>
>>> - The usual addAll: method can be sped up in a similar way, by performing
>>> one grow operation before adding elements if necessary (pessimistically
>>> assuming all elements are unique), then use an add method which does not
>>> check for grow need at all. (an add method with tally add and = object
>>> check, but no growCheck)
>>> If desirable, you could also shrink the capacity after adds are done, to
>>> match the real amount of new elements.
>>>
>>> Adding (1 to: 1200000) to a new Set with such an addAll: method took 1/2 to
>>> 1/3rd the time of the regular addAll: method, even with the lower grow cost
>>> of FasterSets. (Lower gains if Set is presized, and if hash-calculation is
>>> bigger part of cost)
>>>
>>>
>> I considered this when I created FasterSets. I didn't do this because I was
>> being conservative. I didn't want FasterSets rejected because it did things
>> that weren't wanted. I am surprised though at the amount of performance
>> gain you see.
>> As things stand I see no reason for not including
>> FasterSets other than the potential of breaking packages that subclass
>> Set and then override methods crucial to FasterSets.
>> My vote is obviously for fixing any packages that get broken rather than
>> rejecting FasterSets.
>>
>>
>>> Note: If you want to stay ANSI-compatible, one tricky point in such an
>>> optimization is adding all objects in collection up to a nil element, then
>>> raising an error. (ie: must produce equivalent results to add: performed of
>>> each of collections elements in sequence )
>>>
>>> Cheers,
>>> Henry
>>>
>>>
>>>
>> I created FasterSets for adding to Squeak and did not consider Pharo but
>> of course am delighted to have it considered for Pharo.
>> You can of course make whatever improvements you want.
>> I encourage though that you create a FasterSets2 containing all the
>> improvements and leave the original FasterSets alone. I am still interested
>> in having FasterSets included in Squeak and I do not know if the powers that
>> be will want the additional improvements or not so I prefer that both versions
>> exist.
>>
>> One thing that I have in my code is a method called nastyAddNew: which
>> adds an element to a set without checking if the element is already there.
>> If the element is already there of course it gets added again and your set
>> contains two copies of the element; thus your set is corrupted. This problem
>> explains the 'nasty' part of the name nastyAddNew. I also have a method
>> addNew: which reports an error if the object is already in the set.
>> nastyAddNew: can be used in set operations such as copy and intersection
>> making these operations faster and can also be used externally by brave users.
>>
>> If improvements to FasterSets are being contemplated then I suggest that
>> adding methods addNew: and nastyAddNew: be considered. If nastyAddNew:
>> is added I see no point in making it private because some users will
>> use it anyway.
>> I am not necessarily voting for their inclusion because we are then
>> giving the user the opportunity to corrupt his data.
>> Of course I do use them in my current project.
>> thus, for my current project at least, I am a brave user.
>>
>> Regards,
>>
>> Ralph Boland
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
> .
>
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Igor Stasenko

Re: FasterSets improvements

2009/7/3 Andres Valloud <[hidden email]>:
> Keep in mind that, as they're implemented, there will be always some
> object that cannot be added to a Set because the set uses the object as
> a sentinel value. On the other hand, since it does not make a whole lot
> of sense to add a set to itself, maybe it should use self as the
> sentinel value.
>
There is no difference what object to use as a sentinel - self or nil.
In fact you can use any object which fits your needs.
AND be able to include it as an element of set.

Just to illustrate:

Set>>add: anObject
sentinel == anObject ifTrue: [
includesSentinel := true.
].
.... rest of code ...

Set>>includes: anObject
anObject == sentinel ifTrue: [ ^ indludesSentinel ].
.... rest of code ..

> Igor Stasenko wrote:
>> Anyone considered fixing Sets to allow a nil as an element?
>>
>> 2009/7/3 Ralph Boland <[hidden email]>:
>>
>>> 2009/7/1 Henrik Johansen <[hidden email]>:
>>>
>>>> I took some time to check this yesterday, my comments on the code itself are
>>>> basically the same as what Lucas posted in the issue tracker.
>>>> Works as advertised, but should probably clean out the "gunk" which is
>>>> unused after it's integrated.
>>>>
>>>> Some feedback on possible improvements:
>>>> - What it does, is basically speed up the adds to the new array in a grow
>>>> that might happen Why not take this all to the logical conclusion, and
>>>> instead optimize an addAll: method to be used in grow, that assumes elements
>>>> are not in set (and themselves are not duplicates)? F.ex. it could also add
>>>> to tally just once, instead of one increase per element as the noCompareAdd:
>>>> does. (an add method with no tally add, no = object check, and no
>>>> growCheck).
>>>>
>>>
>>>> - The usual addAll: method can be sped up in a similar way, by performing
>>>> one grow operation before adding elements if necessary (pessimistically
>>>> assuming all elements are unique), then use an add method which does not
>>>> check for grow need at all. (an add method with tally add and = object
>>>> check, but no growCheck)
>>>> If desirable, you could also shrink the capacity after adds are done, to
>>>> match the real amount of new elements.
>>>>
>>>> Adding (1 to: 1200000) to a new Set with such an addAll: method took 1/2 to
>>>> 1/3rd the time of the regular addAll: method, even with the lower grow cost
>>>> of FasterSets. (Lower gains if Set is presized, and if hash-calculation is
>>>> bigger part of cost)
>>>>
>>>>
>>> I considered this when I created FasterSets. I didn't do this because I was
>>> being conservative. I didn't want FasterSets rejected because it did things
>>> that weren't wanted. I am surprised though at the amount of performance
>>> gain you see.
>>> As things stand I see no reason for not including
>>> FasterSets other than the potential of breaking packages that subclass
>>> Set and then override methods crucial to FasterSets.
>>> My vote is obviously for fixing any packages that get broken rather than
>>> rejecting FasterSets.
>>>
>>>
>>>> Note: If you want to stay ANSI-compatible, one tricky point in such an
>>>> optimization is adding all objects in collection up to a nil element, then
>>>> raising an error. (ie: must produce equivalent results to add: performed of
>>>> each of collections elements in sequence )
>>>>
>>>> Cheers,
>>>> Henry
>>>>
>>>>
>>>>
>>> I created FasterSets for adding to Squeak and did not consider Pharo but
>>> of course am delighted to have it considered for Pharo.
>>> You can of course make whatever improvements you want.
>>> I encourage though that you create a FasterSets2 containing all the
>>> improvements and leave the original FasterSets alone. I am still interested
>>> in having FasterSets included in Squeak and I do not know if the powers that
>>> be will want the additional improvements or not so I prefer that both versions
>>> exist.
>>>
>>> One thing that I have in my code is a method called nastyAddNew: which
>>> adds an element to a set without checking if the element is already there.
>>> If the element is already there of course it gets added again and your set
>>> contains two copies of the element; thus your set is corrupted. This problem
>>> explains the 'nasty' part of the name nastyAddNew. I also have a method
>>> addNew: which reports an error if the object is already in the set.
>>> nastyAddNew: can be used in set operations such as copy and intersection
>>> making these operations faster and can also be used externally by brave users.
>>>
>>> If improvements to FasterSets are being contemplated then I suggest that
>>> adding methods addNew: and nastyAddNew: be considered. If nastyAddNew:
>>> is added I see no point in making it private because some users will
>>> use it anyway.
>>> I am not necessarily voting for their inclusion because we are then
>>> giving the user the opportunity to corrupt his data.
>>> Of course I do use them in my current project.
>>> thus, for my current project at least, I am a brave user.
>>>
>>> Regards,
>>>
>>> Ralph Boland
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>> .
>>
>>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Andres Valloud-4

Re: FasterSets improvements

So you can't have a set with size = 0?... No set isEmpty?...

Igor Stasenko wrote:

> 2009/7/3 Andres Valloud <[hidden email]>:
>
>> Keep in mind that, as they're implemented, there will be always some
>> object that cannot be added to a Set because the set uses the object as
>> a sentinel value. On the other hand, since it does not make a whole lot
>> of sense to add a set to itself, maybe it should use self as the
>> sentinel value.
>>
>>
> There is no difference what object to use as a sentinel - self or nil.
> In fact you can use any object which fits your needs.
> AND be able to include it as an element of set.
>
> Just to illustrate:
>
> Set>>add: anObject
> sentinel == anObject ifTrue: [
> includesSentinel := true.
> ].
> .... rest of code ...
>
> Set>>includes: anObject
> anObject == sentinel ifTrue: [ ^ indludesSentinel ].
> .... rest of code ..
>
>
>
>> Igor Stasenko wrote:
>>
>>> Anyone considered fixing Sets to allow a nil as an element?
>>>
>>> 2009/7/3 Ralph Boland <[hidden email]>:
>>>
>>>
>>>> 2009/7/1 Henrik Johansen <[hidden email]>:
>>>>
>>>>
>>>>> I took some time to check this yesterday, my comments on the code itself are
>>>>> basically the same as what Lucas posted in the issue tracker.
>>>>> Works as advertised, but should probably clean out the "gunk" which is
>>>>> unused after it's integrated.
>>>>>
>>>>> Some feedback on possible improvements:
>>>>> - What it does, is basically speed up the adds to the new array in a grow
>>>>> that might happen Why not take this all to the logical conclusion, and
>>>>> instead optimize an addAll: method to be used in grow, that assumes elements
>>>>> are not in set (and themselves are not duplicates)? F.ex. it could also add
>>>>> to tally just once, instead of one increase per element as the noCompareAdd:
>>>>> does. (an add method with no tally add, no = object check, and no
>>>>> growCheck).
>>>>>
>>>>>
>>>>> - The usual addAll: method can be sped up in a similar way, by performing
>>>>> one grow operation before adding elements if necessary (pessimistically
>>>>> assuming all elements are unique), then use an add method which does not
>>>>> check for grow need at all. (an add method with tally add and = object
>>>>> check, but no growCheck)
>>>>> If desirable, you could also shrink the capacity after adds are done, to
>>>>> match the real amount of new elements.
>>>>>
>>>>> Adding (1 to: 1200000) to a new Set with such an addAll: method took 1/2 to
>>>>> 1/3rd the time of the regular addAll: method, even with the lower grow cost
>>>>> of FasterSets. (Lower gains if Set is presized, and if hash-calculation is
>>>>> bigger part of cost)
>>>>>
>>>>>
>>>>>
>>>> I considered this when I created FasterSets. I didn't do this because I was
>>>> being conservative. I didn't want FasterSets rejected because it did things
>>>> that weren't wanted. I am surprised though at the amount of performance
>>>> gain you see.
>>>> As things stand I see no reason for not including
>>>> FasterSets other than the potential of breaking packages that subclass
>>>> Set and then override methods crucial to FasterSets.
>>>> My vote is obviously for fixing any packages that get broken rather than
>>>> rejecting FasterSets.
>>>>
>>>>
>>>>
>>>>> Note: If you want to stay ANSI-compatible, one tricky point in such an
>>>>> optimization is adding all objects in collection up to a nil element, then
>>>>> raising an error. (ie: must produce equivalent results to add: performed of
>>>>> each of collections elements in sequence )
>>>>>
>>>>> Cheers,
>>>>> Henry
>>>>>
>>>>>
>>>>>
>>>>>
>>>> I created FasterSets for adding to Squeak and did not consider Pharo but
>>>> of course am delighted to have it considered for Pharo.
>>>> You can of course make whatever improvements you want.
>>>> I encourage though that you create a FasterSets2 containing all the
>>>> improvements and leave the original FasterSets alone. I am still interested
>>>> in having FasterSets included in Squeak and I do not know if the powers that
>>>> be will want the additional improvements or not so I prefer that both versions
>>>> exist.
>>>>
>>>> One thing that I have in my code is a method called nastyAddNew: which
>>>> adds an element to a set without checking if the element is already there.
>>>> If the element is already there of course it gets added again and your set
>>>> contains two copies of the element; thus your set is corrupted. This problem
>>>> explains the 'nasty' part of the name nastyAddNew. I also have a method
>>>> addNew: which reports an error if the object is already in the set.
>>>> nastyAddNew: can be used in set operations such as copy and intersection
>>>> making these operations faster and can also be used externally by brave users.
>>>>
>>>> If improvements to FasterSets are being contemplated then I suggest that
>>>> adding methods addNew: and nastyAddNew: be considered. If nastyAddNew:
>>>> is added I see no point in making it private because some users will
>>>> use it anyway.
>>>> I am not necessarily voting for their inclusion because we are then
>>>> giving the user the opportunity to corrupt his data.
>>>> Of course I do use them in my current project.
>>>> thus, for my current project at least, I am a brave user.
>>>>
>>>> Regards,
>>>>
>>>> Ralph Boland
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>
>>>>
>>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko AKA sig.
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>> .
>>>
>>>
>>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
> .
>
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Andres Valloud-4

Re: FasterSets improvements

In reply to this post by Igor Stasenko

I think it's just a bad idea... enumeration of the set would become
unnecessarily complicated, and a set that includes itself is asking for
trouble. For example, how do you hash the set? How do you determine
whether a set that includes itself is equal to itself? How do these
print themselves? In Smalltalk everything is an object, and
consequently, some object must represent the absence of an object in a
storage slot. That object cannot, at the same time, represent the
presence of something.

Igor Stasenko wrote:

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Igor Stasenko

Re: FasterSets improvements

In reply to this post by Andres Valloud-4

2009/7/3 Andres Valloud <[hidden email]>:
> So you can't have a set with size = 0?... No set isEmpty?...
>
why?

isEmpty
^ tally == 0 and: [ includesSentinel not]

> Igor Stasenko wrote:
>> 2009/7/3 Andres Valloud <[hidden email]>:
>>
>>> Keep in mind that, as they're implemented, there will be always some
>>> object that cannot be added to a Set because the set uses the object as
>>> a sentinel value. On the other hand, since it does not make a whole lot
>>> of sense to add a set to itself, maybe it should use self as the
>>> sentinel value.
>>>
>>>
>> There is no difference what object to use as a sentinel - self or nil.
>> In fact you can use any object which fits your needs.
>> AND be able to include it as an element of set.
>>
>> Just to illustrate:
>>
>> Set>>add: anObject
>> sentinel == anObject ifTrue: [
>> includesSentinel := true.
>> ].
>> .... rest of code ...
>>
>> Set>>includes: anObject
>> anObject == sentinel ifTrue: [ ^ indludesSentinel ].
>> .... rest of code ..
>>
>>
>>
>>> Igor Stasenko wrote:
>>>
>>>> Anyone considered fixing Sets to allow a nil as an element?
>>>>
>>>> 2009/7/3 Ralph Boland <[hidden email]>:
>>>>
>>>>
>>>>> 2009/7/1 Henrik Johansen <[hidden email]>:
>>>>>
>>>>>
>>>>>> I took some time to check this yesterday, my comments on the code itself are
>>>>>> basically the same as what Lucas posted in the issue tracker.
>>>>>> Works as advertised, but should probably clean out the "gunk" which is
>>>>>> unused after it's integrated.
>>>>>>
>>>>>> Some feedback on possible improvements:
>>>>>> - What it does, is basically speed up the adds to the new array in a grow
>>>>>> that might happen Why not take this all to the logical conclusion, and
>>>>>> instead optimize an addAll: method to be used in grow, that assumes elements
>>>>>> are not in set (and themselves are not duplicates)? F.ex. it could also add
>>>>>> to tally just once, instead of one increase per element as the noCompareAdd:
>>>>>> does. (an add method with no tally add, no = object check, and no
>>>>>> growCheck).
>>>>>>
>>>>>>
>>>>>> - The usual addAll: method can be sped up in a similar way, by performing
>>>>>> one grow operation before adding elements if necessary (pessimistically
>>>>>> assuming all elements are unique), then use an add method which does not
>>>>>> check for grow need at all. (an add method with tally add and = object
>>>>>> check, but no growCheck)
>>>>>> If desirable, you could also shrink the capacity after adds are done, to
>>>>>> match the real amount of new elements.
>>>>>>
>>>>>> Adding (1 to: 1200000) to a new Set with such an addAll: method took 1/2 to
>>>>>> 1/3rd the time of the regular addAll: method, even with the lower grow cost
>>>>>> of FasterSets. (Lower gains if Set is presized, and if hash-calculation is
>>>>>> bigger part of cost)
>>>>>>
>>>>>>
>>>>>>
>>>>> I considered this when I created FasterSets. I didn't do this because I was
>>>>> being conservative. I didn't want FasterSets rejected because it did things
>>>>> that weren't wanted. I am surprised though at the amount of performance
>>>>> gain you see.
>>>>> As things stand I see no reason for not including
>>>>> FasterSets other than the potential of breaking packages that subclass
>>>>> Set and then override methods crucial to FasterSets.
>>>>> My vote is obviously for fixing any packages that get broken rather than
>>>>> rejecting FasterSets.
>>>>>
>>>>>
>>>>>
>>>>>> Note: If you want to stay ANSI-compatible, one tricky point in such an
>>>>>> optimization is adding all objects in collection up to a nil element, then
>>>>>> raising an error. (ie: must produce equivalent results to add: performed of
>>>>>> each of collections elements in sequence )
>>>>>>
>>>>>> Cheers,
>>>>>> Henry
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> I created FasterSets for adding to Squeak and did not consider Pharo but
>>>>> of course am delighted to have it considered for Pharo.
>>>>> You can of course make whatever improvements you want.
>>>>> I encourage though that you create a FasterSets2 containing all the
>>>>> improvements and leave the original FasterSets alone. I am still interested
>>>>> in having FasterSets included in Squeak and I do not know if the powers that
>>>>> be will want the additional improvements or not so I prefer that both versions
>>>>> exist.
>>>>>
>>>>> One thing that I have in my code is a method called nastyAddNew: which
>>>>> adds an element to a set without checking if the element is already there.
>>>>> If the element is already there of course it gets added again and your set
>>>>> contains two copies of the element; thus your set is corrupted. This problem
>>>>> explains the 'nasty' part of the name nastyAddNew. I also have a method
>>>>> addNew: which reports an error if the object is already in the set.
>>>>> nastyAddNew: can be used in set operations such as copy and intersection
>>>>> making these operations faster and can also be used externally by brave users.
>>>>>
>>>>> If improvements to FasterSets are being contemplated then I suggest that
>>>>> adding methods addNew: and nastyAddNew: be considered. If nastyAddNew:
>>>>> is added I see no point in making it private because some users will
>>>>> use it anyway.
>>>>> I am not necessarily voting for their inclusion because we are then
>>>>> giving the user the opportunity to corrupt his data.
>>>>> Of course I do use them in my current project.
>>>>> thus, for my current project at least, I am a brave user.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Ralph Boland
>>>>>
>>>>> _______________________________________________
>>>>> Pharo-project mailing list
>>>>> [hidden email]
>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Igor Stasenko AKA sig.
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>> .
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>> .
>>
>>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Igor Stasenko

Re: FasterSets improvements

In reply to this post by Andres Valloud-4

2009/7/3 Andres Valloud <[hidden email]>:
> I think it's just a bad idea... enumeration of the set would become
> unnecessarily complicated, and a set that includes itself is asking for
> trouble. For example, how do you hash the set? How do you determine
> whether a set that includes itself is equal to itself? How do these
> print themselves? In Smalltalk everything is an object, and
> consequently, some object must represent the absence of an object in a
> storage slot. That object cannot, at the same time, represent the
> presence of something.
>
well, you can try and see what will happen if you try this:

set := Set new.
set add: set.
set printString.

or try using Array instead of set.
Or try using any other object which prints its slots, and put in one
of its slots itself.
What makes a Set an exceptional in this regard?

--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Andres Valloud-4

Re: FasterSets improvements

Unlike arrayed collections, hashed collections use a sentinel value for
implementation purposes. Such collections shouldn't be allowed to
contain the sentinel because doing so introduces unnecessary complexity
for the vast majority of practical applications. It's not worth it. If
adding nil to a set is important, then using self as the sentinel could
do (at the cost of initializing the set to contain self at all slots
when the instance is created, thus duplicating the nilling the VM does
for you). But then, the set shouldn't contain itself.

Now, this is just my opinion... :).

Igor Stasenko wrote:

> 2009/7/3 Andres Valloud <[hidden email]>:
>
>> I think it's just a bad idea... enumeration of the set would become
>> unnecessarily complicated, and a set that includes itself is asking for
>> trouble. For example, how do you hash the set? How do you determine
>> whether a set that includes itself is equal to itself? How do these
>> print themselves? In Smalltalk everything is an object, and
>> consequently, some object must represent the absence of an object in a
>> storage slot. That object cannot, at the same time, represent the
>> presence of something.
>>
>>
> well, you can try and see what will happen if you try this:
>
> set := Set new.
> set add: set.
> set printString.
>
> or try using Array instead of set.
> Or try using any other object which prints its slots, and put in one
> of its slots itself.
> What makes a Set an exceptional in this regard?
>
>
>
>> Igor Stasenko wrote:
>>
>>> 2009/7/3 Andres Valloud <[hidden email]>:
>>>
>>>
>>>> Keep in mind that, as they're implemented, there will be always some
>>>> object that cannot be added to a Set because the set uses the object as
>>>> a sentinel value. On the other hand, since it does not make a whole lot
>>>> of sense to add a set to itself, maybe it should use self as the
>>>> sentinel value.
>>>>
>>>>
>>>>
>>> There is no difference what object to use as a sentinel - self or nil.
>>> In fact you can use any object which fits your needs.
>>> AND be able to include it as an element of set.
>>>
>>> Just to illustrate:
>>>
>>> Set>>add: anObject
>>> sentinel == anObject ifTrue: [
>>> includesSentinel := true.
>>> ].
>>> .... rest of code ...
>>>
>>> Set>>includes: anObject
>>> anObject == sentinel ifTrue: [ ^ indludesSentinel ].
>>> .... rest of code ..
>>>
>>>
>>>
>>>
>>>> Igor Stasenko wrote:
>>>>
>>>>
>>>>> Anyone considered fixing Sets to allow a nil as an element?
>>>>>
>>>>> 2009/7/3 Ralph Boland <[hidden email]>:
>>>>>
>>>>>
>>>>>
>>>>>> 2009/7/1 Henrik Johansen <[hidden email]>:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I took some time to check this yesterday, my comments on the code itself are
>>>>>>> basically the same as what Lucas posted in the issue tracker.
>>>>>>> Works as advertised, but should probably clean out the "gunk" which is
>>>>>>> unused after it's integrated.
>>>>>>>
>>>>>>> Some feedback on possible improvements:
>>>>>>> - What it does, is basically speed up the adds to the new array in a grow
>>>>>>> that might happen Why not take this all to the logical conclusion, and
>>>>>>> instead optimize an addAll: method to be used in grow, that assumes elements
>>>>>>> are not in set (and themselves are not duplicates)? F.ex. it could also add
>>>>>>> to tally just once, instead of one increase per element as the noCompareAdd:
>>>>>>> does. (an add method with no tally add, no = object check, and no
>>>>>>> growCheck).
>>>>>>>
>>>>>>>
>>>>>>> - The usual addAll: method can be sped up in a similar way, by performing
>>>>>>> one grow operation before adding elements if necessary (pessimistically
>>>>>>> assuming all elements are unique), then use an add method which does not
>>>>>>> check for grow need at all. (an add method with tally add and = object
>>>>>>> check, but no growCheck)
>>>>>>> If desirable, you could also shrink the capacity after adds are done, to
>>>>>>> match the real amount of new elements.
>>>>>>>
>>>>>>> Adding (1 to: 1200000) to a new Set with such an addAll: method took 1/2 to
>>>>>>> 1/3rd the time of the regular addAll: method, even with the lower grow cost
>>>>>>> of FasterSets. (Lower gains if Set is presized, and if hash-calculation is
>>>>>>> bigger part of cost)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> I considered this when I created FasterSets. I didn't do this because I was
>>>>>> being conservative. I didn't want FasterSets rejected because it did things
>>>>>> that weren't wanted. I am surprised though at the amount of performance
>>>>>> gain you see.
>>>>>> As things stand I see no reason for not including
>>>>>> FasterSets other than the potential of breaking packages that subclass
>>>>>> Set and then override methods crucial to FasterSets.
>>>>>> My vote is obviously for fixing any packages that get broken rather than
>>>>>> rejecting FasterSets.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Note: If you want to stay ANSI-compatible, one tricky point in such an
>>>>>>> optimization is adding all objects in collection up to a nil element, then
>>>>>>> raising an error. (ie: must produce equivalent results to add: performed of
>>>>>>> each of collections elements in sequence )
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Henry
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> I created FasterSets for adding to Squeak and did not consider Pharo but
>>>>>> of course am delighted to have it considered for Pharo.
>>>>>> You can of course make whatever improvements you want.
>>>>>> I encourage though that you create a FasterSets2 containing all the
>>>>>> improvements and leave the original FasterSets alone. I am still interested
>>>>>> in having FasterSets included in Squeak and I do not know if the powers that
>>>>>> be will want the additional improvements or not so I prefer that both versions
>>>>>> exist.
>>>>>>
>>>>>> One thing that I have in my code is a method called nastyAddNew: which
>>>>>> adds an element to a set without checking if the element is already there.
>>>>>> If the element is already there of course it gets added again and your set
>>>>>> contains two copies of the element; thus your set is corrupted. This problem
>>>>>> explains the 'nasty' part of the name nastyAddNew. I also have a method
>>>>>> addNew: which reports an error if the object is already in the set.
>>>>>> nastyAddNew: can be used in set operations such as copy and intersection
>>>>>> making these operations faster and can also be used externally by brave users.
>>>>>>
>>>>>> If improvements to FasterSets are being contemplated then I suggest that
>>>>>> adding methods addNew: and nastyAddNew: be considered. If nastyAddNew:
>>>>>> is added I see no point in making it private because some users will
>>>>>> use it anyway.
>>>>>> I am not necessarily voting for their inclusion because we are then
>>>>>> giving the user the opportunity to corrupt his data.
>>>>>> Of course I do use them in my current project.
>>>>>> thus, for my current project at least, I am a brave user.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Ralph Boland
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pharo-project mailing list
>>>>>> [hidden email]
>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Igor Stasenko AKA sig.
>>>>>
>>>>> _______________________________________________
>>>>> Pharo-project mailing list
>>>>> [hidden email]
>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>> .
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>
>>>>
>>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko AKA sig.
>>> .
>>>
>>>
>>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
> .
>
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Igor Stasenko

Re: FasterSets improvements

2009/7/3 Andres Valloud <[hidden email]>:
> Unlike arrayed collections, hashed collections use a sentinel value for
> implementation purposes. Such collections shouldn't be allowed to
> contain the sentinel because doing so introduces unnecessary complexity
> for the vast majority of practical applications. It's not worth it. If
> adding nil to a set is important, then using self as the sentinel could
> do (at the cost of initializing the set to contain self at all slots
> when the instance is created, thus duplicating the nilling the VM does
> for you). But then, the set shouldn't contain itself.
>

I think you are mixing an object's interface with its inner
implementation details.
A Set can include any object in itself, without any discrimination.
But its implemented in such way that it's impossible to include nil in it.
So, its not worth proving that due to limitation, which comes from
implementation detail, its not worth to have Sets which can include
nils.

> Now, this is just my opinion... :).
>
> Igor Stasenko wrote:
>> 2009/7/3 Andres Valloud <[hidden email]>:
>>
>>> I think it's just a bad idea... enumeration of the set would become
>>> unnecessarily complicated, and a set that includes itself is asking for
>>> trouble. For example, how do you hash the set? How do you determine
>>> whether a set that includes itself is equal to itself? How do these
>>> print themselves? In Smalltalk everything is an object, and
>>> consequently, some object must represent the absence of an object in a
>>> storage slot. That object cannot, at the same time, represent the
>>> presence of something.
>>>
>>>
>> well, you can try and see what will happen if you try this:
>>
>> set := Set new.
>> set add: set.
>> set printString.
>>
>> or try using Array instead of set.
>> Or try using any other object which prints its slots, and put in one
>> of its slots itself.
>> What makes a Set an exceptional in this regard?
>>
>>
>>
>>> Igor Stasenko wrote:
>>>
>>>> 2009/7/3 Andres Valloud <[hidden email]>:
>>>>
>>>>
>>>>> Keep in mind that, as they're implemented, there will be always some
>>>>> object that cannot be added to a Set because the set uses the object as
>>>>> a sentinel value. On the other hand, since it does not make a whole lot
>>>>> of sense to add a set to itself, maybe it should use self as the
>>>>> sentinel value.
>>>>>
>>>>>
>>>>>
>>>> There is no difference what object to use as a sentinel - self or nil.
>>>> In fact you can use any object which fits your needs.
>>>> AND be able to include it as an element of set.
>>>>
>>>> Just to illustrate:
>>>>
>>>> Set>>add: anObject
>>>> sentinel == anObject ifTrue: [
>>>> includesSentinel := true.
>>>> ].
>>>> .... rest of code ...
>>>>
>>>> Set>>includes: anObject
>>>> anObject == sentinel ifTrue: [ ^ indludesSentinel ].
>>>> .... rest of code ..
>>>>
>>>>
>>>>
>>>>
>>>>> Igor Stasenko wrote:
>>>>>
>>>>>
>>>>>> Anyone considered fixing Sets to allow a nil as an element?
>>>>>>
>>>>>> 2009/7/3 Ralph Boland <[hidden email]>:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> 2009/7/1 Henrik Johansen <[hidden email]>:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I took some time to check this yesterday, my comments on the code itself are
>>>>>>>> basically the same as what Lucas posted in the issue tracker.
>>>>>>>> Works as advertised, but should probably clean out the "gunk" which is
>>>>>>>> unused after it's integrated.
>>>>>>>>
>>>>>>>> Some feedback on possible improvements:
>>>>>>>> - What it does, is basically speed up the adds to the new array in a grow
>>>>>>>> that might happen Why not take this all to the logical conclusion, and
>>>>>>>> instead optimize an addAll: method to be used in grow, that assumes elements
>>>>>>>> are not in set (and themselves are not duplicates)? F.ex. it could also add
>>>>>>>> to tally just once, instead of one increase per element as the noCompareAdd:
>>>>>>>> does. (an add method with no tally add, no = object check, and no
>>>>>>>> growCheck).
>>>>>>>>
>>>>>>>>
>>>>>>>> - The usual addAll: method can be sped up in a similar way, by performing
>>>>>>>> one grow operation before adding elements if necessary (pessimistically
>>>>>>>> assuming all elements are unique), then use an add method which does not
>>>>>>>> check for grow need at all. (an add method with tally add and = object
>>>>>>>> check, but no growCheck)
>>>>>>>> If desirable, you could also shrink the capacity after adds are done, to
>>>>>>>> match the real amount of new elements.
>>>>>>>>
>>>>>>>> Adding (1 to: 1200000) to a new Set with such an addAll: method took 1/2 to
>>>>>>>> 1/3rd the time of the regular addAll: method, even with the lower grow cost
>>>>>>>> of FasterSets. (Lower gains if Set is presized, and if hash-calculation is
>>>>>>>> bigger part of cost)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> I considered this when I created FasterSets. I didn't do this because I was
>>>>>>> being conservative. I didn't want FasterSets rejected because it did things
>>>>>>> that weren't wanted. I am surprised though at the amount of performance
>>>>>>> gain you see.
>>>>>>> As things stand I see no reason for not including
>>>>>>> FasterSets other than the potential of breaking packages that subclass
>>>>>>> Set and then override methods crucial to FasterSets.
>>>>>>> My vote is obviously for fixing any packages that get broken rather than
>>>>>>> rejecting FasterSets.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Note: If you want to stay ANSI-compatible, one tricky point in such an
>>>>>>>> optimization is adding all objects in collection up to a nil element, then
>>>>>>>> raising an error. (ie: must produce equivalent results to add: performed of
>>>>>>>> each of collections elements in sequence )
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Henry
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> I created FasterSets for adding to Squeak and did not consider Pharo but
>>>>>>> of course am delighted to have it considered for Pharo.
>>>>>>> You can of course make whatever improvements you want.
>>>>>>> I encourage though that you create a FasterSets2 containing all the
>>>>>>> improvements and leave the original FasterSets alone. I am still interested
>>>>>>> in having FasterSets included in Squeak and I do not know if the powers that
>>>>>>> be will want the additional improvements or not so I prefer that both versions
>>>>>>> exist.
>>>>>>>
>>>>>>> One thing that I have in my code is a method called nastyAddNew: which
>>>>>>> adds an element to a set without checking if the element is already there.
>>>>>>> If the element is already there of course it gets added again and your set
>>>>>>> contains two copies of the element; thus your set is corrupted. This problem
>>>>>>> explains the 'nasty' part of the name nastyAddNew. I also have a method
>>>>>>> addNew: which reports an error if the object is already in the set.
>>>>>>> nastyAddNew: can be used in set operations such as copy and intersection
>>>>>>> making these operations faster and can also be used externally by brave users.
>>>>>>>
>>>>>>> If improvements to FasterSets are being contemplated then I suggest that
>>>>>>> adding methods addNew: and nastyAddNew: be considered. If nastyAddNew:
>>>>>>> is added I see no point in making it private because some users will
>>>>>>> use it anyway.
>>>>>>> I am not necessarily voting for their inclusion because we are then
>>>>>>> giving the user the opportunity to corrupt his data.
>>>>>>> Of course I do use them in my current project.
>>>>>>> thus, for my current project at least, I am a brave user.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Ralph Boland
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pharo-project mailing list
>>>>>>> [hidden email]
>>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Igor Stasenko AKA sig.
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pharo-project mailing list
>>>>>> [hidden email]
>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>> .
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Pharo-project mailing list
>>>>> [hidden email]
>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Igor Stasenko AKA sig.
>>>> .
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>> .
>>
>>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Igor Stasenko

Re: FasterSets improvements

Here the implementation of Set which can include nils.
Where do you see so-called 'complexity'?
I had to override only 8 methods.

2009/7/3 Igor Stasenko <[hidden email]>:

> 2009/7/3 Andres Valloud <[hidden email]>:
>> Unlike arrayed collections, hashed collections use a sentinel value for
>> implementation purposes. Such collections shouldn't be allowed to
>> contain the sentinel because doing so introduces unnecessary complexity
>> for the vast majority of practical applications. It's not worth it. If
>> adding nil to a set is important, then using self as the sentinel could
>> do (at the cost of initializing the set to contain self at all slots
>> when the instance is created, thus duplicating the nilling the VM does
>> for you). But then, the set shouldn't contain itself.
>>
>
> I think you are mixing an object's interface with its inner
> implementation details.
> A Set can include any object in itself, without any discrimination.
> But its implemented in such way that it's impossible to include nil in it.
> So, its not worth proving that due to limitation, which comes from
> implementation detail, its not worth to have Sets which can include
> nils.
>

--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

GenericSet.st (3K) Download Attachment

Andres Valloud-4

Re: FasterSets improvements

Why does #like: have to be disabled? Why can't you provide an
implementation for it?

Every add: has an additional nil check. I'd bet that in 99.9999% of the
cases, the nil check is effectively overhead. In efficient
implementations of hashed collections, you can tell when you add a few
message sends because it costs about 10% (as I measured for
findElementOrNil: in VW's IdentitySet).

Besides the additional complexity, is do: actually correct? How can a
set includeNil but have a tally of 0? I am suspicious that add: should
increase the tally when you add nil (more complexity).

Every removal has an additional nil check. I also note that the tally
is not updated either...

So. I wanted to point out that these 8 refined methods give you:

* Inflexible implementation. If Set changes, then you have to change
GenericSet too.

* Additional code complexity that people have to deal with.

* But you can add nil to a Set.

Let's say you write code with GenericSet and use the fact that you can
add nils. In the vast majority of Smalltalks (all?), you can't add nil
to a set. So, if you want to port code, you have to port GenericSet
(and apply the necessary refinements N times). You can't use the
everyday Set that IIRC is specified in the ANSI standard.

Again, is it worth the effort?

Finally, of course you compare GenericSet to e.g.: HPS, and there's no
complexity at all. Nevertheless, I've come to think that "I can do it"
generally doesn't mean "I should do it".

Andres.

Igor Stasenko wrote:

> Here the implementation of Set which can include nils.
> Where do you see so-called 'complexity'?
> I had to override only 8 methods.
>
>
> 2009/7/3 Igor Stasenko <[hidden email]>:
>
>> 2009/7/3 Andres Valloud <[hidden email]>:
>>
>>> Unlike arrayed collections, hashed collections use a sentinel value for
>>> implementation purposes. Such collections shouldn't be allowed to
>>> contain the sentinel because doing so introduces unnecessary complexity
>>> for the vast majority of practical applications. It's not worth it. If
>>> adding nil to a set is important, then using self as the sentinel could
>>> do (at the cost of initializing the set to contain self at all slots
>>> when the instance is created, thus duplicating the nilling the VM does
>>> for you). But then, the set shouldn't contain itself.
>>>
>>>
>> I think you are mixing an object's interface with its inner
>> implementation details.
>> A Set can include any object in itself, without any discrimination.
>> But its implemented in such way that it's impossible to include nil in it.
>> So, its not worth proving that due to limitation, which comes from
>> implementation detail, its not worth to have Sets which can include
>> nils.
>>
>>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Stéphane Ducasse

Re: FasterSets improvements

In reply to this post by Igor Stasenko

This would be good
Once eliot mentioned that we need a new primitive for new which
allocate with a passed arg (if I remember well)
and some days after tim disappeared from squeak world.

On Jul 3, 2009, at 4:53 AM, Igor Stasenko wrote:

> Anyone considered fixing Sets to allow a nil as an element?
>
> 2009/7/3 Ralph Boland <[hidden email]>:
>> 2009/7/1 Henrik Johansen <[hidden email]>:
>>> I took some time to check this yesterday, my comments on the code
>>> itself are
>>> basically the same as what Lucas posted in the issue tracker.
>>> Works as advertised, but should probably clean out the "gunk"
>>> which is
>>> unused after it's integrated.
>>>
>>> Some feedback on possible improvements:
>>> - What it does, is basically speed up the adds to the new array in
>>> a grow
>>> that might happen Why not take this all to the logical conclusion,
>>> and
>>> instead optimize an addAll: method to be used in grow, that
>>> assumes elements
>>> are not in set (and themselves are not duplicates)? F.ex. it could
>>> also add
>>> to tally just once, instead of one increase per element as the
>>> noCompareAdd:
>>> does. (an add method with no tally add, no = object check, and no
>>> growCheck).
>>
>>
>>>
>>> - The usual addAll: method can be sped up in a similar way, by
>>> performing
>>> one grow operation before adding elements if necessary
>>> (pessimistically
>>> assuming all elements are unique), then use an add method which
>>> does not
>>> check for grow need at all. (an add method with tally add and =
>>> object
>>> check, but no growCheck)
>>> If desirable, you could also shrink the capacity after adds are
>>> done, to
>>> match the real amount of new elements.
>>>
>>> Adding (1 to: 1200000) to a new Set with such an addAll: method
>>> took 1/2 to
>>> 1/3rd the time of the regular addAll: method, even with the lower
>>> grow cost
>>> of FasterSets. (Lower gains if Set is presized, and if hash-
>>> calculation is
>>> bigger part of cost)
>>>
>>
>> I considered this when I created FasterSets. I didn't do this
>> because I was
>> being conservative. I didn't want FasterSets rejected because it
>> did things
>> that weren't wanted. I am surprised though at the amount of
>> performance
>> gain you see.
>> As things stand I see no reason for not including
>> FasterSets other than the potential of breaking packages that
>> subclass
>> Set and then override methods crucial to FasterSets.
>> My vote is obviously for fixing any packages that get broken rather
>> than
>> rejecting FasterSets.
>>
>>> Note: If you want to stay ANSI-compatible, one tricky point in
>>> such an
>>> optimization is adding all objects in collection up to a nil
>>> element, then
>>> raising an error. (ie: must produce equivalent results to add:
>>> performed of
>>> each of collections elements in sequence )
>>>
>>> Cheers,
>>> Henry
>>>
>>>
>>
>> I created FasterSets for adding to Squeak and did not consider
>> Pharo but
>> of course am delighted to have it considered for Pharo.
>> You can of course make whatever improvements you want.
>> I encourage though that you create a FasterSets2 containing all the
>> improvements and leave the original FasterSets alone. I am still
>> interested
>> in having FasterSets included in Squeak and I do not know if the
>> powers that
>> be will want the additional improvements or not so I prefer that
>> both versions
>> exist.
>>
>> One thing that I have in my code is a method called nastyAddNew:
>> which
>> adds an element to a set without checking if the element is already
>> there.
>> If the element is already there of course it gets added again and
>> your set
>> contains two copies of the element; thus your set is corrupted.
>> This problem
>> explains the 'nasty' part of the name nastyAddNew. I also have a
>> method
>> addNew: which reports an error if the object is already in the set.
>> nastyAddNew: can be used in set operations such as copy and
>> intersection
>> making these operations faster and can also be used externally by
>> brave users.
>>
>> If improvements to FasterSets are being contemplated then I suggest
>> that
>> adding methods addNew: and nastyAddNew: be considered. If
>> nastyAddNew:
>> is added I see no point in making it private because some users will
>> use it anyway.
>> I am not necessarily voting for their inclusion because we are then
>> giving the user the opportunity to corrupt his data.
>> Of course I do use them in my current project.
>> thus, for my current project at least, I am a brave user.
>>
>> Regards,
>>
>> Ralph Boland
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Stéphane Ducasse

Re: FasterSets improvements

In reply to this post by Andres Valloud-4

Hi andres

thanks for your comments. I thought that in ANSI you can have nil in
Set.
Is it not the case in VW?

Stef

On Jul 3, 2009, at 8:18 AM, Andres Valloud wrote:

> Why does #like: have to be disabled? Why can't you provide an
> implementation for it?
>
> Every add: has an additional nil check. I'd bet that in 99.9999% of
> the
> cases, the nil check is effectively overhead. In efficient
> implementations of hashed collections, you can tell when you add a few
> message sends because it costs about 10% (as I measured for
> findElementOrNil: in VW's IdentitySet).
>
> Besides the additional complexity, is do: actually correct? How can a
> set includeNil but have a tally of 0? I am suspicious that add:
> should
> increase the tally when you add nil (more complexity).
>
> Every removal has an additional nil check. I also note that the tally
> is not updated either...
>
> So. I wanted to point out that these 8 refined methods give you:
>
> * Inflexible implementation. If Set changes, then you have to change
> GenericSet too.
>
> * Additional code complexity that people have to deal with.
>
> * But you can add nil to a Set.
>
> Let's say you write code with GenericSet and use the fact that you can
> add nils. In the vast majority of Smalltalks (all?), you can't add
> nil
> to a set. So, if you want to port code, you have to port GenericSet
> (and apply the necessary refinements N times). You can't use the
> everyday Set that IIRC is specified in the ANSI standard.
>
> Again, is it worth the effort?
>
> Finally, of course you compare GenericSet to e.g.: HPS, and there's no
> complexity at all. Nevertheless, I've come to think that "I can do
> it"
> generally doesn't mean "I should do it".
>
> Andres.
>
>
> Igor Stasenko wrote:
>> Here the implementation of Set which can include nils.
>> Where do you see so-called 'complexity'?
>> I had to override only 8 methods.
>>
>>
>> 2009/7/3 Igor Stasenko <[hidden email]>:
>>
>>> 2009/7/3 Andres Valloud <[hidden email]>:
>>>
>>>> Unlike arrayed collections, hashed collections use a sentinel
>>>> value for
>>>> implementation purposes. Such collections shouldn't be allowed to
>>>> contain the sentinel because doing so introduces unnecessary
>>>> complexity
>>>> for the vast majority of practical applications. It's not worth
>>>> it. If
>>>> adding nil to a set is important, then using self as the sentinel
>>>> could
>>>> do (at the cost of initializing the set to contain self at all
>>>> slots
>>>> when the instance is created, thus duplicating the nilling the VM
>>>> does
>>>> for you). But then, the set shouldn't contain itself.
>>>>
>>>>
>>> I think you are mixing an object's interface with its inner
>>> implementation details.
>>> A Set can include any object in itself, without any discrimination.
>>> But its implemented in such way that it's impossible to include
>>> nil in it.
>>> So, its not worth proving that due to limitation, which comes from
>>> implementation detail, its not worth to have Sets which can include
>>> nils.
>>>
>>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Stéphane Ducasse

Re: FasterSets improvements

In reply to this post by Igor Stasenko

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Andres Valloud-4

Re: FasterSets improvements

In reply to this post by Stéphane Ducasse

IIRC, the behavior of adding nil to a Set is undefined as per ANSI. In
VW, adding nil is a no-op.

Stéphane Ducasse wrote:

> Hi andres
>
> thanks for your comments. I thought that in ANSI you can have nil in
> Set.
> Is it not the case in VW?
>
> Stef
>
>
> On Jul 3, 2009, at 8:18 AM, Andres Valloud wrote:
>
>
>> Why does #like: have to be disabled? Why can't you provide an
>> implementation for it?
>>
>> Every add: has an additional nil check. I'd bet that in 99.9999% of
>> the
>> cases, the nil check is effectively overhead. In efficient
>> implementations of hashed collections, you can tell when you add a few
>> message sends because it costs about 10% (as I measured for
>> findElementOrNil: in VW's IdentitySet).
>>
>> Besides the additional complexity, is do: actually correct? How can a
>> set includeNil but have a tally of 0? I am suspicious that add:
>> should
>> increase the tally when you add nil (more complexity).
>>
>> Every removal has an additional nil check. I also note that the tally
>> is not updated either...
>>
>> So. I wanted to point out that these 8 refined methods give you:
>>
>> * Inflexible implementation. If Set changes, then you have to change
>> GenericSet too.
>>
>> * Additional code complexity that people have to deal with.
>>
>> * But you can add nil to a Set.
>>
>> Let's say you write code with GenericSet and use the fact that you can
>> add nils. In the vast majority of Smalltalks (all?), you can't add
>> nil
>> to a set. So, if you want to port code, you have to port GenericSet
>> (and apply the necessary refinements N times). You can't use the
>> everyday Set that IIRC is specified in the ANSI standard.
>>
>> Again, is it worth the effort?
>>
>> Finally, of course you compare GenericSet to e.g.: HPS, and there's no
>> complexity at all. Nevertheless, I've come to think that "I can do
>> it"
>> generally doesn't mean "I should do it".
>>
>> Andres.
>>
>>
>> Igor Stasenko wrote:
>>
>>> Here the implementation of Set which can include nils.
>>> Where do you see so-called 'complexity'?
>>> I had to override only 8 methods.
>>>
>>>
>>> 2009/7/3 Igor Stasenko <[hidden email]>:
>>>
>>>
>>>> 2009/7/3 Andres Valloud <[hidden email]>:
>>>>
>>>>
>>>>> Unlike arrayed collections, hashed collections use a sentinel
>>>>> value for
>>>>> implementation purposes. Such collections shouldn't be allowed to
>>>>> contain the sentinel because doing so introduces unnecessary
>>>>> complexity
>>>>> for the vast majority of practical applications. It's not worth
>>>>> it. If
>>>>> adding nil to a set is important, then using self as the sentinel
>>>>> could
>>>>> do (at the cost of initializing the set to contain self at all
>>>>> slots
>>>>> when the instance is created, thus duplicating the nilling the VM
>>>>> does
>>>>> for you). But then, the set shouldn't contain itself.
>>>>>
>>>>>
>>>>>
>>>> I think you are mixing an object's interface with its inner
>>>> implementation details.
>>>> A Set can include any object in itself, without any discrimination.
>>>> But its implemented in such way that it's impossible to include
>>>> nil in it.
>>>> So, its not worth proving that due to limitation, which comes from
>>>> implementation detail, its not worth to have Sets which can include
>>>> nils.
>>>>
>>>>
>>>>
>>> --
>>> Best regards,
>>> Igor Stasenko AKA sig.
>>>
>>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
> .
>
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Igor Stasenko

Re: FasterSets improvements

In reply to this post by Andres Valloud-4

2009/7/3 Andres Valloud <[hidden email]>:
> Why does #like: have to be disabled? Why can't you provide an
> implementation for it?
>
> Every add: has an additional nil check. I'd bet that in 99.9999% of the
> cases, the nil check is effectively overhead. In efficient
> implementations of hashed collections, you can tell when you add a few
> message sends because it costs about 10% (as I measured for
> findElementOrNil: in VW's IdentitySet).
>
you forgot that #add: in Set has an additional nil check already :)

> Besides the additional complexity, is do: actually correct? How can a
> set includeNil but have a tally of 0? I am suspicious that add: should
> increase the tally when you add nil (more complexity).
>
again, 'tally' is an internal state. Not something you can see or
meaningfully use outside.

> Every removal has an additional nil check. I also note that the tally
> is not updated either...
>
well, you can test it, if you think it wrongly implemented.

> So. I wanted to point out that these 8 refined methods give you:
>
> * Inflexible implementation. If Set changes, then you have to change
> GenericSet too.
>
can't accept such argument. Should i implement it from scratch by
subclassing from Object?

> * Additional code complexity that people have to deal with.
>
where? and what you mean by 'deal with'?
Did i extended a Set protocol, or something like that?

> * But you can add nil to a Set.
>
> Let's say you write code with GenericSet and use the fact that you can
> add nils. In the vast majority of Smalltalks (all?), you can't add nil
> to a set. So, if you want to port code, you have to port GenericSet
> (and apply the necessary refinements N times). You can't use the
> everyday Set that IIRC is specified in the ANSI standard.
>
I can say similar about situations where you need to use sets where it
meaningfully can contain nils.
Each time, in every such place, you have to write a workarounds to
count nil separately.
Now compare this with a class, which provides you this in a simple and
usual manner.

> Again, is it worth the effort?
>
> Finally, of course you compare GenericSet to e.g.: HPS, and there's no
> complexity at all. Nevertheless, I've come to think that "I can do it"
> generally doesn't mean "I should do it".
>

Sure thing. But you will recall my words each time when you will need
to deal with nil, as with useful and valuable object, which you can
treat in same way as others.

> Andres.
>
>

--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Stéphane Ducasse

Re: FasterSets improvements

In reply to this post by Andres Valloud-4

Thanks

> IIRC, the behavior of adding nil to a Set is undefined as per ANSI.
> In
> VW, adding nil is a no-op.

In that case I would not let nil to be a possible element and avoid
incompatibility.
But this is strange since I remember having problem to port code from
VW to Squeak

Stef

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Reinout Heeck

Re: FasterSets improvements

>> In
>> VW, adding nil is a no-op.
>

Which is a bug that still needs fixing, it should raise an exception
(like dictionary does in VW).

> In that case I would not let nil to be a possible element and avoid
> incompatibility.

Oh, how compatible does Pharo need to be/intend to be with bugs of
other platforms?

R
-

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Andres Valloud-4

Re: FasterSets improvements

Whether it's a bug depends on your point of view. According to the ANSI
spec, it's not a bug because the behavior in that case is undefined. On
the other hand, at first sight it seems Set should behave like Dictionary...

Reinout Heeck wrote:

>>> In
>>> VW, adding nil is a no-op.
>>>
>
> Which is a bug that still needs fixing, it should raise an exception
> (like dictionary does in VW).
>
>
>
>> In that case I would not let nil to be a possible element and avoid
>> incompatibility.
>>
>
> Oh, how compatible does Pharo need to be/intend to be with bugs of
> other platforms?
>
>
>
> R
> -
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project