Sets which not allowing contain nil as element is a point of inconvenience.
There are two ways how get around that: - initialize an array which contains set elements with unique object ~~ nil. and fix methods which testing for an empty slots to compare against this object, not nil. This can cause a slowdown during rehashing, because VM initially creating arrays filled with nils, while we need to fill them with another object. There is also a problem with preserving it 'uniqueness' by not giving this object outside a set. - add an instVar 'containsNil' then when set receiving 'add: nil' , it simply sets this flag to true. modify #collect: , #do: , #remove: to be aware of flag value. I find the second way is more appropriate. While it costs additional memory per Set/IdentitySet instance, it costs almost nothing in speed. What do you think about supporting Sets to contain nils in general, and about methods how to achieve that? -- Best regards, Igor Stasenko AKA sig. |
I'll ask the inevitable question... given that ability, what do you really want to do? :) -C |
In reply to this post by Igor Stasenko
>>>>> "Igor" == Igor Stasenko <[hidden email]> writes:
Igor> What do you think about supporting Sets to contain nils in general, Igor> and about methods how to achieve that? I ran across this problem as well. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <[hidden email]> <URL:http://www.stonehenge.com/merlyn/> Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc. See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion |
In reply to this post by Igor Stasenko
Sets are defined as not holding nils. Changing that would mess up
quite a bit of code I suspect. Sets are sparsely populated arrays (around 40% IIRC) that use the nils to mark the unused slots. How do you imagine that we would mark unused slots (which are an important part of the whole sparse array thing) if nils were allowed? And how would you expect #do:/collect: etc to work - you would have to make all your blocks test for nil. Oh and how would code discriminate between nils you put in the array and nils that are put there by the system for the sparseness management? If you need Sets and want a distinguished value, make one. Just don't mess up the whole system in the process, eh? tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Strange OpCodes: RDR: Rotate Disk Right |
In reply to this post by ccrraaiigg
2008/8/9 Craig Latta <[hidden email]>:
> > I'll ask the inevitable question... given that ability, what do you > really want to do? :) > Some practicle usage: pick an arbitrary array and tell the number of _unique_ objects it contains, including nils, of course. -- Best regards, Igor Stasenko AKA sig. |
On 8-Aug-08, at 5:20 PM, Igor Stasenko wrote: > 2008/8/9 Craig Latta <[hidden email]>: >> >> I'll ask the inevitable question... given that ability, what do >> you >> really want to do? :) >> > > Some practicle usage: pick an arbitrary array and tell the number of > _unique_ objects it contains, including nils, of course. For Sets that is trivial without any need whatsoever to alter anything. It is theSet size + 1 since all Sets will have at least one nil. Unless of course some evil person messes up the growing protocol for Sets. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Variables won't; constants aren't. |
In reply to this post by timrowledge
2008/8/9 tim Rowledge <[hidden email]>:
> Sets are defined as not holding nils. That's the problem. A nil is just an object. What other reasons besides implementaion limitations restricts from putting nils in set? Sets are commonly used as a sets of _arbitrary_ objects. But hey - nil is an object too. So what the point in such discrimination? > Changing that would mess up quite a bit of code I suspect. I suspect not. Currently Sets will throw an error if you try to include nil as element. This means that in current system there are no sets containing nils and no code which intentionally put nils to sets, because this leads to error. >From the above i can conclude that removing limitation of inclusion nils to sets will not break anything in current system. > > Sets are sparsely populated arrays (around 40% IIRC) that use the nils to > mark the unused slots. How do you imagine that we would mark unused slots > (which are an important part of the whole sparse array thing) if nils were > allowed? I thought i outlined possible solutions good enough :) > And how would you expect #do:/collect: etc to work - you would > have to make all your blocks test for nil. Oh and how would code > discriminate between nils you put in the array and nils that are put there > by the system for the sparseness management? > Let me illustrate you: Collection subclass: #Set instanceVariableNames: 'tally array containsNil' Set>>initialize containsNil := false Set>>add: newObject newObject ifNil: [ containsNil := true. ^ nil ]. " instead of self error: blabla ... " ... blabla... Set>>do: aBlock containsNil ifTrue: [ aBlock value: nil ]. ... the rest of method is same as previous .. Set>>collect: aBlock | newSet | newSet := Set new: self size. array do: [:each | each ifNotNil: [newSet add: (aBlock value: each)]]. containsNil ifTrue: [ newSet add: nil ]. ^ newSet so, its actually a question of adding 'containsNil ifTrue: [] ' everywhere it needs to be. > If you need Sets and want a distinguished value, make one. Just don't mess > up the whole system in the process, eh? > > tim > -- > tim Rowledge; [hidden email]; http://www.rowledge.org/tim > Strange OpCodes: RDR: Rotate Disk Right > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Igor Stasenko
On Fri, Aug 8, 2008 at 3:53 PM, Igor Stasenko <[hidden email]> wrote: Sets which not allowing contain nil as element is a point of inconvenience. Here's a third approach (a variation on your 2nd approach). Have an instVar 'includesSelf' and fill the array with the Set itself. So add a new primitive new:fillWith: (primitiveNewWithArgAndFillValue?) and use this to create the empty array filled with the Set itself. Check for adding the set itself, and itself being the null entry. The advantage over the flag for nil approach is that you kill two birds with one stone.
1. You need a unique value anyway, and the Set can nicely be its own unique value 2. recursive collections are a problem to print and with the explicit flag this becomes much easier to deal with.
|
In reply to this post by timrowledge
On Fri, Aug 8, 2008 at 5:37 PM, tim Rowledge <[hidden email]> wrote:
But it doesn't deal with the case where the array includes nil. So one has to come up with something clumsy such as (theArray select: [:obj| obj notNil]) asSet size + ((theArray includes: nil) ifTrue: [1] ifFalse: [0])
which creates an extra collection for the select, requires an extra pass to find nil (n the case where it doesn't contain it) and is plain ugly or [| forNil | forNil := 0. (theArray inject: Set new into: [:set :item| item == nil ifTrue: [hasNil := 1] ifFalse: [set add: item]. set]) + forNil]
which is faster but even more verbose and confusing. Compare this to theArray asSet size I'm with Igor. This would indeed be an improvement.
Just because "its always been this way" doesn't mean its right. Variables won't; constants aren't. |
In reply to this post by Eliot Miranda-2
2008/8/9 Eliot Miranda <[hidden email]>:
> > > On Fri, Aug 8, 2008 at 3:53 PM, Igor Stasenko <[hidden email]> wrote: >> >> Sets which not allowing contain nil as element is a point of >> inconvenience. >> >> There are two ways how get around that: >> - initialize an array which contains set elements with unique object ~~ >> nil. >> and fix methods which testing for an empty slots to compare against >> this object, not nil. >> This can cause a slowdown during rehashing, because VM initially >> creating arrays filled with nils, while we need to fill them with >> another object. >> There is also a problem with preserving it 'uniqueness' by not giving >> this object outside a set. >> >> - add an instVar 'containsNil' >> then when set receiving 'add: nil' , it simply sets this flag to true. >> modify #collect: , #do: , #remove: to be aware of flag value. >> >> I find the second way is more appropriate. While it costs additional >> memory per Set/IdentitySet instance, it costs almost nothing in speed. >> >> What do you think about supporting Sets to contain nils in general, >> and about methods how to achieve that? > > Here's a third approach (a variation on your 2nd approach). Have an instVar > 'includesSelf' and fill the array with the Set itself. So add a new > primitive new:fillWith: (primitiveNewWithArgAndFillValue?) and use this to > create the empty array filled with the Set itself. Check for adding the set > itself, and itself being the null entry. The advantage over the flag for > nil approach is that you kill two birds with one stone. > 1. You need a unique value anyway, and the Set can nicely be its own unique > value > 2. recursive collections are a problem to print and with the explicit flag > this becomes much easier to deal with. In math domain, any set includes itself , not as element of course, but as subset :) And i don't see how this is better comparing to having 'containsNil' ivar? You still have to test this flag in each method which deals with elements , so be it containsFoo or containsBar - no real difference. >> >> -- >> Best regards, >> Igor Stasenko AKA sig. >> -- Best regards, Igor Stasenko AKA sig. |
On Fri, Aug 8, 2008 at 5:46 PM, Igor Stasenko <[hidden email]> wrote: 2008/8/9 Eliot Miranda <[hidden email]>: One difference is the use of self for the unique object instead of another. Another difference is that recursive sets no longer fail to print, inspect, etc.
I think those are real differences.
|
2008/8/9 Eliot Miranda <[hidden email]>:
> > > On Fri, Aug 8, 2008 at 5:46 PM, Igor Stasenko <[hidden email]> wrote: >> >> 2008/8/9 Eliot Miranda <[hidden email]>: >> > >> > >> > On Fri, Aug 8, 2008 at 3:53 PM, Igor Stasenko <[hidden email]> >> > wrote: >> >> >> >> Sets which not allowing contain nil as element is a point of >> >> inconvenience. >> >> >> >> There are two ways how get around that: >> >> - initialize an array which contains set elements with unique object ~~ >> >> nil. >> >> and fix methods which testing for an empty slots to compare against >> >> this object, not nil. >> >> This can cause a slowdown during rehashing, because VM initially >> >> creating arrays filled with nils, while we need to fill them with >> >> another object. >> >> There is also a problem with preserving it 'uniqueness' by not giving >> >> this object outside a set. >> >> >> >> - add an instVar 'containsNil' >> >> then when set receiving 'add: nil' , it simply sets this flag to true. >> >> modify #collect: , #do: , #remove: to be aware of flag value. >> >> >> >> I find the second way is more appropriate. While it costs additional >> >> memory per Set/IdentitySet instance, it costs almost nothing in speed. >> >> >> >> What do you think about supporting Sets to contain nils in general, >> >> and about methods how to achieve that? >> > >> > Here's a third approach (a variation on your 2nd approach). Have an >> > instVar >> > 'includesSelf' and fill the array with the Set itself. So add a new >> > primitive new:fillWith: (primitiveNewWithArgAndFillValue?) and use this >> > to >> > create the empty array filled with the Set itself. Check for adding the >> > set >> > itself, and itself being the null entry. The advantage over the flag >> > for >> > nil approach is that you kill two birds with one stone. >> > 1. You need a unique value anyway, and the Set can nicely be its own >> > unique >> > value >> > 2. recursive collections are a problem to print and with the explicit >> > flag >> > this becomes much easier to deal with. >> >> In math domain, any set includes itself , not as element of course, >> but as subset :) >> And i don't see how this is better comparing to having 'containsNil' ivar? >> You still have to test this flag in each method which deals with >> elements , so be it containsFoo or containsBar - no real difference. > > One difference is the use of self for the unique object instead of another. > Another difference is that recursive sets no longer fail to print, inspect, > etc. > I think those are real differences. > Valid point. As a bytecode proofy, can you tell how much different a bytecode will be for: obj == nil versus obj == self versus obj isNil what is faster/slower? -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Eliot Miranda-2
2008/8/9 Eliot Miranda <[hidden email]>:
> > > On Fri, Aug 8, 2008 at 5:37 PM, tim Rowledge <[hidden email]> wrote: >> >> On 8-Aug-08, at 5:20 PM, Igor Stasenko wrote: >> >>> 2008/8/9 Craig Latta <[hidden email]>: >>>> >>>> I'll ask the inevitable question... given that ability, what do you >>>> really want to do? :) >>>> >>> >>> Some practicle usage: pick an arbitrary array and tell the number of >>> _unique_ objects it contains, including nils, of course. >> >> >> For Sets that is trivial without any need whatsoever to alter anything. It >> is theSet size + 1 since all Sets will have at least one nil. Unless of >> course some evil person messes up the growing protocol for Sets. > > But it doesn't deal with the case where the array includes nil. So one has > to come up with something clumsy such as > (theArray select: [:obj| obj notNil]) asSet size + ((theArray includes: > nil) ifTrue: [1] ifFalse: [0]) > which creates an extra collection for the select, requires an extra pass to > find nil (n the case where it doesn't contain it) and is plain ugly > or > [| forNil | forNil := 0. (theArray inject: Set new into: [:set :item| > item == nil ifTrue: [hasNil := 1] ifFalse: [set add: item]. set]) + forNil] > which is faster but even more verbose and confusing. My, maybe more verbose sample, but most efficient one, i guess :) : daSet := Set new. daObject := Object new. daArray do: [:item | item ifNil: [ daSet add: daObject ] ifNotNil: [ daSet add: item] ]. ^ daSet size A use case example from the parrallel domain - databases: SELECT DISTINCT field FROM someTable WHERE blabla - returns a set of disctinct field values (including NULL values). But while we can deal more or less painless with calculating a number of distinct values, operating with distinct values causing a lot more problem: suggest you want to pass a set of distinct values somewhere else: Now since you want to keep notion that nil present or not as distinct value - you have to use additional argument, or convert a set to OrderedCollection and add nil. Nedless to say, its very inconvenient. > Compare this to > > theArray asSet size > I'm with Igor. This would indeed be an improvement. > Just because "its always been this way" doesn't mean its right. >> >> tim >> -- >> tim Rowledge; [hidden email]; http://www.rowledge.org/tim >> Variables won't; constants aren't. >> -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Igor Stasenko
On Fri, Aug 8, 2008 at 6:09 PM, Igor Stasenko <[hidden email]> wrote:
what's a proofy? ;) obj == nil Depends. In an Interpreter obj == self likely to be slightly faster than obj == nil since nil must be fetched either form the specialObjectsArray (if bytecode set has pushNl, as it does) or from the method's literal frame. In a JIT they're liely the same because self is a read through the frame pointer and nil is a constant embedded in the instruction stream. These are likely to be of similar cost.
obj isNil will either be the same cost as == nil or slower depending on whether the bytecode compiler inlines it. But the difference between obj == self & obj == nil/obj isNil will be in the noise.
You should make the decision on convenience. what is faster/slower? anArray asSet size is faster than any of the alternatives because it is easier to thunk about ;) Faster thought is much more valuable than faster processing, c.f. Smalltalk programs vs C++ programs ;)
|
2008/8/9 Eliot Miranda <[hidden email]>:
> > > On Fri, Aug 8, 2008 at 6:09 PM, Igor Stasenko <[hidden email]> wrote: >> >> 2008/8/9 Eliot Miranda <[hidden email]>: >> > >> > >> > On Fri, Aug 8, 2008 at 5:46 PM, Igor Stasenko <[hidden email]> >> > wrote: >> >> >> >> 2008/8/9 Eliot Miranda <[hidden email]>: >> >> > >> >> > >> >> > On Fri, Aug 8, 2008 at 3:53 PM, Igor Stasenko <[hidden email]> >> >> > wrote: >> >> >> >> >> >> Sets which not allowing contain nil as element is a point of >> >> >> inconvenience. >> >> >> >> >> >> There are two ways how get around that: >> >> >> - initialize an array which contains set elements with unique object >> >> >> ~~ >> >> >> nil. >> >> >> and fix methods which testing for an empty slots to compare against >> >> >> this object, not nil. >> >> >> This can cause a slowdown during rehashing, because VM initially >> >> >> creating arrays filled with nils, while we need to fill them with >> >> >> another object. >> >> >> There is also a problem with preserving it 'uniqueness' by not >> >> >> giving >> >> >> this object outside a set. >> >> >> >> >> >> - add an instVar 'containsNil' >> >> >> then when set receiving 'add: nil' , it simply sets this flag to >> >> >> true. >> >> >> modify #collect: , #do: , #remove: to be aware of flag value. >> >> >> >> >> >> I find the second way is more appropriate. While it costs additional >> >> >> memory per Set/IdentitySet instance, it costs almost nothing in >> >> >> speed. >> >> >> >> >> >> What do you think about supporting Sets to contain nils in general, >> >> >> and about methods how to achieve that? >> >> > >> >> > Here's a third approach (a variation on your 2nd approach). Have an >> >> > instVar >> >> > 'includesSelf' and fill the array with the Set itself. So add a new >> >> > primitive new:fillWith: (primitiveNewWithArgAndFillValue?) and use >> >> > this >> >> > to >> >> > create the empty array filled with the Set itself. Check for adding >> >> > the >> >> > set >> >> > itself, and itself being the null entry. The advantage over the flag >> >> > for >> >> > nil approach is that you kill two birds with one stone. >> >> > 1. You need a unique value anyway, and the Set can nicely be its own >> >> > unique >> >> > value >> >> > 2. recursive collections are a problem to print and with the explicit >> >> > flag >> >> > this becomes much easier to deal with. >> >> >> >> In math domain, any set includes itself , not as element of course, >> >> but as subset :) >> >> And i don't see how this is better comparing to having 'containsNil' >> >> ivar? >> >> You still have to test this flag in each method which deals with >> >> elements , so be it containsFoo or containsBar - no real difference. >> > >> > One difference is the use of self for the unique object instead of >> > another. >> > Another difference is that recursive sets no longer fail to print, >> > inspect, >> > etc. >> > I think those are real differences. >> > >> >> Valid point. >> >> As a bytecode proofy, can you tell how much different a bytecode will be >> for: > > what's a proofy? ;) > >> >> obj == nil >> versus >> obj == self >> versus >> obj isNil > > Depends. In an Interpreter obj == self likely to be slightly faster than > obj == nil since nil must be fetched either form the specialObjectsArray (if > bytecode set has pushNl, as it does) or from the method's literal frame. In > a JIT they're liely the same because self is a read through the frame > pointer and nil is a constant embedded in the instruction stream. These are > likely to be of similar cost. > obj isNil will either be the same cost as == nil or slower depending on > whether the bytecode compiler inlines it. > But the difference between obj == self & obj == nil/obj isNil will be in the > noise. > You should make the decision on convenience. >> >> what is faster/slower? > > anArray asSet size is faster than any of the alternatives because it is > easier to thunk about ;) Faster thought is much more valuable than faster > processing, c.f. Smalltalk programs vs C++ programs ;) > yes, i raised this topic exactly from this reason. What i'm not sure that is this change is so badly needed. Maybe its only me who get stuck with a problem how to deal with sets where nils are meaningful and useful value. -- Best regards, Igor Stasenko AKA sig. |
On Fri, Aug 8, 2008 at 7:07 PM, Igor Stasenko <[hidden email]> wrote:
I've certainly run into it before and I know colleagues have in the past. What's hard to tell is how much code out there is working around the limitation.
|
In reply to this post by Igor Stasenko
Igor Stasenko ha scritto:
> What do you think about supporting Sets to contain nils in general, > and about methods how to achieve that? > Messing in such a way with a standard collection class is definetively wrong. What you could do is create another class that implements all Set's protocols. In such a class I'd use the instance itself as a placeholder to put in the empty slots. Giovanni |
In reply to this post by Igor Stasenko
Igor Stasenko ha scritto:
> > yes, i raised this topic exactly from this reason. > What i'm not sure that is this change is so badly needed. > Maybe its only me who get stuck with a problem how to deal with sets > where nils are meaningful and useful value. > Could it be that you're possibly missing an abstraction here? why not use a different object, specialized for your needs, to represent your null value? Giovanni |
2008/8/9 Giovanni Corriga <[hidden email]>:
> Igor Stasenko ha scritto: >> >> yes, i raised this topic exactly from this reason. >> What i'm not sure that is this change is so badly needed. >> Maybe its only me who get stuck with a problem how to deal with sets >> where nils are meaningful and useful value. >> > > Could it be that you're possibly missing an abstraction here? why not use a > different object, specialized for your needs, to represent your null value? > Because other part(s) of the system may not know about nulls as representation of nils, and in own turn may want to abstract a set containing nils - so it will use own null2 abstraction.. In result your end up operating with sets containing nulls and null2-s :) > Giovanni > -- Best regards, Igor Stasenko AKA sig. |
On Sat, 09 Aug 2008 14:41:42 +0200, Igor Stasenko wrote:
> 2008/8/9 Giovanni Corriga : >> Igor Stasenko ha scritto: >>> >>> yes, i raised this topic exactly from this reason. >>> What i'm not sure that is this change is so badly needed. >>> Maybe its only me who get stuck with a problem how to deal with sets >>> where nils are meaningful and useful value. >>> >> >> Could it be that you're possibly missing an abstraction here? why not >> use a >> different object, specialized for your needs, to represent your null >> value? >> > > Because other part(s) of the system may not know about nulls as > representation of nils, and in own turn may want to abstract a set > containing nils - so it will use own null2 abstraction.. In result > your end up operating with sets containing nulls and null2-s :) If it where a Smalltalk system the two of you are talking about, the Set in question follows encapsulation principle, and nobody outside of Set would know how it memorizes unoccupied slots. The implementor of Set is free to make own choice (all other things kept equal, like for example the contract with the VM on MethodDictionary while the latter inherits from Set). /Klaus |
Free forum by Nabble | Edit this page |