Sets with distinct entries (Was: Re: Re: Ideas about sets and dictionaries)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Sets with distinct entries (Was: Re: Re: Ideas about sets and dictionaries)

Igor Stasenko
2009/11/12 Tobias Pape <[hidden email]>:

> Hello —,
> Am 2009-11-12 um 06:35 schrieb Andreas Raab:
>
>> Igor Stasenko wrote:
>>> The main purpose to be able to handle nils in sets as an elements is to make Set
>>> be able to hold _any_ potentially existing objects in image.
>>
>> Absolutely not. The main purpose of having nil be included in sets is to remove a particular restriction (namely that Sets cannot contain nil), not to "be able to hold _any_ potentially existing objects in an image". What is being proposed is trading one restriction (Sets cannot contain nil) against another one (Sets cannot contain their internal private marker) which is deemed advantageous as it allows us to convert many collections that contain nil into sets without blowing up. *That* is the main purpose.
>>
>> The goal of "be able to hold _any_ potentially existing objects in an image" is neither achievable nor worthwhile. Try this for starters:
>>
>>  (s := Set new) add: s; add: s.
>
> As far as I know, another approach is to have
> Distinct Entry Object inside the set. cf. the GemStone
> approach.  A set, there, contains an ordered collection of
> nils an Entry elements. Thus, the actual element is stored _inside_
> the entry element, not in the underlying (storage) collection
> of the array itself.  As far as I know, this is similar to its Dictionary
> approach.
>   Probably, this will introduce a slightly broader spectrum of
> optimization possibilities.
>
> So long,
>        -Tobias
>

I haven't seen how it implemented in GemStone (so i think that
implementing it will be license clean (tm) ;)

As i see it, the trick is to use a message dispatch to object which
going to be an element, to tell it how it would like
to represent itself in Set. And this is where story begins.

add: newObject
        "Include newObject as one of the receiver's elements, but only if
        not already present. Answer newObject."

        | entry index |
        entry := newObject asSetElement.

        index := self scanFor: entry.
        (array at: index) ifNil: [self atNewIndex: index put: entry].
        ^ newObject

a SetElement class then is a one-inst-var object, holding a reference
to proxied object, and
needs to properly implement #hash #= , required by Set.
Also, at each place where we need to feed a real object , a Set should
send an #unwrap (or give more distinct selector name)
to return enclosed object.

a subclasses, CollectionSetElement and NilSetElement implemented
slightly differently to take in account that we are still using nil as
filler object, and collection included in set should prevent going too
deep in #hash and guard against potential recursion.
As an alternative , we migth use already existing LookupKey class, and
subclass from it, instead of having separate SetElement.

All objects then obliged to implement #asSetElement and #unwrap - by
providing the default methods in Object class, which by default doing
nothing, return self.
While Collection and UndefinedObject classes overriding asSetElement
in order to return wrappers instead of real objects.

The main overhead is cost of extra messages sent at each #add: and at
each point where we need to deliver a real set element (#unwrap).

The space overhead can be measured roughly as:

count := 0.
Set allSubInstancesDo: [:each |
        each do: [:value |  value isCollection ifTrue: [ count := count +1 ]]].
count

in my image its = 97197

but another test reveals that this number is much smaller, because we
don't have to be worry about ByteArrays, Strings and Symbols (we could
override #asSetElement back to return self, instead of constructing
wrapper):

count := 0.
Set allSubInstancesDo: [:each |
        each do: [:value |
                (value isCollection and: [value class isBytes not and: [value
isString not] ]) ifTrue: [ count := count +1 ]]].
count
- 5021

So, what you think? Is it worth with trying to implemet it, and see
the real numbers, or its already clear that its wrong way to go?



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Sets with distinct entries (Was: Re: Re: Ideas about sets and dictionaries)

Tobias Pape-2
Hello —,
Am 2009-11-12 um 12:11 schrieb Igor Stasenko:

> a SetElement class then is a one-inst-var object, holding a reference
> to proxied object, and
> needs to properly implement #hash #= , required by Set.
> Also, at each place where we need to feed a real object , a Set should
> send an #unwrap (or give more distinct selector name)
> to return enclosed object.

probaly, the folowing would also be possible:

for all, except the “special” object,

#asSetElement

        ^ self

the other two shall return an actual wrapper.

this shall retrun the actual entry upon called #value:

#value

        ^ wrappedEntry

Then, to obtain an entry , it would be called
asking for its #value.

Problem then: Blocks :)

so long,
        -Tobias


Reply | Threaded
Open this post in threaded view
|

Re: Sets with distinct entries (Was: Re: Re: Ideas about sets and dictionaries)

Igor Stasenko
In reply to this post by Igor Stasenko
Guys, i took a time to implement this idea (not precisely as
described, but close to).

Please review at:
  http://bugs.squeak.org/view.php?id=7413



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Sets with distinct entries (Was: Re: Re: Ideas about sets and dictionaries)

Igor Stasenko
In reply to this post by Tobias Pape-2
2009/11/12 Tobias Pape <[hidden email]>:

> Hello —,
> Am 2009-11-12 um 12:11 schrieb Igor Stasenko:
>
>> a SetElement class then is a one-inst-var object, holding a reference
>> to proxied object, and
>> needs to properly implement #hash #= , required by Set.
>> Also, at each place where we need to feed a real object , a Set should
>> send an #unwrap (or give more distinct selector name)
>> to return enclosed object.
>
> probaly, the folowing would also be possible:
>
> for all, except the “special” object,
>
> #asSetElement
>
>        ^ self
>
> the other two shall return an actual wrapper.
>
> this shall retrun the actual entry upon called #value:
>
> #value
>
>        ^ wrappedEntry
>
> Then, to obtain an entry , it would be called
> asking for its #value.
>
> Problem then: Blocks :)
>

not only with block, but with anything else, which implements #value
differently.
This is why i took completely different message selectors for that
(#asSetElement and #enclosedSetElement)

> so long,
>        -Tobias
>
>
>



--
Best regards,
Igor Stasenko AKA sig.