[vwnc] Bug in MethodDictionary

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[vwnc] Bug in MethodDictionary

Holger Kleinsorgen-4
Hello,

while loading some stuff from Store, I encountered an a "key not found
error" which resulted from a bug in MethodDictionary:

The method dictionary of a class showed some inconsistencies:

    "self includesKey: #packagesMLS"
evaluated to false, but

    "self keys includes: #packagesMLS"
evaluated to true

the keys were properly sorted, so it hat to be something else.
the first three selector symbols had the same identity hash:

   "(self basicAt: 1) identityHash" -> 3
   "(self basicAt: 3) identityHash" -> 3
   "(self basicAt: 5) identityHash" -> 3

which triggered an error in #findIndexOfKey:

    ...
    "Equal identityHashes.  Need linear search both ways until found."
    low := index - 2.
    [   low < 1
          ifTrue: [ ^ 0 ].
       ( probe := self basicAt: low ) == key
          ifTrue: [ ^ low ].
       probe identityHash = keyIdHash
    ] whileTrue: [ low := low - 2 ].
    low := index + 2.
    [   low > high
          ifTrue: [ ^ 0 ].
       ( probe := self basicAt: low ) == key
          ifTrue: [ ^ low ].
       probe identityHash = keyIdHash
    ] whileTrue: [ low := low + 2 ].
    ^ 0
    ...

if the hashes of all symbols from low to index are equal, but do not
include the key, the method will return 0 without looking in the range
from index to high

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Bug in MethodDictionary

Andres Valloud-6
Holger,

Ahhh, very nice catch indeed.  We do have a somewhat related AR,

AR# 46540
Description SortedCollection>>includes:/indexOf: binary search

which implements binary search / binary inclusion checks for sorted
collections.  Note that this functionality is not meant to replace what
includes: / indexOf: are doing right now, but rather they provide this
via new selectors.

Something that came up with that AR is how binary search proceeds in the
presence of duplicates.  I think I found a beautiful solution to that,
and I also think it's correct.  If this AR moves, then I could simply
adapt the solution to MethodDictionary and this problem would go away.
I made another AR, 54789, back referencing AR 46540, for the problem
with MethodDictionary.

Thank you for letting us know about this!

Andres.

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of Holger Kleinsorgen
Sent: Wednesday, July 16, 2008 3:37 AM
To: [hidden email]
Subject: [vwnc] Bug in MethodDictionary

Hello,

while loading some stuff from Store, I encountered an a "key not found
error" which resulted from a bug in MethodDictionary:

The method dictionary of a class showed some inconsistencies:

    "self includesKey: #packagesMLS"
evaluated to false, but

    "self keys includes: #packagesMLS"
evaluated to true

the keys were properly sorted, so it hat to be something else.
the first three selector symbols had the same identity hash:

   "(self basicAt: 1) identityHash" -> 3
   "(self basicAt: 3) identityHash" -> 3
   "(self basicAt: 5) identityHash" -> 3

which triggered an error in #findIndexOfKey:

    ...
    "Equal identityHashes.  Need linear search both ways until found."
    low := index - 2.
    [   low < 1
          ifTrue: [ ^ 0 ].
       ( probe := self basicAt: low ) == key
          ifTrue: [ ^ low ].
       probe identityHash = keyIdHash
    ] whileTrue: [ low := low - 2 ].
    low := index + 2.
    [   low > high
          ifTrue: [ ^ 0 ].
       ( probe := self basicAt: low ) == key
          ifTrue: [ ^ low ].
       probe identityHash = keyIdHash
    ] whileTrue: [ low := low + 2 ].
    ^ 0
    ...

if the hashes of all symbols from low to index are equal, but do not
include the key, the method will return 0 without looking in the range
from index to high

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Bug in MethodDictionary

Holger Kleinsorgen-4
> Ahhh, very nice catch indeed.  We do have a somewhat related AR,
>
> AR# 46540
> Description SortedCollection>>includes:/indexOf: binary search
>
> which implements binary search / binary inclusion checks for sorted
> collections.  Note that this functionality is not meant to replace what
> includes: / indexOf: are doing right now, but rather they provide this
> via new selectors.

may I ask why? Interval didn't implement #includes: in the past, too,
until a specific implementation was added in 7.6 (43789). This was nice,
because all senders automatically benefited from the improvement.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Bug in MethodDictionary

Alan Knight-2
At 07:57 AM 7/17/2008, Holger Kleinsorgen wrote:
> Ahhh, very nice catch indeed.  We do have a somewhat related AR,
>
> AR#            46540
> Description    SortedCollection>>includes:/indexOf: binary search
>
> which implements binary search / binary inclusion checks for sorted
> collections.  Note that this functionality is not meant to replace what
> includes: / indexOf: are doing right now, but rather they provide this
> via new selectors.

may I ask why? Interval didn't implement #includes: in the past, too,
until a specific implementation was added in 7.6 (43789). This was nice,
because all senders automatically benefited from the improvement.

This came up quite some time ago, and I can't seem to find the reference, but John Brant quite extensively pointed out the issues with includes: using binary search being difficult or impossible to reconcile with the ANSI standard behaviour. That's unfortunate, and arguably a bug with ANSI, but it does seem to make it inadvisable to make that the default implementation. It also does mean (if the methods are available) that you can deliberately invoke binary searches on collections that you happen to believe are sorted, even if they aren't strictly sorted collections.

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Bug in MethodDictionary

Alan Knight-2
At 09:42 AM 7/17/2008, Alan Knight wrote:
At 07:57 AM 7/17/2008, Holger Kleinsorgen wrote:
> Ahhh, very nice catch indeed.  We do have a somewhat related AR,
>
> AR#            46540
> Description    SortedCollection>>includes:/indexOf: binary search
>
> which implements binary search / binary inclusion checks for sorted
> collections.  Note that this functionality is not meant to replace what
> includes: / indexOf: are doing right now, but rather they provide this
> via new selectors.

may I ask why? Interval didn't implement #includes: in the past, too,
until a specific implementation was added in 7.6 (43789). This was nice,
because all senders automatically benefited from the improvement.

This came up quite some time ago, and I can't seem to find the reference, but John Brant quite extensively pointed out the issues with includes: using binary search being difficult or impossible to reconcile with the ANSI standard behaviour. That's unfortunate, and arguably a bug with ANSI, but it does seem to make it inadvisable to make that the default implementation. It also does mean (if the methods are available) that you can deliberately invoke binary searches on collections that you happen to believe are sorted, even if they aren't strictly sorted collections.

OK, I found it. In comp.lang.smalltalk, September 2003, with the rather unrelated subject line "Re: ST standardization: (was Re: Is Python's library more standard than Smalltalk's?) "


--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Bug in MethodDictionary

Andres Valloud-3
Here's a more permlink to the messages...

http://groups.google.com/group/comp.lang.smalltalk/browse_thread/thread/3c47ffb41e6433d3/d21a50cbcb3a49de?q=ANSI&lnk=nl&

FIrst, things like 'abcd' asSortedCollection includes: 5 would fail if
includes: used binary search because the argument may not be comparable
to what is inside the collection.  And wrapping the sort block in an
exception handler may not work either:

" [...] consider the following:
    | coll |
    coll := ((1 to: 10) collect: [:each | 2 raisedTo: each])
        asSortedCollection: [:a :b | a highBit <= b highBit].
    coll allSatisfy: [:each | coll includes: each asFloat]
This should return true by the ANSI standard. However, it will raise an
error if you try to look for the floating point value using the sort block.

Also, does [binary search] work for the case where the sort block
doesn't uniquely
distinguish items?
    | coll |
    coll := (1 to: 1000) asSortedCollection: [:a :b | true].
    coll allSatisfy: [:each | coll includes: each]
In this case, [includes:] has to linearly look through the whole
collection for each element. "

Andres.


Alan Knight wrote:

> At 09:42 AM 7/17/2008, Alan Knight wrote:
>  
>> At 07:57 AM 7/17/2008, Holger Kleinsorgen wrote:
>>    
>>>> Ahhh, very nice catch indeed.  We do have a somewhat related AR,
>>>>
>>>> AR#           46540
>>>> Description   SortedCollection>>includes:/indexOf: binary search
>>>>
>>>> which implements binary search / binary inclusion checks for sorted
>>>> collections.  Note that this functionality is not meant to replace what
>>>> includes: / indexOf: are doing right now, but rather they provide this
>>>> via new selectors.
>>>>        
>>> may I ask why? Interval didn't implement #includes: in the past, too,
>>> until a specific implementation was added in 7.6 (43789). This was nice,
>>> because all senders automatically benefited from the improvement.
>>>      
>> This came up quite some time ago, and I can't seem to find the reference, but John Brant quite extensively pointed out the issues with includes: using binary search being difficult or impossible to reconcile with the ANSI standard behaviour. That's unfortunate, and arguably a bug with ANSI, but it does seem to make it inadvisable to make that the default implementation. It also does mean (if the methods are available) that you can deliberately invoke binary searches on collections that you happen to believe are sorted, even if they aren't strictly sorted collections.
>>    
>
> OK, I found it. In comp.lang.smalltalk, September 2003, with the rather unrelated subject line "Re: ST standardization: (was Re: Is Python's library more standard than Smalltalk's?) "
>
>
> --
> Alan Knight [|], Engineering Manager, Cincom Smalltalk
> [hidden email]
> [hidden email]
> http://www.cincom.com/smalltalk
>
>  
> ------------------------------------------------------------------------
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>  

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Bug in MethodDictionary

Holger Kleinsorgen-4
Andres Valloud schrieb:

> Here's a more permlink to the messages...
>
> http://groups.google.com/group/comp.lang.smalltalk/browse_thread/thread/3c47ffb41e6433d3/d21a50cbcb3a49de?q=ANSI&lnk=nl&
>
> FIrst, things like 'abcd' asSortedCollection includes: 5 would fail if
> includes: used binary search because the argument may not be comparable
> to what is inside the collection.  And wrapping the sort block in an
> exception handler may not work either:
>
> " [...] consider the following:
>     | coll |
>     coll := ((1 to: 10) collect: [:each | 2 raisedTo: each])
>         asSortedCollection: [:a :b | a highBit <= b highBit].
>     coll allSatisfy: [:each | coll includes: each asFloat]
> This should return true by the ANSI standard. However, it will raise an
> error if you try to look for the floating point value using the sort block.
>
> Also, does [binary search] work for the case where the sort block
> doesn't uniquely
> distinguish items?
>     | coll |
>     coll := (1 to: 1000) asSortedCollection: [:a :b | true].
>     coll allSatisfy: [:each | coll includes: each]
> In this case, [includes:] has to linearly look through the whole
> collection for each element. "

thank you both for the explanation and links.

Slightly related:
Here at work we rarely use #asSortedCollection: [:a :b | ... ]. Instead,
we usually use a custom extension, #sortedBy: [ : a | ... ], which
returns a comparable value. There are two reasons for doing this:

- sorting is usually done by computing a comparable value based on each
item of the collection, so the expression to compute the comparable
value is the same for a and b.

- computing the comparable value is often expensive, so #sortedBy: first
  computes the values for all elements and then uses the precomputed
values to sort the collection.

Maybe it would be useful to have a method similar to #sortedBy: in the
base image, in two flavors (with and without precomputation).

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Bug in MethodDictionary

Paul Baumann
Holger wrote:
>>Maybe it would be useful to have a method similar to #sortedBy: in the
base image

Here are techniques I use for efficient sorting. Given that...

Sorting usually involves multiple attributes. Sorting needs to be fast.
Determining all the attributes in advance for each of the objects to be
compared would not be efficient. A general sorting framework should
allow for the sorting of dissimilar objects without crashing.

Here is an example of how I sort code models of GemKit:

GbcCode>><= other
        | i a b |
        i := 1.
        [
                (a := self sortableFieldAt: i) isNil ifTrue: [^true].
                (b := other sortableFieldAt: i) isNil ifTrue: [^false].
                a < b ifTrue: [^true].
                b < a ifTrue: [^false].
                (i := i + 1) < 1000
        ] whileTrue: [].
        self error: '#sortableFieldAt: returned an unlikely number of
matches before returning nil'.

GbcCode>>sortableFieldAt: pos
        ^nil
GbcClassReference>>sortableFieldAt: pos
        pos = 1 ifTrue: [^50]. "Order: namespaces, variables, classes,
methods"
        pos = 2 ifTrue: [^self hierarchyName].
        pos = 3 ifTrue: [^self lastSourceMagnitude].
        ^nil
GbcGsUser>>sortableFieldAt: pos
        pos = 1 ifTrue: [^25]. "Order: namespaces, variables, classes,
methods"
        pos = 2 ifTrue: [^self name].
        pos = 3 ifTrue: [^self lastSourceMagnitude].
        ^nil
GbcMethod>>sortableFieldAt: pos
        pos = 1 ifTrue: [^100]. "Order: namespaces, variables, classes,
methods"
        pos = 2 ifTrue: [^self className].
        pos = 3 ifTrue: [^self isMeta ifTrue: [50] ifFalse: [51]].
        pos = 4 ifTrue: [^self selector].
        pos = 5 ifTrue: [^self lastSourceMagnitude].
        ^nil
GbcNamespaceAlias>>sortableFieldAt: pos
        pos = 1 ifTrue: [^20]. "Order: namespaces, variables, classes,
methods"
        pos = 2 ifTrue: [^self name].
        pos = 3 ifTrue: [^self lastSourceMagnitude].
        ^nil
GbcSharedVariable>>sortableFieldAt: pos
        pos = 1 ifTrue: [^30]. "Order: namespaces, variables, classes,
methods"
        pos = 2 ifTrue: [^self name].
        pos = 3 ifTrue: [^self lastSourceMagnitude].
        ^nil

Now I just use #asSortedCollection to allow all the models to be
sequenced. The big thing is that it only incurs the cost of fetching
sort attributes when necessary for a comparison. Temporary objects are
not created for the sorting behavior.

In this implementation I returned a generality for the first field.
There are plenty of other ways it can be done for more generic use, but
this was the most efficient for my needs. It is simple, fast, and
flexible. You could for example change the code around to pass a
sortContext as a second argument (like if you wanted to sort by a
different attribute for a window).

Paul Baumann
 
--------------------------------------------------------
This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Bug in MethodDictionary

Holger Kleinsorgen-4
Paul Baumann schrieb:

> Holger wrote:
>>> Maybe it would be useful to have a method similar to #sortedBy: in the
> base image
>
> Here are techniques I use for efficient sorting. Given that...
>
> Sorting usually involves multiple attributes. Sorting needs to be fast.
> Determining all the attributes in advance for each of the objects to be
> compared would not be efficient. A general sorting framework should
> allow for the sorting of dissimilar objects without crashing.
>
> Here is an example of how I sort code models of GemKit:
 > [snip]

we use something similiar in conjunction  with #sortedBy: - a class
called MultiSortKey. It's a subclass of OrderedCollection, and each
element is a sort criteria block.

Example usage (a somewhat constructed example, sorry):

^ self persons sortedBy: [ : person |
     MultiSortKey
        with: person lastName
        with: [ person residence postalCode ]
   ]

the persons are sorted by name first, then by the postal code. in this
example, fetching the postal code is wrapped in a block, because getting
the residence object might require querying a DB. For sorting this will
only be required for persons with the same last name.

The MultiSortKey basically works like this:

MultiSortKey>>= otherCollection

   | size |
   self species == otherCollection species ifFalse: [^false].
   (size := self size) = otherCollection size ifFalse: [^false].
   1 to: size do: [:index |
     (self resolvedAt: index) = (otherCollection resolvedAt: index)
        ifFalse: [^false]].
   ^true


MultiSortKey>>resolvedAt: index
    " lazy evaluation, value is cached "

    | v |
    v := self at: index.
    v kIsBlock
       ifTrue: [
          v := v value.
          self at: index put: v
       ].
    ^ v

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc