Hello,
while loading some stuff from Store, I encountered an a "key not found error" which resulted from a bug in MethodDictionary: The method dictionary of a class showed some inconsistencies: "self includesKey: #packagesMLS" evaluated to false, but "self keys includes: #packagesMLS" evaluated to true the keys were properly sorted, so it hat to be something else. the first three selector symbols had the same identity hash: "(self basicAt: 1) identityHash" -> 3 "(self basicAt: 3) identityHash" -> 3 "(self basicAt: 5) identityHash" -> 3 which triggered an error in #findIndexOfKey: ... "Equal identityHashes. Need linear search both ways until found." low := index - 2. [ low < 1 ifTrue: [ ^ 0 ]. ( probe := self basicAt: low ) == key ifTrue: [ ^ low ]. probe identityHash = keyIdHash ] whileTrue: [ low := low - 2 ]. low := index + 2. [ low > high ifTrue: [ ^ 0 ]. ( probe := self basicAt: low ) == key ifTrue: [ ^ low ]. probe identityHash = keyIdHash ] whileTrue: [ low := low + 2 ]. ^ 0 ... if the hashes of all symbols from low to index are equal, but do not include the key, the method will return 0 without looking in the range from index to high _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Holger,
Ahhh, very nice catch indeed. We do have a somewhat related AR, AR# 46540 Description SortedCollection>>includes:/indexOf: binary search which implements binary search / binary inclusion checks for sorted collections. Note that this functionality is not meant to replace what includes: / indexOf: are doing right now, but rather they provide this via new selectors. Something that came up with that AR is how binary search proceeds in the presence of duplicates. I think I found a beautiful solution to that, and I also think it's correct. If this AR moves, then I could simply adapt the solution to MethodDictionary and this problem would go away. I made another AR, 54789, back referencing AR 46540, for the problem with MethodDictionary. Thank you for letting us know about this! Andres. -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Holger Kleinsorgen Sent: Wednesday, July 16, 2008 3:37 AM To: [hidden email] Subject: [vwnc] Bug in MethodDictionary Hello, while loading some stuff from Store, I encountered an a "key not found error" which resulted from a bug in MethodDictionary: The method dictionary of a class showed some inconsistencies: "self includesKey: #packagesMLS" evaluated to false, but "self keys includes: #packagesMLS" evaluated to true the keys were properly sorted, so it hat to be something else. the first three selector symbols had the same identity hash: "(self basicAt: 1) identityHash" -> 3 "(self basicAt: 3) identityHash" -> 3 "(self basicAt: 5) identityHash" -> 3 which triggered an error in #findIndexOfKey: ... "Equal identityHashes. Need linear search both ways until found." low := index - 2. [ low < 1 ifTrue: [ ^ 0 ]. ( probe := self basicAt: low ) == key ifTrue: [ ^ low ]. probe identityHash = keyIdHash ] whileTrue: [ low := low - 2 ]. low := index + 2. [ low > high ifTrue: [ ^ 0 ]. ( probe := self basicAt: low ) == key ifTrue: [ ^ low ]. probe identityHash = keyIdHash ] whileTrue: [ low := low + 2 ]. ^ 0 ... if the hashes of all symbols from low to index are equal, but do not include the key, the method will return 0 without looking in the range from index to high _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
> Ahhh, very nice catch indeed. We do have a somewhat related AR,
> > AR# 46540 > Description SortedCollection>>includes:/indexOf: binary search > > which implements binary search / binary inclusion checks for sorted > collections. Note that this functionality is not meant to replace what > includes: / indexOf: are doing right now, but rather they provide this > via new selectors. may I ask why? Interval didn't implement #includes: in the past, too, until a specific implementation was added in 7.6 (43789). This was nice, because all senders automatically benefited from the improvement. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
At 07:57 AM 7/17/2008, Holger Kleinsorgen wrote:
> Ahhh, very nice catch indeed. We do have a somewhat related AR, This came up quite some time ago, and I can't seem to find the reference, but John Brant quite extensively pointed out the issues with includes: using binary search being difficult or impossible to reconcile with the ANSI standard behaviour. That's unfortunate, and arguably a bug with ANSI, but it does seem to make it inadvisable to make that the default implementation. It also does mean (if the methods are available) that you can deliberately invoke binary searches on collections that you happen to believe are sorted, even if they aren't strictly sorted collections. --
Alan Knight [|], Engineering Manager, Cincom Smalltalk
_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
At 09:42 AM 7/17/2008, Alan Knight wrote:
At 07:57 AM 7/17/2008, Holger Kleinsorgen wrote: OK, I found it. In comp.lang.smalltalk, September 2003, with the rather unrelated subject line "Re: ST standardization: (was Re: Is Python's library more standard than Smalltalk's?) " --
Alan Knight [|], Engineering Manager, Cincom Smalltalk
_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Here's a more permlink to the messages...
http://groups.google.com/group/comp.lang.smalltalk/browse_thread/thread/3c47ffb41e6433d3/d21a50cbcb3a49de?q=ANSI&lnk=nl& FIrst, things like 'abcd' asSortedCollection includes: 5 would fail if includes: used binary search because the argument may not be comparable to what is inside the collection. And wrapping the sort block in an exception handler may not work either: " [...] consider the following: | coll | coll := ((1 to: 10) collect: [:each | 2 raisedTo: each]) asSortedCollection: [:a :b | a highBit <= b highBit]. coll allSatisfy: [:each | coll includes: each asFloat] This should return true by the ANSI standard. However, it will raise an error if you try to look for the floating point value using the sort block. Also, does [binary search] work for the case where the sort block doesn't uniquely distinguish items? | coll | coll := (1 to: 1000) asSortedCollection: [:a :b | true]. coll allSatisfy: [:each | coll includes: each] In this case, [includes:] has to linearly look through the whole collection for each element. " Andres. Alan Knight wrote: > At 09:42 AM 7/17/2008, Alan Knight wrote: > >> At 07:57 AM 7/17/2008, Holger Kleinsorgen wrote: >> >>>> Ahhh, very nice catch indeed. We do have a somewhat related AR, >>>> >>>> AR# 46540 >>>> Description SortedCollection>>includes:/indexOf: binary search >>>> >>>> which implements binary search / binary inclusion checks for sorted >>>> collections. Note that this functionality is not meant to replace what >>>> includes: / indexOf: are doing right now, but rather they provide this >>>> via new selectors. >>>> >>> may I ask why? Interval didn't implement #includes: in the past, too, >>> until a specific implementation was added in 7.6 (43789). This was nice, >>> because all senders automatically benefited from the improvement. >>> >> This came up quite some time ago, and I can't seem to find the reference, but John Brant quite extensively pointed out the issues with includes: using binary search being difficult or impossible to reconcile with the ANSI standard behaviour. That's unfortunate, and arguably a bug with ANSI, but it does seem to make it inadvisable to make that the default implementation. It also does mean (if the methods are available) that you can deliberately invoke binary searches on collections that you happen to believe are sorted, even if they aren't strictly sorted collections. >> > > OK, I found it. In comp.lang.smalltalk, September 2003, with the rather unrelated subject line "Re: ST standardization: (was Re: Is Python's library more standard than Smalltalk's?) " > > > -- > Alan Knight [|], Engineering Manager, Cincom Smalltalk > [hidden email] > [hidden email] > http://www.cincom.com/smalltalk > > > ------------------------------------------------------------------------ > > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Andres Valloud schrieb:
> Here's a more permlink to the messages... > > http://groups.google.com/group/comp.lang.smalltalk/browse_thread/thread/3c47ffb41e6433d3/d21a50cbcb3a49de?q=ANSI&lnk=nl& > > FIrst, things like 'abcd' asSortedCollection includes: 5 would fail if > includes: used binary search because the argument may not be comparable > to what is inside the collection. And wrapping the sort block in an > exception handler may not work either: > > " [...] consider the following: > | coll | > coll := ((1 to: 10) collect: [:each | 2 raisedTo: each]) > asSortedCollection: [:a :b | a highBit <= b highBit]. > coll allSatisfy: [:each | coll includes: each asFloat] > This should return true by the ANSI standard. However, it will raise an > error if you try to look for the floating point value using the sort block. > > Also, does [binary search] work for the case where the sort block > doesn't uniquely > distinguish items? > | coll | > coll := (1 to: 1000) asSortedCollection: [:a :b | true]. > coll allSatisfy: [:each | coll includes: each] > In this case, [includes:] has to linearly look through the whole > collection for each element. " thank you both for the explanation and links. Slightly related: Here at work we rarely use #asSortedCollection: [:a :b | ... ]. Instead, we usually use a custom extension, #sortedBy: [ : a | ... ], which returns a comparable value. There are two reasons for doing this: - sorting is usually done by computing a comparable value based on each item of the collection, so the expression to compute the comparable value is the same for a and b. - computing the comparable value is often expensive, so #sortedBy: first computes the values for all elements and then uses the precomputed values to sort the collection. Maybe it would be useful to have a method similar to #sortedBy: in the base image, in two flavors (with and without precomputation). _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Holger wrote:
>>Maybe it would be useful to have a method similar to #sortedBy: in the base image Here are techniques I use for efficient sorting. Given that... Sorting usually involves multiple attributes. Sorting needs to be fast. Determining all the attributes in advance for each of the objects to be compared would not be efficient. A general sorting framework should allow for the sorting of dissimilar objects without crashing. Here is an example of how I sort code models of GemKit: GbcCode>><= other | i a b | i := 1. [ (a := self sortableFieldAt: i) isNil ifTrue: [^true]. (b := other sortableFieldAt: i) isNil ifTrue: [^false]. a < b ifTrue: [^true]. b < a ifTrue: [^false]. (i := i + 1) < 1000 ] whileTrue: []. self error: '#sortableFieldAt: returned an unlikely number of matches before returning nil'. GbcCode>>sortableFieldAt: pos ^nil GbcClassReference>>sortableFieldAt: pos pos = 1 ifTrue: [^50]. "Order: namespaces, variables, classes, methods" pos = 2 ifTrue: [^self hierarchyName]. pos = 3 ifTrue: [^self lastSourceMagnitude]. ^nil GbcGsUser>>sortableFieldAt: pos pos = 1 ifTrue: [^25]. "Order: namespaces, variables, classes, methods" pos = 2 ifTrue: [^self name]. pos = 3 ifTrue: [^self lastSourceMagnitude]. ^nil GbcMethod>>sortableFieldAt: pos pos = 1 ifTrue: [^100]. "Order: namespaces, variables, classes, methods" pos = 2 ifTrue: [^self className]. pos = 3 ifTrue: [^self isMeta ifTrue: [50] ifFalse: [51]]. pos = 4 ifTrue: [^self selector]. pos = 5 ifTrue: [^self lastSourceMagnitude]. ^nil GbcNamespaceAlias>>sortableFieldAt: pos pos = 1 ifTrue: [^20]. "Order: namespaces, variables, classes, methods" pos = 2 ifTrue: [^self name]. pos = 3 ifTrue: [^self lastSourceMagnitude]. ^nil GbcSharedVariable>>sortableFieldAt: pos pos = 1 ifTrue: [^30]. "Order: namespaces, variables, classes, methods" pos = 2 ifTrue: [^self name]. pos = 3 ifTrue: [^self lastSourceMagnitude]. ^nil Now I just use #asSortedCollection to allow all the models to be sequenced. The big thing is that it only incurs the cost of fetching sort attributes when necessary for a comparison. Temporary objects are not created for the sorting behavior. In this implementation I returned a generality for the first field. There are plenty of other ways it can be done for more generic use, but this was the most efficient for my needs. It is simple, fast, and flexible. You could for example change the code around to pass a sortContext as a second argument (like if you wanted to sort by a different attribute for a window). Paul Baumann -------------------------------------------------------- This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Paul Baumann schrieb:
> Holger wrote: >>> Maybe it would be useful to have a method similar to #sortedBy: in the > base image > > Here are techniques I use for efficient sorting. Given that... > > Sorting usually involves multiple attributes. Sorting needs to be fast. > Determining all the attributes in advance for each of the objects to be > compared would not be efficient. A general sorting framework should > allow for the sorting of dissimilar objects without crashing. > > Here is an example of how I sort code models of GemKit: we use something similiar in conjunction with #sortedBy: - a class called MultiSortKey. It's a subclass of OrderedCollection, and each element is a sort criteria block. Example usage (a somewhat constructed example, sorry): ^ self persons sortedBy: [ : person | MultiSortKey with: person lastName with: [ person residence postalCode ] ] the persons are sorted by name first, then by the postal code. in this example, fetching the postal code is wrapped in a block, because getting the residence object might require querying a DB. For sorting this will only be required for persons with the same last name. The MultiSortKey basically works like this: MultiSortKey>>= otherCollection | size | self species == otherCollection species ifFalse: [^false]. (size := self size) = otherCollection size ifFalse: [^false]. 1 to: size do: [:index | (self resolvedAt: index) = (otherCollection resolvedAt: index) ifFalse: [^false]]. ^true MultiSortKey>>resolvedAt: index " lazy evaluation, value is cached " | v | v := self at: index. v kIsBlock ifTrue: [ v := v value. self at: index put: v ]. ^ v _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Free forum by Nabble | Edit this page |