The Inbox: Collections-cmm.576.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

The Inbox: Collections-cmm.576.mcz

commits-2
Chris Muller uploaded a new version of Collections to project The Inbox:
http://source.squeak.org/inbox/Collections-cmm.576.mcz

==================== Summary ====================

Name: Collections-cmm.576
Author: cmm
Time: 24 July 2014, 10:39:06.452 am
UUID: 17b60ac6-55b3-4706-b1c8-d826e6ad5657
Ancestors: Collections-eem.575

When a SequenceableCollection contains duplicate keys, #findBinaryIndex:do:ifNone: should answer the index of the _earliest_.

=============== Diff against Collections-eem.575 ===============

Item was changed:
  ----- Method: SequenceableCollection>>findBinaryIndex:do:ifNone: (in category 'enumerating') -----
+ findBinaryIndex: aBlock do: actionBlock ifNone: exceptionBlock
- findBinaryIndex: aBlock do: actionBlock ifNone: exceptionBlock
  "Search for an element in the receiver using binary search.
  The argument aBlock is a one-element block returning
  0 - if the element is the one searched for
  <0 - if the search should continue in the first half
  >0 - if the search should continue in the second half
  If found, evaluate actionBlock with the index as argument
  If no matching element is found, evaluate exceptionBlock,
  with the indexes of the 'bounding' elements as optional
  arguments. Warning: Might give invalid indexes, see
  examples below.
  Examples:
+ #(1 3 5 7 11 11 15 23)
- #(1 3 5 7 11 15 23)
  findBinaryIndex: [ :arg | 11 - arg ]
  do: [ :found | found ]
  ifNone: [ :a :b | ('between: ', {a. b} printString)]
  #(1 3 5 7 11 15 23)
  findBinaryIndex: [ :arg | 12 - arg ]
  do: [ :found | found ]
  ifNone: [ :a :b | ('between: ', {a. b} printString) ]
  #(1 3 5 7 11 15 23) d
  findBinaryIndex: [ :arg | 0.5 - arg ]
  do: [ :found | found ]
  ifNone: [ :a :b | ('between: ', {a. b} printString) ]
  #(1 3 5 7 11 15 23)
  findBinaryIndex: [ :arg | 25 - arg ]
  do: [ :found | found ]
  ifNone: [ :a :b | ('between: ',{a. b} printString) ]
  "
  | index low high |
  low := 1.
  high := self size.
+ [ index := high + low // 2.
+ low > high ] whileFalse:
+ [ | test |
+ test := aBlock value: (self at: index).
+ test = 0
+ ifTrue:
+ [ "In case there are duplicate keys, walk backward to the first."
+ [ index > 1 and: [ (aBlock value: (self at: index - 1)) = 0 ] ] whileTrue: [ index := index - 1 ].
+ ^ actionBlock value: index ]
+ ifFalse:
+ [ test > 0
- [
- index := high + low // 2.
- low > high ] whileFalse: [
- | test |
- test := aBlock value: (self at: index).
- test = 0
- ifTrue: [ ^actionBlock value: index ]
- ifFalse: [ test > 0
  ifTrue: [ low := index + 1 ]
  ifFalse: [ high := index - 1 ] ] ].
+ ^ exceptionBlock
+ cull: high
+ cull: low!
- ^exceptionBlock cull: high cull: low!


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.576.mcz

Frank Shearar-3
On 24 July 2014 16:39,  <[hidden email]> wrote:

> Chris Muller uploaded a new version of Collections to project The Inbox:
> http://source.squeak.org/inbox/Collections-cmm.576.mcz
>
> ==================== Summary ====================
>
> Name: Collections-cmm.576
> Author: cmm
> Time: 24 July 2014, 10:39:06.452 am
> UUID: 17b60ac6-55b3-4706-b1c8-d826e6ad5657
> Ancestors: Collections-eem.575
>
> When a SequenceableCollection contains duplicate keys, #findBinaryIndex:do:ifNone: should answer the index of the _earliest_.
>
> =============== Diff against Collections-eem.575 ===============
>
> Item was changed:
>   ----- Method: SequenceableCollection>>findBinaryIndex:do:ifNone: (in category 'enumerating') -----
> + findBinaryIndex: aBlock do: actionBlock ifNone: exceptionBlock
> - findBinaryIndex: aBlock do: actionBlock ifNone: exceptionBlock
>         "Search for an element in the receiver using binary search.
>         The argument aBlock is a one-element block returning
>                 0       - if the element is the one searched for
>                 <0      - if the search should continue in the first half
>                 >0      - if the search should continue in the second half
>         If found, evaluate actionBlock with the index as argument
>         If no matching element is found, evaluate exceptionBlock,
>         with the indexes of the 'bounding' elements as optional
>         arguments.      Warning: Might give invalid indexes, see
>         examples below.
>         Examples:
> +               #(1 3 5 7 11 11 15 23)
> -               #(1 3 5 7 11 15 23)
>                         findBinaryIndex: [ :arg | 11 - arg ]
>                         do: [ :found | found ]
>                         ifNone: [ :a :b | ('between: ', {a. b} printString)]
>                 #(1 3 5 7 11 15 23)
>                         findBinaryIndex: [ :arg | 12 - arg ]
>                         do: [ :found | found ]
>                         ifNone: [ :a :b | ('between: ', {a. b} printString) ]
>                 #(1 3 5 7 11 15 23) d
>                         findBinaryIndex: [ :arg | 0.5 - arg ]
>                         do: [ :found | found ]
>                         ifNone: [ :a :b | ('between: ', {a. b} printString) ]
>                 #(1 3 5 7 11 15 23)
>                         findBinaryIndex: [ :arg | 25 - arg ]
>                         do: [ :found | found ]
>                         ifNone: [ :a :b | ('between: ',{a. b} printString) ]
>         "
>         | index low high |
>         low := 1.
>         high := self size.
> +       [ index := high + low // 2.
> +       low > high ] whileFalse:
> +               [ | test |
> +               test := aBlock value: (self at: index).
> +               test = 0
> +                       ifTrue:
> +                               [ "In case there are duplicate keys, walk backward to the first."
> +                               [ index > 1 and: [ (aBlock value: (self at: index - 1)) = 0 ] ] whileTrue: [ index := index - 1 ].
> +                               ^ actionBlock value: index ]
> +                       ifFalse:
> +                               [ test > 0
> -       [
> -               index := high + low // 2.
> -               low > high ] whileFalse: [
> -                       | test |
> -                       test := aBlock value: (self at: index).
> -                       test = 0
> -                               ifTrue: [ ^actionBlock value: index ]
> -                               ifFalse: [ test > 0
>                                         ifTrue: [ low := index + 1 ]
>                                         ifFalse: [ high := index - 1 ] ] ].
> +       ^ exceptionBlock
> +               cull: high
> +               cull: low!
> -       ^exceptionBlock cull: high cull: low!

The change looks good, and certainly I approve of the feature.
Wouldn't mind seeing some tests to go with it too, especially if you
took the time to backfill the other tests this method needs!

(It took me a good few minutes to parse the diff and see that the
_actual_ change is just (something like)

> +                               [ "In case there are duplicate keys, walk backward to the first."
> +                               [ index > 1 and: [ (aBlock value: (self at: index - 1)) = 0 ] ] whileTrue: [ index := index - 1 ].
> -                               ifTrue: [ ^actionBlock value: index ]


It's _really hard_ to see what the actual change is, when it's drowned
out by semantically meaningless whitespace changes. (If you were in an
image and did a "pretty print diff" you'd get an accurate diff, at the
expense of not seeing the horrible things the author was doing to the
layout (regardless of what layout you, the viewer, preferred).))

frank

Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.576.mcz

Levente Uzonyi-2
In reply to this post by commits-2
On Thu, 24 Jul 2014, [hidden email] wrote:

> Chris Muller uploaded a new version of Collections to project The Inbox:
> http://source.squeak.org/inbox/Collections-cmm.576.mcz
>
> ==================== Summary ====================
>
> Name: Collections-cmm.576
> Author: cmm
> Time: 24 July 2014, 10:39:06.452 am
> UUID: 17b60ac6-55b3-4706-b1c8-d826e6ad5657
> Ancestors: Collections-eem.575
>
> When a SequenceableCollection contains duplicate keys, #findBinaryIndex:do:ifNone: should answer the index of the _earliest_.

There are two problems with this change
- it breaks this method's runtime guarantee which is O(log(n)) for all cases. Worst case runtime becomes O(n) [2], instead of O(log(n)) (actually Big Theta instead of Big Omicron[1], but it's not on my keyboard).
- it introduces unwanted behavior. What if I want the index of the last matching element?

If you want the index of the first matching element, then you can do two
things:
- find the index after the binary search yourself (which is still not a
good idea because of the possible performance loss)
- change the binary search to behave as you'd like to

| haystack needle |
haystack := #(1 3 5 7 11 11 15 23).
needle := 11.
haystack
  findBinaryIndex: [ :each |
  needle = each
  ifTrue: [ -1 " Go to the left in case of a match to ensure that upper will be the index of the leftmost match " ]
  ifFalse: [ needle - each ] ]
  do: nil "We'll never get here, so it can be anything."
  ifNone: [ :lower :upper |
  " upper is the index of the leftmost match if there's a match, and its value is at least 1 "
  (upper <= haystack size and: [ (haystack at: upper) = needle ])
  ifTrue: [ upper ]
  ifFalse: [ 0 ] ]

With a slight modification you can find the index of the last match too.


Levente

[1] https://en.wikipedia.org/wiki/Big_O_notation#Family_of_Bachmann.E2.80.93Landau_notations
[2] Here's a small benchmark demonstrating the edge case with your change:

| haystack needle |
Smalltalk garbageCollect.
haystack := Array new: 100000 withAll: 11.
needle := 11.
{ [ haystack
  findBinaryIndex: [ :each |
  needle = each
  ifTrue: [ -1 ]
  ifFalse: [ needle - each ] ]
  do: nil
  ifNone: [ :lower :upper |
  (upper <= haystack size and: [ (haystack at: upper) = needle ])
  ifTrue: [ upper ]
  ifFalse: [ 0 ] ] ] bench.
[ haystack
  findBinaryIndex: [ :each | needle - each ]
  do: [ :each | each ]
  ifNone: [ 0 ] ] bench }
#('1,150,000 per second.' '741 per second.')

>
> =============== Diff against Collections-eem.575 ===============
>
> Item was changed:
>  ----- Method: SequenceableCollection>>findBinaryIndex:do:ifNone: (in category 'enumerating') -----
> + findBinaryIndex: aBlock do: actionBlock ifNone: exceptionBlock
> - findBinaryIndex: aBlock do: actionBlock ifNone: exceptionBlock
>   "Search for an element in the receiver using binary search.
>   The argument aBlock is a one-element block returning
>   0 - if the element is the one searched for
>   <0 - if the search should continue in the first half
>   >0 - if the search should continue in the second half
>   If found, evaluate actionBlock with the index as argument
>   If no matching element is found, evaluate exceptionBlock,
>   with the indexes of the 'bounding' elements as optional
>   arguments. Warning: Might give invalid indexes, see
>   examples below.
>   Examples:
> + #(1 3 5 7 11 11 15 23)
> - #(1 3 5 7 11 15 23)
>   findBinaryIndex: [ :arg | 11 - arg ]
>   do: [ :found | found ]
>   ifNone: [ :a :b | ('between: ', {a. b} printString)]
>   #(1 3 5 7 11 15 23)
>   findBinaryIndex: [ :arg | 12 - arg ]
>   do: [ :found | found ]
>   ifNone: [ :a :b | ('between: ', {a. b} printString) ]
>   #(1 3 5 7 11 15 23) d
>   findBinaryIndex: [ :arg | 0.5 - arg ]
>   do: [ :found | found ]
>   ifNone: [ :a :b | ('between: ', {a. b} printString) ]
>   #(1 3 5 7 11 15 23)
>   findBinaryIndex: [ :arg | 25 - arg ]
>   do: [ :found | found ]
>   ifNone: [ :a :b | ('between: ',{a. b} printString) ]
>   "
>   | index low high |
>   low := 1.
>   high := self size.
> + [ index := high + low // 2.
> + low > high ] whileFalse:
> + [ | test |
> + test := aBlock value: (self at: index).
> + test = 0
> + ifTrue:
> + [ "In case there are duplicate keys, walk backward to the first."
> + [ index > 1 and: [ (aBlock value: (self at: index - 1)) = 0 ] ] whileTrue: [ index := index - 1 ].
> + ^ actionBlock value: index ]
> + ifFalse:
> + [ test > 0
> - [
> - index := high + low // 2.
> - low > high ] whileFalse: [
> - | test |
> - test := aBlock value: (self at: index).
> - test = 0
> - ifTrue: [ ^actionBlock value: index ]
> - ifFalse: [ test > 0
>   ifTrue: [ low := index + 1 ]
>   ifFalse: [ high := index - 1 ] ] ].
> + ^ exceptionBlock
> + cull: high
> + cull: low!
> - ^exceptionBlock cull: high cull: low!
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.576.mcz

Chris Muller-3
On Thu, Jul 24, 2014 at 4:16 PM, Levente Uzonyi <[hidden email]> wrote:
On Thu, 24 Jul 2014, [hidden email] wrote:

Chris Muller uploaded a new version of Collections to project The Inbox:
http://source.squeak.org/inbox/Collections-cmm.576.mcz

==================== Summary ====================

Name: Collections-cmm.576
Author: cmm
Time: 24 July 2014, 10:39:06.452 am
UUID: 17b60ac6-55b3-4706-b1c8-d826e6ad5657
Ancestors: Collections-eem.575

When a SequenceableCollection contains duplicate keys, #findBinaryIndex:do:ifNone: should answer the index of the _earliest_.

There are two problems with this change
- it breaks this method's runtime guarantee which is O(log(n)) for all cases. Worst case runtime becomes O(n) [2], instead of O(log(n)) (actually Big Theta instead of Big Omicron[1], but it's not on my keyboard).
- it introduces unwanted behavior. What if I want the index of the last matching element?

For me, it introduced the _wanted_ behavior.  :)  But yes, its very easy to imagine other cases where it wouldn't be needed and so the extra traversal would make it unwanted.  Since what I need can be done on the outside, we don't need such unwantedness on the inside.

Incidentally, it seems like the current behavior might not always answer the last-matching element; like dividing odd sizes, remainders and one-off's, you might get 2nd from last matching or even one besides that?  Boy, Franks words are ringing clear about tests for this.
 
If you want the index of the first matching element, then you can do two things:
- find the index after the binary search yourself (which is still not a good idea because of the possible performance loss)

That's what I ended up doing.  I know the composition of the data; very few duplicate keys.  It's a perfect solution for me.

- change the binary search to behave as you'd like to

| haystack needle |
haystack := #(1 3 5 7 11 11 15 23).
needle := 11.
haystack
        findBinaryIndex: [ :each |
                needle = each
                        ifTrue: [ -1 " Go to the left in case of a match to ensure that upper will be the index of the leftmost match " ]
                        ifFalse: [ needle - each ] ]
        do: nil "We'll never get here, so it can be anything."
        ifNone: [ :lower :upper |
                " upper is the index of the leftmost match if there's a match, and its value is at least 1 "
                (upper <= haystack size and: [ (haystack at: upper) = needle ])
                        ifTrue: [ upper ]
                        ifFalse: [ 0 ] ]

Ah, interesting..
 

With a slight modification you can find the index of the last match too.


Levente

[1] https://en.wikipedia.org/wiki/Big_O_notation#Family_of_Bachmann.E2.80.93Landau_notations
[2] Here's a small benchmark demonstrating the edge case with your change:

| haystack needle |
Smalltalk garbageCollect.
haystack := Array new: 100000 withAll: 11.
needle := 11.
{ [ haystack
        findBinaryIndex: [ :each |
                needle = each
                        ifTrue: [ -1 ]
                        ifFalse: [ needle - each ] ]
        do: nil
        ifNone: [ :lower :upper |
                (upper <= haystack size and: [ (haystack at: upper) = needle ])
                        ifTrue: [ upper ]
                        ifFalse: [ 0 ] ] ] bench.
[ haystack
        findBinaryIndex: [ :each | needle - each ]
        do: [ :each | each ]
        ifNone: [ 0 ] ] bench }
#('1,150,000 per second.' '741 per second.')


LOL!!!!!  Good one!  Oh, okay, yes, you did call that an "edge" case.  More like a "nil" case, but it sure made for a dramatic comparison!   :-D