Smalltalk › Pharo › Pharo Smalltalk Users

[GSoC blog post] Binary Search Trees

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

15 messages Options

Smiljana Knezev

[GSoC blog post] Binary Search Trees

Dear all,

I've written about implementing binary search trees: https://pharokeepers.github.io/pharo/2019/07/07/SmiljanaBinarySearchTreeinPharo.html

Feedback and opinions is always welcome :)

Best regards,

Smiljana Knezev

ducasse

Re: [GSoC blog post] Binary Search Trees

On 7 Jul 2019, at 23:12, Smiljana Knezev <[hidden email]> wrote:

Dear all,

I've written about implementing binary search trees: https://pharokeepers.github.io/pharo/2019/07/07/SmiljanaBinarySearchTreeinPharo.html

Feedback and opinions is always welcome :)

pay attention to the formatting of your code.

It makes is difficult to read (I do not know if this is wordpress effect or not).

Once you have your tree implementation and tests I would really like to see if you can get another implementation and measure

the impact of introducing a NullNode to remove all the isNil checks.

Best regards,
Smiljana Knezev

ducasse

Re: [GSoC blog post] Binary Search Trees

On 8 Jul 2019, at 05:51, ducasse <[hidden email]> wrote:

On 7 Jul 2019, at 23:12, Smiljana Knezev <[hidden email]> wrote:

Dear all,

I've written about implementing binary search trees: https://pharokeepers.github.io/pharo/2019/07/07/SmiljanaBinarySearchTreeinPharo.html

Feedback and opinions is always welcome :)

pay attention to the formatting of your code.
It makes is difficult to read (I do not know if this is wordpress effect or not).

Once you have your tree implementation and tests I would really like to see if you can get another implementation and measure
the impact of introducing a NullNode to remove all the isNil checks.

Also no need of extra parentheses between binary and keywords

depth:aNode "Returns depth of a tree starting from the given node" | leftDepth rightDepth | leftDepth := -1. aNode leftChild isNotNil ifTrue: [ leftDepth := self depth: aNode leftChild ]. rightDepth := -1. aNode rightChild isNotNil ifTrue: [ rightDepth := self depth: aNode rightChild ]. ( leftDepth > rightDepth ) ifTrue: [ ^ (1 + leftDepth) ] ifFalse: [^ (1 + rightDepth ) ].

leftDepth > rightDepth ifTrue: [ ^ 1 + leftDepth ] ifFalse: [^ 1 + rightDepth ].

And even better move out of block returns

when possible

^ leftDepth > rightDepth ifTrue: [ 1 + leftDepth ] ifFalse: [ 1 + rightDepth ].

With a nullNode the code should look like

depth:aNode "Returns depth of a tree starting from the given node”

| leftDepth rightDepth | leftDepth := self depth: aNode leftChild. rightDepth := self depth: aNode rightChild. ^ leftDepth > rightDepth ifTrue: [ 1 + leftDepth ] ifFalse: [1 + rightDepth ]

Best regards,
Smiljana Knezev

ducasse

Re: [GSoC blog post] Binary Search Trees

In reply to this post by Smiljana Knezev

More feedback :)

Do not return nil

Check the last lecture of the mooc

Did you read Pharo with Style?

I do not know if LinkedList is the one of the system used by Process but in that case use the alternate implementation.

bfs: info "Breadth fisrt search for an information. If there is no key containing given infomration returns nil. If the info is found returns a node containing it" | queue | self isEmpty ifTrue: [ ^nil ]. queue := LinkedList new. queue addLast: self root. [ queue isNotEmpty ] whileTrue: [ | node | node := queue removeFirst. node key == info ifTrue: [ ^node ]. node leftChild isNotNil ifTrue: [ queue addLast: node leftChild ]. node rightChild isNotNil ifTrue: [ queue addLast: node rightChild ]. ]. ^nil.

bfs: info "Breadth fisrt search for an information. If there is no key containing given infomration returns nil. If the info is found returns a node containing it" | queue | self isEmpty ifTrue: [ ^ self ]. queue := LinkedList new. queue addLast: self root. [ queue isNotEmpty ] whileTrue: [ | node | node := queue removeFirst. node key == info ifTrue: [ ^node ]. node leftChild isNotNil ifTrue: [ queue addLast: node leftChild ]. node rightChild isNotNil ifTrue: [ queue addLast: node rightChild ]. ]

ducasse

Re: [GSoC blog post] Binary Search Trees

On 8 Jul 2019, at 05:57, ducasse <[hidden email]> wrote:

More feedback :)

Do not return nil
Check the last lecture of the mooc

Did you read Pharo with Style?

I do not know if LinkedList is the one of the system used by Process but in that case use the alternate implementation.

Sorry I’ still jet lagged and sleepy

But

- bfs: is a bad name.

- Also why not having a block to specify what we are looking for?

bfs: info "Breadth fisrt search for an information. If there is no key containing given infomration returns nil. If the info is found returns a node containing it" | queue | self isEmpty ifTrue: [ ^nil ]. queue := LinkedList new. queue addLast: self root. [ queue isNotEmpty ] whileTrue: [ | node | node := queue removeFirst. node key == info ifTrue: [ ^node ]. node leftChild isNotNil ifTrue: [ queue addLast: node leftChild ]. node rightChild isNotNil ifTrue: [ queue addLast: node rightChild ]. ]. ^nil.

=>

bfs: info "Breadth fisrt search for an information. If there is no key containing given infomration returns nil. If the info is found returns a node containing it" | queue | self isEmpty ifTrue: [ ^ self ]. queue := LinkedList new. queue addLast: self root. [ queue isNotEmpty ] whileTrue: [ | node | node := queue removeFirst. node key == info ifTrue: [ ^node ]. node leftChild isNotNil ifTrue: [ queue addLast: node leftChild ]. node rightChild isNotNil ifTrue: [ queue addLast: node rightChild ]. ]

ducasse

Re: [GSoC blog post] Binary Search Trees

bfs: aValue

findBreadthFirst: aValue

dfs: info node: aNode

findDeepFirst: aValue startingFrom: aNode

webwarrior

Re: [GSoC blog post] Binary Search Trees

In reply to this post by Smiljana Knezev

Smiljana Knezev wrote
> Dear all,
>
> I've written about implementing binary search trees:
> https://pharokeepers.github.io/pharo/2019/07/07/SmiljanaBinarySearchTreeinPharo.html
>
> Feedback and opinions is always welcome :)
>
> Best regards,
> Smiljana Knezev

Your implementation is unnecesarily complicated.

Remember, tree is a recursive data structure. You may have have reasons to
split tree and node into different classes, but at least delegate operations
to node.

For example, #depth and #size can be defined as following methods of Node
class:

depth
self isLeaf ifTrue: [ ^ 0 ].
^ leftChild depth max: rightChild depth

size
self isLeaf ifTrue: [ ^ 1 ].
^ leftChild size + rightChlid size

Most other operations can also be defined in similar fashion.

--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Ben Coman

Re: [GSoC blog post] Binary Search Trees

In reply to this post by ducasse

I agree #bfs: is not explicit, but #breadthFirstSearch: is a well
know term, why not just that? Or even #breathFirstSearchFor:

In comparison #findBreadthFirst: and #findDeepFirst: feel a bit
awkward to me.
(I think its the repetition rhythm of "f" words)

cheers -ben

On Mon, 8 Jul 2019 at 12:55, ducasse <[hidden email]> wrote:

>
> bfs: aValue
>
> =>
>
> findBreadthFirst: aValue
>
>
> dfs: info node: aNode
>
> =>
>
> findDeepFirst: aValue startingFrom: aNode

Smiljana Knezev

Re: [GSoC blog post] Binary Search Trees

Thank you all very much. This is quite useful for me and I can learn a lot.

best regards,

Smiljana Knezev

сре, 10. јул 2019. у 12:29 Ben Coman <[hidden email]> је написао/ла:

I agree #bfs: is not explicit, but #breadthFirstSearch: is a well
know term, why not just that? Or even #breathFirstSearchFor:

In comparison #findBreadthFirst: and #findDeepFirst: feel a bit
awkward to me.
(I think its the repetition rhythm of "f" words)

cheers -ben

On Mon, 8 Jul 2019 at 12:55, ducasse <[hidden email]> wrote:
>
> bfs: aValue
>
> =>
>
> findBreadthFirst: aValue
>
>
> dfs: info node: aNode
>
> =>
>
> findDeepFirst: aValue startingFrom: aNode

Pharo Smalltalk Users mailing list

Re: [GSoC blog post] Binary Search Trees

In reply to this post by Smiljana Knezev

Hello Smiljana,

Thanks for having written down this document. I am not expert in algorithm, so I would consider myself a simple user. I have developed complex software for some times and I have never seen the need of having a binary search tree. I guess this is probably partly because of my lack of expertise in binary search tree and partly because experts in binary search trees assume that people know what it is about and in what it is useful.

My question is, when should a programmer ever need to use binary search tree? Can you add some examples on what these trees are good for, and how an average programmer should look into it. I think this will be a valuable and easy way to expand your blog.

Cheers,

Alexandre

On Jul 7, 2019, at 5:12 PM, Smiljana Knezev <[hidden email]> wrote:

Dear all,

I've written about implementing binary search trees: https://pharokeepers.github.io/pharo/2019/07/07/SmiljanaBinarySearchTreeinPharo.html

Feedback and opinions is always welcome :)

Best regards,
Smiljana Knezev

Sven Van Caekenberghe-2

Re: [GSoC blog post] Binary Search Trees

Binary search is extremely useful to quickly find something in a sorted collection, like finding if a string or number is included in a large collection.

BTW, we already have an implementation, see SequenceableCollection>>#findBinary: and friends.

> On 10 Jul 2019, at 18:53, Alexandre Bergel via Pharo-users <[hidden email]> wrote:
>
>
> From: Alexandre Bergel <[hidden email]>
> Subject: Re: [Pharo-users] [GSoC blog post] Binary Search Trees
> Date: 10 July 2019 at 18:53:01 GMT+2
> To: Any question about pharo is welcome <[hidden email]>
>
>
> Hello Smiljana,
>
> Thanks for having written down this document. I am not expert in algorithm, so I would consider myself a simple user. I have developed complex software for some times and I have never seen the need of having a binary search tree. I guess this is probably partly because of my lack of expertise in binary search tree and partly because experts in binary search trees assume that people know what it is about and in what it is useful.
>
> My question is, when should a programmer ever need to use binary search tree? Can you add some examples on what these trees are good for, and how an average programmer should look into it. I think this will be a valuable and easy way to expand your blog.
>
> Cheers,
> Alexandre
>
>> On Jul 7, 2019, at 5:12 PM, Smiljana Knezev <[hidden email]> wrote:
>>
>> Dear all,
>>
>> I've written about implementing binary search trees: https://pharokeepers.github.io/pharo/2019/07/07/SmiljanaBinarySearchTreeinPharo.html
>>
>> Feedback and opinions is always welcome :)
>>
>> Best regards,
>> Smiljana Knezev
>
>
>

Richard Sargent

Re: [GSoC blog post] Binary Search Trees

Administrator

In reply to this post by Pharo Smalltalk Users mailing list

On Wed, Jul 10, 2019 at 9:53 AM Alexandre Bergel via Pharo-users <[hidden email]> wrote:

Hello Smiljana,

Thanks for having written down this document. I am not expert in algorithm, so I would consider myself a simple user. I have developed complex software for some times and I have never seen the need of having a binary search tree. I guess this is probably partly because of my lack of expertise in binary search tree and partly because experts in binary search trees assume that people know what it is about and in what it is useful.

Hi Alexandre,

I can tell you that GemStone/S makes extensive use of b-trees and variants in its internal data structures. Two examples are the object table itself and the index data structures. I agree that the average programmer may never have a need for them, but I don't like to close doors on (other) developers.

Smiljana,

Others have suggested things like the Null Object pattern for the left and right links, but an alternative or supplement to that would be a space optimizing leaf node which doesn't have instance variables for left and right sub-trees. But, don't go down the optimizing paths until you have good solutions for everything else. i.e. make right, then make it fast.

I noted your code looks "C-like", which I suspect arises from being inspired by implementation in other languages.

For example, I think only an empty tree should answer 0 for depth. A tree with one node should answer 1 while a tree with 2 or 3 nodes should answer 2, etc. If you drew a tree out on paper, how many levels would you see? That is the depth, in my opinion.

Another example arises from one of the most powerful idioms in Smalltalk. Call it the #at:ifAbsent: pattern. (See also #detect:ifNone:) Someone else mentioned the passing around of nil and suggested you not do that. The #at:ifAbsent: pattern is the key to avoiding that: tell the receiver what *you* want to *do* if the thing can't be found by passing in your own block.

Additionally, you will want your nodes to support various payloads. Right now the payload is the object held in the key. That works for identity structures, but it won't work for more advanced needs, such as one would find in a database index structure. Eventually, you will want the ability to have nodes that support additional attribute(s). I've found P. J Plauger's "0, 1, infinity" rule to be a superb design guideline. In short, once you go beyond having one of something (e.g. the key itself), you need to design things so that you can have an arbitrarily large number. The step from 1 to >1 is huge and messy if retrofitted, but easy when planned in.

Eventually, you will need queries against the tree, allowing range selection, identity versus equality, etc. The tree should provide a basic infrastructure for searching, but a separate modelling of queries will be necessary. That may be out of scope for GSOC, of course.

Good work and good luck with the rest of the project!

My question is, when should a programmer ever need to use binary search tree? Can you add some examples on what these trees are good for, and how an average programmer should look into it. I think this will be a valuable and easy way to expand your blog.

Cheers,
Alexandre

On Jul 7, 2019, at 5:12 PM, Smiljana Knezev <[hidden email]> wrote:

Dear all,

I've written about implementing binary search trees: https://pharokeepers.github.io/pharo/2019/07/07/SmiljanaBinarySearchTreeinPharo.html

Feedback and opinions is always welcome :)

Best regards,
Smiljana Knezev

Ben Coman

Re: [GSoC blog post] Binary Search Trees

In reply to this post by Pharo Smalltalk Users mailing list

On Thu, 11 Jul 2019 at 00:53, Alexandre Bergel via Pharo-users
<[hidden email]> wrote:
>
> Hello Smiljana,
>
> Thanks for having written down this document. I am not expert in algorithm, so I would consider myself a simple user. I have developed complex software for some times and I have never seen the need of having a binary search tree. I guess this is probably partly because of my lack of expertise in binary search tree and partly because experts in binary search trees assume that people know what it is about and in what it is useful.
>
> My question is, when should a programmer ever need to use binary search tree? Can you add some examples on what these trees are good for, and how an average programmer should look into it. I think this will be a valuable and easy way to expand your blog.

My "computer science" is a bit rusty but Binary Search Trees and their
generalisation to B-Trees is Databases 101.
Almost guaranteed you are "using" them with any database you use,
you're just not directly exposed to them.

If you skim "Redesigning String Data Structures to Exploit Cache"
(https://people.eng.unimelb.edu.au/jzobel/fulltext/acmjea10.pdf)
looking at the performance graphs, for "unordered searching" the best
bet for many applications is probably our hash-based Dictionary if
that covers all your needs.
BSTs are more memory efficient than Dictionarys, but less cache-friendly.

A few applications where BSTs can be useful:
* doing range searches efficiently [1]
* traversing elements in order, which is more difficult with hash tables [1] [2]
* good for "order" statistics [2] like:
- inexact search - finding closest lower and greater elements.
- determining maximum & minimum elements
- determining successor & predecessor elements
- range queries - easy to do with BSTs and not a natural operation
with Hash Tables.
* Self-Balancing BSTs guarantee operations work in O(Logn) time.
Hashing has Θ(1) is average time but some particular operations are
costly, especially when table resizing happen [2]
but also they are slower so if the data is sufficiently random
simple BSTs may do.

[1] https://stackoverflow.com/questions/371136/binary-trees-vs-linked-lists-vs-hash-tables
[2] https://www.geeksforgeeks.org/advantages-of-bst-over-hash-table/

This side-related article...
"Number crunching: Why you should never, ever, EVER use linked-list
in your code again"
(https://kjellkod.wordpress.com/2012/02/25/why-you-should-never-ever-ever-use-linked-list-in-your-code-again)
indicates that for many In-Memory operations you are better using
sorted-Arrays (e.g. SequenceableCollection>>#findBinary:)
than cache-invalidating pointer-base structures like Lists & Trees -
thus the strong association of BSTs & BTrees with slower access
disk-based databases.
Also notice that additional BST features closely match common database
operations.

cheers -ben

Richard O'Keefe

Re: [GSoC blog post] Binary Search Trees

In reply to this post by Pharo Smalltalk Users mailing list

Search trees (which come in many varieties) are good for collections in which

adding, deleting, and lookup are interleaved, where there is an order on the

elements (if used as a set) or the keys (if used as a dictionary).  So are

hash tables.  But (balanced) search trees have guaranteed O(lg n) worst case

time, which hash tables do not.  And search trees let you enumerate their

contents in ascending or descending order, at any time.  And search trees can

support queries like

 - tell me all the elements {<, <=, >, >=} x

 - tell me all the elements between L and U

 - tell me the next element (before,after) x

My Smalltalk library includes Sorted{Set,Bag,Dictionary} implemented as splay trees.

Some time I must try an alternative.

in time O(lg n + number of answers).

On Thu, 11 Jul 2019 at 04:53, Alexandre Bergel via Pharo-users <[hidden email]> wrote:

Hello Smiljana,

Thanks for having written down this document. I am not expert in algorithm, so I would consider myself a simple user. I have developed complex software for some times and I have never seen the need of having a binary search tree. I guess this is probably partly because of my lack of expertise in binary search tree and partly because experts in binary search trees assume that people know what it is about and in what it is useful.

My question is, when should a programmer ever need to use binary search tree? Can you add some examples on what these trees are good for, and how an average programmer should look into it. I think this will be a valuable and easy way to expand your blog.

Cheers,
Alexandre

On Jul 7, 2019, at 5:12 PM, Smiljana Knezev <[hidden email]> wrote:

Dear all,

I've written about implementing binary search trees: https://pharokeepers.github.io/pharo/2019/07/07/SmiljanaBinarySearchTreeinPharo.html

Feedback and opinions is always welcome :)

Best regards,
Smiljana Knezev

Pharo Smalltalk Users mailing list

Re: [GSoC blog post] Binary Search Trees

Thanks for your detailed answers

Cheers,

Alexandre

On Jul 12, 2019, at 8:49 AM, Richard O'Keefe <[hidden email]> wrote:

Search trees (which come in many varieties) are good for collections in which
adding, deleting, and lookup are interleaved, where there is an order on the
elements (if used as a set) or the keys (if used as a dictionary). So are
hash tables. But (balanced) search trees have guaranteed O(lg n) worst case
time, which hash tables do not. And search trees let you enumerate their
contents in ascending or descending order, at any time. And search trees can
support queries like
- tell me all the elements {<, <=, >, >=} x
- tell me all the elements between L and U
- tell me the next element (before,after) x

My Smalltalk library includes Sorted{Set,Bag,Dictionary} implemented as splay trees.
Some time I must try an alternative.
in time O(lg n + number of answers).

On Thu, 11 Jul 2019 at 04:53, Alexandre Bergel via Pharo-users <[hidden email]> wrote:
Hello Smiljana,

Thanks for having written down this document. I am not expert in algorithm, so I would consider myself a simple user. I have developed complex software for some times and I have never seen the need of having a binary search tree. I guess this is probably partly because of my lack of expertise in binary search tree and partly because experts in binary search trees assume that people know what it is about and in what it is useful.

My question is, when should a programmer ever need to use binary search tree? Can you add some examples on what these trees are good for, and how an average programmer should look into it. I think this will be a valuable and easy way to expand your blog.

Cheers,
Alexandre

On Jul 7, 2019, at 5:12 PM, Smiljana Knezev <[hidden email]> wrote:

Dear all,

I've written about implementing binary search trees: https://pharokeepers.github.io/pharo/2019/07/07/SmiljanaBinarySearchTreeinPharo.html

Feedback and opinions is always welcome :)

Best regards,
Smiljana Knezev