GSoC 2019 Introduction

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

GSoC 2019 Introduction

Smiljana Knezev
Hello everyone,

I am a final year undergraduate student of Information Technologies in Novi Sad, Serbia. I first heard about Pharo about a year ago when professor Stephane Ducasse gave a guest lecture at our Faculty. I thought I was familiar with Object Oriented Programming, having been learning Java, but it was the first time I heard branching in OO is a sign of bad design. I was drawn to Pharo because of it's simplicity and elegance.

This Tuesday I found out I was accepted for GSoC program for New Collections for Pharo project and I'm pretty excited about it. You can read about proposal here: https://summerofcode.withgoogle.com/projects/#4973054243897344
It will focus on documenting, refactoring and testing existing collections and migrating them to Containers library. Also some new collections like: SimpleTrees, B-Trees,  AVL Trees, QuadTrees, OcTrees, ... will be developed
Final part would be about research and development specialized/advanced data structures like: Immutable Queues, Immutable Lists, Immutable bidirectional maps,Threaded binary trees ... I would appreciate your feedback about the proposal. Should I change something? Maybe add or focus on some additional data structures?

Other candidate from Serbia, Nina Medic and I discussed about writing a blog together for our weekly posts. It can be found here: https://smiljanaknezev.github.io/blog/?fbclid=IwAR2vC-N38wlPGY42ijokv67Anqd3CENh8N2q47Yl9CzrR5RDxd5Tr_bjVy0 
It will be further customized and ready until Sunday. Also I have setup a twitter account to promote our blog posts and keep in touch with community. You can find me with @KnezevSmiljana .

If you have any additional tips/tricks about making this GSoC project successful, I would like to hear about them.

Looking forward to hearing from you.

Best regards,
Smiljana




Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2019 Introduction

Richard O'Keefe
Branching as such is not a sign of bad design in OOP.
The actual dogma is that *testing the class of an
argument instead of dispatching* is a sign of bad design.
Just consider the case of binary search.  That is not
the kind of branching that lends itself to dynamic
dispatch.  Nor is the kind of branching you will have
to write if you implement some sort of balanced tree.
Or quad-trees or oct-trees or k-d trees or R-trees or ...

I invite you to consider the following two methods
from java.lang.String:
  byte[] getBytes(String charsetName);
  byte[] getBytes(CharSet charset);
In Smalltalk, you have two options.
The first is double dispatch.

  getBytes: encoding
    ^encoding getBytesFor: self
  String>>getBytesFor: 
  CharSet>>getBytesFor:
In this case, that requires you to modify two classes
instead of one; in general it would require N+1 where
N is the number of overloaded methods.  It also leads
to *coupling*, where too many classes depend on each
other.  Worst of all, in a project, you might not be
*allowed* to modify CharSet.

The other is to use an "if":

  getBytesFor: encoding
    |charset|
    charset := (encoding isKindOf: CharSet)
                 ifTrue:  [encoding]
                 ifFalse: [CharSet named: encoding].
    ...

This approach
 - involves just one method, not N+1;
 - does not increase coupling;
 - does not require the ability to change system classes;
 - does not introduce additional non-private method names;
 - BUT does not cope well with additional ad hoc overloads.

There is no silver bullet.

Perhaps you may now look at Java with different eyes:
ad hoc overloading is similar to using type-based branching.

When it comes to data structures, I remember when we had a
visitor to the CS department I was working in.  His goal
while he was with us was to carefully implement and benchmark
about a dozen "advanced" priority queue data structures from
the literature on the subject.  I told him "make sure you
include the classic array-based heap".  Guess which algorithm
won!  That's right, the classic array-based heap.  The point
of the story is that a simple algorithm implemented well is
likely to do surprisingly well.  The rule of thumb I use is
"keep the number of memory references down".

As for which data structures you should look at, perhaps it
is better to start by considering what *tasks* you want to
support.  If you want to support computational geometry,
that will lead you to certain choices.  If you want to
support constraint satisfaction and/or combinatorial
optimisation, that will lead you to others.  If you want
to make concurrent programming easier, that will lead you
to concurrent stacks, queues, sets, bags, dictionaries &c.
Don't think of "AVL tree", think of "efficient sorted sets,
bags, and dictionaries" and work back from that.

Here are two (related) data structures I've wanted.
A triple store
 - is a set of (Att,Obj,Val) triples
 - supports *all* of the access patterns,
   where + means that value is known (given)
     and - means that value is unknown (iterated over)
   (+,+,+) (+,+,-) (+,-,+) (+,-,-)
   (-,+,+) (-,+,-) (-,-,+) (-,-,-)
   where each has good asymptotic efficiency
 - has low space overhead.
This has obvious application to RDF and rule-based engines.

A good graph data structure, as in nodes and edges.
Well, graph AND digraph, obviously.  You might want to
look at Knuth's book "The Stanford GraphBase" for ideas.





On Fri, 10 May 2019 at 22:00, Smiljana Knezev <[hidden email]> wrote:
Hello everyone,

I am a final year undergraduate student of Information Technologies in Novi Sad, Serbia. I first heard about Pharo about a year ago when professor Stephane Ducasse gave a guest lecture at our Faculty. I thought I was familiar with Object Oriented Programming, having been learning Java, but it was the first time I heard branching in OO is a sign of bad design. I was drawn to Pharo because of it's simplicity and elegance.

This Tuesday I found out I was accepted for GSoC program for New Collections for Pharo project and I'm pretty excited about it. You can read about proposal here: https://summerofcode.withgoogle.com/projects/#4973054243897344
It will focus on documenting, refactoring and testing existing collections and migrating them to Containers library. Also some new collections like: SimpleTrees, B-Trees,  AVL Trees, QuadTrees, OcTrees, ... will be developed
Final part would be about research and development specialized/advanced data structures like: Immutable Queues, Immutable Lists, Immutable bidirectional maps,Threaded binary trees ... I would appreciate your feedback about the proposal. Should I change something? Maybe add or focus on some additional data structures?

Other candidate from Serbia, Nina Medic and I discussed about writing a blog together for our weekly posts. It can be found here: https://smiljanaknezev.github.io/blog/?fbclid=IwAR2vC-N38wlPGY42ijokv67Anqd3CENh8N2q47Yl9CzrR5RDxd5Tr_bjVy0 
It will be further customized and ready until Sunday. Also I have setup a twitter account to promote our blog posts and keep in touch with community. You can find me with @KnezevSmiljana .

If you have any additional tips/tricks about making this GSoC project successful, I would like to hear about them.

Looking forward to hearing from you.

Best regards,
Smiljana