Design Principles Behind Smalltalk, Revisited

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
79 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Göran Krampe
Hi all!

Todd Blanchard <[hidden email]> wrote:
> Funny, I just blogged about this.
>
> http://www.blackbagops.net/?p=93

And a "response" from me:

        http://goran.krampe.se/blog/Bits/ODBvsRDB.rdoc

But... let me ramble a bit about the RDB life cycle stuff.

JJ IIRC talks about making a "proper" relational model and then letting
multiple apps written in various languages over time operate on it - or
parts of it. The idea is that the data "lives forever" and the apps come
and go.

Is this idea really based on real world observations? I dunno, I have
only been exposed to a few "enterprises" so my experience is of course
not valid for proofs but I have a feeling that it goes more like this:

1. Someone builds an app. Or hey, the company buys one. A big business
system, or whatever. It has tons of interesting data in an RDB. It is
not object oriented and it has few or very bad interfaces to the outside
world.

2. Another app, bought or homemade wants to use that data - or even
manipulate it! Noone at the company has ever thought of the concept of
encapsulation - so what the heck, let's go straight to the source - use
SQL right into the RDB - these table and column names don't look so hard
to grok... For readonly queries we will hopefully get it right, for
manipulations we damn sure *hope* to get it right.

3. And yet another apps pops up putting its fingers in the cookie jar
too and so it goes. Eventually we have tons of apps written in a bunch
of languages using/abusing the RDB, adding tables of their own, breaking
a few rules here and there perhaps, not using the proper SQL and so on.
It might be tempting to say this is BY DESIGN and that this is GOOD, but
I think that is often a reconstruction of the truth.

I also don't think that first app ever really dies without the DB going
down with it. I also don't think you *first* design the DB, then build
apps to use it. Nope, it is that first app that comes with the DB and
the DB can't just stand on its own without it. Sure, you might *rewrite*
that app using the same DB - but have you ever seen that being actually
done? Some of the apps that came afterwards may go, but the original
system typically is only *replaced including the DB* with something else
when it gets unbearable.

Using the RDB as a sharing ground for applications is IMHO really,
really bad. Sure, it *works* kinda, but very fast you end up with
replicated SQL statements all over the place. Then someone says "stored
procedures" and hey... why not consider OBJECTS? There is probably a
reason why people are so worked up about Services these days. :)

Just my 3 cents of course.

regards, Göran

PS. On a given day and context lots of factors come into play. I just
don't buy simple answers about RDBs being superior for enterprises based
on these particular arguments. There are large mission critical systems
built using ODBs running at "Enterprises". If it fits they rock.

Reply | Threaded
Open this post in threaded view
|

Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

dcorking
In reply to this post by Howard Stearns
On 1/2/07, Howard Stearns <[hidden email]> wrote:
> There are also problems for which pencil and paper really aren't suited
> for. Same for RDBMS. They can be made to work with the great expenditure
> of resources, chewing gum, bailing wire, duct tape, vise grips, etc....

> What I'm trying to do -- and of course, this isn't a Squeak question at
> all, but I hope it is a Squeak community question -- is try to learn
> what domain a perfectly running RDBMS is a good fit for by design,
> compared with a perfectly running alternative (even a hypothetical one).

I am not clear what you mean by "good fit by design"

When you asked in an earlier message "whether the math techniques that
were developed to  provide efficient random access over disks 20 years
ago are still valid" were you referring to the math  techniques
relational model?

If so, my hunch is that you are framing the question upon an incorrect
perception of the purpose of the relational calculus.   My
understanding is that the calculus, or specifically SQL, is a _problem
statement  language_ , a way for engineers to specify what needs to be
done, leaving the computer to figure out how to do it.

I wasn't doing this 20 years ago, but my reading of history is that
engineers knew perfectly well how to make efficient use of disks, and
when their employer bought the leading RDBMS they got a slow layer of
murky proprietary code, with a shiny standardised data model and API.

In other words, RDBs make data access slower, _but_ make engineering
easier for some problem domains.

David

Reply | Threaded
Open this post in threaded view
|

Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Bryce Kampjes
In reply to this post by Howard Stearns
Howard Stearns writes:

 > What I'm trying to do -- and of course, this isn't a Squeak question at
 > all, but I hope it is a Squeak community question -- is try to learn
 > what domain a perfectly running RDBMS is a good fit for by design,
 > compared with a perfectly running alternative (even a hypothetical one).
I'd say if you're placing the database schema at the center of your
large system or you're using the query facilities. Relational algebra
is often just powerful enough to model commercially interesting
systems. It's lack of expressive power makes it a very powerful system
to manipulate either during design or by a query optimizer.

The great strength of RDBMSes is they are a mathematically decidable
and complete system. If you can translate a problem into relational
algebra you can always find a solution however such a system is not
powerful enough to model arithmetic on natural numbers.

Bryce

Reply | Threaded
Open this post in threaded view
|

RE: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Peter Crowther-2
In reply to this post by Howard Stearns
> From: Howard Stearns
> I'm asking what kinds of problems RDBMS are
> uniquely best at solving (or at least no worse).

If you could go from a clean slate for each unique problem, probably
none.  Same for almost any other widely-deployed technology - almost by
definition, if it has been deployed outside its niche then it has been
deployed in sub-optimal ways.

> I'm not asking whether
> they CAN be used for this problem or that.  I'm asking this from an
> engineering/mathematics perspective, not a business ("we've always
done
> things this way" or "we like this vendor") perspective.

Ah.  Theory :-).  In theory, I agree with you.  In reality, I agree with
Andreas - RDBMSs are stable and widely understood, and they aren't
*that* bad for quite a wide class of problems.

> [Naively, it seems like the obvious solution for this
> (mathematically)
> is a hashing operation to keep the data evenly distributed over
> in-memory systems on a LAN, plus an in-memory cache of recently used
> chunks. But let's assume I'm missing something. The task here is to
> figure out what I'm not seeing.]

Stability and incremental development.  How long would it take to
develop your system and get the showstopper defect rate down low enough
for the system to be in line-of-business use?  How would you extend your
system when the next application area came along?  How would you
convince your funder (who wants some part of this system live *now*) to
wait long enough to get the defects out?

> Maybe this isn't typical

Alarmingly, it's not atypical.  My day job involves a *lot* of plumbing
- connecting up previously-incompatible data sources.  This is because
most organisations grow organically, and their IT systems grow
organically with them.  The systems are patch upon patch, and it's never
possible to rip them out and start again.

> Anyway, either the data AS USED fits into memory or doesn't.

I think that's naive.  Could I instead propose "the data AS USED fits
into memory plus what can reasonably be transferred via the mass storage
subsystem"?  For many of the apps I use, 98+% of the data accessed comes
from RAM - but it's nice for the remaining 2% to be able to be 10x or
100x the size of RAM without major ill effects.  However, are you
looking at the correct boundary?  Consider tape vs disk, L2 cache versus
main memory, registers and L1 cache versus L2, etc.  I would presume you
could get even faster performance reading all this data into a mass of
Athlon or Core L2 caches and using the HyperTransports to hook 'em
together - why should we use this slow RAM stuff when we have this much
faster on-chip capability?  In other words, what's your rationale for
picking RAM and disk as the boundary?

> Is this still the fastest way? (Answer is no.)

No.  Neither's your proposed approach of using main memory, I suspect.
It may, however be the fastest per dollar of expenditure on the end
system.

> Is there some
> circumstance in which it is the fastest? Or the safest? Or allow us to
> do something that we could not do otherwise?

The latter, yes: develop a sufficiently robust and functional
application in a sufficiently short time with a sufficiently cheap set
of developers.

> Having tools to allow a cult of specialists to break your own
computing
> model (the relational calculus) is not feature, but a signal that
> something is wrong.

Agree entirely :-).

> Maybe if we define the
> problem as "and you only have one commodity box to do it on." That's
> fair. Maybe that's it?  (Then we need to find an "enterprise" with
only
> one box...)

Or /n/ commodity boxes, where n is the capital the organisation can
reasonably deploy in that area.  I suspect you're coming from a
background of solving "hard" problems, where throwing tin at the job is
acceptable, to a world where return on investment determines whether a
project can be justified or not.  If it's not justifiable, it shouldn't
get done - and there are plenty of quotes we've put in where we've been
the cheapest, but the company's decided not to proceed because,
actually, the cost of the system is more than they would ever save from
using it.  That's a pretty sharp razor for business applications, but
ultimately it's the appropriate one to use - it avoids wasting capital
and human effort to produce a shining solution when, ultimately, it
would have been cheaper to use lots of monkeys with typewriters.

                - Peter

Reply | Threaded
Open this post in threaded view
|

Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Howard Stearns


Peter Crowther wrote:
>> ...

lots of good comments. (Thanks.)

> I suspect you're coming from a
> background of solving "hard" problems, where throwing tin at the job is
> acceptable, to a world where return on investment determines whether a
> project can be justified or not.  ...

Heh. Bingo.

But the other side of the coin is that so many projects in this "easy
problem" world are failures. A higher failure rate than the clean-slate
hard problem world!

I've had a $300K hard-problem budget zero'ed because, in part, the last
easy-problem folks spent $26M (some say $52M) to implement a very
standard three-tier system that didn't work. OK. I get that. This is the
way it is, and I've got plenty to say about that, too, over a beer.

But I'm an engineer. I want to understand why the three-tier projects
fail. And how to avoid that. I know that there are people who can make
them succeed despite the math, using leadership and operations research
and charm and ruthlessness and lots of money, or whatever. But that's
not my domain. I may not have the choice to always pick the right tool
for the job, but I do want to try to understand what makes something the
right (or wrong) tool.

--
Howard Stearns
University of Wisconsin - Madison
Division of Information Technology
mailto:[hidden email]
jabber:[hidden email]
voice:+1-608-262-3724

Reply | Threaded
Open this post in threaded view
|

Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Jimmie Houchin-3
In reply to this post by Peter Crowther-2
Peter Crowther wrote:

>> From: J J
>>> From: Howard Stearns <[hidden email]>
>>> That's something I've never really understood: what is the domain in
>>> which Relational Databases excel?
>> Handling large amounts of enterprise data.
>
> Handling and dynamically querying large amounts of data where the data
> format is not necessarily completely stable and ad-hoc query performance
> is important.  "Large" here is "much larger than main memory of the
> machine(s) concerned".  I routinely handle data sets of tens of gigs on
> current commodity hardware - storing the data in RAM would be somewhat
> faster, but too expensive for the available capital.
>
> The strength of relational over other forms is in being able to form
> arbitrary joins *relatively* efficiently, and hence in being able to
> query across data many times larger than main memory without excessive
> disk traffic.
>
> Google isn't a good counter-example, as the ad-hoc querying is missing.
> The types of queries done on the Google database are very limited and
> are well known in advance.

My apologies for an ignorant and naive reply. So forgive if I am way off
base.

But it seems to me that being able to perform arbitrary joins relatively
efficiently is a requisite of an RDBMS because an RDBMS requires you to
arbitrarily partition your data in such a way as to require such joins.

Any time I've spent reading a book on SQL and speaking of "normalizing"
my data, I've never liked what I read.

1st Normal Form contains:
Atomicity: Each attribute must contain a single value, not a set of values.

Since a list is a natural and common way of grouping things. It is by
nature (IMO) an unnatural thing to decompose the list so that I have the
express ability to recompose the list.

Things like that I don't believe are common to other methods of
persistence. I may be wrong.

So I don't believe that comparing a requisite of an RDBMS efficient
joins to other persistent methods OODBS, filesystem, etc. which don't
require such joins to be a valid comparison or at least one in which the
RDBMS wins.

Of course this is a simple argument and could be debated and go off into
the reasons of RDB theory. But there's not enough room for that.

I also don't understand what queries could be performed with an RDBMS
that can't with Google. Or that couldn't if Google partitioned its data
for such queries. After all the data set would have to be partitioned
correctly for an RDB to perform said queries also.

Personally I've almost always been pleased with the performance of my
Google queries. I've also been to many, many, many sites backed by an
RDB in which the queries were horribly slow.

So I personally would be reticent to say Google made a wrong decision.
(and yes, I know you didn't say so either. :)

Jimmie

Reply | Threaded
Open this post in threaded view
|

Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Joshua Gargus-2
In reply to this post by J J-6
(I see that the conversation has moved along since the time that I  
started to draft this, but here goes...)


On Jan 2, 2007, at 11:10 AM, J J wrote:
> Sanity check:  google is trying to keep a current snapshot of all  
> websites and run it on commodity hardware.  You could do exactly  
> the same thing with a lot less CPU's using a highly tuned,  
> distributed RDBMS.  They chose to hand tune code instead of an RDBMS.

What, really?  There are many possible reasons that Google don't use  
an RDBMS to index the web: stupidity, arrogance, excessive cost of an  
RDBMS, sound engineering decisions, or a combination of these.

According to the computer systems research community, Google has  
sound engineering reasons for its architecture; they have published  
papers at top conferences such as OSDI and SOSP.  See http://
labs.google.com/papers ("The Google File System" and "BigTable..."  
might be the most relevant to this conversation).

That's not rule out the possibility of stupidity, arrogance,  
excessive cost, etc..  But it does cast doubt on the unsubstantiated  
claim that Google could "do exactly the same thing with a lot less  
CPUs".

>
>> Finally, in world with great distributed computing power, is  
>> centralized transaction processing really a superior model?
>
> Some people seem to think so:
> http://lambda-the-ultimate.org/node/463
>
> And there is more then that.  I believe in that paper (dont have  
> time to verify) they mention that hardware manufacturers are also  
> starting to take this approach as well because fine grain locking  
> is so bad.

As you mentioned in a follow-up email, this wasn't the paper you  
meant.  Although it has nothing whatsoever to do with RDBMSes, I  
would recommend anyone who has enough free time to learn enough  
Haskell to read that paper.

Did you happen to find the intended link?

>
>> - Working with other applications that are designed to use RDB's?  
>> Maybe, but that's a tautology, no?
>
> Again, one has to work in a large company to appreciate the nature  
> of enterprise application development.

I have no doubt that you're right, but it doesn't answer the  
question: what is it that RDBs *fundamentally* get correct?  It's  
quite like the easy but unsatisfying answer to "why is Smalltalk so  
great?"... "well, you can't appreciate it unless you've grokked  
Smalltalk".

Certainly RDBs are essential to the operations of the modern  
enterprise, but how much of this is because RDBs are really the best  
imaginable approach to this sort of thing, and how much is due to a  
complicated process of co-evolution that has resulted in the current  
enterprise software ecosystem?

Josh


Reply | Threaded
Open this post in threaded view
|

RE: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Peter Crowther-2
In reply to this post by Howard Stearns
> From: Howard Stearns
> Peter Crowther wrote:
> lots of good comments. (Thanks.)

Thanks for the response - didn't know how they'd be taken.

> But the other side of the coin is that so many projects in this "easy
> problem" world are failures. A higher failure rate than the
clean-slate
> hard problem world!

Yup.  Because the (technically) "easy problems" are organisational
nightmares, with vendor/customer politics, sales/tech politics, many
stakeholders at the client who are using the system for political
infighting and empire-building, and (in general) at least one key
stakeholder who will do his/her best to sabotage the project as it's to
their personal advantage to have it fail.  And that's in a *small*
project.

By contrast, the clean-slate projects typically have a few key
stakeholders, clear and non-conflicting requirements, and less in the
way of internal politics.

Consider the following (typical) example of an "easy" problem:
- The client invites tenders for a specified system;
- The salesperson wants their bonus, so deliberately tenders for less
than they know the system will cost to develop;
- The developing company wins the bid;
- The salesperson gets their bonus, and moves on to the next sale;
- The project *cannot* be a win for both remaining sides, as it is not
possible to bring it in with all features and within budget - and that's
ignoring requirements creep;
- Somebody loses.  Probably both sides lose: there's no profit in the
job, and the end system doesn't do what the client wants.

Unless you consider the systems angle, you don't see the full system.
The *full* system includes all the humans who interact to produce it,
and therein lies a large chunk of the problem.

> I want to understand why the three-tier projects
> fail. And how to avoid that. I know that there are people who can make
> them succeed despite the math, using leadership and operations
research
> and charm and ruthlessness and lots of money, or whatever. But that's
> not my domain. I may not have the choice to always pick the right tool
> for the job, but I do want to try to understand what makes something
> the right (or wrong) tool.

>From observation (perhaps with charcoal-tinted spectacles), I think
you're looking at the end of the problem that can make a few percent of
difference.  If, instead, you look at all the messy leadership, charm,
ruthlessness and money side, I think you're looking at the side where
most of the difference in the success of a project is *actually* made.
It's possible to do so, but you need to apply the principles of systems
analysis... and competition, of course.  Even if your company is
sensible and bids at a level where they can do the job, they'll be
undercut by a lying salestoad from a company who (eventually) can't.

Cynical?  Moi? :-)

                - Peter

Reply | Threaded
Open this post in threaded view
|

RE: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Peter Crowther-2
In reply to this post by Joshua Gargus-2
> From: Joshua Gargus
> what is it that RDBs *fundamentally* get correct?

People find it easy to understand tabular data and to cross-reference
between tables.  Relational databases contain tabular data.  So people
find relational databases easy to understand compared to the
alternatives.  Other than the uniformity ("everything is either a tuple
or an atomic value"), there's little else to commend them - but that
ease of understanding has been enough, I think.

The rest of the system is optimisation to try to get relatively
efficient use of the machine despite expressing the problem in ways that
are easy for humans to understand.  Oh, and trying to make up for the
*dreadful* query language that IBM inflicted on the world with SQL.  I
learned the principles of relational systems with db++, a very odd
relational system that used a very clean relational algebra where
select, project and join were first-class operations.  It's certainly
helped me cut through the fog of SQL, where the principles are far less
clear!

                - Peter

Reply | Threaded
Open this post in threaded view
|

Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

tblanchard
In reply to this post by Göran Krampe

On Jan 2, 2007, at 1:36 PM, [hidden email] wrote:

Using the RDB as a sharing ground for applications is IMHO really,

really bad. Sure, it *works* kinda, but very fast you end up with

replicated SQL statements all over the place. Then someone says "stored

procedures" and hey... why not consider OBJECTS? There is probably a

reason why people are so worked up about Services these days. :)


So if it is objects instead of tables - how is this different?
Uh, and the alternative would be what?  Take a typical company that makes and sells stuff.

They have customers (hopefully).  

The marketing guys want the customer's demographics and contacts to generate targeted messages.  
The accounting people what to know their credit status, payments, and order totals.  
The inventory/production planning guys don't really care who is buying what, but they want to see how much of each thing is going out the door.
The product development people are looking for trends to spot new kinds of demand trends.  
The sales guys want recent order history, contact event logs, etc.

There are many cross cutting concerns.

If you take the naive object model you probably have
Customers->>Accounts->>Orders->>Items-----(CatalogItem)->InventoryStatus

Works for most traversals, you put the customers in a dictionary at the root by some identifier.
But for the people who process orders or do shipping, this model is a drag.  They just want orders, and items and navigating to all the orders by searching starting at customers is nuts.  So maybe you add a second root for orders for them.  Then there's the inventory stuff....

Everybody wants a different view with different entry points.  I'm talking enterprise architecture here - bigger than systems which is bigger than applications.

Relational databases  don't care about your relationships or roots - anything can be a root.  Anything can be correlated.  Any number of object models can map to a well normalized.

RDBMS systems have a couple nice properties - you can produce lots of different views tailored to a viewpoint/area of responsibility.  They guarantee data consistency in integrity.  Something I find lacking from OO solutions.  

Here's a fun game.  Build an OO db.  Get a lot of data.  Overtime, deprecate one class and start using another in its place.  Give it to another developer who doesn't know the entire history.  One day he deletes a class because it appears not to be referenced in the image anymore.  6 months later try to traverse the entire db to build a report and find errors 'no such class for record'.  What will you do?  

This has happened BTW to me. If I have long lived data, with different classes of users and different areas of responsibilities, I want the RDBMS because it is maximally flexible while providing the highest guarantees of data integrity and consistency.  The problems I've heard described are the result of poor design and unwillingness to refactor as the business changes and grows.

FWIW I have worked at everything from 5 person Mac software companies (anyone remember Tempo II?) to telcos, aerospace, government agencies, and the world's largest internet retailer (a scenario where the relational database turns out to be not the best fit overall).  My solution selection hierarchy as the amount of data grows runs:

1) In the image
2) Image segments or PLists in directories
3) RDBMS/ORM like glorp
4) RDBMS with optimized SQL
5) SOA

I'm pretty sour on OODBMS's based on my long running experiences with them.

-Todd Blanchard


Reply | Threaded
Open this post in threaded view
|

Re: Design Principles Behind Smalltalk, Revisited

Marcel Weiher
In reply to this post by J J-6

On Dec 26, 2006, at 3:18 , J J wrote:

>
>> Again, to contrast with Python, Squeak wants to run the show, but  
>> Python plays nice with all the other free tools of the GNU/Linux  
>> ecosystem.
>
> I keep on seeing this, but it appears largely overstated.  Java has  
> it's own VM, threads etc. as well.

Yes, Java.  I think Python is very different from Java in this  
context, as Java also wants to run the show, and I think this is where  
it is quite similar to Smalltalk.  Python on the other hand is quite  
happy to play along with others, just like Ruby, Perl and, of course, C.

[more java comparison]

>
> And if you mean more to address the tools, well yes you *can* edit  
> Java code in vi if you really want to.  But no one really wants to.  
> And if your interface to the language is through some program  
> anyway, then the "barrier" of the code not being on the file system  
> disappears.

Once again, I think that Java is not a valid substitute for Python in  
this context.  In my experience, hacking Python in or Java or Ruby in  
vi is not just doable but quite useful.  I can't say the same for Java.

Marcel


Reply | Threaded
Open this post in threaded view
|

Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Marcel Weiher
In reply to this post by J J-6

On Jan 2, 2007, at 11:10 , J J wrote:
>> J J wrote:
>>>> ... I simply believe in the right tool for the right job,
>>> and you can't beat an RDB in it's domain. ...
>>
>> That's something I've never really understood: what is the domain  
>> in which Relational Databases excel?
>
> Handling large amounts of enterprise data.  If you have never worked  
> in a large company, you probably wont appreciate this.

Well, I have worked in a large-ish enterprise and my experience was  
that moving *away* from the RDB was central to improving performance  
around a hundred- to a thousandfold, with the bigger improvement for  
the project that completely eliminated the RDB.

> But in a large company you have a *lot* of data, and different  
> applications want to see different parts of it.  In an RDBMS this is  
> no problem, you normalize the data and take one of a few strategies  
> to supply it to the different consumers (e.g. views, stored  
> procedures, etc.).

Har har.  Sorry, but I have seen very few actually reusable data models.

>
>> - Data too large to fit in memory? Well, most uses today may have  
>> been too large to fit in memory 20 years ago, but aren't today. And  
>> even for really big data sets today, networks are much faster than  
>> disk drives, so a distributed database (e.g., a DHT) will be  
>> faster.   Sanity check: Do you think Google uses an RDB for storing  
>> indexes and a cache of the WWW?
>
> Are you serious with this (data too large to fit into memory)?  And  
> if you use a good RDBMS then you don't have to worry about disk  
> speed or distribution.

You are kidding, right?

> The DBA's can watch how the database is being used and tune this  
> (i.e. partition the data and move it to another CPU, etc., etc.).

To some limited extent, yes.  But they can't work miracles, and  
neither can the DB.  In fact, if you know your access patterns, you  
can (almost) always do better than the DB, simply because there are  
fewer layers between you and the code.

> Oh, but you found one example where someone with a lot of data  
> didn't use a RDB.  I guess we can throw the whole technology sector  
> in the trash.  Sanity check:  google is trying to keep a current  
> snapshot of all websites and run it on commodity hardware.  You  
> could do exactly the same thing with a lot less CPU's using a highly  
> tuned, distributed RDBMS.

That's a big claim, mister.  Care to back it up?

Marcel


Reply | Threaded
Open this post in threaded view
|

Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Marcel Weiher
In reply to this post by Andreas.Raab

On Jan 2, 2007, at 12:57 , Andreas Raab wrote:

> Howard Stearns wrote:
>> Yes, I'm quite serious. I'm asking what kinds of problems RDBMS are  
>> uniquely best at solving (or at least no worse). I'm not asking  
>> whether they CAN be used for this problem or that.  I'm asking this  
>> from an engineering/mathematics perspective, not a business ("we've  
>> always done things this way" or "we like this vendor") perspective.
>
> The main benefit: They work.

For some definition of 'work', yes.  My experience so far is more that  
they are 'perceived' to work by enterprisey management-types.  "If  
it's got a database (preferably Oracle), then it must be a real  
business system.  Otherwise it's some weird toy".  This perception is  
not to be ignored lightly, but it isn't the same as actual technical  
merit and not necessarily backed by facts.

> There is no question how to use them, apply them to problems, map  
> them into different domains etc.

Really?  A pretty smart acquaintance of mine who does "enterprise"  
consulting with a company that's actually pretty damn good (and has a  
pretty good reputation and track record AFAICT) once asked the not  
entirely rhetorical question why, in this day and age, every CRUD  
application turns into a PhD thesis.

The Sports system I was supposed to improve had a very complicated  
schema, but all the interesting data had to be stored in serialized  
dictionaries anyway because it simply wasn't regular enough.  During  
the development of the replacement (which doesn't use an RDB at all),  
we were presented with a 'standardized' relational schema for the  
domain.  We fell over laughing.  I don't think we could have printed  
it on an A0 sheet.  And that isn't the only time I have seen this sort  
of thing.

I did come up with a schema that would have worked, and apparently our  
DBAs were quite impressed with it, but I wasn't, as it was really just  
a meta-model for defining arbitrary key-value pairs and relations  
between them.

> This has all been worked out, there is nothing new to find out, just  
> a book or two to read. From an engineering perspective that is  
> vastly advantageous since it represents a solution with a proven  
> track-record and no surprises.

At least not until you put the system into production and wonder why  
it doesn't actually work at all, or performs worse than 2 people doing  
the same job manually.

Marcel


Reply | Threaded
Open this post in threaded view
|

Re: Design Principles Behind Smalltalk, Revisited

Marcel Weiher
In reply to this post by Paul D. Fernhout

On Dec 28, 2006, at 10:51 , Jecel Assumpcao Jr wrote:

[also snipped rather radically]


[community around a new Squeak?]


That is something I thought a lot about back in 1998. And I watch

closely the community's reaction to stuff like Coke or Slate. What I

concluded was that there are several rather different groups. The

largest group is the eToys users and they are extremely under

represented here. There is a tiny "use Squeak to build something better"

group but most people here are in the "we need a great open source

Smalltalk-80" (there is a lot of overlap, of course). So I don't see how

you can change things and not lose a significant part of this

(squeak-dev) community.


In theory:  refactor.  Make the the "something better" something that is more abstract than either eToys or Smalltalk-80 and allow it to have "subclasses" that are, in effect, eToys and a great Smalltalk-80.  Which is not the same as a big bowl of everything at once.

Increasing complexity is easy; moving it around in tradeoffs is harder; 

reducing it is hardest and generally require a leap of the imagination.


And/or a lot of refactoring... :-)   Or rather:  refactorings that require a leap of the imagination.  But probably not "extensions" to what is already there. 

Sorry to say it, but Squeak still seems pretty complex to me, and moreso 

than ten years ago. :-)


Which is why it is a good thing that the version of Neo Smalltalk I am

working on right now is 16 bits. When you only have 32K objects total

simplicity is not optional.


That is probably a good learning experience, but I am not sure that this is the same kind of simplicity.

Marcel




Reply | Threaded
Open this post in threaded view
|

Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Marcel Weiher
In reply to this post by Howard Stearns

On Jan 2, 2007, at 12:36 , Howard Stearns wrote:

I'm new to the Enterprise Software world, having been mostly in either industrial or "hard problem" software. But the 3-tier application architecture we use for financial processing at our 26 state campuses (University of Wisconsin) appears to me to be typical: large numbers of individual browser (not communicating with each other) interact through a Web server farm to the Application Servers. The overall application is too large as implemented to allow the load to be accommodated, so it is divided by functional area into a farm of individual applications that do not talk directly to each other. This partitioning isn't very successful, because the users tend to do the same functional activities at the same times of day, so most of the applications sit idle while a few are at their limit. I assumed that a single database was used so that the RDBMS could ensure data consistency between all these different applications. 


This sounds so incredibly familiar, even if the domain is quite different.  And I thought that financial processing would be the one area where RDBMSes would be able to shine...

But it turns out that the Oracle database can't handle that, so instead, each functional area gets its own database.  Most of the work done by the system (and most of the work of programmers like me) is to COPY data from one table to another at night when the system is otherwise quiet.


And of course use various bits of 

Maybe this isn't typical, but it is the architecture that Oracle and its PeopleSoft division pushes on us in their extensive training classes. And it appears to be the architecture discussed in the higher education IT conferences and Web sites in the U.S.


I am starting to fear that it *is* typical.  Good thing I am now pretty much completely out of the enterprisey world. :-)

Cheers,

Marcel



Reply | Threaded
Open this post in threaded view
|

Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Göran Krampe
In reply to this post by tblanchard
Todd Blanchard <[hidden email]> wrote:
> On Jan 2, 2007, at 1:36 PM, [hidden email] wrote:
> > Using the RDB as a sharing ground for applications is IMHO really,
> > really bad. Sure, it *works* kinda, but very fast you end up with
> > replicated SQL statements all over the place. Then someone says  
> > "stored
> > procedures" and hey... why not consider OBJECTS? There is probably a
> > reason why people are so worked up about Services these days. :)
>
> So if it is objects instead of tables - how is this different?

Objects offer encapsulation and sharable behavior. Tables offer just
shared data.

> Uh, and the alternative would be what?  Take a typical company that  
> makes and sells stuff.
[SNIP of quick description]
> There are many cross cutting concerns.
 
[SNIP]
> RDBMS systems have a couple nice properties - you can produce lots of  
> different views tailored to a viewpoint/area of responsibility.  They  
> guarantee data consistency in integrity.  Something I find lacking  
> from OO solutions.

I am not saying that I have a Grand Solution. I agree that an ODB is
focused on having a Real object model at the core instead of a bunch of
loosely interrelated tables that can be viewed in 1000 different ways. I
still believe the object model offers real value in the form of proper
behavior, better more natural model, reuse of business rules and so on.

But I agree that ODBs do not offer that "twist and turn"-ability, I just
often see that as an advantage instead of a disadvantage. :)

One project I was involved in was interesting - we built a proper, good
object model in GemStone for quite a complicated domain. Then we added
an "export tool" on top of it which could produce tabular data from it -
you picked what aspects you wanted, calculated or not - and got it out
as a batch job. Then you could analyze that to your heart's content in
an OLAP tool on the side.

It would be interesting to know if there are any ODBs or ORDBs that
offer that ability - but online instead of offline. Most of the use
cases you mentioned was about getting information (and not writing).

regards, Göran

Reply | Threaded
Open this post in threaded view
|

Re: Design Principles Behind Smalltalk, Revisited

Laurence Rozier
In reply to this post by Paul D. Fernhout
Paul,

Thanks for sharing this essay. I think it brings up many important topics which I'd like to comment on one at a time(or perhaps on my blog) ...

On 12/25/06, Paul D. Fernhout <[hidden email]> wrote:
When I was looking at GST vs. Ruby benchmarks today,
http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=gst&lang2=ruby
I came across a link at the bottom to the original "Design Principles
Behind Smalltalk" paper by Dan Ingalls, see:
http://users.ipa.net/~dwighth/smalltalk/byte_augc81/design_principles_behind_smalltalk.html

This essay attempts to look at Dan's 1981 essay and move beyond it,
especially by considering supporting creativity by a group instead of
creativity by an isolated individual, and also by calling into question
"objects" as a sole major metaphor for a system supporting creativity.
Some of this thinking about "objects" is informed by the late William
Kent's work, especiallyKent's book "Data & Reality":
   http://www.bkent.net/
   http://www.bkent.net/Doc/darxrp.htm

<snip>


== objects are an illusions, but useful ones ===

In my undergraduate work in psychology I wrote a senior paper in 1985
entitled: "Why intelligence: Object, Evolution, Stability, and Model"
where I argued the impression of a world of well-defined objects is an
illusion, but a useful one. Considered in the context of the section
above, we can also see that how you parse the world into objects may
depend on the particular goal you have (reaching your car without being
wet) or the particular approach you are taking to reaching the goal
(either the strategy, walking outside, or any helping tool used, like a
neural net or 2D map). Yet, the world is the same, even as what we
consider to be an "object" may vary from time to time; in one situation
"rain" might be an object, in another a "rain drop" might be an object, in
another the weather might be of little interest. So objects are a
*convenience* to reaching goals (in terms of internal states), not reality
(which our best physics says is more continuous than anything else in
terms of quantum probabilities, or at best, more conventionally a
particle-wave duality). So objects, as tools of thought, then have no
meaning apart from the context in which we create them -- and the contexts
include our viewpoints, our goals, our tools, or history, or relations to
the community, and so on.


While there are certainly valuable insights in "Data & Reality" and I would agree that some data objects are merely "tools of thought", *many* objects have meaning and exist independent of our view/model. Quantum physics does tell us that  the boundries of "things" are hard to define precisely but "things" themselves as aggregates are held together by forces of nature not by external views. A keyboard can be remapped in software and different people using it can have different views of the individual key "objects". Even the keyboard itself could be viewed differently - a word processor, game controller, or a cash register. However, any observer, human, machine or otherwise observer of measurable physical characteristics of the keyboard will not see any changes. The wave-functions underlying all of the sub-atomic particles making up that keyboard have a unique history going back at least to just after the big bang.

Today, more and more so-called information systems are being used not just for description but to augment/effect the external world. In this evolving hyperlinked meshverse of simulation and "reality", data often enters into a symbiotic relationship with "reality" where changing views can change "reality".  The "real" Mars Climate Orbiter object was destroyed because it was dependent on the data a model object had. If one  accepts that a paradigm shift is underway which Croquet offers something of value in, then there are important ramifications for database and language choices.

Laurence


Reply | Threaded
Open this post in threaded view
|

Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Ralph Johnson
In reply to this post by James Foster-4
I agree that RDBMSs tend to be knee-jerk reactions that produce as
many problems as they solve.  My favorite alternative is not a real
OODBMS, but instead a pattern that is best exemplified by Prevayler, a
Java framework.  The main idea is to represent your data as objects,
and to ensure that every change to the data is represented by a
Command.  Executing a Command will cause it to write itself out on a
log.  You get persistence by periodically (once a day, perhaps)
writing all your objects out to disk and recovering from crashes by
restarting from the last checkpoint and then replaying the log of
Commands.  You get multiuser access by implementing the transactions
inside the system, making them fast (no disk access) and just having a
single lock for the whole system.

There are lots of things this doesn't give you.  You don't get a query
language.  This is a big deal in Java, not so big a deal in Smalltalk,
because Smalltalk makes a pretty good ad-hoc query language (for
Smalltalk programmers).  You don't get multilanguage access.  The data
must all fit in memory, or suddenly your assumptions of instantanious
transactions break down.  You have to be a decent programmer, though
it really isn't very hard, and if you let your  just barely decent
programmers build a toy system to learn the pattern then they should
do fine.  Lots of people learn the pattern by working on a production
system, but that is probably a bad idea for all patterns, not just
this one.

I did this in Smalltalk long before Prevayler was invented.  In fact,
Smalltalk-80 has always used this pattern.  Smalltalk programs are
stored this way.  Smalltalk programs are classes and methods, not the
ASCII stored on disk.  The ASCII stored on disk is several things,
including a printable representation with things like comments that
programmer need but the computer doesn't.  But the changes file, in
particular, is a log and when your image crashes, you often will
replay the log to get back to the version of the image at the time
your system crashed.  The real data is in the image, the log is just
to make sure your changes are persistent.

But this message stream is about what RDBMSs are good for, and I'd
like to address that.  First, even though SQL is rightly criticised,
it is a standard query language that enables people who are not
intimately familiar with the data to access it to make reports, browse
the data, or write simple applications.  Most groups I've seen have
only programmers using SQL and so don't take advantage of this, but
I've seen shops where secretaries used  SQL or query-by-exmple tools
to make reports for their bosses, so it can be done.  I suppose an OO
database or a Prevayler-like system could provide a query-by-example
tool, too, but I have never seen one.

Second, even though the use of an RDBMS  as the glue for a system is
rightly criticised, this is common practice.  It tends to produce a
big ball of mud, but for many organizations, this seems to be the best
they can do.  See http://www.laputan.org/mud/  One advantage of using
the RDBMS as the glue is that it is supported by nearly every language
and programming environment.  I think that the growing use of SOA will
make this less important, because people will use XML and web services
as the glue rather than a database.

Third, data in an RDBMS is a lot like plain text.  It is more or less
human readable.  It stands up to abuse pretty well, tolerating null
fields, non-normalized data, and use of special characters to store
several values in one field.  For the past few years, I have had
undergraduate teams migrating databases for a city government.  The
students are always amazed at how bad the data is.  I laugh at them.
All databases contain bad data, and it is important for the system to
tolerate it.

An RDBMS works best with relatively simple data models.  One of its
weaknesses is trees, since you have to make a separate query for each
descent.  It also has problems with versioned data, i.e. data with a
date or date range as part of the key.  But it can deal pretty well
with the usual set of objects that represent the state of the
business, and another set of objects that represent important events.
For example, a bank has deposit accounts and loans to customers, and
it records deposits, cash withdrawals, computation of interest,
payments, and checks written to other organizations.  Huge amounts of
data are OK for a RDBMS, but complex data models tend to cause
troubles.

It is wrong to think that persistence = RDBMS.  Architects should also
consider XML, a Prevayler-like system, binary files, OODBMS.  Each has
advantages and disadvantages.  An architect needs to have experience
with all these technologies to make a good decision.  Of course, which
one is best often depends on what is going to happen ten years in the
future, which is impossible to predict.  It is good to encapsulate
this decision so that it can be changed.  This is another advantage of
a SOA; your clients don't care how you store the data.

In the end, technology decisions on large projects depend as much on
politics as on technical reasons.  RDBMSs are the standard, the safe
course.  "Nobody ever got fired for buying Oracle".  They are usually
not chosen for technical reasons.  There are times when they really
are the best technical choice, but they are used a lot more often that
that.

-Ralph

Reply | Threaded
Open this post in threaded view
|

Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

tblanchard
In reply to this post by Göran Krampe
FWIW,

The usual answer to this is to encapsulate behavior behind an api written as stored procedures and forbid direct table access.

On Jan 3, 2007, at 12:26 AM, [hidden email] wrote:

So if it is objects instead of tables - how is this different?


Objects offer encapsulation and sharable behavior. Tables offer just

shared data.




Reply | Threaded
Open this post in threaded view
|

Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Göran Krampe
HiI!

Todd Blanchard <[hidden email]> wrote:
> FWIW,
>
> The usual answer to this is to encapsulate behavior behind an api  
> written as stored procedures and forbid direct table access.

Yes, I kinda wrote that.... :) I wrote:

> Then someone says  
> "stored
> procedures" and hey... why not consider OBJECTS? There is probably a
> reason why people are so worked up about Services these days. :)

regards, Göran

1234