Hi all!
Todd Blanchard <[hidden email]> wrote: > Funny, I just blogged about this. > > http://www.blackbagops.net/?p=93 And a "response" from me: http://goran.krampe.se/blog/Bits/ODBvsRDB.rdoc But... let me ramble a bit about the RDB life cycle stuff. JJ IIRC talks about making a "proper" relational model and then letting multiple apps written in various languages over time operate on it - or parts of it. The idea is that the data "lives forever" and the apps come and go. Is this idea really based on real world observations? I dunno, I have only been exposed to a few "enterprises" so my experience is of course not valid for proofs but I have a feeling that it goes more like this: 1. Someone builds an app. Or hey, the company buys one. A big business system, or whatever. It has tons of interesting data in an RDB. It is not object oriented and it has few or very bad interfaces to the outside world. 2. Another app, bought or homemade wants to use that data - or even manipulate it! Noone at the company has ever thought of the concept of encapsulation - so what the heck, let's go straight to the source - use SQL right into the RDB - these table and column names don't look so hard to grok... For readonly queries we will hopefully get it right, for manipulations we damn sure *hope* to get it right. 3. And yet another apps pops up putting its fingers in the cookie jar too and so it goes. Eventually we have tons of apps written in a bunch of languages using/abusing the RDB, adding tables of their own, breaking a few rules here and there perhaps, not using the proper SQL and so on. It might be tempting to say this is BY DESIGN and that this is GOOD, but I think that is often a reconstruction of the truth. I also don't think that first app ever really dies without the DB going down with it. I also don't think you *first* design the DB, then build apps to use it. Nope, it is that first app that comes with the DB and the DB can't just stand on its own without it. Sure, you might *rewrite* that app using the same DB - but have you ever seen that being actually done? Some of the apps that came afterwards may go, but the original system typically is only *replaced including the DB* with something else when it gets unbearable. Using the RDB as a sharing ground for applications is IMHO really, really bad. Sure, it *works* kinda, but very fast you end up with replicated SQL statements all over the place. Then someone says "stored procedures" and hey... why not consider OBJECTS? There is probably a reason why people are so worked up about Services these days. :) Just my 3 cents of course. regards, Göran PS. On a given day and context lots of factors come into play. I just don't buy simple answers about RDBs being superior for enterprises based on these particular arguments. There are large mission critical systems built using ODBs running at "Enterprises". If it fits they rock. |
In reply to this post by Howard Stearns
On 1/2/07, Howard Stearns <[hidden email]> wrote:
> There are also problems for which pencil and paper really aren't suited > for. Same for RDBMS. They can be made to work with the great expenditure > of resources, chewing gum, bailing wire, duct tape, vise grips, etc.... > What I'm trying to do -- and of course, this isn't a Squeak question at > all, but I hope it is a Squeak community question -- is try to learn > what domain a perfectly running RDBMS is a good fit for by design, > compared with a perfectly running alternative (even a hypothetical one). I am not clear what you mean by "good fit by design" When you asked in an earlier message "whether the math techniques that were developed to provide efficient random access over disks 20 years ago are still valid" were you referring to the math techniques relational model? If so, my hunch is that you are framing the question upon an incorrect perception of the purpose of the relational calculus. My understanding is that the calculus, or specifically SQL, is a _problem statement language_ , a way for engineers to specify what needs to be done, leaving the computer to figure out how to do it. I wasn't doing this 20 years ago, but my reading of history is that engineers knew perfectly well how to make efficient use of disks, and when their employer bought the leading RDBMS they got a slow layer of murky proprietary code, with a shiny standardised data model and API. In other words, RDBs make data access slower, _but_ make engineering easier for some problem domains. David |
In reply to this post by Howard Stearns
Howard Stearns writes:
> What I'm trying to do -- and of course, this isn't a Squeak question at > all, but I hope it is a Squeak community question -- is try to learn > what domain a perfectly running RDBMS is a good fit for by design, > compared with a perfectly running alternative (even a hypothetical one). I'd say if you're placing the database schema at the center of your large system or you're using the query facilities. Relational algebra is often just powerful enough to model commercially interesting systems. It's lack of expressive power makes it a very powerful system to manipulate either during design or by a query optimizer. The great strength of RDBMSes is they are a mathematically decidable and complete system. If you can translate a problem into relational algebra you can always find a solution however such a system is not powerful enough to model arithmetic on natural numbers. Bryce |
In reply to this post by Howard Stearns
> From: Howard Stearns
> I'm asking what kinds of problems RDBMS are > uniquely best at solving (or at least no worse). If you could go from a clean slate for each unique problem, probably none. Same for almost any other widely-deployed technology - almost by definition, if it has been deployed outside its niche then it has been deployed in sub-optimal ways. > I'm not asking whether > they CAN be used for this problem or that. I'm asking this from an > engineering/mathematics perspective, not a business ("we've always done > things this way" or "we like this vendor") perspective. Ah. Theory :-). In theory, I agree with you. In reality, I agree with Andreas - RDBMSs are stable and widely understood, and they aren't *that* bad for quite a wide class of problems. > [Naively, it seems like the obvious solution for this > (mathematically) > is a hashing operation to keep the data evenly distributed over > in-memory systems on a LAN, plus an in-memory cache of recently used > chunks. But let's assume I'm missing something. The task here is to > figure out what I'm not seeing.] Stability and incremental development. How long would it take to develop your system and get the showstopper defect rate down low enough for the system to be in line-of-business use? How would you extend your system when the next application area came along? How would you convince your funder (who wants some part of this system live *now*) to wait long enough to get the defects out? > Maybe this isn't typical Alarmingly, it's not atypical. My day job involves a *lot* of plumbing - connecting up previously-incompatible data sources. This is because most organisations grow organically, and their IT systems grow organically with them. The systems are patch upon patch, and it's never possible to rip them out and start again. > Anyway, either the data AS USED fits into memory or doesn't. I think that's naive. Could I instead propose "the data AS USED fits into memory plus what can reasonably be transferred via the mass storage subsystem"? For many of the apps I use, 98+% of the data accessed comes from RAM - but it's nice for the remaining 2% to be able to be 10x or 100x the size of RAM without major ill effects. However, are you looking at the correct boundary? Consider tape vs disk, L2 cache versus main memory, registers and L1 cache versus L2, etc. I would presume you could get even faster performance reading all this data into a mass of Athlon or Core L2 caches and using the HyperTransports to hook 'em together - why should we use this slow RAM stuff when we have this much faster on-chip capability? In other words, what's your rationale for picking RAM and disk as the boundary? > Is this still the fastest way? (Answer is no.) No. Neither's your proposed approach of using main memory, I suspect. It may, however be the fastest per dollar of expenditure on the end system. > Is there some > circumstance in which it is the fastest? Or the safest? Or allow us to > do something that we could not do otherwise? The latter, yes: develop a sufficiently robust and functional application in a sufficiently short time with a sufficiently cheap set of developers. > Having tools to allow a cult of specialists to break your own computing > model (the relational calculus) is not feature, but a signal that > something is wrong. Agree entirely :-). > Maybe if we define the > problem as "and you only have one commodity box to do it on." That's > fair. Maybe that's it? (Then we need to find an "enterprise" with only > one box...) Or /n/ commodity boxes, where n is the capital the organisation can reasonably deploy in that area. I suspect you're coming from a background of solving "hard" problems, where throwing tin at the job is acceptable, to a world where return on investment determines whether a project can be justified or not. If it's not justifiable, it shouldn't get done - and there are plenty of quotes we've put in where we've been the cheapest, but the company's decided not to proceed because, actually, the cost of the system is more than they would ever save from using it. That's a pretty sharp razor for business applications, but ultimately it's the appropriate one to use - it avoids wasting capital and human effort to produce a shining solution when, ultimately, it would have been cheaper to use lots of monkeys with typewriters. - Peter |
Peter Crowther wrote: >> ... lots of good comments. (Thanks.) > I suspect you're coming from a > background of solving "hard" problems, where throwing tin at the job is > acceptable, to a world where return on investment determines whether a > project can be justified or not. ... Heh. Bingo. But the other side of the coin is that so many projects in this "easy problem" world are failures. A higher failure rate than the clean-slate hard problem world! I've had a $300K hard-problem budget zero'ed because, in part, the last easy-problem folks spent $26M (some say $52M) to implement a very standard three-tier system that didn't work. OK. I get that. This is the way it is, and I've got plenty to say about that, too, over a beer. But I'm an engineer. I want to understand why the three-tier projects fail. And how to avoid that. I know that there are people who can make them succeed despite the math, using leadership and operations research and charm and ruthlessness and lots of money, or whatever. But that's not my domain. I may not have the choice to always pick the right tool for the job, but I do want to try to understand what makes something the right (or wrong) tool. -- Howard Stearns University of Wisconsin - Madison Division of Information Technology mailto:[hidden email] jabber:[hidden email] voice:+1-608-262-3724 |
In reply to this post by Peter Crowther-2
Peter Crowther wrote:
>> From: J J >>> From: Howard Stearns <[hidden email]> >>> That's something I've never really understood: what is the domain in >>> which Relational Databases excel? >> Handling large amounts of enterprise data. > > Handling and dynamically querying large amounts of data where the data > format is not necessarily completely stable and ad-hoc query performance > is important. "Large" here is "much larger than main memory of the > machine(s) concerned". I routinely handle data sets of tens of gigs on > current commodity hardware - storing the data in RAM would be somewhat > faster, but too expensive for the available capital. > > The strength of relational over other forms is in being able to form > arbitrary joins *relatively* efficiently, and hence in being able to > query across data many times larger than main memory without excessive > disk traffic. > > Google isn't a good counter-example, as the ad-hoc querying is missing. > The types of queries done on the Google database are very limited and > are well known in advance. My apologies for an ignorant and naive reply. So forgive if I am way off base. But it seems to me that being able to perform arbitrary joins relatively efficiently is a requisite of an RDBMS because an RDBMS requires you to arbitrarily partition your data in such a way as to require such joins. Any time I've spent reading a book on SQL and speaking of "normalizing" my data, I've never liked what I read. 1st Normal Form contains: Atomicity: Each attribute must contain a single value, not a set of values. Since a list is a natural and common way of grouping things. It is by nature (IMO) an unnatural thing to decompose the list so that I have the express ability to recompose the list. Things like that I don't believe are common to other methods of persistence. I may be wrong. So I don't believe that comparing a requisite of an RDBMS efficient joins to other persistent methods OODBS, filesystem, etc. which don't require such joins to be a valid comparison or at least one in which the RDBMS wins. Of course this is a simple argument and could be debated and go off into the reasons of RDB theory. But there's not enough room for that. I also don't understand what queries could be performed with an RDBMS that can't with Google. Or that couldn't if Google partitioned its data for such queries. After all the data set would have to be partitioned correctly for an RDB to perform said queries also. Personally I've almost always been pleased with the performance of my Google queries. I've also been to many, many, many sites backed by an RDB in which the queries were horribly slow. So I personally would be reticent to say Google made a wrong decision. (and yes, I know you didn't say so either. :) Jimmie |
In reply to this post by J J-6
(I see that the conversation has moved along since the time that I
started to draft this, but here goes...) On Jan 2, 2007, at 11:10 AM, J J wrote: > Sanity check: google is trying to keep a current snapshot of all > websites and run it on commodity hardware. You could do exactly > the same thing with a lot less CPU's using a highly tuned, > distributed RDBMS. They chose to hand tune code instead of an RDBMS. What, really? There are many possible reasons that Google don't use an RDBMS to index the web: stupidity, arrogance, excessive cost of an RDBMS, sound engineering decisions, or a combination of these. According to the computer systems research community, Google has sound engineering reasons for its architecture; they have published papers at top conferences such as OSDI and SOSP. See http:// labs.google.com/papers ("The Google File System" and "BigTable..." might be the most relevant to this conversation). That's not rule out the possibility of stupidity, arrogance, excessive cost, etc.. But it does cast doubt on the unsubstantiated claim that Google could "do exactly the same thing with a lot less CPUs". > >> Finally, in world with great distributed computing power, is >> centralized transaction processing really a superior model? > > Some people seem to think so: > http://lambda-the-ultimate.org/node/463 > > And there is more then that. I believe in that paper (dont have > time to verify) they mention that hardware manufacturers are also > starting to take this approach as well because fine grain locking > is so bad. As you mentioned in a follow-up email, this wasn't the paper you meant. Although it has nothing whatsoever to do with RDBMSes, I would recommend anyone who has enough free time to learn enough Haskell to read that paper. Did you happen to find the intended link? > >> - Working with other applications that are designed to use RDB's? >> Maybe, but that's a tautology, no? > > Again, one has to work in a large company to appreciate the nature > of enterprise application development. I have no doubt that you're right, but it doesn't answer the question: what is it that RDBs *fundamentally* get correct? It's quite like the easy but unsatisfying answer to "why is Smalltalk so great?"... "well, you can't appreciate it unless you've grokked Smalltalk". Certainly RDBs are essential to the operations of the modern enterprise, but how much of this is because RDBs are really the best imaginable approach to this sort of thing, and how much is due to a complicated process of co-evolution that has resulted in the current enterprise software ecosystem? Josh |
In reply to this post by Howard Stearns
> From: Howard Stearns
> Peter Crowther wrote: > lots of good comments. (Thanks.) Thanks for the response - didn't know how they'd be taken. > But the other side of the coin is that so many projects in this "easy > problem" world are failures. A higher failure rate than the clean-slate > hard problem world! Yup. Because the (technically) "easy problems" are organisational nightmares, with vendor/customer politics, sales/tech politics, many stakeholders at the client who are using the system for political infighting and empire-building, and (in general) at least one key stakeholder who will do his/her best to sabotage the project as it's to their personal advantage to have it fail. And that's in a *small* project. By contrast, the clean-slate projects typically have a few key stakeholders, clear and non-conflicting requirements, and less in the way of internal politics. Consider the following (typical) example of an "easy" problem: - The client invites tenders for a specified system; - The salesperson wants their bonus, so deliberately tenders for less than they know the system will cost to develop; - The developing company wins the bid; - The salesperson gets their bonus, and moves on to the next sale; - The project *cannot* be a win for both remaining sides, as it is not possible to bring it in with all features and within budget - and that's ignoring requirements creep; - Somebody loses. Probably both sides lose: there's no profit in the job, and the end system doesn't do what the client wants. Unless you consider the systems angle, you don't see the full system. The *full* system includes all the humans who interact to produce it, and therein lies a large chunk of the problem. > I want to understand why the three-tier projects > fail. And how to avoid that. I know that there are people who can make > them succeed despite the math, using leadership and operations research > and charm and ruthlessness and lots of money, or whatever. But that's > not my domain. I may not have the choice to always pick the right tool > for the job, but I do want to try to understand what makes something > the right (or wrong) tool. >From observation (perhaps with charcoal-tinted spectacles), I think you're looking at the end of the problem that can make a few percent of difference. If, instead, you look at all the messy leadership, charm, ruthlessness and money side, I think you're looking at the side where most of the difference in the success of a project is *actually* made. It's possible to do so, but you need to apply the principles of systems analysis... and competition, of course. Even if your company is sensible and bids at a level where they can do the job, they'll be undercut by a lying salestoad from a company who (eventually) can't. Cynical? Moi? :-) - Peter |
In reply to this post by Joshua Gargus-2
> From: Joshua Gargus
> what is it that RDBs *fundamentally* get correct? People find it easy to understand tabular data and to cross-reference between tables. Relational databases contain tabular data. So people find relational databases easy to understand compared to the alternatives. Other than the uniformity ("everything is either a tuple or an atomic value"), there's little else to commend them - but that ease of understanding has been enough, I think. The rest of the system is optimisation to try to get relatively efficient use of the machine despite expressing the problem in ways that are easy for humans to understand. Oh, and trying to make up for the *dreadful* query language that IBM inflicted on the world with SQL. I learned the principles of relational systems with db++, a very odd relational system that used a very clean relational algebra where select, project and join were first-class operations. It's certainly helped me cut through the fog of SQL, where the principles are far less clear! - Peter |
In reply to this post by Göran Krampe
On Jan 2, 2007, at 1:36 PM, [hidden email] wrote:
Uh, and the alternative would be what? Take a typical company that makes and sells stuff. They have customers (hopefully). The marketing guys want the customer's demographics and contacts to generate targeted messages. The accounting people what to know their credit status, payments, and order totals. The inventory/production planning guys don't really care who is buying what, but they want to see how much of each thing is going out the door. The product development people are looking for trends to spot new kinds of demand trends. The sales guys want recent order history, contact event logs, etc. There are many cross cutting concerns. If you take the naive object model you probably have Customers->>Accounts->>Orders->>Items-----(CatalogItem)->InventoryStatus Works for most traversals, you put the customers in a dictionary at the root by some identifier. But for the people who process orders or do shipping, this model is a drag. They just want orders, and items and navigating to all the orders by searching starting at customers is nuts. So maybe you add a second root for orders for them. Then there's the inventory stuff.... Everybody wants a different view with different entry points. I'm talking enterprise architecture here - bigger than systems which is bigger than applications. Relational databases don't care about your relationships or roots - anything can be a root. Anything can be correlated. Any number of object models can map to a well normalized. RDBMS systems have a couple nice properties - you can produce lots of different views tailored to a viewpoint/area of responsibility. They guarantee data consistency in integrity. Something I find lacking from OO solutions. Here's a fun game. Build an OO db. Get a lot of data. Overtime, deprecate one class and start using another in its place. Give it to another developer who doesn't know the entire history. One day he deletes a class because it appears not to be referenced in the image anymore. 6 months later try to traverse the entire db to build a report and find errors 'no such class for record'. What will you do? This has happened BTW to me. If I have long lived data, with different classes of users and different areas of responsibilities, I want the RDBMS because it is maximally flexible while providing the highest guarantees of data integrity and consistency. The problems I've heard described are the result of poor design and unwillingness to refactor as the business changes and grows. FWIW I have worked at everything from 5 person Mac software companies (anyone remember Tempo II?) to telcos, aerospace, government agencies, and the world's largest internet retailer (a scenario where the relational database turns out to be not the best fit overall). My solution selection hierarchy as the amount of data grows runs: 1) In the image 2) Image segments or PLists in directories 3) RDBMS/ORM like glorp 4) RDBMS with optimized SQL 5) SOA I'm pretty sour on OODBMS's based on my long running experiences with them. -Todd Blanchard |
In reply to this post by J J-6
On Dec 26, 2006, at 3:18 , J J wrote: > >> Again, to contrast with Python, Squeak wants to run the show, but >> Python plays nice with all the other free tools of the GNU/Linux >> ecosystem. > > I keep on seeing this, but it appears largely overstated. Java has > it's own VM, threads etc. as well. Yes, Java. I think Python is very different from Java in this context, as Java also wants to run the show, and I think this is where it is quite similar to Smalltalk. Python on the other hand is quite happy to play along with others, just like Ruby, Perl and, of course, C. [more java comparison] > > And if you mean more to address the tools, well yes you *can* edit > Java code in vi if you really want to. But no one really wants to. > And if your interface to the language is through some program > anyway, then the "barrier" of the code not being on the file system > disappears. Once again, I think that Java is not a valid substitute for Python in this context. In my experience, hacking Python in or Java or Ruby in vi is not just doable but quite useful. I can't say the same for Java. Marcel |
In reply to this post by J J-6
On Jan 2, 2007, at 11:10 , J J wrote: >> J J wrote: >>>> ... I simply believe in the right tool for the right job, >>> and you can't beat an RDB in it's domain. ... >> >> That's something I've never really understood: what is the domain >> in which Relational Databases excel? > > Handling large amounts of enterprise data. If you have never worked > in a large company, you probably wont appreciate this. Well, I have worked in a large-ish enterprise and my experience was that moving *away* from the RDB was central to improving performance around a hundred- to a thousandfold, with the bigger improvement for the project that completely eliminated the RDB. > But in a large company you have a *lot* of data, and different > applications want to see different parts of it. In an RDBMS this is > no problem, you normalize the data and take one of a few strategies > to supply it to the different consumers (e.g. views, stored > procedures, etc.). Har har. Sorry, but I have seen very few actually reusable data models. > >> - Data too large to fit in memory? Well, most uses today may have >> been too large to fit in memory 20 years ago, but aren't today. And >> even for really big data sets today, networks are much faster than >> disk drives, so a distributed database (e.g., a DHT) will be >> faster. Sanity check: Do you think Google uses an RDB for storing >> indexes and a cache of the WWW? > > Are you serious with this (data too large to fit into memory)? And > if you use a good RDBMS then you don't have to worry about disk > speed or distribution. You are kidding, right? > The DBA's can watch how the database is being used and tune this > (i.e. partition the data and move it to another CPU, etc., etc.). To some limited extent, yes. But they can't work miracles, and neither can the DB. In fact, if you know your access patterns, you can (almost) always do better than the DB, simply because there are fewer layers between you and the code. > Oh, but you found one example where someone with a lot of data > didn't use a RDB. I guess we can throw the whole technology sector > in the trash. Sanity check: google is trying to keep a current > snapshot of all websites and run it on commodity hardware. You > could do exactly the same thing with a lot less CPU's using a highly > tuned, distributed RDBMS. That's a big claim, mister. Care to back it up? Marcel |
In reply to this post by Andreas.Raab
On Jan 2, 2007, at 12:57 , Andreas Raab wrote: > Howard Stearns wrote: >> Yes, I'm quite serious. I'm asking what kinds of problems RDBMS are >> uniquely best at solving (or at least no worse). I'm not asking >> whether they CAN be used for this problem or that. I'm asking this >> from an engineering/mathematics perspective, not a business ("we've >> always done things this way" or "we like this vendor") perspective. > > The main benefit: They work. For some definition of 'work', yes. My experience so far is more that they are 'perceived' to work by enterprisey management-types. "If it's got a database (preferably Oracle), then it must be a real business system. Otherwise it's some weird toy". This perception is not to be ignored lightly, but it isn't the same as actual technical merit and not necessarily backed by facts. > There is no question how to use them, apply them to problems, map > them into different domains etc. Really? A pretty smart acquaintance of mine who does "enterprise" consulting with a company that's actually pretty damn good (and has a pretty good reputation and track record AFAICT) once asked the not entirely rhetorical question why, in this day and age, every CRUD application turns into a PhD thesis. The Sports system I was supposed to improve had a very complicated schema, but all the interesting data had to be stored in serialized dictionaries anyway because it simply wasn't regular enough. During the development of the replacement (which doesn't use an RDB at all), we were presented with a 'standardized' relational schema for the domain. We fell over laughing. I don't think we could have printed it on an A0 sheet. And that isn't the only time I have seen this sort of thing. I did come up with a schema that would have worked, and apparently our DBAs were quite impressed with it, but I wasn't, as it was really just a meta-model for defining arbitrary key-value pairs and relations between them. > This has all been worked out, there is nothing new to find out, just > a book or two to read. From an engineering perspective that is > vastly advantageous since it represents a solution with a proven > track-record and no surprises. At least not until you put the system into production and wonder why it doesn't actually work at all, or performs worse than 2 people doing the same job manually. Marcel |
In reply to this post by Paul D. Fernhout
On Dec 28, 2006, at 10:51 , Jecel Assumpcao Jr wrote: [also snipped rather radically]
In theory: refactor. Make the the "something better" something that is more abstract than either eToys or Smalltalk-80 and allow it to have "subclasses" that are, in effect, eToys and a great Smalltalk-80. Which is not the same as a big bowl of everything at once.
And/or a lot of refactoring... :-) Or rather: refactorings that require a leap of the imagination. But probably not "extensions" to what is already there.
That is probably a good learning experience, but I am not sure that this is the same kind of simplicity. Marcel |
In reply to this post by Howard Stearns
On Jan 2, 2007, at 12:36 , Howard Stearns wrote:
This sounds so incredibly familiar, even if the domain is quite different. And I thought that financial processing would be the one area where RDBMSes would be able to shine...
And of course use various bits of
I am starting to fear that it *is* typical. Good thing I am now pretty much completely out of the enterprisey world. :-) Cheers, Marcel |
In reply to this post by tblanchard
Todd Blanchard <[hidden email]> wrote:
> On Jan 2, 2007, at 1:36 PM, [hidden email] wrote: > > Using the RDB as a sharing ground for applications is IMHO really, > > really bad. Sure, it *works* kinda, but very fast you end up with > > replicated SQL statements all over the place. Then someone says > > "stored > > procedures" and hey... why not consider OBJECTS? There is probably a > > reason why people are so worked up about Services these days. :) > > So if it is objects instead of tables - how is this different? Objects offer encapsulation and sharable behavior. Tables offer just shared data. > Uh, and the alternative would be what? Take a typical company that > makes and sells stuff. [SNIP of quick description] > There are many cross cutting concerns. [SNIP] > RDBMS systems have a couple nice properties - you can produce lots of > different views tailored to a viewpoint/area of responsibility. They > guarantee data consistency in integrity. Something I find lacking > from OO solutions. I am not saying that I have a Grand Solution. I agree that an ODB is focused on having a Real object model at the core instead of a bunch of loosely interrelated tables that can be viewed in 1000 different ways. I still believe the object model offers real value in the form of proper behavior, better more natural model, reuse of business rules and so on. But I agree that ODBs do not offer that "twist and turn"-ability, I just often see that as an advantage instead of a disadvantage. :) One project I was involved in was interesting - we built a proper, good object model in GemStone for quite a complicated domain. Then we added an "export tool" on top of it which could produce tabular data from it - you picked what aspects you wanted, calculated or not - and got it out as a batch job. Then you could analyze that to your heart's content in an OLAP tool on the side. It would be interesting to know if there are any ODBs or ORDBs that offer that ability - but online instead of offline. Most of the use cases you mentioned was about getting information (and not writing). regards, Göran |
In reply to this post by Paul D. Fernhout
Paul,
Thanks for sharing this essay. I think it brings up many important topics which I'd like to comment on one at a time(or perhaps on my blog) ... On 12/25/06,
Paul D. Fernhout <[hidden email]> wrote:
When I was looking at GST vs. Ruby benchmarks today, <snip>
While there are certainly valuable insights in "Data & Reality" and I would agree that some data objects are merely "tools of thought", *many* objects have meaning and exist independent of our view/model. Quantum physics does tell us that the boundries of "things" are hard to define precisely but "things" themselves as aggregates are held together by forces of nature not by external views. A keyboard can be remapped in software and different people using it can have different views of the individual key "objects". Even the keyboard itself could be viewed differently - a word processor, game controller, or a cash register. However, any observer, human, machine or otherwise observer of measurable physical characteristics of the keyboard will not see any changes. The wave-functions underlying all of the sub-atomic particles making up that keyboard have a unique history going back at least to just after the big bang. Today, more and more so-called information systems are being used not just for description but to augment/effect the external world. In this evolving hyperlinked meshverse of simulation and "reality", data often enters into a symbiotic relationship with "reality" where changing views can change "reality". The "real" Mars Climate Orbiter object was destroyed because it was dependent on the data a model object had. If one accepts that a paradigm shift is underway which Croquet offers something of value in, then there are important ramifications for database and language choices. Laurence |
In reply to this post by James Foster-4
I agree that RDBMSs tend to be knee-jerk reactions that produce as
many problems as they solve. My favorite alternative is not a real OODBMS, but instead a pattern that is best exemplified by Prevayler, a Java framework. The main idea is to represent your data as objects, and to ensure that every change to the data is represented by a Command. Executing a Command will cause it to write itself out on a log. You get persistence by periodically (once a day, perhaps) writing all your objects out to disk and recovering from crashes by restarting from the last checkpoint and then replaying the log of Commands. You get multiuser access by implementing the transactions inside the system, making them fast (no disk access) and just having a single lock for the whole system. There are lots of things this doesn't give you. You don't get a query language. This is a big deal in Java, not so big a deal in Smalltalk, because Smalltalk makes a pretty good ad-hoc query language (for Smalltalk programmers). You don't get multilanguage access. The data must all fit in memory, or suddenly your assumptions of instantanious transactions break down. You have to be a decent programmer, though it really isn't very hard, and if you let your just barely decent programmers build a toy system to learn the pattern then they should do fine. Lots of people learn the pattern by working on a production system, but that is probably a bad idea for all patterns, not just this one. I did this in Smalltalk long before Prevayler was invented. In fact, Smalltalk-80 has always used this pattern. Smalltalk programs are stored this way. Smalltalk programs are classes and methods, not the ASCII stored on disk. The ASCII stored on disk is several things, including a printable representation with things like comments that programmer need but the computer doesn't. But the changes file, in particular, is a log and when your image crashes, you often will replay the log to get back to the version of the image at the time your system crashed. The real data is in the image, the log is just to make sure your changes are persistent. But this message stream is about what RDBMSs are good for, and I'd like to address that. First, even though SQL is rightly criticised, it is a standard query language that enables people who are not intimately familiar with the data to access it to make reports, browse the data, or write simple applications. Most groups I've seen have only programmers using SQL and so don't take advantage of this, but I've seen shops where secretaries used SQL or query-by-exmple tools to make reports for their bosses, so it can be done. I suppose an OO database or a Prevayler-like system could provide a query-by-example tool, too, but I have never seen one. Second, even though the use of an RDBMS as the glue for a system is rightly criticised, this is common practice. It tends to produce a big ball of mud, but for many organizations, this seems to be the best they can do. See http://www.laputan.org/mud/ One advantage of using the RDBMS as the glue is that it is supported by nearly every language and programming environment. I think that the growing use of SOA will make this less important, because people will use XML and web services as the glue rather than a database. Third, data in an RDBMS is a lot like plain text. It is more or less human readable. It stands up to abuse pretty well, tolerating null fields, non-normalized data, and use of special characters to store several values in one field. For the past few years, I have had undergraduate teams migrating databases for a city government. The students are always amazed at how bad the data is. I laugh at them. All databases contain bad data, and it is important for the system to tolerate it. An RDBMS works best with relatively simple data models. One of its weaknesses is trees, since you have to make a separate query for each descent. It also has problems with versioned data, i.e. data with a date or date range as part of the key. But it can deal pretty well with the usual set of objects that represent the state of the business, and another set of objects that represent important events. For example, a bank has deposit accounts and loans to customers, and it records deposits, cash withdrawals, computation of interest, payments, and checks written to other organizations. Huge amounts of data are OK for a RDBMS, but complex data models tend to cause troubles. It is wrong to think that persistence = RDBMS. Architects should also consider XML, a Prevayler-like system, binary files, OODBMS. Each has advantages and disadvantages. An architect needs to have experience with all these technologies to make a good decision. Of course, which one is best often depends on what is going to happen ten years in the future, which is impossible to predict. It is good to encapsulate this decision so that it can be changed. This is another advantage of a SOA; your clients don't care how you store the data. In the end, technology decisions on large projects depend as much on politics as on technical reasons. RDBMSs are the standard, the safe course. "Nobody ever got fired for buying Oracle". They are usually not chosen for technical reasons. There are times when they really are the best technical choice, but they are used a lot more often that that. -Ralph |
In reply to this post by Göran Krampe
FWIW,
The usual answer to this is to encapsulate behavior behind an api written as stored procedures and forbid direct table access. On Jan 3, 2007, at 12:26 AM, [hidden email] wrote:
|
HiI!
Todd Blanchard <[hidden email]> wrote: > FWIW, > > The usual answer to this is to encapsulate behavior behind an api > written as stored procedures and forbid direct table access. Yes, I kinda wrote that.... :) I wrote: > Then someone says > "stored > procedures" and hey... why not consider OBJECTS? There is probably a > reason why people are so worked up about Services these days. :) regards, Göran |
Free forum by Nabble | Edit this page |