Hi,
We’ve been experimenting with GOODS as an object database persistency mechanism for a couple of weeks. We plan to use Dali with it too. We’re really excited about this but have a burning question. Question: How do we handle changes to the “model” layer? Example: Suppose that last week we had a “Person” and “Profession” object. Each “Person” could only have one “Profession.” We’ve been saving some data to our Object Database and have 500 people and 25 professions currently saved and used with our app. This week we realize that our world view is wrong. Some people are both a “Chef” and a “Computer Scientist.” They have 2 or more professions. We must change our “model.” The class “Person” must change. How do we fix our object database now? Background: This isn’t really a GOODS specific question is it? I imagine it is the same issue with Magma, GemStone, etc. But in all the examples I’ve found nobody seems to talk about “migrations” or maybe I’ve overlooked it. Can someone point me in the right direction? In NeXT / Apple WebObjects I would write a “migration” to do the necessary steps to change SQL tables, make new DB constraints, etc. as an executable script. I would commit this migration to the repository. Any of the other developers on our team would automatically get it when they pulled from the repository. They don’t need to think about it. As soon as the WebObjects app launches the migrations would be run and they could see the new functionality and make “Marcus” both a “Chef and a Computer Scientist." When we deploy the new code in production, the migration will automatically run there too. This is all explained in detail here: Any helpful pointers are appreciated. Thank you,
_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
I use Magma, but yes, Data-quality management applies to any persistent model, and is a little more involved than at first it seems, because it's tempting to just spank-out a workspace script which enumerates appropriate objects in your DB and fixes/upgrades them.
But doing only that is loaded with caveats. After running a repair script, can one _really_ feel comfortable the model is "all fixed" and ready for production? Did the script really work? Did it catch all cases or was there possibly a bug in it?
To address these concerns, Magma employs a first-class DataRepair which is used to perform data repairs/migrations on Magma models in a controlled and reliable fashion. It is based on the repair process we used at Florida Power and Light for patching/upgrading the models in its GemStone databases.
The requirements of a data-repair are: - Enumerate all objects needing repair. - Of the enumerated objects, #check, #count or #identify the ones that are in need of repair.
- Of the enumerated objects, #improve or #repair the ones that are in need of repair. - Before committing #repair, _verify_ whether the repaired objects are, in fact, repaired by using the same check as for #check, #count and #identify. If they are, commit, if not, abort.
- Connect a brand-new session, run a final #check and report whether the repair was successful or failed. - (Optional) If successful, you might wish to persist the DataRepair object itself somewhere in your database, so it has a history of what was done to it. - Output report of for each of the above actions. Here is an example of Magma's DataRepair constructor: (MagmaDataRepair
at: (MagmaRemoteLocation host: 'prod1' port: 51199) enumerate: [ : session : repair | session root accountsDo: [ : eachAccount | repair check: eachAccount ] ]
check: [ : eachAccount : repair | eachAccount importFilters anySatisfy: [ : each | each class = MaxImportFilter ] ]) repair: [ : eachAccount : repair | eachAccount importFilters withIndexDo: [ : eachFilter : index | eachAccount importFilters at: index put: eachFilter asKeywordFilter ] ]
To this object, I can send any of the 5 actions: #check, #count, #identify, #improve or #repair. The first three are read-only operations. The 4th will commit the repairs as-it-goes, the last one only commits after a successful pre-verification.
Although you the user must specify the enumerate:, check: and repair: blocks, the DataRepair object encapsulates the _process_ of repairing models in a controlled and reliable fashion, by using those user-specified, model-specific blocks.
- Chris On Sat, Apr 26, 2014 at 11:55 AM, Aaron Rosenzweig <[hidden email]> wrote:
_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
> - Enumerate all objects needing repair. Enumerate all objects _possibly_ needing repair. _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by Chris Muller-3
Hi Chris,
You don’t just use Magma, you *are* Magma! Thank you for the thoughtful reply. That does look like a great feature of Magma but it leads me to four questions: 1) Old vs. New? As you instruct MagmaDataRepair to enumerate and “hydrate” your objects from the DB, how do you look at the “old way” and the “new way” at the same time so you can do the #improve message? In my example the old way had a Person -> Profession (a to-one relationship, a single object). In the new way Person ->> Profession (a to-many relationship, an array, or B-Tree, etc). The class in Smalltalk is already “new” so how can it deal with the “old” form of what was stored in the DB? 2) Lazy Improve What if the object itself, “Person” could know how to improve itself? And as you access Persons they automatically improve themselves. Then you wouldn’t need to modify all the objects at once and they could self-heal as needed. It would need a way to look at its old self and “self improve.” Is this a bad idea? Anyone done this? 3) Auto repair / improve It looks like MagmaDataRepair is something I would run from a workspace. Suppose I was working on this app with Dirk and Dave. Do I really have to tell them both “Hey guys, I’m going to Skype you some text to run in our workspace to update our data model but before you do, pull from the repo.” That seems like a lot to remember and go wrong. Isn’t there a way after the pull from the repo and accessing your app that it will “self improve” so I don’t actually have to tell Dirk and Dave anything and when we deploy we don’t have to remember to do anything either. 4) Queued repairs / improvements Imagine one of my team-mates was on vacation for two weeks. After he comes back he wants to get back into our Smalltalk app. During that time there have been 7 (seven) MagmaDataRepair scripts created by various people (Me and Dirk mostly). Isn’t there a way to have them all put together in chronological order so when Dave returns from his cruise he can just “run” and all 7 MagmaDataRepairs will be applied without him thinking about it? Thanks,
On Apr 26, 2014, at 5:13 PM, Chris Muller <[hidden email]> wrote:
_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
Hi Everyone,
I found a short statement here that is interesting: It appears to quote Avi Bryant and says two interesting things: 1) There is a way to do a mass migration of the data on the GOODS server and says we can see the goodsrv documentation for how to do it…. But the only documentation I know of I don’t see anything about this: 2) There is a way to do lazy migrations with an “old / new” pair of values. has any of this been done by anyone or written about anywhere or should I ask Avi for an additional clue? Thanks,
On Apr 26, 2014, at 6:24 PM, Aaron Rosenzweig <[hidden email]> wrote:
_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by Aaron Rosenzweig
Hi Aaron, To avoid migration trouble I usually use lazy initialisers, accessing them as properties. With other words: I keep my initialize method as empty as possible. And when I have a “migrated” value, then I try to come up with a new name and put the migration code in the initialiser. When this initialisation becomes complex I have the following pattern: x ^x ifNil: [ x := self buildX ] buildX | newX | “ do the complex initialisation “ ^newX I do not know if this can be easily applied to databases as well (I myself use pharo / gemstone), and I still have moments where something goes wrong because the data is not migrated properly, because I forgot something, but it saves a most of trouble. Diego
_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by Aaron Rosenzweig
Aaron,
As you say, this isn’t really a GOODS-specific question but a more generic one. The Programming Guide for GemStone/S 64 Bit has a an entire chapter devoted to the topic (see http://gemtalksystems.com/index.php/community/gss-support/documentation/gs64/). There are a variety of approaches to changing the schema of an object. First, since Smalltalk has dynamic typing, you don’t have to have a Profession instance in the profession instance variable of Person. You can store either a Profession instance or a Collection of Profession instances. You can then have accessors that return either one or a collection. Alternatively, in GemStone you can add a method to the old class version to return a collection, and create a new version of the class that stores a collection. Then, at your convenience you can migrate all the instances of the old version to instances of the new version. James On Apr 26, 2014, at 9:55 AM, Aaron Rosenzweig <[hidden email]> wrote:
_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
Aaron,
Because of GemStone’s architecture, we don’t think of an object as being “hydrated.” That is, they are represented on disk in the same way they are represented in memory. Every object is an instance of some class; the concepts of “previously committed state” and “current class” aren’t really meaningful. The closest idea is whether there are other classes in the ClassHistory collection. For performance reasons many developers would prefer not to have the VM do additional checks and migrate an object if the behavior of the old version is equivalent to the behavior of the new version. Next, if an object were automatically migrated by two or more concurrent sessions this would create a commit conflict. Finally, since migration can take significant time and other resources, developers prefer to control when those resources are consumed. Depending on the change, one might choose never to migrate. For example, one could add the following to the “old” class to achieve compatibility with the “new” class: professions ^profession ifNil: [#()] ifNotNil: [Array with: profession]. Now there is less to be gained by migrating. As to your list, (1) GemStone objects know how to migrate themselves (#’migrate’); (2) #’dbVersion’ could be implemented as '^(anObject class classHistory indexOf: anObject class)’; (3) & (4) The migrate methods should be passed the prior state (as in GemStone’s #’migrateFrom:instVarMap:’), not available at any time (so we can garbage collect prior states); and (5) While a framework could provide auto-migration, the default approach would be slow and any refinement would likely be application-dependent. James On Apr 28, 2014, at 7:24 PM, Aaron Rosenzweig <[hidden email]> wrote:
_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
Free forum by Nabble | Edit this page |