I use Magma, but yes, Data-quality management applies to any persistent model, and is a little more involved than at first it seems, because it's tempting to just spank-out a workspace script which enumerates appropriate objects in your DB and fixes/upgrades them.
But doing only that is loaded with caveats. After running a repair script, can one _really_ feel comfortable the model is "all fixed" and ready for production? Did the script really work? Did it catch all cases or was there possibly a bug in it?
To address these concerns, Magma employs a first-class DataRepair which is used to perform data repairs/migrations on Magma models in a controlled and reliable fashion. It is based on the repair process we used at Florida Power and Light for patching/upgrading the models in its GemStone databases.
The requirements of a data-repair are: - Enumerate all objects needing repair. - Of the enumerated objects, #check, #count or #identify the ones that are in need of repair.
- Of the enumerated objects, #improve or #repair the ones that are in need of repair. - Before committing #repair, _verify_ whether the repaired objects are, in fact, repaired by using the same check as for #check, #count and #identify. If they are, commit, if not, abort.
- Connect a brand-new session, run a final #check and report whether the repair was successful or failed. - (Optional) If successful, you might wish to persist the DataRepair object itself somewhere in your database, so it has a history of what was done to it. - Output report of for each of the above actions. Here is an example of Magma's DataRepair constructor: (MagmaDataRepair
at: (MagmaRemoteLocation host: 'prod1' port: 51199) enumerate: [ : session : repair | session root accountsDo: [ : eachAccount | repair check: eachAccount ] ]
check: [ : eachAccount : repair | eachAccount importFilters anySatisfy: [ : each | each class = MaxImportFilter ] ]) repair: [ : eachAccount : repair | eachAccount importFilters withIndexDo: [ : eachFilter : index | eachAccount importFilters at: index put: eachFilter asKeywordFilter ] ]
To this object, I can send any of the 5 actions: #check, #count, #identify, #improve or #repair. The first three are read-only operations. The 4th will commit the repairs as-it-goes, the last one only commits after a successful pre-verification.
Although you the user must specify the enumerate:, check: and repair: blocks, the DataRepair object encapsulates the _process_ of repairing models in a controlled and reliable fashion, by using those user-specified, model-specific blocks.
- Chris On Sat, Apr 26, 2014 at 11:55 AM, Aaron Rosenzweig <[hidden email]> wrote:
_______________________________________________ Magma mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/magma |
> - Enumerate all objects needing repair. Enumerate all objects _possibly_ needing repair. _______________________________________________ Magma mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/magma |
In reply to this post by Chris Muller-3
Hi Chris,
You don’t just use Magma, you *are* Magma! Thank you for the thoughtful reply. That does look like a great feature of Magma but it leads me to four questions: 1) Old vs. New? As you instruct MagmaDataRepair to enumerate and “hydrate” your objects from the DB, how do you look at the “old way” and the “new way” at the same time so you can do the #improve message? In my example the old way had a Person -> Profession (a to-one relationship, a single object). In the new way Person ->> Profession (a to-many relationship, an array, or B-Tree, etc). The class in Smalltalk is already “new” so how can it deal with the “old” form of what was stored in the DB? 2) Lazy Improve What if the object itself, “Person” could know how to improve itself? And as you access Persons they automatically improve themselves. Then you wouldn’t need to modify all the objects at once and they could self-heal as needed. It would need a way to look at its old self and “self improve.” Is this a bad idea? Anyone done this? 3) Auto repair / improve It looks like MagmaDataRepair is something I would run from a workspace. Suppose I was working on this app with Dirk and Dave. Do I really have to tell them both “Hey guys, I’m going to Skype you some text to run in our workspace to update our data model but before you do, pull from the repo.” That seems like a lot to remember and go wrong. Isn’t there a way after the pull from the repo and accessing your app that it will “self improve” so I don’t actually have to tell Dirk and Dave anything and when we deploy we don’t have to remember to do anything either. 4) Queued repairs / improvements Imagine one of my team-mates was on vacation for two weeks. After he comes back he wants to get back into our Smalltalk app. During that time there have been 7 (seven) MagmaDataRepair scripts created by various people (Me and Dirk mostly). Isn’t there a way to have them all put together in chronological order so when Dave returns from his cruise he can just “run” and all 7 MagmaDataRepairs will be applied without him thinking about it? Thanks,
On Apr 26, 2014, at 5:13 PM, Chris Muller <[hidden email]> wrote:
_______________________________________________ Magma mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/magma |
Hi Everyone,
I found a short statement here that is interesting: It appears to quote Avi Bryant and says two interesting things: 1) There is a way to do a mass migration of the data on the GOODS server and says we can see the goodsrv documentation for how to do it…. But the only documentation I know of I don’t see anything about this: 2) There is a way to do lazy migrations with an “old / new” pair of values. has any of this been done by anyone or written about anywhere or should I ask Avi for an additional clue? Thanks,
On Apr 26, 2014, at 6:24 PM, Aaron Rosenzweig <[hidden email]> wrote:
_______________________________________________ Magma mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/magma |
In reply to this post by Aaron Rosenzweig
Hi,
Sure, sometimes migration can require to successive repairs. The first to add the new ivar / relationship, the second to remove the old one. In fact, sometimes there's no big hurry to remove the old one. The old ivars can remain for one release of your app or so, and then, upon the next release, remove it.
Nothing wrong with this if it suits the situation. My only comment is that I don't particuarly like to put commits into lazy-initializing accessors. And, so, my Person objects would be "healing" themselves on every access from a new session. Plus, the DB would be only partially migrated, which can lead to a mess if, subsequently, you want to do yet _another_ migration along the same ivars (e.g., you now have two possible input states instead of just one, which makes getting to 3rd state more difficult).
Well, I don't actually define my Repairs in workspaces but as class-side methods of the root object of the DB, where they remain for at least a few versions. I assume your app has a bootstrap / initialization and so you could certainly invoke the repair at that time. Repair failure would be treated just like a regular Error which I assume would cause the app bootstrap process to abort accordingly..
Right now, multiple successive Repairs are left to the app-developers. Certainly, a SequenceableCollection of DataRepairs could be applied in sequence to your data-model, where each Repair is designated to be applied only to a particular #version of your apps data-model, so that earlier ones would be skipped if they were no longer necessary.
But Magma doesn't know what the individual app's version naming / numbering scheme is, and so this is currently left to the app-developers to handle themselves. - Chris
_______________________________________________ Magma mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/magma |
Free forum by Nabble | Edit this page |