Hi,
We’ve been experimenting with GOODS as an object database persistency mechanism for a couple of weeks. We plan to use Dali with it too. We’re really excited about this but have a burning question. Question: How do we handle changes to the “model” layer? Example: Suppose that last week we had a “Person” and “Profession” object. Each “Person” could only have one “Profession.” We’ve been saving some data to our Object Database and have 500 people and 25 professions currently saved and used with our app. This week we realize that our world view is wrong. Some people are both a “Chef” and a “Computer Scientist.” They have 2 or more professions. We must change our “model.” The class “Person” must change. How do we fix our object database now? Background: This isn’t really a GOODS specific question is it? I imagine it is the same issue with Magma, GemStone, etc. But in all the examples I’ve found nobody seems to talk about “migrations” or maybe I’ve overlooked it. Can someone point me in the right direction? In NeXT / Apple WebObjects I would write a “migration” to do the necessary steps to change SQL tables, make new DB constraints, etc. as an executable script. I would commit this migration to the repository. Any of the other developers on our team would automatically get it when they pulled from the repository. They don’t need to think about it. As soon as the WebObjects app launches the migrations would be run and they could see the new functionality and make “Marcus” both a “Chef and a Computer Scientist." When we deploy the new code in production, the migration will automatically run there too. This is all explained in detail here: Any helpful pointers are appreciated. Thank you,
_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
I use Magma, but yes, Data-quality management applies to any persistent model, and is a little more involved than at first it seems, because it's tempting to just spank-out a workspace script which enumerates appropriate objects in your DB and fixes/upgrades them.
But doing only that is loaded with caveats. After running a repair script, can one _really_ feel comfortable the model is "all fixed" and ready for production? Did the script really work? Did it catch all cases or was there possibly a bug in it?
To address these concerns, Magma employs a first-class DataRepair which is used to perform data repairs/migrations on Magma models in a controlled and reliable fashion. It is based on the repair process we used at Florida Power and Light for patching/upgrading the models in its GemStone databases.
The requirements of a data-repair are: - Enumerate all objects needing repair. - Of the enumerated objects, #check, #count or #identify the ones that are in need of repair.
- Of the enumerated objects, #improve or #repair the ones that are in need of repair. - Before committing #repair, _verify_ whether the repaired objects are, in fact, repaired by using the same check as for #check, #count and #identify. If they are, commit, if not, abort.
- Connect a brand-new session, run a final #check and report whether the repair was successful or failed. - (Optional) If successful, you might wish to persist the DataRepair object itself somewhere in your database, so it has a history of what was done to it. - Output report of for each of the above actions. Here is an example of Magma's DataRepair constructor: (MagmaDataRepair
at: (MagmaRemoteLocation host: 'prod1' port: 51199) enumerate: [ : session : repair | session root accountsDo: [ : eachAccount | repair check: eachAccount ] ]
check: [ : eachAccount : repair | eachAccount importFilters anySatisfy: [ : each | each class = MaxImportFilter ] ]) repair: [ : eachAccount : repair | eachAccount importFilters withIndexDo: [ : eachFilter : index | eachAccount importFilters at: index put: eachFilter asKeywordFilter ] ]
To this object, I can send any of the 5 actions: #check, #count, #identify, #improve or #repair. The first three are read-only operations. The 4th will commit the repairs as-it-goes, the last one only commits after a successful pre-verification.
Although you the user must specify the enumerate:, check: and repair: blocks, the DataRepair object encapsulates the _process_ of repairing models in a controlled and reliable fashion, by using those user-specified, model-specific blocks.
- Chris On Sat, Apr 26, 2014 at 11:55 AM, Aaron Rosenzweig <[hidden email]> wrote:
_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
> - Enumerate all objects needing repair. Enumerate all objects _possibly_ needing repair. _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by Chris Muller-3
Hi Chris,
You don’t just use Magma, you *are* Magma! Thank you for the thoughtful reply. That does look like a great feature of Magma but it leads me to four questions: 1) Old vs. New? As you instruct MagmaDataRepair to enumerate and “hydrate” your objects from the DB, how do you look at the “old way” and the “new way” at the same time so you can do the #improve message? In my example the old way had a Person -> Profession (a to-one relationship, a single object). In the new way Person ->> Profession (a to-many relationship, an array, or B-Tree, etc). The class in Smalltalk is already “new” so how can it deal with the “old” form of what was stored in the DB? 2) Lazy Improve What if the object itself, “Person” could know how to improve itself? And as you access Persons they automatically improve themselves. Then you wouldn’t need to modify all the objects at once and they could self-heal as needed. It would need a way to look at its old self and “self improve.” Is this a bad idea? Anyone done this? 3) Auto repair / improve It looks like MagmaDataRepair is something I would run from a workspace. Suppose I was working on this app with Dirk and Dave. Do I really have to tell them both “Hey guys, I’m going to Skype you some text to run in our workspace to update our data model but before you do, pull from the repo.” That seems like a lot to remember and go wrong. Isn’t there a way after the pull from the repo and accessing your app that it will “self improve” so I don’t actually have to tell Dirk and Dave anything and when we deploy we don’t have to remember to do anything either. 4) Queued repairs / improvements Imagine one of my team-mates was on vacation for two weeks. After he comes back he wants to get back into our Smalltalk app. During that time there have been 7 (seven) MagmaDataRepair scripts created by various people (Me and Dirk mostly). Isn’t there a way to have them all put together in chronological order so when Dave returns from his cruise he can just “run” and all 7 MagmaDataRepairs will be applied without him thinking about it? Thanks,
On Apr 26, 2014, at 5:13 PM, Chris Muller <[hidden email]> wrote:
_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
Hi Everyone,
I found a short statement here that is interesting: It appears to quote Avi Bryant and says two interesting things: 1) There is a way to do a mass migration of the data on the GOODS server and says we can see the goodsrv documentation for how to do it…. But the only documentation I know of I don’t see anything about this: 2) There is a way to do lazy migrations with an “old / new” pair of values. has any of this been done by anyone or written about anywhere or should I ask Avi for an additional clue? Thanks,
On Apr 26, 2014, at 6:24 PM, Aaron Rosenzweig <[hidden email]> wrote:
_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by Aaron Rosenzweig
Hi Aaron, To avoid migration trouble I usually use lazy initialisers, accessing them as properties. With other words: I keep my initialize method as empty as possible. And when I have a “migrated” value, then I try to come up with a new name and put the migration code in the initialiser. When this initialisation becomes complex I have the following pattern: x ^x ifNil: [ x := self buildX ] buildX | newX | “ do the complex initialisation “ ^newX I do not know if this can be easily applied to databases as well (I myself use pharo / gemstone), and I still have moments where something goes wrong because the data is not migrated properly, because I forgot something, but it saves a most of trouble. Diego
_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by Aaron Rosenzweig
Aaron,
As you say, this isn’t really a GOODS-specific question but a more generic one. The Programming Guide for GemStone/S 64 Bit has a an entire chapter devoted to the topic (see http://gemtalksystems.com/index.php/community/gss-support/documentation/gs64/). There are a variety of approaches to changing the schema of an object. First, since Smalltalk has dynamic typing, you don’t have to have a Profession instance in the profession instance variable of Person. You can store either a Profession instance or a Collection of Profession instances. You can then have accessors that return either one or a collection. Alternatively, in GemStone you can add a method to the old class version to return a collection, and create a new version of the class that stores a collection. Then, at your convenience you can migrate all the instances of the old version to instances of the new version. James On Apr 26, 2014, at 9:55 AM, Aaron Rosenzweig <[hidden email]> wrote:
_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
Thank you James, Diego, and Chris for your thoughts,
1) Lazy Initializers Thanks Diego, I see how what you say could work for many cases and is generally good practice even without thinking about migrations. But when it’s something more complex, or there has been a lot of code changes over the week and your only pulling “now” then it wouldn’t handle that. Or when you’ve been making a lot of changes over the month and want to push to production it wouldn’t have a straightforward game plan to follow. 2) Gemstone documentation Thanks James. I got really exited to look into the docs but in the end felt like it was only half of the solution. It does have a notion of “ClassHistory” which every Gemstone class has. That is cool! It also has the notion of the message “migrateFrom:instVarMap:” Where you have access to the old values and can create the new values. That is also very cool! But migrations are not automatic and you cannot chain them together in a predetermined way. “that’s an exercise for the app developer” as what Chris is suggesting. I see value in having a documented way for everyone to upgrade their objects. Sure, app developers could always roll their own methods but if a unified way was available, it would make it easier for all of us and quicker to move between projects in this community. Maybe we can add this to Dali, so it is available for Magma, GOODS, and Gemstone. Actually any object db that has a layer built into Dali. Here’s a rough idea of what I’m thinking: 1) The object knows The object should know how to migrate itself. When it is “hydrated” from the data store it compares previously “committed” state to what the current class is like and automatically provides migration after migration to become current. Even if you have not pulled new code from the repo for over a month, it will upgrade your object database for you. You’ll get all the latest and greatest features from your colleagues without any effort. You don’t have to think. 2) #dbVersion This method would return a value of 1 as the default. Then every time you make changes to the class that would require a migration you would increment the value. Simply adding a new method that can be satisfied with lazy initializers wouldn’t require a change to #dbVersion. If you were to change #profession -> #professions (go from one to many) then this would require you to increment #dbVersion. 3) #migrate2, #migrate3, … #migrateN To go from #dbVersion (1) to (2) then you would define a method called #migrate2. Likewise, to go from #dbVersion (2) to (3) you would define a #migrate3 method. This would be a running history of “how” you migrate to the new hotness of your app. 4) #priorState Inside each of our #migrateN methods we could reference a #priorState and ask it messages like #profession so that we can build our current ivars. 5) extending Dali We could then extend Dali to automatically take our #dbVersion and #migrateN methods to auto-run migrations on our behalf. We could have the choice of doing it as objects are loaded or we could run a command to do them all in a batch like Chris had with his Magma specific command. This idea of full “migrations” doesn’t appear to exist even in a particular commercial flavor such as Gemstone… but maybe through an improved Dali we could all benefit from this approach. Once and for all! No more reinventing the wheel. We all have the same structured game plan no matter what OODBMS we use. Sound good?
On Apr 28, 2014, at 2:14 PM, James Foster <[hidden email]> wrote:
_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
Aaron,
Because of GemStone’s architecture, we don’t think of an object as being “hydrated.” That is, they are represented on disk in the same way they are represented in memory. Every object is an instance of some class; the concepts of “previously committed state” and “current class” aren’t really meaningful. The closest idea is whether there are other classes in the ClassHistory collection. For performance reasons many developers would prefer not to have the VM do additional checks and migrate an object if the behavior of the old version is equivalent to the behavior of the new version. Next, if an object were automatically migrated by two or more concurrent sessions this would create a commit conflict. Finally, since migration can take significant time and other resources, developers prefer to control when those resources are consumed. Depending on the change, one might choose never to migrate. For example, one could add the following to the “old” class to achieve compatibility with the “new” class: professions ^profession ifNil: [#()] ifNotNil: [Array with: profession]. Now there is less to be gained by migrating. As to your list, (1) GemStone objects know how to migrate themselves (#’migrate’); (2) #’dbVersion’ could be implemented as '^(anObject class classHistory indexOf: anObject class)’; (3) & (4) The migrate methods should be passed the prior state (as in GemStone’s #’migrateFrom:instVarMap:’), not available at any time (so we can garbage collect prior states); and (5) While a framework could provide auto-migration, the default approach would be slow and any refinement would likely be application-dependent. James On Apr 28, 2014, at 7:24 PM, Aaron Rosenzweig <[hidden email]> wrote:
_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
Free forum by Nabble | Edit this page |