Re: [Seaside] Migrations - Object Database

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Seaside] Migrations - Object Database

Chris Muller-3
I use Magma, but yes, Data-quality management applies to any persistent model, and is a little more involved than at first it seems, because it's tempting to just spank-out a workspace script which enumerates appropriate objects in your DB and fixes/upgrades them.

But doing only that is loaded with caveats.  After running a repair script, can one _really_ feel comfortable the model is "all fixed" and ready for production?  Did the script really work?  Did it catch all cases or was there possibly a bug in it?

To address these concerns, Magma employs a first-class DataRepair which is used to perform data repairs/migrations on Magma models in a controlled and reliable fashion.  It is based on the repair process we used at Florida Power and Light for patching/upgrading the models in its GemStone databases.

The requirements of a data-repair are:

- Enumerate all objects needing repair.
- Of the enumerated objects, #check, #count or #identify the ones that are in need of repair.
- Of the enumerated objects, #improve or #repair the ones that are in need of repair.
- Before committing #repair, _verify_ whether the repaired objects are, in fact, repaired by using the same check as for #check, #count and #identify.  If they are, commit, if not, abort.
- Connect a brand-new session, run a final #check and report whether the repair was successful or failed.
- (Optional) If successful, you might wish to persist the DataRepair object itself somewhere in your database, so it has a history of what was done to it.
- Output report of for each of the above actions.

Here is an example of Magma's DataRepair constructor:

(MagmaDataRepair
at: (MagmaRemoteLocation host: 'prod1' port: 51199)
enumerate: [ : session : repair | session root accountsDo: [ : eachAccount | repair check: eachAccount ] ]
check: [ : eachAccount : repair | eachAccount importFilters anySatisfy: [ : each | each class = MaxImportFilter ] ])
repair: [ : eachAccount : repair | eachAccount importFilters withIndexDo: [ : eachFilter : index | eachAccount importFilters at: index put: eachFilter asKeywordFilter ] ]

To this object, I can send any of the 5 actions:  #check, #count, #identify, #improve or #repair.  The first three are read-only operations.  The 4th will commit the repairs as-it-goes, the last one only commits after a successful pre-verification.

Although you the user must specify the enumerate:, check: and repair: blocks, the DataRepair object encapsulates the _process_ of repairing models in a controlled and reliable fashion, by using those user-specified, model-specific blocks.

 - Chris



On Sat, Apr 26, 2014 at 11:55 AM, Aaron Rosenzweig <[hidden email]> wrote:
Hi,

We’ve been experimenting with GOODS as an object database persistency mechanism for a couple of weeks. We plan to use Dali with it too. We’re really excited about this but have a burning question.

Question: 
How do we handle changes to the “model” layer? 

Example: 
Suppose that last week we had a “Person” and “Profession” object. Each “Person” could only have one “Profession.” We’ve been saving some data to our Object Database and have 500 people and 25 professions currently saved and used with our app.

This week we realize that our world view is wrong. Some people are both a “Chef” and a “Computer Scientist.” They have 2 or more professions. We must change our “model.” The class “Person” must change. 

How do we fix our object database now?

Background:
This isn’t really a GOODS specific question is it? I imagine it is the same issue with Magma, GemStone, etc. But in all the examples I’ve found nobody seems to talk about “migrations” or maybe I’ve overlooked it. Can someone point me in the right direction?

In NeXT / Apple WebObjects I would write a “migration” to do the necessary steps to change SQL tables, make new DB constraints, etc. as an executable script. I would commit this migration to the repository. Any of the other developers on our team would automatically get it when they pulled from the repository. They don’t need to think about it. As soon as the WebObjects app launches the migrations would be run and they could see the new functionality and make “Marcus” both a “Chef and a Computer Scientist." When we deploy the new code in production, the migration will automatically run there too. This is all explained in detail here:


Any helpful pointers are appreciated.

Thank you,
Aaron Rosenzweig / Chat 'n Bike
e:  [hidden email]  t:  <a href="tel:%28301%29%20956-2319" value="+13019562319" target="_blank">(301) 956-2319
Chat 'n Bike Chat 'n Bike


_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside



_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: [Seaside] Migrations - Object Database

Chris Muller-3
> - Enumerate all objects needing repair.

Enumerate all objects _possibly_ needing repair.


_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: [Seaside] Migrations - Object Database

Aaron Rosenzweig
In reply to this post by Chris Muller-3
Hi Chris,

You don’t just use Magma, you *are* Magma! Thank you for the thoughtful reply. That does look like a great feature of Magma but it leads me to four questions:

1) Old vs. New?
As you instruct MagmaDataRepair to enumerate and “hydrate” your objects from the DB, how do you look at the “old way” and the “new way” at the same time so you can do the #improve message? In my example the old way had a Person -> Profession (a to-one relationship, a single object). In the new way Person ->> Profession (a to-many relationship, an array, or B-Tree, etc). The class in Smalltalk is already “new” so how can it deal with the “old” form of what was stored in the DB?

2) Lazy Improve
What if the object itself, “Person” could know how to improve itself? And as you access Persons they automatically improve themselves. Then you wouldn’t need to modify all the objects at once and they could self-heal as needed. It would need a way to look at its old self and “self improve.” Is this a bad idea? Anyone done this?

3) Auto repair / improve
It looks like MagmaDataRepair is something I would run from a workspace. Suppose I was working on this app with Dirk and Dave. Do I really have to tell them both “Hey guys, I’m going to Skype you some text to run in our workspace to update our data model but before you do, pull from the repo.” That seems like a lot to remember and go wrong. Isn’t there a way after the pull from the repo and accessing your app that it will “self improve” so I don’t actually have to tell Dirk and Dave anything and when we deploy we don’t have to remember to do anything either.

4) Queued repairs / improvements
Imagine one of my team-mates was on vacation for two weeks. After he comes back he wants to get back into our Smalltalk app. During that time there have been 7 (seven) MagmaDataRepair scripts created by various people (Me and Dirk mostly). Isn’t there a way to have them all put together in chronological order so when Dave returns from his cruise he can just “run” and all 7 MagmaDataRepairs will be applied without him thinking about it?

Thanks,
Aaron Rosenzweig / Chat 'n Bike
e:  [hidden email]  t:  (301) 956-2319
Chat 'n Bike Chat 'n Bike

On Apr 26, 2014, at 5:13 PM, Chris Muller <[hidden email]> wrote:

I use Magma, but yes, Data-quality management applies to any persistent model, and is a little more involved than at first it seems, because it's tempting to just spank-out a workspace script which enumerates appropriate objects in your DB and fixes/upgrades them.

But doing only that is loaded with caveats.  After running a repair script, can one _really_ feel comfortable the model is "all fixed" and ready for production?  Did the script really work?  Did it catch all cases or was there possibly a bug in it?

To address these concerns, Magma employs a first-class DataRepair which is used to perform data repairs/migrations on Magma models in a controlled and reliable fashion.  It is based on the repair process we used at Florida Power and Light for patching/upgrading the models in its GemStone databases.

The requirements of a data-repair are:

- Enumerate all objects needing repair.
- Of the enumerated objects, #check, #count or #identify the ones that are in need of repair.
- Of the enumerated objects, #improve or #repair the ones that are in need of repair.
- Before committing #repair, _verify_ whether the repaired objects are, in fact, repaired by using the same check as for #check, #count and #identify.  If they are, commit, if not, abort.
- Connect a brand-new session, run a final #check and report whether the repair was successful or failed.
- (Optional) If successful, you might wish to persist the DataRepair object itself somewhere in your database, so it has a history of what was done to it.
- Output report of for each of the above actions.

Here is an example of Magma's DataRepair constructor:

(MagmaDataRepair
at: (MagmaRemoteLocation host: 'prod1' port: 51199)
enumerate: [ : session : repair | session root accountsDo: [ : eachAccount | repair check: eachAccount ] ]
check: [ : eachAccount : repair | eachAccount importFilters anySatisfy: [ : each | each class = MaxImportFilter ] ])
repair: [ : eachAccount : repair | eachAccount importFilters withIndexDo: [ : eachFilter : index | eachAccount importFilters at: index put: eachFilter asKeywordFilter ] ]

To this object, I can send any of the 5 actions:  #check, #count, #identify, #improve or #repair.  The first three are read-only operations.  The 4th will commit the repairs as-it-goes, the last one only commits after a successful pre-verification.

Although you the user must specify the enumerate:, check: and repair: blocks, the DataRepair object encapsulates the _process_ of repairing models in a controlled and reliable fashion, by using those user-specified, model-specific blocks.

 - Chris



On Sat, Apr 26, 2014 at 11:55 AM, Aaron Rosenzweig <[hidden email]> wrote:
Hi,

We’ve been experimenting with GOODS as an object database persistency mechanism for a couple of weeks. We plan to use Dali with it too. We’re really excited about this but have a burning question.

Question: 
How do we handle changes to the “model” layer? 

Example: 
Suppose that last week we had a “Person” and “Profession” object. Each “Person” could only have one “Profession.” We’ve been saving some data to our Object Database and have 500 people and 25 professions currently saved and used with our app.

This week we realize that our world view is wrong. Some people are both a “Chef” and a “Computer Scientist.” They have 2 or more professions. We must change our “model.” The class “Person” must change. 

How do we fix our object database now?

Background:
This isn’t really a GOODS specific question is it? I imagine it is the same issue with Magma, GemStone, etc. But in all the examples I’ve found nobody seems to talk about “migrations” or maybe I’ve overlooked it. Can someone point me in the right direction?

In NeXT / Apple WebObjects I would write a “migration” to do the necessary steps to change SQL tables, make new DB constraints, etc. as an executable script. I would commit this migration to the repository. Any of the other developers on our team would automatically get it when they pulled from the repository. They don’t need to think about it. As soon as the WebObjects app launches the migrations would be run and they could see the new functionality and make “Marcus” both a “Chef and a Computer Scientist." When we deploy the new code in production, the migration will automatically run there too. This is all explained in detail here:


Any helpful pointers are appreciated.

Thank you,
Aaron Rosenzweig / Chat 'n Bike
e:  [hidden email]  t:  <a href="tel:%28301%29%20956-2319" value="+13019562319" target="_blank">(301) 956-2319
Chat 'n Bike Chat 'n Bike


_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside


_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma


_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: [Seaside] Migrations - Object Database

Aaron Rosenzweig
Hi Everyone,

I found a short statement here that is interesting:


It appears to quote Avi Bryant and says two interesting things:

1) There is a way to do a mass migration of the data on the GOODS server and says we can see the goodsrv documentation for how to do it…. But the only documentation I know of I don’t see anything about this:


2) There is a way to do lazy migrations with an “old / new” pair of values. 

has any of this been done by anyone or written about anywhere or should I ask Avi for an additional clue?

Thanks,
Aaron Rosenzweig / Chat 'n Bike
e:  [hidden email]  t:  (301) 956-2319
Chat 'n Bike Chat 'n Bike

On Apr 26, 2014, at 6:24 PM, Aaron Rosenzweig <[hidden email]> wrote:

Hi Chris,

You don’t just use Magma, you *are* Magma! Thank you for the thoughtful reply. That does look like a great feature of Magma but it leads me to four questions:

1) Old vs. New?
As you instruct MagmaDataRepair to enumerate and “hydrate” your objects from the DB, how do you look at the “old way” and the “new way” at the same time so you can do the #improve message? In my example the old way had a Person -> Profession (a to-one relationship, a single object). In the new way Person ->> Profession (a to-many relationship, an array, or B-Tree, etc). The class in Smalltalk is already “new” so how can it deal with the “old” form of what was stored in the DB?

2) Lazy Improve
What if the object itself, “Person” could know how to improve itself? And as you access Persons they automatically improve themselves. Then you wouldn’t need to modify all the objects at once and they could self-heal as needed. It would need a way to look at its old self and “self improve.” Is this a bad idea? Anyone done this?

3) Auto repair / improve
It looks like MagmaDataRepair is something I would run from a workspace. Suppose I was working on this app with Dirk and Dave. Do I really have to tell them both “Hey guys, I’m going to Skype you some text to run in our workspace to update our data model but before you do, pull from the repo.” That seems like a lot to remember and go wrong. Isn’t there a way after the pull from the repo and accessing your app that it will “self improve” so I don’t actually have to tell Dirk and Dave anything and when we deploy we don’t have to remember to do anything either.

4) Queued repairs / improvements
Imagine one of my team-mates was on vacation for two weeks. After he comes back he wants to get back into our Smalltalk app. During that time there have been 7 (seven) MagmaDataRepair scripts created by various people (Me and Dirk mostly). Isn’t there a way to have them all put together in chronological order so when Dave returns from his cruise he can just “run” and all 7 MagmaDataRepairs will be applied without him thinking about it?

Thanks,
Aaron Rosenzweig / Chat 'n Bike
e:  [hidden email]  t:  (301) 956-2319
Chat 'n Bike Chat 'n Bike

On Apr 26, 2014, at 5:13 PM, Chris Muller <[hidden email]> wrote:

I use Magma, but yes, Data-quality management applies to any persistent model, and is a little more involved than at first it seems, because it's tempting to just spank-out a workspace script which enumerates appropriate objects in your DB and fixes/upgrades them.

But doing only that is loaded with caveats.  After running a repair script, can one _really_ feel comfortable the model is "all fixed" and ready for production?  Did the script really work?  Did it catch all cases or was there possibly a bug in it?

To address these concerns, Magma employs a first-class DataRepair which is used to perform data repairs/migrations on Magma models in a controlled and reliable fashion.  It is based on the repair process we used at Florida Power and Light for patching/upgrading the models in its GemStone databases.

The requirements of a data-repair are:

- Enumerate all objects needing repair.
- Of the enumerated objects, #check, #count or #identify the ones that are in need of repair.
- Of the enumerated objects, #improve or #repair the ones that are in need of repair.
- Before committing #repair, _verify_ whether the repaired objects are, in fact, repaired by using the same check as for #check, #count and #identify.  If they are, commit, if not, abort.
- Connect a brand-new session, run a final #check and report whether the repair was successful or failed.
- (Optional) If successful, you might wish to persist the DataRepair object itself somewhere in your database, so it has a history of what was done to it.
- Output report of for each of the above actions.

Here is an example of Magma's DataRepair constructor:

(MagmaDataRepair
at: (MagmaRemoteLocation host: 'prod1' port: 51199)
enumerate: [ : session : repair | session root accountsDo: [ : eachAccount | repair check: eachAccount ] ]
check: [ : eachAccount : repair | eachAccount importFilters anySatisfy: [ : each | each class = MaxImportFilter ] ])
repair: [ : eachAccount : repair | eachAccount importFilters withIndexDo: [ : eachFilter : index | eachAccount importFilters at: index put: eachFilter asKeywordFilter ] ]

To this object, I can send any of the 5 actions:  #check, #count, #identify, #improve or #repair.  The first three are read-only operations.  The 4th will commit the repairs as-it-goes, the last one only commits after a successful pre-verification.

Although you the user must specify the enumerate:, check: and repair: blocks, the DataRepair object encapsulates the _process_ of repairing models in a controlled and reliable fashion, by using those user-specified, model-specific blocks.

 - Chris



On Sat, Apr 26, 2014 at 11:55 AM, Aaron Rosenzweig <[hidden email]> wrote:
Hi,

We’ve been experimenting with GOODS as an object database persistency mechanism for a couple of weeks. We plan to use Dali with it too. We’re really excited about this but have a burning question.

Question: 
How do we handle changes to the “model” layer? 

Example: 
Suppose that last week we had a “Person” and “Profession” object. Each “Person” could only have one “Profession.” We’ve been saving some data to our Object Database and have 500 people and 25 professions currently saved and used with our app.

This week we realize that our world view is wrong. Some people are both a “Chef” and a “Computer Scientist.” They have 2 or more professions. We must change our “model.” The class “Person” must change. 

How do we fix our object database now?

Background:
This isn’t really a GOODS specific question is it? I imagine it is the same issue with Magma, GemStone, etc. But in all the examples I’ve found nobody seems to talk about “migrations” or maybe I’ve overlooked it. Can someone point me in the right direction?

In NeXT / Apple WebObjects I would write a “migration” to do the necessary steps to change SQL tables, make new DB constraints, etc. as an executable script. I would commit this migration to the repository. Any of the other developers on our team would automatically get it when they pulled from the repository. They don’t need to think about it. As soon as the WebObjects app launches the migrations would be run and they could see the new functionality and make “Marcus” both a “Chef and a Computer Scientist." When we deploy the new code in production, the migration will automatically run there too. This is all explained in detail here:


Any helpful pointers are appreciated.

Thank you,
Aaron Rosenzweig / Chat 'n Bike
e:  [hidden email]  t:  <a href="tel:%28301%29%20956-2319" value="+13019562319" target="_blank">(301) 956-2319
Chat 'n Bike Chat 'n Bike


_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside


_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma



_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: [Seaside] Migrations - Object Database

Chris Muller-4
In reply to this post by Aaron Rosenzweig
Hi,

...

1) Old vs. New?
As you instruct MagmaDataRepair to enumerate and “hydrate” your objects from the DB, how do you look at the “old way” and the “new way” at the same time so you can do the #improve message? In my example the old way had a Person -> Profession (a to-one relationship, a single object). In the new way Person ->> Profession (a to-many relationship, an array, or B-Tree, etc). The class in Smalltalk is already “new” so how can it deal with the “old” form of what was stored in the DB?

Sure, sometimes migration can require to successive repairs.  The first to add the new ivar / relationship, the second to remove the old one.

In fact, sometimes there's no big hurry to remove the old one.  The old ivars can remain for one release of your app or so, and then, upon the next release, remove it.
 
2) Lazy Improve
What if the object itself, “Person” could know how to improve itself? And as you access Persons they automatically improve themselves. Then you wouldn’t need to modify all the objects at once and they could self-heal as needed. It would need a way to look at its old self and “self improve.” Is this a bad idea? Anyone done this?

Nothing wrong with this if it suits the situation.  My only comment is that I don't particuarly like to put commits into lazy-initializing accessors.  And, so, my Person objects would be "healing" themselves on every access from a new session.  Plus, the DB would be only partially migrated, which can lead to a mess if, subsequently, you want to do yet _another_ migration along the same ivars (e.g., you now have two possible input states instead of just one, which makes getting to 3rd state more difficult).
 

3) Auto repair / improve
It looks like MagmaDataRepair is something I would run from a workspace. Suppose I was working on this app with Dirk and Dave. Do I really have to tell them both “Hey guys, I’m going to Skype you some text to run in our workspace to update our data model but before you do, pull from the repo.” That seems like a lot to remember and go wrong. Isn’t there a way after the pull from the repo and accessing your app that it will “self improve” so I don’t actually have to tell Dirk and Dave anything and when we deploy we don’t have to remember to do anything either.

Well, I don't actually define my Repairs in workspaces but as class-side methods of the root object of the DB, where they remain for at least a few versions.

I assume your app has a bootstrap / initialization and so you could certainly invoke the repair at that time.  Repair failure would be treated just like a regular Error which I assume would cause the app bootstrap process to abort accordingly..
 
4) Queued repairs / improvements
Imagine one of my team-mates was on vacation for two weeks. After he comes back he wants to get back into our Smalltalk app. During that time there have been 7 (seven) MagmaDataRepair scripts created by various people (Me and Dirk mostly). Isn’t there a way to have them all put together in chronological order so when Dave returns from his cruise he can just “run” and all 7 MagmaDataRepairs will be applied without him thinking about it?

Right now, multiple successive Repairs are left to the app-developers.  Certainly, a SequenceableCollection of DataRepairs could be applied in sequence to your data-model, where each Repair is designated to be applied only to a particular #version of your apps data-model, so that earlier ones would be skipped if they were no longer necessary.

But Magma doesn't know what the individual app's version naming / numbering scheme is, and so this is currently left to the app-developers to handle themselves.

  - Chris

_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma