Hi,
Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello. I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file. I’ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-a-single-image Cheers, Max |
I have several applications which launch multiple copies of the same
image for multicore processing. The images do their work, commit it to database, then exit themselves without saving. Its a great feature. I know OSProcess, when combined with CommandShell, has a RemoteTask which allows efficient forking of the image (via Linux copy-on-write memory sharing) and so a solution like what happens in Windows is not really good. Instead of putting a pop-up in front of the user, perhaps one way to solve the problem would be to, upon image save, simply goes through all the changes since the last save and re-flushes them to the .changes file. That way, if someone does want to save the same image on top of themself, at least it would be whichever saved last "wins".... On Tue, Jun 28, 2016 at 5:04 AM, Max Leske <[hidden email]> wrote: > Hi, > > Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello. > > I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file. > > > I’ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-a-single-image > > > Cheers, > Max |
On Tue, Jun 28, 2016 at 04:47:00PM -0500, Chris Muller wrote:
> > On Tue, Jun 28, 2016 at 5:04 AM, Max Leske <[hidden email]> wrote: > > Hi, > > > > Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello. > > > > I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file. > > If the offsets are wrong in this scenario, it's a bug in the image. The image is supposed to seek to the end of the changes file before writing the next chunk. While this sounds horrible in theory, in practice it works remarkably well, and I have been happily surprised at how reliable it is after many years of using and abusing the feature. That is a very good thing. Adding a lock to prevent the scenario would be bad, because it would surely break a number of other legitimate use cases. > > > > I???ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-a-single-image > > > > > I have several applications which launch multiple copies of the same > image for multicore processing. The images do their work, commit it > to database, then exit themselves without saving. Its a great > feature. That is consistent with my experience. I remember expecting horrible things to happen if I had two images sharing a changes file, but nothing bad ever happened. It just works. > > I know OSProcess, when combined with CommandShell, has a RemoteTask > which allows efficient forking of the image (via Linux copy-on-write > memory sharing) and so a solution like what happens in Windows is not > really good. My assumption with RemoteTask was that someone doing complex or long-running jobs would more or less know what they were doing, and would have the good sense to stop writing to the changes file from a bunch of forked images. But in actual practice, I have never seen a problem related to this. It just works. > > Instead of putting a pop-up in front of the user, perhaps one way to > solve the problem would be to, upon image save, simply goes through > all the changes since the last save and re-flushes them to the > .changes file. > > That way, if someone does want to save the same image on top of > themself, at least it would be whichever saved last "wins".... > There must be a problem somewhere, otherwise Max would not be raising the issue. So whatever combination of operating system and image is having a problem, I would be inclined fix that. Windows cannot be a problem, because the operating system will not permit you to open the changes file twice. The Unix/Linux systems that I have used all work fine. Max, which operating system/VM/image are you using? Is this on a Mac? Dave |
> On 29 Jun 2016, at 02:06, David T. Lewis <[hidden email]> wrote: > > On Tue, Jun 28, 2016 at 04:47:00PM -0500, Chris Muller wrote: >> >> On Tue, Jun 28, 2016 at 5:04 AM, Max Leske <[hidden email]> wrote: >>> Hi, >>> >>> Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello. >>> >>> I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file. >>> > > If the offsets are wrong in this scenario, it's a bug in the image. The > image is supposed to seek to the end of the changes file before writing > the next chunk. While this sounds horrible in theory, in practice it works > remarkably well, and I have been happily surprised at how reliable it > is after many years of using and abusing the feature. That is a very > good thing. > > Adding a lock to prevent the scenario would be bad, because it would > surely break a number of other legitimate use cases. > > >>> >>> I???ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-a-single-image >>> >> >> >> I have several applications which launch multiple copies of the same >> image for multicore processing. The images do their work, commit it >> to database, then exit themselves without saving. Its a great >> feature. Doing work is not the problem. Modifying source code is the problem. > > That is consistent with my experience. I remember expecting horrible > things to happen if I had two images sharing a changes file, but nothing > bad ever happened. It just works. > >> >> I know OSProcess, when combined with CommandShell, has a RemoteTask >> which allows efficient forking of the image (via Linux copy-on-write >> memory sharing) and so a solution like what happens in Windows is not >> really good. > > My assumption with RemoteTask was that someone doing complex or long-running > jobs would more or less know what they were doing, and would have the good > sense to stop writing to the changes file from a bunch of forked images. > But in actual practice, I have never seen a problem related to this. > It just works. > >> >> Instead of putting a pop-up in front of the user, perhaps one way to >> solve the problem would be to, upon image save, simply goes through >> all the changes since the last save and re-flushes them to the >> .changes file. >> >> That way, if someone does want to save the same image on top of >> themself, at least it would be whichever saved last "wins".... >> > > There must be a problem somewhere, otherwise Max would not be raising > the issue. So whatever combination of operating system and image is > having a problem, I would be inclined fix that. :) Thanks Dave! > > Windows cannot be a problem, because the operating system will not > permit you to open the changes file twice. The Unix/Linux systems that > I have used all work fine. > > Max, which operating system/VM/image are you using? Is this on a Mac? Mac OS X 10.11.5, Pharo 6 (60086) > > Dave > > I actually didn’t open the issue for myself but because of a student who ran into this. I’ve been in the same situation before but I’m an experienced user while students at the research group sometimes just spend a couple of weeks with Pharo and then such things are a real problem. Interestingly the issue is, as you already suggested, pretty hard to reproduce i.e., modifying arbitrary methods in both images did not show the symptoms I was looking for. Here’s a reproducible case (at least on my machine): 1. create a new method in both images: foo ^ nil 2. Modify it in one image: foo "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum." ^ nil + 1 3. Modify it in the other image: foo ^ nil - 1 isEmpty ifTrue: [ "blah" nil ] In my case saving in step three produces a syntax error when the source is loaded from file again. I don’t really have a clue as to what the underlying issue is, but I suspect it may have to do with comments and a particular situation in which the position is not being correctly updated before or after writing. I agree with Chris that locks may be problematic, it just seemed like the simplest obvious solution (although of course it gets complicated when an image crashes and doesn’t clean up the lock…). Cheers, Max > |
Max,
Confirming on Linux and Squeak. See below. On Wed, Jun 29, 2016 at 08:53:12AM +0200, Max Leske wrote: > > > On 29 Jun 2016, at 02:06, David T. Lewis <[hidden email]> wrote: > > > > On Tue, Jun 28, 2016 at 04:47:00PM -0500, Chris Muller wrote: > >> > >> On Tue, Jun 28, 2016 at 5:04 AM, Max Leske <[hidden email]> wrote: > >>> Hi, > >>> > >>> Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello. > >>> > >>> I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file. > >>> > > > > If the offsets are wrong in this scenario, it's a bug in the image. The > > image is supposed to seek to the end of the changes file before writing > > the next chunk. While this sounds horrible in theory, in practice it works > > remarkably well, and I have been happily surprised at how reliable it > > is after many years of using and abusing the feature. That is a very > > good thing. > > > > Adding a lock to prevent the scenario would be bad, because it would > > surely break a number of other legitimate use cases. > > > > > >>> > >>> I???ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-a-single-image > >>> > >> > >> > >> I have several applications which launch multiple copies of the same > >> image for multicore processing. The images do their work, commit it > >> to database, then exit themselves without saving. Its a great > >> feature. > > Doing work is not the problem. Modifying source code is the problem. > > > > > That is consistent with my experience. I remember expecting horrible > > things to happen if I had two images sharing a changes file, but nothing > > bad ever happened. It just works. > > > >> > >> I know OSProcess, when combined with CommandShell, has a RemoteTask > >> which allows efficient forking of the image (via Linux copy-on-write > >> memory sharing) and so a solution like what happens in Windows is not > >> really good. > > > > My assumption with RemoteTask was that someone doing complex or long-running > > jobs would more or less know what they were doing, and would have the good > > sense to stop writing to the changes file from a bunch of forked images. > > But in actual practice, I have never seen a problem related to this. > > It just works. > > > >> > >> Instead of putting a pop-up in front of the user, perhaps one way to > >> solve the problem would be to, upon image save, simply goes through > >> all the changes since the last save and re-flushes them to the > >> .changes file. > >> > >> That way, if someone does want to save the same image on top of > >> themself, at least it would be whichever saved last "wins".... > >> > > > > There must be a problem somewhere, otherwise Max would not be raising > > the issue. So whatever combination of operating system and image is > > having a problem, I would be inclined fix that. > > :) Thanks Dave! > > > > > Windows cannot be a problem, because the operating system will not > > permit you to open the changes file twice. The Unix/Linux systems that > > I have used all work fine. > > > > Max, which operating system/VM/image are you using? Is this on a Mac? > > Mac OS X 10.11.5, > Pharo 6 (60086) > > > > > Dave > > > > > > I actually didn???t open the issue for myself but because of a student who ran into this. I???ve been in the same situation before but I???m an experienced user while students at the research group sometimes just spend a couple of weeks with Pharo and then such things are a real problem. > > Interestingly the issue is, as you already suggested, pretty hard to reproduce i.e., modifying arbitrary methods in both images did not show the symptoms I was looking for. > > Here???s a reproducible case (at least on my machine): > > 1. create a new method in both images: > > foo > ^ nil > > 2. Modify it in one image: > > foo > "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum." > ^ nil + 1 > > 3. Modify it in the other image: > > foo > ^ nil - 1 isEmpty ifTrue: [ "blah" nil ] > Confirmed on Linux + Squeak. I did your test above using #forkSqueak so that I had two identical images sharing the same changes file. In each image, I saved the #foo method. At that point, the changes file conntained exactly what I would expect. I then did a save and exit from the child image, followed by a save and exit from the original image. I can see that the changes from the child image are now overwriting the changes from the original parent image. Since the parent image is the one that was saved last, its #foo method now has corrupted source. This is not a scenario that I have ever encountered, but I can see how it might happen in a classroom setting. I can't look into this further right now, but it seems possible that the problem happens only when saving the image, in which case we could force the changes file to seek to end of file before doing the save. But we'll need to do some more testing to make sure that this is the only scenario in which it happens. Dave > In my case saving in step three produces a syntax error when the source is loaded from file again. I don???t really have a clue as to what the underlying issue is, but I suspect it may have to do with comments and a particular situation in which the position is not being correctly updated before or after writing. > > > I agree with Chris that locks may be problematic, it just seemed like the simplest obvious solution (although of course it gets complicated when an image crashes and doesn???t clean up the lock???). > > Cheers, > Max > > > > |
Great, thanks. In the scenario I described I did not save either image. Of course, without saving the problem will not exist as soon as you start the image anew (the old pointers are still valid and new content will be written to the end). The problem does exhibit itself without saving though. Since this is not anything critical, don’t put too much effort into it. I’ll have time in a couple of weeks to look at it in detail and then, once we understand the problem, we can discuss possible solutions. Cheers, Max
|
On Wed, Jun 29, 2016 at 02:00:19PM -0400, David T. Lewis wrote:
> Let's not solve the wrong problem folks. I only looked at this for 10 > minutes this morning, and I think (but I am not sure) that the issue > affects the case of saving the image, and that the normal writing of > changes is fine. I am wrong. I spent some more time with this, and it is clear that two images saving chunks to the same changes file will result in corrupted change records in the changes file. It is not just an issue related to the image save as I suggested above. In practice, this is not an issue that either Chris or I have noticed, probably because we are not doing software development (saving method changes) at the same time that we are running RemoteTask and similar. But I can certainly see how it might be a problem if, for example, I had a bunch of students running the same image from a network shared folder. Dave > > Max was running on Pharo, which may or may not be handling changes the > same way. I think he may be seeing a different problem from the one I > confirmed. > > So a bit more testing and verification would be in order. I can't look at > it now though. > > Dave > > > > >> On 29-06-2016, at 10:35 AM, Eliot Miranda <[hidden email]> > >> wrote: > >> > > {snip much rant} > > > >> The most obvious place where this is an issue is where two images are > >> using the same changes file and think they???re appending. Image A seeks > >> to the end of the file, ???writes??? stuff. Image B near-simultaneously > >> does the same. Eventually each process gets around to pushing data to > >> hardware. Oops! And let???s not dwell too much on the problems possible > >> if either process causes a truncation of the file. Oh, wait, I think we > >> actually had a problem with that some years ago. > >> > >> The thing is that this problem bites even if we have a unitary primitive > >> that both positions and writes if that primitive is written above a > >> substrate that, as unix and stdio streams do, separates positioning from > >> writing. The primitive is neat but it simply drives the problem further > >> underground. > > > > > > Oh absolutely - we only have real control over a small part of it. It > > would probably be worth making use of that where we can. > > > >> > >> A more robust solution might be to position, write, reposition, read, > >> and compare, shortening on corruption, and retrying, using exponential > >> back-off like ethernet packet transmission. Most of the time this adds > >> only the overhead of reading what's written. > > > > Yes, for anything we want reliable that???s probably a good way. A limit > > on the number of retries would probably be smart to stop infinite > > recursion. Imagine the fun of an error causing infinite retries of writing > > an error log about an infinite recursion. On an infinitely large Beowulf > > cluster! > > > > It???s all yet another example of where software meeting reality leads to > > nightmares. > > > > > > tim > > -- > > tim Rowledge; [hidden email]; http://www.rowledge.org/tim > > If it was easy, the hardware people would take care of it. > > > > > > > |
In reply to this post by Max Leske
On Tue, Jun 28, 2016 at 6:04 PM, Max Leske <[hidden email]> wrote:
> Hi, > > Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello. > > I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file. > > > I’ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-a-single-image > > > Cheers, > Max I just learnt something quite surprising that is probably important to be aware of... "Locks given by fcntl are not associated with the file-descriptor or open-file table entries. Instead, they are bound to the process itself. For example, a process has multiple open file descriptors for a particular file and gets a read/write lock using any one of these descriptors. Now closing any of these file descriptors will release the lock, the process holds on the file. The descriptor that was used to acquire the lock in the first place might still be open, but the process will loose its lock. So, it does not require an explicit unlock or a close ONLY on the descriptor that was used to acquire the lock in fcntl call. Doing unlock or close on any of the open file descriptors will release the lock owned by the process on the particular file." https://loonytek.com/2015/01/15/advisory-file-locking-differences-between-posix-and-bsd-locks/ cheers -ben |
> On 30 Jun 2016, at 05:09, Ben Coman <[hidden email]> wrote: > > On Tue, Jun 28, 2016 at 6:04 PM, Max Leske <[hidden email]> wrote: >> Hi, >> >> Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello. >> >> I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file. >> >> >> I’ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-a-single-image >> >> >> Cheers, >> Max > > I just learnt something quite surprising that is probably important to > be aware of... "Locks given by fcntl are not associated with the > file-descriptor or open-file table entries. Instead, they are bound to > the process itself. For example, a process has multiple open file > descriptors for a particular file and gets a read/write lock using any > one of these descriptors. Now closing any of these file descriptors > will release the lock, the process holds on the file. The descriptor > that was used to acquire the lock in the first place might still be > open, but the process will loose its lock. So, it does not require an > explicit unlock or a close ONLY on the descriptor that was used to > acquire the lock in fcntl call. Doing unlock or close on any of the > open file descriptors will release the lock owned by the process on > the particular file." > > https://loonytek.com/2015/01/15/advisory-file-locking-differences-between-posix-and-bsd-locks/ > > cheers -ben > Which would solve the problem of a crashed image not cleaning up its lock. Thanks for sharing Ben. |
On Thu, Jun 30, 2016 at 09:59:37AM +0200, Max Leske wrote:
> > > On 30 Jun 2016, at 05:09, Ben Coman <[hidden email]> wrote: > > > > On Tue, Jun 28, 2016 at 6:04 PM, Max Leske <[hidden email]> wrote: > >> Hi, > >> > >> Opening the same image twice works fine as long as no writes to the .changes file occur. When both images write to the .changes file however it will be broken for both because the offsets for the changes are wrong. This can lead to lost data and predominantly to invalid method source code, which is a pain with Monticello. > >> > >> I suggest that we implement a kind of lock mechanism to ensure that only one image (the first one opened) can write to the .changes file. > >> > >> > >> I???ve opened an issue for Pharo here: https://pharo.fogbugz.com/f/cases/18635/The-changes-file-should-be-bound-to-a-single-image > >> > >> > >> Cheers, > >> Max > > > > I just learnt something quite surprising that is probably important to > > be aware of... "Locks given by fcntl are not associated with the > > file-descriptor or open-file table entries. Instead, they are bound to > > the process itself. For example, a process has multiple open file > > descriptors for a particular file and gets a read/write lock using any > > one of these descriptors. Now closing any of these file descriptors > > will release the lock, the process holds on the file. The descriptor > > that was used to acquire the lock in the first place might still be > > open, but the process will loose its lock. So, it does not require an > > explicit unlock or a close ONLY on the descriptor that was used to > > acquire the lock in fcntl call. Doing unlock or close on any of the > > open file descriptors will release the lock owned by the process on > > the particular file." > > > > https://loonytek.com/2015/01/15/advisory-file-locking-differences-between-posix-and-bsd-locks/ > > > > cheers -ben > > > > Which would solve the problem of a crashed image not cleaning up its lock. Thanks for sharing Ben. FYI, file locking for Unix/Linux/OS X is supported in OSProcess, see UnixProcessFileLockTestCase and the 'file locking' tests in UnixProcessAccessorTestCase. Dave |
Free forum by Nabble | Edit this page |