>> In practice, this is not an issue that either Chris or I have noticed,
>> probably because we are not doing software development (saving method >> changes) at the same time that we are running RemoteTask and similar. >> But I can certainly see how it might be a problem if, for example, I >> had a bunch of students running the same image from a network shared >> folder. > > Maybe its time to consider a fundamental change in how method-sources > are referred to. > Taking inspiration from git... A content addressable key-value file > store might solve concurrent access. Each CompiledMethod gets written > to a file named for the hash of its contents, which is the only > reference the Image getsto a method's source. Each such file would It sounds like a lot of files.. so how would I move an image to another computer? I gotta know which files go with which image? Plus, it doesn't really solve the fundamental problem of two images writing to the same file. Mutliple images could still change the same method to the same contents at the same time. You may have made the problem less-likely, except for when you have your first hash-collision of *different* sources (it COULD happen), in which case it wouldn't even require the changes to occur at the same time. I guess it would also lose the order-sequence of the change log too... unless you were to try to use the underlying filesystem's timestamps on each file but... it wouldn't work after I've copied all the files via scp and because they all get new timestamps... Might be better to teach the class, who are learning about Smalltalk anyway, about the nature of the changes file..? |
Another thought...
Upon launching of the image, start a, temporary changes file, [image-name]-[some UUID].changes. Upon image save, append the temp changes file to the main changes file, but in an atomic way (first do the append as a new unique filename, then rename it to the original changes file name). Hmm, but then we would have to check two changes files when accessing sources.. On Thu, Jun 30, 2016 at 3:10 PM, Chris Muller <[hidden email]> wrote: >>> In practice, this is not an issue that either Chris or I have noticed, >>> probably because we are not doing software development (saving method >>> changes) at the same time that we are running RemoteTask and similar. >>> But I can certainly see how it might be a problem if, for example, I >>> had a bunch of students running the same image from a network shared >>> folder. >> >> Maybe its time to consider a fundamental change in how method-sources >> are referred to. >> Taking inspiration from git... A content addressable key-value file >> store might solve concurrent access. Each CompiledMethod gets written >> to a file named for the hash of its contents, which is the only >> reference the Image getsto a method's source. Each such file would > > It sounds like a lot of files.. so how would I move an image to > another computer? I gotta know which files go with which image? > > Plus, it doesn't really solve the fundamental problem of two images > writing to the same file. Mutliple images could still change the same > method to the same contents at the same time. You may have made the > problem less-likely, except for when you have your first > hash-collision of *different* sources (it COULD happen), in which case > it wouldn't even require the changes to occur at the same time. > > I guess it would also lose the order-sequence of the change log too... > unless you were to try to use the underlying filesystem's timestamps > on each file but... it wouldn't work after I've copied all the files > via scp and because they all get new timestamps... > > Might be better to teach the class, who are learning about Smalltalk > anyway, about the nature of the changes file..? |
Sounds like a better idea to me, but I don't think it would solve the problem of multiple images almost simultaneously attempting to update themselves (as in a classroom)
Sent from my iPad > On Jun 30, 2016, at 13:31, Chris Muller <[hidden email]> wrote: > > Another thought... > > Upon launching of the image, start a, temporary changes file, > [image-name]-[some UUID].changes. > > Upon image save, append the temp changes file to the main changes > file, but in an atomic way (first do the append as a new unique > filename, then rename it to the original changes file name). > > Hmm, but then we would have to check two changes files when accessing sources.. > > On Thu, Jun 30, 2016 at 3:10 PM, Chris Muller <[hidden email]> wrote: >>>> In practice, this is not an issue that either Chris or I have noticed, >>>> probably because we are not doing software development (saving method >>>> changes) at the same time that we are running RemoteTask and similar. >>>> But I can certainly see how it might be a problem if, for example, I >>>> had a bunch of students running the same image from a network shared >>>> folder. >>> >>> Maybe its time to consider a fundamental change in how method-sources >>> are referred to. >>> Taking inspiration from git... A content addressable key-value file >>> store might solve concurrent access. Each CompiledMethod gets written >>> to a file named for the hash of its contents, which is the only >>> reference the Image getsto a method's source. Each such file would >> >> It sounds like a lot of files.. so how would I move an image to >> another computer? I gotta know which files go with which image? >> >> Plus, it doesn't really solve the fundamental problem of two images >> writing to the same file. Mutliple images could still change the same >> method to the same contents at the same time. You may have made the >> problem less-likely, except for when you have your first >> hash-collision of *different* sources (it COULD happen), in which case >> it wouldn't even require the changes to occur at the same time. >> >> I guess it would also lose the order-sequence of the change log too... >> unless you were to try to use the underlying filesystem's timestamps >> on each file but... it wouldn't work after I've copied all the files >> via scp and because they all get new timestamps... >> >> Might be better to teach the class, who are learning about Smalltalk >> anyway, about the nature of the changes file..? > |
In reply to this post by Chris Muller-3
On Fri, Jul 1, 2016 at 4:10 AM, Chris Muller <[hidden email]> wrote:
>>> In practice, this is not an issue that either Chris or I have noticed, >>> probably because we are not doing software development (saving method >>> changes) at the same time that we are running RemoteTask and similar. >>> But I can certainly see how it might be a problem if, for example, I >>> had a bunch of students running the same image from a network shared >>> folder. >> >> Maybe its time to consider a fundamental change in how method-sources >> are referred to. >> Taking inspiration from git... A content addressable key-value file >> store might solve concurrent access. Each CompiledMethod gets written >> to a file named for the hash of its contents, which is the only >> reference the Image gets to a method's source. Each such file would > > It sounds like a lot of files.. so how would I move an image to > another computer? I gotta know which files go with which image? Yes, that would be a sticking point. You couldn't just grab any saved Image file off disk. The image would first need to generate an archive transfer file. Except if these methods were automatically pushed through to a private web service, then presuming pervasive web access you, that sleeping Image would pull down its sources where ever it boots back up (which even if that would be cool, is not the problem of the original post.) > > Plus, it doesn't really solve the fundamental problem of two images > writing to the same file. Multiple images could still change the same > method to the same contents at the same time. The hash-named-file would never be written to twice. Its a fixed point in space-time ;) A second image with the same hash would write the *same* contents, so there is no need to write. If the hash-named-file exists, do nothing. To handle any race condition between checking file existence and writing to it, the first image could take an exclusive write lock. > You may have made the > problem less-likely, except for when you have your first > hash-collision of *different* sources (it COULD happen), Some equivalent things... * Pick a random atom from the volume of the moon, then another random pick gets the same atom. http://stackoverflow.com/a/23253149 * Win the national lottery 11 times in a row http://stackoverflow.com/a/29146396 * Your chances of winning the Powerball lottery are far better than finding a hash collision. After all, lotteries often have actual winners. The probability of a hash collision is more like a lottery that has been running since prehistoric times and has never had a winner and will probably not have a winner for billions of years. http://ericsink.com/vcbe/html/cryptographic_hashes.html > in which case it wouldn't even require the changes to occur at the same time. When the second Image finds the hash-named-file already exists, it could check the contents and flag an error if they don't match, so at least its not a silent error. The same when integrating different repositories. > > I guess it would also lose the order-sequence of the change log too... > unless you were to try to use the underlying filesystem's timestamps > on each file but... it wouldn't work after I've copied all the files > via scp and because they all get new timestamps... good point. This would complicate changes-replay for a crashed image. Although this case is only important "now" and could be handled by "/tmp/${username}.${last-image-save-checkpoint-id}" file that records the order of commits for a session, that would be checked for on Image startup - which is similar to what you already suggested... > Upon launching of the image, start a, temporary changes file, > [image-name]-[some UUID].changes. > > Upon image save, append the temp changes file to the main changes > file, but in an atomic way (first do the append as a new unique > filename, then rename it to the original changes file name). > Good idea. This would eliminate the need for my idea here. You'd need some way to match the UUID with the Image being opened, so I guess the UUID would need to stored in the saved Image and be constant for the session, and be updated each save of the Image. The temporary changes filename could include username to distinguish between users. If the same user opens an Image twice, there would be two files and upon recovering from a crash the user would be presented a choice between the two files. > > Might be better to teach the class, who are learning about Smalltalk > anyway, about the nature of the changes file..? This seemed more of a classroom system administration issue. Actually in that case, maybe the network executable startup script just copied both image and changes file to the user's personal area? cheers -ben |
In reply to this post by Ben Coman
Ben,
> On Jun 29, 2016, at 9:48 PM, Ben Coman <[hidden email]> wrote: > >> On Thu, Jun 30, 2016 at 7:07 AM, David T. Lewis <[hidden email]> wrote: >>> On Wed, Jun 29, 2016 at 02:00:19PM -0400, David T. Lewis wrote: >>> Let's not solve the wrong problem folks. I only looked at this for 10 >>> minutes this morning, and I think (but I am not sure) that the issue >>> affects the case of saving the image, and that the normal writing of >>> changes is fine. >> >> I am wrong. >> >> I spent some more time with this, and it is clear that two images saving >> chunks to the same changes file will result in corrupted change records >> in the changes file. It is not just an issue related to the image save >> as I suggested above. >> >> In practice, this is not an issue that either Chris or I have noticed, >> probably because we are not doing software development (saving method >> changes) at the same time that we are running RemoteTask and similar. >> But I can certainly see how it might be a problem if, for example, I >> had a bunch of students running the same image from a network shared >> folder. > > Maybe its time to consider a fundamental change in how method-sources > are referred to. The changes file us not merely the repository for sources on newly minted methods. It is also a log file, a crash recovery mechanism. It is simple. It works. You propose something horribly complex to solve a problem that a) died t affect very many people, b) is easy to work around and c) feasible to fix with a well-known approach. If doesn't wash for me. > Taking inspiration from git... A content addressable key-value file > store might solve concurrent access. Each CompiledMethod gets written > to a file named for the hash of its contents, which is the only > reference the Image getsto a method's source. Each such file would > *only* need be written once and thereafter could be read > simultaneously by multiple Images. Anyone on the network wanting > store the same source would see the file already exists and have > nothing to do. > Perhaps having many individual files implies abysmal performance, > > Or maybe something similar to Mecurial's reflog format [1] could be > used, one file per class. > > The thing about the Image *only* referring to a method's source by its > content hash would seem to great flexibility in backends to > locate/store that source. Possibly... > * stored as individual files as above > * bundled in a zip file in random order > * a school could configure a database server in Image provided to students > * hashes could be thrown at a service on the Internet > * cached locally with a key-value database like LMDB [2] > * remote replication to multiple internet backup locations > * in an emergency you could throw bundle of hashes as a query to the > mail list and get an adhoc response of individual files. > * Inter-Smalltalk image communication > > Pharo has a stated goal to get rid of the changes file. Changing to > content-hash-addressable method-source seems a logicial step along > that road. Even if the Squeak community doesn't want to go so far as > eliminating the .changes file, can they see value in changing method > source references to be content-hashes rather than indexes into a > particular file? > > [1] http://blog.prasoonshukla.com/mercurial-vs-git-scaling > [2] https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database > > > Just having a poke at this, it seems a new form of > CompiledMethodTrailer may need to be defined, being invoked from > CompiledMethod>>sourceCode. CompiledMethodTrailer>>sourceCode would > find the source code based on a content-hash held by the > CompiledMethod. If found, the call to #getSourceFromFile that > accesses the .changes file will be bypassed, and could remain as a > backup. > > cheers -ben > >> >> Dave >> >> >>> >>> Max was running on Pharo, which may or may not be handling changes the >>> same way. I think he may be seeing a different problem from the one I >>> confirmed. >>> >>> So a bit more testing and verification would be in order. I can't look at >>> it now though. >>> >>> Dave >>> >>>> >>>>> On 29-06-2016, at 10:35 AM, Eliot Miranda <[hidden email]> >>>>> wrote: >>>> {snip much rant} >>>> >>>>> The most obvious place where this is an issue is where two images are >>>>> using the same changes file and think they???re appending. Image A seeks >>>>> to the end of the file, ???writes??? stuff. Image B near-simultaneously >>>>> does the same. Eventually each process gets around to pushing data to >>>>> hardware. Oops! And let???s not dwell too much on the problems possible >>>>> if either process causes a truncation of the file. Oh, wait, I think we >>>>> actually had a problem with that some years ago. >>>>> >>>>> The thing is that this problem bites even if we have a unitary primitive >>>>> that both positions and writes if that primitive is written above a >>>>> substrate that, as unix and stdio streams do, separates positioning from >>>>> writing. The primitive is neat but it simply drives the problem further >>>>> underground. >>>> >>>> >>>> Oh absolutely - we only have real control over a small part of it. It >>>> would probably be worth making use of that where we can. >>>> >>>>> >>>>> A more robust solution might be to position, write, reposition, read, >>>>> and compare, shortening on corruption, and retrying, using exponential >>>>> back-off like ethernet packet transmission. Most of the time this adds >>>>> only the overhead of reading what's written. >>>> >>>> Yes, for anything we want reliable that???s probably a good way. A limit >>>> on the number of retries would probably be smart to stop infinite >>>> recursion. Imagine the fun of an error causing infinite retries of writing >>>> an error log about an infinite recursion. On an infinitely large Beowulf >>>> cluster! >>>> >>>> It???s all yet another example of where software meeting reality leads to >>>> nightmares. >>>> >>>> >>>> tim >>>> -- >>>> tim Rowledge; [hidden email]; http://www.rowledge.org/tim >>>> If it was easy, the hardware people would take care of it. > |
In reply to this post by Ben Coman
Sent from my iPad >> >> Might be better to teach the class, who are learning about Smalltalk >> anyway, about the nature of the changes file..? > > This seemed more of a classroom system administration issue. Actually > in that case, maybe the network executable startup script just copied > both image and changes file to the user's personal area? > > cheers -ben > |
In reply to this post by Max Leske
It’s nice to see the enthusiasm (both pro and con) on this issue. I just want to clarify that it has nothing to do with a class room setting, where the changes file is being shared or copied so students have access. I have run into the corrupted .changes file problem myself a couple of times for two reasons mainly:
a) I’ve done a lot of work but need to check something against code that wasn’t modified (and no, checking package changes in Monticello wouldn’t help in the case I’m thinking of. Imagine for example a huge refactoring across multiple packages). So I open a second copy of the image. I keep both images open because its convenient but at some point I accidentally make a change in the wrong image. Now I’m screwed. b) I forgot that I already had the image running (e.g. minimised). I start a fresh copy and work on it until I realise that some of my method sources are broken. Again: screwed. Another thing I want to mention is that the semantics of flush depend on the operating / file system (I have experienced this first hand between Linux (ext4) and OS X (HSF+)). Just because you’ve flushed you’re buffer doesn’t mean that the contents have actually been written to the file. So while it may be true that there is a #flush missing somewhere I would not expect that adding the #flush will solve the problem entirely (which is one reason for proposing a locking mechanism in the first place). Cheers, Max |
Free forum by Nabble | Edit this page |