Hi -
[I couldn't find a mailing list dedicated to SqueakSource discussions so I'm abusing Squeak-dev here. If there is one, please point me to it] We've been using SqueakSource at Qwaq for our internal projects and unfortunately it works only so-so. Mostly it works but every other day it basically spins out at 100% CPU and needs to be killed for good. Since this usually looses the last checkin(s) it's a major annoyance which we work around by sending an email message which includes a portion of the log so that at least you have a chance to see if it was "your" checkin that was just lost. That was until two days ago. About a week or so ago, we ran out of disk space on the box and after restoring it the server was working quite well until it again spun out in 100%. After the restart we noticed that we hadn't lost just one but several dozen checkins - basically everything that happened after we run out of disk space didn't show up. Since this smelled like major desaster I actually dug into the SqueakSource code to see what can be done to restore our data (fortunately, I could see that all of the data was actually on the server). This immediately showed a couple of major issues: 1) When SSFilesystem saves a repository it uses a mutex to serialize access but it doesn't protect a client from modifying the repository *while* it is saving. Since this is a process running in background priority, two saves in quick succession will lead to the second save modifying the repository that the first one is trying to write on disk. And indeed, looking at our problems, many of them show a pattern of two commits closely together like here: 2007-07-10T21:09:57+00:00 PUT /Qwaq/QwaqForums-1.0.42.mcm (qwaq) 2007-07-10T21:09:57+00:00 MODIFIED by SSSession>>putRequest: 2007-07-10T21:09:57+00:00 BEGIN SAVING 2007-07-10T21:11:03+00:00 PUT /Qwaq/QwaqForums-1.0.41.mcm (qwaq) 2007-07-10T21:11:03+00:00 MODIFIED by SSSession>>putRequest: (note that the "END SAVING" is missing before the second put) So it seems like one of the failure modes is that the repository is being modified *while* it is being saved. In addition, I think that one of the reasons while so many of the saved snapshots are "kaputt" is simply that they are broken by the same concurrent modification. I'd appreciate some insight from the authors (or anyone else knowledgeable) what the right fix for this problem might be. I have no idea how Seaside in general deals with these concurrency issues but it seems pretty clear that SSFilesystem is *not* safe in the face of concurrent modifications of the repository. 2) Much to my surprise I found that SSFilesystem actually *has* code that can be used to recover versions if any of the above happens (SSFilesystem>>importVersionsFor:) but it seems to be pretty much unused and affected by some bit rot. One of the things I did for our version is to hook this code up with the case that the last snapshot is kaputt, so that if there is a broken snapshot SSFilesystem automatically imports all the versions that aren't currently present in the repository. I'm attaching the recovery code in case anyone else has had similar problems. Question: Does anyone use similar/other changes like those? If so I'd be interested in learning about them. 3) The speed (and snapshot size) of SSFilesystem is pretty abysmal (on our box a repository snapshot is about 4mins and about 4MB each). Looking at what it's writing it seems that most of it is information that is easily available from the .MCZs and really doesn't need to be kept in the snapshot. Question: Is anyone using alternative storage mechanisms (lightweight & fast perhaps)? If so, what do you use and how does it work out? Generally speaking, what *do* people use for Squeaksource storage given that SSFilesystem is generally quite unreliable? I'd appreciate any help on the above issues. Cheers, - Andreas SSRecovery.1.cs (2K) Download Attachment |
> Question: Is anyone using alternative storage mechanisms (lightweight &
> fast perhaps)? If so, what do you use and how does it work out? > Generally speaking, what *do* people use for Squeaksource storage given > that SSFilesystem is generally quite unreliable? Are you on the latest version? www.squeaksource.com saves the image every hour. For that amount of objects ReferenceStreams simply don't work anymore. We host about 30'000 versions with more than 2 GB of data. The following code is executed from a transcript: [ [ Smalltalk saveSession. SSRepository storage log: 'IMAGE SAVE' ] fork. (Delay forSeconds: 60 * 60) wait ] repeat In case the image crashes (what didn't happen for a very long time now) it automatically reloads the missing versions from the harddisk. SSRepository current projects do: [ :each | SSRepository storage importVersionsFor: (SSRepository current projectAt: each id ] ] Cheers, Lukas -- Lukas Renggli http://www.lukas-renggli.ch |
In reply to this post by Andreas.Raab
On 8/2/07, Andreas Raab <[hidden email]> wrote:
> Generally speaking, what *do* people use for Squeaksource storage given > that SSFilesystem is generally quite unreliable? Just for what it's worth: what we've taken to doing is to always commit to the local filesystem, and then periodically do a 2-way rsync to a directory on a server. That way we're never waiting for network traffic of any kind when we're actually saving/loading/merging, the workflow is identical whether you're on or offline, and everyone has a complete backup of the entire repository at all times. Avi |
In reply to this post by Andreas.Raab
2007/8/3, Andreas Raab <[hidden email]>:
> Hi - > > [I couldn't find a mailing list dedicated to SqueakSource discussions so > I'm abusing Squeak-dev here. If there is one, please point me to it] > > We've been using SqueakSource at Qwaq for our internal projects and > unfortunately it works only so-so. Mostly it works but every other day > it basically spins out at 100% CPU and needs to be killed for good. > Since this usually looses the last checkin(s) it's a major annoyance > which we work around by sending an email message which includes a > portion of the log so that at least you have a chance to see if it was > "your" checkin that was just lost. > > That was until two days ago. About a week or so ago, we ran out of disk > space on the box and after restoring it the server was working quite > well until it again spun out in 100%. After the restart we noticed that > we hadn't lost just one but several dozen checkins - basically > everything that happened after we run out of disk space didn't show up. > > Since this smelled like major desaster I actually dug into the > SqueakSource code to see what can be done to restore our data > (fortunately, I could see that all of the data was actually on the > server). This immediately showed a couple of major issues: > > 1) When SSFilesystem saves a repository it uses a mutex to serialize > access but it doesn't protect a client from modifying the repository > *while* it is saving. Since this is a process running in background > priority, two saves in quick succession will lead to the second save > modifying the repository that the first one is trying to write on disk. > And indeed, looking at our problems, many of them show a pattern of two > commits closely together like here: > > 2007-07-10T21:09:57+00:00 PUT /Qwaq/QwaqForums-1.0.42.mcm (qwaq) > 2007-07-10T21:09:57+00:00 MODIFIED by SSSession>>putRequest: > 2007-07-10T21:09:57+00:00 BEGIN SAVING > 2007-07-10T21:11:03+00:00 PUT /Qwaq/QwaqForums-1.0.41.mcm (qwaq) > 2007-07-10T21:11:03+00:00 MODIFIED by SSSession>>putRequest: > > (note that the "END SAVING" is missing before the second put) So it > seems like one of the failure modes is that the repository is being > modified *while* it is being saved. In addition, I think that one of the > reasons while so many of the saved snapshots are "kaputt" is simply that > they are broken by the same concurrent modification. > > I'd appreciate some insight from the authors (or anyone else > knowledgeable) what the right fix for this problem might be. I have no > idea how Seaside in general deals with these concurrency issues but it > seems pretty clear that SSFilesystem is *not* safe in the face of > concurrent modifications of the repository. > > 2) Much to my surprise I found that SSFilesystem actually *has* code > that can be used to recover versions if any of the above happens > (SSFilesystem>>importVersionsFor:) but it seems to be pretty much unused > and affected by some bit rot. One of the things I did for our version is > to hook this code up with the case that the last snapshot is kaputt, so > that if there is a broken snapshot SSFilesystem automatically imports > all the versions that aren't currently present in the repository. I'm > attaching the recovery code in case anyone else has had similar problems. > > Question: Does anyone use similar/other changes like those? If so I'd be > interested in learning about them. > > 3) The speed (and snapshot size) of SSFilesystem is pretty abysmal (on > our box a repository snapshot is about 4mins and about 4MB each). > Looking at what it's writing it seems that most of it is information > that is easily available from the .MCZs and really doesn't need to be > kept in the snapshot. > > Question: Is anyone using alternative storage mechanisms (lightweight & > fast perhaps)? If so, what do you use and how does it work out? > Generally speaking, what *do* people use for Squeaksource storage given > that SSFilesystem is generally quite unreliable? There is a Magama version in the Impara repository. It has also been ported to Gemstone/S. If have never used either of them or heard any usage reports. Cheers Philippe > I'd appreciate any help on the above issues. > > Cheers, > - Andreas > > > > > > |
In reply to this post by Lukas Renggli
Lukas Renggli wrote:
>> Question: Is anyone using alternative storage mechanisms (lightweight & >> fast perhaps)? If so, what do you use and how does it work out? >> Generally speaking, what *do* people use for Squeaksource storage given >> that SSFilesystem is generally quite unreliable? > > Are you on the latest version? would have been my question too. Bert did some work on the storage to ensure this stuff doesn't happen, adding auto recover if a data file is broken etc. > www.squeaksource.com saves the image every hour. For that amount of > objects ReferenceStreams simply don't work anymore. We host about > 30'000 versions with more than 2 GB of data. After poking him long enough ;-) Bert also did a Magma backend, but we never got around to try it out on a server. OK, we chickened out to use it on a production server ;-) Isn't it time we add a real database backend to squeak source? Saving the image isn't a good solution either, one disk full or similar incident and there goes your entire database. What if all people with real production level usage pitch in a few bucks and we got someone to write something using Glorp and postgres? Or at least test the magma backend a copy of the squeaksource or sophie repository? Michael |
In reply to this post by Lukas Renggli
Lukas Renggli wrote:
> Are you on the latest version? Pretty much so. I skipped some of the tagging stuff but other than that I think we're up to date. > www.squeaksource.com saves the image every hour. For that amount of > objects ReferenceStreams simply don't work anymore. We host about > 30'000 versions with more than 2 GB of data. That's pretty clever. It definitely avoids all of the issues we are seeing and is also faster and atomic. I'll try this. Out of curiosity, is anyone actively working on SqueakSource to address such issues? Thanks! - Andreas > > The following code is executed from a transcript: > > [ [ Smalltalk saveSession. > SSRepository storage log: 'IMAGE SAVE' ] fork. > (Delay forSeconds: 60 * 60) wait ] repeat > > In case the image crashes (what didn't happen for a very long time > now) it automatically reloads the missing versions from the harddisk. > > SSRepository current projects do: [ :each | > SSRepository storage importVersionsFor: (SSRepository current > projectAt: each id ] ] > > Cheers, > Lukas > |
> seeing and is also faster and atomic. I'll try this. Out of curiosity,
> is anyone actively working on SqueakSource to address such issues? Bert (I think), Philippe and Adrian provided (are providing) many extensions and patches, but else there is no active development. One problem is that SqueakSource works on a very old version of Seaside. Not that we didn't had many ideas on how to improve SqueakSource (e.g. a senders/implemntors/text search across all the data would be terrific), but the time is lacking. We always planned to have SqueakSource 2 go with Monticello 2, but ... Cheers, Lukas -- Lukas Renggli http://www.lukas-renggli.ch |
In reply to this post by Michael Rueger-4
Indeed it would be a good idea. Please please do it. I may even
motivate the ESUG board to participate. Merging all the fixes and changes would be good too. > What if all people with real production level usage pitch in a few > bucks and we got someone to write something using Glorp and > postgres? Or at least test the magma backend a copy of the > squeaksource or sophie repository? Stef |
In reply to this post by Michael Rueger-4
Michael Rueger wrote:
> Isn't it time we add a real database backend to squeak source? Saving > the image isn't a good solution either, one disk full or similar > incident and there goes your entire database. Even better, is to run Squeak Source inside a real object database server. No filesystem nonsense, no object-relational mapping struggles, full ACID behavior, etc. See http://seaside.gemstone.com/ss/ for an in-production example. James |
> Even better, is to run Squeak Source inside a real object database
> server. No filesystem nonsense, no object-relational mapping struggles, > full ACID behavior, etc. See http://seaside.gemstone.com/ss/ for an > in-production example. One of the design decisions in the initial version was to use the file-system for the mcz files. Like this versions could also be served directly by Apache, what was the de-facto standard at that time. I agree that there might be better solutions today. Lukas -- Lukas Renggli http://www.lukas-renggli.ch |
In reply to this post by jgfoster
James -
Interesting. Where can I find information about platforms, pricings etc? Thanks, - Andreas James Foster wrote: > Michael Rueger wrote: >> Isn't it time we add a real database backend to squeak source? Saving >> the image isn't a good solution either, one disk full or similar >> incident and there goes your entire database. > Even better, is to run Squeak Source inside a real object database > server. No filesystem nonsense, no object-relational mapping struggles, > full ACID behavior, etc. See http://seaside.gemstone.com/ss/ for an > in-production example. > > James > > |
Andreas Raab wrote:
> Interesting. Where can I find information about platforms, pricings etc? See http://seaside.gemstone.com/docs/Announcement-StS2007.pdf for the Smalltalk Solutions 2007 announcement. The summary is that GemStone/S 64-bit Web Edition will be available in Q3 (by the end of September). The Web Edition is free--even for commercial applications--and is the full commercial version, but with certain limitations, including the following: it will only use one machine (all VMs must be on the same host as the database), it will only use 1 GB for the shared page cache, it will only allow up to 4 GB for the database size, and it will only use one CPU. Overall, this should run a decent-sized SqueakSource repository without much trouble. If you are on Windows, you can use the free VMware Server to host a Linux OS. We will be presenting more this at ESUG. Will we see you there? James |
Free forum by Nabble | Edit this page |