[SqueakSource] SSFilesystem woes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[SqueakSource] SSFilesystem woes

Andreas.Raab
Hi -

[I couldn't find a mailing list dedicated to SqueakSource discussions so
I'm abusing Squeak-dev here. If there is one, please point me to it]

We've been using SqueakSource at Qwaq for our internal projects and
unfortunately it works only so-so. Mostly it works but every other day
it basically spins out at 100% CPU and needs to be killed for good.
Since this usually looses the last checkin(s) it's a major annoyance
which we work around by sending an email message which includes a
portion of the log so that at least you have a chance to see if it was
"your" checkin that was just lost.

That was until two days ago. About a week or so ago, we ran out of disk
space on the box and after restoring it the server was working quite
well until it again spun out in 100%. After the restart we noticed that
we hadn't lost just one but several dozen checkins - basically
everything that happened after we run out of disk space didn't show up.

Since this smelled like major desaster I actually dug into the
SqueakSource code to see what can be done to restore our data
(fortunately, I could see that all of the data was actually on the
server). This immediately showed a couple of major issues:

1) When SSFilesystem saves a repository it uses a mutex to serialize
access but it doesn't protect a client from modifying the repository
*while* it is saving. Since this is a process running in background
priority, two saves in quick succession will lead to the second save
modifying the repository that the first one is trying to write on disk.
And indeed, looking at our problems, many of them show a pattern of two
commits closely together like here:

2007-07-10T21:09:57+00:00 PUT /Qwaq/QwaqForums-1.0.42.mcm (qwaq)
2007-07-10T21:09:57+00:00 MODIFIED by SSSession>>putRequest:
2007-07-10T21:09:57+00:00 BEGIN SAVING
2007-07-10T21:11:03+00:00 PUT /Qwaq/QwaqForums-1.0.41.mcm (qwaq)
2007-07-10T21:11:03+00:00 MODIFIED by SSSession>>putRequest:

(note that the "END SAVING" is missing before the second put) So it
seems like one of the failure modes is that the repository is being
modified *while* it is being saved. In addition, I think that one of the
reasons while so many of the saved snapshots are "kaputt" is simply that
they are broken by the same concurrent modification.

I'd appreciate some insight from the authors (or anyone else
knowledgeable) what the right fix for this problem might be. I have no
idea how Seaside in general deals with these concurrency issues but it
seems pretty clear that SSFilesystem is *not* safe in the face of
concurrent modifications of the repository.

2) Much to my surprise I found that SSFilesystem actually *has* code
that can be used to recover versions if any of the above happens
(SSFilesystem>>importVersionsFor:) but it seems to be pretty much unused
and affected by some bit rot. One of the things I did for our version is
to hook this code up with the case that the last snapshot is kaputt, so
that if there is a broken snapshot SSFilesystem automatically imports
all the versions that aren't currently present in the repository. I'm
attaching the recovery code in case anyone else has had similar problems.

Question: Does anyone use similar/other changes like those? If so I'd be
interested in learning about them.

3) The speed (and snapshot size) of SSFilesystem is pretty abysmal (on
our box a repository snapshot is about 4mins and about 4MB each).
Looking at what it's writing it seems that most of it is information
that is easily available from the .MCZs and really doesn't need to be
kept in the snapshot.

Question: Is anyone using alternative storage mechanisms (lightweight &
fast perhaps)? If so, what do you use and how does it work out?
Generally speaking, what *do* people use for Squeaksource storage given
that SSFilesystem is generally quite unreliable?

I'd appreciate any help on the above issues.

Cheers,
   - Andreas




SSRecovery.1.cs (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [SqueakSource] SSFilesystem woes

Lukas Renggli
> Question: Is anyone using alternative storage mechanisms (lightweight &
> fast perhaps)? If so, what do you use and how does it work out?
> Generally speaking, what *do* people use for Squeaksource storage given
> that SSFilesystem is generally quite unreliable?

Are you on the latest version?

www.squeaksource.com saves the image every hour. For that amount of
objects ReferenceStreams simply don't work anymore. We host about
30'000 versions with more than 2 GB of data.

The following code is executed from a transcript:

[ [ Smalltalk saveSession.
     SSRepository storage log: 'IMAGE SAVE' ] fork.
  (Delay forSeconds: 60 * 60) wait ] repeat

In case the image crashes (what didn't happen for a very long time
now) it automatically reloads the missing versions from the harddisk.

SSRepository current projects do: [ :each |
  SSRepository storage importVersionsFor: (SSRepository current
    projectAt: each id ] ]

Cheers,
Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

Reply | Threaded
Open this post in threaded view
|

Re: [SqueakSource] SSFilesystem woes

Avi Bryant-2
In reply to this post by Andreas.Raab
On 8/2/07, Andreas Raab <[hidden email]> wrote:

> Generally speaking, what *do* people use for Squeaksource storage given
> that SSFilesystem is generally quite unreliable?

Just for what it's worth: what we've taken to doing is to always
commit to the local filesystem, and then periodically do a 2-way rsync
to a directory on a server.  That way we're never waiting for network
traffic of any kind when we're actually saving/loading/merging, the
workflow is identical whether you're on or offline, and everyone has a
complete backup of the entire repository at all times.

Avi

Reply | Threaded
Open this post in threaded view
|

Re: [SqueakSource] SSFilesystem woes

Philippe Marschall
In reply to this post by Andreas.Raab
2007/8/3, Andreas Raab <[hidden email]>:

> Hi -
>
> [I couldn't find a mailing list dedicated to SqueakSource discussions so
> I'm abusing Squeak-dev here. If there is one, please point me to it]
>
> We've been using SqueakSource at Qwaq for our internal projects and
> unfortunately it works only so-so. Mostly it works but every other day
> it basically spins out at 100% CPU and needs to be killed for good.
> Since this usually looses the last checkin(s) it's a major annoyance
> which we work around by sending an email message which includes a
> portion of the log so that at least you have a chance to see if it was
> "your" checkin that was just lost.
>
> That was until two days ago. About a week or so ago, we ran out of disk
> space on the box and after restoring it the server was working quite
> well until it again spun out in 100%. After the restart we noticed that
> we hadn't lost just one but several dozen checkins - basically
> everything that happened after we run out of disk space didn't show up.
>
> Since this smelled like major desaster I actually dug into the
> SqueakSource code to see what can be done to restore our data
> (fortunately, I could see that all of the data was actually on the
> server). This immediately showed a couple of major issues:
>
> 1) When SSFilesystem saves a repository it uses a mutex to serialize
> access but it doesn't protect a client from modifying the repository
> *while* it is saving. Since this is a process running in background
> priority, two saves in quick succession will lead to the second save
> modifying the repository that the first one is trying to write on disk.
> And indeed, looking at our problems, many of them show a pattern of two
> commits closely together like here:
>
> 2007-07-10T21:09:57+00:00 PUT /Qwaq/QwaqForums-1.0.42.mcm (qwaq)
> 2007-07-10T21:09:57+00:00 MODIFIED by SSSession>>putRequest:
> 2007-07-10T21:09:57+00:00 BEGIN SAVING
> 2007-07-10T21:11:03+00:00 PUT /Qwaq/QwaqForums-1.0.41.mcm (qwaq)
> 2007-07-10T21:11:03+00:00 MODIFIED by SSSession>>putRequest:
>
> (note that the "END SAVING" is missing before the second put) So it
> seems like one of the failure modes is that the repository is being
> modified *while* it is being saved. In addition, I think that one of the
> reasons while so many of the saved snapshots are "kaputt" is simply that
> they are broken by the same concurrent modification.
>
> I'd appreciate some insight from the authors (or anyone else
> knowledgeable) what the right fix for this problem might be. I have no
> idea how Seaside in general deals with these concurrency issues but it
> seems pretty clear that SSFilesystem is *not* safe in the face of
> concurrent modifications of the repository.
>
> 2) Much to my surprise I found that SSFilesystem actually *has* code
> that can be used to recover versions if any of the above happens
> (SSFilesystem>>importVersionsFor:) but it seems to be pretty much unused
> and affected by some bit rot. One of the things I did for our version is
> to hook this code up with the case that the last snapshot is kaputt, so
> that if there is a broken snapshot SSFilesystem automatically imports
> all the versions that aren't currently present in the repository. I'm
> attaching the recovery code in case anyone else has had similar problems.
>
> Question: Does anyone use similar/other changes like those? If so I'd be
> interested in learning about them.
>
> 3) The speed (and snapshot size) of SSFilesystem is pretty abysmal (on
> our box a repository snapshot is about 4mins and about 4MB each).
> Looking at what it's writing it seems that most of it is information
> that is easily available from the .MCZs and really doesn't need to be
> kept in the snapshot.
>
> Question: Is anyone using alternative storage mechanisms (lightweight &
> fast perhaps)? If so, what do you use and how does it work out?
> Generally speaking, what *do* people use for Squeaksource storage given
> that SSFilesystem is generally quite unreliable?

There is a Magama version in the Impara repository. It has also been
ported to Gemstone/S. If have never used either of them or heard any
usage reports.

Cheers
Philippe

> I'd appreciate any help on the above issues.
>
> Cheers,
>    - Andreas
>
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [SqueakSource] SSFilesystem woes

Michael Rueger-4
In reply to this post by Lukas Renggli
Lukas Renggli wrote:
>> Question: Is anyone using alternative storage mechanisms (lightweight &
>> fast perhaps)? If so, what do you use and how does it work out?
>> Generally speaking, what *do* people use for Squeaksource storage given
>> that SSFilesystem is generally quite unreliable?
>
> Are you on the latest version?

would have been my question too. Bert did some work on the storage to
ensure this stuff doesn't happen, adding auto recover if a data file is
broken etc.

> www.squeaksource.com saves the image every hour. For that amount of
> objects ReferenceStreams simply don't work anymore. We host about
> 30'000 versions with more than 2 GB of data.

After poking him long enough ;-) Bert also did a Magma backend, but we
never got around to try it out on a server. OK, we chickened out to use
it on a production server ;-)

Isn't it time we add a real database backend to squeak source? Saving
the image isn't a good solution either, one disk full or similar
incident and there goes your entire database.

What if all people with real production level usage pitch in a few bucks
and we got someone to write something using Glorp and postgres? Or at
least test the magma backend a copy of the squeaksource or sophie
repository?

Michael

Reply | Threaded
Open this post in threaded view
|

Re: [SqueakSource] SSFilesystem woes

Andreas.Raab
In reply to this post by Lukas Renggli
Lukas Renggli wrote:
> Are you on the latest version?

Pretty much so. I skipped some of the tagging stuff but other than that
I think we're up to date.

> www.squeaksource.com saves the image every hour. For that amount of
> objects ReferenceStreams simply don't work anymore. We host about
> 30'000 versions with more than 2 GB of data.

That's pretty clever. It definitely avoids all of the issues we are
seeing and is also faster and atomic. I'll try this. Out of curiosity,
is anyone actively working on SqueakSource to address such issues?

Thanks!
   - Andreas

>
> The following code is executed from a transcript:
>
> [ [ Smalltalk saveSession.
>      SSRepository storage log: 'IMAGE SAVE' ] fork.
>   (Delay forSeconds: 60 * 60) wait ] repeat
>
> In case the image crashes (what didn't happen for a very long time
> now) it automatically reloads the missing versions from the harddisk.
>
> SSRepository current projects do: [ :each |
>   SSRepository storage importVersionsFor: (SSRepository current
>     projectAt: each id ] ]
>
> Cheers,
> Lukas
>


Reply | Threaded
Open this post in threaded view
|

Re: [SqueakSource] SSFilesystem woes

Lukas Renggli
> seeing and is also faster and atomic. I'll try this. Out of curiosity,
> is anyone actively working on SqueakSource to address such issues?

Bert (I think), Philippe and Adrian provided (are providing) many
extensions and patches, but else there is no active development. One
problem is that SqueakSource works on a very old version of Seaside.

Not that we didn't had many ideas on how to improve SqueakSource (e.g.
a senders/implemntors/text search across all the data would be
terrific), but the time is lacking. We always planned to have
SqueakSource 2 go with Monticello 2, but ...

Cheers,
Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

Reply | Threaded
Open this post in threaded view
|

Re: [SqueakSource] SSFilesystem woes

stephane ducasse
In reply to this post by Michael Rueger-4
Indeed it would be a good idea. Please please do it. I may even  
motivate the ESUG board to participate.

Merging all the fixes and changes would be good too.

> What if all people with real production level usage pitch in a few  
> bucks and we got someone to write something using Glorp and  
> postgres? Or at least test the magma backend a copy of the  
> squeaksource or sophie repository?

Stef



Reply | Threaded
Open this post in threaded view
|

Re: [SqueakSource] SSFilesystem woes

jgfoster
In reply to this post by Michael Rueger-4
Michael Rueger wrote:
> Isn't it time we add a real database backend to squeak source? Saving
> the image isn't a good solution either, one disk full or similar
> incident and there goes your entire database.
Even better, is to run Squeak Source inside a real object database
server. No filesystem nonsense, no object-relational mapping struggles,
full ACID behavior, etc. See http://seaside.gemstone.com/ss/ for an
in-production example.

James

Reply | Threaded
Open this post in threaded view
|

Re: [SqueakSource] SSFilesystem woes

Lukas Renggli
> Even better, is to run Squeak Source inside a real object database
> server. No filesystem nonsense, no object-relational mapping struggles,
> full ACID behavior, etc. See http://seaside.gemstone.com/ss/ for an
> in-production example.

One of the design decisions in the initial version was to use the
file-system for the mcz files. Like this versions could also be served
directly by Apache, what was the de-facto standard at that time. I
agree that there might be better solutions today.

Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

Reply | Threaded
Open this post in threaded view
|

Re: [SqueakSource] SSFilesystem woes

Andreas.Raab
In reply to this post by jgfoster
James -

Interesting. Where can I find information about platforms, pricings etc?

Thanks,
   - Andreas

James Foster wrote:

> Michael Rueger wrote:
>> Isn't it time we add a real database backend to squeak source? Saving
>> the image isn't a good solution either, one disk full or similar
>> incident and there goes your entire database.
> Even better, is to run Squeak Source inside a real object database
> server. No filesystem nonsense, no object-relational mapping struggles,
> full ACID behavior, etc. See http://seaside.gemstone.com/ss/ for an
> in-production example.
>
> James
>
>


Reply | Threaded
Open this post in threaded view
|

Re: [SqueakSource] SSFilesystem woes

jgfoster
Andreas Raab wrote:
> Interesting. Where can I find information about platforms, pricings etc?
See http://seaside.gemstone.com/docs/Announcement-StS2007.pdf for the
Smalltalk Solutions 2007 announcement. The summary is that GemStone/S
64-bit Web Edition will be available in Q3 (by the end of September).
The Web Edition is free--even for commercial applications--and is the
full commercial version, but with certain limitations, including the
following: it will only use one machine (all VMs must be on the same
host as the database), it will only use 1 GB for the shared page cache,
it will only allow up to 4 GB for the database size, and it will only
use one CPU. Overall, this should run a decent-sized SqueakSource
repository without much trouble. If you are on Windows, you can use the
free VMware Server to host a Linux OS.

We will be presenting more this at ESUG. Will we see you there?

James