Larry,
Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore. James On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote: > Hello James, > Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent. > I guess I could find the extent0 from the distribution and copy it in? > > Larry > > > On Feb 14, 2012, at 2:07 PM, James Foster wrote: > >> Larry, >> >> Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large? >> >> James >> >> On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote: >> >>> Hello, >>> So, I run two separate Amazon instances, once called Production, and one called Staging. >>> My plan was to back up production twice a day and load the backups into the Staging environment. >>> Well, here are my extent sizes: >>> >>> 515899392 Feb 14 18:32 extent0.dbf - Production >>> 4192206848 Feb 14 18:32 extent0.dbf - Staging >>> >>> Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production >>> into Staging? I'm reading the Admin guide as fast as I can but I don't know what is going on. >>> >>> I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it >>> goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up? >>> >>> Thanks, >>> >>> Larry >>> >>> >> > |
In reply to this post by Dale Henrichs
Hi Dale,
Thanks for that pointer! Your numbers are quite impressive ... even on my local macbook pro I'm getting only numbers like the one below. Now, I've seen stats 10 times slower on the SAN, depending on the stone, so I'm currently gathering stats. ---- PGSVR>'/opt/gemstone/stones/test/data/extent0.dbf' opendbfnolock PGSVR>10000 testreadrate 10000 random pages read in 29371 ms Avg random read rate: 340.47 pages/s (2.9371e+00 ms/read) PGSVR>100 20 testbigreadrate 2000 random pages read in 20 IO calls in 2163 ms Avg random IO rate: 9.25 IO/s (1.0815e+02 ms/read) Avg page read rate: 924.64 pages/s (1.0815e+00 ms/ page read) On 14 Feb 2012, at 19:00, Dale Henrichs wrote: > Johan, > > We have a program that tests performance for a system doing random page reads against an extent. Launch `$GEMSTONE/sys/pgsvrslow` then enter the following two commands at the 'PGSVR>' prompt : > > '$GEMSTONE/seaside/data/extent0.dbf' opendbfnolock > <numpages> testreadrate > <numpages in block> <numsamples> testbigreadrate > > The `testreadrate` command does reads <numpages> random pages from the given extent. The answer you get gives random read performance. > > The `testbigreadrate` command does <numsamples> reads of <numpages in block> pages from random locations in the given extent. The answer you get gives you a measure of sequential read performance. > > Here's sample output from one of our desktop boxes on standard file system (basically reading from file buffer): > > --------------------------------------------------------------------------------- > % $GEMSTONE/sys/pgsvrslow > PGSVR>'extent0.dbf' opendbfnolock > > PGSVR>10000 testreadrate > > 10000 random pages read in 16 ms > Avg random read rate: 625000.00 pages/s (1.6000e-03 ms/read) > > PGSVR>100 20 testbigreadrate > > 2000 random pages read in 20 IO calls in 4 ms > Avg random IO rate: 5000.00 IO/s (2.0000e-01 ms/read) > Avg page read rate: 500000.00 pages/s (2.0000e-03 ms/ page read) > PGSVR> > --------------------------------------------------------------------------------- > > These commands can be run against the extent for a running stone ... but you'll want to get measurements with a variety of configurations... > > At the moment we're guessing that that the SAN might be optimized for sequential reads rather than random reads (i.e., buffering issues) ... also are you sure the you aren't be throttled by your provider? > > Finally it is worth looking at a copy of the config file for the stone to see if there's anything there... > > Dale > > ----- Original Message ----- > | From: "Johan Brichau" <[hidden email]> > | To: "GemStone Seaside beta discussion" <[hidden email]> > | Sent: Tuesday, February 14, 2012 5:43:58 AM > | Subject: Re: [GS/SS Beta] slow data page reads? > | > | As mentioned in Dale's blogpost, I went on to try a raw disk > | partition for the extent and the tranlogs and got exactly the same > | results: *very* low disk read speed (see below). Starting Gemstone > | and reading the SPC takes a long time. > | > | We are pretty certain the SAN is not overloaded because all other > | disk operations can reach a lot higher speeds. For example, the > | copydbf operation from the extent file to the partition reached very > | good speeds of over 30MB/s. > | > | So we are only seeing this issue when gemstone is doing read access > | on this kind of setup. I have other servers where everything is > | running smoothly. > | > | If anybody has any ideas... that would be cool ;-) > | > | Johan > | > | Sample read speed during gemstone page read: > | > | Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > | avgrq-sz avgqu-sz await svctm %util > | sda5 111.60 0.00 37.00 0.00 0.58 0.00 > | 32.00 1.00 26.90 27.01 99.92 > | > | > | On 13 Feb 2012, at 21:09, Johan Brichau wrote: > | > | > Well.. it turns out that we were wrong and we still experience the > | > problem... > | > > | > Dale, > | > > | > What we are seeing sounds very similar to this: > | > > | > http://gemstonesoup.wordpress.com/2007/10/19/scaling-seaside-with-gemstones/ > | > > | > " The issue with the i/o anomalies that we observed in Linux has > | > not been as easy to resolve. I spent some time tuning GemStone/S > | > to make sure that GemStone/S wasn't the source of the anomaly. > | > Finally our IS guy was able to reproduce the anomaly and he ran > | > into a few other folks on the net that have observed similar > | > anomalies. > | > > | > At this writing we haven't found a solution to the anomaly, but we > | > are pretty optimistic that it is resolvable. We've seen different > | > versions of Linux running on similar hardware that doesn't show > | > the anomaly, so it is either a function of the kernel version or > | > the settings of some of the kernel parameters. As soon as we > | > figure it out we'll let you know." > | > > | > Do you have more information on this? > | > > | > Johan > | > > | > > | > On 13 Feb 2012, at 19:39, Otto Behrens wrote: > | > > | >> Hi Johan, > | >> > | >> We had a machine hosted on a VPS, with a "state of the art" san, > | >> with > | >> similar issues. We complained every so often and the service > | >> provider > | >> responded with their inability to control some users on the same > | >> VPS > | >> host doing "extremely heavy" disk io. We got the client off the > | >> vps > | >> onto a normal machine with a SATA disk and have had joy ever since > | >> (10-20x improvement with the vps at its best). > | >> > | >> I think that the randomness of the reads thrown on top of other > | >> vms on > | >> the same host just caused unpredictable io; so we prefer avoiding > | >> vms. > | >> > | >> Alternatively, if it can work for you, put the extents in RAM. > | >> > | >> Otto > | >> > | >> On 13 Feb 2012, at 20:16, Johan Brichau <[hidden email]> > | >> wrote: > | >> > | >>> Hi all, > | >>> > | >>> Never mind my question below: our hosters have identified the > | >>> problem on their SAN. > | >>> Strange behavior though... > | >>> > | >>> phew ;-) > | >>> Johan > | >>> > | >>> On 13 Feb 2012, at 14:05, Johan Brichau wrote: > | >>> > | >>>> Hi Gemstoners, > | >>>> > | >>>> Is there any condition (other than a slow filesystem) that would > | >>>> trigger slow page reads when a gem needs to hit disk and load > | >>>> objects? > | >>>> > | >>>> Here is the problem I'm trying to chase: a seaside gem is > | >>>> processing a request and (according to the statmonit output) > | >>>> ends up requesting pages. The pageread process goes terribly > | >>>> slow (takes approx +- 50s) and I see only 5 to 15 pages per > | >>>> second being read during that time period. There is no other > | >>>> activity at that moment and I'm puzzled by why the read goes so > | >>>> slow (other than a slow filesystem -- see next). > | >>>> > | >>>> Because the iostat system monitoring also shows the same low > | >>>> read speed and indicates a 100% disk util statistic, my obvious > | >>>> first impression was that the disk is saturated and we have > | >>>> datastore problem. However, the disk read speed proves to be > | >>>> good when I'm doing other disk activity outside of Gemstone. > | >>>> Moreover, the _write_ speed is terribly good at all times. > | >>>> > | >>>> So, I'm currently trying to chase something that only triggers > | >>>> slow page read speed from a Gemstone topaz session. > | >>>> > | >>>> GEM_IO_LIMIT is set at default setting of 5000 > | >>>> > | >>>> For illustration, these are some kind of io stats when Gemstone > | >>>> is doing read access: > | >>>> > | >>>> Time: 06:40:21 PM > | >>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > | >>>> avgrq-sz avgqu-sz await svctm %util > | >>>> sda3 0.00 0.20 6.00 0.40 0.09 0.00 > | >>>> 30.75 1.00 166.88 156.00 99.84 > | >>>> > | >>>> Time: 06:40:26 PM > | >>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > | >>>> avgrq-sz avgqu-sz await svctm %util > | >>>> sda3 0.00 0.20 8.20 0.40 0.13 0.00 > | >>>> 31.07 1.05 119.91 115.72 99.52 > | >>>> > | >>>> Time: 06:40:31 PM > | >>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > | >>>> avgrq-sz avgqu-sz await svctm %util > | >>>> sda3 0.00 0.20 5.99 0.40 0.09 0.00 > | >>>> 30.75 1.01 157.75 156.25 99.80 > | >>> > | > > | > | |
In reply to this post by James Foster-8
On Feb 14, 2012, at 2:21 PM, James Foster wrote: > Larry, > > Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore. > Thanks, James. Clearly, I'm no Gemstone admin. ;-) I was assuming that the extent would assume the size of the backup, a bad assumption, I see… Larry > James > > On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote: > >> Hello James, >> Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent. >> I guess I could find the extent0 from the distribution and copy it in? >> >> Larry >> >> >> On Feb 14, 2012, at 2:07 PM, James Foster wrote: >> >>> Larry, >>> >>> Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large? >>> >>> James >>> >>> On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote: >>> >>>> Hello, >>>> So, I run two separate Amazon instances, once called Production, and one called Staging. >>>> My plan was to back up production twice a day and load the backups into the Staging environment. >>>> Well, here are my extent sizes: >>>> >>>> 515899392 Feb 14 18:32 extent0.dbf - Production >>>> 4192206848 Feb 14 18:32 extent0.dbf - Staging >>>> >>>> Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production >>>> into Staging? I'm reading the Admin guide as fast as I can but I don't know what is going on. >>>> >>>> I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it >>>> goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up? >>>> >>>> Thanks, >>>> >>>> Larry >>>> >>>> >>> >> > |
On Feb 14, 2012, at 2:43 PM, Lawrence Kellogg wrote: > > > On Feb 14, 2012, at 2:21 PM, James Foster wrote: > >> Larry, >> >> Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore. >> > > James, Ok, I was successful in copying over the original extent, and applying my backup on the Staging system. Things are much better now. Now, I want to do the same thing for production where I also have an inflated repository from some backups I loaded there. I just want to start from scratch from a backup, ignoring all the tranlogs, some of which are huge. In a scrabble for disk space over there, I deleted some old tranelogs. I know, I know, stupid. How do I start over? I don't have a lot of user traffic yet so now seems like the time. Given Seaside, how do I make sure there are no users hitting the server so I can make my backup and restore it? Should I kill nginx? I can't seem to find a command to force off all current sessions and then suspend login. Larry > >> James >> >> On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote: >> >>> Hello James, >>> Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent. >>> I guess I could find the extent0 from the distribution and copy it in? >>> >>> Larry >>> >>> >>> On Feb 14, 2012, at 2:07 PM, James Foster wrote: >>> >>>> Larry, >>>> >>>> Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large? >>>> >>>> James >>>> >>>> On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote: >>>> >>>>> Hello, >>>>> So, I run two separate Amazon instances, once called Production, and one called Staging. >>>>> My plan was to back up production twice a day and load the backups into the Staging environment. >>>>> Well, here are my extent sizes: >>>>> >>>>> 515899392 Feb 14 18:32 extent0.dbf - Production >>>>> 4192206848 Feb 14 18:32 extent0.dbf - Staging >>>>> >>>>> Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production >>>>> into Staging? I'm reading the Admin guide as fast as I can but I don't know what is going on. >>>>> >>>>> I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it >>>>> goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up? >>>>> >>>>> Thanks, >>>>> >>>>> Larry >>>>> >>>>> >>>> >>> >> > |
Larry,
As long as your have a good production system, then the loss of previous transaction logs is not really a problem. They are only necessary for restoring from a crash. To shrink your production repository, you want to make a backup of an otherwise inactive system (i.e., all other user sessions logged off). To do this, log in to a new session (Topaz, Jade, GemTools, etc.) and evaluate the following: System stopUserSessions. Then proceed with a full backup, shutdown, copy in a new extent, start the stone, restore the backup, commit the restore, and then start up the user sessions. You might check the size before and after: SystemRepository fileSizeReport. -James On Feb 14, 2012, at 1:58 PM, Lawrence Kellogg wrote: > > On Feb 14, 2012, at 2:43 PM, Lawrence Kellogg wrote: > >> >> >> On Feb 14, 2012, at 2:21 PM, James Foster wrote: >> >>> Larry, >>> >>> Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore. >>> >> >> > > James, > > Ok, I was successful in copying over the original extent, and applying my backup on the Staging system. Things are much better now. > > Now, I want to do the same thing for production where I also have an inflated repository from some backups I loaded there. > > I just want to start from scratch from a backup, ignoring all the tranlogs, some of which are huge. In a scrabble for > disk space over there, I deleted some old tranelogs. I know, I know, stupid. > > How do I start over? I don't have a lot of user traffic yet so now seems like the time. Given Seaside, how do > I make sure there are no users hitting the server so I can make my backup and restore it? Should I > kill nginx? > > I can't seem to find a command to force off all current sessions and then suspend login. > > Larry > > > >> >>> James >>> >>> On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote: >>> >>>> Hello James, >>>> Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent. >>>> I guess I could find the extent0 from the distribution and copy it in? >>>> >>>> Larry >>>> >>>> >>>> On Feb 14, 2012, at 2:07 PM, James Foster wrote: >>>> >>>>> Larry, >>>>> >>>>> Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large? >>>>> >>>>> James >>>>> >>>>> On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote: >>>>> >>>>>> Hello, >>>>>> So, I run two separate Amazon instances, once called Production, and one called Staging. >>>>>> My plan was to back up production twice a day and load the backups into the Staging environment. >>>>>> Well, here are my extent sizes: >>>>>> >>>>>> 515899392 Feb 14 18:32 extent0.dbf - Production >>>>>> 4192206848 Feb 14 18:32 extent0.dbf - Staging >>>>>> >>>>>> Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production >>>>>> into Staging? I'm reading the Admin guide as fast as I can but I don't know what is going on. >>>>>> >>>>>> I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it >>>>>> goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Larry >>>>>> >>>>>> >>>>> >>>> >>> >> > |
On Feb 14, 2012, at 5:14 PM, James Foster wrote: > Larry, > > As long as your have a good production system, then the loss of previous transaction logs is not really a problem. They are only necessary for restoring from a crash. > > To shrink your production repository, you want to make a backup of an otherwise inactive system (i.e., all other user sessions logged off). To do this, log in to a new session (Topaz, Jade, GemTools, etc.) and evaluate the following: > System stopUserSessions. Thanks, the funny thing is that it still shows me some user sessions after doing that….maybe it takes a minute for them to go away….??? topaz 1> printit System stopUserSessions. % System class superClass Object class format 32 instVars 0 instVarNames an Array constraints an Array classVars a SymbolDictionary methodDict a GsMethodDictionary poolDictionaries an Array categories a GsMethodDictionary secondarySuperclasses nil name System classHistory a ClassHistory description a GsClassDocumentation migrationDestination nil timeStamp a DateTime userId SystemUser extraDict a SymbolDictionary classCategory nil subclasses nil topaz 1> printit System currentSessionNames. % session number: 2 UserId: GcUser session number: 3 UserId: SymbolUser session number: 4 UserId: GcUser session number: 6 UserId: DataCurator topaz 1> > Then proceed with a full backup, shutdown, copy in a new extent, start the stone, restore the backup, commit the restore, and then start up the user sessions. > > You might check the size before and after: > SystemRepository fileSizeReport. > > -James > > > On Feb 14, 2012, at 1:58 PM, Lawrence Kellogg wrote: > >> >> On Feb 14, 2012, at 2:43 PM, Lawrence Kellogg wrote: >> >>> >>> >>> On Feb 14, 2012, at 2:21 PM, James Foster wrote: >>> >>>> Larry, >>>> >>>> Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore. >>>> >>> >>> >> >> James, >> >> Ok, I was successful in copying over the original extent, and applying my backup on the Staging system. Things are much better now. >> >> Now, I want to do the same thing for production where I also have an inflated repository from some backups I loaded there. >> >> I just want to start from scratch from a backup, ignoring all the tranlogs, some of which are huge. In a scrabble for >> disk space over there, I deleted some old tranelogs. I know, I know, stupid. >> >> How do I start over? I don't have a lot of user traffic yet so now seems like the time. Given Seaside, how do >> I make sure there are no users hitting the server so I can make my backup and restore it? Should I >> kill nginx? >> >> I can't seem to find a command to force off all current sessions and then suspend login. >> >> Larry >> >> >> >>> >>>> James >>>> >>>> On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote: >>>> >>>>> Hello James, >>>>> Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent. >>>>> I guess I could find the extent0 from the distribution and copy it in? >>>>> >>>>> Larry >>>>> >>>>> >>>>> On Feb 14, 2012, at 2:07 PM, James Foster wrote: >>>>> >>>>>> Larry, >>>>>> >>>>>> Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large? >>>>>> >>>>>> James >>>>>> >>>>>> On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote: >>>>>> >>>>>>> Hello, >>>>>>> So, I run two separate Amazon instances, once called Production, and one called Staging. >>>>>>> My plan was to back up production twice a day and load the backups into the Staging environment. >>>>>>> Well, here are my extent sizes: >>>>>>> >>>>>>> 515899392 Feb 14 18:32 extent0.dbf - Production >>>>>>> 4192206848 Feb 14 18:32 extent0.dbf - Staging >>>>>>> >>>>>>> Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production >>>>>>> into Staging? I'm reading the Admin guide as fast as I can but I don't know what is going on. >>>>>>> >>>>>>> I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it >>>>>>> goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Larry >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > |
Good observation and it is smart to check. In this case, though, it is okay. We make a distinction between "user" sessions and "system" sessions. The two GcUsers are doing background garbage collection and the SymbolUser is available to create Symbols if needed. These are built-in users and activities. While they can be stopped, it is rare to need to do so. The remaining one is you, DataCurator. You may proceed with the backup with them logged in...
On Feb 14, 2012, at 2:25 PM, Lawrence Kellogg wrote: > > On Feb 14, 2012, at 5:14 PM, James Foster wrote: > >> Larry, >> >> As long as your have a good production system, then the loss of previous transaction logs is not really a problem. They are only necessary for restoring from a crash. >> >> To shrink your production repository, you want to make a backup of an otherwise inactive system (i.e., all other user sessions logged off). To do this, log in to a new session (Topaz, Jade, GemTools, etc.) and evaluate the following: >> System stopUserSessions. > > Thanks, the funny thing is that it still shows me some user sessions after doing that….maybe it takes a > minute for them to go away….??? > > topaz 1> printit > System stopUserSessions. > % > System class > superClass Object class > format 32 > instVars 0 > instVarNames an Array > constraints an Array > classVars a SymbolDictionary > methodDict a GsMethodDictionary > poolDictionaries an Array > categories a GsMethodDictionary > secondarySuperclasses nil > name System > classHistory a ClassHistory > description a GsClassDocumentation > migrationDestination nil > timeStamp a DateTime > userId SystemUser > extraDict a SymbolDictionary > classCategory nil > subclasses nil > > topaz 1> printit > System currentSessionNames. > % > > session number: 2 UserId: GcUser > session number: 3 UserId: SymbolUser > session number: 4 UserId: GcUser > session number: 6 UserId: DataCurator > topaz 1> > >> Then proceed with a full backup, shutdown, copy in a new extent, start the stone, restore the backup, commit the restore, and then start up the user sessions. >> >> You might check the size before and after: >> SystemRepository fileSizeReport. >> >> -James >> >> >> On Feb 14, 2012, at 1:58 PM, Lawrence Kellogg wrote: >> >>> >>> On Feb 14, 2012, at 2:43 PM, Lawrence Kellogg wrote: >>> >>>> >>>> >>>> On Feb 14, 2012, at 2:21 PM, James Foster wrote: >>>> >>>>> Larry, >>>>> >>>>> Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore. >>>>> >>>> >>>> >>> >>> James, >>> >>> Ok, I was successful in copying over the original extent, and applying my backup on the Staging system. Things are much better now. >>> >>> Now, I want to do the same thing for production where I also have an inflated repository from some backups I loaded there. >>> >>> I just want to start from scratch from a backup, ignoring all the tranlogs, some of which are huge. In a scrabble for >>> disk space over there, I deleted some old tranelogs. I know, I know, stupid. >>> >>> How do I start over? I don't have a lot of user traffic yet so now seems like the time. Given Seaside, how do >>> I make sure there are no users hitting the server so I can make my backup and restore it? Should I >>> kill nginx? >>> >>> I can't seem to find a command to force off all current sessions and then suspend login. >>> >>> Larry >>> >>> >>> >>>> >>>>> James >>>>> >>>>> On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote: >>>>> >>>>>> Hello James, >>>>>> Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent. >>>>>> I guess I could find the extent0 from the distribution and copy it in? >>>>>> >>>>>> Larry >>>>>> >>>>>> >>>>>> On Feb 14, 2012, at 2:07 PM, James Foster wrote: >>>>>> >>>>>>> Larry, >>>>>>> >>>>>>> Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large? >>>>>>> >>>>>>> James >>>>>>> >>>>>>> On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> So, I run two separate Amazon instances, once called Production, and one called Staging. >>>>>>>> My plan was to back up production twice a day and load the backups into the Staging environment. >>>>>>>> Well, here are my extent sizes: >>>>>>>> >>>>>>>> 515899392 Feb 14 18:32 extent0.dbf - Production >>>>>>>> 4192206848 Feb 14 18:32 extent0.dbf - Staging >>>>>>>> >>>>>>>> Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production >>>>>>>> into Staging? I'm reading the Admin guide as fast as I can but I don't know what is going on. >>>>>>>> >>>>>>>> I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it >>>>>>>> goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Larry >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > |
On Feb 14, 2012, at 5:28 PM, James Foster wrote: > Good observation and it is smart to check. In this case, though, it is okay. We make a distinction between "user" sessions and "system" sessions. The two GcUsers are doing background garbage collection and the SymbolUser is available to create Symbols if needed. These are built-in users and activities. While they can be stopped, it is rare to need to do so. The remaining one is you, DataCurator. You may proceed with the backup with them logged in… > James, Whew. I did it but have a few questions. First, why did my Free Space go to 25 meg?? Do I have to manually add an extent? SystemRepository fileSizeReport % Extent #1 ----------- Filename = !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf File size = 492.00 Megabytes Space available = 279.48 Megabytes Totals ------ Repository size = 492.00 Megabytes Free Space = 279.48 Megabytes topaz 1> AFTER SystemRepository fileSizeReport % Extent #1 ----------- Filename = !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf File size = 218.00 Megabytes Space available = 25.88 Megabytes Totals ------ Repository size = 218.00 Megabytes Free Space = 25.88 Megabytes topaz 1> Also, I guess this is nil because there are no new transactions after the restore, but I am not sure… topaz 1> printit SystemRepository restoreStatusOldestFileId % nil topaz Larry > On Feb 14, 2012, at 2:25 PM, Lawrence Kellogg wrote: > >> >> On Feb 14, 2012, at 5:14 PM, James Foster wrote: >> >>> Larry, >>> >>> As long as your have a good production system, then the loss of previous transaction logs is not really a problem. They are only necessary for restoring from a crash. >>> >>> To shrink your production repository, you want to make a backup of an otherwise inactive system (i.e., all other user sessions logged off). To do this, log in to a new session (Topaz, Jade, GemTools, etc.) and evaluate the following: >>> System stopUserSessions. >> >> Thanks, the funny thing is that it still shows me some user sessions after doing that….maybe it takes a >> minute for them to go away….??? >> >> topaz 1> printit >> System stopUserSessions. >> % >> System class >> superClass Object class >> format 32 >> instVars 0 >> instVarNames an Array >> constraints an Array >> classVars a SymbolDictionary >> methodDict a GsMethodDictionary >> poolDictionaries an Array >> categories a GsMethodDictionary >> secondarySuperclasses nil >> name System >> classHistory a ClassHistory >> description a GsClassDocumentation >> migrationDestination nil >> timeStamp a DateTime >> userId SystemUser >> extraDict a SymbolDictionary >> classCategory nil >> subclasses nil >> >> topaz 1> printit >> System currentSessionNames. >> % >> >> session number: 2 UserId: GcUser >> session number: 3 UserId: SymbolUser >> session number: 4 UserId: GcUser >> session number: 6 UserId: DataCurator >> topaz 1> >> >>> Then proceed with a full backup, shutdown, copy in a new extent, start the stone, restore the backup, commit the restore, and then start up the user sessions. >>> >>> You might check the size before and after: >>> SystemRepository fileSizeReport. >>> >>> -James >>> >>> >>> On Feb 14, 2012, at 1:58 PM, Lawrence Kellogg wrote: >>> >>>> >>>> On Feb 14, 2012, at 2:43 PM, Lawrence Kellogg wrote: >>>> >>>>> >>>>> >>>>> On Feb 14, 2012, at 2:21 PM, James Foster wrote: >>>>> >>>>>> Larry, >>>>>> >>>>>> Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore. >>>>>> >>>>> >>>>> >>>> >>>> James, >>>> >>>> Ok, I was successful in copying over the original extent, and applying my backup on the Staging system. Things are much better now. >>>> >>>> Now, I want to do the same thing for production where I also have an inflated repository from some backups I loaded there. >>>> >>>> I just want to start from scratch from a backup, ignoring all the tranlogs, some of which are huge. In a scrabble for >>>> disk space over there, I deleted some old tranelogs. I know, I know, stupid. >>>> >>>> How do I start over? I don't have a lot of user traffic yet so now seems like the time. Given Seaside, how do >>>> I make sure there are no users hitting the server so I can make my backup and restore it? Should I >>>> kill nginx? >>>> >>>> I can't seem to find a command to force off all current sessions and then suspend login. >>>> >>>> Larry >>>> >>>> >>>> >>>>> >>>>>> James >>>>>> >>>>>> On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote: >>>>>> >>>>>>> Hello James, >>>>>>> Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent. >>>>>>> I guess I could find the extent0 from the distribution and copy it in? >>>>>>> >>>>>>> Larry >>>>>>> >>>>>>> >>>>>>> On Feb 14, 2012, at 2:07 PM, James Foster wrote: >>>>>>> >>>>>>>> Larry, >>>>>>>> >>>>>>>> Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large? >>>>>>>> >>>>>>>> James >>>>>>>> >>>>>>>> On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> So, I run two separate Amazon instances, once called Production, and one called Staging. >>>>>>>>> My plan was to back up production twice a day and load the backups into the Staging environment. >>>>>>>>> Well, here are my extent sizes: >>>>>>>>> >>>>>>>>> 515899392 Feb 14 18:32 extent0.dbf - Production >>>>>>>>> 4192206848 Feb 14 18:32 extent0.dbf - Staging >>>>>>>>> >>>>>>>>> Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production >>>>>>>>> into Staging? I'm reading the Admin guide as fast as I can but I don't know what is going on. >>>>>>>>> >>>>>>>>> I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it >>>>>>>>> goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Larry >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > |
On Feb 14, 2012, at 3:11 PM, Lawrence Kellogg wrote: > > On Feb 14, 2012, at 5:28 PM, James Foster wrote: > >> Good observation and it is smart to check. In this case, though, it is okay. We make a distinction between "user" sessions and "system" sessions. The two GcUsers are doing background garbage collection and the SymbolUser is available to create Symbols if needed. These are built-in users and activities. While they can be stopped, it is rare to need to do so. The remaining one is you, DataCurator. You may proceed with the backup with them logged in… >> > > James, > Whew. I did it but have a few questions. > > First, why did my Free Space go to 25 meg?? Do I have to manually add an extent? Just as doing a restore in a big database will leave lots of free space, doing a restore in a small database will leave a little bit of free space (imagine the object tree that existed moments before you did the commitRestore). Also, it is possible that there was some garbage collection in process at the time of the backup and it finished after the restore. I'm not certain of either of these theories, but ~12% free space is not too big a deal. With luck, your business will have reason to consume that space soon! > SystemRepository fileSizeReport > % > Extent #1 > ----------- > Filename = !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf > > File size = 492.00 Megabytes > Space available = 279.48 Megabytes > > Totals > ------ > Repository size = 492.00 Megabytes > Free Space = 279.48 Megabytes > > topaz 1> > > > AFTER > > SystemRepository fileSizeReport > % > Extent #1 > ----------- > Filename = !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf > > File size = 218.00 Megabytes > Space available = 25.88 Megabytes > > Totals > ------ > Repository size = 218.00 Megabytes > Free Space = 25.88 Megabytes > > topaz 1> > > > Also, I guess this is nil because there are no new transactions after the restore, but I am not sure… Once you commit the restore, then there are never any more transaction logs to apply. > topaz 1> printit > SystemRepository restoreStatusOldestFileId > % > nil > topaz > > > Larry > > >> On Feb 14, 2012, at 2:25 PM, Lawrence Kellogg wrote: >> >>> >>> On Feb 14, 2012, at 5:14 PM, James Foster wrote: >>> >>>> Larry, >>>> >>>> As long as your have a good production system, then the loss of previous transaction logs is not really a problem. They are only necessary for restoring from a crash. >>>> >>>> To shrink your production repository, you want to make a backup of an otherwise inactive system (i.e., all other user sessions logged off). To do this, log in to a new session (Topaz, Jade, GemTools, etc.) and evaluate the following: >>>> System stopUserSessions. >>> >>> Thanks, the funny thing is that it still shows me some user sessions after doing that….maybe it takes a >>> minute for them to go away….??? >>> >>> topaz 1> printit >>> System stopUserSessions. >>> % >>> System class >>> superClass Object class >>> format 32 >>> instVars 0 >>> instVarNames an Array >>> constraints an Array >>> classVars a SymbolDictionary >>> methodDict a GsMethodDictionary >>> poolDictionaries an Array >>> categories a GsMethodDictionary >>> secondarySuperclasses nil >>> name System >>> classHistory a ClassHistory >>> description a GsClassDocumentation >>> migrationDestination nil >>> timeStamp a DateTime >>> userId SystemUser >>> extraDict a SymbolDictionary >>> classCategory nil >>> subclasses nil >>> >>> topaz 1> printit >>> System currentSessionNames. >>> % >>> >>> session number: 2 UserId: GcUser >>> session number: 3 UserId: SymbolUser >>> session number: 4 UserId: GcUser >>> session number: 6 UserId: DataCurator >>> topaz 1> >>> >>>> Then proceed with a full backup, shutdown, copy in a new extent, start the stone, restore the backup, commit the restore, and then start up the user sessions. >>>> >>>> You might check the size before and after: >>>> SystemRepository fileSizeReport. >>>> >>>> -James >>>> >>>> >>>> On Feb 14, 2012, at 1:58 PM, Lawrence Kellogg wrote: >>>> >>>>> >>>>> On Feb 14, 2012, at 2:43 PM, Lawrence Kellogg wrote: >>>>> >>>>>> >>>>>> >>>>>> On Feb 14, 2012, at 2:21 PM, James Foster wrote: >>>>>> >>>>>>> Larry, >>>>>>> >>>>>>> Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore. >>>>>>> >>>>>> >>>>>> >>>>> >>>>> James, >>>>> >>>>> Ok, I was successful in copying over the original extent, and applying my backup on the Staging system. Things are much better now. >>>>> >>>>> Now, I want to do the same thing for production where I also have an inflated repository from some backups I loaded there. >>>>> >>>>> I just want to start from scratch from a backup, ignoring all the tranlogs, some of which are huge. In a scrabble for >>>>> disk space over there, I deleted some old tranelogs. I know, I know, stupid. >>>>> >>>>> How do I start over? I don't have a lot of user traffic yet so now seems like the time. Given Seaside, how do >>>>> I make sure there are no users hitting the server so I can make my backup and restore it? Should I >>>>> kill nginx? >>>>> >>>>> I can't seem to find a command to force off all current sessions and then suspend login. >>>>> >>>>> Larry >>>>> >>>>> >>>>> >>>>>> >>>>>>> James >>>>>>> >>>>>>> On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote: >>>>>>> >>>>>>>> Hello James, >>>>>>>> Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent. >>>>>>>> I guess I could find the extent0 from the distribution and copy it in? >>>>>>>> >>>>>>>> Larry >>>>>>>> >>>>>>>> >>>>>>>> On Feb 14, 2012, at 2:07 PM, James Foster wrote: >>>>>>>> >>>>>>>>> Larry, >>>>>>>>> >>>>>>>>> Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large? >>>>>>>>> >>>>>>>>> James >>>>>>>>> >>>>>>>>> On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote: >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> So, I run two separate Amazon instances, once called Production, and one called Staging. >>>>>>>>>> My plan was to back up production twice a day and load the backups into the Staging environment. >>>>>>>>>> Well, here are my extent sizes: >>>>>>>>>> >>>>>>>>>> 515899392 Feb 14 18:32 extent0.dbf - Production >>>>>>>>>> 4192206848 Feb 14 18:32 extent0.dbf - Staging >>>>>>>>>> >>>>>>>>>> Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production >>>>>>>>>> into Staging? I'm reading the Admin guide as fast as I can but I don't know what is going on. >>>>>>>>>> >>>>>>>>>> I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it >>>>>>>>>> goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Larry >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > |
On Feb 14, 2012, at 8:51 PM, James Foster wrote:
Ha ha, business, I certainly hope my project leads to some income but I have low expectations, though, that way, I'm rarely disappointed. ;-) So, I read a whole bunch more and see this: "STN_FREE_SPACE_THRESHOLD sets the minimum amount of free space (in MB) to be available in the repository. If the Stone cannot maintain this level by growing an extent, it begins actions to prevent shutdown of the system; for information, see “Repository Full” on page 190." So, to answer my own question, correctly, I hope, the repository will grow an extent as long as there is available disk space. Sounds cool to me. I'm sorry that I'm so hopeless with these admin tasks, I never administered a Gemstone system in my previous jobs, I just wrote domain code. That was hard enough.
Ok, that makes sense. I'm still kind of confused about how to handle transaction logs going forward, though. How do I know which ones I can safely delete? When? Is it overkill to do two full backups a day and keep rotating them to my Staging environment?
|
On Feb 14, 2012, at 6:38 PM, Lawrence Kellogg wrote:
> I'm still kind of confused about how to handle transaction logs going forward, though. > How do I know which ones I can safely delete? When? Is it overkill to do two full backups a day and > keep rotating them to my Staging environment? The general policy for backups and transaction logs is very much the same for GemStone as it is for any other database system. You need to keep enough redundancy to recover from some statistical risk of loss. A typical rule is "no single point of failure" which means two of everything. The standard recommendation is to have at least three disks: one for the OS, swap space, and backups; one for the extents; and one for the transaction logs. (In addition to providing duplication, this helps performance since it allows the transaction logs to be written without competition from other activity.) Ideally, each of these would be on redundant disks (e.g., RAID). If the system crashes without any disk failures (e.g., power loss), then on restart GemStone recognizes that things were not shut down cleanly and it replays the transactions since the last checkpoint. If there is a failure of the OS disk, then when it is repaired GemStone will recover just as if it were another OS crash. On restart you should take a current backup (or two!). If there is a failure of the extent disk, then when it is replaced you restore the backup and replay the transaction logs. If there is a failure of the transaction disk, then the system will pause until you provide a new disk for transactions (which can be done while the system is up). In any case, it is generally a good idea to keep two backups and the transaction logs going back to the beginning of the earlier backup. If you want to keep a "warm standby" then you have a second system in restore mode and each time you finish a transaction log on the production system you restore it to the standby system. I know of at least one customer who has the standby system in a second data center and transfers a log every 15 minutes. (You don't have to wait for it to be full; you can explicitly start a new log.) How this plays into Amazon, I'm not sure. I assume that you get only one disk, and it is on a shared SAN (and can have performance problems!), but almost certainly has RAID. You have the option of specifying different data centers for different instances. |
If this is true:
[seasideuser@ip-10-191-194-75 data]$ echo $GEMSTONE_SYS_CONF /opt/gemstone/product/seaside/data/system.conf why does Gemstone insist in trying to start the stone in the other directory, the one with no extent0.dbf??? I'm too tired to figure this out now. I've started and stopped this thing dozens of times and can't figure out why this is failing. :luckily it's just Staging so it can wait. I had shut down the stone to restore a backup and then reply log files….but I don't understand this error… Larry [seasideuser@ip-10-191-194-75 data]$ startstone seaside startstone[Info]: GemStone version '2.4.4.1' startstone[Info]: Starting Stone repository monitor "seaside". startstone[Info]: GEMSTONE is: "/opt/gemstone/product". startstone[Warning]: /usr/lib64/libposix-aio.so not found, using librt.so startstone[Info]: GEMSTONE_SYS_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/system.conf <<<<<<<<<<<<<<<<???????????????? GEMSTONE_EXE_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/seaside.conf <<<<<<<<<<<<<<<<<???????????????? startstone[Info]: Log file is '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log'. startstone[Error]: Stone process (id=14170) has died. startstone[Error]: Examine '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log' for more information. Excerpt follows: configuredSize 1000 MB Directory 1: configured name $GEMSTONE_DATADIR/ expanded name /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/ configuredSize 1000 MB ------------------------------------------------------- GemStone is unable to open the file !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf reason = File = /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf DBF Op: Open; DBF Record: -1; Error: open() failure; System Codes: errno=13,EACCES, Authorization failure (permission denied) An error occurred opening the repository for exclusive access. Stone startup has failed. |
Larry,
What do you get from 'ls -alF /opt/gemstone'? What about 'ls -alF /opt/gemstone/product/seaside/data/'? I ask because I generally set up product as a symbolic link to the actual directory so it can be easily changed for server upgrades (when will you move to 2.4.5?). Also, we need to make sure that there really is a config file! Also, while the default installation puts the config files, extents, transaction logs, and log files in the product tree, I prefer to have them elsewhere (/opt/gemstone/etc, /opt/gemstone/data, /opt/gemstone/log) so that upgrades do not require moving as much stuff. -James On Feb 14, 2012, at 8:23 PM, Lawrence Kellogg wrote: > If this is true: > > [seasideuser@ip-10-191-194-75 data]$ echo $GEMSTONE_SYS_CONF > /opt/gemstone/product/seaside/data/system.conf > > > why does Gemstone insist in trying to start the stone in the other directory, the one with no extent0.dbf??? > > I'm too tired to figure this out now. I've started and stopped this thing dozens of times and > can't figure out why this is failing. :luckily it's just Staging so it can wait. I had shut down the stone > to restore a backup and then reply log files….but I don't understand this error… > > Larry > > > > [seasideuser@ip-10-191-194-75 data]$ startstone seaside > startstone[Info]: GemStone version '2.4.4.1' > startstone[Info]: Starting Stone repository monitor "seaside". > startstone[Info]: GEMSTONE is: "/opt/gemstone/product". > startstone[Warning]: /usr/lib64/libposix-aio.so not found, using librt.so > startstone[Info]: > GEMSTONE_SYS_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/system.conf <<<<<<<<<<<<<<<<???????????????? > GEMSTONE_EXE_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/seaside.conf <<<<<<<<<<<<<<<<<???????????????? > startstone[Info]: Log file is '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log'. > > startstone[Error]: Stone process (id=14170) has died. > startstone[Error]: Examine '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log' for more information. Excerpt follows: > configuredSize 1000 MB > Directory 1: > configured name $GEMSTONE_DATADIR/ > expanded name /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/ > configuredSize 1000 MB > ------------------------------------------------------- > > GemStone is unable to open the file !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf > reason = File = /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf > DBF Op: Open; DBF Record: -1; > Error: open() failure; System Codes: errno=13,EACCES, Authorization failure (permission denied) > > An error occurred opening the repository for exclusive access. > > Stone startup has failed. > |
On Feb 14, 2012, at 11:40 PM, James Foster wrote: > Larry, > > What do you get from 'ls -alF /opt/gemstone'? What about 'ls -alF /opt/gemstone/product/seaside/data/'? I ask because I generally set up product as a symbolic link to the actual directory so it can be easily changed for server upgrades (when will you move to 2.4.5?). Also, we need to make sure that there really is a config file! > [seasideuser@ip-10-191-194-75 ~]$ ls -alF /opt/gemstone total 28 drwxrwx--- 5 seasideuser seasideuser 4096 Oct 12 19:18 ./ drwxr-xr-x 4 root root 4096 Oct 12 19:18 ../ drwxr-xr-x 17 seasideuser seasideuser 4096 Oct 13 02:15 GemStone64Bit2.4.4.1-x86_64.Linux/ drwxrwx--- 2 seasideuser seasideuser 4096 Feb 15 04:18 locks/ drwxrwxrwx 3 seasideuser seasideuser 12288 Feb 15 04:05 log/ lrwxrwxrwx 1 seasideuser seasideuser 47 Oct 12 19:18 product -> /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/ [seasideuser@ip-10-191-194-75 ~]$ ls -alF /opt/gemstone/product/seaside/data total 47144 drwxrwxr-x 4 seasideuser seasideuser 4096 Feb 15 04:26 ./ drwxrwxr-x 9 seasideuser seasideuser 4096 Jul 13 2010 ../ drwxrwxr-x 2 seasideuser seasideuser 4096 Feb 15 02:41 backups/ -rw------- 1 root root 14680064 Feb 15 03:40 extent0.dbf -rw-r--r-- 1 seasideuser seasideuser 229 Jul 13 2010 gem.conf drwxr-xr-x 2 root root 4096 Feb 15 03:40 old/ -rw-r--r-- 1 seasideuser seasideuser 478 Jul 13 2010 system.conf -rw-rw-r-- 1 seasideuser seasideuser 35840 Feb 14 20:56 tranlog1.dbf -rw-rw-r-- 1 seasideuser seasideuser 4320256 Feb 15 03:39 tranlog2.dbf -rw-rw-r-- 1 seasideuser seasideuser 753152 Feb 15 03:24 tranlog5.dbf -rw-rw-r-- 1 seasideuser seasideuser 10240 Feb 15 03:24 tranlog6.dbf -rw-rw-r-- 1 seasideuser seasideuser 28443136 Feb 15 03:27 tranlog7.dbf [seasideuser@ip-10-191-194-75 ~]$ This seems right to me, unless I'm missing something. Is the symbolic link for product correct? I see gem.conf and system.conf files. I swear I only copied in the new extent0 but perhaps I somehow deleted the config file. I did a ". defSeaside" and that runs ok. Just when I think I understand how it all works, I run into these sorts of issues that leave me stymied. I will migrate to 2.4.5 when I get a few spare moments. It's not easy being the admin/programmer/designer/customer service rep/marketeer, but it sure is a lot of fun. Larry > Also, while the default installation puts the config files, extents, transaction logs, and log files in the product tree, I prefer to have them elsewhere (/opt/gemstone/etc, /opt/gemstone/data, /opt/gemstone/log) so that upgrades do not require moving as much stuff. > > -James > > On Feb 14, 2012, at 8:23 PM, Lawrence Kellogg wrote: > >> If this is true: >> >> [seasideuser@ip-10-191-194-75 data]$ echo $GEMSTONE_SYS_CONF >> /opt/gemstone/product/seaside/data/system.conf >> >> >> why does Gemstone insist in trying to start the stone in the other directory, the one with no extent0.dbf??? >> >> I'm too tired to figure this out now. I've started and stopped this thing dozens of times and >> can't figure out why this is failing. :luckily it's just Staging so it can wait. I had shut down the stone >> to restore a backup and then reply log files….but I don't understand this error… >> >> Larry >> >> >> >> [seasideuser@ip-10-191-194-75 data]$ startstone seaside >> startstone[Info]: GemStone version '2.4.4.1' >> startstone[Info]: Starting Stone repository monitor "seaside". >> startstone[Info]: GEMSTONE is: "/opt/gemstone/product". >> startstone[Warning]: /usr/lib64/libposix-aio.so not found, using librt.so >> startstone[Info]: >> GEMSTONE_SYS_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/system.conf <<<<<<<<<<<<<<<<???????????????? >> GEMSTONE_EXE_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/seaside.conf <<<<<<<<<<<<<<<<<???????????????? >> startstone[Info]: Log file is '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log'. >> >> startstone[Error]: Stone process (id=14170) has died. >> startstone[Error]: Examine '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log' for more information. Excerpt follows: >> configuredSize 1000 MB >> Directory 1: >> configured name $GEMSTONE_DATADIR/ >> expanded name /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/ >> configuredSize 1000 MB >> ------------------------------------------------------- >> >> GemStone is unable to open the file !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf >> reason = File = /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf >> DBF Op: Open; DBF Record: -1; >> Error: open() failure; System Codes: errno=13,EACCES, Authorization failure (permission denied) >> >> An error occurred opening the repository for exclusive access. >> >> Stone startup has failed. >> > |
In reply to this post by Johan Brichau-2
As another reference I have two Gemstone 2.4.4.1 servers.
The first runs on a EC2 micro-instance (613 MB):
$ $GEMSTONE/sys/pgsvrslow '/opt/gemstone/product/seaside/data/extent0.dbf' opendbfnolock PGSVR>10000 testreadrate 10000 random pages read in 3600 ms
Avg random read rate: 2777.78 pages/s (3.6000e-01 ms/read) PGSVR>100 20 testbigreadrate 2000 random pages read in 20 IO calls in 197 ms Avg random IO rate: 101.52 IO/s (9.8500e+00 ms/read)
Avg page read rate: 10152.28 pages/s (9.8500e-02 ms/ page read) ---- The second is running on a more sensibly sized (2G RAM) Linode instance: PGSVR>10000 testreadrate 10000 random pages read in 67836 ms Avg random read rate: 147.41 pages/s (6.7836e+00 ms/read) PGSVR>100 20 testbigreadrate
2000 random pages read in 20 IO calls in 798 ms Avg random IO rate: 25.06 IO/s (3.9900e+01 ms/read) Avg page read rate: 2506.27 pages/s (3.9900e-01 ms/ page read) as the first was so slow I repeated: PGSVR>10000 testreadrate 10000 random pages read in 28384 ms Avg random read rate: 352.31 pages/s (2.8384e+00 ms/read)
and again: PGSVR>10000 testreadrate 10000 random pages read in 12671 ms Avg random read rate: 789.20 pages/s (1.2671e+00 ms/read)
odd... The EC2 instance has recently been through a backup and restore and has a smaller extent (320M). Whereas the Linode instance hasn't been through a backup and restore and has a very large extent (4.9G). Perhaps there is some memory paging which is effecting performance?
Nick On 14 February 2012 19:21, Johan Brichau <[hidden email]> wrote: Hi Dale, |
Nick,
Thanks for those figures. At least that gives me some comparison on 'cloud' infrastructure. First off: when I run the test on an extent that is not running, I'm getting *very* good statistics (in the order of those shown by Dale...). Dale: is that normal? I also notice that there is a correlation with extent size, although I suspect it has more to do with file system buffers. I have also had the chance to measure +10000 pages/s random read performance on an extent of 5Gb as well (with 3Gb free space). I'm not an expert on this (by far) but I suspect that smaller extents get buffered more quickly and that the same pages get read multiple times due to their total number being smaller. Here are some representative runs on operational extents: PGSVR>10000 testreadrate 10000 random pages read in 128217 ms Avg random read rate: 77.99 pages/s (1.2822e+01 ms/read) PGSVR>100 20 testbigreadrate 2000 random pages read in 20 IO calls in 4713 ms Avg random IO rate: 4.24 IO/s (2.3565e+02 ms/read) Avg page read rate: 424.36 pages/s (2.3565e+00 ms/ page read) ***** PGSVR>10000 testreadrate 10000 random pages read in 95294 ms Avg random read rate: 104.94 pages/s (9.5294e+00 ms/read) PGSVR>100 20 testbigreadrate 2000 random pages read in 20 IO calls in 3378 ms Avg random IO rate: 5.92 IO/s (1.6890e+02 ms/read) Avg page read rate: 592.07 pages/s (1.6890e+00 ms/ page read) On 15 Feb 2012, at 12:41, Nick Ager wrote: > As another reference I have two Gemstone 2.4.4.1 servers. > > The first runs on a EC2 micro-instance (613 MB): > > $ $GEMSTONE/sys/pgsvrslow > '/opt/gemstone/product/seaside/data/extent0.dbf' opendbfnolock > PGSVR>10000 testreadrate > > 10000 random pages read in 3600 ms > Avg random read rate: 2777.78 pages/s (3.6000e-01 ms/read) > PGSVR>100 20 testbigreadrate > > 2000 random pages read in 20 IO calls in 197 ms > Avg random IO rate: 101.52 IO/s (9.8500e+00 ms/read) > Avg page read rate: 10152.28 pages/s (9.8500e-02 ms/ page read) > > ---- > > The second is running on a more sensibly sized (2G RAM) Linode instance: > > PGSVR>10000 testreadrate > > 10000 random pages read in 67836 ms > Avg random read rate: 147.41 pages/s (6.7836e+00 ms/read) > PGSVR>100 20 testbigreadrate > > 2000 random pages read in 20 IO calls in 798 ms > Avg random IO rate: 25.06 IO/s (3.9900e+01 ms/read) > Avg page read rate: 2506.27 pages/s (3.9900e-01 ms/ page read) > > as the first was so slow I repeated: > > PGSVR>10000 testreadrate > > 10000 random pages read in 28384 ms > Avg random read rate: 352.31 pages/s (2.8384e+00 ms/read) > > and again: > > PGSVR>10000 testreadrate > > 10000 random pages read in 12671 ms > Avg random read rate: 789.20 pages/s (1.2671e+00 ms/read) > > odd... > > The EC2 instance has recently been through a backup and restore and has a smaller extent (320M). Whereas the Linode instance hasn't been through a backup and restore and has a very large extent (4.9G). Perhaps there is some memory paging which is effecting performance? > > Nick > > On 14 February 2012 19:21, Johan Brichau <[hidden email]> wrote: > Hi Dale, > > Thanks for that pointer! > > Your numbers are quite impressive ... even on my local macbook pro I'm getting only numbers like the one below. > Now, I've seen stats 10 times slower on the SAN, depending on the stone, so I'm currently gathering stats. > > ---- > > PGSVR>'/opt/gemstone/stones/test/data/extent0.dbf' opendbfnolock > PGSVR>10000 testreadrate > > 10000 random pages read in 29371 ms > Avg random read rate: 340.47 pages/s (2.9371e+00 ms/read) > PGSVR>100 20 testbigreadrate > > 2000 random pages read in 20 IO calls in 2163 ms > Avg random IO rate: 9.25 IO/s (1.0815e+02 ms/read) > Avg page read rate: 924.64 pages/s (1.0815e+00 ms/ page read) > > > On 14 Feb 2012, at 19:00, Dale Henrichs wrote: > > > Johan, > > > > We have a program that tests performance for a system doing random page reads against an extent. Launch `$GEMSTONE/sys/pgsvrslow` then enter the following two commands at the 'PGSVR>' prompt : > > > > '$GEMSTONE/seaside/data/extent0.dbf' opendbfnolock > > <numpages> testreadrate > > <numpages in block> <numsamples> testbigreadrate > > > > The `testreadrate` command does reads <numpages> random pages from the given extent. The answer you get gives random read performance. > > > > The `testbigreadrate` command does <numsamples> reads of <numpages in block> pages from random locations in the given extent. The answer you get gives you a measure of sequential read performance. > > > > Here's sample output from one of our desktop boxes on standard file system (basically reading from file buffer): > > > > --------------------------------------------------------------------------------- > > % $GEMSTONE/sys/pgsvrslow > > PGSVR>'extent0.dbf' opendbfnolock > > > > PGSVR>10000 testreadrate > > > > 10000 random pages read in 16 ms > > Avg random read rate: 625000.00 pages/s (1.6000e-03 ms/read) > > > > PGSVR>100 20 testbigreadrate > > > > 2000 random pages read in 20 IO calls in 4 ms > > Avg random IO rate: 5000.00 IO/s (2.0000e-01 ms/read) > > Avg page read rate: 500000.00 pages/s (2.0000e-03 ms/ page read) > > PGSVR> > > --------------------------------------------------------------------------------- > > > > These commands can be run against the extent for a running stone ... but you'll want to get measurements with a variety of configurations... > > > > At the moment we're guessing that that the SAN might be optimized for sequential reads rather than random reads (i.e., buffering issues) ... also are you sure the you aren't be throttled by your provider? > > > > Finally it is worth looking at a copy of the config file for the stone to see if there's anything there... > > > > Dale > > > > ----- Original Message ----- > > | From: "Johan Brichau" <[hidden email]> > > | To: "GemStone Seaside beta discussion" <[hidden email]> > > | Sent: Tuesday, February 14, 2012 5:43:58 AM > > | Subject: Re: [GS/SS Beta] slow data page reads? > > | > > | As mentioned in Dale's blogpost, I went on to try a raw disk > > | partition for the extent and the tranlogs and got exactly the same > > | results: *very* low disk read speed (see below). Starting Gemstone > > | and reading the SPC takes a long time. > > | > > | We are pretty certain the SAN is not overloaded because all other > > | disk operations can reach a lot higher speeds. For example, the > > | copydbf operation from the extent file to the partition reached very > > | good speeds of over 30MB/s. > > | > > | So we are only seeing this issue when gemstone is doing read access > > | on this kind of setup. I have other servers where everything is > > | running smoothly. > > | > > | If anybody has any ideas... that would be cool ;-) > > | > > | Johan > > | > > | Sample read speed during gemstone page read: > > | > > | Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > > | avgrq-sz avgqu-sz await svctm %util > > | sda5 111.60 0.00 37.00 0.00 0.58 0.00 > > | 32.00 1.00 26.90 27.01 99.92 > > | > > | > > | On 13 Feb 2012, at 21:09, Johan Brichau wrote: > > | > > | > Well.. it turns out that we were wrong and we still experience the > > | > problem... > > | > > > | > Dale, > > | > > > | > What we are seeing sounds very similar to this: > > | > > > | > http://gemstonesoup.wordpress.com/2007/10/19/scaling-seaside-with-gemstones/ > > | > > > | > " The issue with the i/o anomalies that we observed in Linux has > > | > not been as easy to resolve. I spent some time tuning GemStone/S > > | > to make sure that GemStone/S wasn't the source of the anomaly. > > | > Finally our IS guy was able to reproduce the anomaly and he ran > > | > into a few other folks on the net that have observed similar > > | > anomalies. > > | > > > | > At this writing we haven't found a solution to the anomaly, but we > > | > are pretty optimistic that it is resolvable. We've seen different > > | > versions of Linux running on similar hardware that doesn't show > > | > the anomaly, so it is either a function of the kernel version or > > | > the settings of some of the kernel parameters. As soon as we > > | > figure it out we'll let you know." > > | > > > | > Do you have more information on this? > > | > > > | > Johan > > | > > > | > > > | > On 13 Feb 2012, at 19:39, Otto Behrens wrote: > > | > > > | >> Hi Johan, > > | >> > > | >> We had a machine hosted on a VPS, with a "state of the art" san, > > | >> with > > | >> similar issues. We complained every so often and the service > > | >> provider > > | >> responded with their inability to control some users on the same > > | >> VPS > > | >> host doing "extremely heavy" disk io. We got the client off the > > | >> vps > > | >> onto a normal machine with a SATA disk and have had joy ever since > > | >> (10-20x improvement with the vps at its best). > > | >> > > | >> I think that the randomness of the reads thrown on top of other > > | >> vms on > > | >> the same host just caused unpredictable io; so we prefer avoiding > > | >> vms. > > | >> > > | >> Alternatively, if it can work for you, put the extents in RAM. > > | >> > > | >> Otto > > | >> > > | >> On 13 Feb 2012, at 20:16, Johan Brichau <[hidden email]> > > | >> wrote: > > | >> > > | >>> Hi all, > > | >>> > > | >>> Never mind my question below: our hosters have identified the > > | >>> problem on their SAN. > > | >>> Strange behavior though... > > | >>> > > | >>> phew ;-) > > | >>> Johan > > | >>> > > | >>> On 13 Feb 2012, at 14:05, Johan Brichau wrote: > > | >>> > > | >>>> Hi Gemstoners, > > | >>>> > > | >>>> Is there any condition (other than a slow filesystem) that would > > | >>>> trigger slow page reads when a gem needs to hit disk and load > > | >>>> objects? > > | >>>> > > | >>>> Here is the problem I'm trying to chase: a seaside gem is > > | >>>> processing a request and (according to the statmonit output) > > | >>>> ends up requesting pages. The pageread process goes terribly > > | >>>> slow (takes approx +- 50s) and I see only 5 to 15 pages per > > | >>>> second being read during that time period. There is no other > > | >>>> activity at that moment and I'm puzzled by why the read goes so > > | >>>> slow (other than a slow filesystem -- see next). > > | >>>> > > | >>>> Because the iostat system monitoring also shows the same low > > | >>>> read speed and indicates a 100% disk util statistic, my obvious > > | >>>> first impression was that the disk is saturated and we have > > | >>>> datastore problem. However, the disk read speed proves to be > > | >>>> good when I'm doing other disk activity outside of Gemstone. > > | >>>> Moreover, the _write_ speed is terribly good at all times. > > | >>>> > > | >>>> So, I'm currently trying to chase something that only triggers > > | >>>> slow page read speed from a Gemstone topaz session. > > | >>>> > > | >>>> GEM_IO_LIMIT is set at default setting of 5000 > > | >>>> > > | >>>> For illustration, these are some kind of io stats when Gemstone > > | >>>> is doing read access: > > | >>>> > > | >>>> Time: 06:40:21 PM > > | >>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > > | >>>> avgrq-sz avgqu-sz await svctm %util > > | >>>> sda3 0.00 0.20 6.00 0.40 0.09 0.00 > > | >>>> 30.75 1.00 166.88 156.00 99.84 > > | >>>> > > | >>>> Time: 06:40:26 PM > > | >>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > > | >>>> avgrq-sz avgqu-sz await svctm %util > > | >>>> sda3 0.00 0.20 8.20 0.40 0.13 0.00 > > | >>>> 31.07 1.05 119.91 115.72 99.52 > > | >>>> > > | >>>> Time: 06:40:31 PM > > | >>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > > | >>>> avgrq-sz avgqu-sz await svctm %util > > | >>>> sda3 0.00 0.20 5.99 0.40 0.09 0.00 > > | >>>> 30.75 1.01 157.75 156.25 99.80 > > | >>> > > | > > > | > > | > > |
Hi Johan,
Aren't there two extent related variables. The size of the extent used in the test and the size of the extent in-use by the running stone when the test is executed.
I'll try the tests again using the base 48M /opt/gemstone/product/bin/extent0.seaside.dbf extent and get back to you Nick On 15 February 2012 12:46, Johan Brichau <[hidden email]> wrote: Nick, |
In reply to this post by Dale Henrichs
I have a dedicated server at hetzner hosting (server type EQ6). In this is linux installed with openvz as virtualization. Storage is on scsi disks with software raid1. We have 25 openvz instances running containing streaming server and such but not too high I/O load. The following numbers are from a gemstone installation within an openvz instance.
instance1: ---- PGSVR>'/opt/application/test/data/extent0.dbf' opendbfnolock PGSVR>10000 testreadrate 10000 random pages read in 8538 ms Avg random read rate: 1171.23 pages/s (8.5380e-01 ms/read) PGSVR>100 20 testbigreadrate 2000 random pages read in 20 IO calls in 97 ms Avg random IO rate: 206.19 IO/s (4.8500e+00 ms/read) Avg page read rate: 20618.56 pages/s (4.8500e-02 ms/ page read) PGSVR> ---- instance2: ---- PGSVR>'/opt/application/taskforus/data/extent0.dbf' opendbfnolock PGSVR>10000 testreadrate 10000 random pages read in 185795 ms Avg random read rate: 53.82 pages/s (1.8579e+01 ms/read) PGSVR>PGSVR>100 20 testbigreadrate 2000 random pages read in 20 IO calls in 1009 ms Avg random IO rate: 19.82 IO/s (5.0450e+01 ms/read) Avg page read rate: 1982.16 pages/s (5.0450e-01 ms/ page read) ---- Instance1 and instance2 are nearly the same. But on instance1 I have an 150MB extent. On instance2 which is a real test server and where I do not garbage collect the extent size is 29GB. Norbert Am 14.02.2012 um 19:00 schrieb Dale Henrichs: > Johan, > > We have a program that tests performance for a system doing random page reads against an extent. Launch `$GEMSTONE/sys/pgsvrslow` then enter the following two commands at the 'PGSVR>' prompt : > > '$GEMSTONE/seaside/data/extent0.dbf' opendbfnolock > <numpages> testreadrate > <numpages in block> <numsamples> testbigreadrate > > The `testreadrate` command does reads <numpages> random pages from the given extent. The answer you get gives random read performance. > > The `testbigreadrate` command does <numsamples> reads of <numpages in block> pages from random locations in the given extent. The answer you get gives you a measure of sequential read performance. > > Here's sample output from one of our desktop boxes on standard file system (basically reading from file buffer): > > --------------------------------------------------------------------------------- > % $GEMSTONE/sys/pgsvrslow > PGSVR>'extent0.dbf' opendbfnolock > > PGSVR>10000 testreadrate > > 10000 random pages read in 16 ms > Avg random read rate: 625000.00 pages/s (1.6000e-03 ms/read) > > PGSVR>100 20 testbigreadrate > > 2000 random pages read in 20 IO calls in 4 ms > Avg random IO rate: 5000.00 IO/s (2.0000e-01 ms/read) > Avg page read rate: 500000.00 pages/s (2.0000e-03 ms/ page read) > PGSVR> > --------------------------------------------------------------------------------- > > These commands can be run against the extent for a running stone ... but you'll want to get measurements with a variety of configurations... > > At the moment we're guessing that that the SAN might be optimized for sequential reads rather than random reads (i.e., buffering issues) ... also are you sure the you aren't be throttled by your provider? > > Finally it is worth looking at a copy of the config file for the stone to see if there's anything there... > > Dale > > ----- Original Message ----- > | From: "Johan Brichau" <[hidden email]> > | To: "GemStone Seaside beta discussion" <[hidden email]> > | Sent: Tuesday, February 14, 2012 5:43:58 AM > | Subject: Re: [GS/SS Beta] slow data page reads? > | > | As mentioned in Dale's blogpost, I went on to try a raw disk > | partition for the extent and the tranlogs and got exactly the same > | results: *very* low disk read speed (see below). Starting Gemstone > | and reading the SPC takes a long time. > | > | We are pretty certain the SAN is not overloaded because all other > | disk operations can reach a lot higher speeds. For example, the > | copydbf operation from the extent file to the partition reached very > | good speeds of over 30MB/s. > | > | So we are only seeing this issue when gemstone is doing read access > | on this kind of setup. I have other servers where everything is > | running smoothly. > | > | If anybody has any ideas... that would be cool ;-) > | > | Johan > | > | Sample read speed during gemstone page read: > | > | Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > | avgrq-sz avgqu-sz await svctm %util > | sda5 111.60 0.00 37.00 0.00 0.58 0.00 > | 32.00 1.00 26.90 27.01 99.92 > | > | > | On 13 Feb 2012, at 21:09, Johan Brichau wrote: > | > | > Well.. it turns out that we were wrong and we still experience the > | > problem... > | > > | > Dale, > | > > | > What we are seeing sounds very similar to this: > | > > | > http://gemstonesoup.wordpress.com/2007/10/19/scaling-seaside-with-gemstones/ > | > > | > " The issue with the i/o anomalies that we observed in Linux has > | > not been as easy to resolve. I spent some time tuning GemStone/S > | > to make sure that GemStone/S wasn't the source of the anomaly. > | > Finally our IS guy was able to reproduce the anomaly and he ran > | > into a few other folks on the net that have observed similar > | > anomalies. > | > > | > At this writing we haven't found a solution to the anomaly, but we > | > are pretty optimistic that it is resolvable. We've seen different > | > versions of Linux running on similar hardware that doesn't show > | > the anomaly, so it is either a function of the kernel version or > | > the settings of some of the kernel parameters. As soon as we > | > figure it out we'll let you know." > | > > | > Do you have more information on this? > | > > | > Johan > | > > | > > | > On 13 Feb 2012, at 19:39, Otto Behrens wrote: > | > > | >> Hi Johan, > | >> > | >> We had a machine hosted on a VPS, with a "state of the art" san, > | >> with > | >> similar issues. We complained every so often and the service > | >> provider > | >> responded with their inability to control some users on the same > | >> VPS > | >> host doing "extremely heavy" disk io. We got the client off the > | >> vps > | >> onto a normal machine with a SATA disk and have had joy ever since > | >> (10-20x improvement with the vps at its best). > | >> > | >> I think that the randomness of the reads thrown on top of other > | >> vms on > | >> the same host just caused unpredictable io; so we prefer avoiding > | >> vms. > | >> > | >> Alternatively, if it can work for you, put the extents in RAM. > | >> > | >> Otto > | >> > | >> On 13 Feb 2012, at 20:16, Johan Brichau <[hidden email]> > | >> wrote: > | >> > | >>> Hi all, > | >>> > | >>> Never mind my question below: our hosters have identified the > | >>> problem on their SAN. > | >>> Strange behavior though... > | >>> > | >>> phew ;-) > | >>> Johan > | >>> > | >>> On 13 Feb 2012, at 14:05, Johan Brichau wrote: > | >>> > | >>>> Hi Gemstoners, > | >>>> > | >>>> Is there any condition (other than a slow filesystem) that would > | >>>> trigger slow page reads when a gem needs to hit disk and load > | >>>> objects? > | >>>> > | >>>> Here is the problem I'm trying to chase: a seaside gem is > | >>>> processing a request and (according to the statmonit output) > | >>>> ends up requesting pages. The pageread process goes terribly > | >>>> slow (takes approx +- 50s) and I see only 5 to 15 pages per > | >>>> second being read during that time period. There is no other > | >>>> activity at that moment and I'm puzzled by why the read goes so > | >>>> slow (other than a slow filesystem -- see next). > | >>>> > | >>>> Because the iostat system monitoring also shows the same low > | >>>> read speed and indicates a 100% disk util statistic, my obvious > | >>>> first impression was that the disk is saturated and we have > | >>>> datastore problem. However, the disk read speed proves to be > | >>>> good when I'm doing other disk activity outside of Gemstone. > | >>>> Moreover, the _write_ speed is terribly good at all times. > | >>>> > | >>>> So, I'm currently trying to chase something that only triggers > | >>>> slow page read speed from a Gemstone topaz session. > | >>>> > | >>>> GEM_IO_LIMIT is set at default setting of 5000 > | >>>> > | >>>> For illustration, these are some kind of io stats when Gemstone > | >>>> is doing read access: > | >>>> > | >>>> Time: 06:40:21 PM > | >>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > | >>>> avgrq-sz avgqu-sz await svctm %util > | >>>> sda3 0.00 0.20 6.00 0.40 0.09 0.00 > | >>>> 30.75 1.00 166.88 156.00 99.84 > | >>>> > | >>>> Time: 06:40:26 PM > | >>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > | >>>> avgrq-sz avgqu-sz await svctm %util > | >>>> sda3 0.00 0.20 8.20 0.40 0.13 0.00 > | >>>> 31.07 1.05 119.91 115.72 99.52 > | >>>> > | >>>> Time: 06:40:31 PM > | >>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > | >>>> avgrq-sz avgqu-sz await svctm %util > | >>>> sda3 0.00 0.20 5.99 0.40 0.09 0.00 > | >>>> 30.75 1.01 157.75 156.25 99.80 > | >>> > | > > | > | |
In reply to this post by Nick
Rerunning tests against base 48M extent:
EC2 instance: PGSVR>'/opt/gemstone/product/bin/extent0.seaside.dbf' opendbfnolock PGSVR>10000 testreadrate
10000 random pages read in 2254 ms Avg random read rate: 4436.56 pages/s (2.2540e-01 ms/read) PGSVR>100 20 testbigreadrate 2000 random pages read in 20 IO calls in 13 ms
Avg random IO rate: 1538.46 IO/s (6.5000e-01 ms/read) Avg page read rate: 153846.15 pages/s (6.5000e-03 ms/ page read) --- Linode
PGSVR>'/opt/gemstone/product/bin/extent0.seaside.dbf' opendbfnolock PGSVR>10000 testreadrate 10000 random pages read in 139 ms Avg random read rate: 71942.45 pages/s (1.3900e-02 ms/read)
PGSVR>100 20 testbigreadrate 2000 random pages read in 20 IO calls in 10 ms Avg random IO rate: 2000.00 IO/s (5.0000e-01 ms/read) Avg page read rate: 200000.00 pages/s (5.0000e-03 ms/ page read)
--- Again with EC2 320M (170M free space) extent: PGSVR>'/opt/gemstone/product/seaside/data/extent0.dbf' opendbfnolock
PGSVR>10000 testreadrate 10000 random pages read in 3927 ms Avg random read rate: 2546.47 pages/s (3.9270e-01 ms/read) PGSVR>100 20 testbigreadrate 2000 random pages read in 20 IO calls in 136 ms Avg random IO rate: 147.06 IO/s (6.8000e+00 ms/read) Avg page read rate: 14705.88 pages/s (6.8000e-02 ms/ page read)
----- Again with the 4.9G (4.2G free space) Linode extent: '/opt/gemstone/product/seaside/data/extent0.dbf' opendbfnolock PGSVR>10000 testreadrate
10000 random pages read in 38912 ms Avg random read rate: 256.99 pages/s (3.8912e+00 ms/read) PGSVR>100 20 testbigreadrate 2000 random pages read in 20 IO calls in 623 ms
Avg random IO rate: 32.10 IO/s (3.1150e+01 ms/read) Avg page read rate: 3210.27 pages/s (3.1150e-01 ms/ page read) On 15 February 2012 13:22, Nick Ager <[hidden email]> wrote: Hi Johan, |
In reply to this post by Larry Kellogg
Larry,
>>> [seasideuser@ip-10-191-194-75 data]$ echo $GEMSTONE_SYS_CONF >>> /opt/gemstone/product/seaside/data/system.conf >>> GEMSTONE_SYS_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/system.conf <<<<<<<<<<<<<<<<???????????????? > lrwxrwxrwx 1 seasideuser seasideuser 47 Oct 12 19:18 product -> /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/ I believe that the symbolic link is being resolved and that these two are equivalent. >>> GemStone is unable to open the file !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf >>> reason = File = /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf >>> DBF Op: Open; DBF Record: -1; >>> Error: open() failure; System Codes: errno=13,EACCES, Authorization failure (permission denied) > -rw------- 1 root root 14680064 Feb 15 03:40 extent0.dbf I believe that having the extent owned by root is preventing GemStone from opening the file since seasideuser does not have read/write permission. >>> GEMSTONE_EXE_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/seaside.conf > I swear I only copied in the new extent0 but perhaps I somehow deleted the config file. It seems that seaside.conf was there when GemStone tried to start (it would have reported an error otherwise), but was not there when you did the listing. >>> I'm too tired to figure this out now. I've started and stopped this thing dozens of times and >>> can't figure out why this is failing. Perhaps a little rest and a fresh look will make it a bit more clear! You are actually doing very well and your willingness to journal your saga on the mailing list is helpful to others and provides a nice testimony that people are starting new projects with GemStone. That is a positive reinforcement for all of us! -James On Feb 15, 2012, at 2:06 AM, Lawrence Kellogg wrote: > > On Feb 14, 2012, at 11:40 PM, James Foster wrote: > >> Larry, >> >> What do you get from 'ls -alF /opt/gemstone'? What about 'ls -alF /opt/gemstone/product/seaside/data/'? I ask because I generally set up product as a symbolic link to the actual directory so it can be easily changed for server upgrades (when will you move to 2.4.5?). Also, we need to make sure that there really is a config file! >> > > [seasideuser@ip-10-191-194-75 ~]$ ls -alF /opt/gemstone > total 28 > drwxrwx--- 5 seasideuser seasideuser 4096 Oct 12 19:18 ./ > drwxr-xr-x 4 root root 4096 Oct 12 19:18 ../ > drwxr-xr-x 17 seasideuser seasideuser 4096 Oct 13 02:15 GemStone64Bit2.4.4.1-x86_64.Linux/ > drwxrwx--- 2 seasideuser seasideuser 4096 Feb 15 04:18 locks/ > drwxrwxrwx 3 seasideuser seasideuser 12288 Feb 15 04:05 log/ > lrwxrwxrwx 1 seasideuser seasideuser 47 Oct 12 19:18 product -> /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/ > [seasideuser@ip-10-191-194-75 ~]$ ls -alF /opt/gemstone/product/seaside/data > total 47144 > drwxrwxr-x 4 seasideuser seasideuser 4096 Feb 15 04:26 ./ > drwxrwxr-x 9 seasideuser seasideuser 4096 Jul 13 2010 ../ > drwxrwxr-x 2 seasideuser seasideuser 4096 Feb 15 02:41 backups/ > -rw------- 1 root root 14680064 Feb 15 03:40 extent0.dbf > -rw-r--r-- 1 seasideuser seasideuser 229 Jul 13 2010 gem.conf > drwxr-xr-x 2 root root 4096 Feb 15 03:40 old/ > -rw-r--r-- 1 seasideuser seasideuser 478 Jul 13 2010 system.conf > -rw-rw-r-- 1 seasideuser seasideuser 35840 Feb 14 20:56 tranlog1.dbf > -rw-rw-r-- 1 seasideuser seasideuser 4320256 Feb 15 03:39 tranlog2.dbf > -rw-rw-r-- 1 seasideuser seasideuser 753152 Feb 15 03:24 tranlog5.dbf > -rw-rw-r-- 1 seasideuser seasideuser 10240 Feb 15 03:24 tranlog6.dbf > -rw-rw-r-- 1 seasideuser seasideuser 28443136 Feb 15 03:27 tranlog7.dbf > [seasideuser@ip-10-191-194-75 ~]$ > > This seems right to me, unless I'm missing something. Is the symbolic link for product correct? > I see gem.conf and system.conf files. I swear I only copied in the new extent0 but perhaps I somehow deleted the config file. > I did a ". defSeaside" and that runs ok. > > Just when I think I understand how it all works, I run into these sorts of issues that leave me stymied. I will migrate to > 2.4.5 when I get a few spare moments. It's not easy being the admin/programmer/designer/customer service rep/marketeer, but it > sure is a lot of fun. > > Larry > > > >> Also, while the default installation puts the config files, extents, transaction logs, and log files in the product tree, I prefer to have them elsewhere (/opt/gemstone/etc, /opt/gemstone/data, /opt/gemstone/log) so that upgrades do not require moving as much stuff. >> >> -James >> >> On Feb 14, 2012, at 8:23 PM, Lawrence Kellogg wrote: >> >>> If this is true: >>> >>> [seasideuser@ip-10-191-194-75 data]$ echo $GEMSTONE_SYS_CONF >>> /opt/gemstone/product/seaside/data/system.conf >>> >>> >>> why does Gemstone insist in trying to start the stone in the other directory, the one with no extent0.dbf??? >>> >>> I'm too tired to figure this out now. I've started and stopped this thing dozens of times and >>> can't figure out why this is failing. :luckily it's just Staging so it can wait. I had shut down the stone >>> to restore a backup and then reply log files….but I don't understand this error… >>> >>> Larry >>> >>> >>> >>> [seasideuser@ip-10-191-194-75 data]$ startstone seaside >>> startstone[Info]: GemStone version '2.4.4.1' >>> startstone[Info]: Starting Stone repository monitor "seaside". >>> startstone[Info]: GEMSTONE is: "/opt/gemstone/product". >>> startstone[Warning]: /usr/lib64/libposix-aio.so not found, using librt.so >>> startstone[Info]: >>> GEMSTONE_SYS_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/system.conf <<<<<<<<<<<<<<<<???????????????? >>> GEMSTONE_EXE_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/seaside.conf <<<<<<<<<<<<<<<<<???????????????? >>> startstone[Info]: Log file is '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log'. >>> >>> startstone[Error]: Stone process (id=14170) has died. >>> startstone[Error]: Examine '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log' for more information. Excerpt follows: >>> configuredSize 1000 MB >>> Directory 1: >>> configured name $GEMSTONE_DATADIR/ >>> expanded name /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/ >>> configuredSize 1000 MB >>> ------------------------------------------------------- >>> >>> GemStone is unable to open the file !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf >>> reason = File = /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf >>> DBF Op: Open; DBF Record: -1; >>> Error: open() failure; System Codes: errno=13,EACCES, Authorization failure (permission denied) >>> >>> An error occurred opening the repository for exclusive access. >>> >>> Stone startup has failed. >>> >> > |
Free forum by Nabble | Edit this page |