slow data page reads?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
50 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

James Foster-8
Larry,

Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore.

James

On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote:

> Hello James,
>  Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent.
> I guess I could find the extent0 from the distribution and copy it in?
>
>  Larry
>
>
> On Feb 14, 2012, at 2:07 PM, James Foster wrote:
>
>> Larry,
>>
>> Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large?
>>
>> James
>>
>> On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote:
>>
>>> Hello,
>>> So, I run two separate Amazon instances, once called Production, and one called Staging.
>>> My plan was to back up production twice a day and load the backups into the Staging environment.
>>> Well, here are my extent sizes:
>>>
>>> 515899392 Feb 14 18:32 extent0.dbf - Production
>>> 4192206848 Feb 14 18:32 extent0.dbf - Staging
>>>
>>> Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production
>>> into Staging?  I'm reading the Admin guide as fast as I can but I don't know what is going on.
>>>
>>> I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it
>>> goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up?
>>>
>>> Thanks,
>>>
>>> Larry
>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: slow data page reads?

Johan Brichau-2
In reply to this post by Dale Henrichs
Hi Dale,

Thanks for that pointer!

Your numbers are quite impressive ... even on my local macbook pro I'm getting only numbers like the one below.
Now, I've seen stats 10 times slower on the SAN, depending on the stone, so I'm currently gathering stats.

----

PGSVR>'/opt/gemstone/stones/test/data/extent0.dbf' opendbfnolock
PGSVR>10000 testreadrate

10000 random pages read in 29371 ms
Avg random read rate: 340.47 pages/s (2.9371e+00 ms/read)
PGSVR>100 20 testbigreadrate

2000 random pages read in 20 IO calls in 2163 ms
Avg random IO rate: 9.25 IO/s (1.0815e+02 ms/read)
Avg page read rate: 924.64 pages/s (1.0815e+00 ms/ page read)


On 14 Feb 2012, at 19:00, Dale Henrichs wrote:

> Johan,
>
> We have a program that tests performance for a system doing random page reads against an extent. Launch `$GEMSTONE/sys/pgsvrslow` then enter the following two commands at the 'PGSVR>' prompt :
>
>  '$GEMSTONE/seaside/data/extent0.dbf' opendbfnolock
>  <numpages> testreadrate
>  <numpages in block> <numsamples> testbigreadrate
>
> The `testreadrate` command does reads <numpages> random pages from the given extent. The answer you get gives random read performance.
>
> The `testbigreadrate` command does <numsamples> reads of <numpages in block> pages from random locations in the given extent. The answer you get gives you a measure of sequential read performance.
>
> Here's sample output from one of our desktop boxes on standard file system (basically reading from file buffer):
>
> ---------------------------------------------------------------------------------
> % $GEMSTONE/sys/pgsvrslow
> PGSVR>'extent0.dbf' opendbfnolock
>
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 16 ms
> Avg random read rate: 625000.00 pages/s (1.6000e-03 ms/read)
>
> PGSVR>100 20 testbigreadrate
>
> 2000 random pages read in 20 IO calls in 4 ms
> Avg random IO rate: 5000.00 IO/s (2.0000e-01 ms/read)
> Avg page read rate: 500000.00 pages/s (2.0000e-03 ms/ page read)
> PGSVR>
> ---------------------------------------------------------------------------------
>
> These commands can be run against the extent for a running stone ... but you'll want to get measurements with a variety of configurations...
>
> At the moment we're guessing that that the SAN might be optimized for sequential reads rather than random reads (i.e., buffering issues) ... also are you sure the you aren't be throttled by your provider?
>
> Finally it is worth looking at a copy of the config file for the stone to see if there's anything there...
>
> Dale
>
> ----- Original Message -----
> | From: "Johan Brichau" <[hidden email]>
> | To: "GemStone Seaside beta discussion" <[hidden email]>
> | Sent: Tuesday, February 14, 2012 5:43:58 AM
> | Subject: Re: [GS/SS Beta] slow data page reads?
> |
> | As mentioned in Dale's blogpost, I went on to try a raw disk
> | partition for the extent and the tranlogs and got exactly the same
> | results: *very* low disk read speed (see below). Starting Gemstone
> | and reading the SPC takes a long time.
> |
> | We are pretty certain the SAN is not overloaded because all other
> | disk operations can reach a lot higher speeds. For example, the
> | copydbf operation from the extent file to the partition reached very
> | good speeds of over 30MB/s.
> |
> | So we are only seeing this issue when gemstone is doing read access
> | on this kind of setup. I have other servers where everything is
> | running smoothly.
> |
> | If anybody has any ideas... that would be cool ;-)
> |
> | Johan
> |
> | Sample read speed during gemstone page read:
> |
> | Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> | avgrq-sz avgqu-sz   await  svctm  %util
> | sda5            111.60     0.00 37.00  0.00     0.58     0.00
> |    32.00     1.00   26.90  27.01  99.92
> |
> |
> | On 13 Feb 2012, at 21:09, Johan Brichau wrote:
> |
> | > Well.. it turns out that we were wrong and we still experience the
> | > problem...
> | >
> | > Dale,
> | >
> | > What we are seeing sounds very similar to this:
> | >
> | > http://gemstonesoup.wordpress.com/2007/10/19/scaling-seaside-with-gemstones/
> | >
> | > " The issue with the i/o anomalies that we observed in Linux has
> | > not been as easy to resolve. I spent some time tuning GemStone/S
> | > to make sure that GemStone/S wasn't the source of the anomaly.
> | > Finally our IS guy was able to reproduce the anomaly and he ran
> | > into a few other folks on the net that have observed similar
> | > anomalies.
> | >
> | > At this writing we haven't found a solution to the anomaly, but we
> | > are pretty optimistic that it is resolvable. We've seen different
> | > versions of Linux running on similar hardware that doesn't show
> | > the anomaly, so it is either a function of the kernel version or
> | > the settings of some of the kernel parameters. As soon as we
> | > figure it out we'll let you know."
> | >
> | > Do you have more information on this?
> | >
> | > Johan
> | >
> | >
> | > On 13 Feb 2012, at 19:39, Otto Behrens wrote:
> | >
> | >> Hi Johan,
> | >>
> | >> We had a machine hosted on a VPS, with a "state of the art" san,
> | >> with
> | >> similar issues. We complained every so often and the service
> | >> provider
> | >> responded with their inability to control some users on the same
> | >> VPS
> | >> host doing "extremely heavy" disk io. We got the client off the
> | >> vps
> | >> onto a normal machine with a SATA disk and have had joy ever since
> | >> (10-20x improvement with the vps at its best).
> | >>
> | >> I think that the randomness of the reads thrown on top of other
> | >> vms on
> | >> the same host just caused unpredictable io; so we prefer avoiding
> | >> vms.
> | >>
> | >> Alternatively, if it can work for you, put the extents in RAM.
> | >>
> | >> Otto
> | >>
> | >> On 13 Feb 2012, at 20:16, Johan Brichau <[hidden email]>
> | >> wrote:
> | >>
> | >>> Hi all,
> | >>>
> | >>> Never mind my question below: our hosters have identified the
> | >>> problem on their SAN.
> | >>> Strange behavior though...
> | >>>
> | >>> phew ;-)
> | >>> Johan
> | >>>
> | >>> On 13 Feb 2012, at 14:05, Johan Brichau wrote:
> | >>>
> | >>>> Hi Gemstoners,
> | >>>>
> | >>>> Is there any condition (other than a slow filesystem) that would
> | >>>> trigger slow page reads when a gem needs to hit disk and load
> | >>>> objects?
> | >>>>
> | >>>> Here is the problem I'm trying to chase: a seaside gem is
> | >>>> processing a request and (according to the statmonit output)
> | >>>> ends up requesting pages. The pageread process goes terribly
> | >>>> slow (takes approx +- 50s) and I see only 5 to 15 pages per
> | >>>> second being read during that time period. There is no other
> | >>>> activity at that moment and I'm puzzled by why the read goes so
> | >>>> slow (other than a slow filesystem -- see next).
> | >>>>
> | >>>> Because the iostat system monitoring also shows the same low
> | >>>> read speed and indicates a 100% disk util statistic, my obvious
> | >>>> first impression was that the disk is saturated and we have
> | >>>> datastore problem. However, the disk read speed proves to be
> | >>>> good when I'm doing other disk activity outside of Gemstone.
> | >>>> Moreover, the _write_ speed is terribly good at all times.
> | >>>>
> | >>>> So, I'm currently trying to chase something that only triggers
> | >>>> slow page read speed from a Gemstone topaz session.
> | >>>>
> | >>>> GEM_IO_LIMIT is set at default setting of 5000
> | >>>>
> | >>>> For illustration, these are some kind of io stats when Gemstone
> | >>>> is doing read access:
> | >>>>
> | >>>> Time: 06:40:21 PM
> | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> | >>>> sda3              0.00     0.20  6.00  0.40     0.09     0.00
> | >>>>    30.75     1.00  166.88 156.00  99.84
> | >>>>
> | >>>> Time: 06:40:26 PM
> | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> | >>>> sda3              0.00     0.20  8.20  0.40     0.13     0.00
> | >>>>    31.07     1.05  119.91 115.72  99.52
> | >>>>
> | >>>> Time: 06:40:31 PM
> | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> | >>>> sda3              0.00     0.20  5.99  0.40     0.09     0.00
> | >>>>    30.75     1.01  157.75 156.25  99.80
> | >>>
> | >
> |
> |

Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

Larry Kellogg
In reply to this post by James Foster-8


On Feb 14, 2012, at 2:21 PM, James Foster wrote:

> Larry,
>
> Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore.
>

Thanks, James. Clearly, I'm no Gemstone admin. ;-) I was assuming that the extent would
assume the size of the backup, a bad assumption, I see…

  Larry


> James
>
> On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote:
>
>> Hello James,
>> Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent.
>> I guess I could find the extent0 from the distribution and copy it in?
>>
>> Larry
>>
>>
>> On Feb 14, 2012, at 2:07 PM, James Foster wrote:
>>
>>> Larry,
>>>
>>> Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large?
>>>
>>> James
>>>
>>> On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote:
>>>
>>>> Hello,
>>>> So, I run two separate Amazon instances, once called Production, and one called Staging.
>>>> My plan was to back up production twice a day and load the backups into the Staging environment.
>>>> Well, here are my extent sizes:
>>>>
>>>> 515899392 Feb 14 18:32 extent0.dbf - Production
>>>> 4192206848 Feb 14 18:32 extent0.dbf - Staging
>>>>
>>>> Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production
>>>> into Staging?  I'm reading the Admin guide as fast as I can but I don't know what is going on.
>>>>
>>>> I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it
>>>> goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up?
>>>>
>>>> Thanks,
>>>>
>>>> Larry
>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

Larry Kellogg

On Feb 14, 2012, at 2:43 PM, Lawrence Kellogg wrote:

>
>
> On Feb 14, 2012, at 2:21 PM, James Foster wrote:
>
>> Larry,
>>
>> Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore.
>>
>
>

James,

 Ok, I was successful in copying over the original extent, and applying my backup on the Staging system. Things are much better now.

  Now, I want to do the same thing for production where I also have an inflated repository from some backups I loaded there.

  I just want to start from scratch from a backup, ignoring all the tranlogs, some of which are huge. In a scrabble for
disk space over there, I deleted some old tranelogs. I know, I know, stupid.

  How do I start over? I don't have a lot of user traffic yet so now seems like the time. Given Seaside, how do
I make sure there are no users hitting the server so I can make my backup and restore it? Should I
kill nginx?

  I can't seem to find a command to force off all current sessions and then suspend login.

  Larry



>
>> James
>>
>> On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote:
>>
>>> Hello James,
>>> Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent.
>>> I guess I could find the extent0 from the distribution and copy it in?
>>>
>>> Larry
>>>
>>>
>>> On Feb 14, 2012, at 2:07 PM, James Foster wrote:
>>>
>>>> Larry,
>>>>
>>>> Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large?
>>>>
>>>> James
>>>>
>>>> On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote:
>>>>
>>>>> Hello,
>>>>> So, I run two separate Amazon instances, once called Production, and one called Staging.
>>>>> My plan was to back up production twice a day and load the backups into the Staging environment.
>>>>> Well, here are my extent sizes:
>>>>>
>>>>> 515899392 Feb 14 18:32 extent0.dbf - Production
>>>>> 4192206848 Feb 14 18:32 extent0.dbf - Staging
>>>>>
>>>>> Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production
>>>>> into Staging?  I'm reading the Admin guide as fast as I can but I don't know what is going on.
>>>>>
>>>>> I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it
>>>>> goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Larry
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

James Foster-8
Larry,

As long as your have a good production system, then the loss of previous transaction logs is not really a problem. They are only necessary for restoring from a crash.

To shrink your production repository, you want to make a backup of an otherwise inactive system (i.e., all other user sessions logged off). To do this, log in to a new session (Topaz, Jade, GemTools, etc.) and evaluate the following:
        System stopUserSessions.
Then proceed with a full backup, shutdown, copy in a new extent, start the stone, restore the backup, commit the restore, and then start up the user sessions.

You might check the size before and after:
        SystemRepository fileSizeReport.

-James


On Feb 14, 2012, at 1:58 PM, Lawrence Kellogg wrote:

>
> On Feb 14, 2012, at 2:43 PM, Lawrence Kellogg wrote:
>
>>
>>
>> On Feb 14, 2012, at 2:21 PM, James Foster wrote:
>>
>>> Larry,
>>>
>>> Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore.
>>>
>>
>>
>
> James,
>
> Ok, I was successful in copying over the original extent, and applying my backup on the Staging system. Things are much better now.
>
>  Now, I want to do the same thing for production where I also have an inflated repository from some backups I loaded there.
>
>  I just want to start from scratch from a backup, ignoring all the tranlogs, some of which are huge. In a scrabble for
> disk space over there, I deleted some old tranelogs. I know, I know, stupid.
>
>  How do I start over? I don't have a lot of user traffic yet so now seems like the time. Given Seaside, how do
> I make sure there are no users hitting the server so I can make my backup and restore it? Should I
> kill nginx?
>
>  I can't seem to find a command to force off all current sessions and then suspend login.
>
>  Larry
>
>
>
>>
>>> James
>>>
>>> On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote:
>>>
>>>> Hello James,
>>>> Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent.
>>>> I guess I could find the extent0 from the distribution and copy it in?
>>>>
>>>> Larry
>>>>
>>>>
>>>> On Feb 14, 2012, at 2:07 PM, James Foster wrote:
>>>>
>>>>> Larry,
>>>>>
>>>>> Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large?
>>>>>
>>>>> James
>>>>>
>>>>> On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote:
>>>>>
>>>>>> Hello,
>>>>>> So, I run two separate Amazon instances, once called Production, and one called Staging.
>>>>>> My plan was to back up production twice a day and load the backups into the Staging environment.
>>>>>> Well, here are my extent sizes:
>>>>>>
>>>>>> 515899392 Feb 14 18:32 extent0.dbf - Production
>>>>>> 4192206848 Feb 14 18:32 extent0.dbf - Staging
>>>>>>
>>>>>> Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production
>>>>>> into Staging?  I'm reading the Admin guide as fast as I can but I don't know what is going on.
>>>>>>
>>>>>> I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it
>>>>>> goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Larry
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

Larry Kellogg

On Feb 14, 2012, at 5:14 PM, James Foster wrote:

> Larry,
>
> As long as your have a good production system, then the loss of previous transaction logs is not really a problem. They are only necessary for restoring from a crash.
>
> To shrink your production repository, you want to make a backup of an otherwise inactive system (i.e., all other user sessions logged off). To do this, log in to a new session (Topaz, Jade, GemTools, etc.) and evaluate the following:
> System stopUserSessions.

Thanks, the funny thing is that it still shows me some user sessions after doing that….maybe it takes a
minute for them to go away….???

topaz 1> printit
System stopUserSessions.
%
System class
  superClass      Object class
  format          32
  instVars        0
  instVarNames    an Array
  constraints     an Array
  classVars       a SymbolDictionary
  methodDict      a GsMethodDictionary
  poolDictionaries an Array
  categories      a GsMethodDictionary
  secondarySuperclasses nil
  name            System
  classHistory    a ClassHistory
  description     a GsClassDocumentation
  migrationDestination nil
  timeStamp       a DateTime
  userId          SystemUser
  extraDict       a SymbolDictionary
  classCategory   nil
  subclasses      nil

topaz 1> printit
System currentSessionNames.
%

session number: 2    UserId: GcUser
session number: 3    UserId: SymbolUser
session number: 4    UserId: GcUser
session number: 6    UserId: DataCurator
topaz 1>

> Then proceed with a full backup, shutdown, copy in a new extent, start the stone, restore the backup, commit the restore, and then start up the user sessions.
>
> You might check the size before and after:
> SystemRepository fileSizeReport.
>
> -James
>
>
> On Feb 14, 2012, at 1:58 PM, Lawrence Kellogg wrote:
>
>>
>> On Feb 14, 2012, at 2:43 PM, Lawrence Kellogg wrote:
>>
>>>
>>>
>>> On Feb 14, 2012, at 2:21 PM, James Foster wrote:
>>>
>>>> Larry,
>>>>
>>>> Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore.
>>>>
>>>
>>>
>>
>> James,
>>
>> Ok, I was successful in copying over the original extent, and applying my backup on the Staging system. Things are much better now.
>>
>> Now, I want to do the same thing for production where I also have an inflated repository from some backups I loaded there.
>>
>> I just want to start from scratch from a backup, ignoring all the tranlogs, some of which are huge. In a scrabble for
>> disk space over there, I deleted some old tranelogs. I know, I know, stupid.
>>
>> How do I start over? I don't have a lot of user traffic yet so now seems like the time. Given Seaside, how do
>> I make sure there are no users hitting the server so I can make my backup and restore it? Should I
>> kill nginx?
>>
>> I can't seem to find a command to force off all current sessions and then suspend login.
>>
>> Larry
>>
>>
>>
>>>
>>>> James
>>>>
>>>> On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote:
>>>>
>>>>> Hello James,
>>>>> Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent.
>>>>> I guess I could find the extent0 from the distribution and copy it in?
>>>>>
>>>>> Larry
>>>>>
>>>>>
>>>>> On Feb 14, 2012, at 2:07 PM, James Foster wrote:
>>>>>
>>>>>> Larry,
>>>>>>
>>>>>> Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large?
>>>>>>
>>>>>> James
>>>>>>
>>>>>> On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>> So, I run two separate Amazon instances, once called Production, and one called Staging.
>>>>>>> My plan was to back up production twice a day and load the backups into the Staging environment.
>>>>>>> Well, here are my extent sizes:
>>>>>>>
>>>>>>> 515899392 Feb 14 18:32 extent0.dbf - Production
>>>>>>> 4192206848 Feb 14 18:32 extent0.dbf - Staging
>>>>>>>
>>>>>>> Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production
>>>>>>> into Staging?  I'm reading the Admin guide as fast as I can but I don't know what is going on.
>>>>>>>
>>>>>>> I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it
>>>>>>> goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Larry
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

James Foster-8
Good observation and it is smart to check. In this case, though, it is okay. We make a distinction between "user" sessions and "system" sessions. The two GcUsers are doing background garbage collection and the SymbolUser is available to create Symbols if needed. These are built-in users and activities. While they can be stopped, it is rare to need to do so. The remaining one is you, DataCurator. You may proceed with the backup with them logged in...

On Feb 14, 2012, at 2:25 PM, Lawrence Kellogg wrote:

>
> On Feb 14, 2012, at 5:14 PM, James Foster wrote:
>
>> Larry,
>>
>> As long as your have a good production system, then the loss of previous transaction logs is not really a problem. They are only necessary for restoring from a crash.
>>
>> To shrink your production repository, you want to make a backup of an otherwise inactive system (i.e., all other user sessions logged off). To do this, log in to a new session (Topaz, Jade, GemTools, etc.) and evaluate the following:
>> System stopUserSessions.
>
> Thanks, the funny thing is that it still shows me some user sessions after doing that….maybe it takes a
> minute for them to go away….???
>
> topaz 1> printit
> System stopUserSessions.
> %
> System class
>  superClass      Object class
>  format          32
>  instVars        0
>  instVarNames    an Array
>  constraints     an Array
>  classVars       a SymbolDictionary
>  methodDict      a GsMethodDictionary
>  poolDictionaries an Array
>  categories      a GsMethodDictionary
>  secondarySuperclasses nil
>  name            System
>  classHistory    a ClassHistory
>  description     a GsClassDocumentation
>  migrationDestination nil
>  timeStamp       a DateTime
>  userId          SystemUser
>  extraDict       a SymbolDictionary
>  classCategory   nil
>  subclasses      nil
>
> topaz 1> printit
> System currentSessionNames.
> %
>
> session number: 2    UserId: GcUser
> session number: 3    UserId: SymbolUser
> session number: 4    UserId: GcUser
> session number: 6    UserId: DataCurator
> topaz 1>
>
>> Then proceed with a full backup, shutdown, copy in a new extent, start the stone, restore the backup, commit the restore, and then start up the user sessions.
>>
>> You might check the size before and after:
>> SystemRepository fileSizeReport.
>>
>> -James
>>
>>
>> On Feb 14, 2012, at 1:58 PM, Lawrence Kellogg wrote:
>>
>>>
>>> On Feb 14, 2012, at 2:43 PM, Lawrence Kellogg wrote:
>>>
>>>>
>>>>
>>>> On Feb 14, 2012, at 2:21 PM, James Foster wrote:
>>>>
>>>>> Larry,
>>>>>
>>>>> Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore.
>>>>>
>>>>
>>>>
>>>
>>> James,
>>>
>>> Ok, I was successful in copying over the original extent, and applying my backup on the Staging system. Things are much better now.
>>>
>>> Now, I want to do the same thing for production where I also have an inflated repository from some backups I loaded there.
>>>
>>> I just want to start from scratch from a backup, ignoring all the tranlogs, some of which are huge. In a scrabble for
>>> disk space over there, I deleted some old tranelogs. I know, I know, stupid.
>>>
>>> How do I start over? I don't have a lot of user traffic yet so now seems like the time. Given Seaside, how do
>>> I make sure there are no users hitting the server so I can make my backup and restore it? Should I
>>> kill nginx?
>>>
>>> I can't seem to find a command to force off all current sessions and then suspend login.
>>>
>>> Larry
>>>
>>>
>>>
>>>>
>>>>> James
>>>>>
>>>>> On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote:
>>>>>
>>>>>> Hello James,
>>>>>> Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent.
>>>>>> I guess I could find the extent0 from the distribution and copy it in?
>>>>>>
>>>>>> Larry
>>>>>>
>>>>>>
>>>>>> On Feb 14, 2012, at 2:07 PM, James Foster wrote:
>>>>>>
>>>>>>> Larry,
>>>>>>>
>>>>>>> Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large?
>>>>>>>
>>>>>>> James
>>>>>>>
>>>>>>> On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>> So, I run two separate Amazon instances, once called Production, and one called Staging.
>>>>>>>> My plan was to back up production twice a day and load the backups into the Staging environment.
>>>>>>>> Well, here are my extent sizes:
>>>>>>>>
>>>>>>>> 515899392 Feb 14 18:32 extent0.dbf - Production
>>>>>>>> 4192206848 Feb 14 18:32 extent0.dbf - Staging
>>>>>>>>
>>>>>>>> Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production
>>>>>>>> into Staging?  I'm reading the Admin guide as fast as I can but I don't know what is going on.
>>>>>>>>
>>>>>>>> I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it
>>>>>>>> goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Larry
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

Larry Kellogg

On Feb 14, 2012, at 5:28 PM, James Foster wrote:

> Good observation and it is smart to check. In this case, though, it is okay. We make a distinction between "user" sessions and "system" sessions. The two GcUsers are doing background garbage collection and the SymbolUser is available to create Symbols if needed. These are built-in users and activities. While they can be stopped, it is rare to need to do so. The remaining one is you, DataCurator. You may proceed with the backup with them logged in…
>

James,
  Whew. I did it but have a few questions.

  First, why did my Free Space go to 25 meg?? Do I have to manually add an extent?

SystemRepository fileSizeReport
%
Extent #1
-----------
   Filename = !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf

   File size =       492.00 Megabytes
   Space available = 279.48 Megabytes

Totals
------
   Repository size = 492.00 Megabytes
   Free Space =      279.48 Megabytes

topaz 1>


AFTER

SystemRepository fileSizeReport
%
Extent #1
-----------
   Filename = !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf

   File size =       218.00 Megabytes
   Space available = 25.88 Megabytes

Totals
------
   Repository size = 218.00 Megabytes
   Free Space =      25.88 Megabytes

topaz 1>


Also, I guess this is nil because there are no new transactions after the restore, but I am not sure…

topaz 1> printit
SystemRepository restoreStatusOldestFileId
%
nil
topaz


Larry


> On Feb 14, 2012, at 2:25 PM, Lawrence Kellogg wrote:
>
>>
>> On Feb 14, 2012, at 5:14 PM, James Foster wrote:
>>
>>> Larry,
>>>
>>> As long as your have a good production system, then the loss of previous transaction logs is not really a problem. They are only necessary for restoring from a crash.
>>>
>>> To shrink your production repository, you want to make a backup of an otherwise inactive system (i.e., all other user sessions logged off). To do this, log in to a new session (Topaz, Jade, GemTools, etc.) and evaluate the following:
>>> System stopUserSessions.
>>
>> Thanks, the funny thing is that it still shows me some user sessions after doing that….maybe it takes a
>> minute for them to go away….???
>>
>> topaz 1> printit
>> System stopUserSessions.
>> %
>> System class
>> superClass      Object class
>> format          32
>> instVars        0
>> instVarNames    an Array
>> constraints     an Array
>> classVars       a SymbolDictionary
>> methodDict      a GsMethodDictionary
>> poolDictionaries an Array
>> categories      a GsMethodDictionary
>> secondarySuperclasses nil
>> name            System
>> classHistory    a ClassHistory
>> description     a GsClassDocumentation
>> migrationDestination nil
>> timeStamp       a DateTime
>> userId          SystemUser
>> extraDict       a SymbolDictionary
>> classCategory   nil
>> subclasses      nil
>>
>> topaz 1> printit
>> System currentSessionNames.
>> %
>>
>> session number: 2    UserId: GcUser
>> session number: 3    UserId: SymbolUser
>> session number: 4    UserId: GcUser
>> session number: 6    UserId: DataCurator
>> topaz 1>
>>
>>> Then proceed with a full backup, shutdown, copy in a new extent, start the stone, restore the backup, commit the restore, and then start up the user sessions.
>>>
>>> You might check the size before and after:
>>> SystemRepository fileSizeReport.
>>>
>>> -James
>>>
>>>
>>> On Feb 14, 2012, at 1:58 PM, Lawrence Kellogg wrote:
>>>
>>>>
>>>> On Feb 14, 2012, at 2:43 PM, Lawrence Kellogg wrote:
>>>>
>>>>>
>>>>>
>>>>> On Feb 14, 2012, at 2:21 PM, James Foster wrote:
>>>>>
>>>>>> Larry,
>>>>>>
>>>>>> Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore.
>>>>>>
>>>>>
>>>>>
>>>>
>>>> James,
>>>>
>>>> Ok, I was successful in copying over the original extent, and applying my backup on the Staging system. Things are much better now.
>>>>
>>>> Now, I want to do the same thing for production where I also have an inflated repository from some backups I loaded there.
>>>>
>>>> I just want to start from scratch from a backup, ignoring all the tranlogs, some of which are huge. In a scrabble for
>>>> disk space over there, I deleted some old tranelogs. I know, I know, stupid.
>>>>
>>>> How do I start over? I don't have a lot of user traffic yet so now seems like the time. Given Seaside, how do
>>>> I make sure there are no users hitting the server so I can make my backup and restore it? Should I
>>>> kill nginx?
>>>>
>>>> I can't seem to find a command to force off all current sessions and then suspend login.
>>>>
>>>> Larry
>>>>
>>>>
>>>>
>>>>>
>>>>>> James
>>>>>>
>>>>>> On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote:
>>>>>>
>>>>>>> Hello James,
>>>>>>> Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent.
>>>>>>> I guess I could find the extent0 from the distribution and copy it in?
>>>>>>>
>>>>>>> Larry
>>>>>>>
>>>>>>>
>>>>>>> On Feb 14, 2012, at 2:07 PM, James Foster wrote:
>>>>>>>
>>>>>>>> Larry,
>>>>>>>>
>>>>>>>> Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large?
>>>>>>>>
>>>>>>>> James
>>>>>>>>
>>>>>>>> On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>> So, I run two separate Amazon instances, once called Production, and one called Staging.
>>>>>>>>> My plan was to back up production twice a day and load the backups into the Staging environment.
>>>>>>>>> Well, here are my extent sizes:
>>>>>>>>>
>>>>>>>>> 515899392 Feb 14 18:32 extent0.dbf - Production
>>>>>>>>> 4192206848 Feb 14 18:32 extent0.dbf - Staging
>>>>>>>>>
>>>>>>>>> Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production
>>>>>>>>> into Staging?  I'm reading the Admin guide as fast as I can but I don't know what is going on.
>>>>>>>>>
>>>>>>>>> I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it
>>>>>>>>> goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Larry
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

James Foster-8

On Feb 14, 2012, at 3:11 PM, Lawrence Kellogg wrote:

>
> On Feb 14, 2012, at 5:28 PM, James Foster wrote:
>
>> Good observation and it is smart to check. In this case, though, it is okay. We make a distinction between "user" sessions and "system" sessions. The two GcUsers are doing background garbage collection and the SymbolUser is available to create Symbols if needed. These are built-in users and activities. While they can be stopped, it is rare to need to do so. The remaining one is you, DataCurator. You may proceed with the backup with them logged in…
>>
>
> James,
>  Whew. I did it but have a few questions.
>
>  First, why did my Free Space go to 25 meg?? Do I have to manually add an extent?

Just as doing a restore in a big database will leave lots of free space, doing a restore in a small database will leave a little bit of free space (imagine the object tree that existed moments before you did the commitRestore). Also, it is possible that there was some garbage collection in process at the time of the backup and it finished after the restore. I'm not certain of either of these theories, but ~12% free space is not too big a deal. With luck, your business will have reason to consume that space soon!

> SystemRepository fileSizeReport
> %
> Extent #1
> -----------
>   Filename = !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf
>
>   File size =       492.00 Megabytes
>   Space available = 279.48 Megabytes
>
> Totals
> ------
>   Repository size = 492.00 Megabytes
>   Free Space =      279.48 Megabytes
>
> topaz 1>
>
>
> AFTER
>
> SystemRepository fileSizeReport
> %
> Extent #1
> -----------
>   Filename = !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf
>
>   File size =       218.00 Megabytes
>   Space available = 25.88 Megabytes
>
> Totals
> ------
>   Repository size = 218.00 Megabytes
>   Free Space =      25.88 Megabytes
>
> topaz 1>
>
>
> Also, I guess this is nil because there are no new transactions after the restore, but I am not sure…

Once you commit the restore, then there are never any more transaction logs to apply.

> topaz 1> printit
> SystemRepository restoreStatusOldestFileId
> %
> nil
> topaz
>
>
> Larry
>
>
>> On Feb 14, 2012, at 2:25 PM, Lawrence Kellogg wrote:
>>
>>>
>>> On Feb 14, 2012, at 5:14 PM, James Foster wrote:
>>>
>>>> Larry,
>>>>
>>>> As long as your have a good production system, then the loss of previous transaction logs is not really a problem. They are only necessary for restoring from a crash.
>>>>
>>>> To shrink your production repository, you want to make a backup of an otherwise inactive system (i.e., all other user sessions logged off). To do this, log in to a new session (Topaz, Jade, GemTools, etc.) and evaluate the following:
>>>> System stopUserSessions.
>>>
>>> Thanks, the funny thing is that it still shows me some user sessions after doing that….maybe it takes a
>>> minute for them to go away….???
>>>
>>> topaz 1> printit
>>> System stopUserSessions.
>>> %
>>> System class
>>> superClass      Object class
>>> format          32
>>> instVars        0
>>> instVarNames    an Array
>>> constraints     an Array
>>> classVars       a SymbolDictionary
>>> methodDict      a GsMethodDictionary
>>> poolDictionaries an Array
>>> categories      a GsMethodDictionary
>>> secondarySuperclasses nil
>>> name            System
>>> classHistory    a ClassHistory
>>> description     a GsClassDocumentation
>>> migrationDestination nil
>>> timeStamp       a DateTime
>>> userId          SystemUser
>>> extraDict       a SymbolDictionary
>>> classCategory   nil
>>> subclasses      nil
>>>
>>> topaz 1> printit
>>> System currentSessionNames.
>>> %
>>>
>>> session number: 2    UserId: GcUser
>>> session number: 3    UserId: SymbolUser
>>> session number: 4    UserId: GcUser
>>> session number: 6    UserId: DataCurator
>>> topaz 1>
>>>
>>>> Then proceed with a full backup, shutdown, copy in a new extent, start the stone, restore the backup, commit the restore, and then start up the user sessions.
>>>>
>>>> You might check the size before and after:
>>>> SystemRepository fileSizeReport.
>>>>
>>>> -James
>>>>
>>>>
>>>> On Feb 14, 2012, at 1:58 PM, Lawrence Kellogg wrote:
>>>>
>>>>>
>>>>> On Feb 14, 2012, at 2:43 PM, Lawrence Kellogg wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Feb 14, 2012, at 2:21 PM, James Foster wrote:
>>>>>>
>>>>>>> Larry,
>>>>>>>
>>>>>>> Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> James,
>>>>>
>>>>> Ok, I was successful in copying over the original extent, and applying my backup on the Staging system. Things are much better now.
>>>>>
>>>>> Now, I want to do the same thing for production where I also have an inflated repository from some backups I loaded there.
>>>>>
>>>>> I just want to start from scratch from a backup, ignoring all the tranlogs, some of which are huge. In a scrabble for
>>>>> disk space over there, I deleted some old tranelogs. I know, I know, stupid.
>>>>>
>>>>> How do I start over? I don't have a lot of user traffic yet so now seems like the time. Given Seaside, how do
>>>>> I make sure there are no users hitting the server so I can make my backup and restore it? Should I
>>>>> kill nginx?
>>>>>
>>>>> I can't seem to find a command to force off all current sessions and then suspend login.
>>>>>
>>>>> Larry
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>> James
>>>>>>>
>>>>>>> On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote:
>>>>>>>
>>>>>>>> Hello James,
>>>>>>>> Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent.
>>>>>>>> I guess I could find the extent0 from the distribution and copy it in?
>>>>>>>>
>>>>>>>> Larry
>>>>>>>>
>>>>>>>>
>>>>>>>> On Feb 14, 2012, at 2:07 PM, James Foster wrote:
>>>>>>>>
>>>>>>>>> Larry,
>>>>>>>>>
>>>>>>>>> Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large?
>>>>>>>>>
>>>>>>>>> James
>>>>>>>>>
>>>>>>>>> On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>> So, I run two separate Amazon instances, once called Production, and one called Staging.
>>>>>>>>>> My plan was to back up production twice a day and load the backups into the Staging environment.
>>>>>>>>>> Well, here are my extent sizes:
>>>>>>>>>>
>>>>>>>>>> 515899392 Feb 14 18:32 extent0.dbf - Production
>>>>>>>>>> 4192206848 Feb 14 18:32 extent0.dbf - Staging
>>>>>>>>>>
>>>>>>>>>> Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production
>>>>>>>>>> into Staging?  I'm reading the Admin guide as fast as I can but I don't know what is going on.
>>>>>>>>>>
>>>>>>>>>> I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it
>>>>>>>>>> goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Larry
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

Larry Kellogg

On Feb 14, 2012, at 8:51 PM, James Foster wrote:


On Feb 14, 2012, at 3:11 PM, Lawrence Kellogg wrote:


On Feb 14, 2012, at 5:28 PM, James Foster wrote:

Good observation and it is smart to check. In this case, though, it is okay. We make a distinction between "user" sessions and "system" sessions. The two GcUsers are doing background garbage collection and the SymbolUser is available to create Symbols if needed. These are built-in users and activities. While they can be stopped, it is rare to need to do so. The remaining one is you, DataCurator. You may proceed with the backup with them logged in…


James,
Whew. I did it but have a few questions.

First, why did my Free Space go to 25 meg?? Do I have to manually add an extent?

Just as doing a restore in a big database will leave lots of free space, doing a restore in a small database will leave a little bit of free space (imagine the object tree that existed moments before you did the commitRestore). Also, it is possible that there was some garbage collection in process at the time of the backup and it finished after the restore. I'm not certain of either of these theories, but ~12% free space is not too big a deal. With luck, your business will have reason to consume that space soon!


  Ha ha, business, I certainly hope my project leads to some income but I have low expectations, though,  that way, I'm rarely disappointed. ;-) 
  
So, I read a whole bunch more and see this: 

"STN_FREE_SPACE_THRESHOLD sets the minimum amount of free space (in MB) to be available in the repository. If the Stone cannot maintain this level by growing an extent, it begins actions to prevent shutdown of the system; for information, see “Repository Full” on page 190."

  So, to answer my own question, correctly, I hope, the repository will grow an extent as long as there is available 
disk space. Sounds cool to me. I'm sorry that I'm so hopeless with these admin tasks, I never administered a 
Gemstone system in my previous jobs, I just wrote domain code. That was hard enough. 

So, I guess this is nil because there are no new transactions after the restore, but I am not sure…

Once you commit the restore, then there are never any more transaction logs to apply.


  Ok, that makes sense. I'm still kind of confused about how to handle transaction logs going forward, though. 
How do I know which ones I can safely delete? When? Is it overkill to do two full backups a day and 
keep rotating them to my Staging environment?


topaz 1> printit
SystemRepository restoreStatusOldestFileId
%
nil
topaz


Larry


On Feb 14, 2012, at 2:25 PM, Lawrence Kellogg wrote:


On Feb 14, 2012, at 5:14 PM, James Foster wrote:

Larry,

As long as your have a good production system, then the loss of previous transaction logs is not really a problem. They are only necessary for restoring from a crash.

To shrink your production repository, you want to make a backup of an otherwise inactive system (i.e., all other user sessions logged off). To do this, log in to a new session (Topaz, Jade, GemTools, etc.) and evaluate the following:
System stopUserSessions.

Thanks, the funny thing is that it still shows me some user sessions after doing that….maybe it takes a
minute for them to go away….???

topaz 1> printit
System stopUserSessions.
%
System class
superClass      Object class
format          32
instVars        0
instVarNames    an Array
constraints     an Array
classVars       a SymbolDictionary
methodDict      a GsMethodDictionary
poolDictionaries an Array
categories      a GsMethodDictionary
secondarySuperclasses nil
name            System
classHistory    a ClassHistory
description     a GsClassDocumentation
migrationDestination nil
timeStamp       a DateTime
userId          SystemUser
extraDict       a SymbolDictionary
classCategory   nil
subclasses      nil

topaz 1> printit
System currentSessionNames.
%

session number: 2    UserId: GcUser
session number: 3    UserId: SymbolUser
session number: 4    UserId: GcUser
session number: 6    UserId: DataCurator
topaz 1>

Then proceed with a full backup, shutdown, copy in a new extent, start the stone, restore the backup, commit the restore, and then start up the user sessions.

You might check the size before and after:
SystemRepository fileSizeReport.

-James


On Feb 14, 2012, at 1:58 PM, Lawrence Kellogg wrote:


On Feb 14, 2012, at 2:43 PM, Lawrence Kellogg wrote:



On Feb 14, 2012, at 2:21 PM, James Foster wrote:

Larry,

Restore from backup is covered in section 9.5 of the System Admin Guide. The recommendation is to stop the stone, delete the old extent, copy a clean extent from $GEMSTONE/bin/extent0.dbf, start the stone, and do the restore.




James,

Ok, I was successful in copying over the original extent, and applying my backup on the Staging system. Things are much better now.

Now, I want to do the same thing for production where I also have an inflated repository from some backups I loaded there.

I just want to start from scratch from a backup, ignoring all the tranlogs, some of which are huge. In a scrabble for
disk space over there, I deleted some old tranelogs. I know, I know, stupid.

How do I start over? I don't have a lot of user traffic yet so now seems like the time. Given Seaside, how do
I make sure there are no users hitting the server so I can make my backup and restore it? Should I
kill nginx?

I can't seem to find a command to force off all current sessions and then suspend login.

Larry




James

On Feb 14, 2012, at 11:17 AM, Lawrence Kellogg wrote:

Hello James,
Yes, I restored into an extent that was already large, I would say. How do I get a clean extent? I don't suppose it is a matter of just deleting the large extent.
I guess I could find the extent0 from the distribution and copy it in?

Larry


On Feb 14, 2012, at 2:07 PM, James Foster wrote:

Larry,

Did you start the Staging system with a clean extent ($GEMSTONE/bin/extent0.dbf) before doing the restore? Or did you restore into an extent that was already large?

James

On Feb 14, 2012, at 10:42 AM, Lawrence Kellogg wrote:

Hello,
So, I run two separate Amazon instances, once called Production, and one called Staging.
My plan was to back up production twice a day and load the backups into the Staging environment.
Well, here are my extent sizes:

515899392 Feb 14 18:32 extent0.dbf - Production
4192206848 Feb 14 18:32 extent0.dbf - Staging

Why is the size of the extent in staging eight! times the extent in Production when I loaded a backup from Production
into Staging?  I'm reading the Admin guide as fast as I can but I don't know what is going on.

I removedbf old tranlogs in Staging, get the disk space down to 70%, look the other way, and it
goes to 100%. Puzzling. Is there a way to shrink the extent and clean things up?

Thanks,

Larry













Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

James Foster-8
On Feb 14, 2012, at 6:38 PM, Lawrence Kellogg wrote:
> I'm still kind of confused about how to handle transaction logs going forward, though.
> How do I know which ones I can safely delete? When? Is it overkill to do two full backups a day and
> keep rotating them to my Staging environment?

The general policy for backups and transaction logs is very much the same for GemStone as it is for any other database system. You need to keep enough redundancy to recover from some statistical risk of loss. A typical rule is "no single point of failure" which means two of everything.

The standard recommendation is to have at least three disks: one for the OS, swap space, and backups; one for the extents; and one for the transaction logs. (In addition to providing duplication, this helps performance since it allows the transaction logs to be written without competition from other activity.) Ideally, each of these would be on redundant disks (e.g., RAID).

If the system crashes without any disk failures (e.g., power loss), then on restart GemStone recognizes that things were not shut down cleanly and it replays the transactions since the last checkpoint. If there is a failure of the OS disk, then when it is repaired GemStone will recover just as if it were another OS crash. On restart you should take a current backup (or two!).

If there is a failure of the extent disk, then when it is replaced you restore the backup and replay the transaction logs.

If there is a failure of the transaction disk, then the system will pause until you provide a new disk for transactions (which can be done while the system is up).

In any case, it is generally a good idea to keep two backups and the transaction logs going back to the beginning of the earlier backup.

If you want to keep a "warm standby" then you have a second system in restore mode and each time you finish a transaction log on the production system you restore it to the standby system. I know of at least one customer who has the standby system in a second data center and transfers a log every 15 minutes. (You don't have to wait for it to be full; you can explicitly start a new log.)

How this plays into Amazon, I'm not sure. I assume that you get only one disk, and it is on a shared SAN (and can have performance problems!), but almost certainly has RAID. You have the option of specifying different data centers for different instances.
Reply | Threaded
Open this post in threaded view
|

Silly configuration issue...

Larry Kellogg
If this is true:

[seasideuser@ip-10-191-194-75 data]$ echo $GEMSTONE_SYS_CONF
/opt/gemstone/product/seaside/data/system.conf


why does Gemstone insist in trying to start the stone in the other directory, the one with no extent0.dbf???

I'm too tired to figure this out now. I've started and stopped this thing dozens of times and
can't figure out why this is failing. :luckily it's just Staging so it can wait. I had shut down the stone
to restore a backup and then reply log files….but I don't understand this error…

Larry



[seasideuser@ip-10-191-194-75 data]$ startstone seaside
startstone[Info]: GemStone version '2.4.4.1'
startstone[Info]: Starting Stone repository monitor "seaside".
startstone[Info]: GEMSTONE is: "/opt/gemstone/product".
startstone[Warning]: /usr/lib64/libposix-aio.so not found, using librt.so
startstone[Info]:
    GEMSTONE_SYS_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/system.conf  <<<<<<<<<<<<<<<<????????????????
    GEMSTONE_EXE_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/seaside.conf <<<<<<<<<<<<<<<<<????????????????
startstone[Info]: Log file is '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log'.

startstone[Error]: Stone process (id=14170) has died.
startstone[Error]: Examine '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log' for more information.  Excerpt follows:
        configuredSize 1000 MB
      Directory   1:
        configured name $GEMSTONE_DATADIR/
        expanded name /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/
        configuredSize 1000 MB
    -------------------------------------------------------

    GemStone is unable to open the file !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf
       reason = File = /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf
DBF Op: Open; DBF Record: -1;
Error: open() failure; System Codes: errno=13,EACCES, Authorization failure (permission denied)

    An error occurred opening the repository for exclusive access.

    Stone startup has failed.

Reply | Threaded
Open this post in threaded view
|

Re: Silly configuration issue...

James Foster-8
Larry,

What do you get from 'ls -alF /opt/gemstone'? What about 'ls -alF /opt/gemstone/product/seaside/data/'? I ask because I generally set up product as a symbolic link to the actual directory so it can be easily changed for server upgrades (when will you move to 2.4.5?). Also, we need to make sure that there really is a config file!

Also, while the default installation puts the config files, extents, transaction logs, and log files in the product tree, I prefer to have them elsewhere (/opt/gemstone/etc, /opt/gemstone/data, /opt/gemstone/log) so that upgrades do not require moving as much stuff.

-James

On Feb 14, 2012, at 8:23 PM, Lawrence Kellogg wrote:

> If this is true:
>
> [seasideuser@ip-10-191-194-75 data]$ echo $GEMSTONE_SYS_CONF
> /opt/gemstone/product/seaside/data/system.conf
>
>
> why does Gemstone insist in trying to start the stone in the other directory, the one with no extent0.dbf???
>
> I'm too tired to figure this out now. I've started and stopped this thing dozens of times and
> can't figure out why this is failing. :luckily it's just Staging so it can wait. I had shut down the stone
> to restore a backup and then reply log files….but I don't understand this error…
>
> Larry
>
>
>
> [seasideuser@ip-10-191-194-75 data]$ startstone seaside
> startstone[Info]: GemStone version '2.4.4.1'
> startstone[Info]: Starting Stone repository monitor "seaside".
> startstone[Info]: GEMSTONE is: "/opt/gemstone/product".
> startstone[Warning]: /usr/lib64/libposix-aio.so not found, using librt.so
> startstone[Info]:
>    GEMSTONE_SYS_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/system.conf  <<<<<<<<<<<<<<<<????????????????
>    GEMSTONE_EXE_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/seaside.conf <<<<<<<<<<<<<<<<<????????????????
> startstone[Info]: Log file is '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log'.
>
> startstone[Error]: Stone process (id=14170) has died.
> startstone[Error]: Examine '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log' for more information.  Excerpt follows:
>        configuredSize 1000 MB
>      Directory   1:
>        configured name $GEMSTONE_DATADIR/
>        expanded name /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/
>        configuredSize 1000 MB
>    -------------------------------------------------------
>
>    GemStone is unable to open the file !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf
>       reason = File = /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf
> DBF Op: Open; DBF Record: -1;
> Error: open() failure; System Codes: errno=13,EACCES, Authorization failure (permission denied)
>
>    An error occurred opening the repository for exclusive access.
>
>    Stone startup has failed.
>

Reply | Threaded
Open this post in threaded view
|

Re: Silly configuration issue...

Larry Kellogg

On Feb 14, 2012, at 11:40 PM, James Foster wrote:

> Larry,
>
> What do you get from 'ls -alF /opt/gemstone'? What about 'ls -alF /opt/gemstone/product/seaside/data/'? I ask because I generally set up product as a symbolic link to the actual directory so it can be easily changed for server upgrades (when will you move to 2.4.5?). Also, we need to make sure that there really is a config file!
>

[seasideuser@ip-10-191-194-75 ~]$ ls -alF /opt/gemstone
total 28
drwxrwx---  5 seasideuser seasideuser  4096 Oct 12 19:18 ./
drwxr-xr-x  4 root        root         4096 Oct 12 19:18 ../
drwxr-xr-x 17 seasideuser seasideuser  4096 Oct 13 02:15 GemStone64Bit2.4.4.1-x86_64.Linux/
drwxrwx---  2 seasideuser seasideuser  4096 Feb 15 04:18 locks/
drwxrwxrwx  3 seasideuser seasideuser 12288 Feb 15 04:05 log/
lrwxrwxrwx  1 seasideuser seasideuser    47 Oct 12 19:18 product -> /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/
[seasideuser@ip-10-191-194-75 ~]$ ls -alF /opt/gemstone/product/seaside/data
total 47144
drwxrwxr-x 4 seasideuser seasideuser     4096 Feb 15 04:26 ./
drwxrwxr-x 9 seasideuser seasideuser     4096 Jul 13  2010 ../
drwxrwxr-x 2 seasideuser seasideuser     4096 Feb 15 02:41 backups/
-rw------- 1 root        root        14680064 Feb 15 03:40 extent0.dbf
-rw-r--r-- 1 seasideuser seasideuser      229 Jul 13  2010 gem.conf
drwxr-xr-x 2 root        root            4096 Feb 15 03:40 old/
-rw-r--r-- 1 seasideuser seasideuser      478 Jul 13  2010 system.conf
-rw-rw-r-- 1 seasideuser seasideuser    35840 Feb 14 20:56 tranlog1.dbf
-rw-rw-r-- 1 seasideuser seasideuser  4320256 Feb 15 03:39 tranlog2.dbf
-rw-rw-r-- 1 seasideuser seasideuser   753152 Feb 15 03:24 tranlog5.dbf
-rw-rw-r-- 1 seasideuser seasideuser    10240 Feb 15 03:24 tranlog6.dbf
-rw-rw-r-- 1 seasideuser seasideuser 28443136 Feb 15 03:27 tranlog7.dbf
[seasideuser@ip-10-191-194-75 ~]$

This seems right to me, unless I'm missing something. Is the symbolic link for product correct?
I see gem.conf and system.conf files. I swear I only copied in the new extent0 but perhaps I somehow deleted the config file.
I did a ". defSeaside" and that runs ok.

Just when I think I understand how it all works, I run into these sorts of issues that leave me stymied. I will migrate to
2.4.5 when I get a few spare moments. It's not easy being the admin/programmer/designer/customer service rep/marketeer, but it
sure is a lot of fun.

  Larry
 


> Also, while the default installation puts the config files, extents, transaction logs, and log files in the product tree, I prefer to have them elsewhere (/opt/gemstone/etc, /opt/gemstone/data, /opt/gemstone/log) so that upgrades do not require moving as much stuff.
>
> -James
>
> On Feb 14, 2012, at 8:23 PM, Lawrence Kellogg wrote:
>
>> If this is true:
>>
>> [seasideuser@ip-10-191-194-75 data]$ echo $GEMSTONE_SYS_CONF
>> /opt/gemstone/product/seaside/data/system.conf
>>
>>
>> why does Gemstone insist in trying to start the stone in the other directory, the one with no extent0.dbf???
>>
>> I'm too tired to figure this out now. I've started and stopped this thing dozens of times and
>> can't figure out why this is failing. :luckily it's just Staging so it can wait. I had shut down the stone
>> to restore a backup and then reply log files….but I don't understand this error…
>>
>> Larry
>>
>>
>>
>> [seasideuser@ip-10-191-194-75 data]$ startstone seaside
>> startstone[Info]: GemStone version '2.4.4.1'
>> startstone[Info]: Starting Stone repository monitor "seaside".
>> startstone[Info]: GEMSTONE is: "/opt/gemstone/product".
>> startstone[Warning]: /usr/lib64/libposix-aio.so not found, using librt.so
>> startstone[Info]:
>>   GEMSTONE_SYS_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/system.conf  <<<<<<<<<<<<<<<<????????????????
>>   GEMSTONE_EXE_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/seaside.conf <<<<<<<<<<<<<<<<<????????????????
>> startstone[Info]: Log file is '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log'.
>>
>> startstone[Error]: Stone process (id=14170) has died.
>> startstone[Error]: Examine '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log' for more information.  Excerpt follows:
>>       configuredSize 1000 MB
>>     Directory   1:
>>       configured name $GEMSTONE_DATADIR/
>>       expanded name /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/
>>       configuredSize 1000 MB
>>   -------------------------------------------------------
>>
>>   GemStone is unable to open the file !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf
>>      reason = File = /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf
>> DBF Op: Open; DBF Record: -1;
>> Error: open() failure; System Codes: errno=13,EACCES, Authorization failure (permission denied)
>>
>>   An error occurred opening the repository for exclusive access.
>>
>>   Stone startup has failed.
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: slow data page reads?

Nick
In reply to this post by Johan Brichau-2
As another reference I have two Gemstone 2.4.4.1 servers.

The first runs on a EC2 micro-instance (613 MB):

$ $GEMSTONE/sys/pgsvrslow
'/opt/gemstone/product/seaside/data/extent0.dbf' opendbfnolock
PGSVR>10000 testreadrate

10000 random pages read in 3600 ms
Avg random read rate: 2777.78 pages/s (3.6000e-01 ms/read)
PGSVR>100 20 testbigreadrate

2000 random pages read in 20 IO calls in 197 ms
Avg random IO rate: 101.52 IO/s (9.8500e+00 ms/read)
Avg page read rate: 10152.28 pages/s (9.8500e-02 ms/ page read)

----

The second is running on a more sensibly sized (2G RAM) Linode instance:

PGSVR>10000 testreadrate

10000 random pages read in 67836 ms
Avg random read rate: 147.41 pages/s (6.7836e+00 ms/read)
PGSVR>100 20 testbigreadrate

2000 random pages read in 20 IO calls in 798 ms
Avg random IO rate: 25.06 IO/s (3.9900e+01 ms/read)
Avg page read rate: 2506.27 pages/s (3.9900e-01 ms/ page read)

as the first was so slow I repeated:

PGSVR>10000 testreadrate

10000 random pages read in 28384 ms
Avg random read rate: 352.31 pages/s (2.8384e+00 ms/read)

and again:

PGSVR>10000 testreadrate

10000 random pages read in 12671 ms
Avg random read rate: 789.20 pages/s (1.2671e+00 ms/read)

odd...

The EC2 instance has recently been through a backup and restore and has a smaller extent (320M). Whereas the Linode instance hasn't been through a backup and restore and has a very large extent (4.9G). Perhaps there is some memory paging which is effecting performance?

Nick

On 14 February 2012 19:21, Johan Brichau <[hidden email]> wrote:
Hi Dale,

Thanks for that pointer!

Your numbers are quite impressive ... even on my local macbook pro I'm getting only numbers like the one below.
Now, I've seen stats 10 times slower on the SAN, depending on the stone, so I'm currently gathering stats.

----

PGSVR>'/opt/gemstone/stones/test/data/extent0.dbf' opendbfnolock
PGSVR>10000 testreadrate

10000 random pages read in 29371 ms
Avg random read rate: 340.47 pages/s (2.9371e+00 ms/read)
PGSVR>100 20 testbigreadrate

2000 random pages read in 20 IO calls in 2163 ms
Avg random IO rate: 9.25 IO/s (1.0815e+02 ms/read)
Avg page read rate: 924.64 pages/s (1.0815e+00 ms/ page read)


On 14 Feb 2012, at 19:00, Dale Henrichs wrote:

> Johan,
>
> We have a program that tests performance for a system doing random page reads against an extent. Launch `$GEMSTONE/sys/pgsvrslow` then enter the following two commands at the 'PGSVR>' prompt :
>
>  '$GEMSTONE/seaside/data/extent0.dbf' opendbfnolock
>  <numpages> testreadrate
>  <numpages in block> <numsamples> testbigreadrate
>
> The `testreadrate` command does reads <numpages> random pages from the given extent. The answer you get gives random read performance.
>
> The `testbigreadrate` command does <numsamples> reads of <numpages in block> pages from random locations in the given extent. The answer you get gives you a measure of sequential read performance.
>
> Here's sample output from one of our desktop boxes on standard file system (basically reading from file buffer):
>
> ---------------------------------------------------------------------------------
> % $GEMSTONE/sys/pgsvrslow
> PGSVR>'extent0.dbf' opendbfnolock
>
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 16 ms
> Avg random read rate: 625000.00 pages/s (1.6000e-03 ms/read)
>
> PGSVR>100 20 testbigreadrate
>
> 2000 random pages read in 20 IO calls in 4 ms
> Avg random IO rate: 5000.00 IO/s (2.0000e-01 ms/read)
> Avg page read rate: 500000.00 pages/s (2.0000e-03 ms/ page read)
> PGSVR>
> ---------------------------------------------------------------------------------
>
> These commands can be run against the extent for a running stone ... but you'll want to get measurements with a variety of configurations...
>
> At the moment we're guessing that that the SAN might be optimized for sequential reads rather than random reads (i.e., buffering issues) ... also are you sure the you aren't be throttled by your provider?
>
> Finally it is worth looking at a copy of the config file for the stone to see if there's anything there...
>
> Dale
>
> ----- Original Message -----
> | From: "Johan Brichau" <[hidden email]>
> | To: "GemStone Seaside beta discussion" <[hidden email]>
> | Sent: Tuesday, February 14, 2012 5:43:58 AM
> | Subject: Re: [GS/SS Beta] slow data page reads?
> |
> | As mentioned in Dale's blogpost, I went on to try a raw disk
> | partition for the extent and the tranlogs and got exactly the same
> | results: *very* low disk read speed (see below). Starting Gemstone
> | and reading the SPC takes a long time.
> |
> | We are pretty certain the SAN is not overloaded because all other
> | disk operations can reach a lot higher speeds. For example, the
> | copydbf operation from the extent file to the partition reached very
> | good speeds of over 30MB/s.
> |
> | So we are only seeing this issue when gemstone is doing read access
> | on this kind of setup. I have other servers where everything is
> | running smoothly.
> |
> | If anybody has any ideas... that would be cool ;-)
> |
> | Johan
> |
> | Sample read speed during gemstone page read:
> |
> | Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> | avgrq-sz avgqu-sz   await  svctm  %util
> | sda5            111.60     0.00 37.00  0.00     0.58     0.00
> |    32.00     1.00   26.90  27.01  99.92
> |
> |
> | On 13 Feb 2012, at 21:09, Johan Brichau wrote:
> |
> | > Well.. it turns out that we were wrong and we still experience the
> | > problem...
> | >
> | > Dale,
> | >
> | > What we are seeing sounds very similar to this:
> | >
> | > http://gemstonesoup.wordpress.com/2007/10/19/scaling-seaside-with-gemstones/
> | >
> | > " The issue with the i/o anomalies that we observed in Linux has
> | > not been as easy to resolve. I spent some time tuning GemStone/S
> | > to make sure that GemStone/S wasn't the source of the anomaly.
> | > Finally our IS guy was able to reproduce the anomaly and he ran
> | > into a few other folks on the net that have observed similar
> | > anomalies.
> | >
> | > At this writing we haven't found a solution to the anomaly, but we
> | > are pretty optimistic that it is resolvable. We've seen different
> | > versions of Linux running on similar hardware that doesn't show
> | > the anomaly, so it is either a function of the kernel version or
> | > the settings of some of the kernel parameters. As soon as we
> | > figure it out we'll let you know."
> | >
> | > Do you have more information on this?
> | >
> | > Johan
> | >
> | >
> | > On 13 Feb 2012, at 19:39, Otto Behrens wrote:
> | >
> | >> Hi Johan,
> | >>
> | >> We had a machine hosted on a VPS, with a "state of the art" san,
> | >> with
> | >> similar issues. We complained every so often and the service
> | >> provider
> | >> responded with their inability to control some users on the same
> | >> VPS
> | >> host doing "extremely heavy" disk io. We got the client off the
> | >> vps
> | >> onto a normal machine with a SATA disk and have had joy ever since
> | >> (10-20x improvement with the vps at its best).
> | >>
> | >> I think that the randomness of the reads thrown on top of other
> | >> vms on
> | >> the same host just caused unpredictable io; so we prefer avoiding
> | >> vms.
> | >>
> | >> Alternatively, if it can work for you, put the extents in RAM.
> | >>
> | >> Otto
> | >>
> | >> On 13 Feb 2012, at 20:16, Johan Brichau <[hidden email]>
> | >> wrote:
> | >>
> | >>> Hi all,
> | >>>
> | >>> Never mind my question below: our hosters have identified the
> | >>> problem on their SAN.
> | >>> Strange behavior though...
> | >>>
> | >>> phew ;-)
> | >>> Johan
> | >>>
> | >>> On 13 Feb 2012, at 14:05, Johan Brichau wrote:
> | >>>
> | >>>> Hi Gemstoners,
> | >>>>
> | >>>> Is there any condition (other than a slow filesystem) that would
> | >>>> trigger slow page reads when a gem needs to hit disk and load
> | >>>> objects?
> | >>>>
> | >>>> Here is the problem I'm trying to chase: a seaside gem is
> | >>>> processing a request and (according to the statmonit output)
> | >>>> ends up requesting pages. The pageread process goes terribly
> | >>>> slow (takes approx +- 50s) and I see only 5 to 15 pages per
> | >>>> second being read during that time period. There is no other
> | >>>> activity at that moment and I'm puzzled by why the read goes so
> | >>>> slow (other than a slow filesystem -- see next).
> | >>>>
> | >>>> Because the iostat system monitoring also shows the same low
> | >>>> read speed and indicates a 100% disk util statistic, my obvious
> | >>>> first impression was that the disk is saturated and we have
> | >>>> datastore problem. However, the disk read speed proves to be
> | >>>> good when I'm doing other disk activity outside of Gemstone.
> | >>>> Moreover, the _write_ speed is terribly good at all times.
> | >>>>
> | >>>> So, I'm currently trying to chase something that only triggers
> | >>>> slow page read speed from a Gemstone topaz session.
> | >>>>
> | >>>> GEM_IO_LIMIT is set at default setting of 5000
> | >>>>
> | >>>> For illustration, these are some kind of io stats when Gemstone
> | >>>> is doing read access:
> | >>>>
> | >>>> Time: 06:40:21 PM
> | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> | >>>> sda3              0.00     0.20  6.00  0.40     0.09     0.00
> | >>>>    30.75     1.00  166.88 156.00  99.84
> | >>>>
> | >>>> Time: 06:40:26 PM
> | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> | >>>> sda3              0.00     0.20  8.20  0.40     0.13     0.00
> | >>>>    31.07     1.05  119.91 115.72  99.52
> | >>>>
> | >>>> Time: 06:40:31 PM
> | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> | >>>> sda3              0.00     0.20  5.99  0.40     0.09     0.00
> | >>>>    30.75     1.01  157.75 156.25  99.80
> | >>>
> | >
> |
> |


Reply | Threaded
Open this post in threaded view
|

Re: slow data page reads?

Johan Brichau-2
Nick,

Thanks for those figures. At least that gives me some comparison on 'cloud' infrastructure.

First off: when I run the test on an extent that is not running, I'm getting *very* good statistics (in the order of those shown by Dale...).
Dale: is that normal?

I also notice that there is a correlation with extent size, although I suspect it has more to do with file system buffers. I have also had the chance to measure +10000 pages/s random read performance on an extent of 5Gb as well (with 3Gb free space). I'm not an expert on this (by far) but I suspect that smaller extents get buffered more quickly and that the same pages get read multiple times due to their total number being smaller.

Here are some representative runs on operational extents:

PGSVR>10000 testreadrate

10000 random pages read in 128217 ms
Avg random read rate: 77.99 pages/s (1.2822e+01 ms/read)
PGSVR>100 20 testbigreadrate

2000 random pages read in 20 IO calls in 4713 ms
Avg random IO rate: 4.24 IO/s (2.3565e+02 ms/read)
Avg page read rate: 424.36 pages/s (2.3565e+00 ms/ page read)

*****
PGSVR>10000 testreadrate        

10000 random pages read in 95294 ms
Avg random read rate: 104.94 pages/s (9.5294e+00 ms/read)
PGSVR>100 20 testbigreadrate

2000 random pages read in 20 IO calls in 3378 ms
Avg random IO rate: 5.92 IO/s (1.6890e+02 ms/read)
Avg page read rate: 592.07 pages/s (1.6890e+00 ms/ page read)

On 15 Feb 2012, at 12:41, Nick Ager wrote:

> As another reference I have two Gemstone 2.4.4.1 servers.
>
> The first runs on a EC2 micro-instance (613 MB):
>
> $ $GEMSTONE/sys/pgsvrslow
> '/opt/gemstone/product/seaside/data/extent0.dbf' opendbfnolock
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 3600 ms
> Avg random read rate: 2777.78 pages/s (3.6000e-01 ms/read)
> PGSVR>100 20 testbigreadrate
>
> 2000 random pages read in 20 IO calls in 197 ms
> Avg random IO rate: 101.52 IO/s (9.8500e+00 ms/read)
> Avg page read rate: 10152.28 pages/s (9.8500e-02 ms/ page read)
>
> ----
>
> The second is running on a more sensibly sized (2G RAM) Linode instance:
>
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 67836 ms
> Avg random read rate: 147.41 pages/s (6.7836e+00 ms/read)
> PGSVR>100 20 testbigreadrate
>
> 2000 random pages read in 20 IO calls in 798 ms
> Avg random IO rate: 25.06 IO/s (3.9900e+01 ms/read)
> Avg page read rate: 2506.27 pages/s (3.9900e-01 ms/ page read)
>
> as the first was so slow I repeated:
>
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 28384 ms
> Avg random read rate: 352.31 pages/s (2.8384e+00 ms/read)
>
> and again:
>
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 12671 ms
> Avg random read rate: 789.20 pages/s (1.2671e+00 ms/read)
>
> odd...
>
> The EC2 instance has recently been through a backup and restore and has a smaller extent (320M). Whereas the Linode instance hasn't been through a backup and restore and has a very large extent (4.9G). Perhaps there is some memory paging which is effecting performance?
>
> Nick
>
> On 14 February 2012 19:21, Johan Brichau <[hidden email]> wrote:
> Hi Dale,
>
> Thanks for that pointer!
>
> Your numbers are quite impressive ... even on my local macbook pro I'm getting only numbers like the one below.
> Now, I've seen stats 10 times slower on the SAN, depending on the stone, so I'm currently gathering stats.
>
> ----
>
> PGSVR>'/opt/gemstone/stones/test/data/extent0.dbf' opendbfnolock
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 29371 ms
> Avg random read rate: 340.47 pages/s (2.9371e+00 ms/read)
> PGSVR>100 20 testbigreadrate
>
> 2000 random pages read in 20 IO calls in 2163 ms
> Avg random IO rate: 9.25 IO/s (1.0815e+02 ms/read)
> Avg page read rate: 924.64 pages/s (1.0815e+00 ms/ page read)
>
>
> On 14 Feb 2012, at 19:00, Dale Henrichs wrote:
>
> > Johan,
> >
> > We have a program that tests performance for a system doing random page reads against an extent. Launch `$GEMSTONE/sys/pgsvrslow` then enter the following two commands at the 'PGSVR>' prompt :
> >
> >  '$GEMSTONE/seaside/data/extent0.dbf' opendbfnolock
> >  <numpages> testreadrate
> >  <numpages in block> <numsamples> testbigreadrate
> >
> > The `testreadrate` command does reads <numpages> random pages from the given extent. The answer you get gives random read performance.
> >
> > The `testbigreadrate` command does <numsamples> reads of <numpages in block> pages from random locations in the given extent. The answer you get gives you a measure of sequential read performance.
> >
> > Here's sample output from one of our desktop boxes on standard file system (basically reading from file buffer):
> >
> > ---------------------------------------------------------------------------------
> > % $GEMSTONE/sys/pgsvrslow
> > PGSVR>'extent0.dbf' opendbfnolock
> >
> > PGSVR>10000 testreadrate
> >
> > 10000 random pages read in 16 ms
> > Avg random read rate: 625000.00 pages/s (1.6000e-03 ms/read)
> >
> > PGSVR>100 20 testbigreadrate
> >
> > 2000 random pages read in 20 IO calls in 4 ms
> > Avg random IO rate: 5000.00 IO/s (2.0000e-01 ms/read)
> > Avg page read rate: 500000.00 pages/s (2.0000e-03 ms/ page read)
> > PGSVR>
> > ---------------------------------------------------------------------------------
> >
> > These commands can be run against the extent for a running stone ... but you'll want to get measurements with a variety of configurations...
> >
> > At the moment we're guessing that that the SAN might be optimized for sequential reads rather than random reads (i.e., buffering issues) ... also are you sure the you aren't be throttled by your provider?
> >
> > Finally it is worth looking at a copy of the config file for the stone to see if there's anything there...
> >
> > Dale
> >
> > ----- Original Message -----
> > | From: "Johan Brichau" <[hidden email]>
> > | To: "GemStone Seaside beta discussion" <[hidden email]>
> > | Sent: Tuesday, February 14, 2012 5:43:58 AM
> > | Subject: Re: [GS/SS Beta] slow data page reads?
> > |
> > | As mentioned in Dale's blogpost, I went on to try a raw disk
> > | partition for the extent and the tranlogs and got exactly the same
> > | results: *very* low disk read speed (see below). Starting Gemstone
> > | and reading the SPC takes a long time.
> > |
> > | We are pretty certain the SAN is not overloaded because all other
> > | disk operations can reach a lot higher speeds. For example, the
> > | copydbf operation from the extent file to the partition reached very
> > | good speeds of over 30MB/s.
> > |
> > | So we are only seeing this issue when gemstone is doing read access
> > | on this kind of setup. I have other servers where everything is
> > | running smoothly.
> > |
> > | If anybody has any ideas... that would be cool ;-)
> > |
> > | Johan
> > |
> > | Sample read speed during gemstone page read:
> > |
> > | Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> > | avgrq-sz avgqu-sz   await  svctm  %util
> > | sda5            111.60     0.00 37.00  0.00     0.58     0.00
> > |    32.00     1.00   26.90  27.01  99.92
> > |
> > |
> > | On 13 Feb 2012, at 21:09, Johan Brichau wrote:
> > |
> > | > Well.. it turns out that we were wrong and we still experience the
> > | > problem...
> > | >
> > | > Dale,
> > | >
> > | > What we are seeing sounds very similar to this:
> > | >
> > | > http://gemstonesoup.wordpress.com/2007/10/19/scaling-seaside-with-gemstones/
> > | >
> > | > " The issue with the i/o anomalies that we observed in Linux has
> > | > not been as easy to resolve. I spent some time tuning GemStone/S
> > | > to make sure that GemStone/S wasn't the source of the anomaly.
> > | > Finally our IS guy was able to reproduce the anomaly and he ran
> > | > into a few other folks on the net that have observed similar
> > | > anomalies.
> > | >
> > | > At this writing we haven't found a solution to the anomaly, but we
> > | > are pretty optimistic that it is resolvable. We've seen different
> > | > versions of Linux running on similar hardware that doesn't show
> > | > the anomaly, so it is either a function of the kernel version or
> > | > the settings of some of the kernel parameters. As soon as we
> > | > figure it out we'll let you know."
> > | >
> > | > Do you have more information on this?
> > | >
> > | > Johan
> > | >
> > | >
> > | > On 13 Feb 2012, at 19:39, Otto Behrens wrote:
> > | >
> > | >> Hi Johan,
> > | >>
> > | >> We had a machine hosted on a VPS, with a "state of the art" san,
> > | >> with
> > | >> similar issues. We complained every so often and the service
> > | >> provider
> > | >> responded with their inability to control some users on the same
> > | >> VPS
> > | >> host doing "extremely heavy" disk io. We got the client off the
> > | >> vps
> > | >> onto a normal machine with a SATA disk and have had joy ever since
> > | >> (10-20x improvement with the vps at its best).
> > | >>
> > | >> I think that the randomness of the reads thrown on top of other
> > | >> vms on
> > | >> the same host just caused unpredictable io; so we prefer avoiding
> > | >> vms.
> > | >>
> > | >> Alternatively, if it can work for you, put the extents in RAM.
> > | >>
> > | >> Otto
> > | >>
> > | >> On 13 Feb 2012, at 20:16, Johan Brichau <[hidden email]>
> > | >> wrote:
> > | >>
> > | >>> Hi all,
> > | >>>
> > | >>> Never mind my question below: our hosters have identified the
> > | >>> problem on their SAN.
> > | >>> Strange behavior though...
> > | >>>
> > | >>> phew ;-)
> > | >>> Johan
> > | >>>
> > | >>> On 13 Feb 2012, at 14:05, Johan Brichau wrote:
> > | >>>
> > | >>>> Hi Gemstoners,
> > | >>>>
> > | >>>> Is there any condition (other than a slow filesystem) that would
> > | >>>> trigger slow page reads when a gem needs to hit disk and load
> > | >>>> objects?
> > | >>>>
> > | >>>> Here is the problem I'm trying to chase: a seaside gem is
> > | >>>> processing a request and (according to the statmonit output)
> > | >>>> ends up requesting pages. The pageread process goes terribly
> > | >>>> slow (takes approx +- 50s) and I see only 5 to 15 pages per
> > | >>>> second being read during that time period. There is no other
> > | >>>> activity at that moment and I'm puzzled by why the read goes so
> > | >>>> slow (other than a slow filesystem -- see next).
> > | >>>>
> > | >>>> Because the iostat system monitoring also shows the same low
> > | >>>> read speed and indicates a 100% disk util statistic, my obvious
> > | >>>> first impression was that the disk is saturated and we have
> > | >>>> datastore problem. However, the disk read speed proves to be
> > | >>>> good when I'm doing other disk activity outside of Gemstone.
> > | >>>> Moreover, the _write_ speed is terribly good at all times.
> > | >>>>
> > | >>>> So, I'm currently trying to chase something that only triggers
> > | >>>> slow page read speed from a Gemstone topaz session.
> > | >>>>
> > | >>>> GEM_IO_LIMIT is set at default setting of 5000
> > | >>>>
> > | >>>> For illustration, these are some kind of io stats when Gemstone
> > | >>>> is doing read access:
> > | >>>>
> > | >>>> Time: 06:40:21 PM
> > | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> > | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> > | >>>> sda3              0.00     0.20  6.00  0.40     0.09     0.00
> > | >>>>    30.75     1.00  166.88 156.00  99.84
> > | >>>>
> > | >>>> Time: 06:40:26 PM
> > | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> > | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> > | >>>> sda3              0.00     0.20  8.20  0.40     0.13     0.00
> > | >>>>    31.07     1.05  119.91 115.72  99.52
> > | >>>>
> > | >>>> Time: 06:40:31 PM
> > | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> > | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> > | >>>> sda3              0.00     0.20  5.99  0.40     0.09     0.00
> > | >>>>    30.75     1.01  157.75 156.25  99.80
> > | >>>
> > | >
> > |
> > |
>
>

Reply | Threaded
Open this post in threaded view
|

Re: slow data page reads?

Nick
Hi Johan,

Aren't there two extent related variables. The size of the extent used in the test and the size of the extent in-use by the running stone when the test is executed.

I'll try the tests again using the base 48M /opt/gemstone/product/bin/extent0.seaside.dbf extent and get back to you

Nick

On 15 February 2012 12:46, Johan Brichau <[hidden email]> wrote:
Nick,

Thanks for those figures. At least that gives me some comparison on 'cloud' infrastructure.

First off: when I run the test on an extent that is not running, I'm getting *very* good statistics (in the order of those shown by Dale...).
Dale: is that normal?

I also notice that there is a correlation with extent size, although I suspect it has more to do with file system buffers. I have also had the chance to measure +10000 pages/s random read performance on an extent of 5Gb as well (with 3Gb free space). I'm not an expert on this (by far) but I suspect that smaller extents get buffered more quickly and that the same pages get read multiple times due to their total number being smaller.

Here are some representative runs on operational extents:

PGSVR>10000 testreadrate

10000 random pages read in 128217 ms
Avg random read rate: 77.99 pages/s (1.2822e+01 ms/read)
PGSVR>100 20 testbigreadrate

2000 random pages read in 20 IO calls in 4713 ms
Avg random IO rate: 4.24 IO/s (2.3565e+02 ms/read)
Avg page read rate: 424.36 pages/s (2.3565e+00 ms/ page read)

*****
PGSVR>10000 testreadrate

10000 random pages read in 95294 ms
Avg random read rate: 104.94 pages/s (9.5294e+00 ms/read)
PGSVR>100 20 testbigreadrate

2000 random pages read in 20 IO calls in 3378 ms
Avg random IO rate: 5.92 IO/s (1.6890e+02 ms/read)
Avg page read rate: 592.07 pages/s (1.6890e+00 ms/ page read)

On 15 Feb 2012, at 12:41, Nick Ager wrote:

> As another reference I have two Gemstone 2.4.4.1 servers.
>
> The first runs on a EC2 micro-instance (613 MB):
>
> $ $GEMSTONE/sys/pgsvrslow
> '/opt/gemstone/product/seaside/data/extent0.dbf' opendbfnolock
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 3600 ms
> Avg random read rate: 2777.78 pages/s (3.6000e-01 ms/read)
> PGSVR>100 20 testbigreadrate
>
> 2000 random pages read in 20 IO calls in 197 ms
> Avg random IO rate: 101.52 IO/s (9.8500e+00 ms/read)
> Avg page read rate: 10152.28 pages/s (9.8500e-02 ms/ page read)
>
> ----
>
> The second is running on a more sensibly sized (2G RAM) Linode instance:
>
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 67836 ms
> Avg random read rate: 147.41 pages/s (6.7836e+00 ms/read)
> PGSVR>100 20 testbigreadrate
>
> 2000 random pages read in 20 IO calls in 798 ms
> Avg random IO rate: 25.06 IO/s (3.9900e+01 ms/read)
> Avg page read rate: 2506.27 pages/s (3.9900e-01 ms/ page read)
>
> as the first was so slow I repeated:
>
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 28384 ms
> Avg random read rate: 352.31 pages/s (2.8384e+00 ms/read)
>
> and again:
>
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 12671 ms
> Avg random read rate: 789.20 pages/s (1.2671e+00 ms/read)
>
> odd...
>
> The EC2 instance has recently been through a backup and restore and has a smaller extent (320M). Whereas the Linode instance hasn't been through a backup and restore and has a very large extent (4.9G). Perhaps there is some memory paging which is effecting performance?
>
> Nick
>
> On 14 February 2012 19:21, Johan Brichau <[hidden email]> wrote:
> Hi Dale,
>
> Thanks for that pointer!
>
> Your numbers are quite impressive ... even on my local macbook pro I'm getting only numbers like the one below.
> Now, I've seen stats 10 times slower on the SAN, depending on the stone, so I'm currently gathering stats.
>
> ----
>
> PGSVR>'/opt/gemstone/stones/test/data/extent0.dbf' opendbfnolock
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 29371 ms
> Avg random read rate: 340.47 pages/s (2.9371e+00 ms/read)
> PGSVR>100 20 testbigreadrate
>
> 2000 random pages read in 20 IO calls in 2163 ms
> Avg random IO rate: 9.25 IO/s (1.0815e+02 ms/read)
> Avg page read rate: 924.64 pages/s (1.0815e+00 ms/ page read)
>
>
> On 14 Feb 2012, at 19:00, Dale Henrichs wrote:
>
> > Johan,
> >
> > We have a program that tests performance for a system doing random page reads against an extent. Launch `$GEMSTONE/sys/pgsvrslow` then enter the following two commands at the 'PGSVR>' prompt :
> >
> >  '$GEMSTONE/seaside/data/extent0.dbf' opendbfnolock
> >  <numpages> testreadrate
> >  <numpages in block> <numsamples> testbigreadrate
> >
> > The `testreadrate` command does reads <numpages> random pages from the given extent. The answer you get gives random read performance.
> >
> > The `testbigreadrate` command does <numsamples> reads of <numpages in block> pages from random locations in the given extent. The answer you get gives you a measure of sequential read performance.
> >
> > Here's sample output from one of our desktop boxes on standard file system (basically reading from file buffer):
> >
> > ---------------------------------------------------------------------------------
> > % $GEMSTONE/sys/pgsvrslow
> > PGSVR>'extent0.dbf' opendbfnolock
> >
> > PGSVR>10000 testreadrate
> >
> > 10000 random pages read in 16 ms
> > Avg random read rate: 625000.00 pages/s (1.6000e-03 ms/read)
> >
> > PGSVR>100 20 testbigreadrate
> >
> > 2000 random pages read in 20 IO calls in 4 ms
> > Avg random IO rate: 5000.00 IO/s (2.0000e-01 ms/read)
> > Avg page read rate: 500000.00 pages/s (2.0000e-03 ms/ page read)
> > PGSVR>
> > ---------------------------------------------------------------------------------
> >
> > These commands can be run against the extent for a running stone ... but you'll want to get measurements with a variety of configurations...
> >
> > At the moment we're guessing that that the SAN might be optimized for sequential reads rather than random reads (i.e., buffering issues) ... also are you sure the you aren't be throttled by your provider?
> >
> > Finally it is worth looking at a copy of the config file for the stone to see if there's anything there...
> >
> > Dale
> >
> > ----- Original Message -----
> > | From: "Johan Brichau" <[hidden email]>
> > | To: "GemStone Seaside beta discussion" <[hidden email]>
> > | Sent: Tuesday, February 14, 2012 5:43:58 AM
> > | Subject: Re: [GS/SS Beta] slow data page reads?
> > |
> > | As mentioned in Dale's blogpost, I went on to try a raw disk
> > | partition for the extent and the tranlogs and got exactly the same
> > | results: *very* low disk read speed (see below). Starting Gemstone
> > | and reading the SPC takes a long time.
> > |
> > | We are pretty certain the SAN is not overloaded because all other
> > | disk operations can reach a lot higher speeds. For example, the
> > | copydbf operation from the extent file to the partition reached very
> > | good speeds of over 30MB/s.
> > |
> > | So we are only seeing this issue when gemstone is doing read access
> > | on this kind of setup. I have other servers where everything is
> > | running smoothly.
> > |
> > | If anybody has any ideas... that would be cool ;-)
> > |
> > | Johan
> > |
> > | Sample read speed during gemstone page read:
> > |
> > | Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> > | avgrq-sz avgqu-sz   await  svctm  %util
> > | sda5            111.60     0.00 37.00  0.00     0.58     0.00
> > |    32.00     1.00   26.90  27.01  99.92
> > |
> > |
> > | On 13 Feb 2012, at 21:09, Johan Brichau wrote:
> > |
> > | > Well.. it turns out that we were wrong and we still experience the
> > | > problem...
> > | >
> > | > Dale,
> > | >
> > | > What we are seeing sounds very similar to this:
> > | >
> > | > http://gemstonesoup.wordpress.com/2007/10/19/scaling-seaside-with-gemstones/
> > | >
> > | > " The issue with the i/o anomalies that we observed in Linux has
> > | > not been as easy to resolve. I spent some time tuning GemStone/S
> > | > to make sure that GemStone/S wasn't the source of the anomaly.
> > | > Finally our IS guy was able to reproduce the anomaly and he ran
> > | > into a few other folks on the net that have observed similar
> > | > anomalies.
> > | >
> > | > At this writing we haven't found a solution to the anomaly, but we
> > | > are pretty optimistic that it is resolvable. We've seen different
> > | > versions of Linux running on similar hardware that doesn't show
> > | > the anomaly, so it is either a function of the kernel version or
> > | > the settings of some of the kernel parameters. As soon as we
> > | > figure it out we'll let you know."
> > | >
> > | > Do you have more information on this?
> > | >
> > | > Johan
> > | >
> > | >
> > | > On 13 Feb 2012, at 19:39, Otto Behrens wrote:
> > | >
> > | >> Hi Johan,
> > | >>
> > | >> We had a machine hosted on a VPS, with a "state of the art" san,
> > | >> with
> > | >> similar issues. We complained every so often and the service
> > | >> provider
> > | >> responded with their inability to control some users on the same
> > | >> VPS
> > | >> host doing "extremely heavy" disk io. We got the client off the
> > | >> vps
> > | >> onto a normal machine with a SATA disk and have had joy ever since
> > | >> (10-20x improvement with the vps at its best).
> > | >>
> > | >> I think that the randomness of the reads thrown on top of other
> > | >> vms on
> > | >> the same host just caused unpredictable io; so we prefer avoiding
> > | >> vms.
> > | >>
> > | >> Alternatively, if it can work for you, put the extents in RAM.
> > | >>
> > | >> Otto
> > | >>
> > | >> On 13 Feb 2012, at 20:16, Johan Brichau <[hidden email]>
> > | >> wrote:
> > | >>
> > | >>> Hi all,
> > | >>>
> > | >>> Never mind my question below: our hosters have identified the
> > | >>> problem on their SAN.
> > | >>> Strange behavior though...
> > | >>>
> > | >>> phew ;-)
> > | >>> Johan
> > | >>>
> > | >>> On 13 Feb 2012, at 14:05, Johan Brichau wrote:
> > | >>>
> > | >>>> Hi Gemstoners,
> > | >>>>
> > | >>>> Is there any condition (other than a slow filesystem) that would
> > | >>>> trigger slow page reads when a gem needs to hit disk and load
> > | >>>> objects?
> > | >>>>
> > | >>>> Here is the problem I'm trying to chase: a seaside gem is
> > | >>>> processing a request and (according to the statmonit output)
> > | >>>> ends up requesting pages. The pageread process goes terribly
> > | >>>> slow (takes approx +- 50s) and I see only 5 to 15 pages per
> > | >>>> second being read during that time period. There is no other
> > | >>>> activity at that moment and I'm puzzled by why the read goes so
> > | >>>> slow (other than a slow filesystem -- see next).
> > | >>>>
> > | >>>> Because the iostat system monitoring also shows the same low
> > | >>>> read speed and indicates a 100% disk util statistic, my obvious
> > | >>>> first impression was that the disk is saturated and we have
> > | >>>> datastore problem. However, the disk read speed proves to be
> > | >>>> good when I'm doing other disk activity outside of Gemstone.
> > | >>>> Moreover, the _write_ speed is terribly good at all times.
> > | >>>>
> > | >>>> So, I'm currently trying to chase something that only triggers
> > | >>>> slow page read speed from a Gemstone topaz session.
> > | >>>>
> > | >>>> GEM_IO_LIMIT is set at default setting of 5000
> > | >>>>
> > | >>>> For illustration, these are some kind of io stats when Gemstone
> > | >>>> is doing read access:
> > | >>>>
> > | >>>> Time: 06:40:21 PM
> > | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> > | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> > | >>>> sda3              0.00     0.20  6.00  0.40     0.09     0.00
> > | >>>>    30.75     1.00  166.88 156.00  99.84
> > | >>>>
> > | >>>> Time: 06:40:26 PM
> > | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> > | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> > | >>>> sda3              0.00     0.20  8.20  0.40     0.13     0.00
> > | >>>>    31.07     1.05  119.91 115.72  99.52
> > | >>>>
> > | >>>> Time: 06:40:31 PM
> > | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> > | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> > | >>>> sda3              0.00     0.20  5.99  0.40     0.09     0.00
> > | >>>>    30.75     1.01  157.75 156.25  99.80
> > | >>>
> > | >
> > |
> > |
>
>


Reply | Threaded
Open this post in threaded view
|

Re: slow data page reads?

NorbertHartl
In reply to this post by Dale Henrichs
I have a dedicated server at hetzner hosting (server type EQ6). In this is linux installed with openvz as virtualization. Storage is on scsi disks with software raid1. We have 25 openvz instances running containing streaming server and such but not too high I/O load. The following numbers are from a gemstone installation within an openvz instance.

instance1:
----
PGSVR>'/opt/application/test/data/extent0.dbf' opendbfnolock
PGSVR>10000 testreadrate

10000 random pages read in 8538 ms
Avg random read rate: 1171.23 pages/s (8.5380e-01 ms/read)
PGSVR>100 20 testbigreadrate

2000 random pages read in 20 IO calls in 97 ms
Avg random IO rate: 206.19 IO/s (4.8500e+00 ms/read)
Avg page read rate: 20618.56 pages/s (4.8500e-02 ms/ page read)
PGSVR>

----
instance2:
----

PGSVR>'/opt/application/taskforus/data/extent0.dbf' opendbfnolock
PGSVR>10000 testreadrate


10000 random pages read in 185795 ms
Avg random read rate: 53.82 pages/s (1.8579e+01 ms/read)
PGSVR>PGSVR>100 20 testbigreadrate

2000 random pages read in 20 IO calls in 1009 ms
Avg random IO rate: 19.82 IO/s (5.0450e+01 ms/read)
Avg page read rate: 1982.16 pages/s (5.0450e-01 ms/ page read)

----

Instance1 and instance2 are nearly the same. But on instance1 I have an 150MB extent. On instance2 which is a real test server and where I do not garbage collect the extent size is 29GB.

Norbert

Am 14.02.2012 um 19:00 schrieb Dale Henrichs:

> Johan,
>
> We have a program that tests performance for a system doing random page reads against an extent. Launch `$GEMSTONE/sys/pgsvrslow` then enter the following two commands at the 'PGSVR>' prompt :
>
>  '$GEMSTONE/seaside/data/extent0.dbf' opendbfnolock
>  <numpages> testreadrate
>  <numpages in block> <numsamples> testbigreadrate
>
> The `testreadrate` command does reads <numpages> random pages from the given extent. The answer you get gives random read performance.
>
> The `testbigreadrate` command does <numsamples> reads of <numpages in block> pages from random locations in the given extent. The answer you get gives you a measure of sequential read performance.
>
> Here's sample output from one of our desktop boxes on standard file system (basically reading from file buffer):
>
> ---------------------------------------------------------------------------------
> % $GEMSTONE/sys/pgsvrslow
> PGSVR>'extent0.dbf' opendbfnolock
>
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 16 ms
> Avg random read rate: 625000.00 pages/s (1.6000e-03 ms/read)
>
> PGSVR>100 20 testbigreadrate
>
> 2000 random pages read in 20 IO calls in 4 ms
> Avg random IO rate: 5000.00 IO/s (2.0000e-01 ms/read)
> Avg page read rate: 500000.00 pages/s (2.0000e-03 ms/ page read)
> PGSVR>
> ---------------------------------------------------------------------------------
>
> These commands can be run against the extent for a running stone ... but you'll want to get measurements with a variety of configurations...
>
> At the moment we're guessing that that the SAN might be optimized for sequential reads rather than random reads (i.e., buffering issues) ... also are you sure the you aren't be throttled by your provider?
>
> Finally it is worth looking at a copy of the config file for the stone to see if there's anything there...
>
> Dale
>
> ----- Original Message -----
> | From: "Johan Brichau" <[hidden email]>
> | To: "GemStone Seaside beta discussion" <[hidden email]>
> | Sent: Tuesday, February 14, 2012 5:43:58 AM
> | Subject: Re: [GS/SS Beta] slow data page reads?
> |
> | As mentioned in Dale's blogpost, I went on to try a raw disk
> | partition for the extent and the tranlogs and got exactly the same
> | results: *very* low disk read speed (see below). Starting Gemstone
> | and reading the SPC takes a long time.
> |
> | We are pretty certain the SAN is not overloaded because all other
> | disk operations can reach a lot higher speeds. For example, the
> | copydbf operation from the extent file to the partition reached very
> | good speeds of over 30MB/s.
> |
> | So we are only seeing this issue when gemstone is doing read access
> | on this kind of setup. I have other servers where everything is
> | running smoothly.
> |
> | If anybody has any ideas... that would be cool ;-)
> |
> | Johan
> |
> | Sample read speed during gemstone page read:
> |
> | Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> | avgrq-sz avgqu-sz   await  svctm  %util
> | sda5            111.60     0.00 37.00  0.00     0.58     0.00
> |    32.00     1.00   26.90  27.01  99.92
> |
> |
> | On 13 Feb 2012, at 21:09, Johan Brichau wrote:
> |
> | > Well.. it turns out that we were wrong and we still experience the
> | > problem...
> | >
> | > Dale,
> | >
> | > What we are seeing sounds very similar to this:
> | >
> | > http://gemstonesoup.wordpress.com/2007/10/19/scaling-seaside-with-gemstones/
> | >
> | > " The issue with the i/o anomalies that we observed in Linux has
> | > not been as easy to resolve. I spent some time tuning GemStone/S
> | > to make sure that GemStone/S wasn't the source of the anomaly.
> | > Finally our IS guy was able to reproduce the anomaly and he ran
> | > into a few other folks on the net that have observed similar
> | > anomalies.
> | >
> | > At this writing we haven't found a solution to the anomaly, but we
> | > are pretty optimistic that it is resolvable. We've seen different
> | > versions of Linux running on similar hardware that doesn't show
> | > the anomaly, so it is either a function of the kernel version or
> | > the settings of some of the kernel parameters. As soon as we
> | > figure it out we'll let you know."
> | >
> | > Do you have more information on this?
> | >
> | > Johan
> | >
> | >
> | > On 13 Feb 2012, at 19:39, Otto Behrens wrote:
> | >
> | >> Hi Johan,
> | >>
> | >> We had a machine hosted on a VPS, with a "state of the art" san,
> | >> with
> | >> similar issues. We complained every so often and the service
> | >> provider
> | >> responded with their inability to control some users on the same
> | >> VPS
> | >> host doing "extremely heavy" disk io. We got the client off the
> | >> vps
> | >> onto a normal machine with a SATA disk and have had joy ever since
> | >> (10-20x improvement with the vps at its best).
> | >>
> | >> I think that the randomness of the reads thrown on top of other
> | >> vms on
> | >> the same host just caused unpredictable io; so we prefer avoiding
> | >> vms.
> | >>
> | >> Alternatively, if it can work for you, put the extents in RAM.
> | >>
> | >> Otto
> | >>
> | >> On 13 Feb 2012, at 20:16, Johan Brichau <[hidden email]>
> | >> wrote:
> | >>
> | >>> Hi all,
> | >>>
> | >>> Never mind my question below: our hosters have identified the
> | >>> problem on their SAN.
> | >>> Strange behavior though...
> | >>>
> | >>> phew ;-)
> | >>> Johan
> | >>>
> | >>> On 13 Feb 2012, at 14:05, Johan Brichau wrote:
> | >>>
> | >>>> Hi Gemstoners,
> | >>>>
> | >>>> Is there any condition (other than a slow filesystem) that would
> | >>>> trigger slow page reads when a gem needs to hit disk and load
> | >>>> objects?
> | >>>>
> | >>>> Here is the problem I'm trying to chase: a seaside gem is
> | >>>> processing a request and (according to the statmonit output)
> | >>>> ends up requesting pages. The pageread process goes terribly
> | >>>> slow (takes approx +- 50s) and I see only 5 to 15 pages per
> | >>>> second being read during that time period. There is no other
> | >>>> activity at that moment and I'm puzzled by why the read goes so
> | >>>> slow (other than a slow filesystem -- see next).
> | >>>>
> | >>>> Because the iostat system monitoring also shows the same low
> | >>>> read speed and indicates a 100% disk util statistic, my obvious
> | >>>> first impression was that the disk is saturated and we have
> | >>>> datastore problem. However, the disk read speed proves to be
> | >>>> good when I'm doing other disk activity outside of Gemstone.
> | >>>> Moreover, the _write_ speed is terribly good at all times.
> | >>>>
> | >>>> So, I'm currently trying to chase something that only triggers
> | >>>> slow page read speed from a Gemstone topaz session.
> | >>>>
> | >>>> GEM_IO_LIMIT is set at default setting of 5000
> | >>>>
> | >>>> For illustration, these are some kind of io stats when Gemstone
> | >>>> is doing read access:
> | >>>>
> | >>>> Time: 06:40:21 PM
> | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> | >>>> sda3              0.00     0.20  6.00  0.40     0.09     0.00
> | >>>>    30.75     1.00  166.88 156.00  99.84
> | >>>>
> | >>>> Time: 06:40:26 PM
> | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> | >>>> sda3              0.00     0.20  8.20  0.40     0.13     0.00
> | >>>>    31.07     1.05  119.91 115.72  99.52
> | >>>>
> | >>>> Time: 06:40:31 PM
> | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> | >>>> sda3              0.00     0.20  5.99  0.40     0.09     0.00
> | >>>>    30.75     1.01  157.75 156.25  99.80
> | >>>
> | >
> |
> |

Reply | Threaded
Open this post in threaded view
|

Re: slow data page reads?

Nick
In reply to this post by Nick
Rerunning tests against base 48M extent:

EC2 instance:

PGSVR>'/opt/gemstone/product/bin/extent0.seaside.dbf' opendbfnolock
PGSVR>10000 testreadrate

10000 random pages read in 2254 ms
Avg random read rate: 4436.56 pages/s (2.2540e-01 ms/read)
PGSVR>100 20 testbigreadrate

2000 random pages read in 20 IO calls in 13 ms
Avg random IO rate: 1538.46 IO/s (6.5000e-01 ms/read)
Avg page read rate: 153846.15 pages/s (6.5000e-03 ms/ page read)

---

Linode

PGSVR>'/opt/gemstone/product/bin/extent0.seaside.dbf' opendbfnolock
PGSVR>10000 testreadrate

10000 random pages read in 139 ms
Avg random read rate: 71942.45 pages/s (1.3900e-02 ms/read)
PGSVR>100 20 testbigreadrate

2000 random pages read in 20 IO calls in 10 ms
Avg random IO rate: 2000.00 IO/s (5.0000e-01 ms/read)
Avg page read rate: 200000.00 pages/s (5.0000e-03 ms/ page read)

---

Again with EC2 320M (170M free space) extent:

PGSVR>'/opt/gemstone/product/seaside/data/extent0.dbf' opendbfnolock
PGSVR>10000 testreadrate

10000 random pages read in 3927 ms
Avg random read rate: 2546.47 pages/s (3.9270e-01 ms/read)
PGSVR>100 20 testbigreadrate

2000 random pages read in 20 IO calls in 136 ms
Avg random IO rate: 147.06 IO/s (6.8000e+00 ms/read)
Avg page read rate: 14705.88 pages/s (6.8000e-02 ms/ page read)

-----

Again with the 4.9G (4.2G free space) Linode extent:

'/opt/gemstone/product/seaside/data/extent0.dbf' opendbfnolock
PGSVR>10000 testreadrate

10000 random pages read in 38912 ms
Avg random read rate: 256.99 pages/s (3.8912e+00 ms/read)
PGSVR>100 20 testbigreadrate

2000 random pages read in 20 IO calls in 623 ms
Avg random IO rate: 32.10 IO/s (3.1150e+01 ms/read)
Avg page read rate: 3210.27 pages/s (3.1150e-01 ms/ page read)


 
On 15 February 2012 13:22, Nick Ager <[hidden email]> wrote:
Hi Johan,

Aren't there two extent related variables. The size of the extent used in the test and the size of the extent in-use by the running stone when the test is executed.

I'll try the tests again using the base 48M /opt/gemstone/product/bin/extent0.seaside.dbf extent and get back to you

Nick

On 15 February 2012 12:46, Johan Brichau <[hidden email]> wrote:
Nick,

Thanks for those figures. At least that gives me some comparison on 'cloud' infrastructure.

First off: when I run the test on an extent that is not running, I'm getting *very* good statistics (in the order of those shown by Dale...).
Dale: is that normal?

I also notice that there is a correlation with extent size, although I suspect it has more to do with file system buffers. I have also had the chance to measure +10000 pages/s random read performance on an extent of 5Gb as well (with 3Gb free space). I'm not an expert on this (by far) but I suspect that smaller extents get buffered more quickly and that the same pages get read multiple times due to their total number being smaller.

Here are some representative runs on operational extents:

PGSVR>10000 testreadrate

10000 random pages read in 128217 ms
Avg random read rate: 77.99 pages/s (1.2822e+01 ms/read)
PGSVR>100 20 testbigreadrate

2000 random pages read in 20 IO calls in 4713 ms
Avg random IO rate: 4.24 IO/s (2.3565e+02 ms/read)
Avg page read rate: 424.36 pages/s (2.3565e+00 ms/ page read)

*****
PGSVR>10000 testreadrate

10000 random pages read in 95294 ms
Avg random read rate: 104.94 pages/s (9.5294e+00 ms/read)
PGSVR>100 20 testbigreadrate

2000 random pages read in 20 IO calls in 3378 ms
Avg random IO rate: 5.92 IO/s (1.6890e+02 ms/read)
Avg page read rate: 592.07 pages/s (1.6890e+00 ms/ page read)

On 15 Feb 2012, at 12:41, Nick Ager wrote:

> As another reference I have two Gemstone 2.4.4.1 servers.
>
> The first runs on a EC2 micro-instance (613 MB):
>
> $ $GEMSTONE/sys/pgsvrslow
> '/opt/gemstone/product/seaside/data/extent0.dbf' opendbfnolock
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 3600 ms
> Avg random read rate: 2777.78 pages/s (3.6000e-01 ms/read)
> PGSVR>100 20 testbigreadrate
>
> 2000 random pages read in 20 IO calls in 197 ms
> Avg random IO rate: 101.52 IO/s (9.8500e+00 ms/read)
> Avg page read rate: 10152.28 pages/s (9.8500e-02 ms/ page read)
>
> ----
>
> The second is running on a more sensibly sized (2G RAM) Linode instance:
>
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 67836 ms
> Avg random read rate: 147.41 pages/s (6.7836e+00 ms/read)
> PGSVR>100 20 testbigreadrate
>
> 2000 random pages read in 20 IO calls in 798 ms
> Avg random IO rate: 25.06 IO/s (3.9900e+01 ms/read)
> Avg page read rate: 2506.27 pages/s (3.9900e-01 ms/ page read)
>
> as the first was so slow I repeated:
>
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 28384 ms
> Avg random read rate: 352.31 pages/s (2.8384e+00 ms/read)
>
> and again:
>
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 12671 ms
> Avg random read rate: 789.20 pages/s (1.2671e+00 ms/read)
>
> odd...
>
> The EC2 instance has recently been through a backup and restore and has a smaller extent (320M). Whereas the Linode instance hasn't been through a backup and restore and has a very large extent (4.9G). Perhaps there is some memory paging which is effecting performance?
>
> Nick
>
> On 14 February 2012 19:21, Johan Brichau <[hidden email]> wrote:
> Hi Dale,
>
> Thanks for that pointer!
>
> Your numbers are quite impressive ... even on my local macbook pro I'm getting only numbers like the one below.
> Now, I've seen stats 10 times slower on the SAN, depending on the stone, so I'm currently gathering stats.
>
> ----
>
> PGSVR>'/opt/gemstone/stones/test/data/extent0.dbf' opendbfnolock
> PGSVR>10000 testreadrate
>
> 10000 random pages read in 29371 ms
> Avg random read rate: 340.47 pages/s (2.9371e+00 ms/read)
> PGSVR>100 20 testbigreadrate
>
> 2000 random pages read in 20 IO calls in 2163 ms
> Avg random IO rate: 9.25 IO/s (1.0815e+02 ms/read)
> Avg page read rate: 924.64 pages/s (1.0815e+00 ms/ page read)
>
>
> On 14 Feb 2012, at 19:00, Dale Henrichs wrote:
>
> > Johan,
> >
> > We have a program that tests performance for a system doing random page reads against an extent. Launch `$GEMSTONE/sys/pgsvrslow` then enter the following two commands at the 'PGSVR>' prompt :
> >
> >  '$GEMSTONE/seaside/data/extent0.dbf' opendbfnolock
> >  <numpages> testreadrate
> >  <numpages in block> <numsamples> testbigreadrate
> >
> > The `testreadrate` command does reads <numpages> random pages from the given extent. The answer you get gives random read performance.
> >
> > The `testbigreadrate` command does <numsamples> reads of <numpages in block> pages from random locations in the given extent. The answer you get gives you a measure of sequential read performance.
> >
> > Here's sample output from one of our desktop boxes on standard file system (basically reading from file buffer):
> >
> > ---------------------------------------------------------------------------------
> > % $GEMSTONE/sys/pgsvrslow
> > PGSVR>'extent0.dbf' opendbfnolock
> >
> > PGSVR>10000 testreadrate
> >
> > 10000 random pages read in 16 ms
> > Avg random read rate: 625000.00 pages/s (1.6000e-03 ms/read)
> >
> > PGSVR>100 20 testbigreadrate
> >
> > 2000 random pages read in 20 IO calls in 4 ms
> > Avg random IO rate: 5000.00 IO/s (2.0000e-01 ms/read)
> > Avg page read rate: 500000.00 pages/s (2.0000e-03 ms/ page read)
> > PGSVR>
> > ---------------------------------------------------------------------------------
> >
> > These commands can be run against the extent for a running stone ... but you'll want to get measurements with a variety of configurations...
> >
> > At the moment we're guessing that that the SAN might be optimized for sequential reads rather than random reads (i.e., buffering issues) ... also are you sure the you aren't be throttled by your provider?
> >
> > Finally it is worth looking at a copy of the config file for the stone to see if there's anything there...
> >
> > Dale
> >
> > ----- Original Message -----
> > | From: "Johan Brichau" <[hidden email]>
> > | To: "GemStone Seaside beta discussion" <[hidden email]>
> > | Sent: Tuesday, February 14, 2012 5:43:58 AM
> > | Subject: Re: [GS/SS Beta] slow data page reads?
> > |
> > | As mentioned in Dale's blogpost, I went on to try a raw disk
> > | partition for the extent and the tranlogs and got exactly the same
> > | results: *very* low disk read speed (see below). Starting Gemstone
> > | and reading the SPC takes a long time.
> > |
> > | We are pretty certain the SAN is not overloaded because all other
> > | disk operations can reach a lot higher speeds. For example, the
> > | copydbf operation from the extent file to the partition reached very
> > | good speeds of over 30MB/s.
> > |
> > | So we are only seeing this issue when gemstone is doing read access
> > | on this kind of setup. I have other servers where everything is
> > | running smoothly.
> > |
> > | If anybody has any ideas... that would be cool ;-)
> > |
> > | Johan
> > |
> > | Sample read speed during gemstone page read:
> > |
> > | Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> > | avgrq-sz avgqu-sz   await  svctm  %util
> > | sda5            111.60     0.00 37.00  0.00     0.58     0.00
> > |    32.00     1.00   26.90  27.01  99.92
> > |
> > |
> > | On 13 Feb 2012, at 21:09, Johan Brichau wrote:
> > |
> > | > Well.. it turns out that we were wrong and we still experience the
> > | > problem...
> > | >
> > | > Dale,
> > | >
> > | > What we are seeing sounds very similar to this:
> > | >
> > | > http://gemstonesoup.wordpress.com/2007/10/19/scaling-seaside-with-gemstones/
> > | >
> > | > " The issue with the i/o anomalies that we observed in Linux has
> > | > not been as easy to resolve. I spent some time tuning GemStone/S
> > | > to make sure that GemStone/S wasn't the source of the anomaly.
> > | > Finally our IS guy was able to reproduce the anomaly and he ran
> > | > into a few other folks on the net that have observed similar
> > | > anomalies.
> > | >
> > | > At this writing we haven't found a solution to the anomaly, but we
> > | > are pretty optimistic that it is resolvable. We've seen different
> > | > versions of Linux running on similar hardware that doesn't show
> > | > the anomaly, so it is either a function of the kernel version or
> > | > the settings of some of the kernel parameters. As soon as we
> > | > figure it out we'll let you know."
> > | >
> > | > Do you have more information on this?
> > | >
> > | > Johan
> > | >
> > | >
> > | > On 13 Feb 2012, at 19:39, Otto Behrens wrote:
> > | >
> > | >> Hi Johan,
> > | >>
> > | >> We had a machine hosted on a VPS, with a "state of the art" san,
> > | >> with
> > | >> similar issues. We complained every so often and the service
> > | >> provider
> > | >> responded with their inability to control some users on the same
> > | >> VPS
> > | >> host doing "extremely heavy" disk io. We got the client off the
> > | >> vps
> > | >> onto a normal machine with a SATA disk and have had joy ever since
> > | >> (10-20x improvement with the vps at its best).
> > | >>
> > | >> I think that the randomness of the reads thrown on top of other
> > | >> vms on
> > | >> the same host just caused unpredictable io; so we prefer avoiding
> > | >> vms.
> > | >>
> > | >> Alternatively, if it can work for you, put the extents in RAM.
> > | >>
> > | >> Otto
> > | >>
> > | >> On 13 Feb 2012, at 20:16, Johan Brichau <[hidden email]>
> > | >> wrote:
> > | >>
> > | >>> Hi all,
> > | >>>
> > | >>> Never mind my question below: our hosters have identified the
> > | >>> problem on their SAN.
> > | >>> Strange behavior though...
> > | >>>
> > | >>> phew ;-)
> > | >>> Johan
> > | >>>
> > | >>> On 13 Feb 2012, at 14:05, Johan Brichau wrote:
> > | >>>
> > | >>>> Hi Gemstoners,
> > | >>>>
> > | >>>> Is there any condition (other than a slow filesystem) that would
> > | >>>> trigger slow page reads when a gem needs to hit disk and load
> > | >>>> objects?
> > | >>>>
> > | >>>> Here is the problem I'm trying to chase: a seaside gem is
> > | >>>> processing a request and (according to the statmonit output)
> > | >>>> ends up requesting pages. The pageread process goes terribly
> > | >>>> slow (takes approx +- 50s) and I see only 5 to 15 pages per
> > | >>>> second being read during that time period. There is no other
> > | >>>> activity at that moment and I'm puzzled by why the read goes so
> > | >>>> slow (other than a slow filesystem -- see next).
> > | >>>>
> > | >>>> Because the iostat system monitoring also shows the same low
> > | >>>> read speed and indicates a 100% disk util statistic, my obvious
> > | >>>> first impression was that the disk is saturated and we have
> > | >>>> datastore problem. However, the disk read speed proves to be
> > | >>>> good when I'm doing other disk activity outside of Gemstone.
> > | >>>> Moreover, the _write_ speed is terribly good at all times.
> > | >>>>
> > | >>>> So, I'm currently trying to chase something that only triggers
> > | >>>> slow page read speed from a Gemstone topaz session.
> > | >>>>
> > | >>>> GEM_IO_LIMIT is set at default setting of 5000
> > | >>>>
> > | >>>> For illustration, these are some kind of io stats when Gemstone
> > | >>>> is doing read access:
> > | >>>>
> > | >>>> Time: 06:40:21 PM
> > | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> > | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> > | >>>> sda3              0.00     0.20  6.00  0.40     0.09     0.00
> > | >>>>    30.75     1.00  166.88 156.00  99.84
> > | >>>>
> > | >>>> Time: 06:40:26 PM
> > | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> > | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> > | >>>> sda3              0.00     0.20  8.20  0.40     0.13     0.00
> > | >>>>    31.07     1.05  119.91 115.72  99.52
> > | >>>>
> > | >>>> Time: 06:40:31 PM
> > | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
> > | >>>> avgrq-sz avgqu-sz   await  svctm  %util
> > | >>>> sda3              0.00     0.20  5.99  0.40     0.09     0.00
> > | >>>>    30.75     1.01  157.75 156.25  99.80
> > | >>>
> > | >
> > |
> > |
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Silly configuration issue...

James Foster-8
In reply to this post by Larry Kellogg
Larry,

>>> [seasideuser@ip-10-191-194-75 data]$ echo $GEMSTONE_SYS_CONF
>>> /opt/gemstone/product/seaside/data/system.conf

>>>  GEMSTONE_SYS_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/system.conf  <<<<<<<<<<<<<<<<????????????????

> lrwxrwxrwx  1 seasideuser seasideuser    47 Oct 12 19:18 product -> /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/

I believe that the symbolic link is being resolved and that these two are equivalent.


>>>  GemStone is unable to open the file !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf
>>>     reason = File = /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf
>>> DBF Op: Open; DBF Record: -1;
>>> Error: open() failure; System Codes: errno=13,EACCES, Authorization failure (permission denied)

> -rw------- 1 root        root        14680064 Feb 15 03:40 extent0.dbf

I believe that having the extent owned by root is preventing GemStone from opening the file since seasideuser does not have read/write permission.


>>>  GEMSTONE_EXE_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/seaside.conf

> I swear I only copied in the new extent0 but perhaps I somehow deleted the config file.

It seems that seaside.conf was there when GemStone tried to start (it would have reported an error otherwise), but was not there when you did the listing.


>>> I'm too tired to figure this out now. I've started and stopped this thing dozens of times and
>>> can't figure out why this is failing.

Perhaps a little rest and a fresh look will make it a bit more clear! You are actually doing very well and your willingness to journal your saga on the mailing list is helpful to others and provides a nice testimony that people are starting new projects with GemStone. That is a positive reinforcement for all of us!

-James

On Feb 15, 2012, at 2:06 AM, Lawrence Kellogg wrote:

>
> On Feb 14, 2012, at 11:40 PM, James Foster wrote:
>
>> Larry,
>>
>> What do you get from 'ls -alF /opt/gemstone'? What about 'ls -alF /opt/gemstone/product/seaside/data/'? I ask because I generally set up product as a symbolic link to the actual directory so it can be easily changed for server upgrades (when will you move to 2.4.5?). Also, we need to make sure that there really is a config file!
>>
>
> [seasideuser@ip-10-191-194-75 ~]$ ls -alF /opt/gemstone
> total 28
> drwxrwx---  5 seasideuser seasideuser  4096 Oct 12 19:18 ./
> drwxr-xr-x  4 root        root         4096 Oct 12 19:18 ../
> drwxr-xr-x 17 seasideuser seasideuser  4096 Oct 13 02:15 GemStone64Bit2.4.4.1-x86_64.Linux/
> drwxrwx---  2 seasideuser seasideuser  4096 Feb 15 04:18 locks/
> drwxrwxrwx  3 seasideuser seasideuser 12288 Feb 15 04:05 log/
> lrwxrwxrwx  1 seasideuser seasideuser    47 Oct 12 19:18 product -> /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/
> [seasideuser@ip-10-191-194-75 ~]$ ls -alF /opt/gemstone/product/seaside/data
> total 47144
> drwxrwxr-x 4 seasideuser seasideuser     4096 Feb 15 04:26 ./
> drwxrwxr-x 9 seasideuser seasideuser     4096 Jul 13  2010 ../
> drwxrwxr-x 2 seasideuser seasideuser     4096 Feb 15 02:41 backups/
> -rw------- 1 root        root        14680064 Feb 15 03:40 extent0.dbf
> -rw-r--r-- 1 seasideuser seasideuser      229 Jul 13  2010 gem.conf
> drwxr-xr-x 2 root        root            4096 Feb 15 03:40 old/
> -rw-r--r-- 1 seasideuser seasideuser      478 Jul 13  2010 system.conf
> -rw-rw-r-- 1 seasideuser seasideuser    35840 Feb 14 20:56 tranlog1.dbf
> -rw-rw-r-- 1 seasideuser seasideuser  4320256 Feb 15 03:39 tranlog2.dbf
> -rw-rw-r-- 1 seasideuser seasideuser   753152 Feb 15 03:24 tranlog5.dbf
> -rw-rw-r-- 1 seasideuser seasideuser    10240 Feb 15 03:24 tranlog6.dbf
> -rw-rw-r-- 1 seasideuser seasideuser 28443136 Feb 15 03:27 tranlog7.dbf
> [seasideuser@ip-10-191-194-75 ~]$
>
> This seems right to me, unless I'm missing something. Is the symbolic link for product correct?
> I see gem.conf and system.conf files. I swear I only copied in the new extent0 but perhaps I somehow deleted the config file.
> I did a ". defSeaside" and that runs ok.
>
> Just when I think I understand how it all works, I run into these sorts of issues that leave me stymied. I will migrate to
> 2.4.5 when I get a few spare moments. It's not easy being the admin/programmer/designer/customer service rep/marketeer, but it
> sure is a lot of fun.
>
>  Larry
>
>
>
>> Also, while the default installation puts the config files, extents, transaction logs, and log files in the product tree, I prefer to have them elsewhere (/opt/gemstone/etc, /opt/gemstone/data, /opt/gemstone/log) so that upgrades do not require moving as much stuff.
>>
>> -James
>>
>> On Feb 14, 2012, at 8:23 PM, Lawrence Kellogg wrote:
>>
>>> If this is true:
>>>
>>> [seasideuser@ip-10-191-194-75 data]$ echo $GEMSTONE_SYS_CONF
>>> /opt/gemstone/product/seaside/data/system.conf
>>>
>>>
>>> why does Gemstone insist in trying to start the stone in the other directory, the one with no extent0.dbf???
>>>
>>> I'm too tired to figure this out now. I've started and stopped this thing dozens of times and
>>> can't figure out why this is failing. :luckily it's just Staging so it can wait. I had shut down the stone
>>> to restore a backup and then reply log files….but I don't understand this error…
>>>
>>> Larry
>>>
>>>
>>>
>>> [seasideuser@ip-10-191-194-75 data]$ startstone seaside
>>> startstone[Info]: GemStone version '2.4.4.1'
>>> startstone[Info]: Starting Stone repository monitor "seaside".
>>> startstone[Info]: GEMSTONE is: "/opt/gemstone/product".
>>> startstone[Warning]: /usr/lib64/libposix-aio.so not found, using librt.so
>>> startstone[Info]:
>>>  GEMSTONE_SYS_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/system.conf  <<<<<<<<<<<<<<<<????????????????
>>>  GEMSTONE_EXE_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/seaside.conf <<<<<<<<<<<<<<<<<????????????????
>>> startstone[Info]: Log file is '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log'.
>>>
>>> startstone[Error]: Stone process (id=14170) has died.
>>> startstone[Error]: Examine '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log' for more information.  Excerpt follows:
>>>      configuredSize 1000 MB
>>>    Directory   1:
>>>      configured name $GEMSTONE_DATADIR/
>>>      expanded name /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/
>>>      configuredSize 1000 MB
>>>  -------------------------------------------------------
>>>
>>>  GemStone is unable to open the file !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf
>>>     reason = File = /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf
>>> DBF Op: Open; DBF Record: -1;
>>> Error: open() failure; System Codes: errno=13,EACCES, Authorization failure (permission denied)
>>>
>>>  An error occurred opening the repository for exclusive access.
>>>
>>>  Stone startup has failed.
>>>
>>
>

123