slow data page reads?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
50 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Silly configuration issue...

Larry Kellogg

On Feb 15, 2012, at 10:51 AM, James Foster wrote:

> Larry,
>
>>>> [seasideuser@ip-10-191-194-75 data]$ echo $GEMSTONE_SYS_CONF
>>>> /opt/gemstone/product/seaside/data/system.conf
>
>>>> GEMSTONE_SYS_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/system.conf  <<<<<<<<<<<<<<<<????????????????
>
>> lrwxrwxrwx  1 seasideuser seasideuser    47 Oct 12 19:18 product -> /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/
>
> I believe that the symbolic link is being resolved and that these two are equivalent.
>
>
>>>> GemStone is unable to open the file !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf
>>>>    reason = File = /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf
>>>> DBF Op: Open; DBF Record: -1;
>>>> Error: open() failure; System Codes: errno=13,EACCES, Authorization failure (permission denied)
>
>> -rw------- 1 root        root        14680064 Feb 15 03:40 extent0.dbf
>
> I believe that having the extent owned by root is preventing GemStone from opening the file since seasideuser does not have read/write permission.
>

  Yeah, this was my problem. I must have used sudo cp to move the extent over and messed up the permissions. Sometimes, the solution is right there, but you can't see it. I know I'm supposed to be using copydbf but it seems to me that sometimes that command complains when I try to run it….

  I was also confused by the soft link to the product directory….now, I see what is going on.


>
>>>> GEMSTONE_EXE_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/seaside.conf
>
>> I swear I only copied in the new extent0 but perhaps I somehow deleted the config file.
>
> It seems that seaside.conf was there when GemStone tried to start (it would have reported an error otherwise), but was not there when you did the listing.
>
>
>>>> I'm too tired to figure this out now. I've started and stopped this thing dozens of times and
>>>> can't figure out why this is failing.
>
> Perhaps a little rest and a fresh look will make it a bit more clear! You are actually doing very well and your willingness to journal your saga on the mailing list is helpful to others and provides a nice testimony that people are starting new projects with GemStone. That is a positive reinforcement for all of us!
>

  Thanks! Yes, I am very happy with how things turned out with my use of Gemstone on this project. The people on the list have been incredibly helpful.

  You don't know how ecstatic I am that I don't have to write object-relational mapping code! Life is good. I'll triumph over my admin problems. It's not that hard to administer a Gemstone system but you can't do it in your sleep. ;-)


  Larry



> -James
>
> On Feb 15, 2012, at 2:06 AM, Lawrence Kellogg wrote:
>
>>
>> On Feb 14, 2012, at 11:40 PM, James Foster wrote:
>>
>>> Larry,
>>>
>>> What do you get from 'ls -alF /opt/gemstone'? What about 'ls -alF /opt/gemstone/product/seaside/data/'? I ask because I generally set up product as a symbolic link to the actual directory so it can be easily changed for server upgrades (when will you move to 2.4.5?). Also, we need to make sure that there really is a config file!
>>>
>>
>> [seasideuser@ip-10-191-194-75 ~]$ ls -alF /opt/gemstone
>> total 28
>> drwxrwx---  5 seasideuser seasideuser  4096 Oct 12 19:18 ./
>> drwxr-xr-x  4 root        root         4096 Oct 12 19:18 ../
>> drwxr-xr-x 17 seasideuser seasideuser  4096 Oct 13 02:15 GemStone64Bit2.4.4.1-x86_64.Linux/
>> drwxrwx---  2 seasideuser seasideuser  4096 Feb 15 04:18 locks/
>> drwxrwxrwx  3 seasideuser seasideuser 12288 Feb 15 04:05 log/
>> lrwxrwxrwx  1 seasideuser seasideuser    47 Oct 12 19:18 product -> /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/
>> [seasideuser@ip-10-191-194-75 ~]$ ls -alF /opt/gemstone/product/seaside/data
>> total 47144
>> drwxrwxr-x 4 seasideuser seasideuser     4096 Feb 15 04:26 ./
>> drwxrwxr-x 9 seasideuser seasideuser     4096 Jul 13  2010 ../
>> drwxrwxr-x 2 seasideuser seasideuser     4096 Feb 15 02:41 backups/
>> -rw------- 1 root        root        14680064 Feb 15 03:40 extent0.dbf
>> -rw-r--r-- 1 seasideuser seasideuser      229 Jul 13  2010 gem.conf
>> drwxr-xr-x 2 root        root            4096 Feb 15 03:40 old/
>> -rw-r--r-- 1 seasideuser seasideuser      478 Jul 13  2010 system.conf
>> -rw-rw-r-- 1 seasideuser seasideuser    35840 Feb 14 20:56 tranlog1.dbf
>> -rw-rw-r-- 1 seasideuser seasideuser  4320256 Feb 15 03:39 tranlog2.dbf
>> -rw-rw-r-- 1 seasideuser seasideuser   753152 Feb 15 03:24 tranlog5.dbf
>> -rw-rw-r-- 1 seasideuser seasideuser    10240 Feb 15 03:24 tranlog6.dbf
>> -rw-rw-r-- 1 seasideuser seasideuser 28443136 Feb 15 03:27 tranlog7.dbf
>> [seasideuser@ip-10-191-194-75 ~]$
>>
>> This seems right to me, unless I'm missing something. Is the symbolic link for product correct?
>> I see gem.conf and system.conf files. I swear I only copied in the new extent0 but perhaps I somehow deleted the config file.
>> I did a ". defSeaside" and that runs ok.
>>
>> Just when I think I understand how it all works, I run into these sorts of issues that leave me stymied. I will migrate to
>> 2.4.5 when I get a few spare moments. It's not easy being the admin/programmer/designer/customer service rep/marketeer, but it
>> sure is a lot of fun.
>>
>> Larry
>>
>>
>>
>>> Also, while the default installation puts the config files, extents, transaction logs, and log files in the product tree, I prefer to have them elsewhere (/opt/gemstone/etc, /opt/gemstone/data, /opt/gemstone/log) so that upgrades do not require moving as much stuff.
>>>
>>> -James
>>>
>>> On Feb 14, 2012, at 8:23 PM, Lawrence Kellogg wrote:
>>>
>>>> If this is true:
>>>>
>>>> [seasideuser@ip-10-191-194-75 data]$ echo $GEMSTONE_SYS_CONF
>>>> /opt/gemstone/product/seaside/data/system.conf
>>>>
>>>>
>>>> why does Gemstone insist in trying to start the stone in the other directory, the one with no extent0.dbf???
>>>>
>>>> I'm too tired to figure this out now. I've started and stopped this thing dozens of times and
>>>> can't figure out why this is failing. :luckily it's just Staging so it can wait. I had shut down the stone
>>>> to restore a backup and then reply log files….but I don't understand this error…
>>>>
>>>> Larry
>>>>
>>>>
>>>>
>>>> [seasideuser@ip-10-191-194-75 data]$ startstone seaside
>>>> startstone[Info]: GemStone version '2.4.4.1'
>>>> startstone[Info]: Starting Stone repository monitor "seaside".
>>>> startstone[Info]: GEMSTONE is: "/opt/gemstone/product".
>>>> startstone[Warning]: /usr/lib64/libposix-aio.so not found, using librt.so
>>>> startstone[Info]:
>>>> GEMSTONE_SYS_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/system.conf  <<<<<<<<<<<<<<<<????????????????
>>>> GEMSTONE_EXE_CONF=/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/seaside.conf <<<<<<<<<<<<<<<<<????????????????
>>>> startstone[Info]: Log file is '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log'.
>>>>
>>>> startstone[Error]: Stone process (id=14170) has died.
>>>> startstone[Error]: Examine '/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/data/seaside.log' for more information.  Excerpt follows:
>>>>     configuredSize 1000 MB
>>>>   Directory   1:
>>>>     configured name $GEMSTONE_DATADIR/
>>>>     expanded name /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/
>>>>     configuredSize 1000 MB
>>>> -------------------------------------------------------
>>>>
>>>> GemStone is unable to open the file !TCP@localhost#dbf!/opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf
>>>>    reason = File = /opt/gemstone/GemStone64Bit2.4.4.1-x86_64.Linux/seaside/data/extent0.dbf
>>>> DBF Op: Open; DBF Record: -1;
>>>> Error: open() failure; System Codes: errno=13,EACCES, Authorization failure (permission denied)
>>>>
>>>> An error occurred opening the repository for exclusive access.
>>>>
>>>> Stone startup has failed.
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: slow data page reads?

Thelliez
In reply to this post by Nick
I ran the same tests on a MacBookPro with the extent on an internal SSD.  But what are the number showing?  I found that the more I run the tests, the better the performance is.  Is that measuring accesses to pages in memory or on disk?

PGSVR>'/opt/gemstone/product/seaside/data/extent0.dbf' opendbfnolock
PGSVR>10000 testreadrate
10000 random pages read in 4149 ms
Avg random read rate: 2410.22 pages/s (4.1490e-01 ms/read)

PGSVR>100 20 testbigreadrate
2000 random pages read in 20 IO calls in 184 ms
Avg random IO rate: 108.70 IO/s (9.2000e+00 ms/read)
Avg page read rate: 10869.57 pages/s (9.2000e-02 ms/ page read) 

and then it kept improving:

PGSVR>10000 testreadrate
10000 random pages read in 2303 ms
Avg random read rate: 4342.16 pages/s (2.3030e-01 ms/read)

PGSVR>10000 testreadrate
10000 random pages read in 1177 ms
Avg random read rate: 8496.18 pages/s (1.1770e-01 ms/read)

PGSVR>10000 testreadrate
10000 random pages read in 711 ms
Avg random read rate: 14064.70 pages/s (7.1100e-02 ms/read) 

PGSVR>10000 testreadrate
10000 random pages read in 431 ms 
Avg random read rate: 23201.86 pages/s (4.3100e-02 ms/read)

PGSVR>10000 testreadrate
10000 random pages read in 308 ms
Avg random read rate: 32467.53 pages/s (3.0800e-02 ms/read) 

PGSVR>10000 testreadrate
10000 random pages read in 214 ms
Avg random read rate: 46728.97 pages/s (2.1400e-02 ms/read) 

(ok, I stopped here ;-)

PGSVR>100 20 testbigreadrate
2000 random pages read in 20 IO calls in 61 ms
Avg random IO rate: 327.87 IO/s (3.0500e+00 ms/read)
Avg page read rate: 32786.89 pages/s (3.0500e-02 ms/ page read) 


Thierry


Reply | Threaded
Open this post in threaded view
|

Re: slow data page reads?

Dale Henrichs
In reply to this post by Johan Brichau-2
I'm no expert in this area but it seems to me that we are seeing the effects of file buffering.

When you are reading random pages from a large extent you are stressing the file buffers between you and the bits of iron oxide on the platter...

File buffering is affected by other processes that may be accessing disk ... on a relatively idle system, the file buffer may end up with big chunks of the extent in memory (the case for the numbers in the test results I sent in my original message). The file buffering algorithms are affected by writes to the file system:

   If on one is using file system tranlogs, then the stone flushes the
   file buffer on each commit and that can lead to significant
   performance issues.

We recommend that users put tranlogs onto raw file partitions, so that no buffers are involved ... on a SAN ... I'm not so sure that "raw partitions" are completely unbuffered... We also recommend that tranlogs and extents be located on separate extents to avoid bad interactions between the "hi sequential write rate" for tranlogs (where io speed is typically the limiting factor) and the "hi random read/write rates for extents"...

I'm not a file system tuning expert, but I think that this is the territory that is being entered ... the fact that "reads are slow and writes are fast" rings a bell where I seem to recall that the linux i/o strategy is to schedule writes in front of reads, so if there is concurrent writing and reading going on that may be one explanation, but my recollection could easily be flawed and I am no expert ...

EC2 instances, laptops, systems with multiple disk spindles, and systems with SANS all have a different tuning formulas ...

Dale
----- Original Message -----
| From: "Johan Brichau" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Wednesday, February 15, 2012 4:46:58 AM
| Subject: Re: [GS/SS Beta] slow data page reads?
|
| Nick,
|
| Thanks for those figures. At least that gives me some comparison on
| 'cloud' infrastructure.
|
| First off: when I run the test on an extent that is not running, I'm
| getting *very* good statistics (in the order of those shown by
| Dale...).
| Dale: is that normal?
|
| I also notice that there is a correlation with extent size, although
| I suspect it has more to do with file system buffers. I have also
| had the chance to measure +10000 pages/s random read performance on
| an extent of 5Gb as well (with 3Gb free space). I'm not an expert on
| this (by far) but I suspect that smaller extents get buffered more
| quickly and that the same pages get read multiple times due to their
| total number being smaller.
|
| Here are some representative runs on operational extents:
|
| PGSVR>10000 testreadrate
|
| 10000 random pages read in 128217 ms
| Avg random read rate: 77.99 pages/s (1.2822e+01 ms/read)
| PGSVR>100 20 testbigreadrate
|
| 2000 random pages read in 20 IO calls in 4713 ms
| Avg random IO rate: 4.24 IO/s (2.3565e+02 ms/read)
| Avg page read rate: 424.36 pages/s (2.3565e+00 ms/ page read)
|
| *****
| PGSVR>10000 testreadrate
|
| 10000 random pages read in 95294 ms
| Avg random read rate: 104.94 pages/s (9.5294e+00 ms/read)
| PGSVR>100 20 testbigreadrate
|
| 2000 random pages read in 20 IO calls in 3378 ms
| Avg random IO rate: 5.92 IO/s (1.6890e+02 ms/read)
| Avg page read rate: 592.07 pages/s (1.6890e+00 ms/ page read)
|
| On 15 Feb 2012, at 12:41, Nick Ager wrote:
|
| > As another reference I have two Gemstone 2.4.4.1 servers.
| >
| > The first runs on a EC2 micro-instance (613 MB):
| >
| > $ $GEMSTONE/sys/pgsvrslow
| > '/opt/gemstone/product/seaside/data/extent0.dbf' opendbfnolock
| > PGSVR>10000 testreadrate
| >
| > 10000 random pages read in 3600 ms
| > Avg random read rate: 2777.78 pages/s (3.6000e-01 ms/read)
| > PGSVR>100 20 testbigreadrate
| >
| > 2000 random pages read in 20 IO calls in 197 ms
| > Avg random IO rate: 101.52 IO/s (9.8500e+00 ms/read)
| > Avg page read rate: 10152.28 pages/s (9.8500e-02 ms/ page read)
| >
| > ----
| >
| > The second is running on a more sensibly sized (2G RAM) Linode
| > instance:
| >
| > PGSVR>10000 testreadrate
| >
| > 10000 random pages read in 67836 ms
| > Avg random read rate: 147.41 pages/s (6.7836e+00 ms/read)
| > PGSVR>100 20 testbigreadrate
| >
| > 2000 random pages read in 20 IO calls in 798 ms
| > Avg random IO rate: 25.06 IO/s (3.9900e+01 ms/read)
| > Avg page read rate: 2506.27 pages/s (3.9900e-01 ms/ page read)
| >
| > as the first was so slow I repeated:
| >
| > PGSVR>10000 testreadrate
| >
| > 10000 random pages read in 28384 ms
| > Avg random read rate: 352.31 pages/s (2.8384e+00 ms/read)
| >
| > and again:
| >
| > PGSVR>10000 testreadrate
| >
| > 10000 random pages read in 12671 ms
| > Avg random read rate: 789.20 pages/s (1.2671e+00 ms/read)
| >
| > odd...
| >
| > The EC2 instance has recently been through a backup and restore and
| > has a smaller extent (320M). Whereas the Linode instance hasn't
| > been through a backup and restore and has a very large extent
| > (4.9G). Perhaps there is some memory paging which is effecting
| > performance?
| >
| > Nick
| >
| > On 14 February 2012 19:21, Johan Brichau <[hidden email]>
| > wrote:
| > Hi Dale,
| >
| > Thanks for that pointer!
| >
| > Your numbers are quite impressive ... even on my local macbook pro
| > I'm getting only numbers like the one below.
| > Now, I've seen stats 10 times slower on the SAN, depending on the
| > stone, so I'm currently gathering stats.
| >
| > ----
| >
| > PGSVR>'/opt/gemstone/stones/test/data/extent0.dbf' opendbfnolock
| > PGSVR>10000 testreadrate
| >
| > 10000 random pages read in 29371 ms
| > Avg random read rate: 340.47 pages/s (2.9371e+00 ms/read)
| > PGSVR>100 20 testbigreadrate
| >
| > 2000 random pages read in 20 IO calls in 2163 ms
| > Avg random IO rate: 9.25 IO/s (1.0815e+02 ms/read)
| > Avg page read rate: 924.64 pages/s (1.0815e+00 ms/ page read)
| >
| >
| > On 14 Feb 2012, at 19:00, Dale Henrichs wrote:
| >
| > > Johan,
| > >
| > > We have a program that tests performance for a system doing
| > > random page reads against an extent. Launch
| > > `$GEMSTONE/sys/pgsvrslow` then enter the following two commands
| > > at the 'PGSVR>' prompt :
| > >
| > >  '$GEMSTONE/seaside/data/extent0.dbf' opendbfnolock
| > >  <numpages> testreadrate
| > >  <numpages in block> <numsamples> testbigreadrate
| > >
| > > The `testreadrate` command does reads <numpages> random pages
| > > from the given extent. The answer you get gives random read
| > > performance.
| > >
| > > The `testbigreadrate` command does <numsamples> reads of
| > > <numpages in block> pages from random locations in the given
| > > extent. The answer you get gives you a measure of sequential
| > > read performance.
| > >
| > > Here's sample output from one of our desktop boxes on standard
| > > file system (basically reading from file buffer):
| > >
| > > ---------------------------------------------------------------------------------
| > > % $GEMSTONE/sys/pgsvrslow
| > > PGSVR>'extent0.dbf' opendbfnolock
| > >
| > > PGSVR>10000 testreadrate
| > >
| > > 10000 random pages read in 16 ms
| > > Avg random read rate: 625000.00 pages/s (1.6000e-03 ms/read)
| > >
| > > PGSVR>100 20 testbigreadrate
| > >
| > > 2000 random pages read in 20 IO calls in 4 ms
| > > Avg random IO rate: 5000.00 IO/s (2.0000e-01 ms/read)
| > > Avg page read rate: 500000.00 pages/s (2.0000e-03 ms/ page read)
| > > PGSVR>
| > > ---------------------------------------------------------------------------------
| > >
| > > These commands can be run against the extent for a running stone
| > > ... but you'll want to get measurements with a variety of
| > > configurations...
| > >
| > > At the moment we're guessing that that the SAN might be optimized
| > > for sequential reads rather than random reads (i.e., buffering
| > > issues) ... also are you sure the you aren't be throttled by
| > > your provider?
| > >
| > > Finally it is worth looking at a copy of the config file for the
| > > stone to see if there's anything there...
| > >
| > > Dale
| > >
| > > ----- Original Message -----
| > > | From: "Johan Brichau" <[hidden email]>
| > > | To: "GemStone Seaside beta discussion"
| > > | <[hidden email]>
| > > | Sent: Tuesday, February 14, 2012 5:43:58 AM
| > > | Subject: Re: [GS/SS Beta] slow data page reads?
| > > |
| > > | As mentioned in Dale's blogpost, I went on to try a raw disk
| > > | partition for the extent and the tranlogs and got exactly the
| > > | same
| > > | results: *very* low disk read speed (see below). Starting
| > > | Gemstone
| > > | and reading the SPC takes a long time.
| > > |
| > > | We are pretty certain the SAN is not overloaded because all
| > > | other
| > > | disk operations can reach a lot higher speeds. For example, the
| > > | copydbf operation from the extent file to the partition reached
| > > | very
| > > | good speeds of over 30MB/s.
| > > |
| > > | So we are only seeing this issue when gemstone is doing read
| > > | access
| > > | on this kind of setup. I have other servers where everything is
| > > | running smoothly.
| > > |
| > > | If anybody has any ideas... that would be cool ;-)
| > > |
| > > | Johan
| > > |
| > > | Sample read speed during gemstone page read:
| > > |
| > > | Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s
| > > | avgrq-sz avgqu-sz   await  svctm  %util
| > > | sda5            111.60     0.00 37.00  0.00     0.58     0.00
| > > |    32.00     1.00   26.90  27.01  99.92
| > > |
| > > |
| > > | On 13 Feb 2012, at 21:09, Johan Brichau wrote:
| > > |
| > > | > Well.. it turns out that we were wrong and we still
| > > | > experience the
| > > | > problem...
| > > | >
| > > | > Dale,
| > > | >
| > > | > What we are seeing sounds very similar to this:
| > > | >
| > > | > http://gemstonesoup.wordpress.com/2007/10/19/scaling-seaside-with-gemstones/
| > > | >
| > > | > " The issue with the i/o anomalies that we observed in Linux
| > > | > has
| > > | > not been as easy to resolve. I spent some time tuning
| > > | > GemStone/S
| > > | > to make sure that GemStone/S wasn't the source of the
| > > | > anomaly.
| > > | > Finally our IS guy was able to reproduce the anomaly and he
| > > | > ran
| > > | > into a few other folks on the net that have observed similar
| > > | > anomalies.
| > > | >
| > > | > At this writing we haven't found a solution to the anomaly,
| > > | > but we
| > > | > are pretty optimistic that it is resolvable. We've seen
| > > | > different
| > > | > versions of Linux running on similar hardware that doesn't
| > > | > show
| > > | > the anomaly, so it is either a function of the kernel version
| > > | > or
| > > | > the settings of some of the kernel parameters. As soon as we
| > > | > figure it out we'll let you know."
| > > | >
| > > | > Do you have more information on this?
| > > | >
| > > | > Johan
| > > | >
| > > | >
| > > | > On 13 Feb 2012, at 19:39, Otto Behrens wrote:
| > > | >
| > > | >> Hi Johan,
| > > | >>
| > > | >> We had a machine hosted on a VPS, with a "state of the art"
| > > | >> san,
| > > | >> with
| > > | >> similar issues. We complained every so often and the service
| > > | >> provider
| > > | >> responded with their inability to control some users on the
| > > | >> same
| > > | >> VPS
| > > | >> host doing "extremely heavy" disk io. We got the client off
| > > | >> the
| > > | >> vps
| > > | >> onto a normal machine with a SATA disk and have had joy ever
| > > | >> since
| > > | >> (10-20x improvement with the vps at its best).
| > > | >>
| > > | >> I think that the randomness of the reads thrown on top of
| > > | >> other
| > > | >> vms on
| > > | >> the same host just caused unpredictable io; so we prefer
| > > | >> avoiding
| > > | >> vms.
| > > | >>
| > > | >> Alternatively, if it can work for you, put the extents in
| > > | >> RAM.
| > > | >>
| > > | >> Otto
| > > | >>
| > > | >> On 13 Feb 2012, at 20:16, Johan Brichau <[hidden email]>
| > > | >> wrote:
| > > | >>
| > > | >>> Hi all,
| > > | >>>
| > > | >>> Never mind my question below: our hosters have identified
| > > | >>> the
| > > | >>> problem on their SAN.
| > > | >>> Strange behavior though...
| > > | >>>
| > > | >>> phew ;-)
| > > | >>> Johan
| > > | >>>
| > > | >>> On 13 Feb 2012, at 14:05, Johan Brichau wrote:
| > > | >>>
| > > | >>>> Hi Gemstoners,
| > > | >>>>
| > > | >>>> Is there any condition (other than a slow filesystem) that
| > > | >>>> would
| > > | >>>> trigger slow page reads when a gem needs to hit disk and
| > > | >>>> load
| > > | >>>> objects?
| > > | >>>>
| > > | >>>> Here is the problem I'm trying to chase: a seaside gem is
| > > | >>>> processing a request and (according to the statmonit
| > > | >>>> output)
| > > | >>>> ends up requesting pages. The pageread process goes
| > > | >>>> terribly
| > > | >>>> slow (takes approx +- 50s) and I see only 5 to 15 pages
| > > | >>>> per
| > > | >>>> second being read during that time period. There is no
| > > | >>>> other
| > > | >>>> activity at that moment and I'm puzzled by why the read
| > > | >>>> goes so
| > > | >>>> slow (other than a slow filesystem -- see next).
| > > | >>>>
| > > | >>>> Because the iostat system monitoring also shows the same
| > > | >>>> low
| > > | >>>> read speed and indicates a 100% disk util statistic, my
| > > | >>>> obvious
| > > | >>>> first impression was that the disk is saturated and we
| > > | >>>> have
| > > | >>>> datastore problem. However, the disk read speed proves to
| > > | >>>> be
| > > | >>>> good when I'm doing other disk activity outside of
| > > | >>>> Gemstone.
| > > | >>>> Moreover, the _write_ speed is terribly good at all times.
| > > | >>>>
| > > | >>>> So, I'm currently trying to chase something that only
| > > | >>>> triggers
| > > | >>>> slow page read speed from a Gemstone topaz session.
| > > | >>>>
| > > | >>>> GEM_IO_LIMIT is set at default setting of 5000
| > > | >>>>
| > > | >>>> For illustration, these are some kind of io stats when
| > > | >>>> Gemstone
| > > | >>>> is doing read access:
| > > | >>>>
| > > | >>>> Time: 06:40:21 PM
| > > | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s
| > > | >>>>    wMB/s
| > > | >>>> avgrq-sz avgqu-sz   await  svctm  %util
| > > | >>>> sda3              0.00     0.20  6.00  0.40     0.09
| > > | >>>>     0.00
| > > | >>>>    30.75     1.00  166.88 156.00  99.84
| > > | >>>>
| > > | >>>> Time: 06:40:26 PM
| > > | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s
| > > | >>>>    wMB/s
| > > | >>>> avgrq-sz avgqu-sz   await  svctm  %util
| > > | >>>> sda3              0.00     0.20  8.20  0.40     0.13
| > > | >>>>     0.00
| > > | >>>>    31.07     1.05  119.91 115.72  99.52
| > > | >>>>
| > > | >>>> Time: 06:40:31 PM
| > > | >>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s
| > > | >>>>    wMB/s
| > > | >>>> avgrq-sz avgqu-sz   await  svctm  %util
| > > | >>>> sda3              0.00     0.20  5.99  0.40     0.09
| > > | >>>>     0.00
| > > | >>>>    30.75     1.01  157.75 156.25  99.80
| > > | >>>
| > > | >
| > > |
| > > |
| >
| >
|
|
Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

Larry Kellogg
In reply to this post by James Foster-8
>
> If you want to keep a "warm standby" then you have a second system in restore mode and each time you finish a transaction log on the production system you restore it to the standby system. I know of at least one customer who has the standby system in a second data center and transfers a log every 15 minutes. (You don't have to wait for it to be full; you can explicitly start a new log.)

James,
    So, if you don't have your secondary system in restore mode all the time, as I don't, because I use the secondary system for development, then I need to take another full backup of production, and start with a new extent to move Production to Staging. It's not really safe to try to apply tranlogs from the primary system to the secondary system, is it?


  Larry


Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

James Foster-8
On Feb 15, 2012, at 6:47 PM, Lawrence Kellogg wrote:

>> If you want to keep a "warm standby" then you have a second system in restore mode and each time you finish a transaction log on the production system you restore it to the standby system. I know of at least one customer who has the standby system in a second data center and transfers a log every 15 minutes. (You don't have to wait for it to be full; you can explicitly start a new log.)
>
> James,
>    So, if you don't have your secondary system in restore mode all the time, as I don't, because I use the secondary system for development, then I need to take another full backup of production, and start with a new extent to move Production to Staging. It's not really safe to try to apply tranlogs from the primary system to the secondary system, is it?
>  Larry

Larry,
When a system is in restore mode, no transactions may be committed (so you can look, but not do any development). When a system is not in restore mode, no transaction logs may be applied. So you are right: not only is it not safe, it isn't even allowed.
James
Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

Larry Kellogg

On Feb 15, 2012, at 10:13 PM, James Foster wrote:

> On Feb 15, 2012, at 6:47 PM, Lawrence Kellogg wrote:
>
>>> If you want to keep a "warm standby" then you have a second system in restore mode and each time you finish a transaction log on the production system you restore it to the standby system. I know of at least one customer who has the standby system in a second data center and transfers a log every 15 minutes. (You don't have to wait for it to be full; you can explicitly start a new log.)
>>
>> James,
>>   So, if you don't have your secondary system in restore mode all the time, as I don't, because I use the secondary system for development, then I need to take another full backup of production, and start with a new extent to move Production to Staging. It's not really safe to try to apply tranlogs from the primary system to the secondary system, is it?
>> Larry
>
> Larry,
> When a system is in restore mode, no transactions may be committed (so you can look, but not do any development). When a system is not in restore mode, no transaction logs may be applied. So you are right: not only is it not safe, it isn't even allowed.

 Thanks for this comment, it clears up a mistaken impression I have been carrying around about my Staging system. I have been thinking that there is a way to use it for development and apply tranlogs from Production, but that is just not permitted. I spent a lot of time searching for an option to apply tranlogs outside of restore mode, or a way to get into restore mode without restoreFromBackup, ha ha.

Larry


> James




Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

James Foster-8
On Feb 16, 2012, at 2:16 AM, Lawrence Kellogg wrote:

> I have been thinking that there is a way to use it for development and apply tranlogs from Production, but that is just not permitted. I spent a lot of time searching for an option to apply tranlogs outside of restore mode, or a way to get into restore mode without restoreFromBackup, ha ha.

Right. if you made changes to your staging system, then replayed logs from production, the object references could be corrupt. Imagine you deleted an object in staging (dereferenced and then GC), then replayed a transaction that referenced the missing (or replaced!) object. Since the transaction logs deal with object IDs, you could end up with a reference to a missing object. GemStone attempts to avoid that sort of database corruption.

Of course, you were thinking of modifying "code" in staging and then updating "data" from production. In a world where code and data are separate that makes sense. In a world ("image") where everything is just an object referenced by a 64-bit identifier, that doesn't work. At the object manager level there is no distinction between code and data. Even if you could distinguish between code and data, new object identifiers could be duplicated because they could be assigned from either system. So a method in your staging system might have the same ID (OOP) as a customer in your production system. There is no way to reply logs from one in the other.

-James
Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

Larry Kellogg

On Feb 16, 2012, at 12:40 PM, James Foster wrote:

> On Feb 16, 2012, at 2:16 AM, Lawrence Kellogg wrote:
>
>> I have been thinking that there is a way to use it for development and apply tranlogs from Production, but that is just not permitted. I spent a lot of time searching for an option to apply tranlogs outside of restore mode, or a way to get into restore mode without restoreFromBackup, ha ha.
>
> Right. if you made changes to your staging system, then replayed logs from production, the object references could be corrupt. Imagine you deleted an object in staging (dereferenced and then GC), then replayed a transaction that referenced the missing (or replaced!) object. Since the transaction logs deal with object IDs, you could end up with a reference to a missing object. GemStone attempts to avoid that sort of database corruption.
>
> Of course, you were thinking of modifying "code" in staging and then updating "data" from production. In a world where code and data are separate that makes sense. In a world ("image") where everything is just an object referenced by a 64-bit identifier, that doesn't work. At the object manager level there is no distinction between code and data. Even if you could distinguish between code and data, new object identifiers could be duplicated because they could be assigned from either system. So a method in your staging system might have the same ID (OOP) as a customer in your production system. There is no way to reply logs from one in the other.
>

James,
  Of course, you're absolutely right, it makes perfect sense to me, and i knew this already. I just lost sight of how the system worked when thinking about trying to keep two versions of the database in synch.

  I'm still mulling over whether I need a version of Gemstone in recovery model, pulling in tranlogs every fifteen minutes. It's a slick approach, I have to say. Does it come down to figuring out the chances of losing my primary extent? I could buy another large instance, in a different region, and start doing that. Of course, it is more expensive, hosting-wise. Still, might be a good idea.

  Thoughts?

  Larry



> -James

Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

James Foster-8
On Feb 16, 2012, at 11:20 AM, Lawrence Kellogg wrote:

>  I'm still mulling over whether I need a version of Gemstone in recovery model, pulling in tranlogs every fifteen minutes. It's a slick approach, I have to say. Does it come down to figuring out the chances of losing my primary extent? I could buy another large instance, in a different region, and start doing that. Of course, it is more expensive, hosting-wise. Still, might be a good idea.
>
> Thoughts?


What does your professional liability insurance carrier recommend?

Seriously, disaster recovery strategy is strictly a matter of insurance. What risks are you willing to take? What is the likelihood of Amazon going down (it has happened)? What is the likelihood of two Amazon data centers going down together (it has happened!)? How much would your business and customers suffer if your application were off-line for some time?

Also, keep in mind that even with a "warm standby" you might lose some transactions (logs that hadn't been copied over) and it would take some manual intervention to switch over to your standby system. If Amazon came down how soon would it come back up? Would the disks be lost or just unavailable for some minutes?

I suspect that Amazon has some redundancy on disks, so the risk of data loss due to disk failure is small (though it has happened!). I suspect that most Amazon outages are short and do not involve loss of data (e.g., a machine dies and they restart on another server with the old disks). I suspect that the system would be back up without loss of data before you knew it had been down. If a system were down, you would need to discover it, and decide to start the standby system with loss of data since the last transaction log. At that point you would be balancing the cost of delay ("they will probably have everything back up soon without loss of data") against the cost of being down ("I'm tired of being down and my customers can stand the loss of a few minutes of activity").

For many startups the cost of the redundancy probably isn't worth it but by the time they are successful they will be worried about other problems and this won't come up till there actually is a disaster. If you don't bother with the warm standby, then at least copy regular backups to an off-site location. Also, mark your calendar for a quarterly review of disaster recovery options. Give it a few minutes thought every few months.

-James


Reply | Threaded
Open this post in threaded view
|

Re: Extent size explosion

Larry Kellogg

On Feb 16, 2012, at 2:41 PM, James Foster wrote:

> On Feb 16, 2012, at 11:20 AM, Lawrence Kellogg wrote:
>
>> I'm still mulling over whether I need a version of Gemstone in recovery model, pulling in tranlogs every fifteen minutes. It's a slick approach, I have to say. Does it come down to figuring out the chances of losing my primary extent? I could buy another large instance, in a different region, and start doing that. Of course, it is more expensive, hosting-wise. Still, might be a good idea.
>>
>> Thoughts?
>
>
> What does your professional liability insurance carrier recommend?

  Good question. I'll try to find out.

>
> Seriously, disaster recovery strategy is strictly a matter of insurance. What risks are you willing to take? What is the likelihood of Amazon going down (it has happened)? What is the likelihood of two Amazon data centers going down together (it has happened!)? How much would your business and customers suffer if your application were off-line for some time?
>
> Also, keep in mind that even with a "warm standby" you might lose some transactions (logs that hadn't been copied over) and it would take some manual intervention to switch over to your standby system. If Amazon came down how soon would it come back up? Would the disks be lost or just unavailable for some minutes?
>
> I suspect that Amazon has some redundancy on disks, so the risk of data loss due to disk failure is small (though it has happened!). I suspect that most Amazon outages are short and do not involve loss of data (e.g., a machine dies and they restart on another server with the old disks). I suspect that the system would be back up without loss of data before you knew it had been down. If a system were down, you would need to discover it, and decide to start the standby system with loss of data since the last transaction log. At that point you would be balancing the cost of delay ("they will probably have everything back up soon without loss of data") against the cost of being down ("I'm tired of being down and my customers can stand the loss of a few minutes of activity").
>
> For many startups the cost of the redundancy probably isn't worth it but by the time they are successful they will be worried about other problems and this won't come up till there actually is a disaster. If you don't bother with the warm standby, then at least copy regular backups to an off-site location. Also, mark your calendar for a quarterly review of disaster recovery options. Give it a few minutes thought every few months.
>

 Thanks for this discussion. It provides a lot of food for thought. I think you're right, the cost of this kind of redundancy probably isn't worth it, and is superseded by what Amazon is providing. After all, I suppose having a better redundancy capability is part of the reason why I'm paying the big bucks for hosting on Amazon. I'm sure Amazon doesn't want to be seen as an unreliable host. Being seen as unreliable a surefire way for them to lose the hosting business.

  I definitely will make periodic backups and stash them places so I have a fallback plan, in case disaster strikes. In the end, we all take our chances.

  Regards,

  Larry


> -James
>
>

123