lock not acquired and error

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

lock not acquired and error

EstebanLM
Hi,

I'm having a recurrent error in my production server: more or less once a week, sessions start to die with lock not acquired and then some nil errors. Something more or less like this:

-----------  Unreportable ERROR Encountered: 2012-03-15T03:25:60-07:00UserDefinedError 2318: User defined error, <nil>-----------
-----------  Unreportable ERROR Encountered: 2012-03-15T03:51:06-07:00UserDefinedError 2318: User defined error, <nil>-----------
----------- Lock not acquired - retrying LOG ENTRY: WASessionLockNotAcquiredException: Session lock denied: 2075-----------
-----------  Unreportable ERROR Encountered: 2012-03-15T06:06:15-07:00UserDefinedError 2318: User defined error, <nil>-----------
-----------  Unreportable ERROR Encountered: 2012-03-15T07:06:07-07:00UserDefinedError 2318: User defined error, <nil>-----------
-----------  Unreportable ERROR Encountered: 2012-03-15T08:08:21-07:00UserDefinedError 2318: User defined error, <nil>-----------
-----------  Unreportable ERROR Encountered: 2012-03-15T09:21:34-07:00UserDefinedError 2318: User defined error, <nil>-----------

I think is related (or should be) about WASession objects not being garbage collected or something like that (also, my database is 9g and should not be bigger than 1.5-2g).

So... How can I clean all old objects?  (I don't want to export data and import it again)
Can this be related? Any other idea?

thanks,
Esteban
Reply | Threaded
Open this post in threaded view
|

Re: lock not acquired and error

Dale Henrichs
Esteban,

The 'Lock not acquired - retrying' message is not indicative of a real problem ... it indicates that you have two concurrent requests for the same session ... this is not unusual if you are using something like Ajax.

The "Unreportable Error" is coming from FSGsSocketServer>>notifyUnreportableError: and indicates that you are getting a FastCGI error of some sort ... odd that you're getting nil. There are 4 senders of that message, so a little probing with print statements should show the root cause ... when you you get more info about the real error condition, I'd be interested, since that's an obvious hole in the error handling.

With FastCGI getting errors your sessions will "die", so you'll want to track that one down a bit more.

First, I assume that you are running the maintenance vm for seaside session expiration and garbage collection.

Next, I would be suspicious that the ObjectLog could be hanging onto a bunch or error continuations that in turn are keeping a bunch of objects alive. The following expression will give you a count of continuations in your ObjectLog:

    (ObjectLogEntry objectLog select: [ :ea | ea hasContinuation ]) size.

If It's a big number then you should consider cleaning out the continuations:

    ObjectLogEntry objectLog removeAll: (ObjectLogEntry objectLog select: [ :ea | ea hasContinuation ])

Run an MFC to reap the garbage.

Before going to far, you should run a file size report:

  SystemRepository fileSizeReport.

to find out how much of your 9Bb repository is data and how much is free space ... just in case you have more free space than you thought.

There are also several persistent caches[1] that you should clean up before getting to far into this.

If you've done all of the above and the repository is still bigger than you think it should be, then it is time to analyze the repository contents and look for leaks.

James has written tool for scanning a backup of your repository and listing instance counts by class ... this report should give you a starting point for tracking down instances of classes that you don't think should be surviving in your repository.

Dale

[1] http://code.google.com/p/glassdb/wiki/ClearPersistentCaches
[2] http://seaside.gemstone.com/ss/ScanBackup.html

----- Original Message -----
| From: "Esteban Lorenzano" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Friday, March 16, 2012 6:30:10 AM
| Subject: [GS/SS Beta] lock not acquired and error
|
| Hi,
|
| I'm having a recurrent error in my production server: more or less
| once a week, sessions start to die with lock not acquired and then
| some nil errors. Something more or less like this:
|
| -----------  Unreportable ERROR Encountered:
| 2012-03-15T03:25:60-07:00UserDefinedError 2318: User defined error,
| <nil>-----------
| -----------  Unreportable ERROR Encountered:
| 2012-03-15T03:51:06-07:00UserDefinedError 2318: User defined error,
| <nil>-----------
| ----------- Lock not acquired - retrying LOG ENTRY:
| WASessionLockNotAcquiredException: Session lock denied:
| 2075-----------
| -----------  Unreportable ERROR Encountered:
| 2012-03-15T06:06:15-07:00UserDefinedError 2318: User defined error,
| <nil>-----------
| -----------  Unreportable ERROR Encountered:
| 2012-03-15T07:06:07-07:00UserDefinedError 2318: User defined error,
| <nil>-----------
| -----------  Unreportable ERROR Encountered:
| 2012-03-15T08:08:21-07:00UserDefinedError 2318: User defined error,
| <nil>-----------
| -----------  Unreportable ERROR Encountered:
| 2012-03-15T09:21:34-07:00UserDefinedError 2318: User defined error,
| <nil>-----------
|
| I think is related (or should be) about WASession objects not being
| garbage collected or something like that (also, my database is 9g
| and should not be bigger than 1.5-2g).
|
| So... How can I clean all old objects?  (I don't want to export data
| and import it again)
| Can this be related? Any other idea?
|
| thanks,
| Esteban
Reply | Threaded
Open this post in threaded view
|

Re: lock not acquired and error

EstebanLM
Hi,

(sorry for stupid questions below, but it was some time I'm not using glass... so I'm rusty)

1) I *think* I have the maintenance vm running... but just to be sure: how can I check that?
2) what means "MFC"?

thanks,
Esteban

El 16/03/2012, a las 12:45p.m., Dale Henrichs escribió:

> Esteban,
>
> The 'Lock not acquired - retrying' message is not indicative of a real problem ... it indicates that you have two concurrent requests for the same session ... this is not unusual if you are using something like Ajax.
>
> The "Unreportable Error" is coming from FSGsSocketServer>>notifyUnreportableError: and indicates that you are getting a FastCGI error of some sort ... odd that you're getting nil. There are 4 senders of that message, so a little probing with print statements should show the root cause ... when you you get more info about the real error condition, I'd be interested, since that's an obvious hole in the error handling.
>
> With FastCGI getting errors your sessions will "die", so you'll want to track that one down a bit more.
>
> First, I assume that you are running the maintenance vm for seaside session expiration and garbage collection.
>
> Next, I would be suspicious that the ObjectLog could be hanging onto a bunch or error continuations that in turn are keeping a bunch of objects alive. The following expression will give you a count of continuations in your ObjectLog:
>
>    (ObjectLogEntry objectLog select: [ :ea | ea hasContinuation ]) size.
>
> If It's a big number then you should consider cleaning out the continuations:
>
>    ObjectLogEntry objectLog removeAll: (ObjectLogEntry objectLog select: [ :ea | ea hasContinuation ])
>
> Run an MFC to reap the garbage.
>
> Before going to far, you should run a file size report:
>
>  SystemRepository fileSizeReport.
>
> to find out how much of your 9Bb repository is data and how much is free space ... just in case you have more free space than you thought.
>
> There are also several persistent caches[1] that you should clean up before getting to far into this.
>
> If you've done all of the above and the repository is still bigger than you think it should be, then it is time to analyze the repository contents and look for leaks.
>
> James has written tool for scanning a backup of your repository and listing instance counts by class ... this report should give you a starting point for tracking down instances of classes that you don't think should be surviving in your repository.
>
> Dale
>
> [1] http://code.google.com/p/glassdb/wiki/ClearPersistentCaches
> [2] http://seaside.gemstone.com/ss/ScanBackup.html
>
> ----- Original Message -----
> | From: "Esteban Lorenzano" <[hidden email]>
> | To: "GemStone Seaside beta discussion" <[hidden email]>
> | Sent: Friday, March 16, 2012 6:30:10 AM
> | Subject: [GS/SS Beta] lock not acquired and error
> |
> | Hi,
> |
> | I'm having a recurrent error in my production server: more or less
> | once a week, sessions start to die with lock not acquired and then
> | some nil errors. Something more or less like this:
> |
> | -----------  Unreportable ERROR Encountered:
> | 2012-03-15T03:25:60-07:00UserDefinedError 2318: User defined error,
> | <nil>-----------
> | -----------  Unreportable ERROR Encountered:
> | 2012-03-15T03:51:06-07:00UserDefinedError 2318: User defined error,
> | <nil>-----------
> | ----------- Lock not acquired - retrying LOG ENTRY:
> | WASessionLockNotAcquiredException: Session lock denied:
> | 2075-----------
> | -----------  Unreportable ERROR Encountered:
> | 2012-03-15T06:06:15-07:00UserDefinedError 2318: User defined error,
> | <nil>-----------
> | -----------  Unreportable ERROR Encountered:
> | 2012-03-15T07:06:07-07:00UserDefinedError 2318: User defined error,
> | <nil>-----------
> | -----------  Unreportable ERROR Encountered:
> | 2012-03-15T08:08:21-07:00UserDefinedError 2318: User defined error,
> | <nil>-----------
> | -----------  Unreportable ERROR Encountered:
> | 2012-03-15T09:21:34-07:00UserDefinedError 2318: User defined error,
> | <nil>-----------
> |
> | I think is related (or should be) about WASession objects not being
> | garbage collected or something like that (also, my database is 9g
> | and should not be bigger than 1.5-2g).
> |
> | So... How can I clean all old objects?  (I don't want to export data
> | and import it again)
> | Can this be related? Any other idea?
> |
> | thanks,
> | Esteban

Reply | Threaded
Open this post in threaded view
|

Re: lock not acquired and error

Dale Henrichs
Esteban,

No problem ... "stupid questions are welcome!" ...

If you are running with a standard setup, then if you are using runSeasideGems30 to start your seaside gems and the maintenance vm is started by that script as well.

In the standard setup you have 3 seaside gems and a maintence gem, so you should see 4 topaz proceses running:

  ps -efw | grep topaz

The log files for the gems should all be written to /opt/gemstone/log and there should be a maintenance_gem.log at the end of the log there will be a header from the topas process starting with the following line:

  | PROCESS ID: 4067      DATE: 12/06/2011 15:20:08 PST                         |

that will give you the process id of the gem ... the log should also record what it's been doing:

Unregistering...2012-03-16T08:25:36.55504608154297-07:00
...Expired: 0 sessions.
Unregistering...2012-03-16T08:26:36.59475708007812-07:00
...Expired: 0 sessions.
Starting markForCollect.: 2012-03-16T08:26:36.602952003479-07:00
--  MFC  denied, vote state 1 = voting, session not voted: session 8 pid 13431 on foos.gemstone.com, 03/16/2012 08:26:36 AM PDT
--  MFC  denied, vote state 1 = voting, session not voted: session 8 pid 13431 on foos.gemstone.com, 03/16/2012 08:28:16 AM PDT
AbortingError 3006: Garbage collection aborted, reason: <' MFC  denied, vote state 1 = voting, session not voted: session 8 pid 13431 on foos.gemstone.com, 03/16/2012 08:28:26 AM PDT'>, conflict code: 2.  Garbage collection was aborted and should be tried again later.
...finished markForCollect.2012-03-16T08:28:36.62052488327026-07:00


The "Unregistering" records indicate that the gem is expiring seaside sessions.

The "Startting markForCollect" records give you the status of the hourly mark for collect (MFC) run. In the case above the MFC did not run, because a session "did not vote" in this case I had a GemTools image sitting idle and logged in ... normal in a development setup, but not a good idea in a production setup...besides blocking the MFC's an open gemtools session connected to production will eventually cause a "Commit record backlog" which has significant performance impacts ...

Dale

----- Original Message -----
| From: "Esteban Lorenzano" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Friday, March 16, 2012 9:17:13 AM
| Subject: Re: [GS/SS Beta] lock not acquired and error
|
| Hi,
|
| (sorry for stupid questions below, but it was some time I'm not using
| glass... so I'm rusty)
|
| 1) I *think* I have the maintenance vm running... but just to be
| sure: how can I check that?
| 2) what means "MFC"?
|
| thanks,
| Esteban
|
| El 16/03/2012, a las 12:45p.m., Dale Henrichs escribió:
|
| > Esteban,
| >
| > The 'Lock not acquired - retrying' message is not indicative of a
| > real problem ... it indicates that you have two concurrent
| > requests for the same session ... this is not unusual if you are
| > using something like Ajax.
| >
| > The "Unreportable Error" is coming from
| > FSGsSocketServer>>notifyUnreportableError: and indicates that you
| > are getting a FastCGI error of some sort ... odd that you're
| > getting nil. There are 4 senders of that message, so a little
| > probing with print statements should show the root cause ... when
| > you you get more info about the real error condition, I'd be
| > interested, since that's an obvious hole in the error handling.
| >
| > With FastCGI getting errors your sessions will "die", so you'll
| > want to track that one down a bit more.
| >
| > First, I assume that you are running the maintenance vm for seaside
| > session expiration and garbage collection.
| >
| > Next, I would be suspicious that the ObjectLog could be hanging
| > onto a bunch or error continuations that in turn are keeping a
| > bunch of objects alive. The following expression will give you a
| > count of continuations in your ObjectLog:
| >
| >    (ObjectLogEntry objectLog select: [ :ea | ea hasContinuation ])
| >    size.
| >
| > If It's a big number then you should consider cleaning out the
| > continuations:
| >
| >    ObjectLogEntry objectLog removeAll: (ObjectLogEntry objectLog
| >    select: [ :ea | ea hasContinuation ])
| >
| > Run an MFC to reap the garbage.
| >
| > Before going to far, you should run a file size report:
| >
| >  SystemRepository fileSizeReport.
| >
| > to find out how much of your 9Bb repository is data and how much is
| > free space ... just in case you have more free space than you
| > thought.
| >
| > There are also several persistent caches[1] that you should clean
| > up before getting to far into this.
| >
| > If you've done all of the above and the repository is still bigger
| > than you think it should be, then it is time to analyze the
| > repository contents and look for leaks.
| >
| > James has written tool for scanning a backup of your repository and
| > listing instance counts by class ... this report should give you a
| > starting point for tracking down instances of classes that you
| > don't think should be surviving in your repository.
| >
| > Dale
| >
| > [1] http://code.google.com/p/glassdb/wiki/ClearPersistentCaches
| > [2] http://seaside.gemstone.com/ss/ScanBackup.html
| >
| > ----- Original Message -----
| > | From: "Esteban Lorenzano" <[hidden email]>
| > | To: "GemStone Seaside beta discussion"
| > | <[hidden email]>
| > | Sent: Friday, March 16, 2012 6:30:10 AM
| > | Subject: [GS/SS Beta] lock not acquired and error
| > |
| > | Hi,
| > |
| > | I'm having a recurrent error in my production server: more or
| > | less
| > | once a week, sessions start to die with lock not acquired and
| > | then
| > | some nil errors. Something more or less like this:
| > |
| > | -----------  Unreportable ERROR Encountered:
| > | 2012-03-15T03:25:60-07:00UserDefinedError 2318: User defined
| > | error,
| > | <nil>-----------
| > | -----------  Unreportable ERROR Encountered:
| > | 2012-03-15T03:51:06-07:00UserDefinedError 2318: User defined
| > | error,
| > | <nil>-----------
| > | ----------- Lock not acquired - retrying LOG ENTRY:
| > | WASessionLockNotAcquiredException: Session lock denied:
| > | 2075-----------
| > | -----------  Unreportable ERROR Encountered:
| > | 2012-03-15T06:06:15-07:00UserDefinedError 2318: User defined
| > | error,
| > | <nil>-----------
| > | -----------  Unreportable ERROR Encountered:
| > | 2012-03-15T07:06:07-07:00UserDefinedError 2318: User defined
| > | error,
| > | <nil>-----------
| > | -----------  Unreportable ERROR Encountered:
| > | 2012-03-15T08:08:21-07:00UserDefinedError 2318: User defined
| > | error,
| > | <nil>-----------
| > | -----------  Unreportable ERROR Encountered:
| > | 2012-03-15T09:21:34-07:00UserDefinedError 2318: User defined
| > | error,
| > | <nil>-----------
| > |
| > | I think is related (or should be) about WASession objects not
| > | being
| > | garbage collected or something like that (also, my database is 9g
| > | and should not be bigger than 1.5-2g).
| > |
| > | So... How can I clean all old objects?  (I don't want to export
| > | data
| > | and import it again)
| > | Can this be related? Any other idea?
| > |
| > | thanks,
| > | Esteban
|
|