Random Internal Server Error

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Random Internal Server Error

keropotter
Hi!!

This time i need to ask about a random "internal server error" that occurs on our application. Looking at the fastCGI logs (we have 3 fastCGI running) we get this kind of lines when the error occurs:


Write-Dependency Conflicts...
Write-ReadLock Conflicts...
Write-WriteLock Conflicts...
Rc-Write-Write Conflicts...
Synchronized-Commit Conflicts...
----------- Commit failure - retrying LOG ENTRY: aSymbolDictionary-----------
-----------  Unreportable ERROR Encountered: 2012-04-19T11:40:51.53651309013367-07:00UserDefinedError: send(69,0x2b7e586c3e10,117,0) failed with errno=107,ENOTCONN, Socket is not connected-----------
failure
Read-Write Conflicts...
Write-Write Conflicts...
    16975315969
    16975317249
    16975319297
    16975674881
    16975675137
    16975675393
    16975675649
    16975675905
    16976480513
    16976484353
    16976484609
    16976484865
    16976485121
    16976485377
    16976485889
    16976486145
    16976486401
    16976486657
    16976486913
    16979452673
    16984274177
Write-Dependency Conflicts...
Write-ReadLock Conflicts...
Write-WriteLock Conflicts...
Rc-Write-Write Conflicts...
Synchronized-Commit Conflicts...
----------- Commit failure - retrying LOG ENTRY: aSymbolDictionary-----------
-----------  Unreportable ERROR Encountered: 2012-04-19T11:41:05.54662108421326-07:00UserDefinedError: send(70,0x2b7e587f2688,117,0) failed with errno=107,ENOTCONN, Socket is not connected-----------
-----------  Unreportable ERROR Encountered: 2012-04-19T16:16:42.11317491531372-07:00UserDefinedError: send(69,0x2b7e5874a2e0,94,0) failed with errno=107,ENOTCONN, Socket is not connected-----------





Does anybody know how to solve this issue or why it happens? It's absolutely random. Sometimes you just click on a link that requires almost no processing on the server and it crashes like that.

We're working on Gemstone 3.0.1 64bits, Seaside 3.0.

Any hints? We're about to go live with the project and this should not happen on a production environment.

Thanks in advance!!
Alejandro
Reply | Threaded
Open this post in threaded view
|

Re: Random Internal Server Error

Dale Henrichs
Alejandro,

You have two different errors showing up in the log:

  Commit failure
  ENOTCONN

The "Commit failure" is not quite as bad as it looks. The system is actually working as expected:

  When you get a commit failure while processing a transaction, the system aborts
  and retries the original HTTP request. If you get 10 retry failures, you'll get a
  "Too many retries" log entry.

Since you have reported any "Too many retries" messages from your log, then it is likely that the commit failures are affecting the user's experience.

With that said, I agree that you should dig into this a bit deeper ... under load a commit conflict like this may turn into a 'real problem'.

If you look at GRGemstonePlatfrom>>seasideProcessRequestWtihRetry:resultBlock: you'll see the code that is logging the commit conflict:

  self doCommitTransaction
  ifFalse: [ | conflicts |
    conflicts := System transactionConflicts.
    self doAbortTransaction.
    self
    saveLogEntry: (WAObjectLogEntry warn: 'Commit failure - retrying' request: aNativeRequest object: conflicts)  
    shouldCommit: true.
    ^nil "retry request"].

Notice that we are using the #saveLogEntry:shouldCommit: and passing in the result of System class>>transactionConflicts, which is a dictionary reporting the objects that are involved in the commit conflict. Besides writing to the gem log file (those numbers are oops), the dictionary is stashed in the Object Log.

You can inspect the Object Log from a GemTools image (see the ObjectLog class for access methods) or better, you can look at the Object Log through Seaside. If you haven't already enabled the object log tool, evaluate the following:

  WAAdmin
        register: WAObjectLog
        asApplicationAt: 'tools/objectLog'
        user: 'admin' password: 'tool'.

For a production systems you want to make sure that your customer facing web browser will not allow access to 'tools/objectLog' ... oh and to be safe, change the password:). Then set up a separate seaside gem that runs Swazoo instead of fastCGI and make the port for Swazoo accessible to your development engineers.

When you hit the object log tool, you'll be able to filter out all log entries except errors and inspect the conflicts dictionary directly ... with this you can see which classes are involved in the conflicts:

  You are getting write-write conflicts which means that two
  separate Seaside requests are concurrently updating the same object.

If the objects look to be Seaside session state objects, then it is likely a bug that I will have to fix. If the objects are your business objects, then you will have to address the problem ... Most likely you are updating a shared collection of users, inventory or something similar and you will need to replace the collection with one of the RC classes.

When you've characterized your issue a bit I can help there ...

If you can't tell why the conflicts are occurring then I'l try to help with debugging (there are some techniques that you can try).
 
The ENOTCONN sounds like a case where the client disconnected before we had finished processing. The error is coming from FSGsSocketServer and it could provide more information[1] to help with understanding the root cause ...

Basically, an ENOTCONN during accept (one of the callers) has a different significance, than getting an ENOTCONN during a read or write (client disconnect)...

Dale

[1] http://code.google.com/p/glassdb/issues/detail?id=339
----- Original Message -----
| From: "keropotter" <[hidden email]>
| To: [hidden email]
| Sent: Saturday, April 21, 2012 6:45:42 PM
| Subject: [GS/SS Beta] Random Internal Server Error
|
| Hi!!
|
| This time i need to ask about a random "internal server error" that
| occurs
| on our application. Looking at the fastCGI logs (we have 3 fastCGI
| running)
| we get this kind of lines when the error occurs:
|
|
| Write-Dependency Conflicts...
| Write-ReadLock Conflicts...
| Write-WriteLock Conflicts...
| Rc-Write-Write Conflicts...
| Synchronized-Commit Conflicts...
| ----------- Commit failure - retrying LOG ENTRY:
| aSymbolDictionary-----------
| -----------  Unreportable ERROR Encountered:
| 2012-04-19T11:40:51.53651309013367-07:00UserDefinedError:
| send(69,0x2b7e586c3e10,117,0) failed with errno=107,ENOTCONN, Socket
| is not
| connected-----------
| failure
| Read-Write Conflicts...
| Write-Write Conflicts...
|     16975315969
|     16975317249
|     16975319297
|     16975674881
|     16975675137
|     16975675393
|     16975675649
|     16975675905
|     16976480513
|     16976484353
|     16976484609
|     16976484865
|     16976485121
|     16976485377
|     16976485889
|     16976486145
|     16976486401
|     16976486657
|     16976486913
|     16979452673
|     16984274177
| Write-Dependency Conflicts...
| Write-ReadLock Conflicts...
| Write-WriteLock Conflicts...
| Rc-Write-Write Conflicts...
| Synchronized-Commit Conflicts...
| ----------- Commit failure - retrying LOG ENTRY:
| aSymbolDictionary-----------
| -----------  Unreportable ERROR Encountered:
| 2012-04-19T11:41:05.54662108421326-07:00UserDefinedError:
| send(70,0x2b7e587f2688,117,0) failed with errno=107,ENOTCONN, Socket
| is not
| connected-----------
| -----------  Unreportable ERROR Encountered:
| 2012-04-19T16:16:42.11317491531372-07:00UserDefinedError:
| send(69,0x2b7e5874a2e0,94,0) failed with errno=107,ENOTCONN, Socket
| is not
| connected-----------
|
|
|
|
|
| Does anybody know how to solve this issue or why it happens? It's
| absolutely
| random. Sometimes you just click on a link that requires almost no
| processing on the server and it crashes like that.
|
| We're working on Gemstone 3.0.1 64bits, Seaside 3.0.
|
| Any hints? We're about to go live with the project and this should
| not
| happen on a production environment.
|
| Thanks in advance!!
| Alejandro
|
| --
| View this message in context:
| http://forum.world.st/Random-Internal-Server-Error-tp4577501p4577501.html
| Sent from the GLASS mailing list archive at Nabble.com.
|