Status: Accepted
Owner: [hidden email] Labels: Type-Defect Priority-Medium GLASS-Server Version-1.0-beta.8.1 Milestone-1.0-beta.8.6 New issue 223 by [hidden email]: "A fatal network protocol error occurred on the Gem to Stone network" [2.4.4.1] http://code.google.com/p/glassdb/issues/detail?id=223 Johan Brichau and I have seen this error on a Mac. In my case I saw the error while doing a seaside30 load into a virgin 2.4.4.1 extent on both Mac and LInux. The linux error is tied into Issue 222 and when I start the stone with the -s option on Linux, the protocol error goes away. Unfortunately, the -s option for the stone doesn't fix the Mac-based problem ... I am still characterizing the problem. |
Comment #1 on issue 223 by [hidden email]: "A fatal network protocol error occurred on the Gem to Stone network" [2.4.4.1] http://code.google.com/p/glassdb/issues/detail?id=223 By setting: GEM_HALT_ON_ERROR=4034; in the system.conf (or gem.conf) the gem will stop and allow one to attach gdb to get a stack (on the mac). On Linux the stack is dumped to the log file. Here is the c stack for Linux: Thread 1 (Thread 0x7f555ccba720 (LWP 28086)): #0 0x00007f555c8ad48d in waitpid () from /lib/libpthread.so.0 #1 0x00007f555b5ff332 in forkAndWait ( #2 0x00007f555b5ff4fc in HostPrintCStack () #3 0x00007f555b635955 in HostPrintCStack_notifyStone() () #4 0x00007f555b5ff855 in HostCallDebuggerMsg ( #5 0x00007f555b5ff8eb in HostCallDebuggerMsg_fl ( #6 0x00007f555b4886a1 in GemSupErr(char const*, unsigned long, int, int, ...) #7 0x00007f555b61cc1f in StnCallProcessMessage(RepWorkspaceType*) () #8 0x00007f555b61d142 in StnCallConverseWithStn_(RepWorkspaceType*, GscMsgBufRefSType*) () #9 0x00007f555b50f44a in StnCallUpdateSharedCounter(WorkEntrySType*, int, int, long) () #10 0x00007f555b5951a6 in SysPrimUpdatePersistentCounter(IntStateSType*, omObjSType**) () at intloopamd64.asmm4:6258 #11 0x00007f555b54c49c in IntLpBCLoop () at intloopam64.m4:1 #12 0x00007f555b53f829 in IntLpSupControlLoop(IntStateSType*) () #13 0x00007f555b53237d in IntContinue(IntStateSType*, omObjSType**, omObjSType**, int, unsigned int, GciErrSType*) () #14 0x00007f555b47deb5 in GemDoContinue(unsigned long, unsigned long*, unsigned long, int, GciErrSType*) () #15 0x00007f555b4556e6 in dispatchLoop(IntStateSType*, LgcStateSType*, RpcStateSType*) () #16 0x00007f555b457497 in GemDoRpcLoop(IntStateSType*, LgcStateSType*, IntGciActStateSType*) () #17 0x00007f555b45241e in doConnect() () #18 0x00007f555b452fea in dispatchCommand() () #19 0x00007f555b45365f in Gdbg(char const*, int) () #20 0x00007f555b45818f in gemMain () #21 0x00000000004015d7 in main (argc=4, argv=0x7fff60efced8) As mentioned, starting the stone on Linux with the -s flag appears to avoid the problem. |
Comment #2 on issue 223 by [hidden email]: "A fatal network protocol error occurred on the Gem to Stone network" [2.4.4.1] http://code.google.com/p/glassdb/issues/detail?id=223 Here's the stack from Johan's mac: Attaching to process 13344. Reading symbols for shared libraries . done Reading symbols for shared libraries ....warning: Could not find object file "/export/orpheus2/users/buildgss/gs64/244x-1/fast42/omdebug.o" - no debug information available for "/export/orpheus2/users/buildgss/gs64/244x-1/src/omdebug.c". warning: Could not find object file "/export/orpheus2/users/buildgss/gs64/244x-1/fast42/hostdebugmt.o" - no debug information available for "/export/orpheus2/users/buildgss/gs64/244x-1/src/hostdebugmt.c". warning: Could not find object file "/export/orpheus2/users/buildgss/gs64/244x-1/fast42/hostdebug.o" - no debug information available for "/export/orpheus2/users/buildgss/gs64/244x-1/src/hostdebug.c". . done 0x00007fff835e8fca in __semwait_signal () (gdb) bt #0 0x00007fff835e8fca in __semwait_signal () #1 0x00007fff8360fd56 in pthread_join () #2 0x00000001003fc3a5 in RepOobThreadType::joinThread () #3 0x00000001003feaeb in StnCallLogout_ () #4 0x00000001002eb68b in StnCallReportFatalError () #5 0x0000000100417c76 in HostPrintCStack_notifyStone () #6 0x00000001003e3afe in HostCallDebuggerMsg () #7 0x00000001003e3b72 in HostCallDebuggerMsg_fl () #8 0x0000000100258854 in GemSupErr () #9 0x00000001003fed41 in StnCallProcessMessage () #10 0x0000000100319e23 in IntLpSupProcessInterruptFlag () #11 0x0000000100335444 in .LuncacheCheck_Interrupts2 () #12 0x00000001003207f8 in IntLpSupControlLoop () #13 0x0000000100314c8d in IntExecuteMethod () #14 0x000000010024ea82 in executeFromCtx () #15 0x000000010024ec60 in GemDoExecuteFromContext () #16 0x0000000100204db6 in gciCall5Args () #17 0x0000000100206348 in GciExecuteFromContextDbg () #18 0x000000010000c759 in TpAuxDispatchCmd () #19 0x0000000100005969 in processCommand () #20 0x00000001000077d8 in topazMain () #21 0x0000000100001424 in start () (gdb) Note that there is no SysPrimUpdatePersistentCounter() function on the stack, so it appears to be a different problem... |
Comment #3 on issue 223 by [hidden email]: "A fatal network protocol error occurred on the Gem to Stone network" [2.4.4.1] http://code.google.com/p/glassdb/issues/detail?id=223 Here's the stack from my mac ... very similar to Johan's mac stack: 0x00007fff827d01a6 in poll () (gdb) where #0 0x00007fff827d01a6 in poll () #1 0x00000001003e50c1 in HostMilliSleep () #2 0x00000001003fce03 in RepOobThreadType::requestThrStop () #3 0x00000001003fe715 in StnCallLogout_ () #4 0x00000001002eb68b in StnCallReportFatalError () #5 0x0000000100417c76 in HostPrintCStack_notifyStone () #6 0x00000001003e3afe in HostCallDebuggerMsg () #7 0x00000001003e3b72 in HostCallDebuggerMsg_fl () #8 0x0000000100258854 in GemSupErr () #9 0x00000001003fed41 in StnCallProcessMessage () #10 0x0000000100319e23 in IntLpSupProcessInterruptFlag () #11 0x0000000100335444 in .LuncacheCheck_Interrupts2 () #12 0x00000001003207f8 in IntLpSupControlLoop () #13 0x0000000100315a4c in IntContinue () #14 0x0000000100248f96 in GemDoContinue () #15 0x000000010022294b in dispatchLoop () #16 0x0000000100224b01 in GemDoRpcLoop () #17 0x000000010021edf8 in doConnect () #18 0x0000000100220761 in dispatchCommand () #19 0x0000000100220d5f in Gdbg () #20 0x0000000100225553 in gemMain () #21 0x0000000100000c30 in main () (gdb) |
Updates:
Labels: bugid-40330 Comment #4 on issue 223 by [hidden email]: "A fatal network protocol error occurred on the Gem to Stone network" [2.4.4.1] http://code.google.com/p/glassdb/issues/detail?id=223 This is an instance of GS/64 server bug 40330, introduced in version 2.3.0, fixed in versions 2.5 and 3.0 of the server product: Bug Number: 40330 Bug Comment: protocol errors from StnCallUpdateSharedCounter Mail From: bille Date: Thu Feb 10 2011 10:16 BugNote Comment: Version: 2.3.0, 2.3.1, 2.3.1.1, 2.3.1.2, 2.3.1.4, 2.3.1.5, 2.3.1.6, 2.3.1.7, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.4.1, 2.4.4.2, 2.4.4.3, 2.4.4.4 Platform: All platforms BugNote Title: Changing shared counters can trigger gsErrStnNetProtocol error BugNote Text: Large-scale applications making heavy use of shared counters may occationally hit #gsErrStnNetProtocol errors (error 4034). Affected methods include: System>>_updateSharedCounterAt:by:withOpCode: System>>persistentCounterAt:put: System>>persistentCounterAt:incrementBy: System>>persistentCounterAt:decrementBy: Workaround: No workaround. |
Free forum by Nabble | Edit this page |