A fatal network protocol error occurred on the Gem to Stone network ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

A fatal network protocol error occurred on the Gem to Stone network ?

Johan Brichau-2
Hi all,

Topaz is throwing the following error message at me.
I'm running a script that is looping to create a lot of objects in the database (I'm benching my database size requirements). Hence, it should run for quite some time but after a couple of minutes, this error occurs.

What is happening?

btw: Best wishes!
Johan

===========

Unexpected packet received from Stone:
 2,
[Info]: Logging out at 01/04/11 16:09:04 CET
-----------------------------------------------------
GemStone: Error         Fatal
A fatal network protocol error occurred on the Gem to Stone network.,
22
Error Category: [GemStone] Number: 4034 Arg Count: 1
Arg 1: 178

Reply | Threaded
Open this post in threaded view
|

Re: A fatal network protocol error occurred on the Gem to Stone network ?

Sounble, Hassan
Maybe you are not committing frequently, and hence getting a large CR-Backlog?
Hassan

-----Ursprüngliche Nachricht-----
Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Johan Brichau
Gesendet: Dienstag, 04. Jänner 2011 16:16
An: GemStone Seaside beta discussion
Betreff: [GS/SS Beta] A fatal network protocol error occurred on the Gem to Stone network ?

Hi all,

Topaz is throwing the following error message at me.
I'm running a script that is looping to create a lot of objects in the database (I'm benching my database size requirements). Hence, it should run for quite some time but after a couple of minutes, this error occurs.

What is happening?

btw: Best wishes!
Johan

===========

Unexpected packet received from Stone:
 2,
[Info]: Logging out at 01/04/11 16:09:04 CET
-----------------------------------------------------
GemStone: Error         Fatal
A fatal network protocol error occurred on the Gem to Stone network.,
22
Error Category: [GemStone] Number: 4034 Arg Count: 1
Arg 1: 178

Reply | Threaded
Open this post in threaded view
|

Re: A fatal network protocol error occurred on the Gem to Stone network ?

Dale Henrichs
In reply to this post by Johan Brichau-2
Johan,

Could you check the stone log? I would think that there would be a
corresponding message in the stone log that would give us a bit more
information..


Dale

On 01/04/2011 07:16 AM, Johan Brichau wrote:

> Hi all,
>
> Topaz is throwing the following error message at me.
> I'm running a script that is looping to create a lot of objects in the database (I'm benching my database size requirements). Hence, it should run for quite some time but after a couple of minutes, this error occurs.
>
> What is happening?
>
> btw: Best wishes!
> Johan
>
> ===========
>
> Unexpected packet received from Stone:
>   2,
> [Info]: Logging out at 01/04/11 16:09:04 CET
> -----------------------------------------------------
> GemStone: Error         Fatal
> A fatal network protocol error occurred on the Gem to Stone network.,
> 22
> Error Category: [GemStone] Number: 4034 Arg Count: 1
> Arg 1: 178
>

Reply | Threaded
Open this post in threaded view
|

Re: A fatal network protocol error occurred on the Gem to Stone network ?

Johan Brichau-2
Hi Dale,

I checked the stone log and... as impossible that that may seem but there is no mention whatsoever about the incident.

The last log entries right before the incident can be found in the reclaimgcgem and admingcgem logs, indicating normal operation (to me).
Just to be sure: you do mean the $GEMSTONE/log/seaside.log ?

I also found out that we seem to be leaking repository space at an enormeous rate (at least in the script that is filling my db). After clearing my "db root" from the UserGlobals, no repository space is regained. When I'm tracing a reference path to the instances that are still present (but should be gone), I'm always ending up with a reference path that starts with: SharedDependencyLists -> aDepListBucket(...) -> aDependencyList( ... ) -> ...
The problem is: that does not learn me much why these objects are kept alive :-(

Every loop of the script is a series of transactions, because otherwise I was running out of temp space.

The two things might be related, or might not, because we're not seeing any of the above issues in normal seaside operation.

Thanks for reading this far already, if you can shed more light on this, that would be awesome ;-)
Johan

On 04 Jan 2011, at 18:48, Dale Henrichs wrote:

> Johan,
>
> Could you check the stone log? I would think that there would be a corresponding message in the stone log that would give us a bit more information..
>
>
> Dale
>
> On 01/04/2011 07:16 AM, Johan Brichau wrote:
>> Hi all,
>>
>> Topaz is throwing the following error message at me.
>> I'm running a script that is looping to create a lot of objects in the database (I'm benching my database size requirements). Hence, it should run for quite some time but after a couple of minutes, this error occurs.
>>
>> What is happening?
>>
>> btw: Best wishes!
>> Johan
>>
>> ===========
>>
>> Unexpected packet received from Stone:
>>  2,
>> [Info]: Logging out at 01/04/11 16:09:04 CET
>> -----------------------------------------------------
>> GemStone: Error         Fatal
>> A fatal network protocol error occurred on the Gem to Stone network.,
>> 22
>> Error Category: [GemStone] Number: 4034 Arg Count: 1
>> Arg 1: 178
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: A fatal network protocol error occurred on the Gem to Stone network ?

Johan Brichau-2
I found the problem of our 'repository space leak': it seems we are keeping too many indexes flying around, which we thought would have been dead, markeable and sweepable.

Okay, need to re-read that part of the GS manual to fix that for sure ;-)

On 05 Jan 2011, at 18:43, Johan Brichau wrote:

> Hi Dale,
>
> I checked the stone log and... as impossible that that may seem but there is no mention whatsoever about the incident.
>
> The last log entries right before the incident can be found in the reclaimgcgem and admingcgem logs, indicating normal operation (to me).
> Just to be sure: you do mean the $GEMSTONE/log/seaside.log ?
>
> I also found out that we seem to be leaking repository space at an enormeous rate (at least in the script that is filling my db). After clearing my "db root" from the UserGlobals, no repository space is regained. When I'm tracing a reference path to the instances that are still present (but should be gone), I'm always ending up with a reference path that starts with: SharedDependencyLists -> aDepListBucket(...) -> aDependencyList( ... ) -> ...
> The problem is: that does not learn me much why these objects are kept alive :-(
>
> Every loop of the script is a series of transactions, because otherwise I was running out of temp space.
>
> The two things might be related, or might not, because we're not seeing any of the above issues in normal seaside operation.
>
> Thanks for reading this far already, if you can shed more light on this, that would be awesome ;-)
> Johan
>
> On 04 Jan 2011, at 18:48, Dale Henrichs wrote:
>
>> Johan,
>>
>> Could you check the stone log? I would think that there would be a corresponding message in the stone log that would give us a bit more information..
>>
>>
>> Dale
>>
>> On 01/04/2011 07:16 AM, Johan Brichau wrote:
>>> Hi all,
>>>
>>> Topaz is throwing the following error message at me.
>>> I'm running a script that is looping to create a lot of objects in the database (I'm benching my database size requirements). Hence, it should run for quite some time but after a couple of minutes, this error occurs.
>>>
>>> What is happening?
>>>
>>> btw: Best wishes!
>>> Johan
>>>
>>> ===========
>>>
>>> Unexpected packet received from Stone:
>>> 2,
>>> [Info]: Logging out at 01/04/11 16:09:04 CET
>>> -----------------------------------------------------
>>> GemStone: Error         Fatal
>>> A fatal network protocol error occurred on the Gem to Stone network.,
>>> 22
>>> Error Category: [GemStone] Number: 4034 Arg Count: 1
>>> Arg 1: 178
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: A fatal network protocol error occurred on the Gem to Stone network ?

Dale Henrichs
In reply to this post by Johan Brichau-2
Hey Johan,

The protocol error that you are getting in topaz is usually generated
when the topaz (or gem) executable is from a different release than the
stone that it is running against, and I initially thought that the error
should/would happen immediately upon login.....But there is the
possibility that a certain message between the stone and topaz is
running into the protocol error.

So you should ensure that your topaz and stone executables are for the
same version by looking at the most recent header in the stone log
($GEMSTONE/log/seaside.log):

+-----------------------------------------------------------------------------+
|    PROGRAM: STONE, Stone Repository Monitor
|    VERSION: 2.4.4.1, Fri Jul  9 10:53:12 2010 dhenrich private build
|      BUILD: gss64_2_4_4_x_branch-23807-PRIVATE
|  BUILT FOR: x86-64 (Linux)


  and the header that is dumped when you fire up topaz.

+-----------------------------------------------------------------------------+
|    PROGRAM: topaz, Linear GemStone Interface (Remote Session)
|    VERSION: 2.4.4.1, Fri Jul  9 10:53:12 2010 dhenrich private build
|      BUILD: gss64_2_4_4_x_branch-23807 PRIVATE
|  BUILT FOR: x86-64 (Linux)

Regarding the "leaking repository space" if you have entries in the
SharedDependencyLists, then most likely you are using indexes ....
indexed collections have strong references from internal data structures
(SharedDependencyLists are just one of them). Before dropping an indexed
collection on the floor, you need to explicitly remove the indexes on
the collection. You can do a bulk clean up on indexes with the following
expression:

   IndexManager current removeAllIndexes.

The IndexManager has strong references to all collections that have indexes:

   IndexManager current getAllNSCRoots

returns the set of all collections with indexes...

If you are not using indexes then you must be using modification
tracking (the indexing system uses modification tracking). If you _are_
using modification tracking then you should be doing a DependencyList
class>>removeTracker:for: when you no longer want to track the object
(i.e., you are dropping it on the floor).

You can do a bulk cleanup of the DependencyList with the following:

   | keys |

   keys := DependencyList depMapKeys.
   keys do: [:key |
     DependencyList set: nil for: key
   ].

Note that doing the above will corrupt any indexes that may exist so
this expression is really only useful in the case where you are dropping
everything on the floor ...

Hope this helps...

Dale

On 01/05/2011 09:43 AM, Johan Brichau wrote:

> Hi Dale,
>
> I checked the stone log and... as impossible that that may seem but
> there is no mention whatsoever about the incident.
>
> The last log entries right before the incident can be found in the
> reclaimgcgem and admingcgem logs, indicating normal operation (to
> me). Just to be sure: you do mean the $GEMSTONE/log/seaside.log ?
>
> I also found out that we seem to be leaking repository space at an
> enormeous rate (at least in the script that is filling my db). After
> clearing my "db root" from the UserGlobals, no repository space is
> regained. When I'm tracing a reference path to the instances that are
> still present (but should be gone), I'm always ending up with a
> reference path that starts with: SharedDependencyLists ->
> aDepListBucket(...) ->  aDependencyList( ... ) ->  ... The problem
> is: that does not learn me much why these objects are kept alive :-(
>
> Every loop of the script is a series of transactions, because
> otherwise I was running out of temp space.
>
> The two things might be related, or might not, because we're not
> seeing any of the above issues in normal seaside operation.
>
> Thanks for reading this far already, if you can shed more light on
> this, that would be awesome ;-) Johan
>
> On 04 Jan 2011, at 18:48, Dale Henrichs wrote:
>
>> Johan,
>>
>> Could you check the stone log? I would think that there would be a
>> corresponding message in the stone log that would give us a bit
>> more information..
>>
>>
>> Dale
>>
>> On 01/04/2011 07:16 AM, Johan Brichau wrote:
>>> Hi all,
>>>
>>> Topaz is throwing the following error message at me. I'm running
>>> a script that is looping to create a lot of objects in the
>>> database (I'm benching my database size requirements). Hence, it
>>> should run for quite some time but after a couple of minutes,
>>> this error occurs.
>>>
>>> What is happening?
>>>
>>> btw: Best wishes! Johan
>>>
>>> ===========
>>>
>>> Unexpected packet received from Stone: 2, [Info]: Logging out at
>>> 01/04/11 16:09:04 CET
>>> ----------------------------------------------------- GemStone:
>>> Error         Fatal A fatal network protocol error occurred on
>>> the Gem to Stone network., 22 Error Category: [GemStone] Number:
>>> 4034 Arg Count: 1 Arg 1: 178
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: A fatal network protocol error occurred on the Gem to Stone network ?

Dale Henrichs
In reply to this post by Johan Brichau-2
I guess our emails passed by in the internets:) At least I've given you
a clue or two about how to clean up the leftover indexes:)

Dale

On 01/05/2011 11:46 AM, Johan Brichau wrote:

> I found the problem of our 'repository space leak': it seems we are keeping too many indexes flying around, which we thought would have been dead, markeable and sweepable.
>
> Okay, need to re-read that part of the GS manual to fix that for sure ;-)
>
> On 05 Jan 2011, at 18:43, Johan Brichau wrote:
>
>> Hi Dale,
>>
>> I checked the stone log and... as impossible that that may seem but there is no mention whatsoever about the incident.
>>
>> The last log entries right before the incident can be found in the reclaimgcgem and admingcgem logs, indicating normal operation (to me).
>> Just to be sure: you do mean the $GEMSTONE/log/seaside.log ?
>>
>> I also found out that we seem to be leaking repository space at an enormeous rate (at least in the script that is filling my db). After clearing my "db root" from the UserGlobals, no repository space is regained. When I'm tracing a reference path to the instances that are still present (but should be gone), I'm always ending up with a reference path that starts with: SharedDependencyLists ->  aDepListBucket(...) ->  aDependencyList( ... ) ->  ...
>> The problem is: that does not learn me much why these objects are kept alive :-(
>>
>> Every loop of the script is a series of transactions, because otherwise I was running out of temp space.
>>
>> The two things might be related, or might not, because we're not seeing any of the above issues in normal seaside operation.
>>
>> Thanks for reading this far already, if you can shed more light on this, that would be awesome ;-)
>> Johan
>>
>> On 04 Jan 2011, at 18:48, Dale Henrichs wrote:
>>
>>> Johan,
>>>
>>> Could you check the stone log? I would think that there would be a corresponding message in the stone log that would give us a bit more information..
>>>
>>>
>>> Dale
>>>
>>> On 01/04/2011 07:16 AM, Johan Brichau wrote:
>>>> Hi all,
>>>>
>>>> Topaz is throwing the following error message at me.
>>>> I'm running a script that is looping to create a lot of objects in the database (I'm benching my database size requirements). Hence, it should run for quite some time but after a couple of minutes, this error occurs.
>>>>
>>>> What is happening?
>>>>
>>>> btw: Best wishes!
>>>> Johan
>>>>
>>>> ===========
>>>>
>>>> Unexpected packet received from Stone:
>>>> 2,
>>>> [Info]: Logging out at 01/04/11 16:09:04 CET
>>>> -----------------------------------------------------
>>>> GemStone: Error         Fatal
>>>> A fatal network protocol error occurred on the Gem to Stone network.,
>>>> 22
>>>> Error Category: [GemStone] Number: 4034 Arg Count: 1
>>>> Arg 1: 178
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: A fatal network protocol error occurred on the Gem to Stone network ?

Johan Brichau-2
In reply to this post by Dale Henrichs
Dale,

It seems I'm getting this error only on Mac installations (tried 2 separate installations).
I'm doing the exact same experiment on a linux installation and it never pops up.

On 05 Jan 2011, at 21:17, Dale Henrichs wrote:

> The protocol error that you are getting in topaz is usually generated
> when the topaz (or gem) executable is from a different release than the
> stone that it is running against, and I initially thought that the error
> should/would happen immediately upon login.....But there is the
> possibility that a certain message between the stone and topaz is
> running into the protocol error.
>
> So you should ensure that your topaz and stone executables are for the
> same version by looking at the most recent header in the stone log
> ($GEMSTONE/log/seaside.log):

Reply | Threaded
Open this post in threaded view
|

Re: A fatal network protocol error occurred on the Gem to Stone network ?

Johan Brichau-2

On 08 Jan 2011, at 12:48, Johan Brichau wrote:

> It seems I'm getting this error only on Mac installations (tried 2 separate installations).
> I'm doing the exact same experiment on a linux installation and it never pops up.

I have to withdraw this statement.: it fails under linux too.
It just takes more time before it fails.

Johan
Reply | Threaded
Open this post in threaded view
|

Re: A fatal network protocol error occurred on the Gem to Stone network ?

Dale Henrichs
In reply to this post by Johan Brichau-2
Johan is continuing to get this error in 2.4.4.1 on a regular basis ...
happens more frequently on Mac than on Linux but he has seen it on both
platforms ... this isn't happening in production ...

His topaz session and stone are from the same build:

+-----------------------------------------------------------------------------+
|    PROGRAM: STONE, Stone Repository Monitor
       |
|    VERSION: 2.4.4.1, Tue Jul 13 15:19:59 2010
       |
|      BUILD: gss64_2_4_4_x_branch-23813


+-----------------------------------------------------------------------------+
|    PROGRAM: topaz, Linear GemStone Interface (Linked Session)
       |
|    VERSION: 2.4.4.1, Tue Jul 13 15:19:59 2010
       |
|      BUILD: gss64_2_4_4_x_branch-23813
       |
|  BUILT FOR: Darwin (Apple Macintosh x86)
       |

Any other ideas about how to go about debugging this?

Dale

On 01/04/2011 07:16 AM, Johan Brichau wrote:

> Hi all,
>
> Topaz is throwing the following error message at me.
> I'm running a script that is looping to create a lot of objects in the database (I'm benching my database size requirements). Hence, it should run for quite some time but after a couple of minutes, this error occurs.
>
> What is happening?
>
> btw: Best wishes!
> Johan
>
> ===========
>
> Unexpected packet received from Stone:
>   2,
> [Info]: Logging out at 01/04/11 16:09:04 CET
> -----------------------------------------------------
> GemStone: Error         Fatal
> A fatal network protocol error occurred on the Gem to Stone network.,
> 22
> Error Category: [GemStone] Number: 4034 Arg Count: 1
> Arg 1: 178
>

Reply | Threaded
Open this post in threaded view
|

Re: A fatal network protocol error occurred on the Gem to Stone network ?

Dale Henrichs
In reply to this post by Johan Brichau-2
Johan,

I'm checking on what we should try as the next step for debugging this...

Dale

On 01/09/2011 01:41 AM, Johan Brichau wrote:

>
> On 08 Jan 2011, at 12:48, Johan Brichau wrote:
>
>> It seems I'm getting this error only on Mac installations (tried 2 separate installations).
>> I'm doing the exact same experiment on a linux installation and it never pops up.
>
> I have to withdraw this statement.: it fails under linux too.
> It just takes more time before it fails.
>
> Johan

Reply | Threaded
Open this post in threaded view
|

Re: A fatal network protocol error occurred on the Gem to Stone network ?

Dale Henrichs
In reply to this post by Dale Henrichs
woops ... meant to send this o an internal mailing list:)

Dale

On 01/10/2011 09:10 AM, Dale Henrichs wrote:

> Johan is continuing to get this error in 2.4.4.1 on a regular basis ...
> happens more frequently on Mac than on Linux but he has seen it on both
> platforms ... this isn't happening in production ...
>
> His topaz session and stone are from the same build:
>
> +-----------------------------------------------------------------------------+
> |    PROGRAM: STONE, Stone Repository Monitor
>         |
> |    VERSION: 2.4.4.1, Tue Jul 13 15:19:59 2010
>         |
> |      BUILD: gss64_2_4_4_x_branch-23813
>
>
> +-----------------------------------------------------------------------------+
> |    PROGRAM: topaz, Linear GemStone Interface (Linked Session)
>         |
> |    VERSION: 2.4.4.1, Tue Jul 13 15:19:59 2010
>         |
> |      BUILD: gss64_2_4_4_x_branch-23813
>         |
> |  BUILT FOR: Darwin (Apple Macintosh x86)
>         |
>
> Any other ideas about how to go about debugging this?
>
> Dale
>
> On 01/04/2011 07:16 AM, Johan Brichau wrote:
>> Hi all,
>>
>> Topaz is throwing the following error message at me.
>> I'm running a script that is looping to create a lot of objects in the database (I'm benching my database size requirements). Hence, it should run for quite some time but after a couple of minutes, this error occurs.
>>
>> What is happening?
>>
>> btw: Best wishes!
>> Johan
>>
>> ===========
>>
>> Unexpected packet received from Stone:
>>    2,
>> [Info]: Logging out at 01/04/11 16:09:04 CET
>> -----------------------------------------------------
>> GemStone: Error         Fatal
>> A fatal network protocol error occurred on the Gem to Stone network.,
>> 22
>> Error Category: [GemStone] Number: 4034 Arg Count: 1
>> Arg 1: 178
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: A fatal network protocol error occurred on the Gem to Stone network ?

Dale Henrichs
In reply to this post by Dale Henrichs
Johan,

For the next step:

   Set GEM_HALT_ON_ERROR=4034 (in your system.conf file) and send us
   the stacks from the gem log file.

Dale

On 01/10/2011 09:10 AM, Dale Henrichs wrote:

> Johan,
>
> I'm checking on what we should try as the next step for debugging this...
>
> Dale
>
> On 01/09/2011 01:41 AM, Johan Brichau wrote:
>>
>> On 08 Jan 2011, at 12:48, Johan Brichau wrote:
>>
>>> It seems I'm getting this error only on Mac installations (tried 2 separate installations).
>>> I'm doing the exact same experiment on a linux installation and it never pops up.
>>
>> I have to withdraw this statement.: it fails under linux too.
>> It just takes more time before it fails.
>>
>> Johan
>

Reply | Threaded
Open this post in threaded view
|

Re: A fatal network protocol error occurred on the Gem to Stone network ?

Johan Brichau-2
Hi Dale,

This is the only thing I'm getting:


Unexpected packet received from Stone:
 2, GemStone error 4034 matches GEMCFG_HALT_ON_ERROR config value.

HostCallDebugger: invoked at: Tue Jan 11 22:54:10 CET 2011

UTL_GUARANTEE failed, File /export/orpheus2/users/buildgss/gs64/244x-1/src/gemsup.c line 1051

Begin attempt to print C-level stack at: Tue Jan 11 22:54:10 CET 2011


End of C-level stack:


hostcalldebugger invoked in process 3935, at 01/11/11 22:54:10.866 CET
 notifying stone of fatal error

[Info]: Logging out at 01/11/11 22:54:12 CET


On 10 Jan 2011, at 18:49, Dale Henrichs wrote:

> Johan,
>
> For the next step:
>
>  Set GEM_HALT_ON_ERROR=4034 (in your system.conf file) and send us
>  the stacks from the gem log file.
>
> Dale
>
> On 01/10/2011 09:10 AM, Dale Henrichs wrote:
>> Johan,
>>
>> I'm checking on what we should try as the next step for debugging this...
>>
>> Dale
>>
>> On 01/09/2011 01:41 AM, Johan Brichau wrote:
>>>
>>> On 08 Jan 2011, at 12:48, Johan Brichau wrote:
>>>
>>>> It seems I'm getting this error only on Mac installations (tried 2 separate installations).
>>>> I'm doing the exact same experiment on a linux installation and it never pops up.
>>>
>>> I have to withdraw this statement.: it fails under linux too.
>>> It just takes more time before it fails.
>>>
>>> Johan
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: A fatal network protocol error occurred on the Gem to Stone network ?

Dale Henrichs
Johan,

Apparently the C-stack printing on the Mac is broken. Here's the
recommendation from one of our developers:

   On the Mac, you have to be super user to run gdb, so the automatic C
   stack printing is broken. Once it goes to host call debugger, he can
   do 'sudo gdb' and attach the process to get the stack. Or just run on
   Linux.

If you're not familiar with gdb:

   once you get the command prompt after 'sudo gdb', you can attach to
   the process using 'attach <pid>' using the pid of the gem process.
   Then type 'where' to get the stack.

Dale

On 01/11/2011 02:06 PM, Johan Brichau wrote:

> Hi Dale,
>
> This is the only thing I'm getting:
>
>
> Unexpected packet received from Stone:
>   2, GemStone error 4034 matches GEMCFG_HALT_ON_ERROR config value.
>
> HostCallDebugger: invoked at: Tue Jan 11 22:54:10 CET 2011
>
> UTL_GUARANTEE failed, File /export/orpheus2/users/buildgss/gs64/244x-1/src/gemsup.c line 1051
>
> Begin attempt to print C-level stack at: Tue Jan 11 22:54:10 CET 2011
>
>
> End of C-level stack:
>
>
> hostcalldebugger invoked in process 3935, at 01/11/11 22:54:10.866 CET
>   notifying stone of fatal error
>
> [Info]: Logging out at 01/11/11 22:54:12 CET
>
>
> On 10 Jan 2011, at 18:49, Dale Henrichs wrote:
>
>> Johan,
>>
>> For the next step:
>>
>>   Set GEM_HALT_ON_ERROR=4034 (in your system.conf file) and send us
>>   the stacks from the gem log file.
>>
>> Dale
>>
>> On 01/10/2011 09:10 AM, Dale Henrichs wrote:
>>> Johan,
>>>
>>> I'm checking on what we should try as the next step for debugging this...
>>>
>>> Dale
>>>
>>> On 01/09/2011 01:41 AM, Johan Brichau wrote:
>>>>
>>>> On 08 Jan 2011, at 12:48, Johan Brichau wrote:
>>>>
>>>>> It seems I'm getting this error only on Mac installations (tried 2 separate installations).
>>>>> I'm doing the exact same experiment on a linux installation and it never pops up.
>>>>
>>>> I have to withdraw this statement.: it fails under linux too.
>>>> It just takes more time before it fails.
>>>>
>>>> Johan
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: A fatal network protocol error occurred on the Gem to Stone network ?

Dale Henrichs
Johan,

One more detail, if you want to use gdb to get the stack:

   you'll also want to set GS_CORE_TIME_OUT in your shell environment
   (environment of netldi, if running RPC) to the number of seconds you
   want processes to hang around after entering HostCallDebugger().

The default timeout is about 60 seconds, so you should bump the timeout
up if you need to attach with gdb or won't be paying close attention the
process...in development our timeout is set to a couple of days so that
the process will survive a weekend and still be attachable...

Dale

On 01/11/2011 03:47 PM, Dale Henrichs wrote:

> Johan,
>
> Apparently the C-stack printing on the Mac is broken. Here's the
> recommendation from one of our developers:
>
>     On the Mac, you have to be super user to run gdb, so the automatic C
>     stack printing is broken. Once it goes to host call debugger, he can
>     do 'sudo gdb' and attach the process to get the stack. Or just run on
>     Linux.
>
> If you're not familiar with gdb:
>
>     once you get the command prompt after 'sudo gdb', you can attach to
>     the process using 'attach<pid>' using the pid of the gem process.
>     Then type 'where' to get the stack.
>
> Dale
>
> On 01/11/2011 02:06 PM, Johan Brichau wrote:
>> Hi Dale,
>>
>> This is the only thing I'm getting:
>>
>>
>> Unexpected packet received from Stone:
>>    2, GemStone error 4034 matches GEMCFG_HALT_ON_ERROR config value.
>>
>> HostCallDebugger: invoked at: Tue Jan 11 22:54:10 CET 2011
>>
>> UTL_GUARANTEE failed, File /export/orpheus2/users/buildgss/gs64/244x-1/src/gemsup.c line 1051
>>
>> Begin attempt to print C-level stack at: Tue Jan 11 22:54:10 CET 2011
>>
>>
>> End of C-level stack:
>>
>>
>> hostcalldebugger invoked in process 3935, at 01/11/11 22:54:10.866 CET
>>    notifying stone of fatal error
>>
>> [Info]: Logging out at 01/11/11 22:54:12 CET
>>
>>
>> On 10 Jan 2011, at 18:49, Dale Henrichs wrote:
>>
>>> Johan,
>>>
>>> For the next step:
>>>
>>>    Set GEM_HALT_ON_ERROR=4034 (in your system.conf file) and send us
>>>    the stacks from the gem log file.
>>>
>>> Dale
>>>
>>> On 01/10/2011 09:10 AM, Dale Henrichs wrote:
>>>> Johan,
>>>>
>>>> I'm checking on what we should try as the next step for debugging this...
>>>>
>>>> Dale
>>>>
>>>> On 01/09/2011 01:41 AM, Johan Brichau wrote:
>>>>>
>>>>> On 08 Jan 2011, at 12:48, Johan Brichau wrote:
>>>>>
>>>>>> It seems I'm getting this error only on Mac installations (tried 2 separate installations).
>>>>>> I'm doing the exact same experiment on a linux installation and it never pops up.
>>>>>
>>>>> I have to withdraw this statement.: it fails under linux too.
>>>>> It just takes more time before it fails.
>>>>>
>>>>> Johan
>>>>
>>>
>>
>