errors in indexes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

errors in indexes

Johan Brichau-2
Hi,

In some situations, we end up with errors in an equality index.
Since the comment of UnsortedCollection>>auditIndexes says:
 
        If the audit returns errors, the indexes should be dropped and rebuilt
  and the incident should be reported to Gemstone support for analysis.

... I guess I need to send an email here :-)

The scenario unfolds *sometimes* when we abort a transaction that had modified the indexed collections.
I guess this scenario is what is meant on p.124 of the GS Prog guide in the small paragraph:

"If you modify objects that participate in an index, try to commit your transaction, and your commit operation fails, query results can become inconsistent. If this occurs, abort the transaction and try again."

Although we do abort the transaction, so it still is not quite that scenario?

The setup is as follows:
- 2 equality indexes created on a collection of type Set
- both indexes are created on an instance variable of type DateAndTime

The biggest problem is that the index error eventually means I'm getting bad query results back, which are displayed to the user. So, at this time it seems as if I need to run an audit index after each tx abort that changed  the index?

Somehow, that seems like quite an overhead...

Johan
Reply | Threaded
Open this post in threaded view
|

Re: errors in indexes

Dale Henrichs
Johan,

If you could supply a test case that help us track this down ...

Do you use any of the commit when almost out of memory handlers? If
could be that a commit of partial results followed by an abort could be
the culprit ...

When we get these types of errors, we usually do an analysis of the
tranlogs and the extent ... so in the absence of a reproducable tests
case we may end up needing to get your extent and tranlogs along with
the oops of the objects involved...tranlog analysis will give us a trace
of the commits and aborts that affect the object.

There are some tranlog analysis scripts that are shipped with the
product...I should be able to give you some instructions for at least
the first order analysis that you can run yourself as well...

I'll also need to check to see if there are any recent bug reports/fixes
that apply...

So let me do a little research here on my end and I'll get back to you..

Dale


On 04/18/2011 08:54 AM, Johan Brichau wrote:

> Hi,
>
> In some situations, we end up with errors in an equality index.
> Since the comment of UnsortedCollection>>auditIndexes says:
>
> If the audit returns errors, the indexes should be dropped and rebuilt
>   and the incident should be reported to Gemstone support for analysis.
>
> ... I guess I need to send an email here :-)
>
> The scenario unfolds *sometimes* when we abort a transaction that had modified the indexed collections.
> I guess this scenario is what is meant on p.124 of the GS Prog guide in the small paragraph:
>
> "If you modify objects that participate in an index, try to commit your transaction, and your commit operation fails, query results can become inconsistent. If this occurs, abort the transaction and try again."
>
> Although we do abort the transaction, so it still is not quite that scenario?
>
> The setup is as follows:
> - 2 equality indexes created on a collection of type Set
> - both indexes are created on an instance variable of type DateAndTime
>
> The biggest problem is that the index error eventually means I'm getting bad query results back, which are displayed to the user. So, at this time it seems as if I need to run an audit index after each tx abort that changed  the index?
>
> Somehow, that seems like quite an overhead...
>
> Johan

Reply | Threaded
Open this post in threaded view
|

Re: errors in indexes

Johan Brichau-2
Hi Dale,

No, we are not using the almost-out-of-mem handlers.
Our tx strategy is the standard GLASS tx strategy (i.e. one tx per request) where we also trigger an abort tx (and retry request) when a semantic conflict happened in our own tx blocks nested inside the application code.

I will do my best to reconstruct a testcase, but right now, it's trial and error. The error pops up quite frequently though, but not deterministically (as far as I can see).

thanks for looking into this!
Johan


On 18 Apr 2011, at 18:13, Dale Henrichs wrote:

> Johan,
>
> If you could supply a test case that help us track this down ...
>
> Do you use any of the commit when almost out of memory handlers? If could be that a commit of partial results followed by an abort could be the culprit ...
>
> When we get these types of errors, we usually do an analysis of the tranlogs and the extent ... so in the absence of a reproducable tests case we may end up needing to get your extent and tranlogs along with the oops of the objects involved...tranlog analysis will give us a trace of the commits and aborts that affect the object.
>
> There are some tranlog analysis scripts that are shipped with the product...I should be able to give you some instructions for at least the first order analysis that you can run yourself as well...
>
> I'll also need to check to see if there are any recent bug reports/fixes that apply...
>
> So let me do a little research here on my end and I'll get back to you..
>
> Dale
>
>
> On 04/18/2011 08:54 AM, Johan Brichau wrote:
>> Hi,
>>
>> In some situations, we end up with errors in an equality index.
>> Since the comment of UnsortedCollection>>auditIndexes says:
>>
>> If the audit returns errors, the indexes should be dropped and rebuilt
>>   and the incident should be reported to Gemstone support for analysis.
>>
>> ... I guess I need to send an email here :-)
>>
>> The scenario unfolds *sometimes* when we abort a transaction that had modified the indexed collections.
>> I guess this scenario is what is meant on p.124 of the GS Prog guide in the small paragraph:
>>
>> "If you modify objects that participate in an index, try to commit your transaction, and your commit operation fails, query results can become inconsistent. If this occurs, abort the transaction and try again."
>>
>> Although we do abort the transaction, so it still is not quite that scenario?
>>
>> The setup is as follows:
>> - 2 equality indexes created on a collection of type Set
>> - both indexes are created on an instance variable of type DateAndTime
>>
>> The biggest problem is that the index error eventually means I'm getting bad query results back, which are displayed to the user. So, at this time it seems as if I need to run an audit index after each tx abort that changed  the index?
>>
>> Somehow, that seems like quite an overhead...
>>
>> Johan
>

Reply | Threaded
Open this post in threaded view
|

Re: errors in indexes

Dale Henrichs
Johan.

Here's a quick example of a script that you can use to start getting a
handle on what might be going on:

This smalltalk code creates a persistent set and a DateAndTime .. dumps
some data into the set, commits, does an abort, then commits after
adding the date to the set. then returns the oop of the set and the oop
of the date:

| set date |
UserGlobals at: #TEST_JOHAN put: (set := Set new).
date := DateAndTime now.

1 to: 100 do: [:i |
        set add: i; add: i printString ].
System commitTransaction.

set add: date.

System abortTransaction.

set add: date.

System commitTransaction.

{ set asOop. date asOop }.

If you run the following script in the directory where your tranlogs are
located (or copy the tranlogs to the current directory). The following
shell script will dump out information from the tranlogs about the
lifetime of the objects:

   $GEMSTONE/bin/searchlogs.sh searchlogs.sh <set oop> <date oop>

In my case I ran:

   $GEMSTONE/bin/searchlogs.sh searchlogs.sh 33649665 33648897

to get the following report:

3522.159.0  BeginData session: 5 beginId:(289789 0) user: DataCurator
     gemhost: foos.gemstone.com clientIP: 10.80.250.
   newobj  33649665 cls 102401 onPage 343,  newobj  33648897 cls 7908097
     onPage 343,
3522.191.0  Commit session: 5 beginId:(289789 0) user: DataCurator
     gemhost: foos.gemstone.com clientIP: 10.80.250.
   timeWritten: 04/18/2011 10:08:10 AM PDT

3522.192.0  BeginData session: 5 beginId:(289804 3) user: DataCurator
     gemhost: foos.gemstone.com clientIP: 10.80.250.
   object  33649665 cls 102401 onPage 337,
3522.192.1  Commit session: 5 beginId:(289804 3) user: DataCurator
     gemhost: foos.gemstone.com clientIP: 10.80.250.
   timeWritten: 04/18/2011 10:08:10 AM PDT

 From this you can see the creation of the two objects (newObj) and
commit in record 3522.191.0, then the modification of the set in the
commit in record 3522.192.1

If you can get access to the oops of the objects involved in the index
corruption (set, btree, and objects referencing the DateTime objects,
etc.) we should be able to get a picture of their lifetime and start
getting a picture of what might have gone bad ... so if you have old
audit results and tranlogs you could get a quick picture ...

Send the resulting output with description of the various oops and I
should be able to interpret...

The searchlogs.sh script is shipped in 2.4.4.1.

There is more detailed analysis that can be done but this can give us a
quick overview and start to point to where we can look next.

Dale



handle on On 04/18/2011 10:06 AM, Johan Brichau wrote:

> Hi Dale,
>
> No, we are not using the almost-out-of-mem handlers.
> Our tx strategy is the standard GLASS tx strategy (i.e. one tx per request) where we also trigger an abort tx (and retry request) when a semantic conflict happened in our own tx blocks nested inside the application code.
>
> I will do my best to reconstruct a testcase, but right now, it's trial and error. The error pops up quite frequently though, but not deterministically (as far as I can see).
>
> thanks for looking into this!
> Johan
>
>
> On 18 Apr 2011, at 18:13, Dale Henrichs wrote:
>
>> Johan,
>>
>> If you could supply a test case that help us track this down ...
>>
>> Do you use any of the commit when almost out of memory handlers? If could be that a commit of partial results followed by an abort could be the culprit ...
>>
>> When we get these types of errors, we usually do an analysis of the tranlogs and the extent ... so in the absence of a reproducable tests case we may end up needing to get your extent and tranlogs along with the oops of the objects involved...tranlog analysis will give us a trace of the commits and aborts that affect the object.
>>
>> There are some tranlog analysis scripts that are shipped with the product...I should be able to give you some instructions for at least the first order analysis that you can run yourself as well...
>>
>> I'll also need to check to see if there are any recent bug reports/fixes that apply...
>>
>> So let me do a little research here on my end and I'll get back to you..
>>
>> Dale
>>
>>
>> On 04/18/2011 08:54 AM, Johan Brichau wrote:
>>> Hi,
>>>
>>> In some situations, we end up with errors in an equality index.
>>> Since the comment of UnsortedCollection>>auditIndexes says:
>>>
>>> If the audit returns errors, the indexes should be dropped and rebuilt
>>>   and the incident should be reported to Gemstone support for analysis.
>>>
>>> ... I guess I need to send an email here :-)
>>>
>>> The scenario unfolds *sometimes* when we abort a transaction that had modified the indexed collections.
>>> I guess this scenario is what is meant on p.124 of the GS Prog guide in the small paragraph:
>>>
>>> "If you modify objects that participate in an index, try to commit your transaction, and your commit operation fails, query results can become inconsistent. If this occurs, abort the transaction and try again."
>>>
>>> Although we do abort the transaction, so it still is not quite that scenario?
>>>
>>> The setup is as follows:
>>> - 2 equality indexes created on a collection of type Set
>>> - both indexes are created on an instance variable of type DateAndTime
>>>
>>> The biggest problem is that the index error eventually means I'm getting bad query results back, which are displayed to the user. So, at this time it seems as if I need to run an audit index after each tx abort that changed  the index?
>>>
>>> Somehow, that seems like quite an overhead...
>>>
>>> Johan
>>
>