Hi,
In some situations, we end up with errors in an equality index. Since the comment of UnsortedCollection>>auditIndexes says: If the audit returns errors, the indexes should be dropped and rebuilt and the incident should be reported to Gemstone support for analysis. ... I guess I need to send an email here :-) The scenario unfolds *sometimes* when we abort a transaction that had modified the indexed collections. I guess this scenario is what is meant on p.124 of the GS Prog guide in the small paragraph: "If you modify objects that participate in an index, try to commit your transaction, and your commit operation fails, query results can become inconsistent. If this occurs, abort the transaction and try again." Although we do abort the transaction, so it still is not quite that scenario? The setup is as follows: - 2 equality indexes created on a collection of type Set - both indexes are created on an instance variable of type DateAndTime The biggest problem is that the index error eventually means I'm getting bad query results back, which are displayed to the user. So, at this time it seems as if I need to run an audit index after each tx abort that changed the index? Somehow, that seems like quite an overhead... Johan |
Johan,
If you could supply a test case that help us track this down ... Do you use any of the commit when almost out of memory handlers? If could be that a commit of partial results followed by an abort could be the culprit ... When we get these types of errors, we usually do an analysis of the tranlogs and the extent ... so in the absence of a reproducable tests case we may end up needing to get your extent and tranlogs along with the oops of the objects involved...tranlog analysis will give us a trace of the commits and aborts that affect the object. There are some tranlog analysis scripts that are shipped with the product...I should be able to give you some instructions for at least the first order analysis that you can run yourself as well... I'll also need to check to see if there are any recent bug reports/fixes that apply... So let me do a little research here on my end and I'll get back to you.. Dale On 04/18/2011 08:54 AM, Johan Brichau wrote: > Hi, > > In some situations, we end up with errors in an equality index. > Since the comment of UnsortedCollection>>auditIndexes says: > > If the audit returns errors, the indexes should be dropped and rebuilt > and the incident should be reported to Gemstone support for analysis. > > ... I guess I need to send an email here :-) > > The scenario unfolds *sometimes* when we abort a transaction that had modified the indexed collections. > I guess this scenario is what is meant on p.124 of the GS Prog guide in the small paragraph: > > "If you modify objects that participate in an index, try to commit your transaction, and your commit operation fails, query results can become inconsistent. If this occurs, abort the transaction and try again." > > Although we do abort the transaction, so it still is not quite that scenario? > > The setup is as follows: > - 2 equality indexes created on a collection of type Set > - both indexes are created on an instance variable of type DateAndTime > > The biggest problem is that the index error eventually means I'm getting bad query results back, which are displayed to the user. So, at this time it seems as if I need to run an audit index after each tx abort that changed the index? > > Somehow, that seems like quite an overhead... > > Johan |
Hi Dale,
No, we are not using the almost-out-of-mem handlers. Our tx strategy is the standard GLASS tx strategy (i.e. one tx per request) where we also trigger an abort tx (and retry request) when a semantic conflict happened in our own tx blocks nested inside the application code. I will do my best to reconstruct a testcase, but right now, it's trial and error. The error pops up quite frequently though, but not deterministically (as far as I can see). thanks for looking into this! Johan On 18 Apr 2011, at 18:13, Dale Henrichs wrote: > Johan, > > If you could supply a test case that help us track this down ... > > Do you use any of the commit when almost out of memory handlers? If could be that a commit of partial results followed by an abort could be the culprit ... > > When we get these types of errors, we usually do an analysis of the tranlogs and the extent ... so in the absence of a reproducable tests case we may end up needing to get your extent and tranlogs along with the oops of the objects involved...tranlog analysis will give us a trace of the commits and aborts that affect the object. > > There are some tranlog analysis scripts that are shipped with the product...I should be able to give you some instructions for at least the first order analysis that you can run yourself as well... > > I'll also need to check to see if there are any recent bug reports/fixes that apply... > > So let me do a little research here on my end and I'll get back to you.. > > Dale > > > On 04/18/2011 08:54 AM, Johan Brichau wrote: >> Hi, >> >> In some situations, we end up with errors in an equality index. >> Since the comment of UnsortedCollection>>auditIndexes says: >> >> If the audit returns errors, the indexes should be dropped and rebuilt >> and the incident should be reported to Gemstone support for analysis. >> >> ... I guess I need to send an email here :-) >> >> The scenario unfolds *sometimes* when we abort a transaction that had modified the indexed collections. >> I guess this scenario is what is meant on p.124 of the GS Prog guide in the small paragraph: >> >> "If you modify objects that participate in an index, try to commit your transaction, and your commit operation fails, query results can become inconsistent. If this occurs, abort the transaction and try again." >> >> Although we do abort the transaction, so it still is not quite that scenario? >> >> The setup is as follows: >> - 2 equality indexes created on a collection of type Set >> - both indexes are created on an instance variable of type DateAndTime >> >> The biggest problem is that the index error eventually means I'm getting bad query results back, which are displayed to the user. So, at this time it seems as if I need to run an audit index after each tx abort that changed the index? >> >> Somehow, that seems like quite an overhead... >> >> Johan > |
Johan.
Here's a quick example of a script that you can use to start getting a handle on what might be going on: This smalltalk code creates a persistent set and a DateAndTime .. dumps some data into the set, commits, does an abort, then commits after adding the date to the set. then returns the oop of the set and the oop of the date: | set date | UserGlobals at: #TEST_JOHAN put: (set := Set new). date := DateAndTime now. 1 to: 100 do: [:i | set add: i; add: i printString ]. System commitTransaction. set add: date. System abortTransaction. set add: date. System commitTransaction. { set asOop. date asOop }. If you run the following script in the directory where your tranlogs are located (or copy the tranlogs to the current directory). The following shell script will dump out information from the tranlogs about the lifetime of the objects: $GEMSTONE/bin/searchlogs.sh searchlogs.sh <set oop> <date oop> In my case I ran: $GEMSTONE/bin/searchlogs.sh searchlogs.sh 33649665 33648897 to get the following report: 3522.159.0 BeginData session: 5 beginId:(289789 0) user: DataCurator gemhost: foos.gemstone.com clientIP: 10.80.250. newobj 33649665 cls 102401 onPage 343, newobj 33648897 cls 7908097 onPage 343, 3522.191.0 Commit session: 5 beginId:(289789 0) user: DataCurator gemhost: foos.gemstone.com clientIP: 10.80.250. timeWritten: 04/18/2011 10:08:10 AM PDT 3522.192.0 BeginData session: 5 beginId:(289804 3) user: DataCurator gemhost: foos.gemstone.com clientIP: 10.80.250. object 33649665 cls 102401 onPage 337, 3522.192.1 Commit session: 5 beginId:(289804 3) user: DataCurator gemhost: foos.gemstone.com clientIP: 10.80.250. timeWritten: 04/18/2011 10:08:10 AM PDT From this you can see the creation of the two objects (newObj) and commit in record 3522.191.0, then the modification of the set in the commit in record 3522.192.1 If you can get access to the oops of the objects involved in the index corruption (set, btree, and objects referencing the DateTime objects, etc.) we should be able to get a picture of their lifetime and start getting a picture of what might have gone bad ... so if you have old audit results and tranlogs you could get a quick picture ... Send the resulting output with description of the various oops and I should be able to interpret... The searchlogs.sh script is shipped in 2.4.4.1. There is more detailed analysis that can be done but this can give us a quick overview and start to point to where we can look next. Dale handle on On 04/18/2011 10:06 AM, Johan Brichau wrote: > Hi Dale, > > No, we are not using the almost-out-of-mem handlers. > Our tx strategy is the standard GLASS tx strategy (i.e. one tx per request) where we also trigger an abort tx (and retry request) when a semantic conflict happened in our own tx blocks nested inside the application code. > > I will do my best to reconstruct a testcase, but right now, it's trial and error. The error pops up quite frequently though, but not deterministically (as far as I can see). > > thanks for looking into this! > Johan > > > On 18 Apr 2011, at 18:13, Dale Henrichs wrote: > >> Johan, >> >> If you could supply a test case that help us track this down ... >> >> Do you use any of the commit when almost out of memory handlers? If could be that a commit of partial results followed by an abort could be the culprit ... >> >> When we get these types of errors, we usually do an analysis of the tranlogs and the extent ... so in the absence of a reproducable tests case we may end up needing to get your extent and tranlogs along with the oops of the objects involved...tranlog analysis will give us a trace of the commits and aborts that affect the object. >> >> There are some tranlog analysis scripts that are shipped with the product...I should be able to give you some instructions for at least the first order analysis that you can run yourself as well... >> >> I'll also need to check to see if there are any recent bug reports/fixes that apply... >> >> So let me do a little research here on my end and I'll get back to you.. >> >> Dale >> >> >> On 04/18/2011 08:54 AM, Johan Brichau wrote: >>> Hi, >>> >>> In some situations, we end up with errors in an equality index. >>> Since the comment of UnsortedCollection>>auditIndexes says: >>> >>> If the audit returns errors, the indexes should be dropped and rebuilt >>> and the incident should be reported to Gemstone support for analysis. >>> >>> ... I guess I need to send an email here :-) >>> >>> The scenario unfolds *sometimes* when we abort a transaction that had modified the indexed collections. >>> I guess this scenario is what is meant on p.124 of the GS Prog guide in the small paragraph: >>> >>> "If you modify objects that participate in an index, try to commit your transaction, and your commit operation fails, query results can become inconsistent. If this occurs, abort the transaction and try again." >>> >>> Although we do abort the transaction, so it still is not quite that scenario? >>> >>> The setup is as follows: >>> - 2 equality indexes created on a collection of type Set >>> - both indexes are created on an instance variable of type DateAndTime >>> >>> The biggest problem is that the index error eventually means I'm getting bad query results back, which are displayed to the user. So, at this time it seems as if I need to run an audit index after each tx abort that changed the index? >>> >>> Somehow, that seems like quite an overhead... >>> >>> Johan >> > |
Free forum by Nabble | Edit this page |