Hello everybody,
I need some advice from people who know the inner workings of Glorp. First, let me describe the background. I am currently confronted with an older VW application which is based on the ObjectLens and uses an Oracle database. It has performance problems, because the ObjectLens executes inserts and updates as separate statements for each row. The Oracle Call Interface and the OCI support array binding, i.e. executing a single call of the OCIStmtExecute() function with arrays as parameters and the "iters" parameter set to the number of elements of the arrays. The ObjectLens does not support this, but will execute 1000 INSERT statements when inserting 1000 rows, which is slower than using array binding for a single insert by an incredibly large factor. It will also drive your DBA nuts. ;-) I looked inside the ObjectLens to see if it could be altered to use array binding for Oracle databases, but that would not be easy. The code is a bit too convoluted, with methods doing preprocessing, executing SQL, and postprocessing for single rows. It's probably possible to refactor this, but it won't be what I consider fun. Another option would be porting the application to Glorp. However, when looking into the code it seems that it supports array binding only for INSERT statements (InsertCommands). The method comment of RowBasedCommand>>supportsGroupWriting says: "Return true if array binding or a similar mechanism can be used to execute us with multiple commands at once. I think in general this can only ever make sense for inserts (and possibly selects under some circumstances). Updates may want to update only part of the fields." IMO it is wrong that array binding only makes sense for INSERTS, at least with an Oracle database. The Oracle Call Interface supports update and delete commands with array binding AFAIK. E.g., one can delete multiple rows with the statement DELETE FROM myTable WHERE someColumn = :1 by binding :1 to an array of values. Also, Glorp limits the number of rows which will be passed to an INSERT command by array binding to 1000 (RowBasedCommand>>tooBigForSingleWrite). The OCI has no such limitation. Of course, binding more than 1000 rows to a single call will allocate larger OracleBuffers, but if the array elements are small (e.g. Integers) and there is enough memory available, it is no problem at all to use array binding with many more elements in a bind array. Another issue is that for each row (or group of rows when using array binding), a new ExternalDatabaseSession is created, prepared, bound, and executed. This is a bad practice (which could be fixed at the EXDI level in VW, but I don't know about other dialects), because preparing the same statement over and over again is an expensive operation which involves parsing the SQL statement, computing an execution path etc. Oracle servers will cache the results in the SGA, but this still requires unnecessary network roundtrips and computations in the database server. It should be implemented in a way which caches the result of the "prepare" step, and binds and executes the ExternalDatabaseSession in a loop. These limitations can seriously hamper the performance of Glorp based applications when they have to process mass data, just as it was the case with the ObjectLens. So, what I would like to know is: * Is my analysis concerning the behavior of Glorp correct? * Would it be sufficient to alter the default return value of supportsGroupWriting to true to enable array binding for UPDATE and DELETE commands? Or would there be other things which need to be changed, e.g. in DatabasePlatform and its subclasses? * For UPDATEs, supporting array binding may not be trivial, depending on the changes which have to be posted. Different objects may have changes in different attributes. It would be possible to compute the set of all columns which have a change for at least one row within a group of rows, and update them all for all of the rows. When the number of rows is large enough and array binding can be used, this will still be much faster than posting dozens or hundreds of different single-row updates (not to speak of thousands). How difficult would it be to change the Glorp implementation to do this? * How much effort would it be to reuse prepared ExternalDatabaseSessions in Glorp? Thanks in advance, Joachim Geidel _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
(Responding on Saturday afternoon without having actually looked at any code)
As I recall, the issue with updates is that you're only modifying a subset of fields, so you'd have to make the update statement say it was updating all of the fields and generate the previous values in there. This reduces the improvement in efficiency, introduces the possibility for spurious concurrency conflicts if a field had changed but you didn't actually care about that change, and would require a certain amount more additional work. It would probably be possible to make it work, but I think it would be a middling amount of effort. Making it work for deletes would, I think, be quite easy. I think the limit on the maximum number of rows does not apply when array binding, but was intended for things like Postgresql where we don't have array binding, but rather concatenate multiple statements together semicolon-separated. Or if it does apply, I don't think there's any reason it ought to, and the code should probably be changed. Glorp caches prepared statements (ExternalDatabaseSession) objects, so if the SQL statement matches, it will try to reuse a previously prepared session. At 09:59 AM 3/7/2009, Joachim Geidel wrote: >Hello everybody, > >I need some advice from people who know the inner workings of Glorp. First, >let me describe the background. > >I am currently confronted with an older VW application which is based on the >ObjectLens and uses an Oracle database. It has performance problems, because >the ObjectLens executes inserts and updates as separate statements for each >row. > >The Oracle Call Interface and the OCI support array binding, i.e. executing >a single call of the OCIStmtExecute() function with arrays as parameters and >the "iters" parameter set to the number of elements of the arrays. The >ObjectLens does not support this, but will execute 1000 INSERT statements >when inserting 1000 rows, which is slower than using array binding for a >single insert by an incredibly large factor. It will also drive your DBA >nuts. ;-) > >I looked inside the ObjectLens to see if it could be altered to use array >binding for Oracle databases, but that would not be easy. The code is a bit >too convoluted, with methods doing preprocessing, executing SQL, and >postprocessing for single rows. It's probably possible to refactor this, but >it won't be what I consider fun. > >Another option would be porting the application to Glorp. However, when >looking into the code it seems that it supports array binding only for >INSERT statements (InsertCommands). The method comment of >RowBasedCommand>>supportsGroupWriting says: > >"Return true if array binding or a similar mechanism can be used to execute >us with multiple commands at once. I think in general this can only ever >make sense for inserts (and possibly selects under some circumstances). >Updates may want to update only part of the fields." > >IMO it is wrong that array binding only makes sense for INSERTS, at least >with an Oracle database. The Oracle Call Interface supports update and >delete commands with array binding AFAIK. E.g., one can delete multiple rows >with the statement > DELETE FROM myTable WHERE someColumn = :1 >by binding :1 to an array of values. > >Also, Glorp limits the number of rows which will be passed to an INSERT >command by array binding to 1000 (RowBasedCommand>>tooBigForSingleWrite). >The OCI has no such limitation. Of course, binding more than 1000 rows to a >single call will allocate larger OracleBuffers, but if the array elements >are small (e.g. Integers) and there is enough memory available, it is no >problem at all to use array binding with many more elements in a bind array. > >Another issue is that for each row (or group of rows when using array >binding), a new ExternalDatabaseSession is created, prepared, bound, and >executed. This is a bad practice (which could be fixed at the EXDI level in >VW, but I don't know about other dialects), because preparing the same >statement over and over again is an expensive operation which involves >parsing the SQL statement, computing an execution path etc. Oracle servers >will cache the results in the SGA, but this still requires unnecessary >network roundtrips and computations in the database server. It should be >implemented in a way which caches the result of the "prepare" step, and >binds and executes the ExternalDatabaseSession in a loop. > >These limitations can seriously hamper the performance of Glorp based >applications when they have to process mass data, just as it was the case >with the ObjectLens. > >So, what I would like to know is: > >* Is my analysis concerning the behavior of Glorp correct? > >* Would it be sufficient to alter the default return value of >supportsGroupWriting to true to enable array binding for UPDATE and DELETE >commands? Or would there be other things which need to be changed, e.g. in >DatabasePlatform and its subclasses? > >* For UPDATEs, supporting array binding may not be trivial, depending on the >changes which have to be posted. Different objects may have changes in >different attributes. It would be possible to compute the set of all columns >which have a change for at least one row within a group of rows, and update >them all for all of the rows. When the number of rows is large enough and >array binding can be used, this will still be much faster than posting >dozens or hundreds of different single-row updates (not to speak of >thousands). How difficult would it be to change the Glorp implementation to >do this? > >* How much effort would it be to reuse prepared ExternalDatabaseSessions in >Glorp? > >Thanks in advance, > >Joachim Geidel > > > >_______________________________________________ >vwnc mailing list >[hidden email] >http://lists.cs.uiuc.edu/mailman/listinfo/vwnc -- Alan Knight [|], Engineering Manager, Cincom Smalltalk [hidden email] [hidden email] http://www.cincom.com/smalltalk _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Joachim Geidel
I did something of the same for Lens -> Oracle DB, modifying the addAll:
to perform array binding (per type of course) of the objects instead of committing them one by one... Strangely enough, this speeded things up by alot! I can't guarantee that it'll always work, but for us it's left no visible side-effects, been in production for 1/2 year or so with no reported problems. Can't really remember if I put any constraints on the inputs anymore, guess I should comment abit more :) Named it addAllFast: as it wasn't really needed for all operations, and I'd like to isolate potental problems as much as possible. Cheers, Henry On 07.03.2009 15:59, Joachim Geidel wrote: > Hello everybody, > > I need some advice from people who know the inner workings of Glorp. First, > let me describe the background. > > I am currently confronted with an older VW application which is based on the > ObjectLens and uses an Oracle database. It has performance problems, because > the ObjectLens executes inserts and updates as separate statements for each > row. > > The Oracle Call Interface and the OCI support array binding, i.e. executing > a single call of the OCIStmtExecute() function with arrays as parameters and > the "iters" parameter set to the number of elements of the arrays. The > ObjectLens does not support this, but will execute 1000 INSERT statements > when inserting 1000 rows, which is slower than using array binding for a > single insert by an incredibly large factor. It will also drive your DBA > nuts. ;-) > > I looked inside the ObjectLens to see if it could be altered to use array > binding for Oracle databases, but that would not be easy. The code is a bit > too convoluted, with methods doing preprocessing, executing SQL, and > postprocessing for single rows. It's probably possible to refactor this, but > it won't be what I consider fun. > > Another option would be porting the application to Glorp. However, when > looking into the code it seems that it supports array binding only for > INSERT statements (InsertCommands). The method comment of > RowBasedCommand>>supportsGroupWriting says: > > "Return true if array binding or a similar mechanism can be used to execute > us with multiple commands at once. I think in general this can only ever > make sense for inserts (and possibly selects under some circumstances). > Updates may want to update only part of the fields." > > IMO it is wrong that array binding only makes sense for INSERTS, at least > with an Oracle database. The Oracle Call Interface supports update and > delete commands with array binding AFAIK. E.g., one can delete multiple rows > with the statement > DELETE FROM myTable WHERE someColumn = :1 > by binding :1 to an array of values. > > Also, Glorp limits the number of rows which will be passed to an INSERT > command by array binding to 1000 (RowBasedCommand>>tooBigForSingleWrite). > The OCI has no such limitation. Of course, binding more than 1000 rows to a > single call will allocate larger OracleBuffers, but if the array elements > are small (e.g. Integers) and there is enough memory available, it is no > problem at all to use array binding with many more elements in a bind array. > > Another issue is that for each row (or group of rows when using array > binding), a new ExternalDatabaseSession is created, prepared, bound, and > executed. This is a bad practice (which could be fixed at the EXDI level in > VW, but I don't know about other dialects), because preparing the same > statement over and over again is an expensive operation which involves > parsing the SQL statement, computing an execution path etc. Oracle servers > will cache the results in the SGA, but this still requires unnecessary > network roundtrips and computations in the database server. It should be > implemented in a way which caches the result of the "prepare" step, and > binds and executes the ExternalDatabaseSession in a loop. > > These limitations can seriously hamper the performance of Glorp based > applications when they have to process mass data, just as it was the case > with the ObjectLens. > > So, what I would like to know is: > > * Is my analysis concerning the behavior of Glorp correct? > > * Would it be sufficient to alter the default return value of > supportsGroupWriting to true to enable array binding for UPDATE and DELETE > commands? Or would there be other things which need to be changed, e.g. in > DatabasePlatform and its subclasses? > > * For UPDATEs, supporting array binding may not be trivial, depending on the > changes which have to be posted. Different objects may have changes in > different attributes. It would be possible to compute the set of all columns > which have a change for at least one row within a group of rows, and update > them all for all of the rows. When the number of rows is large enough and > array binding can be used, this will still be much faster than posting > dozens or hundreds of different single-row updates (not to speak of > thousands). How difficult would it be to change the Glorp implementation to > do this? > > * How much effort would it be to reuse prepared ExternalDatabaseSessions in > Glorp? > > Thanks in advance, > > Joachim Geidel > > > > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > > > _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc LensSession-addAllFast.zip (1K) Download Attachment |
Interesting. I'm surprised it worked as well as that. Glorp goes to a fair bit of effort to make that sort of thing possible, although it also gets other benefits from the same work. In particular, it takes all of the objects to be written in a particular transaction, generates all the row information, and then sorts the rows by type and based on the foreign key constraints in order to write as many rows as possible in the same operation.
At 02:19 PM 3/7/2009, Henrik Johansen wrote: >I did something of the same for Lens -> Oracle DB, modifying the addAll: to perform array binding (per type of course) of the objects instead of committing them one by one... Strangely enough, this speeded things up by alot! > >I can't guarantee that it'll always work, but for us it's left no visible side-effects, been in production for 1/2 year or so with no reported problems. Can't really remember if I put any constraints on the inputs anymore, guess I should comment abit more :) >Named it addAllFast: as it wasn't really needed for all operations, and I'd like to isolate potental problems as much as possible. > >Cheers, >Henry > >On 07.03.2009 15:59, Joachim Geidel wrote: >>Hello everybody, >> >>I need some advice from people who know the inner workings of Glorp. First, >>let me describe the background. >> >>I am currently confronted with an older VW application which is based on the >>ObjectLens and uses an Oracle database. It has performance problems, because >>the ObjectLens executes inserts and updates as separate statements for each >>row. >> >>The Oracle Call Interface and the OCI support array binding, i.e. executing >>a single call of the OCIStmtExecute() function with arrays as parameters and >>the "iters" parameter set to the number of elements of the arrays. The >>ObjectLens does not support this, but will execute 1000 INSERT statements >>when inserting 1000 rows, which is slower than using array binding for a >>single insert by an incredibly large factor. It will also drive your DBA >>nuts. ;-) >> >>I looked inside the ObjectLens to see if it could be altered to use array >>binding for Oracle databases, but that would not be easy. The code is a bit >>too convoluted, with methods doing preprocessing, executing SQL, and >>postprocessing for single rows. It's probably possible to refactor this, but >>it won't be what I consider fun. >> >>Another option would be porting the application to Glorp. However, when >>looking into the code it seems that it supports array binding only for >>INSERT statements (InsertCommands). The method comment of >>RowBasedCommand>>supportsGroupWriting says: >> >>"Return true if array binding or a similar mechanism can be used to execute >>us with multiple commands at once. I think in general this can only ever >>make sense for inserts (and possibly selects under some circumstances). >>Updates may want to update only part of the fields." >> >>IMO it is wrong that array binding only makes sense for INSERTS, at least >>with an Oracle database. The Oracle Call Interface supports update and >>delete commands with array binding AFAIK. E.g., one can delete multiple rows >>with the statement >> DELETE FROM myTable WHERE someColumn = :1 >>by binding :1 to an array of values. >> >>Also, Glorp limits the number of rows which will be passed to an INSERT >>command by array binding to 1000 (RowBasedCommand>>tooBigForSingleWrite). >>The OCI has no such limitation. Of course, binding more than 1000 rows to a >>single call will allocate larger OracleBuffers, but if the array elements >>are small (e.g. Integers) and there is enough memory available, it is no >>problem at all to use array binding with many more elements in a bind array. >> >>Another issue is that for each row (or group of rows when using array >>binding), a new ExternalDatabaseSession is created, prepared, bound, and >>executed. This is a bad practice (which could be fixed at the EXDI level in >>VW, but I don't know about other dialects), because preparing the same >>statement over and over again is an expensive operation which involves >>parsing the SQL statement, computing an execution path etc. Oracle servers >>will cache the results in the SGA, but this still requires unnecessary >>network roundtrips and computations in the database server. It should be >>implemented in a way which caches the result of the "prepare" step, and >>binds and executes the ExternalDatabaseSession in a loop. >> >>These limitations can seriously hamper the performance of Glorp based >>applications when they have to process mass data, just as it was the case >>with the ObjectLens. >> >>So, what I would like to know is: >> >>* Is my analysis concerning the behavior of Glorp correct? >> >>* Would it be sufficient to alter the default return value of >>supportsGroupWriting to true to enable array binding for UPDATE and DELETE >>commands? Or would there be other things which need to be changed, e.g. in >>DatabasePlatform and its subclasses? >> >>* For UPDATEs, supporting array binding may not be trivial, depending on the >>changes which have to be posted. Different objects may have changes in >>different attributes. It would be possible to compute the set of all columns >>which have a change for at least one row within a group of rows, and update >>them all for all of the rows. When the number of rows is large enough and >>array binding can be used, this will still be much faster than posting >>dozens or hundreds of different single-row updates (not to speak of >>thousands). How difficult would it be to change the Glorp implementation to >>do this? >> >>* How much effort would it be to reuse prepared ExternalDatabaseSessions in >>Glorp? >> >>Thanks in advance, >> >>Joachim Geidel >> >> >> >>_______________________________________________ >>vwnc mailing list >>[hidden email] >>http://lists.cs.uiuc.edu/mailman/listinfo/vwnc >> >> >> > > > > >_______________________________________________ >vwnc mailing list >[hidden email] >http://lists.cs.uiuc.edu/mailman/listinfo/vwnc -- Alan Knight [|], Engineering Manager, Cincom Smalltalk [hidden email] [hidden email] http://www.cincom.com/smalltalk _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Henrik Sperre Johansen
TBH though, I forgot to mention...
In our usecase, the objects saved are never used/retrieved from the DB again, it's simply data results from an analysis used to create reports in Crystal Reports. I have no idea if this approach will work for persistent objects in an application, as I've never used the method for those. Cheers, Henry On 07.03.2009 22:29, Alan Knight wrote: Interesting. I'm surprised it worked as well as that. Glorp goes to a fair bit of effort to make that sort of thing possible, although it also gets other benefits from the same work. In particular, it takes all of the objects to be written in a particular transaction, generates all the row information, and then sorts the rows by type and based on the foreign key constraints in order to write as many rows as possible in the same operation. At 02:19 PM 3/7/2009, Henrik Johansen wrote:I did something of the same for Lens -> Oracle DB, modifying the addAll: to perform array binding (per type of course) of the objects instead of committing them one by one... Strangely enough, this speeded things up by alot! I can't guarantee that it'll always work, but for us it's left no visible side-effects, been in production for 1/2 year or so with no reported problems. Can't really remember if I put any constraints on the inputs anymore, guess I should comment abit more :) Named it addAllFast: as it wasn't really needed for all operations, and I'd like to isolate potental problems as much as possible. Cheers, Henry On 07.03.2009 15:59, Joachim Geidel wrote:Hello everybody, I need some advice from people who know the inner workings of Glorp. First, let me describe the background. I am currently confronted with an older VW application which is based on the ObjectLens and uses an Oracle database. It has performance problems, because the ObjectLens executes inserts and updates as separate statements for each row. The Oracle Call Interface and the OCI support array binding, i.e. executing a single call of the OCIStmtExecute() function with arrays as parameters and the "iters" parameter set to the number of elements of the arrays. The ObjectLens does not support this, but will execute 1000 INSERT statements when inserting 1000 rows, which is slower than using array binding for a single insert by an incredibly large factor. It will also drive your DBA nuts. ;-) I looked inside the ObjectLens to see if it could be altered to use array binding for Oracle databases, but that would not be easy. The code is a bit too convoluted, with methods doing preprocessing, executing SQL, and postprocessing for single rows. It's probably possible to refactor this, but it won't be what I consider fun. Another option would be porting the application to Glorp. However, when looking into the code it seems that it supports array binding only for INSERT statements (InsertCommands). The method comment of RowBasedCommand>>supportsGroupWriting says: "Return true if array binding or a similar mechanism can be used to execute us with multiple commands at once. I think in general this can only ever make sense for inserts (and possibly selects under some circumstances). Updates may want to update only part of the fields." IMO it is wrong that array binding only makes sense for INSERTS, at least with an Oracle database. The Oracle Call Interface supports update and delete commands with array binding AFAIK. E.g., one can delete multiple rows with the statement DELETE FROM myTable WHERE someColumn = :1 by binding :1 to an array of values. Also, Glorp limits the number of rows which will be passed to an INSERT command by array binding to 1000 (RowBasedCommand>>tooBigForSingleWrite). The OCI has no such limitation. Of course, binding more than 1000 rows to a single call will allocate larger OracleBuffers, but if the array elements are small (e.g. Integers) and there is enough memory available, it is no problem at all to use array binding with many more elements in a bind array. Another issue is that for each row (or group of rows when using array binding), a new ExternalDatabaseSession is created, prepared, bound, and executed. This is a bad practice (which could be fixed at the EXDI level in VW, but I don't know about other dialects), because preparing the same statement over and over again is an expensive operation which involves parsing the SQL statement, computing an execution path etc. Oracle servers will cache the results in the SGA, but this still requires unnecessary network roundtrips and computations in the database server. It should be implemented in a way which caches the result of the "prepare" step, and binds and executes the ExternalDatabaseSession in a loop. These limitations can seriously hamper the performance of Glorp based applications when they have to process mass data, just as it was the case with the ObjectLens. So, what I would like to know is: * Is my analysis concerning the behavior of Glorp correct? * Would it be sufficient to alter the default return value of supportsGroupWriting to true to enable array binding for UPDATE and DELETE commands? Or would there be other things which need to be changed, e.g. in DatabasePlatform and its subclasses? * For UPDATEs, supporting array binding may not be trivial, depending on the changes which have to be posted. Different objects may have changes in different attributes. It would be possible to compute the set of all columns which have a change for at least one row within a group of rows, and update them all for all of the rows. When the number of rows is large enough and array binding can be used, this will still be much faster than posting dozens or hundreds of different single-row updates (not to speak of thousands). How difficult would it be to change the Glorp implementation to do this? * How much effort would it be to reuse prepared ExternalDatabaseSessions in Glorp? Thanks in advance, Joachim Geidel _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc-- Alan Knight [|], Engineering Manager, Cincom Smalltalk [hidden email] [hidden email] http://www.cincom.com/smalltalk _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Henrik, Alan,
thanks a lot for sharing your code and comments. I'll look into it as soon as possible. I am not surprised at all that enabling array binding in the Lens speeds up updates a lot. In one project I have been involved with over the course of the last few years, using array binding whereever possible made a huge difference, usually in the range of one to two orders of magnitude when storing dozens to tens of thousands of objects. Am 07.03.09 22:57 schrieb Henrik Johansen: > TBH though, I forgot to mention... > In our usecase, the objects saved are never used/retrieved from the DB > again, it's simply data results from an analysis used to create reports > in Crystal Reports. > I have no idea if this approach will work for persistent objects in an > application, as I've never used the method for those. This happens to be the most critical use case in the application I mentioned. :-) Alan Knight wrote: > As I recall, the issue with updates is that you're only modifying a subset of > fields, so you'd have to make the update statement say it was updating all of > the fields and generate the previous values in there. This reduces the > improvement in efficiency, introduces the possibility for spurious concurrency > conflicts if a field had changed but you didn't actually care about that > change, and would require a certain amount more additional work. Whether updating all fields and using array binding is faster or not will probably depend on the number of rows written, and it may make a difference whether primary key columns are involved or can be omitted. However, for large numbers of rows (dozens to many thousands), I suspect that array binding will always be much faster. Anyway, that's gut feeling - we need measurements instead. ;-) I don't understand your comment about spurious concurrency conflicts. Do you remember what the problem was? As to caching of prepared sessions in Glorp, I haven't yet spotted the code which does it. I'll have to dig deeper, but it's good to know that Glorp does this. Thanks again, Joachim _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Dear Joachim,
> As to caching of prepared statements in Glorp, I haven't yet spotted the code > which does it. I'll have to dig deeper, but it's good to know that Glorp does this. Follow the trail from VWDatabaseAccessor>>statementFor: FYI, Glorp's binding implementation has some platform-specific code. In the current Glorp, it is only supported in VW; all other platforms simply inherit from the superclass the routing of bound Glorp commands back to unbound calls at the interface. I believe I have figured out how to extend binding to VADatabaseAccessor and hope to implement this in odd moments over the coming months. I do not know if anyone is working on this in other dialects. HTH. Yours faithfully Niall Ross _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Joachim Geidel
At 01:40 AM 3/8/2009, Joachim Geidel wrote:
>Whether updating all fields and using array binding is faster or not will >probably depend on the number of rows written, and it may make a difference >whether primary key columns are involved or can be omitted. However, for >large numbers of rows (dozens to many thousands), I suspect that array >binding will always be much faster. Anyway, that's gut feeling - we need >measurements instead. ;-) > >I don't understand your comment about spurious concurrency conflicts. Do you >remember what the problem was? So, the original problem I ran into was that Glorp computes the differences that need to be written. If we're then going to do the updates, we need to fill in the old values for all of the fields that haven't changed. We've somewhat thrown that information away at that point, although that really just means that Glorp would need to be changed to retain that information for that case. But what I meant by concurrency conflicts is a case where we write back the old value for a field that we didn't touch, but in the meantime someone else has updated. i.e. Person 1 Update Row 47 to set Name = 'Alan' <also update a bunch of other stuff> Person2 Update Row 47 to set LastTransactionID to 54321 (previous value 99987) <also update a bunch of other stuff> So both update Row 47, but in different ways that are potentially compatible. But if we used array binding, then if we did them in order, at the end of it, the name will have been changed but then changed back to its old value. This is perhaps a somewhat contrived case, because different sessions updating fields from the same row in ways that aren't conflicting is likely to be rare and is rather frightening to be relying on, but it might happen. >As to caching of prepared sessions in Glorp, I haven't yet spotted the code >which does it. I'll have to dig deeper, but it's good to know that Glorp >does this. See the instance variable preparedStatementCache in VWDatabaseAccessor (It relies on the weak caching mechanism, so isn't portable at the moment). -- Alan Knight [|], Engineering Manager, Cincom Smalltalk [hidden email] [hidden email] http://www.cincom.com/smalltalk _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Joachim Geidel
Am 09.03.09 17:04 schrieb Alan Knight:
> So, the original problem I ran into was that Glorp computes the differences > that need to be written. If we're then going to do the updates, we need to > fill in the old values for all of the fields that haven't changed. We've > somewhat thrown that information away at that point, although that really just > means that Glorp would need to be changed to retain that information for that > case. But what I meant by concurrency conflicts is a case where we write back > the old value for a field that we didn't touch, but in the meantime someone > else has updated. i.e. > > Person 1 > Update Row 47 to set Name = 'Alan' > <also update a bunch of other stuff> > Person2 > Update Row 47 to set LastTransactionID to 54321 (previous value 99987) > <also update a bunch of other stuff> > > So both update Row 47, but in different ways that are potentially compatible. > But if we used array binding, then if we did them in order, at the end of it, > the name will have been changed but then changed back to its old value. This kind of conflict does happen whether you update only columns with changed values or all columns in a row. Limiting the updates to columns with changed values somewhat reduces the probability that the problem occurs, but it does not eliminate it. For example: Person 1 Read Row 47 with account balance = 100 Person 2 Read Row 47 with account balance = 100 Person 1 Update Row 47 to set account balance = account balance + 20 (120) Person2 Update Row 47 to set account balance = account balance + 10 (110, not 130) There are two ways to prevent this kind of inconsistency: - Reading and writing have to be in the same database transaction, which has to be in "serializable" mode (in Oracle terms), not in "read committed" mode. The second commit would fail, which has to be handled appropriately. - One has to acquire a lock for the row before reading it (or re-read it after acquiring the lock), and hold the lock until the commit. It does not matter if one uses a row lock using SELECT FOR UPDATE, or if the locking mechanism is based on something else like Oracle User Locks, a home-grown locking mechanism, etc. > This is perhaps a somewhat contrived case, because different sessions updating > fields from the same row in ways that aren't conflicting is likely to be rare > and is rather frightening to be relying on, but it might happen. To rephrase this: It is not a good idea to rely on lowering the probability of the problem by updating only columns with actual value changes. Indeed, it isn't. If the problem can happen, the application has a serious design problem concerning transactional integrity. It has to be solved in one of the ways described above. In applications with high numbers of concurrent sessions and high transaction rates, even low probabilities make the occurrence of errors a certainty. Been there, seen that. That said, in my experience, throughput of high-volume updates is always much faster when using array binding at the price of updating more columns than needed. The reasons are that - the SQL has to be parsed only once (even "soft parses" which hit a cache in the Oracle server need time) - single row updates need one network roundtrip per row - 1,000 updates at 10ms per roundtrip mean 10 seconds of execution time, while updating 1,000 rows with a single update using array binding can usually be done in tenths of a second. IMHO, the minuscule reduction in the probability of hitting the consequences of a design problem in an application does not outweigh the serious performance impact of using single row statements for updates and deletes in Glorp. Of course, this is only true when using a database which supports bind variables in SQL statements as well as array binding, such as Oracle. If I am not mistaken, DB2 and ODBC support this, too, but I don't know if the ODBCEXDI and DB2EXDI support array binding. If they don't, it would be good to have this feature added. Joachim Geidel _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
At 06:40 PM 3/14/2009, Joachim Geidel wrote:
>Am 09.03.09 17:04 schrieb Alan Knight: >> So, the original problem I ran into was that Glorp computes the differences >> that need to be written. If we're then going to do the updates, we need to >> fill in the old values for all of the fields that haven't changed. We've >> somewhat thrown that information away at that point, although that really just >> means that Glorp would need to be changed to retain that information for that >> case. But what I meant by concurrency conflicts is a case where we write back >> the old value for a field that we didn't touch, but in the meantime someone >> else has updated. i.e. >> >> Person 1 >> Update Row 47 to set Name = 'Alan' >> <also update a bunch of other stuff> >> Person2 >> Update Row 47 to set LastTransactionID to 54321 (previous value 99987) >> <also update a bunch of other stuff> >> >> So both update Row 47, but in different ways that are potentially compatible. >> But if we used array binding, then if we did them in order, at the end of it, >> the name will have been changed but then changed back to its old value. > >This kind of conflict does happen whether you update only columns with >changed values or all columns in a row. Limiting the updates to columns with >changed values somewhat reduces the probability that the problem occurs, but >it does not eliminate it. For example: > > Person 1 > Read Row 47 with account balance = 100 > Person 2 > Read Row 47 with account balance = 100 > Person 1 > Update Row 47 to set account balance = account balance + 20 (120) > Person2 > Update Row 47 to set account balance = account balance + 10 (110, not >130) > >There are two ways to prevent this kind of inconsistency: >- Reading and writing have to be in the same database transaction, which has > to be in "serializable" mode (in Oracle terms), not in "read > committed" mode. The second commit would fail, which has to be handled > appropriately. >- One has to acquire a lock for the row before reading it (or re-read it > after acquiring the lock), and hold the lock until the commit. It does > not matter if one uses a row lock using SELECT FOR UPDATE, or if the > locking mechanism is based on something else like Oracle User Locks, a > home-grown locking mechanism, etc. > >> This is perhaps a somewhat contrived case, because different sessions updating >> fields from the same row in ways that aren't conflicting is likely to be rare >> and is rather frightening to be relying on, but it might happen. > >To rephrase this: It is not a good idea to rely on lowering the probability >of the problem by updating only columns with actual value changes. Indeed, >it isn't. If the problem can happen, the application has a serious design >problem concerning transactional integrity. It has to be solved in one of >the ways described above. In applications with high numbers of concurrent >sessions and high transaction rates, even low probabilities make the >occurrence of errors a certainty. Been there, seen that. > >That said, in my experience, throughput of high-volume updates is always >much faster when using array binding at the price of updating more columns >than needed. The reasons are that >- the SQL has to be parsed only once (even "soft parses" which hit a cache >in the Oracle server need time) >- single row updates need one network roundtrip per row - 1,000 updates at >10ms per roundtrip mean 10 seconds of execution time, while updating 1,000 >rows with a single update using array binding can usually be done in tenths >of a second. > >IMHO, the minuscule reduction in the probability of hitting the consequences >of a design problem in an application does not outweigh the serious >performance impact of using single row statements for updates and deletes in >Glorp. > >Of course, this is only true when using a database which supports bind >variables in SQL statements as well as array binding, such as Oracle. If I >am not mistaken, DB2 and ODBC support this, too, but I don't know if the >ODBCEXDI and DB2EXDI support array binding. If they don't, it would be good >to have this feature added. > >Joachim Geidel Yes, these are the reasons I called it rather frightening to rely on. In terms of performance, we may have to reparse sometimes anyway. You're assuming that we always have the same list of columns. We may not, so if we actually wanted to consistently write all the columns, we might actually have to do a large set of reads of information that is otherwise irrelevant. That's likely to bring the performance down some. But the largest obstacle to implementing that is that you'd have to rework the way Glorp is generating the rows to be written, or else do a postprocessing stage, so that it has the data for the unchanged columns. -- Alan Knight [|], Engineering Manager, Cincom Smalltalk [hidden email] [hidden email] http://www.cincom.com/smalltalk _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Joachim Geidel
At 06:40 PM 3/14/2009, Joachim Geidel wrote:
>Of course, this is only true when using a database which supports bind >variables in SQL statements as well as array binding, such as Oracle. If I >am not mistaken, DB2 and ODBC support this, too, but I don't know if the >ODBCEXDI and DB2EXDI support array binding. If they don't, it would be good >to have this feature added. Also, I'm curious what it would mean for ODBC to support this. Presumably it can't do anything unless the underlying database supports this type of feature, so for example, it would be rather useless if using SQL Server with its <expletive deleted> identity columns. -- Alan Knight [|], Engineering Manager, Cincom Smalltalk [hidden email] [hidden email] http://www.cincom.com/smalltalk _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Joachim Geidel
Am 15.03.09 16:34 schrieb Alan Knight:
> At 06:40 PM 3/14/2009, Joachim Geidel wrote: >> Of course, this is only true when using a database which supports bind >> variables in SQL statements as well as array binding, such as Oracle. If I >> am not mistaken, DB2 and ODBC support this, too, but I don't know if the >> ODBCEXDI and DB2EXDI support array binding. If they don't, it would be good >> to have this feature added. > > Also, I'm curious what it would mean for ODBC to support this. Presumably it > can't do anything unless the underlying database supports this type of > feature, so for example, it would be rather useless if using SQL Server with > its <expletive deleted> identity columns. Details are here: http://msdn.microsoft.com/en-us/library/ms713841(VS.85).aspx "Unfortunately, not many data sources support parameter arrays. However, a driver can emulate parameter arrays by executing an SQL statement once for each set of parameter values. This can lead to increases in speed because the driver can then prepare the statement that it plans to execute once for each parameter set. It might also lead to simpler application code." So, it has advantages even when the database itself does not support array binding. If MSDN says it, it must be true. ;-) Joachim Geidel _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Joachim Geidel
Am 15.03.09 16:25 schrieb Alan Knight:
> In terms of performance, we may have to reparse sometimes anyway. You're > assuming that we always have the same list of columns. We may not, so if we > actually wanted to consistently write all the columns, we might actually have > to do a large set of reads of information that is otherwise irrelevant. That's > likely to bring the performance down some. That depends on what is actually done in a transaction. I would say that the typical case is that all columns of a row are read anyway. If the information has to be re-read depends on the design of the transactions and the locking scheme used. Glorp is quite flexible in the way it maps between tables and objects, so there may be difficult cases. However, I think that they can be avoided by grouping those rows together which lead to exactly the same SQL statements, and process them with array binding. As most high volume updates will change the same attributes in all rows anyway, this would speed up most cases. > But the largest obstacle to implementing that is that you'd have to rework the > way Glorp is generating the rows to be written, or else do a postprocessing > stage, so that it has the data for the unchanged columns. Too bad. This means that switching to Glorp will not have any advantages over sticking with the ObjectLens in the application with the performance problem I was talking about. It also means that Glorp is not yet ready for mass data processing. :-( Joachim Geidel _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Free forum by Nabble | Edit this page |