[vwnc] Glorp and Array Binding for Oracle

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[vwnc] Glorp and Array Binding for Oracle

Joachim Geidel
Hello everybody,

I need some advice from people who know the inner workings of Glorp. First,
let me describe the background.

I am currently confronted with an older VW application which is based on the
ObjectLens and uses an Oracle database. It has performance problems, because
the ObjectLens executes inserts and updates as separate statements for each
row.

The Oracle Call Interface and the OCI support array binding, i.e. executing
a single call of the OCIStmtExecute() function with arrays as parameters and
the "iters" parameter set to the number of elements of the arrays. The
ObjectLens does not support this, but will execute 1000 INSERT statements
when inserting 1000 rows, which is slower than using array binding for a
single insert by an incredibly large factor. It will also drive your DBA
nuts. ;-)

I looked inside the ObjectLens to see if it could be altered to use array
binding for Oracle databases, but that would not be easy. The code is a bit
too convoluted, with methods doing preprocessing, executing SQL, and
postprocessing for single rows. It's probably possible to refactor this, but
it won't be what I consider fun.

Another option would be porting the application to Glorp. However, when
looking into the code it seems that it supports array binding only for
INSERT statements (InsertCommands). The method comment of
RowBasedCommand>>supportsGroupWriting says:

"Return true if array binding or a similar mechanism can be used to execute
us with multiple commands at once. I think in general this can only ever
make sense for inserts (and possibly selects under some circumstances).
Updates may want to update only part of the fields."

IMO it is wrong that array binding only makes sense for INSERTS, at least
with an Oracle database. The Oracle Call Interface supports update and
delete commands with array binding AFAIK. E.g., one can delete multiple rows
with the statement
    DELETE FROM myTable WHERE someColumn = :1
by binding :1 to an array of values.

Also, Glorp limits the number of rows which will be passed to an INSERT
command by array binding to 1000 (RowBasedCommand>>tooBigForSingleWrite).
The OCI has no such limitation. Of course, binding more than 1000 rows to a
single call will allocate larger OracleBuffers, but if the array elements
are small (e.g. Integers) and there is enough memory available, it is no
problem at all to use array binding with many more elements in a bind array.

Another issue is that for each row (or group of rows when using array
binding), a new ExternalDatabaseSession is created, prepared, bound, and
executed. This is a bad practice (which could be fixed at the EXDI level in
VW, but I don't know about other dialects), because preparing the same
statement over and over again is an expensive operation which involves
parsing the SQL statement, computing an execution path etc. Oracle servers
will cache the results in the SGA, but this still requires unnecessary
network roundtrips and computations in the database server. It should be
implemented in a way which caches the result of the "prepare" step, and
binds and executes the ExternalDatabaseSession in a loop.

These limitations can seriously hamper the performance of Glorp based
applications when they have to process mass data, just as it was the case
with the ObjectLens.

So, what I would like to know is:

* Is my analysis concerning the behavior of Glorp correct?

* Would it be sufficient to alter the default return value of
supportsGroupWriting to true to enable array binding for UPDATE and DELETE
commands? Or would there be other things which need to be changed, e.g. in
DatabasePlatform and its subclasses?

* For UPDATEs, supporting array binding may not be trivial, depending on the
changes which have to be posted. Different objects may have changes in
different attributes. It would be possible to compute the set of all columns
which have a change for at least one row within a group of rows, and update
them all for all of the rows. When the number of rows is large enough and
array binding can be used, this will still be much faster than posting
dozens or hundreds of different single-row updates (not to speak of
thousands). How difficult would it be to change the Glorp implementation to
do this?
 
* How much effort would it be to reuse prepared ExternalDatabaseSessions in
Glorp?

Thanks in advance,

Joachim Geidel



_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Glorp and Array Binding for Oracle

Alan Knight-2
(Responding on Saturday afternoon without having actually looked at any code)

As I recall, the issue with updates is that you're only modifying a subset of fields, so you'd have to make the update statement say it was updating all of the fields and generate the previous values in there. This reduces the improvement in efficiency, introduces the possibility for spurious concurrency conflicts if a field had changed but you didn't actually care about that change, and would require a certain amount more additional work. It would probably be possible to make it work, but I think it would be a middling amount of effort. Making it work for deletes would, I think, be quite easy.

I think the limit on the maximum number of rows does not apply when array binding, but was intended for things like Postgresql where we don't have array binding, but rather concatenate multiple statements together semicolon-separated. Or if it does apply, I don't think there's any reason it ought to, and the code should probably be changed.

Glorp caches prepared statements (ExternalDatabaseSession) objects, so if the SQL statement matches, it will try to reuse a previously prepared session.

At 09:59 AM 3/7/2009, Joachim Geidel wrote:

>Hello everybody,
>
>I need some advice from people who know the inner workings of Glorp. First,
>let me describe the background.
>
>I am currently confronted with an older VW application which is based on the
>ObjectLens and uses an Oracle database. It has performance problems, because
>the ObjectLens executes inserts and updates as separate statements for each
>row.
>
>The Oracle Call Interface and the OCI support array binding, i.e. executing
>a single call of the OCIStmtExecute() function with arrays as parameters and
>the "iters" parameter set to the number of elements of the arrays. The
>ObjectLens does not support this, but will execute 1000 INSERT statements
>when inserting 1000 rows, which is slower than using array binding for a
>single insert by an incredibly large factor. It will also drive your DBA
>nuts. ;-)
>
>I looked inside the ObjectLens to see if it could be altered to use array
>binding for Oracle databases, but that would not be easy. The code is a bit
>too convoluted, with methods doing preprocessing, executing SQL, and
>postprocessing for single rows. It's probably possible to refactor this, but
>it won't be what I consider fun.
>
>Another option would be porting the application to Glorp. However, when
>looking into the code it seems that it supports array binding only for
>INSERT statements (InsertCommands). The method comment of
>RowBasedCommand>>supportsGroupWriting says:
>
>"Return true if array binding or a similar mechanism can be used to execute
>us with multiple commands at once. I think in general this can only ever
>make sense for inserts (and possibly selects under some circumstances).
>Updates may want to update only part of the fields."
>
>IMO it is wrong that array binding only makes sense for INSERTS, at least
>with an Oracle database. The Oracle Call Interface supports update and
>delete commands with array binding AFAIK. E.g., one can delete multiple rows
>with the statement
>    DELETE FROM myTable WHERE someColumn = :1
>by binding :1 to an array of values.
>
>Also, Glorp limits the number of rows which will be passed to an INSERT
>command by array binding to 1000 (RowBasedCommand>>tooBigForSingleWrite).
>The OCI has no such limitation. Of course, binding more than 1000 rows to a
>single call will allocate larger OracleBuffers, but if the array elements
>are small (e.g. Integers) and there is enough memory available, it is no
>problem at all to use array binding with many more elements in a bind array.
>
>Another issue is that for each row (or group of rows when using array
>binding), a new ExternalDatabaseSession is created, prepared, bound, and
>executed. This is a bad practice (which could be fixed at the EXDI level in
>VW, but I don't know about other dialects), because preparing the same
>statement over and over again is an expensive operation which involves
>parsing the SQL statement, computing an execution path etc. Oracle servers
>will cache the results in the SGA, but this still requires unnecessary
>network roundtrips and computations in the database server. It should be
>implemented in a way which caches the result of the "prepare" step, and
>binds and executes the ExternalDatabaseSession in a loop.
>
>These limitations can seriously hamper the performance of Glorp based
>applications when they have to process mass data, just as it was the case
>with the ObjectLens.
>
>So, what I would like to know is:
>
>* Is my analysis concerning the behavior of Glorp correct?
>
>* Would it be sufficient to alter the default return value of
>supportsGroupWriting to true to enable array binding for UPDATE and DELETE
>commands? Or would there be other things which need to be changed, e.g. in
>DatabasePlatform and its subclasses?
>
>* For UPDATEs, supporting array binding may not be trivial, depending on the
>changes which have to be posted. Different objects may have changes in
>different attributes. It would be possible to compute the set of all columns
>which have a change for at least one row within a group of rows, and update
>them all for all of the rows. When the number of rows is large enough and
>array binding can be used, this will still be much faster than posting
>dozens or hundreds of different single-row updates (not to speak of
>thousands). How difficult would it be to change the Glorp implementation to
>do this?
>
>* How much effort would it be to reuse prepared ExternalDatabaseSessions in
>Glorp?
>
>Thanks in advance,
>
>Joachim Geidel
>
>
>
>_______________________________________________
>vwnc mailing list
>[hidden email]
>http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Glorp and Array Binding for Oracle

Henrik Sperre Johansen
In reply to this post by Joachim Geidel
I did something of the same for Lens -> Oracle DB, modifying the addAll:
to perform array binding (per type of course) of the objects instead of
committing them one by one... Strangely enough, this speeded things up
by alot!

I can't guarantee that it'll always work, but for us it's left no
visible side-effects, been in production for 1/2 year or so with no
reported problems. Can't really remember if I put any constraints on the
inputs anymore, guess I should comment abit more :)
Named it addAllFast: as it wasn't really needed for all operations, and
I'd like to isolate potental problems as much as possible.

Cheers,
Henry

On 07.03.2009 15:59, Joachim Geidel wrote:

> Hello everybody,
>
> I need some advice from people who know the inner workings of Glorp. First,
> let me describe the background.
>
> I am currently confronted with an older VW application which is based on the
> ObjectLens and uses an Oracle database. It has performance problems, because
> the ObjectLens executes inserts and updates as separate statements for each
> row.
>
> The Oracle Call Interface and the OCI support array binding, i.e. executing
> a single call of the OCIStmtExecute() function with arrays as parameters and
> the "iters" parameter set to the number of elements of the arrays. The
> ObjectLens does not support this, but will execute 1000 INSERT statements
> when inserting 1000 rows, which is slower than using array binding for a
> single insert by an incredibly large factor. It will also drive your DBA
> nuts. ;-)
>
> I looked inside the ObjectLens to see if it could be altered to use array
> binding for Oracle databases, but that would not be easy. The code is a bit
> too convoluted, with methods doing preprocessing, executing SQL, and
> postprocessing for single rows. It's probably possible to refactor this, but
> it won't be what I consider fun.
>
> Another option would be porting the application to Glorp. However, when
> looking into the code it seems that it supports array binding only for
> INSERT statements (InsertCommands). The method comment of
> RowBasedCommand>>supportsGroupWriting says:
>
> "Return true if array binding or a similar mechanism can be used to execute
> us with multiple commands at once. I think in general this can only ever
> make sense for inserts (and possibly selects under some circumstances).
> Updates may want to update only part of the fields."
>
> IMO it is wrong that array binding only makes sense for INSERTS, at least
> with an Oracle database. The Oracle Call Interface supports update and
> delete commands with array binding AFAIK. E.g., one can delete multiple rows
> with the statement
>      DELETE FROM myTable WHERE someColumn = :1
> by binding :1 to an array of values.
>
> Also, Glorp limits the number of rows which will be passed to an INSERT
> command by array binding to 1000 (RowBasedCommand>>tooBigForSingleWrite).
> The OCI has no such limitation. Of course, binding more than 1000 rows to a
> single call will allocate larger OracleBuffers, but if the array elements
> are small (e.g. Integers) and there is enough memory available, it is no
> problem at all to use array binding with many more elements in a bind array.
>
> Another issue is that for each row (or group of rows when using array
> binding), a new ExternalDatabaseSession is created, prepared, bound, and
> executed. This is a bad practice (which could be fixed at the EXDI level in
> VW, but I don't know about other dialects), because preparing the same
> statement over and over again is an expensive operation which involves
> parsing the SQL statement, computing an execution path etc. Oracle servers
> will cache the results in the SGA, but this still requires unnecessary
> network roundtrips and computations in the database server. It should be
> implemented in a way which caches the result of the "prepare" step, and
> binds and executes the ExternalDatabaseSession in a loop.
>
> These limitations can seriously hamper the performance of Glorp based
> applications when they have to process mass data, just as it was the case
> with the ObjectLens.
>
> So, what I would like to know is:
>
> * Is my analysis concerning the behavior of Glorp correct?
>
> * Would it be sufficient to alter the default return value of
> supportsGroupWriting to true to enable array binding for UPDATE and DELETE
> commands? Or would there be other things which need to be changed, e.g. in
> DatabasePlatform and its subclasses?
>
> * For UPDATEs, supporting array binding may not be trivial, depending on the
> changes which have to be posted. Different objects may have changes in
> different attributes. It would be possible to compute the set of all columns
> which have a change for at least one row within a group of rows, and update
> them all for all of the rows. When the number of rows is large enough and
> array binding can be used, this will still be much faster than posting
> dozens or hundreds of different single-row updates (not to speak of
> thousands). How difficult would it be to change the Glorp implementation to
> do this?
>
> * How much effort would it be to reuse prepared ExternalDatabaseSessions in
> Glorp?
>
> Thanks in advance,
>
> Joachim Geidel
>
>
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
>
>    

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

LensSession-addAllFast.zip (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Glorp and Array Binding for Oracle

Alan Knight-2
Interesting. I'm surprised it worked as well as that. Glorp goes to a fair bit of effort to make that sort of thing possible, although it also gets other benefits from the same work. In particular, it takes all of the objects to be written in a particular transaction, generates all the row information, and then sorts the rows by type and based on the foreign key constraints in order to write as many rows as possible in the same operation.

At 02:19 PM 3/7/2009, Henrik Johansen wrote:

>I did something of the same for Lens -> Oracle DB, modifying the addAll: to perform array binding (per type of course) of the objects instead of committing them one by one... Strangely enough, this speeded things up by alot!
>
>I can't guarantee that it'll always work, but for us it's left no visible side-effects, been in production for 1/2 year or so with no reported problems. Can't really remember if I put any constraints on the inputs anymore, guess I should comment abit more :)
>Named it addAllFast: as it wasn't really needed for all operations, and I'd like to isolate potental problems as much as possible.
>
>Cheers,
>Henry
>
>On 07.03.2009 15:59, Joachim Geidel wrote:
>>Hello everybody,
>>
>>I need some advice from people who know the inner workings of Glorp. First,
>>let me describe the background.
>>
>>I am currently confronted with an older VW application which is based on the
>>ObjectLens and uses an Oracle database. It has performance problems, because
>>the ObjectLens executes inserts and updates as separate statements for each
>>row.
>>
>>The Oracle Call Interface and the OCI support array binding, i.e. executing
>>a single call of the OCIStmtExecute() function with arrays as parameters and
>>the "iters" parameter set to the number of elements of the arrays. The
>>ObjectLens does not support this, but will execute 1000 INSERT statements
>>when inserting 1000 rows, which is slower than using array binding for a
>>single insert by an incredibly large factor. It will also drive your DBA
>>nuts. ;-)
>>
>>I looked inside the ObjectLens to see if it could be altered to use array
>>binding for Oracle databases, but that would not be easy. The code is a bit
>>too convoluted, with methods doing preprocessing, executing SQL, and
>>postprocessing for single rows. It's probably possible to refactor this, but
>>it won't be what I consider fun.
>>
>>Another option would be porting the application to Glorp. However, when
>>looking into the code it seems that it supports array binding only for
>>INSERT statements (InsertCommands). The method comment of
>>RowBasedCommand>>supportsGroupWriting says:
>>
>>"Return true if array binding or a similar mechanism can be used to execute
>>us with multiple commands at once. I think in general this can only ever
>>make sense for inserts (and possibly selects under some circumstances).
>>Updates may want to update only part of the fields."
>>
>>IMO it is wrong that array binding only makes sense for INSERTS, at least
>>with an Oracle database. The Oracle Call Interface supports update and
>>delete commands with array binding AFAIK. E.g., one can delete multiple rows
>>with the statement
>>     DELETE FROM myTable WHERE someColumn = :1
>>by binding :1 to an array of values.
>>
>>Also, Glorp limits the number of rows which will be passed to an INSERT
>>command by array binding to 1000 (RowBasedCommand>>tooBigForSingleWrite).
>>The OCI has no such limitation. Of course, binding more than 1000 rows to a
>>single call will allocate larger OracleBuffers, but if the array elements
>>are small (e.g. Integers) and there is enough memory available, it is no
>>problem at all to use array binding with many more elements in a bind array.
>>
>>Another issue is that for each row (or group of rows when using array
>>binding), a new ExternalDatabaseSession is created, prepared, bound, and
>>executed. This is a bad practice (which could be fixed at the EXDI level in
>>VW, but I don't know about other dialects), because preparing the same
>>statement over and over again is an expensive operation which involves
>>parsing the SQL statement, computing an execution path etc. Oracle servers
>>will cache the results in the SGA, but this still requires unnecessary
>>network roundtrips and computations in the database server. It should be
>>implemented in a way which caches the result of the "prepare" step, and
>>binds and executes the ExternalDatabaseSession in a loop.
>>
>>These limitations can seriously hamper the performance of Glorp based
>>applications when they have to process mass data, just as it was the case
>>with the ObjectLens.
>>
>>So, what I would like to know is:
>>
>>* Is my analysis concerning the behavior of Glorp correct?
>>
>>* Would it be sufficient to alter the default return value of
>>supportsGroupWriting to true to enable array binding for UPDATE and DELETE
>>commands? Or would there be other things which need to be changed, e.g. in
>>DatabasePlatform and its subclasses?
>>
>>* For UPDATEs, supporting array binding may not be trivial, depending on the
>>changes which have to be posted. Different objects may have changes in
>>different attributes. It would be possible to compute the set of all columns
>>which have a change for at least one row within a group of rows, and update
>>them all for all of the rows. When the number of rows is large enough and
>>array binding can be used, this will still be much faster than posting
>>dozens or hundreds of different single-row updates (not to speak of
>>thousands). How difficult would it be to change the Glorp implementation to
>>do this?
>>
>>* How much effort would it be to reuse prepared ExternalDatabaseSessions in
>>Glorp?
>>
>>Thanks in advance,
>>
>>Joachim Geidel
>>
>>
>>
>>_______________________________________________
>>vwnc mailing list
>>[hidden email]
>>http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>
>>
>>  
>
>
>
>
>_______________________________________________
>vwnc mailing list
>[hidden email]
>http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Glorp and Array Binding for Oracle

Henrik Sperre Johansen
In reply to this post by Henrik Sperre Johansen
TBH though, I forgot to mention...
In our usecase, the objects saved are never used/retrieved from the DB again, it's simply data results from an analysis used to create reports in Crystal Reports.
I have no idea if this approach will work for persistent objects in an application, as I've never used the method for those.

Cheers,
Henry

On 07.03.2009 22:29, Alan Knight wrote:
Interesting. I'm surprised it worked as well as that. Glorp goes to a fair bit of effort to make that sort of thing possible, although it also gets other benefits from the same work. In particular, it takes all of the objects to be written in a particular transaction, generates all the row information, and then sorts the rows by type and based on the foreign key constraints in order to write as many rows as possible in the same operation. 

At 02:19 PM 3/7/2009, Henrik Johansen wrote:
  
I did something of the same for Lens -> Oracle DB, modifying the addAll: to perform array binding (per type of course) of the objects instead of committing them one by one... Strangely enough, this speeded things up by alot!

I can't guarantee that it'll always work, but for us it's left no visible side-effects, been in production for 1/2 year or so with no reported problems. Can't really remember if I put any constraints on the inputs anymore, guess I should comment abit more :)
Named it addAllFast: as it wasn't really needed for all operations, and I'd like to isolate potental problems as much as possible.

Cheers,
Henry

On 07.03.2009 15:59, Joachim Geidel wrote:
    
Hello everybody,

I need some advice from people who know the inner workings of Glorp. First,
let me describe the background.

I am currently confronted with an older VW application which is based on the
ObjectLens and uses an Oracle database. It has performance problems, because
the ObjectLens executes inserts and updates as separate statements for each
row.

The Oracle Call Interface and the OCI support array binding, i.e. executing
a single call of the OCIStmtExecute() function with arrays as parameters and
the "iters" parameter set to the number of elements of the arrays. The
ObjectLens does not support this, but will execute 1000 INSERT statements
when inserting 1000 rows, which is slower than using array binding for a
single insert by an incredibly large factor. It will also drive your DBA
nuts. ;-)

I looked inside the ObjectLens to see if it could be altered to use array
binding for Oracle databases, but that would not be easy. The code is a bit
too convoluted, with methods doing preprocessing, executing SQL, and
postprocessing for single rows. It's probably possible to refactor this, but
it won't be what I consider fun.

Another option would be porting the application to Glorp. However, when
looking into the code it seems that it supports array binding only for
INSERT statements (InsertCommands). The method comment of
RowBasedCommand>>supportsGroupWriting says:

"Return true if array binding or a similar mechanism can be used to execute
us with multiple commands at once. I think in general this can only ever
make sense for inserts (and possibly selects under some circumstances).
Updates may want to update only part of the fields."

IMO it is wrong that array binding only makes sense for INSERTS, at least
with an Oracle database. The Oracle Call Interface supports update and
delete commands with array binding AFAIK. E.g., one can delete multiple rows
with the statement
    DELETE FROM myTable WHERE someColumn = :1
by binding :1 to an array of values.

Also, Glorp limits the number of rows which will be passed to an INSERT
command by array binding to 1000 (RowBasedCommand>>tooBigForSingleWrite).
The OCI has no such limitation. Of course, binding more than 1000 rows to a
single call will allocate larger OracleBuffers, but if the array elements
are small (e.g. Integers) and there is enough memory available, it is no
problem at all to use array binding with many more elements in a bind array.

Another issue is that for each row (or group of rows when using array
binding), a new ExternalDatabaseSession is created, prepared, bound, and
executed. This is a bad practice (which could be fixed at the EXDI level in
VW, but I don't know about other dialects), because preparing the same
statement over and over again is an expensive operation which involves
parsing the SQL statement, computing an execution path etc. Oracle servers
will cache the results in the SGA, but this still requires unnecessary
network roundtrips and computations in the database server. It should be
implemented in a way which caches the result of the "prepare" step, and
binds and executes the ExternalDatabaseSession in a loop.

These limitations can seriously hamper the performance of Glorp based
applications when they have to process mass data, just as it was the case
with the ObjectLens.

So, what I would like to know is:

* Is my analysis concerning the behavior of Glorp correct?

* Would it be sufficient to alter the default return value of
supportsGroupWriting to true to enable array binding for UPDATE and DELETE
commands? Or would there be other things which need to be changed, e.g. in
DatabasePlatform and its subclasses?

* For UPDATEs, supporting array binding may not be trivial, depending on the
changes which have to be posted. Different objects may have changes in
different attributes. It would be possible to compute the set of all columns
which have a change for at least one row within a group of rows, and update
them all for all of the rows. When the number of rows is large enough and
array binding can be used, this will still be much faster than posting
dozens or hundreds of different single-row updates (not to speak of
thousands). How difficult would it be to change the Glorp implementation to
do this?

* How much effort would it be to reuse prepared ExternalDatabaseSessions in
Glorp?

Thanks in advance,

Joachim Geidel



_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


  
      


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
    

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk



  


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Glorp and Array Binding for Oracle

Joachim Geidel
Henrik, Alan,

thanks a  lot for sharing your code and comments. I'll look into it as soon
as possible. I am not surprised at all that enabling array binding in the
Lens speeds up updates a lot. In one project I have been involved with over
the course of the last few years, using array binding whereever possible
made a huge difference, usually in the range of one to two orders of
magnitude when storing dozens to tens of thousands of objects.

Am 07.03.09 22:57 schrieb Henrik Johansen:
> TBH though, I forgot to mention...
> In our usecase, the objects saved are never used/retrieved from the DB
> again, it's simply data results from an analysis used to create reports
> in Crystal Reports.
> I have no idea if this approach will work for persistent objects in an
> application, as I've never used the method for those.

This happens to be the most critical use case in the application I
mentioned. :-)

Alan Knight wrote:
> As I recall, the issue with updates is that you're only modifying a subset of
> fields, so you'd have to make the update statement say it was updating all of
> the fields and generate the previous values in there. This reduces the
> improvement in efficiency, introduces the possibility for spurious concurrency
> conflicts if a field had changed but you didn't actually care about that
> change, and would require a certain amount more additional work.

Whether updating all fields and using array binding is faster or not will
probably depend on the number of rows written, and it may make a difference
whether primary key columns are involved or can be omitted. However, for
large numbers of rows (dozens to many thousands), I suspect that array
binding will always be much faster. Anyway, that's gut feeling - we need
measurements instead. ;-)

I don't understand your comment about spurious concurrency conflicts. Do you
remember what the problem was?

As to caching of prepared sessions in Glorp, I haven't yet spotted the code
which does it. I'll have to dig deeper, but it's good to know that Glorp
does this.

Thanks again,
Joachim


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Glorp and Array Binding for Oracle

Niall Ross
Dear Joachim,

 > As to caching of prepared statements in Glorp, I haven't yet spotted
the code
 > which does it. I'll have to dig deeper, but it's good to know that
Glorp does this.

Follow the trail from VWDatabaseAccessor>>statementFor:

FYI, Glorp's binding implementation has some platform-specific code.  In
the current Glorp, it is only supported in VW;  all other platforms
simply inherit from the superclass the routing of bound Glorp commands
back to unbound calls at the interface.  I believe I have figured out
how to extend binding to VADatabaseAccessor and hope to implement this
in odd moments over the coming months.  I do not know if anyone is
working on this in other dialects.

HTH.
          Yours faithfully
             Niall Ross

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Glorp and Array Binding for Oracle

Alan Knight-2
In reply to this post by Joachim Geidel
At 01:40 AM 3/8/2009, Joachim Geidel wrote:
>Whether updating all fields and using array binding is faster or not will
>probably depend on the number of rows written, and it may make a difference
>whether primary key columns are involved or can be omitted. However, for
>large numbers of rows (dozens to many thousands), I suspect that array
>binding will always be much faster. Anyway, that's gut feeling - we need
>measurements instead. ;-)
>
>I don't understand your comment about spurious concurrency conflicts. Do you
>remember what the problem was?

So, the original problem I ran into was that Glorp computes the differences that need to be written. If we're then going to do the updates, we need to fill in the old values for all of the fields that haven't changed. We've somewhat thrown that information away at that point, although that really just means that Glorp would need to be changed to retain that information for that case. But what I meant by concurrency conflicts is a case where we write back the old value for a field that we didn't touch, but in the meantime someone else has updated. i.e.

 Person 1
    Update Row 47 to set Name = 'Alan'
    <also update a bunch of other stuff>
 Person2
   Update Row 47 to set LastTransactionID to 54321 (previous value 99987)
   <also update a bunch of other stuff>

So both update Row 47, but in different ways that are potentially compatible. But if we used array binding, then if we did them in order, at the end of it, the name will have been changed but then changed back to its old value.

This is perhaps a somewhat contrived case, because different sessions updating fields from the same row in ways that aren't conflicting is likely to be rare and is rather frightening to be relying on, but it might happen.

>As to caching of prepared sessions in Glorp, I haven't yet spotted the code
>which does it. I'll have to dig deeper, but it's good to know that Glorp
>does this.

See the instance variable preparedStatementCache in VWDatabaseAccessor (It relies on the weak caching mechanism, so isn't portable at the moment).


--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Glorp and Array Binding for Oracle

Joachim Geidel
In reply to this post by Joachim Geidel
Am 09.03.09 17:04 schrieb Alan Knight:

> So, the original problem I ran into was that Glorp computes the differences
> that need to be written. If we're then going to do the updates, we need to
> fill in the old values for all of the fields that haven't changed. We've
> somewhat thrown that information away at that point, although that really just
> means that Glorp would need to be changed to retain that information for that
> case. But what I meant by concurrency conflicts is a case where we write back
> the old value for a field that we didn't touch, but in the meantime someone
> else has updated. i.e.
>
>  Person 1
>     Update Row 47 to set Name = 'Alan'
>     <also update a bunch of other stuff>
>  Person2
>    Update Row 47 to set LastTransactionID to 54321 (previous value 99987)
>    <also update a bunch of other stuff>
>
> So both update Row 47, but in different ways that are potentially compatible.
> But if we used array binding, then if we did them in order, at the end of it,
> the name will have been changed but then changed back to its old value.

This kind of conflict does happen whether you update only columns with
changed values or all columns in a row. Limiting the updates to columns with
changed values somewhat reduces the probability that the problem occurs, but
it does not eliminate it. For example:

  Person 1
    Read Row 47 with account balance = 100
  Person 2
    Read Row 47 with account balance = 100
  Person 1
     Update Row 47 to set account balance = account balance + 20 (120)
  Person2
     Update Row 47 to set account balance = account balance + 10 (110, not
130)

There are two ways to prevent this kind of inconsistency:
- Reading and writing have to be in the same database transaction, which has
  to be in "serializable" mode (in Oracle terms), not in "read
  committed" mode. The second commit would fail, which has to be handled
  appropriately.
- One has to acquire a lock for the row before reading it (or re-read it
  after acquiring the lock), and hold the lock until the commit. It does
  not matter if one uses a row lock using SELECT FOR UPDATE, or if the
  locking mechanism is based on something else like Oracle User Locks, a
  home-grown locking mechanism, etc.

> This is perhaps a somewhat contrived case, because different sessions updating
> fields from the same row in ways that aren't conflicting is likely to be rare
> and is rather frightening to be relying on, but it might happen.

To rephrase this: It is not a good idea to rely on lowering the probability
of the problem by updating only columns with actual value changes. Indeed,
it isn't. If the problem can happen, the application has a serious design
problem concerning transactional integrity. It has to be solved in one of
the ways described above. In applications with high numbers of concurrent
sessions and high transaction rates, even low probabilities make the
occurrence of errors a certainty. Been there, seen that.

That said, in my experience, throughput of high-volume updates is always
much faster when using array binding at the price of updating more columns
than needed. The reasons are that
- the SQL has to be parsed only once (even "soft parses" which hit a cache
in the Oracle server need time)
- single row updates need one network roundtrip per row - 1,000 updates at
10ms per roundtrip mean 10 seconds of execution time, while updating 1,000
rows with a single update using array binding can usually be done in tenths
of a second.

IMHO, the minuscule reduction in the probability of hitting the consequences
of a design problem in an application does not outweigh the serious
performance impact of using single row statements for updates and deletes in
Glorp.

Of course, this is only true when using a database which supports bind
variables in SQL statements as well as array binding, such as Oracle. If I
am not mistaken, DB2 and ODBC support this, too, but I don't know if the
ODBCEXDI and DB2EXDI support array binding. If they don't, it would be good
to have this feature added.

Joachim Geidel


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Glorp and Array Binding for Oracle

Alan Knight-2
At 06:40 PM 3/14/2009, Joachim Geidel wrote:

>Am 09.03.09 17:04 schrieb Alan Knight:
>> So, the original problem I ran into was that Glorp computes the differences
>> that need to be written. If we're then going to do the updates, we need to
>> fill in the old values for all of the fields that haven't changed. We've
>> somewhat thrown that information away at that point, although that really just
>> means that Glorp would need to be changed to retain that information for that
>> case. But what I meant by concurrency conflicts is a case where we write back
>> the old value for a field that we didn't touch, but in the meantime someone
>> else has updated. i.e.
>>
>>  Person 1
>>     Update Row 47 to set Name = 'Alan'
>>     <also update a bunch of other stuff>
>>  Person2
>>    Update Row 47 to set LastTransactionID to 54321 (previous value 99987)
>>    <also update a bunch of other stuff>
>>
>> So both update Row 47, but in different ways that are potentially compatible.
>> But if we used array binding, then if we did them in order, at the end of it,
>> the name will have been changed but then changed back to its old value.
>
>This kind of conflict does happen whether you update only columns with
>changed values or all columns in a row. Limiting the updates to columns with
>changed values somewhat reduces the probability that the problem occurs, but
>it does not eliminate it. For example:
>
>  Person 1
>    Read Row 47 with account balance = 100
>  Person 2
>    Read Row 47 with account balance = 100
>  Person 1
>     Update Row 47 to set account balance = account balance + 20 (120)
>  Person2
>     Update Row 47 to set account balance = account balance + 10 (110, not
>130)
>
>There are two ways to prevent this kind of inconsistency:
>- Reading and writing have to be in the same database transaction, which has
>  to be in "serializable" mode (in Oracle terms), not in "read
>  committed" mode. The second commit would fail, which has to be handled
>  appropriately.
>- One has to acquire a lock for the row before reading it (or re-read it
>  after acquiring the lock), and hold the lock until the commit. It does
>  not matter if one uses a row lock using SELECT FOR UPDATE, or if the
>  locking mechanism is based on something else like Oracle User Locks, a
>  home-grown locking mechanism, etc.
>
>> This is perhaps a somewhat contrived case, because different sessions updating
>> fields from the same row in ways that aren't conflicting is likely to be rare
>> and is rather frightening to be relying on, but it might happen.
>
>To rephrase this: It is not a good idea to rely on lowering the probability
>of the problem by updating only columns with actual value changes. Indeed,
>it isn't. If the problem can happen, the application has a serious design
>problem concerning transactional integrity. It has to be solved in one of
>the ways described above. In applications with high numbers of concurrent
>sessions and high transaction rates, even low probabilities make the
>occurrence of errors a certainty. Been there, seen that.
>
>That said, in my experience, throughput of high-volume updates is always
>much faster when using array binding at the price of updating more columns
>than needed. The reasons are that
>- the SQL has to be parsed only once (even "soft parses" which hit a cache
>in the Oracle server need time)
>- single row updates need one network roundtrip per row - 1,000 updates at
>10ms per roundtrip mean 10 seconds of execution time, while updating 1,000
>rows with a single update using array binding can usually be done in tenths
>of a second.
>
>IMHO, the minuscule reduction in the probability of hitting the consequences
>of a design problem in an application does not outweigh the serious
>performance impact of using single row statements for updates and deletes in
>Glorp.
>
>Of course, this is only true when using a database which supports bind
>variables in SQL statements as well as array binding, such as Oracle. If I
>am not mistaken, DB2 and ODBC support this, too, but I don't know if the
>ODBCEXDI and DB2EXDI support array binding. If they don't, it would be good
>to have this feature added.
>
>Joachim Geidel

Yes, these are the reasons I called it rather frightening to rely on.

In terms of performance, we may have to reparse sometimes anyway. You're assuming that we always have the same list of columns. We may not, so if we actually wanted to consistently write all the columns, we might actually have to do a large set of reads of information that is otherwise irrelevant. That's likely to bring the performance down some.

But the largest obstacle to implementing that is that you'd have to rework the way Glorp is generating the rows to be written, or else do a postprocessing stage, so that it has the data for the unchanged columns.


--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Glorp and Array Binding for Oracle

Alan Knight-2
In reply to this post by Joachim Geidel
At 06:40 PM 3/14/2009, Joachim Geidel wrote:
>Of course, this is only true when using a database which supports bind
>variables in SQL statements as well as array binding, such as Oracle. If I
>am not mistaken, DB2 and ODBC support this, too, but I don't know if the
>ODBCEXDI and DB2EXDI support array binding. If they don't, it would be good
>to have this feature added.

Also, I'm curious what it would mean for ODBC to support this. Presumably it can't do anything unless the underlying database supports this type of feature, so for example, it would be rather useless if using SQL Server with its <expletive deleted> identity columns.

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Glorp and Array Binding for Oracle

Joachim Geidel
In reply to this post by Joachim Geidel
Am 15.03.09 16:34 schrieb Alan Knight:

> At 06:40 PM 3/14/2009, Joachim Geidel wrote:
>> Of course, this is only true when using a database which supports bind
>> variables in SQL statements as well as array binding, such as Oracle. If I
>> am not mistaken, DB2 and ODBC support this, too, but I don't know if the
>> ODBCEXDI and DB2EXDI support array binding. If they don't, it would be good
>> to have this feature added.
>
> Also, I'm curious what it would mean for ODBC to support this. Presumably it
> can't do anything unless the underlying database supports this type of
> feature, so for example, it would be rather useless if using SQL Server with
> its <expletive deleted> identity columns.

Details are here:
    http://msdn.microsoft.com/en-us/library/ms713841(VS.85).aspx

"Unfortunately, not many data sources support parameter arrays. However, a
driver can emulate parameter arrays by executing an SQL statement once for
each set of parameter values. This can lead to increases in speed because
the driver can then prepare the statement that it plans to execute once for
each parameter set. It might also lead to simpler application code."

So, it has advantages even when the database itself does not support array
binding. If MSDN says it, it must be true. ;-)

Joachim Geidel


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Glorp and Array Binding for Oracle

Joachim Geidel
In reply to this post by Joachim Geidel
Am 15.03.09 16:25 schrieb Alan Knight:
> In terms of performance, we may have to reparse sometimes anyway. You're
> assuming that we always have the same list of columns. We may not, so if we
> actually wanted to consistently write all the columns, we might actually have
> to do a large set of reads of information that is otherwise irrelevant. That's
> likely to bring the performance down some.

That depends on what is actually done in a transaction. I would say that the
typical case is that all columns of a row are read anyway. If the
information has to be re-read depends on the design of the transactions and
the locking scheme used. Glorp is quite flexible in the way it maps between
tables and objects, so there may be difficult cases. However, I think that
they can be avoided by grouping those rows together which lead to exactly
the same SQL statements, and process them with array binding. As most high
volume updates will change the same attributes in all rows anyway, this
would speed up most cases.

> But the largest obstacle to implementing that is that you'd have to rework the
> way Glorp is generating the rows to be written, or else do a postprocessing
> stage, so that it has the data for the unchanged columns.

Too bad. This means that switching to Glorp will not have any advantages
over sticking with the ObjectLens in the application with the performance
problem I was talking about. It also means that Glorp is not yet ready for
mass data processing. :-(

Joachim Geidel


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc