[vwnc] Performance problem in OracleEXDI when fetching LOBs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[vwnc] Performance problem in OracleEXDI when fetching LOBs

Joachim Geidel
The speed of fetching LOBs from an Oracle database via OracleEXDI could be
considerably higher than it currently is in certain cases. I have seen the
problem in a real application, and got to the point where I have a proof of
concept for a solution.

The problem:
------------
Although OracleSessions have a boolean property named #answerLobProxy, which
controls wether a query answers LOB locators or LOB data, LOBs are always
fetched using Lob locators, see OracleLargeObjectBuffer>>getBlobAt:. For
each row in the answer set, there are two calls of OCI (Oracle Call
Interface) functions, OCILobGetLength and OCILobRead. The latter causes a
network roundtrip for each call.

The OCI can fetch LOBs in two ways. One is the technique used in the
OracleEXDI. The way the output variables for LOB columns in queries are
defined in OracleSession>>allocateRowBufferExternal:, the OCI will always
return a Lob locator.

This is perfectly okay for use cases where LOBs are used to store large
unstructured data, e.g. large files, video or image data etc. where one
usually fetches only a single or very few rows.

However, there is another use case where this is not optimal and leads to
unnecessary performance losses. LOBs can be used to overcome the length
restrictions of RAW and VARCHAR2 columns in Oracle. This makes sense in
situations where one has to store binary data with a size of a few kBytes,
e.g. small PDF documents, images, or serialized objects. Without LOBs, one
would have to store the data in several 2000 byte or 4000 byte chunks in RAW
or VARCHAR2 columns. This needs additional code to split and join the chunks
in VisualWorks.

For these cases, the OCI lets you define LOB output variables with the type
SQLT_LNG (type code 8) for CLOBs and SQLT_LBI (24) for BLOBs and a buffer
size larger than 4000. If one does this, the OCI fetches LOBs "inline"
without using Lob locators. This avoids having to call OCILobRead for each
of the rows in the answer set. To use this way of fetching LOBs, one either
has to know the maximum size of the BLOBs, or compute it in advance using
"select max(length(blob_column)) from ...", which is fast enough. If the
buffer size is smaller than a LOB's length, the LOB will be truncated.

The technique is described in the OCI Programmer's Guide, chapter 5 "Binding
and Defining in OCI", section "Defining LOB Data", and in the "Oracle
Database Application Developer's Guide - Large Objects", chapter 13 "Data
Interface for Persistent LOBs".

Performance impact of using Lob locators:
-----------------------------------------
I did an experiment with a table from a productive application containing
BLOBs of less than 16 kByte, comparing it with a table where the same data
is stored in 2 kByte chunks in a RAW column. On average, there were two rows
in the table with the RAW column for every row in the table with the BLOBs.
Fetching ca. 1200 objects from the table with the BLOBs lasted more than 10
times(!) as long as fetching the same data from the table with the RAWs,
9-10 seconds vs. 0.8 seconds.

The time for fetching rows containing a single LOB column rises linearly
with the number of rows. If your network needs 10 milliseconds for a
roundtrip, then fetching 1000 rows will last at least 10 seconds. With
higher network latency, performance will be worse. Fetching RAWs is much
faster - it has to transfer the same amount of data, but with fewer network
roundtrips.

Proof of concept for a solution:
--------------------------------
The attached file OracleLobBuffer.st contains a very simplistic
implementation of a subclass of OracleBuffer for BLOBs. It overrides
OracleSession class>>largeObjectBufferClass to answer my new class (it would
be a bad idea to use this code in a real application!). There is no support
for CLOBs, for INSERT and UPDATE statements and for bind variables, the
buffer size is hard coded, and it is clearly a prototypical hack not
suitable for productive use.

I created a table with a BLOB column in Oracle:
    CREATE TABLE BLOBTEST ( ID NUMBER, DATA BLOB );

Test data was created via Smalltalk:

connection := OracleConnection new
    environment: 'env'; username: 'secret'; password: 'verysecret'.
connection connect.
session := connection getSession.
session prepare: 'insert into blobtest ( id, data ) values (:1, :2)'.
data := (String new: 8000 withAll: $x) asByteArray.
1 to: 2000 do: [:i |
    session bindInput: (Array with: i with: data).
    session execute; answer; cancel].
session disconnect.
connection commit; disconnect

Retrieving 1000 and 2000 rows from the table does not make much of a
difference with a local Oracle XE database on my MacBook pro:

Original OracleEXDI using OracleLargeObjectBuffer:
1.16s (1000 rows)
2.08s (2000 rows)
Modified OracleEXDI using OracleLOBBuffer:
0.9s (1000 rows)
1.7s (2000 rows)

With network roundtrips to a remote Oracle database, the difference is much
larger. The following is teh output from a test with a remote database
located ca. 400-500 km from the client machine, so network latency plays a
considerable role (time in milliseconds):

select * from blobtest where rownum <= 1000
using OracleLargeObjectBuffer    5870
using OracleLargeObjectBuffer    5975
using OracleLargeObjectBuffer    5988
using OracleLOBBuffer    2895
using OracleLOBBuffer    2918
using OracleLOBBuffer    2924

select * from blobtest where rownum <= 2000
using OracleLargeObjectBuffer    11632
using OracleLargeObjectBuffer    11851
using OracleLargeObjectBuffer    11748
using OracleLOBBuffer    6935
using OracleLOBBuffer    5829
using OracleLOBBuffer    5878

I suspect that the difference becomes larger with longer cables and more
firewalls and routers between the client and the database server.


Design problem in OracleEXDI (just one of them ;-) ):
-----------------------------
The class for new large object buffers is determined using a class method of
OracleSession. However, this should be an instance method of
OracleColumnDescription, as the buffer class might actually be different for
each column. Changing the EXDI accordingly would also make it possible to
remove the "case" logic in allocateRowBufferExternal:

colDescr isLargeObject
                ifFalse:
                    [( ( colDescr isTimestamp ifTrue: [ self class
timestampBufferClass ] ifFalse: [ self class bufferClass ] ) new  encoding:
self connection encoding)
                        columnDescription: colDescr
                        count: self blockFactor
                        maxLongBytes: self maxLongBytes]
                ifTrue:
                    [(self class largeObjectBufferClass new  encoding: self
connection encoding)
                        columnDescription: colDescr
                        count: self blockFactor
                        session: self].

This could be rewritten as:

    colDescr newBufferFor: self

Best regards,
Joachim Geidel


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

OracleLobBuffer.st.zip (1K) Download Attachment