The speed of fetching LOBs from an Oracle database via OracleEXDI could be
considerably higher than it currently is in certain cases. I have seen the problem in a real application, and got to the point where I have a proof of concept for a solution. The problem: ------------ Although OracleSessions have a boolean property named #answerLobProxy, which controls wether a query answers LOB locators or LOB data, LOBs are always fetched using Lob locators, see OracleLargeObjectBuffer>>getBlobAt:. For each row in the answer set, there are two calls of OCI (Oracle Call Interface) functions, OCILobGetLength and OCILobRead. The latter causes a network roundtrip for each call. The OCI can fetch LOBs in two ways. One is the technique used in the OracleEXDI. The way the output variables for LOB columns in queries are defined in OracleSession>>allocateRowBufferExternal:, the OCI will always return a Lob locator. This is perfectly okay for use cases where LOBs are used to store large unstructured data, e.g. large files, video or image data etc. where one usually fetches only a single or very few rows. However, there is another use case where this is not optimal and leads to unnecessary performance losses. LOBs can be used to overcome the length restrictions of RAW and VARCHAR2 columns in Oracle. This makes sense in situations where one has to store binary data with a size of a few kBytes, e.g. small PDF documents, images, or serialized objects. Without LOBs, one would have to store the data in several 2000 byte or 4000 byte chunks in RAW or VARCHAR2 columns. This needs additional code to split and join the chunks in VisualWorks. For these cases, the OCI lets you define LOB output variables with the type SQLT_LNG (type code 8) for CLOBs and SQLT_LBI (24) for BLOBs and a buffer size larger than 4000. If one does this, the OCI fetches LOBs "inline" without using Lob locators. This avoids having to call OCILobRead for each of the rows in the answer set. To use this way of fetching LOBs, one either has to know the maximum size of the BLOBs, or compute it in advance using "select max(length(blob_column)) from ...", which is fast enough. If the buffer size is smaller than a LOB's length, the LOB will be truncated. The technique is described in the OCI Programmer's Guide, chapter 5 "Binding and Defining in OCI", section "Defining LOB Data", and in the "Oracle Database Application Developer's Guide - Large Objects", chapter 13 "Data Interface for Persistent LOBs". Performance impact of using Lob locators: ----------------------------------------- I did an experiment with a table from a productive application containing BLOBs of less than 16 kByte, comparing it with a table where the same data is stored in 2 kByte chunks in a RAW column. On average, there were two rows in the table with the RAW column for every row in the table with the BLOBs. Fetching ca. 1200 objects from the table with the BLOBs lasted more than 10 times(!) as long as fetching the same data from the table with the RAWs, 9-10 seconds vs. 0.8 seconds. The time for fetching rows containing a single LOB column rises linearly with the number of rows. If your network needs 10 milliseconds for a roundtrip, then fetching 1000 rows will last at least 10 seconds. With higher network latency, performance will be worse. Fetching RAWs is much faster - it has to transfer the same amount of data, but with fewer network roundtrips. Proof of concept for a solution: -------------------------------- The attached file OracleLobBuffer.st contains a very simplistic implementation of a subclass of OracleBuffer for BLOBs. It overrides OracleSession class>>largeObjectBufferClass to answer my new class (it would be a bad idea to use this code in a real application!). There is no support for CLOBs, for INSERT and UPDATE statements and for bind variables, the buffer size is hard coded, and it is clearly a prototypical hack not suitable for productive use. I created a table with a BLOB column in Oracle: CREATE TABLE BLOBTEST ( ID NUMBER, DATA BLOB ); Test data was created via Smalltalk: connection := OracleConnection new environment: 'env'; username: 'secret'; password: 'verysecret'. connection connect. session := connection getSession. session prepare: 'insert into blobtest ( id, data ) values (:1, :2)'. data := (String new: 8000 withAll: $x) asByteArray. 1 to: 2000 do: [:i | session bindInput: (Array with: i with: data). session execute; answer; cancel]. session disconnect. connection commit; disconnect Retrieving 1000 and 2000 rows from the table does not make much of a difference with a local Oracle XE database on my MacBook pro: Original OracleEXDI using OracleLargeObjectBuffer: 1.16s (1000 rows) 2.08s (2000 rows) Modified OracleEXDI using OracleLOBBuffer: 0.9s (1000 rows) 1.7s (2000 rows) With network roundtrips to a remote Oracle database, the difference is much larger. The following is teh output from a test with a remote database located ca. 400-500 km from the client machine, so network latency plays a considerable role (time in milliseconds): select * from blobtest where rownum <= 1000 using OracleLargeObjectBuffer 5870 using OracleLargeObjectBuffer 5975 using OracleLargeObjectBuffer 5988 using OracleLOBBuffer 2895 using OracleLOBBuffer 2918 using OracleLOBBuffer 2924 select * from blobtest where rownum <= 2000 using OracleLargeObjectBuffer 11632 using OracleLargeObjectBuffer 11851 using OracleLargeObjectBuffer 11748 using OracleLOBBuffer 6935 using OracleLOBBuffer 5829 using OracleLOBBuffer 5878 I suspect that the difference becomes larger with longer cables and more firewalls and routers between the client and the database server. Design problem in OracleEXDI (just one of them ;-) ): ----------------------------- The class for new large object buffers is determined using a class method of OracleSession. However, this should be an instance method of OracleColumnDescription, as the buffer class might actually be different for each column. Changing the EXDI accordingly would also make it possible to remove the "case" logic in allocateRowBufferExternal: colDescr isLargeObject ifFalse: [( ( colDescr isTimestamp ifTrue: [ self class timestampBufferClass ] ifFalse: [ self class bufferClass ] ) new encoding: self connection encoding) columnDescription: colDescr count: self blockFactor maxLongBytes: self maxLongBytes] ifTrue: [(self class largeObjectBufferClass new encoding: self connection encoding) columnDescription: colDescr count: self blockFactor session: self]. This could be rewritten as: colDescr newBufferFor: self Best regards, Joachim Geidel _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc OracleLobBuffer.st.zip (1K) Download Attachment |
Free forum by Nabble | Edit this page |