[PATCH] [RFC] dbd-sqlite: Do not fetch all rows of a select at once

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] [RFC] dbd-sqlite: Do not fetch all rows of a select at once

Holger Freyther
From: Holger Hans Peter Freyther <[hidden email]>

The current code is loading all rows of a select into the memory. For
my application this consumes too much memory.

SQLite3 does not support to query the size of the result set and this
code will return -1 for the rowSize. This is inline with several other
SQL frameworks, e.g. QtSql and python. To complicate the code further
it tries to allow to get the entire result set using the #rows selector.

When the ResultSet is created the query is started. This allows us
to free the bound parameters immediately. If #rows is called on a clean
ResultSet all rows will be fetched. If not the #atEnd, #next will work
on the ResultSet.
---
 packages/dbd-sqlite/ResultSet.st   |   58 +++++++++++++++++++++++++-----------
 packages/dbd-sqlite/SQLiteTests.st |    4 ++-
 packages/dbd-sqlite/Statement.st   |   10 ++-----
 packages/dbi/ResultSet.st          |    6 ++--
 4 files changed, 51 insertions(+), 27 deletions(-)

diff --git a/packages/dbd-sqlite/ResultSet.st b/packages/dbd-sqlite/ResultSet.st
index 985295f..abf05a5 100644
--- a/packages/dbd-sqlite/ResultSet.st
+++ b/packages/dbd-sqlite/ResultSet.st
@@ -8,7 +8,7 @@
 
 "======================================================================
 |
-| Copyright 2007, 2008 Free Software Foundation, Inc.
+| Copyright 2007, 2008, 2012 Free Software Foundation, Inc.
 | Written by Daniele Sciascia
 |
 | This file is part of the GNU Smalltalk class library.
@@ -31,9 +31,12 @@
  ======================================================================
 "
 
-
 ResultSet subclass: SQLiteResultSet [
-    | handle rows columns index |
+    | handle rows columns index lastRes |
+
+    <comment: 'This class is doing three things at once. It can handle
+    SELECT and DML. For the result of a select a legacy fetch all interface
+    is provided or a stream based one.'>
   
     SQLiteResultSet class >> on: aStatement [
         <category: 'instance creation'>
@@ -46,7 +49,7 @@ ResultSet subclass: SQLiteResultSet [
         self statement: aStatement.
         self handle: (aStatement handle).
         self isSelect
-    ifTrue: [self populate]
+    ifTrue: [lastRes := self handle exec.]
     ifFalse: [self exec]
     ]
     
@@ -58,17 +61,17 @@ ResultSet subclass: SQLiteResultSet [
  rows := handle changes
     ]
 
-    populate [
+    populateAllRows [
         <category: 'initialization'>
-        | resCode |
         
         rows := OrderedCollection new.
-        [ resCode := self handle exec.
-          resCode = 100
-        ] whileTrue: [rows addLast:
-                        (SQLiteRow forValues: self handle returnedRow copy in: self)].
+        [lastRes = 100]
+            whileTrue: [
+                rows addLast:
+                        (SQLiteRow forValues: self handle returnedRow copy in: self).
+                lastRes := self handle exec].
         
-        self handle checkError: resCode = 101.
+        self handle checkError: lastRes = 101.
     ]
     
     handle [
@@ -83,14 +86,28 @@ ResultSet subclass: SQLiteResultSet [
 
     next [
         <category: 'cursor access'>
+        | res |
+
+        "At the end?"
      self atEnd ifTrue: [self error: 'No more rows'].
-     index := index + 1.
-     ^self rows at: index
+
+        "Using cached results?"
+        rows isNil ifFalse: [index := index + 1. ^self rows at: index].
+
+        "first row handling.."
+        index := index + 1.
+        res := SQLiteRow forValues: self handle returnedRow copy in: self.
+        lastRes := self handle exec.
+        lastRes = 101 ifTrue: [self handle reset].
+
+        ^ res.
     ]
 
     atEnd [
         <category: 'cursor access'>
-        ^index >= self rowCount
+        ^ rows isNil
+            ifFalse: [index >= self rows size]
+            ifTrue:  [lastRes ~= 100].
     ]
     
     position [
@@ -100,7 +117,9 @@ ResultSet subclass: SQLiteResultSet [
 
     position: anInteger [
         <category: 'stream protocol'>
-        (anInteger between: 0 and: self size)
+        rows isNil ifTrue: [self error: 'Can not set the position on SQLite'].
+
+        (anInteger between: 0 and: self rows size)
             ifTrue: [ index := anInteger ]
             ifFalse: [ SystemExceptions.IndexOutOfRange signalOn: self withIndex: anInteger ].
         ^index
@@ -142,13 +161,18 @@ ResultSet subclass: SQLiteResultSet [
     
     rows [
         <category: 'accessing'>
-        ^rows
+        rows isNil ifFalse: [^rows].
+        index = 0 ifFalse: [
+            ^ self error: 'Can only ask for the row set before the first fetch.'].
+
+        self populateAllRows.
+        ^ rows
     ]
 
     rowCount [
         <category: 'accessing'>
         self isSelect
-            ifTrue: [^self rows size]
+            ifTrue: [^-1]
             ifFalse: [^self error: 'Not a SELECT statement.']
     ]
 
diff --git a/packages/dbd-sqlite/SQLiteTests.st b/packages/dbd-sqlite/SQLiteTests.st
index fe9d0ef..0616cda 100644
--- a/packages/dbd-sqlite/SQLiteTests.st
+++ b/packages/dbd-sqlite/SQLiteTests.st
@@ -120,7 +120,9 @@ SQLiteBaseTest subclass: SQLiteResultSetTestCase [
     ]
     
     testRowCount [
-        self should: [rs rowCount = 3]
+        self should: [rs rowCount = -1].
+        self should: [rs rows size = 3].
+        self should: [rs rowCount = -1].
     ]
 ]
 
diff --git a/packages/dbd-sqlite/Statement.st b/packages/dbd-sqlite/Statement.st
index ea6166d..9b0cfc4 100644
--- a/packages/dbd-sqlite/Statement.st
+++ b/packages/dbd-sqlite/Statement.st
@@ -71,23 +71,19 @@ Statement subclass: SQLiteStatement [
     
     execute [
         <category: 'querying'>
+        self handle reset.
         ^SQLiteResultSet on: self
     ]
     
     executeWithAll: aParams [
         <category: 'querying'>
         | resCode |
+        self handle reset.
         ^[aParams keysAndValuesDo: [:i :param |
                 resCode := self handle bindingAt: i put: param.
                 self handle checkError: resCode = 0].
                                           
-        SQLiteResultSet on: self] ensure: [self resetAndClear]
-    ]
-    
-    resetAndClear [
-        <category: 'private'>
-        self handle reset.
-        self handle clearBindings.
+        SQLiteResultSet on: self] ensure: [self handle clearBindings]
     ]
     
     getCommand [
diff --git a/packages/dbi/ResultSet.st b/packages/dbi/ResultSet.st
index 308a268..f43d472 100644
--- a/packages/dbi/ResultSet.st
+++ b/packages/dbi/ResultSet.st
@@ -123,14 +123,16 @@ case I only hold the number of rows affected.'>
     ]
 
     size [
- "Returns the number of rows in the result set."
+ "Returns the number of rows in the result set or -1 if the backend
+         does not support this information."
  <category: 'stream protocol'>
  ^self rowCount
     ]
 
     rowCount [
  "Returns the number of rows in the result set;
- error for DML statements."
+ error for DML statements and -1 if the backend does not support
+         this information."
 
  <category: 'accessing'>
  self error: 'Not a SELECT statement.'
--
1.7.10.4


_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] [RFC] dbd-sqlite: Do not fetch all rows of a select at once

Paolo Bonzini-2
Il 20/11/2012 15:08, Holger Hans Peter Freyther ha scritto:
>      rowCount [
>          <category: 'accessing'>
>          self isSelect
> -            ifTrue: [^self rows size]
> +            ifTrue: [^-1]

I think it's simpler to leave this method aside.  Code using
#next/#atEnd will simply not call #rowCount.

Otherwise the patch looks good, thanks!

Paolo

>              ifFalse: [^self error: 'Not a SELECT statement.']
>      ]


_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] [RFC] dbd-sqlite: Do not fetch all rows of a select at once

Holger Freyther
On Mon, Nov 26, 2012 at 01:02:51PM +0100, Paolo Bonzini wrote:

> I think it's simpler to leave this method aside.  Code using
> #next/#atEnd will simply not call #rowCount.

Okay, then I change the documentation to not call rowCount when
atEnd/next? is used? E.g. for Postgres one can query the size of
the resultset.


_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] [RFC] dbd-sqlite: Do not fetch all rows of a select at once

Paolo Bonzini-2
Il 26/11/2012 17:07, Holger Hans Peter Freyther ha scritto:
> > I think it's simpler to leave this method aside.  Code using
> > #next/#atEnd will simply not call #rowCount.
>
> Okay, then I change the documentation to not call rowCount when
> atEnd/next? is used? E.g. for Postgres one can query the size of
> the resultset.

Yes, I think this would be better.  We would need a separate method name
if #rowCount could return -1 (or better, nil) on SQLite.

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

[PATCH] dbd-sqlite: Do not fetch all rows of a select at once

Holger Freyther
In reply to this post by Holger Freyther
The current code is loading all rows of a select into the memory. For
my application this consumes too much memory. SQLite3 does not support
to query the size of the result set. This means that >>#rowSize can not
determine the size of the result set. When using the >>#next/>>#atEnd
selectors the >>#rowSize will not be available.

When using >>#rows/>>#rowSize/>>#size before any other query will result
in all results to be fetched.
---
 NEWS                               |    6 +++-
 packages/dbd-sqlite/ChangeLog      |    8 +++++
 packages/dbd-sqlite/ResultSet.st   |   59 ++++++++++++++++++++++++++----------
 packages/dbd-sqlite/SQLiteTests.st |   39 +++++++++++++++++++++++-
 packages/dbd-sqlite/Statement.st   |   10 ++----
 packages/dbi/ChangeLog             |    4 +++
 packages/dbi/ResultSet.st          |    8 +++--
 7 files changed, 106 insertions(+), 28 deletions(-)

diff --git a/NEWS b/NEWS
index d768a8f..0d47cf4 100644
--- a/NEWS
+++ b/NEWS
@@ -1,10 +1,14 @@
 List of user-visible changes in GNU Smalltalk
 
-NEWS FROM 3.2.5 to 3.2.90
+NEWS FROM 3.2.5 to 3.2.91
 
 o   Add PackageLoader>>#loadPackageFromFile: to load a package by using
     a package.xml. This can make the development more effective.
 
+o   Change the semantic of >>#rowCount for the SQLite backend. One can
+    either use the >>#next/#atEnd selectors to stream over the result or use
+    the #>>rows/#rowCount selectors.
+
 -----------------------------------------------------------------------------
 
 NEWS FROM 3.2.4 to 3.2.5
diff --git a/packages/dbd-sqlite/ChangeLog b/packages/dbd-sqlite/ChangeLog
index 2608bb5..303158f 100644
--- a/packages/dbd-sqlite/ChangeLog
+++ b/packages/dbd-sqlite/ChangeLog
@@ -1,3 +1,11 @@
+2013-04-15  Holger Hans Peter Freyther  <[hidden email]>
+
+ * ResultSet.st: Implement streaming usage to conserve memory.
+ * SQLiteTests.st: Add SQLiteStreamRowTestCase class. Modify
+ existing tests.
+ * Statement.st: Reset the handle before executing the query
+ and do not reset it after doing the query.
+
 2011-04-09  Paolo Bonzini  <[hidden email]>
 
  * Statement.st: Move #resetAndClear inside an #ensure: block.
diff --git a/packages/dbd-sqlite/ResultSet.st b/packages/dbd-sqlite/ResultSet.st
index 985295f..7536773 100644
--- a/packages/dbd-sqlite/ResultSet.st
+++ b/packages/dbd-sqlite/ResultSet.st
@@ -8,7 +8,7 @@
 
 "======================================================================
 |
-| Copyright 2007, 2008 Free Software Foundation, Inc.
+| Copyright 2007, 2008, 2013 Free Software Foundation, Inc.
 | Written by Daniele Sciascia
 |
 | This file is part of the GNU Smalltalk class library.
@@ -31,9 +31,12 @@
  ======================================================================
 "
 
-
 ResultSet subclass: SQLiteResultSet [
-    | handle rows columns index |
+    | handle rows columns index lastRes |
+
+    <comment: 'This class is doing three things at once. It can handle
+    SELECT and DML. For the result of a select a legacy fetch all interface
+    is provided or a stream based one.'>
   
     SQLiteResultSet class >> on: aStatement [
         <category: 'instance creation'>
@@ -46,7 +49,7 @@ ResultSet subclass: SQLiteResultSet [
         self statement: aStatement.
         self handle: (aStatement handle).
         self isSelect
-    ifTrue: [self populate]
+    ifTrue: [lastRes := self handle exec.]
     ifFalse: [self exec]
     ]
     
@@ -58,17 +61,17 @@ ResultSet subclass: SQLiteResultSet [
  rows := handle changes
     ]
 
-    populate [
+    populateAllRows [
         <category: 'initialization'>
-        | resCode |
         
         rows := OrderedCollection new.
-        [ resCode := self handle exec.
-          resCode = 100
-        ] whileTrue: [rows addLast:
-                        (SQLiteRow forValues: self handle returnedRow copy in: self)].
+        [lastRes = 100]
+            whileTrue: [
+                rows addLast:
+                        (SQLiteRow forValues: self handle returnedRow copy in: self).
+                lastRes := self handle exec].
         
-        self handle checkError: resCode = 101.
+        self handle checkError: lastRes = 101.
     ]
     
     handle [
@@ -83,14 +86,28 @@ ResultSet subclass: SQLiteResultSet [
 
     next [
         <category: 'cursor access'>
+        | res |
+
+        "At the end?"
      self atEnd ifTrue: [self error: 'No more rows'].
-     index := index + 1.
-     ^self rows at: index
+
+        "Using cached results?"
+        rows isNil ifFalse: [index := index + 1. ^self rows at: index].
+
+        "first row handling.."
+        index := index + 1.
+        res := SQLiteRow forValues: self handle returnedRow copy in: self.
+        lastRes := self handle exec.
+        lastRes = 101 ifTrue: [self handle reset].
+
+        ^ res.
     ]
 
     atEnd [
         <category: 'cursor access'>
-        ^index >= self rowCount
+        ^ rows isNil
+            ifFalse: [index >= self rows size]
+            ifTrue:  [lastRes ~= 100].
     ]
     
     position [
@@ -100,7 +117,9 @@ ResultSet subclass: SQLiteResultSet [
 
     position: anInteger [
         <category: 'stream protocol'>
-        (anInteger between: 0 and: self size)
+        rows isNil ifTrue: [self error: 'Can not set the position on SQLite'].
+
+        (anInteger between: 0 and: self rows size)
             ifTrue: [ index := anInteger ]
             ifFalse: [ SystemExceptions.IndexOutOfRange signalOn: self withIndex: anInteger ].
         ^index
@@ -142,11 +161,19 @@ ResultSet subclass: SQLiteResultSet [
     
     rows [
         <category: 'accessing'>
-        ^rows
+        rows isNil ifFalse: [^rows].
+        index = 0 ifFalse: [
+            ^ self error: 'Can only ask for the row set before the first fetch.'].
+
+        self populateAllRows.
+        ^ rows
     ]
 
     rowCount [
         <category: 'accessing'>
+        "I'm only available for SELECT statements and only when used together with
+        >>#rows. For streaming usage with >>#atEnd and >>#next I may not be used.
+        This is because SQLite3 does not indicate the size of the query set."
         self isSelect
             ifTrue: [^self rows size]
             ifFalse: [^self error: 'Not a SELECT statement.']
diff --git a/packages/dbd-sqlite/SQLiteTests.st b/packages/dbd-sqlite/SQLiteTests.st
index fe9d0ef..d3bb1c2 100644
--- a/packages/dbd-sqlite/SQLiteTests.st
+++ b/packages/dbd-sqlite/SQLiteTests.st
@@ -120,7 +120,13 @@ SQLiteBaseTest subclass: SQLiteResultSetTestCase [
     ]
     
     testRowCount [
-        self should: [rs rowCount = 3]
+        self should: [rs rowCount = 3].
+        self should: [rs rows size = 3].
+    ]
+
+    testMixRowCountAtEnd [
+        rs next.
+        self should: [rs rowCount] raise: Error description: 'May not mix next/rowCount'.
     ]
 ]
 
@@ -146,6 +152,34 @@ SQLiteBaseTest subclass: SQLiteRowTestCase [
     ]
 ]
 
+SQLiteBaseTest subclass: SQLiteStreamRowTestCase [
+    | rs |
+
+    setUp [
+        super setUp.
+        rs := self connection select: 'select * from test'.
+    ]
+
+    testRead [
+        | row |
+        self shouldnt: [rs atEnd].
+
+        "First row"
+        row := rs next.
+        self should: [(row at: 'string_field') = 'one'].
+        self shouldnt: [rs atEnd].
+
+        "Second row"
+        row := rs next.
+        self should: [(row at: 'string_field') = 'two'].
+        self shouldnt: [rs atEnd].
+
+        "Third row"
+        row := rs next.
+        self should: [(row at: 'string_field') = 'three'].
+        self should: [rs atEnd].
+    ]
+]
 
 SQLiteBaseTest subclass: SQLitePreparedStatementTestCase [
     | stmt stmt2 stmt3 |
@@ -204,10 +238,13 @@ TestSuite subclass: SQLiteTestSuite [
         self addTest: (SQLiteResultSetTestCase selector: #testAtEnd).
         self addTest: (SQLiteResultSetTestCase selector: #testColumnNames).
         self addTest: (SQLiteResultSetTestCase selector: #testRowCount).
+        self addTest: (SQLiteResultSetTestCase selector: #testMixRowCountAtEnd).
 
         self addTest: (SQLiteRowTestCase selector: #testAt).
         self addTest: (SQLiteRowTestCase selector: #testAtIndex).
 
+        self addTest: (SQLiteStreamRowTestCase selector: #testRead).
+
         self addTest: (SQLiteDMLResultSetTestCase selector: #testRowsAffected).
         
         self addTest: (SQLitePreparedStatementTestCase selector: #testExecute).
diff --git a/packages/dbd-sqlite/Statement.st b/packages/dbd-sqlite/Statement.st
index ea6166d..9b0cfc4 100644
--- a/packages/dbd-sqlite/Statement.st
+++ b/packages/dbd-sqlite/Statement.st
@@ -71,23 +71,19 @@ Statement subclass: SQLiteStatement [
     
     execute [
         <category: 'querying'>
+        self handle reset.
         ^SQLiteResultSet on: self
     ]
     
     executeWithAll: aParams [
         <category: 'querying'>
         | resCode |
+        self handle reset.
         ^[aParams keysAndValuesDo: [:i :param |
                 resCode := self handle bindingAt: i put: param.
                 self handle checkError: resCode = 0].
                                           
-        SQLiteResultSet on: self] ensure: [self resetAndClear]
-    ]
-    
-    resetAndClear [
-        <category: 'private'>
-        self handle reset.
-        self handle clearBindings.
+        SQLiteResultSet on: self] ensure: [self handle clearBindings]
     ]
     
     getCommand [
diff --git a/packages/dbi/ChangeLog b/packages/dbi/ChangeLog
index 67b1647..e0a8c96 100644
--- a/packages/dbi/ChangeLog
+++ b/packages/dbi/ChangeLog
@@ -1,3 +1,7 @@
+2013-04-15  Holger Hans Peter Freyther  <[hidden email]>
+
+ * ResultSet.st: Add documentation to >>#rowCount and >>#size.
+
 2011-04-08  Holger Hans Peter Freyther  <[hidden email]>
 
  * Statement.st: Add Statement class>>#getCommand:.
diff --git a/packages/dbi/ResultSet.st b/packages/dbi/ResultSet.st
index 308a268..c3c312d 100644
--- a/packages/dbi/ResultSet.st
+++ b/packages/dbi/ResultSet.st
@@ -8,7 +8,7 @@
 "======================================================================
 |
 | Copyright 2006 Mike Anderson
-| Copyright 2007, 2008, 2009 Free Software Foundation, Inc.
+| Copyright 2007, 2008, 2009, 2013 Free Software Foundation, Inc.
 |
 | Written by Mike Anderson
 |
@@ -123,14 +123,16 @@ case I only hold the number of rows affected.'>
     ]
 
     size [
- "Returns the number of rows in the result set."
+ "Returns the number of rows in the result set. See >>#rowCount for
+ details."
  <category: 'stream protocol'>
  ^self rowCount
     ]
 
     rowCount [
  "Returns the number of rows in the result set;
- error for DML statements."
+ error for DML statements. Not all implementations allow to query
+ the size of the ResultSet. In this case an Error will be raised."
 
  <category: 'accessing'>
  self error: 'Not a SELECT statement.'
--
1.7.10.4


_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk