[vwnc] Large queries with PostgreSQL driver

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[vwnc] Large queries with PostgreSQL driver

david.long
Hi everyone,

I have a large query that seems to be crashing my image. I'm running it on Postgres, it may be happening when I have a break point in the code.
I'm not sure. Does anyone have any suggestion on handling large sets of data coming from the database into a Smalltalk image?

thanks,

David
-- 

CEO
SageTea Group
613 722 2091 x 5
http://sageteagroup.com

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Large queries with PostgreSQL driver

Alan Knight-2
The Postgresql driver does not support streaming right now, so large queries come back all at once. The easiest workaround is to use the postgresql offset and limit commands to break the results up into chunks.

At 07:11 AM 7/7/2009, David Long wrote:
Hi everyone,

I have a large query that seems to be crashing my image. I'm running it on Postgres, it may be happening when I have a break point in the code.
I'm not sure. Does anyone have any suggestion on handling large sets of data coming from the database into a Smalltalk image?

thanks,

David

-- 

CEO
SageTea Group
613 722 2091 x 5
http://sageteagroup.com
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Large queries with PostgreSQL driver

giorgiof
In reply to this post by david.long
Hi, David,

I don't know what large means for you, but I was able to load at least 80/100.000 rows of data in an image without problem, using postgres. Only I had to change the memory settings, else I had memory limit problems. Obviously, I had also to wait a bit...
Using VW 7.6 and postgres parcel  found on the 7.6 distro.

Generally speaking, in any case is probably  better try to do it in step,  for example using LIMIT and OFFSET

ciao

Giorgio

On Tue, Jul 7, 2009 at 1:11 PM, David Long <[hidden email]> wrote:
Hi everyone,

I have a large query that seems to be crashing my image. I'm running it on Postgres, it may be happening when I have a break point in the code.
I'm not sure. Does anyone have any suggestion on handling large sets of data coming from the database into a Smalltalk image?

thanks,

David
-- 

CEO
SageTea Group
613 722 2091 x 5
http://sageteagroup.com

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc



_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Large queries with PostgreSQL driver

david.long
ah k, so count the entries and use limit and offset to get them in
smaller size chunks...awesome, thanks fellas..

David
-----Original Message-----
From: giorgio ferraris <[hidden email]>
To: David Long <[hidden email]>
Cc: VWNC <[hidden email]>
Subject: Re: [vwnc] Large queries with PostgreSQL driver
Date: Tue, 7 Jul 2009 16:36:06 +0200

Hi, David,

I don't know what large means for you, but I was able to load at least
80/100.000 rows of data in an image without problem, using postgres.
Only I had to change the memory settings, else I had memory limit
problems. Obviously, I had also to wait a bit...
Using VW 7.6 and postgres parcel  found on the 7.6 distro.

Generally speaking, in any case is probably  better try to do it in
step,  for example using LIMIT and OFFSET

ciao

Giorgio

On Tue, Jul 7, 2009 at 1:11 PM, David Long <[hidden email]>
wrote:
        Hi everyone,
       
        I have a large query that seems to be crashing my image. I'm
        running it on Postgres, it may be happening when I have a break
        point in the code.
        I'm not sure. Does anyone have any suggestion on handling large
        sets of data coming from the database into a Smalltalk image?
       
        thanks,
       
        David
        --
       
        CEO
        SageTea Group
        613 722 2091 x 5
        http://sageteagroup.com
       
        _______________________________________________
        vwnc mailing list
        [hidden email]
        http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
       

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Large queries with PostgreSQL driver

Mike Hales
I've found similar problems. I use cursors to chunk over large queries in postgres. I built a simple object that has some of the same api as streams that wrap the cursor and query, and then can stream over the results. It's interesting that I can get the same number of rows into the image using a cursor to return a few (several thousand) at a time with no problem, but pulling them in in one fell swoop causes problems, even though in the end, I still have all the values in the image in one collection no matter which way I get them.

Mike

Mike Hales
Engineering Manager
KnowledgeScape
www.kscape.com


On Tue, Jul 7, 2009 at 9:11 AM, David Long <[hidden email]> wrote:
ah k, so count the entries and use limit and offset to get them in
smaller size chunks...awesome, thanks fellas..

David
-----Original Message-----
From: giorgio ferraris <[hidden email]>
To: David Long <[hidden email]>
Cc: VWNC <[hidden email]>
Subject: Re: [vwnc] Large queries with PostgreSQL driver
Date: Tue, 7 Jul 2009 16:36:06 +0200

Hi, David,

I don't know what large means for you, but I was able to load at least
80/100.000 rows of data in an image without problem, using postgres.
Only I had to change the memory settings, else I had memory limit
problems. Obviously, I had also to wait a bit...
Using VW 7.6 and postgres parcel  found on the 7.6 distro.

Generally speaking, in any case is probably  better try to do it in
step,  for example using LIMIT and OFFSET

ciao

Giorgio

On Tue, Jul 7, 2009 at 1:11 PM, David Long <[hidden email]>
wrote:
       Hi everyone,

       I have a large query that seems to be crashing my image. I'm
       running it on Postgres, it may be happening when I have a break
       point in the code.
       I'm not sure. Does anyone have any suggestion on handling large
       sets of data coming from the database into a Smalltalk image?

       thanks,

       David
       --

       CEO
       SageTea Group
       613 722 2091 x 5
       http://sageteagroup.com

       _______________________________________________
       vwnc mailing list
       [hidden email]
       http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Large queries with PostgreSQL driver

david.long
Hi Mike,
yup - I have some rudimentary paging running now - its cruising along
and is passed where it was blowing up. I might upload this to the public
repository - no doubt there's a few people who can improve on this
"wrapper", and it would be useful for those Postgres people out there..

David
-----Original Message-----
From: Mike Hales <[hidden email]>
To: David Long <[hidden email]>
Cc: VWNC <[hidden email]>
Subject: Re: [vwnc] Large queries with PostgreSQL driver
Date: Tue, 7 Jul 2009 13:34:41 -0600

I've found similar problems. I use cursors to chunk over large queries
in postgres. I built a simple object that has some of the same api as
streams that wrap the cursor and query, and then can stream over the
results. It's interesting that I can get the same number of rows into the image using a cursor to return a few (several thousand) at a time with no problem, but pulling them in in one fell swoop causes problems, even though in the end, I still have all the values in the image in one collection no matter which way I get them.


Mike

Mike Hales
Engineering Manager
KnowledgeScape
www.kscape.com


On Tue, Jul 7, 2009 at 9:11 AM, David Long <[hidden email]>
wrote:
        ah k, so count the entries and use limit and offset to get them
        in
        smaller size chunks...awesome, thanks fellas..
       
        David
       
        -----Original Message-----
        From: giorgio ferraris <[hidden email]>
        To: David Long <[hidden email]>
        Cc: VWNC <[hidden email]>
        Subject: Re: [vwnc] Large queries with PostgreSQL driver
        Date: Tue, 7 Jul 2009 16:36:06 +0200
       
        Hi, David,
       
        I don't know what large means for you, but I was able to load at
        least
        80/100.000 rows of data in an image without problem, using
        postgres.
        Only I had to change the memory settings, else I had memory
        limit
        problems. Obviously, I had also to wait a bit...
        Using VW 7.6 and postgres parcel  found on the 7.6 distro.
       
        Generally speaking, in any case is probably  better try to do it
        in
        step,  for example using LIMIT and OFFSET
       
        ciao
       
        Giorgio
       
        On Tue, Jul 7, 2009 at 1:11 PM, David Long
        <[hidden email]>
        wrote:
               Hi everyone,
       
               I have a large query that seems to be crashing my image.
        I'm
               running it on Postgres, it may be happening when I have a
        break
               point in the code.
               I'm not sure. Does anyone have any suggestion on handling
        large
               sets of data coming from the database into a Smalltalk
        image?
       
               thanks,
       
               David
               --
       
               CEO
               SageTea Group
               613 722 2091 x 5
               http://sageteagroup.com
       
               _______________________________________________
               vwnc mailing list
               [hidden email]
               http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
       
       
        _______________________________________________
        vwnc mailing list
        [hidden email]
        http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
       
        _______________________________________________
        vwnc mailing list
        [hidden email]
        http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
       



_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Large queries with PostgreSQL driver

Mark Pirogovsky-3
In reply to this post by Alan Knight-2
Does anybody care to elaborate ?

Are  you  suggesting to issue multiple queries in a form like

Select * from foo limit 100
 
first and then

Select * from foo offset 101 limit 100
 Select * from foo offset 201 limit 100
................................
And to ensure that your view on the DB is constant do it inside
transaction ?



Alan Knight wrote:

> The Postgresql driver does not support streaming right now, so large
> queries come back all at once. The easiest workaround is to use the
> postgresql offset and limit commands to break the results up into chunks.
>
> At 07:11 AM 7/7/2009, David Long wrote:
>> Hi everyone,
>>
>> I have a large query that seems to be crashing my image. I'm running
>> it on Postgres, it may be happening when I have a break point in the
>> code.
>> I'm not sure. Does anyone have any suggestion on handling large sets
>> of data coming from the database into a Smalltalk image?
>>
>> thanks,
>>
>> David
>>
>> --
>>
>> CEO
>> SageTea Group
>> 613 722 2091 x 5
>> http://sageteagroup.com
>>    
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
> --
> Alan Knight [|], Engineering Manager, Cincom Smalltalk
> [hidden email]
> [hidden email]
> http://www.cincom.com/smalltalk
> ------------------------------------------------------------------------
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>  

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Large queries with PostgreSQL driver

Mark Pirogovsky-3
In reply to this post by Mike Hales
Mike that is very interesting ! can you share it with others ?

Mike Hales wrote:

> I've found similar problems. I use cursors to chunk over large queries
> in postgres. I built a simple object that has some of the same api as
> streams that wrap the cursor and query, and then can stream over the
> results. It's interesting that I can get the same number of rows into the image using a cursor to return a few (several thousand) at a time with no problem, but pulling them in in one fell swoop causes problems, even though in the end, I still have all the values in the image in one collection no matter which way I get them.
>
>
> Mike
>
> Mike Hales
> Engineering Manager
> KnowledgeScape
> www.kscape.com <http://www.kscape.com>
>
>

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Large queries with PostgreSQL driver

Michael Lucas-Smith-2
In reply to this post by Mark Pirogovsky-3
Mark Pirogovsky wrote:

> Does anybody care to elaborate ?
>
> Are  you  suggesting to issue multiple queries in a form like
>
> Select * from foo limit 100
>  
> first and then
>
> Select * from foo offset 101 limit 100
>  Select * from foo offset 201 limit 100
> ................................
> And to ensure that your view on the DB is constant do it inside
> transaction ?
>
>  
Yes that is pretty much right. That's the technique we use in
WebVelocity for doing 'pages' of results from the db.

Michael
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Large queries with PostgreSQL driver

Mike Hales
You can use a cursor to give stream like behavior over a large query.

DECLARE mycursor CURSOR FOR SELECT * FROM foo ;

Then you can position using the move command or use the fetch command

FETCH FORWARD 100 FROM mycursor ;

This returns the next 100 ( or up to the end whichever is less ) and leaves the cursor positioned correctly for the next fetch.


This can be faster than separate select statements and is meant for this kind of thing.

My little object wraps a cursor and a ReadStream. When initialized, it builds the cursor and performs the first fetch, then opens a ReadStream on the results. It has a stream-like api, that calls into the real stream, but when the stream is at the end, it does another fetch, and if it returns results, replaces the stream with a new one on the new results, then keeps on going. When the stream is at end, and the fetch returns no more results, I return a nil and close the cursor. It doesn't have a full stream api, but close enough so that my trends, histograms and tables that use it can be passed a ReadStream, or my SQLCursor object and exercise the same api on both. I don't have problems blowing up the image with this technique when the tables are really big.

Mike
 


Mike Hales
Engineering Manager
KnowledgeScape
www.kscape.com


On Tue, Jul 7, 2009 at 5:06 PM, Michael Lucas-Smith <[hidden email]> wrote:
Mark Pirogovsky wrote:
> Does anybody care to elaborate ?
>
> Are  you  suggesting to issue multiple queries in a form like
>
> Select * from foo limit 100
>
> first and then
>
> Select * from foo offset 101 limit 100
>  Select * from foo offset 201 limit 100
> ................................
> And to ensure that your view on the DB is constant do it inside
> transaction ?
>
>
Yes that is pretty much right. That's the technique we use in
WebVelocity for doing 'pages' of results from the db.

Michael
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Large queries with PostgreSQL driver

david.long
I gave this a quick run in the ad hoc query tool tool in VW - I had to
add "WITH HOLD" ...

DECLARE mycursor CURSOR WITH HOLD FOR SELECT * FROM foo;

This will work quite nicely I expect.

thanks Mike.

David

-----Original Message-----
From: Mike Hales <[hidden email]>
To: VWNC <[hidden email]>, Mark Pirogovsky <[hidden email]>,
David Long <[hidden email]>
Subject: Re: [vwnc] Large queries with PostgreSQL driver
Date: Wed, 8 Jul 2009 09:53:01 -0600

You can use a cursor to give stream like behavior over a large query.


DECLARE mycursor CURSOR FOR SELECT * FROM foo ;


Then you can position using the move command or use the fetch command


FETCH FORWARD 100 FROM mycursor ;


This returns the next 100 ( or up to the end whichever is less ) and
leaves the cursor positioned correctly for the next fetch.


You can see the docs
at http://www.postgresql.org/docs/8.1/static/sql-fetch.html
and http://www.postgresql.org/docs/8.1/static/sql-declare.html


This can be faster than separate select statements and is meant for this
kind of thing.


My little object wraps a cursor and a ReadStream. When initialized, it
builds the cursor and performs the first fetch, then opens a ReadStream
on the results. It has a stream-like api, that calls into the real
stream, but when the stream is at the end, it does another fetch, and if
it returns results, replaces the stream with a new one on the new
results, then keeps on going. When the stream is at end, and the fetch
returns no more results, I return a nil and close the
cursor. It doesn't have a full stream api, but close enough so that my trends, histograms and tables that use it can be passed a ReadStream, or my SQLCursor object and exercise the same api on both. I don't have problems blowing up the image with this technique when the tables are really big.


Mike
 




Mike Hales
Engineering Manager
KnowledgeScape
www.kscape.com


On Tue, Jul 7, 2009 at 5:06 PM, Michael Lucas-Smith
<[hidden email]> wrote:
        Mark Pirogovsky wrote:
        > Does anybody care to elaborate ?
        >
        > Are  you  suggesting to issue multiple queries in a form like
        >
        > Select * from foo limit 100
        >
        > first and then
        >
        > Select * from foo offset 101 limit 100
        >  Select * from foo offset 201 limit 100
        > ................................
        > And to ensure that your view on the DB is constant do it
        inside
        > transaction ?
        >
        >
       
        Yes that is pretty much right. That's the technique we use in
        WebVelocity for doing 'pages' of results from the db.
       
        Michael
       
        _______________________________________________
        vwnc mailing list
        [hidden email]
        http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
       


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc