Smalltalk › Gemtalk › GLASS

Query results as Streams and indexes question

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

6 messages Options

BrunoBB

Query results as Streams and indexes question

Hi All,

For collection with indexes you can use a readStream for large query result (page 104 gs prg manual).

If you try to send #readStream to a collection without an equality index then:
a GsMalformedQueryExpressionError occurred (error 2710), reason:acceptPredicate:, Query may not be streamed.

So in the case an application need aStream on a GsQuery --> you have to pay NO attention to "the rule of thumb" that only a collection with more that 2000 objects should have indexes (page 95 gs prg manual).

Is correct to say when a readStream is needed then you have to create the index after the first element is added ?

Or my application code should be smart enough to use a readStream when there is an index present and a simple GsQuery without readStream when there is no index ?

How do you handle these cases ?

regards
bruno

GLASS mailing list

Re: Query results as Streams and indexes question

On 1/5/17 5:38 AM, BrunoBB via Glass wrote:
> Hi All,
>
> For collection with indexes you can use a readStream for large query result
> (page 104 gs prg manual).
>
> If you try to send #readStream to a collection without an equality index
> then:
> a GsMalformedQueryExpressionError occurred (error 2710),
> reason:acceptPredicate:, Query may not be streamed.
I'd like to see the expression that generates this error ... it looks
like you are sending #readStream to a GsQuery and not a "collection". In
looking at the method
GsStreamableConjunctiveClauseChecker>>acceptPredicate: where this
particular error is signalled, there should be a more detailed
explanation of the reason for the error ... there are 4 different
possible reasons, so I need to see the complete error message to
understand what may have gone wrong..
> So in the case an application need aStream on a GsQuery --> you have to pay
> NO attention to "the rule of thumb" that only a collection with more that
> 2000 objects should have indexes (page 95 gs prg manual).
The rule of thumb is "As a rule of thumb, if your collection contains
fewer than about 2000 objects, it may not be worthwhile to create an
index", which has nothing to do with whether or not you should use a
stream to view results of a query.
>
> Is correct to say when a readStream is needed then you have to create the
> index after the first element is added ?
Whether or not you use a stream on the query results is really a
function of how bug you think the expected result will be ... for "small
result sets" it is not that expensive to use #queryResult.

#do: and #readStream both attempt to avoid scanning the entire result
set and can both be used if you don't intend to to scan the entire
result set... if you are going to touch all of the objects in the result
set, then using #queryResult is probably a bit more efficient than using
#do: or #readStream.
>
> Or my application code should be smart enough to use a readStream when there
> is an index present and a simple GsQuery without readStream when there is no
> index ?
I think you have misunderstood the reason for the error message above
... there are certain types of queries that are not streamable (look at
the error messages in
GsStreamableConjunctiveClauseChecker>>acceptPredicate:for the types of
queries that are not streamable).

And this error message has nothing to do with whether or not the
collection has an error present or not, but with the form of the query
that you are trying to execute... like I said earlier, if you show me an
example of the query itself, and/or provide the full error message I can
tell you a bit more about what may be going wrong ...

Dale

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

BrunoBB

Re: Query results as Streams and indexes question

Dale,

My mail was a little confusing i think...

Yes readStream is sent to aGsQuery (using GS 3.3.0):
('each.username = ''admin''' asQueryOn: instancesSet) readStream.
"where <instancesSet> is an RcIdentityBag"

There error arise when there is NO index on this collection <instancesSet>.
After executing the following then i get <aRangeIndexReadStream> from the previous sentence (no error):
GsIndexSpec new
equalityIndex: 'each.username' lastElementClass: String;
equalityIndex: 'each.groupname' lastElementClass: String;
createIndexesOn: instancesSet.

I was trying to figure out how to deal with indexes at code level in a very specific situation.

But i think is solved when you said:
"#do: and #readStream both attempt to avoid scanning the entire result
set and can both be used if you don't intend to to scan the entire
result set"

I thought (i do not why) that #do: will load all objects to memory :(
If that NOT case then #do: over aGsQuery will do the job for me :)

Also from that large result i need to copy a small segment of objects (for paging purpose on a web page).
The result maybe has 100.000 objects and i want to get objects from position 20 to 30.

There is no #copyFrom:to: in GsQuery. What is the best to do ?
aGsQuery queryResult copyFrom: 20 to: 30.
"this will load all result to memory ?, size use #queryResult so it should not load all objects to memory"

Or use: aGsQuery readStream position: 20.

Sorry for my previous confusing mail ...

regards,
bruno

GLASS mailing list

Re: Query results as Streams and indexes question

On 01/06/2017 07:21 AM, BrunoBB via Glass wrote:
> Dale,
>
> My mail was a little confusing i think...
>
> Yes readStream is sent to aGsQuery (using GS 3.3.0):
> ('each.username = ''admin''' asQueryOn: instancesSet) readStream.
> "where <instancesSet> is an RcIdentityBag"
Okay and the complete error message would have been:

a GsMalformedQueryExpressionError occurred (error 2710),
reason:acceptPredicate:, Query may not be streamed. Predicate:
'(each.key = ''admin'')' must use an equality index.
(#queryIsNotStreamable).

so the error was directly complaining about the lack of an index ...
there were too many possible error messages ... sorry about that

> There error arise when there is NO index on this collection <instancesSet>.
> After executing the following then i get <aRangeIndexReadStream> from the
> previous sentence (no error):
> GsIndexSpec new
> equalityIndex: 'each.username' lastElementClass: String;
> equalityIndex: 'each.groupname' lastElementClass: String;
> createIndexesOn: instancesSet.
>
> I was trying to figure out how to deal with indexes at code level in a very
> specific situation.
>
> But i think is solved when you said:
> "#do: and #readStream both attempt to avoid scanning the entire result
> set and can both be used if you don't intend to to scan the entire
> result set"

In fact, do: uses a stream underneath the covers for indexed queries.
For non-indexed queries each element is passed to the do: block instead
of adding it to the result set. As a result the order of result elements
encountered while using a do: block may differ between an indexed query
and a non-indexed query ... indexed queries using equality indexes will
produce results in "sort order" while non-indexed queries will produce
results in "collection order"
> I thought (i do not why) that #do: will load all objects to memory :(
> If that NOT case then #do: over aGsQuery will do the job for me :)
Yeah, do: was implemented so that you could get an early exit from a
query that may have a lot of results.
> Also from that large result i need to copy a small segment of objects (for
> paging purpose on a web page).
> The result maybe has 100.000 objects and i want to get objects from position
> 20 to 30.
>
> There is no #copyFrom:to: in GsQuery. What is the best to do ?
> aGsQuery queryResult copyFrom: 20 to: 30.
> "this will load all result to memory ?, size use #queryResult so it should
> not load all objects to memory"
Yes this will load all results into memory, but also, you will get back
another UnOrderedCollection so copyFrom:to: isn't implemented.
> Or use: aGsQuery readStream position: 20.
The BtreeReadStreams are not PositionableStreams, so you can use position:

So the best bet (for now) would be to do something like:

19 timesRepeat: [stream next].
10 timesRepeat: [col add: stream next]

Currently this is the best that can be done for skipping about within a
stream ...

For 3.4 we are planning on replacing the current btree implementation
using something similar to a B+tree structure (linked list of leaf
nodes) and in the process we may also introduce a counted B+tree
implementation that would then allow us to introduce an efficient skip
--- when the result set is large and you need to copy from the end of
the result set ...
> Sorry for my previous confusing mail ...
No problem ... I think we're now on the same page and that is what is
important

Dale
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

GLASS mailing list

Re: Query results as Streams and indexes question

In reply to this post by BrunoBB

On 01/06/2017 10:26 AM, Dale Henrichs wrote:
>
>
>
> On 01/06/2017 07:21 AM, BrunoBB via Glass wrote:
>> Or use: aGsQuery readStream position: 20.
> The BtreeReadStreams are not PositionableStreams, so you can use position:
The BtreeReadStreams are not PositionableStreams, so you _cannot_ use
position:
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

BrunoBB

Re: Query results as Streams and indexes question

In reply to this post by GLASS mailing list

Dale,

/*
So the best bet (for now) would be to do something like:
19 timesRepeat: [stream next].
10 timesRepeat: [col add: stream next]
*/

Ok, i got it.

/*
For 3.4 we are planning on replacing the current btree implementation
using something similar to a B+tree structure (linked list of leaf
nodes) and in the process we may also introduce a counted B+tree
implementation that would then allow us to introduce an efficient skip
--- when the result set is large and you need to copy from the end of
the result set ...
*/
Will be excellent to have this in 3.4.

Regards,
Bruno