Hi All,
For collection with indexes you can use a readStream for large query result (page 104 gs prg manual). If you try to send #readStream to a collection without an equality index then: a GsMalformedQueryExpressionError occurred (error 2710), reason:acceptPredicate:, Query may not be streamed. So in the case an application need aStream on a GsQuery --> you have to pay NO attention to "the rule of thumb" that only a collection with more that 2000 objects should have indexes (page 95 gs prg manual). Is correct to say when a readStream is needed then you have to create the index after the first element is added ? Or my application code should be smart enough to use a readStream when there is an index present and a simple GsQuery without readStream when there is no index ? How do you handle these cases ? regards bruno |
On 1/5/17 5:38 AM, BrunoBB via Glass wrote: > Hi All, > > For collection with indexes you can use a readStream for large query result > (page 104 gs prg manual). > > If you try to send #readStream to a collection without an equality index > then: > a GsMalformedQueryExpressionError occurred (error 2710), > reason:acceptPredicate:, Query may not be streamed. I'd like to see the expression that generates this error ... it looks like you are sending #readStream to a GsQuery and not a "collection". In looking at the method GsStreamableConjunctiveClauseChecker>>acceptPredicate: where this particular error is signalled, there should be a more detailed explanation of the reason for the error ... there are 4 different possible reasons, so I need to see the complete error message to understand what may have gone wrong.. > So in the case an application need aStream on a GsQuery --> you have to pay > NO attention to "the rule of thumb" that only a collection with more that > 2000 objects should have indexes (page 95 gs prg manual). The rule of thumb is "As a rule of thumb, if your collection contains fewer than about 2000 objects, it may not be worthwhile to create an index", which has nothing to do with whether or not you should use a stream to view results of a query. > > Is correct to say when a readStream is needed then you have to create the > index after the first element is added ? Whether or not you use a stream on the query results is really a function of how bug you think the expected result will be ... for "small result sets" it is not that expensive to use #queryResult. #do: and #readStream both attempt to avoid scanning the entire result set and can both be used if you don't intend to to scan the entire result set... if you are going to touch all of the objects in the result set, then using #queryResult is probably a bit more efficient than using #do: or #readStream. > > Or my application code should be smart enough to use a readStream when there > is an index present and a simple GsQuery without readStream when there is no > index ? I think you have misunderstood the reason for the error message above ... there are certain types of queries that are not streamable (look at the error messages in GsStreamableConjunctiveClauseChecker>>acceptPredicate:for the types of queries that are not streamable). And this error message has nothing to do with whether or not the collection has an error present or not, but with the form of the query that you are trying to execute... like I said earlier, if you show me an example of the query itself, and/or provide the full error message I can tell you a bit more about what may be going wrong ... Dale _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Dale,
My mail was a little confusing i think... Yes readStream is sent to aGsQuery (using GS 3.3.0): ('each.username = ''admin''' asQueryOn: instancesSet) readStream. "where <instancesSet> is an RcIdentityBag" There error arise when there is NO index on this collection <instancesSet>. After executing the following then i get <aRangeIndexReadStream> from the previous sentence (no error): GsIndexSpec new equalityIndex: 'each.username' lastElementClass: String; equalityIndex: 'each.groupname' lastElementClass: String; createIndexesOn: instancesSet. I was trying to figure out how to deal with indexes at code level in a very specific situation. But i think is solved when you said: "#do: and #readStream both attempt to avoid scanning the entire result set and can both be used if you don't intend to to scan the entire result set" I thought (i do not why) that #do: will load all objects to memory :( If that NOT case then #do: over aGsQuery will do the job for me :) Also from that large result i need to copy a small segment of objects (for paging purpose on a web page). The result maybe has 100.000 objects and i want to get objects from position 20 to 30. There is no #copyFrom:to: in GsQuery. What is the best to do ? aGsQuery queryResult copyFrom: 20 to: 30. "this will load all result to memory ?, size use #queryResult so it should not load all objects to memory" Or use: aGsQuery readStream position: 20. Sorry for my previous confusing mail ... regards, bruno |
On 01/06/2017 07:21 AM, BrunoBB via Glass wrote: > Dale, > > My mail was a little confusing i think... > > Yes readStream is sent to aGsQuery (using GS 3.3.0): > ('each.username = ''admin''' asQueryOn: instancesSet) readStream. > "where <instancesSet> is an RcIdentityBag" Okay and the complete error message would have been: a GsMalformedQueryExpressionError occurred (error 2710), reason:acceptPredicate:, Query may not be streamed. Predicate: '(each.key = ''admin'')' must use an equality index. (#queryIsNotStreamable). so the error was directly complaining about the lack of an index ... there were too many possible error messages ... sorry about that > There error arise when there is NO index on this collection <instancesSet>. > After executing the following then i get <aRangeIndexReadStream> from the > previous sentence (no error): > GsIndexSpec new > equalityIndex: 'each.username' lastElementClass: String; > equalityIndex: 'each.groupname' lastElementClass: String; > createIndexesOn: instancesSet. > > I was trying to figure out how to deal with indexes at code level in a very > specific situation. > > But i think is solved when you said: > "#do: and #readStream both attempt to avoid scanning the entire result > set and can both be used if you don't intend to to scan the entire > result set" For non-indexed queries each element is passed to the do: block instead of adding it to the result set. As a result the order of result elements encountered while using a do: block may differ between an indexed query and a non-indexed query ... indexed queries using equality indexes will produce results in "sort order" while non-indexed queries will produce results in "collection order" > I thought (i do not why) that #do: will load all objects to memory :( > If that NOT case then #do: over aGsQuery will do the job for me :) Yeah, do: was implemented so that you could get an early exit from a query that may have a lot of results. > Also from that large result i need to copy a small segment of objects (for > paging purpose on a web page). > The result maybe has 100.000 objects and i want to get objects from position > 20 to 30. > > There is no #copyFrom:to: in GsQuery. What is the best to do ? > aGsQuery queryResult copyFrom: 20 to: 30. > "this will load all result to memory ?, size use #queryResult so it should > not load all objects to memory" Yes this will load all results into memory, but also, you will get back another UnOrderedCollection so copyFrom:to: isn't implemented. > Or use: aGsQuery readStream position: 20. The BtreeReadStreams are not PositionableStreams, so you can use position: So the best bet (for now) would be to do something like: 19 timesRepeat: [stream next]. 10 timesRepeat: [col add: stream next] Currently this is the best that can be done for skipping about within a stream ... For 3.4 we are planning on replacing the current btree implementation using something similar to a B+tree structure (linked list of leaf nodes) and in the process we may also introduce a counted B+tree implementation that would then allow us to introduce an efficient skip --- when the result set is large and you need to copy from the end of the result set ... > Sorry for my previous confusing mail ... No problem ... I think we're now on the same page and that is what is important Dale _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by BrunoBB
On 01/06/2017 10:26 AM, Dale Henrichs wrote: > > > > On 01/06/2017 07:21 AM, BrunoBB via Glass wrote: >> Or use: aGsQuery readStream position: 20. > The BtreeReadStreams are not PositionableStreams, so you can use position: The BtreeReadStreams are not PositionableStreams, so you _cannot_ use position: _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by GLASS mailing list
Dale,
/* So the best bet (for now) would be to do something like: 19 timesRepeat: [stream next]. 10 timesRepeat: [col add: stream next] */ Ok, i got it. /* For 3.4 we are planning on replacing the current btree implementation using something similar to a B+tree structure (linked list of leaf nodes) and in the process we may also introduce a counted B+tree implementation that would then allow us to introduce an efficient skip --- when the result set is large and you need to copy from the end of the result set ... */ Will be excellent to have this in 3.4. Regards, Bruno |
Free forum by Nabble | Edit this page |