GsQuery results are duplicated

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

GsQuery results are duplicated

BrunoBB
Hi All,

The following query:
(each.groupname|username = 'admin') | (each.groupname = 'orbeon-role') is on aRcIdentityBag.

But it returns duplicated domain objects.
Why this query is answering duplicated objects (same oop) ?
In the Gs Manual i found nothing about duplicated objects in a query.
(there is no GsQueryOptions ignoreDuplicates ?)

With:
aGsQuery asArray asSet are all objects going to be loaded into memory ?

Regards,
Bruno
Reply | Threaded
Open this post in threaded view
|

Re: GsQuery results are duplicated

GLASS mailing list
Bruno,

I wouldn't expect the query to return duplicates unless the
RcIdentityBag the count for those objects in the bag is greater than 1
(occurrencesOf: > 1).

Could you check if that is the case? If not then I will have to dig a
bit deeper.

Dale

On 1/7/17 1:05 PM, BrunoBB via Glass wrote:

> Hi All,
>
> The following query:
> (each.groupname|username = 'admin') | (each.groupname = 'orbeon-role') is on
> aRcIdentityBag.
>
> But it returns duplicated domain objects.
> Why this query is answering duplicated objects (same oop) ?
> In the Gs Manual i found nothing about duplicated objects in a query.
> (there is no GsQueryOptions ignoreDuplicates ?)
>
> With:
> aGsQuery asArray asSet are all objects going to be loaded into memory ?
>
> Regards,
> Bruno
>
>
>
> --
> View this message in context: http://forum.world.st/GsQuery-results-are-duplicated-tp4929023.html
> Sent from the GLASS mailing list archive at Nabble.com.
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: GsQuery results are duplicated

BrunoBB
Dale,

There is no duplicates in the main collection.

(instancesSet size = instancesSet asSet size). "answer true"
instancesSet occurrencesOf: (instancesSet detect: [:each | each username= 'admin']). "answer 1"

(instancesSet select: [:each | (each username = 'admin') or: [(each groupname = 'admin')
or:[(each groupname = 'orbeon-role')]]]) size. "answer 12"

('(each.username = ''admin'') | (each.groupname = ''admin'') | (each.groupname = ''orbeon-role'')' asQueryOn: instancesSet)
size. "answer 24"

('(each.groupname|username = ''admin'') | (each.groupname = ''orbeon-role'')' asQueryOn: instancesSet)
size. "answer 24"

('(each.groupname|username = ''admin'') | (each.groupname = ''admin'') | (each.groupname = ''orbeon-role'')' asQueryOn: instancesSet)
size. "answer 36"

('(each.username = ''admin'') | (each.groupname = ''admin'')' asQueryOn: instancesSet)
size. "answer 24"

('(each.username = ''admin'')' asQueryOn: instancesSet)
size. "answer 12"

No idea what is going on...

regards
bruno
Reply | Threaded
Open this post in threaded view
|

Re: GsQuery results are duplicated

GLASS mailing list
Thanks for providing the additional details.

Hmmm, seems like a bug ... interesting that we haven't run across this
earlier ...

I am assuming that your instanceSet is actually a kind of Bag and the
all of the result sets are Bags as well - otherwise there would not be
duplicates...

It looks like the behavior you are seeing stems from the fact that we
use the #+ operation when combining intermediate results for predicates  
when the #| query operator is used (see
GsCompoundClause>>resultOperatorFor: and IdentityBag>>+). The #+
operator adds the number of duplicate elements when creating the new bag
...

Sooooo this bug is tied into the expected behavior of bags and I'm not
quite sure that there is a right answer here ... suggestions for
alternative behaviors would be welcomed:)

After a bit of study, I have a feeling that the "right answer" will be
something like the following:

   | nsc query |
   nsc := IdentityBag new.
   nsc add: 1 -> '1'.
   nsc add: 2 -> '1'.
   nsc add: 3 -> '1'.
   query := '(each.key = 2) | (each.value = ''1'')' asQueryOn: nsc.
   query queryResult * nsc

The effect of `* nsc` is to normalize the result object count in your
result to match the count in the original bag and give a result would be
correct even if the original bag has multiple occurrences of the
objects... #* is implemented as a primitive and our identity-based
intersection and union primitives are pretty efficient (they do not page
the objects into memory ... operations are performed at the collection
leaf level without touching the elements themselves).

If you know that your Bag does not contain multiple occurrences (or you
don't care), then converting the query result using #asSet ... #asSet
will end up paging in all of the objects in the result set and I
actually think that `* nsc` would end being more efficient.

I was thinking that using a do: block might actually give better
results, but it turns out that the block also gives duplicate results
(for a slightly different reason):

   | nsc query ar |
   nsc := IdentityBag new.
   nsc add: 1 -> '1'.
   nsc add: 2 -> '1'.
   nsc add: 3 -> '1'.
   query := '(each.key = 2) | (each.value = ''1'')' asQueryOn: nsc.
   ar := {}.
   query do: [ :each | ar add: each ].
   ar

I've create internal bug reports for both of these:

   46607  GsCompoundClause>>executeAndDo: and
GsCompoundClause>>executeAndDo:using: not correct for #| query operator
   46609  GsCompoundClause>>executeClauseUsing: and
GsCompoundClause>>executeClauseNegated not correct for #| query operator
on bag-based collections

Thanks again for reporting this ... I imagine that the solution to both
bugs will be in Smalltalk code, so if you are interested, I can probably
provide a patch for one or both problems, otherwise the fix will be in
the upcoming 3.4.0 (we're aiming at late spring, early summer at the
moment).

Dale

On 1/7/17 2:20 PM, BrunoBB via Glass wrote:

> Dale,
>
> There is no duplicates in the main collection.
>
> (instancesSet size = instancesSet asSet size). "answer true"
> instancesSet occurrencesOf: (instancesSet detect: [:each | each username=
> 'admin']). "answer 1"
>
> (instancesSet select: [:each | (each username = 'admin') or: [(each
> groupname = 'admin')
> or:[(each groupname = 'orbeon-role')]]]) size. "answer 12"
>
> ('(each.username = ''admin'') | (each.groupname = ''admin'') |
> (each.groupname = ''orbeon-role'')' asQueryOn: instancesSet)
> size. "answer 24"
>
> ('(each.groupname|username = ''admin'') | (each.groupname =
> ''orbeon-role'')' asQueryOn: instancesSet)
> size. "answer 24"
>
> ('(each.groupname|username = ''admin'') | (each.groupname = ''admin'') |
> (each.groupname = ''orbeon-role'')' asQueryOn: instancesSet)
> size. "answer 36"
>
> ('(each.username = ''admin'') | (each.groupname = ''admin'')' asQueryOn:
> instancesSet)
> size. "answer 24"
>
> ('(each.username = ''admin'')' asQueryOn: instancesSet)
> size. "answer 12"
>
> No idea what is going on...
>
> regards
> bruno
>
>
>
> --
> View this message in context: http://forum.world.st/GsQuery-results-are-duplicated-tp4929023p4929027.html
> Sent from the GLASS mailing list archive at Nabble.com.
> _______________________________________________
> Glass mailing list
> [hidden email]
> http://lists.gemtalksystems.com/mailman/listinfo/glass

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: GsQuery results are duplicated

BrunoBB
Dale,

i think i will apply:
   | nsc query |
   nsc := IdentityBag new.
   nsc add: 1 -> '1'.
   nsc add: 2 -> '1'.
   nsc add: 3 -> '1'.
   query := '(each.key = 2) | (each.value = ''1'')' asQueryOn: nsc.
   query queryResult * nsc
(thanks for the patch :)

But the better solution will be to migrate to 3.4 when shipped :)

The information about * operator is very useful:
"is implemented as a primitive and our identity-based
intersection and union primitives are pretty efficient (they do not page
the objects into memory ... operations are performed at the collection
leaf level without touching the elements themselves)."

Regards,
Bruno