Where are indices really useful ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Where are indices really useful ?

GLASS mailing list

Yes, I know: indices are a world of their own. But I asked myself, what can an index bring for me.

I had an IdentitySet with around 3759515 elements. The objects contained have several attributes, but I am only considering one attribute for my queries via GsQuery: the attribute instances are member of SmallDouble, no nil containing.

I produced a query like:

"(40,0 <= each.floatvalue <= 60,0)"

and the results are around 1719852 elements (around 50 % of the total numbers of elements). I do not select the returned objects I just count them.

When I define an index on this set (equalityIndex) it took around 6 seconds to count them all, without index creation the system get the job done in 2.5 seconds. (always a warm database).

SQLite3 takes around 0.7s without index and 0.17s with index (total numbers are not useful here, but it shows that here an index is useful).

It seems for me, that an index in Gemstone is useful, IF the number of expected elements are really small .


Any comment ?

Marten


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Where are indices really useful ?

GLASS mailing list

Ok, I read the documentation and I understand where the problem is:

The query support creates a temporary container to held the result and then you iterate over the result container. I assume, that it takes time to create and maintain this result container (if the result size is larger). The example below can be transferred into a streamable query, which speed ups the query by at least a factor of 2.

Marten


Marten Feldtmann via Glass <[hidden email]> hat am 28. Januar 2017 um 16:03 geschrieben:

Yes, I know: indices are a world of their own. But I asked myself, what can an index bring for me.

I had an IdentitySet with around 3759515 elements. The objects contained have several attributes, but I am only considering one attribute for my queries via GsQuery: the attribute instances are member of SmallDouble, no nil containing.

I produced a query like:

"(40,0 <= each.floatvalue <= 60,0)"

and the results are around 1719852 elements (around 50 % of the total numbers of elements). I do not select the returned objects I just count them.

When I define an index on this set (equalityIndex) it took around 6 seconds to count them all, without index creation the system get the job done in 2.5 seconds. (always a warm database).

SQLite3 takes around 0.7s without index and 0.17s with index (total numbers are not useful here, but it shows that here an index is useful).

It seems for me, that an index in Gemstone is useful, IF the number of expected elements are really small .


Any comment ?

Marten


 

_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Where are indices really useful ?

GLASS mailing list
In reply to this post by GLASS mailing list
Vía phone.. Quick answer.. Small double are immediate objects so always are identical. Can you measure performance with identity index? 

On Jan 28, 2017 12:03 PM, "Marten Feldtmann via Glass" <[hidden email]> wrote:

Yes, I know: indices are a world of their own. But I asked myself, what can an index bring for me.

I had an IdentitySet with around 3759515 elements. The objects contained have several attributes, but I am only considering one attribute for my queries via GsQuery: the attribute instances are member of SmallDouble, no nil containing.

I produced a query like:

"(40,0 <= each.floatvalue <= 60,0)"

and the results are around 1719852 elements (around 50 % of the total numbers of elements). I do not select the returned objects I just count them.

When I define an index on this set (equalityIndex) it took around 6 seconds to count them all, without index creation the system get the job done in 2.5 seconds. (always a warm database).

SQLite3 takes around 0.7s without index and 0.17s with index (total numbers are not useful here, but it shows that here an index is useful).

It seems for me, that an index in Gemstone is useful, IF the number of expected elements are really small .


Any comment ?

Marten


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Where are indices really useful ?

GLASS mailing list
In reply to this post by GLASS mailing list
That's correct.  With a low selectivity query (aka large result set), most of the time can be spent constructing the result set.  The #selectAsStream: methods avoid this penalty.

Also Dale is working on a major overhaul of the indexing system in 3.4.  Indexing should perform better in that release.

Norm


On 1/28/2017 8:24 AM, Marten Feldtmann via Glass wrote:

Ok, I read the documentation and I understand where the problem is:

The query support creates a temporary container to held the result and then you iterate over the result container. I assume, that it takes time to create and maintain this result container (if the result size is larger). The example below can be transferred into a streamable query, which speed ups the query by at least a factor of 2.

Marten


Marten Feldtmann via Glass [hidden email] hat am 28. Januar 2017 um 16:03 geschrieben:

Yes, I know: indices are a world of their own. But I asked myself, what can an index bring for me.

I had an IdentitySet with around 3759515 elements. The objects contained have several attributes, but I am only considering one attribute for my queries via GsQuery: the attribute instances are member of SmallDouble, no nil containing.

I produced a query like:

"(40,0 <= each.floatvalue <= 60,0)"

and the results are around 1719852 elements (around 50 % of the total numbers of elements). I do not select the returned objects I just count them.

When I define an index on this set (equalityIndex) it took around 6 seconds to count them all, without index creation the system get the job done in 2.5 seconds. (always a warm database).

SQLite3 takes around 0.7s without index and 0.17s with index (total numbers are not useful here, but it shows that here an index is useful).

It seems for me, that an index in Gemstone is useful, IF the number of expected elements are really small .


Any comment ?

Marten


 

_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Where are indices really useful ?

GLASS mailing list
In reply to this post by GLASS mailing list

First streamable queries assume equality indices and I assume, that comparing numerical values (with something different than =) can not benefit from identity indices.

Marten

Mariano Martinez Peck <[hidden email]> hat am 28. Januar 2017 um 17:25 geschrieben:
Vía phone.. Quick answer.. Small double are immediate objects so always are identical. Can you measure performance with identity index? 



_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Where are indices really useful ?

GLASS mailing list

But using streamable result sets pop up another interface problem: they are only allowed to use, if an equalityIndex is defined on that attribute. In the example I create an index only on larger sets (>50000 items) - that means sometimes the index is available, but sometimes not .. so to make it right, you should ask GsQuery (if this is possible at all) if the query can be executed at all and then according to the result, use the streaming interface - otherwise the do: interface.


Marten


First streamable queries assume equality indices and I assume, that comparing numerical values (with something different than =) can not benefit from identity indices.

Marten

Mariano Martinez Peck <[hidden email]> hat am 28. Januar 2017 um 17:25 geschrieben:
Vía phone.. Quick answer.. Small double are immediate objects so always are identical. Can you measure performance with identity index? 



 

_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Where are indices really useful ?

GLASS mailing list

After testing the index support in Gemstone I come (for me) to the following point:

a) try to define a query with only one predicate to get a streamable result set (and hope to reduce the size of your result set)

b) if you have more than one predicate -> build your own block and then execute this block against the result set from (a).

c) Defining more than one index on a set may result in strange (slower) query execution time - the query is executed faster, if no index is defined at all.

Puuuh, lots of new things to learn. Perhaps it would be nice to ask GsQuery NOT to use indices at all - so you have ONE programming interface for all cases.

Marten

Marten Feldtmann via Glass <[hidden email]> hat am 28. Januar 2017 um 18:09 geschrieben:

But using streamable result sets pop up another interface problem: they are only allowed to use, if an equalityIndex is defined on that attribute. In the example I create an index only on larger sets (>50000 items) - that means sometimes the index is available, but sometimes not .. so to make it right, you should ask GsQuery (if this is possible at all) if the query can be executed at all and then according to the result, use the streaming interface - otherwise the do: interface.


Marten


First streamable queries assume equality indices and I assume, that comparing numerical values (with something different than =) can not benefit from identity indices.

Marten

Mariano Martinez Peck <[hidden email]> hat am 28. Januar 2017 um 17:25 geschrieben:
Vía phone.. Quick answer.. Small double are immediate objects so always are identical. Can you measure performance with identity index? 



 

_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass


 

_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Where are indices really useful ?

GLASS mailing list
In reply to this post by GLASS mailing list

In 3.4 identity indexes will be streamable ...


On 1/28/17 8:58 AM, Marten Feldtmann via Glass wrote:

First streamable queries assume equality indices and I assume, that comparing numerical values (with something different than =) can not benefit from identity indices.

Marten

Mariano Martinez Peck [hidden email] hat am 28. Januar 2017 um 17:25 geschrieben:
Vía phone.. Quick answer.. Small double are immediate objects so always are identical. Can you measure performance with identity index? 




_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Where are indices really useful ?

GLASS mailing list
In reply to this post by GLASS mailing list

Marten,

I suggest that you wait for 3.4 before making up your mind about indexing support.

The work that we are doing for 3.4 should significantly speed up the execution of queries, but I don't think that query speedup is the only thing that can be done ...

The main thrust of my efforts for 3.4 involves eliminating the need for the RcIndexDictionary ...  Without going into gory details, eliminating the RcIndexDictionary means that we can eliminate quite a bit of object faulting and execution time associated with **each object in a query result***. In my initial prototype, I saw an overall speedup of 2x in running one of the major index tests in our test suite ... this test spends most of it's time in query evaluation, but it also sequences through the whole life cycle of an index: index creation/removal; adding, updating, and removing indexed objects... and my initial prototype is written all in Smalltalk ... there are several areas where we can add new primitives to speed things up even more.

You have been making several general observations about index performance, but unfortunately you haven't provided any examples or test cases to back up your observations ...

I am not claiming that your observations are wrong, but without an example or test case I cannot know what the root cause may be and whether or not the performance issues you are seeing in your particular test cases will be addressed or not by the new work or even what additional work that _could_ be done to improve things for you.

You have made some observations that are likely to not be directly affected by my work, like order of predicate execution.

As you have observed, currently, you can only "hope to reduce the size of your result set". One of features we are are adding is a counted b plus tree implementation ... with a counted b plus tree, it should be possible to calculate the size of a result set without executing the entire query ... depending upon the exact nature of your queries, it may be possible to define a query optimizer that utilizes the estimated result set sizes to determine predicate evaluation order ... otoh, without knowing the details of your exact use case, we may not implement the optimizer in such a way that addresses your use case ...

If you are willing, I am sure that we could make an alpha release of 3.4 available to so you could give us better feedback. In fact, I would really like to get a closer look at your use cases. Ideally I would get a copy of your development extent so that I can work with exactly the same sample data and code that you are using ... If you are unable to share extents, then a set of test cases that exhibit the same behavior as you are seeing would be much appreciated.

This happens to be the perfect time to be looking at index performance issues ... I am a couple of weeks from finishing up the initial work on the new implementation and being able to get direct feedback from you would help us focus and prioritize any additional work that might be needed to round out the usability and performance of the new implementation.

Dale


On 1/28/17 9:27 AM, Marten Feldtmann via Glass wrote:

After testing the index support in Gemstone I come (for me) to the following point:

a) try to define a query with only one predicate to get a streamable result set (and hope to reduce the size of your result set)

b) if you have more than one predicate -> build your own block and then execute this block against the result set from (a).

c) Defining more than one index on a set may result in strange (slower) query execution time - the query is executed faster, if no index is defined at all.

Puuuh, lots of new things to learn. Perhaps it would be nice to ask GsQuery NOT to use indices at all - so you have ONE programming interface for all cases.

Marten

Marten Feldtmann via Glass [hidden email] hat am 28. Januar 2017 um 18:09 geschrieben:

But using streamable result sets pop up another interface problem: they are only allowed to use, if an equalityIndex is defined on that attribute. In the example I create an index only on larger sets (>50000 items) - that means sometimes the index is available, but sometimes not .. so to make it right, you should ask GsQuery (if this is possible at all) if the query can be executed at all and then according to the result, use the streaming interface - otherwise the do: interface.


Marten


First streamable queries assume equality indices and I assume, that comparing numerical values (with something different than =) can not benefit from identity indices.

Marten

Mariano Martinez Peck [hidden email] hat am 28. Januar 2017 um 17:25 geschrieben:
Vía phone.. Quick answer.. Small double are immediate objects so always are identical. Can you measure performance with identity index? 



 

_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass


 

_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass