I have a question concerning postgres-indices:
What "field" exactly is covered by an index?
Is it correct, that indices are effective on columns? Does each field in the column then has an index?
Is it correct, that all columns used in queries should have an index?
By comparison of seq-scan and idx-scan one can detect missing indices. What exactly is a missing index and how can they got lost?
No, not all columns should have an index. Indexes are for use on the columns that are used in filtering critera; ON, WHERE, HAVING. Generally, only the most selective column or columns should be used to define the index (and you can use more than one column). Now, some queries can benefit from adding columns to the INCLUDE clause of an index. But, that's not the same as making them the keys. The basics for indexes are here.
Indexes aren't really missing. That's the optimizer making a suggestion for a candidate index. Don't assume it's correct. Test and validate the suggestion.
Related
I have a table in PostgreSQL, it has 20 columns, which are mostly of an enum type. And this table has millions of rows.
I'd like to support and speed up for queries searching for rows with multiple fields, for instance: col2=value1&col3=value2&col5=value3 page=1
I can't use PostgreSQL's compound index,
because it only works with a fixed order of the columns. For instance, If I build an index on (col2,col3,col5), then it can't be used for queries searching for col1=value1&col2=value2
And I'd like also to support queries like:
col1=value1&col2=(value3 or value4) orderby=col3 page=1
What would be a solution to this problem? And if I don't need full-text search on any of these columns (since they are all enum types), could the solution be lightweight?
If you want an OR in your search condition, that's pretty mush “game over” for performance (I'm exaggerating a little for effect).
But if you have only ANDs and equality conditions, I want to call your attention to Bloom filters.
You just have to
CREATE EXTENSION bloom;
and then create an index USING bloom on all columns together.
Unlike other indexes, this single index can speed up queries with all possible combinations of columns in the WHERE condition. The index is just a filter that will pass some false positives, so there always has to be a recheck of the condition, but it will significantly speed up the query.
I have a collection which has an optional field xy_id. About 10% of the documents (out of 500k) does not have this xy_id field.
I have quite a lot of queries to this collection like find({xy_id: <id>}).
I tried indexing it normally (.createIndex({xy_id: 1}, {"background": true})) and it does improve the query speed.
Is this the correct way to index the field in this case? or should I be using a sparse index or another way?
Yes, this is the correct way. The default behaviour of MongoDB is serving well in this case. You can see in the docs that index creation supports an unique flag, which is false by default. All your documents missing the index key will be indexed under a single index entry. Queries can use this index in all cases because all the documents are indexed.
On the other hand, if you use sparse index the documents missing the index key will not be indexed at all. Some operations such as count, sort and other queries will not be able to use the sparse index unless explicitly hinted to do so. If explicitly hinted, you should be okay with incorrect results - the entries not in the index will be omitted in the result. You can read about it here.
I'd like to use a MongoDB unique compound index (with two fields) as a covering index by adding two more fields. Can I specify the uniqueness of the four field index is defined by the first two fields only?
Reading the documentation it sounds like I may have to have one compound four field index for the covering, and another two field index purely for asserting the uniqueness constraint.
You are right that you need to indices for achieving what you want. And there is nothing wrong with it. While uniqueness is checked during writes (and the according index will be used for it), the other index will either be used automatically or you can hint MongoDB to use it.
We have two types of high-volume queries. One looks for docs involving 5 attributes: a date (lte), a value stored in an array, a value stored in a second array, one integer (gte), and one float (gte).
The second includes these five attributes plus two more.
Should we create two compound indices, one for each query? Assume each attribute has a high cardinality.
If we do, because each query involves multiple arrays, it doesn't seem like we can create an index because of Mongo's restriction. How do people structure their Mongo databases in this case?
We're using MongoMapper.
Thanks!
Indexes for queries after the first ranges in the query the value of the additional index fields drops significantly.
Conceptually, I find it best to think of the addition fields in the index pruning ever smaller sub-trees from the query. The first range chops off a large branch, the second a smaller, the third smaller, etc. My general rule of thumb is only the first range from the query in the index is of value.
The caveat to that rule is that additional fields in the index can be useful to aid sorting returned results.
For the first query I would create a index on the two array values and then which ever of the ranges will exclude the most documents. The date field is unlikely to provide high exclusion unless you can close the range (lte and gte). The integer and float is hard to tell without knowing the domain.
If the second query's two additional attributes also use ranges in the query and do not have a significantly higher exclusion value then I would just work with the one index.
Rob.
In mongo, When creating an index I am trying to figure out whether the following query would have an index on a) category_ids and status, OR b) category_ids, status and name???
Source.where(category_ids: [1,2,3], status: Status::ACTIVE).order_by(:name) # ((Ruby/Mongoid code))
Essentially, I am trying to figure out whether indexes should include the ORDER_BY columns? or only the WHERE clauses? Where could I read some more about this?
Yes, an index on thius particular query would be beneficial to the speed of the query. However there is one caveat here, the order of the index fields.
I have noticed you are using an $in there on category_ids. This link is particularly useful in understanding a little complexity which exists from using an $in with an index on the sort (or a sort in general in fact): http://blog.mongolab.com/2012/06/cardinal-ins/
Towards the end it gives you an indea of an optimal index order for your type of query:
The order of fields in an index should be:
First, fields on which you will query for exact values.
Second, fields on which you will sort.
Finally, fields on which you will query for a range of values.
For reference a couple of other helpful links are as follows:
http://docs.mongodb.org/manual/applications/indexes/
http://docs.mongodb.org/manual/faq/indexes/#how-do-you-determine-what-fields-to-index
http://jasonwilder.com/blog/2012/02/08/optimizing-mongodb-indexes/
why does direction of index matter in MongoDB?
And, http://www.slideshare.net/kbanker/mongo-indexoptimizationprimer
These will help you get started on optimising your indexes and making them work for your queries.