Is there a way to force a specific secondary index to be used first in queries in Oracle NoSQL Database? - nosql

My application indexes a few fields in the JSON document to create secondary indexes. Below is a sample document. Let’s suppose indexes are created for age and income.
Sample record:
{ "id":1, "appUser" : {
"firstname":"John",
"lastname":"Doe",
"age":27,
"income":150000,
"lastLogin" : "2020-10-29T18:43:59.8319",
"address":{"street":"150 Route 2",
"city":"Antioch",
"state":"TN",
"zipcode" : 37013,
"phones":[{"type":"home", "areacode":423,
"number":123456}]
},
"connections":[2, 3],
"expenses":{"food":1000, "gas":180}
}
}
Let’s say the query is:
SELECT * from mySample WHERE income > 300000 and age < 40
During the query process, does Oracle NoSQL Database use its own index ordered search method, i.e., picking the first and second indexes (e.g., income, age or vice versa) in the order to achieve the optimal performance?
If I know the data characteristics of my documents, can I specify and force my own order of indexes for queries?

Oracle NoSQL can use only one index for any query. When multiple indexes are applicable to a query, it uses a heuristic to choose the "best" index. However, the chosen index may not always be the best. This is because the heuristic is not based on index key statistics, and as a result, does not compute predicate selectivity. If the heuristic choice turns out to be the wrong one, Oracle NoSQL supports index hints, that are placed inside the query itself and tell the query processor what index to use.
For the example above, both indexes (on age and income) are equally good, according to the heuristic, and the query processor will chose the one whose name is first alphabetically, say this is the one on age. To force the use of the income index instead, use the following query:
SELECT /*+ FORCE_INDEX(mySample ) */ *
from mySample
WHERE income > 300000 and age < 40

Related

What is the correct way to Index in MongoDB when big combination of fields exist

Considering I have search pannel that inculude multiple options like in the picture below:
I'm working with mongo and create compound index on 3-4 properties with specific order.
But when i run a different combinations of searches i see every time different order in execution plan (explain()). Sometime i see it on Collection scan (bad) , and sometime it fit right to the index (IXSCAN).
The selective fields that should handle by mongo indexes are:(brand,Types,Status,Warehouse,Carries ,Search - only by id)
My question is:
Do I have to create all combination with all fields with different order , it can be 10-20 compound indexes. Or 1-3 big Compound Index , but again it will not solve the order.
What is the best strategy to deal with big various of fields combinations.
I use same structure queries with different combinations of pairs
// Example Query.
// fields could be different every time according to user select (and order) !!
db.getCollection("orders").find({
'$and': [
{
'status': {
'$in': [
'XXX',
'YYY'
]
}
},
{
'searchId': {
'$in': [
'3859447'
]
}
},
{
'origin.brand': {
'$in': [
'aaaa',
'bbbb',
'cccc',
'ddd',
'eee',
'bundle'
]
}
},
{
'$or': [
{
'origin.carries': 'YYY'
},
{
'origin.carries': 'ZZZ'
},
{
'origin.carries': 'WWWW'
}
]
}
]
}).sort({"timestamp":1})
// My compound index is:
{status:1 ,searchId:-1,origin.brand:1, origin.carries:1 , timestamp:1}
but it only 1 combination ...it could be plenty like
a. {status:1} {b.status:1 ,searchId:-1} {c. status:1 ,searchId:-1,origin.brand:1} {d.status:1 ,searchId:-1,origin.brand:1, origin.carries:1} ........
Additionally , What will happened with Performance write/read ? , I think write will decreased over reads ...
The queries pattern are :
1.find(...) with '$and'/'$or' + sort
2.Aggregation with Match/sort
thanks
Generally, indexes are only useful if they are over a selective field. This means the number of documents that have a particular value is small relative to the overall number of documents.
What "small" means varies on the data set and the query. A 1% selectivity is pretty safe when deciding whether an index makes sense. If an particular value exists in, say, 10% of documents, performing a table scan may be more efficient than using an index over the respective field.
With that in mind, some of your fields will be selective and some will not be. For example, I suspect filtering by "OK" will not be very selective. You can eliminate non-selective fields from indexing considerations - if someone wants all orders which are "OK" with no other conditions they'll end up doing a table scan. If someone wants orders which are "OK" and have other conditions, whatever index is applicable to other conditions will be used.
Now that you are left with selective (or at least somewhat selective) fields, consider what queries are both popular and selective. For example, perhaps brand+type would be such a combination. You could add compound indexes that match popular queries which you expect to be selective.
Now, what happens if someone filters by brand only? This could be selective or not depending on the data. If you already have a compound index on brand+type, you'd leave it up to the database to determine whether a brand only query is more efficient to fulfill via the brand+type index or via a collection scan.
Continue in this manner with other popular queries and fields.
So you have subdocuments, ranged queries, and sorting by 1 field only.
It can eliminate most of the possible permutations. Assuming there are no other surprises.
D. SM already covered selectivity - you should really listen what the man says and at least upvote.
The other things to consider is the order of the fields in the compound index:
fields that have direct match like $eq
fields you sort on
fields with ranged queries: $in, $lt, $or etc
These are common rules for all b-trees. Now things that are specific to mongo:
A compound index can have no more than 1 multikey index - the index by a field in subdocuments like "origin.brand". Again I assume origins are embedded docs, so the document's shape is like this:
{
_id: ...,
status: ...,
timestamp: ....,
origin: [
{brand: ..., carries: ...},
{brand: ..., carries: ...},
{brand: ..., carries: ...}
]
}
For your query the best index would be
{
searchId: 1,
timestamp: 1,
status: 1, /** only if it is selective enough **/
"origin.carries" : 1 /** or brand, depending on data **/
}
Regarding the number of indexes - it depends on data size. Ensure all indexes fit into RAM otherwise it will be really slow.
Last but not least - indexing is not a one off job but a lifestyle. Data change over time, so do queries. If you care about performance and have finite resources you should keep an eye on the database. Check slow queries to add new indexes, collect stats from user's queries to remove unused indexes and free up some room. Basically apply common sense.
I noticed this one-year-old topic, because I am more or less struggling with a similar issue: users can request queries with an unpredictable set of the fields, which makes it near to impossible to decide (or change) how indexes should be defined.
Even worse: the user should indicate some value (or range) for the fields that make up the sharding-key, otherwise we cannot help MongoDB to limit its search in only a few shards (or chunks, for that matter).
When the user needs the liberty to search on other fields that are not necessariy the ones which make up the sharding-key, then we're stuck with a full-database search. Our dbase is some 10's of TB size...
Indexes should fit in RAM ? This can only be achieved with small databases, meaning some 100's GB max. How about my 37 TB database ? Indexes won't fit in RAM.
So I am trying out a POC inspired by the UNIX filesystem structures where we have inodes pointing to data blocks:
we have a cluster with 108 shards, each contains 100 chunks
at insert time, we take some fields of which we know they yield a good cardinality of the data, and we compute the sharding-key with those fields; the document goes into the main collection (call it "Main_col") on that computed shard, so with a certain chunk-number (equals our computed sharding-key value)
from the original document, we take a few 'crucial' fields (the list of such fields can evolve as your needs change) and store a small extra document in another collection (call these "Crucial_col_A", Crucial_col_B", etc, one for each such field): that document contains the value of this crucial field, plus an array with the chunk-number where the original full document has been stored in the 'big' collection "Main_col"; consider this as a 'pointer' to the chunk in collecton "Main_col" where this full document exists. These "Crucial_col_X" collections are sharded based on the value of the 'crucial' field.
when we insert another document that has the same value for some 'crucial' field "A", then that array in "Crucial_col_A" with chunk-numbers with be updated (with 'merge') to contain the different or same chunk number of this next full document from "Main_col"
a user can now define queries with criteria for at least one of those 'crucial' fields, plus (optional) any other criteria on other fields in the documents; the first criterium for the crucial field (say field "B") will run very quickly (because sharded on the value of "B") and return the small document from "Crucial_col_B", in which we have the array of chunk-numbers in "Main_col" where any document exists that has field "B" equal to the given criterium. Then we run a second set of parallel queries, one for each shardkey-value=chunk-number (or one per shard, to be decided) that we find in the array from before. We combine the results of those parallel subqueries, and then apply further filtering if the user gave additional criteria.
Thus this involves 2 query-steps: first in the "Crucial_col_X" collection to obtain the array with chunk-numbers where the full documents exist, and then the second query on those specific chunks in "Main_col".
The first query is done with a precise value for the 'crucial' field, so the exact shard/chunk is known, thus this query goes very fast.
The second (set of) queries are done with precise values for the sharding-keys (= the chunk numbers), so these are expected to go also very fast.
This way of working would eliminate the burden of defining many index combinations.

Which MongoDB indexes should be created for different sorting and filtering conditions to improve performance?

I have MongoDB collection with ~100,000,000 records.
On the website, users search for these records with "Refinement search" functionality, where they can filter by multiple criteria:
by country, state, region;
by price range;
by industry;
Also, they can review search results sorted:
by title (asc/desc),
by price (asc/desc),
by bestMatch field.
I need to create indexes to avoid full scan for any of combination above (because users use most of the combinations). Following Equality-Sort-Range rule for creating indexes, I have to create a lot of indexes:
All filter combination × All sortings × All range filters, like the following:
country_title
state_title
region_title
title_price
industry_title
country_title_price
country_industry_title
state_industry_title
...
country_price
state_price
region_price
...
country_bestMatch
state_bestMatch
region_bestMatch
...
In reality, I have more criteria (including equality & range), and more sortings. For example, I have multiple price fields and users can sort by any of that prices, so I have to create all filtering indexes for each price field in case if the user will sort by that price.
We use MongoDB 4.0.9, only one server yet.
Until I had sorting, it was easier, at least I could have one compound index like country_state_region and always include country & state in the query when one searches for a region. But with sorting field at the end, I cannot do it anymore - I have to create all different indexes even for location (country/state/region) with all sorting combinations.
Also, not all products have a price, so I cannot just sort by price field. Instead, I have to create two indexes: {hasPrice: -1, price: 1}, and {hasPrice: -1, price: -1} (here, hasPrice is -1, to have records with hasPrice=true always first, no matter price sort direction).
Currently, I use the NodeJS code to generate indexes similar to the following (that's simplified example):
for (const filterFields of getAllCombinationsOf(['country', 'state', 'region', 'industry', 'price'])) {
for (const sortingField of ['name', 'price', 'bestMatch']) {
const index = {
...(_.fromPairs(filterFields.map(x => [x, 1]))),
[sortingField]: 1
};
await collection.ensureIndex(index);
}
}
So, the code above generates more than 90 indexes. And in my real task, this number is even more.
Is it possible somehow to decrease the number of indexes without reducing the query performance?
Thanks!
Firstly, in MongoDB (Refer: https://docs.mongodb.com/manual/reference/limits/), a single collection can have no more than 64 indexes. Also, you should never create 64 indexes unless there will be no writes or very minimal.
Is it possible somehow to decrease the number of indexes without reducing the query performance?
Without sacrificing either of functionality and query performance, you can't.
Few things you can do: (assuming you are using pagination to show results)
Create a separate (not compound) index on each column and let MongoDB execution planner choose index based on meta-information (cardinality, number, etc) it has. Of course, there will be a performance hit.
Based on your judgment and some analytics create compound indexes only for combinations which will be used most frequently.
Most important - While creating compound indexes you can let off sort column. Say you are filtering based on industry and sorting based on price. If you have a compound index (industry, price) then everything will work fine. But if you have index only on the industry (assuming paginated results), then for first few pages query will be quite fast, but will keep degrading as you move on to next pages. Generally, users don't navigate after 5-6 pages. Also, you have to keep in mind for larger skip values, the query will start to fail because of the 32mb memory limit for sorting. This can be overcome with aggregation (instead of the query) with allowDiskUse enable.
Check for keyset pagination (also called seek method) if that can be used in your use-case.

Fundamental misunderstanding of MongoDB indices

So, I read the following definition of indexes from [MongoDB Docs][1].
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
Indexes are special data structures that store a small portion of the
collection’s data set in an easy to traverse form. The index stores
the value of a specific field or set of fields, ordered by the value
of the field. The ordering of the index entries supports efficient
equality matches and range-based query operations. In addition,
MongoDB can return sorted results by using the ordering in the index.
I have a sample database with a collection called pets. Pets have the following structure.
{
"_id": ObjectId(123abc123abc)
"name": "My pet's name"
}
I created an index on the name field using the following code.
db.pets.createIndex({"name":1})
What I expect is that the documents in the collection, pets, will be indexed in ascending order based on the name field during queries. The result of this index can potentially reduce the overall query time, especially if a query is strategically structured with available indices in mind. Under that assumption, the following query should return all pets sorted by name in ascending order, but it doesn't.
db.pets.find({},{"_id":0})
Instead, it returns the pets in the order that they were inserted. My conclusion is that I lack a fundamental understanding of how indices work. Can someone please help me to understand?
Yes, it is misunderstanding about how indexes work.
Indexes don't change the output of a query but the way query is processed by the database engine. So db.pets.find({},{"_id":0}) will always return the documents in natural order irrespective of whether there is an index or not.
Indexes will be used only when you make use of them in your query. Thus,
db.pets.find({name : "My pet's name"},{"_id":0}) and db.pets.find({}, {_id : 0}).sort({name : 1}) will use the {name : 1} index.
You should run explain on your queries to check if indexes are being used or not.
You may want to refer the documentation on how indexes work.
https://docs.mongodb.com/manual/indexes/
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/

What should the indexing strategy be to support queries that are a combination of different fields?

Lets say I have a User collection, where a document looks like this
{
"name": "Starlord",
"age": 24,
"gender": "Male",
"height": 180,
"weight": 230,
"hobbies": "Flying Spaceships"
}
Now, I want someone to be able to search for User based on one or more of these fields. So I add a compound index containing all these fields in the order above.
The issue is that MongoDB indexing works great when the query fields are a prefix of the indexed fields. For example, if I query by name, age and gender then the performance of the query is great. If I query by name, gender and weight, then the performance of the query is not so great (although it still uses the index and is faster than no-index).
What indexing strategy do you use when you have a use case like this?
The reason why your query by name, age and gender works great while the query by name, gender and weight does not is because the order of the fields matter significantly for compound indexes in MongoDB, especially the index's prefixes. As explained in this page in the documentation, a compound index can support queries on any prefix of its fields. So assuming you created the index in the order you presented the fields, the query for name, age and gender is a prefix of your compound index, while name, gender and weight can only take advantage of the name part of the index.
Supporting all possible combinations of queries on these fields would require you to create enough compound indexes so that all possible queries are prefixes of your indexes. I would say that this is not something you would want to do. Since your question asks about indexing strategies for queries with multiple fields, I would suggest that you look into the specific data access patterns that are most useful for your data set and create a few compound indexes that support these, taking advantage of the prefixes concept and omitting certain fields with low cardinality from the index, such as gender.
If you need to be able to query for all combinations, the number of indexes requires explodes quickly. The feature that comes to the rescue is called "index intersection".
Create a simple index on each field and trust the query optimizer to perform the correct index intersection. This feature is relatively new (from 2.6) and not as feature complete as in the well-known RBDMSses. It makes sense to track the Jira Ticket for index intersections to know the limitations, because the limitations are quite severe. It usually makes sense to carefully mix simple indexes (that can be intersected) and compound indexes (for very common queries).
In your specific case, you can utilize the fact that many fields are numeric and the range of valid values is very limited (e.g., for age, height and weight). The gender field has low selectivity and shouldn't be indexed in any case. Filter the gender in the last step, because it will, on average, only double the amount of data that must be processed.
Creating n! compound indexes is almost certainly not an option for n > 3...

MongoDB : Indexes order and query order must match?

This question concern the internal method to manage indexes and serching Bson Documents.
When you create a multiple indexes like "index1", "index2", "index3"...the index are stored to be used during queries, but what about the order of queries and the performance resulting.
sample
index1,index2,index3----> query in the same order index1,index2,index3 (best case)
index1,index2,index3----> query in another order index2,index1,index3 (the order altered)
Many times you use nested queries including these 3 index and others items or more indexes. The order of the queries would implicate some time lost?. Must passing the queries respecting the indexes order defined or the internal architecture take care about this order search? I searching to know if i do take care about this or can make my queries in freedom manier.
Thanks.
The order of the conditions in your query does not affect whether it can use an index or no.
e.g.
typical document structure:
{
"FieldA" : "A",
"FieldB" : "B"
}
If you have an compound index on A and B :
db.MyCollection.ensureIndex({FieldA : 1, FieldB : 1})
Then both of the following queries will be able to use that index:
db.MyCollection.find({FieldA : "A", FieldB : "B"})
db.MyCollection.find({FieldB : "B", FieldA : "A"})
So the ordering of the conditions in the query do not prevent the index being used - which I think is the question you are asking.
You can easily test this out by trying the 2 queries in the shell and adding .explain() after the find. I just did this to confirm, and they both showed that the compound index was used.
however, if you run the following query, this will NOT use the index as FieldA is not being queried on:
db.MyCollection.find({FieldB : "B"})
So it's the ordering of the fields in the index that defines whether it can be used by a query and not the ordering of the fields in the query itself (this was what Lucas was referring to).
From http://www.mongodb.org/display/DOCS/Indexes:
If you have a compound index on
multiple fields, you can use it to
query on the beginning subset of
fields. So if you have an index on
a,b,c
you can use it query on
a
a,b
a,b,c
So yes, order matters. You should clarify your question a bit if you need a more precise answer.