CosmosDB MongoDB 3.6 fails sort() query with compounded index - azure-cosmosdb-mongoapi

Newby MongoDB & CosmosDB user here, I've read the answer to this question How does MongoDB treat find().sort() queries with respect to single and compound indexes? and the offocial MongoDB docs and I believe my index creation mirrors that answer so I am leaning towards this being a CosmosDB issue but reading their documentation CosmosDB 3.6 supports compounded indexes as well, so I am at a loss right now.
I am able to run sort() queries like db.Videos.find().sort({"PublishedOn": 1}) from the mongo command line on a collection with an index created as db.Videos.createIndex({"PublishedOn": 1}) or db.Videos.createIndex({"PublishedOn": -1}).
And when I add a 'where' clause to the find like this db.Videos.find({"IsPinned": false}).sort({"PublishedOn": 1}) the above index still works.
However I now have document look ups which I want to avoid, so I drop the above single field index and create a compounded index like this db.Videos.createIndex({"IsPinned": 1, "PublishedOn": 1}) or db.Videos.createIndex({"PublishedOn": 1, "IsPinned": 1}) but now the query always fails with the error The index path corresponding to the specified order-by item is excluded..
Is this a limitation of CosmosDB or is my 'ordering' in the index bad?

The issue with CosmosDB is that it expects all WHERE fields to be used in the GROUP BY clause as well in exactly the same order else it won't use the index.
Creating an index as db.Videos.createIndex({"IsPinned": 1, "PublishedOn": 1}) and then updating the query to be db.Videos.find({"IsPinned": false}).sort({"IsPinned": 1, "PublishedOn": 1}) works like a charm.
I inferred this from reading the CosmosDB documentation on indexing policies (https://learn.microsoft.com/en-us/azure/cosmos-db/index-policy) as the MongoDB documentation suddenly stops after the index creation (https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb-indexing) section.

Related

MongoDb Index Search returns the entire collection when using a string contains operation

Why would MongoDb (4.2.6) return every row from an index (Collation locale: en_US, strength: 1), when searching for a string contained in the document field? Example query:
db.eClearFaces.aggregate()
.match({
"Name": /Test/s
})
.collation({ locale: "en_US", "strength": 1 })
The index it is using is:
Name is simply a string field on the document. Resulting query plan shows that every single record in the collection is returned:
You can see in stage IXSCAN, it returned 56k+ docs (where I expected it to return only 6). That caused the next stage to fetch all 56k docs, but out of those fetched, it returned 6 (the correct count).
I am confused on why - I have both the Collation for the Index and Query configured the same, and its obviously hitting the index. I don't understand why its returning all those extras rows to the next stage.
Index output from profiler:
Did I miss a MongoDb Index or Query fundamental?
The solution ended up being in the MongoDb docs.
The $regex implementation is not collation-aware and is unable to
utilize case-insensitive indexes.
https://www.mongodb.com/docs/manual/reference/operator/query/regex/
So, supplying the collation ended up hindering the performance. Solution was to have an index without collation. It still performed an index scan as expected, but resulted in far fewer results before it fetched from the table.

MongoDB - explain.executionStats

Are there any elements within the output of MongoDB's explain("executionStats") that gives an idea or a hint about - whether the query is using a given index for filtering or sorting or for both?
I read the following URLs
Mongodb compound indexes for filtering and sorting on BIG collection [points to below URL and has brief discussion]
https://emptysqua.re/blog/optimizing-mongodb-compound-indexes/ [ this one gives general idea, but the explain output uses older format/elements that don't exist in Mongodb 4.0 that I am using ]
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/ [documents how to determine the index and leverage index prefixes, but does show explain output confirming the usage]
From MongoDB Docs:
If MongoDB can use an index scan to obtain the requested sort order,
the result will not include a SORT stage. Otherwise, if MongoDB cannot
use the index to sort, the explain result will include a SORT stage.
Example:
Look at the sample data from sortop collection.
Explain plan for a query without index:
Create Index on the collection:
Run the same query and check SORT stage in explain plan:

Pymongo ignoring my limit parameter

I am using Pymongo (v3.5.1) in a Python v3.6.3 Jupyter notebook.
Problem
Even-though I am limiting my results, the db.collection.find() is still retrieving all results before returning
My code:
for post in posts.find({'subreddit_1':"the_donald"}, limit=2):
print(post)
exit
Background
I have imported the Reddit comment data set (RC_2017-01) from files.pushshift.io and created an index on the subreddit field (subreddit_1).
My Indexes
I believe this is caused by the collection having no index on your query term, as exhibited by the line:
planSummary: COLLSCAN
which means that to answer your query, MongoDB is forced to look at each document in the collection one by one.
Creating an index to support your query should help. You can create an index in the mongo shell by executing:
db.posts.createIndex({'subreddit_1': 1})
This is assuming your collection is named posts.
Please note that creating that index would only help with the query you posted. It's likely that different index would be needed for different type of queries.
To read more about how indexing works in MongoDB, check out https://docs.mongodb.com/manual/indexes/
I think you need to change the query, because in find() method 2nd parameter is projection. Find() always return cursor and limit function always works on cursor.
So the syntax should like below:
for post in posts.find({'subreddit_1':"the_donald"})[<start_index>:<end_index>]
print(post)
exit
OR
for post in posts.find({'subreddit_1':"the_donald"}).limit(2)
print(post)
exit
Please read the doc for detail

MongoDB: Indexes, Sorting

After having read the official documentations on indexes, sort, intersection, i'm a little bit confuse on how everything work together.
I've trouble making my query use the indexes i've created. I work on a mongodb 3.0.3, on a collection having ~4millions of document.
To simplify, let's say my document is composed of 6 fields:
{
a:<text>,
b:<boolean>,
c:<text>,
d:<boolean>,
e:<date>,
f:<date>
}
The query I want to achieve is the following :
db.mycoll.find({ a:"OK", b:true, c:"ProviderA", d:true, e:{ $gte:ISODate("2016-10-28T12:00:01Z"),$lt:ISODate("2016-10-28T12:00:02") } }).sort({f:1});
So intuitively I've created two indexes
db.mycoll.createIndex({a: 1, b: 1, c: 1, d:1, e:1 }, {background: true,name: "test1"})
db.mycoll.createIndex({f:1}, {background: true,name: "test2"})
But the explain() give me that the first index is not used at all.
I known there is some kind of limitation when there is ranges in play in the filter (in the e field), but I can't find my way around it.
Also instead of having a single index on f, I try a compound index on {e:1,f:1} but it didn't change anything.
So What I have misunderstood?
Thanks for your support.
Update: also I find some time the following predicate for mongodb 2.6 :
A good rule of thumb for queries with sort is to order the indexed fields in this order:
First, the field(s) on which you will query for exact values.
Second, the field(s) on which you will sort.
Finally, field(s) on which you will query for a range of values (e.g., $gt, $lt, $in)
An example of using this rule of thumb is in the section on “Sorting the results of a complex query on a range of values” below, including a link to further reading.
Does this also apply for 3.X version?
Update 2: following above predicate, I created the following index
db.mycoll.createIndex({a: 1, b: 1, c: 1, d:1 , f:1, e:1}, {background: true,name: "test1"})
And for the same query :
db.mycoll.find({ a:"OK", b:true, c:"ProviderA", d:true, e:{ $gte:ISODate("2016-10-28T12:00:01Z"),$lt:ISODate("2016-10-28T12:00:02") } }).sort({f:1});
the index is indeed used. However too much keys seems to be scan, I may need to find a better order the fields in the query/index.
Mongo acts sometimes a bit strange when it comes to the index selection.
Mongo automagically decides what index to use. The smaller an index is the more likely it is used (especially indexes with only one field) - this is my experience. May be this happens because it is more often already loaded in RAM? To find out what index to use when Mongo performs test queries when it is idle. However the result is sometimes unexpected.
Therefore if you know what index to use you can force a query to use a specific index using the $hint option. You should try that.
Your two indexes used in the query and the sort does not overlap so MongoDB can not use them for index intersection:
Index intersection does not apply when the sort() operation requires an index completely separate from the query predicate.

Add _id when ensuring index?

I am building a webapp using Codeigniter (PHP) and MongoDB.
I am creating indexes and have one question.
If I am querying on three fields (_id, status, type) and want to
create an index do I need to include _id when ensuring the index like this:
db.comments.ensureIndex({_id: 1, status : 1, type : 1});
or will this due?
db.comments.ensureIndex({status : 1, type : 1});
You would need to explicitly include _id in your ensureIndex call if you wanted to include it in your compound index. But because filtering by _id already provides selectivity of a single document that's very rarely the right thing to do. I think it would only make sense if your documents are very large and you're trying to use covered indexes.
MongoDB will currently only use one index per query with the exception of $or queries. If your common query will always be searching on those three fields (_id, status, type) then a compound index would be helpful.
From within the DB shell you can use the explain() command on your query to get information on the indexes used.
You don't need to implicitly create index on the _id field, it's done automatically. See the mongo documentation:
The _id Index
For all collections except capped collections, an index is automatically created for the _id field. This index is special and cannot be deleted. The _id index enforces uniqueness for its keys (except for some situations with sharding).