How does Mongo Query works? - mongodb

I like to know the internal process of how mongo query works. I had gone through the source code of Mongo. But, I could not find the right way to understand it.
Mongo Query 1:
db.collection.find({
class : 10,
subject:"Physics",
name :"John Doe"
});
Mongo Query 2:
db.collection.find({
name :"John Doe",
class : 10,
subject:"Physics",
});
Consider a scenario, in a collection, if the name value -"John Doe" occurs only twice and class value - 10 occurs hundred times, will the second query executes faster or will both queries take same response time. Is the order of query matters?

Its all depends on index, MongoDB will give preferences for Indexed key at first.
You can use Explain which gives results that you want such as how many records have been processed to get this result, is there any index key in your query.
I love using Compass Which gives explains query visually.

Related

MongoDB - Using Index to get nested IDs is slow

I have a MongoDB collection with 8k+ documents, around 40GB. Inside it, the data follows this format:
{
_id: ...,
_session: {
_id: ...
},
data: {...}
}
I need to get all the _session._id for my application. The following approach (python) takes too long to get them:
cursor = collection.find({}, projection={'_session._id': 1})
I have created an Index in MongoDB Compass, but I'm not sure if my query is making use of it at all.
Is there a way to speed this query such that I get all the _session._id very fast?
In mongo shell you can hint() the query optimizer to use the available index as follow:
db.collection.find({},{_id:0,"_session._id":1}).hint({"_session._id":1})
Following test is confirmed to work via python:
import pymongo
db=pymongo.MongoClient("mongodb://user:pass#localhost:12345")
mydb=db["test"]
docs= mydb.test2.find( {} ).hint([ ("x.y", pymongo.ASCENDING) ])
for i in docs:
print(i)
db.test2.createIndex({"x.y":1})
{
"v" : 2,
"key" : {
"x.y" : 1
},
"name" : "x.y_1"
}
python 3.7 ,
pymongo 3.11.2 ,
mongod 5.0.5
In your case seems to be text index , btw it seems abit strange why session is text index , for text index somethink like this must work:
db.test2.find({}).hint("x.y_text").explain()
And here is working example with text index:
import pymongo
db=pymongo.MongoClient("mongodb://user:pass#localhost:123456")
print('Get first 10 docs from test.test:')
mydb=db["test"]
docs= mydb.test2.find( {"x.y":"3"} ).hint( "x.y_text" )
print("===start:====")
for i in docs:
print(i)
db.test2.createIndex({"x.y":"text"}):
{
"v" : 2,
"key" : {
"_fts" : "text",
"_ftsx" : 1
},
"name" : "x.y_text",
"weights" : {
"x.y" : 1
},
"default_language" : "english",
"language_override" : "language",
"textIndexVersion" : 3
}
There are a few points of confusion in this question and the ensuing discussion which generally come down to:
What indexes are present in the environment (and why the attempts to hint it failed)
When using indexing is most appropriate
Current Indexes
I think there are at least 5 indexes that were mentioned so far:
A standard index of {"_session._id":1} mentioned originally in #R2D2's answer.
A text index on the _session._id field (mentioned in this comment)
A text index on the _ts_meta.session field (mentioned in this comment)
A standard index of {"x.y":1} mentioned second in #R2D2's answer.
A text index of {"x.y":"text"} mentioned at the end of #R2D2's answer.
Only the first of these is likely to even really be relevant to the original question. Note that the difference a text index is a specialized index that is meant for performing more advanced text searching. Such indexes are not required for simple string matching or value retrieval. But standard indexes, { '_session._id': 1}, will also store string values and are relevant here.
What Indexing is For
Indexes are typically useful for retrieving a small subset of results from the database. The larger that set of results becomes relative to the overall size of the collection, the less helpful using an index will become. In your situation you are looking to retrieve data from all of the documents in the collection which is why the database doesn't consider using any index at all.
Now it is still possible that an index could help in this situation. That would be if we used it to perform a covered query which means that the data can be retrieved from the index alone without looking at the documents themselves. In this case the database would have to scan the full index, so it is not clear that it would be faster or not. But you could certainly try. To do so you would need to follow #R2D2's instructions, specifically by creating the index and then hinting it in the query (while also projecting out the _id field):
db.collection.createIndex({"_session._id":1})
db.collection.find({},{_id:0,"_session._id":1}).hint({"_session._id":1})
Additional Questions
There were two other things mentioned in the question that are important to address.
I have created an Index in MongoDB Compass, but I'm not sure if my query is making use of it at all.
We talked about why this was the case above. But to find out if the database is using it or not you could navigate to the Explain tab in compass to take a look. If you explain plan visualization it should indicate if the index was used. Remember that you will need to hint the index based on your query.
Is there a way to speed this query such that I get all the _session._id very fast?
What is your definition of "very fast" here?
The general answer is that your operation requires scanning either all documents in the collection or a full index. There is no way to do this more efficiently based on the current schema. Therefore how fast it happens is largely going to come down to the hardware that the database is running on and it will slow down as the collection grows.
If this operation is something that you will be running frequently or have strict performance requirements around, then it may be important to think through your intended goals to see if there are other ways of achieving them. What will you or the application be doing with this list of session IDs?

mongodb Add results of second query to first query which is unrelated to first one

How can I add some result from another query on another collection which is completely unrelated to "main" query with mongodb?
For example I have one big query (which I don't completely understand myself right now ;) written by someone else) containing some $lookup, $group and other stuff and I have second query (written by me) completely unrelated to first which finally gives something like this:
{
//output of the second query
_id : "klhgh-8872-4301-8d89-fskghsfgs"
activeUsers : 3
}
This isn't case for $lookup I believe because those queries have no the same foreignField and localFiled but I'm not expert on mongodb so I could be wrong. :)

Parse-Server source _rperm and _wperm mongo queries, and should I index them?

I'm going through my mongo logs to add indexes for my unindexed queries. By default, mongo only logs queries that take over 100ms to complete.
I've found that I have several on _wperm and _rperm keys. I see that is how the ACL gets broken down. But what type of Parse.Query call might create a query like this in the logs?
query: { orderby: {}, $query: { _rperm: { $in: [ null, "*", "[UserId]" ] } } }
I'm even noticing that this query is on a class that has only 8 total objects, yet is taking 133ms to complete, which seems really slow for such a small class, even if it had to do an in memory sort and scan.
Should I solve this at the code level, modifying my query to avoid this type of mongo query? Or should I add an index for these types of queries?
I notice I also have a few that are showing up in the Slow Queries tab on mLab. The query looks like {"_id":"<val>","_wperm":{"$in":["<vals>"]}}, with the suggested index {"_id": 1, "_wperm": 1}, but it has the following note:
"_id" is in the existing {"_id": 1} unique index. The following index recommendation should only be necessary in certain circumstances.
Yet, this is one of my slower queries, taking 320 ms to complete. It's on the _User class. Is that just because the _User class has a lot of rows? Since the _id is unique I feel like it shouldn't really make a difference adding a _wperm index, since I end up with only a single object.
I'm curious if I will see a benefit from taking action on these queries or if I should safely ignore them.
You should index your collections following the mongodb recommendations. In the good old parse.com days, those indexes were created automatically based on the seen workloads. Now you need to create them. Both make sense. The _rperm will be hit on every queries run without a masterKey. The _wperm in every write. In the future, we could automatically create the _id + _wperm index as all writes use this index

Does MongoDB simplify query logic?

Are the following logical equivalent queries handled by the server differently?
{"Name":"1"}
{$and:[{"Name":"1"},{$or:[{"Name":"1"},{"Tag":"a"}]}]}
Since the second query include the "Tag" field, does it affect index usage?
If you want to experiment and see what mongo is doing for each query you can use an explainable object in the mongo shell.
You can create it with a cursor (https://docs.mongodb.org/manual/reference/method/cursor.explain/):
db.example.find({a:17, b:55}).sort({b:-1}).explain()
Or you can create an explainable object and execute queries with it (https://docs.mongodb.org/v3.0/reference/explain-results/#explain-output):
var exp = db.example.explain()
exp.help() // see what you can do with it
exp.find({a:17, b:55}).sort({b:-1}) // execute query over the object
I cannot answer your question since you do not provide information about the indexes defined in your database, but using this you can see it by yourself in the "queryPlanner" section. If it's using an index it shows "IXSCAN" at "stage" field.

How to comapre all records of two collections in mongodb using mapreduce?

I have an use case in which I want to compare each record of two collections in mongodb and after comparing each record I need to find mismatch fields of all record.
Let us take an example, in collection1 I have one record as {id : 1, name : "bks"}
and in collection2 I have a record as {id : 1, name : "abc"}
When I compare above two records with same key, then field name is a mismatch field as name is different.
I am thinking to achieve this use case using mapreduce in mongodb. But I am facing some problems while accessing collection name in map function. When I tried to compare it in map function, I got error as : "errmsg" : "exception: ReferenceError: db is not defined near '
Can anyone give me some thoughts on how to compare records using mapreduce?
I might have helped you to read the documentation:
When upgrading to MongoDB 2.4, you will need to refactor your code if your map-reduce operations, group commands, or $where operator expressions include any global shell functions or properties that are no longer available, such as db.
So from your error fragment, you appear to be referencing db in order to access another collection. You cannot do that.
If indeed you are intending to "compare" items in one collection to those in another, then there is no other approach other than looping code:
db.collection.find().forEach(function(doc) {
var another = db.anothercollection.findOne({ "_id": doc._id });
// Code to compare
})
There is simply no concept of "joins" as such available to MongoDB, and operations such as mapReduce or aggregate or others strictly work with one collection only.
The exception is db.eval(), but as per all of strict warnings in the documentation, this is almost always a very bad idea.
Live with your comparison in looping code.