mongodb using less performant index - mongodb

I am using mongodb 2.6, and I have the following query:
db.getCollection('Jobs').find(
{ $and: [ { RunID: { $regex: ".*_0" } },
{ $or: [ { JobType: "TypeX" },
{ JobType: "TypeY" },
{ JobType: "TypeZ" },
{ $and: [ { Info: { $regex: "Weekly.*" } }, { JobType: "YetAnotherType" } ] } ] } ] })
I have three different indexes: RunID, RunID + JobType, RunID + JobType + Info. Mongo is always using the index containing RunID only, although the other indexes seem more likely to produce faster results, it is even sometimes using an index consisting of RunID + StartTime while StartTime is not even in the list of used fields, any idea why is it choosing that index?

Note1:
You can drop your first 2 indexes, RunID and RunID + JobType. It is enough to use just the expanded compound index RunID + JobType + Info; this can be also used to query on RunID or RunID + JobType fields, info here:
In addition to supporting queries that match on all the index fields,
compound indexes can support queries that match on the prefix of the
index fields.
When you drop those indexes, mongo will choose the only remained index.
Note2:
You can always use hint, to tell mongo to use a specific index:
db.getCollection('Jobs').find().hint({RunID:1, JobType:1, Info:1})

Thanks to Sergiu's answer and Sammaye's comment, I think I found what I am looking for:
I got rid of RunID index, since RunID is a prefix in many other indexes, mongodb will use it if it needs only RunID.
Concerning $or, we have the following in the documentation:
When evaluating the clauses in the $or expression, MongoDB either
performs a collection scan or, if all the clauses are supported by
indexes, MongoDB performs index scans. That is, for MongoDB to use
indexes to evaluate an $or expression, all the clauses in the $or
expression must be supported by indexes. Otherwise, MongoDB will
perform a collection scan.
As I mentioned earlier, RunID is already indexed, so we need a new index for the other fields in the query: JobType and Info, since JobType needs to be the index's prefix so that it can be used in queries not containing Info field, so the second index I created is
{ "JobType": 1.0, "Info": 1.0}
As a result, mongodb will use a complex plan in which different indexes will be used.

Related

what is the difference between MongoDB find and aggregate in below queries?

select records using aggregate:
db.getCollection('stock_records').aggregate(
[
{
"$project": {
"info.created_date": 1,
"info.store_id": 1,
"info.store_name": 1,
"_id": 1
}
},
{
"$match": {
"$and": [
{
"info.store_id": "563dcf3465512285781608802a"
},
{
"info.created_date": {
$gt: ISODate("2021-07-18T21:07:42.313+00:00")
}
}
]
}
}
])
select records using find:
db.getCollection('stock_records').find(
{
'info.store_id':'563dcf3465512285781608802a',
'info.created_date':{ $gt:ISODate('2021-07-18T21:07:42.313+00:00')}
})
What is difference between these queries and which is best for select by id and date condition?
I think your question should be rephrased to "what's the difference between find and aggregate".
Before I dive into that I will say that both commands are similar and will perform generally the same at scale. If you want specific differences is that you did not add a project option to your find query so it will return the full document.
Regarding which is better, generally speaking unless you need a specific aggregation operator it's best to use find instead, it performs better
Now why is the aggregation framework performance "worse"? it's simple. it just does "more".
Any pipeline stage needs aggregation to fetch the BSON for the document then convert them to internal objects in the pipeline for processing - then at the end of the pipeline they are converted back to BSON and sent to the client.
This, especially for large queries has a very significant overhead compared to a find where the BSON is just sent back to the client.
Because of this, if you could execute your aggregation as a find query, you should.
Aggregation is slower than find.
In your example, Aggregation
In the first stage, you are returning all the documents with projected fields
For example, if your collection has 1000 documents, you are returning all 1000 documents each having specified projection fields. This will impact the performance of your query.
Now in the second stage, You are filtering the documents that match the query filter.
For example, out of 1000 documents from the stage 1 you select only few documents
In your example, find
First, you are filtering the documents that match the query filter.
For example, if your collection has 1000 documents, you are returning only the documents that match the query condition.
Here You did not specify the fields to return in the documents that match the query filter. Therefore the returned documents will have all fields.
You can use projection in find, instead of using aggregation
db.getCollection('stock_records').find(
{
'info.store_id': '563dcf3465512285781608802a',
'info.created_date': {
$gt: ISODate('2021-07-18T21:07:42.313+00:00')
}
},
{
"info.created_date": 1,
"info.store_id": 1,
"info.store_name": 1,
"_id": 1
}
)

Spring Data Mongo - Perform Regular Search?

I went through many links like this: How to create full text search query in mongodb with spring-data?, but did not get the correct approach.
I've an Employee collection which holds 1000 documents. I want to give capability to perform search ignorecase where when I search for ra, I should get Ravi,Ram, rasika etc names.
I used below logic which works fine, but I wanted to understand from the perspective of performance. Is there any better solution than this?
Query query = new Query(Criteria.where("employeeName").regex("^"+employeeName, "i"));
You can create an index on the field you are applying the query filter using the regular expression. For example, consider the documents in a person collection:
{ "name" : "ravi" }
{ "name" : "ram" }
{ "name" : "John" }
{ "name" : "renu" }
{ "name" : "Raj" }
{ "name" : "peter" }
The following query (run from Mongo Shell) finds and fetches the four documents with the names starting with the letter "r" or "R":
db.person.find( { name: { $regex: "^r", $options: "i" } } )
But, the query performs a collection scan, without an index on the name field. So, create an index on the field.
db.person.createIndex( { name: 1 } )
Now, run the query and generate a query plan for the same query (using the explain()). The query plan shown that it is an IXSCAN (an indexed scan). And, this will be an efficiently performing query.
Note that prefix searches (as in the above query using the ^) on index fields results in faster performing queries.
From the documentation:
For case sensitive regular expression queries, if an index exists for
the field, then MongoDB matches the regular expression against the
values in the index, which can be faster than a collection scan.
Further optimization can occur if the regular expression is a “prefix
expression”, which means that all potential matches start with the
same string. This allows MongoDB to construct a “range” from that
prefix and only match against those values from the index that fall
within that range.
Though the documentation says the following (see below paragraph), the query I ran did use the index and the query plan generated using the explain() showed an index scan.
Case insensitive regular expression queries generally cannot use
indexes effectively. The $regex implementation is not collation-aware
and is unable to utilize case-insensitive indexes.

MongoDB: using indexes on multiple fields or an array

I'm new with mongo
Entity:
{
"sender": {
"id": <unique key inside type>,
"type": <enum value>,
},
"recipient": {
"id": <unique key inside type>,
"type": <enum value>,
},
...
}
I need to create effective seach by query "find entities where sender or recipient equal to user from collection" with paging
foreach member in memberIHaveAccessTo:
condition ||= member == recipient || member == sender
I have read some about mongo indexes. Probably my problem can be solve by storing addional field "members" which will be array contains sender and recipient and then create index on this array
Is it possible to build such an index with monga?
Is mongo good choise to create indexes like?
Some thoughts about the issues raised in the question about querying and the application of indexes on the queried fields.
(i) The $or and two indexes:
I need to create effective search by query "find entities where sender
or recipient equal to user from collection...
Your query is going to be like this:
db.test.find( { $or: [ { "sender.id": "someid" }, { "recipient.id": "someid" } ] } )
With indexes defined on "sender.id" and "recipient.id", two individual indexes, the query with the $or operator will use both the indexes.
From the docs ($or Clauses and Indexes):
When evaluating the clauses in the $or expression, MongoDB either
performs a collection scan or, if all the clauses are supported by
indexes, MongoDB performs index scans.
Running the query with an explain() and examining the query plan shows that indexes are used for both the conditions.
(ii) Index on members array:
Probably my problem can be solve by storing addtional field "members"
which will be array contains sender and recipient and then create
index on this array...
With the members array field, the query will be like this:
db.test.find( { members_array: "someid" } )
When an index is defined on members_array field, the query will use the index; the generated query plan shows the index usage. Note that an index defined on an array field is referred as Multikey Index.

MongoDB best optimisation on embedded documents using index

In MondoDB, how can you optimize better a collection with embedded documents using index, if you need to query the embedded document?
For example, in a collection with the following format:
{
name: “Andy”,
address: {
city: “London”,
street: “Sunny St.”
}
}
If we need to query by:
db.collection.find( {$and: [ {"address.city ": “London”}, {"address”: “Sunny St."} ] } )
Which type of index will be better:
1. db.collection.createIndex({"address":1})
2. db.collection.createIndex({"address.city ":1})
db.collection.createIndex({"address.street":1})
3. db.collection.createIndex({"address.city ":1, "address.street":1})
Thanks
for given query proposal number 3
db.collection.createIndex({"address.city ":1, "address.street":1})
will do the job as there is logical relation city=> street
if you need to get more precise output how mongo uses index and perform your own test use query.explain("executionStats") to see index usage.
more here

Mongodb $and returns no results

This is a similar question to $and query returns no result, however the answer there does not apply in this case.
I have the following query:
{
$and: [
{
ownerId: "505b832c975a5c3ca6e9523b"
},
{
privacyLevel: "Public"
}
]
}
My collection has 16 documents, all of which are "Public" and 7 of which have the ownerId above. The subqueries behave correctly and return the correct documents so I would expect 7 results from this query.
The $and query returns nothing, I am at a loss as to why that might be.
If you are just querying two fields you do not need an $and operator. Your query will simply be:
.find({ownerId: "505b832c975a5c3ca6e9523b", privacyLevel: "Public"})