Mongo group query does not used indexes or slow down queries - mongodb

I have used mongodb 1.8.1. In which I have collection which contains more than 1.8 million records. In this collections all records are simple object means not nested objects or array
Like as follows
{ name : "xyz" , "id" : 123 ,"a" : "na" , "c" : "in" , "cmp" : "pq" , "ttl" : "sd"}
All records are like this.
On this collections at time more 5 queries fire in which 2 is simple queries one contains exists in it and another one is simple query which uses index properly.
Another 2 are group queries which in which condition fields are in indexes and one contains exists.
Another one 1 distinct query with proper condition which is also index.
And order of query fire is first qroup queries then 1 simple query then distinct query and last simple query.
So data loads slowly.
If such 2 -3 calls make then it loads very lowly sometimes gives error read time out.
The collections have more than 1 index.

$exists queries do not use indexes (fixed from 1.9.1 onwards)
group commands use the JS context of mongodb which is exclusively locked while it's being used. This will affect performance of concurrent group queries. A new aggregation framework is under development that should help with this (2.1 onwards). Monitor https://jira.mongodb.org/browse/SERVER-447 for progress. In my experience it's usually more performant to do "group" like aggregation app-side.

Related

MongoDB query is slow even when searching by indexes

I have a collection called calls containing properties DateStarted, DateEnded, IdAccount, From, To, FromReversed, ToReversed. In other words this is how a call document looks like:
{
_id : "LKDJLDKJDLKDJDLKJDLKDJDLKDJLK",
IdAccount: 123,
DateStarted: ISODate('2020-11-05T05:00:00Z'),
DateEnded: ISODate('2020-11-05T05:20:00Z'),
From: "1234567890",
FromReversed: "0987654321",
To: "1231231234",
ToReversed: "4321321321"
}
On our website we want to give customers the option to search by custom calls. When they search for calls they must specify the DateStarted and DateEnded Those fields are required the other ones are optional. The IdAccount will be injected on our end so that the customer can only get calls that belong to his account.
Because we have about 5 million records we have created the following indexes
db.calls.ensureIndex({"IdAccount":1});
db.calls.ensureIndex({"DateStarted":1});
db.calls.ensureIndex({"DateEnded":1});
db.calls.ensureIndex({"From":1});
db.calls.ensureIndex({"FromReversed":1});
db.calls.ensureIndex({"To":1});
db.calls.ensureIndex({"ToReversed":1});
The reason why we did not created a compound index is because we want to be able to search by custom criteria. For example we may want to search by all calls with date smaller than December 11 and from a specific account.
Because of the indexes all these queries execute very fast:
db.calls.find({'DateStarted' : {'$gte': ISODate('2020-11-05T05:00:00Z')}).limit(200).explain();
db.calls.find({'DateEnded' : {'$lte': ISODate('2020-11-05T05:00:00Z')}).limit(200).explain();
db.calls.find({'IdAccount' : 123 ).limit(200).explain();
// etc...
Even queries that use regexes execute very fast. They only work fast if I use ^... meaning that it must start with a search pattern as:
db.calls.find({ 'From' : /^305/ ).limit(200).explain();
and that is the reason why we created the field FromReversed and ToReversed. If I want to search for a To phone number that ends with 3985 I will execute:
db.calls.find({ 'ToReversed' : /^5893/ ).limit(200).explain(); // note I will have to reverse the search option to
So the only queries that are slow are the ones that do not start with something such as this query:
db.calls.find({ 'ToReversed' : /1234/ ).limit(200).explain();
Question
Why is it that if I combine all the queries it is very slow? For example this query is very slow:
db.calls.find({
'DateStarted':{'$gte':ISODate('2018-11-05T05:00:00Z')},
'DateEnded':{'$lte':ISODate('2020-11-05T05:00:00Z')},
'IdAccount':123,
'ToReversed' : /^5893/
}).limit(200).explain();
The problem is the 'ToReversed' : /^5893/. If I execute that query by itself it is really fast. Even if I put something that does not give me the limit of 200 results fast. Should I add a compound index as well? just for the scenario where it is slow
I need to give our customers the option to search by phone numbers that end with or start with a specific criteria. The moment I add extra stuff to the query it becomes really slow.
Edit
By researching on the internet if I use the hint option it is faster. It goes from 20 seconds to 5 seconds.
db.calls.find({
'DateStarted':{'$gte':ISODate('2018-11-05T05:00:00Z')},
'DateEnded':{'$lte':ISODate('2020-11-05T05:00:00Z')},
'IdAccount':123,
'ToReversed' : /^5893/
}).hint({'ToReversed':1}).limit(200).explain();
This is still slow and it will be great if I can lower it to 1 second just like the simple queries take milliseconds.
For the find query you showed us involving filtering on 4 fields, ideally the optimal index would cover all 4 fields:
db.calls.createIndex( {
"DateStarted": 1,
"DateEnded": 1,
"IdAccount": 1,
"ToReversed": 1
} )
As to which columns should appear first, you should generally place the most restrictive columns first. Check the cardinality of your data to determine this.

How to count total SELECT query requests for specific collection from MongoDB logs?

I have a MongoDB collection "cars" and below gives me how many documents exist inside it:
let carsColl = db.getCollection('cars');
carsColl.count();
However, I need to know how many times the cars collection is queried in total ? Also, say if I have two cars x and y, may I know how many SELECT queries executed for 'x' vs 'y' documents?
I checked MongoDB triggers but they can only be created for INSERT, UPDATE, REPLACE and DELETE, but not SELECT.
Any guidance will be of great help.
So from your question basically you're not looking to query documents in a collection rather looking to query on type of operations being done on cars collection i.e; on logs.
Where as your below code will give no.of documents in cars collection :
let carsColl = db.getCollection('cars');
carsColl.count();
As that's not what you're looking for, then try below :
Steps :
Enable Db profiling, by default your DB will log slow running queries that is any query that runs > 100ms, So you need to log all queries in this case. Execute below query to log all queries on DB :
db.setProfilingLevel(2)
Note : Doing this can effect your DB performance as it will be logging all operations, which might become an issue for Prod servers having high amount of DB calls. Ref : manage-the-database-profiler . Also check on how to execute above query as it might need user admin access.
Now once you do above step, You can execute these queries :
To get count of all read queries on collection cars which is inside test DB, try below query. Here op: 'query' refers to all find calls (Remember that aggregation is different it has op : command).
db.system.profile.find( { ns : 'test.cars', op: 'query' }).count()
To get count of specific queries : In this case let's say your collection has few documents with a field carType : x & few others has carType : y and you're querying on them like this db.cars.find('carType': 'x'), then :
db.system.profile.find( { ns : 'test.cars', op: 'query' , 'command.filter.carType' : 'x' }).count() // will give you all queries on x
db.system.profile.find( { ns : 'test.cars', op: 'query' , 'command.filter.carType' : 'y' }).count() // will give you all queries on y
Note : Basically you'll be giving your filter query over db.system.profile.find() to differentiate between your needed finds (Vs) all other find logs.

How to improve mongodb group query performance

I am currently using solr to store public tweet information. I have field such as content, sentiment, keywords, tstamp, language, tweet_id to capture the essence of the tweet. I am also evaluating Mongodb for the same use case. I am trying to benchmark mongodb and solr each having one million records.
What I have observed is that group query in mongodb are 2.5 to 3 times slower than the facet query of solr.
The following mongodb query
db.tweets.aggregate(
[
{
$group : {
_id : "$sentiment",
total : { $sum : 1 }
}
}
]
)
takes 481ms. I have index applied on sentiment field.
However the same thing in solr using facet query takes 93ms.
Is there any other configuration in mongodb which needs to be set so as to improve the group query performance in mongodb?
A $group operation and a facet search are not really comparable operations and the $group won't use an index. It looks like you are trying to compute the number of documents with each distinct value of sentiment. MongoDB doesn't have a specific function for this. For a specific value, a much better operation to get the count would be
db.collection.count({ "sentiment" : sentiment })
and you can get all of the distinct values with
db.collection.distinct("sentiment")
Both of these can use an index { "sentiment" : 1 }. You will need multiple queries to get counts for multiple values of sentiment so it's not as convenient as Solr. Faceted searching is a core competency of full text search engines, so it's not surprising this is easier in Solr than MongoDB. MongoDB and Solr meant for totally different uses, so I can't say I'd see why you'd benchmark one versus the other. It's like racing a boat against a car.

MongoDB Query for records with non-existant field & indexing

We have a mongo database with around 1M documents, and we want to poll this database using a processed field to find documents which we havent seen before. To do this we are setting a new field called _processed.
To query for documents which need to be processed, we query for documents which do not have this processed field:
db.stocktwits.find({ "_processed" : { "$exists" : false } })
However, this query takes around 30 seconds to complete each time, which is rather slow. There is an index (asc) which sits on the _processed field:
db.stocktwits.ensureIndex({ "_processed" : -1 },{ "name" : "idx_processed" });
Adding this index does not change query performance. There are a few other indexes sitting on the collection (namely the ID idx & a unique index of a couple of fields in each document).
The _processed field is a long, perhaps this should be changed to a bool to make things quicker?
We have tried using a $where query (i.e. $where : this._processed==null) to do the same thing as $exists : false and the performance is about the same (few secs slower which makes sense)...
Any ideas on what would be casusing the slow performance (or is it normal)? Does anyone have any suggestions on how to improve the query speed?
Cheers!
Upgrading to 2.0 is going to do this for you:
From MongoDB.org:
Before v2.0, $exists is not able to use an index. Indexes on other fields are still used.
Its slow because checking for _processed -> not exists doesnt offer much selectivity. Its like having an index on "Gender" - and since there are only two possible options male or female then if you have 1M rows and an index on Gender it will have to scan 50% or 500K rows to find all males.
You need to make your index more selective.

MongoDB : Indexes order and query order must match?

This question concern the internal method to manage indexes and serching Bson Documents.
When you create a multiple indexes like "index1", "index2", "index3"...the index are stored to be used during queries, but what about the order of queries and the performance resulting.
sample
index1,index2,index3----> query in the same order index1,index2,index3 (best case)
index1,index2,index3----> query in another order index2,index1,index3 (the order altered)
Many times you use nested queries including these 3 index and others items or more indexes. The order of the queries would implicate some time lost?. Must passing the queries respecting the indexes order defined or the internal architecture take care about this order search? I searching to know if i do take care about this or can make my queries in freedom manier.
Thanks.
The order of the conditions in your query does not affect whether it can use an index or no.
e.g.
typical document structure:
{
"FieldA" : "A",
"FieldB" : "B"
}
If you have an compound index on A and B :
db.MyCollection.ensureIndex({FieldA : 1, FieldB : 1})
Then both of the following queries will be able to use that index:
db.MyCollection.find({FieldA : "A", FieldB : "B"})
db.MyCollection.find({FieldB : "B", FieldA : "A"})
So the ordering of the conditions in the query do not prevent the index being used - which I think is the question you are asking.
You can easily test this out by trying the 2 queries in the shell and adding .explain() after the find. I just did this to confirm, and they both showed that the compound index was used.
however, if you run the following query, this will NOT use the index as FieldA is not being queried on:
db.MyCollection.find({FieldB : "B"})
So it's the ordering of the fields in the index that defines whether it can be used by a query and not the ordering of the fields in the query itself (this was what Lucas was referring to).
From http://www.mongodb.org/display/DOCS/Indexes:
If you have a compound index on
multiple fields, you can use it to
query on the beginning subset of
fields. So if you have an index on
a,b,c
you can use it query on
a
a,b
a,b,c
So yes, order matters. You should clarify your question a bit if you need a more precise answer.