MongoDB speed query when searching by part of text - mongodb

I have smallish size database with 2 million records of phone calls.
When I execute:
db.getCollection('calls').find({
'IsIncoming':true, 'DateCreated' : { '$gte':ISODate('2010-12-02T02:26:22.478Z') }, 'CallerIdNum':"2545874578"
}).limit(100).count({})
it is supper fast and it takes 95ms. Note that IsIncoming, DateCreted and CallerIdNum have indexes. Every time I search using those fields it is supper fast
The moment I search for something containing part of a text it is very slow. For example this query now takes 25 seconds:
db.getCollection('calls').find({
'IsIncoming':true, 'DateCreated' : { '$gte': ISODate('2010-12-02T02:26:22.478Z') }, 'CallerIdNum' : /2545874/
}).limit(100).count({})
I know the reason is because I am searching within CallerIdNum. If I where to know the full caller id in advance like on my first query then it will be fast.
Question
I will like the last query to execute faster. I know it is probably impossible, and the only way of getting a great performance is by searching by the whole CallerIdNum. But maybe/hopefully I am wrong and someone can help me find a way of executing my last query faster.

The problem here is that you are searching for a substring of a caller ID number /2545874/. The is not sargable, and generally can't use an index. Assuming you really want numbers which start with that prefix, then use this sargable version:
db.getCollection('calls').find({
'IsIncoming':true, 'DateCreated' : { '$gte': ISODate('2010-12-02T02:26:22.478Z') }, 'CallerIdNum' : /^2545874/
}).limit(100).count({})
You might also want to add a compound index on all three fields, though at least the version of the query I gave above can use an index involving the CallerIdNum field.

Related

MongoDB - Using Index to get nested IDs is slow

I have a MongoDB collection with 8k+ documents, around 40GB. Inside it, the data follows this format:
{
_id: ...,
_session: {
_id: ...
},
data: {...}
}
I need to get all the _session._id for my application. The following approach (python) takes too long to get them:
cursor = collection.find({}, projection={'_session._id': 1})
I have created an Index in MongoDB Compass, but I'm not sure if my query is making use of it at all.
Is there a way to speed this query such that I get all the _session._id very fast?
In mongo shell you can hint() the query optimizer to use the available index as follow:
db.collection.find({},{_id:0,"_session._id":1}).hint({"_session._id":1})
Following test is confirmed to work via python:
import pymongo
db=pymongo.MongoClient("mongodb://user:pass#localhost:12345")
mydb=db["test"]
docs= mydb.test2.find( {} ).hint([ ("x.y", pymongo.ASCENDING) ])
for i in docs:
print(i)
db.test2.createIndex({"x.y":1})
{
"v" : 2,
"key" : {
"x.y" : 1
},
"name" : "x.y_1"
}
python 3.7 ,
pymongo 3.11.2 ,
mongod 5.0.5
In your case seems to be text index , btw it seems abit strange why session is text index , for text index somethink like this must work:
db.test2.find({}).hint("x.y_text").explain()
And here is working example with text index:
import pymongo
db=pymongo.MongoClient("mongodb://user:pass#localhost:123456")
print('Get first 10 docs from test.test:')
mydb=db["test"]
docs= mydb.test2.find( {"x.y":"3"} ).hint( "x.y_text" )
print("===start:====")
for i in docs:
print(i)
db.test2.createIndex({"x.y":"text"}):
{
"v" : 2,
"key" : {
"_fts" : "text",
"_ftsx" : 1
},
"name" : "x.y_text",
"weights" : {
"x.y" : 1
},
"default_language" : "english",
"language_override" : "language",
"textIndexVersion" : 3
}
There are a few points of confusion in this question and the ensuing discussion which generally come down to:
What indexes are present in the environment (and why the attempts to hint it failed)
When using indexing is most appropriate
Current Indexes
I think there are at least 5 indexes that were mentioned so far:
A standard index of {"_session._id":1} mentioned originally in #R2D2's answer.
A text index on the _session._id field (mentioned in this comment)
A text index on the _ts_meta.session field (mentioned in this comment)
A standard index of {"x.y":1} mentioned second in #R2D2's answer.
A text index of {"x.y":"text"} mentioned at the end of #R2D2's answer.
Only the first of these is likely to even really be relevant to the original question. Note that the difference a text index is a specialized index that is meant for performing more advanced text searching. Such indexes are not required for simple string matching or value retrieval. But standard indexes, { '_session._id': 1}, will also store string values and are relevant here.
What Indexing is For
Indexes are typically useful for retrieving a small subset of results from the database. The larger that set of results becomes relative to the overall size of the collection, the less helpful using an index will become. In your situation you are looking to retrieve data from all of the documents in the collection which is why the database doesn't consider using any index at all.
Now it is still possible that an index could help in this situation. That would be if we used it to perform a covered query which means that the data can be retrieved from the index alone without looking at the documents themselves. In this case the database would have to scan the full index, so it is not clear that it would be faster or not. But you could certainly try. To do so you would need to follow #R2D2's instructions, specifically by creating the index and then hinting it in the query (while also projecting out the _id field):
db.collection.createIndex({"_session._id":1})
db.collection.find({},{_id:0,"_session._id":1}).hint({"_session._id":1})
Additional Questions
There were two other things mentioned in the question that are important to address.
I have created an Index in MongoDB Compass, but I'm not sure if my query is making use of it at all.
We talked about why this was the case above. But to find out if the database is using it or not you could navigate to the Explain tab in compass to take a look. If you explain plan visualization it should indicate if the index was used. Remember that you will need to hint the index based on your query.
Is there a way to speed this query such that I get all the _session._id very fast?
What is your definition of "very fast" here?
The general answer is that your operation requires scanning either all documents in the collection or a full index. There is no way to do this more efficiently based on the current schema. Therefore how fast it happens is largely going to come down to the hardware that the database is running on and it will slow down as the collection grows.
If this operation is something that you will be running frequently or have strict performance requirements around, then it may be important to think through your intended goals to see if there are other ways of achieving them. What will you or the application be doing with this list of session IDs?

MongoDB query is slow even when searching by indexes

I have a collection called calls containing properties DateStarted, DateEnded, IdAccount, From, To, FromReversed, ToReversed. In other words this is how a call document looks like:
{
_id : "LKDJLDKJDLKDJDLKJDLKDJDLKDJLK",
IdAccount: 123,
DateStarted: ISODate('2020-11-05T05:00:00Z'),
DateEnded: ISODate('2020-11-05T05:20:00Z'),
From: "1234567890",
FromReversed: "0987654321",
To: "1231231234",
ToReversed: "4321321321"
}
On our website we want to give customers the option to search by custom calls. When they search for calls they must specify the DateStarted and DateEnded Those fields are required the other ones are optional. The IdAccount will be injected on our end so that the customer can only get calls that belong to his account.
Because we have about 5 million records we have created the following indexes
db.calls.ensureIndex({"IdAccount":1});
db.calls.ensureIndex({"DateStarted":1});
db.calls.ensureIndex({"DateEnded":1});
db.calls.ensureIndex({"From":1});
db.calls.ensureIndex({"FromReversed":1});
db.calls.ensureIndex({"To":1});
db.calls.ensureIndex({"ToReversed":1});
The reason why we did not created a compound index is because we want to be able to search by custom criteria. For example we may want to search by all calls with date smaller than December 11 and from a specific account.
Because of the indexes all these queries execute very fast:
db.calls.find({'DateStarted' : {'$gte': ISODate('2020-11-05T05:00:00Z')}).limit(200).explain();
db.calls.find({'DateEnded' : {'$lte': ISODate('2020-11-05T05:00:00Z')}).limit(200).explain();
db.calls.find({'IdAccount' : 123 ).limit(200).explain();
// etc...
Even queries that use regexes execute very fast. They only work fast if I use ^... meaning that it must start with a search pattern as:
db.calls.find({ 'From' : /^305/ ).limit(200).explain();
and that is the reason why we created the field FromReversed and ToReversed. If I want to search for a To phone number that ends with 3985 I will execute:
db.calls.find({ 'ToReversed' : /^5893/ ).limit(200).explain(); // note I will have to reverse the search option to
So the only queries that are slow are the ones that do not start with something such as this query:
db.calls.find({ 'ToReversed' : /1234/ ).limit(200).explain();
Question
Why is it that if I combine all the queries it is very slow? For example this query is very slow:
db.calls.find({
'DateStarted':{'$gte':ISODate('2018-11-05T05:00:00Z')},
'DateEnded':{'$lte':ISODate('2020-11-05T05:00:00Z')},
'IdAccount':123,
'ToReversed' : /^5893/
}).limit(200).explain();
The problem is the 'ToReversed' : /^5893/. If I execute that query by itself it is really fast. Even if I put something that does not give me the limit of 200 results fast. Should I add a compound index as well? just for the scenario where it is slow
I need to give our customers the option to search by phone numbers that end with or start with a specific criteria. The moment I add extra stuff to the query it becomes really slow.
Edit
By researching on the internet if I use the hint option it is faster. It goes from 20 seconds to 5 seconds.
db.calls.find({
'DateStarted':{'$gte':ISODate('2018-11-05T05:00:00Z')},
'DateEnded':{'$lte':ISODate('2020-11-05T05:00:00Z')},
'IdAccount':123,
'ToReversed' : /^5893/
}).hint({'ToReversed':1}).limit(200).explain();
This is still slow and it will be great if I can lower it to 1 second just like the simple queries take milliseconds.
For the find query you showed us involving filtering on 4 fields, ideally the optimal index would cover all 4 fields:
db.calls.createIndex( {
"DateStarted": 1,
"DateEnded": 1,
"IdAccount": 1,
"ToReversed": 1
} )
As to which columns should appear first, you should generally place the most restrictive columns first. Check the cardinality of your data to determine this.

Mongodb: Skip collection values from between (not a normal pagination)

I have browsed through various examples but have failed to find what I am looking for.. What I want is to search for a specific document by _id and skip multiple times between a collection by using one query? Or some alternative which is fast enough to my case.
Following query would skip first one and return second in advance:
db.posts.find( { "_id" : 1 }, { comments: { $slice: [ 1, 1 ] } } )
That would be skip 0, return 1 and leaves the rest out from result..
But what If there would be like 10000 comments and I would want to use same pattern, but return that array values like this:
skip 0, return 1, skip 2, return 3, skip 4, return 5
So that would return collection which comments would be size of 5000, because half of them is skipped away. Is this possible? I applied large number like 10000 because I fear that using multiple queries to apply this would not be performance wise.. (example shown in here: multiple queries to accomplish something similar). Thnx!
I went through several resources and concluded that currently this is impossible to make with one query.. Instead, I agreed on that there are only two options to overcome this problem:
1.) Make a loop of some sort and run several slice queries while increasing the position of a slice. Similar to resource I linked:
var skip = NUMBER_OF_ITEMS * (PAGE_NUMBER - 1)
db.companies.find({}, {$slice:[skip, NUMBER_OF_ITEMS]})
However, depending on the type of a data, I would not want to run 5000 individual queries to get only half of the array contents, so I decided to use option 2.) Which seems for me relatively fast and performance wise.
2.) Make single query by _id to row you want and before returning results to client or some other part of your code, skip your unwanted array items away by using for loop and then return the results. I made this at java side since I talked to mongo via morphia. I also used query explain() to mongo and understood that returning single line with array which has 10000 items while specifying _id criteria is so fast, that speed wasn't really an issue, I bet that slice skip would only be slower.

What is the fastest way to see when the last update to MongoDB was made

I'm writing a long-polling script to check for new documents in my mongo collection and the only way I know of checking to see whether or not changes have been made in the past iteration of my loop is to do a query getting the last document's ID and parsing out the timestamp in that ID and seeing if it's greater than the timestamp left since I last made a request.
Is there some kind of approach that doesn't involve making a query every time or something that makes that query the fastest?
I was thinking a query like this:
db.chat.find().sort({_id:-1}).limit(1);
But it would be using the PHP driver.
The fastest way will be creating indexes on timestamp field.
Creating index:
db.posts.ensureIndex( { timestamp : 1 } )
Optimizes this query:
db.posts.find().sort( { timestamp : -1 } )
findOne give you only one the last timestamp.
nice to help you.
#Zagorulkin Your suggestion is surely going to help in the required scenario. However i don't think so sort() works with findOne().

MongoDB Query for records with non-existant field & indexing

We have a mongo database with around 1M documents, and we want to poll this database using a processed field to find documents which we havent seen before. To do this we are setting a new field called _processed.
To query for documents which need to be processed, we query for documents which do not have this processed field:
db.stocktwits.find({ "_processed" : { "$exists" : false } })
However, this query takes around 30 seconds to complete each time, which is rather slow. There is an index (asc) which sits on the _processed field:
db.stocktwits.ensureIndex({ "_processed" : -1 },{ "name" : "idx_processed" });
Adding this index does not change query performance. There are a few other indexes sitting on the collection (namely the ID idx & a unique index of a couple of fields in each document).
The _processed field is a long, perhaps this should be changed to a bool to make things quicker?
We have tried using a $where query (i.e. $where : this._processed==null) to do the same thing as $exists : false and the performance is about the same (few secs slower which makes sense)...
Any ideas on what would be casusing the slow performance (or is it normal)? Does anyone have any suggestions on how to improve the query speed?
Cheers!
Upgrading to 2.0 is going to do this for you:
From MongoDB.org:
Before v2.0, $exists is not able to use an index. Indexes on other fields are still used.
Its slow because checking for _processed -> not exists doesnt offer much selectivity. Its like having an index on "Gender" - and since there are only two possible options male or female then if you have 1M rows and an index on Gender it will have to scan 50% or 500K rows to find all males.
You need to make your index more selective.