Reducing latency of MongoDB querying for large datasets - mongodb

I have the following code for querying ideas in my web app. I'm trying to figure out how I can optimize this code to run faster. Currently, for 100 entries it takes around 350ms, for 1000 entries it takes around 1000ms, and for 10,000 entries it takes around 6000ms for it to send a response. I've looked into indexing, but it doesn't seem to make it any faster. I used MongoLabs to create the index on tool, username, and rating_level for now. I was thinking about using Redis, but not sure how to implement it.
app.get('/api/ideas', function(req, res, next) {
var query;
if (req.query.tool) {
query = Idea.find({ tool : req.query.tool });
}
else if (req.query.username) {
query = Idea.find({ username : req.query.username });
}
else {
query = Idea.find();
}
if(req.query.sortRank) {
query = query.sort({rating_level: req.query.sortRank});
}
else if(req.query.sortDate) {
query = query.sort({datetime: req.query.sortDate});
}
else {
query = query.sort({rating_level: -1});
}
query.exec(function(err, ideas) {
if (err) return next(err);
res.send(ideas);
});

You are generating several different query patterns and sort combinations in your code:
tool, rating_level
tool, datetime
username, rating_level
username, datetime
(none), rating_level
(none), datetime
Performant queries will match the keys and sort order (asc/desc) defined for a compound index, or a proper subset of the keys from left to right.
If you only have a single index on "tool, username, rating_level", this will not efficiently support any of your queries as listed above.
The index you've suggested would be useful for queries on:
tool
tool, username
tool, username, rating_level
To return results without having to perform an in-memory sort, the sort order in the compound index should match the sort order in the queries.
For example:
db.collection.find({tool:1}).sort({rating_level: -1})
...would ideally want an index of
{tool:1, rating_level:-1}
Given it looks like your sort order could include both ascending & descending variations, there are potentially a lot of indexes required to cover all your query variations efficiently.
You'll have to consider the tradeoffs for query performance vs index maintenance and storage overhead.
Some approaches to consider include:
Add all the required indexes. With only 10,000 documents it's probably reasonable to do so.
Optimise index covered for your most common queries. Use the explain() command to get a better understanding of how queries are using indexes.
Adjust your data model to simplify indexes. For example, you could consider using a multikey array so that tool and username are key/value pairs that can be included in the same index.
Adjust your application user interface to simplify the number of query and sort permutations.
Some helpful references:
Use Indexes to Sort Query Results
Optimizing MongoDB Compound Indexes
Indexing Schemaless Documents in Mongo

Related

Which MongoDB indexes should be created for different sorting and filtering conditions to improve performance?

I have MongoDB collection with ~100,000,000 records.
On the website, users search for these records with "Refinement search" functionality, where they can filter by multiple criteria:
by country, state, region;
by price range;
by industry;
Also, they can review search results sorted:
by title (asc/desc),
by price (asc/desc),
by bestMatch field.
I need to create indexes to avoid full scan for any of combination above (because users use most of the combinations). Following Equality-Sort-Range rule for creating indexes, I have to create a lot of indexes:
All filter combination × All sortings × All range filters, like the following:
country_title
state_title
region_title
title_price
industry_title
country_title_price
country_industry_title
state_industry_title
...
country_price
state_price
region_price
...
country_bestMatch
state_bestMatch
region_bestMatch
...
In reality, I have more criteria (including equality & range), and more sortings. For example, I have multiple price fields and users can sort by any of that prices, so I have to create all filtering indexes for each price field in case if the user will sort by that price.
We use MongoDB 4.0.9, only one server yet.
Until I had sorting, it was easier, at least I could have one compound index like country_state_region and always include country & state in the query when one searches for a region. But with sorting field at the end, I cannot do it anymore - I have to create all different indexes even for location (country/state/region) with all sorting combinations.
Also, not all products have a price, so I cannot just sort by price field. Instead, I have to create two indexes: {hasPrice: -1, price: 1}, and {hasPrice: -1, price: -1} (here, hasPrice is -1, to have records with hasPrice=true always first, no matter price sort direction).
Currently, I use the NodeJS code to generate indexes similar to the following (that's simplified example):
for (const filterFields of getAllCombinationsOf(['country', 'state', 'region', 'industry', 'price'])) {
for (const sortingField of ['name', 'price', 'bestMatch']) {
const index = {
...(_.fromPairs(filterFields.map(x => [x, 1]))),
[sortingField]: 1
};
await collection.ensureIndex(index);
}
}
So, the code above generates more than 90 indexes. And in my real task, this number is even more.
Is it possible somehow to decrease the number of indexes without reducing the query performance?
Thanks!
Firstly, in MongoDB (Refer: https://docs.mongodb.com/manual/reference/limits/), a single collection can have no more than 64 indexes. Also, you should never create 64 indexes unless there will be no writes or very minimal.
Is it possible somehow to decrease the number of indexes without reducing the query performance?
Without sacrificing either of functionality and query performance, you can't.
Few things you can do: (assuming you are using pagination to show results)
Create a separate (not compound) index on each column and let MongoDB execution planner choose index based on meta-information (cardinality, number, etc) it has. Of course, there will be a performance hit.
Based on your judgment and some analytics create compound indexes only for combinations which will be used most frequently.
Most important - While creating compound indexes you can let off sort column. Say you are filtering based on industry and sorting based on price. If you have a compound index (industry, price) then everything will work fine. But if you have index only on the industry (assuming paginated results), then for first few pages query will be quite fast, but will keep degrading as you move on to next pages. Generally, users don't navigate after 5-6 pages. Also, you have to keep in mind for larger skip values, the query will start to fail because of the 32mb memory limit for sorting. This can be overcome with aggregation (instead of the query) with allowDiskUse enable.
Check for keyset pagination (also called seek method) if that can be used in your use-case.

Sort results of search by regex in MongoDB

So, there's a web-server that has a number of methods which are used for autocompleting input fields on the client. Methods take a string and scan a specific property of mongodb collection using regexp.
Pretty common stuff, right? But here's a problem - these methods need to sort results based on how close the searched string is to the start of the result string. Like if I searched for countries and typed "ru", "Russia" should come before "Peru".
I don't see how I can sort results like this without performing multiple searches. Now I can only think of something like this
const limit = 20;
const resultsStartOfLine = db.countries.find({name: /^ru/i})
.limit(limit)
.toArray();
const resultsRest = db.countries.find({
name: /ru/i,
_id: {$nin: _.map(resultsStartOfLine, '_id')}
})
.limit(limit - resultsStartOfLine.length)
.toArray();
I know, that Mongo can't do this kind of sort by default, but maybe there's better way to do it?
As I've learned search by regex is usually a bad practice because it doesn't utilize indexes and as a result is pretty slow.
So I created an index for full-text search and sort results by weights.

MongoDB - how to get fields fill-rates as quickly as possible?

We have a very big MongoDB collection of documents with some pre-defined fields that can either have a value or not.
We need to gather fill-rates of those fields, we wrote a script that goes over all documents and counts fill-rates for each, problem is it takes a long time to process all documents.
Is there a way to use db.collection.aggregate or db.collection.mapReduce to run such a script server-side?
Should it have significant performance improvements?
Will it slow down other usages of that collection (e.g. holding a major lock)?
Answering my own question, I was able to migrate my script using a cursor to scan the whole collection, to a map-reduce query, and running on a sample of the collection it seems it's at least twice as fast using the map-reduce.
Here's how the old script worked (in node.js):
var cursor = collection.find(query, projection).sort({_id: 1}).limit(limit);
var next = function() {
cursor.nextObject(function(err, doc) {
processDoc(doc, next);
});
};
next();
and this is the new script:
collection.mapReduce(
function () {
var processDoc = function(doc) {
...
};
processDoc(this);
},
function (key, values) {
return Array.sum(values)
},
{
query : query,
out: {inline: 1}
},
function (error, results) {
// print results
}
);
processDoc stayed basically the same, but instead of incrementing a counter on a global stats object, I do:
emit(field_name, 1);
running old and new on a sample of 100k, old took 20 seconds, new took 8.
some notes:
map-reduce's limit option doesn't work on sharded collections, I had to query for _id : { $gte, $lte} to create the sample size needed.
map-reduce's performance boost option: jsMode : true doesn't work on sharded collections as well (might have improve performance even more), it might work to run it manually on each shard to gain that feature.
As I understood what you want to achieve is compute something on your documents, after that you have a new "document" that can be queried. You don't need to store the "new values" computed.
If you don't need to write your "new values" inside that documents, you can use Aggregation Framework.
Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.
https://docs.mongodb.com/manual/aggregation/
Since Aggregation Framework has a lot of features i can't give you more informations about how to resolve your issue.

how to use indexing for fetching all the documents in mongodb?

Indexing is possible while fetching some particular records, sorting, ordering etc but suppose a collection contains lot many documents and it is taking time to fetch them all and display. So how to make this query faster using indexing? Is it possible using indexing? If yes then is it the best way or is there any other way too?
EDIT 1
Since indexing can't be used in my case. What is the most efficient way to write a query to fetch millions of records?
Edit 2
This is my mongoose query function for fetching data from some collection. If this collection has millions of data then obviously it will affect performance so how will you use indexing in this case for good performance?
Info.statics.findAllInfo = function( callback) {
this.aggregate([{$project:{
"name":"$some_name",
"age":"$some_age",
"city":"$some_city",
"state":"$some_state",
"country":"$some_country",
"zipcode":"$some_zipcode",
"time":{ $dateToString: { format: "%Y-%m-%d %H:%M:%S ", date: "$some_time" } }
}},
{
$sort:{
_id:-1
}
}],callback);
};
I haven't tried lean() method yet due to some temporary issue. But still i would like to know whether it will help or not?

In MongoDB how do you query for records that contain ONLY certain fields and no others

In MongoDB,
To query for records that contain certain fields you can do:
collection.find({'field_name1': {'$exists': true}})
And that will return any record that has the 'field_name1' field set...
But how do you query mongo to find records that contains ONLY 'field_name1' (and no other fields)? I'd like to be able to do this for, say, a list of fields.
The sad answer, as you'll often find with MongoDB and other NoSQL databases is probably that it would be best to structure your data in a way that allows you to query it as simply as possible.
That said, there are ways of doing this, but as far as I know, it requires you execute JavaScript server side. This will be slow, and cannot possibly take advantage of indexes and other logical features of MongoDB, so use it only if it's absolutely necessary, if performance is at all important.
So, the easiest way to do this, is probably to create a function that returns the number of fields in an object, which we can use with the $where query syntax. This allows you to run arbitrary JavaScript queries against your data, and can be combined with normal syntax queries.
Sadly, my JavaScript-fu is a little weak, so I don't know how (or if) you can get at the count of members of an object in JS in a one-liner, so to do this, I would store a function server side.
From the mongo shell, execute the following:
db.system.js.save(
{
"_id" : "countFields",
"value" : function(x) { i=0; for(p in x) { i++; } return i}
}
)
With that, you have a saved JavaScript function, server side, called countFields that returns the number of elements in an object. Now, you need to execute your find-operation with the $where query:
db.collection.find({
'field_name1': {'$exists': True},
'$where' : 'countFields(this)==2'
})
This would give you only the documents that meet both the $exists condition, and the $where clause. Note that I'm comparing with 2 in the example, since the countFields function counts _id as a field.