Indexing is possible while fetching some particular records, sorting, ordering etc but suppose a collection contains lot many documents and it is taking time to fetch them all and display. So how to make this query faster using indexing? Is it possible using indexing? If yes then is it the best way or is there any other way too?
EDIT 1
Since indexing can't be used in my case. What is the most efficient way to write a query to fetch millions of records?
Edit 2
This is my mongoose query function for fetching data from some collection. If this collection has millions of data then obviously it will affect performance so how will you use indexing in this case for good performance?
Info.statics.findAllInfo = function( callback) {
this.aggregate([{$project:{
"name":"$some_name",
"age":"$some_age",
"city":"$some_city",
"state":"$some_state",
"country":"$some_country",
"zipcode":"$some_zipcode",
"time":{ $dateToString: { format: "%Y-%m-%d %H:%M:%S ", date: "$some_time" } }
}},
{
$sort:{
_id:-1
}
}],callback);
};
I haven't tried lean() method yet due to some temporary issue. But still i would like to know whether it will help or not?
Related
I am running examples of aggregate queries similar to this:
https://www.compose.com/articles/aggregations-in-mongodb-by-example/
db.mycollection.aggregate([
{
{ $match: {"nested.field": "1110"}}, {
$group: {
_id: null,
total: {
$sum: "$nested.field"
},
average_transaction_amount: {
$avg: "$nested.field"
},
min_transaction_amount: {
$min: "$nested.field"
},
max_transaction_amount: {
$max: "$nested.field"
}
}
}
]);
One collection that I created have 5,000,000 inserted big JSON documents (around 1,000 K->V pairs, some are nested).
Before adding index on one nested field - it takes around 5min to do count of that field.
After adding index - for count it takes less than a second (which is good).
Now I am trying to do SUM or AVG or any other like example above - it takes minutes (not seconds).
Is there a way to improve aggregate queries in MongoDB?
Thanks!
Unfortunately, group currently does not use indexes in mongodb. Only sort and match can take advantage of indexes. So the query as you wrote it is as optimized as it could be.
There are a couple things you could do. For max and min, you could just query them instead of using the aggregation framework. You can than sort by $nested.field and take just one. You can put an index on $nested.field and you can then sort ascending or descending with the same index.
If you have any control over when the data is inserting, and the query is as simple as it looks, you could keep track of the data yourself. So you could have a table in mongo where the collection has the "Id" or whatever you are grouping on and have fields for "total" and "sum". You could increment them on inserts and then getting the total and averages would be fast queries. Not sure if that's an option for your situation, but its the best you can do.
Generally, mongo is super fast. In my opinion, the only place its not quite as good as SQL is aggregation. The benefits heavily outweigh the struggles to me. I generally maintain separate reporting collections for this kind of situation as I recommended.
I have one big mongodb collection (3-million docs, 50 GigaBytes), and it would be very slow to query the data even I have created the indexs.
db.collection.find({"C123":1, "C122":2})
e.g. the query will be timeout or will be extreme slow (10s at least), even if I have created the separate indexes for C123 and C122.
Should I create more indexs or increase the physical memory to accelerate the querying?
For such a query you should create compound indexes. One on both fields. And then it should be very efficient. Creating separate indexes won't help you much, because MongoDB engine will use first to get results of first part of query, but second if is used won't help much (or even can slow down in some cases your query because of lookup in indexes table and then in real data again). You can confirm used indexes by using .explain() on your query in shell.
See compound indexes:
https://docs.mongodb.com/manual/core/index-compound/
Also consider sorting directions on both your fields while making indexes.
The answer is really simple.
You don't need to create more indexes, you need to create the right indexes. Index on field c124 won't help queries on field c123, so no point in creating it.
Use better/more hardware. More RAM, more machines (sharding).
Create Right indices and carefully use compound index. (You can have max. 64 indices per collection and 31 fields in compound index)
Use mongo side pagination
Try to find out most used queries and build compound index around that.
Compound index strictly follow sequence so read documentation and do trials
Also try covered query for 'summary' like queries
Learned it hard way..
Use skip and limit. Run a loop for 50000 data at once .
https://docs.mongodb.com/manual/reference/method/cursor.skip/
https://docs.mongodb.com/manual/reference/method/cursor.limit/
example :
[
{
$group: {
_id: "$myDoc,homepage_domain",
count: {$sum: 1},
entry: {
$push: {
location_city: "$myDoc.location_city",
homepage_domain: "$myDoc.homepage_domain",
country: "$myDoc.country",
employee_linkedin: "$myDoc.employee_linkedin",
linkedin_url: "$myDoc.inkedin_url",
homepage_url: "$myDoc.homepage_url",
industry: "$myDoc.industry",
read_at: "$myDoc.read_at"
}
}
}
}, {
$limit : 50000
}, {
$skip: 50000
}
],
{
allowDiskUse: true
},
print(
db.Or9.insert({
"HomepageDomain":myDoc.homepage_domain,
"location_city":myDoc.location_city
})
)
We have a very big MongoDB collection of documents with some pre-defined fields that can either have a value or not.
We need to gather fill-rates of those fields, we wrote a script that goes over all documents and counts fill-rates for each, problem is it takes a long time to process all documents.
Is there a way to use db.collection.aggregate or db.collection.mapReduce to run such a script server-side?
Should it have significant performance improvements?
Will it slow down other usages of that collection (e.g. holding a major lock)?
Answering my own question, I was able to migrate my script using a cursor to scan the whole collection, to a map-reduce query, and running on a sample of the collection it seems it's at least twice as fast using the map-reduce.
Here's how the old script worked (in node.js):
var cursor = collection.find(query, projection).sort({_id: 1}).limit(limit);
var next = function() {
cursor.nextObject(function(err, doc) {
processDoc(doc, next);
});
};
next();
and this is the new script:
collection.mapReduce(
function () {
var processDoc = function(doc) {
...
};
processDoc(this);
},
function (key, values) {
return Array.sum(values)
},
{
query : query,
out: {inline: 1}
},
function (error, results) {
// print results
}
);
processDoc stayed basically the same, but instead of incrementing a counter on a global stats object, I do:
emit(field_name, 1);
running old and new on a sample of 100k, old took 20 seconds, new took 8.
some notes:
map-reduce's limit option doesn't work on sharded collections, I had to query for _id : { $gte, $lte} to create the sample size needed.
map-reduce's performance boost option: jsMode : true doesn't work on sharded collections as well (might have improve performance even more), it might work to run it manually on each shard to gain that feature.
As I understood what you want to achieve is compute something on your documents, after that you have a new "document" that can be queried. You don't need to store the "new values" computed.
If you don't need to write your "new values" inside that documents, you can use Aggregation Framework.
Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.
https://docs.mongodb.com/manual/aggregation/
Since Aggregation Framework has a lot of features i can't give you more informations about how to resolve your issue.
I have a collection subscribers.
I want to get a segment of subscribers by applying sometimes complex filters in the query db.subscribers.find({ age: { $gt: 20 }, ...etc }), but I don't want to save the result, since that would be inefficient.
Instead, I would like to save only the filters applied in the query as a set of rules in the segments collection.
Is that a good approach and what would be an efficient way to do that?
Should I just save the query object itself as a document or define a more restrictive schema before saving?
I have some simple transaction-style data in a flat format like the following:
{
'giver': 'Alexandra',
'receiver': 'Julie',
'amount': 20,
'year_given': 2015
}
There can be multiple entries for any 'giver' or 'receiver'.
I am mostly querying this data based on the giver field, and then split up by year. So, I would like to speed up these queries.
I'm fairly new to Mongo so I'm not sure which of the following methods would be the best course of action:
1. Restructure the data into the format:
{
'giver': 'Alexandra',
'transactions': {
'2015': [
{
'receiver': 'Julie',
'amount': 20
},
...
],
'2014': ...,
...
}
}
This makes me the most sense to me. We place all transactions into subdocuments of a person rather than having transactions all over the collection. It provides the data in the form I query it by the most, so it should be fast to query by 'giver' and then 'transactions.year'
I'm unsure if restructuring data like this is possible inside of mongo or if I should export it and modify it outside of mongo via some programming language.
2. Simply index by 'giver'
This doesn't quite match the way I'm querying this data (by 'giver' then 'year'), but it could be fast enough to do what I'm looking for. It's simple within mongo to do, and doesn't require restructuring of the data.
How should I go about adjusting my database to make my queries faster? And which way is the 'Mongo way'?