Making a collection from collection subset in mongodb - mongodb

I have a huge collection of documents (more than two millions) ans I found my self querying very a small subset. using something like
scs = db.balance_sheets.find({"9087n":{$gte:40}, "20/58n":{ $lte:40000000}})
which gives less than 5k results. The question is, can I create a new collection with the results of this query?
I'd tried insert:
db.scs.insert(db.balance_sheets.find({"9087n":{$gte:40}, "20/58n":{ $lte:40000000}}).toArray())
But it gives me errors: Socket say send() errno:32 Broken pipe 127.0.0.1:27017
I tryied aggregate:
db.balance_sheets.aggregate([{ "9087n":{$gte:40}, "20/58n":{ $lte:40000000}} ,{$out:"pme"}])
And I get "exception: A pipeline stage specification object must contain exactly one field."
Any hints?
Thanks

The first option would be:
var cursor = db.balance_sheets.find({"9087n":{"$gte": 40}, "20/58n":{ $lte:40000000}});
while (cursor.hasNext()) {
var doc = cursor.next();
db.pme.save(doc);
};
As for the aggregation, try
db.balance_sheets.aggregate([
{
"$match": { "9087n": { "$gte": 40 }, "20/58n": { "$lte": 40000000 } }
},
{ "$out": "pme" }
]);
For improved performance especially when dealing with large collections, take advantage of using the Bulk API for bulk updates as you will be sending the operations to the server in batches of say 500 which gives you a better performance as you are not sending every request to the server, just once in every 500 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2 to insert all the documents matching the query from the balance_sheets collection into the pme collection:
var bulk = db.pme.initializeUnorderedBulkOp(),
counter = 0;
db.balance_sheets.find({
"9087n": {"$gte": 40},
"20/58n":{ "$lte":40000000}
}).forEach(function (doc) {
bulk.insert(doc);
counter++;
if (counter % 500 == 0) {
bulk.execute(); // Execute per 500 operations
// and re-initialize every 1000 update statements
bulk = db.pme.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 500 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite():
var bulkOps = db.balance_sheets.find({
"9087n": { "$gte": 40 },
"20/58n": { "$lte": 40000000 }
}).map(function (doc) {
return { "insertOne" : { "document": doc } };
});
db.pme.bulkWrite(bulkOps);

Related

Mongodb Aggregation slow count using facet

I am wanting to use a facet to create a simple query that i can use to get paged data, however i have noticed that if i do this i get really poor performance when compared to running just two seperate queries.
As a quick test i created a collection with 50000 random documents and ran the following test.
var x = new Date();
var a = {
count : db.getCollection("test").find({}).count(),
data: db.getCollection("test").find({}).skip(0).limit(10)
};
var y = new Date();
print('result ' + a);
print(y - x);
var x = new Date();
var a = db.getCollection("test").aggregate(
[
{
"$match" : {
}
},
{
"$facet" : {
"data": [
{
"$skip": 0
},
{
"$limit": 10
}
],
"pageInfo": [
{
"$group": {
"_id": null,
"count": {
"$sum": 1
}
}
}
]
}
}
]
)
var y = new Date();
print('result ' + a);
print(y - x);
The result of this is that two seperate queries one for find the other for count takes around 2 milliseconds vs the aggregation single query taking upwards of 500 milliseconds.
Why is it that the aggregation is so slow?
Update
Even just a count without a facet within an aggregation is slow
var x = new Date();
var a = db.getCollection("test").find({}).count();
var y = new Date();
print('result ' + a);
print(y - x);
var x = new Date();
var a = db.getCollection("test").aggregate(
[
{ "$count" : "count" }
]
)
var y = new Date();
print('result ' + a);
print(y - x);
In the above with my test data set, the aggregation count takes 200ms vs the Count method taking 2ms.
This issue extends into the NodeJs Mongodb Driver where the .Count() method has been deprecated and replaced with a countDocuments() method, under the hood the new countDocuments() method is using an aggregation and not the count method on a find just like my example above it has significantly worse performance to the point at which i will continue using the deprecated method over the newer countDocuments() method.
Of course it is slow. The count() method just returns the cursor size after a query is applied (which does not necessarily require all documents to be read, depending on your query and indices). Furthermore, with an empty query, the query optimizer knows that all documents ought to be returned and basically only has to return length(_id_1).
Aggregations, by definition, do not work that way. Unless there is a match stage actually ruling out a document, each and every document is read from “disk” (MongoDB’s own cache and FS caches aside for the moment) for further processing.
I am running into the same issue, and I just hope that anyone might have a better answer then what was previously posted.
I have a "user" collection with 12 million users in it, using MongoDB 5.0.
My query looks like this:
db.users.aggregate([
{ '$sort': { updated_at: -1 } },
{ '$facet': {
results: [
{ $skip: 0 },
{ $limit: 20 }
],
total: [
{ $count: 'count' }
]
}
}
])
The query takes around 1 minute, so that is not acceptable.
I have an index on "updated_at", that is not the issue.
Also, I have this issue even if I run it directly on MongoShell in Compass. So it is not related to any NodeJs Mongo Driver as was previously suspected.
Can I somehow tell Mongo to use the estimated count here?
Or is there any other way to improve the query?

How to bulk copy one field to a first object of array and update the document in MongoDB?

I want to copy price information in my document to prices[] array.
var entitiesCol = db.getCollection('entities');
entitiesCol.find({"type": "item"}).forEach(function(item){
item.prices = [ {
"value": item.price
}];
entitiesCol.save(item);
});
It takes too long time and some fields are not updated.
I am using Mongoose in server side and I can also use it.
What can I do for that?
In the mongo shell, you can use the bulkWrite() method to carry out the updates in a fast and efficient manner. Consider the following example:
var entitiesCol = db.getCollection('entities'),
counter = 0,
ops = [];
entitiesCol.find({
"type": "item",
"prices.0": { "$exists": false }
}).snapshot().forEach(function(item){
ops.push({
"updateOne": {
"filter": { "_id": item._id },
"update": {
"$push": {
"prices": { "value": item.price }
}
}
}
});
counter++;
if (counter % 500 === 0) {
entitiesCol.bulkWrite(ops);
ops = [];
}
})
if (counter % 500 !== 0)
entitiesCol.bulkWrite(ops);
The counter variable above is there to manage your bulk updates effectively if your collection is large. It allows you to batch the update operations and sends the writes to the server in batches of 500 which gives you a better performance as you are not sending every request to the server, just once in every 500 requests.
For bulk operations MongoDB imposes a default internal limit of 1000 operations per batch and so the choice of 500 documents is good in the sense that you have some control over the batch size rather than let MongoDB impose the default, i.e. for larger operations in the magnitude of > 1000 documents.

Update join in mongoDB. Is it possible?

Given these three documents:
db.test.save({"_id":1, "foo":"bar1", "xKey": "xVal1"});
db.test.save({"_id":2, "foo":"bar2", "xKey": "xVal2"});
db.test.save({"_id":3, "foo":"bar3", "xKey": "xVal3"});
And a separate array of information that references those documents:
[{"_id":1, "foo":"bar1Upd"},{"_id":2, "foo":"bar2Upd"}]
Is it possible to update "foo" on the two referenced documents (1 and 2) in a single operation?
I know I can loop through the array and do them one by one but I have thousands of documents, which means too many round trips to the server.
Many thanks for your thoughts.
It's not possible to update "foo" on the two referenced documents (1 and 2) in a single atomic operation as MongoDB has no such mechanism. However, seeing that you have a large collection, one option is to take advantage of the Bulk API which allows you to send your updates in batches instead of every update request to the server.
The process involves looping all matched documents within the array and process Bulk updates which will at least allow many operations to be sent in a single request with a singular response.
This gives you much better performance since you won't be sending every request to the server but just once in every 500 requests, thus making your updates more efficient and quicker.
-EDIT-
The reason of choosing a lower value is generally a controlled choice. As noted in the documentation there, MongoDB by default will send to the server in batches of 1000 operations at a time at maximum and there is no guarantee that makes sure that these default 1000 operations requests actually fit under the 16MB BSON limit. So you would still need to be on the "safe" side and impose a lower batch size that you can only effectively manage so that it totals less than the data limit in size when sending to the server.
Let's use an example to demonstrate the approaches above:
a) If using MongoDB v3.0 or below:
var bulk = db.test.initializeOrderedBulkOp(),
largeArray = [{"_id":1, "foo":"bar1Upd"},{"_id":2, "foo":"bar2Upd"}],
counter = 0;
largeArray.forEach(doc) {
bulk.find({ "_id": doc._id }).updateOne({ "$set": { "foo": doc.foo } });
counter++;
if (counter % 500 == 0) {
bulk.execute();
bulk = db.test.initializeOrderedBulkOp();
}
}
if (counter % 500 != 0 ) bulk.execute();
b) If using MongoDB v3.2.X or above (the new MongoDB version 3.2 has since deprecated the Bulk() API and provided a newer set of apis using bulkWrite()):
var largeArray = [{"_id":1, "foo":"bar1Upd"},{"_id":2, "foo":"bar2Upd"}],
bulkUpdateOps = [];
largeArray.forEach(function(doc){
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "foo": doc.foo } }
}
});
if (bulkUpdateOps.length === 500) {
db.test.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) db.test.bulkWrite(bulkUpdateOps);

How to get pointer of current document to update in updateMany

I have a latest mongodb 3.2 and there is a collection of many items that have timeStamp.
A need to convert milliseconds to Date object and now I use this function:
db.myColl.find().forEach(function (doc) {
doc.date = new Date(doc.date);
db.myColl.save(doc);
})
It took very long time to update 2 millions of rows.
I try to use updateMany (seems it is very fast) but how I can get access to a current document? Is there any chance to rewrite the query above by using updateMany?
Thank you.
You can leverage other bulk update APIs like the bulkWrite() method which will allow you to use an iterator to access a document, manipulate it, add the modified document to a list and then send the list of the update operations in a batch to the server for execution.
The following demonstrates this approach, in which you would use the cursor's forEach() method to iterate the colloction and modify the each document at the same time pushing the update operation to a batch of about 1000 documents which can then be updated at once using the bulkWrite() method.
This is as efficient as using the updateMany() since it uses the same underlying bulk write operations:
var cursor = db.myColl.find({"date": { "$exists": true, "$type": 1 }}),
bulkUpdateOps = [];
cursor.forEach(function(doc){
var newDate = new Date(doc.date);
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "date": newDate } }
}
});
if (bulkUpdateOps.length == 1000) {
db.myColl.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) { db.myColl.bulkWrite(bulkUpdateOps); }
Current query is the only one solution to set field value by itself or other field value (one could compute some data using more than one field from document).
There is a way to improve performance of that query - when it is executed vis mongo shell directly on server (no data is passed to client).

How to change date and time in MongoDB? [duplicate]

This question already has answers here:
Update MongoDB field using value of another field
(12 answers)
Closed 5 years ago.
I have the DB with names and dates. I need to change the old date with the date that is +3 days after that. For example oldaDate is 01.02.2015 the new one is 03.02.2015.
I was trying just to put another date for all files, but that mean that all exams are going to be in one day.
$ db.getCollection('school.exam').update( {}, { $set : { "oldDay" : new ISODate("2016-01-11T03:34:54Z") } }, true, true);
The problem is just to replace old date with some random days.
Since MongoDB doesn't yet support the $inc operator to apply on dates (view the JIRA ticket on that here), as an alternative to increment the date field, you would need to iterate the cursor returned by the find() method using the forEach() method, in the loop
get convert the old date field to timestamp, add the number of days in milliseconds to the timestamp and then update the field using the $set operator.
Take advantage of using the Bulk API for bulk updates which offer better performance as you will be sending the operations to the server in batches of say 1000 which gives you a better performance as you are not sending every request to the server, just once in every 1000 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2. It updates all
the documents in the collection by adding 3 days to the date field:
var bulk = db.getCollection("school.exam").initializeUnorderedBulkOp(),
counter = 0,
daysInMilliSeconds = 86400000,
numOfDays = 3;
db.getCollection("school.exam").find({ "oldDay": { $exists : true, "$type": 2 }}).forEach(function (doc) {
var incDate = new Date(doc.oldDay.getTime() + (numOfDays * daysInMilliSeconds ));
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "oldDay": incDate }
});
counter++;
if (counter % 1000 == 0) {
bulk.execute(); // Execute per 1000 operations and re-initialize every 1000 update statements
bulk = db.getCollection('school.exam').initializeUnorderedBulkOp();
}
})
if (counter % 1000 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite():
var bulkOps = [],
daysInMilliSeconds = 86400000,
numOfDays = 3;
db.getCollection("school.exam").find({ "oldDay": { $exists : true, "$type": 2 }}).forEach(function (doc) {
var incDate = new Date(doc.oldDay.getTime() + (numOfDays * daysInMilliSeconds ));
bulkOps.push(
{
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "oldDay": incDate } }
}
}
);
})
db.getCollection("school.exam").bulkWrite(bulkOps, { 'ordered': true });