Using async loop in mongodb shell for updating many documents - mongodb

I have a problem with the following query in MongoDB shell ONLY when the size of the array gets bigger, for example, more than 100 elements.
newPointArray --> is an array with 500 elements
newPointArray.forEach(function(newDoc){
//update the mongodb properties for each doc
db.getCollection('me_all_test')
.update({ '_id': newDoc._id },
{ $set: { "properties": newDoc.properties } },
{ upsert: true });
})
Can someone guide me how can I run this query IN MongoDB SHELL for lager array by using an async loop or promise or...?
Thanks in advance

Rather than doing individual .update()s, use a .bulkWrite() operation. This should reduce the overhead of asking mongo to do multiple individual operations. This is assuming that you are doing general operations. I'm not clear on if newPointArray is always new points that don't exist.
Given your example, I believe your script would mimic the following:
// I'm assuming this is your array (but truncated)
let newPointArray = [
{
_id: "1",
properties: {
foo: "bar"
}
},
{
_id: "2",
properties: {
foo: "buzz"
}
}
// Whatever other points you have in your array
];
db
.getCollection("me_all_test")
.bulkWrite(newPointArray
// Map your array to a query bulkWrite understands
.map(point => {
return {
updateOne: {
filter: {
_id: point._id
},
update: {
$set: {
properties: point.properties
}
},
upsert: true
}
};
}));
You may also want to consider setting ordered to false in the operation which may also have performance gains. That would look something liked this:
db
.getCollection("me_all_test")
.bulkWrite([SOME_ARRAY_SIMILAR_TO_ABOVE_EXAMPLE], {
ordered: false
});

Related

MongoDB Aggregation - filter results according to an array of data

I want a group of random data of a Connection type schema. But I want to filter the data in accordance with a group of array 'isConnections' which is a collection of userID's, so that the fetched data does not include any document where its userID is in the 'isConnections' array.
Initially, I tried using the following logic, and many others that I found over stack overflow, but none worked.
"isConnection": [
"62adca9063bc6adb05d79db8",
"62af5f3a3920076f461d8a3f",
"62bfd0935d52cdde5a4e89a9",
"62b3bb29342af89edcd11264"
]
let Connections = await ConnectionModel.aggregate([
{ $match: { UserID: { $ne: isConnection } } },
{ $sample: { size: 10 } }
])
$ne is looking for an exact match. you want to be using $nin ( not in ), like so:
let Connections = await ConnectionModel.aggregate([
{ $match: { UserID: { $nin: isConnection } } },
{ $sample: { size: 10 } }
])
Also just make sure UserID is actually string type and not ObjectID, otherwise you'll have to convert the input array as well.

Most efficient way to put fields of an embedded document in its parent for an entire MongoDB collection?

I am looking for the most efficient way to modify all the documents of a collection from this structure:
{
[...]
myValues:
{
a: "any",
b: "content",
c: "can be found here"
}
[...]
}
so it becomes this:
{
[...]
a: "any",
b: "content",
c: "can be found here"
[...]
}
Basically, I want everything under the field myValues to be put in its parent document for all the documents of a collection.
I have been looking for a way to do this in a single query using dbCollection.updateMany(), put it does not seem possible to do such thing, unless the content of myValues is the same for all documents. But in my case the content of myValues changes from one document to the other. For example, I tried:
db.getCollection('myCollection').updateMany({ myValues: { $exists: true } }, { $set: '$myValues' });
thinking it would perhaps resolve the myValues object and use that object to set it in the document. But it returns an error saying it's illegal to assign a string to the $set field.
So what would be the most efficient approach for what I am trying to do? Is there a way to update all the documents of the collection as I need in a single command?
Or do I need to iterate on each document of the collection, and update them one by one?
For now, I iterate on all documents with the following code:
var documents = await myCollection.find({ myValues: { $exists: true } });
for (var document = await documents.next(); document != null; document = await documents.next())
{
await myCollection.updateOne({ _id: document._id }, { $set: document.myValues, $unset: { myValues: 1} });
}
Since my collection is very large, it takes really long to execute.
You can consider using $out as an alternative, single-command solution. It can be used to replace existing collection with the output of an aggregation. Knowing that you can write following aggregation pipeline:
db.myCollection.aggregate([
{
$replaceRoot: {
newRoot: {
$mergeObjects: [ "$$ROOT", "$myValues" ]
}
}
},
{
$project: {
myValues: 0
}
},
{
$out: "myCollection"
}
])
$replaceRoot allows you to promote an object which merges the old $$ROOT and myValues into root level.

Mongo aggregation and MongoError: exception: BufBuilder attempted to grow() to 134217728 bytes, past the 64MB limit

I'm trying to aggregate data from my Mongo collection to produce some statistics for FreeCodeCamp by making a large json file of the data to use later.
I'm running into the error in the title. There doesn't seem to be a lot of information about this, and the other posts here on SO don't have an answer. I'm using the latest version of MongoDB and drivers.
I suspect there is probably a better way to run this aggregation, but it runs fine on a subset of my collection. My full collection is ~7GB.
I'm running the script via node aggScript.js > ~/Desktop/output.json
Here is the relevant code:
MongoClient.connect(secrets.db, function(err, database) {
if (err) {
throw err;
}
database.collection('user').aggregate([
{
$match: {
'completedChallenges': {
$exists: true
}
}
},
{
$match: {
'completedChallenges': {
$ne: ''
}
}
},
{
$match: {
'completedChallenges': {
$ne: null
}
}
},
{
$group: {
'_id': 1, 'completedChallenges': {
$addToSet: '$completedChallenges'
}
}
}
], {
allowDiskUse: true
}, function(err, results) {
if (err) { throw err; }
var aggData = results.map(function(camper) {
return _.flatten(camper.completedChallenges.map(function(challenges) {
return challenges.map(function(challenge) {
return {
name: challenge.name,
completedDate: challenge.completedDate,
solution: challenge.solution
};
});
}), true);
});
console.log(JSON.stringify(aggData));
process.exit(0);
});
});
Aggregate returns a single document containing all the result data, which limits how much data can be returned to the maximum BSON document size.
Assuming that you do actually want all this data, there are two options:
Use aggregateCursor instead of aggregate. This returns a cursor rather than a single document, which you can then iterate over
add a $out stage as the last stage of your pipeline. This tells mongodb to write your aggregation data to the specified collection. The aggregate command itself returns no data and you then query that collection as you would any other.
It just means that the result object you are building became too large. This kind of issue should not be impacted by the version. The fix implemented for 2.5.0 only prevents the crash from occurring.
You need to filter ($match) properly to have the data which you need in result. Also group with proper fields. The results are put into buffer of 64MB. So reduce your data. $project only the columns you require in result. Not whole documents.
You can combine your 3 $match objects to single to reduce pipelines.
{
$match: {
'completedChallenges': {
$exists: true,
$ne: null,
$ne: ""
}
}
}
I had this issue and I couldn't debug the problem so I ended up abandoning the aggregation approach. Instead I just iterated through each entry and created a new collection. Here's a stripped down shell script which might help you see what I mean:
db.new_collection.ensureIndex({my_key:1}); //for performance, not a necessity
db.old_collection.find({}).noCursorTimeout().forEach(function(doc) {
db.new_collection.update(
{ my_key: doc.my_key },
{
$push: { stuff: doc.stuff, other_stuff: doc.other_stuff},
$inc: { thing: doc.thing},
},
{ upsert: true }
);
});
I don't imagine that this approach would suit everyone, but hopefully that helps anyone who was in my particular situation.

Limiting results in MongoDB but still getting the full count?

For speed, I'd like to limit a query to 10 results
db.collection.find( ... ).limit(10)
However, I'd also like to know the total count, so to say "there were 124 but I only have 10". Is there a good efficient way to do this?
By default, count() ignores limit() and counts the results in the entire query.
So when you for example do this, var a = db.collection.find(...).limit(10);
running a.count() will give you the total count of your query.
Doing count(1) includes limit and skip.
The accepted answer by #johnnycrab is for the mongo CLI.
If you have to write the same code in Node.js and Express.js, you will have to use it like this to be able to use the "count" function along with the toArray's "result".
var curFind = db.collection('tasks').find({query});
Then you can run two functions after it like this (one nested in the other)
curFind.count(function (e, count) {
// Use count here
curFind.skip(0).limit(10).toArray(function(err, result) {
// Use result here and count here
});
});
cursor.count() should ignore cursor.skip() and cursor.limit() by default.
Source: http://docs.mongodb.org/manual/reference/method/cursor.count/#cursor.count
You can use a $facet stage which processes multiple aggregation pipelines within a single stage on the same set of input documents:
// { item: "a" }
// { item: "b" }
// { item: "c" }
db.collection.aggregate([
{ $facet: {
limit: [{ $limit: 2 }],
total: [{ $count: "count" }]
}},
{ $set: { total: { $first: "$total.count" } } }
])
// { limit: [{ item: "a" }, { item: "b" }], total: 3 }
This way, within the same query, you can get both some documents (limit: [{ $limit: 2 }]) and the total count of documents ({ $count: "count" }).
The final $set stage is an optional clean-up step, just there to project the result of the $count stage, such that "total" : [ { "count" : 3 } ] becomes total: 3.
There is a solution using push and slice: https://stackoverflow.com/a/39784851/4752635
I prefe
First for filtering and then grouping by ID to get number of filtered elements. Do not filter here, it is unnecessary.
Second query which filters, sorts and paginates.
Solution with pushing $$ROOT and using $slice runs into document memory limitation of 16MB for large collections. Also, for large collections two queries together seem to run faster than the one with $$ROOT pushing. You can run them in parallel as well, so you are limited only by the slower of the two queries (probably the one which sorts).
I have settled with this solution using 2 queries and aggregation framework (note - I use node.js in this example, but idea is the same):
var aggregation = [
{
// If you can match fields at the begining, match as many as early as possible.
$match: {...}
},
{
// Projection.
$project: {...}
},
{
// Some things you can match only after projection or grouping, so do it now.
$match: {...}
}
];
// Copy filtering elements from the pipeline - this is the same for both counting number of fileter elements and for pagination queries.
var aggregationPaginated = aggregation.slice(0);
// Count filtered elements.
aggregation.push(
{
$group: {
_id: null,
count: { $sum: 1 }
}
}
);
// Sort in pagination query.
aggregationPaginated.push(
{
$sort: sorting
}
);
// Paginate.
aggregationPaginated.push(
{
$limit: skip + length
},
{
$skip: skip
}
);
// I use mongoose.
// Get total count.
model.count(function(errCount, totalCount) {
// Count filtered.
model.aggregate(aggregation)
.allowDiskUse(true)
.exec(
function(errFind, documents) {
if (errFind) {
// Errors.
res.status(503);
return res.json({
'success': false,
'response': 'err_counting'
});
}
else {
// Number of filtered elements.
var numFiltered = documents[0].count;
// Filter, sort and pagiante.
model.request.aggregate(aggregationPaginated)
.allowDiskUse(true)
.exec(
function(errFindP, documentsP) {
if (errFindP) {
// Errors.
res.status(503);
return res.json({
'success': false,
'response': 'err_pagination'
});
}
else {
return res.json({
'success': true,
'recordsTotal': totalCount,
'recordsFiltered': numFiltered,
'response': documentsP
});
}
});
}
});
});

How to limit number of updating documents in mongodb

How to implement somethings similar to db.collection.find().limit(10) but while updating documents?
Now I'm using something really crappy like getting documents with db.collection.find().limit() and then updating them.
In general I wanna to return given number of records and change one field in each of them.
Thanks.
You can use:
db.collection.find().limit(NUMBER_OF_ITEMS_YOU_WANT_TO_UPDATE).forEach(
function (e) {
e.fieldToChange = "blah";
....
db.collection.save(e);
}
);
(Credits for forEach code: MongoDB: Updating documents using data from the same document)
What this will do is only change the number of entries you specify. So if you want to add a field called "newField" with value 1 to only half of your entries inside "collection", for example, you can put in
db.collection.find().limit(db.collection.count() / 2).forEach(
function (e) {
e.newField = 1;
db.collection.save(e);
}
);
If you then want to make the other half also have "newField" but with value 2, you can do an update with the condition that newField doesn't exist:
db.collection.update( { newField : { $exists : false } }, { $set : { newField : 2 } }, {multi : true} );
Using forEach to individually update each document is slow. You can update the documents in bulk using
ids = db.collection.find(<condition>).limit(<limit>).map(
function(doc) {
return doc._id;
}
);
db.collection.updateMany({_id: {$in: ids}}, <update>})
The solutions that iterate over all objects then update them individually are very slow.
Retrieving them all then updating simultaneously using $in is more efficient.
ids = People.where(firstname: 'Pablo').limit(10000).only(:_id).to_a.map(&:id)
People.in(_id: ids).update_all(lastname: 'Cantero')
The query is written using Mongoid, but can be easily rewritten in Mongo Shell as well.
Unfortunately the workaround you have is the only way to do it AFAIK. There is a boolean flag multi which will either update all the matches (when true) or update the 1st match (when false).
As the answer states there is still no way to limit the number of documents to update (or delete) to a value > 1. A workaround to use something like:
db.collection.find(<condition>).limit(<limit>).forEach(function(doc){db.collection.update({_id:doc._id},{<your update>})})
If your id is a sequence number and not an ObjectId you can do this in a for loop:
let batchSize= 10;
for (let i = 0; i <= 1000000; i += batchSize) {
db.collection.update({$and :[{"_id": {$lte: i+batchSize}}, {"_id": {$gt: i}}]}),{<your update>})
}
let fetchStandby = await db.model.distinct("key",{});
fetchStandby = fetchStandby.slice(0, no_of_docs_to_be_updated)
let fetch = await db.model.updateMany({
key: { $in: fetchStandby }
}, {
$set:{"qc.status": "pending"}
})
I also recently wanted something like this. I think querying for a long list of _id just to update in an $in is perhaps slow too, so I tried to use an aggregation+merge
while (true) {
const record = db.records.findOne({ isArchived: false }, {_id: 1})
if (!record) {
print("No more records")
break
}
db.records.aggregate([
{ $match: { isArchived: false } },
{ $limit: 100 },
{
$project: {
_id: 1,
isArchived: {
$literal: true
},
updatedAt: {
$literal: new Date()
}
}
},
{
$merge: {
into: "records",
on: "_id",
whenMatched: "merge"
}
}
])
print("Done update")
}
But feel free to comment if this is better or worse that a bulk update with $in.