Performance problem of updating mongodb with self reference - mongodb

I am a mongoDB atlas cloud user. I need to update more than 10K documents remotely, all documents in a colleciton, at once.
Currently I tried ".forEach" to self-referencing, but it is too slow and i wonder there is better way.
myCollection.find({})
.forEach(function(o){
const totalPoint = o.gamepoint + 10
return myCollection.update(
{_id: o._id}
, {
$set: {
total = totalPoint
}
}
);
});
there is no problem! but very slow.... Is there no way to improve the perfromance?
If upgrading the hardware is the only way, what should be upgraded between app hosting server and mongodb server?
Update: My apology. the former code sample is not suit for explaining this problem. this is corrected one.
myCollection.find({})
.forEach(function(o){
const totalPoint = o.gamepoint + o.gamepoint
return myCollection.update(
{_id: o._id}
, {
$set: {
total = totalPoint
}
}
);
});

You can use mongo db operator to get desire outcome. Go here.
Collection.update({},{$inc: { total: 10}},{multi: true } , function(err, updated){
console.log('updated')// this will update all documents.
})

the following update query will get you there but you will need mongodb server v4.2 for it to work.
db.myCollection.update({}, [{
$set: {
total: {
$add: ["$gamepoint", "$gamepoint"]
}
}
}], {
multi: true
});
reference: Update with Aggregation Pipeline

Related

Golang MongoDB insertMany if not exist

So I am writing code where I want to insert many articles to the MongoDB, but I want to check if there are no articles with the same ID and skip them if there are. I can not find an implementation of this logic online can anyone help me with a solution?
collection.InsertMany works fine, but it does not check for existing documents.
You can use the '$setOnInsert'.
Like this :
db.products.update(
{ },
{
$set: { _id: 1, item: "apple" },
$setOnInsert: { defaultQty: 100 }
},
{ upsert: true }
)
And the document is in this link:
https://docs.mongodb.com/manual/reference/operator/update/setOnInsert/

Strong Loopback group_by aggregation

I have searched a lot to find a way for aggregation using loopback mongodb, unfortunately no perfect solution found. One of them is here
But can't implement this, any one to help me solve this problem, with any new solution, or describing above link.
Loopback doesn't provide a way to do an aggregation query, but you can find another solution in: https://github.com/strongloop/loopback/issues/890
//Using the datasource we are making a direct request to MongoDB instead of use the PersistedModel of Loopback
var bookCollection = Book.getDataSource().connector.collection(Book.modelName);
bookCollection.aggregate({
$group: {
_id: { category: "$category", author: "$author" },
total: { $sum: 1 }
}
}, function(err, groupByRecords) {
if(err) {
next(err);
} else {
next();
}
});

Mongo aggregation and MongoError: exception: BufBuilder attempted to grow() to 134217728 bytes, past the 64MB limit

I'm trying to aggregate data from my Mongo collection to produce some statistics for FreeCodeCamp by making a large json file of the data to use later.
I'm running into the error in the title. There doesn't seem to be a lot of information about this, and the other posts here on SO don't have an answer. I'm using the latest version of MongoDB and drivers.
I suspect there is probably a better way to run this aggregation, but it runs fine on a subset of my collection. My full collection is ~7GB.
I'm running the script via node aggScript.js > ~/Desktop/output.json
Here is the relevant code:
MongoClient.connect(secrets.db, function(err, database) {
if (err) {
throw err;
}
database.collection('user').aggregate([
{
$match: {
'completedChallenges': {
$exists: true
}
}
},
{
$match: {
'completedChallenges': {
$ne: ''
}
}
},
{
$match: {
'completedChallenges': {
$ne: null
}
}
},
{
$group: {
'_id': 1, 'completedChallenges': {
$addToSet: '$completedChallenges'
}
}
}
], {
allowDiskUse: true
}, function(err, results) {
if (err) { throw err; }
var aggData = results.map(function(camper) {
return _.flatten(camper.completedChallenges.map(function(challenges) {
return challenges.map(function(challenge) {
return {
name: challenge.name,
completedDate: challenge.completedDate,
solution: challenge.solution
};
});
}), true);
});
console.log(JSON.stringify(aggData));
process.exit(0);
});
});
Aggregate returns a single document containing all the result data, which limits how much data can be returned to the maximum BSON document size.
Assuming that you do actually want all this data, there are two options:
Use aggregateCursor instead of aggregate. This returns a cursor rather than a single document, which you can then iterate over
add a $out stage as the last stage of your pipeline. This tells mongodb to write your aggregation data to the specified collection. The aggregate command itself returns no data and you then query that collection as you would any other.
It just means that the result object you are building became too large. This kind of issue should not be impacted by the version. The fix implemented for 2.5.0 only prevents the crash from occurring.
You need to filter ($match) properly to have the data which you need in result. Also group with proper fields. The results are put into buffer of 64MB. So reduce your data. $project only the columns you require in result. Not whole documents.
You can combine your 3 $match objects to single to reduce pipelines.
{
$match: {
'completedChallenges': {
$exists: true,
$ne: null,
$ne: ""
}
}
}
I had this issue and I couldn't debug the problem so I ended up abandoning the aggregation approach. Instead I just iterated through each entry and created a new collection. Here's a stripped down shell script which might help you see what I mean:
db.new_collection.ensureIndex({my_key:1}); //for performance, not a necessity
db.old_collection.find({}).noCursorTimeout().forEach(function(doc) {
db.new_collection.update(
{ my_key: doc.my_key },
{
$push: { stuff: doc.stuff, other_stuff: doc.other_stuff},
$inc: { thing: doc.thing},
},
{ upsert: true }
);
});
I don't imagine that this approach would suit everyone, but hopefully that helps anyone who was in my particular situation.

MongoDB - change simple field into an object

In MongoDB, I want to change the structure of my documents from:
{
discount: 10,
discountType: "AMOUNT"
}
to:
{
discount: {
value: 10,
type: "AMOUNT"
}
}
so I tried following query in mongo shell:
db.discounts.update({},
{
$rename: {
discount: "discount.value",
discountType: "discount.type"
}
},
{multi: true}
)
but it throws an error:
"writeError" : {
"code" : 2,
"errmsg" : "The source and target field for $rename must not be on the same path: discount: \"discount.value\""
}
A workaround that comes to my mind is to do it in 2 steps: first assign the new structure to a new field (let's say discount2) and then rename it to discount. But maybe there is a way to do it one step?
The simplest way is to do it in two steps as you allude to in your question; initially renaming discount to a temporary field name so that it can be reused in the second step:
db.discounts.update({}, {$rename: {discount: 'temp'}}, {multi: true})
db.discounts.update({},
{$rename: {temp: 'discount.value', discountType: 'discount.type'}},
{multi: true})
The reason you are getting this error is because as mentioned in the documentation:
The $rename operator logically performs an $unset of both the old name and the new name, and then performs a $set operation with the new name. As such, the operation may not preserve the order of the fields in the document; i.e. the renamed field may move within the document.
And the problem with this is that you can't $set and $unset same field at the same time in MongoDB.
The solution will be to use bulk operations to update your documents in order to change their structure, and even in that case you need to use a field's name that doesn't exist in your collection. Of course the best way to do all this is using "Bulk" operations for maximum efficiency
MongoDB 3.2 or newer
MongoDB 3.2 deprecates Bulk() and its associated methods. You need to use the .bulkWrite() method.
var operations = [];
db.discounts.find().forEach(function(doc) {
var discount = doc.discount;
var discountType = doc.discountType;
var operation = { 'updateOne': {
'filter': { '_id': doc._id },
'update': {
'$unset': { 'discount': '', 'discountType': '' },
'$set': { 'discounts.value': discount, 'discounts.type': discountType }
}
}};
operations.push(operation);
});
operations.push( {
ordered: true,
writeConcern: { w: "majority", wtimeout: 5000 }
});
db.discounts.bulkWrite(operations);
Which yields:
{
"_id" : ObjectId("56682a02e6a2321d88f6d078"),
"discounts" : {
"value" : 10,
"type" : "AMOUNT"
}
}
MongoDB 2.6
Prior to MongoDB 3.2 and using MongoDB version 2.6 or newer you can use the "Bulk" API.
var bulk = db.discounts.initializeOrderedBulkOp();
var count = 0;
db.discounts.find().forEach(function(doc) {
var discount = doc.discount;
var discountType = doc.discountType;
bulk.find( { '_id': doc._id } ).updateOne( {
'$unset': { 'discount': '', 'discountType': '' },
'$set': { 'discounts.value': discount, 'discounts.type': discountType } });
count++;
if (count % 500 === 0) {
bulk.execute();
bulk = db.discounts.initializeOrderedBulkOp();
}
})
if (count > 0)
bulk.execute();
This query yields same result as previous one.
Thanks to answers from Update MongoDB field using value of another field I figured out following solution:
db.discounts.find().snapshot().forEach(
function(elem) {
elem.discount = {
value: elem.discount,
type: elem.discountType
}
delete elem.discountType;
db.discounts.save(elem);
}
)
Which I quite like because the source code reads nicely but performance sucks for large amount of documents.

How to limit number of updating documents in mongodb

How to implement somethings similar to db.collection.find().limit(10) but while updating documents?
Now I'm using something really crappy like getting documents with db.collection.find().limit() and then updating them.
In general I wanna to return given number of records and change one field in each of them.
Thanks.
You can use:
db.collection.find().limit(NUMBER_OF_ITEMS_YOU_WANT_TO_UPDATE).forEach(
function (e) {
e.fieldToChange = "blah";
....
db.collection.save(e);
}
);
(Credits for forEach code: MongoDB: Updating documents using data from the same document)
What this will do is only change the number of entries you specify. So if you want to add a field called "newField" with value 1 to only half of your entries inside "collection", for example, you can put in
db.collection.find().limit(db.collection.count() / 2).forEach(
function (e) {
e.newField = 1;
db.collection.save(e);
}
);
If you then want to make the other half also have "newField" but with value 2, you can do an update with the condition that newField doesn't exist:
db.collection.update( { newField : { $exists : false } }, { $set : { newField : 2 } }, {multi : true} );
Using forEach to individually update each document is slow. You can update the documents in bulk using
ids = db.collection.find(<condition>).limit(<limit>).map(
function(doc) {
return doc._id;
}
);
db.collection.updateMany({_id: {$in: ids}}, <update>})
The solutions that iterate over all objects then update them individually are very slow.
Retrieving them all then updating simultaneously using $in is more efficient.
ids = People.where(firstname: 'Pablo').limit(10000).only(:_id).to_a.map(&:id)
People.in(_id: ids).update_all(lastname: 'Cantero')
The query is written using Mongoid, but can be easily rewritten in Mongo Shell as well.
Unfortunately the workaround you have is the only way to do it AFAIK. There is a boolean flag multi which will either update all the matches (when true) or update the 1st match (when false).
As the answer states there is still no way to limit the number of documents to update (or delete) to a value > 1. A workaround to use something like:
db.collection.find(<condition>).limit(<limit>).forEach(function(doc){db.collection.update({_id:doc._id},{<your update>})})
If your id is a sequence number and not an ObjectId you can do this in a for loop:
let batchSize= 10;
for (let i = 0; i <= 1000000; i += batchSize) {
db.collection.update({$and :[{"_id": {$lte: i+batchSize}}, {"_id": {$gt: i}}]}),{<your update>})
}
let fetchStandby = await db.model.distinct("key",{});
fetchStandby = fetchStandby.slice(0, no_of_docs_to_be_updated)
let fetch = await db.model.updateMany({
key: { $in: fetchStandby }
}, {
$set:{"qc.status": "pending"}
})
I also recently wanted something like this. I think querying for a long list of _id just to update in an $in is perhaps slow too, so I tried to use an aggregation+merge
while (true) {
const record = db.records.findOne({ isArchived: false }, {_id: 1})
if (!record) {
print("No more records")
break
}
db.records.aggregate([
{ $match: { isArchived: false } },
{ $limit: 100 },
{
$project: {
_id: 1,
isArchived: {
$literal: true
},
updatedAt: {
$literal: new Date()
}
}
},
{
$merge: {
into: "records",
on: "_id",
whenMatched: "merge"
}
}
])
print("Done update")
}
But feel free to comment if this is better or worse that a bulk update with $in.