Insert or update many documents in MongoDB - mongodb

Is there a way to insert or update/replace multiple documents in MongoDB with a single query?
Assume the following collection:
[
{_id: 1, text: "something"},
{_id: 4, text: "baz"}
]
Now I would like to add multiple documents of which some might already be in the collection. If the documents are already in the collection, I would like to update/replace them. For example, I would like to insert the following documents:
[
{_id:1, text: "something else"},
{_id:2, text: "foo"},
{_id:3, text: "bar"}
]
The query should insert the documents with _id 2 and 3. It should also update/replace the document with _id 1. After the process, the collection should look as follows:
[
{_id:1, text: "something else"},
{_id:2, text: "foo"},
{_id:3, text: "bar"},
{_id:4, text: "baz"}
]
One approach might be to use insertMany:
db.collection.insertMany(
[ {...}, {...}, {...} ],
{
ordered: false,
}
)
If duplicates occur, that query will emit a writeErrors containing an array of objects containing the indexes of the documents that failed to insert. I could go through them and update them instead.
But that process is cumbersome. Is there a way to insert or update/replace many documents in one query?

As said here, to do what you need you can put something like this in
script.js
(* warning: untested code)
use YOUR_DB
var bulk = db.collection.initializeUnorderedBulkOp();
bulk.find( { _id : 1 } ).upsert().update( { $set: { "text": "something else" } } );
bulk.find( { _id : 4 } ).upsert().update( { $set: { "text": "baz" } } );
bulk.find( { _id : 99 } ).upsert().update( { $set: { "text": "mrga" } } );
bulk.execute();
and run it with
mongo < script.js
I had to do it this way as anything I tried for updating/inserting more than 1000 documents didn't work because of the limit.
Write commands can accept no more than 1000 operations. The Bulk() operations in the mongo shell and comparable methods in the drivers do not have this limit.
source

You can also use bulkWrite api to update or insert multiple documents in a single query, here is an example
var ops = []
items.forEach(item => {
ops.push(
{
updateOne: {
filter: { _id: unique_id },
update: {
$set: { fields_to_update_if_exists },
$setOnInsert: { fileds_to_insert_if_does_not_exist }
},
upsert: true
}
}
)
})
db.collections('collection_name').bulkWrite(ops, { ordered: false });

Considering data item as follows:
interface Item {
id: string; // unique key across collection
someValue: string;
}
If you have items under the limit 1000, you can make bulk write operation like this:
public async insertOrUpdateBulk(items: Item[]) {
try {
const bulkOperation = this._itemCollection.initializeUnorderedBulkOp();
for (let itemIndex = 0; itemIndex < items.length; itemIndex++) {
const item = items[itemIndex];
bulkOperation.find({ id: item.id }).upsert().update(item);
}
await bulkOperation.execute();
return true;
} catch (err) {
console.log(err);
return false;
}
}
If you have items that exceed the limit 1000, you can make simultaneous promises:
public async insertOrUpdate(items: Item[]) {
try {
const promises: Array<Promise<UpdateWriteOpResult>> = [];
for (let itemIndex = 0; itemIndex < items.length; itemIndex++) {
const item = items[itemIndex];
const updatePromise = this.itemCollection.updateOne({ id: item.id }, item, { upsert: true });
promises.push(updatePromise);
}
await Promise.all(promises);
console.log('done...');
return true;
} catch (err) {
console.log(err);
return false;
}
}

Related

How do you consistently migrate a large MongoDB collection?

I was trying to migrate a large MongoDB of ~600k documents, like so:
for await (const doc of db.collection('collection').find({
legacyProp: { $exists: true },
})) {
// additional data fetching from separate collections here
const newPropValue = await fetchNewPropValue(doc._id)
await db.collection('collection').findOneAndUpdate({ _id: doc._id }, [{ $set: { newProp: newPropValue } }, { $unset: ['legacyProp'] }])
}
}
When the migration script finished, data was still being updated for about 30 minutes or so. I've concluded this by computing document count of documents containing legacyProp property:
db.collection.countDocuments({ legacyProp: { $exists: true } })
which was decreasing on subsequent calls. After a while, the updates stopped and the final document count of documents containing legacy prop was around 300k, so the update failed silently resulting in a data loss. I'm curious what exactly happened, and most importantly, how do you update large MongoDB collections without any data loss? Keep in mind, there is additional data fetching involved before every update operation.
My first attempt would be to build function of fetchNewPropValue() in an aggregation pipeline.
Have a look at Aggregation Pipeline Operators
If this is not possible then you can try to put all newPropValue's into array and use it like this. 600k properties should fit easily into your RAM.
const newPropValues = await fetchNewPropValue() // getting all new properties as array [{_id: ..., val: ...}, {_id: ..., val: ...}, ...]
db.getCollection('collection').updateMany(
{ legacyProp: { $exists: true } },
[
{
$set: {
newProp: {
$first: {
$filter: { input: newPropValues, cond: { $eq: ["$_id", "$$this._id"] } }
}
}
}
},
{ $set: { legacyProp: "$$REMOVE", newProp: "$$newProp.val" } }
]
)
Or you can try bulkWrite:
let bulkOperations = []
db.getCollection('collection').find({ legacyProp: { $exists: true } }).forEach(doc => {
const newPropValue = await fetchNewPropValue(doc._id);
bulkOperations.push({
updateOne: {
filter: { _id: doc._id },
update: {
$set: { newProp: newPropValue },
$unset: { legacyProp: "" }
}
}
});
if (bulkOperations.length > 10000) {
db.getCollection('collection').bulkWrite(bulkOperations, { ordered: false });
bulkOperations = [];
}
})
if (bulkOperations.length > 0)
db.getCollection('collection').bulkWrite(bulkOperations, { ordered: false })

How can I get mongo documents with multiple non-exclusive selectors using Mongoose?

I need to build a single query to return :
Documents from their IDs (optional)
Documents matching a text search string (optional)
The other documents sorted by score and with a count limited by an integer argument
The total limit is the provided one + the length of the documents array IDs of the first condition.
I used to work with meteor where you can return an array of queries cursors. In this case, I am working with a mongoose backend and I am not sure of how to proceed. I assume I need to use Model.aggregate and provide my conditions as an array. However, the request fails with the error Arguments must be aggregate pipeline operators.
Each of my conditions works fine individually with a regular find() query.
Here is my graphQL query resolver, where I can't find what is going wrong:
async (root, { search, selected = 0, limit = 10 }, { models: { tag } }) => {
try {
let selector = [{}] // {} should return the documents by default if no other condition is set
if (selected.length) selector.push({ _id: { $in: selected } })
if (search && search.length) selector.push({
$text: {
$search: search,
$caseSensitive: false,
$diacriticSensitive: false
}
})
const tags = await tag.aggregate(selector).sort('-score').limit(limit + selected.length)
return {
ok: true,
message: "Tags fetched",
data: tags
}
} catch (err) { return { ok: false, message: err.message }; }
}
),
When I log the selector with all the arguments set, it returns an array of the following form:
[
{},
{ _id: { '$in': [Array] } },
{
'$text': {
'$search': 'test',
'$caseSensitive': false,
'$diacriticSensitive': false
}
}
]
UPDATE
Based on #Ashh answer, with an additional $or operator, the full agregator variable look like this:
{
'$match': {
'$or': {
_id: {
'$in': [ '5e39745e0ac14b1731a779a3', '5e39745d0ac14b1731a76984' ]
},
'$text': {
'$search': 'test',
'$caseSensitive': false,
'$diacriticSensitive': false
}
}
}
},
{ '$sort': { score: -1 } },
{ limit: 12 }
I still get the "Arguments must be aggregate pipeline operators" error, and I don't see where, if the $text argument is not present, I get the default documents by score.
#Ashh, I'll wait for your updated answer to validate it. Thanks again for your help.
Mongoose aggregate() function uses $match stage which is equivalent to the find() but accepts some stages as array of elements to filter the documents. You can check the example here Mongoose Aggregate.
And rest is your code fault. It should be
async (root, { search, selected = 0, limit = 10 }, { models: { tag } }) => {
try {
const aggregate = []
let selector = { $match: { }};
aggregate.push(selector)
if (selected.length) {
aggregate[0].$match['$or'] = [];
aggregate[0].$match.$or.push({ _id: { $in: selected }});
}
if (search && search.length) {
aggregate[0].$match['$or'] = aggregate[0].$match['$or'] ? aggregate[0].$match['$or'] : []
aggregate[0].$match.$or.push({ $text: {
$search: search,
$caseSensitive: false,
$diacriticSensitive: false
}})
}
aggregate.push({ $sort: { score: - 1 }})
aggregate.push({ $limit: limit })
const tags = await tag.aggregate(aggregate)
return {
ok: true,
message: "Tags fetched",
data: tags
};
} catch (err) {
return { ok: false, message: err.message };
}
};

Efficient MongoDB query to split a field into an array

This code splits the nicknames field in the cities collection into an array, but it's way to slow:
db.cities
.find()
.snapshot()
.forEach(function(el) {
el.nicknames = el.nicknames.split('->')
db.cities.save(el)
})
This code also splits the nicknames field in the cities collection into an array and it's much faster, but it temporarily causes the database size to double which crashes my database.
db.cities.aggregate(
[
{ "$addFields": {
"nicknames": { "$split": [ "$nicknames", "->" ] }
}},
{ "$out": "cities" }
]
)
This seems like a trivial database task. There has to be a better way... right?
Yes, take advantage of the bulkWrite method for efficient bulk updates. You can split up the update operation into batches for large collections.
Using the cursor from the aggregate operation (minus the last $out pipeline), you can compose the bulk update operations as:
let bulkUpdateOps = [];
const cursor = db.cities.aggregate([
{ "$project": { "nicknames": { "$split": [ "$nicknames", "->" ] } } }
]);
cursor.forEach(doc => {
const { _id, nicknames } = doc;
bulkUpdateOps.push({
"updateOne": {
"filter": { _id },
"update": { "$set": { nicknames } },
"upsert": true
}
});
if (bulkUpdateOps.length === 1000) {
db.cities.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) {
db.cities.bulkWrite(bulkUpdateOps);
}

MongoDB multiple update attributes

I have a collection A that has documents in form of:
{
_id: 12345,
title: "title"
}
and document B in form of:
{
_id: 12345,
newAttribute: "newAttribute12345"
}
I want to update collection A to have documents like:
{
_id: 12345,
title: "title"
newAttribute: "newAttribute12345"
}
At this time I do it with
update({_id: doc._id}, {$set: {newAttribute: doc.newAttrubute}})
, but I need to run it 10,000 in a loop for all my documents.
How can I update multiple documents like these (by _id) in 1 db call or in most efficient way? (this is basically a join/bulk update attributes operation)
I use mongodb 2.6
consider following scenario, two collections name as title and attribute.
title collection contains following documents :
[{
_id: 12345,
title: "title"
},
{
_id: 12346,
title: "title1"
}]
and attribute collection contains following document :
[{
_id: 12345,
newAttribute: "newAttribute12345"
},
{
_id: 12346,
newAttribute: "newAttribute12346"
},
{
_id: 12347,
newAttribute: "newAttribute12347"
}]
And you want to update title collection as using this criteria title._id = attribute._id use mongo bulk update with following script :
var bulk = db.title.initializeOrderedBulkOp();
var counter = 0;
db.attribute.find().forEach(function(data) {
var updoc = {
"$set": {}
};
var updateKey = "newAttribute";
updoc["$set"][updateKey] = data.newAttribute;
bulk.find({
"_id": data._id
}).update(updoc);
counter++;
// Drain and re-initialize every 1000 update statements
if(counter % 1000 == 0) {
bulk.execute();
bulk = db.title.initializeOrderedBulkOp();
}
})
// Add the rest in the queue
if(counter % 1000 != 0) bulk.execute();
A possible/problematic answer is hacky join in mongo (maybe there is something better):
http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/
The problem with this is that I have to swap the collections later and this requires me to know the properties of my collection
var r = function(key, values){
var result = { prop1: null, prop2: null };
values.forEach(function(value){
if (result.prop1 === null && value.prop1 !== null) {
result.prop1 = value.prop1;
}
if (result.prop2 === null && value.prop2 !== null) {
result.prop2 = value.prop2;
}
})
return result;
};
var m = function(){
emit(this._id, { prop1: this.prop1, prop2: this.prop2 })
}
db.A.mapReduce(m1, r, { out: { reduce: 'C' }});
db.B.mapReduce(m1, r, { out: { reduce: 'C' }});
You can use the cursor.forEach method
db.collectionA.find().forEach(function(docA){
db.collectionB.find().forEach(function(docB){
if(docA._id === docB._id){
docA.newAttribute = docB.newAttribute;
db.collectionA.save(docA);
}
})
})
> db.collectionA.find()
{ "_id" : 12345, "title" : "title", "newAttribute" : "newAttribute12345" }

Update multiple documents by id set. Mongoose

I wonder if mongoose has some method to update multiple documents by id set. For example:
for (var i = 0, l = ids.length; i < l; i++) {
Element.update({'_id': ids[i]}, {'visibility': visibility} ,function(err, records){
if (err) {
return false;
} else {
return true;
};
});
};
What i want to know, that if mongoose can do something like this:
Element.update({'_id': ids}, {'visibility': visibility}, {multi: true} ,function(err, records){
if (err) {
return false;
}
});
where ids is an array of ids, like ['id1', 'id2', 'id3'] - sample array.
Same question for find.
Most probably yes. And it is called using $in operator in mongodb query for update.
db.Element.update(
{ _id: { $in: ['id1', 'id2', 'id3'] } },
{ $set: { visibility : yourvisibility } },
{multi: true}
)
All you need is to find how to implement $in in mongoose.
in updateMany function no need of { multi: true }
db.collectionName.updateMany(
{
_id:
{
$in:
[
ObjectId("your object id"),
ObjectId("your object id")
]
}
},
{
$inc: { quantity: 100 }
})
I want to add one more point, you can use $in to fetch multiple document
db.collectionName.find(
{
_id:
{
$in:
[
ObjectId("your object id"),
ObjectId("your object id")
]
}
})
Updates all documents that match the specified filter for a collection.
let ids = ["kwwe232244h3j44jg3h4", "23h2u32g2h3b3hbh", "fhfu3h4h34u35"];
let visibility = true;
Element.updateMany({_id: {$in: ids}},
{ $set: { visibility } },
{multi: true} ,
function(err, records){
if (err) {
return false;
}
});
know more