Mongodb - return an array of _id of all updated documents - mongodb

I need to update some documents in one collection, and send an array of the _ids of the updated documents to another collection.
Since update() returns the number of updated items not their ids, I've come up with the following to get the array:
var docsUpdated = [];
var cursor = myCollection.find(<myQuery>);
cursor.forEach(function(doc) {
myCollection.update({_id : doc._id}, <myUpdate>, function(error, response){
docsUpdated.push(doc._id);
});
});
Or I could do:
var docsUpdated = myCollection.distinct("_id", <myQuery>);
myCollection.update(<myQuery>, <myUpdate>, {multi : true});
I'm guessing the second version would be faster because it only calls the database twice. But both seem annoyingly inefficient - is there another way of doing this without multiple database calls? Or am I overcomplicating things?

I think you need the cursor operator ".aggregate()"
db.orders.aggregate([
{ $group: { _id: "$_id"} }
])
something along those lines that returns the results of all the id's in the collection

Related

MongoDB query is performing slow when using $in operator

With some complex queries and aggregation, I got the list of object ids as below:
var objectIdsCollection = Array of ObjectIds
I have another collection, it's say Collection1 and It has OtherCollectionObjectId. Now, I need to filter this Collection1 based on object ids I got in an array.
Collection1 {
_id: ObjectId('xyz'),
name: ..
someOtherAttributes: ...
OtherCollectionObjectId: ObjectId('abc')
}
Below is the query which I am trying. I need to use await because, the result of this query is dependent on another query.
let queryData = await Document1.aggregate([
{
$match: {
OtherCollectionObjectId: {
$in: objectIdsCollection,
},
Deleted: false,
},
},
]);
But, this query is performing way slow. Sometimes it takes around 1 minute to fetch the results.
I tried couple of other suggestions from internet but nothing seems to work for this kind of scenario.
Please let me know anything which can improve the performance of this query.
Thanks

Mongoose Updating an array in multiple documents by passing an array of filters to update query

I have multiple documents(3 documents in this example) in one collection that looks like this:
{
_id:123,
bizs:[{_id:'',name:'a'},{_id:'',name:'b'}]
},
{
_id:456,
bizs:[{_id:'',name:'e'},{_id:'',name:'f'}]
}
{
_id:789,
bizs:[{_id:'',name:'x'},{_id:'',name:'y'}]
}
Now, I want to update the bizs subdocument by matching with my array of ids.
That is to say, my array filter for update query is [123,789], which will match against the _id fields of each document.
I have tried using findByIdAndUpdate() but that doesn't allow an array for the update query
How can I update the 2 matching documents (like my example above) without having to put findByIdAndUpdate inside a forloop to match the array element with the _id?
You can not use findByIdAndUpdate when updating multiple documents, findByIdAndUpdate is from mongoose which is a wrapper to native MongoDB's findOneAndUpdate. When you pass a single string as a filter to findByIdAndUpdate like : Collection.findByIdAndUpdate({'5e179dac627ef7823643cd97'}, {}) - then mongoose will internally convert string to ObjectId() & form it as a filter like :_id : ObjectId('5e179dac627ef7823643cd97') to execute findOneAndUpdate. So it means you can only update one document at a time, So if you've multiple documents to be updated use update with option {multi : true} or updateMany.
Assume if you wanted to push a new object to bizs, this is how query looks like :
collection.updateMany({ _id: { $in: [123, 456] } }, {
$push: {
bizs: {
"_id": "",
"name": "new"
}
}
})
Note : Update operations doesn't return the documents in response rather they will return write result which has information about n docs matched & n docs modified.

How to fetch just the "_id" field from MongoDB find()

I wish to return just the document id's from mongo that match a find() query.
I know I can pass an object to exclude or include in the result set, however I cannot find a way to just return the _id field.
My thought process is returning just this bit of information is going to be way more efficient (my use case requires no other document data just the ObjectId).
An example query that I expected to work was:
collection.find({}, { _id: 1 }).toArray(function(err, docs) {
...
}
However this returns the entire document and not just the _id field.
You just need to use a projection to find what ya want.
collection.find({filter criteria here}, {foo: 0, bar: 0, _id: 1});
Since I don't know what your document collection looks like this is all I can do for you. foo: 0 for example is exclude this property.
I found that using the cursor object directly I can specify the required projection. The mongodb package on npm when calling toArray() is returning the entire document regardless of the projection specified in the initial find(). Fixed working example below that satisfies my requirements of just getting the _id field.
Example document:
{
_id: new ObjectId(...),
test1: "hello",
test2: "world!"
}
Working Projection
var cursor = collection.find({});
cursor.project({
test1: 0,
test2: 0
});
cursor.toArray(function(err, docs) {
// Importantly the docs objects here only
// have the field _id
});
Because _id is by definition unique, you can use distinct to get an array of the _id values of all documents as:
collection.distinct('_id', function(err, ids) {
...
}
you can do like this
collection.find({},'_id').toArray(function(err, docs) {
...
}

How to get pointer of current document to update in updateMany

I have a latest mongodb 3.2 and there is a collection of many items that have timeStamp.
A need to convert milliseconds to Date object and now I use this function:
db.myColl.find().forEach(function (doc) {
doc.date = new Date(doc.date);
db.myColl.save(doc);
})
It took very long time to update 2 millions of rows.
I try to use updateMany (seems it is very fast) but how I can get access to a current document? Is there any chance to rewrite the query above by using updateMany?
Thank you.
You can leverage other bulk update APIs like the bulkWrite() method which will allow you to use an iterator to access a document, manipulate it, add the modified document to a list and then send the list of the update operations in a batch to the server for execution.
The following demonstrates this approach, in which you would use the cursor's forEach() method to iterate the colloction and modify the each document at the same time pushing the update operation to a batch of about 1000 documents which can then be updated at once using the bulkWrite() method.
This is as efficient as using the updateMany() since it uses the same underlying bulk write operations:
var cursor = db.myColl.find({"date": { "$exists": true, "$type": 1 }}),
bulkUpdateOps = [];
cursor.forEach(function(doc){
var newDate = new Date(doc.date);
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "date": newDate } }
}
});
if (bulkUpdateOps.length == 1000) {
db.myColl.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) { db.myColl.bulkWrite(bulkUpdateOps); }
Current query is the only one solution to set field value by itself or other field value (one could compute some data using more than one field from document).
There is a way to improve performance of that query - when it is executed vis mongo shell directly on server (no data is passed to client).

How to remove duplicates based on a key in Mongodb?

I have a collection in MongoDB where there are around (~3 million records). My sample record would look like,
{ "_id" = ObjectId("50731xxxxxxxxxxxxxxxxxxxx"),
"source_references" : [
"_id" : ObjectId("5045xxxxxxxxxxxxxx"),
"name" : "xxx",
"key" : 123
]
}
I am having a lot of duplicate records in the collection having same source_references.key. (By Duplicate I mean, source_references.key not the _id).
I want to remove duplicate records based on source_references.key, I'm thinking of writing some PHP code to traverse each record and remove the record if exists.
Is there a way to remove the duplicates in Mongo Internal command line?
This answer is obsolete : the dropDups option was removed in MongoDB 3.0, so a different approach will be required in most cases. For example, you could use aggregation as suggested on: MongoDB duplicate documents even after adding unique key.
If you are certain that the source_references.key identifies duplicate records, you can ensure a unique index with the dropDups:true index creation option in MongoDB 2.6 or older:
db.things.ensureIndex({'source_references.key' : 1}, {unique : true, dropDups : true})
This will keep the first unique document for each source_references.key value, and drop any subsequent documents that would otherwise cause a duplicate key violation.
Important Note: Any documents missing the source_references.key field will be considered as having a null value, so subsequent documents missing the key field will be deleted. You can add the sparse:true index creation option so the index only applies to documents with a source_references.key field.
Obvious caution: Take a backup of your database, and try this in a staging environment first if you are concerned about unintended data loss.
This is the easiest query I used on my MongoDB 3.2
db.myCollection.find({}, {myCustomKey:1}).sort({_id:1}).forEach(function(doc){
db.myCollection.remove({_id:{$gt:doc._id}, myCustomKey:doc.myCustomKey});
})
Index your customKey before running this to increase speed
While #Stennie's is a valid answer, it is not the only way. Infact the MongoDB manual asks you to be very cautious while doing that. There are two other options
Let the MongoDB do that for you using Map Reduce
Another way
You do programatically which is less efficient.
Here is a slightly more 'manual' way of doing it:
Essentially, first, get a list of all the unique keys you are interested.
Then perform a search using each of those keys and delete if that search returns bigger than one.
db.collection.distinct("key").forEach((num)=>{
var i = 0;
db.collection.find({key: num}).forEach((doc)=>{
if (i) db.collection.remove({key: num}, { justOne: true })
i++
})
});
I had a similar requirement but I wanted to retain the latest entry. The following query worked with my collection which had millions of records and duplicates.
/** Create a array to store all duplicate records ids*/
var duplicates = [];
/** Start Aggregation pipeline*/
db.collection.aggregate([
{
$match: { /** Add any filter here. Add index for filter keys*/
filterKey: {
$exists: false
}
}
},
{
$sort: { /** Sort it in such a way that you want to retain first element*/
createdAt: -1
}
},
{
$group: {
_id: {
key1: "$key1", key2:"$key2" /** These are the keys which define the duplicate. Here document with same value for key1 and key2 will be considered duplicate*/
},
dups: {
$push: {
_id: "$_id"
}
},
count: {
$sum: 1
}
}
},
{
$match: {
count: {
"$gt": 1
}
}
}
],
{
allowDiskUse: true
}).forEach(function(doc){
doc.dups.shift();
doc.dups.forEach(function(dupId){
duplicates.push(dupId._id);
})
})
/** Delete the duplicates*/
var i,j,temparray,chunk = 100000;
for (i=0,j=duplicates.length; i<j; i+=chunk) {
temparray = duplicates.slice(i,i+chunk);
db.collection.bulkWrite([{deleteMany:{"filter":{"_id":{"$in":temparray}}}}])
}
Expanding on Fernando's answer, I found that it was taking too long, so I modified it.
var x = 0;
db.collection.distinct("field").forEach(fieldValue => {
var i = 0;
db.collection.find({ "field": fieldValue }).forEach(doc => {
if (i) {
db.collection.remove({ _id: doc._id });
}
i++;
x += 1;
if (x % 100 === 0) {
print(x); // Every time we process 100 docs.
}
});
});
The improvement is basically using the document id for removing, which should be faster, and also adding the progress of the operation, you can change the iteration value to your desired amount.
Also, indexing the field before the operation helps.
pip install mongo_remove_duplicate_indexes
create a script in any language
iterate over your collection
create new collection and create new index in this collection with unique set to true ,remember this index has to be same as index u wish to remove duplicates from in ur original collection with same name
for ex-u have a collection gaming,and in this collection u have field genre which contains duplicates,which u wish to remove,so just create new collection
db.createCollection("cname")
create new index
db.cname.createIndex({'genre':1},unique:1)
now when u will insert document with similar genre only first will be accepted,other will be rejected with duplicae key error
now just insert the json format values u received into new collection and handle exception using exception handling
for ex pymongo.errors.DuplicateKeyError
check out the package source code for the mongo_remove_duplicate_indexes for better understanding
If you have enough memory, you can in scala do something like that:
cole.find().groupBy(_.customField).filter(_._2.size>1).map(_._2.tail).flatten.map(_.id)
.foreach(x=>cole.remove({id $eq x})