Upsert instead of save mongodb - mongodb

I have two collections(so and temp_so).
My query:
db.getCollection('temp_sale_order').find({},'orderItemId').
forEach(function(element) { db.getCollection('sale_order').save(element); });
i have a document with orderItemId 123 in temp_so. I want to check if a document with orderItemId exist in so then the query should update the whole document in so. If the document with orderItemId does not exist in so, it should create one.
The query should do this for all the documents in temp_so.
My current query just insert the data in so and doesn't check if the document with orderItemId exists or not in so. How to rectify it?
In SQL this can be written as:
insert into so ( select * from temp_so where orderItemId not in ( select orderItemId from so));
Update so set col = select col from temp_so where temp_so.orderItemId = so.orderItemId.
*update for all the columns*

Technically speaking the .save() action is already a "wrapper" around an "upsert", it's just only ever looking at the _id field only, and where present.
So you just need to be explicit about which field you are looking up. And also removing the existing _id:
db.getCollection('temp_sale_order').find({},{ 'orderItemId': 1 }).
forEach(function(element) {
delete element._id;
db.getCollection('sale_order').update(
{ "orderItemId": element.orderItemId },
element,
{ "upsert": true }
);
});
You might also consider that you don't want to replace the whole document and would rather just "insert" the first occurance only. Which is what $setOnInsert does here:
db.getCollection('temp_sale_order').find({},{ 'orderItemId': 1}).
forEach(function(element) {
delete element._id;
db.getCollection('sale_order').update(
{ "orderItemId": element.orderItemId },
{ "$setOnInsert": { "orderItemid": element.orderItemId } },
{ "upsert": true }
);
});
So that basically means that any alteration within $setOnInsert will not actually be applied unless this is actually an "upsert" and a new document is created. If it is just a "match" then nothing in the document gets changed.

Related

Using MongoDB mongo go driver for counters persistent collection [duplicate]

as the title says, I want to perform a find (one) for a document, by _id, and if doesn't exist, have it created, then whether it was found or was created, have it returned in the callback.
I don't want to update it if it exists, as I've read findAndModify does. I have seen many other questions on Stackoverflow regarding this but again, don't wish to update anything.
I am unsure if by creating (of not existing), THAT is actually the update everyone is talking about, it's all so confuzzling :(
Beginning with MongoDB 2.4, it's no longer necessary to rely on a unique index (or any other workaround) for atomic findOrCreate like operations.
This is thanks to the $setOnInsert operator new to 2.4, which allows you to specify updates which should only happen when inserting documents.
This, combined with the upsert option, means you can use findAndModify to achieve an atomic findOrCreate-like operation.
db.collection.findAndModify({
query: { _id: "some potentially existing id" },
update: {
$setOnInsert: { foo: "bar" }
},
new: true, // return new doc if one is upserted
upsert: true // insert the document if it does not exist
})
As $setOnInsert only affects documents being inserted, if an existing document is found, no modification will occur. If no document exists, it will upsert one with the specified _id, then perform the insert only set. In both cases, the document is returned.
Driver Versions > 2
Using the latest driver (> version 2), you'll use findOneAndUpdate as findAndModify was deprecated. The new method takes 3 arguments, the filter, the update object (which contains your default properties, that should be inserted for a new object), and options where you have to specify the upsert operation.
Using the promise syntax, it looks like this:
const result = await collection.findOneAndUpdate(
{ _id: new ObjectId(id) },
{
$setOnInsert: { foo: "bar" },
},
{
returnOriginal: false,
upsert: true,
}
);
const newOrUpdatedDocument = result.value;
Its a bit dirty, but you can just insert it.
Be sure that the key has a unique index on it (if you use the _id it's ok, it's already unique).
In this way if the element is already present it will return an exception that you can catch.
If it isn't present, the new document will be inserted.
Updated: a detailed explanation of this technique on the MongoDB Documentation
Here's what I did (Ruby MongoDB driver):
$db[:tags].update_one({:tag => 'flat'}, {'$set' => {:tag => 'earth' }}, { :upsert => true })}
It will update it if it exists, and insert it if it doesn't.

Mongoose document update- selected objects in array - with $in operator

My schema is as below.
Customer: {
orders: [{
...
status: 'Pending'
...
}]
}
I have a list of order ids whose which are delivered to the customer. So I need to change the Status to closed. This is my simple requirement. I have my below code, which updates the status 'closed' only for the first order id in the OrderIdList.
orderIdList = ["54899c0cbdde6b281e9aaa22","54899c28bdde6b281e9aaa23","54899c2abdde6b281e9aaa24","54899c2cbdde6b281e9aaa25"]
Customer.update({
'orders._id': {
$in: orderIdList
}
}, {
'$set': {
'orders.$.status': 'Closed'
}
}, {
multi: true,
upsert: true
}, function(err, rows) {
console.log("error");
console.log(err);
console.log("rows updated");
console.log(rows);
});
Only one row is updated. for the orderId "54899c0cbdde6b281e9aaa22"
Why is it not updating for all the order Ids. Please suggest me the solution.
Thanks in advance.
which updates the status 'closed' only for the first order id in the
OrderIdList.
Yes, that is how the positional operator($) works. It updates only the first item in an array that has matched the condition in the query.
From the docs,
the positional $ operator acts as a placeholder for the first element
that matches the query document.
multi: true is applicable for parent documents and not for embedded documents. If this parameter is set to true, all the documents that have an order matching the required Id, will be updated. But the positional operator will update only the first sub document that matched the query in all these documents.
To update all the matching orders for all the matching document, currently it is not possible in a single mongo update query. You need to perform the logic in the application code.
One way of doing it in the application code:
Match all the customer documents which contain an order that we are
looking for.
For each customer document identify the orders that we need to
update.
Modify the order.
Perform an update based on the Customer document _id.
Sample Code:
var orderIdList = ['54899c0cbdde6b281e9aaa22'];
Customer.find({"orders._id":{$in:orderIdList }},function(err,resp){
resp.forEach(function(doc){
var orders = doc.orders;
orders.forEach(function(order){
if(orderIdList.indexOf(order._id.toString()) != -1){
order.status = 'closed';
Customer.update({"_id":doc._id},{$set:{"orders":orders}},function(err,aff,raw){
console.log("Updated: "+aff);
});}})})})

How to remove duplicates based on a key in Mongodb?

I have a collection in MongoDB where there are around (~3 million records). My sample record would look like,
{ "_id" = ObjectId("50731xxxxxxxxxxxxxxxxxxxx"),
"source_references" : [
"_id" : ObjectId("5045xxxxxxxxxxxxxx"),
"name" : "xxx",
"key" : 123
]
}
I am having a lot of duplicate records in the collection having same source_references.key. (By Duplicate I mean, source_references.key not the _id).
I want to remove duplicate records based on source_references.key, I'm thinking of writing some PHP code to traverse each record and remove the record if exists.
Is there a way to remove the duplicates in Mongo Internal command line?
This answer is obsolete : the dropDups option was removed in MongoDB 3.0, so a different approach will be required in most cases. For example, you could use aggregation as suggested on: MongoDB duplicate documents even after adding unique key.
If you are certain that the source_references.key identifies duplicate records, you can ensure a unique index with the dropDups:true index creation option in MongoDB 2.6 or older:
db.things.ensureIndex({'source_references.key' : 1}, {unique : true, dropDups : true})
This will keep the first unique document for each source_references.key value, and drop any subsequent documents that would otherwise cause a duplicate key violation.
Important Note: Any documents missing the source_references.key field will be considered as having a null value, so subsequent documents missing the key field will be deleted. You can add the sparse:true index creation option so the index only applies to documents with a source_references.key field.
Obvious caution: Take a backup of your database, and try this in a staging environment first if you are concerned about unintended data loss.
This is the easiest query I used on my MongoDB 3.2
db.myCollection.find({}, {myCustomKey:1}).sort({_id:1}).forEach(function(doc){
db.myCollection.remove({_id:{$gt:doc._id}, myCustomKey:doc.myCustomKey});
})
Index your customKey before running this to increase speed
While #Stennie's is a valid answer, it is not the only way. Infact the MongoDB manual asks you to be very cautious while doing that. There are two other options
Let the MongoDB do that for you using Map Reduce
Another way
You do programatically which is less efficient.
Here is a slightly more 'manual' way of doing it:
Essentially, first, get a list of all the unique keys you are interested.
Then perform a search using each of those keys and delete if that search returns bigger than one.
db.collection.distinct("key").forEach((num)=>{
var i = 0;
db.collection.find({key: num}).forEach((doc)=>{
if (i) db.collection.remove({key: num}, { justOne: true })
i++
})
});
I had a similar requirement but I wanted to retain the latest entry. The following query worked with my collection which had millions of records and duplicates.
/** Create a array to store all duplicate records ids*/
var duplicates = [];
/** Start Aggregation pipeline*/
db.collection.aggregate([
{
$match: { /** Add any filter here. Add index for filter keys*/
filterKey: {
$exists: false
}
}
},
{
$sort: { /** Sort it in such a way that you want to retain first element*/
createdAt: -1
}
},
{
$group: {
_id: {
key1: "$key1", key2:"$key2" /** These are the keys which define the duplicate. Here document with same value for key1 and key2 will be considered duplicate*/
},
dups: {
$push: {
_id: "$_id"
}
},
count: {
$sum: 1
}
}
},
{
$match: {
count: {
"$gt": 1
}
}
}
],
{
allowDiskUse: true
}).forEach(function(doc){
doc.dups.shift();
doc.dups.forEach(function(dupId){
duplicates.push(dupId._id);
})
})
/** Delete the duplicates*/
var i,j,temparray,chunk = 100000;
for (i=0,j=duplicates.length; i<j; i+=chunk) {
temparray = duplicates.slice(i,i+chunk);
db.collection.bulkWrite([{deleteMany:{"filter":{"_id":{"$in":temparray}}}}])
}
Expanding on Fernando's answer, I found that it was taking too long, so I modified it.
var x = 0;
db.collection.distinct("field").forEach(fieldValue => {
var i = 0;
db.collection.find({ "field": fieldValue }).forEach(doc => {
if (i) {
db.collection.remove({ _id: doc._id });
}
i++;
x += 1;
if (x % 100 === 0) {
print(x); // Every time we process 100 docs.
}
});
});
The improvement is basically using the document id for removing, which should be faster, and also adding the progress of the operation, you can change the iteration value to your desired amount.
Also, indexing the field before the operation helps.
pip install mongo_remove_duplicate_indexes
create a script in any language
iterate over your collection
create new collection and create new index in this collection with unique set to true ,remember this index has to be same as index u wish to remove duplicates from in ur original collection with same name
for ex-u have a collection gaming,and in this collection u have field genre which contains duplicates,which u wish to remove,so just create new collection
db.createCollection("cname")
create new index
db.cname.createIndex({'genre':1},unique:1)
now when u will insert document with similar genre only first will be accepted,other will be rejected with duplicae key error
now just insert the json format values u received into new collection and handle exception using exception handling
for ex pymongo.errors.DuplicateKeyError
check out the package source code for the mongo_remove_duplicate_indexes for better understanding
If you have enough memory, you can in scala do something like that:
cole.find().groupBy(_.customField).filter(_._2.size>1).map(_._2.tail).flatten.map(_.id)
.foreach(x=>cole.remove({id $eq x})

mongodb: upserting: only set value if document is being inserted

Considering a simple mongo document structure:
{ _id, firstTime, lastTime }
The client needs to insert a document with a known ID, or update an existing document. The 'lastTime' should always be set to some latest time. For the 'firstTime', if a document is being inserted, then the 'firstTime' should be set to current time. However, if the document is already created, then 'firstTime' remain unchanged. I would like to do it purely with upserts (to avoid look ups).
I've crawled the http://www.mongodb.org/display/DOCS/Updating, but I just don't see how that particular operation can be done.
I don't believe this is something unreasonable, there are $push and $addToSet operations that effectively do that on array fields, just nothing that would do the same on simple fields. It's like there should be something like $setIf operation.
I ran into the exact same problem and there was no simple solution for <2.4 however since 2.4 the $setOnInsert operator let's you do exactly that.
db.collection.update( <query>,
{ $setOnInsert: { "firstTime": <TIMESTAMP> } },
{ upsert: true }
)
See the 2.4 release notes of setOnInsert for more info.
I ran into a very similar problem when attempting to upsert documents based on existing content--maybe this solution will work for you also:
Try removing the _id attribute from your record and only use it in the query portion of your update (you'll have to translate from pymongo speak...)
myid = doc.get('_id')
del doc['_id']
mycollection.update({'_id':myid}, {'$set':doc}, upsert=True)
If you will trigger the following code 2 subsequent times, it will first set both firstVisit and lastVisit on document insert (and will return upsertedId in the response) and on the second it will only update lastVisit (and will return modifiedCount: 1).
Tested with Mongo 4.0.5 though I believe should be working with older versions.
db.collection.updateOne(
{_id: 1},
{
$set: {
lastVisit: Date.now()
},
$setOnInsert: {
firstVisit: Date.now()
}
},
{ upsert: true }
);
There's no way to do this with just one upsert. You'd have to do it as 2 operations - first try to insert the document, if it already exists the insert will fail due to duplicate key violation on the _id index. Then you do an update operation to set the lastTime to now.

Upserts in mongodb when using custom _id values

I need to insert a document if it doesn't exist. I know that the "upsert" option can do that, but I have some particular needs.
First I need to create the document with its _id field only, but only if it doesn't exist already. My _id field is a number generated by me (not an ObjectId). If I use the "upsert" option then I get "Mod on _id not allowed"
db.mycollection.update({ _id: id }, { _id: id }, { upsert: true });
I know that we can't use the _id in a $set.
So, my question is: If there any way to a "create if doesn't exists" atomically in mongodb?
EDIT:
As proposed by #Barrie this works (using nodejs and mongoose):
var newUser = new User({ _id: id });
newUser.save(function (err) {
if (err && err.code === 11000) {
console.log('If duplicate key the user already exists', newTwitterUser);
return;
}
console.log('New user or err', newTwitterUser);
});
But I still wonder if it is the best way to do it.
I had the same problem, but found a better solution for my needs. You can use that same query style if you simply remove the _id attribute from the update object. So if at first you get an error with this:
db.mycollection.update({ _id: id }, {$set: { _id: id, name: 'name' }}, { upsert: true });
instead use this:
db.mycollection.update({ _id: id }, {$set: { name: 'name' }}, { upsert: true });
This is better because it works for both insert and update.
UPDATE: Upsert with _id can be done without $setOnInsert, as explaind by #Barrie above.
The trick is to use $setOnInsert:{_id:1} with upsert, that way the _id is only written to if it's an insert, and never for updates.
Only, there was a bug preventing this from working until v2.6 - I just tried it on 2.4 and it's not working.
The workaround I use is having another ID field with a unique index. Eg. $setOnInsert:{myId:1}.
You can just use insert(). If the document with the _id you specify already exists, the insert() will fail, nothing will be modified - so "create if it doesn't exist" is what it's already doing by default when you use insert() with a user-created _id.
Please note that $setOnInsert don't work easily when you upsert a simple key => value object (not $set or other).
I need to use that (in PHP):
public function update($criteria , $new_object, array $options = array()){
// In 2.6, $setOnInsert with upsert == true work with _id field
if(isset($options['upsert']) && $options['upsert']){
$firstKey = array_keys($new_object)[0];
if(strpos($firstKey, '$')===0){
$new_object['$setOnInsert']['_id'] = $this->getStringId();
}
//Even, we need to check if the object exists
else if($this->findOne($criteria, ['_id'])===null){
//In this case, we need to set the _id
$new_object['_id'] = $this->getStringId();
}
}
return parent::update($criteria, $new_object, $options);
}