MongoDB updateOne with upsert failed: Duplicate Key - mongodb

I have a collection with 2 compound unique index, uuid and id. I want to update a document if the collection have a document with unique value of uuid and id (composite unique) and I found in the documentation that updateOne with upsert=true can do this. So, I use:
db.collection("messages").updateOne({uuid:this.uuid, id:new_message.id}, {$set: {uuid: this.uuid, ...new_message}}, {upsert:true})
and this always throw an error saying that there's a document with duplicate value of uuid=xxx and id=yyy. I looked up and found a post stating there's data race happening on update and insert on mongodb upsert operation so this will always happen. Is there another way to do this? How do I properly and efficiently upsert a collection with 1 million documents?
EDIT:
I gave the wrong code for this question. The code should be:
db.collection("messages").updateOne({uuid:this.uuid, key:{id:new_message.key.id}}, {$set: {uuid: this.uuid, ...new_message}}, {upsert:true})

Since you have multi-threading, this is a common problem. All the supported operations in mongo will run into this issue as it is based on your architecture.
You can catch the exception and retry the operation. In this case, one of the threads would be succeeded. Other one will pass through exception handling. This is a feasible workaround.
When do you except both threads updating the same document at the same time? This is a serious design problem. This will alter the desired document state.

So, after trying out things I found out that I should use dot notation in the query, I changed it to:
db.collection("messages").updateOne({uuid:this.uuid, "key.id":new_message.key.id}}, {$set: {uuid: this.uuid, ...new_message}}, {upsert:true})
and now it works.

Related

How should I efficiently delete alot of records from a mongodb collection?

This bounty has ended. Answers to this question are eligible for a +500 reputation bounty. Bounty grace period ends in 4 hours.
Jiew Meng wants to draw more attention to this question.
I am using Mongo to store multi tenant data. As part of data cleanup for a tenant I want to delete everything related to the tenant. The tenantId is indexed but there are alot of rows and it takes a long time to query and I have no easy way to get the progress.
Currently I do something
db.records.deleteMany({tenantId: x})
Is there a better way?
Thinking of doing in batches but like query for x records then build a list of ids to delete. Seems very manual but isit the recommended way?
Some options that I can think of.
Drop the index, before deleting. You can recreate the index after the deletion.
Change the write concern to a lower value, possibly 0. Request won't wait for acknowledgement from secondaries.
db.records.deleteMany({tenantId: x},{w : 0});
If there is another field with enough cardinality to reduce the number of documents, try including that in the query.
Ex: if anotherField as 0,1,2,3 as values, then execute the delete command 4 times, each time with different value.
db.records.deleteMany({tenantId: x, anotherField: 0},{w : 0});
db.records.deleteMany({tenantId: x, anotherField: 1},{w : 0});
db.records.deleteMany({tenantId: x, anotherField: 2},{w : 0});
db.records.deleteMany({tenantId: x, anotherField: 3},{w : 0});
The performance may depend on variety of different factors. But here are some options you can try to improve the performance
Bulk operations
Bulk operations might help here. bulk.find(query).remove() is a version of db.collection.remove(query) that optimized for large numbers of operations. You can read more about it here
You can use the following way:
Declare a search query:
var query= {tenantId: x};
Initialize and use a bulk:
var bulk = db.yourCollection.initializeUnorderedBulkOp()
bulk.find(query).remove() // or try delete() instead of remove()
bulk.execute()
The idea here rather not to speed up the removal, but to produce less load.
Also you could try bulkWrite()
db.yourCollection.bulkWrite([
{ deleteMany: {
"filter" : query,
}}
])
TTL indexes
It may be not suitable for your use case, but there's entirely another approach without removing by yourself at all.
If it is suitable for you to delete data based on a timestamp, then a TTL index might help you. The idea here is that the record is being removed when the TTL expires.
Implemented as a special index type, TTL collections make it possible
to store data in MongoDB and have the mongod automatically remove data
after a specified period of time.
DeleteMany I think, There must be something common between all the rows that you want to remove from the collection.
You can find out something and then create a query accordingly.
this will help you to remove those records fast.
Let me give you one example. I want to remove all the records where username is not exists.
db.collection.deleteMany({ username: {$exists: false} })
The best place to start is to find something that all records have in common in-order to removed them all at once.
For example the following code deletes all entries that don't contain an email address.
db.users.deleteMany({ email: { $exists: false } })
MongoDB documentation have great examples. Link provided below.
https://www.mongodb.com/docs/manual/reference/method/db.collection.deleteMany/#delete-multiple-documents
You might also want to consider dropping the index since it could be recreated after your done with the operation.
Finally you might want to lower the write concern in your operation in order to speed things up. A compile list of options can be found here
https://www.mongodb.com/docs/v5.0/reference/write-concern/#w-option
I found a good tutorial on https://www.geeksforgeeks.org/mongodb-delete-multiple-documents-using-mongoshell/ that might help you further.
apologies for any grammatical mistakes since English is not my native tongue
I would suggest two solutions, and also please export your model If anything goes wrong you will have a backup of your data or try this in your test DB first 
you can use your tenantId as a condition, not matching _id but with extra logic, like if any of the records do have the tenantId delete them so this way all of your tenant data will be removed using a single query.
db.records.deleteMany({tenantId : {$exists: true})
// suggestion- if any of your tenant data has a field tenantId but it is null you can check for a null value also to delete those records.
 
2) find command data in all of the records, if there is use it as a condition to delete those records.
for example, all of your tenant data have a common field called type with the same value use delete statement like
db.records.deleteMany({type : 1})

How can I add validation to adding email to my db? Upsert?

I'm trying to undertstand the best way to do this.
I am getting the name and email and I want to add it to my collection.
However, if the email already exists, then i don't want to insert the name and email. Is there a way to do this using upsert? I'm trying to understand from the documentation but its a bit confusing for me. http://docs.mongodb.org/manual/reference/method/db.collection.update/ Any help is greatly appreciated.
First of all, you should consider creating an unique index for email field to ensure that there could be only one document for any particular email:
db.collection.createIndex({email: 1}, {unique: true})
You could also add sparse option to allow documents without email.
Then you'll have two options depending on your particular use case: to use upsert, or to use insert ignoring duplicate key errors.
Upsert
Using the following upsert operation
db.collection.update({email: email}, {$set: {name: name}}, {upsert: true})
you will:
create new document if there is no such email yet;
update existing document with new name if the email already exists.
Here is a quotation from MondoDB documentation explaining upsert behavior when no document matches the query criteria:
The update creates a base document from the equality clauses in the <query> parameter, and then applies the update expressions from the <update> parameter.
Insert
If you don't want to update name field of an existing document, you should use basic insert operation instead:
db.u.insert({email: email, name: name})
ignoring all 11000 E11000 duplicate key errors.

Mongodb - duplicate fields in $set and $setOnInsert

In this post, the accepted answer explains that you cannot have the same fields under $set and $setOnInsert in an upsert operation.
Can someone explain why this is? It seems like the $setOnInsert shouldn't conflict with $set, since the former is used when a document is inserted, and the latter is used when the document is updated.
I faced this problem. If someone is looking for a solution, you need to understand how the $set and $setOnInsert mechanism works
$set refreshes if found (ignoring $setOnInsert)
$setOnInsert inserts a new record (and then executes $set)
I did not know this and thought that only one operator would work. This way I was able to get rid of duplicate fields
$set operator is used on upsert too. So it's nonsense to refer same fields both on $set and $setOnInsert.
Just try this on an empty collection:
db.items.remove();
db.items.update({},{$set:{a:1},$setOnInsert:{b:2}},{upsert:1})
db.items.find({});

What is 'upsert' in the context of MongoDB?

In the context of MongoDB, what is upsert?
Is this an update and insert?
Just curious as I see the usage of this term in many articles and documentation on the MongoDB website.
From the documentation: An operation that will either update the first document matched by a query or insert a new document if none matches. The new document will have the fields implied by the operation.
See http://docs.mongodb.org/manual/reference/glossary/#term-upsert
To put it into SQL terms it is much like a ON DUPLICATE KEY ... UPDATE except that it isn't so verbose in how to query for it.
So essentially it is when you query for an update document, MongoDB doesn't find it and so inserts it.
The condition for the upsert accepts all the same stuff as a normal update except it also has the $setOnInsert ( http://docs.mongodb.org/manual/reference/operator/update/setOnInsert/ ) operator which allows you to define a set of fields that will only be taken into consideration on an insert.

Is there an "upsert" option in the mongodb insert command?

I know this may be a silly question, but I read on an e-book that there is an upsert option in MongoDB insert. I couldn't find proper documentation about this. Can someone educate me about this?
Since upsert is defined as operation that "creates a new document when no document matches the query criteria" there is no place for upsertsin insert command. It is an option for the update command. If you execute command like below it works as an update, if there is a document matching query, or as an insert with document described by update as an argument.
db.collection.update(query, update, {upsert: true})
MongoDB 3.2 adds replaceOne:
db.collection.replaceOne(query, replacement, {upsert: true})
which has similar behavior, but its replacement cannot contain update operators.
As in the links provided by PKD, db.collection.insert() provides no upsert possibility. Instead, mongo insert inserts a new document into a collection. Upsert is only possible using db.collection.update() and db.collection.save().
If you happen to pass a document to db.collection.insert() which is already in the collection and thus has an _id similar to an existing _id, it will throw a duplicate key exception.
For upserting a singe document using the java driver:
FindOneAndReplaceOptions replaceOptions = new FindOneAndReplaceOptions();
replaceOptions.upsert(true);
collection.findOneAndReplace(
Filters.eq("key", "value"),
document,
replaceOptions
);
Although uniqueness should be ensured from Filters.eq("key", "value") otherwise there is a possibility of adding multiple documents. See this for more