What is the correct method for inserting multiple documents, say 5,000 of them in one command, on duplicate unique index, updating existing documents with new documents on all fields?
For instance, out of the 5,000 documents, 1,792 of them are new with no duplicates by unique indexes so they are inserted, and 3,208 of them have duplicates in the collection by unique indexes which should be replaced into the existing ones by all values.
I tried insertMany() with the unordered option but it seems to skip duplicate documents.
And then updateMany() with upsert:true isn't for inserting multiple documents but only updating certain fields in a collection?
Is this possible at all?
========Example=========
For a business collection with unique index of field "name":
{"name":"Google", "address":"...", "employees":38571, "phone":12345}
{"name":"Microsoft", "address":"...", "employees":73859, "phone":54321}
{"name":"Apple", "address":"...", "employees":55177, "phone":88888}
{"name":"Meta", "address":"...", "employees":88901, "phone":77777}
Now we want to update the collection with these 4 documents:
{"name":"Apple", "address":"...", "employees":55177, "phone":22222}
{"name":"Dell", "address":"...", "employees":77889, "phone":11223}
{"name":"Google", "address":"...", "employees":33333, "phone":44444}
{"name":"IBM", "address":"...", "employees":77777, "phone":88888}
In MySQL, I could just do this in one query:
INSERT INTO business (name, address, employees, phone)
VALUES
('Apple', '...', 55177, 22222),
('Dell', '...', 77889, 11223),
('Google', '...', 33333, 44444),
('IBM', '...', 77777, 88888)
AS new
ON DUPLICATE KEY UPDATE
address = new.address
employees = new.employees
phone = new.phone
And the collection documents become:
{"name":"Google", "address":"...", "employees":33333, "phone":44444} # updated
{"name":"Microsoft", "address":"...", "employees":73859, "phone":54321} # no change
{"name":"Apple", "address":"...", "employees":55177, "phone":22222} # updated
{"name":"Meta", "address":"...", "employees":88901, "phone":77777} # no change
{"name":"Dell", "address":"...", "employees":77889, "phone":11223} # inserted
{"name":"IBM", "address":"...", "employees":77777, "phone":88888} # inserted
How do I do this in MongoDB?
You probably just need the $merge. Put the documents you need to go through into another collection(says toBeInserted). $merge toBeInserted into the existing collection.
db.toBeInserted.aggregate([
{
"$project": {
// select the relevant fields
_id: 0,
name: 1,
address: 1,
employees: 1,
phone: 1
}
},
{
"$merge": {
"into": "companies",
"on": "name",
"whenMatched": "merge",
"whenNotMatched": "insert"
}
}
])
Mongo Playground
Related
I want to delete the documents from a collection(Collection1) which are not matching with other collection(Collection2).
collection1 document - {_id: <autogen>, items: [{key:'key', ...}}]
collection2 document - {_id: 'key', action: {key1:'key1', ...}}, {_id: 'non_matching_key', action: {key1:'key1', ...}}
Delete all the documents from collection2, where items.key in collection1 is not matching with _id in collection2. with reference to the above example, a document with key-value 'non_matching_key' should be delete from collection2. There would be similar documents in collection2 like the one with _id value 'non_matching_key'.
The approach I thought was for mark and sweep.
I will add the column in collection2 documents for matching ids(in collection2 with items.key in collection1). This is mark step
Delete all the documents from collection2 where newly added column do not exists. This is sweep step.
Could you please advise if there is a better way of doing this?
Thanks,
Not fully clear how your documents look like and what are the exact conditions, but you could do it like this:
var ids = db.collection1.find({}, { "item.key": 1 }).toArray();
db.collection2.deleteMany({ _id: { $nin: ids } });
So basically, you want to iterate over your collection 2 and get all the ids that are not in collection 1 and delete those or vice versa.
db.collection2.find({}).forEach((doc) => db.collection1.deleteOne({_id: {$ne: doc._id}}))
or
let idsToDelete = db.collection2.find({}).distinct('_id')
let deleteResponse = db.collection1.deleteMany({_id: {$nin: idsToDelete}})
SWAP the collection name in the case of the other way.
NOTE: The code is just to give an overview and is not tested.
I have this Document structure:
id: "xxxxx",
name: "John",
pets: [
{
"id": "yyyyyy",
"type": "Chihuahua",
},
{
"id": "zzzzzz",
"type": "Labrador",
}
]
The pets field is not an array of embedded documents (not referencing any other collection).
I want the pets id to be unique across the documents and the document itself, but it seems the mongodb official docs say its not possible and doesnt offer other solution:
For unique indexes, the unique constraint applies across separate documents in the collection rather than within a single document.
Because the unique constraint applies to separate documents, for a
unique multikey index, a document may have array elements that result
in repeating index key values as long as the index key values for that
document do not duplicate those of another document.
https://docs.mongodb.com/manual/core/index-multikey/
I have tried this using mongodb golang driver:
_, err = collection.Indexes().CreateOne(context.TODO(), mongo.IndexModel{
Keys: bson.M{"pets.id": 1},
Options: options.Index().SetUnique(true),
})
but like the docs said, it allows 2 pets of a person to have same ID, while not allowing a pet from a different person to have the same ID compared to the pet of the first person...
is there anyway to enforce this in mongodb ?
I have a collection that looks like:
transactions
{
"_id": ...,
"status": "new"
}
processed
{
"_id": id value from collection above
}
So I want to update the transaction collection status value to "processed", if there is record in the processed collection.
The processed id value is from the transactions collection id.
Is this possible to do in mongo console or I have to do this using code?
You can do that with a $merge stage in aggregation.
remove all fields except the _id
add any fields with the updated value
merge with the other collection
db.procecessed.aggregate([
{$project:{_id:1}},
{$addFields:{status:"processed"}},
{$merge:"transactions"}
])
According to the API doc of MongoDB, $addToSet is describe as such:
If the value is a document, MongoDB determines that the document is a
duplicate if an existing document in the array matches the to-be-added
document exactly; i.e. the existing document has the exact same fields
and values and the fields are in the same order. As such, field order
matters and you cannot specify that MongoDB compare only a subset of
the fields in the document to determine whether the document is a
duplicate of an existing array element.
So other than $addToSet, what is the best way to insert new document into an array only when there is no duplicate data (compare the duplication on specific fields). Let's create an example.
[{
"Name" : "Python",
"Month" : new Date("2020-01-01"),
"Sales":[
{"date": new Date("2020-01-01"), "volume":1, product: ["a"]},
{"date": new Date("2020-01-02"), "volume":2, product: ["a"]},
{"date": new Date("2020-01-03"), "volume":3, product: ["a","b"]},
{"date": new Date("2020-01-04"), "volume":4, product: ["a"]},
{"date": new Date("2020-01-05"), "volume":5, product: ["a","b"]},
]
}]
I would like to insert new embedding documents into the sales array field of the above example documents.
edit: the sub-document use date as the index key
[{"date": new Date("2020-01-05"), "volume":8, product: ["a","b","d"]} // Date already exist
{"date": new Date("2020-01-06"), "volume":6, product: ["a","b","c"]}] // new data
Three questions:
For data already exist in the database, I would like to update it with the new data.
How to perform in batch, instead of insert it one by one. (i.e. insert an array perform checking and update)
Insert only when no duplicate
There is a database on mongodb. It contains a collection of products, which was created when importing from a csv file with a unique _id. In products, each document has a field articul corresponding to the article of the manufacturer. There is also a field size indicating the size of the product. Since the size of one product can be different, when you import documents are created which for the same articul will have different size.
How to make a selection from products and create another collection in which to put values with a unique articul that must contain all the values of size for each articul?
What you are looking for is aggregation. You can group the documents on article names and save it in a different collection
db.products.aggregate(
{$group: {_id: '$articul', sizes: {$addToSet: '$size'}}},
{$out: 'articles'}
)
To store new size for existed articul
db.products.update(
{ "articul": "Banana" },
{ $addToSet: { size: 9 } }
)
if nothing matched from above query you need create new insert query
db.products.insertOne({
"articul": "Apple", "size": [8]
});