Compare only a subset of the fields in MongoDB - mongodb

According to the API doc of MongoDB, $addToSet is describe as such:
If the value is a document, MongoDB determines that the document is a
duplicate if an existing document in the array matches the to-be-added
document exactly; i.e. the existing document has the exact same fields
and values and the fields are in the same order. As such, field order
matters and you cannot specify that MongoDB compare only a subset of
the fields in the document to determine whether the document is a
duplicate of an existing array element.
So other than $addToSet, what is the best way to insert new document into an array only when there is no duplicate data (compare the duplication on specific fields). Let's create an example.
[{
"Name" : "Python",
"Month" : new Date("2020-01-01"),
"Sales":[
{"date": new Date("2020-01-01"), "volume":1, product: ["a"]},
{"date": new Date("2020-01-02"), "volume":2, product: ["a"]},
{"date": new Date("2020-01-03"), "volume":3, product: ["a","b"]},
{"date": new Date("2020-01-04"), "volume":4, product: ["a"]},
{"date": new Date("2020-01-05"), "volume":5, product: ["a","b"]},
]
}]
I would like to insert new embedding documents into the sales array field of the above example documents.
edit: the sub-document use date as the index key
[{"date": new Date("2020-01-05"), "volume":8, product: ["a","b","d"]} // Date already exist
{"date": new Date("2020-01-06"), "volume":6, product: ["a","b","c"]}] // new data
Three questions:
For data already exist in the database, I would like to update it with the new data.
How to perform in batch, instead of insert it one by one. (i.e. insert an array perform checking and update)
Insert only when no duplicate

Related

Insert multiple documents on duplicate update existing document with the new document?

What is the correct method for inserting multiple documents, say 5,000 of them in one command, on duplicate unique index, updating existing documents with new documents on all fields?
For instance, out of the 5,000 documents, 1,792 of them are new with no duplicates by unique indexes so they are inserted, and 3,208 of them have duplicates in the collection by unique indexes which should be replaced into the existing ones by all values.
I tried insertMany() with the unordered option but it seems to skip duplicate documents.
And then updateMany() with upsert:true isn't for inserting multiple documents but only updating certain fields in a collection?
Is this possible at all?
========Example=========
For a business collection with unique index of field "name":
{"name":"Google", "address":"...", "employees":38571, "phone":12345}
{"name":"Microsoft", "address":"...", "employees":73859, "phone":54321}
{"name":"Apple", "address":"...", "employees":55177, "phone":88888}
{"name":"Meta", "address":"...", "employees":88901, "phone":77777}
Now we want to update the collection with these 4 documents:
{"name":"Apple", "address":"...", "employees":55177, "phone":22222}
{"name":"Dell", "address":"...", "employees":77889, "phone":11223}
{"name":"Google", "address":"...", "employees":33333, "phone":44444}
{"name":"IBM", "address":"...", "employees":77777, "phone":88888}
In MySQL, I could just do this in one query:
INSERT INTO business (name, address, employees, phone)
VALUES
('Apple', '...', 55177, 22222),
('Dell', '...', 77889, 11223),
('Google', '...', 33333, 44444),
('IBM', '...', 77777, 88888)
AS new
ON DUPLICATE KEY UPDATE
address = new.address
employees = new.employees
phone = new.phone
And the collection documents become:
{"name":"Google", "address":"...", "employees":33333, "phone":44444} # updated
{"name":"Microsoft", "address":"...", "employees":73859, "phone":54321} # no change
{"name":"Apple", "address":"...", "employees":55177, "phone":22222} # updated
{"name":"Meta", "address":"...", "employees":88901, "phone":77777} # no change
{"name":"Dell", "address":"...", "employees":77889, "phone":11223} # inserted
{"name":"IBM", "address":"...", "employees":77777, "phone":88888} # inserted
How do I do this in MongoDB?
You probably just need the $merge. Put the documents you need to go through into another collection(says toBeInserted). $merge toBeInserted into the existing collection.
db.toBeInserted.aggregate([
{
"$project": {
// select the relevant fields
_id: 0,
name: 1,
address: 1,
employees: 1,
phone: 1
}
},
{
"$merge": {
"into": "companies",
"on": "name",
"whenMatched": "merge",
"whenNotMatched": "insert"
}
}
])
Mongo Playground

Unique multikey in mongodb for field of array of embedded fields

I have this Document structure:
id: "xxxxx",
name: "John",
pets: [
{
"id": "yyyyyy",
"type": "Chihuahua",
},
{
"id": "zzzzzz",
"type": "Labrador",
}
]
The pets field is not an array of embedded documents (not referencing any other collection).
I want the pets id to be unique across the documents and the document itself, but it seems the mongodb official docs say its not possible and doesnt offer other solution:
For unique indexes, the unique constraint applies across separate documents in the collection rather than within a single document.
Because the unique constraint applies to separate documents, for a
unique multikey index, a document may have array elements that result
in repeating index key values as long as the index key values for that
document do not duplicate those of another document.
https://docs.mongodb.com/manual/core/index-multikey/
I have tried this using mongodb golang driver:
_, err = collection.Indexes().CreateOne(context.TODO(), mongo.IndexModel{
Keys: bson.M{"pets.id": 1},
Options: options.Index().SetUnique(true),
})
but like the docs said, it allows 2 pets of a person to have same ID, while not allowing a pet from a different person to have the same ID compared to the pet of the first person...
is there anyway to enforce this in mongodb ?

Query in mongodb mongoose

There is a database on mongodb. It contains a collection of products, which was created when importing from a csv file with a unique _id. In products, each document has a field articul corresponding to the article of the manufacturer. There is also a field size indicating the size of the product. Since the size of one product can be different, when you import documents are created which for the same articul will have different size.
How to make a selection from products and create another collection in which to put values with a unique articul that must contain all the values of size for each articul?
What you are looking for is aggregation. You can group the documents on article names and save it in a different collection
db.products.aggregate(
{$group: {_id: '$articul', sizes: {$addToSet: '$size'}}},
{$out: 'articles'}
)
To store new size for existed articul
db.products.update(
{ "articul": "Banana" },
{ $addToSet: { size: 9 } }
)
if nothing matched from above query you need create new insert query
db.products.insertOne({
"articul": "Apple", "size": [8]
});

How do I manage a sublist in Mongodb?

I have different types of data that would be difficult to model and scale with a relational database (e.g., a product type)
I'm interested in using Mongodb to solve this problem.
I am referencing the documentation at mongodb's website:
http://docs.mongodb.org/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
For the data type that I am storing, I need to also maintain a relational list of id's where this particular product is available (e.g., store location id's).
In their example regarding "one-to-many relationships with embedded documents", they have the following:
{
name: "O'Reilly Media",
founded: 1980,
location: "CA",
books: [12346789, 234567890, ...]
}
I am currently importing the data with a spreadsheet, and want to use a batchInsert.
To avoid duplicates, I assume that:
1) I need to do an ensure index on the ID, and ignore errors on the insert?
2) Do I then need to loop through all the ID's to insert a new related ID to the books?
Your question could possibly be defined a little better, but let's consider the case that you have rows in a spreadsheet or other source that are all de-normalized in some way. So in a JSON representation the rows would be something like this:
{
"publisher": "O'Reilly Media",
"founded": 1980,
"location": "CA",
"book": 12346789
},
{
"publisher": "O'Reilly Media",
"founded": 1980,
"location": "CA",
"book": 234567890
}
So in order to get those sort of row results into the structure you wanted, one way to do this would be using the "upsert" functionality of the .update() method:
So assuming you have some way of looping the input values and they are identified with some structure then an analog to this would be something like:
books.forEach(function(book) {
db.publishers.update(
{
"name": book.publisher
},
{
"$setOnInsert": {
"founded": book.founded,
"location": book.location,
},
"$addToSet": { "books": book.book }
},
{ "upsert": true }
);
})
This essentially simplified the code so that MongoDB is doing all of the data collection work for you. So where the "name" of the publisher is considered to be unique, what the statement does is first search for a document in the collection that matches the query condition given, as the "name".
In the case where that document is not found, then a new document is inserted. So either the database or driver will take care of creating the new _id value for this document and your "condition" is also automatically inserted to the new document since it was an implied value that should exist.
The usage of the $setOnInsert operator is to say that those fields will only be set when a new document is created. The final part uses $addToSet in order to "push" the book values that have not already been found into the "books" array (or set).
The reason for the separation is for when a document is actually found to exist with the specified "publisher" name. In this case, all of the fields under the $setOnInsert will be ignored as they should already be in the document. So only the $addToSet operation is processed and sent to the server in order to add the new entry to the "books" array (set) and where it does not already exist.
So that would be simplified logic compared to aggregating the new records in code before sending a new insert operation. However it is not very "batch" like as you are still performing some operation to the server for each row.
This is fixed in MongoDB version 2.6 and above as there is now the ability to do "batch" updates. So with a similar analog:
var batch = [];
books.forEach(function(book) {
batch.push({
"q": { "name": book.publisher },
"u": {
"$setOnInsert": {
"founded": book.founded,
"location": book.location,
},
"$addToSet": { "books": book.book }
},
"upsert": true
});
if ( ( batch.length % 500 ) == 0 ) {
db.runCommand( "update", "updates": batch );
batch = [];
}
});
db.runCommand( "update", "updates": batch );
So what is doing in setting up all of the constructed update statements into a single call to the server with a sensible size of operations sent in the batch, in this case once every 500 items processed. The actual limit is the BSON document maximum of 16MB so this can be altered appropriate to your data.
If your MongoDB version is lower than 2.6 then you either use the first form or do something similar to the second form using the existing batch insert functionality. But if you choose to insert then you need to do all the pre-aggregation work within your code.
All of the methods are of course supported with the PHP driver, so it is just a matter of adapting this to your actual code and which course you want to take.

Clone MongoDB objects in Meteorjs by changing value of a single field

I am working on a Meteorjs application which is using MongoDB in back end.
In my collection there are some objects which are having a common field named parent_id eg
{name:'A',parent_id:'acd'}
{name:'b',parent_id:'acd'}
{name:'c',parent_id:'acd'}
{name:'d',parent_id:'acd'}
I want to copy all these objects in the database by changing the parent_id field eg
{name:'A',parent_id:'acdef'}
{name:'b',parent_id:'acdef'}
{name:'c',parent_id:'acdef'}
{name:'d',parent_id:'acdef'}
and these all objects will be in database like this
{name:'A',parent_id:'acd'}
{name:'b',parent_id:'acd'}
{name:'c',parent_id:'acd'}
{name:'d',parent_id:'acd'}
{name:'A',parent_id:'acdef'}
{name:'b',parent_id:'acdef'}
{name:'c',parent_id:'acdef'}
{name:'d',parent_id:'acdef'}
for this I have find the elements from the db which have parent_id:'abc'
items=db.collection.find({parent_id:'abc').fetch()
and using a loop i have changed the parent_id of each item and then tried this command
for(i=0;i<items.length;i++){
items[i].parent_id='abcdef';
meteor.collection.insert(item)
}
but it is giving me an errorduplicate for _id
Well it will unless you delete the _id value from the object first:
for(i=0;i<items.length;i++){
items[i].parent_id='abcdef';
delete items[i]["_id"];
meteor.collection.insert(item[i])
}
So the delete should clear that up and an new _id will be generated.
When you Collection.find() your documents you can use the field specifier to exclude the _id field.
var items = collection.find({}, {fields: {name: 1, parent_id: 1, _id: 0}).fetch();
Then when you modify and insert those documents again, they will be duplicates with each having its own unique _id.