I have the following two schemas:
var SchemaOne = new mongoose.Schema({
id_headline: { type: String, required: true },
tags: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Tag' }]
});
var tagSchema = new mongoose.Schema({
_id: { type: String, required: true, index: { unique: true } }, // value
name: { type: String, required: true }
});
As you can see, in the first schema there is an array of references to the second schema.
My problem is:
Suppose that, in my backend server, I receive an array of tags (just the id's) and, before creating the SchemaOne document, I need to verify if the received tags already exist in the database and, if not, create them. Only after having all the tags stored in the database, I may assign this received array to the tags array of the to be created SchemaOne document.
I'm not sure on how to implement this? Can you give me a helping hand?
So lets assume you have input being sent to your server that essentially resolves to this:
var input = {
"id_headline": "title",
"tags": [
{ "name": "one" },
{ "name": "two" }
]
};
And as you state, you are not sure whether any of the "tags" entries alredy exists, but of course the "name" is also unique for lookup to the associated object.
What you are basically going to have to do here is "lookup" each of the elements within "tags" and return the document with the reference to use to the objects in the "Tag" model. The ideal method here is .findOneAndUpdate(), with the "upsert" option set to true. This will create the document in the collection where it is not found, and at any rate will return the document content with the reference that was created.
Note that natually, you want to ensure you have those array items resolved "first", before preceeding to saving the main "SchemaOne" object. The async library has some methods that help structure this:
async.waterfall(
[
function(callback) {
async.map(input.tags,function(tag,callback) {
Tag.findOneAndUpdate(
{ "name": tag.name },
{ "$setOnInsert": { "name": tag.name } },
{ "upsert": true, "new": true },
callback
)
},callback);
},
function(tags,callback) {
Model.findOneAndUpdate(
{ "id_headline": input.id_headline },
{ "$addToSet": {
"tags": { "$each": tags.map(function(tag) { return tag._id }) }
}},
{ "upsert": true, "new": true },
callback
)
}
],
function(err,result) {
// if err then do something to report it, otherwise it's done.
}
)
So the async.waterfall is a special flow control method that will pass the result returned from each of the functions specified in the array of arguments to the next one, right until the end of execution where you can optionally pass in the result of the final function in the list. It basically "cascades" or "waterfalls" results down to each step. This is wanted to pass in the results of the "tags" creation to the main model creation/modification.
The async.map within the first executed stage looks at each of the elements within the array of the input. So for each item contained in "tags", the .findOneAndUpdate() method is called to look for and possibly create if not found, the specified "tag" entry in the collection.
Since the output of .map() is going to be an array of those documents, it is simply passed through to the next stage. Therefore each iteration returns a document, when the iteration is complete you have all documents.
The next usage of .findOneAndUpdate() with "upsert" is optional, and of course considers that the document with the matching "id_headline" may or may not exist. The same case is true that if it is there then the "update" is processed, if not then it is simply created. You could optionally .insert() or .create() if the document was known not to be there, but the "update" action gives some interesting options.
Namely here is the usage of $addToSet, where if the document already existed then the specified items would be "added" to any content that was already there, and of course as a "set", any items already present would not be new additions. Note that only the _id fields are required here when adding to the array with an atomic operator, hence the .map() function employed.
An alternate case on "updating" could be to simply "replace" the array content using the $set atomic operation if it was the intent to only store those items that were mentioned in the input and no others.
In a similar manner the $setOnInsert shown when "creating"/"looking for" items in "Tags" makes sure that there is only actual "modification" when the object is "created/inserted", and that removes some write overhead on the server.
So the basic priciples of using .findOneAndUpdate() at least for the "Tags" entries is the most optimal way of handling this. This avoids double handling such as:
Querying to see if the document exists by name
if No result is returned, then send an additional statement to create one
That means two operations to the database with communication back and forth, which the actions here using "upserts" simplifies into a single request for each item.
Related
How can I update a mongo document with the following requirements:
Find a document by email property:
If the document exists:
If both retrieved and new document have property A, keep property A (the retrieved one).
If retrieved document property A is null or undefined or doesn't exist, update using property A of the new object.
If the document doesn't exist
Insert the new document.
The findOneAndUpdate seems not to convey the both 3 of the requirements. Thanks.
My recommendation is to go the following path:
db.getCollection('<some-collection>').update(
{ email: 'someguy#email.com' },
{
$set: {
name: "some guy",
username: someguy,
tel: '1234'
}
},
{ upsert: true }
);
Check upsert documentation:
https://docs.mongodb.com/manual/reference/method/db.collection.update/#upsert-option
Lets go through your requirements now:
3. If the document doesn't exist, insert the new document.
Yes, it will insert new document to collection if it doesnt find the document by email. Resulting document will be combination of find condition + $set + autogenerated _id, so it will look something like this:
{
_id: ObjectId(...)
email: 'someguy#email.com'
name: "some guy",
username: someguy,
tel: '1234'
}
2. If retrieved document property A is null or undefined or doesn't exist, update using property A of the new object.
All properties provided in $set will unconditionally be persisted in the database, which also covers your requirement of updating null/undefined values
3. If both retrieved and new document have property A, keep property A (the retrieved one).
If both newly provided A and database A are the same, we dont have a problem.
If As are different, dont you want to store the new A value?
If you are afraid of nulls/undefined values, you can omit them before providing object to $set.
What is the use-case for you not wanting to update database property with newly provided value?
One use-case i can see is that you want to pass createdAt in case you are creating new record, but dont want to update that value for existing records.
If thats the case, and you know those properties in advance, you can use $setOnInsert update operator. https://docs.mongodb.com/manual/reference/operator/update/#id1
So your update query can look like this:
db.getCollection('<some-collection>').update(
{ email: 'someguy#email.com' },
{
$set: {
name: "some guy",
username: someguy,
tel: '1234'
},
$setOnInsert: {
createdAt: new Date(),
updatedAt: new Date()
}
},
{ upsert: true }
);
I hope this helps!
You need not retrieve the document for updating the property A. You can use the update API of mongo to do so. Please find the psuedo code below:
db.<collection>.update({
"$or": [
{ "PropertyA": { "$exists": false } },
{ "PropertyA": null }
]
}, {$set: {"PropertyA": "NewValue"}});
The above code is for one property, but I think you can figure out how to scale it up.
Hope this helps !!
MongoDB does not allow to replace an item in an array in a single operation. Instead it's a pull followed by a push operation.
Unfortunately we have a case where we end up with a race condition on the same item in the array on parallel requests (distributed environment), i.e.
2x pull runs first, then 2x push. This results in duplicate entries, e.g.
{
"_id": ...,
"nestedArray": [
{
"subId": "1"
},
{
"subId": "1"
},
{
"subId": "2"
}
]
}
Are there any workarounds?
I usually use an optimistic lock for this situation.
To prepare for this, you need to add a version field to your model, which you will increment each time you modify that model. Then you use this method:
Model.findOneAndUpdate(
{$and: [{_id: <current_id>}, {version: <current_version>}]},
{nestedArray: <new_nested_array>})
.exec(function(err, result) {
if(err) {
// handle error
}
if(!result) {
// the model has been updated in the mean time
}
// all is good
});
This means that you first need to get the model and compute the new array <new_nested_array>. This way you can be sure that only one modification will take place for a certain version.
Hope I explained myself.
I have different types of data that would be difficult to model and scale with a relational database (e.g., a product type)
I'm interested in using Mongodb to solve this problem.
I am referencing the documentation at mongodb's website:
http://docs.mongodb.org/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
For the data type that I am storing, I need to also maintain a relational list of id's where this particular product is available (e.g., store location id's).
In their example regarding "one-to-many relationships with embedded documents", they have the following:
{
name: "O'Reilly Media",
founded: 1980,
location: "CA",
books: [12346789, 234567890, ...]
}
I am currently importing the data with a spreadsheet, and want to use a batchInsert.
To avoid duplicates, I assume that:
1) I need to do an ensure index on the ID, and ignore errors on the insert?
2) Do I then need to loop through all the ID's to insert a new related ID to the books?
Your question could possibly be defined a little better, but let's consider the case that you have rows in a spreadsheet or other source that are all de-normalized in some way. So in a JSON representation the rows would be something like this:
{
"publisher": "O'Reilly Media",
"founded": 1980,
"location": "CA",
"book": 12346789
},
{
"publisher": "O'Reilly Media",
"founded": 1980,
"location": "CA",
"book": 234567890
}
So in order to get those sort of row results into the structure you wanted, one way to do this would be using the "upsert" functionality of the .update() method:
So assuming you have some way of looping the input values and they are identified with some structure then an analog to this would be something like:
books.forEach(function(book) {
db.publishers.update(
{
"name": book.publisher
},
{
"$setOnInsert": {
"founded": book.founded,
"location": book.location,
},
"$addToSet": { "books": book.book }
},
{ "upsert": true }
);
})
This essentially simplified the code so that MongoDB is doing all of the data collection work for you. So where the "name" of the publisher is considered to be unique, what the statement does is first search for a document in the collection that matches the query condition given, as the "name".
In the case where that document is not found, then a new document is inserted. So either the database or driver will take care of creating the new _id value for this document and your "condition" is also automatically inserted to the new document since it was an implied value that should exist.
The usage of the $setOnInsert operator is to say that those fields will only be set when a new document is created. The final part uses $addToSet in order to "push" the book values that have not already been found into the "books" array (or set).
The reason for the separation is for when a document is actually found to exist with the specified "publisher" name. In this case, all of the fields under the $setOnInsert will be ignored as they should already be in the document. So only the $addToSet operation is processed and sent to the server in order to add the new entry to the "books" array (set) and where it does not already exist.
So that would be simplified logic compared to aggregating the new records in code before sending a new insert operation. However it is not very "batch" like as you are still performing some operation to the server for each row.
This is fixed in MongoDB version 2.6 and above as there is now the ability to do "batch" updates. So with a similar analog:
var batch = [];
books.forEach(function(book) {
batch.push({
"q": { "name": book.publisher },
"u": {
"$setOnInsert": {
"founded": book.founded,
"location": book.location,
},
"$addToSet": { "books": book.book }
},
"upsert": true
});
if ( ( batch.length % 500 ) == 0 ) {
db.runCommand( "update", "updates": batch );
batch = [];
}
});
db.runCommand( "update", "updates": batch );
So what is doing in setting up all of the constructed update statements into a single call to the server with a sensible size of operations sent in the batch, in this case once every 500 items processed. The actual limit is the BSON document maximum of 16MB so this can be altered appropriate to your data.
If your MongoDB version is lower than 2.6 then you either use the first form or do something similar to the second form using the existing batch insert functionality. But if you choose to insert then you need to do all the pre-aggregation work within your code.
All of the methods are of course supported with the PHP driver, so it is just a matter of adapting this to your actual code and which course you want to take.
I have this collection:
Books
[
{
title: Book1,
References: [ObjectId(1), ObjectId(3), ObjectId(5)] <- These are object ids of another collection
Main-Reference: ObjectId(5)
},
{
title: Book2,
References: [ObjectId(2), ObjectId(5), ObjectId(7)]
Main-Reference: ObjectId(5)
}
{
title: Book3,
References: [ObjectId(5), ObjectId(7), ObjectId(9)]
Main-Reference: ObjectId(7)
},
]
I have an operation where I delete a Reference from book collection
Example: Assume I have to delete Reference ObjectId(5) from my collection
So my new collection become this:
Books
[
{
title: Book1,
References: [ObjectId(1), ObjectId(3)] <- ObjectId(5) is pulled
Main-Reference: ObjectId(1) <- ObjectId(1) is new value as ObjectId(5) is deleted
},
{
title: Book2,
References: [ObjectId(2), ObjectId(7)] <- ObjectId(5) is pulled
Main-Reference: ObjectId(2) <- ObjectId(2) is now main reference
}
{
title: Book3,
References: [ObjectId(7), ObjectId(9)] <- ObjectId(5) is pulled
Main-Reference: ObjectId(7) <- no changes here as ObjectId(7) still exists in References
},
]
Currently this is how I am doing:
Step 1: Pull ObjectId(5) from all Books where References[] has ObjectId(5)
Step 2: Query Books collection where Main-Reference=ObjectId(5) & use References: {$slice:1} slice to get the first array element from References array
Step 3: Update all of the books found in Step 2 & replace Main-Reference with the first array element I get from slice
This seems clumsy to me and trying to see if there is a better way to do this.
If I essentially get your gist you basically want to
Pull the item that is not required from your references array
Set the value of your main-reference field to the first element of the altered array
And get that done all in one update without moving documents across the wire.
But this sadly cannot be done. The main problem with this is that there is no way to refer to the value of another field within the document being updated. Even so, to do this without iterating you would also need to access the changed array in order to get the new first element.
Perhaps one approach is to re-think your schema in order to accomplish what you want. My option here would expanding on your references documents a little and removing the need for the main-reference field.
It seems that the assumption you are willing to live with on the updates is that if the removed reference was the main-reference then you can just set the new main-reference to the first element in the array. With that in mind consider the following structure:
refs: [ { oid: "object1" }, { oid: "object2" }, { oid: "object5", main: true } ]
By changing these to documents with an oid property that would be set to the ObjectId it gives the option to have an additional property on the document that specifies which is the default. This can easily be queried determine which Id is the main reference.
Now also consider what would happen if the document matching "object5" in the oid field was pulled from the array:
refs: [ { oid: "object1" }, { oid: "object2" } ]
So when you query for which is the main-reference as per the earlier logic you accept the first document in the array. Now of course, to your application requirements, if you want to set a different main-reference you just alter the document
refs: [ { oid: "object1" }, { oid: "object2", main: true } ]
And now the logic remains to choose the array element that has the main property as true would occur in preference, and as shown above that if that property does not exist on any elements document then fall back to the first element.
With all of that digested, your operation to pull all references to an object out of that array in all documents becomes quite simple, as done in the shell ( same format should basically apply to whatever driver ):
db.books.update(
{ "refs.oid": "object5" },
{ $pull: { refs: {oid: "object5"} } }, false, true )
The two extra arguments to the query and update operation being upsert and multi respectively. In this case, upsert does not make much sense as we only want to modify documents that exist, and multi means that we want to update everything that matched. The default is to change just the first document.
Naturally I shortened all the notation but of course the values can be actual ObjectId's as per your intent. It seemed also reasonable to presume that your main usage of the main-reference is once you have retrieved the document. Defining a query that returns the main-reference by following the logic that was outlined should be possible, but as it stands I have typed a lot out here and need to break for dinner :)
I think this presents a worthwhile case for re-thinking your schema to avoid over the wire iterations for what you want to achieve.
I would like to mark all messages as read by 'jim'. Here is the structure of a thread:
db.threads.save({
messages: [
{
read_by: ['bob', 'jim']
},
{
read_by: ['bob']
},
{
read_by: ['bob']
}
]
})
As you can see, one message has already been read by 'jim', the rest only by 'bob'. I'd like to find and modify any embedded documents so that 'jim' is appended to the read_by array.
Here is where I got:
db.threads.findAndModify({
query: {
'messages.read_by': {
$ne: 'jim'
}
},
update: {
$push: {
'messages.$.read_by': 'jim'
}
}
})
I get this error:
uncaught exception: findAndModifyFailed failed: "can't append to array using string field name [$]"
The query works with a db.threads.find() so I guess the problem is with the update part of the findAndModify() call.
I know its been a while, but pushes into nested documents is indeed possible. You need to add an each into it. For example,
db.threads.update({
'messages.read_by': {
$ne: 'jim'
}
},
{
$push: {
'messages.read_by': {
$each: ['jim']
}
}
}
)
See here for more samples - http://docs.mongodb.org/manual/reference/operator/update/push/
Even though you are just passing only one value, for nested arrays, you need to pass $each. If the document already contains a read_by field with some values in it, then an update without the each works. Using the $each will work irrespective of whether the field exists or not.
With one operation, you can't do this yet. See this question, which is the same as yours.
You will have to this sort of operation in your application: find() all messages which jim has not already read, append him to them, and then set the messages field of your thread to this array.