How do I implement this mongodb query & update operation (CSharp driver)? - mongodb

I have this collection:
Books
[
{
title: Book1,
References: [ObjectId(1), ObjectId(3), ObjectId(5)] <- These are object ids of another collection
Main-Reference: ObjectId(5)
},
{
title: Book2,
References: [ObjectId(2), ObjectId(5), ObjectId(7)]
Main-Reference: ObjectId(5)
}
{
title: Book3,
References: [ObjectId(5), ObjectId(7), ObjectId(9)]
Main-Reference: ObjectId(7)
},
]
I have an operation where I delete a Reference from book collection
Example: Assume I have to delete Reference ObjectId(5) from my collection
So my new collection become this:
Books
[
{
title: Book1,
References: [ObjectId(1), ObjectId(3)] <- ObjectId(5) is pulled
Main-Reference: ObjectId(1) <- ObjectId(1) is new value as ObjectId(5) is deleted
},
{
title: Book2,
References: [ObjectId(2), ObjectId(7)] <- ObjectId(5) is pulled
Main-Reference: ObjectId(2) <- ObjectId(2) is now main reference
}
{
title: Book3,
References: [ObjectId(7), ObjectId(9)] <- ObjectId(5) is pulled
Main-Reference: ObjectId(7) <- no changes here as ObjectId(7) still exists in References
},
]
Currently this is how I am doing:
Step 1: Pull ObjectId(5) from all Books where References[] has ObjectId(5)
Step 2: Query Books collection where Main-Reference=ObjectId(5) & use References: {$slice:1} slice to get the first array element from References array
Step 3: Update all of the books found in Step 2 & replace Main-Reference with the first array element I get from slice
This seems clumsy to me and trying to see if there is a better way to do this.

If I essentially get your gist you basically want to
Pull the item that is not required from your references array
Set the value of your main-reference field to the first element of the altered array
And get that done all in one update without moving documents across the wire.
But this sadly cannot be done. The main problem with this is that there is no way to refer to the value of another field within the document being updated. Even so, to do this without iterating you would also need to access the changed array in order to get the new first element.
Perhaps one approach is to re-think your schema in order to accomplish what you want. My option here would expanding on your references documents a little and removing the need for the main-reference field.
It seems that the assumption you are willing to live with on the updates is that if the removed reference was the main-reference then you can just set the new main-reference to the first element in the array. With that in mind consider the following structure:
refs: [ { oid: "object1" }, { oid: "object2" }, { oid: "object5", main: true } ]
By changing these to documents with an oid property that would be set to the ObjectId it gives the option to have an additional property on the document that specifies which is the default. This can easily be queried determine which Id is the main reference.
Now also consider what would happen if the document matching "object5" in the oid field was pulled from the array:
refs: [ { oid: "object1" }, { oid: "object2" } ]
So when you query for which is the main-reference as per the earlier logic you accept the first document in the array. Now of course, to your application requirements, if you want to set a different main-reference you just alter the document
refs: [ { oid: "object1" }, { oid: "object2", main: true } ]
And now the logic remains to choose the array element that has the main property as true would occur in preference, and as shown above that if that property does not exist on any elements document then fall back to the first element.
With all of that digested, your operation to pull all references to an object out of that array in all documents becomes quite simple, as done in the shell ( same format should basically apply to whatever driver ):
db.books.update(
{ "refs.oid": "object5" },
{ $pull: { refs: {oid: "object5"} } }, false, true )
The two extra arguments to the query and update operation being upsert and multi respectively. In this case, upsert does not make much sense as we only want to modify documents that exist, and multi means that we want to update everything that matched. The default is to change just the first document.
Naturally I shortened all the notation but of course the values can be actual ObjectId's as per your intent. It seemed also reasonable to presume that your main usage of the main-reference is once you have retrieved the document. Defining a query that returns the main-reference by following the logic that was outlined should be possible, but as it stands I have typed a lot out here and need to break for dinner :)
I think this presents a worthwhile case for re-thinking your schema to avoid over the wire iterations for what you want to achieve.

Related

Firestore arrayUnion on aggragated array of objects?

I want to arrayUnion the votes field from the following document:
{
answers: [
{
title: "title"
votes: [
"id1",
"id2",
]
}
]
}
It's important to me to use arrayUnion since I need to use an atomic operation (in case a user goes offline and then back online).
Since your answers field is an array value, you'll need to specify the entire value of the array item when using arrayUnion on that field. There is no way to use an arrayUnion operation on the nested votes subfield in there, as that'd be providing a partial update to an array item, which isn't a supported operation.
So you'll have to :
Read the document and get the entire answers field from it into your application code.
Modify the correct array item with the new votes subvalue.
Write the entire array back to the database.

how can I make the "updated" of mongodb stop when updating a field of a nested array?

I have a database like this:
{
"universe":"comics",
"saga":[
{
"name":"x-men",
"characters":[
{
"character":"wolverine",
"picture":"618035022351.png"
},
{
"character":"wolverine",
"picture":"618035022352.png"
}
]
}
]
},
{
"universe":"dc",
"saga":[
{
"name":"spiderman",
"characters":[
{
"character":"venom",
"picture":"618035022353.png"
}
]
}
]
}
And with this code, I update the field where name: wolverine:
db.getCollection('collection').findOneAndUpdate(
{
"universe": "comics"
},
{
$set: {
"saga.$[outer].characters.$[inner].character": "lobezno",
"saga.$[outer].characters.$[inner].picture": "618035022354.png"
}
},
/*{
"saga.characters": 1
},*/
{
"arrayFilters": [
{
"outer.name": "x-men"
},
{
"inner.character": "wolverine"
}
],
"multi":false
}
)
I want to just update the first object where there is a match, and stop it.
For example, if I have an array of 100,000 elements and the object where the match is, is in the tenth position, he will update that record, but he will continue going through the entire array and this seems ineffective to me even though he already did the update.
Note: if I did the update using an _id inside of universe.saga.characters instead of doing the update using the name, it would still loop through the rest of the elements.
How can I do it?
Update using arrayFilters conditions
I don't think it will find and update through loop, and It does not matter if collection have 100,000 sub documents, because here is nice explanation in $[<identifier>] and has mentioned:
The $[<identifier>] to define an identifier to update only those array elements that match the corresponding filter document in the arrayFilters
In the update document, use the $[<identifier>] filtered positional operator to define an identifier, which you then reference in the array filter documents. But make sure you cannot have an array filter document for an identifier if the identifier is not included in the update document.
Update using _id
Your point,
Note: if I did the update using an _id inside of universe.saga.characters instead of doing the update using the name, it would still loop through the rest of the elements.
MongoDB will certainly use the _id index. Here is the nice answer on question MongoDB Update Query Performance, from this you will get an better idea on above point
Update using indexed fields
You can create index according to your query section of update command, Here MongoDB Indexes and Indexing Strategies has explained why index is important,
In your example, lets see with examples:
Example 1: If document have 2 sub documents and when you update and check with explain("executionStats"), assume it will take 1 second to update,
quick use Mongo Playground (this platform will not support update query)
Example 2: If document have 1000 sub documents and when you update and check with explain("executionStats"), might be it will take more then 1 second,
If provide index on fields (universe, saga.characters.character and saga.characters.picture) then definitely it will take less time then usual without index, main benefit of index it will direct point to indexed fields.
quick use Mongo Playground (this platform will not support update query)
Create Index for your fields
db.maxData.createIndex({
"universe": 1,
"saga.characters.character": 1,
"saga.characters.picture": 1
})
For more experiment use above 2 examples data with index and without index and check executionStats you will get more clarity.

Mongo best practice to structure nested document array

I've been struggling to find a solution to the following problem and seem to get conflicting advice from various mongodb posts. I am trying to figure out how to correctly represent an "array" of sub-objects such that:
they can be upserted (i.e. updated or new element created if needed, in a single operation)
the ids of the objects are available as values that can be searched, not just keys (that you can't really search in mongo).
I have a structure that I can represent as an array (repr A):
{
_id: 1,
subdocs: [
{ sd_id: 1, title: t1 },
{ sd_id: 2, title: t2 },
...
]
}
or as a nested document (repr B)
{
_id: 1,
subdocs: {
1: { title: t1 },
2: { title: t2 },
...
}
}
I would like to be able to update OR insert (i.e. upsert) new subdocs without having to use extra in-application logic.
In repr B this is straight-forward as I can simply use set
$set: {subdocs.3.title: t3}
in an update with upsert: true.
In repr A it is possible to update an existing record using an 'arrayFilter' with something like:
update({_id: 1}, {$set: {subdocs.$[i].title: t3}}, {arrayFilter: [{i.sd_id: 3}], upsert: true})
The problem is that while the above will update an existing subobject it will not create a new subobject (i.e. with _id: 3) if it does not exist (it is not an upsert). The docs claim that $[] does support upsert but this does not work for me.
While repr B does allow for update/upserts there is no way to search on the ids of the subdocuments because they are now keys rather than values.
The only solution to the above is to use a denormalized representation with e.g. the id being stored as both a key and a value:
subdocs: {
1: { sd_id: 1, title: t1 },
2: { sd_id: 2, title: t2 },
...
}
But this seems precarious (because the values might get out of sync).
So my question is whether there is a way around this? Am I perhaps missing a way to do an upsert in case A?
UPDATE: I found a workaround that lets me effectively use repr A even though I'm not sure its optimal. It involves using two writes rather than one:
update({_id: 1, "subdocs.sd_id": {$ne: 3}}, {$push: {subdocs: {sd_id: 3}}})
update({_id: 1}, {$set: {subdocs.$[i].title: t3}}, {arrayFilter: [{i.sd_id: 3}]})
The first line in the above ensures that we only ever insert one subdoc with sd_id 3 (and only has an effect if the id does not exist) while the second line updates the record (which should now definitely exist). I can probably put these in an ordered bulkwrite to make it all work.

Insert multiple documents referenced by another Schema

I have the following two schemas:
var SchemaOne = new mongoose.Schema({
id_headline: { type: String, required: true },
tags: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Tag' }]
});
var tagSchema = new mongoose.Schema({
_id: { type: String, required: true, index: { unique: true } }, // value
name: { type: String, required: true }
});
As you can see, in the first schema there is an array of references to the second schema.
My problem is:
Suppose that, in my backend server, I receive an array of tags (just the id's) and, before creating the SchemaOne document, I need to verify if the received tags already exist in the database and, if not, create them. Only after having all the tags stored in the database, I may assign this received array to the tags array of the to be created SchemaOne document.
I'm not sure on how to implement this? Can you give me a helping hand?
So lets assume you have input being sent to your server that essentially resolves to this:
var input = {
"id_headline": "title",
"tags": [
{ "name": "one" },
{ "name": "two" }
]
};
And as you state, you are not sure whether any of the "tags" entries alredy exists, but of course the "name" is also unique for lookup to the associated object.
What you are basically going to have to do here is "lookup" each of the elements within "tags" and return the document with the reference to use to the objects in the "Tag" model. The ideal method here is .findOneAndUpdate(), with the "upsert" option set to true. This will create the document in the collection where it is not found, and at any rate will return the document content with the reference that was created.
Note that natually, you want to ensure you have those array items resolved "first", before preceeding to saving the main "SchemaOne" object. The async library has some methods that help structure this:
async.waterfall(
[
function(callback) {
async.map(input.tags,function(tag,callback) {
Tag.findOneAndUpdate(
{ "name": tag.name },
{ "$setOnInsert": { "name": tag.name } },
{ "upsert": true, "new": true },
callback
)
},callback);
},
function(tags,callback) {
Model.findOneAndUpdate(
{ "id_headline": input.id_headline },
{ "$addToSet": {
"tags": { "$each": tags.map(function(tag) { return tag._id }) }
}},
{ "upsert": true, "new": true },
callback
)
}
],
function(err,result) {
// if err then do something to report it, otherwise it's done.
}
)
So the async.waterfall is a special flow control method that will pass the result returned from each of the functions specified in the array of arguments to the next one, right until the end of execution where you can optionally pass in the result of the final function in the list. It basically "cascades" or "waterfalls" results down to each step. This is wanted to pass in the results of the "tags" creation to the main model creation/modification.
The async.map within the first executed stage looks at each of the elements within the array of the input. So for each item contained in "tags", the .findOneAndUpdate() method is called to look for and possibly create if not found, the specified "tag" entry in the collection.
Since the output of .map() is going to be an array of those documents, it is simply passed through to the next stage. Therefore each iteration returns a document, when the iteration is complete you have all documents.
The next usage of .findOneAndUpdate() with "upsert" is optional, and of course considers that the document with the matching "id_headline" may or may not exist. The same case is true that if it is there then the "update" is processed, if not then it is simply created. You could optionally .insert() or .create() if the document was known not to be there, but the "update" action gives some interesting options.
Namely here is the usage of $addToSet, where if the document already existed then the specified items would be "added" to any content that was already there, and of course as a "set", any items already present would not be new additions. Note that only the _id fields are required here when adding to the array with an atomic operator, hence the .map() function employed.
An alternate case on "updating" could be to simply "replace" the array content using the $set atomic operation if it was the intent to only store those items that were mentioned in the input and no others.
In a similar manner the $setOnInsert shown when "creating"/"looking for" items in "Tags" makes sure that there is only actual "modification" when the object is "created/inserted", and that removes some write overhead on the server.
So the basic priciples of using .findOneAndUpdate() at least for the "Tags" entries is the most optimal way of handling this. This avoids double handling such as:
Querying to see if the document exists by name
if No result is returned, then send an additional statement to create one
That means two operations to the database with communication back and forth, which the actions here using "upserts" simplifies into a single request for each item.

MongoDB update fields of subarrays that meet criteria

I am having a problem where I need to update a specific field found in arrays contained in a bigger array that match certain criteria as of MongoDB v2.2.3.
I have the following mongodb sample document.
{
_id: ObjectId("50be30b64983e5100a000009"),
user_id: 0
occupied: {
Greece: [
{
user_id: 3,
deadline: ISODate("2013-02-08T19:19:28Z"),
fulfilled: false
},
{
user_id: 4,
deadline: ISODate("2013-02-16T19:19:28Z"),
fulfilled: false
}
],
Italy: [
{
user_id: 2,
deadline: ISODate("2013-02-15T19:19:28Z"),
fulfilled: false
}
]
}
}
Each country in the occupied array has its own set of arrays.
What I am trying to do is find the document where user_id is 0, search through the occupied.Greece array only for elements that have "deadline": {$gt: ISODate(current-date)} and change their individual "fulfilled" fields to true.
I have tried the $ and $elemMatch operators but they match only one, the first, array element in a query whereas I need it to match all eligible elements by the given criteria and make the update without running the same query multiple times or having to process the arrays client-side.
Is there no server-side solution for generic updates in a single document? I am developing using PHP though a solution to this should be universal.
I'm afraid this is not possible. From the documentation:
Remember that the positional $ operator acts as a placeholder for the first match of the update query selector. [emphasis not mine]
This is tracked in the MongoDB Jira under SERVER-1243.
There's quite a number of related feature requests in the jira, mostly under the topic 'virtual collections'.