Related
I have 2 collections in mongodb 4.2:
article - [Id,ArticletypeId,BestResponseId,Topic,PredecessorId]
{ Id: 1, ArticleTypeId:1, BestResponseId:2, Topic:"XYZ" },
{ Id: 2, ArticleTypeId:2, PredecessorId:1 },
{ Id: 3, ArticleTypeId:2, PredecessorId:1 },
{ Id: 4, ArticleTypeId:2, BestResponseId:5, Topic:"ABC" },
{ Id: 5, ArticleTypeId:1, PredecessorId:4 },
{ Id: 6, ArticleTypeId:2, PredecessorId:4 }
result-[Id,ArticleId,ResultTypeId]
{ Id: 1, ArticleId:1, ResultTypeId:2 },
{ Id: 2, ArticleId:2, ResultTypeId:2 },
{ Id: 3, ArticleId:2, ResultTypeId:2 },
{ Id: 4, ArticleId:3, ResultTypeId:2 },
{ Id: 5, ArticleId:2, ResultTypeId:2 },
{ Id: 6, ArticleId:4, ResultTypeId:2 },
{ Id: 7, ArticleId:5, ResultTypeId:2 },
{ Id: 8, ArticleId:6, ResultTypeId:2 },
{ Id: 9, ArticleId:6, ResultTypeId:2 }
In article collection, BestResponseId is the ArticleId of the best response to the given article,i.e., for ArticleId = 1 with Topic"XYZ", the best response is ArticleId=2 and so on.
The PredecessorId indicates which article the response is for.
In result collection, ArticleId is the foreign key referencing article-Id
We need to find the list of topics where the Count(ResultTypeId=2) is greater in AnyResponse than the BestResponse,so in the below example:
For the results from 1-5,The Count(ResultTypeId = 2 ) for ArticleId(2) is 2, but for other response against the same article the Count(ResultTypeId=2) is 1,so best response got the best result, we wont consider in output.
But here for the other results from 6-9:The Count(ResultTypeId = 2 ) for ArticleId(5) is 1,where as Count(ResultTypeId = 2 ) for ArticleId(5) is 2,
hence the expected output will be the
Topic
"ABC"
So, basically, you do a join between article & article[on Id and
PredecessorId,self join], get the list of PredecessorIds alongwith
which one among them is BestResponseId,so the first level of lookup
should give output like:
PredecessorId|ArticleId|IsBestResponse
1 |2 | true
1 |3 | false
4 |5 | true
4 |6 | false
Now once you join this with result(ArticleId),and do the count of
ResultTypeId=2 groupBy ArticleId. So after the second level of
lookup,the output will be:
ArticleId|PredecessorId|IsBestResponse|ResultType2_Count
2 | 1 | true | 3
3 | 1 | false | 1
5 | 4 | true | 1
6 | 4 | false | 2
Now, we need to output the topic name for the predecessors of
Articles, for whom the IsBestResponse=false but ResultType2_Count
greater than the ResultType2_Count of the article for which
IsBestResponse=true belonging to the same predecessor.
So between ArticleId 5 and 6, this condition satisfies.And the
corresponding Topic of their Predecessor ["ABC"]is expected output.
If, 2 & had satisfied the same condition,we would have printed "XYZ"
as well.But,it's not.
I am kind of new to mongodb and lookup, this is what I have done so far:
db.article.aggregate([
{
$lookup:{
from:"article",
localField:"ArticleId",
foreignField:"PredecessorId",
as:"articles"
}
},
{$unwind:"$articles"},
{$lookup:{
from:"result",
localField:"answers.Id",
foreignField:"ArticleId",
as:"articles"
}},
{$unwind:"$articles"}
])
I am sure I need to do $sum or $count in the second level of nested look up.
Is there any way I can accomplish it inside the same query?
Thank you in advance!
So it would appear that what you are looking for is actually the following:
db.article.aggregate([
{ "$match": { "Topic": { "$exists": true } } },
{ "$lookup": {
"from": "article",
"let": { "id": "$Id", "bestResponse": "$BestResponseId" },
"pipeline": [
{ "$match": {
"$expr": { "$eq": [ "$$id", "$PredecessorId" ] }
}},
{ "$lookup": {
"from": "result",
"let": { "articleId": "$Id" },
"pipeline": [
{ "$match": {
"ResultTypeId": 2,
"$expr": { "$eq": [ "$$articleId", "$ArticleId" ] }
}},
{ "$count": "count" }
],
"as": "results"
}},
{ "$addFields": {
"results": "$$REMOVE",
"count": { "$sum": "$results.count" },
"isBestResponse": { "$eq": ["$$bestResponse", "$Id"] }
}}
],
"as": "responses"
}},
{ "$match": {
"$expr": {
"$gt": [
{ "$max": "$responses.count" },
{ "$arrayElemAt": [
"$responses.count",
{ "$indexOfArray": [ "$responses.Id", "$BestResponseId" ] }
]}
]
}
}}
])
And that would provide ( as more MongoDB like output than the relational output you are explaining as ) :
{
"_id" : ObjectId("5da1206f22b8db5a00668cc4"),
"Id" : 4,
"ArticleTypeId" : 2,
"BestResponseId" : 5,
"Topic" : "ABC",
"responses" : [
{
"_id" : ObjectId("5da1206f22b8db5a00668cc5"),
"Id" : 5,
"ArticleTypeId" : 1,
"PredecessorId" : 4,
"count" : 1,
"isBestResponse" : true
},
{
"_id" : ObjectId("5da1206f22b8db5a00668cc6"),
"Id" : 6,
"ArticleTypeId" : 2,
"PredecessorId" : 4,
"count" : 2,
"isBestResponse" : false
}
]
}
Now I'll walk through that and explain why it is so.
Firstly you want a $match stage right at the start of the pipeline to just exclude anything other than those valid Topic results. This uses a simple $exists in order to just retrieve those results with that field, and then hence meet the condition for the first "join".
The actual $lookup is going to use the modern form with a pipeline expression. This is for the two primary reasons of:
We actually want an "inner" $lookup expression to get results from the other collection.
We want to do manipulations on the results before they are returned "as an array" which is the output of $lookup always. This is more efficient that manipulating the "array" returned afterwards.
One thing to note in this syntax is the let expression:
"let": { "id": "$Id", "bestResponse": "$BestResponseId" },
The most common use case here is to provide values from the parent document which can be used in the $expr logic within an initial $match indicating the "join" conditions, i.e which field value matches for local and foreign. But in this case we actually have another valid use, notably for the bestResponse value declared.
Note once we have "joined" which is the "self-join" part in order to get the relevant child items, then the very next thing we want is another $lookup nested within this pipeline expression. In this case we want it's initial $match stage in it's own pipeline expression to use an addition constraint for ResultTypeId: 2, which is part of what the question asks. This is basically how you can include multiple conditions on a "join".
Since we really are not interested in the nested details from the result collection and really do no need a results array within the other array of "children", then to reduce the results we use the $count pipeline stage within this sub-pipeline.
Now that's not completely what you want, so in the initial $lookup operation within it's pipeline expression, you then add the $addFields stage to manipulate what is essentially an array in the results property ( albeit of just a single document with one property ) into just being a single property in each child with a singular value via the $sum operator. You could do:
"count": { "$arrayElemAt": [ "$results.count", 0 ] }
And that would be the same result, but it's notably a longer expression than just "$sum": "$results.count".
The other thing you wanted ( though not really necessary for the remaining logic ) is to identify which "child" actually matches the BestResponseId value. This is actually what we use the bestResponse variable we declared earlier. Since that is the value in the parent, then this is processed against each child from within the pipeline and simply returns true or false where that current child Id field actually matched that value from the parent.
Once out of the $lookup pipeline stage, the only thing left to do is determine after the "join" which of the result documents actually meets the condition of having an article with a higher results count than the one marked as the "BestResponse". This is done with another $match pipeline stage, which employs the $expr operator again.
In short, the $max is used to obtain the maximum of the count values returned within each child entry as the responses array from the $lookup. This is compared to the value obtained by the $indexOfArray operator matching against the values of the Id fields in the responses array where they match the parent BestResponseId ( or alternately, where the isBestResponse is true. But this is why I noted that was not needed ). Having that matching "index value", then you can extract a singular value of the count property from that array via $arrayElemAt and do the comparison. If it's actually a greater than number, then that document qualifies for a returned result.
Of course you can simply return the document with original fields if you want using another $project or even $addFields, or $unwind to "denormalize" if you again really want a result that looks the same as an SQL "join" result. But the basic logic really only needs the three stages ( and a $lookup within a $lookup ) for the essential parts of implementation.
So I have this document in my database like below
{
"_id": {
"$oid": "59a8668f900bea0528b63fdc"
},
"userId": "KingSlizzard",
"credits": 15,
"settings": {
"music": 1,
"sfx": 0
}
}
I have this method for updating just specific fields in a document
function setPlayerDataField(targetUserId, updateObject) {
playerDataCollection.update({
"userId": targetUserId //Looks for a doc with the userId of the player
}, { $set: updateObject }, //Uses the $set Mongo modifier to set value at a path
false, //Create the document if it does not exist (upsert)
true //This query will only affect a single object (multi)
);
}
It works fine if I do a command like
setPlayerDataField("KingSlizzard",{"credits": 20});
It would result in the document like this
{
"_id": {
"$oid": "59a8668f900bea0528b63fdc"
},
"userId": "KingSlizzard",
"credits": 20,
"settings": {
"music": 1,
"sfx": 0
}
}
The value for credits is now 20 yay! This is desired.
However, if I do this command...
setPlayerDataField("KingSlizzard",{"settings": {"music":0}});
It would result in the document like this
{
"_id": {
"$oid": "59a8668f900bea0528b63fdc"
},
"userId": "KingSlizzard",
"credits": 20,
"settings": {
"music": 0
}
}
All I wanted to do was set only the settings/music value to 0. This result is NOT desired since we lost the sfx value.
So my question is, how do I update a value in a sub object without replacing the whole sub object itself?
To set a specific property of a child document, use dot notation. In your example, write it as:
setPlayerDataField("KingSlizzard", {"settings.music": 0});
See the example in the MongoDB docs.
To specify a <field> in an embedded document or in an array, use dot notation.
My recent project encountered the same problem as this one: the question
db.test.update(
{name:"abc123", "config.a":1 },
{$addToSet:{ config:{a:1,b:2} } },
true
)
Will produce such error:
Cannot apply $addToSet to a non-array field
But after changed to:
db.test.update(
{name:"abc123", "config.a":{$in:[1]} },
{$addToSet:{ config:{a:1,b:2} } },
true
)
It works fine.
Also referenced this link: Answer
Can Any one explain what's going on? "config.a":1 will turn config to be an object? Where "config.a":{$in:[1]} won't?
What you are trying to do here is add a new item to an array only where the item does not exist and also create a new document where it does not exist. You choose $addToSet because you want the items to be unique, but in fact you really want them to be unique by "a" only.
So $addToset will not do that, and you rather need to "test" the element being present. But the real problem here is that it is not possible to both do that and "upsert" at the same time. The logic cannot work as a new document will be created whenever the array element was not found, rather than append to the array element like you want.
The current operation errors by design as $addToSet cannot be used to "create" an array, but only to "add" members to an existing array. But as stated already, you have other problems with achieving the logic.
What you need here is a sequence of update operations that each "try" to perform their expected action. This can only be done with multiple statements:
// attempt "upsert" where document does not exist
// do not alter the document if this is an update
db.test.update(
{ "name": "abc" },
{ "$setOnInsert": { "config": [{ "a": 1, "b": 2 }] }},
{ "upsert": true }
)
// $push the element where "a": 1 does not exist
db.test.update(
{ "name": "abc", "config.a": { "$ne": 1 } },
{ "$push": { "config": { "a": 1, "b": 2 } }}
)
// $set the element where "a": 1 does exist
db.test.update(
{ "name": "abc", "config.a": 1 },
{ "$set": { "config.$.b": 2 } }
)
On a first iteration the first statement will "upsert" the document and create the array with items. The second statement will not match the document because the "a" element has the value that was specified. The third statement will match the document but it will not alter it in a write operation because the values have not changed.
If you now change the input to "b": 3 you get different responses but the desired result:
db.test.update(
{ "name": "abc" },
{ "$setOnInsert": { "config": [{ "a": 1, "b": 3 }] }},
{ "upsert": true }
)
db.test.update(
{ "name": "abc", "config.a": { "$ne": 1 } },
{ "$push": { "config": { "a": 1, "b": 3 } }}
)
db.test.update(
{ "name": "abc", "config.a": 1 },
{ "$set": { "config.$.b": 3 } }
)
So now the first statement matches a document with "name": "abc" but does not do anything since the only valid operations are on "insert". The second statement does not match because "a" matches the condition. The third statment matches the value of "a" and changes "b" in the matched element to the desired value.
Subsequently changing "a" to another value that does not exist in the array allows both 1 and 3 to do nothing but the second statement adds another member to the array keeping the content unique by their "a" keys.
Also submitting a statement with no changes from existing data will of course result in a response that says nothing is changed on all accounts.
That's how you do your operations. You can do this with "ordered" Bulk operations so that there is only a single request and response from the server with the valid response to modified or created.
As explained in this issue on the MongoDB JIRA (https://jira.mongodb.org/browse/SERVER-3946), this can be solved in a single query:
The following update fails because we use $addToSet on a field which we have also included in the first argument (the field which accepts the fields and values to query for). As far as I understand it, you can't use upsert: true in this scenario where we $addToSet to the same field we query with.
db.foo.update({x : "a"}, {$addToSet: {x: "b"}} , {upsert: true}); // FAILS
The solution is to use $elemMatch: {$eq: field: value}
db.foo.update({x: {$elemMatch: {$eq: "a"}}}, {$addToSet: {x: "b"}}, {upsert: true});
It looks like you(/I ) cannot have both upsert and an array element update operation.
If you do (python):
findDct = {
"_id": ObjectId("535e3ab9c36b4417d031402f"),
'events.ids': '176976332'
}
print col.update(findDct, {"$set" : {"events.$.foo": "bar"} }, upsert=True)
It will throw:
pymongo.errors.DuplicateKeyError: insertDocument :: caused by :: 11000 E11000
duplicate key error index: test.col.$_id_ dup key: { : ObjectId('535e3ab9c36b4417d031402f') }
This happens because "_id" is of course an index and mongo tries to insert the document as a new since the find query fails on its 'events.ids': '176976332' part (cheat).
Is it possible to update an unknown element in array with upsert True/how?
Yes it is, but you are going about it in the wrong way. Rather than make "finding" the element that you are not sure whether it exists or not, then try to apply the $addToSet operator instead:
db.collection.update(
{ "_id": ObjectId("535e3ab9c36b4417d031402f" },
{
"$addToSet": { "events": { "foo": "bar" } }
},
{ "upsert": true }
)
Please also note from the positional $ operator documentation that you should not use the $ operator with "upserts" as this will result in the field name being interpreted as a "literal" ( which includes the value as in "events.$.foo" ) and that will be the actual field inserted into the document.
Try to make sure that your array "insert/upsert" operations specify the whole array content in order to make this work.
Another adaptation is with the "bulk" methods, the pymongo driver already has a nice API for this, but this is a general form:
db.runCommand({
"update": "collection",
"updates": [
{
"q": { "_id": ObjectId("535e3ab9c36b4417d031402f" } },
"u": {
"$addToSet": {
"events": {
"foo": "bar", "bar": "baz"
}
}
},
"upsert": true
},
{
"q": { "_id": ObjectId("535e3ab9c36b4417d031402f" } },
"u": {
"$set": { "events.foo": "bar" }
}
}
]
})
But still being very careful that you are not producing duplicates in your sub-document array if you can clearly see the case there. But it is a method, as each update will cascade down even if the first form failed to add anything. Not the best case example, but I hope you see the point.
Im tring to set 0 the items.qty of a document obtains by a id query.
db.warehouses.update(
// query
{
_id:ObjectId('5322f07e139cdd7e31178b78')
},
// update
{
$set:{"items.$.qty":0}
},
// options
{
"multi" : true, // update only one document
"upsert" : true // insert a new document, if no existing document match the query
}
);
Return:
Cannot apply the positional operator without a corresponding query field containing an array.
This is the document that i want to set all items.qty to 0
{
"_id": { "$oid" : "5322f07e139cdd7e31178b78" },
"items": [
{
"_id": { "$oid" : "531ed4cae604d3d30df8e2ca" },
"brand": "BJFE",
"color": "GDRNCCD",
"hand": 1,
"model": 0,
"price": 500,
"qty": 0,
"type": 0
},
{
"brand": "BJFE",
"color": "GDRNCCD",
"hand": 1,
"id": "23",
"model": 0,
"price": 500,
"qty": 4,
"type": 0
},
{
"brand": "BJFE",
"color": "GDRNCCD",
"hand": 1,
"id": "3344",
"model": 0,
"price": 500,
"qty": 6,
"type": 0
}
],
"name": "a"
}
EDIT
The detail missing from the question was that the required field to update was actually in a sub-document. This changes the answer considerably:
This is a constraint of what you can possibly do with updating array elements. And this is clearly explained in the documentation. Mostly in this paragraph:
The positional $ operator acts as a placeholder for the first element that matches the query document
So here is the thing. Trying to update all of the array elements in a single statement like this will not work. In order to do this you must to the following.
db.warehouses.find({ "items.qty": { "$gt": 0 } }).forEach(function(doc) {
doc.items.forEach(function(item) {
item.qty = 0;
});
db.warehouses.update({ "_id": doc._id }, doc );
})
Which is basically the way to update every array element.
The multi setting in .update() means across multiple "documents". It cannot be applied to multiple elements of an array. So presently the best option is to replace the whole thing. Or in this case we may just as well replace the whole document since we need to do that anyway.
For real bulk data, use db.eval(). But please read the documentation first:
db.eval(function() {
db.warehouses.find({ "items.qty": { "$gt": 0 } }).forEach(function(doc) {
doc.items.forEach(function(item) {
item.qty = 0;
});
db.warehouses.update({ "_id": doc._id }, doc );
});
})
Updating all the elements in an array across the whole collection is not simple.
Original
Pretty much exactly what the error says. In order to use a positional operator you need to match something first. As in:
db.warehouses.update(
// query
{
_id:ObjectId('5322f07e139cdd7e31178b78'),
"items.qty": { "$gt": 0 }
},
// update
{
$set:{"items.$.qty":0}
},
// options
{
"multi" : true,
"upsert" : true
}
);
So where the match condition fins the position of the items that are less than 0 then that index is passed to the positional operator.
P.S : When muti is true it means it updates every document. Leave it false if you only mean one. Which is the default.
You can use the $ positional operator only when you specify an array in the first argument (i.e., the query part used to identify the document you want to update).
The positional $ operator identifies an element in an array field to update without explicitly specifying the position of the element in the array.