MongoDB/PyMongo: upsert array element - mongodb

I have the following document:
{'software_house': 'k1',
'client_id': '1234',
'transactions': [
{'antecedents': 12345,
'consequents': '015896018',
'antecedent support': 0.0030889166727954697},
{'antecedents': '932696735',
'consequents': '939605046',
'antecedent support': 0.0012502757961314996}
...
]}
In which key 'transactions' stores within an array 3 features, for each item.
I would like to update each item contained in the 'transactions' array, that matches with the same 'software_house', 'client_id', 'transactions.antecedents' and 'transactions.consequents'; and thus:
Overwriting the element within the array if it does exist
Appending a new value within 'transactions' if it doesn't
How could I achieve that using pymongo?

You can do this with an update with aggregation pipeline. You can first $filter the element matched. Then $setUnion with the item you want to upsert
PyMongo:
db.collection.update_many(filter = {
// the criteria you want to match outside array
"software_house": "k1",
"client_id": "1234"
},
update = [
{
"$addFields": {
"transactions": {
"$filter": {
"input": "$transactions",
"as": "t",
"cond": {
$not: {
$and: [
// the criteria you want to match in array
{
$eq: [
"$$t.antecedents",
12345
]
},
{
$eq: [
"$$t.consequents",
"015896018"
]
}
]
}
}
}
}
}
},
{
"$addFields": {
"transactions": {
"$setUnion": [
"$transactions",
[
{
"antecedents": 12345,
"consequents": "the entry you want to upsert",
"antecedent support": -1
}
]
]
}
}
}
])
Native MongoDB query:
db.collection.update({
// the criteria you want to match outside array
"software_house": "k1",
"client_id": "1234"
},
[
{
"$addFields": {
"transactions": {
"$filter": {
"input": "$transactions",
"as": "t",
"cond": {
$not: {
$and: [
// the criteria you want to match in array
{
$eq: [
"$$t.antecedents",
12345
]
},
{
$eq: [
"$$t.consequents",
"015896018"
]
}
]
}
}
}
}
}
},
{
"$addFields": {
"transactions": {
"$setUnion": [
"$transactions",
[
{
"antecedents": 12345,
"consequents": "the entry you want to upsert",
"antecedent support": -1
}
]
]
}
}
}
],
{
multi: true
})
Here is the Mongo playground for your reference.

Related

MongoDB $filter nested array by date does not work

I have a document with a nested array which looks like this:
[
{
"id": 1,
data: [
[
ISODate("2000-01-01T00:00:00Z"),
2,
3
],
[
ISODate("2000-01-03T00:00:00Z"),
2,
3
],
[
ISODate("2000-01-05T00:00:00Z"),
2,
3
]
]
},
{
"id": 2,
data: []
}
]
As you can see, we have an array of arrays. For each element in the data array, the first element is a date.
I wanted to create an aggregation pipeline which filters only the elements of data where the date is larger than a given date.
db.collection.aggregate([
{
"$match": {
"id": 1
}
},
{
"$project": {
"data": {
"$filter": {
"input": "$data",
"as": "entry",
"cond": {
"$gt": [
"$$entry.0",
ISODate("2000-01-04T00:00:00Z")
]
}
}
}
}
}
])
The problem is that with $gt, this just returns an empty array for data. With $lt this returns all elements. So the filtering clearly does not work.
Expected result:
[
{
"id": 1,
"data": [
[
ISODate("2000-01-05T00:00:00Z"),
2,
3
]
]
}
]
Any ideas?
Playground
I believe the issue is that when you write $$entry.0, MongoDB is trying to evaluate entry.0 as a variable name, when in reality the variable is named entry. You could make use of the $first array operator in order to get the first element like so:
db.collection.aggregate([
{
"$match": {
"id": 1
}
},
{
"$project": {
"data": {
"$filter": {
"input": "$data",
"as": "entry",
"cond": {
"$gt": [
{
$first: "$$entry"
},
ISODate("2000-01-04T00:00:00Z")
]
}
}
}
}
}
])
Mongo playground example
Don't think $$entry.0 work to get the first element of the array. Instead, use $arrayElemAt operator.
db.collection.aggregate([
{
"$match": {
"id": 1
}
},
{
"$project": {
"data": {
"$filter": {
"input": "$data",
"as": "entry",
"cond": {
"$gt": [
{
"$arrayElemAt": [
"$$entry",
0
]
},
ISODate("2000-01-04T00:00:00Z")
]
}
}
}
}
}
])
Sample Mongo Playground
to specify which element in the array you are comparing it is better to use $arrayElemAt instead of $$ARRAY.0. you must pass 2 parameters while using $arrayElemAt, the first one is the array which in your case is $$entry, and the second one is the index which in your case is 0
this is the solution I came up with:
db.collection.aggregate([
{
"$match": {
"id": 1
}
},
{
"$project": {
"data": {
"$filter": {
"input": "$data",
"as": "entry",
"cond": {
"$gt": [
{
"$arrayElemAt": [
"$$entry",
0
]
},
ISODate("2000-01-04T00:00:00Z")
]
}
}
}
}
}
])
playground

MongoDB how to filter in nested array

I have below data. I want to find value=v2 (remove others value which not equals to v2) in the inner array which belongs to name=name2. How to write aggregation for this? The hard part for me is filtering the nestedArray which only belongs to name=name2.
{
"_id": 1,
"array": [
{
"name": "name1",
"nestedArray": [
{
"value": "v1"
},
{
"value": "v2"
}
]
},
{
"name": "name2",
"nestedArray": [
{
"value": "v1"
},
{
"value": "v2"
}
]
}
]
}
And the desired output is below. Please note the value=v1 remains under name=name1 while value=v1 under name=name2 is removed.
{
"_id": 1,
"array": [
{
"name": "name1",
"nestedArray": [
{
"value": "v1"
},
{
"value": "v2"
}
]
},
{
"name": "name2",
"nestedArray": [
{
"value": "v2"
}
]
}
]
}
You can try,
$set to update array field, $map to iterate loop of array field, check condition if name is name2 then $filter to get matching value v2 documents from nestedArray field and $mergeObject merge objects with available objects
let name = "name2", value = "v2";
db.collection.aggregate([
{
$set: {
array: {
$map: {
input: "$array",
in: {
$mergeObjects: [
"$$this",
{
$cond: [
{ $eq: ["$$this.name", name] }, //name add here
{
nestedArray: {
$filter: {
input: "$$this.nestedArray",
cond: { $eq: ["$$this.value", value] } //value add here
}
}
},
{}
]
}
]
}
}
}
}
}
])
Playground
You can use the following aggregation query:
db.collection.aggregate([
{
$project: {
"array": {
"$concatArrays": [
{
"$filter": {
"input": "$array",
"as": "array",
"cond": {
"$ne": [
"$$array.name",
"name2"
]
}
}
},
{
"$filter": {
"input": {
"$map": {
"input": "$array",
"as": "array",
"in": {
"name": "$$array.name",
"nestedArray": {
"$filter": {
"input": "$$array.nestedArray",
"as": "nestedArray",
"cond": {
"$eq": [
"$$nestedArray.value",
"v2"
]
}
}
}
}
}
},
"as": "array",
"cond": {
"$eq": [
"$$array.name",
"name2"
]
}
}
}
]
}
}
}
])
MongoDB Playground

Using $elemMatch and $or to implement a fallback logic (in projection)

db.projects.findOne({"_id": "5CmYdmu2Aanva3ZAy"},
{
"responses": {
"$elemMatch": {
"match.nlu": {
"$elemMatch": {
"intent": "intent1",
"$and": [
{
"$or": [
{
"entities.entity": "entity1",
"entities.value": "value1"
},
{
"entities.entity": "entity1",
"entities.value": {
"$exists": false
}
}
]
}
],
"entities.1": {
"$exists": false
}
}
}
}
}
})
In a given project I need a projection containing only one response, hence $elemMatch. Ideally, look for an exact match:
{
"entities.entity": "entity1",
"entities.value": "value1"
}
But if such a match doesn't exist, look for a record where entities.value does not exist
The query above doesn't work because if it finds an item with entities.value not set it will return it. How can I get this fallback logic in a Mongo query
Here is an example of document
{
"_id": "5CmYdmu2Aanva3ZAy",
"responses": [
{
"match": {
"nlu": [
{
"entities": [],
"intent": "intent1"
}
]
},
"key": "utter_intent1_p3vE6O_XsT"
},
{
"match": {
"nlu": [
{
"entities": [{
"entity": "entity1",
"value": "value1"
}],
"intent": "intent1"
}
]
},
"key": "utter_intent1_p3vE6O_XsT"
},
{
"match": {
"nlu": [
{
"intent": "intent2",
"entities": []
},
{
"intent": "intent1",
"entities": [
{
"entity": "entity1"
}
]
}
]
},
"key": "utter_intent2_Laag5aDZv2"
}
]
}
To answer the question, the first thing to start with is that doing what you want is not as simple as an $elemMatch projection and requires special projection logic of the aggregation framework. The second main principle here is "nesting arrays is a really bad idea", and this is exactly why:
db.collection.aggregate([
{ "$match": { "_id": "5CmYdmu2Aanva3ZAy" } },
{ "$addFields": {
"responses": {
"$filter": {
"input": {
"$map": {
"input": "$responses",
"in": {
"match": {
"nlu": {
"$filter": {
"input": {
"$map": {
"input": "$$this.match.nlu",
"in": {
"entities": {
"$let": {
"vars": {
"entities": {
"$filter": {
"input": "$$this.entities",
"cond": {
"$and": [
{ "$eq": [ "$$this.entity", "entity1" ] },
{ "$or": [
{ "$eq": [ "$$this.value", "value1" ] },
{ "$ifNull": [ "$$this.value", false ] }
]}
]
}
}
}
},
"in": {
"$cond": {
"if": { "$gt": [{ "$size": "$$entities" }, 1] },
"then": {
"$slice": [
{ "$filter": {
"input": "$$entities",
"cond": { "$eq": [ "$$this.value", "value1" ] }
}},
0
]
},
"else": "$$entities"
}
}
}
},
"intent": "$$this.intent"
}
}
},
"cond": { "$ne": [ "$$this.entities", [] ] }
}
}
},
"key": "$$this.key"
}
}
},
"cond": { "$ne": [ "$$this.match.nlu", [] ] }
}
}
}}
])
Will return:
{
"_id" : "5CmYdmu2Aanva3ZAy",
"responses" : [
{
"match" : {
"nlu" : [
{
"entities" : [
{
"entity" : "entity1",
"value" : "value1"
}
],
"intent" : "intent1"
}
]
},
"key" : "utter_intent1_p3vE6O_XsT"
}
]
}
That is extracting ( as best I can determine your specification ), the first matching element from the nested inner array of entities where the conditions for both entity and value are met OR where the value property does not exist.
Note the additional fallback in that if both conditions meant returning multiple array elements, then only the first match where the value was present and matching would be the result returned.
Querying deeply nested arrays requires chained usage of $map and $filter in order to traverse those array contents and return only items which match the conditions. You cannot specify these conditions in an $elemMatch projection, nor has it even been possible until recent releases of MongoDB to even atomically update such structures without overwriting significant parts of the document or introducing problems with update concurrency.
More detailed explanation of this is on my existing answer to Updating a Nested Array with MongoDB and from the query side on Find in Double Nested Array MongoDB.
Note that both responses there show usage of $elemMatch as a "query" operator, which is really only about "document selection" ( therefore does not apply to an _id match condition ) and cannot be used in concert with the former "projection" variant nor the positional $ projection operator.
You would be advised then to "not nest arrays" and instead take the option of "flatter" data structures as those answers already discuss at length.

Return first element if no match found in array

I have the following document:
{
_id: 123,
state: "AZ",
products: [
{
product_id: 1,
desc: "P1"
},
{
product_id: 2,
desc: "P2"
}
]
}
I need to write a query to return a single element from the products array where state is "AZ" and product_id is 2. If the matching product_id is not found, then return the first (or any) element from the products array.
For example: If product_id is 2 (match found), then the result should be:
products: [
{
product_id: 2,
desc: "P2"
}
]
If the product_id is 3 (not found), then the result should be:
products: [
{
product_id: 1,
desc: "P1"
}
]
I was able to meet one condition when the match is found but not sure how to satisfy the second condition in the same query:
db.getCollection('test').find({"state": "AZ"}, {_id: 0, state: 0, products: { "$elemMatch": {"product_id": "2"}}})
I tried using the aggregation pipeline as well but could not find a working solution.
Note: This is different from the following question as I need to return a default element if the match is not found:
Retrieve only the queried element in an object array in MongoDB collection
You can try below aggregation
Basically you need to $filter the products array and check for the $condition if it doesn't contain any element or equal to [] then you have to $slice with the first element of the products array.
db.collection.aggregate([
{ "$addFields": {
"products": {
"$cond": [
{
"$eq": [
{ "$filter": {
"input": "$products",
"cond": { "$eq": ["$$this.product_id", 2] }
}},
[]
]
},
{ "$slice": ["$products", 1] },
{ "$filter": {
"input": "$products",
"cond": { "$eq": ["$$this.product_id", 2] }
}}
]
}
}}
])
or even using $let aggregation
db.collection.aggregate([
{ "$addFields": {
"products": {
"$let": {
"vars": {
"filt": {
"$filter": {
"input": "$products",
"cond": { "$eq": ["$$this.product_id", 2] }
}
}
},
"in": {
"$cond": [
{ "$eq": ["$$filt", []] },
{ "$slice": ["$products", 1] },
"$$filt"
]
}
}
}
}}
])
If you don't care which element you get back then this is the way to go (you'll get the last element in the array in case of no match since $indexOfArray will return -1):
db.getCollection('test').aggregate([{
$addFields: {
"products": {
$arrayElemAt: [ "$products", { $indexOfArray: [ "$products.product_id", 2 ] } ]
},
}
}])
If you want the first then do this instead ($max will take care of transforming -1 into index 0 which is the first element):
db.getCollection('test').aggregate([{
$addFields: {
"products": {
$arrayElemAt: [ "$products", { $max: [ 0, { $indexOfArray: [ "$products.product_id", 2 ] } ] } ]
},
}
}])
Here is a version that should work on v3.2 as well:
db.getCollection('test').aggregate([{
"$project": {
"products": {
$slice: [{
$concatArrays: [{
$filter: {
"input": "$products",
"cond": { "$eq": ["$$this.product_id", 2] }
}},
"$products" // simply append the "products" array
// alternatively, you could append only the first or a specific item like this [ { $arrayElemAt: [ "$products", 0 ] } ]
]
},
1 ] // take first element only
}
}
}])

Mongo Query to Return only a subset of SubDocuments

Using the example from the Mongo docs:
{ _id: 1, results: [ { product: "abc", score: 10 }, { product: "xyz", score: 5 } ] }
{ _id: 2, results: [ { product: "abc", score: 8 }, { product: "xyz", score: 7 } ] }
{ _id: 3, results: [ { product: "abc", score: 7 }, { product: "xyz", score: 8 } ] }
db.survey.find(
{ id: 12345, results: { $elemMatch: { product: "xyz", score: { $gte: 6 } } } }
)
How do I return survey 12345 (regardless of even if it HAS surveys or not) but only return surveys with a score greater than 6? In other words I don't want the document disqualified from the results based on the subdocument, I want the document but only a subset of subdocuments.
What you are asking for is not so much a "query" but is basically just a filtering of content from the array in each document.
You do this with .aggregate() and $project:
db.survey.aggregate([
{ "$project": {
"results": {
"$setDifference": [
{ "$map": {
"input": "$results",
"as": "el",
"in": {
"$cond": [
{ "$and": [
{ "$eq": [ "$$el.product", "xyz" ] },
{ "$gte": [ "$$el.score", 6 ] }
]}
]
}
}},
[false]
]
}
}}
])
So rather than "contrain" results to documents that have an array member matching the condition, all this is doing is "filtering" the array members out that do not match the condition, but returns the document with an empty array if need be.
The fastest present way to do this is with $map to inspect all elements and $setDifference to filter out any values of false returned from that inspection. The possible downside is a "set" must contain unique elements, so this is fine as long as the elements themselves are unique.
Future releases will have a $filter method, which is similar to $map in structure, but directly removes non-matching results where as $map just returns them ( via the $cond and either the matching element or false ) and is then better suited.
Otherwise if not unique or the MongoDB server version is less than 2.6, you are doing this using $unwind, in a non performant way:
db.survey.aggregate([
{ "$unwind": "$results" },
{ "$group": {
"_id": "$_id",
"results": { "$push": "$results" },
"matched": {
"$sum": {
"$cond": [
{ "$and": [
{ "$eq": [ "$results.product", "xyz" ] },
{ "$gte": [ "$results.score", 6 ] }
]},
1,
0
]
}
}
}},
{ "$unwind": "$results" },
{ "$match": {
"$or": [
{
"results.product": "xyz",
"results.score": { "$gte": 6 }
},
{ "matched": 0 }
}},
{ "$group": {
"_id": "$_id",
"results": { "$push": "$results" },
"matched": { "$first": "$matched" }
}},
{ "$project": {
"results": {
"$cond": [
{ "$ne": [ "$matched", 0 ] },
"$results",
[]
]
}
}}
])
Which is pretty horrible in both design and perfomance. As such you are probably better off doing the filtering per document in client code instead.
You can use $filter in mongoDB 3.2
db.survey.aggregate([{
$match: {
{ id: 12345}
}
}, {
$project: {
results: {
$filter: {
input: "$results",
as: "results",
cond:{$gt: ['$$results.score', 6]}
}
}
}
}]);
It will return all the sub document that have score greater than 6. If you want to return only first matched document than you can use '$' operator.
You can use $redact in this way:
db.survey.aggregate( [
{ $match : { _id : 12345 }},
{ $redact: {
$cond: {
if: {
$or: [
{ $eq: [ "$_id", 12345 ] },
{ $and: [
{ $eq: [ "$product", "xyz" ] },
{ $gte: [ "$score", 6 ] }
]}
]
},
then: "$$DESCEND",
else: "$$PRUNE"
}
}
}
] );
It will $match by _id: 12345 first and then it will "$$PRUNE" all the subdocuments that don't have "product":"xyz" and don't have score greater or equal 6. I added the condition ($cond) { $eq: [ "$_id", 12345 ] } so that it wouldn't prune the whole document before it reaches the subdocuments.