Exclude nested documents based on a condition - mongodb

I have a document in the following form
{
"_id": "5c9a53b348a0ac000140b5f9",
"e": [
{
"_id": "d6c74cd5-5808-4b0c-b857-57ddbcc72ce5",
"isDeleted": true
},
{
"_id": "d6c74cd5-5808-4b0c-b857-57ddbcc72ce6",
"isDeleted": false
}
]
}
Every document has a list of elements on it, each of the elements which may or may not be deleted. By default, I don't want to return the deleted data. Right now, I filter them server-side but that still means a lot of data gets transmitted unnecessarily. Is it possible to exclude this data on the database?
I've looked at $elemMatch but that only returns a single value so it doesn't look like the right tool for the job.
Is there a way to project a document with an array of nested documents to only include those subdocuments that adhere to a certain condition?

You can use $filter aggregation here
db.collection.aggregate([
{ "$addFields": {
"e": {
"$filter": {
"input": "$e"
"cond": { "$eq": ["$$this.isDeleted", true] }
}
}
}}
])

Related

Mongodb - aggregate match subarray

I trying to match the data in Subarray for some reason it is grouped like this.
Data :
{
"_id": 1,
"addresDetails": [
[
{
"Name":"John",
"Place":"Berlin",
"Pincode":"10001"
},
{
"Name":"Sarah",
"Place":"Newyork",
"Pincode":"10002"
}
],
[
{
"Name":"Mark",
"Place":"Tokyo",
"Pincode":"10003"
},
{
"Name":"Michael",
"Place":"Newyork",
"Pincode":"10002"
}
]
]
}
I tried with this Match query:
{
"$match":{
"attributes":{
"$elemMatch":{
"$in":["Mark"]
}
}
}
}
I am getting No data found , How do i match the elements in this subarrays.
Query
aggregation way, in general if you are stuck and query operators or update operators seems not enough, aggregation provides so much more operators, and its alternative.
2 nested filter in the 2 level arrays to find a Name in array [Mark]
*maybe there is a shorter more declarative way with $elemMatch, and possible a way to use index, also think about schema change, maybe you dont really need array with array members (the bellow doesnt use index)
*i used addressDetails remove the one s else you will get empty results
Playmongo
aggregate(
[{"$match":
{"$expr":
{"$ne":
[{"$filter":
{"input": "$addressDetails",
"as": "a",
"cond":
{"$ne":
[{"$filter":
{"input": "$$a",
"as": "d",
"cond": {"$in": ["$$d.Name", ["Mark"]]}}},
[]]}}},
[]]}}}])
You can apparently nest elemMatch as well, e.g.:
db.collection.find({
"addresDetails": {
$elemMatch: {
$elemMatch: {
"Name": "Mark"
}
}
}
})
This matches your document, as shown by this mongo playground link, but is probably not very efficient.
Alternatively you can use aggregations. For example unwind may help to flatten out your nested arrays, and allow for easier match afterwards.
db.collection.aggregate([
{
"$unwind": "$addresDetails"
},
{
"$match": {
"addresDetails.Name": "Mark"
}
}
])
You can find the mongo playground link for this here. But unwind is usually not preferred as the first stage of the aggregation pipeline either, again because of performance reasons.
Also please note that the results for these 2 options are different!

Query on all nested docs inside a nested doc in MongoDB

I have a collection with documents that look like this:
{ keyA1: "stringVal",
keyA2: "stringVal",
keyA3: { keyB1: { feild1: intVal,
feild2: intVal}
keyB2: { feild1: intVal,
feild2: intVal}
}
}
Currently the [keyB1, keyB2, ...] set is 7 keys, same for all documents in the collection. I want to query the intVals on specific fields for all keyB's. So, for example, I might want to find all documents where field2 has value greater than 100 regardless of whcih keyB it falls in.
For any one specific keyB, I simply use the dot notation: {"keyA3.keyB2.field2": {$gte: 100}}. Right now, I have the option of looping over all keyB's, but this may not be the case in the future where more keyB values can be added. I don't want to have to modify the code then, and would like to avoid harcoding those values in anyway. I also need the solution to be fairly fast, as the final deployment is expected to have over 20M documents.
How can I write a query that can "skip" the keyB field in the dot notation and just go through all the embedded docs?
FWIW, I'm implementing this in python using pymongo. Thanks.
first convert keyA3 object to array and add new field with $addFields
then filter the new array to match field2 value is greater than 100
then query the doc that size of matched array is greater than 0 , then remove extra field we add
db.collection.aggregate([
{
"$addFields": {
"arr": {
"$objectToArray": "$keyA3"
}
}
},
{
"$addFields": {
"matchArrSize": {
$size: {
"$filter": {
"input": "$arr",
"as": "z",
"cond": {
$gt: [
"$$z.v.feild2",
100
]
}
}
}
}
}
},
{
$match: {
matchArrSize: {
$gt: 0
}
}
},
{
$unset: [
"arr",
"matchArrSize"
]
}
])
https://mongoplayground.net/p/VumwL9y7Km1

Return only matched sub-document elements within a nested array

The main collection is retailer, which contains an array for stores. Each store contains an array of offers (you can buy in this store). This offers array has an array of sizes. (See example below)
Now I try to find all offers, which are available in the size L.
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"XS",
"S",
"M"
]
},
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"S",
"L",
"XL"
]
}
]
}
}
I've try this query: db.getCollection('retailers').find({'stores.offers.size': 'L'})
I expect some Output like that:
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"S",
"L",
"XL"
]
}
]
}
}
But the Output of my Query contains also the non matching offer with size XS,X and M.
How I can force MongoDB to return only the offers, which matched my query?
Greetings and thanks.
So the query you have actually selects the "document" just like it should. But what you are looking for is to "filter the arrays" contained so that the elements returned only match the condition of the query.
The real answer is of course that unless you are really saving a lot of bandwidth by filtering out such detail then you should not even try, or at least beyond the first positional match.
MongoDB has a positional $ operator which will return an array element at the matched index from a query condition. However, this only returns the "first" matched index of the "outer" most array element.
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$': 1 }
)
In this case, it means the "stores" array position only. So if there were multiple "stores" entries, then only "one" of the elements that contained your matched condition would be returned. But, that does nothing for the inner array of "offers", and as such every "offer" within the matchd "stores" array would still be returned.
MongoDB has no way of "filtering" this in a standard query, so the following does not work:
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$.offers.$': 1 }
)
The only tools MongoDB actually has to do this level of manipulation is with the aggregation framework. But the analysis should show you why you "probably" should not do this, and instead just filter the array in code.
In order of how you can achieve this per version.
First with MongoDB 3.2.x with using the $filter operation:
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$project": {
"stores": {
"$filter": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$filter": {
"input": "$$store.offers",
"as": "offer",
"cond": {
"$setIsSubset": [ ["L"], "$$offer.size" ]
}
}
}
}
}
},
"as": "store",
"cond": { "$ne": [ "$$store.offers", [] ]}
}
}
}}
])
Then with MongoDB 2.6.x and above with $map and $setDifference:
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$project": {
"stores": {
"$setDifference": [
{ "$map": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$setDifference": [
{ "$map": {
"input": "$$store.offers",
"as": "offer",
"in": {
"$cond": {
"if": { "$setIsSubset": [ ["L"], "$$offer.size" ] },
"then": "$$offer",
"else": false
}
}
}},
[false]
]
}
}
}
},
"as": "store",
"in": {
"$cond": {
"if": { "$ne": [ "$$store.offers", [] ] },
"then": "$$store",
"else": false
}
}
}},
[false]
]
}
}}
])
And finally in any version above MongoDB 2.2.x where the aggregation framework was introduced.
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$unwind": "$stores" },
{ "$unwind": "$stores.offers" },
{ "$match": { "stores.offers.size": "L" } },
{ "$group": {
"_id": {
"_id": "$_id",
"storeId": "$stores._id",
},
"offers": { "$push": "$stores.offers" }
}},
{ "$group": {
"_id": "$_id._id",
"stores": {
"$push": {
"_id": "$_id.storeId",
"offers": "$offers"
}
}
}}
])
Lets break down the explanations.
MongoDB 3.2.x and greater
So generally speaking, $filter is the way to go here since it is designed with the purpose in mind. Since there are multiple levels of the array, you need to apply this at each level. So first you are diving into each "offers" within "stores" to examime and $filter that content.
The simple comparison here is "Does the "size" array contain the element I am looking for". In this logical context, the short thing to do is use the $setIsSubset operation to compare an array ("set") of ["L"] to the target array. Where that condition is true ( it contains "L" ) then the array element for "offers" is retained and returned in the result.
In the higher level $filter, you are then looking to see if the result from that previous $filter returned an empty array [] for "offers". If it is not empty, then the element is returned or otherwise it is removed.
MongoDB 2.6.x
This is very similar to the modern process except that since there is no $filter in this version you can use $map to inspect each element and then use $setDifference to filter out any elements that were returned as false.
So $map is going to return the whole array, but the $cond operation just decides whether to return the element or instead a false value. In the comparison of $setDifference to a single element "set" of [false] all false elements in the returned array would be removed.
In all other ways, the logic is the same as above.
MongoDB 2.2.x and up
So below MongoDB 2.6 the only tool for working with arrays is $unwind, and for this purpose alone you should not use the aggregation framework "just" for this purpose.
The process indeed appears simple, by simply "taking apart" each array, filtering out the things you don't need then putting it back together. The main care is in the "two" $group stages, with the "first" to re-build the inner array, and the next to re-build the outer array. There are distinct _id values at all levels, so these just need to be included at every level of grouping.
But the problem is that $unwind is very costly. Though it does have purpose still, it's main usage intent is not to do this sort of filtering per document. In fact in modern releases it's only usage should be when an element of the array(s) needs to become part of the "grouping key" itself.
Conclusion
So it's not a simple process to get matches at multiple levels of an array like this, and in fact it can be extremely costly if implemented incorrectly.
Only the two modern listings should ever be used for this purpose, as they employ a "single" pipeline stage in addition to the "query" $match in order to do the "filtering". The resulting effect is little more overhead than the standard forms of .find().
In general though, those listings still have an amount of complexity to them, and indeed unless you are really drastically reducing the content returned by such filtering in a way that makes a significant improvement in bandwidth used between the server and client, then you are better of filtering the result of the initial query and basic projection.
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$': 1 }
).forEach(function(doc) {
// Technically this is only "one" store. So omit the projection
// if you wanted more than "one" match
doc.stores = doc.stores.filter(function(store) {
store.offers = store.offers.filter(function(offer) {
return offer.size.indexOf("L") != -1;
});
return store.offers.length != 0;
});
printjson(doc);
})
So working with the returned object "post" query processing is far less obtuse than using the aggregation pipeline to do this. And as stated the only "real" diffrerence would be that you are discarding the other elements on the "server" as opposed to removing them "per document" when received, which may save a little bandwidth.
But unless you are doing this in a modern release with only $match and $project, then the "cost" of processing on the server will greatly outweigh the "gain" of reducing that network overhead by stripping the unmatched elements first.
In all cases, you get the same result:
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size" : [
"S",
"L",
"XL"
]
}
]
}
]
}
as your array is embeded we cannot use $elemMatch, instead you can use aggregation framework to get your results:
db.retailers.aggregate([
{$match:{"stores.offers.size": 'L'}}, //just precondition can be skipped
{$unwind:"$stores"},
{$unwind:"$stores.offers"},
{$match:{"stores.offers.size": 'L'}},
{$group:{
_id:{id:"$_id", "storesId":"$stores._id"},
"offers":{$push:"$stores.offers"}
}},
{$group:{
_id:"$_id.id",
stores:{$push:{_id:"$_id.storesId","offers":"$offers"}}
}}
]).pretty()
what this query does is unwinds arrays (twice), then matches size and then reshapes the document to previous form. You can remove $group steps and see how it prints.
Have a fun!
It's also works without aggregate.
here is the solution link:https://mongoplayground.net/p/Q5lxPvGK03A
db.collection.find({
"stores.offers.size": "L"
},
{
"stores": {
"$filter": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$filter": {
"input": "$$store.offers",
"as": "offer",
"cond": {
"$setIsSubset": [
[
"L"
],
"$$offer.size"
]
}
}
}
}
}
},
"as": "store",
"cond": {
"$ne": [
"$$store.offers",
[]
]
}
}
}
})

Mongodb Aggregate a $slice to get an element in exact position from a nested array

I would like to retrieve a value from a nested array where it exists at an exact position within the array.
I want to create name value pairs by doing $slice[0,1] for the name and then $slice[1,1] for the value.
Before I attempt to use aggregate, I want to attempt a find within a nested array. I can do what I want on a single depth array in a document as shown below:
{
"_id" : ObjectId("565cc5261506995581569439"),
"a" : [
4,
2,
8,
71,
21
]
}
I apply the following: db.getCollection('anothertest').find({},{_id:0, a: {$slice:[0,1]}})
and I get:
{
"a" : [
4
]
}
This is fantastic. However, what if the array I want to $slice [0,1] is located within the document at objectRawOriginData.Reports.Rows.Rows.Cells?
If I can first of all FIND then I want to apply the same as an AGGREGATE.
Your best bet here and especially if your application is not yet ready for release is to hold off until MongoDB 3.2 for deployment, or at least start working with a release candidate in the interim. The main reason being is that the "projection" $slice does not work with the aggregation framework, as do not other forms of array matching projection as well. But this has been addressed for the upcoming release.
This is going to give you a couple of new operators, being $slice and even $arrayElemAt which can be used to address array elements by position in the aggregation pipeline.
Either:
db.getCollection('anothertest').aggregate([
{ "$project": {
"_id": 0,
"a": { "$slice": ["$a",0,1] }
}}
])
Which returns the familiar:
{ "a" : [ 4 ] }
Or:
db.getCollection('anothertest').aggregate([
{ "$project": {
"_id": 0,
"a": { "$arrayElemAt": ["$a", 0] }
}}
])
Which is just the element and not an array:
{ "a" : 4 }
Until that release becomes available other than in release candidate form, the currently available operators make it quite easy for the "first" element of the array:
db.getCollection('anothertest').aggregate([
{ "$unwind": "$a" },
{ "$group": {
"_id": "$_id",
"a": { "$first": "$a" }
}}
])
Through use of the $first operator after $unwind. But getting another indexed position becomes horribly iterative:
db.getCollection('anothertest').aggregate([
{ "$unwind": "$a" },
// Keeps the first element
{ "$group": {
"_id": "$_id",
"first": { "$first": "$a" },
"a": { "$push": "$a" }
}},
{ "$unwind": "$a" },
// Removes the first element
{ "$redact": {
"$cond": {
"if": { "$ne": [ "$first", "$a" ] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
// Top is now the second element
{ "$group": {
"_id": "$_id",
"second": { "$first": "$a" }
}}
])
And so on, and also a lot of handling to alter that to deal with arrays that might be shorter than the "nth" element you are looking for. So "possible", but ugly and not performant.
Also noting that is "not really" working with "indexed positions", and is purely matching on values. So duplicate values would easily be removed, unless there was another unique identifier per array element to work with. Future $unwind also has the ability to project an array index, which is handy for other purposes, but the other operators are more useful for this specific case than that feature.
So for my money I would wait till you had the feature available to be able to integrate this in an aggregation pipeline, or at least re-consider why you believe you need it and possibly design around it.

How to search embedded array

I want to get all matching values, using $elemMatch.
// create test data
db.foo.insert({values:[0,1,2,3,4,5,6,7,8,9]})
db.foo.find({},{
'values':{
'$elemMatch':{
'$gt':3
}
}
}) ;
My expecected result is {values:[3,4,5,6,7,8,9]} . but , really result is {values:[4]}.
I read mongo document , I understand this is specification.
How do I search for multi values ?
And more, I use 'skip' and 'limit'.
Any idea ?
Using Aggregation:
db.foo.aggregate([
{$unwind:"$values"},
{$match:{"values":{$gt:3}}},
{$group:{"_id":"$_id","values":{$push:"$values"}}}
])
You can add further filter condition in the $match, if you would like to.
You can't achieve this using an $elemMatch operator since, mongoDB doc says:
The $elemMatch projection operator limits the contents of an array
field that is included in the query results to contain only the array
element that matches the $elemMatch condition.
Note
The elements of the array are documents.
If you look carefully at the documentation on $elemMatch or the counterpart to query of the positional $ operator then you would see that only the "first" matched element is returned by this type of "projection".
What you are looking for is actually "manipulation" of the document contents where you want to "filter" the content of the array in the document rather than return the original or "matched" element, as there can be only one match.
For true "filtering" you need the aggregation framework, as there is more support there for document manipulation:
db.foo.aggregate([
// No point selecting documents that do not match your condition
{ "$match": { "values": { "$gt": 3 } } },
// Unwind the array to de-normalize as documents
{ "$unwind": "$values },
// Match to "filter" the array
{ "$match": { "values": { "$gt": 3 } } },
// Group by to the array form
{ "$group": {
"_id": "$_id",
"values": { "$push": "$values" }
}}
])
Or with modern versions of MongoDB from 2.6 and onwards, where the array values are "unique" you could do this:
db.foo.aggregate([
{ "$project": {
"values": {
"$setDifference": [
{ "$map": {
"input": "$values",
"as": "el",
"in": {
"$cond": [
{ "$gt": [ "$$el", 3 ] },
"$$el",
false
]
}
}},
[false]
]
}
}}
])