MongoDB unwind multiple empty arrays - mongodb

The sample data in the database looks something like this:
{
'data':
[
'Log':
{
'IP':['8.8.8.8','8.8.4.4'],
'URL':['www.google.com']
'Hash' ['d2a12319bf1221ce7681928cc']
},
'Log':
{
'IP':['1.2.3.4'],
'URL':['www.cnn.com']
'Hash' []
},
]
}
I am trying to aggregate a list of unique IP, URL and Hash from the above list of logs. My current query looks sth like this:
db.loglist.aggregate([{'$match':{'data.Log':{'$exists':true}}},
{'$unwind':'$data'},
{'$unwind':'$data.Log.URL'},
{'$unwind':'$data.Log.Hash'},
{'$unwind':'$data.Log.IP'},
{'$group':{'_id':'$ioc',
'FHList':{'$addToSet':'$data.Log.Hash'},
'URLList':{'$addToSet':'$data.Log.URL'},
'IPList':{'$addToSet':'$data.Log.IP'}}
}])
It works well if for every log, there is at least one element in each of the three arrays. However, when there is an empty array appears in any one of the logs. Mongo returns empty for the whole query. I figured out it's the default behavior of $unwind from a few similar posts. But what is the standard way to use $unwind then, if say we have no results for "Hash", we can still keep the results for "IP" and "URL".
Thanks in advance for any answer.

The $cond operator is the main helper here, with a test to see if the array is empty, and replace it with another value to filter later:
db.loglist.aggregate([
{"$match":{"data.Log":{"$exists":true}}},
{"$unwind":"$data"},
{ "$project": {
"ioc": 1,
"data": {
"Log": {
"IP": { "$cond": [
{ "$ne": [ "$IP", [] ] },
"$IP",
[false]
]},
"URL": { "$cond": [
{ "$ne": [ "$URL", [] ] },
"$URL",
[false]
]},
"Hash": { "$cond": [
{ "$ne": [ "$Hash", [] ] },
"$Hash",
[false]
]}
}
}
}}
{"$unwind":"$data.Log.URL"},
{"$unwind":"$data.Log.Hash"},
{"$unwind":"$data.Log.IP"},
{"$group":{
"_id":"$ioc",
"FHList":{"$addToSet":"$data.Log.Hash"},
"URLList":{"$addToSet":"$data.Log.URL"},
"IPList":{"$addToSet":"$data.Log.IP"}
}},
{ "$project": {
"FHList":{ "$setDifference": ["$FHList", [false]] },
"URLList":{ "$setDifference": ["$URList", [false]] },
"IPList":{ "$setDifference": ["$IPList", [false]] }
}}
])
Once the set it contructed the unwanted value is filtered away.
If your MongoDB version is less than 2.6 and you do not have $setDifference then your can filter after unwinding again, presuming that no result array would be expected to be empty here:
db.loglist.aggregate([
{"$match":{"data.Log":{"$exists":true}}},
{"$unwind":"$data"},
{ "$project": {
"ioc": 1,
"data": {
"Log": {
"IP": { "$cond": [
{ "$ne": [ "$IP", [] ] },
"$IP",
[false]
]},
"URL": { "$cond": [
{ "$ne": [ "$URL", [] ] },
"$URL",
[false]
]},
"Hash": { "$cond": [
{ "$ne": [ "$Hash", [] ] },
"$Hash",
[false]
]}
}
}
}}
{"$unwind":"$data.Log.URL"},
{"$unwind":"$data.Log.Hash"},
{"$unwind":"$data.Log.IP"},
{"$group":{
"_id":"$ioc",
"FHList":{"$addToSet":"$data.Log.Hash"},
"URLList":{"$addToSet":"$data.Log.URL"},
"IPList":{"$addToSet":"$data.Log.IP"}
}},
{ "$unwind": "$FHList" },
{ "$match": { "FHList": { "$ne": false } }},
{ "$unwind": "$URLList" },
{ "$match": { "URLList": { "$ne": false } }},
{ "$unwind": "$IPList" },
{ "$match": { "IPList": { "$ne": false } }},
{ "$group": {
"_id": "$_id",
"FHList":{ "$addToSet":"$FHList" },
"URLList":{ "$addToSet":"$URLList" },
"IPList":{ "$addToSet":"$IPList" }
}}
])
If your grouped arrays were empty then it is tricky in the second form but still possible.

Related

Get property with highest value from key value pair

In MongoDB, I have documents with a structure like this:
{
_id: "123456...", // an ObjectId
name: "foobar",
classification: {
class_1: 0.45,
class_2: 0.11,
class_3: 0.44
}
}
Using the aggregation pipeline, is it possible to give me an object that contains the highest classification? So, given the above, I would like something like this as result:
{
_id: "123456...", // an ObjectId
name: "foobar",
classification: "class_1"
}
I thought I could use $unwind but the classification property is not an array.
For what it's worth: I know there will always be three properties in classification, so it's ok to hard-code the keys in the query.
You should probably note here that every technique applied is essentially based on "coercion" of the "key/value" pairs into an "array" format for comparison and extraction. So the real lesson to learn is is that your document "should" in fact store this as an "array" instead. But onto the techniques.
If you have MongoDB 3.4 then you can use $objectToArray to turn the "keys" into an array so you can get the value:
Dynamic
db.collection.aggregate([
{ "$addFields": {
"classification": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": { "$objectToArray": "$classification" },
"as": "c",
"cond": {
"$eq": [
"$$c.v",
{ "$max": {
"$map": {
"input": { "$objectToArray": "$classification" },
"as": "c",
"in": "$$c.v"
}
}}
]
}
}
},
"as": "c",
"in": "$$c.k",
}},
0
]
}
}}
])
Otherwise just to the transformation as you iterate the cursor if you do not really need it for further aggregation. As a basic JavaScript example:
db.collection.find().map(d => Object.assign(
d,
{ classification: Object.keys(d.classification)
.filter(k => d.classification[k] === Math.max.apply(null,
Object.keys(d.classification).map(k => d.classification[k])
))[0]
}
));
And that's also the same basic logic that you apply using mapReduce if you were actually aggregating something.
Both produce:
/* 1 */
{
"_id" : "123456...",
"name" : "foobar",
"classification" : "class_1"
}
HardCoding
On the "hardcoding" case which you say is okay. Then you can construct like this with $switch by supplying $max with each of the values:
db.collection.aggregate([
{ "$addFields": {
"classification": {
"$let": {
"vars": {
"max": {
"$max": [
"$classification.class_1",
"$classification.class_2",
"$classification.class_3"
]
}
},
"in": {
"$switch": {
"branches": [
{ "case": { "$eq": [ "$classification.class_1", "$$max" ] }, "then": "class_1" },
{ "case": { "$eq": [ "$classification.class_2", "$$max" ] }, "then": "class_2" },
{ "case": { "$eq": [ "$classification.class_3", "$$max" ] }, "then": "class_3" },
]
}
}
}
}
}}
])
Which gives rise to then actually being able to write that out longer using $cond, and then the only real constraint is the change in $max for MongoDB 3.2, which allowed an array of arguments as opposed to it's previous role as an "accumulator only":
db.collection.aggregate([
{ "$addFields": {
"classification": {
"$let": {
"vars": {
"max": {
"$max": [
"$classification.class_1",
"$classification.class_2",
"$classification.class_3"
]
}
},
"in": {
"$cond": {
"if": { "$eq": [ "$classification.class_1", "$$max" ] },
"then": "class_1",
"else": {
"$cond": {
"if": { "$eq": [ "$classification.class_2", "$$max" ] },
"then": "class_2",
"else": "class_3"
}
}
}
}
}
}
}}
])
If you were "really" constrained then you could "force" the "max" through a separate pipeline stage using $map and $unwind on the array then $group again. This would make the operations compatible with MongoDB 2.6:
db.collection.aggregate([
{ "$project": {
"name": 1,
"classification": 1,
"max": {
"$map": {
"input": [1,2,3],
"as": "e",
"in": {
"$cond": {
"if": { "$eq": [ "$$e", 1 ] },
"then": "$classification.class_1",
"else": {
"$cond": {
"if": { "$eq": [ "$$e", 2 ] },
"then": "$classification.class_2",
"else": "$classification.class_3"
}
}
}
}
}
}
}},
{ "$unwind": "$max" },
{ "$group": {
"_id": "$_id",
"name": { "$first": "$name" },
"classification": { "$first": "$classification" },
"max": { "$max": "$max" }
}},
{ "$project": {
"name": 1,
"classification": {
"$cond": {
"if": { "$eq": [ "$classification.class_1", "$max" ] },
"then": "class_1",
"else": {
"$cond": {
"if": { "$eq": [ "$classification.class_2", "$max" ] },
"then": "class_2",
"else": "class_3"
}
}
}
}
}}
])
And going really ancient, then we can instead $unwind from $const, which was (and still is) a "hidden" and undocumented operator equal in function to $literal (which is technically aliased to it) in modern versions, but also with the alternate syntax to $cond as an "array" ternary operation this then becomes compatible with all versions since the aggregation framework existed:
db.collection.aggregate([
{ "$project": {
"name": 1,
"classification": 1,
"temp": { "$const": [1,2,3] }
}},
{ "$unwind": "$temp" },
{ "$group": {
"_id": "$_id",
"name": { "$first": "$name" },
"classification": { "$first": "$classification" },
"max": {
"$max": {
"$cond": [
{ "$eq": [ "$temp", 1 ] },
"$classification.class_1",
{ "$cond": [
{ "$eq": [ "$temp", 2 ] },
"$classification.class_2",
"$classification.class_3"
]}
]
}
}
}},
{ "$project": {
"name": 1,
"classification": {
"$cond": [
{ "$eq": [ "$max", "$classification.class_1" ] },
"class_1",
{ "$cond": [
{ "$eq": [ "$max", "$classification.class_2" ] },
"class_2",
"class_3"
]}
]
}
}}
])
But it is of course possible, even if extremely messy.
You can use $indexOfArray operator to find the $max value in classification followed by projecting the key. $objectToArray to convert classification embedded doc into array of key value pairs in 3.4.4 version.
db.collection.aggregate([
{
"$addFields": {
"classification": {
"$let": {
"vars": {
"classificationkv": {
"$objectToArray": "$classification"
}
},
"in": {
"$let": {
"vars": {
"classificationmax": {
"$arrayElemAt": [
"$$classificationkv",
{
"$indexOfArray": [
"$$classificationkv.v",
{
"$max": "$$classificationkv.v"
}
]
}
]
}
},
"in": "$$classificationmax.k"
}
}
}
}
}
}
])
In the end, I went with a more simple solution, but not as generic as the other ones posted here. I used this a switch case statement:
{'$project': {'_id': 1, 'name': 1,
'classification': {'$switch': {
'branches': [
{'case': {'$and': [{'$gt': ['$classification.class_1', '$classification.class_2']},
{'$gt': ['$classification.class_1', '$classification.class_3']}]},
'then': "class1"},
{'case': {'$and': [{'$gt': ['$classification.class_2', '$classification.class_1']},
{'$gt': ['$classification.class_2', '$classification.class_3']}]},
'then': "class_2"},
{'case': {'$and': [{'$gt': ['$classification.class_3', '$classification.class_1']},
{'$gt': ['$classification.class_3', '$classification.class_2']}]},
'then': "class_3"}],
'default': ''}}
}}
This works for me, but the other answers might be a better option, YMMV.

Return Sub-document only when matched but keep empty arrays

I have a collection set with documents like :
{
"_id": ObjectId("57065ee93f0762541749574e"),
"name": "myName",
"results" : [
{
"_id" : ObjectId("570e3e43628ba58c1735009b"),
"color" : "GREEN",
"week" : 17,
"year" : 2016
},
{
"_id" : ObjectId("570e3e43628ba58c1735009d"),
"color" : "RED",
"week" : 19,
"year" : 2016
}
]
}
I am trying to build a query witch alow me to return all documents of my collection but only select the field 'results' with subdocuments if week > X and year > Y.
I can select the documents where week > X and year > Y with the aggregate function and a $match but I miss documents with no match.
So far, here is my function :
query = ModelUser.aggregate(
{$unwind:{path:'$results', preserveNullAndEmptyArrays:true}},
{$match:{
$or: [
{$and:[
{'results.week':{$gte:parseInt(week)}},
{'results.year':{$eq:parseInt(year)}}
]},
{'results.year':{$gt:parseInt(year)}},
{'results.week':{$exists: false}}
{$group:{
_id: {
_id:'$_id',
name: '$name'
},
results: {$push:{
_id:'$results._id',
color: '$results.color',
numSemaine: '$results.numSemaine',
year: '$results.year'
}}
}},
{$project: {
_id: '$_id._id',
name: '$_id.name',
results: '$results'
);
The only thing I miss is : I have to get all 'name' even if there is no result to display.
Any idea how to do this without 2 queries ?
It looks like you actually have MongoDB 3.2, so use $filter on the array. This will just return an "empty" array [] where the conditions supplied did not match anything:
db.collection.aggregate([
{ "$project": {
"name": 1,
"user": 1,
"results": {
"$filter": {
"input": "$results",
"as": "result",
"cond": {
"$and": [
{ "$eq": [ "$$result.year", year ] },
{ "$or": [
{ "$gt": [ "$$result.week", week ] },
{ "$not": { "$ifNull": [ "$$result.week", false ] } }
]}
]
}
}
}
}}
])
Where the $ifNull test in place of $exists as a logical form can actually "compact" the condition since it returns an alternate value where the property is not present, to:
db.collection.aggregate([
{ "$project": {
"name": 1,
"user": 1,
"results": {
"$filter": {
"input": "$results",
"as": "result",
"cond": {
"$and": [
{ "$eq": [ "$$result.year", year ] },
{ "$gt": [
{ "$ifNull": [ "$$result.week", week+1 ] },
week
]}
]
}
}
}
}}
])
In MongoDB 2.6 releases, you can probably get away with using $redact and $$DESCEND, but of course need to fake the match in the top level document. This has similar usage of the $ifNull operator:
db.collection.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$and": [
{ "$eq": [{ "$ifNull": [ "$year", year ] }, year ] },
{ "$gt": [
{ "$ifNull": [ "$week", week+1 ] }
week
]}
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
])
If you actually have MongoDB 2.4, then you are probably better off filtering the array content in client code instead. Every language has methods for filtering array content, but as a JavaScript example reproducible in the shell:
db.collection.find().forEach(function(doc) {
doc.results = doc.results.filter(function(result) {
return (
result.year == year &&
( result.hasOwnProperty('week') ? result.week > week : true )
)
]);
printjson(doc);
})
The reason being is that prior to MongoDB 2.6 you need to use $unwind and $group, and various stages in-between. This is a "very costly" operation on the server, considering that all you want to do is remove items from the arrays of documents and not actually "aggregate" from items within the array.
MongoDB releases have gone to great lengths to provide array processing that does not use $unwind, since it's usage for that purpose alone is not a performant option. It should only ever be used in the case where you are removing a "significant" amount of data from arrays as a result.
The whole point is that otherwise the "cost" of the aggregation operation is likely greater than the "cost" of transferring the data over the network to be filtered on the client instead. Use with caution:
db.collection.aggregate([
// Create an array if one does not exist or is already empty
{ "$project": {
"name": 1,
"user": 1,
"results": {
"$cond": [
{ "$ifNull": [ "$results.0", false ] },
"$results",
[false]
]
}
}},
// Unwind the array
{ "$unwind": "$results" },
// Conditionally $push based on match expression and conditionally count
{ "$group": {
"_id": "_id",
"name": { "$first": "$name" },
"user": { "$first": "$user" },
"results": {
"$push": {
"$cond": [
{ "$or": [
{ "$not": "$results" },
{ "$and": [
{ "$eq": [ "$results.year", year ] },
{ "$gt": [
{ "$ifNull": [ "$results.week", week+1 ] },
week
]}
]}
] },
"$results",
false
]
}
},
"count": {
"$sum": {
"$cond": [
{ "$and": [
{ "$eq": [ "$results.year", year ] },
{ "$gt": [
{ "$ifNull": [ "$results.week", week+1 ] },
week
]}
] }
1,
0
]
}
}
}},
// $unwind again
{ "$unwind": "$results" }
// Filter out false items unless count is 0
{ "$match": {
"$or": [
"$results",
{ "count": 0 }
]
}},
// Group again
{ "$group": {
"_id": "_id",
"name": { "$first": "$name" },
"user": { "$first": "$user" },
"results": { "$push": "$results" }
}},
// Now swap [false] for []
{ "$project": {
"name": 1,
"user": 1,
"results": {
"$cond": [
{ "$ne": [ "$results", [false] ] },
"$results",
[]
]
}
}}
])
Now that is a lot of operations and shuffling just to "filter" content from an array compared to all of the other approaches which are really quite simple. And aside from the complexity, it really does "cost" a lot more to execute on the server.
So if your server version actually supports the newer operators that can do this optimally, then it's okay to do so. But if you are stuck with that last process, then you probably should not be doing it and instead do your array filtering in the client.

Mongo Query to Return only a subset of SubDocuments

Using the example from the Mongo docs:
{ _id: 1, results: [ { product: "abc", score: 10 }, { product: "xyz", score: 5 } ] }
{ _id: 2, results: [ { product: "abc", score: 8 }, { product: "xyz", score: 7 } ] }
{ _id: 3, results: [ { product: "abc", score: 7 }, { product: "xyz", score: 8 } ] }
db.survey.find(
{ id: 12345, results: { $elemMatch: { product: "xyz", score: { $gte: 6 } } } }
)
How do I return survey 12345 (regardless of even if it HAS surveys or not) but only return surveys with a score greater than 6? In other words I don't want the document disqualified from the results based on the subdocument, I want the document but only a subset of subdocuments.
What you are asking for is not so much a "query" but is basically just a filtering of content from the array in each document.
You do this with .aggregate() and $project:
db.survey.aggregate([
{ "$project": {
"results": {
"$setDifference": [
{ "$map": {
"input": "$results",
"as": "el",
"in": {
"$cond": [
{ "$and": [
{ "$eq": [ "$$el.product", "xyz" ] },
{ "$gte": [ "$$el.score", 6 ] }
]}
]
}
}},
[false]
]
}
}}
])
So rather than "contrain" results to documents that have an array member matching the condition, all this is doing is "filtering" the array members out that do not match the condition, but returns the document with an empty array if need be.
The fastest present way to do this is with $map to inspect all elements and $setDifference to filter out any values of false returned from that inspection. The possible downside is a "set" must contain unique elements, so this is fine as long as the elements themselves are unique.
Future releases will have a $filter method, which is similar to $map in structure, but directly removes non-matching results where as $map just returns them ( via the $cond and either the matching element or false ) and is then better suited.
Otherwise if not unique or the MongoDB server version is less than 2.6, you are doing this using $unwind, in a non performant way:
db.survey.aggregate([
{ "$unwind": "$results" },
{ "$group": {
"_id": "$_id",
"results": { "$push": "$results" },
"matched": {
"$sum": {
"$cond": [
{ "$and": [
{ "$eq": [ "$results.product", "xyz" ] },
{ "$gte": [ "$results.score", 6 ] }
]},
1,
0
]
}
}
}},
{ "$unwind": "$results" },
{ "$match": {
"$or": [
{
"results.product": "xyz",
"results.score": { "$gte": 6 }
},
{ "matched": 0 }
}},
{ "$group": {
"_id": "$_id",
"results": { "$push": "$results" },
"matched": { "$first": "$matched" }
}},
{ "$project": {
"results": {
"$cond": [
{ "$ne": [ "$matched", 0 ] },
"$results",
[]
]
}
}}
])
Which is pretty horrible in both design and perfomance. As such you are probably better off doing the filtering per document in client code instead.
You can use $filter in mongoDB 3.2
db.survey.aggregate([{
$match: {
{ id: 12345}
}
}, {
$project: {
results: {
$filter: {
input: "$results",
as: "results",
cond:{$gt: ['$$results.score', 6]}
}
}
}
}]);
It will return all the sub document that have score greater than 6. If you want to return only first matched document than you can use '$' operator.
You can use $redact in this way:
db.survey.aggregate( [
{ $match : { _id : 12345 }},
{ $redact: {
$cond: {
if: {
$or: [
{ $eq: [ "$_id", 12345 ] },
{ $and: [
{ $eq: [ "$product", "xyz" ] },
{ $gte: [ "$score", 6 ] }
]}
]
},
then: "$$DESCEND",
else: "$$PRUNE"
}
}
}
] );
It will $match by _id: 12345 first and then it will "$$PRUNE" all the subdocuments that don't have "product":"xyz" and don't have score greater or equal 6. I added the condition ($cond) { $eq: [ "$_id", 12345 ] } so that it wouldn't prune the whole document before it reaches the subdocuments.

MongoDB distinct values on subdocuments

I have a little weird database structure it is as follows:
I have a document with normal properties, then I have a metadata property which is an array of objects.
metadata: {[
{
key: [key],
value: [value]
},
...
]}
Edit: There will never be a metadata sub-document which has a duplicate key
It was done this way to retain the order of the metadata objects
Now I want to get distinct values of a metadata object with a given key.
I want to find every distinct [value] where [key] = "x" using MongoDB. And have the distinct values returned in an array (not the document)
I guess this is not possible using the distinct command, but is this possible using an aggregation pipeline or do I have to use Map-Reduce?
Any suggestions?
Thanks in advance! :)
I presume you mean this:
{
"metadata": [
{ "key": "abc", "value": "borf" },
{ "key": "cdc", "value": "biff" }
]
},
{
"metadata": [
{ "key": "bbc", "value": "barf" },
{ "key": "abc", "value": "borf" },
{ "key": "abc", "value": "barf" }
]
}
Where if you filter for "abc" and get the distinct "value" entries like this:
db.collection.aggregate([
{ "$match": { "metadata.key": "abc" } },
{ "$unwind": "$metadata" },
{ "$match": { "metadata.key": "abc" } },
{ "$group": {
"_id": "$metadata.value"
}}
])
Or even better:
db.collection.aggregate([
{ "$match": { "metadata.key": "abc" } },
{ "$redact": {
"$cond": {
"if": { "$eq": [ { "$ifNull": [ "$key", "abc" ] }, "abc" ] },
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}},
{ "$unwind": "$metadata" },
{ "$group": {
"_id": "$metadata.value",
"count": { "$sum": 1 }
}}
])
Which would basically give:
{ "_id": "barf", "count": 1 },
{ "_id": "borf", "count": 2 }
But it is not possible for this to just be an array of "barf" and "borf". The distinct() method does an array of keys only, but it is also very limited. Therefore it can only do this:
db.collection.distinct("metadata.value",{ "metadata.key": "abc" })
[ "biff", "borf", "barf" ]
Which is incorrect as a result. So just take the "document" results from above and apply some "post processing":
db.collection.aggregate([
{ "$match": { "metadata.key": "abc" } },
{ "$redact": {
"$cond": {
"if": { "$eq": [ { "$ifNull": [ "$key", "abc" ] }, "abc" ] },
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}},
{ "$unwind": "$metadata" },
{ "$group": {
"_id": "$metadata.value"
}}
]).map(function(doc) {
return doc._id;
})
And that result is a plain array of just the distinct values:
[ "borf", "barf" ]

how to avoid $push-ing nulls in mongo aggregation framework

$push is aggregating nulls if the field is not present.
I would like to avoid this.
Is there a way to make a sub expression for $push operator in such way that null values will be skipped and not pushed into the resulting array ?
Bit late to the party, but..
I wanted to do the same thing, and found that I could accomplish it with an expression like this:
// Pushes events only if they have the value 'A'
"events": {
"$push": {
"$cond": [
{
"$eq": [
"$event",
"A"
]
},
"A",
"$noval"
]
}
}
The thinking here is that when you do
{ "$push": "$event" }
then it seems to only push non-null values.
So I made up a column that doesn't exist, $noval, to be returned as the false condition of my $cond.
It seems to work. I'm not sure if it is non-standard and therefore susceptible to breaking one day but..
It's really not completely clear what your specific case is without an example. There is the $ifNull operator which can "replace" a null value or missing field with "something else", but to truly "skip" is not possible.
That said, you can always "filter" the results depending on your actual use case.
If your resulting data is actually a "Set" and you have a MongoDB version that is 2.6 or greater then you can use $setDifference with some help from $addToSet to reduce the number of null values that are kept initially:
db.collection.aggregate([
{ "$group": {
"_id": "$key",
"list": { "$addToSet": "$field" }
}},
{ "$project": {
"list": { "$setDifference": [ "$list", [null] ] }
}}
])
So there would only be one null and then the $setDifference operation will "filter" that out in the comparison.
In earlier versions or when the values are not in fact "unique" and not a "set", then you "filter" by processing with $unwind and $match:
db.collection.aggregate([
{ "$group": {
"_id": "$key",
"list": { "$push": "$field" }
}},
{ "$unwind": "$list" },
{ "$match": { "list": { "$ne": null } }},
{ "$group": {
"_id": "$_id",
"list": { "$push": "$list" }
}}
])
If you don't want to be "destructive" of arrays that would end up "empty" because they contained "nothing but" null, then you keep a count use $ifNull and match on the conditions:
db.collection.aggregate([
{ "$group": {
"_id": "$key",
"list": { "$push": "$field" },
"count": {
"$sum": {
"$cond": [
{ "$eq": { "$ifNull": [ "$field", null ] }, null },
0,
1
]
}
}
}},
{ "$unwind": "$list" },
{ "$match": {
"$or": [
{ "list": { "$ne": null } },
{ "count": 0 }
]
}},
{ "$group": {
"_id": "$_id",
"list": { "$push": "$list" }
}},
{ "$project": {
"list": {
"$cond": [
{ "$eq": [ "$count", 0 ] },
{ "$const": [] },
"$list"
]
}
}}
])
With a final $project replacing any array that simply consisted of null values only with an empty array object.