How to use $slice a $filter result in MongoDB? - mongodb

I have a item collection with the following format:
{
"_id": 123,
"items": [{
"name": "item1",
"status" : "inactive",
"created" : ISODate("2018-02-14T10:39:28.321Z")
},
{
"name": "item2",
"status" : "active",
"created" : ISODate("2018-02-14T10:39:28.321Z")
},
{
"name": "item3",
"status" : "active",
"created" : ISODate("2018-02-14T10:39:28.321Z")
},
{
"name": "item4",
"status" : "inactive",
"created" : ISODate("2018-02-14T10:39:28.321Z")
},
{
"name": "item5",
"status" : "active",
"created" : ISODate("2018-02-14T10:39:28.321Z")
}
]
}
I want to query on status field of items such that the object with status as 'active' is only returned in the array and also skip last 1 and limit 2 are returned in the query.
At present I am using $filter(aggregation) for this operation, by using following query it return only last record:
db.item.aggregate([
{ "$match": { "items.status": "active" } },
{ "$project": {
"items": {
"$slice": [
{ "$filter": {
"input": "$items",
"as": "item",
"cond": { "$eq": [ "$$item.status", "active" ] }
}},
-1,2
]
}
}}
])
Output should be:
{
"_id": 123,
"items": [
{
"name": "item2",
"status" : "active",
"created" : ISODate("2018-02-14T10:39:28.321Z")
},
{
"name": "item3",
"status" : "active",
"created" : ISODate("2018-02-14T10:39:28.321Z")
}]
}
Please help me to achieve this result.

One way of doing it would like this (there's no need for the $match stage depending on the structure of your data and your indexing setup you might still want to keep it, though, for performance reasons):
db.item.aggregate([
{ "$project": {
"items": {
"$slice": [
{ "$filter": {
"input": "$items",
"as": "item",
"cond": { "$eq": [ "$$item.status", "active" ] }
}},
-3,2
]
}
}}
])
I would think it's probably better, though, to use the following query:
db.items.db.aggregate([
{
$project: {
"items": {
"$filter": {
"input": "$items",
"as": "item",
"cond": { "$eq": [ "$$item.status", "active" ] }
}
}
}
}, {
$project: {
"items": {
$slice: [
{
$slice: [
"$items",
{
$subtract: [ { $size: "$items" }, 1 ] // length of items array minus one
}
]
}, 2 // max two elements
]
}
}
}])
as this one will first get rid of the last element and then limit the output to two items which is probably more what you want in a situation with less than 3 "active" elements.

Related

MongoDB: How to get add filter main collection by second collection using $lookup [duplicate]

How can I add a filter after an $lookup or is there any other method to do this?
My data collection test is:
{ "_id" : ObjectId("570557d4094a4514fc1291d6"), "id" : 100, "value" : "0", "contain" : [ ] }
{ "_id" : ObjectId("570557d4094a4514fc1291d7"), "id" : 110, "value" : "1", "contain" : [ 100 ] }
{ "_id" : ObjectId("570557d4094a4514fc1291d8"), "id" : 120, "value" : "1", "contain" : [ 100 ] }
{ "_id" : ObjectId("570557d4094a4514fc1291d9"), "id" : 121, "value" : "2", "contain" : [ 100, 120 ] }
I select id 100 and aggregate the childs:
db.test.aggregate([ {
$match : {
id: 100
}
}, {
$lookup : {
from : "test",
localField : "id",
foreignField : "contain",
as : "childs"
}
}]);
I get back:
{
"_id":ObjectId("570557d4094a4514fc1291d6"),
"id":100,
"value":"0",
"contain":[ ],
"childs":[ {
"_id":ObjectId("570557d4094a4514fc1291d7"),
"id":110,
"value":"1",
"contain":[ 100 ]
},
{
"_id":ObjectId("570557d4094a4514fc1291d8"),
"id":120,
"value":"1",
"contain":[ 100 ]
},
{
"_id":ObjectId("570557d4094a4514fc1291d9"),
"id":121,
"value":"2",
"contain":[ 100, 120 ]
}
]
}
But I want only childs that match with "value: 1"
At the end I expect this result:
{
"_id":ObjectId("570557d4094a4514fc1291d6"),
"id":100,
"value":"0",
"contain":[ ],
"childs":[ {
"_id":ObjectId("570557d4094a4514fc1291d7"),
"id":110,
"value":"1",
"contain":[ 100 ]
},
{
"_id":ObjectId("570557d4094a4514fc1291d8"),
"id":120,
"value":"1",
"contain":[ 100 ]
}
]
}
The question here is actually about something different and does not need $lookup at all. But for anyone arriving here purely from the title of "filtering after $lookup" then these are the techniques for you:
MongoDB 3.6 - Sub-pipeline
db.test.aggregate([
{ "$match": { "id": 100 } },
{ "$lookup": {
"from": "test",
"let": { "id": "$id" },
"pipeline": [
{ "$match": {
"value": "1",
"$expr": { "$in": [ "$$id", "$contain" ] }
}}
],
"as": "childs"
}}
])
Earlier - $lookup + $unwind + $match coalescence
db.test.aggregate([
{ "$match": { "id": 100 } },
{ "$lookup": {
"from": "test",
"localField": "id",
"foreignField": "contain",
"as": "childs"
}},
{ "$unwind": "$childs" },
{ "$match": { "childs.value": "1" } },
{ "$group": {
"_id": "$_id",
"id": { "$first": "$id" },
"value": { "$first": "$value" },
"contain": { "$first": "$contain" },
"childs": { "$push": "$childs" }
}}
])
If you question why would you $unwind as opposed to using $filter on the array, then read Aggregate $lookup Total size of documents in matching pipeline exceeds maximum document size for all the detail on why this is generally necessary and far more optimal.
For releases of MongoDB 3.6 and onwards, then the more expressive "sub-pipeline" is generally what you want to "filter" the results of the foreign collection before anything gets returned into the array at all.
Back to the answer though which actually describes why the question asked needs "no join" at all....
Original
Using $lookup like this is not the most "efficient" way to do what you want here. But more on this later.
As a basic concept, just use $filter on the resulting array:
db.test.aggregate([
{ "$match": { "id": 100 } },
{ "$lookup": {
"from": "test",
"localField": "id",
"foreignField": "contain",
"as": "childs"
}},
{ "$project": {
"id": 1,
"value": 1,
"contain": 1,
"childs": {
"$filter": {
"input": "$childs",
"as": "child",
"cond": { "$eq": [ "$$child.value", "1" ] }
}
}
}}
]);
Or use $redact instead:
db.test.aggregate([
{ "$match": { "id": 100 } },
{ "$lookup": {
"from": "test",
"localField": "id",
"foreignField": "contain",
"as": "childs"
}},
{ "$redact": {
"$cond": {
"if": {
"$or": [
{ "$eq": [ "$value", "0" ] },
{ "$eq": [ "$value", "1" ] }
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
]);
Both get the same result:
{
"_id":ObjectId("570557d4094a4514fc1291d6"),
"id":100,
"value":"0",
"contain":[ ],
"childs":[ {
"_id":ObjectId("570557d4094a4514fc1291d7"),
"id":110,
"value":"1",
"contain":[ 100 ]
},
{
"_id":ObjectId("570557d4094a4514fc1291d8"),
"id":120,
"value":"1",
"contain":[ 100 ]
}
]
}
Bottom line is that $lookup itself cannot "yet" query to only select certain data. So all "filtering" needs to happen after the $lookup
But really for this type of "self join" you are better off not using $lookup at all and avoiding the overhead of an additional read and "hash-merge" entirely. Just fetch the related items and $group instead:
db.test.aggregate([
{ "$match": {
"$or": [
{ "id": 100 },
{ "contain.0": 100, "value": "1" }
]
}},
{ "$group": {
"_id": {
"$cond": {
"if": { "$eq": [ "$value", "0" ] },
"then": "$id",
"else": { "$arrayElemAt": [ "$contain", 0 ] }
}
},
"value": { "$first": { "$literal": "0"} },
"childs": {
"$push": {
"$cond": {
"if": { "$ne": [ "$value", "0" ] },
"then": "$$ROOT",
"else": null
}
}
}
}},
{ "$project": {
"value": 1,
"childs": {
"$filter": {
"input": "$childs",
"as": "child",
"cond": { "$ne": [ "$$child", null ] }
}
}
}}
])
Which only comes out a little different because I deliberately removed the extraneous fields. Add them in yourself if you really want to:
{
"_id" : 100,
"value" : "0",
"childs" : [
{
"_id" : ObjectId("570557d4094a4514fc1291d7"),
"id" : 110,
"value" : "1",
"contain" : [ 100 ]
},
{
"_id" : ObjectId("570557d4094a4514fc1291d8"),
"id" : 120,
"value" : "1",
"contain" : [ 100 ]
}
]
}
So the only real issue here is "filtering" any null result from the array, created when the current document was the parent in processing items to $push.
What you also seem to be missing here is that the result you are looking for does not need aggregation or "sub-queries" at all. The structure that you have concluded or possibly found elsewhere is "designed" so that you can get a "node" and all of it's "children" in a single query request.
That means just the "query" is all that is really needed, and the data collection ( which is all that is happening since no content is really being "reduced" ) is just a function of iterating the cursor result:
var result = {};
db.test.find({
"$or": [
{ "id": 100 },
{ "contain.0": 100, "value": "1" }
]
}).sort({ "contain.0": 1 }).forEach(function(doc) {
if ( doc.id == 100 ) {
result = doc;
result.childs = []
} else {
result.childs.push(doc)
}
})
printjson(result);
This does exactly the same thing:
{
"_id" : ObjectId("570557d4094a4514fc1291d6"),
"id" : 100,
"value" : "0",
"contain" : [ ],
"childs" : [
{
"_id" : ObjectId("570557d4094a4514fc1291d7"),
"id" : 110,
"value" : "1",
"contain" : [
100
]
},
{
"_id" : ObjectId("570557d4094a4514fc1291d8"),
"id" : 120,
"value" : "1",
"contain" : [
100
]
}
]
}
And serves as proof that all you really need to do here is issue the "single" query to select both the parent and children. The returned data is just the same, and all you are doing on either server or client is "massaging" into another collected format.
This is one of those cases where you can get "caught up" in thinking of how you did things in a "relational" database, and not realize that since the way the data is stored has "changed", you no longer need to use the same approach.
That is exactly what the point of the documentation example "Model Tree Structures with Child References" in it's structure, where it makes it easy to select parents and children within one query.

Aggregate and Reduce Nested Documents and Arrays

EDIT:
Our use case:
We get continues reports from servers about visitors. We pre-aggregate the data on the servers for a few seconds aber after that insert these "reports" into MongoDB.
In our dashboard we would like to query the different browsers, OSes, geolocation (country etc.) based on time ranges.
So like: Within the last 7 days, there were 1000 visitors using Chrome, 500 from Germany, 200 from England and so on.
I'm pretty stuck with a MongoDB query we need for our dashboard.
We have following report entries:
{
"_id" : ObjectId("59b9d08e402025326e1a0f30"),
"channel_perm_id" : "c361049fb4144b0e81b71c0b6cfdc296",
"source_id" : "insomnia",
"start_timestamp" : ISODate("2017-09-14T00:42:54.510Z"),
"end_timestamp" : ISODate("2017-09-14T00:42:54.510Z"),
"timestamp" : ISODate("2017-09-14T00:42:54.510Z"),
"resource_uri" : "b755d62a-8c0a-4e8a-945f-41782c13535b",
"sources_info" : {
"browsers" : [
{
"name" : "Chrome",
"count" : NumberLong(2)
}
],
"operating_systems" : [
{
"name" : "Mac OS X",
"count" : NumberLong(2)
}
],
"continent_ids" : [
{
"name" : "EU",
"count" : NumberLong(1)
}
],
"country_ids" : [
{
"name" : "DE",
"count" : NumberLong(1)
}
],
"city_ids" : [
{
"name" : "Solingen",
"count" : NumberLong(1)
}
]
},
"unique_sources" : NumberLong(1),
"requests" : NumberLong(1),
"cache_hits" : NumberLong(0),
"cache_misses" : NumberLong(1),
"cache_hit_size" : NumberLong(0),
"cache_refill_size" : NumberLong("170000000000")
}
Now, we need to aggregate these reports based on timestamp.
So far, so easy:
db.channel_report.aggregate([{
$group: {
_id: {
$dateToString: {
format: "%Y",
date: "$timestamp"
}
},
sources_info: {
$push: "$sources_info"
}
},
}];
But now it gets difficult for me. As you might already noticed, the sources_info object is the problem.
Instead of just "pushing" all sources info into array per group, we need to actually accumulate it.
So, if we have something like this:
{
sources_info: [
{
browsers: [
{
name: "Chrome,
count: 1
}
]
},
{
browsers: [
{
name: "Chrome,
count: 1
}
]
}
]
}
The array should be reduced to this:
{
sources_info:
{
browsers: [
{
name: "Chrome,
count: 2
}
]
}
}
We migrated from MySQL to MongoDB for analytics, but I have no clue how to model this behaviour in Mongo. Regarding the docs I almost think it is not possible, at least not with the current data structure.
Is there a nice solution for this? Or maybe even a different kind of data structure?
Cheers,
Chris from StriveCDN
The basic problem you have is that you are using "named keys" where you probably really should be instead using values to a consistent attribute path. This means instead of keys like "browsers", this probably should simply be "type": "browser" and so on on each entry.
The reasoning for this should become apparent on the general approaches to aggregating the data. It also really helps in querying in general. But the approaches basically involve coercing your initial data format into this kind of structure in order to aggregate it first.
With most recent releases ( MongoDB 3.4.4 and greater ), we can work with your named keys via $objectToArray and manipulate as follows:
db.channel_report.aggregate([
{ "$project": {
"timestamp": 1,
"sources": {
"$reduce": {
"input": {
"$map": {
"input": { "$objectToArray": "$sources_info" },
"as": "s",
"in": {
"$map": {
"input": "$$s.v",
"as": "v",
"in": {
"type": "$$s.k",
"name": "$$v.name",
"count": "$$v.count"
}
}
}
}
},
"initialValue": [],
"in": { "$concatArrays": ["$$value", "$$this"] }
}
}
}},
{ "$unwind": "$sources" },
{ "$group": {
"_id": {
"year": { "$year": "$timestamp" },
"type": "$sources.type",
"name": "$sources.name"
},
"count": { "$sum": "$sources.count" }
}},
{ "$group": {
"_id": { "year": "$_id.year", "type": "$_id.type" },
"v": { "$push": { "name": "$_id.name", "count": "$count" } }
}},
{ "$group": {
"_id": "$_id.year",
"sources_info": {
"$push": { "k": "$_id.type", "v": "$v" }
}
}},
{ "$addFields": {
"sources_info": { "$arrayToObject": "$sources_info" }
}}
])
Taking that back a notch to MongoDB 3.4 ( which should be default on most hosted services by now ) you could alternately declare each key name manually:
db.channel_report.aggregate([
{ "$project": {
"timestamp": 1,
"sources": {
"$concatArrays": [
{ "$map": {
"input": "$sources_info.browsers",
"in": {
"type": "browsers",
"name": "$$this.name",
"count": "$$this.count"
}
}},
{ "$map": {
"input": "$sources_info.operating_systems",
"in": {
"type": "operating_systems",
"name": "$$this.name",
"count": "$$this.count"
}
}},
{ "$map": {
"input": "$sources_info.continent_ids",
"in": {
"type": "continent_ids",
"name": "$$this.name",
"count": "$$this.count"
}
}},
{ "$map": {
"input": "$sources_info.country_ids",
"in": {
"type": "country_ids",
"name": "$$this.name",
"count": "$$this.count"
}
}},
{ "$map": {
"input": "$sources_info.city_ids",
"in": {
"type": "city_ids",
"name": "$$this.name",
"count": "$$this.count"
}
}}
]
}
}},
{ "$unwind": "$sources" },
{ "$group": {
"_id": {
"year": { "$year": "$timestamp" },
"type": "$sources.type",
"name": "$sources.name"
},
"count": { "$sum": "$sources.count" }
}},
{ "$group": {
"_id": { "year": "$_id.year", "type": "$_id.type" },
"v": { "$push": { "name": "$_id.name", "count": "$count" } }
}},
{ "$group": {
"_id": "$_id.year",
"sources": {
"$push": { "k": "$_id.type", "v": "$v" }
}
}},
{ "$project": {
"sources_info": {
"browsers": {
"$arrayElemAt": [
"$sources.v",
{ "$indexOfArray": [ "$sources.k", "browsers" ] }
]
},
"operating_systems": {
"$arrayElemAt": [
"$sources.v",
{ "$indexOfArray": [ "$sources.k", "operating_systems" ] }
]
},
"continent_ids": {
"$arrayElemAt": [
"$sources.v",
{ "$indexOfArray": [ "$sources.k", "continent_ids" ] }
]
},
"country_ids": {
"$arrayElemAt": [
"$sources.v",
{ "$indexOfArray": [ "$sources.k", "country_ids" ] }
]
},
"city_ids": {
"$arrayElemAt": [
"$sources.v",
{ "$indexOfArray": [ "$sources.k", "city_ids" ] }
]
}
}
}}
])
We can even wind that back to MongoDB 3.2 by using $map and $filter in place of $indexOfArray, but the general approach is the main thing to explain.
Concatenate arrays
The main thing that needs to happen is to take the data from the many different arrays with named keys and make a "single array" with a "type" property representing each key name. This is arguably how the data should be stored in the first place, and the first aggregation stage of either approach comes out like this:
/* 1 */
{
"_id" : ObjectId("59b9d08e402025326e1a0f30"),
"timestamp" : ISODate("2017-09-14T00:42:54.510Z"),
"sources" : [
{
"type" : "browsers",
"name" : "Chrome",
"count" : NumberLong(2)
},
{
"type" : "operating_systems",
"name" : "Mac OS X",
"count" : NumberLong(2)
},
{
"type" : "continent_ids",
"name" : "EU",
"count" : NumberLong(1)
},
{
"type" : "country_ids",
"name" : "DE",
"count" : NumberLong(1)
},
{
"type" : "city_ids",
"name" : "Solingen",
"count" : NumberLong(1)
}
]
}
Unwind and Group
Part of the data you want to accumulate on actually includes those "type" and "name" properties from "within" the array. Whenever you need to accumulate across documents from "within an array", the process you use is $unwind in order to be able to access those values as part of the grouping key.
What this means is that after using $unwind on the combined array, you then want to $group on both of those keys and the reduced "timestamp" detail in order to $sum the "count" values.
Since you then have "sub-levels" of detail ( i.e each name of browser within browsers ) then you use additional $group pipeline stages, gradually decreasing the granularity of the grouping keys and using $push to accumulate the details into arrays.
In either case, omitting the very last stage of output the accumulated structure comes out as:
/* 1 */
{
"_id" : 2017,
"sources_info" : [
{
"k" : "continent_ids",
"v" : [
{
"name" : "EU",
"count" : NumberLong(1)
}
]
},
{
"k" : "city_ids",
"v" : [
{
"name" : "Solingen",
"count" : NumberLong(1)
}
]
},
{
"k" : "country_ids",
"v" : [
{
"name" : "DE",
"count" : NumberLong(1)
}
]
},
{
"k" : "browsers",
"v" : [
{
"name" : "Chrome",
"count" : NumberLong(2)
}
]
},
{
"k" : "operating_systems",
"v" : [
{
"name" : "Mac OS X",
"count" : NumberLong(2)
}
]
}
]
}
This really is the final state of the data, though not represented in the same form as it was originally found. It is arguably complete at this point as any further processing is merely cosmetic to output as named keys again.
Output to named keys
As shown the varied approaches are either looking up the array entries by the matching key name, or by using $arrayToObject to transform the array content back into an object with named keys.
An alternate is also to simply do that very last manipulation in code, as shown by this .map() example of manipulating the cursor result in the shell:
db.channel_report.aggregate([
{ "$project": {
"timestamp": 1,
"sources": {
"$reduce": {
"input": {
"$map": {
"input": { "$objectToArray": "$sources_info" },
"as": "s",
"in": {
"$map": {
"input": "$$s.v",
"as": "v",
"in": {
"type": "$$s.k",
"name": "$$v.name",
"count": "$$v.count"
}
}
}
}
},
"initialValue": [],
"in": { "$concatArrays": ["$$value", "$$this"] }
}
}
}},
{ "$unwind": "$sources" },
{ "$group": {
"_id": {
"year": { "$year": "$timestamp" },
"type": "$sources.type",
"name": "$sources.name"
},
"count": { "$sum": "$sources.count" }
}},
{ "$group": {
"_id": { "year": "$_id.year", "type": "$_id.type" },
"v": { "$push": { "name": "$_id.name", "count": "$count" } }
}},
{ "$group": {
"_id": "$_id.year",
"sources_info": {
"$push": { "k": "$_id.type", "v": "$v" }
}
}},
/*
{ "$addFields": {
"sources_info": { "$arrayToObject": "$sources_info" }
}}
*/
]).map( d => Object.assign(d,{
"sources_info": d.sources_info.reduce((acc,curr) =>
Object.assign(acc,{ [curr.k]: curr.v }),{})
}))
Which of course applies to either aggregation pipeline approach.
And of course even $concatArrays can be replaced with $setUnion as long as all the entries have a unique identifying combination of "name" and "type" ( as they appear to be ), and that means with application of modifying the final output by processing the cursor instead you can apply the technique even as far back as MongoDB 2.6.
Final Output
And the final output ( actually aggregated of course, but the question only samples one document ) accumulates for all the sub-keys and reconstructs from the last sample output as shown as:
{
"_id" : 2017,
"sources_info" : {
"continent_ids" : [
{
"name" : "EU",
"count" : NumberLong(1)
}
],
"city_ids" : [
{
"name" : "Solingen",
"count" : NumberLong(1)
}
],
"country_ids" : [
{
"name" : "DE",
"count" : NumberLong(1)
}
],
"browsers" : [
{
"name" : "Chrome",
"count" : NumberLong(2)
}
],
"operating_systems" : [
{
"name" : "Mac OS X",
"count" : NumberLong(2)
}
]
}
}
Where every array entry under each key of sources_info is reduced down to it's cumulative count for every other entry sharing the same "name".

Filter Array Content to a Query containing $concatArrays

Given this function, I have a data set that I am querying. The data looks like this:
db.activity.insert(
{
"_id" : ObjectId("5908e64e3b03ca372dc945d5"),
"startDate" : ISODate("2017-05-06T00:00:00Z"),
"details" : [
{
"code" : "2",
"_id" : ObjectId("5908ebf96ae5003a4471c9b2"),
"walkDistance" : "03",
"jogDistance" : "01",
"runDistance" : "08",
"sprintDistance" : "01"
}
]
}
)
db.activity.insert(
{
"_id" : ObjectId("58f79163bebac50d5b2ae760"),
"startDate" : ISODate("2017-05-07T00:00:00Z"),
"details" : [
{
"code" : "2",
"_id" : ObjectId("58f7948fbebac50d5b2ae7f2"),
"walkDistance" : "01",
"jogDistance" : "02",
"runDistance" : "09",
"sprintDistance" : ""
}
]
}
)
Using this function, thanks to Neil Lunn, I am able to get my desired output:
db.activity.aggregate([
{ "$project": {
"_id": 0,
"unique": {
"$filter": {
"input": {
"$setDifference": [
{ "$concatArrays": [
"$details.walkDistance",
"$details.jogDistance",
"$details.runDistance",
"$details.sprintDistance"
]},
[]
]
},
"cond": { "$ne": [ "$$this", "" ] }
}
}
}},
{ "$unwind": "$unique" },
{ "$group": {
"_id": null,
"uniqueArray": { "$addToSet": "$unique" }
}}
])
However, I cannot add a match statement to the beginning.
db.activity.aggregate([
{$match: {"startDate" : ISODate("2017-05-06T00:00:00Z"), "details.code": "2" },
{$unwind: '$details'},
{$match: {"startDate" : ISODate("2017-05-06T00:00:00Z"), "details.code": "2" },
{ "$project": {
"_id": 0,
"unique": {
"$filter": {
"input": {
"$setDifference": [
{ "$concatArrays": [
"$details.walkDistance",
"$details.jogDistance",
"$details.runDistance",
"$details.sprintDistance"
]},
[]
]
},
"cond": { "$ne": [ "$$this", "" ] }
}
}
}},
{ "$unwind": "$unique" },
{ "$group": {
"_id": null,
"uniqueArray": { "$addToSet": "$unique" }
}}
])
Because it gives an error message of:
> $concatArrays only supports arrays, not string
How can I modify this query so that a $match statement can be added?
Don't $unwind the array you are feeding to $concatArrays. Instead apply $filter to only extract the matching values. And as stated, we can just use $setUnion for the 'unique concatenation' instead:
db.activity.aggregate([
{ "$match": { "startDate" : ISODate("2017-05-06T00:00:00Z"), "details.code": "2" } },
{ "$project": {
"_id": 0,
"unique": {
"$let": {
"vars": {
"filtered": {
"$filter": {
"input": "$details",
"cond": { "$eq": [ "$$this.code", "2" ] }
}
}
},
"in": {
"$setDifference": [
{ "$setUnion": [
"$$filtered.walkDistance",
"$$filtered.jogDistance",
"$$filtered.runDistance",
"$$filtered.sprintDistance"
]},
[""]
]
}
}
}
}},
{ "$unwind": "$unique" },
{ "$group": {
"_id": null,
"uniqueArray": { "$addToSet": "$unique" }
}}
])
Using $let makes things a bit cleaner syntax wise since you don't need to specify multiple $map and $filter statements "inline" as the source for $setUnion

How to Join Arrays in the Same Document?

I would like to combine the data in one collection using the IDs of the two arrays.
An example is shown below.
{
"_id": ObjectId ("5976fd2eb0adec0a32fa9831"),
"People": [
{
"_id": 1, <--- ID
"Name": "jane"
},
{
"_id": 2, <--- ID
"Name": "Mark"
}
],
"Contents": [
{
"userID": 2, <--- People ID
"Text": "111"
},
{
"userID": 1, <--- People ID
"Text": "Hi"
}
]
}
I want to make the above document as below.
{
"_id": ObjectId ("5976fd2eb0adec0a32fa9831"),
"People": [
{
"_id": 1,
"Name" : "Jane"
},
{
"_id": 2,
"Name": "Mark"
}
],
"Contents": [
{
"userID": 2,
"Name": "Mark", <-- Adding
"Text": "111",
},
{
"userID": 1,
"Name": "Jane", <-- Adding
"Text": "Hi",
}
]
}
I have tried various things like $lookup or $unwind of .aggregate() but I cannot get the result.
You want $map and $indexOfArray ideally:
db.collection.aggregate([
{ "$addFields": {
"Contents": {
"$map": {
"input": "$Contents",
"as": "c",
"in": {
"userID": "$$c.userID",
"Name": {
"$arrayElemAt": [
"$People.Name",
{ "$indexOfArray": [ "$People._id", "$$c.userID" ] }
]
},
"Text": "$$c.Text"
}
}
}
}}
])
Which basically grabs the value from the other array via $arrayElemAt for the matching "index" returned by $indexOfArray.
If your MongoDB needs to fall back a version without that operator, then you could use $filter instead:
db.collection.aggregate([
{ "$addFields": {
"Contents": {
"$map": {
"input": "$Contents",
"as": "c",
"in": {
"userID": "$$c.userID",
"Name": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": "$People",
"as": "p",
"cond": { "$eq": [ "$$p._id", "$$c.userID" ] }
}
},
"as": "p",
"in": "$$p.Name"
}},
0
]
},
"Text": "$$c.Text"
}
}
}
}}
])
Where basically you $filter the results down of the other array in comparison and simply return the first matching element by the 0 index with $arrayElemAt.
In either case, there is no need to "self-join" using $lookup, and that's just really unnecessary overhead best avoided.
From the document in the question you get the following:
/* 1 */
{
"_id" : ObjectId("5976fd2eb0adec0a32fa9831"),
"People" : [
{
"_id" : 1.0,
"Name" : "jane"
},
{
"_id" : 2.0,
"Name" : "Mark"
}
],
"Contents" : [
{
"userID" : 2.0,
"Name" : "Mark",
"Text" : "111"
},
{
"userID" : 1.0,
"Name" : "jane",
"Text" : "Hi"
}
]
}
Generally speaking though, there is no such reason for any aggregation operators at all, as this sort of operation is generally best left to post-processing in the cursor. In fact since you are actually "adding" data to the document to return, it's better to do modification after the document is sent over the network.
As a common idiom of the above shown as JavaScript for the shell:
db.collection.find().map( d =>
Object.assign(
d,
{
"Contents": d.Contents.map( c =>
Object.assign(c,
{ "Name": d.People.map(p => p.Name)[d.People.map(p => p._id).indexOf(c.userID)] }
)
)
}
)
)
Produces the exact same result, and is generally a bit easier on the eyes to read and interpret

MongoDB aggregate/grouping by key-value pairs

My data looks something like this:
{
"_id" : "9aa072e4-b706-47e6-9607-1a39e904a05a",
"customerId" : "2164289-4",
"channelStatuses" : {
"FOO" : {
"status" : "done"
},
"BAR" : {
"status" : "error"
}
},
"channel" : "BAR",
}
My aggregate/group looks like this:
{
"_id" : {
"customerId" : "$customerId",
"channel" : "$channel",
"status" : "$channelStatuses[$channel].status"
},
"count" : {
"$sum" : 1
}
}
So basically with the example data the group should give me a group grouped by:
{"customerId": "2164289-4", "channel": "BAR", "status": "error"}
But I cannot use []-indexing in a aggregate/group. What should I do instead?
You cannot get the result you want with the current structure using .aggregate(). You "could" change the structure to use an array rather than named keys, and the operation is actually quite simple.
So with a document like:
{
"_id" : "9aa072e4-b706-47e6-9607-1a39e904a05a",
"customerId" : "2164289-4",
"channelStatuses" : [
{
"channel": "FOO",
"status" : "done"
},
{
"channel": "BAR",
"status" : "error"
}
],
"channel" : "BAR",
}
You can then do in modern releases with $filter, $map and $arrayElemAt:
{ "$group": {
"_id": {
"customerId" : "$customerId",
"channel" : "$channel",
"status": {
"$arrayElemAt": [
{ "$map": {
"input": { "$filter": {
"input": "$chanelStatuses",
"as": "el",
"cond": { "$eq": [ "$$el.channel", "$channel" ] }
}},
"as": "el",
"in": "$$el.status"
}},
0
]
}
},
"count": { "$sum": 1 }
}}
Older versions of MongoDB are going to going to require $unwind to access the matched array element.
In MongoDB 2.6 then you can still "pre-filter" the array before unwind:
[
{ "$project": {
"customerId": 1,
"channel": 1,
"status": {
"$setDifference": [
{ "$map": {
"input": "$channelStatuses",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.channel", "$channel" ] },
"$$el.status",
false
]
}
}},
[false]
]
}
}},
{ "$unwind": "$status" },
{ "$group": {
"_id": {
"customerId": "$customerId",
"channel": "$channel",
"status": "$status"
},
"count": { "$sum": 1 }
}}
]
And anything prior to that you "filter" after $unwind instead:
[
{ "$unwind": "$channelStatuses" },
{ "$project": {
"customerId": 1,
"channel": 1,
"status": "$channelStatuses.status",
"same": { "$eq": [ "$channelStatuses.status", "$channel" ] }
}},
{ "$match": { "same": true } },
{ "$group": {
"_id": "$_id",
"customerId": { "$first": "$customerId" },
"channel": { "$first": "$channel" },
"status": { "$first": "$status" }
}},
{ "$group": {
"_id": {
"customerId": "$customerId",
"channel": "$channel",
"status": "$status"
},
"count": { "$sum": 1 }
}}
]
In a lesser version than MongoDB 2.6 you also need to $project the result of the equality test between the two fields and then $match on the result in a seperate stage. You might also note the "two" $group stages, since the first one removes any possible duplicates of the "channel" values after the filter via the $first accumulators. The following $group is exactly the same as in the previous listing.
But if you cannot change the structure and need "flexible" matching of keys where you cannot supply every name, then you must use mapReduce:
db.collection.mapReduce(
function() {
emit({
"customerId": this.customerId,
"channel": this.channel,
"status": this.channelStatuses[this.channel].status
},1);
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
)
Where of course you can use that sort of notation