I have data in below format:
{
"Array1" : [
"A",
"B",
"C",
"D",
"E"
],
"tag": "X"
}
{
"Array1" : [
"A",
"B",
"C",
"X",
"Y"
],
"tag": "X"
}
{
"Array1" : [
"A",
"B",
"C",
"L",
"M"
],
"tag": "U"
}
And, I need to perform a pop command on Array1 during the aggregate command so that the last element is ignored. I am trying the below command:
aggregate([
{$unwind: "$Array1"},
{$group: {_id: "$Array1" count: {$sum: 1}}},
])
Similarly, would it be possible to ignore the first element of the array?
Edit: Expected output:
{
"A": 3,
"B": 3,
"C": 3,
"D": 1,
"X": 1,
"L": 1
}
I'm going to skip the PHP translation because it's both late at night for me and also quite trivial. But the basic process is this:
db.collection.aggregate([
{ "$unwind": "$Array1" },
{ "$group": {
"_id": "$_id",
"Array1": { "$push": "$Array1" },
"last": { "$last": "$Array1" }
}},
{ "$project": {
"Array1": {
"$setDifference": [
"$Array1",
{ "$map": { "input": ["A"], "as": "el", "in": "$last" } }
]
}
}}
])
If your array items are not actually unique, or the order is impotant so the "set" operator there messes with this, then do this instead:
db.collection.aggregate([
{ "$unwind": "$Array1" },
{ "$group": {
"_id": "$_id",
"Array1": { "$push": "$Array1" },
"last": { "$last": "$Array1" }
}},
{ "$unwind": "$Array1" },
{ "$redact": {
"$cond": {
"if": { "$eq": [ "$Array1", "$last" ] },
"then": "$$PRUNE",
"else": "$$KEEP"
}
}},
{ "$group": {
"_id": "$_id",
"Array1": { "$push": "$Array1" }
}}
])
In either case, you are essentially comparing the $last element found in the array with the whole array and removing that from the selection.
But personally, unless you need this type of operation for further aggregation, then do it in client code. Or wait for the next release of MongoDB where the new $slice operator makes this simple:
db.collection.aggregate([
{ "$project": {
"Array1": {
"$slice": [
"$Array1",
0,
{ "$subtract": [
{ "$size": "$Array1" },
1
]}
]
}
}}
])
All produce ( in varying forms, as with the "set" operation ) :
{
"_id" : ObjectId("55cb4ef04f67f8a950c7b8fa"),
"Array1" : [
"A",
"B",
"C",
"D"
]
}
{
"_id" : ObjectId("55cb4ef04f67f8a950c7b8fb"),
"Array1" : [
"A",
"B",
"C",
"X"
]
}
{
"_id" : ObjectId("55cb4ef04f67f8a950c7b8fc"),
"Array1" : [
"A",
"B",
"C",
"L"
]
}
Related
I'm just learning mongodb aggregation framework
There are data in the the format below:
{
"questionType": "multiple",
"multipleOptions": ["first", "second", "third", "forth"],
"answers": ["first", "second", "second", "first", "first", "forth"]
},
{
"questionType": "multiple",
"multipleOptions": ["awful", "bad", "soso", "good", "excellent"],
"answers": ["bad", "bad", "good", "soso", "bad", "excellent", "awful", "soso"]
}
I want to aggregate these to something like this:
{
"result": { "first": 3, "second": 2, "forth": 1 }
},
{
"result": { "awful": 1, "bad": 3, "soso": 2, "good": 1, "excellent": 1 }
}
Or like this (no difference):
{
"result": [["first", 3], ["second", 2], ["forth", 1]]
},
{
"result": [["awful", 1], ["bad", 3], ["soso", 2], ["good", 1], ["excellent", 1]]
}
Is there a way to do this in a $project stage?
This can be done with a cohort of array operators working in conjunction to produce the desired effect.
You essentially need an operation that creates an array of key/value pairs of the counts you need. This will then be converted to a hash map. The array of key value pairs is essentially a map which is constructed by looping through the multipleOptions array and checking the size of the elements that match in the answers array.
TLDR;
The final pipeline you need to run follows:
db.collection.aggregate([
{ "$project": {
"result": {
"$arrayToObject": {
"$map": {
"input": { "$range": [ 0, { "$size": "$multipleOptions" } ] },
"as": "idx",
"in": {
"$let": {
"vars": {
"k": {
"$arrayElemAt": [
"$multipleOptions",
"$$idx"
]
},
"v": {
"$size": {
"$filter": {
"input": "$answers",
"as": "ans",
"cond": {
"$eq": [
"$$ans",
{
"$arrayElemAt": [
"$multipleOptions",
"$$idx"
]
}
]
}
}
}
}
},
"in": { "k": "$$k", "v": "$$v" }
}
}
}
}
}
} }
])
To demonstrate this step by step, lets create an additional field in an aggregate operation, this field will be an array of the
counts of the corresponding array element. We need something like
{
"questionType": "multiple",
"multipleOptions": ["awful", "bad", "soso", "good", "excellent"],
"answersCount": [1, 3, 2, 1, 1],
"answers": ["bad", "bad", "good", "soso", "bad", "excellent", "awful", "soso"]
}
To get this we need a way to loop through the multipleOptions and for each option, iterate the answers array, filter it and count the number of elements in the filtered array. The pseudo-algorithm follows:
answersCount = []
for each elem in ["awful", "bad", "soso", "good", "excellent"]:
filteredAnswers = [<answers array containing only elem>]
count = filteredAnswers.length
answersCount.push(count)
In mongo, the filtering part can be done using $filter on the answers array and elements can be referenced with $arrayElemAt
{
"$filter": {
"input": "$answers",
"as": "ans",
"cond": {
"$eq": [
"$$ans",
{ "$arrayElemAt": [ "$multipleOptions", "$$idx" ] }
]
}
}
}
The counts are derived using $size on the above expression
{
"$size": {
"$filter": {
"input": "$answers",
"as": "ans",
"cond": {
"$eq": [
"$$ans",
{ "$arrayElemAt": [ "$multipleOptions", "$$idx" ] }
]
}
}
}
}
For getting the outer loop, we can use $range and $map as
{
"$map": {
"input": { "$range": [ 0, { "$size": "$multipleOptions" } ] },
"as": "idx",
"in": {
"$let": {
"vars": {
"v": {
"$size": {
"$filter": {
"input": "$answers",
"as": "ans",
"cond": {
"$eq": [
"$$ans",
{ "$arrayElemAt": [ "$multipleOptions", "$$idx" ] }
]
}
}
}
}
},
"in": "$$v"
}
}
}
}
This will produce the answersCount in the following aggregate operation
db.collection.aggregate([
{ "$addFields": {
"answersCount": {
"$map": {
"input": { "$range": [ 0, { "$size": "$multipleOptions" } ] },
"as": "idx",
"in": {
"$let": {
"vars": {
"v": {
"$size": {
"$filter": {
"input": "$answers",
"as": "ans",
"cond": {
"$eq": [
"$$ans",
{ "$arrayElemAt": [ "$multipleOptions", "$$idx" ] }
]
}
}
}
}
},
"in": "$$v"
}
}
}
}
} }
])
To then get to the desired output, you need the answersCount to be an array of key/value pairs i.e.
{
"answersCount": [
{ "k": "awful", "v": 1},
{ "k": "bad", "v": 3},
{ "k": "soso", "v": 2},
{ "k": "good", "v": 1},
{ "k": "excellent", "v": 1}
],
}
and when you apply $arrayToObject on the above expression i.e.
{ "$arrayToObject": {
"answersCount": [
{ "k": "awful", "v": 1},
{ "k": "bad", "v": 3},
{ "k": "soso", "v": 2},
{ "k": "good", "v": 1},
{ "k": "excellent", "v": 1}
],
} }
you get
{
"awful" : 1,
"bad" : 3,
"soso" : 2,
"excellent" : 1,
"good" : 1
}
This is a good use case for "multi-stage grouping." Let's begin with an $unwind of answers:
c = db.foo.aggregate([
{$unwind: "$answers"}
]);
{
"_id" : 0,
"questionType" : "multiple",
"multipleOptions" : [
"first",
"second",
"third",
"forth"
],
"answers" : "first"
}
{
"_id" : 0,
"questionType" : "multiple",
"multipleOptions" : [
"first",
"second",
"third",
"forth"
],
"answers" : "second"
}
{
"_id" : 0,
"questionType" : "multiple",
"multipleOptions" : [
"first",
"second",
"third",
"forth"
],
"answers" : "second"
}
// ...
Now we have answers and _id as peers ready to group:
db.foo.aggregate([
{$unwind: "$answers"}
,{$group: {_id: {Xid:"$_id", answer:"$answers"}, n:{$sum:1} }}
]);
{ "_id" : { "Xid" : 1, "answer" : "awful" }, "n" : 1 }
{ "_id" : { "Xid" : 1, "answer" : "excellent" }, "n" : 1 }
{ "_id" : { "Xid" : 1, "answer" : "soso" }, "n" : 2 }
{ "_id" : { "Xid" : 1, "answer" : "bad" }, "n" : 3 }
{ "_id" : { "Xid" : 0, "answer" : "forth" }, "n" : 1 }
Now we group again, this time by the _id.Xid and then use $push to construct the output array of results:
db.foo.aggregate([
{$unwind: "$answers"}
,{$group: {_id: {Xid:"$_id", answer:"$answers"}, n:{$sum:1} }}
,{$group: {_id: "$_id.Xid", result: {$push: {answer: "$_id.answer", n: "$n" }} }}
]);
{
"_id" : 0,
"result" : [
{
"answer" : "forth",
"n" : 1
},
{
"answer" : "second",
"n" : 2
},
{
"answer" : "first",
"n" : 3
}
]
}
{
"_id" : 1,
"result" : [
{
"answer" : "awful",
"n" : 1
},
{
"answer" : "excellent",
"n" : 1
},
{
"answer" : "soso",
"n" : 2
},
{
"answer" : "bad",
"n" : 3
},
{
"answer" : "good",
"n" : 1
}
]
}
So in spirit we have a solution but to really press the point, we will use the $arrayToObject function to turn the array of options from the values of the answer key to keys in their own right. To do so, we will name the $push object args k and v to properly drive the function:
db.foo.aggregate([
{$unwind: "$answers"}
,{$group: {_id: {Xid:"$_id", answer:"$answers"}, n:{$sum:1} }}
,{$group: {_id: "$_id.Xid", QQ: {$push: {k: "$_id.answer", v: "$n" }} }}
,{$project: {_id: true, result: {$arrayToObject: "$QQ"} }}
]);
which yields:
{ "_id" : 0, "result" : { "forth" : 1, "second" : 2, "first" : 3 } }
{
"_id" : 1,
"result" : {
"awful" : 1,
"excellent" : 1,
"soso" : 2,
"bad" : 3,
"good" : 1
}
}
Can someone help me with the query for sorting an array by date in ascending order and as well display the cCode? I am able to sort the array and project it but am unable to project the cCode along with bal array,
db.collection.aggregate([
{ "$match": {
"_id": {
"$eq": {
"a": "NA",
"b": "HXYZ",
"c": "12345",
"d": "AA"
}
}
}},
{ "$unwind": "$bal" },
{ "$sort": { "bal.date": 1 }},
{ "$group": {"_id": "$_id",
"bal": {"$push": "$bal"}}},
{ $project: {
bal: { $slice: ["$bal",2]} ,"cCode": 1}
}
])
My collection:
/* 1 */
{
"_id" : {
"a" : "NA",
"b" : "HXYZ",
"c" : "12345",
"d" : "AA"
},
"cCode" : "HHH",
"bal" : [
{
"type" : "E",
"date" : "2015-08-02"
},
{
"type" : "E",
"date" : "2015-08-01"
},
{
"type" : "E",
"date" : "2015-07-07"
}
]
}
Please help me what is the problem in the above query. Thanks in advance.
Your cCode field vanished when you use $group stage. So, To get that field again in the pipeline you need to use $first aggregation. Something like this
db.collection.aggregate([
{ "$match": {
"_id": { "$eq": { "a": "NA", "b": "HXYZ", "c": "12345", "d": "AA" }}
}},
{ "$unwind": "$bal" },
{ "$sort": { "bal.date": 1 }},
{ "$group": {
"_id": "$_id",
"bal": { "$push": "$bal" },
"cCode": { "$first": "$cCode" }
}},
{ "$project": { "bal": { "$slice": ["$bal", 2] } ,"cCode": 1 }}
])
Given this function, I have a data set that I am querying. The data looks like this:
db.activity.insert(
{
"_id" : ObjectId("5908e64e3b03ca372dc945d5"),
"startDate" : ISODate("2017-05-06T00:00:00Z"),
"details" : [
{
"code" : "2",
"_id" : ObjectId("5908ebf96ae5003a4471c9b2"),
"walkDistance" : "03",
"jogDistance" : "01",
"runDistance" : "08",
"sprintDistance" : "01"
}
]
}
)
db.activity.insert(
{
"_id" : ObjectId("58f79163bebac50d5b2ae760"),
"startDate" : ISODate("2017-05-07T00:00:00Z"),
"details" : [
{
"code" : "2",
"_id" : ObjectId("58f7948fbebac50d5b2ae7f2"),
"walkDistance" : "01",
"jogDistance" : "02",
"runDistance" : "09",
"sprintDistance" : ""
}
]
}
)
Using this function, thanks to Neil Lunn, I am able to get my desired output:
db.activity.aggregate([
{ "$project": {
"_id": 0,
"unique": {
"$filter": {
"input": {
"$setDifference": [
{ "$concatArrays": [
"$details.walkDistance",
"$details.jogDistance",
"$details.runDistance",
"$details.sprintDistance"
]},
[]
]
},
"cond": { "$ne": [ "$$this", "" ] }
}
}
}},
{ "$unwind": "$unique" },
{ "$group": {
"_id": null,
"uniqueArray": { "$addToSet": "$unique" }
}}
])
However, I cannot add a match statement to the beginning.
db.activity.aggregate([
{$match: {"startDate" : ISODate("2017-05-06T00:00:00Z"), "details.code": "2" },
{$unwind: '$details'},
{$match: {"startDate" : ISODate("2017-05-06T00:00:00Z"), "details.code": "2" },
{ "$project": {
"_id": 0,
"unique": {
"$filter": {
"input": {
"$setDifference": [
{ "$concatArrays": [
"$details.walkDistance",
"$details.jogDistance",
"$details.runDistance",
"$details.sprintDistance"
]},
[]
]
},
"cond": { "$ne": [ "$$this", "" ] }
}
}
}},
{ "$unwind": "$unique" },
{ "$group": {
"_id": null,
"uniqueArray": { "$addToSet": "$unique" }
}}
])
Because it gives an error message of:
> $concatArrays only supports arrays, not string
How can I modify this query so that a $match statement can be added?
Don't $unwind the array you are feeding to $concatArrays. Instead apply $filter to only extract the matching values. And as stated, we can just use $setUnion for the 'unique concatenation' instead:
db.activity.aggregate([
{ "$match": { "startDate" : ISODate("2017-05-06T00:00:00Z"), "details.code": "2" } },
{ "$project": {
"_id": 0,
"unique": {
"$let": {
"vars": {
"filtered": {
"$filter": {
"input": "$details",
"cond": { "$eq": [ "$$this.code", "2" ] }
}
}
},
"in": {
"$setDifference": [
{ "$setUnion": [
"$$filtered.walkDistance",
"$$filtered.jogDistance",
"$$filtered.runDistance",
"$$filtered.sprintDistance"
]},
[""]
]
}
}
}
}},
{ "$unwind": "$unique" },
{ "$group": {
"_id": null,
"uniqueArray": { "$addToSet": "$unique" }
}}
])
Using $let makes things a bit cleaner syntax wise since you don't need to specify multiple $map and $filter statements "inline" as the source for $setUnion
I have a data set that I am querying. The data looks like this:
db.activity.insert(
{
"_id" : ObjectId("5908e64e3b03ca372dc945d5"),
"startDate" : ISODate("2017-05-06T00:00:00Z"),
"details" : [
{
"code" : "2",
"_id" : ObjectId("5908ebf96ae5003a4471c9b2"),
"walkDistance" : "03",
"jogDistance" : "01",
"runDistance" : "08",
"sprintDistance" : "01"
}
]
}
)
db.activity.insert(
{
"_id" : ObjectId("58f79163bebac50d5b2ae760"),
"startDate" : ISODate("2017-05-07T00:00:00Z"),
"details" : [
{
"code" : "2",
"_id" : ObjectId("58f7948fbebac50d5b2ae7f2"),
"walkDistance" : "01",
"jogDistance" : "02",
"runDistance" : "09",
"sprintDistance" : ""
}
]
}
)
My desired output looks as such:
[
{
"_id": null,
"uniqueValues": [
"03",
"01",
"08",
"02",
"09"
]
}
]
In order to do that, I've developed the following code:
db.activity.aggregate([
{
$facet: {
"walk": [
{$unwind: '$details'},
{$group: {_id: null, uniqueValues: {$addToSet: "$details.walkDistance"}}}
], "jog": [
{$unwind: '$details'},
{$group: {_id: null, uniqueValues: {$addToSet: "$details.jogDistance"}}}
], "run": [
{$unwind: '$details'},
{$group: {_id: null, uniqueValues: {$addToSet: "$details.runDistance"}}}
], "sprint": [
{$unwind: '$details'},
{$group: {_id: null, uniqueValues: {$addToSet: "$details.sprintDistance"}}}
]
}
}])
However, I am still getting 4 different facets with their own _id: null and uniqueValues array. How do I change the query so that they all included in a single array, and the "" is also excluded.
$facet really is not the best thing to use here. You should really just be applying $concatArrays and filtering down the result with $setDifference and $filter:
db.activity.aggregate([
{ "$project": {
"_id": 0,
"unique": {
"$filter": {
"input": {
"$setDifference": [
{ "$concatArrays": [
"$details.walkDistance",
"$details.jogDistance",
"$details.runDistance",
"$details.sprintDistance"
]},
[]
]
},
"cond": { "$ne": [ "$$this", "" ] }
}
}
}},
{ "$unwind": "$unique" },
{ "$group": {
"_id": null,
"uniqueArray": { "$addToSet": "$unique" }
}}
])
Returns the result:
/* 1 */
{
"_id" : null,
"uniqueArray" : [
"09",
"03",
"01",
"02",
"08"
]
}
So after bringing all the array values into a single array using $concatArrays, you apply $setDifference to reduce the list to the "unique" values. The $filter removes the "" values you don't want.
Then it's just a matter of applying $unwind on the singular and reduced list and bringing it back together in the $group with $addToSet to only keep unique values across documents.
You could also just $concatArrays only and then $unwind and $match, but the other operators don't really cost much and reduce some of the load by already narrowing down to "unique" within the document before you get to the $unwind. So it's better to do it that way.
Really this can even be broken down futher, to simply $setUnion and $setDifference since we are talking about "sets" afterall:
db.activity.aggregate([
{ "$project": {
"_id": 0,
"unique": {
"$setDifference": [
{ "$setUnion": [
"$details.walkDistance",
"$details.jogDistance",
"$details.runDistance",
"$details.sprintDistance"
]},
[""]
]
}
}},
{ "$unwind": "$unique" },
{ "$group": {
"_id": null,
"uniqueArray": { "$addToSet": "$unique" }
}}
])
And that means that the overall statement becomes compatible back to MongoDB 2.6, or would be if all the forms such as $details.walkDistance were written out in their longer form using $map:
"$setDifference": [
{ "$setUnion": [
{ "$map": { "input": "$details", "as": "d", "in": "$$d.walkDistance" } },
{ "$map": { "input": "$details", "as": "d", "in": "$$d.jogDistance" } },
{ "$map": { "input": "$details", "as": "d", "in": "$$d.runDistance" } },
{ "$map": { "input": "$details", "as": "d", "in": "$$d.sprintDistance" } }
]},
[""]
]
On the other hand running $facet causes a "brute force" parse through the whole collection for every property from within the array, and $unwind being processed on each of those passes. So it's a really inefficient way to obtain the result. So don't do it that way.
Using the example zipcodes collection, I have a query like this:
db.zipcodes.aggregate([
{ "$match": {"state": {"$in": ["PA","NY"]}}},
{ "$group": { "_id": { "city": "$city" }, "ZipsPerCity": {"$addToSet": "$_id"}}},
{ "$match": { "ZipsPerCity" : { "$size": 2 }}},
]).pretty()
This is just an example that looks for cities (in the state of NY and PA) that have 2 zipcodes:
{
"_id" : {
"city" : "BETHLEHEM"
},
"ZipsPerCity" : [
"18018",
"18015"
]
}
{
"_id" : {
"city" : "BEAVER SPRINGS"
},
"ZipsPerCity" : [
"17843",
"17812"
]
}
Now suppose that I want to compare "BEAVER SPRINGS" zip codes to "BETHLEHEM" zip codes, using the "$setDifference" set operator? I tried using the "$setDifference" operator in a $project operator, like this:
db.zipcodes.aggregate([
{ "$match": { "state": {"$in": ["PA","NY"]}}},
{ "$group": { "_id: {city : "$city"},"ZipsPerCity": {$addToSet: "$_id"}}},
{ "$match": { "ZipsPerCity" : { $size: 2 }}},
{ "$project": {
"int": { "$setDifference":[
"$_id.city.BETHLEHEM.ZipsPerCity",
"$_id.city.BEAVER SPRINGS.ZipsPerCity"
]}
}}
]).pretty()
That doesn't even look right, let alone produce results. No errors though.
How would you refer to a couple of arrays built using $addToSet like this, using $setDifference (or any of the set operators)?
The first thing about what you are trying to do here is that the arrays you want to compare are actually in two different documents. All of the aggregation framework operators in fact work on only one document at a time, with the exception of $group which is meant to "aggregate" documents and possibly $unwind which essentially turns one document into many.
In order to compare you would need the data to occur in one document, or at least be "paired" in some way. So there is a technique to do that:
db.zipcodes.aggregate([
{ "$match": {"state": { "$in": [ "PA","NY" ] } }},
{ "$group": {
"_id": "$city",
"ZipsPerCity": { "$addToSet": "$_id"}
}},
{ "$match": { "ZipsPerCity" : { "$size": 2 } }},
{ "$group": {
"_id": null,
"A": { "$min": {
"$cond": [
{ "$eq": [ "$_id", "BETHLEHEM" ] },
{ "city": "$_id", "ZipsPerCity": "$ZipsPerCity" },
false
]
}},
"B": { "$min": {
"$cond": [
{ "$eq": [ "$_id", "BEAVER SPRINGS" ] },
{ "city": "$_id", "ZipsPerCity": "$ZipsPerCity" },
false
]
}}
}},
{ "$project": {
"A": 1,
"B": 1,
"C": { "$setDifference": [ "$A.ZipsPerCity", "$B.ZipsPerCity" ] }
}}
])
That is a little contrived and I am well aware that the actual result set has more than two cities, but the point it to illustrate that the arrays/sets sent to the "set operators" such as $setDifference need to be in the same document.
The result here compares the "left" array with the "right" array, returning the members from the "left" that are different to the "right". Both sets are unique here with no overlap so the results should be expected:
{
"_id" : null,
"A" : {
"city" : "BETHLEHEM",
"ZipsPerCity" : [
"18018",
"18015"
]
},
"B" : {
"city" : "BEAVER SPRINGS",
"ZipsPerCity" : [
"17843",
"17812"
]
},
"C" : [
"18018",
"18015"
]
}
This is really better illustrated with actual "sets" with common members. So this document:
{ "A" : [ "A", "A", "B", "C", "D" ], "B" : [ "B", "C" ] }
Responds to $setDifference:
{ "C" : [ "A", "D" ] }
And $setEquals:
{ "C" : false }
$setIntersection:
{ "C" : [ "B", "C" ] }
$setUnion:
{ "C" : [ "B", "D", "C", "A" ] }
$setIsSubSet reversing the order to $B, $A:
{ "C" : true }
The other set operators $anyElementTrue and $allElementsTrue are likely most useful when used along with the $map operator which can re-shape arrays and evaluate conditions against each element.
A very good usage of $map is alongside $setDifference, where you can "filter" array contents without using $unwind:
db.arrays.aggregate([
{ "$project": {
"A": {
"$setDifference": [
{
"$map": {
"input": "$A",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "A" ] },
"$$el",
false
]
}
}
},
[false]
]
}
}}
])
That can be very handy when you have a lot of results in the pipeline and you do not want to "expand" out all of those results by "unwinding" the array. But note that this is a "set" and as such only one element matching "A" is returned:
{ "A" : ["A"] }
So the things to keep in mind here are that you:
Operate only within the "same" document at a time
The results are generally "sets" and that means they are both "unique" and "un-ordered" as a result.
Overall that should be a decent run-down on what the set operators are and how you use them.