Combining Unique Items From Arrays - mongodb

I have a data set that I am querying. The data looks like this:
db.activity.insert(
{
"_id" : ObjectId("5908e64e3b03ca372dc945d5"),
"startDate" : ISODate("2017-05-06T00:00:00Z"),
"details" : [
{
"code" : "2",
"_id" : ObjectId("5908ebf96ae5003a4471c9b2"),
"walkDistance" : "03",
"jogDistance" : "01",
"runDistance" : "08",
"sprintDistance" : "01"
}
]
}
)
db.activity.insert(
{
"_id" : ObjectId("58f79163bebac50d5b2ae760"),
"startDate" : ISODate("2017-05-07T00:00:00Z"),
"details" : [
{
"code" : "2",
"_id" : ObjectId("58f7948fbebac50d5b2ae7f2"),
"walkDistance" : "01",
"jogDistance" : "02",
"runDistance" : "09",
"sprintDistance" : ""
}
]
}
)
My desired output looks as such:
[
{
"_id": null,
"uniqueValues": [
"03",
"01",
"08",
"02",
"09"
]
}
]
In order to do that, I've developed the following code:
db.activity.aggregate([
{
$facet: {
"walk": [
{$unwind: '$details'},
{$group: {_id: null, uniqueValues: {$addToSet: "$details.walkDistance"}}}
], "jog": [
{$unwind: '$details'},
{$group: {_id: null, uniqueValues: {$addToSet: "$details.jogDistance"}}}
], "run": [
{$unwind: '$details'},
{$group: {_id: null, uniqueValues: {$addToSet: "$details.runDistance"}}}
], "sprint": [
{$unwind: '$details'},
{$group: {_id: null, uniqueValues: {$addToSet: "$details.sprintDistance"}}}
]
}
}])
However, I am still getting 4 different facets with their own _id: null and uniqueValues array. How do I change the query so that they all included in a single array, and the "" is also excluded.

$facet really is not the best thing to use here. You should really just be applying $concatArrays and filtering down the result with $setDifference and $filter:
db.activity.aggregate([
{ "$project": {
"_id": 0,
"unique": {
"$filter": {
"input": {
"$setDifference": [
{ "$concatArrays": [
"$details.walkDistance",
"$details.jogDistance",
"$details.runDistance",
"$details.sprintDistance"
]},
[]
]
},
"cond": { "$ne": [ "$$this", "" ] }
}
}
}},
{ "$unwind": "$unique" },
{ "$group": {
"_id": null,
"uniqueArray": { "$addToSet": "$unique" }
}}
])
Returns the result:
/* 1 */
{
"_id" : null,
"uniqueArray" : [
"09",
"03",
"01",
"02",
"08"
]
}
So after bringing all the array values into a single array using $concatArrays, you apply $setDifference to reduce the list to the "unique" values. The $filter removes the "" values you don't want.
Then it's just a matter of applying $unwind on the singular and reduced list and bringing it back together in the $group with $addToSet to only keep unique values across documents.
You could also just $concatArrays only and then $unwind and $match, but the other operators don't really cost much and reduce some of the load by already narrowing down to "unique" within the document before you get to the $unwind. So it's better to do it that way.
Really this can even be broken down futher, to simply $setUnion and $setDifference since we are talking about "sets" afterall:
db.activity.aggregate([
{ "$project": {
"_id": 0,
"unique": {
"$setDifference": [
{ "$setUnion": [
"$details.walkDistance",
"$details.jogDistance",
"$details.runDistance",
"$details.sprintDistance"
]},
[""]
]
}
}},
{ "$unwind": "$unique" },
{ "$group": {
"_id": null,
"uniqueArray": { "$addToSet": "$unique" }
}}
])
And that means that the overall statement becomes compatible back to MongoDB 2.6, or would be if all the forms such as $details.walkDistance were written out in their longer form using $map:
"$setDifference": [
{ "$setUnion": [
{ "$map": { "input": "$details", "as": "d", "in": "$$d.walkDistance" } },
{ "$map": { "input": "$details", "as": "d", "in": "$$d.jogDistance" } },
{ "$map": { "input": "$details", "as": "d", "in": "$$d.runDistance" } },
{ "$map": { "input": "$details", "as": "d", "in": "$$d.sprintDistance" } }
]},
[""]
]
On the other hand running $facet causes a "brute force" parse through the whole collection for every property from within the array, and $unwind being processed on each of those passes. So it's a really inefficient way to obtain the result. So don't do it that way.

Related

merge records in to one using mongodb

Here is my collection structure, tried $mergeObjects , but did not know how to use in right way, please help :
{
"_id" : ObjectId("5e39b407eb2b5e4c3c80c5b0"),
"groupId":"1",
"emp" : {
"roles" : [
{
"roleId" : "role1"
}
],
"designation" : [
"Manager"
],
"dept" : [
{
"deptId" : "dept1"
}
]
}
},
{
"_id" : ObjectId("5e39b435eb2b5e4c3c80c5b1"),
"groupId":"1",
"emp" : {
"roles" : [
{
"roleId" : "role2"
}
],
"designation" : [
"Developer"
],
"dept" : [
{
"deptId" : "dept2"
}
]
}
}
I want an aggregate query which merges the two documents w.r.t. "groupId" like this using mongodb
{
"_id" : <some id>,
"groupId":"1",
"emp" : {
"roles" : [
{
"roleId" : "role1"
},
{
"roleId" : "role2"
}
],
"designation" : [
"Manager","Developer"
],
"dept" : [
{
"deptId" : "dept1"
},
{
"deptId" : "dept2"
}
]
}
}
There are around 200 to 300 fields like arrays, array of an array , array of an array of an array and so on.
I don't think the $mergeObject operator would work in this given it overrides fields, for example the roleId in role would always be role2 in this case if $mergeObject is used. I are going to need a multistage solution combining $group to combine the record together and a $project to restructure the fields inside emp into what you want.
Try this:
db.collection.aggregate([
{ "$group": {
"_id": "$groupId",
"roles": { "$push": "$emp.roles" },
"designation": { "$push": "$emp.designation" },
"dept": { "$push": "$emp.dept" },
}},
{ "$project": {
"_id": 0,
"groupId": "$_id",
"roles": {
"$reduce": {
"input": "$roles",
"initialValue": [],
"in": { "$concatArrays": [ "$$value", "$$this" ] }
}
},
"designation": {
"$reduce": {
"input": "$designation",
"initialValue": [],
"in": { "$concatArrays": [ "$$value", "$$this" ] }
}
},
"dept": {
"$reduce": {
"input": "$dept",
"initialValue": [],
"in": { "$concatArrays": [ "$$value", "$$this" ] }
}
}
}}
]);
merging arrays together without repeating values is done using operator $concatArrays and this is done by pushing all arrays together in $group then using $reduce like in previous answer by Josh Balcitis
another operator that gives unique result isr $addToSet instead of $push but you have to $unwind all array records before $group to add unique single records.
ex:
db.collection.aggregate([{
$unwind: {
path: "$emp.roles",
preserveNullAndEmptyArrays: true
}
},
{
$unwind: {
path: "$emp.designation",
preserveNullAndEmptyArrays: true
}
},
{
$unwind: {
path: "$emp.dept",
preserveNullAndEmptyArrays: true
}
},
{
"$group": {
"_id": "$groupId",
"roles": {
$addToSet: "$emp.roles"
},
"designation": {
$addToSet: "$emp.designation"
},
"dept": {
$addToSet: "$emp.dept"
},
}
},
]);
which approach is better? that depends on your data and $group vs $reduce perfomance
note: preserveNullAndEmptyArrays option is used to $prevent empty arrays from affecting whole result, its optional

Filter Array Content to a Query containing $concatArrays

Given this function, I have a data set that I am querying. The data looks like this:
db.activity.insert(
{
"_id" : ObjectId("5908e64e3b03ca372dc945d5"),
"startDate" : ISODate("2017-05-06T00:00:00Z"),
"details" : [
{
"code" : "2",
"_id" : ObjectId("5908ebf96ae5003a4471c9b2"),
"walkDistance" : "03",
"jogDistance" : "01",
"runDistance" : "08",
"sprintDistance" : "01"
}
]
}
)
db.activity.insert(
{
"_id" : ObjectId("58f79163bebac50d5b2ae760"),
"startDate" : ISODate("2017-05-07T00:00:00Z"),
"details" : [
{
"code" : "2",
"_id" : ObjectId("58f7948fbebac50d5b2ae7f2"),
"walkDistance" : "01",
"jogDistance" : "02",
"runDistance" : "09",
"sprintDistance" : ""
}
]
}
)
Using this function, thanks to Neil Lunn, I am able to get my desired output:
db.activity.aggregate([
{ "$project": {
"_id": 0,
"unique": {
"$filter": {
"input": {
"$setDifference": [
{ "$concatArrays": [
"$details.walkDistance",
"$details.jogDistance",
"$details.runDistance",
"$details.sprintDistance"
]},
[]
]
},
"cond": { "$ne": [ "$$this", "" ] }
}
}
}},
{ "$unwind": "$unique" },
{ "$group": {
"_id": null,
"uniqueArray": { "$addToSet": "$unique" }
}}
])
However, I cannot add a match statement to the beginning.
db.activity.aggregate([
{$match: {"startDate" : ISODate("2017-05-06T00:00:00Z"), "details.code": "2" },
{$unwind: '$details'},
{$match: {"startDate" : ISODate("2017-05-06T00:00:00Z"), "details.code": "2" },
{ "$project": {
"_id": 0,
"unique": {
"$filter": {
"input": {
"$setDifference": [
{ "$concatArrays": [
"$details.walkDistance",
"$details.jogDistance",
"$details.runDistance",
"$details.sprintDistance"
]},
[]
]
},
"cond": { "$ne": [ "$$this", "" ] }
}
}
}},
{ "$unwind": "$unique" },
{ "$group": {
"_id": null,
"uniqueArray": { "$addToSet": "$unique" }
}}
])
Because it gives an error message of:
> $concatArrays only supports arrays, not string
How can I modify this query so that a $match statement can be added?
Don't $unwind the array you are feeding to $concatArrays. Instead apply $filter to only extract the matching values. And as stated, we can just use $setUnion for the 'unique concatenation' instead:
db.activity.aggregate([
{ "$match": { "startDate" : ISODate("2017-05-06T00:00:00Z"), "details.code": "2" } },
{ "$project": {
"_id": 0,
"unique": {
"$let": {
"vars": {
"filtered": {
"$filter": {
"input": "$details",
"cond": { "$eq": [ "$$this.code", "2" ] }
}
}
},
"in": {
"$setDifference": [
{ "$setUnion": [
"$$filtered.walkDistance",
"$$filtered.jogDistance",
"$$filtered.runDistance",
"$$filtered.sprintDistance"
]},
[""]
]
}
}
}
}},
{ "$unwind": "$unique" },
{ "$group": {
"_id": null,
"uniqueArray": { "$addToSet": "$unique" }
}}
])
Using $let makes things a bit cleaner syntax wise since you don't need to specify multiple $map and $filter statements "inline" as the source for $setUnion

Mongodb count all array elements in all objects matching by criteria

I have a collection that is log of activity on objects like this:
{
"_id" : ObjectId("55e3fd1d7cb5ac9a458b4567"),
"object_id" : "1",
"activity" : [
{
"action" : "test_action",
"time" : ISODate("2015-08-31T00:00:00.000Z")
},
{
"action" : "test_action",
"time" : ISODate("2015-08-31T00:00:22.000Z")
}
]
}
{
"_id" : ObjectId("55e3fd127cb5ac77478b4567"),
"object_id" : "2",
"activity" : [
{
"action" : "test_action",
"time" : ISODate("2015-08-31T00:00:00.000Z")
}
]
}
{
"_id" : ObjectId("55e3fd0f7cb5ac9f458b4567"),
"object_id" : "1",
"activity" : [
{
"action" : "test_action",
"time" : ISODate("2015-08-30T00:00:00.000Z")
}
]
}
If i do followoing query:
db.objects.find({
"createddate": {$gte : ISODate("2015-08-30T00:00:00.000Z")},
"activity.action" : "test_action"}
}).count()
it returns count of documents containing "test_action" (3 in this set), but i need to get count of all test_actions (4 on this set). How do i do that?
The most "performant" way to do this is to skip the $unwind altogther and simply $group to count. Essentially "filter" arrays get the $size of the results to $sum:
db.objects.aggregate([
{ "$match": {
"createddate": {
"$gte": ISODate("2015-08-30T00:00:00.000Z")
},
"activity.action": "test_action"
}},
{ "$group": {
"_id": null,
"count": {
"$sum": {
"$size": {
"$setDifference": [
{ "$map": {
"input": "$activity",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.action", "test_action" ] },
"$$el",
false
]
}
}},
[false]
]
}
}
}
}}
])
Since MongoDB version 3.2 we can use $filter, which makes this much more simple:
db.objects.aggregate([
{ "$match": {
"createddate": {
"$gte": ISODate("2015-08-30T00:00:00.000Z")
},
"activity.action": "test_action"
}},
{ "$group": {
"_id": null,
"count": {
"$sum": {
"$size": {
"$filter": {
"input": "$activity",
"as": "el",
"cond": {
"$eq": [ "$$el.action", "test_action" ]
}
}
}
}
}
}}
])
Using $unwind causes the documents to de-normalize and effectively creates a copy per array entry. Where possible you should avoid this due the the often extreme cost. Filtering and counting array entries per document is much faster by comparison. As is a simple $match and $group pipeline compared to many stages.
You can do so by using aggregation:
db.objects.aggregate([
{$match: {"createddate": {$gte : ISODate("2015-08-30T00:00:00.000Z")}, {"activity.action" : "test_action"}}},
{$unwind: "$activity"},
{$match: {"activity.action" : "test_action"}}},
{$group: {_id: null, count: {$sum: 1}}}
])
This will produce a result like:
{
count: 4
}

How to find document and single subdocument matching given criterias in MongoDB collection

I have collection of products. Each product contains array of items.
> db.products.find().pretty()
{
"_id" : ObjectId("54023e8bcef998273f36041d"),
"shop" : "shop1",
"name" : "product1",
"items" : [
{
"date" : "01.02.2100",
"purchasePrice" : 1,
"sellingPrice" : 10,
"count" : 15
},
{
"date" : "31.08.2014",
"purchasePrice" : 10,
"sellingPrice" : 1,
"count" : 5
}
]
}
So, can you please give me an advice, how I can query MongoDB to retrieve all products with only single item which date is equals to the date I pass to query as parameter.
The result for "31.08.2014" must be:
{
"_id" : ObjectId("54023e8bcef998273f36041d"),
"shop" : "shop1",
"name" : "product1",
"items" : [
{
"date" : "31.08.2014",
"purchasePrice" : 10,
"sellingPrice" : 1,
"count" : 5
}
]
}
What you are looking for is the positional $ operator and "projection". For a single field you need to match the required array element using "dot notation", for more than one field use $elemMatch:
db.products.find(
{ "items.date": "31.08.2014" },
{ "shop": 1, "name":1, "items.$": 1 }
)
Or the $elemMatch for more than one matching field:
db.products.find(
{ "items": {
"$elemMatch": { "date": "31.08.2014", "purchasePrice": 1 }
}},
{ "shop": 1, "name":1, "items.$": 1 }
)
These work for a single array element only though and only one will be returned. If you want more than one array element to be returned from your conditions then you need more advanced handling with the aggregation framework.
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$unwind": "$items" },
{ "$match": { "items.date": "31.08.2014" } },
{ "$group": {
"_id": "$_id",
"shop": { "$first": "$shop" },
"name": { "$first": "$name" },
"items": { "$push": "$items" }
}}
])
Or possibly in shorter/faster form since MongoDB 2.6 where your array of items contains unique entries:
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$project": {
"shop": 1,
"name": 1,
"items": {
"$setDifference": [
{ "$map": {
"input": "$items",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.date", "31.08.2014" ] },
"$$el",
false
]
}
}},
[false]
]
}
}}
])
Or possibly with $redact, but a little contrived:
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$redact": {
"$cond": [
{ "$eq": [ { "$ifNull": [ "$date", "31.08.2014" ] }, "31.08.2014" ] },
"$$DESCEND",
"$$PRUNE"
]
}}
])
More modern, you would use $filter:
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$addFields": {
"items": {
"input": "$items",
"cond": { "$eq": [ "$$this.date", "31.08.2014" ] }
}
}}
])
And with multiple conditions, the $elemMatch and $and within the $filter:
db.products.aggregate([
{ "$match": {
"$elemMatch": { "date": "31.08.2014", "purchasePrice": 1 }
}},
{ "$addFields": {
"items": {
"input": "$items",
"cond": {
"$and": [
{ "$eq": [ "$$this.date", "31.08.2014" ] },
{ "$eq": [ "$$this.purchasePrice", 1 ] }
]
}
}
}}
])
So it just depends on whether you always expect a single element to match or multiple elements, and then which approach is better. But where possible the .find() method will generally be faster since it lacks the overhead of the other operations, which in those last to forms does not lag that far behind at all.
As a side note, your "dates" are represented as strings which is not a very good idea going forward. Consider changing these to proper Date object types, which will greatly help you in the future.
Based on Neil Lunn's code I work with this solution, it includes automatically all first level keys (but you could also exclude keys if you want):
db.products.find(
{ "items.date": "31.08.2014" },
{ "shop": 1, "name":1, "items.$": 1 }
{ items: { $elemMatch: { date: "31.08.2014" } } },
)
With multiple requirements:
db.products.find(
{ "items": {
"$elemMatch": { "date": "31.08.2014", "purchasePrice": 1 }
}},
{ items: { $elemMatch: { "date": "31.08.2014", "purchasePrice": 1 } } },
)
Mongo supports dot notation for sub-queries.
See: http://docs.mongodb.org/manual/reference/glossary/#term-dot-notation
Depending on your driver, you want something like:
db.products.find({"items.date":"31.08.2014"});
Note that the attribute is in quotes for dot notation, even if usually your driver doesn't require this.

How to address arrays with mongodb Set Operators

Using the example zipcodes collection, I have a query like this:
db.zipcodes.aggregate([
{ "$match": {"state": {"$in": ["PA","NY"]}}},
{ "$group": { "_id": { "city": "$city" }, "ZipsPerCity": {"$addToSet": "$_id"}}},
{ "$match": { "ZipsPerCity" : { "$size": 2 }}},
]).pretty()
This is just an example that looks for cities (in the state of NY and PA) that have 2 zipcodes:
{
"_id" : {
"city" : "BETHLEHEM"
},
"ZipsPerCity" : [
"18018",
"18015"
]
}
{
"_id" : {
"city" : "BEAVER SPRINGS"
},
"ZipsPerCity" : [
"17843",
"17812"
]
}
Now suppose that I want to compare "BEAVER SPRINGS" zip codes to "BETHLEHEM" zip codes, using the "$setDifference" set operator? I tried using the "$setDifference" operator in a $project operator, like this:
db.zipcodes.aggregate([
{ "$match": { "state": {"$in": ["PA","NY"]}}},
{ "$group": { "_id: {city : "$city"},"ZipsPerCity": {$addToSet: "$_id"}}},
{ "$match": { "ZipsPerCity" : { $size: 2 }}},
{ "$project": {
"int": { "$setDifference":[
"$_id.city.BETHLEHEM.ZipsPerCity",
"$_id.city.BEAVER SPRINGS.ZipsPerCity"
]}
}}
]).pretty()
That doesn't even look right, let alone produce results. No errors though.
How would you refer to a couple of arrays built using $addToSet like this, using $setDifference (or any of the set operators)?
The first thing about what you are trying to do here is that the arrays you want to compare are actually in two different documents. All of the aggregation framework operators in fact work on only one document at a time, with the exception of $group which is meant to "aggregate" documents and possibly $unwind which essentially turns one document into many.
In order to compare you would need the data to occur in one document, or at least be "paired" in some way. So there is a technique to do that:
db.zipcodes.aggregate([
{ "$match": {"state": { "$in": [ "PA","NY" ] } }},
{ "$group": {
"_id": "$city",
"ZipsPerCity": { "$addToSet": "$_id"}
}},
{ "$match": { "ZipsPerCity" : { "$size": 2 } }},
{ "$group": {
"_id": null,
"A": { "$min": {
"$cond": [
{ "$eq": [ "$_id", "BETHLEHEM" ] },
{ "city": "$_id", "ZipsPerCity": "$ZipsPerCity" },
false
]
}},
"B": { "$min": {
"$cond": [
{ "$eq": [ "$_id", "BEAVER SPRINGS" ] },
{ "city": "$_id", "ZipsPerCity": "$ZipsPerCity" },
false
]
}}
}},
{ "$project": {
"A": 1,
"B": 1,
"C": { "$setDifference": [ "$A.ZipsPerCity", "$B.ZipsPerCity" ] }
}}
])
That is a little contrived and I am well aware that the actual result set has more than two cities, but the point it to illustrate that the arrays/sets sent to the "set operators" such as $setDifference need to be in the same document.
The result here compares the "left" array with the "right" array, returning the members from the "left" that are different to the "right". Both sets are unique here with no overlap so the results should be expected:
{
"_id" : null,
"A" : {
"city" : "BETHLEHEM",
"ZipsPerCity" : [
"18018",
"18015"
]
},
"B" : {
"city" : "BEAVER SPRINGS",
"ZipsPerCity" : [
"17843",
"17812"
]
},
"C" : [
"18018",
"18015"
]
}
This is really better illustrated with actual "sets" with common members. So this document:
{ "A" : [ "A", "A", "B", "C", "D" ], "B" : [ "B", "C" ] }
Responds to $setDifference:
{ "C" : [ "A", "D" ] }
And $setEquals:
{ "C" : false }
$setIntersection:
{ "C" : [ "B", "C" ] }
$setUnion:
{ "C" : [ "B", "D", "C", "A" ] }
$setIsSubSet reversing the order to $B, $A:
{ "C" : true }
The other set operators $anyElementTrue and $allElementsTrue are likely most useful when used along with the $map operator which can re-shape arrays and evaluate conditions against each element.
A very good usage of $map is alongside $setDifference, where you can "filter" array contents without using $unwind:
db.arrays.aggregate([
{ "$project": {
"A": {
"$setDifference": [
{
"$map": {
"input": "$A",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "A" ] },
"$$el",
false
]
}
}
},
[false]
]
}
}}
])
That can be very handy when you have a lot of results in the pipeline and you do not want to "expand" out all of those results by "unwinding" the array. But note that this is a "set" and as such only one element matching "A" is returned:
{ "A" : ["A"] }
So the things to keep in mind here are that you:
Operate only within the "same" document at a time
The results are generally "sets" and that means they are both "unique" and "un-ordered" as a result.
Overall that should be a decent run-down on what the set operators are and how you use them.