Mongodb counting array combinations

Mongodb counting array combinations - mongodb

I have this kind of documents
[
{
....
tags : ["A","B"]
},
{
....
tags : ["A","B"]
},
{
....
tags : ["J","K"]
},
{
....
tags : ["A","B","C"]
}
]
With the Aggregation Framwork I'd like to group by array combinations to have something like this :
[
{
_id:["A","B"],
count : 2
},
{
_id:["J","K"],
count : 1
},
{
_id:["A","B","C"],
count : 1
},
]
Is it possible to do that?
Thank you

Not sure why you didn't even think this would work:
db.collection.aggregate([
{ "$group": {
"_id": "$tags",
"count": { "$sum": 1 }
}}
])
Returns:
{ "_id" : [ "A", "B", "C" ], "count" : 1 }
{ "_id" : [ "J", "K" ], "count" : 1 }
{ "_id" : [ "A", "B" ], "count" : 2 }
MongoDB "does not care" what you throw into the value of a "field" or "property". This applies to the "grouping key" of _id in the $group operator as well. Everything is a "document" and therefore a BSON value and is therefore valid.
Anything works. So long as it's what you want.

Related

How to find the records with same key value assigned to multiple values in MongoDB

I have data like the following,
Student | Subject
A | Language
A | Math
B | Science
A | Arts
C | Biology
B | History
and so on...
I want to fetch the students who has same name but enrolled in two different subjects Language & Math only.
I tried to use the query:
$group:{
_id:"$student",
sub:"{$addToSet:"$subject"}
},
$match:{
sub:{$in:["Language","Math"]}
}
But I am getting no documents to preview in MongoDB Compass. I am working in a VM machine, Compass is able to group only biology, history, science, arts only but not able to group language and math. I wanted to get A as my output.
Thanks in loads.
The collection data and the expected output:
{ Student:"A", Subject:"Language" },
{ Student:"A", Subject:"Math" },
{ Student:"B", Subject:"Science" },
{ Student:"A", Subject:"Arts" },
{ Student:"C", Subject:"Biology" },
{ Student:"B", Subject:"History" }
I am looking to get A as my output.

You are almost there, just need some tweak to your aggregation pipeline:
const pipeline = [
{
$group:
{
_id: '$Student', // Group students by name
subjects: {
$addToSet: '$Subject', // Push all the subjects they take uniquely into an array
},
},
},
{
// Filter for students who only offer Language and Mathematics
$match: { subjects: { $all: ['Language', 'Math'], $size: 2 } },
},
];
db.students.aggregate(pipeline);
That should give an output array like this:
[
{ "_id" : studentName1 , "subjects" : [ "Language", "Math" ] },
{ "_id" : studentName2 , "subjects" : [ "Language", "Math" ] },
....
]

You have to use an Aggregation operator, $setIsSubset. The $in (aggregation) operator is used to check an array for one value only. I think you are thinking of $in (query operator)..
The Query:
db.student_subjects.aggregate( [
{ $group: {
_id: "$student",
studentSubjects: { $addToSet: "$subject" }
}
},
{ $project: {
subjectMatches: { $setIsSubset: [ [ "Language", "Math" ], "$studentSubjects" ] }
}
},
{ $match: {
subjectMatches: true
}
},
{ $project: {
matched_student: "$_id", _id: 0
}
}
] )
The Result:
{ "matched_student" : "A" }
NOTES:
If you replace [ "Language", "Math" ] with [ "History" ], you will get the result: { "matched_student" : "B" }.
You can also try and see other set operators (aggregation), like the $allElementsTrue. Use the best one that suits your application.
[ EDIT ADD ]
Sample data for student_subjects collection:
{ "_id" : 1, "student" : "A", "subject" : "Language" }
{ "_id" : 2, "student" : "A", "subject" : "Math" }
{ "_id" : 3, "student" : "B", "subject" : "Science" }
{ "_id" : 4, "student" : "A", "subject" : "Arts" }
{ "_id" : 5, "student" : "C", "subject" : "Biology" }
{ "_id" : 6, "student" : "B", "subject" : "History" }
The Result After Each Stage:
1st Stage: $group
{ "_id" : "C", "studentSubjects" : [ "Biology" ] }
{ "_id" : "B", "studentSubjects" : [ "History", "Science" ] }
{ "_id" : "A", "studentSubjects" : [ "Arts", "Math", "Language" ] }
2nd Stage: $project
{ "_id" : "C", "subjectMatches" : false }
{ "_id" : "B", "subjectMatches" : false }
{ "_id" : "A", "subjectMatches" : true }
3rd Stage: $match
{ "_id" : "A", "subjectMatches" : true }
4th Stage: $project
{ "matched_student" : "A" }

MongoDb Except equivalent

I have a question about a problem I came across while trying to use $setDifference on a collection of documents.
All I want to have are all documents that are contained in Root 1 and remove all documents that are also included in Root 2 based on the "reference.id".
My collection represents two tree structures and basically looks like this:
/* Tree Root 1 */
{
"_id" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
"name" : "Root 1",
"children" : [
LUUID("ca01f1ab-7c32-4e6b-a07a-e0ee9d8ec5ac"),
LUUID("6dd8c8ed-4a60-41ca-abf1-a4d795a0c213")
]
},
/* Child 1 - Root 1 */
{
"_id" : LUUID("ca01f1ab-7c32-4e6b-a07a-e0ee9d8ec5ac"),
"parentId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
"reference" : {
"type" : "someType",
"id" : LUUID("331503FB-C4D1-4F7A-A461-933C701EF9AB")
},
"rootReferenceId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
"name" : "Child 1 (Root 1)"
}
/* Child 2 - Root 1 */
{
"_id" : LUUID("6dd8c8ed-4a60-41ca-abf1-a4d795a0c213"),
"parentId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
"reference" : {
"type" : "someType",
"id" : LUUID("23E8B540-3EFB-455A-AA5C-2B67D6B59943")
},
"rootReferenceId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
"displayName" : "Child 2 (Root 1)"
}
/* Tree Root 2 */
{
"_id" : LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9"),
"name" : "Root 2",
"children" : [
LUUID("ad4ad076-322e-4c26-8855-91c9b1912d1f"),
LUUID("66452420-dd2f-4d27-91c9-78bd0990817c")
]
},
/* Child 1 - Root 2 */
{
"_id" : LUUID("ad4ad076-322e-4c26-8855-91c9b1912d1f"),
"parentId" : LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9"),
"reference" : {
"type" : "someType",
"id" : LUUID("331503FB-C4D1-4F7A-A461-933C701EF9AB")
},
"rootReferenceId" : LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9"),
"displayName" : "Child 1 (Root 2)"
}
That means in the end I want to have the document:
/* Child 2 - Root 1 */
{
"_id" : LUUID("6dd8c8ed-4a60-41ca-abf1-a4d795a0c213"),
"parentId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
"reference" : {
"type" : "someType",
"id" : LUUID("23E8B540-3EFB-455A-AA5C-2B67D6B59943")
},
"rootReferenceId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
"displayName" : "Child 2 (Root 1)"
}
Because its reference.id is contained in Root 1 but not in Root 2 (so it will not be excluded from the result set like Child 1)
I already wrote an aggregation stage to group the "reference.id"s like this:
db.getCollection('test').aggregate([
{
$match: {
rootReferenceId: { $ne: null }
}
},
{
$group: {
_id: "$rootReferenceId",
referenceIds: { $addToSet: "$reference.id" }
}
}
])
What returns me this:
/* 1 */
{
"_id" : LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9"),
"referenceIds" : [
LUUID("331503fb-c4d1-4f7a-a461-933c701ef9ab")
]
}
/* 2 */
{
"_id" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
"referenceIds" : [
LUUID("23e8b540-3efb-455a-aa5c-2b67d6b59943"),
LUUID("331503fb-c4d1-4f7a-a461-933c701ef9ab")
]
}
Has anyone an idea how I can $project this into a format that $setDifference accepts?
I think it needs to look like this:
{
LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9") : [
LUUID("331503fb-c4d1-4f7a-a461-933c701ef9ab")
]
LUUID("9f3a73df-bca7-48b7-b111-285359e50a02") : [
LUUID("23e8b540-3efb-455a-aa5c-2b67d6b59943"),
LUUID("331503fb-c4d1-4f7a-a461-933c701ef9ab")
]
}
Or it there a complete different way to achieve this i am not aware of?
Any help is appreciated!
Edit Solution:
The solution is now like dnickless suggested. Really a nice one! Thanks a lot for this!

Here is what you could do without storing duplicate values in a string format. What's nice about this solution is that
a) it returns the entire document that you are interested in so you don't need a second query (if you do not need the entire document then the $filter operator can simply be replaced with the $setDifference bit)
b) it consists of very few and cheap stages (no grouping!) and will leverage indices on the rootReferenceId field (if there are any which I would recommend).
db.getCollection('test').aggregate([
{ "$facet": {
"allInRoot1": [{
"$match": { "rootReferenceId": LUUID("9f3a73df-bca7-48b7-b111-285359e50a02") }
}],
"allInRoot2": [{
"$match": { "rootReferenceId": LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9") }
}]
}}, {
"$project": {
"difference": {
"$filter": {
"input": "$allInRoot1",
"as": "this",
"cond": { "$in": [ "$$this.reference.id", { "$setDifference": [ "$allInRoot1.reference.id", "$allInRoot2.reference.id" ] } ] }
}
}
}
}
])

You can try below aggregation in mongodb 3.6 and above.
db.getCollection('test').aggregate([
{ "$match": { "rootReferenceId": { "$ne": null }}},
{ "$group": {
"_id": "$rootReferenceId",
"referenceIds": { "$addToSet": "$reference.id" }
}},
{ "$group": {
"_id": null,
"data": {
"$push": { "k": { "$toString": "$_id" }, "v": "$referenceIds" }
}
}},
{ "$replaceRoot": { "newRoot": { "$arrayToObject": "$data" }}}
])

How to group documents on index of array elements?

I'm looking for a way to take data such as this
{ "_id" : 5, "count" : 1, "arr" : [ "aga", "dd", "a" ] },
{ "_id" : 6, "count" : 4, "arr" : [ "aga", "ysdf" ] },
{ "_id" : 7, "count" : 4, "arr" : [ "sad", "aga" ] }
I would like to sum the count based on the 1st item(index) of arr. In another aggregation I would like to do the same with the 1st and the 2nd item in the arr array.
I've tried using unwind, but that breaks up the data and the hierarchy is then lost.
I've also tried using
$group: {
_id: {
arr_0:'$arr.0'
},
total:{
$sum: '$count'
}
}
but the result is blank arrays

Actually you can't use the dot notation to group your documents by element at a specified index. To two that you have two options:
First the optimal way using the $arrayElemAt operator new in MongoDB 3.2. which return the element at a specified index in the array.
db.collection.aggregate([
{ "$group": {
"_id": { "$arrayElemAt": [ "$arr", 0 ] },
"count": { "$sum": 1 }
}}
])
From MongoDB version 3.0 backward you will need to de-normalise your array then in the first time $group by _id and use the $first operator to return the first item in the array. From there you will need to regroup your document using that value and use the $sum to get the sum. But this will only work for the first and last index because MongoDB also provides the $last operator.
db.collection.aggregate([
{ "$unwind": "$arr" },
{ "$group": {
"_id": "$_id",
"arr": { "$first": "$arr" }
}},
{ "$group": {
"_id": "$arr",
"count": { "$sum": 1 }
}}
])
which yields something like this:
{ "_id" : "sad", "count" : 1 }
{ "_id" : "aga", "count" : 2 }
To group using element at position p in your array you will get a better chance using the mapReduce function.
var mapFunction = function(){ emit(this.arr[0], 1); };
var reduceFunction = function(key, value) { return Array.sum(value); };
db.collection.mapReduce(mapFunction, reduceFunction, { "out": { "inline": 1 } } )
Which returns:
{
"results" : [
{
"_id" : "aga",
"value" : 2
},
{
"_id" : "sad",
"value" : 1
}
],
"timeMillis" : 27,
"counts" : {
"input" : 3,
"emit" : 3,
"reduce" : 1,
"output" : 2
},
"ok" : 1
}

List of userids without duplicates in mongodb [duplicate]

I'm trying to learn MongoDB and how it'd be useful for analytics for me. I'm simply playing around with the JavaScript console available on their website and have created the following items:
{"title": "Cool", "_id": {"$oid": "503e4dc0cc93742e0d0ccad3"}, "tags": ["twenty", "sixty"]}
{"title": "Other", "_id": {"$oid": "503e4e5bcc93742e0d0ccad4"}, "tags": ["ten", "thirty"]}
{"title": "Ouch", "_id": {"$oid": "503e4e72cc93742e0d0ccad5"}, "tags": ["twenty", "seventy"]}
{"title": "Final", "_id": {"$oid": "503e4e72cc93742e0d0ccad6"}, "tags": ["sixty", "seventy"]}
What I'd like to do is query so I get a list of unique tags for all of these objects. The result should look something like this:
["ten", "twenty", "thirty", "sixty", "seventy"]
How do I query for this? I'm trying to distinct() it, but the call always fails without even querying.

The code that fails on their website works on an actual MongoDB instance:
> db.posts.insert({title: "Hello", tags: ["one", "five"]});
> db.posts.insert({title: "World", tags: ["one", "three"]});
> db.posts.distinct("tags");
[ "one", "three", "five"]
Weird.

You can use the aggregation framework. Depending on how you'd like the results structured, you can use either
var pipeline = [
{"$unwind": "$tags" } ,
{ "$group": { _id: "$tags" } }
];
R = db.tb.aggregate( pipeline );
printjson(R);
{
"result" : [
{
"_id" : "seventy"
},
{
"_id" : "ten"
},
{
"_id" : "sixty"
},
{
"_id" : "thirty"
},
{
"_id" : "twenty"
}
],
"ok" : 1
}
or
var pipeline = [
{"$unwind": "$tags" } ,
{ "$group":
{ _id: null, tags: {"$addToSet": "$tags" } }
}
];
R = db.tb.aggregate( pipeline );
printjson(R);
{
"result" : [
{
"_id" : null,
"tags" : [
"seventy",
"ten",
"sixty",
"thirty",
"twenty"
]
}
],
"ok" : 1
}

You should be able to use this:
db.mycollection.distinct("tags").sort()

Another way of getting unique array elements using aggregation pipeline
db.blogs.aggregate(
[
{$group:{_id : null, uniqueTags : {$push : "$tags"}}},
{$project:{
_id : 0,
uniqueTags : {
$reduce : {
input : "$uniqueTags",
initialValue :[],
in : {$let : {
vars : {elem : { $concatArrays : ["$$this", "$$value"] }},
in : {$setUnion : "$$elem"}
}}
}
}
}}
]
)
collection
> db.blogs.find()
{ "_id" : ObjectId("5a6d53faca11d88f428a2999"), "name" : "sdfdef", "tags" : [ "abc", "def", "efg", "abc" ] }
{ "_id" : ObjectId("5a6d5434ca11d88f428a299a"), "name" : "abcdef", "tags" : [ "abc", "ijk", "lmo", "zyx" ] }
>
pipeline
> db.blogs.aggregate(
... [
... {$group:{_id : null, uniqueTags : {$push : "$tags"}}},
... {$project:{
... _id : 0,
... uniqueTags : {
... $reduce : {
... input : "$uniqueTags",
... initialValue :[],
... in : {$let : {
... vars : {elem : { $concatArrays : ["$$this", "$$value"] }},
... in : {$setUnion : "$$elem"}
... }}
... }
... }
... }}
... ]
... )
result
{ "uniqueTags" : [ "abc", "def", "efg", "ijk", "lmo", "zyx" ] }

There are couple of web mongo consoles available:
http://try.mongodb.org/
http://www.mongodb.org/#
But if you type help in them you will realise they only support a very small number of ops:
HELP
Note: Only a subset of MongoDB's features are provided here.
For everything else, download and install at mongodb.org.
db.foo.help() help on collection method
db.foo.find() list objects in collection foo
db.foo.save({a: 1}) save a document to collection foo
db.foo.update({a: 1}, {a: 2}) update document where a == 1
db.foo.find({a: 1}) list objects in foo where a == 1
it use to further iterate over a cursor
As such distinct does not work because it is not supported.

Aggregation framework flatten subdocument data with parent document

I am building a dashboard that rotates between different webpages. I am wanting to pull all slides that are part of the "Test" deck and order them appropriately. After the query my result would ideally look like.
[
{ "url" : "http://10.0.1.187", "position": 1, "duartion": 10 },
{ "url" : "http://10.0.1.189", "position": 2, "duartion": 3 }
]
I currently have a dataset that looks like the following
{
"_id" : ObjectId("53a612043c24d08167b26f82"),
"url" : "http://10.0.1.189",
"decks" : [
{
"title" : "Test",
"position" : 2,
"duration" : 3
}
]
}
{
"_id" : ObjectId("53a6103e3c24d08167b26f81"),
"decks" : [
{
"title" : "Test",
"position" : 1,
"duration" : 2
},
{
"title" : "Other Deck",
"position" : 1,
"duration" : 10
}
],
"url" : "http://10.0.1.187"
}
My attempted query looks like:
db.slides.aggregate([
{
"$match": {
"decks.title": "Test"
}
},
{
"$sort": {
"decks.position": 1
}
},
{
"$project": {
"_id": 0,
"position": "$decks.position",
"duration": "$decks.duration",
"url": 1
}
}
]);
But it does not yield my desired results. How can I query my dataset and get my expected results in a optimal way?

Well to truly "flatten" the document as your title suggests then $unwind is always going to be employed as there really is not other way to do that. There are however some different approaches if you can live with the array being filtered down to the matching element.
Basically speaking, if you really only have one thing to match in the array then your fastest approach is to simply use .find() matching the required element and projecting:
db.slides.find(
{ "decks.title": "Test" },
{ "decks.$": 1 }
).sort({ "decks.position": 1 }).pretty()
That is still an array but as long as you have only one element that matches then this does work. Also the items are sorted as expected, though of course the "title" field is not dropped from the matched documents, as that is beyond the possibilities for simple projection.
{
"_id" : ObjectId("53a6103e3c24d08167b26f81"),
"decks" : [
{
"title" : "Test",
"position" : 1,
"duration" : 2
}
]
}
{
"_id" : ObjectId("53a612043c24d08167b26f82"),
"decks" : [
{
"title" : "Test",
"position" : 2,
"duration" : 3
}
]
}
Another approach, as long as you have MongoDB 2.6 or greater available, is using the $map operator and some others in order to both "filter" and re-shape the array "in-place" without actually applying $unwind:
db.slides.aggregate([
{ "$project": {
"url": 1,
"decks": {
"$setDifference": [
{
"$map": {
"input": "$decks",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.title", "Test" ] },
{
"position": "$$el.position",
"duration": "$$el.duration"
},
false
]
}
}
},
[false]
]
}
}},
{ "$sort": { "decks.position": 1 }}
])
The advantage there is that you can make the changes without "unwinding", which can reduce processing time with large arrays as you are not essentially creating new documents for every array member and then running a separate $match stage to "filter" or another $project to reshape.
{
"_id" : ObjectId("53a6103e3c24d08167b26f81"),
"decks" : [
{
"position" : 1,
"duration" : 2
}
],
"url" : "http://10.0.1.187"
}
{
"_id" : ObjectId("53a612043c24d08167b26f82"),
"url" : "http://10.0.1.189",
"decks" : [
{
"position" : 2,
"duration" : 3
}
]
}
You can again either live with the "filtered" array or if you want you can again "flatten" this truly by adding in an additional $unwind where you do not need to filter with $match as the result already contains only the matched items.
But generally speaking if you can live with it then just use .find() as it will be the fastest way. Otherwise what you are doing is fine for small data, or there is the other option for consideration.

Well as soon as I posted I realized I should be using an $unwind. Is this query the optimal way to do it, or can it be done differently?
db.slides.aggregate([
{
"$unwind": "$decks"
},
{
"$match": {
"decks.title": "Test"
}
},
{
"$sort": {
"decks.position": 1
}
},
{
"$project": {
"_id": 0,
"position": "$decks.position",
"duration": "$decks.duration",
"url": 1
}
}
]);