Mongodb - count of items using addToSet - mongodb

I grouped by organization and used $addToSet to show the distinct machineIds associated with that organization. I would like to get the count of machineIds for each organization. However the code below is returning a count of all machineIds, not the count of distinct ones. Is there another way to get the total unique machineIds?
db.getCollection('newcollections').aggregate([{
$group: {
_id: {
organization: "$user.organization"
},
machineId: {
"$addToSet": "$user.machineId"
},
count: {
$sum: 1
}
}
}])

You need to use $size operator in projection like following:
db.collection.aggregate([{
$group: {
_id: {
organization: "$user.organization"
},
machineId: {
"$addToSet": "$user.machineId"
}
}
}, {
$project: {
"organization": "$_id.organization",
"machineId": 1,
"_id": 0,
"size": {
$size: "$machineId"
}
}
}])

Related

mongodb - check a collection for mutual likes

I had a query that worked before when likes collection had users stored as from and to fields. The query below checked for the two fields and if like was true.
Since then I changed the way users are stored in the likes collection.
.collection('likes')
.aggregate([ {$match: {$and : [{$or: [{from: userId}, {to: userId}]}, {like: true}]}},
{$group: {
_id: 0,
from: {$addToSet: "$from"},
to: {$addToSet: "$to"},
}
},
{$project: {
_id: 0,
users: {
$filter: {
input: {$setIntersection: ["$from", "$to"]},
cond: {$ne: ["$$this", userId]}
}
}
}
},
This is how likes collection used to store data (and above query worked) in figuring out mutual likers for a userId passed in req.body
{
"_id": "xw5vk1s_PpJaal46di",
"from": "xw5vk1s",
"to": "PpJaal46di"
"like": true,
}
and now I changed users to an array.
{
"_id": "xw5vk1s_PpJaal46di",
"users": [
"xw5vk1s",//first element equivalent to from
"PpJaal46di"//equivalent to previously to
],
"like": true,
}
I am not sure how to modify the query to now check for array elements in users field now that from and to is not where the two users liking each other are stored.
For MongoDB v5.2+, you can use $sortArray to create a common key for the field users and $group to get the count. You will get mutual likes by count > 1.
db.collection.aggregate([
{
$match: {
users: "xw5vk1s"
}
},
{
$group: {
_id: {
$sortArray: {
input: "$users",
sortBy: 1
}
},
count: {
$sum: 1
}
}
},
{
"$match": {
count: {
$gt: 1
}
}
}
])
Mongo Playground
Prior to MongoDB v5.2, you can create sorted key for grouping through $unwind and $sort then $push back the users in $group.
db.collection.aggregate([
{
$match: {
users: "xw5vk1s"
}
},
{
"$unwind": "$users"
},
{
$sort: {
users: 1
}
},
{
$group: {
_id: "$_id",
users: {
$push: "$users"
},
like: {
$first: "$like"
}
}
},
{
"$group": {
"_id": "$users",
"count": {
"$sum": 1
}
}
},
{
"$match": {
count: {
$gt: 1
}
}
}
])
Mongo Playground

Count number of rows and get only the last row in MongoDB

I have a collection of posts as follows:
{
"author": "Rothfuss",
"text": "Name of the Wind",
"likes": 1007,
"date": ISODate("2013-03-20T11:30:05Z")
},
{
"author": "Rothfuss",
"text": "Doors of Stone",
"likes": 1,
"date": ISODate("2051-03-20T11:30:05Z")
}
I want to get the count of each author's posts and his/her last post.
There is a SQL answer for the same question here. I try to find its MongoDB alternative.
I ended up this query so far:
db.collection.aggregate([
{
"$group": {
"_id": "$author",
"count": {
"$sum": 1
},
"lastPost": {
"$max": {
"_id": "$date",
"post": "$text"
}
}
}
}
])
which seems to work, but its different runs generate different results. It can be tested here in Mongo playground.
I don't understand how to use $max to select another property from the document containing the maximum. I am new to MongoDB, so describing the basics is also warmly appreciated.
extra question
Is it possible to limit $sum to only add posts with likes more than 100?
its different runs generate different results.
I don't understand how to use $max to select another property from the document containing the maximum.
The $max does not work in multiple fields, and also it is not effective in that field that having text/string value.
It will select any of the properties from a group of posts, it will different every time.
So the accurate result you can add new stage $sort before $group stage, to sort by date in descending order, and in the group stage you can select a value by $first operator,
{ $sort: { date: -1 } },
{
$group: {
_id: "$author",
count: { $sum: 1 },
date: { $first: "$date" },
post: { $first: "$text" }
}
}
Is it possible to limit $sum to only add posts with likes more than 100?
There is two meaning of your requirement, I am not sure which is you are asking but let me give both the solutions,
If you only don't want to count posts in count but you want to get it as the last post's date and text if it is.
$cond check condition if likes is greater than 100 then count 1 otherwise count 0
db.collection.aggregate([
{ $sort: { date: -1 } },
{
$group: {
_id: "$author",
count: {
$sum: {
$cond: [{ $gt: ["$likes", 100] }, 1, 0]
}
},
date: { $first: "$date" },
post: { $first: "$text" }
}
}
])
Playground
If you don't want to count and also don't want the last post if it is.
You can add a $match stage at the first stage to check greater than condition, and your final query would be,
db.collection.aggregate([
{ $match: { likes: { $gt: 100 } } },
{ $sort: { date: -1 } },
{
$group: {
_id: "$author",
count: { $sum: 1 },
date: { $first: "$date" },
post: { $first: "$text" }
}
}
])
Playground
Your query looks ok to me, adding a $match stage can filter out the posts if not likes > 100. (you can also do it in $sum, with $cond but there is no need here)
Query
$max accumulator can be used for documents also
Here you can see how MongoDB compares documents
mongoplayground has a problem and loses the order of fields in the documents(behaves likes they are are hashmaps when they are not) (test it in your driver also)
Test code here
db.collection.aggregate([
{
"$match": {
"likes": {
"$gt": 100
}
}
},
{
"$group": {
"_id": "$author",
"count": {
"$sum": 1
},
"lastPost": {
"$max": {
_id: "$date",
post: "$text"
}
}
}
}
])

aggregate with unwind, how to limit per document and not globally? (mongodb)

If I have a collection with 300 documents, each document has a array field called items (each item of the array is an object), something like this:
*DOCUMENT 1:*
_id: **********,
title: "test",
desc: "test desc",
items (array)
0: (object)
title: (string)
tags: (array of strings)
1: (object)
etc.
and I need to retrieve items by tags, what I'm using is this query below. I have to $limit results to something like 200 or the query is too big, the problem is if the first document has more than 200 items what it returns are only items of that document, what I'd need is to limit results PER document, for instance I'd need to retrieve 5 items for each different document where tags match ($all) tags provided.
const foundItems = await db.collection('store').aggregate([
{
$unwind: '$items'
},
{
$match: {
'items.tags': { $all : tagsArray }
}
},
{
$project: {
myitem: '$items',
desc: 1,
title: 1
}
},
{
$limit: 200
}
]).toArray()
to make it more clear and simple what I'd need in a ideal world would be something like:
{
$limit: 5,
$per: _id,
$totalLimit: 200
}
instead of $limit: 200 , is this achievable somehow? I didn't find any explanation about it in the official documentation.
What I tried is to add $sort right before $limit which would make sense if it had the behaviour I'm looking for put it that way and maybe not if placed AFTER the limit, but unfortunately it doesn't work that way and placed before or after the limit doesn't make any difference.
And I can't really use $sample since results are more than the 5%
Updated demo - https://mongoplayground.net/p/nM6T9XVa-XK
db.collection.aggregate([
{ $unwind: "$items" },
{
$match: {
"items.tags": {
$all: [ "a","b" ]
}
}
},
{
"$group": {
"_id": "$_id",
"myitem": { "$push": "$items" },
desc: { "$first": "$desc" },
title: { "$first": "$title" }
}
},
{
"$project": {
"_id": 1,
desc: 1,
title: 1,
"myitem": { $slice: [ "$myitem", 2 ]
}
}
},
{
$unwind: "$myitem"
}
])
Demo - https://mongoplayground.net/p/BESptnyUfSS
After matching the records you can $group them according to id and $project them and limit them using Use $slice
db.collection.aggregate([
{ $unwind: "$items" },
{
$match: {
"items.tags": { $all: [ "a", "b" ]
}
}
},
{
$project: {
_id: 1, myitem: "$items", desc: 1,title: 1
}
},
{
"$group": {
"_id": "$_id",
"myitem": { "$push": "$myitem" }
}
},
{
"$project": {
"_id": 1,
"myitem": {
$slice: [ "$myitem", 1 ] // limit records here per group / id
}
}
}
])

Finding top 3 students in each subject MongoDB

I have tried searching for ways to solve my problem, except that my database is set up differently,
My documents in my collection are something like this:
{name:"MAX",
date:"2020-01-01"
Math:98,
Science:60,
English:80},
{name:"JANE",
date:"2020-01-01"
Math:80,
Science:70,
English:79},
{name:"ALEX",
date:"2020-01-01"
Math:95,
Science:68,
English:70},
{name:"JOHN",
date:"2020-01-01"
Math:95,
Science:68,
English:70}
{name:"MAX",
date:"2020-06-01"
Math:97,
Science:78,
English:90},
{name:"JANE",
date:"2020-06-01"
Math:78,
Science:76,
English:66},
{name:"ALEX",
date:"2020-06-01"
Math:93,
Science:75,
English:82},
{name:"JOHN",
date:"2020-06-01"
Math:92,
Science:80,
English:50}
I want to find the top 3 students for each subject without regard for the dates. I only managed to find the top 3 students in 1 subject.
So i group the students by name first, and add a column for max scores of a subject. Math in this case. Sort it in descending order and limit results to 3.
db.student_scores.aggregate(
[
{$group:{
_id: "$name",
maxMath: { $max: "$Math" }}},
{$sort:{"maxMath":-1}},
{$limit : 3}
]
)
Is there any way to get the top 3 students for each subject?
So, it would be top 3 for math, top 3 for science, top 3 for english
{
Math:{MAX, JANE, JOHN},
Science:{JOHN, ALEX, JANE},
English:{JANE, MAX, JOHN}
}
I just applied your code 3 times, using $facet
If you prefer a more compact result add
{$project:{English:"$Eng._id", Science:"$sci._id", Math:"$math._id"}}
PLAYGROUND
PIPELINE
db.collection.aggregate([
{
"$facet": {
"math": [
{
$group: {
_id: "$name",
maxMath: {
$max: "$Math"
}
}
},
{
$sort: {
"maxMath": -1
}
},
{
$limit: 3
}
],
"sci": [
{
$group: {
_id: "$name",
maxSci: {
$max: "$Science"
}
}
},
{
$sort: {
"maxSci": -1
}
},
{
$limit: 3
}
],
"Eng": [
{
$group: {
_id: "$name",
maxEng: {
$max: "$English"
}
}
},
{
$sort: {
"maxEng": -1
}
},
{
$limit: 3
}
]
}
}
])
Your question is not clear, but i can predict 2 scenario,
Get repetitive students along with date:
$project to show required fields and convert subjects object to array using $objectToArray
$unwind subjects array
$sort by subjects name in descending order
$group by subject name and get array of students
$project to get latest 3 students from students array
db.collection.aggregate([
{
$project: {
name: "$name",
date: "$date",
subjects: {
$objectToArray: {
Math: "$Math",
Science: "$Science",
English: "$English"
}
}
}
},
{ $unwind: "$subjects" },
{ $sort: { "subjects.v": -1 } },
{
$group: {
_id: "$subjects.k",
students: {
$push: {
name: "$name",
date: "$date",
score: "$subjects.v"
}
}
}
},
{
$project: {
_id: 0,
subject: "$_id",
students: { $slice: ["$students", 3] }
}
}
])
Playground
Sum of all date's score (means unique students):
$group by name, and get sum of all subjects using $sum,
$project to convert subjects object to array using $objectToArray
$unwind subjects array
$sort by subjects name in descending order
$group by subject name and get array of students
$project to get latest 3 students from students array
db.collection.aggregate([
{
$group: {
_id: "$name",
Math: { $sum: "$Math" },
Science: { $sum: "$Science" },
English: { $sum: "$English" }
}
},
{
$project: {
subjects: {
$objectToArray: {
Math: "$Math",
Science: "$Science",
English: "$English"
}
}
}
},
{ $unwind: "$subjects" },
{ $sort: { "subjects.v": -1 } },
{
$group: {
_id: "$subjects.k",
students: {
$push: {
name: "$_id",
score: "$subjects.v"
}
}
}
},
{
$project: {
_id: 0,
subject: "$_id",
students: { $slice: ["$students", 3] }
}
}
])
Playground

Retrieving a count that matches specified criteria in a $group aggregation

So I am looking to group documents in my collection on a specific field, and for the output results of each group, I am looking to include the following:
A count of all documents in the group that match a specific query (i.e. a count of documents that satisfy some expression { "$Property": "Value" })
The total number of documents in the group
(Bonus, as I suspect that this is not easily accomplished) Properties of a document that correspond to a $min/$max accumulator
I am very new to the syntax used to query in mongo and don't quite understand how it all works, but after some research, I've managed to get it down to the following query (please note, I am currently using version 3.0.12 for my mongo db, but I believe we will upgrade in a couple of months time):
db.getCollection('myCollection').aggregate(
[
{
$group: {
_id: {
GroupID: "$GroupID",
Status: "$Status"
},
total: { $sum: 1 },
GroupName: { $first: "$GroupName" },
EarliestCreatedDate: { $min: "$DateCreated" },
LastModifiedDate: { $max: "$LastModifiedDate" }
}
},
{
$group: {
_id: "$_id.GroupID",
Statuses: {
$push: {
Status: "$_id.Status",
Count: "$total"
}
},
TotalCount: { $sum: "$total" },
GroupName: { $first: "$GroupName" },
EarliestCreatedDate: { $min: "$EarliestCreatedDate" },
LastModifiedDate: { $max: "$LastModifiedDate" }
}
}
]
)
Essentially what I am looking to retrieve is the Count for specific Status values, and project them into one final result document that looks like the following:
{
GroupName,
EarliestCreatedDate,
EarliestCreatedBy,
LastModifiedDate,
LastModifiedBy,
TotalCount,
PendingCount,
ClosedCount
}
Where PendingCount and ClosedCount are the total number of documents in each group that have a status Pending/Closed. I suspect I need to use $project with some other expression to extract this value, but I don't really understand the aggregation pipeline well enough to figure this out.
Also the EarliestCreatedBy and LastModifiedBy are the users who created/modified the document(s) corresponding to the EarliestCreatedDate and LastModifiedDate respectively. As I mentioned, I think retrieving these values will add another layer of complexity, so if there is no practical solution, I am willing to forgo this requirement.
Any suggestions/tips would be very much appreciated.
You can try below aggregation stages.
$group
Calculate all the necessary counts TotalCount, PendingCount and ClosedCount for each GroupID
Calculate $min and $max for EarliestCreatedDate and LastModifiedDate respectively and push all the fields to CreatedByLastModifiedBy to be compared later for fetching EarliestCreatedBy and LastModifiedBy for each GroupID
$project
Project all the fields for response
$filter the EarliestCreatedDate value against the data in the CreatedByLastModifiedBy and $map the matching CreatedBy to the EarliestCreatedBy and $arrayElemAt to convert the array to object.
Similar steps for calculating LastModifiedBy
db.getCollection('myCollection').aggregate(
[{
$group: {
_id: "$GroupID",
TotalCount: {
$sum: 1
},
PendingCount: {
$sum: {
$cond: {
if: {
$eq: ["Status", "Pending"]
},
then: 1,
else: 0
}
}
},
ClosedCount: {
$sum: {
$cond: {
if: {
$eq: ["Status", "Closed "]
},
then: 1,
else: 0
}
}
},
GroupName: {
$first: "$GroupName"
},
EarliestCreatedDate: {
$min: "$DateCreated"
},
LastModifiedDate: {
$max: "$LastModifiedDate"
},
CreatedByLastModifiedBy: {
$push: {
CreatedBy: "$CreatedBy",
LastModifiedBy: "$LastModifiedBy",
DateCreated: "$DateCreated",
LastModifiedDate: "$LastModifiedDate"
}
}
}
}, {
$project: {
_id: 0,
GroupName: 1,
EarliestCreatedDate: 1,
EarliestCreatedBy: {
$arrayElemAt: [{
$map: {
input: {
$filter: {
input: "$CreatedByLastModifiedBy",
as: "CrBy",
cond: {
"$eq": ["$EarliestCreatedDate", "$$CrBy.DateCreated"]
}
}
},
as: "EaCrBy",
in: {
"$$EaCrBy.CreatedBy"
}
}
}, 0]
},
LastModifiedDate: 1,
LastModifiedBy: {
$arrayElemAt: [{
$map: {
input: {
$filter: {
input: "$CreatedByLastModifiedBy",
as: "MoBy",
cond: {
"$eq": ["$LastModifiedDate", "$$MoBy.LastModifiedDate"]
}
}
},
as: "LaMoBy",
in: {
"$$LaMoBy.LastModifiedBy"
}
}
}, 0]
},
TotalCount: 1,
PendingCount: 1,
ClosedCount: 1
}
}]
)
Update for Version < 3.2
$filter is also not available in your version. Below is the equivalent.
The comparison logic is the same and creates an array with for every non matching entry the value of false or LastModifiedBy otherwise.
Next step is to use $setDifference to compare the previous array values with array [false] which returns the elements that only exist in the first set.
LastModifiedBy: {
$setDifference: [{
$map: {
input: "$CreatedByLastModifiedBy",
as: "MoBy",
in: {
$cond: [{
$eq: ["$LastModifiedDate", "$$MoBy.LastModifiedDate"]
},
"$$MoBy.LastModifiedBy",
false
]
}
}
},
[false]
]
}
Add $unwind stage after $project stage to change to object
{$unwind:"$LastModifiedBy"}
Similar steps for calculating EarliestCreatedBy