Summing a value of a key over multiple documents in MongoDB - mongodb

I have a collection named users with the following structure to its documents
{
"_id" : <user_id>,
"NAME" : "ABC",
"TIME" : 53.0,
"OBJECTS" : 1
},
{
"_id" : <user_id>,
"NAME" : "ABCD",
"TIME" : 353.0,
"OBJECTS" : 70
}
Now, I want to sum the value of OBJECTS over the entire collection and return the value along with the objects.
Something like this
{
{
"_id" : <user_id>,
"NAME" : "ABC",
"TIME" : 53.0,
"OBJECTS" : 1
},
{
"_id" : <user_id>,
"NAME" : "ABCD",
"TIME" : 353.0,
"OBJECTS" : 70
},
"TOTAL_OBJECTS": 71
}
Or any way wherein I don't have to compute on the received object and can directly access from it. Now, I've tried looking this up but I found none where the hierarchy of the existing documents isn't destroyed.

You can use $group specifying null as a grouping id. You'll gather all documents into one array (using $$ROOT variable) and another field can represent a sum of OBJECT like below:
db.users.aggregate([
{
$group: {
_id: null,
documents: { $push: "$$ROOT" },
TOTAL_OBJECTS: { $sum: "$OBJECTS" }
}
}
])

db.users.aggregate(
// Pipeline
[
// Stage 1
{
$group: {
_id: null,
TOTAL_OBJECTS: {
$sum: '$OBJECTS'
},
documents: {
$addToSet: '$$CURRENT'
}
}
},
]
);
Into above aggregate query I have pushed all documents into an array using $addToSet operator as a part of $group stage of aggregate operation

Related

Querying aggregates on subdocuments then grouping by field in parent document

I'm a noob when it comes to Mongo and I've been struggling to wrap my head around how to fetch data in the following fashion. I have a collection of order documents that contain some data such as an event_id and a subcollection (if that's the term) of issued_tickets. issued_tickets contains one to many subdocuments that contain fields such as name, date, etc. What I am trying to do is fetch the number of each type of issued tickets for each event_id in the parent document. So I would be wanting to do a count on each issued_tickets grouped by issued_tickets.name and then that goes up to the parent which is then summed and grouped on the parent's event_id.
Can anyone help me accomplish this? I keep spinning myself out on trying groupings and projections still.
Here is a sample document:
{
"_id" : ObjectId("5ce7335c1c666f000414f74a"),
"event_id" : ObjectId("5cb54f966668a9719ef6a103"),
"subtotal" : 3000,
"service_fee" : 760,
"processing_fee" : 143,
"total" : 3903,
"customer_id" : ObjectId("5ce7666c1c335f000414f747"),
"updated_at" : ISODate("2019-05-23T23:57:17.524Z"),
"created_at" : ISODate("2019-05-23T23:57:17.524Z"),
"ref" : "60d5fcf9-86c6-469b-b86b-315a9b55caca",
"issued_tickets" : [
{
"_id" : ObjectId("5ce7335c1c335f000414f666"),
"name" : "Tier 1",
"stub_name" : "Tier 1",
"price" : 1500,
"base_fee" : 200,
"perc_fee" : "0.12",
"access_code" : "163a1b9ee98338a8a4288a1c87446665",
"redeemed" : false
},
{
"_id" : ObjectId("5ce7335c1c335f0004146669"),
"name" : "Tier 2",
"stub_name" : "Tier 2",
"price" : 1500,
"base_fee" : 200,
"perc_fee" : "0.12",
"access_code" : "f50f262cd0bf1ec4ab36667c2a762446",
"redeemed" : true
}
]
}
We can do aggregations like following
$unwind to deconstruct the array
$group to reconstruct the array. While regrouping by eventId and issued_tickets.name, we can count using $sum
Mongo script :
db.collection.aggregate([
{
$unwind: "$issued_tickets"
},
{
$group: {
_id: {
_id: "$event_id",
ticketName: "$issued_tickets.name"
},
count: {
$sum: 1
}
}
},
{
$project: {
event_id: "$_id._id",
ticketName: "$_id.ticketName",
count: 1,
_id: 0
}
}
])
Working Mongo playground

how to find duplicate records in mongo db query to use

I have below collection, need to find duplicate records in mongo, how can we find that as below is one sample of collection we have around more then 10000 records of collections.
/* 1 */
{
"_id" : 1814099,
"eventId" : "LAS012",
"eventName" : "CustomerTab",
"timeStamp" : ISODate("2018-12-31T20:09:09.820Z"),
"eventMethod" : "click",
"resourceName" : "CustomerTab",
"targetType" : "",
"resourseUrl" : "",
"operationName" : "",
"functionStatus" : "",
"results" : "",
"pageId" : "CustomerPage",
"ban" : "290824901",
"jobId" : "87377713",
"wrid" : "87377713",
"jobType" : "IBJ7FXXS",
"Uid" : "sc343x",
"techRegion" : "W",
"mgmtReportingFunction" : "N",
"recordPublishIndicator" : "Y",
"__v" : 0
}
We can first find the unique ids using
const data = await db.collection.aggregate([
{
$group: {
_id: "$eventId",
id: {
"$first": "$_id"
}
}
},
{
$group: {
_id: null,
uniqueIds: {
$push: "$id"
}
}
}
]);
And then we can make another query, which will find all the duplicate documents
db.collection.find({_id: {$nin: data.uniqueIds}})
This will find all the documents that are redundant.
Another way
To find the event ids which are duplicated
db.collection.aggregate(
{"$group" : { "_id": "$eventId", "count": { "$sum": 1 } } },
{"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }
)
To get duplicates from db, you need to get only the groups that have a count of more than one, we can use the $match operator to filter our results. Within the $match pipeline operator, we'll tell it to look at the count field and tell it to look for counts greater than one using the $gt operator representing "greater than" and the number 1. This looks like the following:
db.collection.aggregate([
{$group: {
_id: {eventId: "$eventId"},
uniqueIds: {$addToSet: "$_id"},
count: {$sum: 1}
}
},
{$match: {
count: {"$gt": 1}
}
}
]);
I assume that eventId is a unique id.

combining distinct on projection in mongodb

Is there a query i can use on the following collection to get the result at the bottom?
Example:
{
"_id" : ObectId(xyz),
"name" : "Carl",
"something":"else"
},
{
"_id" : ObectId(aaa),
"name" : "Lenny",
"something":"else"
},
{
"_id" : ObectId(bbb),
"name" : "Carl",
"something":"other"
}
I need a query to get this result:
{
"_id" : ObectId(xyz),
"name" : "Carl"
},
{
"_id" : ObectId(aaa),
"name" : "Lenny"
},
A set of documents with no identical names. Its not important which _ids are kept.
You can use aggregation framework to get this shape, the query could look like this:
db.collection.aggregate(
[
{
$group:
{
_id: "$name",
id: { $first: "$_id" }
}
},
{
$project:{
_id:"$id",
name:"$_id"
}
}
]
)
As long as you don't need other fields this will be sufficient.
If you need to add other fields - please update document structure and expected result.
as you don't care about ids it can be simplified
db.collection.aggregate([{$group:{_id: "$name"}}])

Aggregation query returning array of all objects for mongodb

I'm using mongo for the first time. I'm trying to aggregate some documents in a collection using the query below. Instead the query returns an object with a key "result" that contains an array of all the documents that fit with $match.
Below is the query.
db.events_2015_04_10.aggregate([
{$group:{
_id: "$uid",
count: {$sum: 1},
},
$match : {promo:"bc40100abc8d4eb6a0c68f81f4a756c7", evt:"login"}
}
]
);
Below is a sample document in the collection:
{
"_id" : ObjectId("552712c3f92ea17426000ace"),
"product" : "Mobile Safari",
"venue_id" : NumberLong(71540),
"uid" : "dd542fea6b4443469ff7bf1f56472eac",
"ag" : 0,
"promo" : "bc40100abc8d4eb6a0c68f81f4a756c7",
"promo_f" : NumberLong(1),
"brand" : NumberLong(17),
"venue" : "ovation_2480",
"lt" : 0,
"ts" : ISODate("2015-04-10T00:01:07.734Z"),
"evt" : "login",
"mac" : "00:00:00:00:00:00",
"__ns__" : "wifipromo",
"pvdr" : NumberLong(42),
"os" : "iPhone",
"cmpgn" : "fc6de34aef8b4f57af0b8fda98d8c530",
"ip" : "192.119.43.250",
"lng" : 0,
"product_ver" : "8"
}
I'm trying to get it all grouped by uid's with the total sum of each group... What is the correct way to achieve this?
Try the following aggregation framework which has the $match pipeline stage first and then the $group pipeline later:
db.events_2015_04_10.aggregate([
{
$match: {
promo: "bc40100abc8d4eb6a0c68f81f4a756c7",
evt: "login"
}
},
{
$group: {
_id: "$uid",
count: {
$sum: 1
}
}
}
])

Obtaining $group result with group count

Assuming I have a collection called "posts" (in reality it is a more complex collection, posts is too simple) with the following structure:
> db.posts.find()
{ "_id" : ObjectId("50ad8d451d41c8fc58000003"), "title" : "Lorem ipsum", "author" :
"John Doe", "content" : "This is the content", "tags" : [ "SOME", "RANDOM", "TAGS" ] }
I expect this collection to span hundreds of thousands, perhaps millions, that I need to query for posts by tags and group the results by tag and display the results paginated. This is where the aggregation framework comes in. I plan to use the aggregate() method to query the collection:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
]);
The catch is that to create the paginator I would need to know the length of the output array. I know that to do that you can do:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
{ "$group" : {
_id: null,
total: { $sum: 1 }
} }
]);
But that would discard the output from previous pipeline (the first group). Is there a way that the two operations be combined while preserving each pipeline's output? I know that the output of the whole aggregate operation can be cast to an array in some language and have the contents counted but there may be a possibility that the pipeline output may exceed the 16Mb limit. Also, performing the same query just to obtain the count seems like a waste.
So is obtaining the document result and count at the same time possible? Any help is appreciated.
Use $project to save tag and count into tmp
Use $push or addToSet to store tmp into your data list.
Code:
db.test.aggregate(
{$unwind: '$tags'},
{$group:{_id: '$tags', count:{$sum:1}}},
{$project:{tmp:{tag:'$_id', count:'$count'}}},
{$group:{_id:null, total:{$sum:1}, data:{$addToSet:'$tmp'}}}
)
Output:
{
"result" : [
{
"_id" : null,
"total" : 5,
"data" : [
{
"tag" : "SOME",
"count" : 1
},
{
"tag" : "RANDOM",
"count" : 2
},
{
"tag" : "TAGS1",
"count" : 1
},
{
"tag" : "TAGS",
"count" : 1
},
{
"tag" : "SOME1",
"count" : 1
}
]
}
],
"ok" : 1
}
I'm not sure you need the aggregation framework for this other than counting all the tags eg:
db.posts.aggregate(
{ "unwind" : "$tags" },
{ "group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
);
For paginating through per tag you can just use the normal query syntax - like so:
db.posts.find({tags: "RANDOM"}).skip(10).limit(10)