count on aggregate in mongodb - mongodb

this my query for aggregate in pymongo:
db.connection_log.aggregate([
{ '$match': {
'login_time': {'$gte': datetime.datetime(2014, 5, 30, 6, 57)}
}},
{ '$group': {
'_id': {
'username': '$username',
'ras_id': '$ras_id',
'user_id': '$user_id'
},
'total': { '$sum': '$type_details.in_bytes'},
'total1': {'$sum': '$type_details.out_bytes'}
}},
{ '$sort': {'total': 1, 'total1': 1}}
])
How to count all result in aggregate?

Add to the end of your aggregation pipeline:
$group: {
_id:null,
count:{
$sum:1
}
}
SQL to Aggregation Mapping Chart

Well if you really want your results with a total count combined then you can always just push the results into their own array:
result = db.connection_log.aggregate([
{ '$match': {
'login_time': {'$gte': datetime.datetime(2014, 5, 30, 6, 57)}
}},
{ '$group': {
'_id': {
'username': '$username',
'ras_id': '$ras_id',
'user_id': '$user_id'
},
'total': { '$sum': '$type_details.in_bytes'},
'total1': {'$sum': '$type_details.out_bytes'}
}},
{ '$sort': {'total': 1, 'total1': 1}},
{ '$group' {
'_id': null,
'results': {
'$push': {
'_id': '$_id',
'total': '$total',
'total1': '$total1'
}
},
'count': { '$sum': 1 }
}}
])
And if you are using MongoDB 2.6 or greater you can just '$push': '$$ROOT' instead of actually specifying all of the document fields there.
But really, unless you are using MongoDB 2.6 and are explicitly asking for a cursor as a result, then that result is actually returned as an array already without adding an inner array for results with a count. So just get the length of the array, which in python is:
len(result)
If you are indeed using a cursor for a large result-set or otherwise using $limit and $skip to "page" results then you will need to do two queries with one just summarizing the "total count", but otherwise you just don't need to do this.

Related

Count the documents and sum of values of fields in all documents of a mongodb

I have a set of documents modified from mongodb using
[{"$project":{"pred":1, "base-url":1}},
{"$group":{
"_id":"$base-url",
"invalid":{"$sum": { "$cond": [{ "$eq": ["$pred", "invalid"] }, 1, 0] }},
"pending":{"$sum": { "$cond": [{ "$eq": ["$pred", "null"] }, 1, 0] }},
}},
]
to get the below documents
[{'_id': 'https://www.example1.org/', 'invalid': 3, 'pending': 6},
{'_id': 'https://example2.com/', 'invalid': 10, 'pending': 4},
{'_id': 'https://www.example3.org/', 'invalid': 2, 'pending': 6}]
How to get the count of documents and sum of other fields to obtain the following result
{"count":3, "invalid":15,"pending":16}
you just need a $group stage with $sum
playground
The $sum docs and here has good examples
db.collection.aggregate([
{
$group: {
_id: null,
pending: {
$sum: "$pending"
},
invalid: {
$sum: "$invalid"
},
count: {
$sum: 1 //counting each record
}
}
},
{
$project: {
_id: 0 //removing _id field from the final output
}
}
])

How write get sum of array with mapReduce MongoDB?

Given following database schema:
{
'_id': 5079,
'name': 'Lincoln County',
'state': 'AR',
'population': 13024,
'cases': [{'date': '2020-03-16', 'count': 1}, {'date': '2020-03-22', 'count': 1},
{'date': '2020-03-24', 'count': 1}, {'date': '2020-03-26', 'count': 2}],
'deaths': [{'date': '2020-03-27', 'count': 1}, {'date': '2020-04-02', 'count': 1},
{'date': '2020-05-28', 'count': 2}, {'date': '2020-05-30', 'count': 1}]
}
What MongoDB mapReduce function would generate a collection of the total number of covid19 case counts for each states. Generate one record for each state with its 2-letter abbreviation and its total covid cases?
Try this query:
db.collection.aggregate([
{
"$project": {
"total": {
"$sum": {
"$map": {
"input": "$cases",
"as": "c",
"in": "$$c.count"
}
}
},
"state": 1
}
}
])
Example here
The query uses $map to create an array with values from cases.count and then $sum these values.
Also, the fields ouput are count which contains the $sum and the state using state: 1.

Merging multiple aggregation queries to one with MongoDB

I'm using these three queries to can have a python dataframe format with the columns : 'Date', '% part of business 2', '% part of business 3'. (for each day to have the percentage of gain from business 2 and 3).
query_business2 = collection.aggregate( [
{
'$match': {'Business': 2}
},
{
'$group': {
'_id': '$Date',
'stab2': {'$sum': '$Money'}
}
},
{
'$sort': {'_id': 1}
}
])
query_business3 = collection.aggregate([
{
'$match': {'Business':3}
},
{
'$group': {
'_id': '$Date',
'stab3': {'$sum': '$Money'}
}
},
{
'$sort': {'_id': 1}
}
])
query_total = collection.aggregate([
{
'$group': {
'_id': '$Date',
'total': {'$sum': '$Money'}
}
},
{
'$sort': {'_id': 1}
}
])
For this to be faster, I would like to merge these three queries into one. I tried using '$or' but didn't work for unashable dict.
Is there a better way to do that ? It might be possible to directly make the dataframe format without using pandas after this queries and to calculate directly the percentage of each business compared to the total money earned. Thank you for your help
Thanks to prasad_ the answer is :
query_business = collection.aggregate([
{
'$group':{
'_id': '$Date',
'total_2': {'$sum' : {'$cond': [{'$eq': ['$Business', 2]}, '$Money', 0]}},
'total_3': {'$sum' : {'$cond': [{'$eq': ['$Business', 3]}, '$Money', 0]}},
'total': {'$sum': '$Money'},
}
},
{
'$match': {'$and': [{ 'total_2': {'$gt': 0}}, {'total': {'$gt': 0}},{'total_3':{'$gt':0}}]}
},
{
'$addFields':{
'part_2': { "$multiply": [ { "$divide": ["$total_2","$total"] }, 100 ] },
'part_3': { "$multiply": [{'$divide': ['$total_3','$total']}, 100]}
}
}
])

Return 5 elements for for each type with aggregation

How do I create an aggregate operation that shows me 5 for each type?
For example, what I need is to show 5 of type= 1 , 5 of type=2 and 5 of type=3.
I have tried:
db.items.aggregate([
{$match : { "type" : { $gte:1,$lte:3 }}},
{$project: { "type": 1, "subtipo": 1, "dateupdate": 1, "latide": 1, "long": 1, "view": 1,month: { $month: "$dateupdate" } }},
{$sort:{view: -1, dateupdate: -1}},
{$limit:5}
]);
After the $match pipeline, you need to do an initial group which creates an array of the original documents. After that you can $slice the array with the documents to return the 5 elements.
The intuition can be followed in this example:
db.items.aggregate([
{ '$match' : { 'type': { '$gte': 1, '$lte': 3 } } },
{
'$group': {
'_id': '$type',
'docs': { '$push': '$$ROOT' },
}
},
{
'$project': {
'five_docs': {
'$slice': ['$docs', 5]
}
}
}
])
The above will return the 5 documents unsorted in an array. If you need to return the TOP 5 documents in sorted order then you can introduce a $sort pipeline before grouping the docs that re-orders the documents getting into the $group pipeline by the type and dateupdate fields:
db.items.aggregate([
{ '$match' : { 'type': { '$gte': 1, '$lte': 3 } } },
{ '$sort': { 'type': 1, 'dateupdate': -1 } }, // <-- re-order here
{
'$group': {
'_id': '$type',
'docs': { '$push': '$$ROOT' },
}
},
{
'$project': {
'top_five': {
'$slice': ['$docs', 5]
}
}
}
])

How to do HAVING COUNT in MongoDB?

My documents look like this:
{
"_id": ObjectId("5698fcb5585b2de0120eba31"),
"id": "26125242313",
"parent_id": "26125241841",
"link_id": "10024080",
"name": "26125242313",
"author": "gigaquack",
"body": "blogging = creative writing",
"subreddit_id": "6",
"subreddit": "reddit.com",
"score": "27",
"created_utc": "2007-10-22 18:39:31"
}
What I'm trying to do is create a query that finds users who posted to only 1 subreddit. I did this in SQL by using the query:
Select distinct author, subreddit from reddit group by author having count(*) = 1;
I'm trying to do something similar in MongoDB but are having some troubles atm.
I managed to recreate select distinct by using aggregate group but I can't figure out how to solve the HAVING COUNT part.
This is what my query looks like:
db.collection.aggregate(
[{"$group":
{ "_id": { author: "$author", subreddit: "$subreddit" } } },
{$match:{count:1}} // This part is not working
])
Am I using $match wrong?
Your query should be like:
db.collection.aggregate([{
'$group': {
'_id': {'author': '$author', 'subreddit': '$subreddit'},
'count': {'$sum': 1},
'data': {'$addToSet': '$$ROOT'}}
}, {
'$match': {
'count': {'$eq': 1}
}}])
Where data is one-length list with matched document.
if you want to get some exact field, it should look like this:
db.collection.aggregate([{
'$group': {
'_id': {'author': '$author', 'subreddit': '$subreddit'},
'count': {'$sum': 1},
'author': {'$last': '$author'}}
}, {
'$match': {
'count': {'$eq': 1}
}}])