Mongodb sort and group by - mongodb

I'm not sure that my question is correct, but it seems so:
I have a set of rows in my Mongodb, like:
[{'_id': '5b4c9aa7ddc752c1f5844315',
'ccode': 'RU',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5b4cad0dddc752c1f5844322',
'ccode': 'US',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 2,
'regs_age1': 2,
'regs_male': 2}},
{'_id': '5bd88204af4c814883a414b2',
'ccode': 'US',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5bd88204af4c814883a414b3',
'ccode': 'RU',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}}]
And I want to sort them by date and combine because for the same date there are multiple rows from different countries.
So the result should look something like ...
[{'2018-07-16T00:00:00.000Z': [{'_id': '5b4c9aa7ddc752c1f5844315',
'ccode': 'RU',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5b4cad0dddc752c1f5844322',
'ccode': 'US',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 2,
'regs_age1': 2,
'regs_male': 2}}]},
{'2018-10-30T00:00:00.000Z': [{'_id': '5bd88204af4c814883a414b2',
'ccode': 'US',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5bd88204af4c814883a414b3',
'ccode': 'RU',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}}]}]
I tried:
db.getCollection('daily_stats').aggregate([
{'$match': some_condition},
{'$group': {'ccode': 1}}, # ccode or date?
{'$sort': {"date": 1}},
])
But got an error
The field * must be an accumulator object
I googled the error, it's pretty clear, but not seems that related to my case. I don't need any sum, avg, etc functions

Query
sort by date (asceding here, if you need descending put -1)
group by date and collect the ROOT documents
replace the root so you have the date as key
*this assumes you have dates on strings, which is bad idea, if you convert them to date objects, you can still use the query but add
"k":{"$dateToString" : {"date" :"$_id"}}
Test code here
aggregate(
[{"$sort":{"date":1}},
{"$group":{"_id":"$date", "docs":{"$push":"$$ROOT"}}},
{"$replaceRoot":
{"newRoot":{"$arrayToObject":[[{"k":"$_id", "v":"$docs"}]]}}}])

When using $group, you need an _id
From the docs
{
$group:
{
_id: <expression>, // Group By Expression
<field1>: { <accumulator1> : <expression1> },
...
}
}
In your case...
db.getCollection('daily_stats').aggregate([
{'$match': some_condition},
{'$group': {
'_id': "$ccode",
'rates': { $addToSet: '$rates' },
'date': { $first: '$date' }
}},
{'$sort': {"date": 1}},
{'$project: { "_id": 0, "country": "$_id", "rates": 1, "date": 1 }}
])
Playground: https://mongoplayground.net/p/B31XLS9p-6W

Related

Group all elements with same name with their IDs Mongodb

I want to group all elements with same name and find their IDs and $push them in a list.
I have a dataset like
{
'id': 1,
'name': 'Refrigerator'
},
{
'id': 2,
'name': 'Refrigerator'
},
{
'id': 3,
'name': 'TV'
},
{
'id': 4,
'name': 'TV'
}
Expected Ouput
{
'equipment_name': 'Refrigerator',
'equipment_id': [1, 2]
},
{
'equipment_name': 'TV',
'equipment_id': [3, 4]
}
What I've tried
{'$group': {'_id': '$_id', 'equipmne_name': '$name'}}
{'$project': {'name': {'$push': {'$expr': ['$name', '$name']}}}
And a few more aggregation techniques with $cond
[
{'$group': {'_id': {'key': '$name', 'value': '$_id'}}},
{'$group': {'_id': '$_id.key', 'result': {'$push': {'$toString': '$$ROOT._id.value'}}}},
{'$project': {'_id': 0, 'equipment_name': '$_id', 'equipment_id': '$result'}}
]

How to properly use $group operator in MongoDB?

I am currently struggling with the MongoDB query in which I want to group data by 2 fields myId and myType, but the results that I get in return don't look like what I need.
My goal is to have for each myId results with myType grouping. Like:
myId : {myType1 : 5, myType2 : 3, myType3 : 1}
But when I am trying to provide query with group operator like below:
db.collection.aggregate([{
"$project": {
"myId": "$myId",
"myType": "$eventType",
}
},
{
"$group":{
"_id":{
"myId":"$myId",
"myType":"$Type"
},
"count":{
"$sum":1
}
}
}
])
Results returned by this kind of grouping looks like this
[{'_id': {'myId': 'qwerty123', 'myType': 'created', 'count': 1}},
{'_id': {'myId': qwerty123', 'myType': 'removed', 'count': 3}},
{'_id': {'myId': qwerty123', 'myType': 'updated', 'count': 2}},
{'_id': {'myId': 'asd123', 'myType': 'created', 'count': 1}},
{'_id': {'myId': asd123', 'myType': 'removed', 'count': 2}}]
But what I would like to achieve is a structure like below:
[{'_id': {'myId': 'qwerty123', 'myType': {'created' 1, 'removed' : 3, 'updated' : 2}}},
{'_id': {'myId': 'asd123', 'myType': {'created' 1, 'removed' : 2}}}]
Or maybe like this:
[{'qwerty123', 'myType': {'created' 1, 'removed' : 3, 'updated' : 2}},
{'asd123', 'myType': {'created' 1, 'removed' : 2}}]
Is it possible to achieve results from $group operator with the above schema? If yes, how can I achieve it?
Thank you.
Use below stage after your above one
db.collection.aggregate([
{ $group: {
_id: "$_id.myId",
myType: {
$push: {
$arrayToObject: [
[
{
k: "$_id.myType",
v: "$_id.count"
}
]
]
}
}
}}
])
MongoPlayground

Getting last entry of the months from mongo collection

Say the collection store data in the below format. Every day a new entry is added in the collection. Dates are in ISO format.
|id|dt|data|
---
|1|2021-03-17|{key:"A", value:"B"}
...
|1|2021-03-14|{key:"A", value:"B"}
...
|1|2021-02-28|{key:"A", value:"B"}
|1|2021-02-27|{key:"A", value:"B"}
...
|1|2021-02-01|{key:"A", value:"B"}
|1|2021-01-31|{key:"A", value:"B"}
|1|2021-01-30|{key:"A", value:"B"}
...
|1|2021-01-01|{key:"A", value:"B"}
|1|2020-12-31|{key:"A", value:"B"}
...
|1|2020-11-30|{key:"A", value:"B"}
...
I need help with a query that gives me the last day of each month for a given period of time. Below is the query I was able to do which is not giving last day of the current month as I am sorting it by day, month and year.
db.getCollection('data').aggregate([
{
$match: {dt: {$gt: ISODate("2020-01-01")}
},
{
$project: {
dt: "$dt",
month: {
$month: "$dt"
},
day: {
$dayOfMonth: "$dt"
},
year: {
$year: "$dt"
},
data: "$data"
}
},
{
$sort: {day: -1, month: -1, year: -1}
},
{ $limit: 24},
{
$sort: {dt: -1}
},
])
The results I am after is:
|1|2021-03-17|{key:"A", value:"B"}
|1|2021-02-28|{key:"A", value:"B"}
|1|2021-01-31|{key:"A", value:"B"}
|1|2020-12-31|{key:"A", value:"B"}
|1|2020-11-30|{key:"A", value:"B"}
...
|1|2020-01-31|{key:"A", value:"B"}
Group the records by year and month, get the max date for that month.
db.getCollection('data').aggregate([
{ $match: { dt: { $gt: ISODate("2020-01-01") } } },
{ $group: { // group by
_id: { $substr: ['$dt', 0, 7] }, // get year and month eg 2020-01
dt: { $max: "$dt" }, // find the max date
doc:{ "$first" : "$$ROOT" } } // to get the document
},
{ "$replaceRoot": { "newRoot": "$doc"} }, // project the document
{ $sort: { dt: -1 } }
]);
$substr
$group
$replaceRoot
$max
$first
I monkey patched a possible solution for you in Python, but without your DB, I can't be positive that this works.
First there's a function that takes in an integer representing a month and returns the last day of that month.
import datetime as dt
def last_day_of_month(month):
return dt.datetime(2021, month+1, 1) - dt.timedelta(days=1)
Next, I built the query with a separate function.
def build_query(last_month):
return [
{
"$and": [
{"date": {"$gte": last_day_of_month(i)}},
{"date": {"$lt": last_day_of_month(i) + dt.timedelta(days=1)}}
]
}
for i in range(0, last_month)
]
Here's the output. It would be inside an $or operator in the $match stage.
{'$match': {'$or': [{'$and': [{'date': {'$gte': datetime.datetime(2020, 12, 31, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 1, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 1, 31, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 2, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 2, 28, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 3, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 3, 31, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 4, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 4, 30, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 5, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 5, 31, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 6, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 6, 30, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 7, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 7, 31, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 8, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 8, 31, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 9, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 9, 30, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 10, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 10, 31, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 11, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 11, 30, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 12, 1, 0, 0)}}]}]}}

How write get sum of array with mapReduce MongoDB?

Given following database schema:
{
'_id': 5079,
'name': 'Lincoln County',
'state': 'AR',
'population': 13024,
'cases': [{'date': '2020-03-16', 'count': 1}, {'date': '2020-03-22', 'count': 1},
{'date': '2020-03-24', 'count': 1}, {'date': '2020-03-26', 'count': 2}],
'deaths': [{'date': '2020-03-27', 'count': 1}, {'date': '2020-04-02', 'count': 1},
{'date': '2020-05-28', 'count': 2}, {'date': '2020-05-30', 'count': 1}]
}
What MongoDB mapReduce function would generate a collection of the total number of covid19 case counts for each states. Generate one record for each state with its 2-letter abbreviation and its total covid cases?
Try this query:
db.collection.aggregate([
{
"$project": {
"total": {
"$sum": {
"$map": {
"input": "$cases",
"as": "c",
"in": "$$c.count"
}
}
},
"state": 1
}
}
])
Example here
The query uses $map to create an array with values from cases.count and then $sum these values.
Also, the fields ouput are count which contains the $sum and the state using state: 1.

Group by several fields and custom sums with two conditions

I want to group rows with two conditions. The first one to get total (now it works), the second to get unread messages. I cannot imagine how to do it. Inserts are:
db.messages.insert({'source_user': 'test1', 'destination_user': 'test2', 'is_read': true})
db.messages.insert({'source_user': 'test1', 'destination_user': 'test2', 'is_read': false})
db.messages.insert({'source_user': 'test1', 'destination_user': 'test3', 'is_read': true})
my code:
db.messages.aggregate([
{'$match': {'source_user': user}},
{'$group': {
'_id': {
'source_user': '$source_user',
'destination_user': '$destination_user',
'is_read': '$is_read'
},
'total': {'$sum': 1}}
},
{'$project': {
'source_user': '$_id.source_user',
'destination_user': '$_id.destination_user',
#'unread': {'$sum': {'$_id.is_read': False}},
'total': '$total',
'_id': 0
}}
])
as a result I want to get:
[{
'source_user': 'test1',
'destination_user': 'test2',
'unread': 1,
'total': 2
}, {
'source_user': 'test1',
'destination_user': 'test3',
'unread': 0,
'total': 1
}
]
Should I add a new group or I can use $is_read flag in the same group?
Thank you!
You can count unread messages the same way you do it for total but you need to apply $cond to add 0 only for those that are read and 1 for other ones:
db.messages.aggregate([
{'$match': {'source_user': user}},
{'$group': {
'_id': {
'source_user': '$source_user',
'destination_user': '$destination_user'
},
'total': {'$sum': 1},
'unread': {'$sum': { '$cond': [ '$is_read', 0, 1 ] }}
}
},
{'$project': {
'source_user': '$_id.source_user',
'destination_user': '$_id.destination_user',
'total': 1,
'unread': 1,
'_id': 0
}}
])
MongoDB Playground