I want to group all elements with same name and find their IDs and $push them in a list.
I have a dataset like
{
'id': 1,
'name': 'Refrigerator'
},
{
'id': 2,
'name': 'Refrigerator'
},
{
'id': 3,
'name': 'TV'
},
{
'id': 4,
'name': 'TV'
}
Expected Ouput
{
'equipment_name': 'Refrigerator',
'equipment_id': [1, 2]
},
{
'equipment_name': 'TV',
'equipment_id': [3, 4]
}
What I've tried
{'$group': {'_id': '$_id', 'equipmne_name': '$name'}}
{'$project': {'name': {'$push': {'$expr': ['$name', '$name']}}}
And a few more aggregation techniques with $cond
[
{'$group': {'_id': {'key': '$name', 'value': '$_id'}}},
{'$group': {'_id': '$_id.key', 'result': {'$push': {'$toString': '$$ROOT._id.value'}}}},
{'$project': {'_id': 0, 'equipment_name': '$_id', 'equipment_id': '$result'}}
]
Related
I'm not sure that my question is correct, but it seems so:
I have a set of rows in my Mongodb, like:
[{'_id': '5b4c9aa7ddc752c1f5844315',
'ccode': 'RU',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5b4cad0dddc752c1f5844322',
'ccode': 'US',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 2,
'regs_age1': 2,
'regs_male': 2}},
{'_id': '5bd88204af4c814883a414b2',
'ccode': 'US',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5bd88204af4c814883a414b3',
'ccode': 'RU',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}}]
And I want to sort them by date and combine because for the same date there are multiple rows from different countries.
So the result should look something like ...
[{'2018-07-16T00:00:00.000Z': [{'_id': '5b4c9aa7ddc752c1f5844315',
'ccode': 'RU',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5b4cad0dddc752c1f5844322',
'ccode': 'US',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 2,
'regs_age1': 2,
'regs_male': 2}}]},
{'2018-10-30T00:00:00.000Z': [{'_id': '5bd88204af4c814883a414b2',
'ccode': 'US',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5bd88204af4c814883a414b3',
'ccode': 'RU',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}}]}]
I tried:
db.getCollection('daily_stats').aggregate([
{'$match': some_condition},
{'$group': {'ccode': 1}}, # ccode or date?
{'$sort': {"date": 1}},
])
But got an error
The field * must be an accumulator object
I googled the error, it's pretty clear, but not seems that related to my case. I don't need any sum, avg, etc functions
Query
sort by date (asceding here, if you need descending put -1)
group by date and collect the ROOT documents
replace the root so you have the date as key
*this assumes you have dates on strings, which is bad idea, if you convert them to date objects, you can still use the query but add
"k":{"$dateToString" : {"date" :"$_id"}}
Test code here
aggregate(
[{"$sort":{"date":1}},
{"$group":{"_id":"$date", "docs":{"$push":"$$ROOT"}}},
{"$replaceRoot":
{"newRoot":{"$arrayToObject":[[{"k":"$_id", "v":"$docs"}]]}}}])
When using $group, you need an _id
From the docs
{
$group:
{
_id: <expression>, // Group By Expression
<field1>: { <accumulator1> : <expression1> },
...
}
}
In your case...
db.getCollection('daily_stats').aggregate([
{'$match': some_condition},
{'$group': {
'_id': "$ccode",
'rates': { $addToSet: '$rates' },
'date': { $first: '$date' }
}},
{'$sort': {"date": 1}},
{'$project: { "_id": 0, "country": "$_id", "rates": 1, "date": 1 }}
])
Playground: https://mongoplayground.net/p/B31XLS9p-6W
Let's say we have a collection containing the following documents:
[
{'_id': ..., 'name': 'Type A', 'version': 1, ...},
{'_id': ..., 'name': 'Type B', 'version': 1, ...},
{'_id': ..., 'name': 'Type B', 'version': 2, ...},
{'_id': ..., 'name': 'Type B', 'version': 3, ...},
{'_id': ..., 'name': 'Type C', 'version': 1, ...},
{'_id': ..., 'name': 'Type C', 'version': 2, ...},
{'_id': ..., 'name': 'Type A', 'version': 2, ...},
{'_id': ..., 'name': 'Type B', 'version': 4, ...},
{'_id': ..., 'name': 'Type A', 'version': 3, ...},
{'_id': ..., 'name': 'Type B', 'version': 5, ...},
]
I want to return a list containing the documents with the highest version for their respective name, such that the return would look like this, essentially returning the $$ROOT for each distinct name with the highest version:
[
{'_id': ..., 'name': 'Type A', 'version': 3, ...},
{'_id': ..., 'name': 'Type C', 'version': 2, ...},
{'_id': ..., 'name': 'Type B', 'version': 5, ...},
]
I know that I need to use the aggregation pipeline, using group sort and limit, but I can't seem to get what I'm trying to achieve.
$sort by version in descending order
$group by name and get first root document from grouped
(optional) $replaceRoot to replace root object to root
pipeline = [
{ $sort: { version: -1 } },
{
$group: {
_id: "$name",
root: { $first: "$$ROOT" }
}
},
{ $replaceRoot: { newRoot: "$root" } }
]
result = db.collection.aggregate(pipeline)
Playground
Given following database schema:
{
'_id': 5079,
'name': 'Lincoln County',
'state': 'AR',
'population': 13024,
'cases': [{'date': '2020-03-16', 'count': 1}, {'date': '2020-03-22', 'count': 1},
{'date': '2020-03-24', 'count': 1}, {'date': '2020-03-26', 'count': 2}],
'deaths': [{'date': '2020-03-27', 'count': 1}, {'date': '2020-04-02', 'count': 1},
{'date': '2020-05-28', 'count': 2}, {'date': '2020-05-30', 'count': 1}]
}
What MongoDB mapReduce function would generate a collection of the total number of covid19 case counts for each states. Generate one record for each state with its 2-letter abbreviation and its total covid cases?
Try this query:
db.collection.aggregate([
{
"$project": {
"total": {
"$sum": {
"$map": {
"input": "$cases",
"as": "c",
"in": "$$c.count"
}
}
},
"state": 1
}
}
])
Example here
The query uses $map to create an array with values from cases.count and then $sum these values.
Also, the fields ouput are count which contains the $sum and the state using state: 1.
I want to group rows with two conditions. The first one to get total (now it works), the second to get unread messages. I cannot imagine how to do it. Inserts are:
db.messages.insert({'source_user': 'test1', 'destination_user': 'test2', 'is_read': true})
db.messages.insert({'source_user': 'test1', 'destination_user': 'test2', 'is_read': false})
db.messages.insert({'source_user': 'test1', 'destination_user': 'test3', 'is_read': true})
my code:
db.messages.aggregate([
{'$match': {'source_user': user}},
{'$group': {
'_id': {
'source_user': '$source_user',
'destination_user': '$destination_user',
'is_read': '$is_read'
},
'total': {'$sum': 1}}
},
{'$project': {
'source_user': '$_id.source_user',
'destination_user': '$_id.destination_user',
#'unread': {'$sum': {'$_id.is_read': False}},
'total': '$total',
'_id': 0
}}
])
as a result I want to get:
[{
'source_user': 'test1',
'destination_user': 'test2',
'unread': 1,
'total': 2
}, {
'source_user': 'test1',
'destination_user': 'test3',
'unread': 0,
'total': 1
}
]
Should I add a new group or I can use $is_read flag in the same group?
Thank you!
You can count unread messages the same way you do it for total but you need to apply $cond to add 0 only for those that are read and 1 for other ones:
db.messages.aggregate([
{'$match': {'source_user': user}},
{'$group': {
'_id': {
'source_user': '$source_user',
'destination_user': '$destination_user'
},
'total': {'$sum': 1},
'unread': {'$sum': { '$cond': [ '$is_read', 0, 1 ] }}
}
},
{'$project': {
'source_user': '$_id.source_user',
'destination_user': '$_id.destination_user',
'total': 1,
'unread': 1,
'_id': 0
}}
])
MongoDB Playground
Dealing with $lookup was fun until I thought of makeing a join withing the same collection.
Say I have the next collection:
{'_id': ObjectId('5a1a62026462db0032897179'),
'department': ObjectId('5a1982646462db032d58c3f9'),
'name': 'Standards and Quality Department',
'type': 'sub'}, {
'_id': ObjectId('5a1982646462db032d58c3f9'),
'department': false,
'desc': 'Operations Department',
'type': 'main'}
As clearly it says, there's backlinking within the same collection using the department key which could be false to indicate highest level department.
I'm using the next query (Python) to populate the results:
query = [{'$lookup': {'as': '__department',
'foreignField': '_id',
'from': 'departments',
'localField': 'department'}},
{'$unwind': '$__department'},
{'$group': {'__department': {'$first': '$__department'},
'_id': '$_id',
'department': {'$first': '$department'},
'name': {'$first': '$name'},
'type': {'$first': '$type'}}}]
for doc in conn.db.departments.aggregate(query): pprint(doc)
What I'm expecting to get:
{'__department': None,
'_id': ObjectId('5a1982646462db032d58c3f9'),
'department': false,
'name': 'Operations Department',
'type': 'main'},
{'__department': {'_id': ObjectId('5a1982646462db032d58c3f9'),
'department': 'false',
'name': 'Operations Department',
'type': 'main'},
'_id': ObjectId('5a1a62026462db0032897179'),
'department': ObjectId('5a1982646462db032d58c3f9'),
'name': 'Standards and Quality Department',
'type': 'sub'}
What I'm actually getting is:
{'__department': {'_id': ObjectId('5a1982646462db032d58c3f9'),
'department': 'false',
'name': 'Operations Department',
'type': 'main'},
'_id': ObjectId('5a1a62026462db0032897179'),
'department': ObjectId('5a1982646462db032d58c3f9'),
'name': 'Standards and Quality Department',
'type': 'sub'}
I'm not sure why $unwind is grouping both the docs together although before applying $unwind I do get both of them separatly.
Any suggestions?
That is because you create an empty array __department in the document that didn't find a match in the $lookup. This is how your orphan document looks like:
{
"_id" : ObjectId("5a1982646462db032d58c3f9"),
"department" : false,
"desc" : "Operations Department",
"type" : "main",
"__department" : []
}
When you are unwinding there is nothing to $unwind in this document, so it gets lost in the process. If you want to keep it you have to "normalize" your array. So you'd have to add this after your $lookup and before your $unwind:
{
$project: {
_id: 1,
department: 1,
name: 1,
type: 1,
__department: {
$cond: [{
$eq: ["$__department", []]
},
[{
_id: 0,
department: "None",
desc: "None",
type: "None"
}], '$__department'
]
}
}
}
So all together it should look like that:
[{
'$lookup': {
'as': '__department',
'foreignField': '_id',
'from': 'depart',
'localField': 'department'
}
},
{
'$project': {
_id: 1,
department: 1,
name: 1,
type: 1,
__department: {
$cond: [{
$eq: ["$__department", []]
},
[{
_id: 0,
department: "None",
desc: "None",
type: "None"
}], '$__department'
]
}
}
},
{'$unwind': "$__department"},
{'$group': {'__department': {'$first': '$__department'},
'_id': '$_id',
'department': {'$first': '$department'},
'name': {'$first': '$name'},
'type': {'$first': '$type'}}}]