Group by name, then select one document of each name with highest arbitrary field value - mongodb

Let's say we have a collection containing the following documents:
[
{'_id': ..., 'name': 'Type A', 'version': 1, ...},
{'_id': ..., 'name': 'Type B', 'version': 1, ...},
{'_id': ..., 'name': 'Type B', 'version': 2, ...},
{'_id': ..., 'name': 'Type B', 'version': 3, ...},
{'_id': ..., 'name': 'Type C', 'version': 1, ...},
{'_id': ..., 'name': 'Type C', 'version': 2, ...},
{'_id': ..., 'name': 'Type A', 'version': 2, ...},
{'_id': ..., 'name': 'Type B', 'version': 4, ...},
{'_id': ..., 'name': 'Type A', 'version': 3, ...},
{'_id': ..., 'name': 'Type B', 'version': 5, ...},
]
I want to return a list containing the documents with the highest version for their respective name, such that the return would look like this, essentially returning the $$ROOT for each distinct name with the highest version:
[
{'_id': ..., 'name': 'Type A', 'version': 3, ...},
{'_id': ..., 'name': 'Type C', 'version': 2, ...},
{'_id': ..., 'name': 'Type B', 'version': 5, ...},
]
I know that I need to use the aggregation pipeline, using group sort and limit, but I can't seem to get what I'm trying to achieve.

$sort by version in descending order
$group by name and get first root document from grouped
(optional) $replaceRoot to replace root object to root
pipeline = [
{ $sort: { version: -1 } },
{
$group: {
_id: "$name",
root: { $first: "$$ROOT" }
}
},
{ $replaceRoot: { newRoot: "$root" } }
]
result = db.collection.aggregate(pipeline)
Playground

Related

Group all elements with same name with their IDs Mongodb

I want to group all elements with same name and find their IDs and $push them in a list.
I have a dataset like
{
'id': 1,
'name': 'Refrigerator'
},
{
'id': 2,
'name': 'Refrigerator'
},
{
'id': 3,
'name': 'TV'
},
{
'id': 4,
'name': 'TV'
}
Expected Ouput
{
'equipment_name': 'Refrigerator',
'equipment_id': [1, 2]
},
{
'equipment_name': 'TV',
'equipment_id': [3, 4]
}
What I've tried
{'$group': {'_id': '$_id', 'equipmne_name': '$name'}}
{'$project': {'name': {'$push': {'$expr': ['$name', '$name']}}}
And a few more aggregation techniques with $cond
[
{'$group': {'_id': {'key': '$name', 'value': '$_id'}}},
{'$group': {'_id': '$_id.key', 'result': {'$push': {'$toString': '$$ROOT._id.value'}}}},
{'$project': {'_id': 0, 'equipment_name': '$_id', 'equipment_id': '$result'}}
]

Mongodb sort and group by

I'm not sure that my question is correct, but it seems so:
I have a set of rows in my Mongodb, like:
[{'_id': '5b4c9aa7ddc752c1f5844315',
'ccode': 'RU',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5b4cad0dddc752c1f5844322',
'ccode': 'US',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 2,
'regs_age1': 2,
'regs_male': 2}},
{'_id': '5bd88204af4c814883a414b2',
'ccode': 'US',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5bd88204af4c814883a414b3',
'ccode': 'RU',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}}]
And I want to sort them by date and combine because for the same date there are multiple rows from different countries.
So the result should look something like ...
[{'2018-07-16T00:00:00.000Z': [{'_id': '5b4c9aa7ddc752c1f5844315',
'ccode': 'RU',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5b4cad0dddc752c1f5844322',
'ccode': 'US',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 2,
'regs_age1': 2,
'regs_male': 2}}]},
{'2018-10-30T00:00:00.000Z': [{'_id': '5bd88204af4c814883a414b2',
'ccode': 'US',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5bd88204af4c814883a414b3',
'ccode': 'RU',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}}]}]
I tried:
db.getCollection('daily_stats').aggregate([
{'$match': some_condition},
{'$group': {'ccode': 1}}, # ccode or date?
{'$sort': {"date": 1}},
])
But got an error
The field * must be an accumulator object
I googled the error, it's pretty clear, but not seems that related to my case. I don't need any sum, avg, etc functions
Query
sort by date (asceding here, if you need descending put -1)
group by date and collect the ROOT documents
replace the root so you have the date as key
*this assumes you have dates on strings, which is bad idea, if you convert them to date objects, you can still use the query but add
"k":{"$dateToString" : {"date" :"$_id"}}
Test code here
aggregate(
[{"$sort":{"date":1}},
{"$group":{"_id":"$date", "docs":{"$push":"$$ROOT"}}},
{"$replaceRoot":
{"newRoot":{"$arrayToObject":[[{"k":"$_id", "v":"$docs"}]]}}}])
When using $group, you need an _id
From the docs
{
$group:
{
_id: <expression>, // Group By Expression
<field1>: { <accumulator1> : <expression1> },
...
}
}
In your case...
db.getCollection('daily_stats').aggregate([
{'$match': some_condition},
{'$group': {
'_id': "$ccode",
'rates': { $addToSet: '$rates' },
'date': { $first: '$date' }
}},
{'$sort': {"date": 1}},
{'$project: { "_id": 0, "country": "$_id", "rates": 1, "date": 1 }}
])
Playground: https://mongoplayground.net/p/B31XLS9p-6W

db.createView() by matching and comparing two tables in mongodb

I would like to createView by comparing and grouping two table in mongodb.
table1
{id: 1, item: 'apple', house: 'A'},
{id: 2, item: 'apple', house: 'A'},
{id: 3, item: 'apple', house: 'B'},
{id: 4, item: 'banana', house: 'B'},
{id: 5, item: 'banana', house: 'C'},
{id: 6, item: 'pear', house: 'A'},
{id: 7, item: 'pear', house: 'B'},
{id: 8, item: 'pear', house: 'A'},
{id: 9, item: 'pear', house: 'C'},
And in table 2, I need to compare and match
table2
{id: 1, fruits: 'apple', type: 'important'},
{id: 2, fruits: 'banana', type: 'important'},
{id: 3, fruits: 'pear', type: 'notImportant'},
The result I want to get:
houses
{id: 1, house: 'A', totalItem: 4, noOfImportant: 2 },
{id: 2, house: 'B', totalItem: 2, noOfImportant: 2},
{id: 3, house: 'C', totalItem: 2, noOfImportant: 1},
I have tried:
db.createView(
'houses',
'table1',
[
{$lookup: { from: 'table2', localField: 'fruits', foreignField: 'item', as: 'fruits'}},
{
'$group': {
house: '$house',
totalItem: {$sum:1},
noOfImportant: {$sum:'$table2.notImpotant'},
}
},
]
)
but I can't seem to get any thing out. please help me. Thank you very much in advance!
The $lookup stage will return an array, even if it only finds a single matching document.
You can use the $arrayElemAt operator to pick out the first element, or you could simply $unwind the result.
Either way, it looks like there was a typo in your group: noOfImportant: {$sum:'$table2.notImpotant'}, - there doesn't appear to be a field named "table2", I suspect you meant to check the type field of the fruits document.
The $group stage requires that there be an explicitly defined _id field, if you need that field to be called house you can to that afterward with projection.
Since the type field is not numeric, you would need to use $cond in order to count them up.
db.createView(
'houses',
'table1',
[
{$lookup: {
from: 'table2',
localField: 'item',
foreignField: 'fruits',
as: 'fruits'
}},
{$unwind: {
path: '$fruits',
preserveNullAndEmptyArrays: true
}},
{$group: {
_id: '$house',
totalItem: {$sum:1},
noOfImportant: {
$sum:{
$cond:{
if:{$eq:['$fruits.type', 'important']},
then: 1,
else: 0
}
}
},
}},
{$addFields: { house: "$_id" }},
{$project: { _id: 0 }}
]
)
The aggregate functions returns much more descriptive error messages. You should test the pipeline prior to creating the view.
Here is a MongoPlayground that shows this pipeline in action.

MongoDB $expr compare field in array

{
name: 'product A',
min_price: 5,
max_price: 15,
stores: [
{name: 'A', price: 6},
{name: 'B', price: 4}
]
},
{
name: 'product B',
min_price: 9,
max_price: 14,
stores: [
{name: 'C', price: 12},
{name: 'B', price: 10}
]
}
How can I find product have store's price $lt min_price?
I tried:
{$expr: { $lt: [ "$min_price", "$stores.price"] }}
Seems like I am doing it wrong!
You're close, you just need to add $min into the batch:
{$expr: { $lt: [ "$min_price", {$min: "$stores.price"} ] }}

$lookup within the same collection is nesting docs instead of returning all docs

Dealing with $lookup was fun until I thought of makeing a join withing the same collection.
Say I have the next collection:
{'_id': ObjectId('5a1a62026462db0032897179'),
'department': ObjectId('5a1982646462db032d58c3f9'),
'name': 'Standards and Quality Department',
'type': 'sub'}, {
'_id': ObjectId('5a1982646462db032d58c3f9'),
'department': false,
'desc': 'Operations Department',
'type': 'main'}
As clearly it says, there's backlinking within the same collection using the department key which could be false to indicate highest level department.
I'm using the next query (Python) to populate the results:
query = [{'$lookup': {'as': '__department',
'foreignField': '_id',
'from': 'departments',
'localField': 'department'}},
{'$unwind': '$__department'},
{'$group': {'__department': {'$first': '$__department'},
'_id': '$_id',
'department': {'$first': '$department'},
'name': {'$first': '$name'},
'type': {'$first': '$type'}}}]
for doc in conn.db.departments.aggregate(query): pprint(doc)
What I'm expecting to get:
{'__department': None,
'_id': ObjectId('5a1982646462db032d58c3f9'),
'department': false,
'name': 'Operations Department',
'type': 'main'},
{'__department': {'_id': ObjectId('5a1982646462db032d58c3f9'),
'department': 'false',
'name': 'Operations Department',
'type': 'main'},
'_id': ObjectId('5a1a62026462db0032897179'),
'department': ObjectId('5a1982646462db032d58c3f9'),
'name': 'Standards and Quality Department',
'type': 'sub'}
What I'm actually getting is:
{'__department': {'_id': ObjectId('5a1982646462db032d58c3f9'),
'department': 'false',
'name': 'Operations Department',
'type': 'main'},
'_id': ObjectId('5a1a62026462db0032897179'),
'department': ObjectId('5a1982646462db032d58c3f9'),
'name': 'Standards and Quality Department',
'type': 'sub'}
I'm not sure why $unwind is grouping both the docs together although before applying $unwind I do get both of them separatly.
Any suggestions?
That is because you create an empty array __department in the document that didn't find a match in the $lookup. This is how your orphan document looks like:
{
"_id" : ObjectId("5a1982646462db032d58c3f9"),
"department" : false,
"desc" : "Operations Department",
"type" : "main",
"__department" : []
}
When you are unwinding there is nothing to $unwind in this document, so it gets lost in the process. If you want to keep it you have to "normalize" your array. So you'd have to add this after your $lookup and before your $unwind:
{
$project: {
_id: 1,
department: 1,
name: 1,
type: 1,
__department: {
$cond: [{
$eq: ["$__department", []]
},
[{
_id: 0,
department: "None",
desc: "None",
type: "None"
}], '$__department'
]
}
}
}
So all together it should look like that:
[{
'$lookup': {
'as': '__department',
'foreignField': '_id',
'from': 'depart',
'localField': 'department'
}
},
{
'$project': {
_id: 1,
department: 1,
name: 1,
type: 1,
__department: {
$cond: [{
$eq: ["$__department", []]
},
[{
_id: 0,
department: "None",
desc: "None",
type: "None"
}], '$__department'
]
}
}
},
{'$unwind': "$__department"},
{'$group': {'__department': {'$first': '$__department'},
'_id': '$_id',
'department': {'$first': '$department'},
'name': {'$first': '$name'},
'type': {'$first': '$type'}}}]