$group after $lookup is taking way too long - mongodb

I have following mongo collection:
{
"_id" : "22pTvYLd7azAAPL5T",
"plate" : "ABC-123",
"company": "AMZ",
"_portfolioType" : "account"
},
{
"_id" : "22pTvYLd7azAAPL5T",
"plate" : "ABC-123",
"_portfolioType" : "sale",
"price": 87.3
},
{
"_id" : "22pTvYLd7azAAPL5T",
"plate" : "ABC-123",
"_portfolioType" : "sale",
"price": 88.9
}
And I am trying to aggregate all documents which have same value in plate field. Below is the query I have written so far:
db.getCollection('temp').aggregate([
{
$lookup: {
from: 'temp',
let: { 'p': '$plate', 't': '$_portfolioType' },
pipeline: [{
'$match': {
'_portfolioType': 'sale',
'$expr': { '$and': [
{ '$eq': [ '$plate', '$$p' ] },
{ '$eq': [ '$$t', 'account' ] }
]}
}
}],
as: 'revenues'
},
},
{
$project: {
plate: 1,
company: 1,
totalTrades: { $arrayElemAt: ['$revenues', 0] },
},
},
{
$addFields: {
revenue: { $add: [{ $multiply: ['$totalTrades.price', 100] }, 99] },
},
},
{
$group: {
_id: '$company',
revenue: { $sum: '$revenue' },
}
}
])
Query works fine if I remove $group stage, however, as soon as I add $group stage mongo starts an infinite processing. I tried adding $match as the first stage so to limit number of documents to process but without any luck. E.g:
{
$match: { $or: [{ _portfolioType: 'account' }, { _portfolioType: 'sale' }] }
},
I also tried using { explain: true } but it doesn't return anything helpful.

As Neil Lunn noticed, you very likely don't need the lookup to reach your "end goal", which is still quite vague.
Please read comments and adjust as needed:
db.temp.aggregate([
{$group:{
// Get unique plates
_id: "$plate",
// Not clear what you expect if there are documents with
// different company, and the same plate.
// Assuming "it never happens"
// You may need to $cond it here with {$eq: ["$_portfolioType", "account"]}
// but you never voiced it.
company: {$first:"$company"},
// Not exactly all documents with _portfolioType: sale,
// but rather price from all documents for this plate.
// Assuming price field is available only in documents
// with "_portfolioType" : "sale". Otherwise add a $cond here.
// If you really need "all documents", push $$ROOT instead.
prices: {$push: "$price"}
}},
{$project: {
company: 1,
// Apply your math here, or on the previous stage
// to calculate revenue per plate
revenue: "$prices"
}}
{$group: {
// Get document for each "company"
_id: "$company",
// Revenue associated with plate
revenuePerPlate: {$push: {"k":"$_id", "v":"$revenue"}}
}},
{$project:{
_id: 0,
company: "$_id",
// Count of unique plate
platesCnt: {$size: "$revenuePerPlate"},
// arrayToObject if you wish plate names as properties
revenuePerPlate: {$arrayToObject: "$revenuePerPlate"}
}}
])

Related

MongoDB select best matched document

I have a collection of documents like this:
[{
"_id" : ObjectId("6347e5aa0c009a37b81da700"),
"testField1" : "1000",
"testField2" : "2000",
"testField3" : NumberInt(1)
},
{
"_id" : ObjectId("6347e5890c009a37b81da701"),
"testField2" : 2000,
"testField3" : NumberInt(2)
},
{
"_id" : ObjectId("6347e5960c009a37b81da702"),
"testField3" : NumberInt(3)
}]
I need to retrieve documents in the below precedence.
if testField1 and testField2 exist and match their values, the query should return that document.
Otherwise, if testField2 exists and matches its value, the query should return that document,
Otherwise it should return the last document, where testField1 & testField2 do not exist.
I tried the below query, but it returns all the documents.
db.getCollection("TEST_COLLECTION").aggregate([
{
$match: {
$expr: {
$cond: {
if: {
$and: {"testField1": "1000", "testField2": "2000"}
},
then: {
$and: {"testField1": "1000", "testField2": "2000"}
},
else : {
$cond: {
if: {
$and: {"testField1": null, "testField2": "2000"}
},
then: {
$and: {"testField1": null, "testField2": "2000"}
},
else : {
$and: {"testField1": null, "testField2": null}
}
}
}
}
}
}
}
])
There are definitely still some open questions from the comments. #ray has an interesting approach linked in there that uses $setWindowFields which may be appropriate depending on exactly what you're looking for.
I took a different approach (and perhaps interpretation) and built out the following aggregation that uses $unionWith:
db.collection.aggregate([
{
$match: {
testField1: "1000",
testField2: "2000"
}
},
{
"$addFields": {
sortOrder: 1
}
},
{
"$unionWith": {
"coll": "collection",
"pipeline": [
{
$match: {
testField2: "2000"
}
},
{
"$addFields": {
sortOrder: 2
}
}
]
}
},
{
"$unionWith": {
"coll": "collection",
"pipeline": [
{
$match: {
testField1: {
$exists: false
},
testField2: {
$exists: false
}
}
},
{
"$addFields": {
sortOrder: 3
}
},
]
}
},
{
$sort: {
sortOrder: 1
}
},
{
$limit: 1
},
{
"$unset": "sortOrder"
}
])
Basically the aggregation will internally issue three queries, one corresponding with each of three precedence conditions. Similar to #ray's solution, it creates a field to sort on (sortOrder in mine) since the ordering of $unionWith is unspecified otherwise per the documentation. After the $sort we can $limit to a single result and $unset the temporary sorting field prior to returning the result to the client. Depending on the version you are running, you could consider adding a couple of inline $limits for each of the subpipelines to reduce the amount of work being done. Along with appropriate indexes (perhaps just { testField2: 1, testField: 1 }), this operation should be reasonably efficient.
Here is the playground link.
If there are several groups and you need to return the wanted document per group, I would go with #ray's answer. If there is only one group (as implies on your comment, and on #user20042973's nice answer), I would like to point another obvious option:
db.collection.aggregate([
{$facet: {
op1: [{$match: {testField1: "1000", testField2: "2000"}}],
op2: [{$match: {testField1: null, testField2: "2000"}}],
op3: [{$match: {testField1: null, testField2: null}},
{$sort: {timestamp: -1}}, {$limit: 1}]
}},
{$project: {res: {$ifNull: [{$first: "$op1"}, {$first: "$op2"}, {$first: "$op3"}]}}},
{$replaceRoot: {newRoot: "$res"}}
])
See how it works on the playground example

Is there a way to project max value in a range then finding documents within a new range starting at this max value in just one aggregate?

Given the following data in a Mongo collection:
{
_id: "1",
dateA: ISODate("2021-12-31T00:00.000Z"),
dateB: ISODate("2022-01-11T00:00.000Z")
},
{
_id: "2",
dateA: ISODate("2022-01-02T00:00.000Z"),
dateB: ISODate("2022-01-08T00:00.000Z")
},
{
_id: "3",
dateA: ISODate("2022-01-03T00:00.000Z"),
dateB: ISODate("2022-01-05T00:00.000Z")
},
{
_id: "4",
dateA: ISODate("2022-01-09T00:00.000Z"),
dateB: null
},
{
_id: "5",
dateA: ISODate("2022-01-11T00:00.000Z"),
dateB: ISODate("2022-01-11T00:00.000Z")
},
{
_id: "6",
dateA: ISODate("2022-01-12T00:00.000Z"),
dateB: null
}
And given the range below:
ISODate("2022-01-01T00:00.000Z") .. ISODate("2022-01-10T00:00.000Z")
I want to find all values with dateA within given range, then I want to decrease the range starting it from the max dateB value, and finally fetching all documents that doesn't contain dateB.
In resume:
I'll start with range
ISODate("2022-01-01T00:00.000Z") .. ISODate("2022-01-10T00:00.000Z")
Then change to range
ISODate("2022-01-08T00:00.000Z") .. ISODate("2022-01-10T00:00.000Z")
Then find with
dateB: null
Finally, the result would be the document with
_id: "4"
Is there a way to find the document with _id: "4" in just one aggregate?
I know how to do it programmatically using 2 queries, but the main goal is to have just one request to the database.
You can use $max to find the maxDateB first. Then perform a self $lookup to apply the $match and find doc _id: "4".
db.collection.aggregate([
{
$match: {
dateA: {
$gte: ISODate("2022-01-01"),
$lt: ISODate("2022-01-10")
}
}
},
{
"$group": {
"_id": null,
"maxDateB": {
"$max": "$dateB"
}
}
},
{
"$lookup": {
"from": "collection",
"let": {
start: "$maxDateB",
end: ISODate("2022-01-10")
},
"pipeline": [
{
$match: {
$expr: {
$and: [
{
$gte: [
"$dateA",
"$$start"
]
},
{
$lt: [
"$dateA",
"$$end"
]
},
{
$eq: [
"$dateB",
null
]
}
]
}
}
}
],
"as": "result"
}
},
{
"$unwind": "$result"
},
{
"$replaceRoot": {
"newRoot": "$result"
}
}
])
Here is the Mongo Playground for your
Assuming the matched initial dateA range is not huge, here is alternate approach that exploits $push and $filter and avoids the hit of a $lookup stage:
db.foo.aggregate([
{$match: {dateA: {$gte: new ISODate("2022-01-01"), $lt: new ISODate("2022-01-10")} }},
// Kill 2 birds with one stone here. Get the max dateB AND prep
// an array to filter later. The items array will be as large
// as the match above but the output of this stage is a single doc:
{$group: {_id: null,
maxDateB: {$max: "$dateB" },
items: {$push: "$$ROOT"}
}},
{$project: {X: {$filter: {
input: "$items",
cond: {$and: [
// Each element of 'items' is passed as $$this so use
// dot notation to get at individual fields. Note that
// all other peer fields to 'items' like 'maxDateB' are
// in scope here and addressable using '$':
{$gt: [ "$$this.dateA", "$maxDateB"]},
{$eq: [ "$$this.dateB", null ]}
]}
}}
}}
]);
This yields a single doc result (I added an additional doc _id 41 to test the null equality for more than 1 doc):
{
"_id" : null,
"X" : [
{
"_id" : "4",
"dateA" : ISODate("2022-01-09T00:00:00Z"),
"dateB" : null
},
{
"_id" : "41",
"dateA" : ISODate("2022-01-09T00:00:00Z"),
"dateB" : null
}
]
}
It is possible to $unwind and $replaceRoot after this but there is little need to do so.

Mongodb - Get sales per hours

Good people! I am in need of your help.
I am trying to create a line graph using apexcharts with data imported from Mongodb.
I am trying to graph hourly sales, so I need the number of sales for each hour of the day.
Example Mongodb document.
{
"_id" : ObjectId("5dbee4eed6f04aaf191abc59"),
"seller_id" : "5aa1c2c35ef7a4e97b5e995a",
"temp" : "4.3",
"sale_type" : "coins",
"createdAt" : ISODate("2020-05-10T00:10:00.000Z"),
"updatedAt" : ISODate("2019-11-10T14:32:14.650Z")
}
Up to now I have a query like this:
db.getCollection('sales').aggregate([
{ "$facet": {
"00:00": [
{ "$match" : {createdAt: {$gte: ISODate("2020-05-10T00:00:00.000Z"),$lt: ISODate("2020-05-10T00:59:00.001Z")},seller_id: "5aa1c2c35ef7a4e97b5e995a",
}},
{ "$count": "sales" },
],
"01:00": [
{ "$match" : {createdAt: {$gte: ISODate("2020-05-10T01:00:00.000Z"),$lt: ISODate("2020-05-10T01:59:00.001Z")},seller_id: "5aa1c2c35ef7a4e97b5e995a",
}},
{ "$count": "sales" },
],
"02:00": [
{ "$match" : {createdAt: {$gte: ISODate("2020-05-10T02:00:00.000Z"),$lt: ISODate("2020-05-10T02:59:00.001Z")},seller_id: "5aa1c2c35ef7a4e97b5e995a",
}},
{ "$count": "sales" },
],
"03:00": [
{ "$match" : {createdAt: {$gte: ISODate("2020-05-10T03:00:00.000Z"),$lt: ISODate("2020-05-10T03:59:00.001Z")},seller_id: "5aa1c2c35ef7a4e97b5e995a",
}},
{ "$count": "sales" },
],
}},
{ "$project": {
"ventas0": { "$arrayElemAt": ["$01:00.sales", 0] },
"ventas1": { "$arrayElemAt": ["$02:00.sales", 0] },
"ventas3": { "$arrayElemAt": ["$03:00.sales", 0] },
}}
])
But I am sure there is a more efficient way to do this.
My expected output looks like this:
[countsale(00:00),countsale(01:00),countsale(02:00),countsale(03:00), etc to 24 hs]
You are correct, there is a more efficient way to do this. We can use Date expression operators and specifically by grouping with $hour.
db.getCollection('sales').aggregate([
{
$match: {
createdAt: {$gte: ISODate("2020-05-10T00:00:00.000Z"), $lt: ISODate("2020-05-11T00:00:00.001Z")}
}
},
{
$group: {
_id: {$hour: "$createdAt"},
count: {$sum: 1}
}
},
{
$sort: {
_id: 1
}
}
]);
This will give you this result:
[
{
_id: 0,
count: x
},
{
_id: 1,
count: y
},
...
{
_id: 23,
count: z
}
]
From here you can restructure the data easily as you wish.
A problem I forsee happening are hours without any matches (i.e count=0) will not exists in the result set. you'll have to fill in those gaps manually.

MongoDB aggregate pipeline group

I am trying to build a pipeline which will search for documents based on certain criteria and will group certain fields to give desired output. Document structure of deals is
{
"_id":"123",
"status":"New",
"deal_amount":"5200",
"deal_date":"2018-03-05",
"data_source":"API",
"deal_type":"New Business",
"account_id":"A1"
},
{
"_id":"456",
"status":"New",
"deal_amount":"770",
"deal_date":"2018-02-11",
"data_source":"API",
"deal_type":"New Business",
"account_id":"A2"
},
{
"_id":"885",
"status":"Old",
"deal_amount":"4070",
"deal_date":"2017-09-22",
"data_source":"API",
"deal_type":"New Business",
"account_id":"A2"
},
Account name is referenced field. Account document goes like this:
{
"_id":"A1",
"name":"Sarah",
},
{
"_id":"A2",
"name":"Amber",
},
The pipeline should search for documents whose 'status' is 'New' and 'deal amount' is more than 2000 and it should group by 'account name'. Pipeline i have used goes like this
db.deal.aggregate([{
$match: {
status: New,
deal_amount: {
$gte: 2000,
}
}
}, {
$group: {
_id: "$account_name",
}
},{
$lookup:{
from:"accounts",
localField:"account_id",
foreignField:"_id",
as:"acc",
}
}
])
I want to show fields deal_amount, deal_type, deal_date and account name only in result.
Expected Result:
{
"_id": "123",
"deal_amount": "5200",
"deal_date": "2018-03-05",
"deal_type": "New Business",
"account_name": "Sarah"
}, {
"_id": "885",
"deal_amount": "4070",
"deal_date": "2017-09-22",
"deal_type": "New Business",
"account_name": "Amber"
},
Do i have to include all the these fields,deal_amount, deal_type, deal_date & account name, in 'group' stage in order to show in result or is there any other ways to do it. Any help is highly appreciated.
Please use this query.
aggregate([{
$match: {
status: "New",
deal_amount: {
$gte: 2000,
}
}
},
{
$lookup:{
from:"accounts",
localField:"account_id",
foreignField:"_id",
as:"acc",
}
},
{
$unwind: {
path: '$acc',
preserveNullAndEmptyArrays: true,
},
},
{
$group: {
_id: "$acc._id",
deal_amount: { $first: '$deal_amount' },
deal_date: { $first: '$deal_date' },
deal_type: { $first: '$deal_type' },
}
}
])
You can do by :
1) using $$ROOT
reference: link
{ $group : {
_id : "$author",
data: { $push : "$$ROOT" }
}}
2) by assign single parameter
{
$group: {
_id: "$account_name",
deal_amount: { $first: '$deal_amount' },
deal_date: { $first: '$deal_date' },
.
.
}
}
Not sure why you need $group stage. You just need to add $project stage to output the account name from the referenced collection.
{
"$project": {
"deal_amount": 1,
"deal_type": 1,
"deal_date": 1,
"account_name": {"$let":{"vars":{"accl":{"$arrayElemAt":["$acc", 0]}}, in:"$$accl.name}}
}
}
One thing to start with, your $gte operator doesn't work on the string field deal_amount, so you might want to change the field to integers or something similar:
// Convert String to Integer
db.deals.find().forEach(function(data) {
db.deals.update(
{_id:data._id},
{$set:{deal_amount:parseInt(data.deal_amount)}});
Then, to get just the fields you need, reshape the document using $project:
db.deals.aggregate([{
$match: {
"status": "New",
"deal_amount" : {
"$gte" : 2000
}
}
},
{
$lookup:{
from:"accounts",
localField:"account_id",
foreignField:"_id",
as:"acc",
}
},
{
$project: {
_id: 1,
deal_amount: 1,
deal_type: 1,
deal_date: 1,
"account_name": {"$let":{"vars":{"accl":{"$arrayElemAt":["$acc", 0]}}, in:"$$accl.name"}}
}
}
]);
For me, this produced:
{
"_id" : "123",
"deal_amount" : 5200.0,
"deal_date" : "2018-03-05",
"deal_type" : "New Business",
"account_name" : "Sarah"
}
db.deal.aggregate([{$match: {status: {$eq: 'New'}, deal_amount: {$gte: '2000'}}}, {$group: {_id: {accountName: '$account_id', type: '$deal_type', 'amount': '$deal_amount'}}}])

Mongodb Aggregation count array/set size

Here's my problem:
Model:
{ application: "abc", date: Time.now, status: "1" user_id: [ id1, id2,
id4] }
{ application: "abc", date: Time.yesterday, status: "1", user_id: [
id1, id3, id5] }
{ application: "abc", date: Time.yesterday-1, status: "1", user_id: [
id1, id3, id5] }
I need to count the unique number of user_ids in a period of time.
Expected result:
{ application: "abc", status: "1", unique_id_count: 5 }
I'm currently using the aggregation framework and counting the ids outside mongodb.
{ $match: { application: "abc" } }, { $unwind: "$users" }, { $group:
{ _id: { status: "$status"},
users: { $addToSet: "$users" } } }
My arrays of users ids are very large, so I have to iterate the dates or I'll get the maximum document limit (16mb).
I could also $group by
{ year: { $year: "$date" }, month: { $month: "$date" }, day: {
$dayOfMonth: "$date" }
but I also get the document size limitation.
Is it possible to count the set size in mongodb?
thanks
The following will return number of uniqueUsers per application. This will apply an group operation to a result of a group operation by using pipeline feature of mongodb.
{ $match: { application: "abc" } },
{ $unwind: "$users" },
{ $group: { _id: "$status", users: { $addToSet: "$users" } } },
{ $unwind:"$users" },
{ $group : {_id : "$_id", count : {$sum : 1} } }
Hopefully this will be done in an easier way in the following releases of mongo by a command which gives the size of an array under a projection. {$project: {id: "$_id", count: {$size: "$uniqueUsers"}}}
https://jira.mongodb.org/browse/SERVER-4899
Cheers
Sorry I'm a little late to the party. Simply grouping on the 'user_id' and counting the result with a trivial group works just fine and doesn't run into doc size limits.
[
{$match: {application: 'abc', date: {$gte: startDate, $lte: endDate}}},
{$unwind: '$user_id'},
{$group: {_id: '$user_id'}},
{$group: {_id: 'singleton', count: {$sum: 1}}}
];
Use $size to get the size of set.
[
{
$match: {"application": "abc"}
},
{
$unwind: "$user_id"
},
{
$group: {
"_id": "$status",
"application": "$application",
"unique_user_id": {$addToSet: "$user_id"}
}
},
{
$project:{
"_id": "$_id",
"application": "$application",
"count": {$size: "$unique_user_id"}
}
}
]