Trying to figure out something simple in this aggregation. The field "totalArrests" under metadata is coming back 0. It's not able to sum this field from the previous stage for some reason. Please advise.
const agg = await KID.aggregate([
{
$group: {
_id: "$source", // group by this
title: { "$last": "$title"},
comments: { "$last": "$comments"},
body: { "$last": "$body"},
date: { "$last": "$date"},
media: { "$last": "$media"},
source: { "$last": "$source"},
count: { "$sum": 1},
arrestCount: { "$sum": "$arrested"},
rescuedCount: { "$sum": "$rescued"},
}
},
// sorting
{
$sort: {date: sort}
},
// facets for paging
{
$facet: {
metadata: [
{ $count: "total" }, // Returns a count of the number of documents at this stage
{ $addFields: {
page: page,
limit: 30,
totalArrests: {$sum: "$arrestCount"}
}},
],
kids: [ { $skip: (page-1)*30 }, { $limit: 30 } ]
}
},
]);
Here is a sample document in the collection.
[
{
_id: 5e8b922aaf5ccf5ac588398c,
counter: 4,
date: 2017-01-01T17:00:00.000Z,
name: 'Steven Tucker',
arrested: 1,
rescued: 0,
country: 'US',
state: 'NH',
comments: 'Sex trafficking of a minor',
source: 'https://www.justice.gov/opa/pr/new-hampshire-man-indicted-sex-trafficking-minor-connection-interstate-prostitution',
title: 'New ....',
body: 'Steven Tucker, 31, ....',
__v: 0,
media: {
title: 'New Hampshire Man Indicted for Sex ...',
open_graph: [Object],
twitter_card: [Object],
favicon: 'https://www.justice.gov/sites/default/files/favicon.png'
},
id: 5e8b922aaf5ccf5ac588398c,
text: 'New Hampshire Man Indicted',
utcDate: '2017-01-01T12:00'
}
]
$count will only provide you the count for number of documents and escapes all the other things.
So, You have to use one more pipeline in $facet in order to get the documents.
{ $facet: {
metadata: [
{ $group: {
_id: null,
total: { $sum: 1 },
totalArrested: { $sum: "$arrestCount" }
}},
{ $project: {
total: 1,
totalArrested: 1,
page: page,
limit: 30,
hasMore: { $gt: [{ $ceil: { $divide: ["$total", 30] }}, page] }
}}
],
kids: [{ $skip: (page-1) * 30 }, { $limit: 30 }]
}}
Related
I have set of documents in my mongoDB collection. I am looking to get datewise aggregate count of document if date range is more than a day and hourly aggregate count for same column if date query is for single day. The data may have documents with same conversationId, hence it is necessary to group with conversationId as well.Below is sample of data for reference
[
{
"_id":"c438a671-2391-4b85-815c-ecfcb3d2bb54",
"status":"INTERNAL_UPDATE",
"conversationId":"ac44781d-caab-4410-a708-9d6db8480fc3",
"messageIds":[],
"messageId":"4dc02026-ac06-4eb1-aa59-e385fcce4a36",
"responseId":"0c00c83d-61c5-4937-846c-2e6a46aae857",
"conversation":{},
"message":{},
"params":{},
"timestamp":"2021-05-04T11:40:06.552Z",
"source":{}
},
{
"_id":"98370ddf-9ff8-4347-bab7-1f7777ab9e9d",
"status":"NEW",
"conversationId":"b5dc39d2-56a1-4eb6-a728-cdbe33dca580",
"messageIds":[],
"messageId":"ba94b839-f795-44f2-aea0-173d26006f14",
"responseId":"a2b75364-447b-4345-8008-2beccd6cbb34",
"conversation":{},
"message":{},
"params":{},
"timestamp":"2021-05-05T11:40:30.897Z",
"source":{}
},
{
"_id":"db1eae2b-62d9-455c-ab46-dbfc5baf8b67",
"status":"INTERNAL_UPDATE",
"conversationId":"b5dc39d2-56a1-4eb6-a728-cdbe33dcb584",
"messageIds":[],
"messageId":"b83c743b-d36e-4fdd-9c03-21988af47263",
"responseId":"97198c09-0130-48dc-a225-6d0faeff3116",
"conversation":{},
"message":{},
"params":{},
"timestamp":"2021-05-05T11:40:31.418Z",
"source":{}
},
{
"_id":"12a21495-f857-4f18-a06e-f8ba0b951ade",
"status":"NEW",
"conversationId":"8e37c704-add8-4f9f-8e70-d630c24f653b",
"messageIds":[],
"messageId":"51a48362-545c-4f9f-930b-42e4841fc974",
"responseId":"4691468b-a43b-41d1-83df-1349fb554bfa",
"conversation":{},
"message":{},
"params":{},
"timestamp":"2021-05-06T11:43:58.174Z",
"source":{}
},
{
"_id":"4afaa735-4618-40cf-8b4f-00ee83b2c3c5",
"status":"INTERNAL_UPDATE",
"conversationId":"8e37c704-add8-4f9f-8e70-d630c24f653b",
"messageIds":[],
"messageId":"7c860126-bf1e-41b2-a7d3-6bcec3e8d5fb",
"responseId":"09cec9a1-2621-481d-b527-d98b007ef5be",
"conversation":{},
"message":{},
"params":{},
"timestamp":"2021-05-06T11:43:58.736Z",
"source":{}
},
{
"_id":"cf8deeca-2cfd-497e-b92b-03204c84217a",
"status":"NEW",
"conversationId":"3c6870b5-88d6-4e21-8629-28137dea3fee",
"messageIds":[],
"messageId":"da84e414-2269-4812-8ddd-e2cd6c9be4fd",
"responseId":"ae1014b2-0cc1-41f0-9990-cf724ed67ab7",
"conversation":{},
"message":{},
"params":{},
"timestamp":"2021-05-06T13:37:55.060Z",
"source":{}
}
]
Presently I am able to group by conversationId, but unable to get data aggregated datewise or on hourly basis if date range is on single date.
Below is the query for same
db.documentName.aggregate([
{
'$match': {
'$and': [
{
timestamp: {
'$gte': ISODate('2021-05-01T00:00:00.000Z'),
'$lte': ISODate('2021-05-10T23:59:59.999Z')
}
},
{ 'source.author': { '$regex': 'user', '$options': 'i' } },
{},
{}
]
}
},
{ '$group': {
_id: {'conversationId': '$conversationId'} },
{ '$count': 'document_count' }
])
I have tried adding something like, $hour: '$timestamp' with comma separation beside conversationId in $group, but its of no use and is giving error.
The desired result I am trying to get for above data is, something like this
[{"date": "2021-05-04", "doc_count": 1},
{"date": "2021-05-05", "doc_count": 2},
{"date": "2021-05-06", "doc_count": 2}]
As for 2021-05-05 there are 2 docs with different conversationId, and for 2021-05-06 there are 3 docs in total but 2 documents have same conversationId hence aggregate count for 2021-05-06 is also 2. Hope this clarifies my quesiton.
The question is not entirely clear to me, but it sounds like you want something like this:
The groupId is a field to rebuild the date including the hour, or not, according to your condition:
EDIT:
db.collection.aggregate([
{$match: {
timestamp: {
$gte: ISODate("2021-05-01T00:00:00.000Z"),
$lte: ISODate("2021-05-07T23:59:59.999Z")
}
}
},
{$project: {
conversationId: 1,
groupId: {
$dateFromParts: {
year: {$year: "$timestamp"},
month: {$month: "$timestamp"},
day: {$dayOfMonth: "$timestamp"},
hour: {$cond: [
{$gte: [
{$dateDiff: {
startDate: ISODate("2021-05-01T00:00:00.000Z"),
endDate: ISODate("2021-05-07T23:59:59.999Z"),
unit: "day"}}, 1]},
0,
{$hour: "$timestamp"}]}
}
}
}
},
{$group: {_id: {conversationId: "$conversationId", groupId: "$groupId"}}},
{$group: {_id: "$_id.groupId", doc_count: {$sum: 1}}},
{$project: {date: {$toString: "$_id"}, doc_count: 1, _id: 0}}
])
See how it works on the playground example
As suggested by #nimrodserok, for mongo version 4.2.9 the query would be
db.collection.aggregate([
{
$match: {
timestamp: {
$gte: ISODate("2021-05-01T00:00:00.000Z"),
$lte: ISODate("2021-05-07T23:59:59.999Z")
}
}
},
{
$project: {
conversationId: 1,
groupId: {
$dateFromParts: {
year: {
$year: "$timestamp"
},
month: {
$month: "$timestamp"
},
day: {
$dayOfMonth: "$timestamp"
},
hour: {
$cond: [
{
$gte: [
{
$subtract: [
{
$toLong: ISODate("2021-05-07T23:59:59.999Z")
},
{
$toLong: ISODate("2021-05-01T00:00:00.000Z")
}
]
},
86400000
]
},
0,
{
$hour: "$timestamp"
}
]
}
}
}
}
},
{
$group: {
_id: {
conversationId: "$conversationId",
groupId: "$groupId"
}
}
},
{
$group: {
_id: "$_id.groupId",
doc_count: {
$sum: 1
}
}
},
{
$project: {
date: {
$toString: "$_id"
},
doc_count: 1,
_id: 0
}
}
])
I am new to MongoDB. I am looking for expert views about the query I wrote. basically I am calculating the percentage of skills that were in demand in the past 90days. The Query I have written gives the desired output but I feel like this query can be optimized. Kindly point out my mistakes so I can optimize this query with a better understanding of MongoDB.
Sample Document:
{
"_id":{
"$oid":"630a2ba9fe850ebc2d2f4a25"
},
"category":{
"name":"SQL",
"path":",web development,",
"id":{
"$oid":"62fe35f3f1793d2014bfe05f"
}
},
"createdAt":{
"$date":{
"$numberLong":"1661610921812"
}
}
}
My Query:
const SkillsInDemand = await Job.aggregate([
{
$match: { createdAt: { $gte: new Date((new Date().getTime() - (90 * 24 * 60 * 60 * 1000))) }}
},
{
$group: { _id: "$category.name",count: { $sum: 1 }}
},
{
$project: {count: "$count",total: {$sum:count} }
},
{
$project: { percentage: { $multiply: [{ $divide: ["$count", "$total" ] }, 100] }}
},
{
$sort: { percentage: -1 }
}
])
$group stage output:
[
{ _id: 'Flutter', count: 1 },
{ _id: 'SQL', count: 2 },
{ _id: 'Python', count: 9 }
]
1st $project stage output:
[
{ _id: 'Python', count: 9, total: 12 },
{ _id: 'Flutter', count: 1, total: 12 },
{ _id: 'SQL', count: 2, total: 12 }
]
2nd $project stage output:
[
{ _id: 'Python', percentage: 75 },
{ _id: 'Flutter', percentage: 8.333333333333332 },
{ _id: 'SQL', percentage: 16.666666666666664 }
]
Final Output:
[
{ _id: 'Python', percentage: 75 },
{ _id: 'SQL', percentage: 16.666666666666664 },
{ _id: 'Flutter', percentage: 8.333333333333332 }
]
I'm trying to return size of 'orders' and sum of 'item' values for each 'order' for each order from documents like the example document:
orders: [
{
order_id: 1,
items: [
{
item_id: 1,
value:100
},
{
item_id: 2,
value:200
}
]
},
{
order_id: 2,
items: [
{
item_id: 3,
value:300
},
{
item_id: 4,
value:400
}
]
}
]
I'm using following aggregation to return them, everything works fine except I can't get size of 'orders' array because after unwind, 'orders' array is turned into an object and I can't call $size on it since it is an object now.
db.users.aggregate([
{
$unwind: "$orders"
},
{
$project: {
_id: 0,
total_values: {
$reduce: {
input: "$orders.items",
initialValue: 0,
in: { $add: ["$$value", "$$this.value"] }
}
},
order_count: {$size: '$orders'}, //I get 'The argument to $size must be an array, but was of type: object' error
}
},
])
the result I expected is:
{order_count:2, total_values:1000} //For example document
{order_count:3, total_values:1500}
{order_count:5, total_values:2500}
I found a way to get the results that I wanted. Here is the code
db.users.aggregate([
{
$project: {
_id: 1, orders: 1, order_count: { $size: '$orders' }
}
},
{ $unwind: '$orders' },
{
$project: {
_id: '$_id', items: '$orders.items', order_count: '$order_count'
}
},
{ $unwind: '$items' },
{
$project: {
_id: '$_id', sum: { $sum: '$items.value' }, order_count: '$order_count'
}
},
{
$group: {
_id: { _id: '$_id', order_count: '$order_count' }, total_values: { $sum: '$sum' }
}
},
])
output:
{ _id: { _id: ObjectId("5dffc33002ef525620ef09f1"), order_count: 2 }, total_values: 1000 }
{ _id: { _id: ObjectId("5dffc33002ef525620ef09f2"), order_count: 3 }, total_values: 1500 }
I would like to count the status and group them by country.
Data:
[
{ id: 100, status: 'ordered', country: 'US', items: [] },
{ id: 101, status: 'ordered', country: 'UK', items: [] },
{ id: 102, status: 'shipped', country: 'UK', items: [] },
]
Desired aggregation outcome:
[
{ _id: 'US', status: { ordered: 1} },
{ _id: 'UK', status: { ordered: 1, shipped: 1 } }
]
I can $count and $group, but I am not sure how to put this together. Any hint is appreciated.
Thanks,
bluepuama
$group by country and status, and count total
$group by only country and construct array of status and count in key-value format
$set to update status field to object using $arrayToObject
db.collection.aggregate([
{
$group: {
_id: { country: "$country", status: "$status" },
count: { $sum: 1 }
}
},
{
$group: {
_id: "$_id.country",
status: { $push: { k: "$_id.status", v: "$count" } }
}
},
{ $set: { status: { $arrayToObject: "$status" } } }
])
Playground
You can do it with a single $group stage like so:
db.collection.aggregate([
{
$group: {
_id: "$country",
"shipped": {
$sum: {
$cond: [
{
$eq: [
"$status",
"ordered"
]
},
0,
1
]
}
},
"ordered": {
$sum: {
$cond: [
{
$eq: [
"$status",
"shipped"
]
},
0,
1
]
}
}
}
},
{
$project: {
_id: 1,
status: {
shipped: "$shipped",
ordered: "$ordered"
}
}
}
])
Mongo Playground
I need to match one of two fields that must not be equal to zero. How to implement it?
I try these solutions but no luck:
Solution 1:
Model.aggregate[
{
$project: {
accountID: "$_id.accountID",
locationID: "$_id.locationID",
time: "$_id.time",
value: "$value",
actualValue: "$actualValue",
total: { $add: ["$value", "$actualValue"] },
},
},
{
$match: {
total: { $ne: 0 },
},
},
]
With this solution, it will wrong when a negative plus with the opposite version. Example -1500 + 1500 will become zero.
Solution 2
Model.aggregate([
{
$group: {
_id: {
accountID: "$accountID",
locationID: "$locationID",
time: "$time",
},
value: { $sum: "$values.val" },
actualValue: { $sum: "$values.actualVal" },
},
},
{
$addFields: {
absVal: { $abs: "$value" },
absActualVal: { $abs: "$actualValue" },
},
},
{
$project: {
accountID: "$_id.accountID",
locationID: "$_id.locationID",
time: "$_id.time",
value: "$value",
actualValue: "$actualValue",
total: { $add: ["$absVal", "$absActualVal"] },
},
},
{
$match: {
total: { $ne: 0 },
},
},
])
It works, but I lost 1 second from 3.5s to 4.5s when searching in 1m document.
Any suggestion? Thank you first
Some basic boolean logic should suffice, use something like:
Model.aggregate([
{
$match: {
$or: [
{
value: {$ne: 0}
},
{
actualValue: {$ne: 0}
}
]
}
}
{
$project: {
accountID: "$_id.accountID",
locationID: "$_id.locationID",
time: "$_id.time",
value: "$value",
actualValue: "$actualValue",
total: {$add: ["$value", "$actualValue"]},
},
}
])
If you care about efficiency make sure you have a compound index that covers both value and actualValue.