I have a mongodb database with many users and one of the subdocuments I track is file uploads and their statuses through a review process. Every file upload will have an attachment status eventually. I want to be able to pull some metrics to get the total of the current statuses for each uploaded file. I started building an aggregate query that pulls the latest attachment subdocument status from each file uploaded and count them.
The data structure is as follows:
"userName": "johnDoe",
"email": "johnDoe#gmail.com",
"uploads" : [
{
"_id" : ObjectId("adh12451e0012ce9da0"),
"fileName" : "TestDoc.txt",
"fileType" : "text/plain",
"created" : ISODate("2021-01-06T15:26:14.166Z"),
"attachmentStatus" : [ ]
},
{
"_id" : ObjectId("5ff5d6c066cacc0012ed655a"),
"fileName" : "testerABC.txt",
"fileType" : "text/plain",
"created" : ISODate("2021-01-06T15:26:56.027Z"),
"attachmentStatus" : [
{
"_id" : ObjectId("60884f733f88bd00129b9ad4"),
"status" : "Uploaded",
"date" : ISODate("2021-04-22T02:23:00Z")
},
{
"_id" : ObjectId("60884f733f88bd00129b9ad5"),
"status" : "Processing",
"date" : ISODate("2021-04-26T04:54:00Z")
}
]
},
{
"_id" : ObjectId("6075c82a19fdcc0012f81907"),
"fileName" : "Test file.docx",
"fileType" : "application/word",
"created" : ISODate("2021-04-13T16:34:50.955Z"),
"attachmentStatus" : [
{
"_id" : ObjectId("72844f733f88bd11479b9ad7"),
"status" : "Uploaded",
"date" : ISODate("2021-04-23T03:42:00Z")
},
{
"_id" : ObjectId("724986d73f88bd00147c9wt8"),
"status" : "Completed",
"date" : ISODate("2021-04-24T01:37:00Z")
}
]
}
]
"userName": "janeDoe",
"email": "janeDoe#gmail.com",
"uploads" : [
{
"_id" : ObjectId("ej9784652h0012ce9da0"),
"fileName" : "myResume.txt",
"fileType" : "text/plain",
"created" : ISODate("2021-02-13T12:36:14.166Z"),
"attachmentStatus" : [
{
"_id" : ObjectId("15dhdf6f88bd00147c9wt8"),
"status" : "Completed",
"date" : ISODate("2021-04-24T01:37:00Z")
}
]
},
How can I pull the latest attachment status out for each file uploaded and then summarize the statuses?
I want something like this:
{ "status" : "Completed", "Count" : 2 }
{ "status" : "Processing", "Count" : 1 }
...
I get very close with this Aggregate query, but it will grab each and every status and not just the the single most current Status for each file. (one current status per file).
db.myDB.aggregate([
{
"$match" : {
"uploads.attachmentStatus": {
"$elemMatch": { "status": { "$exists": true } }
}
}
},
{ $unwind: "$uploads"},
{ $unwind: "$uploads.attachmentStatus"},
{
$sortByCount: "$uploads.attachmentStatus.status"
},
{
$project: {
_id:0,
status: "$_id",
Count: "$count"
}
}
]).pretty();
Any suggestions?
Demo - https://mongoplayground.net/p/zzOR9qhqny0
{ $sort: { "uploads.attachmentStatus.date": -1 } }, to get the latest 1st
{ $group: { _id: "$uploads._id", status: { $first: "$uploads.attachmentStatus.status" } } } Group the records by uploads._id and take the top status (which is the latest status after the sort by date).
Query
{ $sort: { "uploads.attachmentStatus.date": -1 } },
{ $group: { _id: "$uploads._id", status: { $first: "$uploads.attachmentStatus.status" } } },
Complete query
db.collection.aggregate([
{ $match: { "uploads.attachmentStatus": { "$elemMatch": { "status": { "$exists": true } } } } },
{ $unwind: "$uploads" },
{ $unwind: "$uploads.attachmentStatus" },
{ $sort: { "uploads.attachmentStatus.date": -1 } },
{ $group: { _id: "$uploads._id", status: { $first: "$uploads.attachmentStatus.status" } } },
{ $sortByCount: "$status" },
{ $project: { _id: 0, status: "$_id", Count: "$count" } }
])
Related
I have multiple documents of order logs and I am trying to compress them into one document using $objectToArray. Below is the collection and the output I'm trying to figure out. I also include my query but it does not compress the data.
{
"ordernumber": 21001,
"ordername":"testorder1",
"status" : "Ordered",
"modifiedat" : ISODate("2021-06-30T17:02:17.165Z")
},
{
"ordernumber": 21001,
"ordername":"testorder1",
"status" : "Order Received",
"modifiedat" : ISODate("2021-07-01T03:57:47.533Z")
},
{
"ordernumber": 21001,
"ordername":"testorder1",
"status" : "Delivered",
"modifiedat" : ISODate("2021-08-17T23:53:24.878Z")
},
{
"ordernumber": 21002,
"ordername":"testorder2",
"status" : "Ordered",
"modifiedat" : ISODate("2021-07-17T23:53:24.878Z")
},
{
"ordernumber": 21002,
"ordername":"testorder2",
"status" : "Order Received",
"modifiedat" : ISODate("2021-07-19T04:07:47.686Z")
},
{
"ordernumber": 21002,
"ordername":"testorder2",
"status" : "Order Cancelled",
"modifiedat" : ISODate("2021-07-20T15:42:23.123Z")
},
Each ordernumber should consist all the logs in one document
OUTPUT:
{
"ordernumber": 21001,
"ordername":"testorder1",
"orderlogs": [
{
"status" : "Ordered",
"modifiedat" : ISODate("2021-06-30T17:02:17.165Z")
},
{
"status" : "Order Received",
"modifiedat" : ISODate("2021-07-01T03:57:47.533Z")
},
{
"status" : "Delivered",
"modifiedat" : ISODate("2021-08-17T23:53:24.878Z")
}
]
},
{
"ordernumber": 21002,
"ordername":"testorder2",
"orderlogs": [
{
"status" : "Ordered",
"modifiedat" : ISODate("2021-07-17T23:53:24.878Z")
},
{
"status" : "Order Received",
"modifiedat" : ISODate("2021-07-19T04:07:47.686Z")
},
{
"status" : "Order Cancelled",
"modifiedat" : ISODate("2021-07-20T15:42:23.123Z")
}
]
},
I have a query created but it only return one array per document.(still in multiple document)
{
$project: {
ordernumber: "$ordernumber",
ordername:"$ordername",
orderlogs:
{$objectToArray: {
status:"$status",
modifiedat: "$modifiedat"
}
}
}
}
$addFields - Add current document with $$ROOT into new field, orderlog.
$project - Not to display _id, ordernumber, ordername for orderlog
$group - Group by ordernumber and ordername
$project - Display ordernumber, ordername, orderlog fields
db.collection.aggregate([
{
"$addFields": {
"orderlog": "$$ROOT"
}
},
{
$project: {
"orderlog": {
"_id": 0,
"ordernumber": 0,
"ordername": 0
}
}
},
{
$group: {
_id: {
ordernumber: "$ordernumber",
ordername: "$ordername"
},
orderlogs: {
$push: "$orderlog"
}
}
},
{
$project: {
_id: 0,
ordernumber: "$_id.ordernumber",
ordername: "$_id.ordername",
orderlogs: "$orderlogs"
}
}
])
Sample Mongo Playground
Sample Colloection Data :
{
"_id" : ObjectId("5f30df23243ffsdfwer3d14568bf"),
"value" : {
"busId" : 200.0,
"status" : {
"code" : {
"id" : 1.0,
"key" : "2100",
"value" : "Complete"
}
}
}
}
My Query does provides the right result, but would like to squeeze the output more by using multiple grouping or $project or any other aggregators.
mongo Query:
db.suraj_coll.aggregate([
{
$addFields: {
"value.available": {
$cond: [
{
$in: [
"$value.status.code.value",
[
"Accept",
"Complete"
]
]
},
"Approved",
"Rejected"
]
}
}
},
{
"$group": {
"_id": {
busID: "$value.busId",
status: "$value.available"
},
"subtotal": {
$sum: 1
}
}
}
])
Output:
/* 1 */
{
"_id" : {
"busID" : 200.0,
"status" : "Approved"
},
"subtotal" : 3.0
}
/* 2 */
{
"_id" : {
"busID" : 200.0,
"status" : "Rejected"
},
"subtotal" : 1.0
}
Is it possible to squeeze the output more by using any further grouping ?
Output should look like below
{
"_id" : {
"busID" : 200.0,
"Approved" : 3.0
"Rejected" : 1.0
}
}
tried with $project, by keeping the count in a doc , but couldn't place the count against Approve or Rejected.
Any suggestion would be great.
You can use more two pipelines after your query,
$group by busID and push status and count in status
$project to convert status array to object using $arrayToObject and merge with busID using $mergeObjects
{
$group: {
_id: "$_id.busID",
status: {
$push: {
k: "$_id.status",
v: "$subtotal"
}
}
}
},
{
$project: {
_id: {
$mergeObjects: [
{ busID: "$_id" },
{ $arrayToObject: "$status" }
]
}
}
}
Playground
How to get percentage total of data with group by date in MongoDB ?
Link example : https://mongoplayground.net/p/aNND4EPQhcb
I have some collection structure like this
{
"_id" : ObjectId("5ccbb96706d1d47a4b2ced4b"),
"date" : "2019-05-03T10:39:53.108Z",
"id" : 166,
"update_at" : "2019-05-03T10:45:36.208Z",
"type" : "image"
}
{
"_id" : ObjectId("5ccbb96706d1d47a4b2ced4c"),
"date" : "2019-05-03T10:39:53.133Z",
"id" : 166,
"update_at" : "2019-05-03T10:45:36.208Z",
"type" : "image"
}
{
"_id" : ObjectId("5ccbb96706d1d47a4b2ced4d"),
"date" : "2019-05-03T10:39:53.180Z",
"id" : 166,
"update_at" : "2019-05-03T10:45:36.208Z",
"type" : "image"
}
{
"_id" : ObjectId("5ccbb96706d1d47a4b2ced4e"),
"date" : "2019-05-03T10:39:53.218Z",
"id" : 166,
"update_at" : "2019-05-03T10:45:36.208Z",
"type" : "image"
}
And I have query in mongodb to get data of collection, how to get percentage of total data. in bellow example query to get data :
db.name_collection.aggregate(
[
{ "$match": {
"update_at": { "$gte": "2019-11-04T00:00:00.0Z", "$lt": "2019-11-06T00:00:00.0Z"},
"id": { "$in": [166] }
} },
{
"$group" : {
"_id": {
$substr: [ '$update_at', 0, 10 ]
},
"count" : {
"$sum" : 1
}
}
},
{
"$project" : {
"_id" : 0,
"date" : "$_id",
"count" : "$count"
}
},
{
"$sort" : {
"date" : 1
}
}
]
)
and this response :
{
"date" : "2019-11-04",
"count" : 39
},
{
"date" : "2019-11-05",
"count" : 135
}
how to get percentage data total from key count ? example response to this :
{
"date" : "2019-11-04",
"count" : 39,
"percentage" : "22%"
},
{
"date" : "2019-11-05",
"count" : 135,
"percentage" : "78%"
}
You have to group by null to get total count and then use $map to calculate the percentage. $round will be a useful operator in such case. Finally you can $unwind and $replaceRoot to get back the same number of documents:
db.collection.aggregate([
// previous aggregation steps
{
$group: {
_id: null,
total: { $sum: "$count" },
docs: { $push: "$$ROOT" }
}
},
{
$project: {
docs: {
$map: {
input: "$docs",
in: {
date: "$$this.date",
count: "$$this.count",
percentage: { $concat: [ { $toString: { $round: { $multiply: [ { $divide: [ "$$this.count", "$total" ] }, 100 ] } } }, '%' ] }
}
}
}
}
},
{
$unwind: "$docs"
},
{
$replaceRoot: { newRoot: "$docs" }
}
])
Mongo Playground
I'm trying to clean a huge database.
Sample DB :
{
"_id" : ObjectId("59fc5249d5ab401d99f3de7f"),
"addedAt" : ISODate("2017-11-03T11:26:01.744Z"),
"__v" : 0,
"check" : 17602,
"lastCheck" : ISODate("2018-04-05T11:47:00.609Z"),
"tracking" : [
{
"timeCheck" : ISODate("2017-11-06T13:17:20.861Z"),
"_id" : ObjectId("5a0060e00f3c330012bafe39"),
"rank" : 2395,
},
{
"timeCheck" : ISODate("2017-11-06T13:22:31.254Z"),
"_id" : ObjectId("5a0062170f3c330012bafe77"),
"rank" : 2395,
},
{
"timeCheck" : ISODate("2017-11-06T13:27:40.551Z"),
"_id" : ObjectId("5a00634c0f3c330012bafebe"),
"rank" : 2379,
},
{
"timeCheck" : ISODate("2017-11-06T13:32:41.084Z"),
"_id" : ObjectId("5a0064790f3c330012baff03"),
"rank" : 2395,
},
{
"timeCheck" : ISODate("2017-11-06T13:37:51.012Z"),
"_id" : ObjectId("5a0065af0f3c330012baff32"),
"rank" : 2379,
},
{
"timeCheck" : ISODate("2017-11-07T13:37:51.012Z"),
"_id" : ObjectId("5a0065af0f3c330012baff34"),
"rank" : 2379,
}]
}
I have a lot of duplicate value but I need to clean only by day.
To obtain this for example :
{
"_id" : ObjectId("59fc5249d5ab401d99f3de7f"),
"addedAt" : ISODate("2017-11-03T11:26:01.744Z"),
"__v" : 0,
"check" : 17602,
"lastCheck" : ISODate("2018-04-05T11:47:00.609Z"),
"tracking" : [
{
"timeCheck" : ISODate("2017-11-06T13:17:20.861Z"),
"_id" : ObjectId("5a0060e00f3c330012bafe39"),
"rank" : 2395,
},
{
"timeCheck" : ISODate("2017-11-06T13:27:40.551Z"),
"_id" : ObjectId("5a00634c0f3c330012bafebe"),
"rank" : 2379,
},
{
"timeCheck" : ISODate("2017-11-07T13:37:51.012Z"),
"_id" : ObjectId("5a0065af0f3c330012baff34"),
"rank" : 2379,
}]
}
How can I aggregate by day and after delete last value duplicate?
I need to keep the values per day even if they are identical with another day.
The aggregation framework cannot update data at this stage. However, you can use the following aggregation pipeline in order to get the desired output and then use e.g. a bulk replace to update all your documents:
db.collection.aggregate({
$unwind: "$tracking" // flatten the "tracking" array into separate documents
}, {
$sort: {
"tracking.timeCheck": 1 // sort by timeCheck to allow us to use the $first operator in the next stage reliably
}
}, {
$group: {
_id: { // group by
"_id": "$_id", // "_id" and
"rank": "$tracking.rank", // "rank" and
"date": { // the "date" part of the "timeCheck" field
$dateFromParts : {
year: { $year: "$tracking.timeCheck" },
month: { $month: "$tracking.timeCheck" },
day: { $dayOfWeek: "$tracking.timeCheck" }
}
}
},
"doc": { $first: "$$ROOT" } // only keep the first document per group
}
}, {
$sort: {
"doc.tracking.timeCheck": 1 // restore ascending sort order - may or may not be needed...
}
}, {
$group: {
_id: "$_id._id", // merge everything again per "_id"
"addedAt": { $first: "$doc.addedAt" },
"__v": { $first: "$doc.__v" },
"check": { $first: "$doc.check" },
"lastCheck": { $first: "$doc.lastCheck" },
"tracking": { $push: "$doc.tracking" } // in order to join the tracking values into an array again
}
})
I have the following dataset:
{
"_id" : ObjectId("59668a22734d1d48cf34de08"),
"name" : "Nobody Cares",
"menus" : [
{
"_id" : "menu_123",
"name" : "Weekend Menu",
"description" : "A menu for the weekend",
"groups" : [
{
"name" : "Spirits",
"has_mixers" : true,
"sizes" : [
"Single",
"Double"
],
"categories" : [
{
"name" : "Vodka",
"description" : "Maybe not necessary?",
"drinks" : [
{
"_id" : "drink_123",
"name" : "Absolut",
"description" : "Fancy ass vodka",
"sizes" : [
{
"_id" : "size_123",
"size" : "Single",
"price" : 300
}
]
}
]
}
]
}
],
"mixers" : [
{
"_id" : "mixer_1",
"name" : "Coca Cola",
"price" : 150
},
{
"_id" : "mixer_2",
"name" : "Lemonade",
"price" : 120
}
]
}
]
}
And I'm attempting to retrieve a single drink from that dataset, I'm using the following aggregate query:
db.getCollection('places').aggregate([
{ $match : {"menus.groups.categories.drinks._id" : "drink_123"} },
{ $unwind: "$menus" },
{ $project: { "_id": 1, "menus": { "groups": { "categories": { "drinks": { "name": 1 } } } } } }
])
However, it's returning the full structure of the dataset along with the correct data.
So instead of:
{
"_id": "drink_123",
"name": "Absolut"
}
I get:
{
"_id": ObjectId("59668a22734d1d48cf34de08"),
"menus": {
"groups": {
"categories": {
"drinks": { "name": "Absolut" }
}
}
}
}
For example. Any ideas how to just retrieve the subdocument?
If you need to retain the deeply nested model then this call will produce the desired output:
db.getCollection('places').aggregate([
{ $match : {"menus.groups.categories.drinks._id" : "drink_123"} },
{ $project: {"_id": '$menus.groups.categories.drinks._id', name: '$menus.groups.categories.drinks.name'}},
{ $unwind: "$name" },
{ $unwind: "$name" },
{ $unwind: "$name" },
{ $unwind: "$name" },
{ $unwind: "$_id" },
{ $unwind: "$_id" },
{ $unwind: "$_id" },
{ $unwind: "$_id" }
])
The numerous unwinds are the result of the deep nesting of the drinks subdocuments.
Though, FWIW, this sort of query does perhaps suggest that the model isn't 'read friendly'.