I have below set for data in my MongoDB collections. I need to find the latest data based on field "eventType".
{
"_id" : ObjectId("5d5690843248b8c20481f5e9"),
"mrn" : "xp35",
"eventType" : "LAB",
"eventSubType" : "CBC",
"value" : 1,
"units" : 1,
"charttime" : ISODate("2019-08-16T16:46:21.393Z")
}
{
"_id" : ObjectId("5d5690843248b8c20481f5e9"),
"mrn" : "xp35",
"eventType" : "LAB",
"eventSubType" : "CBB",
"value" : 1,
"units" : 1,
"charttime" : ISODate("2019-08-16T16:46:22.393Z")
}
{
"_id" : ObjectId("5d5690843248b8c20481f5ea"),
"mrn" : "zfwy",
"eventType" : "EDLIST",
"eventSubType" : "Lipids",
"value" : 1,
"units" : 1,
"charttime" : ISODate("2019-08-16T16:46:23.394Z")
}
{
"_id" : ObjectId("5d5690843248b8c20481f5ea"),
"mrn" : "zfwy",
"eventType" : "EDLIST",
"eventSubType" : "L",
"value" : 1,
"units" : 1,
"charttime" : ISODate("2019-08-16T16:46:24.394Z")
}
I used 'aggregation' and 'find' queries and sorted it based on timestamp field "charttime" to fetch the latest data but it is not working. I need to fetch data based on field "eventType" so that for each 'eventType' I should get the latest data. So in the given example, I should get the latest data for "LAB" and "EDLIST". Ideally, it should return data:
{
"_id" : ObjectId("5d5690843248b8c20481f5e9"),
"mrn" : "xp35",
"eventType" : "LAB",
"eventSubType" : "CBB",
"value" : 1,
"units" : 1,
"charttime" : ISODate("2019-08-16T16:46:22.393Z")
}
{
"_id" : ObjectId("5d5690843248b8c20481f5ea"),
"mrn" : "zfwy",
"eventType" : "EDLIST",
"eventSubType" : "L",
"value" : 1,
"units" : 1,
"charttime" : ISODate("2019-08-16T16:46:24.394Z")
}
Follow below steps:
Sort all document first.
Group it by eventtype.
Project again to get id correctly into _id (Not necessary if you are ok with id key)
Sort again those data (Not necessary if you are ok with different eventype not sorted by date)
db.collection.aggregate([
{ $sort: {"charttime": 1 }},
{ $group: {
_id: "$eventType",
id: {$first: "$_id"},
"mrn": {$first: "$mrn"},
"eventType": {$first: "$eventType"},
"eventSubType": {$first: "$eventSubType"},
"value": {$first: "$value"},
"units": {$first: "$units"},
"charttime": {$first: "$charttime"}
}},
{$project: {
_id: "$id",
"mrn": 1,
"eventType": 1,
"eventSubType": 1,
"value": 1,
"units": 1,
"charttime": 1
}},
{ $sort: {"charttime": 1 }}
])
Hope this help!
Output:
/* 1 */
{
"_id" : ObjectId("5d5cedb1fc18699f18a24fa2"),
"mrn" : "xp35",
"eventType" : "LAB",
"eventSubType" : "CBB",
"value" : 1,
"units" : 1,
"charttime" : ISODate("2019-08-16T16:46:22.393Z")
}
/* 2 */
{
"_id" : ObjectId("5d5cedc1fc18699f18a24fa9"),
"mrn" : "zfwy",
"eventType" : "EDLIST",
"eventSubType" : "L",
"value" : 1,
"units" : 1,
"charttime" : ISODate("2019-08-16T16:46:24.394Z")
}
===== UPDATE =====
As per your ask to optimize query:
db.collection.aggregate([
{ $sort: {"charttime": -1 }}, // Sort in descending. (So we would not have another sort after group)
{ $group: {
_id: "$eventType", // Group by event type
data: {$first: "$$ROOT"} // Take whole first record
}},
{ $replaceRoot: { newRoot: "$data" }} // Replaceroot to have document as per your requirement
])
===== UPDATE 2 ====
For too many records:
- Find eventType and maximum chartTime
- Iterate on each document and get records (You may have multiple calls on DB but it will take less time)
db.getCollection('Vehicle').aggregate([
{ $group: {
_id: "$eventType", // Group by event type
maxChartTime: {$max: "$charttime"}
}}
]).forEach(function(data) {
db.getCollection('Vehicle').find({
"eventType": data._id,
"charttime": data.maxChartTime
});
// Any mechanism to have array of all retrieved documents.
// You can handle it from your back end too.
})
Note:- I have tested it with 506983 records and got results in 0.526 sec.
First sort(descending) the data by charttime so that $first
accumulator works properly.
Then group by eventType and find latest of the dates by
$maxaccumulator.
$project pipe is to retain the original _id with the same key
name field. If it is not required as _id you can remove the pipe
altogether.
Aggregation Query:
db.collection.aggregate([
{ $sort: { charttime: -1 } },
{
$group: {
_id: "$eventType",
id: { $first: "$_id" },
mrn: { $first: "$mrn" },
eventType: { $first: "$eventType" },
eventSubType: { $first: "$eventSubType" },
value: { $first: "$value" },
units: { $first: "$units" },
charttime: { $max: "$charttime" }
}
},
{
$project: {
_id: "$id",
mrn: 1,
eventType: 1,
eventSubType: 1,
value: 1,
units: 1,
charttime: 1
}
}
]);
Related
I have a document with multiple level of embedded subdocument each has some nested array. Using $unwind and sort, do sorting based on day in descending and using push to combine each row records into single array. This Push is working only at one level means it allows only one push. If want to do the same things on the nested level and retains the top level data, got "errmsg" : "Unrecognized expression '$push'".
{
"_id" : ObjectId("5f5638d0ff25e01482432803"),
"name" : "XXXX",
"mobileNo" : 323232323,
"payroll" : [
{
"_id" : ObjectId("5f5638d0ff25e01482432801"),
"month" : "Jan",
"salary" : 18200,
"payrollDetails" : [
{
"day" : "1",
"salary" : 200,
},
{
"day" : "2",
"salary" : 201,
}
]
},
{
"_id" : ObjectId("5f5638d0ff25e01482432802"),
"month" : "Feb",
"salary" : 8300,
"payrollDetails" : [
{
"day" : "1",
"salary" : 300,
},
{
"day" : "2",
"salary" : 400,
}
]
}
],
}
Expected Result:
{
"_id" : ObjectId("5f5638d0ff25e01482432803"),
"name" : "XXXX",
"mobileNo" : 323232323,
"payroll" : [
{
"_id" : ObjectId("5f5638d0ff25e01482432801"),
"month" : "Jan",
"salary" : 18200,
"payrollDetails" : [
{
"day" : "2",
"salary" : 201
},
{
"day" : "1",
"salary" : 200
}
]
},
{
"_id" : ObjectId("5f5638d0ff25e01482432802"),
"month" : "Feb",
"salary" : 8300,
"payrollDetails" : [
{
"day" : "2",
"salary" : 400
},
{
"day" : "1",
"salary" : 300
}
]
}
],
}
Just day will be sorted and remaining things are same
I have tried but it got unrecognized expression '$push'
db.employee.aggregate([
{$unwind: '$payroll'},
{$unwind: '$payroll.payrollDetails'},
{$sort: {'payroll.payrollDetails.day': -1}},
{$group: {_id: '$_id', payroll: {$push: {payrollDetails:{$push:
'$payroll.payrollDetails'} }}}}])
It requires two time $group, you can't use $push operator two times in a field,
$group by main id and payroll id, construct payrollDetails array
$sort by payroll id (you can skip if not required)
$group by main id and construct payroll array
db.employee.aggregate([
{ $unwind: "$payroll" },
{ $unwind: "$payroll.payrollDetails" },
{ $sort: { "payroll.payrollDetails.day": -1 } },
{
$group: {
_id: {
_id: "$_id",
pid: "$payroll._id"
},
name: { $first: "$name" },
mobileNo: { $first: "$mobileNo" },
payrollDetails: { $push: "$payroll.payrollDetails" },
month: { $first: "$payroll.month" },
salary: { $first: "$payroll.salary" }
}
},
{ $sort: { "payroll._id": -1 } },
{
$group: {
_id: "$_id._id",
name: { $first: "$name" },
mobileNo: { $first: "$mobileNo" },
payroll: {
$push: {
_id: "$_id.pid",
month: "$month",
salary: "$salary",
payrollDetails: "$payrollDetails"
}
}
}
}
])
Playground
{ "_id" : ObjectId("4f127fa55e7242718200002d"), "id":1, "name" : "foo"}
{ "_id" : ObjectId("4f127fa55e7242718200002d"), "id":2, "name" : "bar"}
{ "_id" : ObjectId("4f127fa55e7242718200002d"), "id":3, "name" : "baz"}
{ "_id" : ObjectId("4f127fa55e7242718200002d"), "id":4, "name" : "foo"}
{ "_id" : ObjectId("4f127fa55e7242718200002d"), "id":5, "name" : "bar"}
{ "_id" : ObjectId("4f127fa55e7242718200002d"), "id":6, "name" : "bar"}
I want to find all the duplicated entries in this collection by the "name" field using aggregation. E.g. "foo" appears twice and "bar" appears 3 times.
You can use group stage in aggrgation
db.collection.aggregate([{
$group: {
_id: "$name",
count: { $sum: 1 },
name: { $first: "$name" }
}
}])
You can group by name and count. And then filter with a count greater that 1.
db.collection.aggregate([
{
$group: {
_id: "$name",
count: { $sum: 1 }
}
},
{
$match:{count:{$gt:1}}
}
])
Output:
{ "_id" : "foo", "count":2}
{ "_id" : "bar", "count":3}
I'm trying to clean a huge database.
Sample DB :
{
"_id" : ObjectId("59fc5249d5ab401d99f3de7f"),
"addedAt" : ISODate("2017-11-03T11:26:01.744Z"),
"__v" : 0,
"check" : 17602,
"lastCheck" : ISODate("2018-04-05T11:47:00.609Z"),
"tracking" : [
{
"timeCheck" : ISODate("2017-11-06T13:17:20.861Z"),
"_id" : ObjectId("5a0060e00f3c330012bafe39"),
"rank" : 2395,
},
{
"timeCheck" : ISODate("2017-11-06T13:22:31.254Z"),
"_id" : ObjectId("5a0062170f3c330012bafe77"),
"rank" : 2395,
},
{
"timeCheck" : ISODate("2017-11-06T13:27:40.551Z"),
"_id" : ObjectId("5a00634c0f3c330012bafebe"),
"rank" : 2379,
},
{
"timeCheck" : ISODate("2017-11-06T13:32:41.084Z"),
"_id" : ObjectId("5a0064790f3c330012baff03"),
"rank" : 2395,
},
{
"timeCheck" : ISODate("2017-11-06T13:37:51.012Z"),
"_id" : ObjectId("5a0065af0f3c330012baff32"),
"rank" : 2379,
},
{
"timeCheck" : ISODate("2017-11-07T13:37:51.012Z"),
"_id" : ObjectId("5a0065af0f3c330012baff34"),
"rank" : 2379,
}]
}
I have a lot of duplicate value but I need to clean only by day.
To obtain this for example :
{
"_id" : ObjectId("59fc5249d5ab401d99f3de7f"),
"addedAt" : ISODate("2017-11-03T11:26:01.744Z"),
"__v" : 0,
"check" : 17602,
"lastCheck" : ISODate("2018-04-05T11:47:00.609Z"),
"tracking" : [
{
"timeCheck" : ISODate("2017-11-06T13:17:20.861Z"),
"_id" : ObjectId("5a0060e00f3c330012bafe39"),
"rank" : 2395,
},
{
"timeCheck" : ISODate("2017-11-06T13:27:40.551Z"),
"_id" : ObjectId("5a00634c0f3c330012bafebe"),
"rank" : 2379,
},
{
"timeCheck" : ISODate("2017-11-07T13:37:51.012Z"),
"_id" : ObjectId("5a0065af0f3c330012baff34"),
"rank" : 2379,
}]
}
How can I aggregate by day and after delete last value duplicate?
I need to keep the values per day even if they are identical with another day.
The aggregation framework cannot update data at this stage. However, you can use the following aggregation pipeline in order to get the desired output and then use e.g. a bulk replace to update all your documents:
db.collection.aggregate({
$unwind: "$tracking" // flatten the "tracking" array into separate documents
}, {
$sort: {
"tracking.timeCheck": 1 // sort by timeCheck to allow us to use the $first operator in the next stage reliably
}
}, {
$group: {
_id: { // group by
"_id": "$_id", // "_id" and
"rank": "$tracking.rank", // "rank" and
"date": { // the "date" part of the "timeCheck" field
$dateFromParts : {
year: { $year: "$tracking.timeCheck" },
month: { $month: "$tracking.timeCheck" },
day: { $dayOfWeek: "$tracking.timeCheck" }
}
}
},
"doc": { $first: "$$ROOT" } // only keep the first document per group
}
}, {
$sort: {
"doc.tracking.timeCheck": 1 // restore ascending sort order - may or may not be needed...
}
}, {
$group: {
_id: "$_id._id", // merge everything again per "_id"
"addedAt": { $first: "$doc.addedAt" },
"__v": { $first: "$doc.__v" },
"check": { $first: "$doc.check" },
"lastCheck": { $first: "$doc.lastCheck" },
"tracking": { $push: "$doc.tracking" } // in order to join the tracking values into an array again
}
})
I'm using mongodb with the following collection sample
{
"_id" : ObjectId("5703750ca9c436386c4814c9"),
"user_id" : NumberLong(17),
"activitytype_id" : NumberLong(1),
"created_date" : ISODate("2015-10-03T03:52:03.000Z")
},
{
"_id" : ObjectId("5703750ca9c436386c4814ca"),
"s_id" : NumberLong(132919),
"user_id" : NumberLong(17),
"activitytype_id" : NumberLong(4),
"created_date" : ISODate("2016-03-18T17:13:43.000Z")
},
{
"_id" : ObjectId("5703750ca9c436386c4814cb"),
"s_id" : NumberLong(215283),
"user_id" : NumberLong(17),
"activitytype_id" : NumberLong(4),
"created_date" : ISODate("2015-10-03T04:12:33.000Z")
}
,
{
"_id" : ObjectId("5703750ca9c436386c4814cc"),
"s_id" : NumberLong(360888),
"user_id" : NumberLong(17),
"activitytype_id" : NumberLong(4),
"created_date" : ISODate("2015-10-03T04:12:41.000Z")
}
This is my aggregation pipeline
db.activitylogs.aggregate([
{ $group: {
_id: {
user_id: "$user_id",
activitytype_id: "$activitytype_id"
},
activity_log_docs: {
$addToSet: {
s_id: "$s_id",
friend_id: "$friend_id",
playlist_id: "$playlist_id",
created_date:"$created_date"
}
}
}},
])
I need to get distinct s_id in activity_log_docs.
here is a screenshot for the result,
screen shot for the result
i need to avoid duplicated s_id in activity_log_docs array, so i will get distinct s_id
I think something like this should do :
db.activitylogs.aggregate([
{ $group: {
_id: {
user_id: "$user_id",
activitytype_id: "$activitytype_id" ,
s_id:"$s_id"
},
friend_id: {$first:"$friend_id"}}},
playlist_id: {$first:"$playlist_id"}}},
created_date: {$first:"$created_date"}}},
{ $group: {
_id: {
user_id: "$_id.user_id",
activitytype_id: "$_id.activitytype_id"
},
activity_log_docs: {
$addToSet: {
s_id: "$_id.s_id",
friend_id: "$friend_id",
playlist_id: "$playlist_id",
created_date:"$created_date"
}
}
}},
])
But please double check your own field's name.
I asked the question before. The question
{
"_id" : ObjectId("5539d45ee3cd0e48e99c3fa6"),
"userId" : 1,
"movieId" : 6,
"rating" : 2.0000000000000000,
"timestamp" : 9.80731e+008
}
{
"_id" : ObjectId("5539d45ee3cd0e48e99c1fa7"),
"userId" : 1,
"movieId" : 22,
"rating" : 3.0000000000000000,
"timestamp" : 9.80731e+008
},
{
"_id" : ObjectId("5539d45ee3cd0e48e99c1fa8"),
"userId" : 1,
"movieId" : 32,
"rating" : 2.0000000000000000,
"timestamp" : 9.80732e+008
},
{
"_id" : ObjectId("5539d45ee3cd0e48e99c1fa9"),
"userId" : 2,
"movieId" : 32,
"rating" : 4.0000000000000000,
"timestamp" : 9.80732e+008
},
{
"_id" : ObjectId("5539d45ee3cd0e48e99c1fa3"),
"userId" : 2,
"movieId" : 6,
"rating" : 5.0000000000000000,
"timestamp" : 9.80731e+008
}
Then needed to get the common(intersect) items for given two users (like userId:1 and userId:2) like [6,32].
But now i need to get that with ratings of each of them like [ {"movieId":6,"user1_rating" : 2,"user2_rating" : 4},{"movieId":32,"user1_rating" : 2,"user2_rating" : 5} ]
How can i get that?
I tried to do with
db.collection.aggregate([
{$match: {"$or":[{"userId":2},{"userId":1}]}},
{$group: {_id: "$movieId", users: {$push: {"userId":"$userId","rating":"$rating"}}}},
{$project: { movieId: "$_id", _id: 0,rating:"$users.rating", allUsersIncluded: { $setIsSubset: [ [1,2], "$users.userId"]}}},
{$match: { allUsersIncluded: true }},
{$group: { _id: null, movies: {$push: {"movie":"$movieId","Rating":"$rating"}}}}
])
But I get [ {"movie":6,0 : 2,1 : 4},{"movie":32,0 : 2,1 : 5} ]
Finally i achieved my target.The answer is
db.collection.aggregate([
{$match: {"$or":[{"userId":2},{"userId":1}]}},
{$group: {_id: "$movieId", users: {$addToSet: {"userId":"$userId","rating":"$rating"}}}},
{$project: { movieId: "$_id", _id: 0,user:"$users", allUsersIncluded: { $setIsSubset: [ [1,2], "$users.userId"]}}},
{$match: { allUsersIncluded: true }},
{$group: { _id: null, movies: {$addToSet: {"movie":"$movieId","user":"$user"}}}}
])