I'm trying to clean a huge database.
Sample DB :
{
"_id" : ObjectId("59fc5249d5ab401d99f3de7f"),
"addedAt" : ISODate("2017-11-03T11:26:01.744Z"),
"__v" : 0,
"check" : 17602,
"lastCheck" : ISODate("2018-04-05T11:47:00.609Z"),
"tracking" : [
{
"timeCheck" : ISODate("2017-11-06T13:17:20.861Z"),
"_id" : ObjectId("5a0060e00f3c330012bafe39"),
"rank" : 2395,
},
{
"timeCheck" : ISODate("2017-11-06T13:22:31.254Z"),
"_id" : ObjectId("5a0062170f3c330012bafe77"),
"rank" : 2395,
},
{
"timeCheck" : ISODate("2017-11-06T13:27:40.551Z"),
"_id" : ObjectId("5a00634c0f3c330012bafebe"),
"rank" : 2379,
},
{
"timeCheck" : ISODate("2017-11-06T13:32:41.084Z"),
"_id" : ObjectId("5a0064790f3c330012baff03"),
"rank" : 2395,
},
{
"timeCheck" : ISODate("2017-11-06T13:37:51.012Z"),
"_id" : ObjectId("5a0065af0f3c330012baff32"),
"rank" : 2379,
},
{
"timeCheck" : ISODate("2017-11-07T13:37:51.012Z"),
"_id" : ObjectId("5a0065af0f3c330012baff34"),
"rank" : 2379,
}]
}
I have a lot of duplicate value but I need to clean only by day.
To obtain this for example :
{
"_id" : ObjectId("59fc5249d5ab401d99f3de7f"),
"addedAt" : ISODate("2017-11-03T11:26:01.744Z"),
"__v" : 0,
"check" : 17602,
"lastCheck" : ISODate("2018-04-05T11:47:00.609Z"),
"tracking" : [
{
"timeCheck" : ISODate("2017-11-06T13:17:20.861Z"),
"_id" : ObjectId("5a0060e00f3c330012bafe39"),
"rank" : 2395,
},
{
"timeCheck" : ISODate("2017-11-06T13:27:40.551Z"),
"_id" : ObjectId("5a00634c0f3c330012bafebe"),
"rank" : 2379,
},
{
"timeCheck" : ISODate("2017-11-07T13:37:51.012Z"),
"_id" : ObjectId("5a0065af0f3c330012baff34"),
"rank" : 2379,
}]
}
How can I aggregate by day and after delete last value duplicate?
I need to keep the values per day even if they are identical with another day.
The aggregation framework cannot update data at this stage. However, you can use the following aggregation pipeline in order to get the desired output and then use e.g. a bulk replace to update all your documents:
db.collection.aggregate({
$unwind: "$tracking" // flatten the "tracking" array into separate documents
}, {
$sort: {
"tracking.timeCheck": 1 // sort by timeCheck to allow us to use the $first operator in the next stage reliably
}
}, {
$group: {
_id: { // group by
"_id": "$_id", // "_id" and
"rank": "$tracking.rank", // "rank" and
"date": { // the "date" part of the "timeCheck" field
$dateFromParts : {
year: { $year: "$tracking.timeCheck" },
month: { $month: "$tracking.timeCheck" },
day: { $dayOfWeek: "$tracking.timeCheck" }
}
}
},
"doc": { $first: "$$ROOT" } // only keep the first document per group
}
}, {
$sort: {
"doc.tracking.timeCheck": 1 // restore ascending sort order - may or may not be needed...
}
}, {
$group: {
_id: "$_id._id", // merge everything again per "_id"
"addedAt": { $first: "$doc.addedAt" },
"__v": { $first: "$doc.__v" },
"check": { $first: "$doc.check" },
"lastCheck": { $first: "$doc.lastCheck" },
"tracking": { $push: "$doc.tracking" } // in order to join the tracking values into an array again
}
})
Related
I have a document with multiple level of embedded subdocument each has some nested array. Using $unwind and sort, do sorting based on day in descending and using push to combine each row records into single array. This Push is working only at one level means it allows only one push. If want to do the same things on the nested level and retains the top level data, got "errmsg" : "Unrecognized expression '$push'".
{
"_id" : ObjectId("5f5638d0ff25e01482432803"),
"name" : "XXXX",
"mobileNo" : 323232323,
"payroll" : [
{
"_id" : ObjectId("5f5638d0ff25e01482432801"),
"month" : "Jan",
"salary" : 18200,
"payrollDetails" : [
{
"day" : "1",
"salary" : 200,
},
{
"day" : "2",
"salary" : 201,
}
]
},
{
"_id" : ObjectId("5f5638d0ff25e01482432802"),
"month" : "Feb",
"salary" : 8300,
"payrollDetails" : [
{
"day" : "1",
"salary" : 300,
},
{
"day" : "2",
"salary" : 400,
}
]
}
],
}
Expected Result:
{
"_id" : ObjectId("5f5638d0ff25e01482432803"),
"name" : "XXXX",
"mobileNo" : 323232323,
"payroll" : [
{
"_id" : ObjectId("5f5638d0ff25e01482432801"),
"month" : "Jan",
"salary" : 18200,
"payrollDetails" : [
{
"day" : "2",
"salary" : 201
},
{
"day" : "1",
"salary" : 200
}
]
},
{
"_id" : ObjectId("5f5638d0ff25e01482432802"),
"month" : "Feb",
"salary" : 8300,
"payrollDetails" : [
{
"day" : "2",
"salary" : 400
},
{
"day" : "1",
"salary" : 300
}
]
}
],
}
Just day will be sorted and remaining things are same
I have tried but it got unrecognized expression '$push'
db.employee.aggregate([
{$unwind: '$payroll'},
{$unwind: '$payroll.payrollDetails'},
{$sort: {'payroll.payrollDetails.day': -1}},
{$group: {_id: '$_id', payroll: {$push: {payrollDetails:{$push:
'$payroll.payrollDetails'} }}}}])
It requires two time $group, you can't use $push operator two times in a field,
$group by main id and payroll id, construct payrollDetails array
$sort by payroll id (you can skip if not required)
$group by main id and construct payroll array
db.employee.aggregate([
{ $unwind: "$payroll" },
{ $unwind: "$payroll.payrollDetails" },
{ $sort: { "payroll.payrollDetails.day": -1 } },
{
$group: {
_id: {
_id: "$_id",
pid: "$payroll._id"
},
name: { $first: "$name" },
mobileNo: { $first: "$mobileNo" },
payrollDetails: { $push: "$payroll.payrollDetails" },
month: { $first: "$payroll.month" },
salary: { $first: "$payroll.salary" }
}
},
{ $sort: { "payroll._id": -1 } },
{
$group: {
_id: "$_id._id",
name: { $first: "$name" },
mobileNo: { $first: "$mobileNo" },
payroll: {
$push: {
_id: "$_id.pid",
month: "$month",
salary: "$salary",
payrollDetails: "$payrollDetails"
}
}
}
}
])
Playground
I am trying to find a user list which is new for day-1. I have written the query to find the users who arrived till the day before yesterday and the list of users arrived yesterday. Now I want minus those data how can I do that in a single aggregate function.
Function to get the list before yesterday
db.chat_question_logs.aggregate([
{
$match : {"createdDate":{$lte: ISODate("2020-04-29T00:00:00Z")}}
},
{
"$project" :
{
_id : 0,
"userInfo.userId":1
}
},
{
"$group": {
"_id": {userId:"$userInfo.userId"},"count": {$sum : 1}}
}
])
similarly for the day-1 is as below
db.chat_question_logs.aggregate([
{
$match : {"createdDate":{$gte: ISODate("2020-04-30T00:00:00Z"),$lte: ISODate("2020-05-01T00:00:00Z")}}
},
{
"$project" :
{
_id : 0,
"userInfo.userId":1
}
},
{
"$group": {
"_id": {userId:"$userInfo.userId"},"count": {$sum : 1}}
}
])
Result JSON are as below
/* 1 */
{
"_id" : {
"userId" : "2350202241750776"
},
"count" : 1
},
/* 2 */
{
"_id" : {
"userId" : "26291570771793121"
},
"count" : 1
},
/* 3 */
{
"_id" : {
"userId" : "2742872209107866"
},
"count" : 5
},
/* 4 */
{
"_id" : {
"userId" : "23502022417507761212"
},
"count" : 1
},
/* 5 */
{
"_id" : {
"userId" : "2629157077179312"
},
"count" : 43
}
How can I find the difference.
It sounds like what you want is to get all users created yesterday (which is the 28th in this example).
db.chat_question_logs.aggregate([
{
$match : { $and: [
{ "createdDate":{$lt: ISODate("2020-04-29T00:00:00Z")} },
{ "createdDate": {$gte: ISODate("2020-04-28T00:00:00Z") }}
] }
},
{
"$project" :
{
_id : 0,
"userInfo.userId":1
}
},
{
"$group": {
"_id": {userId:"$userInfo.userId"},"count": {$sum : 1}}
}
])
Is this what you want?
Hi found the solution which is below
I used the group and first appearance of the Id and then filter record on date which I wanted.The query is as below
db.chat_question_logs.aggregate([
{
$group:
{
_id: "$userInfo.userId",
firstApprance: { $first: "$createdDate" }
}
},
{
$match : { "firstApprance": { $gte: new ISODate("2020-05-03"), $lt: new ISODate("2020-05-05") } }
}
])
How to get percentage total of data with group by date in MongoDB ?
Link example : https://mongoplayground.net/p/aNND4EPQhcb
I have some collection structure like this
{
"_id" : ObjectId("5ccbb96706d1d47a4b2ced4b"),
"date" : "2019-05-03T10:39:53.108Z",
"id" : 166,
"update_at" : "2019-05-03T10:45:36.208Z",
"type" : "image"
}
{
"_id" : ObjectId("5ccbb96706d1d47a4b2ced4c"),
"date" : "2019-05-03T10:39:53.133Z",
"id" : 166,
"update_at" : "2019-05-03T10:45:36.208Z",
"type" : "image"
}
{
"_id" : ObjectId("5ccbb96706d1d47a4b2ced4d"),
"date" : "2019-05-03T10:39:53.180Z",
"id" : 166,
"update_at" : "2019-05-03T10:45:36.208Z",
"type" : "image"
}
{
"_id" : ObjectId("5ccbb96706d1d47a4b2ced4e"),
"date" : "2019-05-03T10:39:53.218Z",
"id" : 166,
"update_at" : "2019-05-03T10:45:36.208Z",
"type" : "image"
}
And I have query in mongodb to get data of collection, how to get percentage of total data. in bellow example query to get data :
db.name_collection.aggregate(
[
{ "$match": {
"update_at": { "$gte": "2019-11-04T00:00:00.0Z", "$lt": "2019-11-06T00:00:00.0Z"},
"id": { "$in": [166] }
} },
{
"$group" : {
"_id": {
$substr: [ '$update_at', 0, 10 ]
},
"count" : {
"$sum" : 1
}
}
},
{
"$project" : {
"_id" : 0,
"date" : "$_id",
"count" : "$count"
}
},
{
"$sort" : {
"date" : 1
}
}
]
)
and this response :
{
"date" : "2019-11-04",
"count" : 39
},
{
"date" : "2019-11-05",
"count" : 135
}
how to get percentage data total from key count ? example response to this :
{
"date" : "2019-11-04",
"count" : 39,
"percentage" : "22%"
},
{
"date" : "2019-11-05",
"count" : 135,
"percentage" : "78%"
}
You have to group by null to get total count and then use $map to calculate the percentage. $round will be a useful operator in such case. Finally you can $unwind and $replaceRoot to get back the same number of documents:
db.collection.aggregate([
// previous aggregation steps
{
$group: {
_id: null,
total: { $sum: "$count" },
docs: { $push: "$$ROOT" }
}
},
{
$project: {
docs: {
$map: {
input: "$docs",
in: {
date: "$$this.date",
count: "$$this.count",
percentage: { $concat: [ { $toString: { $round: { $multiply: [ { $divide: [ "$$this.count", "$total" ] }, 100 ] } } }, '%' ] }
}
}
}
}
},
{
$unwind: "$docs"
},
{
$replaceRoot: { newRoot: "$docs" }
}
])
Mongo Playground
I am new to mongo and trying to perform aggregation query to calculate min/max of timestamps for a given document.
Sample documents are below -
{
"_id" : ObjectId("5c9cd93adddca9ebb2b3fcba"),
"frequency" : 5,
"s_id" : "30081993",
"timestamp" : NumberLong(1546300800000),
"date" : ISODate("2019-01-01T00:00:00.000Z"),
"values" : {
"1547439900000" : {
"number_of_values" : 3,
"min_value" : 32.13,
"max_value" : 81.42
},
"1547440200000" : {
"number_of_values" : 3,
"min_value" : 48.08,
"max_value" : 84.52
},
"1547440500000" : {
"number_of_values" : 2,
"min_value" : 27.39,
"max_value" : 94.64
}
}
}
{
"_id" : ObjectId("5c9cd851dddca9ebb2b3f2ac"),
"frequency" : 5,
"s_id" : "27061995",
"timestamp" : NumberLong(1546300800000),
"date" : ISODate("2019-01-01T00:00:00.000Z"),
"values" : {
"1547539900000" : {
"number_of_values" : 31,
"min_value" : 322.13,
"max_value" : 831.42
},
"1547540200000" : {
"number_of_values" : 3,
"min_value" : 418.08,
"max_value" : 8114.52
},
"1547740500000" : {
"number_of_values" : 2,
"min_value" : 207.39,
"max_value" : 940.64
}
}
}
I have come up with the following query which works for a single document.
db.testdb.aggregate([
{
$match: {
"s_id": "30081993",
"frequency": 5,
}
},
{
$project: {
_id: 1,
valuesarray: {
$objectToArray: "$values"
}
}
},
{
$unwind: "$valuesarray"
},
{
$group: {
"_id": "",
"min_timestamp": {
$min: "$valuesarray.k"
},
"max_timestamp": {
$max: "$valuesarray.k"
}
}
}
]);
The output is below
{
"_id" : "",
"min_timestamp" : "1547439900000",
"max_timestamp" : "1547440500000"
}
I want an aggregation query which can calculate the max/min of timestamps but for multiple documents i.e I want to use a $in operator during the $match stage and get min/max of all s_id. Is this possible?
Expected :
{
"_id" : "30081993",
"min_timestamp" : "1547439900000",
"max_timestamp" : "1547440500000"
}
{
"_id" : "27061995",
"min_timestamp" : "1547539900000",
"max_timestamp" : "1547740500000"
}
Yes, only small changes are required to make this work for multiple documents.
In $match stage, specify your $in query:
$match: {
"s_id": { $in : [ "30081993", "27061995" ] },
"frequency": 5,
}
In $project stage, rename s_id to _id, to ensure we keep the s_id associated with each document:
$project: {
_id: "$s_id",
valuesarray: {
$objectToArray: "$values"
}
}
In $group stage, group by _id (originally s_id), to ensure we correctly group the timestamps together before calculating $min/$max:
$group: {
"_id": "$_id",
"min_timestamp": {
$min: "$valuesarray.k"
},
"max_timestamp": {
$max: "$valuesarray.k"
}
}
Whole pipeline:
db.testdb.aggregate([
{
$match: {
"s_id": { $in : [ "30081993", "27061995" ] },
"frequency": 5,
}
},
{
$project: {
_id: "$s_id",
valuesarray: {
$objectToArray: "$values"
}
}
},
{
$unwind: "$valuesarray"
},
{
$group: {
"_id": "$_id",
"min_timestamp": {
$min: "$valuesarray.k"
},
"max_timestamp": {
$max: "$valuesarray.k"
}
}
}
]);
My collection looks like this:
{
"_id":ObjectId("5744b6cd9c408cea15964d18"),
"uuid":"bbde4bba-062b-4024-9bb0-8b12656afa7e",
"version":1,
"categories":["sport"]
},
{
"_id":ObjectId("5745d2bab047379469e10e27"),
"uuid":"bbde4bba-062b-4024-9bb0-8b12656afa7e",
"version":2,
"categories":["sport", "shopping"]
},
{
"_id":ObjectId("5744b6359c408cea15964d15"),
"uuid":"561c3705-ba6d-432b-98fb-254483fcbefa",
"version":1,
"categories":["politics"]
}
I want to count the number of documents for every category. To do this, I unwind the categories array:
db.collection.aggregate(
{$unwind: '$categories'},
{$group: {_id: '$categories', count: {$sum: 1}} }
)
Result:
{ "_id" : "sport", "count" : 2 }
{ "_id" : "shopping", "count" : 1 }
{ "_id" : "politics", "count" : 1 }
Now I want to count the number of documents for every category, but where document version is the latest version.
This is where I am stuck.
It's ugly but I think this gives you what you're after:
db.collection.aggregate(
{ $unwind : "$categories" },
{ $group :
{ "_id" : { "uuid" : "$uuid" },
"doc" : { $push : { "version" : "$version", "category" : "$categories" } },
"maxVersion" : { $max : "$version" }
}
},
{ $unwind : "$doc" },
{ $project : { "_id" : 0, "uuid" : "$id.uuid", "category" : "$doc.category", "isCurrentVersion" : { $eq : [ "$doc.version", "$maxVersion" ] } } },
{ $match : { "isCurrentVersion" : true }},
{ $group : { "_id" : "$category", "count" : { $sum : 1 } } }
)
You can do this by first grouping the denormalized documents (from the $unwind operator step) by two keys, i.e. the categories and version fields. This is necessary for the preceding pipeline step which orders the grouped documents and their accumulated counts by the version (desc) and categories (asc) keys respectively using the $sort operator.
Another grouping will be required to get the top documents in each categories group after ordering using the $first operator. The following shows this
db.collection.aggregate(
{ "$unwind": "$categories" },
{
"$group": {
"_id": {
'categories': '$categories',
'version': '$version'
},
"count": { "$sum": 1 }
}
},
{ "$sort": { "_id.version": -1, "_id.categories": 1 } },
{
"$group": {
"_id": "$_id.categories",
"count": { "$first": "$count" },
"version": { "$first": "$_id.version" }
}
}
)
Sample Output
{ "_id" : "shopping", "count" : 1, "version" : 2 }
{ "_id" : "sport", "count" : 1, "version" : 2 }
{ "_id" : "politics", "count" : 1, "version" : 1 }