I have a scheduler running on my server and generating a list of medications that didn't sign every day, this problem occurs sometimes and while solving this issue I want to create a script checking and deleting those duplicate items generated. Appreciate your help.
Here some sample date generated : I want to check matched field [med, time, date] and return only 1 document then removed all duplicates.
[
{
"_id": ObjectId("6375ea026a0b4e0015d80f77"),
"med": ObjectId("610845e7f5b0e00017754d50"),
"Time": "8:00am",
"Date": "2022-11-16T08:00:00.008+0000"
},
{
"_id": ObjectId("6375ea026a0b4e0015d80fd4"),
"med": ObjectId("61f988e82cf5760018113cee"),
"Time": "7:00am",
"Date": "2022-11-16T08:00:00.008+0000"
},
{
"_id": ObjectId("6375ea026a0b4e0015d80fdd"),
"med": ObjectId("62d1c6e93603ed00177812ee"),
"Time": "6:00am",
"Date": "2022-11-16T08:00:00.008+0000"
},
{
"_id": ObjectId("6375ea02304a870015dfa2ec"),
"med": ObjectId("610845e7f5b0e00017754d50"),
"Time": "8:00am",
"Date": "2022-11-16T08:00:00.005+0000"
},
{
"_id": ObjectId("6375ea02304a870015dfa349"),
"med": ObjectId("61f988e82cf5760018113cee"),
"Time": "7:00am",
"Date": "2022-11-16T08:00:00.005+0000"
},
{
"_id": ObjectId("6375ea02304a870015dfa352"),
"med": ObjectId("62d1c6e93603ed00177812ee"),
"Time": "6:00am",
"Date": "2022-11-16T08:00:00.005+0000"
}
]
Expected output: (Note: Time and Date are different)
[
{
"med": ObjectId("610845e7f5b0e00017754d50"),
"Time": "8:00am",
"Date": "2022-11-16T08:00:00.008+0000"
},
{
"med": ObjectId("61f988e82cf5760018113cee"),
"Time": "7:00am",
"Date": "2022-11-16T08:00:00.008+0000"
},
{
"med": ObjectId("62d1c6e93603ed00177812ee"),
"Time": "6:00am",
"Date": "2022-11-16T08:00:00.008+0000"
},
]
Here Mongodb Playground for displaying result -> Mongodb Playground
Hope this answer will helpful
Note:- Your date is not same in every object so may be your expected output is wrong
i have attached playground url with code.
https://mongoplayground.net/p/hYrgPd7od02
db.collection.aggregate([
{
"$group": {
"_id": {
med: "$med",
Time: "$Time",
Date: "$Date",
},
"med": {
"$first": "$med"
},
"Time": {
"$first": "$Time"
},
"Date": {
"$first": "Date"
}
}
},
{
"$project": {
_id: 0
}
}
])
Related
My collection called "sets" currently looks like this:
[{
"_id": {
"$oid": "61c2c90b04a5c1fd873bca6c"
},
"exercise": "Flat Barbell Bench Press",
"repetitions": 8,
"rpe": 8,
"__v": 0,
"weight": 90,
"createdAt": {
"$date": {
"$numberLong": "1640155403594"
}
}
}]
It's an array with about 1500 documents several months worth of workouts.
What I'm trying to accomplish is this:
[{
"_id": {
"$oid": "62f3cee8d149f0c3534d848c"
},
"user": {
"$oid": "62d11eaa0caf6d2b3133b4b9"
},
"sets": [
{
"weight": 50,
"exercise": "Bench Press",
"repetitions": 8,
"rpe": 8,
"notes": "some note",
"_id": {
"$oid": "62f3cee8d149f0c3534d848d"
}
},
{},
{}
],
"createdAt": {
"$date": {
"$numberLong": "1660145384923"
}
}
}]
Essentially, what I'm trying to accomplish here is embedding an array of "set" objects as a field value for "sets" field. So that instead of a list of sets I have a list of workouts where sets are stored as an array of objects in a field called "sets".
Each "set" object has a date stamp and what I also need to do is to group these sets by day. So at the end of the day each new document represents one workout and has an id, user and sets fields, where each set is from that day.
My stackoverflow research tells me that I need to use aggregation, but I can't quite wrap my mind around how exactly I would do that.
Any help would be greatly appreciated!
/* UPDATE */
Here's the final query I came up with, hope someone will find it useful.
db.collection.aggregate([
{
$group: {
_id: {
$dateToString: {
format: "%Y-%m-%d",
date: "$createdAt"
}
},
sets: {
$push: {
_id: "$_id",
exercise: "$exercise",
repetitions: "$repetitions",
rpe: "$rpe",
__v: "$__v",
weight: "$weight",
createdAt: "$createdAt"
}
}
}
},
{
"$addFields": {
"user": "UserID",
"date": "$_id"
}
},
{
$project: {
"_id": 0
}
}
])
The collection has fields like key,version,date and status. Same key can have multiple entries in the collection with a unique version, date and Status.
How can we write an aggregation to find all the documents with max version.
For example - I have a sample collection created here - https://mongoplayground.net/p/nyAdYmzf59H
The expected output is
[{
"key": 1,
"version": 2,
"date": "Feb/10",
"status": "DRAFT"
}, {
"key": 2,
"version": 1,
"date": "March/10",
"status": "ACTIVE"
}, {
"key": 3,
"version": 3,
"date": "Jun/10",
"status": "DRAFT"
}
]
Demo - https://mongoplayground.net/p/WyKH2fVbWfA
db.collection.aggregate([
{ "$sort": { "version": -1 } }, // sort descending by version
{
"$group": {
"_id": "$key", // group by key
"version": { "$first": "$version" }, // pick top version which will be max here
"date": { "$first": "$date" },
"status": { "$first": "$status"}
}
},
{ $project: { _id: 0, key: "$_id", version: 1, date: 1, status: 1 }}
])
Demo - https://mongoplayground.net/p/e1Bw7rVGd0Q
db.collection.aggregate([
{ $sort: { "version": -1 } },
{ $group: { "_id": "$key", "doc": { "$first": "$$ROOT" } } },
{ $replaceRoot: { "newRoot": "$doc" } },
{ $project: { _id: 0 } }
])
I have a mongodb database that collects device data.
Example document is
{
"_id" : ObjectId("5c125a185dea1b0252c5352"),
"time" : ISODate("2018-12-13T15:09:42.536Z"),
"mac" : "10:06:21:3e:0a:ff",
}
The goal would be to count the unique mac values per day, from the first document in the db to the last document in the db.
I've been playing around and came to the conclusion that I would need to have multiple groups as well as projects during my aggregations.
This is what I tried - not sure if it's in the right direction or not or just completely messed up.
pipeline = [
{"$project": {
"_id": 1,
"mac": 1,
"day": {
"$dayOfMonth":"$time"
},
"month": {
"$month":"$time"
},
"year": {
"$year":"$time"
}
}
},
{
"$project": {
"_id": 1,
"mac": 1,
"time": {
"$concat": [{
"$substr":["$year", 0, 4]
},
"-", {
"$substr": ["$month", 0, 2]
},
"-",
{
"$substr":["$day", 0, 2]
}]
}
}
},
{
"$group": {
"_id": {
"time": "$time",
"mac": "$mac"
}
},
"$group": {
"_id": "$_id.time",
"count":{"$sum": 1},
}
}
]
data = list(collection.aggregate(pipeline, allowDiskUse=True))
The output now doesn't look like it did any aggregation,
[{"_id": null, "count": 751050}]
I'm using Pymongo as my driver and using Mongodb 4.
Ideally it should just show the date and count (eg { "_id" : "2018-12-13", "count" : 2 }.
I would love some feedback and advice.
Thanks in advance.
I prefer to minimize the number of stages, and especially to avoid unnecessary $group stages. So I would do it with the following pipeline:
pipeline = [
{ '$group' : {
'_id': { '$dateToString': { 'format': "%Y-%m-%d", 'date': "$time" } },
'macs':{ '$addToSet': '$mac' }
} },
{$addFields:{ 'macs':{'$size':'$macs'}}}
]
There's an operator called "$dateToString", which would solve most of your problems.
Edit: Didn't read the question carefully, #Asya Kamsky, thank you for pointing out. Here' the new answer.
pipeline = [
{
"$group": {
"_id": {
"date": {
$dateToString: {
format: "%Y-%m-%d",
date: "$time"
}
},
"mac": "$mac"
}
}
},
{
"$group": {
"_id": "$_id.date",
"count": {
"$sum": 1
}
}
}
]
[
{
"$project": {
"_id": 1,
"mac": 1,
"time": { "$dateToString": { "format": "%Y-%m-%d", "date": "$time", "timezone": "Africa/Johannesburg"}}
},
},
{
"$group": {
"_id":{
"time": "$time",
"mac": "$mac",
}}},{
"$group": {
"_id": "$_id.time",
"count":{"$sum": 1}
}},
{"$sort": SON([("_id", -1)])}
]
Does exactly what it should do.
Thanks. :)
I have many tweets object like this:
{
"_id" : ObjectId("5a2f4a381cb29b482553e2c9"),
"user_id" : 21898942,
"created_at" : ISODate("2009-03-09T19:48:50Z"),
"id" : 1301923516,
"place" : "",
"retweet_count" : 0,
"tweet" : "Save the Date! March 28th Vietnamese Cooking Class! Call to Reserve 312.255.0088",
"favorite_count" : 0
"type": A
}
I'm using this code to qroup the tweets by date and by type:
pipeline = [
{
"$group": {
"_id": {
"date": {
"$dateToString": {
"format": "%Y-%m-%d",
"date": "$created_at"
}
},
"type": "$type"
},
"count": {
"$sum": 1
}
}
}
]
results = mongo.db.tweets.aggregate(pipeline)
Here is the result I get:
{
"_id": {
"date": "2009-03-17",
"type": A
},
"count": 4
,
{
"_id": {
"date": "2009-03-17",
"type": B
},
"count": 6
}
But now I want to have the result in this format:
{date: "2009-03-17", A: 4, B: 6, C: 9}
Is there anyway I can achieve this through aggregate directly?
Note: I'm using MongoDB and PyMongo
You can try the below aggregation query in 3.6 version.
Added the second group to create array of type and count value pairs followed by $mergeObjects to merge date key value with $arrayToObject, which produces create a type value key and count value pairs, to generate the expected response.
$replaceRoot to promote the document to the top level.
pipeline = [
{
"$group": {
"_id": {
"date": {
"$dateToString": {
"format": "%Y-%m-%d",
"date": "$created_at"
}
},
"type": "$type"
},
"count": {
"$sum": 1
}
}
},
{
"$group": {
"_id": "$_id.date",
"typeandcount": {
"$push": {
"k": "$_id.type",
"v": "$count"
}
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{
"date": "$_id"
},
{
"$arrayToObject": "$typeandcount"
}
]
}
}
}
]
Mongo 3.4 version:
Replace the last stage with below
{
"$replaceRoot": {
"newRoot": {
"$arrayToObject": {
"$concatArrays": [
[
{
"k": "date",
"v": "$_id"
}
],
"$typeandcount"
]
}
}
}
}
If I have a set of objects each with the same description, but with different amounts.
{
{
"_id": "101",
"description": "DD from my employer1",
"amount": 1000.33
},
{
"_id": "102",
"description": "DD from my employer1",
"amount": 1000.34
},
{
"_id": "103",
"description": "DD from my employer1",
"amount": 1000.35
},
{
"_id": "104",
"description": "DD from employer1",
"amount": 5000.00
},
{
"_id": "105",
"description": "DD from my employer2",
"amount": 2000.33
},
{
"_id": "106",
"description": "DD from my employer2",
"amount": 2000.33
},
{
"_id": "107",
"description": "DD from my employer2",
"amount": 2000.33
}
}
Below, I am able to group them using the description:
{
{
"$group": {
"_id": {
"description": "$description"
},
"count": {
"$sum": 1
},
"_id": {
"$addToSet": "$_id"
}
}
},
{
"$match": {
"count": {
"$gte": 3
}
}
}
}
Is there a way to include all the amounts in the group (_ids: 101, 102, and 103 plus 105,106,107) even if they have a small difference, but exclude the bonus amount, which in the sample above is _id 104?
I don't believe it could be done in a group stage, but is there something that could be done at a later stage that could group _ids 101, 102 and 103 together and exclude _id 104. Basically, I want MongoDB to ignore the small differences in 101, 102, 103 and group them together since the are paychecks coming from the same employer.
I have been working with $stdDevPop, but can't get a solid formula down.
I am looking for a simple array output of just the _ids.
{
"result": [
"101",
"102",
"103",
"105",
"106",
"107"
]
}
You can do this by doing some math on the "amount" to round it down to the nearest 1000 and use that as the grouping _id:
db.collection.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$trunc": "$amount" },
{ "$mod": [
{ "$trunc": "$amount" },
1000
]}
]
},
"results": { "$push": "$_id" }
}},
{ "$redact": {
"$cond": {
"if": { "$gt": [ { "$size": "$results" }, 1 ] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
{ "$unwind": "$results" },
{ "$group": {
"_id": null,
"results": { "$push": "$results" }
}}
])
If your MongoDB is older than 3.2 then you would just need to use a long form with $mod of what $trunc is doing. And if your MongoDB is older than 2.6 then rather than $redact you would $match. So in the longer form this is:
db.collection.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [
"$amount",
{ "$mod": [ "$amount", 1 ] }
]},
{ "$mod": [
{ "$subtract": [
"$amount",
{ "$mod": [ "$amount", 1 ] }
]},
1000
]}
]
},
"results": { "$push": "$_id" },
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gt": 1 } } },
{ "$unwind": "$results" },
{ "$group": {
"_id": null,
"results": { "$push": "$results" }
}}
])
Either way the output is just the _id values whose amounts grouped to the boundaries with a count more than once.
{ "_id" : null, "results" : [ "105", "106", "107", "101", "102", "103" ] }
You could either add a $sort in there or live with sorting the result array in client code.
db.yourDBNameHere.aggregate( [
{ $match: { "amount" : { $lt : 5000 } } },
{ $project: { _id: 1 } },
])
that will grab the ID only of every transaction less than 5000$.