Restructuring a collection in MongoDB - mongodb

My collection called "sets" currently looks like this:
[{
"_id": {
"$oid": "61c2c90b04a5c1fd873bca6c"
},
"exercise": "Flat Barbell Bench Press",
"repetitions": 8,
"rpe": 8,
"__v": 0,
"weight": 90,
"createdAt": {
"$date": {
"$numberLong": "1640155403594"
}
}
}]
It's an array with about 1500 documents several months worth of workouts.
What I'm trying to accomplish is this:
[{
"_id": {
"$oid": "62f3cee8d149f0c3534d848c"
},
"user": {
"$oid": "62d11eaa0caf6d2b3133b4b9"
},
"sets": [
{
"weight": 50,
"exercise": "Bench Press",
"repetitions": 8,
"rpe": 8,
"notes": "some note",
"_id": {
"$oid": "62f3cee8d149f0c3534d848d"
}
},
{},
{}
],
"createdAt": {
"$date": {
"$numberLong": "1660145384923"
}
}
}]
Essentially, what I'm trying to accomplish here is embedding an array of "set" objects as a field value for "sets" field. So that instead of a list of sets I have a list of workouts where sets are stored as an array of objects in a field called "sets".
Each "set" object has a date stamp and what I also need to do is to group these sets by day. So at the end of the day each new document represents one workout and has an id, user and sets fields, where each set is from that day.
My stackoverflow research tells me that I need to use aggregation, but I can't quite wrap my mind around how exactly I would do that.
Any help would be greatly appreciated!
/* UPDATE */
Here's the final query I came up with, hope someone will find it useful.
db.collection.aggregate([
{
$group: {
_id: {
$dateToString: {
format: "%Y-%m-%d",
date: "$createdAt"
}
},
sets: {
$push: {
_id: "$_id",
exercise: "$exercise",
repetitions: "$repetitions",
rpe: "$rpe",
__v: "$__v",
weight: "$weight",
createdAt: "$createdAt"
}
}
}
},
{
"$addFields": {
"user": "UserID",
"date": "$_id"
}
},
{
$project: {
"_id": 0
}
}
])

Related

Removed mongodb document with the same fields

I have a scheduler running on my server and generating a list of medications that didn't sign every day, this problem occurs sometimes and while solving this issue I want to create a script checking and deleting those duplicate items generated. Appreciate your help.
Here some sample date generated : I want to check matched field [med, time, date] and return only 1 document then removed all duplicates.
[
{
"_id": ObjectId("6375ea026a0b4e0015d80f77"),
"med": ObjectId("610845e7f5b0e00017754d50"),
"Time": "8:00am",
"Date": "2022-11-16T08:00:00.008+0000"
},
{
"_id": ObjectId("6375ea026a0b4e0015d80fd4"),
"med": ObjectId("61f988e82cf5760018113cee"),
"Time": "7:00am",
"Date": "2022-11-16T08:00:00.008+0000"
},
{
"_id": ObjectId("6375ea026a0b4e0015d80fdd"),
"med": ObjectId("62d1c6e93603ed00177812ee"),
"Time": "6:00am",
"Date": "2022-11-16T08:00:00.008+0000"
},
{
"_id": ObjectId("6375ea02304a870015dfa2ec"),
"med": ObjectId("610845e7f5b0e00017754d50"),
"Time": "8:00am",
"Date": "2022-11-16T08:00:00.005+0000"
},
{
"_id": ObjectId("6375ea02304a870015dfa349"),
"med": ObjectId("61f988e82cf5760018113cee"),
"Time": "7:00am",
"Date": "2022-11-16T08:00:00.005+0000"
},
{
"_id": ObjectId("6375ea02304a870015dfa352"),
"med": ObjectId("62d1c6e93603ed00177812ee"),
"Time": "6:00am",
"Date": "2022-11-16T08:00:00.005+0000"
}
]
Expected output: (Note: Time and Date are different)
[
{
"med": ObjectId("610845e7f5b0e00017754d50"),
"Time": "8:00am",
"Date": "2022-11-16T08:00:00.008+0000"
},
{
"med": ObjectId("61f988e82cf5760018113cee"),
"Time": "7:00am",
"Date": "2022-11-16T08:00:00.008+0000"
},
{
"med": ObjectId("62d1c6e93603ed00177812ee"),
"Time": "6:00am",
"Date": "2022-11-16T08:00:00.008+0000"
},
]
Here Mongodb Playground for displaying result -> Mongodb Playground
Hope this answer will helpful
Note:- Your date is not same in every object so may be your expected output is wrong
i have attached playground url with code.
https://mongoplayground.net/p/hYrgPd7od02
db.collection.aggregate([
{
"$group": {
"_id": {
med: "$med",
Time: "$Time",
Date: "$Date",
},
"med": {
"$first": "$med"
},
"Time": {
"$first": "$Time"
},
"Date": {
"$first": "Date"
}
}
},
{
"$project": {
_id: 0
}
}
])

Including additional fields in a Mongodb aggregate query

I have a data structure like this. Each student will have multiple entries based on when they enter the classrooms. The query needs to get the latest record of each student based on a list of student ids and department name. It also should show the teacher id and last timestmap
[
{
"studentid": "stu-1234",
"dept": "geog",
"teacher_id": 1,
"LastSwipeTimestamp": "2021-11-25T10:50:00.5230694Z"
},
{
"studentid": "stu-1234",
"dept": "geog",
"teacher_id": 2,
"LastSwipeTimestamp": "2021-11-25T11:50:00.5230694Z"
},
{
"studentid": "stu-abc",
"dept": "geog",
"teacher_id": 11,
"LastSwipeTimestamp": "2021-11-25T09:15:00.5230694Z"
},
{
"studentid": "stu-abc",
"dept": "geog",
"teacher_id": 21,
"LastSwipeTimestamp": "2021-11-25T11:30:00.5230694Z"
}
]
Here is what I have, but it doesn't show teacher id or the last swipe timestamp. What do I need to change or add?
Maybe you need something like this
db.collection.aggregate([
{
$match: {
"studentid": {
"$in": [
"stu-abc",
"stu-1234"
]
},
"dept": "geog"
}
},
{
$sort: {
"LastSwipeTimestamp": -1
}
},
{
$group: {
"_id": {
"studentid": "$studentid",
"dept": "$dept"
},
"teacher_id": {
$first: "$teacher_id"
},
"LastSwipeTimestamp": {
$first: "$LastSwipeTimestamp"
}
}
},
{
$project: {
_id: 0,
"studentid": "$_id.studentid",
"dept": "$_id.dept",
"teacher_id": "$teacher_id",
"LastSwipeTimestamp": "$LastSwipeTimestamp"
}
}
])
explained:
You need to consider the not grouped fields in the $group stage so they are also available to the next $project stage...

Mongodb aggregate addField that contains the current object position in the array query

After the last aggregate pipeline I receive the current array:
[{
"cdrId": "61574b3e58fb1cae1494df2c",
"date": "2021-10-01T17:54:06.057Z",
"intentType": "FALLBACK"
},
{
"cdrId": "61574b3e58fb1cae1494df2c",
"date": "2021-10-01T17:54:06.057Z",
"intentType": "FAQ"
},
{
"cdrId": "61570b37522aba5e2f205356",
"date": "2021-10-01T13:20:55.601Z",
"intentType": "TRANS/DISAM"
},
{
"cdrId": "61570b37522aba5e2f205356",
"date": "2021-10-01T13:20:55.601Z",
"intentType": "FAQ"
}]
I'm looking to add a index field showing the current position of the object in the array. The output is going to be something like this:
[{
"index": 0,
"cdrId": "61574b3e58fb1cae1494df2c",
"date": "2021-10-01T17:54:06.057Z",
"intentType": "FALLBACK"
},
{
"index": 1,
"cdrId": "61574b3e58fb1cae1494df2c",
"date": "2021-10-01T17:54:06.057Z",
"intentType": "FAQ"
},
{
"index": 2,
"cdrId": "61570b37522aba5e2f205356",
"date": "2021-10-01T13:20:55.601Z",
"intentType": "TRANS/DISAM"
},
{
"index": 3,
"cdrId": "61570b37522aba5e2f205356",
"date": "2021-10-01T13:20:55.601Z",
"intentType": "FAQ"
}]
I will use this value if the next pipe sort this array. So I have it's original position before this sorting.
Is there a way that I can do this with aggregate? I'm using MongoDB 4.2.
Try this one:
db.collection.aggregate([
// {$sort: {...} },
{
$group: {
_id: null,
data: { $push: "$$ROOT" }
}
},
{
$unwind: {
path: "$data",
includeArrayIndex: "index"
}
},
{
$replaceRoot: {
newRoot: { $mergeObjects: [ "$data", {index: "$index"} ] }
}
}
])
Mongo Playground

MongoDB - Aggregate by distinct field then count per day

I have a mongodb database that collects device data.
Example document is
{
"_id" : ObjectId("5c125a185dea1b0252c5352"),
"time" : ISODate("2018-12-13T15:09:42.536Z"),
"mac" : "10:06:21:3e:0a:ff",
}
The goal would be to count the unique mac values per day, from the first document in the db to the last document in the db.
I've been playing around and came to the conclusion that I would need to have multiple groups as well as projects during my aggregations.
This is what I tried - not sure if it's in the right direction or not or just completely messed up.
pipeline = [
{"$project": {
"_id": 1,
"mac": 1,
"day": {
"$dayOfMonth":"$time"
},
"month": {
"$month":"$time"
},
"year": {
"$year":"$time"
}
}
},
{
"$project": {
"_id": 1,
"mac": 1,
"time": {
"$concat": [{
"$substr":["$year", 0, 4]
},
"-", {
"$substr": ["$month", 0, 2]
},
"-",
{
"$substr":["$day", 0, 2]
}]
}
}
},
{
"$group": {
"_id": {
"time": "$time",
"mac": "$mac"
}
},
"$group": {
"_id": "$_id.time",
"count":{"$sum": 1},
}
}
]
data = list(collection.aggregate(pipeline, allowDiskUse=True))
The output now doesn't look like it did any aggregation,
[{"_id": null, "count": 751050}]
I'm using Pymongo as my driver and using Mongodb 4.
Ideally it should just show the date and count (eg { "_id" : "2018-12-13", "count" : 2 }.
I would love some feedback and advice.
Thanks in advance.
I prefer to minimize the number of stages, and especially to avoid unnecessary $group stages. So I would do it with the following pipeline:
pipeline = [
{ '$group' : {
'_id': { '$dateToString': { 'format': "%Y-%m-%d", 'date': "$time" } },
'macs':{ '$addToSet': '$mac' }
} },
{$addFields:{ 'macs':{'$size':'$macs'}}}
]
There's an operator called "$dateToString", which would solve most of your problems.
Edit: Didn't read the question carefully, #Asya Kamsky, thank you for pointing out. Here' the new answer.
pipeline = [
{
"$group": {
"_id": {
"date": {
$dateToString: {
format: "%Y-%m-%d",
date: "$time"
}
},
"mac": "$mac"
}
}
},
{
"$group": {
"_id": "$_id.date",
"count": {
"$sum": 1
}
}
}
]
[
{
"$project": {
"_id": 1,
"mac": 1,
"time": { "$dateToString": { "format": "%Y-%m-%d", "date": "$time", "timezone": "Africa/Johannesburg"}}
},
},
{
"$group": {
"_id":{
"time": "$time",
"mac": "$mac",
}}},{
"$group": {
"_id": "$_id.time",
"count":{"$sum": 1}
}},
{"$sort": SON([("_id", -1)])}
]
Does exactly what it should do.
Thanks. :)

Schema design for MongoDB pre-aggregated reports

I'm following the official MongoDB docs (http://docs.mongodb.org/ecosystem/use-cases/pre-aggregated-reports/) about pre-aggregated reports. According to the tutorial, a pre-aggregated document should look like this:
{
_id: "20101010/site-1/apache_pb.gif",
metadata: {
date: ISODate("2000-10-10T00:00:00Z"),
site: "site-1",
page: "/apache_pb.gif" },
hourly: {
"0": 227850,
"1": 210231,
...
"23": 20457 },
minute: {
"0": {
"0": 3612,
"1": 3241,
...
"59": 2130 },
"1": {
"0": ...,
},
...
"23": {
"59": 2819 }
}
}
The thing is that I'm currently using this approach, and I already have some data stored this way. But now I want to add another dimension in the metadata subdocument and I was reconsidering the whole thing.
My question is: is there a reason to build the _id attribute with the same information stored in the metadata attribute? Wouldn't be enough to create a compound index (unique) around metadata and use an ObjectId for the _id key?
Thanks!
Other way ;)
You can create simple collection:
{
"ts": "unix timestamp",
"site": "site-1",
"page": "/apache_pb.gif"
}
this collection will be had a very good performance on insert
and using complex aggregate query (with aggregate by any time grain):
db.test.aggregate(
[
{
"$project": {
"ts": 1,
"_id": 0,
"grain": {
"$subtract": [
{
"$divide": [
"$ts",
3600
]
},
{
"$mod": [
{
"$divide": [
"$ts",
3600
]
},
1
]
}
]
},
"site": 1,
"page": 1
}
},
{
"$group": {
"_id": {
"site": "$site",
"page": "$page",
"grain": "$grain",
}
}
},
{
"$group": {
"tsum": {
"$sum": 1
},
"_id": {
"grain": "$_id.grain"
}
}
},
{
"$project": {
"tsum": "$tsum",
"_id": 0,
"grain": "$_id.grain"
}
},
{
"$sort": {
"grain": 1
}
}
])
aggregate your statistics by one hour - 3600 sec in this example
imho - this is a more simple and manageable solution without complex datamodel with good preformance (don't forget about index)