I'm following the official MongoDB docs (http://docs.mongodb.org/ecosystem/use-cases/pre-aggregated-reports/) about pre-aggregated reports. According to the tutorial, a pre-aggregated document should look like this:
{
_id: "20101010/site-1/apache_pb.gif",
metadata: {
date: ISODate("2000-10-10T00:00:00Z"),
site: "site-1",
page: "/apache_pb.gif" },
hourly: {
"0": 227850,
"1": 210231,
...
"23": 20457 },
minute: {
"0": {
"0": 3612,
"1": 3241,
...
"59": 2130 },
"1": {
"0": ...,
},
...
"23": {
"59": 2819 }
}
}
The thing is that I'm currently using this approach, and I already have some data stored this way. But now I want to add another dimension in the metadata subdocument and I was reconsidering the whole thing.
My question is: is there a reason to build the _id attribute with the same information stored in the metadata attribute? Wouldn't be enough to create a compound index (unique) around metadata and use an ObjectId for the _id key?
Thanks!
Other way ;)
You can create simple collection:
{
"ts": "unix timestamp",
"site": "site-1",
"page": "/apache_pb.gif"
}
this collection will be had a very good performance on insert
and using complex aggregate query (with aggregate by any time grain):
db.test.aggregate(
[
{
"$project": {
"ts": 1,
"_id": 0,
"grain": {
"$subtract": [
{
"$divide": [
"$ts",
3600
]
},
{
"$mod": [
{
"$divide": [
"$ts",
3600
]
},
1
]
}
]
},
"site": 1,
"page": 1
}
},
{
"$group": {
"_id": {
"site": "$site",
"page": "$page",
"grain": "$grain",
}
}
},
{
"$group": {
"tsum": {
"$sum": 1
},
"_id": {
"grain": "$_id.grain"
}
}
},
{
"$project": {
"tsum": "$tsum",
"_id": 0,
"grain": "$_id.grain"
}
},
{
"$sort": {
"grain": 1
}
}
])
aggregate your statistics by one hour - 3600 sec in this example
imho - this is a more simple and manageable solution without complex datamodel with good preformance (don't forget about index)
Related
I have a collection of documents that contains 3 fields DateTime, Score, and Name. I would like to limit data to display only relevant information on mongo charts. Basically what I need is to select document with Minimal date and Maximum date and pass this information to MongoDB charts. Can you please suggest the best way, how can I do this?
Example document:
{
"_id": {
"$oid": "62f172b99d3a18179cee4c4c"
},
"Name": "pc",
"Score": 46,
"DateTime": {
"$date": {
"$numberLong": "1659646800000"
}
}
},
{
"_id": {
"$oid": "62f172b99d3a18179cee4c4c"
},
"Name": "pc",
"Score": 46,
"DateTime": {
"$date": {
"$numberLong": "1649646800000"
}
}
}
There are number of these kinds of documents and have different values taken at different time period. So I was able to write simple query to Sort by date and limit to one entry which returns only document with minimal date or maxium. Expected output for mew would be to return both of them
With MongoDB v5.0+, you can use $setWindowFields to compute a rank according to ascending and descending sort of DateTime. Then pick those with rank: 1 to choose the max/min DateTime.
db.collection.aggregate([
{
"$setWindowFields": {
"partitionBy": null,
"sortBy": {
"DateTime": 1
},
"output": {
"minRank": {
$rank: {}
}
}
}
},
{
"$setWindowFields": {
"partitionBy": null,
"sortBy": {
"DateTime": -1
},
"output": {
"maxRank": {
$rank: {}
}
}
}
},
{
"$match": {
$expr: {
$or: [
{
$eq: [
"$minRank",
1
]
},
{
$eq: [
"$maxRank",
1
]
}
]
}
}
},
{
// cosmetics
"$unset": [
"minRank",
"maxRank"
]
}
])
Here is the Mongo Playground for your reference.
My collection called "sets" currently looks like this:
[{
"_id": {
"$oid": "61c2c90b04a5c1fd873bca6c"
},
"exercise": "Flat Barbell Bench Press",
"repetitions": 8,
"rpe": 8,
"__v": 0,
"weight": 90,
"createdAt": {
"$date": {
"$numberLong": "1640155403594"
}
}
}]
It's an array with about 1500 documents several months worth of workouts.
What I'm trying to accomplish is this:
[{
"_id": {
"$oid": "62f3cee8d149f0c3534d848c"
},
"user": {
"$oid": "62d11eaa0caf6d2b3133b4b9"
},
"sets": [
{
"weight": 50,
"exercise": "Bench Press",
"repetitions": 8,
"rpe": 8,
"notes": "some note",
"_id": {
"$oid": "62f3cee8d149f0c3534d848d"
}
},
{},
{}
],
"createdAt": {
"$date": {
"$numberLong": "1660145384923"
}
}
}]
Essentially, what I'm trying to accomplish here is embedding an array of "set" objects as a field value for "sets" field. So that instead of a list of sets I have a list of workouts where sets are stored as an array of objects in a field called "sets".
Each "set" object has a date stamp and what I also need to do is to group these sets by day. So at the end of the day each new document represents one workout and has an id, user and sets fields, where each set is from that day.
My stackoverflow research tells me that I need to use aggregation, but I can't quite wrap my mind around how exactly I would do that.
Any help would be greatly appreciated!
/* UPDATE */
Here's the final query I came up with, hope someone will find it useful.
db.collection.aggregate([
{
$group: {
_id: {
$dateToString: {
format: "%Y-%m-%d",
date: "$createdAt"
}
},
sets: {
$push: {
_id: "$_id",
exercise: "$exercise",
repetitions: "$repetitions",
rpe: "$rpe",
__v: "$__v",
weight: "$weight",
createdAt: "$createdAt"
}
}
}
},
{
"$addFields": {
"user": "UserID",
"date": "$_id"
}
},
{
$project: {
"_id": 0
}
}
])
I am having a huge collection of objects where the data is stored for different employees.
{
"employee": "Joe",
"areAllAttributesMatched": false,
"characteristics": [
{
"step": "A",
"name": "house",
"score": "1"
},
{
"step": "B",
"name": "car"
},
{
"step": "C",
"name": "job",
"score": "3"
}
]
}
There are cases where the score for an object is completely missing and I want to find out all these details from the database.
In order to do this, I have written the following query, but seems I am going wrong somewhere due to which it is not displaying the output.
I want the data in the following format for this query, so that it is easy to find out which employee is missing the score for which step and which name.
db.collection.aggregate([
{
"$unwind": "$characteristics"
},
{
"$match": {
"characteristics.score": {
"$exists": false
}
}
},
{
"$project": {
"employee": 1,
"name": "$characteristics.name",
"step": "$characteristics.step",
_id: 0
}
}
])
You need to use $exists to check the existence
playground
You can use $ifNull to handle both cases of 1. the score field is missing 2. score is null.
db.collection.aggregate([
{
"$unwind": "$characteristics"
},
{
"$match": {
$expr: {
$eq: [
{
"$ifNull": [
"$characteristics.score",
null
]
},
null
]
}
}
},
{
"$group": {
_id: null,
documents: {
$push: {
"employee": "$employee",
"name": "$characteristics.name",
"step": "$characteristics.step",
}
}
}
},
{
$project: {
_id: false
}
}
])
Here is the Mongo playground for your reference.
I've been using MongoDB for just a week and I have problems achieving this result: I want to group my documents by date while also keeping track of the number of entries that have a certain field set to a certain value.
So, my documents look like this:
{
"_id" : ObjectId("5f3f79fc266a891167ca8f65"),
"recipe" : "A",
"timestamp" : ISODate("2020-08-22T09:38:36.306Z")
}
where recipe is either "A", "B" or "C". Right now I'm grouping the documents by date using this pymongo query:
mongo.db.aggregate(
# Pipeline
[
# Stage 1
{
"$project": {
"createdAt": {
"$dateToString": {
"format": "%Y-%m-%d",
"date": "$timestamp"
}
},
"progressivo": 1,
"temperatura_fusione": 1
}
},
# Stage 2
{
"$group": {
"_id": {
"createdAt": "$createdAt"
},
"products": {
"$sum": 1
}
}
},
# Stage 3
{
"$project": {
"label": "$_id.createdAt",
"value": "$products",
"_id": 0
}
}])
Which gives me results like this:
[{"label": "2020-08-22", "value": 1}, {"label": "2020-08-15", "value": 2}, {"label": "2020-08-11", "value": 1}, {"label": "2020-08-21", "value": 5}]
What I'd like to have is also the counting of how many times each recipe appears on every date. So, if for example on August 21 I have 2 entries with the "A" recipe, 3 with the "B" recipe and 0 with the "C" recipe, the desired output would be
{"label": "2020-08-21", "value": 5, "A": 2, "B":3, "C":0}
Do you have any tips?
Thank you!
You can do like following, what have you done is excellent. After that,
In second grouping, We just get total value and value of each recipe.
$map is used to go through/modify each objects
$arrayToObject is used to covert the array what we have done via map (key : value pair) to object
$ifNull is used for, sometimes your data might not have "A" or "B" or "C". But you need the value should be 0 if there is no name as expected output.
Here is the code
[
{
"$project": {
"createdAt": {
"$dateToString": {
"format": "%Y-%m-%d",
"date": "$timestamp"
}
},
recipe: 1,
"progressivo": 1,
"temperatura_fusione": 1
}
},
{
"$group": {
"_id": {
"createdAt": "$createdAt",
"recipeName": "$recipe",
},
"products": {
$sum: 1
}
}
},
{
"$group": {
"_id": "$_id.createdAt",
value: {
$sum: "$products"
},
recipes: {
$push: {
name: "$_id.recipeName",
val: "$products"
}
}
}
},
{
$project: {
"content": {
"$arrayToObject": {
"$map": {
"input": "$recipes",
"as": "el",
"in": {
"k": "$$el.name",
"v": "$$el.val"
}
}
}
},
value: 1
}
},
{
$project: {
_id: 1,
value: 1,
A: {
$ifNull: [
"$content.A",
0
]
},
B: {
$ifNull: [
"$content.B",
0
]
},
C: {
$ifNull: [
"$content.C",
0
]
}
}
}
]
Working Mongo playground
I have a mongodb database that collects device data.
Example document is
{
"_id" : ObjectId("5c125a185dea1b0252c5352"),
"time" : ISODate("2018-12-13T15:09:42.536Z"),
"mac" : "10:06:21:3e:0a:ff",
}
The goal would be to count the unique mac values per day, from the first document in the db to the last document in the db.
I've been playing around and came to the conclusion that I would need to have multiple groups as well as projects during my aggregations.
This is what I tried - not sure if it's in the right direction or not or just completely messed up.
pipeline = [
{"$project": {
"_id": 1,
"mac": 1,
"day": {
"$dayOfMonth":"$time"
},
"month": {
"$month":"$time"
},
"year": {
"$year":"$time"
}
}
},
{
"$project": {
"_id": 1,
"mac": 1,
"time": {
"$concat": [{
"$substr":["$year", 0, 4]
},
"-", {
"$substr": ["$month", 0, 2]
},
"-",
{
"$substr":["$day", 0, 2]
}]
}
}
},
{
"$group": {
"_id": {
"time": "$time",
"mac": "$mac"
}
},
"$group": {
"_id": "$_id.time",
"count":{"$sum": 1},
}
}
]
data = list(collection.aggregate(pipeline, allowDiskUse=True))
The output now doesn't look like it did any aggregation,
[{"_id": null, "count": 751050}]
I'm using Pymongo as my driver and using Mongodb 4.
Ideally it should just show the date and count (eg { "_id" : "2018-12-13", "count" : 2 }.
I would love some feedback and advice.
Thanks in advance.
I prefer to minimize the number of stages, and especially to avoid unnecessary $group stages. So I would do it with the following pipeline:
pipeline = [
{ '$group' : {
'_id': { '$dateToString': { 'format': "%Y-%m-%d", 'date': "$time" } },
'macs':{ '$addToSet': '$mac' }
} },
{$addFields:{ 'macs':{'$size':'$macs'}}}
]
There's an operator called "$dateToString", which would solve most of your problems.
Edit: Didn't read the question carefully, #Asya Kamsky, thank you for pointing out. Here' the new answer.
pipeline = [
{
"$group": {
"_id": {
"date": {
$dateToString: {
format: "%Y-%m-%d",
date: "$time"
}
},
"mac": "$mac"
}
}
},
{
"$group": {
"_id": "$_id.date",
"count": {
"$sum": 1
}
}
}
]
[
{
"$project": {
"_id": 1,
"mac": 1,
"time": { "$dateToString": { "format": "%Y-%m-%d", "date": "$time", "timezone": "Africa/Johannesburg"}}
},
},
{
"$group": {
"_id":{
"time": "$time",
"mac": "$mac",
}}},{
"$group": {
"_id": "$_id.time",
"count":{"$sum": 1}
}},
{"$sort": SON([("_id", -1)])}
]
Does exactly what it should do.
Thanks. :)