MongoDB Query for Time Series data - mongodb

I am trying to write a find query to retrieve data only for first hour i.e. hourly : "1" in the following events document. Following is the output from db.events.find().pretty().
In real scenario, I would be finding based on id and hour.
{
"_id" : "08062017/cpu",
"metadata" : {
"host" : "localhost",
"metric" : "cpu"
},
"hourly" : {
"0" : {
"total" : 234,
"used" : 123
},
"1" : {
"total" : 234,
"used" : 123
}
}
}

Related

Query Time series data based on Date

If I have the below document, I would like to return the same document but with hourly.1 based on the minute inside the date field.
Does anyone know how to dynamically do this? If the date had 0 minute I want the hourly.0 returned. All the way up to the 59th minute.
{
"_id" : "08062017/cpu",
"date" : ISODate("2018-04-11T02:01:00.000Z"),
"metadata" : {
"host" : "localhost",
"metric" : "cpu"
},
"hourly" : {
"0" : {
"total" : 234,
"used" : 123
},
"1" : {
"total" : 234,
"used" : 123
}
}
}
RESULT:
{
"_id" : "08062017/cpu",
"date" : ISODate("2018-04-11T02:01:00.000Z"),
"metadata" : {
"host" : "localhost",
"metric" : "cpu"
},
"hourly" : {
"0" : {
"total" : 234,
"used" : 123
}
}
}

Query data in MongoDB vs Filtering in Code

This question is performance based.
If I have a collection which I want to query on multiple fields (fieldValue < x < fieldValue, status = 'pending' etc...) is it better to query via a mongoDB query or rather to retrieve a sample of the collection that fits some simpler query such as status = 'pending' and then do further filtering of the data in the server code?
When would you recommend which approach and when not?
Thank you for your taking your time.
Regards,
Emir
Go for single query option, since the filtering most of the unwanted data and fetching the required data from the Database should be done in the Database itself. Any additional operations will take its own time and resources to complete the same job. Here in this case we can use $and, $gt, $lt, $eq. Performance will be high if the data is operated at the Data layer itself.
Sample Collection
{ "_id" : ObjectId("5a13c7e08e1b021d0f556c29"), "value" : 10, "status" : "pending" }
{ "_id" : ObjectId("5a13c7e58e1b021d0f556c2a"), "value" : 20, "status" : "completed" }
{ "_id" : ObjectId("5a13c7e88e1b021d0f556c2b"), "value" : 40, "status" : "In Progress" }
{ "_id" : ObjectId("5a13c7ec8e1b021d0f556c2c"), "value" : 50, "status" : "pending" }
{ "_id" : ObjectId("5a13c7f08e1b021d0f556c2d"), "value" : 750, "status" : "completed" }
{ "_id" : ObjectId("5a13c7f68e1b021d0f556c2e"), "value" : 90, "status" : "pending" }
{ "_id" : ObjectId("5a13c7fb8e1b021d0f556c2f"), "value" : 190, "status" : "pending" }
{ "_id" : ObjectId("5a13c7fe8e1b021d0f556c30"), "value" : 120, "status" : "completed" }
{ "_id" : ObjectId("5a13c8038e1b021d0f556c31"), "value" : 220, "status" : "completed" }
{ "_id" : ObjectId("5a13c8078e1b021d0f556c32"), "value" : 720, "status" : "pending" }
{ "_id" : ObjectId("5a13c80b8e1b021d0f556c33"), "value" : 7420, "status" : "In Progress" }
Sample Query: 20 < x < 300 and status = pending
db.collection.find({$and:[{value:{$gt: 20}, value:{$lt:300}, status:{$eq:"pending"}}]})
The result will be
{ "_id" : ObjectId("5a13c7e08e1b021d0f556c29"), "value" : 10, "status" : "pending" }
{ "_id" : ObjectId("5a13c7ec8e1b021d0f556c2c"), "value" : 50, "status" : "pending" }
{ "_id" : ObjectId("5a13c7f68e1b021d0f556c2e"), "value" : 90, "status" : "pending" }
{ "_id" : ObjectId("5a13c7fb8e1b021d0f556c2f"), "value" : 190, "status" : "pending" }
Hope it helps!

Getting array of object with limit and offset doesn't work using mongodb

First let me say that I am new to mongodb. I am trying to get the data from the collection
Here is the document in my collection student:
{
"_id" : ObjectId("5979e0473f00003717a9bd62"),
"id" : "l_7c0e37b9-132e-4054-adbf-649dbc29f43d",
"name" : "Raj",
"class" : "10th",
"assignments" : [
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc571",
"name" : "1"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc572",
"name" : "2"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc573",
"name" : "3"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc574",
"name" : "4"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc575",
"name" : "5"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc576",
"name" : "6"
}
]
}
the output which i require is
{
"assignments" : [
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc571",
"name" : "1"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc572",
"name" : "2"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc573",
"name" : "3"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc574",
"name" : "4"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc575",
"name" : "5"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc576",
"name" : "6"
}
]
}
for this response i used the following query
db.getCollection('student').find({},{"assignments":1})
Now what exactly I am trying is to apply limit and offset for the comments list I tried with $slice:[0,3] but it gives me whole document with sliced result
but not assignments alone so how can I combine these two in order to get only assignments with limit and offset.
You'll need to aggregate rather than find because aggregate allows you to project+slice.
Given the document from your question, the following command ...
db.getCollection('student').aggregate([
// project on assignments and apply a slice to the projection
{$project: {assignments: {$slice: ['$assignments', 2, 5]}}}
])
... returns:
{
"_id" : ObjectId("5979e0473f00003717a9bd62"),
"assignments" : [
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc573",
"name" : "3"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc574",
"name" : "4"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc575",
"name" : "5"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc576",
"name" : "6"
}
]
}
This represents the assignments array (and only the assignments array) with a slice from element 2 to 5. You can change the slice arguments (2, 5 in the above example) to apply your own offset and limit (where the first argument is the offset and the limit is the difference between the first and second arguments).
If you want to add a match condition (to address specific documents) to the above then you'd do something like this:
db.getCollection('other').aggregate([
/// match a specific document
{$match: {"_id": ObjectId("5979e0473f00003717a9bd62")}},
// project on assignments and apply a slice to the projection
{$project: {assignments: {$slice: ['$assignments', 2, 5]}}}
])
More details on the match step here.

MongoDB Aggregation for Time Series data

I have following document structure (events collection) for a time series data. I have decided this structure based on the article at https://docs.mongodb.com/ecosystem/use-cases/pre-aggregated-reports-mmapv1/
{
"_id" : "06062017/cpu",
"metadata" : {
"host" : "localhost",
"createdDate" : ISODate("2017-06-09T11:18:56.120Z"),
"metric" : "cpu"
},
"hourly" : {
"0" : {
"total" : 2,
"used" : 1
},
"1" : {
"total" : 3,
"used" : 2
}
},
"minute" : {
"0" : {
"0" : {
"total" : 9789789,
"used" : 353
},
"1" : {
"total" : 0,
"used" : 0
}
},
"1" : {
"0" : {
"total" : 0,
"used" : 0
},
"1" : {
"total" : 234234,
"used" : 123
}
}
}
}
Now I am trying to get the average cpu used from minutes and store it for hourly. Aggregation and map functions work well on Lists and arrays, I could not get the average per hour from minutes as well as I would like to get average per day as well.
I am using java as programming language. Is is better to calculate with in the application or with the aggregation framework?
Any help is greatly appreciated.

how to use aggregation of mongodb

From the data as given below, I want to sum all Values fields.
Please let me know how can I do it using aggregation functionality of mongodb.
{"MetricRecord":
{ "SchemaVersion" : "0.12",
"Product": {
"ProductName" : "abc",
"ProductVersion": "7.5.0.1" ,
"ProductId" : "1234567890ABDFGH12345",
"InstanceId" : "12345BA32",
"InstanceName" : "1234SS123",
"SystemId" : "somehost.com"
},
"Tenant" : {
"CustomerId" : "222-555-124",
"ServiceCode": "xyzxyzxyz12345yyy"
},
"Metrics" : [
{
"ReportType" :[
{ "report" : "billing" },
],
"LogTime" : "2013-12-08T12:34:56:01Z" ,
"Type" : "AuthorizedUsers",
"SubType" : "registered",
"Value" : "125",
"UnitOfMeasure": "USD",
"Period" : {
"StartTime" : "2013-12-07T00:00:00:01Z",
"EndTime" : "2013-12-08T00:00:00:01Z"
}
},
{
"ReportType" :[
{ "report" : "billing" }
],
"LogTime" : "2013-12-08T12:34:56:01Z" ,
"Type" : "NumberOfTickets",
"SubType" : "resolved",
"Value" : "430",
"UnitOfMeasure": "USD",
"Period" : {
"StartTime" : "2013-12-07T00:00:00:01Z",
"EndTime" : "2013-12-08T00:00:00:01Z"
}
}
]
}
}
So, results which I expect from summation of values is 430+125 i.e. 555
Your document contains string value for MetricRecord.Metrics[index].Value field and i am not sure why are you trying to sum up the string values. if it is a typo and your document contains numerical values for MetricRecord.Metrics[index].Value field then you can try the following query
db.metrics.aggregate([
{$unwind:"$MetricRecord.Metrics"},
{$group:{_id:"$_id",sum:{$sum:"$MetricRecord.Metrics.Value"}}}
])
In the above document posted, if your value field is like
MetricRecord.Metrics[0].Value is 125(not "125")
MetricRecord.Metrics[1].Value is 430(not "430")
you will get the following output
{
"result" : [
{
"_id" : ObjectId("xxxxxxxxxxxxxxxxxxxxxxxx"),
"sum" : 555
}
],
"ok" : 1
}
The above sample query is composed assuming you have the default mongodb "_id" field and you are using a metrics collection. You have to manipulate the query as per you requirements.