MongoDB query for finding schemes with more than a year? - mongodb

Given the following schema, I want to write a MongoDB scheme to find all schemes with duration of more than a year. ie (start - end) > 1 year.
I am not sure if I can specify such an expression in mongodb query (start - end) > 1 year.
{
"update" : ISODate("2017-09-26T15:22:13.172Z"),
"create" : ISODate("2017-09-26T15:22:13.172Z"),
"scheme" : {
"since" : ISODate("2017-09-26T15:22:13.172Z"),
"startDate": ISODate("2017-09-26T15:22:13.172Z"),
"endDate": ISODate("2018-09-26T15:22:13.172Z"),
},
}

You can use aggregation with subtract:
db.yourcollection.aggregate(
[{
$project : { dateDifference: { $subtract: [ "$scheme.endDate", "$scheme.endDate" ] }},
$match : { "$dateDifference" : { $gt : 31536000000 } }
}]);
(* 31536000000 = milliseconds per year)
You may use another $project to output any fields you need in the matching documents.

Related

MongoDB: calculate 90th percentile among all documents

I need to calculate the 90th percentile of the duration where the duration for each document is defined as finish_time - start_time.
My plan was to:
Create $project to calculate that duration in seconds for each document.
Calculate the index (in sorted documents list) that correspond to the 90th percentile: 90th_percentile_index = 0.9 * amount_of_documents.
Sort the documents by the $duration variable the was created.
Use the 90th_percentile_index to $limit the documents.
Choose the first document out of the limited subset of document.
I'm new to MongoDB so I guess the query can be improved. So, the query looks like:
db.getCollection('scans').aggregate([
{
$project: {
duration: {
$divide: [{$subtract: ["$finish_time", "$start_time"]}, 1000] // duration is in seconds
},
Percentile90Index: {
$multiply: [0.9, "$total_number_of_documents"] // I don't know how to get the total number of documents..
}
}
},
{
$sort : {"$duration": 1},
},
{
$limit: "$Percentile90Index"
},
{
$group: {
_id: "_id",
percentiles90 : { $max: "$duration" } // selecting the max, i.e, first document after the limit , should give the result.
}
}
])
The problem I have is that I don't know how to get the total_number_of_documents and therefore I can't calculate the index.
Example:
let's say I have only 3 documents:
{
"_id" : ObjectId("1"),
"start_time" : ISODate("2019-02-03T12:00:00.000Z"),
"finish_time" : ISODate("2019-02-03T12:01:00.000Z"),
}
{
"_id" : ObjectId("2"),
"start_time" : ISODate("2019-02-03T12:00:00.000Z"),
"finish_time" : ISODate("2019-02-03T12:03:00.000Z"),
}
{
"_id" : ObjectId("3"),
"start_time" : ISODate("2019-02-03T12:00:00.000Z"),
"finish_time" : ISODate("2019-02-03T12:08:00.000Z"),
}
So I would expect the result to be something like:
{
percentiles50 : 3 // in minutes, since percentiles50=3 is the minimum value that setisfies the request of atleast 50% of the documents have duration <= percentiles50
}
I used percentiles 50th in the example because I only gave 3 documents but it really doesn't matter, just show me please a query for the i-th percentiles and it will be fine :-)

Search inside a range of fields MongoDB

I have a DB on MongoDB, with a collection like this:
{
"_id" : ObjectId("59a64b4cfb80146432aff6ac"),
"name": "Mid Range",
"end" : NumberLong("50000"),
"start" : NumberLong("10000"),
},
{
"_id" : ObjectId("59a64b4cfb80146432aff6ac"),
"name": "Hi Range",
"end" : NumberLong("100000"),
"start" : NumberLong("150000"),
}
The user enters a number to validate: 125000, i need a query to get: "Hi Range" document.
How can i do that? I'm trying to avoid code side, but if there is no choice it's ok.
Thanks!
You could even make do without the $and. You just need to set the query to find a result such that your number is less than or equal to its end and greater than or equal to its start. For example:
db.collection.find({start: {$lte: 125000}, end: {$gte: 125000}})
Note: Careful, if you want that range to include the start and end number, use $lte, $gte. Using $lt or $gt will not include it.
You can do it like this -
db.find({name:/Hi Range/}) // regular Expression.
To avoid logic that determine that 12500 is Hi Range you could do something like this
db.collection.find( { $and: [ { start: { $lte: 125000 } }, { end: { $gt: 12500 } } ] } )

Jaspersoft mongo aggregation query use parameter in match expression to get one or all results

I have designed JasperReports report with MongoDB data source. See my mongodb pipeline query below:
{
runCommand:{
aggregate:"my_collection",
pipeline:[
{$match :
{$and : [
{ tenant_id: 1},
{ $or: [ { $P{Location}: -1 }, { location : $P{Location}} ] },
]}},
{
$project : {
product_attribute_value : 1,
inventory_on_hand : 1 ,
unit_cost : 1
}
},
{
$group : {
_id : "$product_attribute_value",
itemsCount: { $sum : 1 },
inventoryValue:{$multiply : ["$inventory_on_hand", "$unit_cost"] },
}
}
}
]
}
}
I have a location parameter with corresponding input control as a dropdown. I want to group data based on change in location dropdown. The location dropdown has ALL as a default value. When user selects ALL then parameter value will be -1 and I want to get records for all locations. But $P{Location} is not a valid mongo field so it is not working.
I know other way, have a project stage before match and have a literal defined with value -1 and use that literal in match stage, but if I use project before match stage then it will fetch all data and then will apply match. This will degrade performance. I don't want to do this. I want to apply filters first and then pass filtered documents to pipeline stages.
Please suggest me an alternative.

MongoDB Aggregate Time Series

I'm using MongoDB to store time series data using a similar structure to "The Document-Oriented Design" explained here: http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb
The objective is to query for the top 10 busiest minutes of the day on the whole system. Each document stores 1 hour of data using 60 sub-documents (1 for each minute). Each minute stores various metrics embedded in the "vals" field. The metric I care about is "orders". A sample document looks like this:
{
"_id" : ObjectId("54d023802b1815b6ef7162a4"),
"user" : "testUser",
"hour" : ISODate("2015-01-09T13:00:00Z"),
"vals" : {
"0" : {
"orders" : 11,
"anotherMetric": 15
},
"1" : {
"orders" : 12,
"anotherMetric": 20
},
.
.
.
}
}
Note there are many users in the system.
I've managed to flatten the structure (somewhat) by doing an aggregate with the following group object:
group = {
$group: {
_id: {
hour: "$hour"
},
0: {$sum: "$vals.0.orders"},
1: {$sum: "$vals.1.orders"},
2: {$sum: "$vals.2.orders"},
.
.
.
}
}
But that just gives me 24 documents (1 for each hour) with the # of orders for each minute during that hour, like so:
{
"_id" : {
"hour" : ISODate("2015-01-20T14:00:00Z")
},
"0" : 282086,
"1" : 239358,
"2" : 289188,
.
.
.
}
Now I need to somehow get the top 10 minutes of the day from this but I'm not sure how. I suspect it can be done with $project, but I'm not sure how.
You could aggregate as:
$match the documents for the specific date.
Construct the $group and $project objects before querying.
$group by the $hour, accumulate all the documents per hour per
minute in an array.Keep the minute somewhere within the document.
$project a variable docs as $setUnion of all the documents per
hour.
$unwind the documents.
$sort by orders
$limit the top 10 documents which is what we require.
Code:
var inputDate = new ISODate("2015-01-09T13:00:00Z");
var group = {};
var set = [];
for(var i=0;i<=60;i++){
group[i] = {$push:{"doc":"$vals."+i,
"hour":"$_id.hour",
"min":{$literal:i}}};
set.push("$"+i);
}
group["_id"] = {$hour:"$hour"};
var project = {"docs":{$setUnion:set}}
db.t.aggregate([
{$match:{"hour":{$lte:inputDate,$gte:inputDate}}},
{$group:group},
{$project:project},
{$unwind:"$docs"},
{$sort:{"docs.doc.orders":-1}},
{$limit:2},
{$project:{"_id":0,
"hour":"$_id",
"doc":"$docs.doc",
"min":"$docs.min"}}
])

Mongodb Aggregation Framework and timestamp

I have a collection
{ "_id" : 1325376000, "value" : 13393}
{ "_id" : 1325462400, "value" : 13393}
ObjectIds are Unix Timestamp and are storing as Number manually.(at insert time).
now I'm searching for a solution that i could calculate sum of values for each month with Aggregation Framework.
Here is a way you can do it by generating the aggregation pipeline programmatically:
numberOfMonths=24; /* number of months you want to go back from today's */
now=new Date();
year=now.getFullYear();
mo=now.getMonth();
months=[];
for (i=0;i<numberOfMonths;i++) {
m1=mo-i+1; m2=m1-1;
d = new Date(year,m1,1);
d2=new Date(year,m2,1);
from= d2.getTime()/1000;
to= d.getTime()/1000;
dt={from:from, to:to, month:d2}; months.push(dt);
}
prev="$nothing";
cond={};
months.forEach(function(m) {
cond={$cond: [{$and :[ {$gte:["$_id",m.from]}, {$lt:["$_id",m.to]} ]}, m.month, prev]};
prev=cond;
} );
/* now you can use "cond" variable in your pipeline to generate month */
db.collection.aggregate( { $project: { month: cond , value:1 } },
{ $group: {_id:"$month", sum:{$sum:"$value"} } }
)