Given the following schema, I want to write a MongoDB scheme to find all schemes with duration of more than a year. ie (start - end) > 1 year.
I am not sure if I can specify such an expression in mongodb query (start - end) > 1 year.
{
"update" : ISODate("2017-09-26T15:22:13.172Z"),
"create" : ISODate("2017-09-26T15:22:13.172Z"),
"scheme" : {
"since" : ISODate("2017-09-26T15:22:13.172Z"),
"startDate": ISODate("2017-09-26T15:22:13.172Z"),
"endDate": ISODate("2018-09-26T15:22:13.172Z"),
},
}
You can use aggregation with subtract:
db.yourcollection.aggregate(
[{
$project : { dateDifference: { $subtract: [ "$scheme.endDate", "$scheme.endDate" ] }},
$match : { "$dateDifference" : { $gt : 31536000000 } }
}]);
(* 31536000000 = milliseconds per year)
You may use another $project to output any fields you need in the matching documents.
Related
I need to calculate the 90th percentile of the duration where the duration for each document is defined as finish_time - start_time.
My plan was to:
Create $project to calculate that duration in seconds for each document.
Calculate the index (in sorted documents list) that correspond to the 90th percentile: 90th_percentile_index = 0.9 * amount_of_documents.
Sort the documents by the $duration variable the was created.
Use the 90th_percentile_index to $limit the documents.
Choose the first document out of the limited subset of document.
I'm new to MongoDB so I guess the query can be improved. So, the query looks like:
db.getCollection('scans').aggregate([
{
$project: {
duration: {
$divide: [{$subtract: ["$finish_time", "$start_time"]}, 1000] // duration is in seconds
},
Percentile90Index: {
$multiply: [0.9, "$total_number_of_documents"] // I don't know how to get the total number of documents..
}
}
},
{
$sort : {"$duration": 1},
},
{
$limit: "$Percentile90Index"
},
{
$group: {
_id: "_id",
percentiles90 : { $max: "$duration" } // selecting the max, i.e, first document after the limit , should give the result.
}
}
])
The problem I have is that I don't know how to get the total_number_of_documents and therefore I can't calculate the index.
Example:
let's say I have only 3 documents:
{
"_id" : ObjectId("1"),
"start_time" : ISODate("2019-02-03T12:00:00.000Z"),
"finish_time" : ISODate("2019-02-03T12:01:00.000Z"),
}
{
"_id" : ObjectId("2"),
"start_time" : ISODate("2019-02-03T12:00:00.000Z"),
"finish_time" : ISODate("2019-02-03T12:03:00.000Z"),
}
{
"_id" : ObjectId("3"),
"start_time" : ISODate("2019-02-03T12:00:00.000Z"),
"finish_time" : ISODate("2019-02-03T12:08:00.000Z"),
}
So I would expect the result to be something like:
{
percentiles50 : 3 // in minutes, since percentiles50=3 is the minimum value that setisfies the request of atleast 50% of the documents have duration <= percentiles50
}
I used percentiles 50th in the example because I only gave 3 documents but it really doesn't matter, just show me please a query for the i-th percentiles and it will be fine :-)
I have a DB on MongoDB, with a collection like this:
{
"_id" : ObjectId("59a64b4cfb80146432aff6ac"),
"name": "Mid Range",
"end" : NumberLong("50000"),
"start" : NumberLong("10000"),
},
{
"_id" : ObjectId("59a64b4cfb80146432aff6ac"),
"name": "Hi Range",
"end" : NumberLong("100000"),
"start" : NumberLong("150000"),
}
The user enters a number to validate: 125000, i need a query to get: "Hi Range" document.
How can i do that? I'm trying to avoid code side, but if there is no choice it's ok.
Thanks!
You could even make do without the $and. You just need to set the query to find a result such that your number is less than or equal to its end and greater than or equal to its start. For example:
db.collection.find({start: {$lte: 125000}, end: {$gte: 125000}})
Note: Careful, if you want that range to include the start and end number, use $lte, $gte. Using $lt or $gt will not include it.
You can do it like this -
db.find({name:/Hi Range/}) // regular Expression.
To avoid logic that determine that 12500 is Hi Range you could do something like this
db.collection.find( { $and: [ { start: { $lte: 125000 } }, { end: { $gt: 12500 } } ] } )
I have designed JasperReports report with MongoDB data source. See my mongodb pipeline query below:
{
runCommand:{
aggregate:"my_collection",
pipeline:[
{$match :
{$and : [
{ tenant_id: 1},
{ $or: [ { $P{Location}: -1 }, { location : $P{Location}} ] },
]}},
{
$project : {
product_attribute_value : 1,
inventory_on_hand : 1 ,
unit_cost : 1
}
},
{
$group : {
_id : "$product_attribute_value",
itemsCount: { $sum : 1 },
inventoryValue:{$multiply : ["$inventory_on_hand", "$unit_cost"] },
}
}
}
]
}
}
I have a location parameter with corresponding input control as a dropdown. I want to group data based on change in location dropdown. The location dropdown has ALL as a default value. When user selects ALL then parameter value will be -1 and I want to get records for all locations. But $P{Location} is not a valid mongo field so it is not working.
I know other way, have a project stage before match and have a literal defined with value -1 and use that literal in match stage, but if I use project before match stage then it will fetch all data and then will apply match. This will degrade performance. I don't want to do this. I want to apply filters first and then pass filtered documents to pipeline stages.
Please suggest me an alternative.
I'm using MongoDB to store time series data using a similar structure to "The Document-Oriented Design" explained here: http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb
The objective is to query for the top 10 busiest minutes of the day on the whole system. Each document stores 1 hour of data using 60 sub-documents (1 for each minute). Each minute stores various metrics embedded in the "vals" field. The metric I care about is "orders". A sample document looks like this:
{
"_id" : ObjectId("54d023802b1815b6ef7162a4"),
"user" : "testUser",
"hour" : ISODate("2015-01-09T13:00:00Z"),
"vals" : {
"0" : {
"orders" : 11,
"anotherMetric": 15
},
"1" : {
"orders" : 12,
"anotherMetric": 20
},
.
.
.
}
}
Note there are many users in the system.
I've managed to flatten the structure (somewhat) by doing an aggregate with the following group object:
group = {
$group: {
_id: {
hour: "$hour"
},
0: {$sum: "$vals.0.orders"},
1: {$sum: "$vals.1.orders"},
2: {$sum: "$vals.2.orders"},
.
.
.
}
}
But that just gives me 24 documents (1 for each hour) with the # of orders for each minute during that hour, like so:
{
"_id" : {
"hour" : ISODate("2015-01-20T14:00:00Z")
},
"0" : 282086,
"1" : 239358,
"2" : 289188,
.
.
.
}
Now I need to somehow get the top 10 minutes of the day from this but I'm not sure how. I suspect it can be done with $project, but I'm not sure how.
You could aggregate as:
$match the documents for the specific date.
Construct the $group and $project objects before querying.
$group by the $hour, accumulate all the documents per hour per
minute in an array.Keep the minute somewhere within the document.
$project a variable docs as $setUnion of all the documents per
hour.
$unwind the documents.
$sort by orders
$limit the top 10 documents which is what we require.
Code:
var inputDate = new ISODate("2015-01-09T13:00:00Z");
var group = {};
var set = [];
for(var i=0;i<=60;i++){
group[i] = {$push:{"doc":"$vals."+i,
"hour":"$_id.hour",
"min":{$literal:i}}};
set.push("$"+i);
}
group["_id"] = {$hour:"$hour"};
var project = {"docs":{$setUnion:set}}
db.t.aggregate([
{$match:{"hour":{$lte:inputDate,$gte:inputDate}}},
{$group:group},
{$project:project},
{$unwind:"$docs"},
{$sort:{"docs.doc.orders":-1}},
{$limit:2},
{$project:{"_id":0,
"hour":"$_id",
"doc":"$docs.doc",
"min":"$docs.min"}}
])
I have a collection
{ "_id" : 1325376000, "value" : 13393}
{ "_id" : 1325462400, "value" : 13393}
ObjectIds are Unix Timestamp and are storing as Number manually.(at insert time).
now I'm searching for a solution that i could calculate sum of values for each month with Aggregation Framework.
Here is a way you can do it by generating the aggregation pipeline programmatically:
numberOfMonths=24; /* number of months you want to go back from today's */
now=new Date();
year=now.getFullYear();
mo=now.getMonth();
months=[];
for (i=0;i<numberOfMonths;i++) {
m1=mo-i+1; m2=m1-1;
d = new Date(year,m1,1);
d2=new Date(year,m2,1);
from= d2.getTime()/1000;
to= d.getTime()/1000;
dt={from:from, to:to, month:d2}; months.push(dt);
}
prev="$nothing";
cond={};
months.forEach(function(m) {
cond={$cond: [{$and :[ {$gte:["$_id",m.from]}, {$lt:["$_id",m.to]} ]}, m.month, prev]};
prev=cond;
} );
/* now you can use "cond" variable in your pipeline to generate month */
db.collection.aggregate( { $project: { month: cond , value:1 } },
{ $group: {_id:"$month", sum:{$sum:"$value"} } }
)