Find rows between two dates that are an interval n apart - mongodb

Say I have an entry for every day in the year (or possibly every hour, every minute, ...). What I'd like to do is query all rows that are in between the range of two dates and only return one entry for every interval n (e.g. one entry each week or one entry every second day, ...)
For a more specific example, my database has entries like this:
{ _id: ..., date: ISODate("2014-07-T01:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-07-02T12:00:00Z"), values: ... }
...
{ _id: ..., date: ISODate("2015-03-17T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2015-03-18T12:00:00Z"), values: ... }
I want every result between 2014-12-05 and 2015-02-05 but only one every 3 days. The result set should look like this:
{ _id: ..., date: ISODate("2014-12-05T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-08T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-11T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-14T12:00:00Z"), values: ... }
...
Can this be done somehow?

Using the aggregation framework (and an awfully complicated query), you can achieve your goal. Something along the lines of the following:
db.coll.aggregate([
{$match: {
date: {
$gte: ISODate("2014-12-08T12:00:00.000Z"),
$lt: ISODate("2014-12-12T00:00:00.000Z")
}
}},
{$project:
{ date:1,
value: 1,
grp: { $let:
{
vars: { delta:{$subtract:["$date", ISODate("2014-12-08T12:00:00.000Z")]}},
in: {$subtract:["$$delta", {$mod:["$$delta",3*24*3600*1000]}]}
}
}
}
},
{$sort: { date: 1 }},
{$group: {_id:"$grp", date: {$first:"$date"}, value: {$first: "$value"}}}
])
the $match step will keep only rows in the desired range;
the project step will keep date and value, and will compute a "group number" based on the date. delta is the time difference in ms between the given date and some arbitrary application dependent origin. As MongoDB does not have the integer division operator, I use a substitute: delta-mod(delta, 3*24*3600*1000). This will change every 3 days (3 days × 24 hours × 3600 sec × 1000 ms);
the $sort step is maybe not required depending your use case. I use it in order to ensure a deterministic result when keeping the first date and value of each group in the next step;
finally (!) $group will group documents by the grp value calculated before, keeping only the first date and value of each group.

You can query for ranges using the following syntax:
db.collection.find( { field: { $gt: value1, $lt: value2 } } );
In your case, field would be the date field and this question may help you format the values:
return query based on date
Edit: I did not see the requirement for retrieving every nth document. In that case, I'm not sure MongoDB has built in support for that. You may have to manipulate the returned array yourself. In this case, once you get the range you can filter by index. Here's some boilerplate (I couldn't figure out an efficient use of Array.prototype.filter since that function removes the need for indices -- the opposite of what you want.):
var result =[]
for (var i = 0; i < inputArray.length ; i+=3) {
result.push(numList[i]);
}
return result;

Related

Optimal match all records within date range, large mongo data set

MongoDB 5
9 million records in a collection
sparse index on a createdAt field.
Aggregation match to find all records created before a certain date
Issue is, the index is hit, but the IXSCAN takes about 60 seconds.
What can be done to speed this up?
const result = await collection.aggregate([{ $match: { createdAt: { $lte: /* SOME DATE */ } } }]);

Converting StringDate to a queryable representation. Group and project sum of today,yesterday, week, month

I was about to try to figure out how to complete the given question.
It might consists of 2 parts, first one being - my collection dates are stored in plain string with a mysql format (YYYY-mm-dd HH:mm:ss), the second how to project the (today, yesterday, 7 day, month - summaries).
I have been experimenting around and this is what I came up with.
Pipe 1.
$match - nothing fancy there just a simple field = value.
Pipe 2.
$addField - trying to process the string date as a ISO date I believe? I am not sure
{
expired: {
$dateFromString: {
dateString: '$expired',
timezone: 'America/New_York'
}
}
}
Pipe 3.
$match - Quoted out wanted to select only a specific range so not more than 30 days - doesn't work
expired: {
$gt: ISODate(new Date(new Date(ISODate().getTime() - 1000*60*60*24*30)))
}
Pipe 4.
$group - Here I group and sum everything per day. So an output is
_id: 2021-09-27, theVal : 100
{
_id: {
$dateToString: {
date: { $toDate: "$expired" },
format: "%Y-%m-%d" }
},
theVal : {$sum:{$first:"$values.quantity"}} // as $values is an array [0].quantity,[1].quantity,[2].quantity - I am just interested in the first element.
}
Pipe 5.
$project - getting rid of the _id field - making it date name field, keeping theVal.
{
"date": "$_id",
"theVal": 1,
"_id": 0
}
theVal is a sum of integers within a day.
Questions
Between Pipe 1 and 2 ( temporary 3 ) I should be able to match dates
within the last 30 days to reduce the processing?
How to get a desired output like this:
{
today : 100,
yesterday : 10,
7days : 220,
month: 1000,
}
Really appreciate any help here.
Tried to "replicate" what you intend to do as you didn't provided sample test data.
You may want to do the followings in an aggregation pipeline:
$match : filter out the ids you want - same as your pipe 1
$dateFromString: use "format": "%Y-%m-%d %H:%M:%S", "timezone": "America/New_York"
$match : filter out records that are within 30 days with $expr and $$NOW
$group : group by date without time; achieved by converting to dateString with date part only
$addFields : project flags that determine if the record are within today, ``yesterday, 7days, month`
$group : As you didn't provided what is the meaning of today, ``yesterday, 7days, month, I made an assumption that they are the cumulative sum in the ranges. Simply and conditional $sum` will do the summation with the help of flags in step 5
Here is a Mongo playground for your reference.

Aggregating last n days in MongoDB

I'm trying to build a query in MongoDb (to be run with pymongo) to get a sum by groups for the last 30 days. I'm really struggling to combine both aggregate function and date differences. Tne SQL equivalent of the query would be:
SELECT item, sum(volume) from table
where date >= DATEADD(DAY, -30, now())
group by item
Can anyone help?
This is quite simple, first you want to match the matching documents by their date field and then use $group to calculate the sum.
First what we need for the code:
import pymongo
from datetime import date, timedelta, datetime
connection = pymongo.MongoClient("connection url")
collection = connection["db_name"]['collection_name']
Main Aggregation:
thirty_days_ago = date.today() - timedelta(days=30)
results = list(collection.aggregate([
{
"$match": {
"date": {"$gt": thirty_days_ago }
}
},
{
"$group": {
"_id": "item",
"sum": {"$sum": "$volume"}
}
}
]))
I had to guess some of the fields names as you did not provide a schema but you should be able to easily adapt it to your needs.

i want to get the results from mongodb collection using date range "utc_timestamp" field?

My question consist of two parts
i want to collect the data from mongodb using date range (from date
and to date).
Secondly i need day wise data from my collection within the date range?
enter image description here
Did you try Comparison Query Operators?
$eq Matches values that are equal to a specified value.
$gt Matches values that are greater than a specified value.
$gte Matches values that are greater than or equal to a specified value.
$in Matches any of the values specified in an array.
$lt Matches values that are less than a specified value.
$lte Matches values that are less than or equal to a specified value.
$ne Matches all values that are not equal to a specified value.
$nin Matches none of the values specified in an array.
Example 1 ,
db.collectionName.find({
utc_timestamp: {
$gte: new Date("2010-04-29T00:00:00.000Z"),
$lt: new Date("2010-05-01T00:00:00.000Z")
}
})
Example 2 ,
db.collectionName.find({
utc_timestamp: {
$gte: {
$dateFromString: {
dateString: '$utc_timestamp'
}$lt: {
$dateFromString: {
dateString: '$utc_timestamp'
}
}
})

elasticsearch filter dates based on datetime fields

assuming I have the following nested document structure, where my document contains nested routes with an array of date time values.
{
property_1: ...,
routes: [
{
start_id: 1,
end_id: 2,
execution_times: ['2016-08-28T11:11:47+02:00', ...]
}
]
}
Now I could filter my documents that match certain execution_times with something like this.
query: {
filtered: {
query: {
match_all: { }
},
filter: {
nested: {
path: 'routes',
filter: {
bool: {
must: [
{
terms: {
'routes.execution_times': ['2016-08-28T11:11:47+02:00', ...]
}
},
...
]
}
}
}
}
}
}
But what if I would like to filter my documents based on execution dates. What's the best way achieving this?
Should I use a range filter to map my dates to time ranges?
Or is it better to use a script query and do a conversion of the execution_times to dates there?
Or is the best way to change the document structure to contain both, the execution_date and execution_time?
Update
"The dates are not a range but individual dates like [today, day after tomorrow, 4 days from now, 10 days from now]"
Well, this is still a range as a day means 24 hours. So if you store your field as date time, you can use leverage range query : from 20-Nov-2010 00:00:00 TO 20-Nov-2010 23:59:59 with appropriate time zone for a specific day.
If you store it as a String then you will lose all the flexibility of date maths and you would be able to do only exact String matches. You will then have to do all the date manipulations at the client side to find exact matches and ranges.
I suggest play with range queries using Sense plugin and I am sure it will satisfy almost all your requirements.
-----------------------
You should make sure that you use appropriate date-time mapping for your field and use range filter over that field. You don't need to split into 2 separate fields. Date maths will allow you to query just based on date.
This will make your life much easier if you want to do aggregations over date time field.
Reference:
Date Maths:
https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#date-math
Date Mapping : https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html
Date Range Queries:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html