Optimal match all records within date range, large mongo data set - mongodb

MongoDB 5
9 million records in a collection
sparse index on a createdAt field.
Aggregation match to find all records created before a certain date
Issue is, the index is hit, but the IXSCAN takes about 60 seconds.
What can be done to speed this up?
const result = await collection.aggregate([{ $match: { createdAt: { $lte: /* SOME DATE */ } } }]);

Related

getting updated records from last 2 hours

I have a large amount of data in my MongoDB and I want to query those records which were updated in the last 2 hours. can someone help
Pretty simple actually. First, add an updatedAt attribute at your collection. But I assume that you already have this.
So, in short:
db.collection.find({ "updatedAt" : { $lt: new Date(Date.now() - 2 * 60 * 60 * 1000) } })
If you did not have an updatedAt attribute, then this one is also possible.
db.collection.find({ $where: function () {
return Date.now() - this._id.getTimestamp() < (2 * 60 * 60 * 1000)
} })
Explanation:
You will find all documents whose updatedAt attribute is less than 7200 seconds.
You will find all documents whose ObjectID is less than 7200 seconds.
Remember that ObjectID's timestamp can be retrieved.

Aggregating last n days in MongoDB

I'm trying to build a query in MongoDb (to be run with pymongo) to get a sum by groups for the last 30 days. I'm really struggling to combine both aggregate function and date differences. Tne SQL equivalent of the query would be:
SELECT item, sum(volume) from table
where date >= DATEADD(DAY, -30, now())
group by item
Can anyone help?
This is quite simple, first you want to match the matching documents by their date field and then use $group to calculate the sum.
First what we need for the code:
import pymongo
from datetime import date, timedelta, datetime
connection = pymongo.MongoClient("connection url")
collection = connection["db_name"]['collection_name']
Main Aggregation:
thirty_days_ago = date.today() - timedelta(days=30)
results = list(collection.aggregate([
{
"$match": {
"date": {"$gt": thirty_days_ago }
}
},
{
"$group": {
"_id": "item",
"sum": {"$sum": "$volume"}
}
}
]))
I had to guess some of the fields names as you did not provide a schema but you should be able to easily adapt it to your needs.

i want to get the results from mongodb collection using date range "utc_timestamp" field?

My question consist of two parts
i want to collect the data from mongodb using date range (from date
and to date).
Secondly i need day wise data from my collection within the date range?
enter image description here
Did you try Comparison Query Operators?
$eq Matches values that are equal to a specified value.
$gt Matches values that are greater than a specified value.
$gte Matches values that are greater than or equal to a specified value.
$in Matches any of the values specified in an array.
$lt Matches values that are less than a specified value.
$lte Matches values that are less than or equal to a specified value.
$ne Matches all values that are not equal to a specified value.
$nin Matches none of the values specified in an array.
Example 1 ,
db.collectionName.find({
utc_timestamp: {
$gte: new Date("2010-04-29T00:00:00.000Z"),
$lt: new Date("2010-05-01T00:00:00.000Z")
}
})
Example 2 ,
db.collectionName.find({
utc_timestamp: {
$gte: {
$dateFromString: {
dateString: '$utc_timestamp'
}$lt: {
$dateFromString: {
dateString: '$utc_timestamp'
}
}
})

Best way to get documents in specific timeframe MongoDB

I am creating a script that would run after every x minutes and needs to gather data from MongoDB by timestamps.
For example, how would I match the documents with aggregation that have a timestamp in the following timeframe:
start_time = current_time - 60 min
end_time = start_time + 30 min
And I would need to get all the documents that stay within that time frame.
The MongoDB objects have ISODate timestamps on them.
Thanks!
You can create date objects in mongo shell like so:
db.getCollection('order').aggregate([
{
$match : {
start_time : {$gte : new Date(ISODate().getTime() - 1000 * 60 * 60)},
end_time : {$lte : new Date(ISODate().getTime() + 1000 * 60 * 30)}
}
}
...
])
You can use this in aggregate but also in normal find queries.
Note I wrote this without testing, so it might have syntax errors..
db.collection.find({"createdAt": { $gt: new Date('2017-04-25')},"updatedAt":{$lt:new Date('2017-06-25')}})
updatedAt and createdAt are the feilds I have taken at the time of designing the schema by timestamp. You could give feilds according to you design.
the find query would be little better than aggregate in this case as no complex functions have to be performed

Find rows between two dates that are an interval n apart

Say I have an entry for every day in the year (or possibly every hour, every minute, ...). What I'd like to do is query all rows that are in between the range of two dates and only return one entry for every interval n (e.g. one entry each week or one entry every second day, ...)
For a more specific example, my database has entries like this:
{ _id: ..., date: ISODate("2014-07-T01:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-07-02T12:00:00Z"), values: ... }
...
{ _id: ..., date: ISODate("2015-03-17T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2015-03-18T12:00:00Z"), values: ... }
I want every result between 2014-12-05 and 2015-02-05 but only one every 3 days. The result set should look like this:
{ _id: ..., date: ISODate("2014-12-05T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-08T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-11T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-14T12:00:00Z"), values: ... }
...
Can this be done somehow?
Using the aggregation framework (and an awfully complicated query), you can achieve your goal. Something along the lines of the following:
db.coll.aggregate([
{$match: {
date: {
$gte: ISODate("2014-12-08T12:00:00.000Z"),
$lt: ISODate("2014-12-12T00:00:00.000Z")
}
}},
{$project:
{ date:1,
value: 1,
grp: { $let:
{
vars: { delta:{$subtract:["$date", ISODate("2014-12-08T12:00:00.000Z")]}},
in: {$subtract:["$$delta", {$mod:["$$delta",3*24*3600*1000]}]}
}
}
}
},
{$sort: { date: 1 }},
{$group: {_id:"$grp", date: {$first:"$date"}, value: {$first: "$value"}}}
])
the $match step will keep only rows in the desired range;
the project step will keep date and value, and will compute a "group number" based on the date. delta is the time difference in ms between the given date and some arbitrary application dependent origin. As MongoDB does not have the integer division operator, I use a substitute: delta-mod(delta, 3*24*3600*1000). This will change every 3 days (3 days × 24 hours × 3600 sec × 1000 ms);
the $sort step is maybe not required depending your use case. I use it in order to ensure a deterministic result when keeping the first date and value of each group in the next step;
finally (!) $group will group documents by the grp value calculated before, keeping only the first date and value of each group.
You can query for ranges using the following syntax:
db.collection.find( { field: { $gt: value1, $lt: value2 } } );
In your case, field would be the date field and this question may help you format the values:
return query based on date
Edit: I did not see the requirement for retrieving every nth document. In that case, I'm not sure MongoDB has built in support for that. You may have to manipulate the returned array yourself. In this case, once you get the range you can filter by index. Here's some boilerplate (I couldn't figure out an efficient use of Array.prototype.filter since that function removes the need for indices -- the opposite of what you want.):
var result =[]
for (var i = 0; i < inputArray.length ; i+=3) {
result.push(numList[i]);
}
return result;