MongoDB 5
9 million records in a collection
sparse index on a createdAt field.
Aggregation match to find all records created before a certain date
Issue is, the index is hit, but the IXSCAN takes about 60 seconds.
What can be done to speed this up?
const result = await collection.aggregate([{ $match: { createdAt: { $lte: /* SOME DATE */ } } }]);
I have a collection with $vehicleId and $Scraped Date. I am trying to get the avg days a car is in inventory. And I want to calculate it for all the historical days.
Sample Doc
{"_id":{"$oid":"5e1b46d853848fae2832e01a"},"Scraped Date":{"$date":{"$numberLong":"1578845911324"}},"vehicleId":{"$numberInt":"1376788"}}
{"_id":{"$oid":"5e1b46d853848fae2832e01b"},"Scraped Date":{"$date":{"$numberLong":"1578845911324"}},"vehicleId":{"$numberInt":"1376771"}}
{"_id":{"$oid":"5e1b46d853848fae2832e01c"},"Scraped Date":{"$date":{"$numberLong":"1578845911324"}},"vehicleId":{"$numberInt":"1376734"}}
{"_id":{"$oid":"5e1b46d853848fae2832e01d"},"Scraped Date":{"$date":{"$numberLong":"1578845911324"}},"vehicleId":{"$numberInt":"1376706"}}
{"_id":{"$oid":"5e1b46d853848fae2832e01e"},"Scraped Date":{"$date":{"$numberLong":"1578845911324"}},"vehicleId":{"$numberInt":"1376505"}}
collection.aggregate([
{'$group': {
'_id' : {'vehicleId': '$vehicleId'},
'date' : {'$addToSet': "$Scraped Date"}
} }
]
)
This code is giving me a list of dates the vehicleId was found in the inventory. How can I convert this to list of dates with avg length the cars were in inventory for that day? I could think of finding the avg length of the dates column but that wont give the me the data day wise.
The current output looks like this in a dataframe:
dataframe view
I figured out a solution. Created a simple for loop for every date and then used the $match query to first filter the results and then calculate the avg length. The question is closed for now. I will update the code in the original question in a while
Is there any way i can sort a collection descending of day from an createdAt field.
What I'm expecting is that, I wanted to sort contents based on created day from createdAt field with rating field. So that i will get all the good rated content for each day
If i do { rating: -1, createdAt: -1 } i'm not getting the desired output
assuming I have the following nested document structure, where my document contains nested routes with an array of date time values.
{
property_1: ...,
routes: [
{
start_id: 1,
end_id: 2,
execution_times: ['2016-08-28T11:11:47+02:00', ...]
}
]
}
Now I could filter my documents that match certain execution_times with something like this.
query: {
filtered: {
query: {
match_all: { }
},
filter: {
nested: {
path: 'routes',
filter: {
bool: {
must: [
{
terms: {
'routes.execution_times': ['2016-08-28T11:11:47+02:00', ...]
}
},
...
]
}
}
}
}
}
}
But what if I would like to filter my documents based on execution dates. What's the best way achieving this?
Should I use a range filter to map my dates to time ranges?
Or is it better to use a script query and do a conversion of the execution_times to dates there?
Or is the best way to change the document structure to contain both, the execution_date and execution_time?
Update
"The dates are not a range but individual dates like [today, day after tomorrow, 4 days from now, 10 days from now]"
Well, this is still a range as a day means 24 hours. So if you store your field as date time, you can use leverage range query : from 20-Nov-2010 00:00:00 TO 20-Nov-2010 23:59:59 with appropriate time zone for a specific day.
If you store it as a String then you will lose all the flexibility of date maths and you would be able to do only exact String matches. You will then have to do all the date manipulations at the client side to find exact matches and ranges.
I suggest play with range queries using Sense plugin and I am sure it will satisfy almost all your requirements.
-----------------------
You should make sure that you use appropriate date-time mapping for your field and use range filter over that field. You don't need to split into 2 separate fields. Date maths will allow you to query just based on date.
This will make your life much easier if you want to do aggregations over date time field.
Reference:
Date Maths:
https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#date-math
Date Mapping : https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html
Date Range Queries:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
Say I have an entry for every day in the year (or possibly every hour, every minute, ...). What I'd like to do is query all rows that are in between the range of two dates and only return one entry for every interval n (e.g. one entry each week or one entry every second day, ...)
For a more specific example, my database has entries like this:
{ _id: ..., date: ISODate("2014-07-T01:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-07-02T12:00:00Z"), values: ... }
...
{ _id: ..., date: ISODate("2015-03-17T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2015-03-18T12:00:00Z"), values: ... }
I want every result between 2014-12-05 and 2015-02-05 but only one every 3 days. The result set should look like this:
{ _id: ..., date: ISODate("2014-12-05T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-08T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-11T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-14T12:00:00Z"), values: ... }
...
Can this be done somehow?
Using the aggregation framework (and an awfully complicated query), you can achieve your goal. Something along the lines of the following:
db.coll.aggregate([
{$match: {
date: {
$gte: ISODate("2014-12-08T12:00:00.000Z"),
$lt: ISODate("2014-12-12T00:00:00.000Z")
}
}},
{$project:
{ date:1,
value: 1,
grp: { $let:
{
vars: { delta:{$subtract:["$date", ISODate("2014-12-08T12:00:00.000Z")]}},
in: {$subtract:["$$delta", {$mod:["$$delta",3*24*3600*1000]}]}
}
}
}
},
{$sort: { date: 1 }},
{$group: {_id:"$grp", date: {$first:"$date"}, value: {$first: "$value"}}}
])
the $match step will keep only rows in the desired range;
the project step will keep date and value, and will compute a "group number" based on the date. delta is the time difference in ms between the given date and some arbitrary application dependent origin. As MongoDB does not have the integer division operator, I use a substitute: delta-mod(delta, 3*24*3600*1000). This will change every 3 days (3 days × 24 hours × 3600 sec × 1000 ms);
the $sort step is maybe not required depending your use case. I use it in order to ensure a deterministic result when keeping the first date and value of each group in the next step;
finally (!) $group will group documents by the grp value calculated before, keeping only the first date and value of each group.
You can query for ranges using the following syntax:
db.collection.find( { field: { $gt: value1, $lt: value2 } } );
In your case, field would be the date field and this question may help you format the values:
return query based on date
Edit: I did not see the requirement for retrieving every nth document. In that case, I'm not sure MongoDB has built in support for that. You may have to manipulate the returned array yourself. In this case, once you get the range you can filter by index. Here's some boilerplate (I couldn't figure out an efficient use of Array.prototype.filter since that function removes the need for indices -- the opposite of what you want.):
var result =[]
for (var i = 0; i < inputArray.length ; i+=3) {
result.push(numList[i]);
}
return result;