I am doing a kind of room reservation system where a collection is containing documents which contains two dates : a begin date and an end date.
I would like to be able to find all the reservation which begin date is between two dates and end date is also between two dates.
I have used MongoDB compound indexes thus I am indexing start date and end date field.
However I am wondering if I can imporove my query performnce by using GEO2D indexes. For this we could convert begin date and end date to unix time, then each booking is a point whose position is (start date, end date).
Using the $within operator it makes it possible to query for reservation which are in a range of start date AND end date.
Since GEO index are more used for spatial data I guess, would it make sense to use them for this specific use-case ?
Finally since GEO2D indexes are inplemented as B-Trees in MongoDB and not as R-Trees, what is the difference between traditional indexes and this GEO one?
It is an interesting idea, but I don't think it will help your search speed or efficiency. Geo indexes in MongoDB are just B-trees applied to a geohash, where the geohash is just a mechanism to convert something that is two dimensional to something that is one dimensional, such that it can be used by B-trees. Geohashing is a powerful concept but has some peculiarities in that points that are close together could end up in totally different buckets, which can make searching for the x nearest points to a point quite inefficient, as 9 boxes have have to searched around your point of interest. A within query would have the same issues.
I would have thought that sharding on a date column (possibly as unix time) would be a more efficient way to improve performance, though there are some caveats around using a datatype that is monotonically increased as a shard key, such as a timestamp, see MongoDB shard keys.
Related
If I have a Cloudant MapReduce view with year/month/day as the key array, can I query the dataset just by month or just by day?
No. You can query by y/m/d or by y/m or by y.
In other words, you are allowed to omit fields, but you cannot have gaps, so you have to start omitting from the right.
Examples:
Querying by y/m/d -- key=[2022,5,20] finds everything for one day
Querying by y/m -- startkey=[2022,1]&endkey=[2022,2] finds everything in January
Querying by y -- startkey=[2021]&endkey=[2022] finds everything in 2021
Are you able to use MongoDB to combine rows of data into one row?
I'm using dates with year, month, day and hour. The data is shown per hour. Is there a way to combine data of the hours into just one day with data. I would basically remove the hour column and sum the hour data into per day data.
I'm not sure what you mean by "the data is shown per hour" - do you mean it's stored in the database that way?
MongoDB doesn't have rows and columns - the equivalent of a row is a document, and the column equivalent is a field. Unlike in traditional SQL, a field isn't just one piece of information (a string, number/date, boolean, null, etc). It can be more than one piece of data - it can be an array, or a document, or an array of documents, etc.
Anyway, based on the small amount of information I have on your situation, I'd absolutely design the data with the bucket pattern. https://www.mongodb.com/blog/post/building-with-patterns-the-bucket-pattern
You could $unset the 'measurements' array and just keep the sum/count fields if that's what you want.
If your data is already set in stone, then I'd use an aggregation pipeline to group all the documents ('rows') together - the group _id would be year, month, day, and you could sum/count/min/max/etc the data in the group too.
I'm new to Neo4j so maybe I'm just completely wrong on this, but I'll give it a try!
Our data is mostly composed by reservations, users and facilities stored as nodes.
I need both to count the total reservations that occurred in a specific time frame and the overall income (stored as reservation.income) in this timeframe.
I was thinking to overcome the problem by creating the date as a node, in this way I can assign a relationship [:PURCHASED_ON] to all the reservations that occurred on a specific date.
As far as I've understood, creating the date as a node could give me a few pros:
I could split the date from dd/mm/yyyy and store them as integer properties, in this way I could use mathematical operators such as > and <
I could create a label for the node representing each month as a word
It should be easier to sum() the income on a day or a month
Basicly, I was thinking about doing something like this
CREATE (d:Day {name:01/11/2016 day: TOINT(01), month: TOINT(11), year: TOINT(2016)}
I have seen that a possible solution could be to create a node for every year, every month (1-12) and every day (1-31), but I think that would just complicate terribly the architecture of my Graph since every reservation has an "insert_date" (the day it's created) and then the official "reservation_date" (the day it's due).
Am I onto something here or is it just a waste of time? Thanks!
You may want to look at the GraphAware TimeTree library, as date handling is a complex thing, and this seems to be the logical conclusion of the direction you're going. The TimeTree also has support for finding events attached to your time tree between date ranges, at which point you can perform further operations (counting, summing of income, etc).
There are many date/time functions in the APOC plugin that you should take a look at.
As an example, you can use apoc.date.fields (incorrectly called by the obsolete name apoc.date.fieldsFormatted in the APOC doc) to get the year, month, day to put in your node:
WITH '01/11/2016' AS d
WITH apoc.date.fields(d, 'MM/dd/yyyy') AS f
CREATE (d:Day {name: d, day: f.days, month: f.month, year: f.years});
NOTE: The properties in the returned map have names that are oddly plural. I have submitted an issue requesting that the names be made singluar.
How do I setup an index on DynamoDB table to compare dates? i.e. for example, I have a column in my table called synchronizedAt, I want my query to fetch all the rows that were never synchronized (i.e. 0) or weren’t synchronized in the past 2 weeks i.e. (new Date().getTime()) - (1000 * 60 * 60 * 24 * 7 * 4)
It depends by the other attributes of your table.
You may use an Hash and Range Primary Key if the set of Hash values is relatively small and stable; in this case you could filter the dates by putting them in the Range, but anyway the queries will be done by specifying the Index also, and because of this it may or may not make sense to pre-query for all the Index values in order to perform a loop where ask for the interesting Range (inside of the loop for each Index value).
An alternative could be an Hash and Range GSI. In this case you might put a fixed dumb value as Index, in order to query for the range of all Items at once.
Lastly, the less efficient Scan, but with large tables it will be a problem (the larger the table the more time the Scan will take to complete).
I had similar requirement to query on the date range. In my case date range was only criteria. The issue with DynamoDB is you cannot create an Index with Just Range key. It always require Hashkey and Query on such index always expect equal to condition for Hashkey.
So I tricked the DB. I created a Key as Century and populated with century year of the date. For example 1 Jan 2019, century key value is 20. For 1 Jan 2020 century key value is also 20. Very easy to derived from any date. Then Created GSI with Hashkey as Century and RangeKey as date. While querying it is very easy to derive century from date range and Build query condition Hashkey as century and date range. Since I am dealing with data no more than 5 years, trick won't fail for next 75 years. :)
It is not so "nice to have" workaround but it work for me quite well. May be it will help someone else as well.
I have a lookup file which has a start and end range which are Long. I have my long input which can fall in this range. I have 5mil records in my lookup.
I have tried tIntervalMatch and this seems to be very slow.
Is there any alternative ?