How to perform interval search using talend 5.6 - talend

I have a lookup file which has a start and end range which are Long. I have my long input which can fall in this range. I have 5mil records in my lookup.
I have tried tIntervalMatch and this seems to be very slow.
Is there any alternative ?

Related

PostgreSQL: UNIX Time Dynamic query

I'm making a Power BI report where data that I need to show is stored in a PostgreSQL database.
In the table where I query there is data from 4 years ago until today, but for my report I only need the last week of records (I know that I can filter using Power BI but my goal is make the lighter query as possible).
The fields of the database related to time, are in UNIX Timestamp, so I'm filtering it now by this way:
SELECT
DATABASE.INCIDENT_NUMBER
,DATABASE.SUBMITTER
,DATABASE.CREATE_DATE
,DATABASE.MODIFIED_DATE
,DATABASE.CLOSED_DATE
,DATABASE.SUBJECT
FROM
DATABASE
WHERE
1643670000 < DATABASE.CREATE_DATE
ORDER BY DATABASE.INCIDENT_NUMBER, DATABASE.CREATE_DATE ASC
That is fine, but I want to improve it making a dynamic query which returns the records from last week till today, without putting a constant of UNIX timestamp.
How can I make that?
That's an excellent example why it is a bad idea to store timestamps as numbers. The correct data type is timestamp or timestamp with time zone.
If you had used the correct data type, your condition would be as simple as
WHERE current_date - 7 < database.create_date
But with numbers, you have to convert back and forth:
WHERE to_timestamp(EXTRACT('epoch' FROM current_date) - 7) < database.create_date

Truncate datetimes by second for all queries, but keep milliseconds stored in Postgres

I'm trying to find a way to tell Postgres to truncate all datetime columns so that they are displayed and filtered by seconds (ignoring milliseconds).
I'm aware of the
date_trunc('second', my_date_field)
method, but do not want to do that for all datetime fields in every select and where clause that mentions them. Dates in the where clause need to also capture records with the granularity of seconds.
Ideally, I'd avoid stripping milliseconds from the data when it is stored. But then again, maybe this is the best way. I'd really like to avoid that data migration.
I can imagine Postgres having some kind of runtime configuration like this:
SET DATE_TRUNC 'seconds';
similar to how timezones are configured, but of course that doesn't work and I'm unable to find anything else in the docs. Do I need to write my own Postgres extension? Did someone already write this?

date time with timezone

The date/time strings we're sending over to pub/sub look like this:
2018-07-18T17:30:08Z
I created a data flow job to insert these into Big Query and it failed at insert.
Stripping out the "Z" at the end like this was successful:
2018-07-18T17:30:08
The problem is that Big Query seems to be interpreting this as a local time, and not UTC.
I've tried both of these ways to insert the time zone:
2018-07-18T17:30:08+00:00
2018-07-18T17:30:08+0000
Both are rejected.
What's the correct way to do this, or is there some other way I can force Big Query to interpret these times as UTC?

GEO2D indexes for search by two ranges of date (timeseries)

I am doing a kind of room reservation system where a collection is containing documents which contains two dates : a begin date and an end date.
I would like to be able to find all the reservation which begin date is between two dates and end date is also between two dates.
I have used MongoDB compound indexes thus I am indexing start date and end date field.
However I am wondering if I can imporove my query performnce by using GEO2D indexes. For this we could convert begin date and end date to unix time, then each booking is a point whose position is (start date, end date).
Using the $within operator it makes it possible to query for reservation which are in a range of start date AND end date.
Since GEO index are more used for spatial data I guess, would it make sense to use them for this specific use-case ?
Finally since GEO2D indexes are inplemented as B-Trees in MongoDB and not as R-Trees, what is the difference between traditional indexes and this GEO one?
It is an interesting idea, but I don't think it will help your search speed or efficiency. Geo indexes in MongoDB are just B-trees applied to a geohash, where the geohash is just a mechanism to convert something that is two dimensional to something that is one dimensional, such that it can be used by B-trees. Geohashing is a powerful concept but has some peculiarities in that points that are close together could end up in totally different buckets, which can make searching for the x nearest points to a point quite inefficient, as 9 boxes have have to searched around your point of interest. A within query would have the same issues.
I would have thought that sharding on a date column (possibly as unix time) would be a more efficient way to improve performance, though there are some caveats around using a datatype that is monotonically increased as a shard key, such as a timestamp, see MongoDB shard keys.

How to increment Datetime field in mongoDB document with atomic operations? And the same in morphia?

Well. That's it.
I have a documet with Datetime field in it. Now I need to perform atomic operation that will increase this value on some period, f.e. one day.
How to do that?
And ultimately I need to do that via morphia, if you know how to, please share.
Dates are milliseconds since the Unix epoch (MongoDB Dates).
So you can query for the document with the date you want to increment and add (inc/dec in Morphia) the number of ms you need - for example for a day.