Hive query with case statement - date

I am trying to use a field in my data called priority in order to drive a numerical value for the DATE_ADD function. Essentially, the priority determines how many days before the issue is out of SLA.
I am trying to use this priority by saying:
pseudo code - If priority=p0, DATE_ADD (date, INTERVAL 1 day) Else If priority=p1, DATE_ADD (date, INTERVAL 15 day)
Here is my code I am trying:
SELECT
jira.jiraid as `JIRA / FR`,
jira.priority as `Priority`,
DATE_FORMAT(jira.created,"MM/dd/Y") as `Date Jira Created`,
DATE_FORMAT(DATE_ADD(jira.created, INTERVAL
CASE jira.status
WHEN "P0" THEN 1
WHEN "P1" THEN 15
WHEN "P2" THEN 40
WHEN "P3" THEN 70
ELSE 70
END day),"MM/dd/Y") as `Date when Out of SLA`
FROM jira
Does hive support this type of if/else statements?

You do not need to use INTERVAL in Hive for adding days. date_add function accepts integer days. Calculate interval in the subquery, this will work and look cleaner:
select
s.jiraid as `JIRA / FR`,
s.priority as `Priority`,
DATE_FORMAT(s.created,'MM/dd/Y') as `Date Jira Created`,
DATE_FORMAT(DATE_ADD(s.created, s.days_interval),'MM/dd/Y') as `Date when Out of SLA`
from
(
SELECT
j.jiraid,
j.priority,
j.created,
CASE j.status
WHEN 'P0' THEN 1
WHEN 'P1' THEN 15
WHEN 'P2' THEN 40
ELSE 70
END as days_interval
FROM jira j
)s;
Though you can calculate case statement inside date_add function, placing case statement as a function parameter and it should also work.

Related

Postgres generate date series with exactly 100 steps

Lets say we have the dates
'2017-01-01'
and
'2017-01-15'
and I would like to get a series of exactly N timestamps in between these dates, in this case 7 dates:
SELECT * FROM
generate_series_n(
'2017-01-01'::timestamp,
'2017-01-04'::timestamp,
7
)
Which I would like to return something like this:
2017-01-01-00:00:00
2017-01-01-12:00:00
2017-01-02-00:00:00
2017-01-02-12:00:00
2017-01-03-00:00:00
2017-01-03-12:00:00
2017-01-04-00:00:00
How can I do this in postgres?
Possibly this can be useful, using the generate series, and doing the math in the select
select '2022-01-01'::date + generate_series *('2022-05-31'::date - '2022-01-01'::date)/15
FROM generate_series(1, 15)
;
output
?column?
------------
2022-01-11
2022-01-21
2022-01-31
2022-02-10
2022-02-20
2022-03-02
2022-03-12
2022-03-22
2022-04-01
2022-04-11
2022-04-21
2022-05-01
2022-05-11
2022-05-21
2022-05-31
(15 rows)
WITH seconds AS
(
SELECT EXTRACT(epoch FROM('2017-01-04'::timestamp - '2017-01-01'::timestamp))::integer AS sec
),
step_seconds AS
(
SELECT sec / 7 AS step FROM seconds
)
SELECT generate_series('2017-01-01'::timestamp, '2017-01-04'::timestamp, (step || 'S')::interval)
FROM step_seconds
Conversion to function is easy, let me know if have trouble with it.
One problem with this solution is that extract epoch always assumes 30-days months. If this is problem for your use case (long intervals), you can tweak the logic for getting seconds from interval.
You can divide the difference between the end and the start value by the number of values you want:
SELECT *
FROM generate_series('2017-01-01'::timestamp,
'2017-01-04'::timestamp,
('2017-01-04'::timestamp - '2017-01-01'::timestamp) / 7)
This could be wrapped into a function if you want to avoid repeating the start and end value.

Recurring future date at every 3 month after create date in PostgreSQL

I am looking for a function in PostgreSQL which help me to generate recurring date after every 90 days from created date
for example: here is a demo table of mine.
id date name
1 "2020-09-08" "abc"
2 "2020-09-08" "xyz"
3 "2020-09-08" "def"
I need furure date like 2020-12-08, 2021-03-08, 2021-06-08, and so on
First it's important to note that, if you happen to have a date represented as text, then you can convert it to a date via:
SELECT TO_DATE('2017-01-03','YYYY-MM-DD');
So, if you happen to have a text as an input, then you will need to convert it to date. Next, you need to know that if you have a date, you can add days to it, like
SELECT CURRENT_DATE + INTERVAL '90 day';
Now, you need to understand that you can use dynamic variables, like:
select now() + interval '1 day' * 180;
Finally, you will need a temporary table to generate several values described as above. Read more here: How to return temp table result in postgresql function
Summary:
create a function
that generates a temporary table
where you insert as many records as you like
having the date shifted
and converting text to date if needed
You can create a function that returns a SETOF dates/timestamps. The below function takes 3 parameters: a timestamp, an interval, the num_of_periods desired. It returns num_of_periods + 1 timestamps, as it returns the original timestamp and the num_of_periods each the specified interval apart.
create or replace
function generate_periodic_time_intervals
( start_date timestamp
, period_length interval
, num_of_periods integer
, out gen_timestamp timestamp
)
returns setof timestamp
language sql
immutable strict
as $$
select (start_date + n * period_length)::timestamp
from generate_series(0,num_of_periods) gs(n)
$$;
For your particular case to timestamp/date as necessary. The same function would work for your case with the interval specified as '3 months' or of '90 days'. Just a note the interval specified can be any valid INTERVAL data type. See here. It also demonstrates the difference between 3 months and 90 days.

Add_months function error based on postgres database

I tried to ruh this query in postgres :
Select to_char((select add_months (to_date ('10/10/2019', 'dd/mm/yyyy'), '11/11/2019') ) , 'dd/mm/yyyy') as temp_date
I got an error :
Function add_months (date, unknown) does not exist
Hint: no function matches the given name and argument types. You might need to add explicit type casts.
Please help
As documented in the manual there is no add_months function in Postgres
But you can simply add an interval:
select to_date('10/10/2019', 'dd/mm/yyyy') + interval '10 months'
If you need to format that date value to something:
select to_char(to_date('10/10/2019', 'dd/mm/yyyy') + interval '10 months', 'yyyy-mm-dd')
No one, even running on Oracle, has run the original query- at least not successfully. It appears that query is expecting to add two months together (in this case Oct and Nov). That is not what the function does. It adds an integer number of months to the specified date and returns the resulting date. As indicated in Postgres just adding the desired interval. However, if you have many occurrences ( like converting) of this the following implements a Postgres version.
create or replace function add_months(
date_in date
, n_months_in integer)
returns date
language sql immutable strict
as
$$
-- given a date and an integer for number of months return the calendar date for the specified number of months away.
select (date_in + n_months_in * interval '1 month')::date
$$ ;
-- test
-- +/- 6 months from today.
select current_date "today"
, add_months(current_date,6) "6 months from now"
, add_months(current_date,-6) "6 months ago"
;

How to bin timestamp data into buckets of n minutes in postgres

I have the following query which works, binning timestamped "observations" into buckets whose boundaries are defined by the bins table:
SELECT
count(id),
width_bucket(
time :: TIMESTAMP,
(SELECT ARRAY(SELECT start_time
FROM bins
WHERE owner_id = 'some id'
ORDER BY start_time ASC) :: TIMESTAMP[])
) bucket
FROM observations
WHERE owner_id = 'some id'
GROUP BY bucket
ORDER BY bucket;
I would like to modify this to allow for querying arbitrary n-minute bins starting from a specified timestamp, rather than having to pull from from an actual "bins" table.
That is, given a start time, a "bin width" in minutes, and a number of bins, is there a way I can generate the array of timestamps to pass into the width_bucket function?
Alternatively, is there a different/simpler approach to get the same results?
Use the function generate_series(start, stop, step interval), e.g.
select array(
select generate_series(
timestamp '2018-04-15 00:00',
'2018-04-15 01:00',
'30 minutes'))
array
---------------------------------------------------------------------
{"2018-04-15 00:00:00","2018-04-15 00:30:00","2018-04-15 01:00:00"}
(1 row)
Example in Db<>fiddle.
The above answers seem to do what you want, but as of PostgreSQL 14, there is now a function date_bin just for binning timestamps.
Quoting the documentation:
date_bin(stride,source,origin)
source is a value expression of type timestamp or timestamp with time zone. (Values of type date are cast automatically to timestamp.) stride is a value expression of type interval. The return value is likewise of type timestamp or timestamp with time zone, and it marks the beginning of the bin into which the source is placed.
Examples:
SELECT date_bin('15 minutes', TIMESTAMP '2020-02-11 15:44:17', TIMESTAMP > '2001-01-01');
Result: 2020-02-11 15:30:00
SELECT date_bin('15 minutes', TIMESTAMP '2020-02-11 15:44:17', TIMESTAMP '2001-01-01 00:02:30');
Result: 2020-02-11 15:32:30
In the case of full units (1 minute, 1 hour, etc.), it gives the same result as the analogous date_trunc call, but the difference is that date_bin can truncate to an arbitrary interval.
The stride interval must be greater than zero and cannot contain units of month or larger.
I would like to call special attention to the line
The return value [...] marks the beginning of the bin into which the source is placed.
This means that input timestamps will always be binned by "rounding down", rather than binning to whichever bin is closest. E.g. if you do:
SELECT date_bin('1 hour', '2021-10-13 00:59:59', '2021-10-13 00:00:00');
Then the result will be 2020-10-13 00:00:00 (rounded down by 59 minutes and 59 seconds), NOT 2021-10-13 01:00:00 (which is only one second away from the supplied timestamp). So the date_bin function does something slightly different than exactly what you ask for, but I figure this is good to post for anyone coming here in the future.
A different approach without a series:
Divide the difference of time and start by the width of the bin (5 minutes in the example) and add 1 because the first bucket of width_bucket(...) is 1 not 0.
floor(extract(epoch from (time - '2019-06-04 00:00'::timestamp)) / (5 * 60) ) + 1 as bucket
Getting the start of the bin is also possible
to_timestamp(floor(extract(epoch from a.time) / (5 * 60)) * (5 * 60)) as bin_start
Putting this all together:
SELECT
count(id),
floor(extract(epoch from (time - '2019-06-04 00:00'::timestamp)) / (5 * 60) ) + 1 as bucket,
to_timestamp(floor(extract(epoch from time) / (5 * 60)) * (5 * 60)) as bin_start
FROM observations
WHERE owner_id = 'some id'
GROUP BY bucket, bin_start
ORDER BY bucket;

Postgresql Query very slow with ::date, ::time, and interval

I have a sql query that is very slow:
select number1 from mytable
where symbol = 25
and timeframe = 1
and date::date = '2008-02-05'
and date::time='10:40:00' + INTERVAL '30 minutes'
The goal is to return one value, and postgresql takes 1.7 seconds to return the desired value(always a single value). I need to execute hundreds of those queries for one task, so this gets extremely slow.
Executing the same query, but pointing to the time directly without using interval and ::date, ::time takes only 17ms:
select number1 from mytable
where symbol = 25
and timeframe = 1
and date = '2008-02-05 11:10:00'
I thought it would be faster if I would not use ::date and ::time, but when I execute a query like:
select number1 from mytable
where symbol = 25
and timeframe = 1
and date = '2008-02-05 10:40:00' + interval '30 minutes'
I get a sql error (22007). I've experimented with different variations but I couldn't get interval to work without using ::date and ::time. Date/Time Functions on postgresql.org didn't help me out.
The table got a multi column index on symbol, timeframe, date.
Is there a fast way to execute the query with adding time, or a working syntax with interval where I do not have to use ::date and ::time? Or do I need to have a special index when using queries like these?
Postgresql version is 9.2.
Edit:
The format of the table is:
date = timestamp with time zone,
symbol, timeframe = numeric.
Edit 2:
Using
select open from ohlc_dukascopy_bid
where symbol = 25
and timeframe = 1
and date = timestamp '2008-02-05 10:40:00' + interval '30' minute
Explain shows:
"Index Scan using mcbidindex on mytable (cost=0.00..116.03 rows=1 width=7)"
" Index Cond: ((symbol = 25) AND (timeframe = 1) AND (date = '2008-02-05 11:10:00'::timestamp without time zone))"
Time is now considerably faster: 86ms on first run.
The first version will not use a (regular) index on the column named date.
You didn't provide much information, but assuming the column named date has the datatype timestamp (and not date), then the following should work:
and date = timestamp '2008-02-05 10:40:00' + interval '30 minutes'
this should use an index on the column named date (but only if it is in fact a timestamp not a date). It is essentially the same as yours, the only difference is the explicit timestamp literal (although Postgres should understand '2008-02-05 10:40:00' as a timestamp literal as well).
You will need to run an explain to find out if it's using an index.
And please: change the name of that column. It's bad practise to use a reserved word as an identifier, and it's a really horrible name, which doesn't say anything about what kind of information is stored in the column. Is it the "start date", the "end date", the "due date", ...?