generate_series function in Amazon Redshift - amazon-redshift

I tried the below:
SELECT * FROM generate_series(2,4);
generate_series
-----------------
2
3
4
(3 rows)
SELECT * FROM generate_series(5,1,-2);
generate_series
-----------------
5
3
1
(3 rows)
But when I try,
select * from generate_series('2011-12-31'::timestamp, '2012-12-31'::timestamp, '1 day');
It generated error.
ERROR: function generate_series(timestamp without time zone, timestamp without time zone, "unknown") does not exist
HINT: No function matches the given name and argument types. You may need to add explicit type casts.
I use PostgreSQL 8.0.2 on Redshift 1.0.757.
Any idea why it happens?
UPDATE:
generate_series is working with Redshift now.
SELECT CURRENT_DATE::TIMESTAMP - (i * interval '1 day') as date_datetime
FROM generate_series(1,31) i
ORDER BY 1
This will generate last 30 days date

The version of generate_series() that supports dates and timestamps was added in Postgres 8.4.
As Redshift is based on Postgres 8.0, you need to use a different way:
select timestamp '2011-12-31 00:00:00' + (i * interval '1 day')
from generate_series(1, (date '2012-12-31' - date '2011-12-31')) i;
If you "only" need dates, this can be abbreviated to:
select date '2011-12-31' + i
from generate_series(1, (date '2012-12-31' - date '2011-12-31')) i;

generate_series is working with Redshift now.
SELECT CURRENT_DATE::TIMESTAMP - (i * interval '1 day') as date_datetime
FROM generate_series(1,31) i
ORDER BY 1
This will generate last 30 days date

I found a solution here for my problem of not being able to generate a time dimension table on Redshift using generate_series(). You can generate a temporary sequence by using the following SQL snippet.
with digit as (
select 0 as d union all
select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9
),
seq as (
select a.d + (10 * b.d) + (100 * c.d) + (1000 * d.d) as num
from digit a
cross join
digit b
cross join
digit c
cross join
digit d
order by 1
)
select (getdate()::date - seq.num)::date as "Date"
from seq;
The generate_series() function, it seems, is not supported completely on Redshift yet. If I run the SQL mentioned in the answer by DJo, it works, because the SQL runs only on the leader node. If I prepend insert into dim_time to the same SQL it doesn't work.

There is no generate_series() function in Redshift for Date Range but you can generate the series with below steps...
Step 1: Created a table genid and insert constant value as 1 for number of times you need to generate the series. If you need the series to be generated for 12 month you can insert 12 times. Better you can insert for more number of times like 100, so that you do not face any issue.
create table genid(id int)
------------ for number of months
insert into genid values(1)
Step 2: The table for which you need to generate the series.
create table pat(patid varchar(10),stdt timestamp, enddt timestamp);
insert into pat values('Pat01','2018-03-30 00:00:00.0','2018-04-30 00:00:00.0')
insert into pat values('Pat02','2018-02-28 00:00:00.0','2018-04-30 00:00:00.0')
insert into pat values('Pat03','2017-10-28 00:00:00.0','2018-04-30 00:00:00.0')
Step 3: This query will generate the series for you.
with cte as
(
select max(enddt) as maxdt
from pat
) ,
cte2 as(
select dateadd('month', -1 * row_number() over(order by 1), maxdt::date ) as gendt
from genid , cte
) select *
from pat, cte2
where gendt between stdt and enddt

Related

extract days of daterange grouped by month postresql

I have a pickupDate and returnDate in my OrderHistory table. I want to extract the sum of rental days of all OrderHistory entries, grouped/ordered by month. A cte seems to be the solution but I don´t get how to implement it in my query since the cte´s i saw were refering to themselves where it says "FROM cte".
I tried something like this:
SELECT
SUM((EXTRACT (DAY FROM("OrderHistory"."returnDate")-("OrderHistory"."pickupDate")))) as traveltime
, to_char("OrderHistory"."pickupDate"::date, 'YYYY-MM') as M
FROM
"OrderHistory"
GROUP BY
M
ORDER BY
M
But the outcome doesn´t split bookings btw two months (e.g. pickupDate=27th march 2022 and returnDate=03rd of april 2022) but will assign the whole 7 days to the month of march, since the returndate is in it. It should show 4 days in march and 3 in april.
Sorry for the probably very stupid question but I am a beginner. (my code is written in postgresql btw)
PostgreSQL naming conventions
Are PostgreSQL column names case-sensitive?
use legal, lower-case names exclusively so double-quoting is not
needed.
Final result in db fiddle
Add daterange column.
alter table order_history add column date_ranges daterange;
update order_history
with a(m_begin, m_end, pickup_date) as
(select date_trunc('month', pickup_date)::date,
(date_trunc('month', pickup_date) + interval '1 month - 1 day')::date,
pickup_date from order_history)
update order_history set date_ranges =
daterange(a.m_begin, a.m_end,'[]') from a
where a.pickup_date = order_history.pickup_date;
then final query:
WITH A AS(
select
pickup_date,
return_date,
return_date - pickup_date as total,
case when return_date <# date_ranges then (return_date - pickup_date)
else ( date_trunc('month', pickup_date) + interval '1 month - 1 day')::date - pickup_date
end partial_mth
from order_history),
b as (SELECT *, a.total - partial_mth parital_not_mth FROM a)
select *,
case when to_char(pickup_date,'YYYY-MM') = to_char(return_date,'YYYY-MM')
then
sum(partial_mth) over(partition by to_char(pickup_date,'YYYY-MM')) +
sum(parital_not_mth) over (partition by to_char(return_date,'YYYY-MM'))
else sum(partial_mth) over(partition by to_char(pickup_date,'YYYY-MM'))
end
from b;
After trying different things I think I found the best answer to my question, that I want to share with the community:
WITH hier as (
SELECT
"OrderHistory"."pickupDate" as start_date
, "OrderHistory"."returnDate" as end_date
, to_char("OrderHistory"."pickupDate"::date, 'YYYY-MM') as M
FROM
"OrderHistory"
GROUP BY
1, 2, 3
ORDER BY
3
), calendar as (
select date '2022-01-01' + (n || ' days')::interval calendar_date
from generate_series(0, 365) n
)
select
to_char(calendar_date::date, 'YYYY-MM')
, count(*) as tage_gebucht
from calendar
inner join hier on calendar.calendar_date between start_date and end_date
where calendar_date between '2022-01-01' and '2022-12-31'
group by 1
order by 1;
I think this is the simplest solution I came up with.

How to form a dynamic pivot table or return multiple values from GROUP BY subquery

I'm having some major issues with the following query formation:
I have projects with start and end dates
Name Start End
---------------------------------------
Project 1 2020-08-01 2020-09-10
Project 2 2020-01-01 2025-01-01
and I'm trying to count the monthly working days within each project with the following subquery
select datetrunc('month', days) as d_month, count(days) as d_count
from generate_series(greatest('2020-08-01'::date, p.start), least('2020-09-14'::date, p.end), '1 day'::interval) days
where extract(DOW from days) not IN (0, 6)
group by d_month
where p.start is from the aliased main query and the dates are hard-coded for now, this correctly gives me the following result:
{"d_month"=>2020-08-01 00:00:00 +0000, "d_count"=>21}
{"d_month"=>2020-09-01 00:00:00 +0000, "d_count"=>10}
However subqueries can't return multiple values. The date range for the query is dynamic, so I would either need to somehow return the query as:
Name Start End 2020-08-01 2020-09-01 ...
-------------------------------------------------------------------------
Project 1 2020-08-01 2020-09-10 21 8
Project 2 2020-01-01 2025-01-01 21 10
Or simply return the whole subquery as JSON, but it doesn't seem to working either.
Any idea on how to achieve this or whether there are simpler solutions for this?
The most correct solution would be to create an actual calendar table that holds every possible day of interest to your business and, at a minimum for your purpose here, marks work days.
Ideally you would have columns to hold fiscal quarters, periods, and weeks to match your industry. You would also mark holidays. Joining to this table makes these kinds of calculations a snap.
create table calendar (
ddate date not null primary key,
is_work_day boolean default true
);
insert into calendar
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from generate_series(
'2000-01-01'::timestamp,
'2099-12-31'::timestamp,
interval '1 day'
) as gs(ts);
Assuming a calendar table is not within scope, you can do this:
with bounds as (
select min(start) as first_start, max("end") as last_end
from my_projects
), cal as (
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from bounds
cross join generate_series(
first_start,
last_end,
interval '1 day'
) as gs(ts)
), bymonth as (
select p.name, p.start, p.end,
date_trunc('month', c.ddate) as month_start,
count(*) as work_days
from my_projects p
join cal c on c.ddate between p.start and p.end
where c.is_work_day
group by p.name, p.start, p.end, month_start
)
select jsonb_object_agg(to_char(month_start, 'YYYY-MM-DD'), work_days)
|| jsonb_object_agg('name', name)
|| jsonb_object_agg('start', start)
|| jsonb_object_agg('end', "end") as result
from bymonth
group by name;
Doing a pivot from rows to columns in SQL is usually a bad idea, so the query produces json for you.

CROSSTAB PostgreSQL - Alternative for PIVOT in Oracle

I'm migrating a query of Oracle pivot to PostgreSQL crosstab.
create table(cntry numeric,week numeric,year numeric,days text,day text);
insert into x_c values(1,15,2015,'DAY1','MON');
...
insert into x_c values(1,15,2015,'DAY7','SUN');
insert into x_c values(2,15,2015,'DAY1','MON');
...
values(4,15,2015,'DAY7','SUN');
I have 4 weeks with 28 rows like this in a table. My Oracle query looks like this:
SELECT * FROM(select * from x_c)
PIVOT (MIN(DAY) FOR (DAYS) IN
('DAY1' AS DAY1 ,'DAY2' DAY2,'DAY3' DAY3,'DAY4' DAY4,'DAY5' DAY5,'DAY6' DAY6,'DAY7' DAY7 ));
Result:
cntry|week|year|day1|day2|day3|day4|day4|day6|day7|
---------------------------------------------------
1 | 15 |2015| MON| TUE| WED| THU| FRI| SAT| SUN|
...
4 | 18 |2015| MON| ...
Now I have written a Postgres crosstab query like this:
select *
from crosstab('select cntry,week,year,days,min(day) as day
from x_c
group by cntry,week,year,days'
,'select distinct days from x_c order by 1'
) as (cntry numeric,week numeric,year numeric
,day1 text,day2 text,day3 text,day4 text, day5 text,day6 text,day7 text);
I'm getting only one row as output:
1|17|2015|MON|TUE| ... -- only this row is coming
Where am I doing wrong?
ORDER BY was missing in your original query. The manual:
In practice the SQL query should always specify ORDER BY 1,2 to ensure that the input rows are properly ordered, that is, values with the same row_name are brought together and correctly ordered within the row.
More importantly (and more tricky), crosstab() requires exactly one row_name column. Detailed explanation in this closely related answer:
Crosstab splitting results due to presence of unrelated field
The solution you found is to nest multiple columns in an array and later unnest again. That's needlessly expensive, error prone and limited (only works for columns with identical data types or you need to cast and possibly lose proper sort order).
Instead, generate a surrogate row_name column with rank() or dense_rank() (rnk in my example):
SELECT cntry, week, year, day1, day2, day3, day4, day5, day6, day7
FROM crosstab (
'SELECT dense_rank() OVER (ORDER BY cntry, week, year)::int AS rnk
, cntry, week, year, days, day
FROM x_c
ORDER BY rnk, days'
, $$SELECT unnest('{DAY1,DAY2,DAY3,DAY4,DAY5,DAY6,DAY7}'::text[])$$
) AS ct (rnk int, cntry int, week int, year int
, day1 text, day2 text, day3 text, day4 text, day5 text, day6 text, day7 text)
ORDER BY rnk;
I use the data type integer for out columns cntry, week, year because that seems to be the (cheaper) appropriate type. You can also use numeric like you had it.
Basics for crosstab queries here:
PostgreSQL Crosstab Query
I got this figured out from http://www.postgresonline.com/journal/categories/24-tablefunc
select year_wk_cntry.t[1],year_wk_cntry.t[2],year_wk_cntry.t[3],day1,day2,day3,day4,day5,day6,day7
from crosstab('select ARRAY[country :: numeric,week,year] as t,days,min(day) as day
from x_c group by country,week,year,days order by 1,2
','select distinct days from x_c order by 1')
as year_wk_cntry (t numeric[],day1 text,day2 text,day3 text,
day4 text, day5 text,day6 text,day7 text);
thanks!!

Get Data From Postgres Table At every nth interval

Below is my table and i am inserting data from my windows .Net application at every 1 Second Interval. i want to write query to fetch data from the table at every nth interval for example at every 5 second.Below is the query i am using but not getting result as required. Please Help me
CREATE TABLE table_1
(
timestamp_col timestamp without time zone,
value_1 bigint,
value_2 bigint
)
This is my query which i am using
select timestamp_col,value_1,value_2
from (
select timestamp_col,value_1,value_2,
INTERVAL '5 Seconds' * (row_number() OVER(ORDER BY timestamp_col) - 1 )
+ timestamp_col as r
from table_1
) as dt
Where r = 1
Use date_part() function with modulo operator:
select timestamp_col, value_1, value_2
from table_1
where date_part('second', timestamp_col)::int % 5 = 0

Getting Dates by Selecting a week in oracle

I have a textbox with random numbers from 1 to 52 which are week numbers of a calendar and a drop down which mentions as years.
For example if I select 2 in a textbox with year 2014, then I want the dates to be mentioned as 05-1-2014 - 11-1-2014. Is it possible to do it.
Also I have tried one query which doesnt match my requirement
SELECT date_val, TO_CHAR (date_val, 'ww')
FROM (SELECT TO_DATE ('01-jan-2013', 'DD-MON-YYYY') + LEVEL AS date_val
FROM DUAL
CONNECT BY LEVEL <= 365)
Please help.
Try this. Here 2 is the number of week in the year (FirstSunday+(NumberOfWeek-1)*7 as WeekStart, FirstSunday+ NumberOfWeek*7-1 as WeekEnd) and 2014 is a year:
select
FirstSunday+(2-1)*7 as WeekStart,
FirstSunday+ 2*7-1 as WeekEnd
from
(
Select NEXT_DAY(TO_DATE('01/01/'||'2014','DD/MM/YYYY')-7, 'SUN') as FirstSunday
from dual
)
SQLFiddle demo
Try this too,
SELECT start_date,
start_date + 6 end_day
FROM(
SELECT TRUNC(Trunc(to_date('2014', 'YYYY'),'YYYY')+ 1 * 7,'IW')-1 start_date
FROM duaL
);