Day of Quarter in Redshift - amazon-redshift

I'm trying to create a calculation based on days in a quarter. Problem is, I just can't find anything on how to do it that's similar to how you would do day of year.
SELECT
event_date,
day_of_quarter
FROM table_a
WHERE event_date BETWEEN '2018-10-01' AND '2019-03-31'
So in the above example for day_of_quarter, 2018-10-01 would retun 1, 2018-12-31 would return 92 and say, 2019-03-05 would return 5

Here is the SQL for this
SELECT
event_date,
(event_date - date_trunc('quarter', event_date)::date)+1 day_of_quarter
FROM table_a
WHERE event_date BETWEEN '2018-10-01' AND '2019-03-31'
just event_date - date_trunc('quarter', event_date) gives a day before, thus +1

Related

Generate missing data and fill it down - postgresql

I have the dataset:
The problem is that the records are added only if an event happened, e.g. for the row with id 13897, the record was updated on 4/18/2020 and then on 5/1/2020 - the status was changed. What I need is the status of each record at the end of every month.
I was thinking about the below logic:
generate the series of dates from the min(date) till now - T1
get distinct id from the dataset - T2
do cross join between two above tables so that we get a new row for every row in the second table - T3
extract the dataset with all required fields - T4
merge T3 and T4 by concatenate(date and id) - T5
sort T5 by id and d asc - T5
fill-down all the fields grouped by id - T5
generate the series of dates from min(date) till now with the interval of one month and get the last day of each month - T6
merge T5 and T6 by date - right join so that we get only rows with the date = end of month
I am on step 6.
SELECT *
FROM (SELECT d, Concat(dt, t2.id) AS cnct
FROM (SELECT d,d::date AS dt
FROM generate_series(
( SELECT min(created_at::date)
FROM new_table), CURRENT_DATE , interval '1 day') d) t1
CROSS JOIN
(SELECT DISTINCT id FROM new_table )) t2)t3
--in case if a record with the same id was updated several times throughout the day
LEFT JOIN (WITH cte AS
( SELECT id, status, created_at at time zone 'eat' at time zone 'utc' AS "created_at", updated_at::date AS date, updated_at::date, row_number() OVER (partition BY id, updated_at::date ORDER BY updated_at DESC) rnFROM new_table ))SELECT cte.*, Concat(updated_at::date, id) AS cnct
FROM cte
WHERE rn = 1) t4
ON t3.cnct = t4.cnct
I am stuck on step 7. I found fill column with last value from partition in postgresql but it is not what I need. I envision that I need to sort by a date block i.e. dates from min date to now for one id - 13894 are to be considered block 1, dates from min date to now for another id - 13897 are to be considered block 2. The next step I thought is to fill-down all fields per a block.
And another question, how do you deal with the event-based data to adapt it for the time-series?
Tried:
You can use Postgresql's DISTINCT ON feature to do this. We'll generate a series with the start of every month (you'll need to supply start and end dates here) and put the ID and the date into the DISTINCT ON so that we get only one row of new_table for each distinct ID and month pair. Then we simply filter and order to ensure that the row we're getting for each ID and month is the latest row for which the date is before the new month.
SELECT DISTINCT ON (new_table.id, month_start) *
FROM new_table, generate_series(start_date, end_date, interval '1 month') month_start
WHERE new_table.date < month_start
ORDER BY new_table.id, month_start ASC, new_table.date DESC;
(If you need your results to have the last day of the month and not the first day of the next month, you can just subtract 1 day from month_start in your select clause.)
EDIT: Running on the data you supplied, I get this:
SELECT DISTINCT ON (new_table.id, month_start) new_table.id, month_start - interval '1 day' as month_end, new_table.status
FROM new_table, generate_series('2020-05-01', '2020-06-01', interval '1 month') month_start
WHERE new_table.date < month_start
ORDER BY new_table.id, month_start ASC, new_table.date DESC;
id | month_end | status
-------+------------------------+--------
13894 | 2020-04-30 00:00:00-07 | 5
13894 | 2020-05-31 00:00:00-07 | 5
13897 | 2020-04-30 00:00:00-07 | 2
13897 | 2020-05-31 00:00:00-07 | 5
(4 rows)

How to form a dynamic pivot table or return multiple values from GROUP BY subquery

I'm having some major issues with the following query formation:
I have projects with start and end dates
Name Start End
---------------------------------------
Project 1 2020-08-01 2020-09-10
Project 2 2020-01-01 2025-01-01
and I'm trying to count the monthly working days within each project with the following subquery
select datetrunc('month', days) as d_month, count(days) as d_count
from generate_series(greatest('2020-08-01'::date, p.start), least('2020-09-14'::date, p.end), '1 day'::interval) days
where extract(DOW from days) not IN (0, 6)
group by d_month
where p.start is from the aliased main query and the dates are hard-coded for now, this correctly gives me the following result:
{"d_month"=>2020-08-01 00:00:00 +0000, "d_count"=>21}
{"d_month"=>2020-09-01 00:00:00 +0000, "d_count"=>10}
However subqueries can't return multiple values. The date range for the query is dynamic, so I would either need to somehow return the query as:
Name Start End 2020-08-01 2020-09-01 ...
-------------------------------------------------------------------------
Project 1 2020-08-01 2020-09-10 21 8
Project 2 2020-01-01 2025-01-01 21 10
Or simply return the whole subquery as JSON, but it doesn't seem to working either.
Any idea on how to achieve this or whether there are simpler solutions for this?
The most correct solution would be to create an actual calendar table that holds every possible day of interest to your business and, at a minimum for your purpose here, marks work days.
Ideally you would have columns to hold fiscal quarters, periods, and weeks to match your industry. You would also mark holidays. Joining to this table makes these kinds of calculations a snap.
create table calendar (
ddate date not null primary key,
is_work_day boolean default true
);
insert into calendar
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from generate_series(
'2000-01-01'::timestamp,
'2099-12-31'::timestamp,
interval '1 day'
) as gs(ts);
Assuming a calendar table is not within scope, you can do this:
with bounds as (
select min(start) as first_start, max("end") as last_end
from my_projects
), cal as (
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from bounds
cross join generate_series(
first_start,
last_end,
interval '1 day'
) as gs(ts)
), bymonth as (
select p.name, p.start, p.end,
date_trunc('month', c.ddate) as month_start,
count(*) as work_days
from my_projects p
join cal c on c.ddate between p.start and p.end
where c.is_work_day
group by p.name, p.start, p.end, month_start
)
select jsonb_object_agg(to_char(month_start, 'YYYY-MM-DD'), work_days)
|| jsonb_object_agg('name', name)
|| jsonb_object_agg('start', start)
|| jsonb_object_agg('end', "end") as result
from bymonth
group by name;
Doing a pivot from rows to columns in SQL is usually a bad idea, so the query produces json for you.

Postgresql split date range by year parts (financial year)

I have a table like follows:
id start_date end_date
1 2020-01-01 2020-05-01
2 2020-03-01 2021-04-02
I need to be able to split the rows by financial year e.g. 2020-04-01 -> 2021-03-31)
So the result of the query would be as follows:
id start_date end_date
1 2020-01-01 2020-03-31
1 2020-04-01 2020-05-01
2 2020-03-01 2020-03-31
2 2020-04-01 2021-03-31
2 2021-04-01 2021-04-02
Actually another post helped me resolve this: Date split-up based on Fiscal Year
DROP TABLE your_table;
CREATE TABLE your_table (id int, start_date date, end_date date);
INSERT INTO your_table VALUES (1, '2020-01-01', '2020-05-01');
INSERT INTO your_table VALUES (2, '2020-03-01', '2021-04-02');
SELECT
id,
GREATEST(start_date, ('01-04-'||series.year)::date) AS year_start,
LEAST(end_date, ('31-03-'||series.year + 1)::date) AS year_end
FROM
(SELECT
id,
start_date,
end_date,
generate_series(
date_part('year', your_table.start_date - INTERVAL '3 months')::int,
date_part('year', your_table.end_date - INTERVAL '3 months')::int)
FROM your_table) AS series(id, start_date, end_date, year)
ORDER BY
start_date;
Result:
"id","year_start","year_end"
1,"2020-01-01","2020-03-31"
1,"2020-04-01","2020-05-01"
2,"2020-03-01","2020-03-31"
2,"2020-04-01","2021-03-31"
2,"2021-04-01","2021-04-02"

Postgresql - Partial sum per day and overall in one query

I've got a table of different transactions with the according timestamps:
Table: Transactions
Recipient Amount Date
--------------------------------------------------
Bob 52 2019-04-21 11:06:32
Jack 12 2019-06-26 12:08:11
Jill 50 2019-04-19 24:50:26
Bob 90 2019-03-20 16:34:35
Jack 81 2019-03-25 12:26:54
Jenny 53 2019-04-20 09:07:02
Jenny 5 2019-03-29 06:15:35
Now I want to get all of Jack's transactions for today and overall:
Result
Person Amount_Today Amount_Overall
-----------------------------------------------
Jack 12 93
What's the most performant way to archieve this in postgresql? At the moment I run two queries - this one is for Amount_Today:
select Recipient, sum(Amount)
from Transactions
where Recipient = 'Jack'
and created_at > NOW() - INTERVAL '1 day'
But that doesn't seem like the right way.
You can use the filter clause:
select Recipient,
sum(Amount) as Amount_Overall,
sum(Amount) FILTER (WHERE created_at > NOW() - INTERVAL '1 day') as Amount_Today
from Transactions
where Recipient = 'Jack'
GROUP BY recipient;
You have probably realized this, but now() - interval '1 day' is not really today, it is the last 24 hours. You could use date_trunc if you want just today.

SQL Group by Month

I have a table of Sales data with each line as a sale date mm/dd/yy.
I'm trying to create a query so I can see total sales for each month I have.
Would I have to create a column separate that dictates only the month? Or is there a way that it can take the month from that date format?
The short answer is: You don't need a separate column. You can group by the result of a function call.
The details of what that function might depend on your database, how you want results formatted, and performance considerations.
The following both work in Oracle:
SELECT extract(YEAR FROM ae.saledate), extract(MONTH FROM ae.saledate), count(*)
FROM mytable ae
GROUP BY extract(YEAR FROM ae.saledate), extract(MONTH FROM ae.saledate);
SELECT TO_CHAR(ae.saledate, 'YYYY-MM'), count(*)
FROM mytable ae
GROUP BY TO_CHAR(ae.saledate, 'YYYY-MM');
Edited to add versions that ignore year and only look at month (since I was making an assumption above that wasn't actually in the question):
SELECT extract(MONTH FROM ae.saledate), count(*)
FROM mytable ae
GROUP BY extract(MONTH FROM ae.saledate);
SELECT TO_CHAR(ae.saledate, 'MM'), count(*)
FROM mytable ae
GROUP BY TO_CHAR(ae.saledate, 'MM');
The following Query will helpful.
CREATE TABLE #TEMP
(SalesDate DATETime,
Amount float
)
INSERT INTO #TEMP
SELECT '2016-01-12', 12
UNION
SELECT '2016-01-13', 12
UNION
SELECT '2016-02-12', 12
UNION
SELECT '2016-03-12', 12
SELECT CONVERT(VARCHAR(7), SalesDate, 120) AS 'YYYY-MM',
SUM(Amount) as 'Amount'
FROM #Temp
GROUP BY CONVERT(VARCHAR(7), SalesDate, 120)
OUTPUT :
YYYY-MM Amount
------- ----------------------
2016-01 24
2016-02 12
2016-03 12
(3 row(s) affected)
For only Month Wise
SELECT RIGHT(CONVERT(VARCHAR(7), SalesDate, 120), 2) AS 'MM',
sum(Amount) as 'Amount'
from #Temp
group by RIGHT(CONVERT(VARCHAR(7), SalesDate, 120), 2)
Output:
MM Amount
---- ----------------------
01 24
02 12
03 12
(3 row(s) affected)