PostgreSQL grouping timestamp by day - postgresql

I have a table x(x_id, ts), where ts is a timestamp.
And I have a second table y(y_id, day, month, year), which is supposed to have its values from x(ts).
(Both x_id and y_id are serial)
For example:
x y
_x_id_|__________ts__________ _y_id_|_day_|_month_|__year__
1 | '2019-10-17 09:10:08' 1 17 10 2019
2 | '2019-01-26 11:12:02' 2 26 1 2019
However, if on x I have 2 timestamps on the same day but different hour, this how both tables should look like:
x y
_x_id_|__________ts__________ _y_id_|_day_|_month_|__year__
1 | '2019-10-17 09:10:08' 1 17 10 2019
2 | '2019-10-17 11:12:02'
Meaning y can't have 2 rows with the same day, month and year.
Currently, the way I'm doing this is:
INSERT INTO y(day, month, year)
SELECT
EXTRACT(day FROM ts) AS day,
EXTRACT(month FROM ts) AS month,
EXTRACT(year FROM ts) AS year
FROM x
ORDER BY year, month, day;
However, as you probably know, this doesn't check if the timestamps share the same date, so how can I do that?
Thank you for your time!

Assuming you build the unique index as recommended above change your insert to:
insert into y(day, month, year)
select extract(day from ts) as day,
, extract(month from ts) as month,
, extract(year from ts) as year
from x
on conflict do nothing;
I hope your table X is not very large as the above insert (like your original) will attempt inserting a row into Y for every row in X on every execution - NO WHERE clause.

Add a UNIQUE constraint on table y to prevent adding the same date twice.
CREATE UNIQUE INDEX CONCURRENTLY y_date
ON y (year,month,day)
Then add it to y:
ALTER TABLE y
ADD CONSTRAINT y_unique_date
UNIQUE USING INDEX y_date
Note that you'll get an SQL error when the constraint is violated. If you don't want that and just ignore the INSERT, use a BEFORE INSERT trigger, returning NULL when you detect the "date" already exists, or just use ON CONFLICT DO NOTHING in your INSERT statement, as hinted by #Belayer.

Related

PostgreSQL to get the xth business day for the given month

Get xth Business day of a calendar month. For ex. if Nov'21 then 3rd business day is 3rd November, but if Oct'21 3rd business day is 5th Oct. We need to build a query or function to get this dynamically. We need to exclude the weekends (0,6) and any public holidays (from a table with public holidays)..
I believe we dont have a direct calendar function in postgres, may be we can try getting the input as month and integer for (xth business day) we need to get the output as date..
if input : Nov/11 (Month) and 3 (xth Business Day) it will be output: '2021-11-03' as output
create or replace function nth_bizday(y integer, m integer, bizday integer)
returns date language sql as
$$
select max(d) from
(
select d
from generate_series
(
make_date(y, m, 1),
make_date(y, m, 1) + interval '1 month - 1 day',
interval '1 day'
) t(d)
where extract(isodow from d) < 6
-- and not exists (select from nb_days where nb_day = d)
limit bizday
) t;
$$;
select nth_bizday(2021, 11, 11);
-- 2021-11-15
If you want to skip other non-business days except weekends then the where clause should be extended as #SQLPro suggests, something like this (supposing that you have the non-business days listed in a table, nb_days):
where extract(isodow from d) < 6
and not exists (select from nb_days where nb_day = d)
Business days are generally specific to organization... You must create a CALENDAR table with date and entries from the begining to the end, with a boolean column that indicates if a day is on or off...
Then a view can compute the nth "on" days for every month...

Using 'over' function results in column "table.id" must appear in the GROUP BY clause or be used in an aggregate function

I'm currently writing an application which shows the growth of the total number of events in my table over time, I currently have the following query to do this:
query = session.query(
count(Event.id).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
This results in the following output:
Count
Year
Month
100
2021
1
50
2021
2
75
2021
3
While this is okay on it's own, I want it to display the total number over time, so not just the number of events that month, so the desired outpout should be:
Count
Year
Month
100
2021
1
150
2021
2
225
2021
3
I read on various places I should use a window function using SqlAlchemy's over function, however I can't seem to wrap my head around it and every time I try using it I get the following error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError) column "event.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT count(event.id) OVER (PARTITION BY event.date ORDER...
^
[SQL: SELECT count(event.id) OVER (PARTITION BY event.date ORDER BY EXTRACT(year FROM event.date), EXTRACT(month FROM event.date)) AS count, EXTRACT(year FROM event.date) AS year, EXTRACT(month FROM event.date) AS month
FROM event
WHERE event.date IS NOT NULL GROUP BY year, month]
This is the query I used:
session.query(
count(Event.id).over(
order_by=(
extract('year', Event.date),
extract('month', Event.date)
),
partition_by=Event.date
).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
Could someone show me what I'm doing wrong? I've been searching for hours but can't figure out how to get the desired output as adding event.id in the group by would stop my rows from getting grouped by month and year
The final query I ended up using:
query = session.query(
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month'),
func.sum(func.count(Event.id)).over(order_by=(
extract('year', Event.date),
extract('month', Event.date)
)).label('count'),
).filter(
Event.date.isnot(None)
).group_by('year', 'month')
I'm not 100% sure what you want, but I'm assuming you want the number of events up to that month for each month. You're going to first need to calculate the # of events per month and also sum them with the postgresql window function.
You can do that with in a single select statement:
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, SUM(COUNT(events.id)) OVER(ORDER BY extract(year FROM events.date), extract(month FROM events.date)) AS total_so_far
FROM events
GROUP BY 1,2
but it might be easier to think about if you split it into two:
SELECT year, month, SUM(events_count) OVER(ORDER BY year, month)
FROM (
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, COUNT(events.id) AS events_count
FROM events
GROUP BY 1,2
)
but not sure how to do that in SqlAlchemy

How to form a dynamic pivot table or return multiple values from GROUP BY subquery

I'm having some major issues with the following query formation:
I have projects with start and end dates
Name Start End
---------------------------------------
Project 1 2020-08-01 2020-09-10
Project 2 2020-01-01 2025-01-01
and I'm trying to count the monthly working days within each project with the following subquery
select datetrunc('month', days) as d_month, count(days) as d_count
from generate_series(greatest('2020-08-01'::date, p.start), least('2020-09-14'::date, p.end), '1 day'::interval) days
where extract(DOW from days) not IN (0, 6)
group by d_month
where p.start is from the aliased main query and the dates are hard-coded for now, this correctly gives me the following result:
{"d_month"=>2020-08-01 00:00:00 +0000, "d_count"=>21}
{"d_month"=>2020-09-01 00:00:00 +0000, "d_count"=>10}
However subqueries can't return multiple values. The date range for the query is dynamic, so I would either need to somehow return the query as:
Name Start End 2020-08-01 2020-09-01 ...
-------------------------------------------------------------------------
Project 1 2020-08-01 2020-09-10 21 8
Project 2 2020-01-01 2025-01-01 21 10
Or simply return the whole subquery as JSON, but it doesn't seem to working either.
Any idea on how to achieve this or whether there are simpler solutions for this?
The most correct solution would be to create an actual calendar table that holds every possible day of interest to your business and, at a minimum for your purpose here, marks work days.
Ideally you would have columns to hold fiscal quarters, periods, and weeks to match your industry. You would also mark holidays. Joining to this table makes these kinds of calculations a snap.
create table calendar (
ddate date not null primary key,
is_work_day boolean default true
);
insert into calendar
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from generate_series(
'2000-01-01'::timestamp,
'2099-12-31'::timestamp,
interval '1 day'
) as gs(ts);
Assuming a calendar table is not within scope, you can do this:
with bounds as (
select min(start) as first_start, max("end") as last_end
from my_projects
), cal as (
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from bounds
cross join generate_series(
first_start,
last_end,
interval '1 day'
) as gs(ts)
), bymonth as (
select p.name, p.start, p.end,
date_trunc('month', c.ddate) as month_start,
count(*) as work_days
from my_projects p
join cal c on c.ddate between p.start and p.end
where c.is_work_day
group by p.name, p.start, p.end, month_start
)
select jsonb_object_agg(to_char(month_start, 'YYYY-MM-DD'), work_days)
|| jsonb_object_agg('name', name)
|| jsonb_object_agg('start', start)
|| jsonb_object_agg('end', "end") as result
from bymonth
group by name;
Doing a pivot from rows to columns in SQL is usually a bad idea, so the query produces json for you.

Postgres find where dates are NOT overlapping between two tables

I have two tables and I am trying to find data gaps in them where the dates do not overlap.
Item Table:
id unique start_date end_date data
1 a 2019-01-01 2019-01-31 X
2 a 2019-02-01 2019-02-28 Y
3 b 2019-01-01 2019-06-30 Y
Plan Table:
id item_unique start_date end_date
1 a 2019-01-01 2019-01-10
2 a 2019-01-15 'infinity'
I am trying to find a way to produce the following
Missing:
item_unique from to
a 2019-01-11 2019-01-14
b 2019-01-01 2019-06-30
step-by-step demo:db<>fiddle
WITH excepts AS (
SELECT
item,
generate_series(start_date, end_date, interval '1 day') gs
FROM items
EXCEPT
SELECT
item,
generate_series(start_date, CASE WHEN end_date = 'infinity' THEN ( SELECT MAX(end_date) as max_date FROM items) ELSE end_date END, interval '1 day')
FROM plan
)
SELECT
item,
MIN(gs::date) AS start_date,
MAX(gs::date) AS end_date
FROM (
SELECT
*,
SUM(same_day) OVER (PARTITION BY item ORDER BY gs)
FROM (
SELECT
item,
gs,
COALESCE((gs - LAG(gs) OVER (PARTITION BY item ORDER BY gs) >= interval '2 days')::int, 0) as same_day
FROM excepts
) s
) s
GROUP BY item, sum
ORDER BY 1,2
Finding the missing days is quite simple. This is done within the WITH clause:
Generating all days of the date range and subtract this result from the expanded list of the second table. All dates that not occur in the second table are keeping. The infinity end is a little bit tricky, so I replaced the infinity occurrence with the max date of the first table. This avoids expanding an infinite list of dates.
The more interesting part is to reaggregate this list again, which is the part outside the WITH clause:
The lag() window function take the previous date. If the previous date in the list is the last day then give out true (here a time changing issue occurred: This is why I am not asking for a one day difference, but a 2-day-difference. Between 2019-03-31 and 2019-04-01 there are only 23 hours because of daylight saving time)
These 0 and 1 values are aggregated cumulatively. If there is one gap greater than one day, it is a new interval (the days between are covered)
This results in a groupable column which can be used to aggregate and find the max and min date of each interval
Tried something with date ranges which seems to be a better way, especially for avoiding to expand long date lists. But didn't come up with a proper solution. Maybe someone else?

PostgreSQL Selecting The Closest Previous Month of June

I am trying to write a piece for a query that grabs the closest, past June 1st. For example, today is 10/2/2018. If I run the query today, I need it to use the date 6/1/2018. If I run it on 5/29/2019, it still needs to grab 6/1/2018. If I run it on 6/2/2019, it should then grab 6/1/2019. If I run it on 6/2/2022, it should then grab 6/1/2022 and so on.
I believe I need to start with something like this:
SELECT CASE WHEN EXTRACT(MONTH FROM NOW())>=6 THEN 'CURRENT' ELSE 'RF LAST' END AS X
--If month is greater than or equal to 6, you are in the CURRENT YEAR (7/1/CURRENT YEAR)
--If month is less than 6, then reference back to the last year (YEAR MINUS ONE)
And I believe I need to truncate the date then perform an operation. I am unsure of which approach to take (if I should be adding a year to a timestamp such as '6/1/1900', or if I should try to disassemble the date parts to perform an operation. I keep getting errors in my attempts such as "operator does not exist". Things I have tried include:
SELECT (CURRENT_DATE- (CURRENT_DATE-INTERVAL '7 months'))
--This does not work as it just gives me a count of days.
SELECT (DATE_TRUNC('month',NOW())+TIMESTAMP'1900-01-01 00:00:00')
--Variations of this just don't work and generally error out.
Use a case expression to determine if you need to use the current year, or, the previous year (months 1 to 5)
case when extract(month from current_date) >= 6 then 0 else -1 end
then add that to the year extracted from current_date, e.g. using to_date()
select to_Date('06' || (extract(year from current_date)::int + case when extract(month from current_date) >= 6 then 0 else -1 end)::varchar, 'mmYYYY');
You could also use make_date(year int, month int, day int) in postgres 9.4+
select make_date(extract(year from current_date) + case when extract(month from current_date) >= 6 then 0 else -1 end, 6, 1) ;
If month lower than 6, trunc year and minus 6 months.
Else trunc year and add 6 months.
set datestyle to SQL,MDY;
select
case when (extract( month from (date::date)))<6 then date_trunc('year',date)-'6 month'::interval
else date_trunc('year',date)+'6 months'::interval
end as closest_prev_june,
another_column,
another_column2
from mytable;
But format is default and supposed you have a column that named date.
If you want to do this with now(), change date columns with now()
function.