Produce a row for dates that do not exist in a table [duplicate] - postgresql

I have a postgresql table userDistributions like this :
user_id, start_date, end_date, project_id, distribution
I need to write a query in which a given date range and user id the output should be the sum of all distributions for every day for that given user.
So the output should be like this for input : '2-2-2012' - '2-4-2012', some user id :
Date SUM(Distribution)
2-2-2012 12
2-3-2012 15
2-4-2012 34
A user has distribution in many projects, so I need to sum the distributions in all projects for each day and output that sum against that day.
My problem is what I should group by against ? If I had a field as date (instead of start_date and end_date), then I could just write something like
select date, SUM(distributions) from userDistributions group by date;
but in this case I am stumped as what to do. Thanks for the help.

Use generate_series to produce your dates, something like this:
select dt.d::date, sum(u.distributions)
from userdistributions u
join generate_series('2012-02-02'::date, '2012-02-04'::date, '1 day') as dt(d)
on dt.d::date between u.start_date and u.end_date
group by dt.d::date
Your date format is ambiguous so I guess while converting it to ISO 8601.

This is much like #mu's answer.
However, to cover days with no matches you should use LEFT JOIN:
SELECT d.d::date, sum(u.distributions) AS dist_sum
FROM generate_series('2012-02-02'::date, '2012-02-04'::date, '1 day') AS d(d)
LEFT JOIN userdistributions u ON d.d::date BETWEEN u.start_date AND u.end_date
GROUP BY 1

Related

Calculate the difference between two dates in business days

I'm trying to calculate the difference in business days between two dates, at my searches I found the use of functions and gereneate_series() but I would like to search something more practical.
Sample Table:
|start_date |end_date |
|2022-06-01 |2022-06-01|
|2022-05-29 |2022-06-02|
What would you consider more practical? It seems generate_series and extract are specifically made for what you are asking. (see demo)
select start_date, end_date, count(*) "Business days"
from sometable
cross join generate_series(start_date,end_date,interval '1 day') gs(dt)
where extract(isodow from dt) < 6
group by start_date, end_date;

postgresql generating a list of dates between two dates fields

I would like to get the list of dates from two fields: start and end.
I found a case here: Show a list of dates between two dates?
But I would like to have a better solution without going through an intermediary table.
Here is the initial table:
Here is the result I would like to have:
Thank you
Use generate_series()
select t.id, t.name, t.g.dt::date as start_end
from the_table t
cross join generate_series(t.date_start, t.date_end, interval '1 day') as g(dt)
order by t.id, g.dt;

Postgres: How to change start day of week and apply it in date_part?

with partial as(
select
date_part('week', activated_at) as weekly,
count(*) as count
from vendors
where activated_at notnull
group by weekly
)
This is the query counts number of vendors activating per week. I need to change the start day of week from Monday to Saturday. Similar posts like how to change the first day of the week in PostgreSQL or Making Postgres date_trunc() use a Sunday based week but non explain how to embed it in date_part function. I would like to know how to use this function in my query and start day from Saturday.
Thanks in advance.
maybe a little bit overkill for that, you can use some ctes and window functions, so first generate your intervals, start with your first saturday, you want e.g. 2018-01-06 00:00 and the last day you want 2018-12-31, then select your data, join it , sum it and as benefit you also get weeks with zero activations:
with temp_days as (
SELECT a as a ,
a + '7 days'::interval as e
FROM generate_series('2018-01-06 00:00'::timestamp,
'2018-12-31 00:00', '7 day') as a
),
temp_data as (
select
1 as counter,
vendors.activated_at
from vendors
where activated_at notnull
),
temp_order as
(
select *
from temp_days
left join temp_data on temp_data.activated_at between (temp_days.a) and (temp_days.e)
)
select
distinct on (temp_order.a)
temp_order.a,
temp_order.e,
coalesce(sum(temp_order.counter) over (partition by temp_order.a),0) as result
from temp_order

Selecting certain columns from a table with dates as columns

I have a table where column names are like years "2020-05","2020-06", "2020-07" etc and so many years as columns.I need to select only the current month, next month and third month columns alone from this table.(DB : PostgreSQL Version 11)
But since the column names are "TEXT" are in the format YYYY-MM , How can I select only the current month and future 2 months from this table without hard-coding the column names.
Below is the table structure , Name : static_data
Required select statement is like this,The table contains the 14 months data as in the above screen shot like DATES as columns.From this i want the current month , and next 2 month columns along with their data, something like below.
SELECT "2020-05","2020-06","2020-07" from static
-- SELECT Current month and next 2 months
Required output:
It's nearly impossible to get the actual value of the current month as the column name, but you can do something like this:
select d.item_sku,
d.status,
to_jsonb(d) ->> to_char(current_date, 'yyyy-mm') as current_month,
to_jsonb(d) ->> to_char(current_date + interval '1 month', 'yyyy-mm') as "month + 1",
to_jsonb(d) ->> to_char(current_date + interval '2 month', 'yyyy-mm') as "month + 2"
from bad_design d
;
Technically, you can use the information schema to achieve this. But, like GMB said, please re-design your schema and do not approach this issue like this, in the first place.
The special schema information_schema contains meta-data about your DB. Among these is are details about existing columns. In other words, you can query it and convert their names into dates to compare them to what you need.
Here are a few hints.
Query existing column names.
SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'your_schema'
AND table_name = 'your_table'
Compare two dates.
SELECT now() + INTERVAL '3 months' < now() AS compare;
compare
---------
f
(1 row)
You're already pretty close with the conversion yourself.
Have fun and re-design your schema!
Disclaimer: this does not answer your question - but it's too long for a comment.
You need to fix the design of this table. Instead of storing dates in columns, you should have each date on a separate row.
There are numerous drawbacks to your current design:
very simple queries are utterly complicated : filtering on dates, aggregation... All these operations require dynamic SQL, which adds a great deal of complexity
adding or removing new dates requires modifying the structure of the table
storage is wasted for rows where not all columns are filled
Instead, consider this simple design, with one table that stores the master data of each item_sku, and a child table
create table myskus (
item_sku int primary key,
name text,
cat_level_3_name text
);
create table myvalues (
item_sku int references myskus(item_sku),
date_sku date,
value_sku text,
primary key (item_sku, date_sku)
);
Now your original question is easy to solve:
select v.*, s.name, s.cat_level_3_name
from myskus s
inner join myvalues v on v.item_sku = s.item_sku
where
v.date_sku >= date_trunc('month', now())
and v.date_sku < date_trunc('month', now()) + interval '3 month'

Postgres generate_series excluding date ranges

I'm creating a subscription management system, and need to generate a list of upcoming billing date for the next 2 years. I've been able to use generate_series to get the appropriate dates as such:
SELECT i::DATE
FROM generate_series('2015-08-01', '2017-08-01', '1 month'::INTERVAL) i
The last step I need to take is exclude specific date ranges from the calculation. These excluded date ranges may be any range of time. Additionally, they should not be factored into the time range for the generate_series.
For example, say we have a date range exclusion from '2015-08-27' to '2015-09-03'. The resulting generate_series should exclude the date that week from the calculation, and basically push all future month billing dates one week to the future:
2015-08-01
2015-09-10
2015-10-10
2015-11-10
2015-12-10
First you create a time series of dates over the next two years, EXCEPT your blackout dates:
SELECT dt
FROM generate_series('2015-08-01'::date, '2017-08-01'::date, interval '1 day') AS s(dt)
EXCEPT
SELECT dt
FROM generate_series('2015-08-27'::date, '2015-09-03'::date, interval '1 day') as ex1(dt)
Note that you can have as many EXCEPT clauses as you need. For individual blackout days (as opposed to ranges) you could use a VALUES clause instead of a SELECT.
Then you window over that time-series to generate row numbers of billable days:
SELECT row_number() OVER (ORDER BY dt) AS rn, dt
FROM (<query above>) x
Then you select those days where you want to bill:
SELECT dt
FROM (<query above>) y
WHERE rn % 30 = 1; -- billing on the first day of the period
(This latter query following Craig's advice of billing by 30 days)
Yields:
SELECT dt
FROM (
SELECT row_number() OVER (ORDER BY dt) AS rn, dt
FROM (
SELECT dt
FROM generate_series('2015-08-01'::date, '2017-08-01'::date, interval '1 day') AS s(dt)
EXCEPT
SELECT dt
FROM generate_series('2015-08-27'::date, '2015-09-03'::date, interval '1 day') as ex1(dt)
) x
) y
WHERE rn % 30 = 1;
You will have to split the call to generate series for exclusions. Some thing like this:
Union of 3 queries
First query pulls dates from start to exclusion range from
Second query pulls dates between exclusion range to and your end date
Third query pulls dates when none of your series dates cross exclusion range
Note: You still need a way to loop through exclusion list (if you have one). Also this query may not be very efficient as such scenarios can be better handled through functions or procedural code.