How to use date_part function to split value per month to each day and country - postgresql

DB-Fiddle
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
country VARCHAR(255),
sales_date DATE,
sales_volume DECIMAL,
fix_costs DECIMAL
);
INSERT INTO sales
(country, sales_date, sales_volume, fix_costs
)
VALUES
('DE', '2020-01-03', '500', '2000'),
('FR', '2020-01-03', '350', '2000'),
('None', '2020-01-31', '0', '2000'),
('DE', '2020-02-15', '0', '5000'),
('FR', '2020-02-15', '0', '5000'),
('None', '2020-02-29', '0', '5000'),
('DE', '2020-03-27', '180', '4000'),
('FR', '2020-03-27', '970', '4000'),
('None', '2020-03-31', '0', '4000');
Expected Result:
sales_date | country | sales_volume | fix_costs
-------------|--------------|------------------|------------------------------------------
2020-01-03 | DE | 500 | 37.95 (= 2000/31 = 64.5 x 0.59)
2020-01-03 | FR | 350 | 26.57 (= 2000/31 = 64.5 x 0.41)
-------------|--------------|------------------|------------------------------------------
2020-02-15 | DE | 0 | 86.21 (= 5000/28 = 172.4 x 0.50)
2020-02-15 | FR | 0 | 86.21 (= 5000/28 = 172.4 x 0.50)
-------------|--------------|------------------|------------------------------------------
2020-03-27 | DE | 180 | 20.20 (= 4000/31 = 129.0 x 0.16)
2020-03-27 | FR | 970 | 108.84 (= 4000/31 = 129.0 x 0.84)
-------------|--------------|------------------|-------------------------------------------
The column fix_costs in the expected result is calculated as the following:
Step 1) Get the daily rate of the fix_costs per month.(2000/31 = 64.5; 5000/29 = 172.4; 4000/31 = 129.0)
Step 2) Split the daily value to the countries DE and FR based on their share in the sales_volume. (500/850 = 0.59; 350/850 = 0.41; 180/1150 = 0.16; 970/1150 = 0.84)
Step 3) In case the sales_volume is 0 the daily rate gets split 50/50 to DE and FR as you can see for 2020-02-15.
In MariaDB I was able to this with the below query:
SELECT
s.sales_date,
s.country,
s.sales_volume,
(CASE WHEN SUM(sales_volume) OVER (PARTITION BY sales_date) > 0
THEN ((s.fix_costs/ DAY(LAST_DAY(sales_date))) *
sales_volume / NULLIF(SUM(sales_volume) OVER (PARTITION BY sales_date), 0)
)
ELSE (s.fix_costs / DAY(LAST_DAY(sales_date))) * 1 / SUM(country <> 'None') OVER (PARTITION by sales_date)
END) AS imputed_fix_costs
FROM sales s
WHERE country <> 'None'
GROUP BY 1,2,3
ORDER BY 1;
However, in PostgresSQL I get an error on DAY(LAST_DAY(sales_date)).
I tried to replace this part with (date_part('DAY', ((date_trunc('MONTH', s.sales_date) + INTERVAL '1 MONTH - 1 DAY')::date)))
However, this is causing another error.
How do I need to modify the query to get the expected result?

The Postgresql equivalent of DAY(LAST_DAY(sales_date)) would be:
extract(day from (date_trunc('month', sales_date + interval '1 month') - interval '1 day'))
The expression SUM(country <> 'None') also needs to be fixed as
SUM(case when country <> 'None' then 1 else 0 end)
It might be a good idea to define this compatibility function:
create function last_day(d date) returns date as
$$
select date_trunc('month', d + interval '1 month') - interval '1 day';
$$ language sql immutable;
Then the first expression becomes simply
extract(day from last_day(sales_date))

I would create a function to return the last day (number) for a given date - which is actually the "length" of the month.
create function month_length(p_input date)
returns integer
as
$$
select extract(day from (date_trunc('month', p_input) + interval '1 month - 1 day'));
$$
language sql
immutable;
Then the query can be written as:
select sales_date, country,
sum(sales_volume),
sum(fix_costs_per_day * cost_factor)
from (
select id, country, sales_date, sales_volume, fix_costs,
fix_costs / month_length(sales_date) as fix_costs_per_day,
case
when sum(sales_volume) over (partition by sales_date) > 0
then sales_volume::numeric / sum(sales_volume) over (partition by sales_date)
else sales_volume::numeric / 2
end as cost_factor
from sales
where country <> 'None'
) t
group by sales_date, country
order by sales_date, country

Related

postgres, group by date, and bucketize per hour

I would like to create a result object that can be used with Grafana for a heatmap. In order to display the data correctly I need it the output to be like:
| date | 00:00 | 01:00 | 02:00 | 03:00 | ...etc |
| 2023-01-01 | 1 | 2 | 0 | 1 | ... |
| 2023-01-02 | 0 | 0 | 1 | 1 | ... |
| 2023-01-03 | 4 | 0 | 2 | 0 | ... |
my data table structure:
trades
-----
id
closed_at
asset
So far, I know that I need to use generate_series and use the interval function to return the hours, but I need my query to plot these hours as columns, but I've not been able to do that, as its getting a bit too advanced.
So far I have the following query:
SELECT
closed_at::DATE,
COUNT(id)
FROM trades
GROUP BY closed_at
ORDER BY closed_at
It now shows the amount of rows grouped by the days, I want to further aggregate the data, so it outputs the count per hour, as shown above.
Thanks for your help!
You can add more columns, now I only add 0:00 to 05:00.
filter usage: https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES
date_trunc usage: https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC
BEGIN;
CREATE temp TABLE trades (
id bigint GENERATED BY DEFAULT AS IDENTITY,
closed_a timestamp,
asset text
) ON COMMIT DROP;
INSERT INTO trades (closed_a)
SELECT
date '2023-01-01' + interval '10 min' * (random() * i * 10)::int
FROM
generate_series(1, 10) g (i);
INSERT INTO trades (closed_a)
SELECT
date '2023-01-02' + interval '10 min' * (random() * i * 10)::int
FROM
generate_series(1, 10) g (i);
SELECT
closed_a::date
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date) AS "0:00"
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date + interval '1 hour') AS "1:00"
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date + interval '2 hour') AS "2:00"
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date + interval '3 hour') AS "3:00"
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date + interval '4 hour') AS "4:00"
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date + interval '5 hour') AS "5:00"
FROM
trades
GROUP BY
1;
END;

Gaps and Islands - get a list of dates unemployed over a date range with Postgresl

I have a table called Position, in this table, I have the following, dates are inclusive (yyyy-mm-dd), below is a simplified view of the employment dates
id, person_id, start_date, end_date , title
1 , 1 , 2001-12-01, 2002-01-31, 'admin'
2 , 1 , 2002-02-11, 2002-03-31, 'admin'
3 , 1 , 2002-02-15, 2002-05-31, 'sales'
4 , 1 , 2002-06-15, 2002-12-31, 'ops'
I'd like to be able to calculate the gaps in employment, assuming some of the dates overlap to produce the following output for the person with id=1
person_id, start_date, end_date , last_position_id, gap_in_days
1 , 2002-02-01, 2002-02-10, 1 , 10
1 , 2002-06-01, 2002-06-14, 3 , 14
I have looked at numerous solutions, UNIONS, Materialized views, tables with generated calendar date ranges, etc. I really am not sure what is the best way to do this. Is there a single query where I can get this done?
step-by-step demo:db<>fiddle
You just need the lead() window function. With this you are able to get a value (start_date in this case) to the current row.
SELECT
person_id,
end_date + 1 AS start_date,
lead - 1 AS end_date,
id AS last_position_id,
lead - (end_date + 1) AS gap_in_days
FROM (
SELECT
*,
lead(start_date) OVER (PARTITION BY person_id ORDER BY start_date)
FROM
positions
) s
WHERE lead - (end_date + 1) > 0
After getting the next start_date you are able to compare it with the current end_date. If they differ, you have a gap. These positive values can be filtered within the WHERE clause.
(if 2 positions overlap, the diff is negative. So it can be ignored.)
first you need to find what dates overlaps Determine Whether Two Date Ranges Overlap
then merge those ranges as a single one and keep the last id
finally calculate the ranges of days between one end_date and the next start_date - 1
SQL DEMO
with find_overlap as (
SELECT t1."id" as t1_id, t1."person_id", t1."start_date", t1."end_date",
t2."id" as t2_id, t2."start_date" as t2_start_date, t2."end_date" as t2_end_date
FROM Table1 t1
LEFT JOIN Table1 t2
ON t1."person_id" = t2."person_id"
AND t1."start_date" <= t2."end_date"
AND t1."end_date" >= t2."start_date"
AND t1.id < t2.id
), merge_overlap as (
SELECT
person_id,
start_date,
COALESCE(t2_end_date, end_date) as end_date,
COALESCE(t2_id, t1_id) as last_position_id
FROM find_overlap
WHERE t1_id NOT IN (SELECT t2_id FROM find_overlap WHERE t2_ID IS NOT NULL)
), cte as (
SELECT *,
LEAD(start_date) OVER (partition by person_id order by start_date) next_start
FROM merge_overlap
)
SELECT *,
DATE_PART('day',
(next_start::timestamp - INTERVAL '1 DAY') - end_date::timestamp
) as days
FROM cte
WHERE next_start IS NOT NULL
OUTPUT
| person_id | start_date | end_date | last_position_id | next_start | days |
|-----------|------------|------------|------------------|------------|------|
| 1 | 2001-12-01 | 2002-01-31 | 1 | 2002-02-11 | 10 |
| 1 | 2002-02-11 | 2002-05-31 | 3 | 2002-06-15 | 14 |

How to show sum per day AND year postgresql

I want to get sum row values per day and per year, and showing on the same row.
The database that the first and second queries get results from from include a table like this (ltg_data):
time lon lat geom
2018-01-30 11:20:21 -105.4333 32.3444 01010....
And then some geometries that I'm joining to.
One query:
SELECT to_char(time, 'MM/DD/YYYY') as day, count(*) as strikes FROM counties JOIN ltg_data on ST_contains(counties.the_geom, ltg_data.ltg_geom) WHERE cwa = 'MFR' and time >= (now() at time zone 'utc') - interval '50500 hours' group by 1;
Results are like:
day strikes
01/28/2018 22
03/23/2018 15
12/19/2017 20
12/20/2017 12
Second query:
SELECT to_char(time, 'YYYY') as year, count(*) as strikes FROM counties JOIN ltg_data on ST_contains(counties.the_geom, ltg_data.ltg_geom) WHERE cwa = 'MFR' and time >= (now() at time zone 'utc') - interval '50500 hours' group by 1;
Results are like:
year strikes
2017 32
2018 37
What I'd like is:
day daily_strikes year yearly_strikes
01/28/2018 22 2018 37
03/23/2018 15 2018 37
12/19/2017 20 2017 32
12/20/2017 12 2017 32
I found that union all shows the year totals at the very bottom, but I'd like to have the results horizontally, even if there are repeat yearly totals. Thanks for any help!
You can try this kind of approach. It's not very optimal but at lease works:
I have a test table like this:
postgres=# select * from test;
d | v
------------+---
2001-02-16 | a
2002-02-16 | a
2002-02-17 | a
2002-02-17 | a
(4 wiersze)
And query:
select
q.year,
sum(q.countPerDay) over (partition by extract(year from q.day)),
q.day,
q.countPerDay
from (
select extract('year' from d) as year, date_trunc('day', d) as day, count(*) as countPerDay from test group by day, year
) as q
So the result looks like this:
2001 | 1 | 2001-02-16 00:00:001 | 1
2002 | 3 | 2002-02-16 00:00:001 | 1
2002 | 3 | 2002-02-17 00:00:001 | 2
create table strikes (game_date date,
strikes int
) ;
insert into strikes (game_date, strikes)
values ('01/28/2018', 22),
('03/23/2018', 15),
('12/19/2017', 20),
('12/20/2017', 12)
;
select * from strikes ;
select game_date, strikes, sum(strikes) over(partition by extract(year from game_date) ) as sum_stikes_by_year
from strikes ;
"2017-12-19" 20 "32"
"2017-12-20" 12 "32"
"2018-01-28" 22 "37"
"2018-03-23" 15 "37"
This application of aggregation is known as "windowing" functions or analytic functions:
PostgreSQL Docs
---- EDIT --- based on comments...
create table strikes_tally (strike_time timestamp,
lat varchar(10),
long varchar(10),
geom varchar(10)
) ;
insert into strikes_tally (strike_time, lat, long, geom)
values ('2018-01-01 12:43:00', '100.1', '50.8', '1234'),
('2018-01-01 12:44:00', '100.1', '50.8', '1234'),
('2018-01-01 12:45:00', '100.1', '50.8', '1234'),
('2018-01-02 20:01:00', '100.1', '50.8', '1234'),
('2018-01-02 20:02:00', '100.1', '50.8', '1234'),
('2018-01-02 22:03:00', '100.1', '50.8', '1234') ;
select to_char(strike_time, 'dd/mm/yyyy') as strike_date,
count(strike_time) over(partition by to_char(strike_time, 'dd/mm/yyyy')) as daily_strikes,
to_char(strike_time, 'yyyy') as year,
count(strike_time) over(partition by to_char(strike_time, 'yyyy') ) as yearly_strikes
from strikes_tally
;

Find date sequence in PostgreSQL

I'm trying to find the maximum sequence of days by customer in my data. I want to understand what is the max sequence of days that specific customer made. If someone enter to my app in the 25/8/16 AND 26/08/16 AND 27/08/16 AND 01/09/16 AND 02/09/16 - The max sequence will be 3 days (25,26,27).
In the end (The output) I want to get two fields: custid | MaxDaySequence
I have the following fields in my data table: custid | orderdate(timestemp)
For exmple:
custid orderdate
1 25/08/2007
1 03/10/2007
1 13/10/2007
1 15/01/2008
1 16/03/2008
1 09/04/2008
2 18/09/2006
2 08/08/2007
2 28/11/2007
2 04/03/2008
3 27/11/2006
3 15/04/2007
3 13/05/2007
3 19/06/2007
3 22/09/2007
3 25/09/2007
3 28/01/2008
I'm using PostgreSQL 2014.
Thanks
Trying:
select custid, max(num_days) as longest
from (
select custid,rn, count (*) as num_days
from (
select custid, date(orderdate),
cast (row_number() over (partition by custid order by date(orderdate)) as varchar(5)) as rn
from table_
) x group by custid, CURRENT_DATE - INTERVAL rn|| ' day'
) y group by custid
Try:
SELECT custid, max( abc ) as max_sequence_of_days
FROM (
SELECT custid, yy, count(*) abc
FROM (
SELECT * ,
SUM( xx ) OVER (partition by custid order by orderdate ) yy
FROM (
select * ,
CASE WHEN
orderdate - lag( orderdate ) over (partition by custid order by orderdate )
<= 1
THEN 0 ELSE 1 END xx
from mytable
) x
) z
GROUP BY custid, yy
) q
GROUP BY custid
Demo: http://sqlfiddle.com/#!15/00422/11
===== EDIT ===========
Got "operator does not exist: interval <= integer"
This means that orderdate column is of type timestamp, not date.
In this case you need to use <= interval '1' day condition instead of <= 1:
Please see this link: https://www.postgresql.org/docs/9.0/static/functions-datetime.html to learn more about date arithmetic in PostgreSQL
Please see this demo:
http://sqlfiddle.com/#!15/7c2200/2
SELECT custid, max( abc ) as max_sequence_of_days
FROM (
SELECT custid, yy, count(*) abc
FROM (
SELECT * ,
SUM( xx ) OVER (partition by custid order by orderdate ) yy
FROM (
select * ,
CASE WHEN
orderdate - lag( orderdate ) over (partition by custid order by orderdate )
<= interval '1' day
THEN 0 ELSE 1 END xx
from mytable
) x
) z
GROUP BY custid, yy
) q
GROUP BY custid

Postgres: Calculating the number of working months in the last X years

I users table and a jobs. User has many jobs and jobs have a start_date and end_date:
Column | Type | Modifiers
----------------+-----------------------------+---------------------------------------------------
id | integer | not null default nextval('jobs_id_seq'::regclass)
title | character varying |
employer | character varying |
start_date | date |
end_date | date |
user_id | integer |
I need to calculate the total number of months that a person has spent working within the past X years.
I've looked at OVERLAPS and played with intervals a bit but I can't quite figure out what I need. I want to make sure that even it the start_date is outside the X years range that I still count the months that are inside the range.
Here is what I have so far:
select sum(EXTRACT(YEAR FROM months) * 12 + EXTRACT(MONTH FROM months))
as working_months
from (
select CASE current
WHEN true THEN
age(current_date, start_date)
ELSE age(end_date, start_date)
END as months
from jobs inner join users on jobs.user_id = users.id
where users.id = 4
) as employment_time;
with jobs (start_date, end_date, user_id) as ( values
('2000-01-01'::date, '2005-12-31'::date, 1),
('2007-10-01', '2008-09-30', 1),
('2010-09-01', '2014-10-20', 1)
)
select
user_id,
extract(year from work_time) * 12 + extract(month from work_time) as months
from (
select
user_id,
sum(age(upper(period), lower(period))) as work_time
from (
select
user_id,
daterange(start_date, end_date, '[]') *
daterange((current_date - interval '10 years')::date, current_date)
as period
from jobs
) s
group by user_id
) s
;
user_id | months
---------+--------
1 | 70
Range type -
Range functions
The basic query would be this:
SELECT sum(extract(year from months) * 12 + extract(month from months)) AS working_months
FROM (
SELECT
age(CASE (start_date, start_date) OVERLAPS (current_date, interval '-5 years')
WHEN true THEN start_date
ELSE current_date - interval '5 years'
END AS strt::timestamp,
CASE current
WHEN true THEN current_date
ELSE end_date
END AS fin::timestamp) AS months
FROM jobs
WHERE user_id = 4) AS employment_time;
You may also put this in a SQL function with parameters for the number of years and user_id. Note that you throw away partial months from individual jobs. You can add extract(day from months) / 30 to the top SELECT to harvest those partial months into full months.
This assumes that jobs cannot overlap. If they do, then the query becomes much more complex.