PostgreSQL Query: Column Sum for the latest available date of each month - postgresql

Given a pSQL table which looks like this:
date | data
2015-01-23 | 15
2015-01-23 | 11
2015-02-25 | 15
2015-02-25 | 11
2015-01-25 | 24
2015-01-25 | 2
2015-01-25 | 13
2015-01-29 | 5
2015-02-28 | 12
2015-02-28 | 1
2015-05-15 | 12
2015-05-16 | 1
How can I get the sum of data for the last available date of each month?
Example result:
date | data
2015-01-29 | 5
2015-02-28 | 13
2015-05-16 | 1
This is what I've tried so far:
SELECT year,month,max(day),sum(data) FROM
(
SELECT
date,
date_part('year', date) AS year,
date_part('month', date) AS month,
date_part('day', date) AS day,
sum(data) AS tdata
FROM table a
GROUP BY date, date_part('year', date), date_part('month', date), date_part('day', date)
ORDER BY year ASC, month ASC, day ASC
) dataq
GROUP BY year,month
The sum I get from this appears to be wrong.

You should calculate the sums in the inner query, grouping by a single day. Select latest day in month in the outer query:
select distinct on (year, month)
make_date(year::int, month::int, day::int) as date,
data
from (
select
date_part('year', date) as year,
date_part('month', date) as month,
date_part('day', date) as day,
sum(data) as data
from my_table
group by date
) s
order by year, month, day desc
date | data
------------+------
2015-01-29 | 5
2015-02-28 | 13
2015-05-16 | 1
(3 rows)

I guess you need just to remove days that you don't want to sum. For example using NOT EXISTS as follows:
SELECT year,month,max(day),sum(tdata) tdata FROM
(
SELECT
d,
date_part('year', d) AS year,
date_part('month', d) AS month,
date_part('day', d) AS day,
sum(data) AS tdata
FROM tab a
WHERE NOT EXISTS
(
SELECT *
FROM tab a2
WHERE date_part('year', a.d) = date_part('year', a2.d) AND
date_part('month', a.d) = date_part('month', a2.d) AND
date_part('day', a.d) < date_part('day', a2.d)
)
GROUP BY d, date_part('year', d), date_part('month', d), date_part('day', d)
ORDER BY year ASC, month ASC, day ASC
) dataq
GROUP BY year,month
SQLFiddle

Related

Postgresql split date range by year parts (financial year)

I have a table like follows:
id start_date end_date
1 2020-01-01 2020-05-01
2 2020-03-01 2021-04-02
I need to be able to split the rows by financial year e.g. 2020-04-01 -> 2021-03-31)
So the result of the query would be as follows:
id start_date end_date
1 2020-01-01 2020-03-31
1 2020-04-01 2020-05-01
2 2020-03-01 2020-03-31
2 2020-04-01 2021-03-31
2 2021-04-01 2021-04-02
Actually another post helped me resolve this: Date split-up based on Fiscal Year
DROP TABLE your_table;
CREATE TABLE your_table (id int, start_date date, end_date date);
INSERT INTO your_table VALUES (1, '2020-01-01', '2020-05-01');
INSERT INTO your_table VALUES (2, '2020-03-01', '2021-04-02');
SELECT
id,
GREATEST(start_date, ('01-04-'||series.year)::date) AS year_start,
LEAST(end_date, ('31-03-'||series.year + 1)::date) AS year_end
FROM
(SELECT
id,
start_date,
end_date,
generate_series(
date_part('year', your_table.start_date - INTERVAL '3 months')::int,
date_part('year', your_table.end_date - INTERVAL '3 months')::int)
FROM your_table) AS series(id, start_date, end_date, year)
ORDER BY
start_date;
Result:
"id","year_start","year_end"
1,"2020-01-01","2020-03-31"
1,"2020-04-01","2020-05-01"
2,"2020-03-01","2020-03-31"
2,"2020-04-01","2021-03-31"
2,"2021-04-01","2021-04-02"

First record by month & by year

I have a Rails application with 20+ years of data.
I'm struggling to create two SQLs:
Fetch the first record of each year (based on filters)
Fetch the first record of each month (based on filters)
I made a DBFiddle here: https://www.db-fiddle.com/f/wjQqrrpaJeiYG8zkExbaos/0
For the first query (yearly), the result should be:
a | b_id | created_at
74780 | 82373 | 2020-01-02 01:34:33 +0000
15670 | 16639 | 2019-02-24 14:33:56 +0000
14586 | 87594 | 2018-01-06 09:14:31 +0000
I can fetch the years and months using date_part('year', created_at) and date_part('month', created_at), but didn't find a way to "glue" them with min(created_at).
Try to use window function OVER:
with grouped as(
select *, min(created_at) over(partition by date_trunc('year', created_at))
from z order by date_trunc('year', created_at) desc
)
select a, b_id, created_at from grouped where min = created_at
For the first record by month you can use the same approach by replacing all date_trunc('year', created_at) with date_trunc('month', created_at)

How to show sum per day AND year postgresql

I want to get sum row values per day and per year, and showing on the same row.
The database that the first and second queries get results from from include a table like this (ltg_data):
time lon lat geom
2018-01-30 11:20:21 -105.4333 32.3444 01010....
And then some geometries that I'm joining to.
One query:
SELECT to_char(time, 'MM/DD/YYYY') as day, count(*) as strikes FROM counties JOIN ltg_data on ST_contains(counties.the_geom, ltg_data.ltg_geom) WHERE cwa = 'MFR' and time >= (now() at time zone 'utc') - interval '50500 hours' group by 1;
Results are like:
day strikes
01/28/2018 22
03/23/2018 15
12/19/2017 20
12/20/2017 12
Second query:
SELECT to_char(time, 'YYYY') as year, count(*) as strikes FROM counties JOIN ltg_data on ST_contains(counties.the_geom, ltg_data.ltg_geom) WHERE cwa = 'MFR' and time >= (now() at time zone 'utc') - interval '50500 hours' group by 1;
Results are like:
year strikes
2017 32
2018 37
What I'd like is:
day daily_strikes year yearly_strikes
01/28/2018 22 2018 37
03/23/2018 15 2018 37
12/19/2017 20 2017 32
12/20/2017 12 2017 32
I found that union all shows the year totals at the very bottom, but I'd like to have the results horizontally, even if there are repeat yearly totals. Thanks for any help!
You can try this kind of approach. It's not very optimal but at lease works:
I have a test table like this:
postgres=# select * from test;
d | v
------------+---
2001-02-16 | a
2002-02-16 | a
2002-02-17 | a
2002-02-17 | a
(4 wiersze)
And query:
select
q.year,
sum(q.countPerDay) over (partition by extract(year from q.day)),
q.day,
q.countPerDay
from (
select extract('year' from d) as year, date_trunc('day', d) as day, count(*) as countPerDay from test group by day, year
) as q
So the result looks like this:
2001 | 1 | 2001-02-16 00:00:001 | 1
2002 | 3 | 2002-02-16 00:00:001 | 1
2002 | 3 | 2002-02-17 00:00:001 | 2
create table strikes (game_date date,
strikes int
) ;
insert into strikes (game_date, strikes)
values ('01/28/2018', 22),
('03/23/2018', 15),
('12/19/2017', 20),
('12/20/2017', 12)
;
select * from strikes ;
select game_date, strikes, sum(strikes) over(partition by extract(year from game_date) ) as sum_stikes_by_year
from strikes ;
"2017-12-19" 20 "32"
"2017-12-20" 12 "32"
"2018-01-28" 22 "37"
"2018-03-23" 15 "37"
This application of aggregation is known as "windowing" functions or analytic functions:
PostgreSQL Docs
---- EDIT --- based on comments...
create table strikes_tally (strike_time timestamp,
lat varchar(10),
long varchar(10),
geom varchar(10)
) ;
insert into strikes_tally (strike_time, lat, long, geom)
values ('2018-01-01 12:43:00', '100.1', '50.8', '1234'),
('2018-01-01 12:44:00', '100.1', '50.8', '1234'),
('2018-01-01 12:45:00', '100.1', '50.8', '1234'),
('2018-01-02 20:01:00', '100.1', '50.8', '1234'),
('2018-01-02 20:02:00', '100.1', '50.8', '1234'),
('2018-01-02 22:03:00', '100.1', '50.8', '1234') ;
select to_char(strike_time, 'dd/mm/yyyy') as strike_date,
count(strike_time) over(partition by to_char(strike_time, 'dd/mm/yyyy')) as daily_strikes,
to_char(strike_time, 'yyyy') as year,
count(strike_time) over(partition by to_char(strike_time, 'yyyy') ) as yearly_strikes
from strikes_tally
;

Getting data from postgres weekly (according to date)

user timespent(in sec) date(in timestamp)
u1 10 t1(2015-08-15)
u1 20 t2(2015-08-19)
u1 15 t3(2015-08-28)
u1 16 t4(2015-09-06)
Above is the format of my table, which represents timespent by user on a course and it is ordered by timestamp. I want to get sum of timespent by a particular user, say u1 weekly in the format :
start_date end_date sum
2015-08-15 2015-08-21 30
2015-08-22 2015-08-28 15
2015-08-29 2015-09-04 0
2015-09-05 2015-09-11 16
The difficulty lies in the fact that the seven-day periods that you want to get are not regular weeks starting with Monday.
You can not therefore use standard functions to get the week number based on the date, and have to use your own weeks generator using generate_series().
Example data:
create table sessions (user_name text, time_spent int, session_date timestamp);
insert into sessions values
('u1', 10, '2015-08-15'),
('u1', 20, '2015-08-19'),
('u1', 15, '2015-08-28'),
('u1', 16, '2015-09-06');
The query for an arbitrary chosen period from 2015-08-15 to 2015-09-06:
with weeks as (
select d::date start_date, d::date+ 6 end_date
from generate_series('2015-08-15', '2015-09-06', '7d'::interval) d
)
select w.start_date, w.end_date, coalesce(sum(time_spent), 0) total
from weeks w
left join (
select start_date, end_date, coalesce(time_spent, 0) time_spent
from weeks
join sessions
on session_date between start_date and end_date
where user_name = 'u1'
) s
on w.start_date = s.start_date and w.end_date = s.end_date
group by 1, 2
order by 1;
start_date | end_date | total
------------+------------+-------
2015-08-15 | 2015-08-21 | 30
2015-08-22 | 2015-08-28 | 15
2015-08-29 | 2015-09-04 | 0
2015-09-05 | 2015-09-11 | 16
(4 rows)
select
ui,
date_trunc('week', the_date)::date as start_date,
date_trunc('week', the_date)::date + 6 as end_date,
sum(timespent) as "sum"
from t
group by 1, 2, 3
order by 1,2
Something like this (assuming that by timestamp you mean the data type timestamp).
In order to make the 1st day of the week to be Sunday, I added and extra day to "date" in the group by.
select (start_date - date_part('dow', start_date) * interval '1 day')::date start_date,
(start_date + (6 - date_part('dow', start_date)) * interval '1 day')::date end_date,
total_time_spent
from (
select min("date") start_date, sum(timespent) total_time_spent
from mytable
where user=u1
group by date_part('year', "date"), date_part('week', "date" + interval '1 day')) "tmp"
order by start_date
This is a more generic approach, for any date interval.

postgresql daysdiff between two dates grouped by month

I have a table with the date columns (start_date, end_date) and I want to calculate the difference between these dates and grouped by the month.
I am able to get the datediff in days, but I do not know how to group this in month, any suggestions?
Table:
id Start_date End_date days
1234 2014-06-03 2014-07-05 32
12345 2014-02-02 2014-05-10 97
Expected results:
month diff_days
2 26
3 30
4 31
5 10
6 27
7 5
I think your expected output numbers are off a little. You might want to double-check.
I use a calendar table myself, but this query uses a CTE and date arithmetic. Avoiding the hard-coded date '2014-01-01' and the interval for 365 days is straightforward, but it makes the query harder to read, so I just used those values directly.
with your_data as (
select date '2014-06-03' as start_date, date '2014-07-05' as end_date union all
select '2014-02-02', '2014-05-10'
), calendar as (
select date '2014-01-01' + (n || ' days')::interval calendar_date
from generate_series(0, 365) n
)
select extract (month from calendar_date) calendar_month, count(*) from calendar
inner join your_data on calendar.calendar_date between start_date and end_date
group by calendar_month
order by calendar_month;
calendar_month count
--
2 27
3 31
4 30
5 10
6 28
7 5
As a rule of thumb, you should never group by the month alone--doing that risks grouping data from different years. This is a safer version that includes the year, and which also restricts output to a single calendar year.
with your_data as (
select date '2014-06-03' as start_date, date '2014-07-05' as end_date union all
select '2014-02-02', '2014-05-10'
), calendar as (
select date '2014-01-01' + (n || ' days')::interval calendar_date
from generate_series(0, 700) n
)
select extract (year from calendar_date) calendar_year, extract (month from calendar_date) calendar_month, count(*) from calendar
inner join your_data on calendar.calendar_date between start_date and end_date
where calendar_date between '2014-01-01' and '2014-12-31'
group by calendar_year, calendar_month
order by calendar_year, calendar_month;
SQL Fiddle
with min_max as (
select min(start_date) as start_date, max(end_date) as end_date
from t
), g as (
select daterange(d::date, (d + interval '1 month')::date, '[)') as r
from generate_series(
(select date_trunc('month', start_date) from min_max),
(select end_date from min_max),
'1 month'
) g(d)
)
select *
from (
select
to_char(lower(r), 'YYYY Mon') as "Month",
sum(upper(r) - lower(r)) as days
from (
select t.r * g.r as r
from
(
select daterange(start_date, end_date, '[]') as r
from t
) t
inner join
g on t.r && g.r
) s
group by 1
) s
order by to_timestamp("Month", 'YYYY Mon')
;
Month | days
----------+------
2014 Feb | 27
2014 Mar | 31
2014 Apr | 30
2014 May | 10
2014 Jun | 28
2014 Jul | 5
Range data types
Range functions and operators