postgresql daysdiff between two dates grouped by month - postgresql

I have a table with the date columns (start_date, end_date) and I want to calculate the difference between these dates and grouped by the month.
I am able to get the datediff in days, but I do not know how to group this in month, any suggestions?
Table:
id Start_date End_date days
1234 2014-06-03 2014-07-05 32
12345 2014-02-02 2014-05-10 97
Expected results:
month diff_days
2 26
3 30
4 31
5 10
6 27
7 5

I think your expected output numbers are off a little. You might want to double-check.
I use a calendar table myself, but this query uses a CTE and date arithmetic. Avoiding the hard-coded date '2014-01-01' and the interval for 365 days is straightforward, but it makes the query harder to read, so I just used those values directly.
with your_data as (
select date '2014-06-03' as start_date, date '2014-07-05' as end_date union all
select '2014-02-02', '2014-05-10'
), calendar as (
select date '2014-01-01' + (n || ' days')::interval calendar_date
from generate_series(0, 365) n
)
select extract (month from calendar_date) calendar_month, count(*) from calendar
inner join your_data on calendar.calendar_date between start_date and end_date
group by calendar_month
order by calendar_month;
calendar_month count
--
2 27
3 31
4 30
5 10
6 28
7 5
As a rule of thumb, you should never group by the month alone--doing that risks grouping data from different years. This is a safer version that includes the year, and which also restricts output to a single calendar year.
with your_data as (
select date '2014-06-03' as start_date, date '2014-07-05' as end_date union all
select '2014-02-02', '2014-05-10'
), calendar as (
select date '2014-01-01' + (n || ' days')::interval calendar_date
from generate_series(0, 700) n
)
select extract (year from calendar_date) calendar_year, extract (month from calendar_date) calendar_month, count(*) from calendar
inner join your_data on calendar.calendar_date between start_date and end_date
where calendar_date between '2014-01-01' and '2014-12-31'
group by calendar_year, calendar_month
order by calendar_year, calendar_month;

SQL Fiddle
with min_max as (
select min(start_date) as start_date, max(end_date) as end_date
from t
), g as (
select daterange(d::date, (d + interval '1 month')::date, '[)') as r
from generate_series(
(select date_trunc('month', start_date) from min_max),
(select end_date from min_max),
'1 month'
) g(d)
)
select *
from (
select
to_char(lower(r), 'YYYY Mon') as "Month",
sum(upper(r) - lower(r)) as days
from (
select t.r * g.r as r
from
(
select daterange(start_date, end_date, '[]') as r
from t
) t
inner join
g on t.r && g.r
) s
group by 1
) s
order by to_timestamp("Month", 'YYYY Mon')
;
Month | days
----------+------
2014 Feb | 27
2014 Mar | 31
2014 Apr | 30
2014 May | 10
2014 Jun | 28
2014 Jul | 5
Range data types
Range functions and operators

Related

I need help in writing a subquery

I have a query like this to create date series:
Select month
From
(select to_char(created_date, 'Mon') as Month,
created_date::date as start_day,
(created_date::date + interval '1 month - 1 day ')::date as end_day
from generate_series(date '2021-01-26',
date '2022-04-26', interval '1 month') as g(created_date)) AS "thang"
And the table looks like this:
month
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Now I want to count the status from the KYC table.
So I try this:
Select
(Select month
From
(select to_char(created_date, 'Mon') as Month,
created_date::date as start_day,
(created_date::date + interval '1 month - 1 day ')::date as end_day
from generate_series(date '2021-01-26',
date '2022-04-26', interval '1 month') as g(created_date)) AS "thang"),
count(*) filter (where status = 4) as "KYC_Success"
From kyc
group by 1
I hope the result will be like this:
Month | KYC_Success
Jan | 234
Feb | 435
Mar | 546
Apr | 157
But it said
error: more than one row returned by a subquery used as an expression
What should I change in this query?
Let us assume that the table KYC has a timestamp column called created_date and the status column, and, that you want to count the success status per month - even if there was zero success items in a month.
SELECT thang.month
, count(CASE WHEN kyc.STATUS = 'success' THEN 1 END) AS successes
FROM (
SELECT to_char(created_date, 'Mon') AS Month
, created_date::DATE AS start_date
, (created_date::DATE + interval '1 month - 1 day ')::DATE AS end_date
FROM generate_series(DATE '2021-01-26', DATE '2022-04-26', interval '1 month') AS g(created_date)
) AS "thang"
LEFT JOIN kyc ON kyc.created_date>= thang.start_date
AND kyc.created_date < thang.end_date
GROUP BY thang.month;

How to show sum per day AND year postgresql

I want to get sum row values per day and per year, and showing on the same row.
The database that the first and second queries get results from from include a table like this (ltg_data):
time lon lat geom
2018-01-30 11:20:21 -105.4333 32.3444 01010....
And then some geometries that I'm joining to.
One query:
SELECT to_char(time, 'MM/DD/YYYY') as day, count(*) as strikes FROM counties JOIN ltg_data on ST_contains(counties.the_geom, ltg_data.ltg_geom) WHERE cwa = 'MFR' and time >= (now() at time zone 'utc') - interval '50500 hours' group by 1;
Results are like:
day strikes
01/28/2018 22
03/23/2018 15
12/19/2017 20
12/20/2017 12
Second query:
SELECT to_char(time, 'YYYY') as year, count(*) as strikes FROM counties JOIN ltg_data on ST_contains(counties.the_geom, ltg_data.ltg_geom) WHERE cwa = 'MFR' and time >= (now() at time zone 'utc') - interval '50500 hours' group by 1;
Results are like:
year strikes
2017 32
2018 37
What I'd like is:
day daily_strikes year yearly_strikes
01/28/2018 22 2018 37
03/23/2018 15 2018 37
12/19/2017 20 2017 32
12/20/2017 12 2017 32
I found that union all shows the year totals at the very bottom, but I'd like to have the results horizontally, even if there are repeat yearly totals. Thanks for any help!
You can try this kind of approach. It's not very optimal but at lease works:
I have a test table like this:
postgres=# select * from test;
d | v
------------+---
2001-02-16 | a
2002-02-16 | a
2002-02-17 | a
2002-02-17 | a
(4 wiersze)
And query:
select
q.year,
sum(q.countPerDay) over (partition by extract(year from q.day)),
q.day,
q.countPerDay
from (
select extract('year' from d) as year, date_trunc('day', d) as day, count(*) as countPerDay from test group by day, year
) as q
So the result looks like this:
2001 | 1 | 2001-02-16 00:00:001 | 1
2002 | 3 | 2002-02-16 00:00:001 | 1
2002 | 3 | 2002-02-17 00:00:001 | 2
create table strikes (game_date date,
strikes int
) ;
insert into strikes (game_date, strikes)
values ('01/28/2018', 22),
('03/23/2018', 15),
('12/19/2017', 20),
('12/20/2017', 12)
;
select * from strikes ;
select game_date, strikes, sum(strikes) over(partition by extract(year from game_date) ) as sum_stikes_by_year
from strikes ;
"2017-12-19" 20 "32"
"2017-12-20" 12 "32"
"2018-01-28" 22 "37"
"2018-03-23" 15 "37"
This application of aggregation is known as "windowing" functions or analytic functions:
PostgreSQL Docs
---- EDIT --- based on comments...
create table strikes_tally (strike_time timestamp,
lat varchar(10),
long varchar(10),
geom varchar(10)
) ;
insert into strikes_tally (strike_time, lat, long, geom)
values ('2018-01-01 12:43:00', '100.1', '50.8', '1234'),
('2018-01-01 12:44:00', '100.1', '50.8', '1234'),
('2018-01-01 12:45:00', '100.1', '50.8', '1234'),
('2018-01-02 20:01:00', '100.1', '50.8', '1234'),
('2018-01-02 20:02:00', '100.1', '50.8', '1234'),
('2018-01-02 22:03:00', '100.1', '50.8', '1234') ;
select to_char(strike_time, 'dd/mm/yyyy') as strike_date,
count(strike_time) over(partition by to_char(strike_time, 'dd/mm/yyyy')) as daily_strikes,
to_char(strike_time, 'yyyy') as year,
count(strike_time) over(partition by to_char(strike_time, 'yyyy') ) as yearly_strikes
from strikes_tally
;

How to query hourly aggregated data by date with postgresql?

There is one table:
ID DATE
1 2017-09-16 20:12:48
2 2017-09-16 20:38:54
3 2017-09-16 23:58:01
4 2017-09-17 00:24:48
5 2017-09-17 00:26:42
..
The result I need is the last 7-days of data with hourly aggregated count of rows:
COUNT DATE
2 2017-09-16 21:00:00
0 2017-09-16 22:00:00
0 2017-09-16 23:00:00
1 2017-09-17 00:00:00
2 2017-09-17 01:00:00
..
I tried different stuff with EXTRACT, DISTINCT and also used the generate_series function (most stuff from similar stackoverflow questions)
This try was the best one currently:
SELECT
date_trunc('hour', demotime) as date,
COUNT(demotime) as count
FROM demo
GROUP BY date
How to generate hourly series for 7 days and fill-in the count of rows?
SQL DEMO
SELECT dd, count("demotime")
FROM generate_series
( current_date - interval '7 days'
, current_date
, '1 hour'::interval) dd
LEFT JOIN Table1
ON dd = date_trunc('hour', demotime)
GROUP BY dd;
To work from now and now - 7 days:
SELECT dd, count("demotime")
FROM generate_series
( date_trunc('hour', NOW()) - interval '7 days'
, date_trunc('hour', NOW())
, '1 hour'::interval) dd
LEFT JOIN Table1
ON dd = date_trunc('hour', demotime)
GROUP BY dd;

PostgreSQL Query: Column Sum for the latest available date of each month

Given a pSQL table which looks like this:
date | data
2015-01-23 | 15
2015-01-23 | 11
2015-02-25 | 15
2015-02-25 | 11
2015-01-25 | 24
2015-01-25 | 2
2015-01-25 | 13
2015-01-29 | 5
2015-02-28 | 12
2015-02-28 | 1
2015-05-15 | 12
2015-05-16 | 1
How can I get the sum of data for the last available date of each month?
Example result:
date | data
2015-01-29 | 5
2015-02-28 | 13
2015-05-16 | 1
This is what I've tried so far:
SELECT year,month,max(day),sum(data) FROM
(
SELECT
date,
date_part('year', date) AS year,
date_part('month', date) AS month,
date_part('day', date) AS day,
sum(data) AS tdata
FROM table a
GROUP BY date, date_part('year', date), date_part('month', date), date_part('day', date)
ORDER BY year ASC, month ASC, day ASC
) dataq
GROUP BY year,month
The sum I get from this appears to be wrong.
You should calculate the sums in the inner query, grouping by a single day. Select latest day in month in the outer query:
select distinct on (year, month)
make_date(year::int, month::int, day::int) as date,
data
from (
select
date_part('year', date) as year,
date_part('month', date) as month,
date_part('day', date) as day,
sum(data) as data
from my_table
group by date
) s
order by year, month, day desc
date | data
------------+------
2015-01-29 | 5
2015-02-28 | 13
2015-05-16 | 1
(3 rows)
I guess you need just to remove days that you don't want to sum. For example using NOT EXISTS as follows:
SELECT year,month,max(day),sum(tdata) tdata FROM
(
SELECT
d,
date_part('year', d) AS year,
date_part('month', d) AS month,
date_part('day', d) AS day,
sum(data) AS tdata
FROM tab a
WHERE NOT EXISTS
(
SELECT *
FROM tab a2
WHERE date_part('year', a.d) = date_part('year', a2.d) AND
date_part('month', a.d) = date_part('month', a2.d) AND
date_part('day', a.d) < date_part('day', a2.d)
)
GROUP BY d, date_part('year', d), date_part('month', d), date_part('day', d)
ORDER BY year ASC, month ASC, day ASC
) dataq
GROUP BY year,month
SQLFiddle

Getting data from postgres weekly (according to date)

user timespent(in sec) date(in timestamp)
u1 10 t1(2015-08-15)
u1 20 t2(2015-08-19)
u1 15 t3(2015-08-28)
u1 16 t4(2015-09-06)
Above is the format of my table, which represents timespent by user on a course and it is ordered by timestamp. I want to get sum of timespent by a particular user, say u1 weekly in the format :
start_date end_date sum
2015-08-15 2015-08-21 30
2015-08-22 2015-08-28 15
2015-08-29 2015-09-04 0
2015-09-05 2015-09-11 16
The difficulty lies in the fact that the seven-day periods that you want to get are not regular weeks starting with Monday.
You can not therefore use standard functions to get the week number based on the date, and have to use your own weeks generator using generate_series().
Example data:
create table sessions (user_name text, time_spent int, session_date timestamp);
insert into sessions values
('u1', 10, '2015-08-15'),
('u1', 20, '2015-08-19'),
('u1', 15, '2015-08-28'),
('u1', 16, '2015-09-06');
The query for an arbitrary chosen period from 2015-08-15 to 2015-09-06:
with weeks as (
select d::date start_date, d::date+ 6 end_date
from generate_series('2015-08-15', '2015-09-06', '7d'::interval) d
)
select w.start_date, w.end_date, coalesce(sum(time_spent), 0) total
from weeks w
left join (
select start_date, end_date, coalesce(time_spent, 0) time_spent
from weeks
join sessions
on session_date between start_date and end_date
where user_name = 'u1'
) s
on w.start_date = s.start_date and w.end_date = s.end_date
group by 1, 2
order by 1;
start_date | end_date | total
------------+------------+-------
2015-08-15 | 2015-08-21 | 30
2015-08-22 | 2015-08-28 | 15
2015-08-29 | 2015-09-04 | 0
2015-09-05 | 2015-09-11 | 16
(4 rows)
select
ui,
date_trunc('week', the_date)::date as start_date,
date_trunc('week', the_date)::date + 6 as end_date,
sum(timespent) as "sum"
from t
group by 1, 2, 3
order by 1,2
Something like this (assuming that by timestamp you mean the data type timestamp).
In order to make the 1st day of the week to be Sunday, I added and extra day to "date" in the group by.
select (start_date - date_part('dow', start_date) * interval '1 day')::date start_date,
(start_date + (6 - date_part('dow', start_date)) * interval '1 day')::date end_date,
total_time_spent
from (
select min("date") start_date, sum(timespent) total_time_spent
from mytable
where user=u1
group by date_part('year', "date"), date_part('week', "date" + interval '1 day')) "tmp"
order by start_date
This is a more generic approach, for any date interval.