SQL get count grouped by week and type given a date interval - tsql

Given a table of projects with start and end dates as well as a region that they are taking place, I am trying to get my result to output the number of active projects per week over a given interval and grouped by region.
I have many records in Projects that look like
region start_date end_date
Alabama 2012-07-08 2012-08-15
Texas 2012-06-13 2012-07-24
Alabama 2012-07-25 2012-09-13
Texas 2012-08-08 2012-10-28
Florida 2012-07-03 2012-08-07
Lousiana 2012-07-14 2012-08-12
....
If I want results for a single week, I can do something like
DECLARE #today datetime
SET #today ='2012-11-09'
SELECT
[Region],
count(*) as ActiveProjectCount
FROM [MyDatabase].[dbo].[Projects]
where (datecompleted is null and datestart < #today) OR (datestart < #today AND #today < datecompleted)
Group by region
order by region asc
This produces
Region ActiveProjectCount
Arkansas 15
Louisiana 18
North Dakota 18
Oklahoma 27
...
How can I alter this query to produce results that look like
Region 10/06 10/13 10/20 10/27
Arkansas 15 17 12 14
Louisiana 3 0 1 5
North Dakota 18 17 16 15
Oklahoma 27 23 19 22
...
Where on a weekly interval, I am able to see the total number of active projects (projects between start and end date)

you could do sth. like this:
with "nums"
as
(
select 1 as "value"
union all select "value" + 1 as "value"
from "nums"
where "value" <= 52
)
, "intervals"
as
(
select
"id" = "value"
, "startDate" = cast( dateadd( week, "value" - 1, dateadd(year, datediff(year, 0, getdate()), 0)) as date )
, "endDate" = cast( dateadd( week, "value", dateadd( year, datediff( year, 0, getdate()), 0 )) as date )
from
"nums"
)
, "counted"
as
(
select
"intervalId" = I."id"
, I."startDate"
, I."endDate"
, D."region"
, "activeProjects" = count(D."region") over ( partition by I."id", D."region" )
from
"intervals" I
inner join "data" D
on D."startDate" <= I."startDate"
and D."endDate" > I."endDate"
)
select
*
from
(
select
"region"
, "intervalId"
from
"counted"
) as Data
pivot
( count(Data."intervalId") for intervalId in ("25", "26", "27", "28", "29", "30", "31")) as p
intervals can be defined as you wish.
see SQL-Fiddle

Related

How to show sum per day AND year postgresql

I want to get sum row values per day and per year, and showing on the same row.
The database that the first and second queries get results from from include a table like this (ltg_data):
time lon lat geom
2018-01-30 11:20:21 -105.4333 32.3444 01010....
And then some geometries that I'm joining to.
One query:
SELECT to_char(time, 'MM/DD/YYYY') as day, count(*) as strikes FROM counties JOIN ltg_data on ST_contains(counties.the_geom, ltg_data.ltg_geom) WHERE cwa = 'MFR' and time >= (now() at time zone 'utc') - interval '50500 hours' group by 1;
Results are like:
day strikes
01/28/2018 22
03/23/2018 15
12/19/2017 20
12/20/2017 12
Second query:
SELECT to_char(time, 'YYYY') as year, count(*) as strikes FROM counties JOIN ltg_data on ST_contains(counties.the_geom, ltg_data.ltg_geom) WHERE cwa = 'MFR' and time >= (now() at time zone 'utc') - interval '50500 hours' group by 1;
Results are like:
year strikes
2017 32
2018 37
What I'd like is:
day daily_strikes year yearly_strikes
01/28/2018 22 2018 37
03/23/2018 15 2018 37
12/19/2017 20 2017 32
12/20/2017 12 2017 32
I found that union all shows the year totals at the very bottom, but I'd like to have the results horizontally, even if there are repeat yearly totals. Thanks for any help!
You can try this kind of approach. It's not very optimal but at lease works:
I have a test table like this:
postgres=# select * from test;
d | v
------------+---
2001-02-16 | a
2002-02-16 | a
2002-02-17 | a
2002-02-17 | a
(4 wiersze)
And query:
select
q.year,
sum(q.countPerDay) over (partition by extract(year from q.day)),
q.day,
q.countPerDay
from (
select extract('year' from d) as year, date_trunc('day', d) as day, count(*) as countPerDay from test group by day, year
) as q
So the result looks like this:
2001 | 1 | 2001-02-16 00:00:001 | 1
2002 | 3 | 2002-02-16 00:00:001 | 1
2002 | 3 | 2002-02-17 00:00:001 | 2
create table strikes (game_date date,
strikes int
) ;
insert into strikes (game_date, strikes)
values ('01/28/2018', 22),
('03/23/2018', 15),
('12/19/2017', 20),
('12/20/2017', 12)
;
select * from strikes ;
select game_date, strikes, sum(strikes) over(partition by extract(year from game_date) ) as sum_stikes_by_year
from strikes ;
"2017-12-19" 20 "32"
"2017-12-20" 12 "32"
"2018-01-28" 22 "37"
"2018-03-23" 15 "37"
This application of aggregation is known as "windowing" functions or analytic functions:
PostgreSQL Docs
---- EDIT --- based on comments...
create table strikes_tally (strike_time timestamp,
lat varchar(10),
long varchar(10),
geom varchar(10)
) ;
insert into strikes_tally (strike_time, lat, long, geom)
values ('2018-01-01 12:43:00', '100.1', '50.8', '1234'),
('2018-01-01 12:44:00', '100.1', '50.8', '1234'),
('2018-01-01 12:45:00', '100.1', '50.8', '1234'),
('2018-01-02 20:01:00', '100.1', '50.8', '1234'),
('2018-01-02 20:02:00', '100.1', '50.8', '1234'),
('2018-01-02 22:03:00', '100.1', '50.8', '1234') ;
select to_char(strike_time, 'dd/mm/yyyy') as strike_date,
count(strike_time) over(partition by to_char(strike_time, 'dd/mm/yyyy')) as daily_strikes,
to_char(strike_time, 'yyyy') as year,
count(strike_time) over(partition by to_char(strike_time, 'yyyy') ) as yearly_strikes
from strikes_tally
;

How does this Time Difference Calculation work?

I wanted to display the difference in HH:MM:SS between two datetime fields in SQL Server 2014.
I found a solution in this Stack Overflow post. And it works perfectly. But I want to understand the "why" of how this arrives at the correct answer.
T-SQL:
SELECT y.CustomerID ,
y.createDate ,
y.HarvestDate ,
y.DateDif ,
DATEDIFF ( DAY, 0, y.DateDif ) AS [Days] ,
DATEPART ( HOUR, y.DateDif ) AS [Hours] ,
DATEPART ( MINUTE, y.DateDif ) AS [Minutes]
FROM (
SELECT x.createDate - x.HarvestDate AS [DateDif] ,
x.createDate ,
x.HarvestDate ,
x.CustomerID
FROM (
SELECT CustomerID ,
HarvestDate ,
createDate
FROM dbo.CustomerHarvestReports
WHERE HarvestDate >= DATEADD ( MONTH, -6, GETDATE ())
) AS [x]
) AS [y]
ORDER BY DATEDIFF ( DAY, 0, y.DateDif ) DESC;
Results:
1239090 2017-11-07 08:51:03.870 2017-10-14 11:39:49.540 1900-01-24 21:11:14.330 23 21 11
1239090 2017-11-07 08:51:04.823 2017-10-19 11:17:48.320 1900-01-19 21:33:16.503 18 21 33
1843212 2017-10-27 19:14:02.070 2017-10-21 10:49:57.733 1900-01-07 08:24:04.337 6 8 24
1843212 2017-10-27 19:14:03.057 2017-10-21 10:49:57.733 1900-01-07 08:24:05.323 6 8 24
The first column in Customer ID - the second and third columns are the columns I wanted to calculate the time difference between. The third column is the difference between the two columns - and one of the points in the code in which I do not understand.
If you subtract two datetime fields like this create date - harvestdate, why does it default to the year 1900?
And regarding DATEDIFF ( DAY, 0 , y.DateDiff) - what does the 0 mean? Does the 0 set the date as '01-01-1900'?
It works - for that I am grateful. I was hoping I could get an explanation as to why this behavior works?
I've added some comments that should explain it:
SELECT y.CustomerID ,
y.createDate ,
y.HarvestDate ,
y.DateDif ,
DATEDIFF ( DAY, 0, y.DateDif ) AS [Days] , -- calculates the number of whole days between 0 and the difference
DATEPART ( HOUR, y.DateDif ) AS [Hours] , -- the number of hours between the two dates has already been cleverly
-- calculated in [DateDif], therefore, all that is required is to extract
-- that figure using DATEPART
DATEPART ( MINUTE, y.DateDif ) AS [Minutes] -- same explanation as [Hours]
FROM (
SELECT x.createDate - x.HarvestDate AS [DateDif] , -- calculates the difference expressed as a datetime;
-- 0 is '1900-01-01 00:00:00.000' as a datetime, so the
-- resulting datetime will be that plus the difference
x.createDate ,
x.HarvestDate ,
x.CustomerID
FROM (
SELECT CustomerID ,
HarvestDate ,
createDate
FROM dbo.CustomerHarvestReports
WHERE HarvestDate >= DATEADD ( MONTH, -6, GETDATE ())
) AS [x]
) AS [y]
ORDER BY DATEDIFF ( DAY, 0, y.DateDif ) DESC;

Number of entries between dates

I have a table with the following structure: -
day, id
2016-03-13, 123
2016-03-13, 123
2016-03-13, 231
2016-03-14, 231
2016-03-14, 231
2016-03-15, 129
And I'd like to build a table that looks like: -
id, d1, d7, d14
123, 1, 1, 1
231, 1, 2, 2
129, 1, 1, 1
Essentially for a given id, list the number of days which have an entry within a time window. So if id 123 has 10 entries within the last 14 days - d14 would be 10.
So far I have: -
SELECT
day,
id
FROM
events
WHERE
datediff (DAY, day, getdate()) <= 7
GROUP BY
day,
id
This query will do:
SELECT
id,
COUNT(DISTINCT CASE WHEN current_date - day <= 1 THEN 1 END) d1,
COUNT(DISTINCT CASE WHEN current_date - day <= 7 THEN 1 END) d7,
COUNT(DISTINCT CASE WHEN current_date - day <= 14 THEN 1 END) d14
FROM
events
GROUP BY
id
ORDER BY
id
Or, since PostgreSQL 9.4, slightly more concise:
SELECT
id,
COUNT(DISTINCT day) FILTER (WHERE current_date - day <= 1) d1,
COUNT(DISTINCT day) FILTER (WHERE current_date - day <= 7) d7,
COUNT(DISTINCT day) FILTER (WHERE current_date - day <= 14) d14
FROM
events
GROUP BY
id
ORDER BY
id
try this:
SELECT id
, count(case when DAY = getdate() then 1 else null end) as d1
, count(case when DAY + 7 >= getdate() then 1 else null end) as d7
, count(case when DAY + 14 >= getdate() then 1 else null end) as d14
FROM events
WHERE DAY between DAY >= getdate() - 14
--or if you can have day > today ... and DAY between getdate() - 14 and getdate()
GROUP By id

postgresql daysdiff between two dates grouped by month

I have a table with the date columns (start_date, end_date) and I want to calculate the difference between these dates and grouped by the month.
I am able to get the datediff in days, but I do not know how to group this in month, any suggestions?
Table:
id Start_date End_date days
1234 2014-06-03 2014-07-05 32
12345 2014-02-02 2014-05-10 97
Expected results:
month diff_days
2 26
3 30
4 31
5 10
6 27
7 5
I think your expected output numbers are off a little. You might want to double-check.
I use a calendar table myself, but this query uses a CTE and date arithmetic. Avoiding the hard-coded date '2014-01-01' and the interval for 365 days is straightforward, but it makes the query harder to read, so I just used those values directly.
with your_data as (
select date '2014-06-03' as start_date, date '2014-07-05' as end_date union all
select '2014-02-02', '2014-05-10'
), calendar as (
select date '2014-01-01' + (n || ' days')::interval calendar_date
from generate_series(0, 365) n
)
select extract (month from calendar_date) calendar_month, count(*) from calendar
inner join your_data on calendar.calendar_date between start_date and end_date
group by calendar_month
order by calendar_month;
calendar_month count
--
2 27
3 31
4 30
5 10
6 28
7 5
As a rule of thumb, you should never group by the month alone--doing that risks grouping data from different years. This is a safer version that includes the year, and which also restricts output to a single calendar year.
with your_data as (
select date '2014-06-03' as start_date, date '2014-07-05' as end_date union all
select '2014-02-02', '2014-05-10'
), calendar as (
select date '2014-01-01' + (n || ' days')::interval calendar_date
from generate_series(0, 700) n
)
select extract (year from calendar_date) calendar_year, extract (month from calendar_date) calendar_month, count(*) from calendar
inner join your_data on calendar.calendar_date between start_date and end_date
where calendar_date between '2014-01-01' and '2014-12-31'
group by calendar_year, calendar_month
order by calendar_year, calendar_month;
SQL Fiddle
with min_max as (
select min(start_date) as start_date, max(end_date) as end_date
from t
), g as (
select daterange(d::date, (d + interval '1 month')::date, '[)') as r
from generate_series(
(select date_trunc('month', start_date) from min_max),
(select end_date from min_max),
'1 month'
) g(d)
)
select *
from (
select
to_char(lower(r), 'YYYY Mon') as "Month",
sum(upper(r) - lower(r)) as days
from (
select t.r * g.r as r
from
(
select daterange(start_date, end_date, '[]') as r
from t
) t
inner join
g on t.r && g.r
) s
group by 1
) s
order by to_timestamp("Month", 'YYYY Mon')
;
Month | days
----------+------
2014 Feb | 27
2014 Mar | 31
2014 Apr | 30
2014 May | 10
2014 Jun | 28
2014 Jul | 5
Range data types
Range functions and operators

Rolling sum per time interval per group

Table, data and task as follows.
See SQL-Fiddle-Link for demo-data and estimated results.
create table "data"
(
"item" int
, "timestamp" date
, "balance" float
, "rollingSum" float
)
insert into "data" ( "item", "timestamp", "balance", "rollingSum" ) values
( 1, '2014-02-10', -10, -10 )
, ( 1, '2014-02-15', 5, -5 )
, ( 1, '2014-02-20', 2, -3 )
, ( 1, '2014-02-25', 13, 10 )
, ( 2, '2014-02-13', 15, 15 )
, ( 2, '2014-02-16', 15, 30 )
, ( 2, '2014-03-01', 15, 45 )
I need to get all rows in an defined time interval. The above table doesn't hold a record per item for each possible date - only dates on which changes applied are recorded ( it is possible that there are n rows per timestamp per item )
If the given interval does not fit exactly on stored timestamps, the latest timestamp before startdate ( nearest smallest neighbour ) should be used as start-balance/rolling-sum.
estimated results ( time interval: startdate = '2014-02-13', enddate = '2014-02-20' )
"item", "timestamp" , "balance", "rollingSum"
1 , '2014-02-13' , -10 , -10
1 , '2014-02-15' , 5 , -5
1 , '2014-02-20' , 2 , -3
2 , '2014-02-13' , 15 , 15
2 , '2014-02-16' , 15 , 30
I checked questions like this and googled a lot, but didn't found a solution yet.
I don't think it's a good idea to extend "data" table with one row per missing date per item, thus the complete interval ( smallest date <-----> latest date per item may expand over several years ).
Thanks in advance!
select sum(balance)
from table
where timestamp >= (select max(timestamp) from table where timestamp <= 'startdate')
and timestamp <= 'enddate'
Don't know what you mean by rolling-sum.
here is an attempt. Seems it gives the right result, not so beautiful. Would have been easier in sqlserver 2012+:
declare #from date = '2014-02-13'
declare #to date = '2014-02-20'
;with x as
(
select
item, timestamp, balance, row_number() over (partition by item order by timestamp, balance) rn
from (select item, timestamp, balance from data
union all
select distinct item, #from, null from data) z
where timestamp <= #to
)
, y as
(
select item,
timestamp,
coalesce(balance, rollingsum) balance ,
a.rollingsum,
rn
from x d
cross apply
(select sum(balance) rollingsum from x where rn <= d.rn and d.item = item) a
where timestamp between '2014-02-13' and '2014-02-20'
)
select item, timestamp, balance, rollingsum from y
where rollingsum is not null
order by item, rn, timestamp
Result:
item timestamp balance rollingsum
1 2014-02-13 -10,00 -10,00
1 2014-02-15 5,00 -5,00
1 2014-02-20 2,00 -3,00
2 2014-02-13 15,00 15,00
2 2014-02-16 15,00 30,00