Rolling sum per time interval per group - tsql

Table, data and task as follows.
See SQL-Fiddle-Link for demo-data and estimated results.
create table "data"
(
"item" int
, "timestamp" date
, "balance" float
, "rollingSum" float
)
insert into "data" ( "item", "timestamp", "balance", "rollingSum" ) values
( 1, '2014-02-10', -10, -10 )
, ( 1, '2014-02-15', 5, -5 )
, ( 1, '2014-02-20', 2, -3 )
, ( 1, '2014-02-25', 13, 10 )
, ( 2, '2014-02-13', 15, 15 )
, ( 2, '2014-02-16', 15, 30 )
, ( 2, '2014-03-01', 15, 45 )
I need to get all rows in an defined time interval. The above table doesn't hold a record per item for each possible date - only dates on which changes applied are recorded ( it is possible that there are n rows per timestamp per item )
If the given interval does not fit exactly on stored timestamps, the latest timestamp before startdate ( nearest smallest neighbour ) should be used as start-balance/rolling-sum.
estimated results ( time interval: startdate = '2014-02-13', enddate = '2014-02-20' )
"item", "timestamp" , "balance", "rollingSum"
1 , '2014-02-13' , -10 , -10
1 , '2014-02-15' , 5 , -5
1 , '2014-02-20' , 2 , -3
2 , '2014-02-13' , 15 , 15
2 , '2014-02-16' , 15 , 30
I checked questions like this and googled a lot, but didn't found a solution yet.
I don't think it's a good idea to extend "data" table with one row per missing date per item, thus the complete interval ( smallest date <-----> latest date per item may expand over several years ).
Thanks in advance!

select sum(balance)
from table
where timestamp >= (select max(timestamp) from table where timestamp <= 'startdate')
and timestamp <= 'enddate'
Don't know what you mean by rolling-sum.

here is an attempt. Seems it gives the right result, not so beautiful. Would have been easier in sqlserver 2012+:
declare #from date = '2014-02-13'
declare #to date = '2014-02-20'
;with x as
(
select
item, timestamp, balance, row_number() over (partition by item order by timestamp, balance) rn
from (select item, timestamp, balance from data
union all
select distinct item, #from, null from data) z
where timestamp <= #to
)
, y as
(
select item,
timestamp,
coalesce(balance, rollingsum) balance ,
a.rollingsum,
rn
from x d
cross apply
(select sum(balance) rollingsum from x where rn <= d.rn and d.item = item) a
where timestamp between '2014-02-13' and '2014-02-20'
)
select item, timestamp, balance, rollingsum from y
where rollingsum is not null
order by item, rn, timestamp
Result:
item timestamp balance rollingsum
1 2014-02-13 -10,00 -10,00
1 2014-02-15 5,00 -5,00
1 2014-02-20 2,00 -3,00
2 2014-02-13 15,00 15,00
2 2014-02-16 15,00 30,00

Related

Moving grouped MEDIAN / Get the MEDIAN of specific months from the past IN T-SQL

Let's say I have a table:
DATE
ID
VALUE
01.2010
1
100
02.2010
1
200
...
...
...
12.2010
1
300
01.2011
1
150
02.2011
1
250
...
...
...
12.2011
1
350
01.2012
1
200
02.2012
1
300
...
...
...
12.2012
1
400
I want to get a median of VALUE grouped by months i.e. get something like
DATE
ID
VALUE
MEDIAN
01.2010
1
100
100
02.2010
1
200
200
...
...
...
...
12.2010
1
300
300
01.2011
1
150
125 = (100+150)/2
02.2011
1
250
225 = (200+250)/2
...
...
...
...
12.2011
1
350
325 = (300+350)/2
01.2012
1
200
150
02.2012
1
300
250
...
...
...
...
12.2012
1
400
350
I have more ID in table so I would like to get this result for every ID.
I have tried doing
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY VALUE) OVER (PARTITION BY Id, MONTH(Date) ORDER BY Date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
but I get "The function 'PERCENTILE_CONT' may not have a window frame.
I've also tried the following (but also without any results):
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY VALUE)
OVER (PARTITION BY Id, MONTH(Date))
FROM tab1 LEFT JOIN tab2
ON tab1.key = tab2.key
WHERE tab1.Date BETWEEN Min(Date) AND tab2.Date
EDIT
So far I have resolved it with
SELECT (CASE WHEN Date =2010 THEN PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY CASE WHEN Date = 2010 THEN VALUE ELSE NULL) OVER (PARTITION BY Id, MONTH(Date)) ELSE 0 END) +
(CASE WHEN Date =2011 THEN PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY CASE WHEN Date <= 2011 THEN VALUE ELSE NULL) OVER (PARTITION BY Id, MONTH(Date)) ELSE 0 END) +
(CASE WHEN Date =2012 THEN PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY CASE WHEN Date <= 2012 THEN VALUE ELSE NULL) OVER (PARTITION BY Id, MONTH(Date)) ELSE 0 END)
FROM tab1
But to be honest, I would like to have an resolution without assumption of a priori knowledge of dates. I've thought about WHILE LOOP and updating column while #MinYear <= #MaxYear where in every iteration #MinYear = #MinYear+1 but in this case I would have to create temporary tables which I'm trying to avoid.
My idea is to use (Value1+value2)/2 as median as your requirement is little complicated.
CREATE TABLE MedianData
(
[Date] VARCHAR(100)
,ID INT
,[Value] INT
)
INSERT INTO MedianData VALUES ('01.2010', 1, 100)
,('02.2010', 1, 200)
,('12.2010', 1, 300)
,('01.2011', 1, 150)
,('02.2011', 1, 250)
,('12.2011', 1, 350)
,('01.2012', 1, 200)
,('02.2012', 1, 300)
,('12.2012', 1, 400)
SELECT *
,ROW_NUMBER() OVER ( PARTITION BY Substring([Date],1,2 ) ORDER BY [Date] ) AS [row]
,Substring([Date],1,2 ) as [MONTH]
INTO #Temp_tbl2
FROM MedianData
SELECT
A.Date
,A.ID
,A.[Value]
--Logic is applied here. I used (Value1+value2)/2 as median
,CASE WHEN A.[row] = 3 THEN ( A.[Value] + ( SELECT T.[Value] FROM #Temp_tbl2
T where T.[MONTH] = Substring(A.[Date],1,2 ) AND T.[row] = 1 ) )/2
WHEN A.[row] != 1 THEN (A.total/2)
ELSE A.total END as [Median]
INTO #Temp_table
FROM
(
SELECT *
,ROW_NUMBER() OVER ( PARTITION BY Substring([Date],1,2 ) ORDER BY [Date] ) AS [row]
,SUM ([Value] ) OVER ( PARTITION BY Substring([Date],1,2 ) ORDER BY [Date] ) AS [total]
FROM MedianData
) AS A
--to make the table data order
SELECT MedianData.*, #Temp_table.Median
FROM MedianData
INNER JOIN #Temp_table
ON MedianData.[Date] = #Temp_table.[Date]
drop table #Temp_table
drop table #Temp_tbl2

How does this Time Difference Calculation work?

I wanted to display the difference in HH:MM:SS between two datetime fields in SQL Server 2014.
I found a solution in this Stack Overflow post. And it works perfectly. But I want to understand the "why" of how this arrives at the correct answer.
T-SQL:
SELECT y.CustomerID ,
y.createDate ,
y.HarvestDate ,
y.DateDif ,
DATEDIFF ( DAY, 0, y.DateDif ) AS [Days] ,
DATEPART ( HOUR, y.DateDif ) AS [Hours] ,
DATEPART ( MINUTE, y.DateDif ) AS [Minutes]
FROM (
SELECT x.createDate - x.HarvestDate AS [DateDif] ,
x.createDate ,
x.HarvestDate ,
x.CustomerID
FROM (
SELECT CustomerID ,
HarvestDate ,
createDate
FROM dbo.CustomerHarvestReports
WHERE HarvestDate >= DATEADD ( MONTH, -6, GETDATE ())
) AS [x]
) AS [y]
ORDER BY DATEDIFF ( DAY, 0, y.DateDif ) DESC;
Results:
1239090 2017-11-07 08:51:03.870 2017-10-14 11:39:49.540 1900-01-24 21:11:14.330 23 21 11
1239090 2017-11-07 08:51:04.823 2017-10-19 11:17:48.320 1900-01-19 21:33:16.503 18 21 33
1843212 2017-10-27 19:14:02.070 2017-10-21 10:49:57.733 1900-01-07 08:24:04.337 6 8 24
1843212 2017-10-27 19:14:03.057 2017-10-21 10:49:57.733 1900-01-07 08:24:05.323 6 8 24
The first column in Customer ID - the second and third columns are the columns I wanted to calculate the time difference between. The third column is the difference between the two columns - and one of the points in the code in which I do not understand.
If you subtract two datetime fields like this create date - harvestdate, why does it default to the year 1900?
And regarding DATEDIFF ( DAY, 0 , y.DateDiff) - what does the 0 mean? Does the 0 set the date as '01-01-1900'?
It works - for that I am grateful. I was hoping I could get an explanation as to why this behavior works?
I've added some comments that should explain it:
SELECT y.CustomerID ,
y.createDate ,
y.HarvestDate ,
y.DateDif ,
DATEDIFF ( DAY, 0, y.DateDif ) AS [Days] , -- calculates the number of whole days between 0 and the difference
DATEPART ( HOUR, y.DateDif ) AS [Hours] , -- the number of hours between the two dates has already been cleverly
-- calculated in [DateDif], therefore, all that is required is to extract
-- that figure using DATEPART
DATEPART ( MINUTE, y.DateDif ) AS [Minutes] -- same explanation as [Hours]
FROM (
SELECT x.createDate - x.HarvestDate AS [DateDif] , -- calculates the difference expressed as a datetime;
-- 0 is '1900-01-01 00:00:00.000' as a datetime, so the
-- resulting datetime will be that plus the difference
x.createDate ,
x.HarvestDate ,
x.CustomerID
FROM (
SELECT CustomerID ,
HarvestDate ,
createDate
FROM dbo.CustomerHarvestReports
WHERE HarvestDate >= DATEADD ( MONTH, -6, GETDATE ())
) AS [x]
) AS [y]
ORDER BY DATEDIFF ( DAY, 0, y.DateDif ) DESC;

Find date sequence in PostgreSQL

I'm trying to find the maximum sequence of days by customer in my data. I want to understand what is the max sequence of days that specific customer made. If someone enter to my app in the 25/8/16 AND 26/08/16 AND 27/08/16 AND 01/09/16 AND 02/09/16 - The max sequence will be 3 days (25,26,27).
In the end (The output) I want to get two fields: custid | MaxDaySequence
I have the following fields in my data table: custid | orderdate(timestemp)
For exmple:
custid orderdate
1 25/08/2007
1 03/10/2007
1 13/10/2007
1 15/01/2008
1 16/03/2008
1 09/04/2008
2 18/09/2006
2 08/08/2007
2 28/11/2007
2 04/03/2008
3 27/11/2006
3 15/04/2007
3 13/05/2007
3 19/06/2007
3 22/09/2007
3 25/09/2007
3 28/01/2008
I'm using PostgreSQL 2014.
Thanks
Trying:
select custid, max(num_days) as longest
from (
select custid,rn, count (*) as num_days
from (
select custid, date(orderdate),
cast (row_number() over (partition by custid order by date(orderdate)) as varchar(5)) as rn
from table_
) x group by custid, CURRENT_DATE - INTERVAL rn|| ' day'
) y group by custid
Try:
SELECT custid, max( abc ) as max_sequence_of_days
FROM (
SELECT custid, yy, count(*) abc
FROM (
SELECT * ,
SUM( xx ) OVER (partition by custid order by orderdate ) yy
FROM (
select * ,
CASE WHEN
orderdate - lag( orderdate ) over (partition by custid order by orderdate )
<= 1
THEN 0 ELSE 1 END xx
from mytable
) x
) z
GROUP BY custid, yy
) q
GROUP BY custid
Demo: http://sqlfiddle.com/#!15/00422/11
===== EDIT ===========
Got "operator does not exist: interval <= integer"
This means that orderdate column is of type timestamp, not date.
In this case you need to use <= interval '1' day condition instead of <= 1:
Please see this link: https://www.postgresql.org/docs/9.0/static/functions-datetime.html to learn more about date arithmetic in PostgreSQL
Please see this demo:
http://sqlfiddle.com/#!15/7c2200/2
SELECT custid, max( abc ) as max_sequence_of_days
FROM (
SELECT custid, yy, count(*) abc
FROM (
SELECT * ,
SUM( xx ) OVER (partition by custid order by orderdate ) yy
FROM (
select * ,
CASE WHEN
orderdate - lag( orderdate ) over (partition by custid order by orderdate )
<= interval '1' day
THEN 0 ELSE 1 END xx
from mytable
) x
) z
GROUP BY custid, yy
) q
GROUP BY custid

Get Data From Postgres Table At every nth interval

Below is my table and i am inserting data from my windows .Net application at every 1 Second Interval. i want to write query to fetch data from the table at every nth interval for example at every 5 second.Below is the query i am using but not getting result as required. Please Help me
CREATE TABLE table_1
(
timestamp_col timestamp without time zone,
value_1 bigint,
value_2 bigint
)
This is my query which i am using
select timestamp_col,value_1,value_2
from (
select timestamp_col,value_1,value_2,
INTERVAL '5 Seconds' * (row_number() OVER(ORDER BY timestamp_col) - 1 )
+ timestamp_col as r
from table_1
) as dt
Where r = 1
Use date_part() function with modulo operator:
select timestamp_col, value_1, value_2
from table_1
where date_part('second', timestamp_col)::int % 5 = 0

SQL get count grouped by week and type given a date interval

Given a table of projects with start and end dates as well as a region that they are taking place, I am trying to get my result to output the number of active projects per week over a given interval and grouped by region.
I have many records in Projects that look like
region start_date end_date
Alabama 2012-07-08 2012-08-15
Texas 2012-06-13 2012-07-24
Alabama 2012-07-25 2012-09-13
Texas 2012-08-08 2012-10-28
Florida 2012-07-03 2012-08-07
Lousiana 2012-07-14 2012-08-12
....
If I want results for a single week, I can do something like
DECLARE #today datetime
SET #today ='2012-11-09'
SELECT
[Region],
count(*) as ActiveProjectCount
FROM [MyDatabase].[dbo].[Projects]
where (datecompleted is null and datestart < #today) OR (datestart < #today AND #today < datecompleted)
Group by region
order by region asc
This produces
Region ActiveProjectCount
Arkansas 15
Louisiana 18
North Dakota 18
Oklahoma 27
...
How can I alter this query to produce results that look like
Region 10/06 10/13 10/20 10/27
Arkansas 15 17 12 14
Louisiana 3 0 1 5
North Dakota 18 17 16 15
Oklahoma 27 23 19 22
...
Where on a weekly interval, I am able to see the total number of active projects (projects between start and end date)
you could do sth. like this:
with "nums"
as
(
select 1 as "value"
union all select "value" + 1 as "value"
from "nums"
where "value" <= 52
)
, "intervals"
as
(
select
"id" = "value"
, "startDate" = cast( dateadd( week, "value" - 1, dateadd(year, datediff(year, 0, getdate()), 0)) as date )
, "endDate" = cast( dateadd( week, "value", dateadd( year, datediff( year, 0, getdate()), 0 )) as date )
from
"nums"
)
, "counted"
as
(
select
"intervalId" = I."id"
, I."startDate"
, I."endDate"
, D."region"
, "activeProjects" = count(D."region") over ( partition by I."id", D."region" )
from
"intervals" I
inner join "data" D
on D."startDate" <= I."startDate"
and D."endDate" > I."endDate"
)
select
*
from
(
select
"region"
, "intervalId"
from
"counted"
) as Data
pivot
( count(Data."intervalId") for intervalId in ("25", "26", "27", "28", "29", "30", "31")) as p
intervals can be defined as you wish.
see SQL-Fiddle