Is it possible to Average after every fixed interval and group by one column in MSSQL ?
Suppose I have a table A as under:
NAME Interval Data1 Data2
1 0.01 1 4
1 0.05 4 2
1 0.09 7 6
1 0.11 1 2
1 0.15 7 6
1 0.18 3 1
1 0.19 2 5
2 0.209 9 0
I want the Output to group by Name and run average every 10 counts.
So for expamle
Name - 1
Interval Start - 0
Interval End - 10
Data 1 Avg - 4 [(1 + 4 + 7) / 3]
Data 3 Avg - 4 [(4 + 2 + 6) / 3]
AND
Name - 1
Interval Start - 10
Interval End - 20
Data 1 Avg - 3.25 [(1 + 7 + 3 + 2) / 4]
Data 3 Avg - 3.50 [(2 + 6 + 1 + 5) / 4]
So I want the Ouput as below. The interval per "Name" column is different.
Name Interval-Start Interval-End DataAvg1 DataAvg2
1 0 10 4 4
1 10 20 3.25 3.50
2 0 10 0 0
2 10 20 0 0
2 20 30 9 0
I used the below query, but cant figure out logic per interval.
SELECT Name, Interval, AVG(Data1) AS Data1Avg, AVG(Data2) AS Data2Avg
FROM TableA
GROUP BY Name;
Can someone please help me with it.
using cursor and temp table
--drop table dbo.#result
--drop table dbo.#steps
CREATE TABLE dbo.#result
(
[Name] varchar(50),
[Interval-Start] float,
[Interval-End] float,
[DataAvg1] float,
[DataAvg2] float
)
CREATE TABLE dbo.#steps
(
[IntervalStart] float,
[IntervalEnd] float
)
declare #min int, #max int, #step float
DECLARE #Name varchar(50), #IntervalStart float, #IntervalEnd float;
set #min = 0
set #max = 1
set #step = 0.1
insert into #steps
select #min + Number * #step IntervalStart, #min + Number * #step + #step IntervalEnd
from master..spt_values
where type = 'P' and number between 0 and (#max - #min) / #step
DECLARE _cursor CURSOR FOR
SELECT [Name], [IntervalStart], [IntervalEnd] FROM
(select [Name] from [TableA] Group by [Name]) t
INNER JOIN #steps on 1=1
OPEN _cursor;
FETCH NEXT FROM _cursor
INTO #Name, #IntervalStart, #IntervalEnd;
WHILE ##FETCH_STATUS = 0
BEGIN
insert into dbo.#result
select #Name, #IntervalStart, #IntervalEnd, AVG(CAST(Data1 as FLOAT)), AVG(CAST(Data2 as FLOAT))
FROM [TableA]
where [NAME] = #Name and Interval between #IntervalStart and #IntervalEnd
FETCH NEXT FROM _cursor
INTO #Name, #IntervalStart, #IntervalEnd;
END
CLOSE _cursor;
DEALLOCATE _cursor;
select * from dbo.#result
I have my data that looks like this:
user_id touchpoint_number days_difference
1 1 5
1 2 20
1 3 25
1 4 10
2 1 2
2 2 30
2 3 4
I would like to create one more column that would create a cumulative sum of the days_difference, partitioned by user_id, but would reset whenever the value reaches 30 and starts counting from 0. I have been trying to do it, but I couldn't figure it out how to do it in PostgreSQL, because it has to be recursive.
The outcome I would like to have would be something like:
user_id touchpoint_number days_difference cum_sum_upto30
1 1 5 5
1 2 20 25
1 3 25 0 --- new count all over again
1 4 10 10
2 1 2 2
2 2 30 0 --- new count all over again
2 3 4 4
Do you have any cool ideas how this could be done?
This should do what you want:
with cte as (
select t.a, t.b, t.c, t.c as sumc
from t
where b = 1
union all
select t.a, t.b, t.c,
(case when t.c + cte.sumc > 30 then 0 else t.c + cte.sumc end)
from t join
cte
on t.b = cte.b + 1 and t.a = cte.a
)
select *
from cte
order by a, b;
Here is a rextester.
TSQL MSSQL 2008r2
I'm re-writing the question to try and make it clear what the issue is that I'm trying to explain.
I've got a stored proc that takes 3 parameters. VehicleKey, StartDate and EndDateTime. I'm querying a Data Warehouse db. So the data shouldn't change.
When the proc is called with the same parameters then most of the time the results will be as expected but on some random occasions, with those same parameters, the results differ. I'm querying a Data WH so the data doesn't change.
The problem is with the dynamic derived column "Island".
It's completely random. The proc can be executed 20 times and give the expected results and then the next 2 will give incorrect results.
There can be 1 or more VehicleKey/DriverKey combinations in a given date range.
This is the problem query
SELECT
A.VehicleKey
,A.NodeId
,A.DriverKey
,MIN(A.StartTrip) 'StartTrip'
,MAX(A.EndTrip) 'EndTrip'
,SUM(A.PrivOdo) 'Private'
,SUM(A.BusOdo) 'Business'
,SUM(A.TravOdo) 'Travel'
,SUM(A.PrivOdo + A.BusOdo + A.TravOdo )'Total'
FROM
(
SELECT
Island = ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey ORDER BY MONTH(StartTrip)) ) - ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey, T.DriverKey ORDER BY T.StartTrip) )
,NodeId
,VehicleKey
,DriverKey
,StartTrip
,EndTrip
,BusOdo
,PrivOdo
,TravOdo
FROM
#xYTD_BPTotals T
) AS A
GROUP BY
A.Island
,A.VehicleKey
,A.NodeId
,A.DriverKey
ORDER BY
A.VehicleKey
,MIN(A.StartTrip);
I am of the understanding that the ORDER BY should be on the outside of the derived table for it to take effect.
I think I've narrowed it down to the issue presenting itself only when a Vehicle has 2 or more DriverKey combinations.
for example, Parameters VehicleKey 4865, StartDateTime = '2016-01-01', EndDateTime = '2016-10-31'
This is the correct result - including Island column
VehicleKey NodeId DriverKey Island StartTrip EndTrip Private Business Travel Total_
4865 458 0 0 2016-09-06 14:06:08 2016-09-28 17:02:08 54.75 737.83 0 792.58
4865 458 1202 134 2016-09-29 11:10:04 2016-09-30 17:25:51 0 211.32 0 211.32
4865 458 0 27 2016-10-03 07:39:25 2016-10-14 17:00:15 0 579.81 0 579.81
and this is when it's wrong. Parameters VehicleKey 4865, StartDateTime = '2016-01-01', EndDateTime = '2016-10-31'
- including Island column
The first two rows here should be combined.
VehicleKey NodeId DriverKey Island StartTrip EndTrip Private Business Travel Total_
4865 458 0 98 2016-09-06 14:06:08 2016-09-21 09:15:49 0 313.87 0 313.87
4865 458 0 -63 2016-09-21 09:21:10 2016-09-28 17:02:08 54.75 423.96 0 478.71
4865 458 1202 71 2016-09-29 11:10:04 2016-09-30 17:25:51 0 211.32 0 211.32
4865 458 0 27 2016-10-03 07:39:25 2016-10-14 17:00:15 0 579.81 0 579.81
If I show the first few rows from the derived table, I've broken down the "Island" column
SELECT
Island = ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey ORDER BY MONTH(StartTrip)) ) - ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey, T.DriverKey ORDER BY T.StartTrip) )
,Island_x =( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey ORDER BY MONTH(StartTrip)) )
,Island_y = ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey, T.DriverKey ORDER BY T.StartTrip) )
,NodeId
,VehicleKey
,DriverKey
,StartTrip
,EndTrip
,BusOdo
,PrivOdo
,TravOdo
FROM
#xYTD_BPTotals T
The correct result should be
Island Island_x Island_y NodeId VehicleKey DriverKey StartTrip EndTrip BusOdo PrivOdo TravOdo
0 1 1 24901 4865 0 2016-09-06 14:06:08 2016-09-06 14:08:50 0 0 0
0 2 2 24901 4865 0 2016-09-06 15:39:14 2016-09-06 15:40:53 114 0 0
0 3 3 24901 4865 0 2016-09-08 11:06:43 2016-09-08 11:07:23 0 0 0
0 4 4 24901 4865 0 2016-09-08 11:12:03 2016-09-08 11:12:26 20 0 0
0 5 5 24901 4865 0 2016-09-08 11:19:20 2016-09-08 11:19:52 1 0 0
0 6 6 24901 4865 0 2016-09-08 11:26:58 2016-09-08 11:27:56 88 0 0
0 7 7 24901 4865 0 2016-09-08 11:33:40 2016-09-08 11:35:02 1 0 0
0 8 8 24901 4865 0 2016-09-12 09:08:53 2016-09-12 09:10:42 34 0 0
but I sometimes get this with the same input paramaters.
Island Island_x Island_y NodeId VehicleKey DriverKey StartTrip EndTrip BusOdo PrivOdo TravOdo
98 1 1 24901 4865 0 2016-09-06 14:06:08 2016-09-06 14:08:50 0 0 0
98 2 2 24901 4865 0 2016-09-06 15:39:14 2016-09-06 15:40:53 114 0 0
98 3 3 24901 4865 0 2016-09-08 11:06:43 2016-09-08 11:07:23 0 0 0
98 4 4 24901 4865 0 2016-09-08 11:12:03 2016-09-08 11:12:26 20 0 0
98 5 5 24901 4865 0 2016-09-08 11:19:20 2016-09-08 11:19:52 1 0 0
98 6 6 24901 4865 0 2016-09-08 11:26:58 2016-09-08 11:27:56 88 0 0
98 7 7 24901 4865 0 2016-09-08 11:33:40 2016-09-08 11:35:02 1 0 0
98 8 8 24901 4865 0 2016-09-12 09:08:53 2016-09-12 09:10:42 34 0 0
Why is the "Island" calculated column wrong? 1-1 = 0 not 98.
Where am I going wrong?
EDIT - #YourData now looks like your raw table
Declare #YourTable table (VehicleKey int,NodeId int,DriverKey int,StartTrip datetime,EndTrip datetime,PrivOdo decimal(10,2),BusOdo decimal(10,2), TravOdo decimal(10,2))
Insert Into #YourTable values
(4865,458,0 ,'2016-09-06 14:06:08','2016-09-21 09:15:49',0 ,313.87,0),
(4865,458,0 ,'2016-09-21 09:21:10','2016-09-28 17:02:08',54.75,423.96,0),
(4865,458,1202,'2016-09-29 11:10:04','2016-09-30 17:25:51',0 ,211.32,0),
(4865,458,0 ,'2016-10-03 07:39:25','2016-10-14 17:00:15',0 ,579.81,0)
Select VehicleKey
,NodeID
,VehicleKey
,DriverKey
,StartTrip = min(StartTrip)
,EndTrip = max(EndTrip)
,Private = sum(PrivOdo)
,Business = sum(BusOdo)
,Travel = sum(TravOdo)
,Total = sum(PrivOdo + BusOdo + TravOdo )
From (
Select Island = ( ROW_NUMBER() OVER (PARTITION BY VehicleKey ORDER BY MONTH(StartTrip)) ) - ( ROW_NUMBER() OVER (PARTITION BY VehicleKey, DriverKey ORDER BY StartTrip) )
,*
From #YourTable
) A
Group By Island,VehicleKey,NodeID,VehicleKey,DriverKey
Order By min(StartTrip)
Returns
FYI - The sub-query produces
Is there a way to detect subseries of zeros of length at least 3 within a time series in Postgres?
year value
--------------
1 0
2 0
3 0
4 33
5 72
6 0
7 0
8 0
9 0
10 25
11 0
12 56
13 37
So in this example I'd like to return years 1-3 and 6-9, but not year 11.
This one will do it:
WITH d(y,v) AS (VALUES
(1,0),(2,0),(3,0),(4,33),(5,72),
(6,0),(7,0),(8,0),(9,0),(10,25),
(11,0),(12,56),(13,37)
)
SELECT grp, numrange(min(y),max(y),'[]') as ys, count(*) as len
FROM (
/* group identifiers via running total */
SELECT y, v, g, sum(g) OVER (ORDER BY y) grp
FROM (
/* group boundaries */
SELECT y, v, CASE WHEN
v IS DISTINCT FROM lag(v) OVER (ORDER BY y) THEN 1
END g
FROM d) s
WHERE v=0) s
GROUP BY grp
HAVING count(*) >= 3;
Recently i needed to implement a way to allow for Table Records to be Ranked.
Initially i deployed an Update statement to seed the ranks:
;with cte as (
select
t.id,
Rank() Over (
Partition by t.field2
Order by t.id
) as [Rank],
t.index,
t.field2,
t.field3 ,
t.field4
from dbo.Table t
where t.field2 = #fldValue
) Update cte
set index = [Rank]
But now i need to be able to have the end-user re-order the ranks. Any suggestions on how to allow an end-user to take Rank value 92 to Rank value 15 and have everything be re-ranked appropriately.
I had thought about doing this via cursor but am trying to do this via Set based operation.
My first goto was to do a Procedural based operation but need to get more inline with Set based operation.
Table Schema
Table:
id bigint
field2 int
field3 int ---> This field will be the key pivoting column for ranking
field4 int
Data:
id field2 field3 field4
1 0 1 1
2 0 1 1
3 0 1 1
4 0 1 2
5 0 1 2
6 0 1 1
7 0 1 1
8 0 1 1
9 0 1 1
10 0 1 2
11 0 1 2
12 0 1 1
13 0 1 1
14 0 1 1
15 0 1 2
16 0 1 1
17 0 1 2
18 0 1 2
19 0 1 1