SQL create update statement using python script - python-3.7

trying to format an sql update using script, I have columns in database named 'col_prev12', 'col_prev24', etc which needs to add up monthly buckets i.e. 1-12 for prev12 and 13-24 for prev24 this samething needs to happen multiple times. Created an array to spin thru and try and format
prev12cols = [
"colA",
"colB",
"colC",
"colD"
]
for col_prefix in prev12cols:
monthly_cols.extend(["{0} = {3}{1}+{3}{2}".format(col_prefix + "_prev_12Mo", str(i),str(i+1), col_prefix) for i in range(1,12)])
Each array element corresponds to column named colA1, colA2, ...colA24, what I want is something like colA_prev_12Mo = colA1 + colA2 + ...colA12 and then do same for 13-24

Choose meaningful variable names, don't do too many things at once, rather take things step by step. We start at just the column prefixes:
column_prefixes = ["colA", "colB", "colC", "colD"]
for column_prefix in column_prefixes:
print(column_prefix)
Which prints:
colA
colB
colC
colD
Now, we add an additional loop for how many years back the data is. Remember that range doesn't include the "last step", so if we want the current year 0 and one year back 1, we need to run range(0, 2). It might help you, if you write range(0, 1 + 1) instead:
column_prefixes = ["colA", "colB", "colC", "colD"]
for column_prefix in column_prefixes:
for year in range(0, 1 + 1):
print(column_prefix, year)
Which prints:
colA 0
colA 1
colB 0
colB 1
colC 0
colC 1
colD 0
colD 1
Now, we add the final loop for the number of the month. You will have to choose start and stop for range depending on year:
column_prefixes = ["colA", "colB", "colC", "colD"]
for column_prefix in column_prefixes:
for year in range(0, 1 + 1):
for month in range(year * 12 + 1, (year + 1) * 12 + 1):
print(column_prefix, year, month)
Which prints:
colA 0 1
colA 0 2
colA 0 3
colA 0 4
colA 0 5
colA 0 6
colA 0 7
colA 0 8
colA 0 9
colA 0 10
colA 0 11
colA 0 12
colA 1 13
colA 1 14
colA 1 15
colA 1 16
colA 1 17
colA 1 18
colA 1 19
colA 1 20
colA 1 21
colA 1 22
colA 1 23
colA 1 24
colB 0 1
colB 0 2
colB 0 3
colB 0 4
colB 0 5
colB 0 6
colB 0 7
colB 0 8
colB 0 9
colB 0 10
colB 0 11
colB 0 12
colB 1 13
colB 1 14
colB 1 15
colB 1 16
colB 1 17
colB 1 18
colB 1 19
colB 1 20
colB 1 21
colB 1 22
colB 1 23
colB 1 24
colC 0 1
colC 0 2
colC 0 3
colC 0 4
colC 0 5
colC 0 6
colC 0 7
colC 0 8
colC 0 9
colC 0 10
colC 0 11
colC 0 12
colC 1 13
colC 1 14
colC 1 15
colC 1 16
colC 1 17
colC 1 18
colC 1 19
colC 1 20
colC 1 21
colC 1 22
colC 1 23
colC 1 24
colD 0 1
colD 0 2
colD 0 3
colD 0 4
colD 0 5
colD 0 6
colD 0 7
colD 0 8
colD 0 9
colD 0 10
colD 0 11
colD 0 12
colD 1 13
colD 1 14
colD 1 15
colD 1 16
colD 1 17
colD 1 18
colD 1 19
colD 1 20
colD 1 21
colD 1 22
colD 1 23
colD 1 24
Now, all the information we need is available as a variable at some point during the loops and we'll only need to assemble the line you want to see:
column_prefixes = ["colA", "colB", "colC", "colD"]
for column_prefix in column_prefixes:
for year in range(0, 1 + 1):
line = column_prefix + "_" + str(year + 1) + "_years_back = "
for month in range(year * 12 + 1, (year + 1) * 12 + 1):
line += column_prefix + str(month) + " + "
line = line[:-3]
print(line)
Which prints:
colA_1_years_back = colA1 + colA2 + colA3 + colA4 + colA5 + colA6 + colA7 + colA8 + colA9 + colA10 + colA11 + colA12
colA_2_years_back = colA13 + colA14 + colA15 + colA16 + colA17 + colA18 + colA19 + colA20 + colA21 + colA22 + colA23 + colA24
colB_1_years_back = colB1 + colB2 + colB3 + colB4 + colB5 + colB6 + colB7 + colB8 + colB9 + colB10 + colB11 + colB12
colB_2_years_back = colB13 + colB14 + colB15 + colB16 + colB17 + colB18 + colB19 + colB20 + colB21 + colB22 + colB23 + colB24
colC_1_years_back = colC1 + colC2 + colC3 + colC4 + colC5 + colC6 + colC7 + colC8 + colC9 + colC10 + colC11 + colC12
colC_2_years_back = colC13 + colC14 + colC15 + colC16 + colC17 + colC18 + colC19 + colC20 + colC21 + colC22 + colC23 + colC24
colD_1_years_back = colD1 + colD2 + colD3 + colD4 + colD5 + colD6 + colD7 + colD8 + colD9 + colD10 + colD11 + colD12
colD_2_years_back = colD13 + colD14 + colD15 + colD16 + colD17 + colD18 + colD19 + colD20 + colD21 + colD22 + colD23 + colD24
Since every loop iteration of the months adds a " + ", there is a superfluous + at the end after the last month that we need to remove with line = line[:-3], which deletes the last 3 characters of our line. Of course, instead of print, you can also append the line to a list.
Further reading: The code above is perfectly fine, but if you're interested in a different coding style, look into using f-strings and list comprehensions.
Final note: You might not need to write those lines in Python, it might be possible to do that with SQL only - but that's another question.

Related

Average after fixed interval and Group By in SQL

Is it possible to Average after every fixed interval and group by one column in MSSQL ?
Suppose I have a table A as under:
NAME Interval Data1 Data2
1 0.01 1 4
1 0.05 4 2
1 0.09 7 6
1 0.11 1 2
1 0.15 7 6
1 0.18 3 1
1 0.19 2 5
2 0.209 9 0
I want the Output to group by Name and run average every 10 counts.
So for expamle
Name - 1
Interval Start - 0
Interval End - 10
Data 1 Avg - 4 [(1 + 4 + 7) / 3]
Data 3 Avg - 4 [(4 + 2 + 6) / 3]
AND
Name - 1
Interval Start - 10
Interval End - 20
Data 1 Avg - 3.25 [(1 + 7 + 3 + 2) / 4]
Data 3 Avg - 3.50 [(2 + 6 + 1 + 5) / 4]
So I want the Ouput as below. The interval per "Name" column is different.
Name Interval-Start Interval-End DataAvg1 DataAvg2
1 0 10 4 4
1 10 20 3.25 3.50
2 0 10 0 0
2 10 20 0 0
2 20 30 9 0
I used the below query, but cant figure out logic per interval.
SELECT Name, Interval, AVG(Data1) AS Data1Avg, AVG(Data2) AS Data2Avg
FROM TableA
GROUP BY Name;
Can someone please help me with it.
using cursor and temp table
--drop table dbo.#result
--drop table dbo.#steps
CREATE TABLE dbo.#result
(
[Name] varchar(50),
[Interval-Start] float,
[Interval-End] float,
[DataAvg1] float,
[DataAvg2] float
)
CREATE TABLE dbo.#steps
(
[IntervalStart] float,
[IntervalEnd] float
)
declare #min int, #max int, #step float
DECLARE #Name varchar(50), #IntervalStart float, #IntervalEnd float;
set #min = 0
set #max = 1
set #step = 0.1
insert into #steps
select #min + Number * #step IntervalStart, #min + Number * #step + #step IntervalEnd
from master..spt_values
where type = 'P' and number between 0 and (#max - #min) / #step
DECLARE _cursor CURSOR FOR
SELECT [Name], [IntervalStart], [IntervalEnd] FROM
(select [Name] from [TableA] Group by [Name]) t
INNER JOIN #steps on 1=1
OPEN _cursor;
FETCH NEXT FROM _cursor
INTO #Name, #IntervalStart, #IntervalEnd;
WHILE ##FETCH_STATUS = 0
BEGIN
insert into dbo.#result
select #Name, #IntervalStart, #IntervalEnd, AVG(CAST(Data1 as FLOAT)), AVG(CAST(Data2 as FLOAT))
FROM [TableA]
where [NAME] = #Name and Interval between #IntervalStart and #IntervalEnd
FETCH NEXT FROM _cursor
INTO #Name, #IntervalStart, #IntervalEnd;
END
CLOSE _cursor;
DEALLOCATE _cursor;
select * from dbo.#result

Recursive Cumulative Sum up to a certain value Postgres

I have my data that looks like this:
user_id touchpoint_number days_difference
1 1 5
1 2 20
1 3 25
1 4 10
2 1 2
2 2 30
2 3 4
I would like to create one more column that would create a cumulative sum of the days_difference, partitioned by user_id, but would reset whenever the value reaches 30 and starts counting from 0. I have been trying to do it, but I couldn't figure it out how to do it in PostgreSQL, because it has to be recursive.
The outcome I would like to have would be something like:
user_id touchpoint_number days_difference cum_sum_upto30
1 1 5 5
1 2 20 25
1 3 25 0 --- new count all over again
1 4 10 10
2 1 2 2
2 2 30 0 --- new count all over again
2 3 4 4
Do you have any cool ideas how this could be done?
This should do what you want:
with cte as (
select t.a, t.b, t.c, t.c as sumc
from t
where b = 1
union all
select t.a, t.b, t.c,
(case when t.c + cte.sumc > 30 then 0 else t.c + cte.sumc end)
from t join
cte
on t.b = cte.b + 1 and t.a = cte.a
)
select *
from cte
order by a, b;
Here is a rextester.

TSQL Order BY on occasion doesn't order correctly

TSQL MSSQL 2008r2
I'm re-writing the question to try and make it clear what the issue is that I'm trying to explain.
I've got a stored proc that takes 3 parameters. VehicleKey, StartDate and EndDateTime. I'm querying a Data Warehouse db. So the data shouldn't change.
When the proc is called with the same parameters then most of the time the results will be as expected but on some random occasions, with those same parameters, the results differ. I'm querying a Data WH so the data doesn't change.
The problem is with the dynamic derived column "Island".
It's completely random. The proc can be executed 20 times and give the expected results and then the next 2 will give incorrect results.
There can be 1 or more VehicleKey/DriverKey combinations in a given date range.
This is the problem query
SELECT
A.VehicleKey
,A.NodeId
,A.DriverKey
,MIN(A.StartTrip) 'StartTrip'
,MAX(A.EndTrip) 'EndTrip'
,SUM(A.PrivOdo) 'Private'
,SUM(A.BusOdo) 'Business'
,SUM(A.TravOdo) 'Travel'
,SUM(A.PrivOdo + A.BusOdo + A.TravOdo )'Total'
FROM
(
SELECT
Island = ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey ORDER BY MONTH(StartTrip)) ) - ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey, T.DriverKey ORDER BY T.StartTrip) )
,NodeId
,VehicleKey
,DriverKey
,StartTrip
,EndTrip
,BusOdo
,PrivOdo
,TravOdo
FROM
#xYTD_BPTotals T
) AS A
GROUP BY
A.Island
,A.VehicleKey
,A.NodeId
,A.DriverKey
ORDER BY
A.VehicleKey
,MIN(A.StartTrip);
I am of the understanding that the ORDER BY should be on the outside of the derived table for it to take effect.
I think I've narrowed it down to the issue presenting itself only when a Vehicle has 2 or more DriverKey combinations.
for example, Parameters VehicleKey 4865, StartDateTime = '2016-01-01', EndDateTime = '2016-10-31'
This is the correct result - including Island column
VehicleKey NodeId DriverKey Island StartTrip EndTrip Private Business Travel Total_
4865 458 0 0 2016-09-06 14:06:08 2016-09-28 17:02:08 54.75 737.83 0 792.58
4865 458 1202 134 2016-09-29 11:10:04 2016-09-30 17:25:51 0 211.32 0 211.32
4865 458 0 27 2016-10-03 07:39:25 2016-10-14 17:00:15 0 579.81 0 579.81
and this is when it's wrong. Parameters VehicleKey 4865, StartDateTime = '2016-01-01', EndDateTime = '2016-10-31'
- including Island column
The first two rows here should be combined.
VehicleKey NodeId DriverKey Island StartTrip EndTrip Private Business Travel Total_
4865 458 0 98 2016-09-06 14:06:08 2016-09-21 09:15:49 0 313.87 0 313.87
4865 458 0 -63 2016-09-21 09:21:10 2016-09-28 17:02:08 54.75 423.96 0 478.71
4865 458 1202 71 2016-09-29 11:10:04 2016-09-30 17:25:51 0 211.32 0 211.32
4865 458 0 27 2016-10-03 07:39:25 2016-10-14 17:00:15 0 579.81 0 579.81
If I show the first few rows from the derived table, I've broken down the "Island" column
SELECT
Island = ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey ORDER BY MONTH(StartTrip)) ) - ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey, T.DriverKey ORDER BY T.StartTrip) )
,Island_x =( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey ORDER BY MONTH(StartTrip)) )
,Island_y = ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey, T.DriverKey ORDER BY T.StartTrip) )
,NodeId
,VehicleKey
,DriverKey
,StartTrip
,EndTrip
,BusOdo
,PrivOdo
,TravOdo
FROM
#xYTD_BPTotals T
The correct result should be
Island Island_x Island_y NodeId VehicleKey DriverKey StartTrip EndTrip BusOdo PrivOdo TravOdo
0 1 1 24901 4865 0 2016-09-06 14:06:08 2016-09-06 14:08:50 0 0 0
0 2 2 24901 4865 0 2016-09-06 15:39:14 2016-09-06 15:40:53 114 0 0
0 3 3 24901 4865 0 2016-09-08 11:06:43 2016-09-08 11:07:23 0 0 0
0 4 4 24901 4865 0 2016-09-08 11:12:03 2016-09-08 11:12:26 20 0 0
0 5 5 24901 4865 0 2016-09-08 11:19:20 2016-09-08 11:19:52 1 0 0
0 6 6 24901 4865 0 2016-09-08 11:26:58 2016-09-08 11:27:56 88 0 0
0 7 7 24901 4865 0 2016-09-08 11:33:40 2016-09-08 11:35:02 1 0 0
0 8 8 24901 4865 0 2016-09-12 09:08:53 2016-09-12 09:10:42 34 0 0
but I sometimes get this with the same input paramaters.
Island Island_x Island_y NodeId VehicleKey DriverKey StartTrip EndTrip BusOdo PrivOdo TravOdo
98 1 1 24901 4865 0 2016-09-06 14:06:08 2016-09-06 14:08:50 0 0 0
98 2 2 24901 4865 0 2016-09-06 15:39:14 2016-09-06 15:40:53 114 0 0
98 3 3 24901 4865 0 2016-09-08 11:06:43 2016-09-08 11:07:23 0 0 0
98 4 4 24901 4865 0 2016-09-08 11:12:03 2016-09-08 11:12:26 20 0 0
98 5 5 24901 4865 0 2016-09-08 11:19:20 2016-09-08 11:19:52 1 0 0
98 6 6 24901 4865 0 2016-09-08 11:26:58 2016-09-08 11:27:56 88 0 0
98 7 7 24901 4865 0 2016-09-08 11:33:40 2016-09-08 11:35:02 1 0 0
98 8 8 24901 4865 0 2016-09-12 09:08:53 2016-09-12 09:10:42 34 0 0
Why is the "Island" calculated column wrong? 1-1 = 0 not 98.
Where am I going wrong?
EDIT - #YourData now looks like your raw table
Declare #YourTable table (VehicleKey int,NodeId int,DriverKey int,StartTrip datetime,EndTrip datetime,PrivOdo decimal(10,2),BusOdo decimal(10,2), TravOdo decimal(10,2))
Insert Into #YourTable values
(4865,458,0 ,'2016-09-06 14:06:08','2016-09-21 09:15:49',0 ,313.87,0),
(4865,458,0 ,'2016-09-21 09:21:10','2016-09-28 17:02:08',54.75,423.96,0),
(4865,458,1202,'2016-09-29 11:10:04','2016-09-30 17:25:51',0 ,211.32,0),
(4865,458,0 ,'2016-10-03 07:39:25','2016-10-14 17:00:15',0 ,579.81,0)
Select VehicleKey
,NodeID
,VehicleKey
,DriverKey
,StartTrip = min(StartTrip)
,EndTrip = max(EndTrip)
,Private = sum(PrivOdo)
,Business = sum(BusOdo)
,Travel = sum(TravOdo)
,Total = sum(PrivOdo + BusOdo + TravOdo )
From (
Select Island = ( ROW_NUMBER() OVER (PARTITION BY VehicleKey ORDER BY MONTH(StartTrip)) ) - ( ROW_NUMBER() OVER (PARTITION BY VehicleKey, DriverKey ORDER BY StartTrip) )
,*
From #YourTable
) A
Group By Island,VehicleKey,NodeID,VehicleKey,DriverKey
Order By min(StartTrip)
Returns
FYI - The sub-query produces

postgres detect repeating patterns of zeros

Is there a way to detect subseries of zeros of length at least 3 within a time series in Postgres?
year value
--------------
1 0
2 0
3 0
4 33
5 72
6 0
7 0
8 0
9 0
10 25
11 0
12 56
13 37
So in this example I'd like to return years 1-3 and 6-9, but not year 11.
This one will do it:
WITH d(y,v) AS (VALUES
(1,0),(2,0),(3,0),(4,33),(5,72),
(6,0),(7,0),(8,0),(9,0),(10,25),
(11,0),(12,56),(13,37)
)
SELECT grp, numrange(min(y),max(y),'[]') as ys, count(*) as len
FROM (
/* group identifiers via running total */
SELECT y, v, g, sum(g) OVER (ORDER BY y) grp
FROM (
/* group boundaries */
SELECT y, v, CASE WHEN
v IS DISTINCT FROM lag(v) OVER (ORDER BY y) THEN 1
END g
FROM d) s
WHERE v=0) s
GROUP BY grp
HAVING count(*) >= 3;

Reorder Ranked rows

Recently i needed to implement a way to allow for Table Records to be Ranked.
Initially i deployed an Update statement to seed the ranks:
;with cte as (
select
t.id,
Rank() Over (
Partition by t.field2
Order by t.id
) as [Rank],
t.index,
t.field2,
t.field3 ,
t.field4
from dbo.Table t
where t.field2 = #fldValue
) Update cte
set index = [Rank]
But now i need to be able to have the end-user re-order the ranks. Any suggestions on how to allow an end-user to take Rank value 92 to Rank value 15 and have everything be re-ranked appropriately.
I had thought about doing this via cursor but am trying to do this via Set based operation.
My first goto was to do a Procedural based operation but need to get more inline with Set based operation.
Table Schema
Table:
id bigint
field2 int
field3 int ---> This field will be the key pivoting column for ranking
field4 int
Data:
id field2 field3 field4
1 0 1 1
2 0 1 1
3 0 1 1
4 0 1 2
5 0 1 2
6 0 1 1
7 0 1 1
8 0 1 1
9 0 1 1
10 0 1 2
11 0 1 2
12 0 1 1
13 0 1 1
14 0 1 1
15 0 1 2
16 0 1 1
17 0 1 2
18 0 1 2
19 0 1 1