Running total wih constant numbers T-SQL - tsql

I have data looks like this
MILEAGE January February ........December
0 0 0
2 0.066 0.052
3 0.081 0
5 0 0.062
6 0.080 0 .........
813 0 0 and so on
I want the data to look like this
Mileage January February ..... December
0 (Total of Mileage less and equal to zero for each month)
2000 Total of Mileage upto 2000 for each month
4000 Total of Mileage upto 4000 for each month
6000 Total of Mileage upto 6000 for each month
8000 and so on....
10000
12000
14000
2 thousand increment up till
50000
Thank you very much for your help. I am using SQL Server 2008 R2 and not sure how to achieve this

You can use a recursive CTE to make the table that lists the values on the fly as I show below. HOWEVER, you would only want to do this if it was an ad-hoc once in a while thing. If you are going to do it often (say every month) just make the table like you would make any other table and then you can add an index and join to it.
The most common way thing to do is to have a counting table 0, 1, 2, 3 etc up to some large number. Then you could get your result with SELECT val*2000 FROM counting_table WHERE val*2000 >= 5000. This counting table can be reused for many similar cases but is general purpose.
WITH mile_table as
(
-- use recursive cte to make a table with number 0 - 50000 by 2000
SELECT 0 as milage
UNION ALL
SELECT mile_table.milage+2000
FROM mile_table
WHERE mile_table.milage+2000 <= 50000
)
SELECT mile_table.milage,
sum(a.January) as January,
sum(a.February) as February,
--- ....
sum(a.December) as December,
FROM mile_table
JOIN your_table a ON a.milage >= mile_table.milage
GROUP BY mile_Table.milage

Try out something like this:
DECLARE #i as INT = -2000
CREATE TABLE #T
(
MileageFrom INT,
MileageTo INT
)
WHILE (#i <=50000)
BEGIN
INSERT INTO #T
values(#i, #i+2000)
SET #i= #i+2000
END
SELECT
A.MileageTo,
SUM(January) as january,
SUM(Feburary) as Feb ....,
SUM(DEC) as DEC
FROM YourTable A
JOIN #T B ON A.Mileage BETWEEN B.MileageFrom and B.MileageTo
You need to just tweak a bit the MileageFrom value to include all the values less than -2000

Related

PostgreSQL select statement to return rows after where condition

I am working on a query to return the next 7 days worth of data every time an event happens indicated by "where event = 1". The goal is to then group all the data by the user id and perform aggregate functions on this data after the event happens - the event is encoded as binary [0, 1].
So far, I have been attempting to use nested select statements to structure the data how I would like to have it, but using the window functions is starting to restrict me. I am now thinking a self join could be more appropriate but need help in constructing such a query.
The query currently first creates daily aggregate values grouped by user and date (3rd level nested select). Then, the 2nd level sums the data "value_x" to obtain an aggregate value grouped by the user. Then, the 1st level nested select statement uses the lead function to grab the next rows value over and partitioned by each user which acts as selecting the next day's value when event = 1. Lastly, the select statement uses an aggregate function to calculate the average "sum_next_day_value_after_event" grouped by user and where event = 1. Put together, where event = 1, the query returns the avg(value_x) of the next row's total value_x.
However, this doesn't follow my time rule; "where event = 1", return the next 7 days worth of data after the event happens. If there is not 7 days worth of data, then return whatever data is <= 7 days. Yes, I currently only have one lead with the offset as 1, but you could just put 6 more of these functions to grab the next 6 rows. But, the lead function currently just grabs the next row without regard to date. So theoretically, the next row's "value_x" could actually be 15 days from where "event = 1". Also, as can be seen below in the data table, a user may have more than one row per day.
Here is the following query I have so far:
select
f.user_id
avg(f.sum_next_day_value_after_event) as sum_next_day_values
from (
select
bld.user_id,
lead(bld.value_x, 1) over(partition by bld.user_id order by bld.daily) as sum_next_day_value_after_event
from (
select
l.user_id,
l.daily,
sum(l.value_x) as sum_daily_value_x
from (
select
user_id, value_x, date_part('day', day_ts) as daily
from table_1
group by date_part('day', day_ts), user_id, value_x) l
group by l.user_id, l.day_ts
order by l.user_id) bld) f
group by f.user_id
Below is a snippet of the data from table_1:
user_id
day_ts
value_x
event
50
4/2/21 07:37
25
0
50
4/2/21 07:42
45
0
50
4/2/21 09:14
67
1
50
4/5/21 10:09
8
0
50
4/5/21 10:24
75
0
50
4/8/21 11:08
34
0
50
4/15/21 13:09
32
1
50
4/16/21 14:23
12
0
50
4/29/21 14:34
90
0
55
4/4/21 15:31
12
0
55
4/5/21 15:23
34
0
55
4/17/21 18:58
32
1
55
4/17/21 19:00
66
1
55
4/18/21 19:57
54
0
55
4/23/21 20:02
34
0
55
4/29/21 20:39
57
0
55
4/30/21 21:46
43
0
Technical details:
PostgreSQL, supported by EDB, version = 14.1
pgAdmin4, version 5.7
Thanks for the help!
"The query currently first creates daily aggregate values"
I don't see any aggregate function in your first query, so that the GROUP BY clause is useless.
select
user_id, value_x, date_part('day', day_ts) as daily
from table_1
group by date_part('day', day_ts), user_id, value_x
could be simplified as
select
user_id, value_x, date_part('day', day_ts) as daily
from table_1
which in turn provides no real added value, so this first query could be removed and the second query would become :
select user_id
, date_part('day', day_ts) as daily
, sum(value_x) as sum_daily_value_x
from table_1
group by user_id, date_part('day', day_ts)
The order by user_id clause can also be removed at this step.
Now if you want to calculate the average value of the sum_daily_value_x in the period of 7 days after the event (I'm referring to the avg() function in your top query), you can use avg() as a window function that you can restrict to the period of 7 days after the event :
select f.user_id
, avg(f.sum_daily_value_x) over (order by f.daily range between current row and '7 days' following) as sum_next_day_values
from (
select user_id
, date_part('day', day_ts) as daily
, sum(value_x) as sum_daily_value_x
from table_1
group by user_id, date_part('day', day_ts)
) AS f
group by f.user_id
The partition by f.user_id clause in the window function is useless because the rows have already been grouped by f.user_id before the window function is applied.
You can replace the avg() window function by any other one, for instance sum() which could better fit with the alias sum_next_day_values

SELECT record based upon dates

Assuming data such as the following:
ID EffDate Rate
1 12/12/2011 100
1 01/01/2012 110
1 02/01/2012 120
2 01/01/2012 40
2 02/01/2012 50
3 01/01/2012 25
3 03/01/2012 30
3 05/01/2012 35
How would I find the rate for ID 2 as of 1/15/2012?
Or, the rate for ID 1 for 1/15/2012?
In other words, how do I do a query that finds the correct rate when the date falls between the EffDate for two records? (Rate should be for the date prior to the selected date).
Thanks,
John
How about this:
SELECT Rate
FROM Table1
WHERE ID = 1 AND EffDate = (
SELECT MAX(EffDate)
FROM Table1
WHERE ID = 1 AND EffDate <= '2012-15-01');
Here's an SQL Fiddle to play with. I assume here that 'ID/EffDate' pair is unique for all table (at least the opposite doesn't make sense).
SELECT TOP 1 Rate FROM the_table
WHERE ID=whatever AND EffDate <='whatever'
ORDER BY EffDate DESC
if I read you right.
(edited to suit my idea of ms-sql which I have no idea about).

tsql PIVOT function

Need help with the following query:
Current Data format:
StudentID EnrolledStartTime EnrolledEndTime
1 7/18/2011 1.00 AM 7/18/2011 1.05 AM
2 7/18/2011 1.00 AM 7/18/2011 1.09 AM
3 7/18/2011 1.20 AM 7/18/2011 1.40 AM
4 7/18/2011 1.50 AM 7/18/2011 1.59 AM
5 7/19/2011 1.00 AM 7/19/2011 1.05 AM
6 7/19/2011 1.00 AM 7/19/2011 1.09 AM
7 7/19/2011 1.20 AM 7/19/2011 1.40 AM
8 7/19/2011 1.10 AM 7/18/2011 1.59 AM
I would like to calculate the time difference between EnrolledEndTime and EnrolledStartTime and group it with 15 minutes difference and the count of students that enrolled in the time.
Expected Result :
Count(StudentID) Date 0-15Mins 16-30Mins 31-45Mins 46-60Mins
4 7/18/2011 3 1 0 0
4 7/19/2011 2 1 0 1
Can I use a combination of the PIVOT function to acheive the required result. Any pointers would be helpful.
Create a table variable/temp table that includes all the columns from the original table, plus one column that marks the row as 0, 16, 31 or 46. Then
SELECT * FROM temp table name PIVOT (Count(StudentID) FOR new column name in (0, 16, 31, 46).
That should put you pretty close.
It's possible (just see the basic pivot instructions here: http://msdn.microsoft.com/en-us/library/ms177410.aspx), but one problem you'll have using pivot is that you need to know ahead of time which columns you want to pivot into.
E.g., you mention 0-15, 16-30, etc. but actually, you have no idea how long some students might take -- some might take 24-hours, or your full session timeout, or what have you.
So to alleviate this problem, I'd suggesting having a final column as a catch-all, labeled something like '>60'.
Other than that, just do a select on this table, selecting the student ID, the date, and a CASE statement, and you'll have everything you need to work the pivot on.
CASE WHEN date2 - date1 < 15 THEN '0-15' WHEN date2-date1 < 30 THEN '16-30'...ELSE '>60' END.
I have an old version of ms sql server that doesn't support pivot. I wrote the sql for getting the data. I cant test the pivot, so I tried my best, couldn't test the pivot part. The rest of the sql will give you the exact data for the pivot table. If you accept null instead of 0, it can be written alot more simple, you can skip the "a subselect" part defined in "with a...".
declare #t table (EnrolledStartTime datetime,EnrolledEndTime datetime)
insert #t values('2011/7/18 01:00', '2011/7/18 01:05')
insert #t values('2011/7/18 01:00', '2011/7/18 01:09')
insert #t values('2011/7/18 01:20', '2011/7/18 01:40')
insert #t values('2011/7/18 01:50', '2011/7/18 01:59')
insert #t values('2011/7/19 01:00', '2011/7/19 01:05')
insert #t values('2011/7/19 01:00', '2011/7/19 01:09')
insert #t values('2011/7/19 01:20', '2011/7/19 01:40')
insert #t values('2011/7/19 01:10', '2011/7/19 01:59')
;with a
as
(select * from
(select distinct dateadd(day, cast(EnrolledStartTime as int), 0) date from #t) dates
cross join
(select '0-15Mins' t, 0 group1 union select '16-30Mins', 1 union select '31-45Mins', 2 union select '46-60Mins', 3) i)
, b as
(select (datediff(minute, EnrolledStartTime, EnrolledEndTime )-1)/15 group1, dateadd(day, cast(EnrolledStartTime as int), 0) date
from #t)
select count(b.date) count, a.date, a.t, a.group1 from a
left join b
on a.group1 = b.group1
and a.date = b.date
group by a.date, a.t, a.group1
-- PIVOT(max(date)
-- FOR group1
-- in(['0-15Mins'], ['16-30Mins'], ['31-45Mins'], ['46-60Mins'])AS p

How to find the average of certain records T-SQL

I have a table variable that I am dumping data into:
DECLARE #TmpTbl_SKUs AS TABLE
(
Vendor VARCHAR (255),
Number VARCHAR(4),
SKU VARCHAR(20),
PurchaseOrderDate DATETIME,
LastReceivedDate DATETIME,
DaysDifference INT
)
Some records don't have a purchase order date or last received date, so the days difference is null as well. I have done a lot of inner joins on itself, but data seems to take too long, or comes out incorrect most of the time.
Is it possible to get the average per SKU days difference? how would I check if there is only 1 record of that SKU? I need the data, if there is only 1 record, then I have to find it at a champvendor level the average.
Here is the structure:
Vendor has many Numbers and Numbers has many SKUs
Any help would be great, I can't seem to crack this one, nor can I find anything related to this online. Thanks in advance.
Here is some sample data:
Vendor Number SKU PurchaseOrderDate LastReceivedDate DaysDifference
OTHER PMDD 1111 OP1111 2009-08-21 00:00:00.000 2009-09-02 00:00:00.000 12
OTHER PMDD 1111 OP1112 2009-12-09 00:00:00.000 2009-12-17 00:00:00.000 8
MANTOR 3333 MA1111 2006-02-15 00:00:00.000 2006-02-23 00:00:00.000 8
MANTOR 3333 MA1112 2006-02-15 00:00:00.000 2006-02-23 00:00:00.000 8
I'm sorry I may have written this wrong. If there is only 1 SKU for a record, then I want to return the DaysDifference (if it's not null), if it has more than 1 record and they are not null, then return the average days difference. If it is all nulls, then at a vendor level check for the average of the skus that are not null, otherwise it should just return 7. This is what I have tried:
SELECT t1.SKU, ISNULL
(
AVG(t1.DaysDifference),
(
SELECT ISNULL(AVG(t2.DaysDifference), 7)
FROM #TmpTbl_SKUs t2
WHERE t2.SKU=t1.SKU
GROUP BY t2.ChampVendor, t2.VendorNumber, t2.SKU
)
)
FROM #TmpTbl_SKUs t1
GROUP BY t1.SKU
Keep playing with this. I somewhat have what I got, but just don't understand how I would check if it has multiple records, and how to check at a vendor level.
Try this:
EDITED: added NULLIF(..., 0) to treat 0s as NULLs.
SELECT
t1.SKU,
COALESCE(
NULLIF(AVG(t1.DaysDifference), 0),
NULLIF(t2.AvgDifferenceVendor, 0),
7
) AS AvgDiff
FROM #TmpTbl_SKUs t1
INNER JOIN (
SELECT Vendor, AVG(DaysDifference) AS AvgDifferenceVendor
FROM #TmpTbl_SKUs
GROUP BY Vendor
) t2 ON t1.Vendor = t2.Vendor
GROUP BY t1.SKU, t2.AvgDifferenceVendor
EDIT 2: how I tested the script.
For testing I'm using the sample data posted with the question.
DECLARE #TmpTbl_SKUs AS TABLE
(
Vendor VARCHAR (255),
Number VARCHAR(4),
SKU VARCHAR(20),
PurchaseOrderDate DATETIME,
LastReceivedDate DATETIME,
DaysDifference INT
)
INSERT INTO #TmpTbl_SKUs
(Vendor, Number, SKU, PurchaseOrderDate, LastReceivedDate, DaysDifference)
SELECT 'OTHER PMDD', '1111', 'OP1111', '2009-08-21 00:00:00.000', '2009-09-02 00:00:00.000', 12
UNION ALL
SELECT 'OTHER PMDD', '1111', 'OP1112', '2009-12-09 00:00:00.000', '2009-12-17 00:00:00.000', 8
UNION ALL
SELECT 'MANTOR', '3333', 'MA1111', '2006-02-15 00:00:00.000', '2006-02-23 00:00:00.000', 8
UNION ALL
SELECT 'MANTOR', '3333', 'MA1112', '2006-02-15 00:00:00.000', '2006-02-23 00:00:00.000', 8;
First I'm running the script on the unmodified data. Here's the result:
SKU AvgDiff
-------------------- -----------
MA1111 8
MA1112 8
OP1111 12
OP1112 8
AvgDiff for every SKU is identical to the original DaysDifference for every SKU, because there's only one row per each one.
Now I'm changing DaysDifference for SKU='MA1111' to 0 and running the script again. Ther result is:
SKU AvgDiff
-------------------- -----------
MA1111 4
MA1112 8
OP1111 12
OP1112 8
Now AvgDiff for MA1111 is 4. Why? Because the average for the SKU is 0, and so the average by Vendor is taken, which has been calculated as (0 + 8) / 2 = 4.
Next step is to set DaysDifference to 0 for all the SKUs of the same Vendor. In this case I'm setting it for SKUs MA1111 and MA1112. Here's the result of the script for this change:
SKU AvgDiff
-------------------- -----------
MA1111 7
MA1112 7
OP1111 12
OP1112 8
So now AvgDiff is 7 for both MA1111 and MA1112. How has it become so? Both have DaysDifference = 0. That means that the average by Vendor should be taken for each one. But Vendor average is 0 too in this case. According to the requirement, the average here should default to 7, which is what the script has returned.
So the script seems to be working correctly. I understand that it's either me having missed something or you having forgotten to mention some details. In any case, I would be glad to see where this script fails to solve your problem.

Insert rownumber repeatedly in records in t-sql

I want to insert a row number in a records like counting rows in a specific number of range. example output:
RowNumber ID Name
1 20 a
2 21 b
3 22 c
1 23 d
2 24 e
3 25 f
1 26 g
2 27 h
3 28 i
1 29 j
2 30 k
I rather to try using the rownumber() over (partition by order by column name) but my real records are not containing columns that will count into 1-3 rownumber.
I already try to loop each of record to insert a row count 1-3 but this loop affects the performance of the query. The query will use for the RDL report, that is why as much as possible the performance of the query must be good.
any suggestions are welcome. Thanks
have you tried modulo-ing rownumber()?
SELECT
((row_number() over (order by ID)-1) % 3) +1 as RowNumber
FROM table