Sql Server - Running Totals Based on Conditions - tsql

I have been banging my head trying to come up with the correct logic (SQL Server 2012) needed to achieve something I would imagine would be fairly routine but I have been unable to find any examples of this anywhere. Basically, I have 3 columns in a table: product, flag, value. It is possible for a product to be listed multiple times within the table but only once with a unique flag (i.e. product1 can have flag1 or flag2 with different/identical but there will never be 2 records with product1 and flag1 and different/identical values).
The flag represents a pre-defined value (1,2,3,4) and the intention behind this field is to be able to assign a unique mathematical equation based on the value of the flag. The end result would yield a single product, the unique flag, and a new cumulative total based on the mathematical equation output. For instance, let's say product1 was listed 4 times with flag values of flag1, flag2, flag3, flag4 (see below):
Product-----Flag-----Value
Product1----Flag1----1.00
Product1----Flag2----3.00
Product1----Flag3----5.00
Product1----Flag4----7.00
Product-----Flag-----Value
Product1----Flag1----1.00 (flag1 value)
Product1----Flag2----4.00 (flag1+flag2 value)
Product1----Flag3----6.00 (flag1+flag3 value)
Product1----Flag4----10.00 (flag2+flag4 value)
Flag1 is defined as add flag1 only. Flag2 is defined as add flag1 and flag2. Flag 3 is defined as add flag1 and flag 3. Flag 4 is defined as add flag2 and flag4. the new output would be product1 listed four times with flag values of flag1, flag2, flag3, flag4 but new values as flag1, flag1_flag2, flag1+flag3, flag2+flag4.
I have tried to apply the logic via a case statement but I can't figure out how to traverse all the products for each condition and I have tried to go with a running totals solution but I am not sure how to incorporate the flag condition into it so it only performs a running total for when those conditions are true. Any assistance and/or article to help get me going down the right path would be greatly appreciated.

While I'm not sure I fully understand your question I think this might be what you want. For this to work it assumes flag1 is always present when flags 1 through 3 are and that flag2 is present when flag4 is.
;with cte as (
select
product,
max(case when flag = 'Flag1' then Value end) as f1Value,
max(case when flag = 'Flag2' then Value end) as f2Value,
max(case when flag = 'Flag3' then Value end) as f3Value,
max(case when flag = 'Flag4' then Value end) as f4Value
from flags group by Product
)
select
flags.Product,
flags.Flag,
flags.Value as "Org. value",
case flag
when 'Flag1' then f1Value
when 'Flag2' then f1Value + f2Value
when 'Flag3' then f1Value + f3Value
when 'Flag4' then f2Value + f4Value
else flags.Value -- take the present value when flag is not Flag1-4
end as "New value"
from flags
inner join cte on flags.Product = cte.Product
Take a look at this Sample SQL Fiddle to see it in action.

You can join a table to itself, and pick the conditions appropriately:
SELECT p1.product,p1.Flag,p1.Value + COALESCE(p2.Value,0)
FROM
Products p1
left join
Products p2
on
p1.Product = p2.Product and
p2.Flag = CASE p1.Flag
--1 doesn't need a previous value
WHEN 2 THEN 1
WHEN 3 THEN 1
WHEN 4 THEN 2
END

I assumed and tried on Range values.
CREATE TABLE #tmp (Product VARCHAR(10), flag VARCHAR(10),value numeric(13,2))
GO
INSERT INTO #tmp
SELECT 'Product1' , 'Flag1',1
UNION
SELECT 'Product1' , 'Flag2',3
UNION
SELECT 'Product1' , 'Flag3',5
UNION
SELECT 'Product1' , 'Flag4',7
GO
;WITH cte
AS
(
SELECT row_number () OVER(
ORDER BY flag) 'row',*
FROM #tmp
)
SELECT *,value 'RT'
FROM cte
WHERE row = 1
UNION
SELECT * ,(
SELECT cte.value
FROM cte
WHERE row = 1
) + value 'RT'
FROM cte
WHERE row BETWEEN 2
AND 3
UNION
SELECT * ,(
SELECT cte.value
FROM cte
WHERE row =2
) + value 'RT'
FROM cte
WHERE row >3
GO
DROP TABLE #tmp

Related

Update Multiple Columns in One Statement Based On a Field with the Same Value as the Column Name

Not sure if this is possible without some sort of Dynamic SQL or a Pivot (which I want to stay away from)... I have a report that displays total counts for various types/ various status combinations... These types and statuses are always going to be the same and present on the report, so returning no data for a specific combination yields a zero. As of right now there are only three caseTypes (Vegetation, BOA, and Zoning) and 8 statusTypes (see below).
I am first setting up the skeleton of the report using a temp table. I have been careful to name the temp table columns the same as what the "statusType" column will contain in my second table "#ReportData". Is there a way to update the different columns in "#FormattedData" based on the value of the "statusType" column in my second table?
Creation of Formatted Table (for report):
CREATE TABLE #FormattedReport (
caseType VARCHAR(50)
, underInvestigation INT NOT NULL DEFAULT 0
, closed INT NOT NULL DEFAULT 0
, closedDPW INT NOT NULL DEFAULT 0
, unsubtantiated INT NOT NULL DEFAULT 0
, currentlyMonitored INT NOT NULL DEFAULT 0
, judicialProceedings INT NOT NULL DEFAULT 0
, pendingCourtAction INT NOT NULL DEFAULT 0
, other INT NOT NULL DEFAULT 0
)
INSERT INTO #FormattedReport (caseType) VALUES ('Vegetation')
INSERT INTO #FormattedReport (caseType) VALUES ('BOA')
INSERT INTO #FormattedReport (caseType) VALUES ('Zoning')
Creation of Data Table (to populate #FormattedReport):
SELECT B.Name AS caseType, C.Name AS StatusType, COUNT(*) AS Amount
INTO #ReportData
FROM table1 A
INNER JOIN table2 B ...
INNER JOIN table3 C ...
WHERE ...
GROUP BY B.Name, C.Name
CURRENT Update Statement (Currently will be 1 update per column in #FormattedReport):
UPDATE A SET underInvestigation = Amount FROM #ReportData B
INNER JOIN #FormattedReport A ON B.CaseType LIKE CONCAT('%', A.caseType, '%')
WHERE B.StatusType = 'Under Investigation'
UPDATE A SET closed = Amount FROM #ReportData B
INNER JOIN #FormattedReport A ON B.CaseType LIKE CONCAT('%', A.caseType, '%')
WHERE B.StatusType = 'Closed'
...
REQUESTED Update Statement: Would like to have ONE update statement knowing which column to update when "#ReportData.statusType" is the same as a "#FormattedData" column's name. For my "other" column, I'll just do that one manually using a NOT IN.
Assuming I understand the question, I think you can use conditional aggregation for this:
;WITH CTE AS
(
SELECT CaseType
,SUM(CASE WHEN StatusType = 'Under Investigation' THEN Amount ELSE 0 END) As underInvestigation
,SUM(CASE WHEN StatusType = 'Closed' THEN Amount ELSE 0 END) As closed
-- ... More of the same
FROM #ReportData
GROUP BY CaseType
)
UPDATE A
SET underInvestigation = B.underInvestigation
,closed = b.closed
-- more of the same
FROM #FormattedReport A
INNER JOIN CTE B
ON B.CaseType LIKE CONCAT('%', A.caseType, '%')

lead and lag on large table 1billion rows

I have a table T as follows with 1 Billion records. Currently, this table has no Primary key or Indexes.
create table T(
day_c date,
str_c varchar2(20),
comm_c varchar2(20),
src_c varchar2(20)
);
some sample data:
insert into T
select to_date('20171011','yyyymmdd') day_c,'st1' str_c,'c1' comm_c,'s1' src_c from dual
union
select to_date('20171012','yyyymmdd'),'st1','c1','s1' from dual
union
select to_date('20171013','yyyymmdd'),'st1','c1','s1' from dual
union
select to_date('20171014','yyyymmdd'),'st1','c1','s2' from dual
union
select to_date('20171015','yyyymmdd'),'st1','c1','s2' from dual
union
select to_date('20171016','yyyymmdd'),'st1','c1','s2' from dual
union
select to_date('20171017','yyyymmdd'),'st1','c1','s1' from dual
union
select to_date('20171018','yyyymmdd'),'st1','c1','s1' from dual
union
select to_date('20171019','yyyymmdd'),'st1','c1','s1' from dual
union
select to_date('20171020','yyyymmdd'),'st1','c1','s1' from dual;
The expected result is to generate the date ranges for the changes in column src_c.
I have the following code snippet which provides the desired result. However, it is slow as the cost of running lag and lead is quite high on the table.
WITH EndsMarked AS (
SELECT
day_c,str_c,comm_c,src_c,
CASE WHEN src_c= LAG(src_c,1) OVER (ORDER BY day_c)
THEN 0 ELSE 1 END AS IS_START,
CASE WHEN src_c= LEAD(src_c,1) OVER (ORDER BY day_c)
THEN 0 ELSE 1 END AS IS_END
FROM T
), GroupsNumbered AS (
SELECT
day_c,str_c,comm_c,
src_c,
IS_START,
IS_END,
COUNT(CASE WHEN IS_START = 1 THEN 1 END)
OVER (ORDER BY day_c) AS GroupNum
FROM EndsMarked
WHERE IS_START=1 OR IS_END=1
)
SELECT
str_c,comm_c,src_c,
MIN(day_c) AS GROUP_START,
MAX(day_c) AS GROUP_END
FROM GroupsNumbered
GROUP BY str_c,comm_c, src_c,GroupNum
ORDER BY groupnum;
Output :
STR_C COMM_C SRC_C GROUP_START GROUP_END
st1 c1 s1 11-OCT-17 13-OCT-17
st1 c1 s2 14-OCT-17 16-OCT-17
st1 c1 s1 17-OCT-17 20-OCT-17
Any suggestion to speed up?
Oracle database :12c.
SGA Memory:20GB
Total CPU:22
Explain plan:
Order by day_c only, or do you need to partition by str_c and comm_c first? It seems so - in which case I am not sure your query is correct, and Sentinel's solution will need to be adjusted accordingly.
Then:
For some reason (which escapes me), it appears that the match_recognize clause (available only since Oracle 12.1) is faster than analytic functions, even when the work done seems to be the same.
In your problem, (1) you must read 1 billion rows from disk, which can't be done faster than the hardware allows (do you REALLY need to do this on all 1 billion rows, or should you archive a large portion of your table, perhaps after performing this identification of GROUP_START and GROUP_END)? (2) you must order the data by day_c no matter what method you use, and that is time consuming.
With that said, the tabibitosan method (see Sentinel's answer) will be faster than the start-of-group method (which is close to, but simpler than what you currently have).
The match_recognize solution, which will probably be faster than any solution based on analytic functions, looks like this:
select str_c, comm_c, src_c, group_start, group_end
from t
match_recognize(
partition by str_c, comm_c
order by day_c
measures x.src_c as src_c,
first(day_c) as group_start,
last(day_c) as group_end
pattern ( x y* )
define y as src_c = x.src_c
)
-- Add ORDER BY clause here, if needed
;
Here is a quick explanation of how this works; for developers who are not familiar with match_recognize, I provided links to a few good tutorials in a Comment below this Answer.
The match_recognize clause partitions the input rows by str_c and comm_c and orders them by day_c. So far this is exactly the same work that analytic functions do.
Then in the PATTERN and DEFINE clauses I declare and define two "classes" of rows, which will be flagged as X and Y, respectively. X is any row (there are no restrictions on it in the DEFINE clause). However, Y is restricted: it must have the same src_c as the last X row preceding it.
So, in each partition, and reading from the earliest row to the latest (within the partition), I am looking for any number of matches, where a match consists of an arbitrary row (marked X), followed by as many Y rows as possible; where Y means "same src_c as the first row in this match. So, this will identify sequences of rows where the src_c did not change.
For each match that is found, the clause will output the src_c value from the X row (which is the same, really, for all the rows in that match), and the first and the last value in the day_c column for that match. That is what we need to put in the SELECT clause of the overall query.
You can eliminate one CTE by using the Tabibito-san (Traveler) method:
with Groups as (
select t.*
, row_number() over (order by day_c)
- row_number() over (partition by str_c
, comm_c
, src_c
order by day_c) GroupNum
from t
)
select str_c
, comm_c
, src_c
, min(day_c) GROUP_START
, max(day_c) GROUP_END
from Groups
group by str_c
, comm_c
, src_c
, GroupNum

After doing CTE Select Order By and then Update, Update results are not ordered the same (TSQL)

The code is roughly like this:
WITH cte AS
(
SELECT TOP 4 id, due_date, check
FROM table_a a
INNER JOIN table_b b ON a.linkid = b.linkid
WHERE
b.status = 1
AND due_date > GetDate()
ORDER BY due_date, id
)
UPDATE cte
SET check = 1
OUTPUT
INSERTED.id,
INSERTED.due_date
Note: the actual data has same due_date.
When I ran the SELECT statement only inside the cte, I could get the result, for ex: 1, 2, 3, 4.
But after the UPDATE statement, the updated results are: 4, 1, 2, 3
Why is this (order-change) happening?
How to keep or re-order the results back to 1,2,3,4 in this same 1 query?
In MSDN https://msdn.microsoft.com/pl-pl/library/ms177564(v=sql.110).aspx you can read that
There is no guarantee that the order in which the changes are applied
to the table and the order in which the rows are inserted into the
output table or table variable will correspond.
Thats mean you can't solve your problem with only one query. But you still can use one batch to do what you need. Because your output don't guarantee the order then you have to save it in another table and order it after update. This code will return your output values in order that you assume:
declare #outputTable table( id int, due_date date);
with cte as (
select top 4 id, due_date, check
from table_a a
inner join table_b b on a.linkid = b.linkid
where b.status = 1
and due_date > GetDate()
order by due_date, id
)
update cte
set check = 1
output inserted.id, inserted.due_date
into #outputTable;
select *
from #outputTable
order by due_date, id;

How to match records for two different groups?

I have one main table called Event_log which contains all of the records that I need for this query. Within this table there is one column that I'm calling "Grp". To simplify things, assume that there are only two possible values for this Grp: A and B. So now we have one table, Event_log, with one column "Grp" and one more column called "Actual Date". Lastly I want to add one more Flag column to this table, which works as follows.
First, I order all of the records in descending order by date as demonstrated below. Then, I want to flag each Group "A" row with a 1 or a 0. For all "A" rows, if the previous record (earlier in date) = "B" row then I want to flag 1. Otherwise flag a 0. So this initial table looks like this before setting this flag:
Actual Date Grp Flag
1-29-13 A
12-27-12 B
12-26-12 B
12-23-12 A
12-22-12 A
But after these calculations are done, it should look like this:
Actual Date Grp Flag
1-29-13 A 1
12-27-12 B NULL
12-26-12 B NULL
12-23-12 A 0
12-22-12 A 0
How can I do this? This is simpler to describe than it is to query!
You can use something like:
select el.ActualDate
, el.Grp
, Flag = case
when el.grp = 'B' then null
when prev.grp = 'B' then 1
else 0
end
from Event_log el
outer apply
(
select top 1 prev.grp
from Event_log prev
where el.ActualDate > prev.ActualDate
order by prev.ActualDate desc
) prev
order by el.ActualDate desc
SQL Fiddle with demo.
Try this
;with cte as
(
SELECT CAST('01-29-13' As DateTime) ActualDate,'A' Grp
UNION ALL SELECT '12-27-12','B'
UNION ALL SELECT '12-26-12','B'
UNION ALL SELECT '12-23-12','A'
UNION ALL SELECT '12-22-12','A'
)
, CTE2 as
(
SELECT *, ROW_NUMBER() OVER (order by actualdate desc) rn
FROM cte
)
SELECT a.*,
case
when A.Grp = 'A' THEN
CASE WHEN b.Grp = 'B' THEN 1 ELSE 0 END
ELSE NULL
END Flag
from cte2 a
LEFT OUTER JOIN CTE2 b on a.rn + 1 = b.rn

query for a range of records in result

I am wondering if there is some easy way, a function, or other method to return data from a query with the following results.
I have a SQL Express DB 2008 R2, a table that contains numerical data in a given column, say col T.
I am given a value X in code and would like to return up to three records. The record where col T equals my value X, and the record before and after, and nothing else. The sort is done on col T. The record before may be beginning of file and therefore not exist, likewise, if X equals the last record then the record after would be non existent, end of file/table.
The value of X may not exist in the table.
This I think is similar to get a range of results in numerical order.
Any help or direction in solving this would be greatly appreciated.
Thanks again,
It might not be the most optimal solution, but:
SELECT T
FROM theTable
WHERE T = X
UNION ALL
SELECT *
FROM
(
SELECT TOP 1 T
FROM theTable
WHERE T > X
ORDER BY T
) blah
UNION ALL
SELECT *
FROM
(
SELECT TOP 1 T
FROM theTable
WHERE T < X
ORDER BY T DESC
) blah2
DECLARE #x int = 100
;WITH t as
(
select ROW_NUMBER() OVER (ORDER BY T ASC) AS row_nm,*
from YourTable
)
, t1 as
(
select *
from t
WHERE T = #x
)
select *
from t
CROSS APPLY t1
WHERE t.row_nm BETWEEN t1.row_nm -1 and t1.row_nm + 1