T-SQL Determine Status Changes in History Table - tsql

I have an application which logs changes to records in the "production" table to a "history" table. The history table is basically a field for field copy of the production table, with a few extra columns like last modified date, last modified by user, etc.
This works well because we get a snapshot of the record anytime the record changes. However, it makes it hard to determine unique status changes to a record. An example is below.
BoxID StatusID SubStatusID ModifiedTime
1 4 27 2011-08-11 15:31
1 4 11 2011-08-11 15:28
1 4 11 2011-08-10 09:07
1 5 14 2011-08-09 08:53
1 5 14 2011-08-09 08:19
1 4 11 2011-08-08 14:15
1 4 9 2011-07-27 15:52
1 4 9 2011-07-27 15:49
1 2 8 2011-07-26 12:00
As you can see in the above table (data comes from the real system with other fields removed for brevity and security) BoxID 1 has had 9 changes to the production record. Some of those updates resulted in statuses being changed and some did not, which means other fields (those not shown) have changed.
I need to be able, in TSQL, to extract from this data the unique status changes. The output I am looking for, given the above input table, is below.
BoxID StatusID SubStatusID ModifiedTime
1 4 27 2011-08-11 15:31
1 4 11 2011-08-10 09:07
1 5 14 2011-08-09 08:19
1 4 11 2011-08-08 14:15
1 4 9 2011-07-27 15:49
1 2 8 2011-07-26 12:00
This is not as easy as grouping by StatusID and SubStatusID and taking the min(ModifiedTime) then joining back into the history table since statuses can go backwards as well (see StatusID 4, SubStatusID 11 gets set twice).
Any help would be greatly appreciated!

Does this do work for you
;WITH Boxes_CTE AS
(
SELECT Boxid, StatusID, SubStatusID, ModifiedTime,
ROW_NUMBER() OVER (PARTITION BY Boxid ORDER BY ModifiedTime) AS SEQUENCE
FROM Boxes
)
SELECT b1.Boxid, b1.StatusID, b1.SubStatusID, b1.ModifiedTime
FROM Boxes_CTE b1
LEFT OUTER JOIN Boxes_CTE b2 ON b1.Boxid = b2.Boxid
AND b1.Sequence = b2.Sequence + 1
WHERE b1.StatusID <> b2.StatusID
OR b1.SubStatusID <> b2.SubStatusID
OR b2.StatusID IS NULL
ORDER BY b1.ModifiedTime DESC
;

Select BoxID,StatusID,SubStatusID FROM Staty CurrentStaty
INNER JOIN ON
(
Select BoxID,StatusID,SubStatusID FROM Staty PriorStaty
)
Where Staty.ModifiedTime=
(Select Max(PriorStaty.ModifiedTime) FROM PriorStaty
Where PriortStaty.ModifiedTime<Staty.ModifiedTime)
AND Staty.BoxID=PriorStaty.BoxID
AND NOT (
Staty.StatusID=PriorStaty.StatusID
AND
Staty.SubStatusID=PriorStaty.StatusID
)

Related

In Redshift SQL query for reducing years

i have data with fields as shown below
id
grade
grade_id
year
Diff
101
5
7
2022
9
105
k
2
2021
2
106
4
6
2020
5
110
pk
1
2022
1
i want to insert records for same id until we reaches grade = pk , Like shown below for every record in the table .
id
grade
grade_id
year
Diff
101
5
7
2022
9
101
4
6
2021
8
101
3
5
2020
7
101
2
4
2019
6
101
1
3
2018
5
101
k
2
2017
4
101
pk
1
2016
3
need help in sql code
create table amish.cte_test
(id int,
grade int,
year int,
diff int)
insert into amish.cte_test
values (101,5,2022,9)
with recursive temp1( id, grade, year, diff) as
(select id, grade , year , diff from amish.cte_test
union all
select id, grade-1, year-1,diff-1 from temp1
where grade-1 > -2)
select * from temp1

trouble getting rid of duplicate rows

I have base data that has multiple lab items (A, B, C) that occur on same date.
id datetime_1 order_datetime item value
-----------------------------------------------------
1 9/1/21 09:57 9/2/21 04:21 A 13
1 9/1/21 09:57 9/2/21 04:21 B 8
1 9/1/21 09:57 9/2/21 04:21 C 11
1 9/1/21 09:57 9/3/21 16:00 A 10
1 9/1/21 09:57 9/3/21 16:00 B 4
1 9/1/21 09:57 9/3/21 16:00 C 7
1 9/2/21 02:30 9/2/21 04:21 A 13
1 9/2/21 02:30 9/2/21 04:21 B 8
1 9/2/21 02:30 9/2/21 04:21 C 11
1 9/2/21 02:30 9/3/21 16:00 A 10
1 9/2/21 02:30 9/3/21 16:00 B 4
1 9/2/21 02:30 9/3/21 16:00 C 7
I need output to show as :
id datetime_1 a_level b_level c_level
------------------------------------------------
1 9/1/21 09:57 13 8 11
1 9/2/21 02:30 13 8 11
My current code is:
with lab_setup as (
select id, datetime_1, row_number() over (partition by id, datetime_1 order by order_datetime) as lab_order)
from data
group by id, datetime_1, order_datetime
)
, lab_first as (
select id, datetime_1,
max(case when item = 'A' then value end) as a_level,
max(case when item = 'B' then value end) as b_level,
max(case when item = 'C' then value end) as c_level
from lab_setup
group by id, datetime_1, item, value
)
select *
from lab_first
group by id, datetime_1, a_level, b_level, c_level
The problem is that I keep getting duplicate rows in response to this code, looking like:
id datetime_1 a_level b_level c_level
------------------------------------------------
1 9/1/21 09:57 13 null null
1 9/1/21 09:57 null null 11
1 9/1/21 09:57 null 8 null
I've tried distinct, group by, max(case when) but so far it still provides multiple rows per datetime_1, which is not what I want. Does anyone have clue how to help merge these multiple rows into one?
You are close, but have a lot of extra, unnecessary work, making the query more complicated then needed. First off the query as posted did not produce the posted results. Seems somewhere along the line you grabbed the wrong query. The posted query is not valid. The lab_first cte uses item and value from lab_setup, but lab_setup does not contain either of them. Further it seems the purpose on of lab first is to derive the column lab_order, but that is not used afterward. Finally the main query selects only what would have been selected from the lab first cte without change. Thus neither cte is needed. Just incorporate the filtering (max functions) into the main. So (see demo)
select id
, datetime_1
, max(case when item = 'A' then value end) as a_level
, max(case when item = 'B' then value end) as b_level
, max(case when item = 'C' then value end) as c_level
from data
group by id, datetime_1
order by id, datetime_1;
Note on Demo: Just repeating the target data values but change the control values. How do I know afterword except for I not picking up the the exact same target values. Further a universe of 1 (id) is not sufficiently discriminating. For demo I changed of the value on a couple rows and added a few. The results look good, but you have to decide.

T_SQL counting particular values in one row with multiple columns

I have little problem with counting cells with particular value in one row in MSSMS.
Table looks like
ID
Month
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
11
12
13
14
15
16
...
31
5000
1
null
null
1
1
null
1
1
null
null
2
2
2
2
2
null
null
3
3
3
3
3
null
...
1
I need to count how many cells in one row have value for example 1. In this case it would be 5.
Data represents worker shifts in a month. Be aware that there is a column named month (FK with values 1-12), i don't want to count that in a result.
Column ID is ALWAYS 4 digit number.
Possibility is to use count(case when) but in examples there are only two or three columns not 31. Statement will be very long. Is there any other option to count it?
Thanks for any advices.
I'm going to strongly suggest that you abandon your current table design, and instead store one day per month, per record, not column. That is, use this design:
ID | Date | Value
5000 | 2021-01-01 | NULL
5000 | 2021-01-02 | NULL
5000 | 2021-01-03 | 1
5000 | 2021-01-04 | 1
5000 | 2021-01-05 | NULL
...
5000 | 2021-01-31 | 5
Then use this query:
SELECT
ID,
CONVERT(varchar(7), Date, 120),
COUNT(CASE WHEN Value = 1 THEN 1 END) AS one_cnt
FROM yourTable
GROUP BY
ID,
CONVERT(varchar(7), Date, 120);

Running Count Total with PostgresQL

I'm fairly close to this solution, but I just need a little help getting over the end.
I'm trying to get a running count of the occurrences of client_ids regardless of the date, however I need the dates and ids to still appear in my results to verify everything.
I found part of the solution here but have not been able to modify it enough for my needs.
Here is what the answer should be, counting if the occurrences of the client_ids sequentially :
id client_id deliver_on running_total
1 138 2017-10-01 1
2 29 2017-10-01 1
3 138 2017-10-01 2
4 29 2013-10-02 2
5 29 2013-10-02 3
6 29 2013-10-03 4
7 138 2013-10-03 3
However, here is what I'm getting:
id client_id deliver_on running_total
1 138 2017-10-01 1
2 29 2017-10-01 1
3 138 2017-10-01 1
4 29 2013-10-02 3
5 29 2013-10-02 3
6 29 2013-10-03 1
7 138 2013-10-03 2
Rather than counting the times the client_id appears sequentially, the code counts the time the id appears in the previous date range.
Here is my code and any help would be greatly appreciated.
Thank you,
SELECT n.id, n.client_id, n.deliver_on, COUNT(n.client_id) AS "running_total"
FROM orders n
LEFT JOIN orders o
ON (o.client_id = n.client_id
AND n.deliver_on > o.deliver_on)
GROUP BY n.id, n.deliver_on, n.client_id
ORDER BY n.deliver_on ASC
* EDIT WITH ANSWER *
I ending up solving my own question. Here is the solution with comments:
-- Set "1" for counting to be used later
WITH DATA AS (
SELECT
orders.id,
orders.client_id,
orders.deliver_on,
COUNT(1) -- Creates a column of "1" for counting the occurrences
FROM orders
GROUP BY 1
ORDER BY deliver_on, client_id
)
SELECT
id,
client_id,
deliver_on,
SUM(COUNT) OVER (PARTITION BY client_id
ORDER BY client_id, deliver_on
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -- Counts the sequential client_ids based on the number of times they appear
FROM DATA
Just the answer posted to close the question:
-- Set "1" for counting to be used later
WITH DATA AS (
SELECT
orders.id,
orders.client_id,
orders.deliver_on,
COUNT(1) -- Creates a column of "1" for counting the occurrences
FROM orders
GROUP BY 1
ORDER BY deliver_on, client_id
)
SELECT
id,
client_id,
deliver_on,
SUM(COUNT) OVER (PARTITION BY client_id
ORDER BY client_id, deliver_on
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -- Counts the sequential client_ids based on the number of times they appear
FROM DATA

How can I evaluate data over time in Postgresql?

I need to find users who have posted three times or more, three months in a row. I wrote this query:
select count(id), owneruserid, extract(month from creationdate) as postmonth from posts
group by owneruserid, postmonth
having count(id) >=3
order by owneruserid, postmonth
And I get this:
count owneruserid postmonth
36 -1 1
23 -1 2
45 -1 3
41 -1 4
18 -1 5
24 -1 6
31 -1 7
78 -1 8
83 -1 9
17 -1 10
88 -1 11
127 -1 12
3 6 11
3 7 12
4 8 1
8 8 12
4 12 4
3 12 5
3 22 2
4 22 4
(truncated)
Which is great. How can I query for users who posted three times or more, three months or more in a row? Thanks.
This is called the Islands and Gaps problem, specifically it's an Island problem with a date range. You should,
Fix this question up.
Flag it to be sent to dba.stackexchange.com
To solve this,
Create a pseudo column with a window that has 1 if the row preceding it does not correspond to the preceding mont
Create groups out of that with COUNT()
Check to make sure the count(*) for the group is greater than or equal to three.
Query,
SELECT l.id, creationdaterange, count(*)
FROM (
SELECT t.id,
t.creationdate,
count(range_reset) OVER (PARTITION BY t.id ORDER BY creationdate) AS creationdaterange
FROM (
SELECT id,
creationdate,
CASE
WHEN date_trunc('month',creationdate::date)::date - interval '1 month' = date_trunc('month',lag(creationdate))::date OVER (PARTITION BY id ORDER BY creationdate)
THEN 1
END AS range_reset
FROM post
ORDER BY id, creationdate
) AS t;
) AS l
GROUP BY t.id, creationdaterange
HAVING count(*) >= 3;