trouble getting rid of duplicate rows

trouble getting rid of duplicate rows - postgresql

I have base data that has multiple lab items (A, B, C) that occur on same date.
id datetime_1 order_datetime item value
-----------------------------------------------------
1 9/1/21 09:57 9/2/21 04:21 A 13
1 9/1/21 09:57 9/2/21 04:21 B 8
1 9/1/21 09:57 9/2/21 04:21 C 11
1 9/1/21 09:57 9/3/21 16:00 A 10
1 9/1/21 09:57 9/3/21 16:00 B 4
1 9/1/21 09:57 9/3/21 16:00 C 7
1 9/2/21 02:30 9/2/21 04:21 A 13
1 9/2/21 02:30 9/2/21 04:21 B 8
1 9/2/21 02:30 9/2/21 04:21 C 11
1 9/2/21 02:30 9/3/21 16:00 A 10
1 9/2/21 02:30 9/3/21 16:00 B 4
1 9/2/21 02:30 9/3/21 16:00 C 7
I need output to show as :
id datetime_1 a_level b_level c_level
------------------------------------------------
1 9/1/21 09:57 13 8 11
1 9/2/21 02:30 13 8 11
My current code is:
with lab_setup as (
select id, datetime_1, row_number() over (partition by id, datetime_1 order by order_datetime) as lab_order)
from data
group by id, datetime_1, order_datetime
)
, lab_first as (
select id, datetime_1,
max(case when item = 'A' then value end) as a_level,
max(case when item = 'B' then value end) as b_level,
max(case when item = 'C' then value end) as c_level
from lab_setup
group by id, datetime_1, item, value
)
select *
from lab_first
group by id, datetime_1, a_level, b_level, c_level
The problem is that I keep getting duplicate rows in response to this code, looking like:
id datetime_1 a_level b_level c_level
------------------------------------------------
1 9/1/21 09:57 13 null null
1 9/1/21 09:57 null null 11
1 9/1/21 09:57 null 8 null
I've tried distinct, group by, max(case when) but so far it still provides multiple rows per datetime_1, which is not what I want. Does anyone have clue how to help merge these multiple rows into one?

You are close, but have a lot of extra, unnecessary work, making the query more complicated then needed. First off the query as posted did not produce the posted results. Seems somewhere along the line you grabbed the wrong query. The posted query is not valid. The lab_first cte uses item and value from lab_setup, but lab_setup does not contain either of them. Further it seems the purpose on of lab first is to derive the column lab_order, but that is not used afterward. Finally the main query selects only what would have been selected from the lab first cte without change. Thus neither cte is needed. Just incorporate the filtering (max functions) into the main. So (see demo)
select id
, datetime_1
, max(case when item = 'A' then value end) as a_level
, max(case when item = 'B' then value end) as b_level
, max(case when item = 'C' then value end) as c_level
from data
group by id, datetime_1
order by id, datetime_1;
Note on Demo: Just repeating the target data values but change the control values. How do I know afterword except for I not picking up the the exact same target values. Further a universe of 1 (id) is not sufficiently discriminating. For demo I changed of the value on a couple rows and added a few. The results look good, but you have to decide.

Related

Getting data from alternate dates of same ID column

I've a table data as below, now I need to fetch the record with in same code, where (Value2-Value1)*2 of one row >= (Value2-Value1) of consequtive date row. (all dates are uniform with in all codes)
---------------------------------------
code Date Value1 Value2
---------------------------------------
1 1-1-2018 13 14
1 2-1-2018 14 16
1 4-1-2018 15 18
2 1-1-2019 1 3
2 2-1-2018 2 3
2 4-1-2018 3 7
ex: output needs to be
1 1-1-2018 13 14
as I am begginer to SQL coding, tried my best, but cannot get through with compare only on consequtive dates.

Use a self join.
You can specify all the conditions you've listed in the ON clause:
SELECT T0.code, T0.Date, T0.Value1, T0.Value2
FROM Table As T0
JOIN Table As T1
ON T0.code = T1.code
AND T0.Date = DateAdd(Day, 1, T1.Date)
AND (T0.Value2 - T0.Value1) * 2 >= T1.Value2 - T1.Value1

PostgreSQL window function & difference between dates

Suppose I have data formatted in the following way (FYI, total row count is over 30K):
customer_id order_date order_rank
A 2017-02-19 1
A 2017-02-24 2
A 2017-03-31 3
A 2017-07-03 4
A 2017-08-10 5
B 2016-04-24 1
B 2016-04-30 2
C 2016-07-18 1
C 2016-09-01 2
C 2016-09-13 3
I need a 4th column, let's call it days_since_last_order which, in the case where order_rank = 1 then 0 else calculate the number of days since the previous order (with rank n-1).
So, the above would return:
customer_id order_date order_rank days_since_last_order
A 2017-02-19 1 0
A 2017-02-24 2 5
A 2017-03-31 3 35
A 2017-07-03 4 94
A 2017-08-10 5 38
B 2016-04-24 1 0
B 2016-04-30 2 6
C 2016-07-18 1 79
C 2016-09-01 2 45
C 2016-09-13 3 12
Is there an easier way to calculate the above with a window function (or similar) rather than join the entire dataset against itself (eg. on A.order_rank = B.order_rank - 1) and doing the calc?
Thanks!

use the lag window function
SELECT
customer_id
, order_date
, order_rank
, COALESCE(
DATE(order_date)
- DATE(LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date))
, 0)
FROM <table_name>

How can I evaluate data over time in Postgresql?

I need to find users who have posted three times or more, three months in a row. I wrote this query:
select count(id), owneruserid, extract(month from creationdate) as postmonth from posts
group by owneruserid, postmonth
having count(id) >=3
order by owneruserid, postmonth
And I get this:
count owneruserid postmonth
36 -1 1
23 -1 2
45 -1 3
41 -1 4
18 -1 5
24 -1 6
31 -1 7
78 -1 8
83 -1 9
17 -1 10
88 -1 11
127 -1 12
3 6 11
3 7 12
4 8 1
8 8 12
4 12 4
3 12 5
3 22 2
4 22 4
(truncated)
Which is great. How can I query for users who posted three times or more, three months or more in a row? Thanks.

This is called the Islands and Gaps problem, specifically it's an Island problem with a date range. You should,
Fix this question up.
Flag it to be sent to dba.stackexchange.com
To solve this,
Create a pseudo column with a window that has 1 if the row preceding it does not correspond to the preceding mont
Create groups out of that with COUNT()
Check to make sure the count(*) for the group is greater than or equal to three.
Query,
SELECT l.id, creationdaterange, count(*)
FROM (
SELECT t.id,
t.creationdate,
count(range_reset) OVER (PARTITION BY t.id ORDER BY creationdate) AS creationdaterange
FROM (
SELECT id,
creationdate,
CASE
WHEN date_trunc('month',creationdate::date)::date - interval '1 month' = date_trunc('month',lag(creationdate))::date OVER (PARTITION BY id ORDER BY creationdate)
THEN 1
END AS range_reset
FROM post
ORDER BY id, creationdate
) AS t;
) AS l
GROUP BY t.id, creationdaterange
HAVING count(*) >= 3;

T-SQL Determine Status Changes in History Table

I have an application which logs changes to records in the "production" table to a "history" table. The history table is basically a field for field copy of the production table, with a few extra columns like last modified date, last modified by user, etc.
This works well because we get a snapshot of the record anytime the record changes. However, it makes it hard to determine unique status changes to a record. An example is below.
BoxID StatusID SubStatusID ModifiedTime
1 4 27 2011-08-11 15:31
1 4 11 2011-08-11 15:28
1 4 11 2011-08-10 09:07
1 5 14 2011-08-09 08:53
1 5 14 2011-08-09 08:19
1 4 11 2011-08-08 14:15
1 4 9 2011-07-27 15:52
1 4 9 2011-07-27 15:49
1 2 8 2011-07-26 12:00
As you can see in the above table (data comes from the real system with other fields removed for brevity and security) BoxID 1 has had 9 changes to the production record. Some of those updates resulted in statuses being changed and some did not, which means other fields (those not shown) have changed.
I need to be able, in TSQL, to extract from this data the unique status changes. The output I am looking for, given the above input table, is below.
BoxID StatusID SubStatusID ModifiedTime
1 4 27 2011-08-11 15:31
1 4 11 2011-08-10 09:07
1 5 14 2011-08-09 08:19
1 4 11 2011-08-08 14:15
1 4 9 2011-07-27 15:49
1 2 8 2011-07-26 12:00
This is not as easy as grouping by StatusID and SubStatusID and taking the min(ModifiedTime) then joining back into the history table since statuses can go backwards as well (see StatusID 4, SubStatusID 11 gets set twice).
Any help would be greatly appreciated!

Does this do work for you
;WITH Boxes_CTE AS
(
SELECT Boxid, StatusID, SubStatusID, ModifiedTime,
ROW_NUMBER() OVER (PARTITION BY Boxid ORDER BY ModifiedTime) AS SEQUENCE
FROM Boxes
)
SELECT b1.Boxid, b1.StatusID, b1.SubStatusID, b1.ModifiedTime
FROM Boxes_CTE b1
LEFT OUTER JOIN Boxes_CTE b2 ON b1.Boxid = b2.Boxid
AND b1.Sequence = b2.Sequence + 1
WHERE b1.StatusID <> b2.StatusID
OR b1.SubStatusID <> b2.SubStatusID
OR b2.StatusID IS NULL
ORDER BY b1.ModifiedTime DESC
;

Select BoxID,StatusID,SubStatusID FROM Staty CurrentStaty
INNER JOIN ON
(
Select BoxID,StatusID,SubStatusID FROM Staty PriorStaty
)
Where Staty.ModifiedTime=
(Select Max(PriorStaty.ModifiedTime) FROM PriorStaty
Where PriortStaty.ModifiedTime<Staty.ModifiedTime)
AND Staty.BoxID=PriorStaty.BoxID
AND NOT (
Staty.StatusID=PriorStaty.StatusID
AND
Staty.SubStatusID=PriorStaty.StatusID
)

How do I add totals/subtotals to a set of results without grouping the row data?

I'm constructing a SQL query for a business report. I need to have both subtotals (grouped by file number) and grand totals on the report.
I'm entering unknown SQL territory, so this is a bit of a first attempt. The query I made is almost working. The only problem is that the entries are being grouped -- I need them separated in the report.
Here is my sample data:
FileNumber Date Cost Charge
3 Dec 22/09 5 10
3 Jan 13/10 6 15
3B Mar 28/10 1 3
3B Mar 28/10 5 10
When I run this query
SELECT
CASE
WHEN (GROUPING(FileNumber) = 1) THEN NULL
ELSE FileNumber
END AS FileNumber,
CASE
WHEN (GROUPING(Date) = 1) THEN NULL
ELSE Date
END AS Date,
SUM(Cost) AS Cost,
SUM(Charge) AS Charge
FROM SubtotalTesting
GROUP BY FileNumber, Date WITH ROLLUP
ORDER BY
(CASE WHEN FileNumber IS NULL THEN 1 ELSE 0 END), -- Put NULLs after data
FileNumber,
(CASE WHEN Date IS NULL THEN 1 ELSE 0 END), -- Put NULLs after data
Date
I get the following:
FileNumber Date Cost Charge
3 Dec 22/09 5 10
3 Jan 13/10 6 15
3 NULL 11 25
3B Mar 28/10 6 13 <--
3B NULL 6 13
NULL NULL 17 38
What I want is:
FileNumber Date Cost Charge
3 Dec 22/09 5 10
3 Jan 13/10 6 15
3 NULL 11 25
3B Mar 28/10 1 3 <--
3B Mar 28/10 5 10 <--
3B NULL 6 13
NULL NULL 17 38
I can clearly see why the entries are being grouped, but I have no idea how to separate them while still returning the subtotals and grand total.
I'm a bit green when it comes to doing advanced SQL queries like this, so if I'm taking the wrong approach to the problem by using WITH ROLLUP, please suggest some preferred alternatives -- you don't have to write the whole query for me, I just need some direction. Thanks!

WITH SubtotalTesting (FileNumber, Date, Cost, Charge) AS
(
SELECT '3', CAST('2009-22-12' AS DATETIME), 5, 10
UNION ALL
SELECT '3', '2010-13-06', 6, 15
UNION ALL
SELECT '3B', '2010-28-03', 1, 3
UNION ALL
SELECT '3B', '2010-28-03', 5, 10
),
q AS (
SELECT *,
ROW_NUMBER() OVER (ORDER BY filenumber) AS rn
FROM SubTotalTesting
)
SELECT rn,
CASE
WHEN (GROUPING(FileNumber) = 1) THEN NULL
ELSE FileNumber
END AS FileNumber,
CASE
WHEN (GROUPING(Date) = 1) THEN NULL
ELSE Date
END AS Date,
SUM(Cost) AS Cost,
SUM(Charge) AS Charge
FROM q
GROUP BY
FileNumber, Date, rn WITH ROLLUP
HAVING GROUPING(rn) <= GROUPING(Date)
ORDER BY
(CASE WHEN FileNumber IS NULL THEN 1 ELSE 0 END),
FileNumber,
(CASE WHEN Date IS NULL THEN 1 ELSE 0 END),
Date