Looping updates - better way to do this? - tsql

I am stuck on this problem. I am thinking that I may need a looping update (if that exists), but maybe there is a better way?
I am working with claims drug prescription data, so essentially 5 columns
User, Drug, RxStartDate, DaySupply, and
'RxEndDate' = dateadd(dd, DaySupply-1, RxStartDate)
If the same user has 2 prescriptions that overlap (Rx1 EndDate >= Rx2 StartDate), then I need to sum the DaySupply together.
Once I sum the DaySupply, the RxEndDate will extend and I need to check again if there is overlap in the prescription.
Currently I have the following code that I have to run and re-run until I don't have anymore updates, but I know there must be a better way to do this...
UPDATE b
SET b.RxStartDate= a.RxStartDate
FROM RxClaims a
JOIN RxClaims b on a.User=b.User and a.Drug = b.Drug
WHERE b.RxStartDate<= a.RxEndDate
and a.RxStartDate< b.RxStartDate
SELECT User, Drug, RxStartDate, sum(DaySupply) as DaySupply,
'RxEndDate' = dateadd(dd, sum(DaySupply)-1, RxStartDate)
into RxClaims2
from RxClaims
group by User, Drug, RxStartDate
Thoughts anyone?
sample data:
User Drug RxStartDate DaySupply RxEndDate
Amy Humera 2/12/2017 7 2/18/2017
Amy Humera 2/28/2017 5 3/4/2017
Amy Humera 3/3/2017 5 3/7/2017
Amy Humera 3/8/2017 2 3/9/2017
Amy Humera 3/10/2017 7 3/16/2017
Amy Humera 3/17/2017 30 4/15/2017
Amy Humera 3/22/2017 2 3/23/2017
Amy Humera 3/24/2017 2 3/25/2017
Amy Humera 3/31/2017 3 4/2/2017
Amy Humera 4/7/2017 5 4/11/2017
Amy Humera 4/13/2017 30 5/12/2017
after 1st time running my current code
User Drug RxStartDate DaySupply RxEndDate
Amy Humera 2/12/2017 7 2/18/2017
Amy Humera 2/28/2017 10 3/9/2017
Amy Humera 3/8/2017 2 3/9/2017
Amy Humera 3/10/2017 7 3/16/2017
Amy Humera 3/17/2017 72 5/27/2017
after 2nd time running my current code
User Drug RxStartDate DaySupply RxEndDate
Amy Humera 2/12/2017 7 2/18/2017
Amy Humera 2/28/2017 12 3/11/2017
Amy Humera 3/10/2017 7 3/16/2017
Amy Humera 3/17/2017 72 5/27/2017
after 3rd time running my current code
User Drug RxStartDate DaySupply RxEndDate
Amy Humera 2/12/2017 7 2/18/2017
Amy Humera 2/28/2017 19 3/18/2017
Amy Humera 3/17/2017 72 5/27/2017
after 4th time running my current code
User Drug RxStartDate DaySupply RxEndDate
Amy Humera 2/12/2017 7 2/18/2017
Amy Humera 2/28/2017 91 5/29/2017
There is no more overlap…finished!

I think the solution can only be implemented by recursion, as there should be a loop that calculates the accumulated DaySupply and I see no way of doing that with any non-recursive lookups. You can do this with recursive CTE.
A possible implementation:
DECLARE #test TABLE (
[User] VARCHAR(100),
Drug VARCHAR(100),
RxStartDate DATE,
DaySupply INT,
RxEndDate DATE
)
INSERT #test
VALUES
('Amy', 'Humera', '2/12/2017', '7', '2/18/2017'),
('Amy', 'Humera', '2/28/2017', '5', '3/4/2017'),
('Amy', 'Humera', '3/3/2017', '5', '3/7/2017'),
('Amy', 'Humera', '3/8/2017', '2', '3/9/2017'),
('Amy', 'Humera', '3/10/2017', '7', '3/16/2017'),
('Amy', 'Humera', '3/17/2017', '30', '4/15/2017'),
('Amy', 'Humera', '3/22/2017', '2', '3/23/2017'),
('Amy', 'Humera', '3/24/2017', '2', '3/25/2017'),
('Amy', 'Humera', '3/31/2017', '3', '4/2/2017'),
('Amy', 'Humera', '4/7/2017', '5', '4/11/2017'),
('Amy', 'Humera', '4/13/2017', '30', '5/12/2017'),
('Amy', 'Other', '3/24/2017', '7', '3/30/2017'),
('Amy', 'Other', '3/31/2017', '3', '4/2/2017'),
('Amy', 'Other', '4/7/2017', '5', '4/11/2017'),
('Amy', 'Other', '4/13/2017', '30', '5/12/2017'),
('Joe', 'Humera', '3/24/2017', '8', '3/31/2017'),
('Joe', 'Humera', '3/31/2017', '3', '4/2/2017'),
('Joe', 'Humera', '4/12/2017', '5', '4/16/2017'),
('Joe', 'Humera', '4/23/2017', '30', '5/22/2017'),
('Joe', 'Other', '3/24/2017', '60', '5/23/2017'),
('Joe', 'Other', '3/31/2017', '3', '4/2/2017'),
('Joe', 'Other', '4/7/2017', '5', '4/11/2017'),
('Joe', 'Other', '4/13/2017', '30', '5/12/2017')
-- You can comment this out, it is just to show progress:
SELECT * FROM #test ORDER BY [User], Drug, RxStartDate
DECLARE #test_2 TABLE (
[User] VARCHAR(100),
Drug VARCHAR(100),
RxStartDate_base DATE,
DaySupplyCumulative INT
)
;WITH CTE_RxEndDateExtended as (
SELECT [User], Drug, RxStartDate, DaySupply, DaySupply as DaySupplyCumulative, RxStartDate as RxStartDate_base, RxStartDate as RxStartDateExtended, dateadd (dd, DaySupply, RxStartDate) as RxEndDateExtended
FROM #test
-- WHERE [User] = 'Amy' and Drug = 'Humera' and RxStartDate = '2/28/2017'
UNION ALL
SELECT t.[User], t.Drug, t.RxStartDate, t.DaySupply, c.DaySupplyCumulative + t.DaySupply as DaySupplyCumulative, c.RxStartDate_base, t.RxStartDate as RxStartDateExtended, dateadd (dd, t.DaySupply, c.RxEndDateExtended) as RxEndDateExtended
FROM CTE_RxEndDateExtended as c INNER JOIN #test as t
on c.[User] = t.[User] and c.Drug = t.Drug
and c.RxEndDateExtended >= t.RxStartDate and c.RxStartDateExtended < t.RxStartDate
)
INSERT #test_2
SELECT [User], Drug, RxStartDate_base, MAX (DaySupplyCumulative) as DaySupplyCumulative -- comment this out and use this for debugging: SELECT *
FROM CTE_RxEndDateExtended
GROUP BY [User], Drug, RxStartDate_base -- comment this out for debugging
OPTION (MAXRECURSION 0) -- comment this out and use this for debugging (to avoid infinite loops): OPTION (MAXRECURSION 1000)
-- You can comment this out, it is just to show progress:
SELECT * FROM #test_2
ORDER BY [User], Drug, RxStartDate_base -- comment this out and use this for debugging: ORDER BY [User], Drug, RxStartDate_base, RxStartDate, DaySupplyCumulative
SELECT base.*, dateadd (dd, base.DaySupplyCumulative - 1, base.RxStartDate_base) as RxEndDateCumulative
FROM #test_2 as base LEFT OUTER JOIN #test_2 as filter
on base.[User] = filter.[User] and base.Drug = filter.Drug
and base.RxStartDate_base > filter.RxStartDate_base
and dateadd (dd, base.DaySupplyCumulative, base.RxStartDate_base) <= dateadd (dd, filter.DaySupplyCumulative, filter.RxStartDate_base)
WHERE filter.[User] IS NULL
ORDER BY [User], Drug, RxStartDate_base
Maybe you need to optimize it by simplifying the logic. But be careful not to make an infinite loop. When debugging use OPTION (MAXRECURSION N) with N other than zero.

Related

PostgreSQL: Exclude duplicates as sorted by another key

Consider the following table that stores update history of some attributes of certain objects, organized by effective and published dates:
create table update_history(
obj_id integer,
effective date,
published date,
attr1 text,
attr2 integer,
attr3 boolean,
primary key(obj_id, effective, published)
);
insert into update_history values
(1, '2021-01-01', '2021-01-01', 'foo', null, null),
(1, '2021-01-01', '2021-01-02', null, 1, false),
(1, '2021-01-02', '2021-01-01', 'foo', 1, false),
(1, '2021-01-02', '2021-01-02', 'bar', 1, false),
(1, '2021-01-03', '2021-01-01', 'bar', 1, true),
(1, '2021-01-04', '2021-01-01', 'bar', 1, true),
(1, '2021-01-05', '2021-01-01', 'bar', 2, true),
(1, '2021-01-05', '2021-01-02', 'bar', 1, true),
(1, '2021-01-05', '2021-01-03', 'bar', 1, true),
(1, '2021-01-06', '2021-01-04', 'bar', 1, true)
;
I need to write a PostgreSQL query that will simplify the history view for a given obj_id by excluding those update records that did not change any attributes from the immediately preceding update as ordered by effective and published columns. In essence those would be rows ## 6, 9 and 10, marked in italic in the table below:
#
obj_id
effective
published
attr1
attr2
attr3
1
1
2021-01-01
2021-01-01
foo
(null)
(null)
2
1
2021-01-01
2021-01-02
(null)
1
false
3
1
2021-01-02
2021-01-01
foo
1
false
4
1
2021-01-02
2021-01-02
bar
1
false
5
1
2021-01-03
2021-01-01
bar
1
true
6
1
2021-01-04
2021-01-01
bar
1
true
7
1
2021-01-05
2021-01-01
bar
2
true
8
1
2021-01-05
2021-01-02
bar
1
true
9
1
2021-01-05
2021-01-03
bar
1
true
10
1
2021-01-06
2021-01-04
bar
1
true
Keep in mind that in the real life case there are way more attributes to deal with and I don't want the query to get too messy.
The closest I got to the desired result was using the rank window function:
select
obj_id, effective, published,
attr1, attr2, attr3
from (
select *,
rank() over (
partition by attr1, attr2, attr3
order by effective, published
) as rank
from update_history
where obj_id = 1) as d
where rank = 1
order by effective, published;
That results in this:
obj_id
effective
published
attr1
attr2
attr3
1
2021-01-01
2021-01-01
foo
(null)
(null)
1
2021-01-01
2021-01-02
(null)
1
false
1
2021-01-02
2021-01-01
foo
1
false
1
2021-01-02
2021-01-02
bar
1
false
1
2021-01-03
2021-01-01
bar
1
true
1
2021-01-05
2021-01-01
bar
2
true
As you can see, row #8 from the original table is erroneously excluded, although it changed attr2 from the its previous row, #7. Apparently, the problem is that partitioning is applied before sorting in the window definition.
I wonder if there is another way to accomplish this with a single PostgresSQL query.
I would use the lag() for this:
select *
from (
select obj_id, effective, published,
attr1, attr2, attr3,
(attr1, attr2, attr3) is distinct from lag( (attr1,attr2,attr3) ) over (partition by obj_id order by effective, published) as is_different
from update_history
) t
where is_different

Redshift - Many Columns to Rows (Unpivot)

In Redshift :
I've a table with 30 dimension fields and more than 150 measure fields.
To make good use of these data in a visualization tool (Tableau), I need to Unpivot the measure columns into only one measure and one dimension to categorize them.
Short Example:
Date Country Order Banana Apple Orange Kiwi Lemon
1-10-2018 Belgium XYZ789 14 0 10 16 7
1-10-2018 Germany ABC123 10 15 3 15 3
2-10-2018 Belgium KLM456 9 9 7 1 7
Result :
Date Country Order Measure_Name Measure_Value
1-10-2018 Belgium XYZ789 Banana 14
1-10-2018 Belgium XYZ789 Apple 0
1-10-2018 Belgium XYZ789 Orange 10
1-10-2018 Belgium XYZ789 Kiwi 16
1-10-2018 Belgium XYZ789 Lemon 7
1-10-2018 Germany ABC123 Banana 10
1-10-2018 Germany ABC123 Apple 15
1-10-2018 Germany ABC123 Orange 3
1-10-2018 Germany ABC123 Kiwi 15
1-10-2018 Germany ABC123 Lemon 3
2-10-2018 Belgium KLM456 Banana 9
2-10-2018 Belgium KLM456 Apple 9
2-10-2018 Belgium KLM456 Orange 7
2-10-2018 Belgium KLM456 Kiwi 1
2-10-2018 Belgium KLM456 Lemon 7
I know and I've tried the 'UNION ALL' solution but my table count millions of rows, and more than 150 columns to unpivot is really too huge for this solution. (Even The SQL is more than 8k rows long)
Do you have any Idea to help me ?
Thanks a lot,
When writing this code in an 'imperative' way, you'd like to generate more rows out of one, possibly using something like flatMap (or equivalent in your programming language). To generate rows in SQL, you have to use JOIN.
This problem can be solved by (CROSS)JOINing your table with another, having as many rows as there are columns to unpivot. You need to add some conditional magic and Voila!.
CREATE TABLE t (
"Date" date,
"Country" varchar,
"Order" varchar,
"Banana" varchar,
"Apple" varchar,
"Orange" varchar,
"Kiwi" varchar,
"Lemon" varchar
);
INSERT INTO t VALUES ('1-10-2018', 'Belgium', 'XYZ789', '14', '0', '10', '16', '7');
INSERT INTO t VALUES ('1-10-2018', 'Germany', 'ABC123', '10', '15', '3', '15', '3');
INSERT INTO t VALUES ('2-10-2018', 'Belgium', 'KLM456', '9', '9', '7', '1', '7');
WITH
cols as (
select 'Banana' as c
union all
select 'Apple' as c
union all
select 'Orange' as c
union all
select 'Kiwi' as c
union all
select 'Lemon' as c
)
select
"Date",
"Country",
"Order",
c "Fruit Type",
CASE c
WHEN 'Banana' THEN "Banana"
WHEN 'Apple' THEN "Apple"
WHEN 'Orange' THEN "Orange"
WHEN 'Kiwi' THEN "Kiwi"
WHEN 'Lemon' THEN "Lemon"
ELSE NULL
END as "Amount Ordered"
from t cross join cols;
https://www.db-fiddle.com/f/kojuPAjpS5twCKXSPVqYyP/3
Given that you have 150 columns to transpose, I do not think its feasible to do it with SQL. I have had almost the same exact scenario and used python to solve it. The pseudo-code and explanation is in this question
Redshift. How can we transpose (dynamically) a table from columns to rows?

Populate column rows based on common name (subquery returns only one value)

I am trying to populate the blanks in CABSN column with the SN that matches the same name in TempName Column
TempName CabSN SN Name Order RowID
DevCab01 SN12345 SN12345 DevCab01 19 1
DevCab01 SN12346 Test2 18 2
DevCab01 SN12347 Test3 17 3
DevCab01 SN12348 Test4 16 4
DevCab01 SN12352 Test8 15 5
DevCab01 SN12353 Test9 14 6
DevCab01 SN12354 Test10 13 7
DevCab02 SN12355 SN12355 DevCab02 9 8
DevCab02 SN12356 Test12 8 9
DevCab02 SN12357 Test13 7 10
DevCab02 SN12358 Test14 6 11
DevCab03 SN12359 SN12359 DevCab03 5 12
DevCab03 SN12360 Test16 4 13
DevCab03 SN12361 Test17 3 14
DevCab04 SN12349 SN12349 DevCab04 15 15
DevCab04 SN12350 Test6 14 16
DevCab04 SN12351 Test7 13 17
My script attempt (which failed) at populating the blank rows in CabSN with the matching TempName
DECLARE #CabID AS nvarchar(50)
SET #CabID = NULL
(regardless where i had placed the variable it didn't work, displayed more than one value returned)
UPDATE m
set
m.[CabSN] =
CASE WHEN m.[CabSN] is NULL
THEN (
SELECT m3.[CabSN]
FROM [tblname1] m3
JOIN inserted i ON i.[TempName] = m3.[TempName]
WHERE m3.[RowID] =
(
SELECT MAX(i.RowID)
FROM [tblname1] m2
JOIN inserted i ON i.[TempName] = m2.[TempName]
WHERE m2.[RowID] < m.[RowID]
and m2.[CabSN] is not NULL)
)
ELSE m.[CabSN]
Full working example:
DECLARE #DataSource TABLE
(
[TempName] VARCHAR(12)
,[CabSN] VARCHAR(12)
,[SN] VARCHAR(12)
,[Name] VARCHAR(12)
,[Order] SMALLINT
,[RowID] SMALLINT
);
INSERT INTO #DataSource ([TempName], [CabSN], [SN], [Name], [Order], [RowID])
VALUES ('DevCab01', 'SN12345', 'SN12345', 'DevCab01', '19', '1')
,('DevCab01', '', 'SN12346', 'Test2', ' 18', '2')
,('DevCab01', '', 'SN12347', 'Test3', ' 17', '3')
,('DevCab01', '', 'SN12348', 'Test4', ' 16', '4')
,('DevCab01', '', 'SN12352', 'Test8', ' 15', '5')
,('DevCab01', '', 'SN12353', 'Test9', ' 14', '6')
,('DevCab01', '', 'SN12354', 'Test10', '13', '7')
,('DevCab02', 'SN12355', 'SN12355', 'DevCab02', '9', '8')
,('DevCab02', '', 'SN12356', 'Test12', ' 8', '9')
,('DevCab02', '', 'SN12357', 'Test13', ' 7', '10')
,('DevCab02', '', 'SN12358', 'Test14', ' 6', '11')
,('DevCab03', 'SN12359', 'SN12359', 'DevCab03', '5', '12')
,('DevCab03', '', 'SN12360', 'Test16', ' 4', '13')
,('DevCab03', '', 'SN12361', 'Test17', ' 3', '14')
,('DevCab04', 'SN12349', 'SN12349', 'DevCab04', '15', '15')
,('DevCab04', '', 'SN12350', 'Test6', ' 14', '16')
,('DevCab04', '', 'SN12351', 'Test7', ' 13', '17');
WITH DataSource AS
(
SELECT DISTINCT [TempName]
,[CabSN]
FROM #DataSource
WHERE [CabSN] <> ''
)
UPDATE #DataSource
SET [CabSN] = S.[CabSN]
FROM #DataSource T
INNER JOIN DataSource S
ON T.[TempName] = S.[TempName]
WHERE T.[CabSN] = '';
SELECT *
FROM #DataSource;

PostgreSQL, SUM and GROUP from numeric column and hstore

I would kindly ask if someone could make me a query which may SUM up values from numeric column and from hstore column. This is obviously too much for my SQL abbilities.
A table:
DROP TABLE IF EXISTS mytry;
CREATE TABLE IF NOT EXISTS mytry
(mybill int, price numeric, paym text, combined_paym hstore);
INSERT INTO mytry (mybill, price, paym, combined_paym)
VALUES (10, 10.14, '0', ''),
(11, 23.56, '0', ''),
(12, 12.16, '3', ''),
(13, 12.00, '6', '"0"=>"4","3"=>"4","2"=>"4"'),
(14, 14.15, '6', '"0"=>"2","1"=>"4","3"=>"4","4"=>"4.15"'),
(15, 13.00, '1', ''),
(16, 9.00, '4', ''),
(17, 4.00, '4', ''),
(18, 4.00, '1', '');
Here is a list of bills, price and payment method for each bill.
Some bills (here 13 and 14) could have combined payment. Payment methods are enumerated from 0 to 5 which describes specific payment method.
For this I make this query:
SELECT paym, SUM(price) FROM mytry WHERE paym::int<6 GROUP BY paym ORDER BY paym;
This sums prices for payment methods 0-5. 6 is not payment method but a flag which means that we should here consider payment methods and prices from hstore 'combined_paym'. This is what I don't know how to solve. To sum payment methods and prices from 'combined paym' with ones from 'paym' and 'price'.
This query gives result:
"0";33.70
"1";17.00
"3";12.16
"4";13.00
But result is incorrect because here are not summed data from bill's 13 and 14.
Real result should be:
"0";39.70
"1";21.00
"2";4.00
"3";20.16
"4";17.15
Please if someone can make me proper query which would give this last result from given data.
Unnest the hstore column:
select key, value::dec
from mytry, each(combined_paym)
where paym::int = 6
key | value
-----+-------
0 | 4
2 | 4
3 | 4
0 | 2
1 | 4
3 | 4
4 | 4.15
(7 rows)
and use it in union:
select paym, sum(price)
from (
select paym, price
from mytry
where paym::int < 6
union all
select key, value::dec
from mytry, each(combined_paym)
where paym::int = 6
) s
group by 1
order by 1;
paym | sum
------+-------
0 | 39.70
1 | 21.00
2 | 4
3 | 20.16
4 | 17.15
(5 rows)

T-SQL pulling multiple columns of data from a single column field

I am trying to pull 3 columns of data from one field. basically i have a field with for arguments sake a table with the following data:
Color,
Model,
Year of a car.
It is itemized as ID4 is Color, ID5 is Model and ID6 is Year. I can pull one data set with no problem using a filter, ex. Filter = 4, 5 or 6. But I cannot pull multiples as I just get the headers and no data at all.
Assuming you are using SQL Server 2005+, and your question really is "how do you break one column in a table into multiple named columns based on another field in the same table", here is a simple example patterned after your question.
Give this dataset:
declare #tbl table (id int, tag char(3), data varchar(255))
insert into #tbl values
(1, 'ID4', 'Red'), (1, 'ID5', 'Toyota'), (1, 'ID6', '1999'),
(2, 'ID4', 'Blue'), (2, 'ID5', 'Honda'), (2, 'ID6', '2000'),
(3, 'ID4', 'Green'), (3, 'ID5', 'Nissan'), (3, 'ID6', '2004'),
(4, 'ID4', 'Red'), (4, 'ID5', 'Nissan'), (4, 'ID6', '1990'),
(5, 'ID4', 'Black'), (5, 'ID5', 'Toyota'), (5, 'ID6', '2002')
A simple select statement returns this data:
select * from #tbl
id tag data
1 ID4 Red
1 ID5 Toyota
1 ID6 1999
2 ID4 Blue
2 ID5 Honda
2 ID6 2000
3 ID4 Green
3 ID5 Nissan
3 ID6 2004
4 ID4 Red
4 ID5 Nissan
4 ID6 1990
5 ID4 Black
5 ID5 Toyota
5 ID6 2002
This pivot query returns the data -- one row per car -- with Color, Model and Year as their own columns:
select id, [ID4] as 'Color', [ID5] as 'Model', [ID6] as 'Year'
from (select id, tag, data from #tbl) as p
pivot (max(data) for tag in ([ID4], [ID5], [ID6])) as pvt
order by pvt.id
This is how the output looks:
id Color Model Year
1 Red Toyota 1999
2 Blue Honda 2000
3 Green Nissan 2004
4 Red Nissan 1990
5 Black Toyota 2002