I have a dataset at id level with some overlapping dates. All I need to find those rows and add an identifier to count the number overlapping records.
Data:
ID ITEM StrDate EndDate
1001 A121 02/01/2022 02/15/2022
1001 B121 03/10/2022 03/10/2022
1002 C121 02/01/2022 02/10/2022
1002 D121 02/05/2022 02/15/2022
1003 E121 03/10/2022 03/21/2022
1003 F121 03/12/2022 03/21/2022
1004 G121 01/12/2022 01/14/2022
Below is the Result that I am expecting
Want:
ID ITEM StrDate EndDate Indicator
1001 A121 02/01/2022 02/15/2022 N
1001 B121 03/10/2022 03/10/2022 N
1002 C121 02/01/2022 02/10/2022 Y
1002 D121 02/05/2022 02/15/2022 Y
1003 E121 03/10/2022 03/21/2022 Y
1003 F121 03/12/2022 03/21/2022 Y
1004 G121 01/12/2022 01/14/2022 N
I tried sorting the data first on StrDate and EndDate
Proc sort data=Data; by ID StrDate EndDate;run;
Then I tried using lag function to find the same id and subtract the dates but I figured that's not the correct way of doing.
I appreciate your help here. thanks
SAS Date values are integers that can be used as an index into a tracking array. This technique is called a direct-index search.
Example:
A double DOW solution can be coded to find the overlapping records. The first loop flags dates in use and the second loop evaluates the range for an overlap by finding a flag via direct-index.
data have;
input ID ITEM $ StrDate EndDate;
attrib strdate enddate format=mmddyy10. informat=mmddyy10.;
datalines;
1001 A121 02/01/2022 02/15/2022
1001 B121 03/10/2022 03/10/2022
1002 C121 02/01/2022 02/10/2022
1002 D121 02/05/2022 02/15/2022
1003 E121 03/10/2022 03/21/2022
1003 F121 03/12/2022 03/21/2022
1004 G121 01/12/2022 01/14/2022
;
data want;
array tracker(100000) _temporary_ ;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
do _i_ = strdate to enddate;
tracker(_i_) + 1; /* flag date using direct-index */
end;
end;
do _n_ = 1 to _n_;
set have;
/* no overlap would mean no dates in range would find a flag set */
/* and loop would exit with _i_ > enddate */
do _i_ = strdate to enddate while (tracker(_i_) = 1);
end;
length overlap_indicator $1;
overlap_indicator = ifc (_i_ > enddate, 'N', 'Y');
output;
end;
call missing (of tracker(*));
drop _: ;
run;
Extend, count and remerge, this is my thought.
*An extra observation added to ID 1002;
data have;
input ID $ ITEM $ StrDate mmddyy10. +1 EndDate mmddyy10.;
format StrDate EndDate mmddyy10.;
cards;
1001 A121 02/01/2022 02/15/2022
1001 B121 03/10/2022 03/10/2022
1002 C121 02/01/2022 02/10/2022
1002 D121 02/05/2022 02/15/2022
1002 D121 03/05/2022 03/15/2022
1003 E121 03/10/2022 03/21/2022
1003 F121 03/12/2022 03/21/2022
1004 G121 01/12/2022 01/14/2022
;
run;
*Extend;
data middle;
set have;
do date=StrDate to EndDate;
output;
end;
run;
*Count and remerge;
proc sql noprint;
create table want as
select distinct a.*, ifc(b.count and a.StrDate<=b.date<=a.EndDate,'Y','N') as Indicator
from have as a
left join (
select id, date, count(date) as count from middle
group by id, date
having count>1
) as b on a.id=b.id
;
quit;
By the way, if not all records overlapping in dates of one ID but you want to flag all of them out, you need to modify table lookup condition by removing the a.StrDate<=b.date<=a.EndDate.
Simple overlap logic:
proc sql;
create table want as
select
a.*,
/* simple overlap logic */
case
when a.strdate <= b.strdate & a.enddate >= b.strdate then 'Y'
when b.strdate < a.strdate & b.enddate >= a.strdate then 'Y'
else 'N'
end as overlap
from
have a
left join
have b
on a.id = b.id /* join on same ids */
and a.item <> b.item /* but not the same item */
;
quit;
Result:
ID ITEM StrDate EndDate overlap
1001 B121 03/10/2022 03/10/2022 N
1001 A121 02/01/2022 02/15/2022 N
1002 D121 02/05/2022 02/15/2022 Y
1002 C121 02/01/2022 02/10/2022 Y
1003 E121 03/10/2022 03/21/2022 Y
1003 F121 03/12/2022 03/21/2022 Y
1004 G121 01/12/2022 01/14/2022 N
Overlap occurs if StartA <= StartB when:
StartA EndA>=StartB
|-------------|
|---------
StartB
Related
How can I update "A" named value with 10.2 where ID equal 1003 in to a postgresql database table.
Json Data Table
Id
Column
1001
{"results":[{"name":"A","value":"7.8"}, {"name":"B","value":"0.5"}]}
1002
{"results":[{"name":"B","value":"5.4"}, {"name":"D","value":"4.5"}]}
1003
{"results":[{"name":"D","value":"4.8"}, {"name":"A","value":"6.7"}]}
Result after update
Id
Column
1001
{"results":[{"name":"A","value":"7.8"}, {"name":"B","value":"0.5"}]}
1002
{"results":[{"name":"B","value":"5.4"}, {"name":"D","value":"4.5"}]}
1003
{"results":[{"name":"D","value":"4.8"}, {"name":"A","value":"10.2"}]}
It isn't a simple query, was able to make it with CTE only. I refer to your example table as test:
with item_in_list_pos as (
select
pos - 1 as pos
from test, jsonb_array_elements(column1->'results') with ordinality a(elem, pos)
where (
id = 1003
and elem->>'name' = 'A'
)
)
update test
set
column1 = jsonb_set(column1, array['results', pos, 'value']::text[], to_jsonb('10.2'::text))
from item_in_list_pos
where (
id = 1003
)
My Data resembles something like this
Column A Column B
101 1001
101 1002
101 1003
101 1004
102 1001
102 1005
102 1006
101 1001
102 1001
Expected Output is like this
column_a unique_column_b_vals
101 4
102 3
Knowing that COUNT function supports a distinct argument
http://www.postgresqltutorial.com/postgresql-count-function/
select column_a , count(distinct column_b)
from f1
group by column_a
I have the raw data looks like
Class Cert Name Benefit Coverage
-------------------------------
1 1001 ABC EHC Family
1 1001 ABC DEN Family
2 1002 XYZ EHC Single
2 1002 XYZ DEN Single
3 1003 LMN EHC Couple
3 1003 LMN DEN Couple.
I want the final output to look like
**Class** **Benefit**
EHC-Single EHC-Couple EHC-Family DEN-Single DEN-Couple DEN-Family
1 1 1
2 1 1
3 1 1
Values below the columns are count of Certificates.
yes you can do it like below. See in SQL Fiddle
;WITH CTE
AS (SELECT COUNT(*) Counts,
Class,
Benefit + '-' + Coverage AS [Benefits]
FROM ##MyTemp
GROUP BY Class,
Benefit,
Coverage)
SELECT Class,
[EHC-Single],
[EHC-Couple],
[EHC-Family],
[DEN-Single],
[DEN-Couple],
[DEN-Family]
FROM CTE
PIVOT(MAX(Counts)
FOR [Benefits] IN ([EHC-Single],
[EHC-Couple],
[EHC-Family],
[DEN-Single],
[DEN-Couple],
[DEN-Family])) AS TempList;
I have two tables:
table1 =tbl_main:
item_id fastec_qty sourse_qty
001 102 100
002 200 230
003 300 280
004 400 500
table2= tbl_dOrder
order_id item_id amount
1001 001 30
1001 002 40
1002 001 50
1002 003 70
How can I write a query so that the result of the tables are as follows:
sum(fastec_qty) sum(sourse_qty) difference1 sum(amount) difference2
1002 1110 -108 190 812
difference1 =sum(fastec_qty)-sum(sourse_qty);
difference2 =sum(fastec_qty)-sum(amount);
select sum(m.fastec_qty)
, sum(m.sourse_qty)
, sum(m.fastec_qty) - sum(m.sourse_qty)
, sum(o.amount)
, sum(m.fastec_qty) - sum(o.amount)
from tbl_main m
, tbl_dOrder o
where m.item_id = o.item_id
group by 1, 2, 3, 4, 5
SELECT sum(a.sourse_qty) as samount, sum(a.fastec_qty) AS amount,
sum(a.sourse_qty- a.fastec_qty) as sfd,
(select sum(ITEM_QTY) from TBL_DO )as qty,
sum(a.fastec_qty) - (select sum(ITEM_QTY) from TBL_DO ) AS difference
FROM tbl_main a group by 1,2,3,4,5
amount samount sfd qty difference
1002 1110 -108 190 812
Thanks All ,
I'll give you a hint, start by joining the tables on the item_id.
select item_id.tbl_main fastec_qty.tbl_main
, source_qty.tbl_main
, order_id.tbl_order
, amount.tbl_order
from tbl_main
, tbl_order
where item_id.tbl_main = item_id.tbl_order;
next step is to sum the three columns and finally do the subtraction.
This question is a follow-up to my previous question. Beside doing merges using hash objects, I am struggling with Hash Objects when it comes to do a lookup within the same database. I have this database where there is a continuous update to the order_number of a client:
Client Order_number New_number
XYZ 1000 1001
1001 1002
ABC 1006 1009
1009 1017
SST 1010 1011
1017 1020
1020 1030
1011 1050
Similarly to my previous question, I need the following:
Client Order_number New_number
XYZ 1000 1001
XYZ 1001 1002
ABC 1006 1009
ABC 1009 1017
SST 1010 1011
ABC 1017 1020
ABC 1020 1030
SST 1011 1050
In other words, when the client name is missing, I use the order_number to match with a previous new_number to find the client.
The orders are first sorted by order_number and then new_number.
I am trying to achieve the code with some changes to the code posted in my previous question but with no success.
This should work if your data is a SAS dataset. This example rewrites to the existing dataset. What is does is filter on all observations where the client is known and then loop through the chain of ordernumbers using a hash while the client is empty.
data orders (keep=c o n rename=(c=client o=order_number n=new_number));
length client $8 order_number 8 new_number 8;
* declare hash object;
if _n_ = 1 then do;
declare hash h(dataset:'orders');
h.definekey('order_number');
h.definedata('client','new_number');
h.definedone();
call missing(client, order_number, new_number);
end;
* set statement with rename of original column names;
set orders (rename=(order_number=o new_number=n client=c) where=(c ne ''));
* find in hash ;
rc = h.find(key:n);
* write first observation;
output;
* do loop through chain of order numbers while client is empty;
do while (rc = 0 and client = '');
* update values of output dataset;
o = n;
n = new_number;
rc = h.find(key:n);
* write current observation;
output;
end;
run;