How can I achieve this selection in SAS

How can I achieve this selection in SAS - select

say I have a SAS table tbl which has a column col. This column col holds different values say {"a","s","d","f",...} but one is MUCH more present than the other (say "d"). How can I do a select only this value
It would be something like
data tbl;
set tbl;
where col eq "the most present element of col in this case d";
run;

One of many methods to accomplish this...
data test;
n+1;
input col $;
datalines;
a
b
c
d
d
d
d
e
f
g
d
d
a
b
d
d
;
run;
proc freq data=test order=freq; *order=freq automatically puts the most frequent on top;
tables col/out=test_count;
run;
data want;
set test;
if _n_ = 1 then set test_count(keep=col rename=col=col_keep);
if col = col_keep;
run;
To put this into a macro variable (see comments):
data _null_;
set test_count;
call symput("mvar",col); *put it to a macro variable;
stop; *only want the first row;
run;

I would use PROC SQL for this.
Here's an example that gets "d" into a macro variable and then filters the original dataset, as requested in your question.
This will work even if there is a multi-way tie for the most frequent observation.
data tbl;
input col: $1.;
datalines;
a
a
b
b
b
b
c
c
c
c
d
d
d
;run;
proc sql noprint;
create table tbl_freq as
select col, count(*) as freq
from tbl
group by col;
select quote(col) into: mode_values separated by ', '
from tbl_freq
where freq = (select max(freq) from tbl_freq);
quit;
%put mode_values = &mode_values.;
data tbl_filtered;
set tbl;
where col in (&mode_values.);
run;
Note the use of QUOTE(), which is needed to wrap the values of col in quotation marks (omit this if col is a numeric variable).

Related

Compare a column with every element of another column which is an array

I have 3 columns in a postgresql table like:
col A col B col C
---------- --------- ---------
2020-01-01 2024-01-01 {2020-01-01, 2020-05-01, 2022-03-01}
2020-05-01 2021-05-01 {2020-01-01, 2020-05-01, 2022-03-01}
2022-03-01 2023-03-01 {2020-01-01, 2020-05-01, 2022-03-01}
col C is basically the array_agg of colA over the window. What I need to check is, for each row of col A, if the datetime is >= any of the elements of the array from col C. What is the possible solution?
Note: In my actual case there's another col D, which is the array_agg of col B. So what I'll be actually checking is col A >= any of the elements of the array from col C and col B <= any of the elements of the array from col D. I mainly don't know how to compare a value with each element from an array.

The syntax here is pretty nice, you can just write
WHERE A >= Any( C )
If you need to do more complicated checks on the elements in an array, you can also use a generator expression to make it act like multiple rows and then write SQL against it. For example,
WHERE 0 < (SELECT COUNT(*) FROM unnest(C) AS elt WHERE A >= elt)
Would be a more elaborate (but more general) way to do the same thing.

Merge two dataframes by date in SAS

I have two tables and I want to merge them by id and by the latest date before the date in df1 for the relevant id.
data df1;
input id $ date value ;
informat date yymmdd10.;
format date yymmdd10. ;
cards;
a 19991231 1
a 20011231 2
b 20151231 4
;
data df2;
input id $ date ;
informat date yymmdd10.;
format date yymmdd10.;
cards;
a 20020101
c 20160701
;
I tried this, but there's something missing.
proc sql;
create table output
as select a.*, b.date
from df1 as a, df2 as b
where a.id = b.id
group by a.id, b.id
having (a.date) > max(b.date);
quit;
Desired output:
data output;
input id $ date value;
informat date yymmdd10.;
format date yymmdd10.;
cards;
a 20011231 2
;

I'd do it in two steps, with a PROC SQL to join and sort the two tables, then a data step to only output the latest date for each ID.
proc sql;
create table o1 as
select a.id,
a.date,
a.value
from df1 a
join df2 b
on b.id = a.id
and b.date > a.date
order by a.id, a.date
;
quit;
data output;
set o1;
by id;
if last.id then output;
run;

You can use SET to interleave the records. Use RETAIN to keep the last version of VALUE from the first dataset. You didn't indicate whether you have any missing values of VALUE, but let's test for that anyway.
data want;
set df1(in=in1) df2(in=in2);
by id date ;
retain last_value;
if first.id then last_value=.;
if in1 and not missing(value) then last_value=value;
if in2 and not missing(last_value);
run;
Result:
last_
Obs id date value value
1 a 2002-01-01 . 2
Note this method takes the value on or before the DATE in the second dataset. If you want it only take the last value BEFORE that date then reverse the order that the two datasets are referenced in the SET statement.

proc sort data=df1;
by id descending date;
proc sort data=df2;
by id;
data want;
merge df1 (in=in1) df2 (in=in2 rename=(date=date_max));
by id;
** Assume you want only values that are in both datasets **;
if in1 & in2;
retain flag;
if first.id then flag = 0;
** If no dates before max date yet and this one is before max date, we have a winner **;
if flag = 0 & date < date_max then do;
** Set flag to indicate this ID has already found the max date **;
flag = 1;
output;
end;
run;

sequence increment like 2018AA000001, 2018AB000001, 2018AC000001

I need to increment sequence like 2018AA000001 to 2018AA100000 after completing first sequence then it should start next sequence 2018AB000001 to 2018AB100000
it is working only one sequence using trigger in postgresql but i need to implement 2018AA then 2018AB, 2018AC sequence like.
please suggest me how to do this.
Thanks,
Vittal

You can probably make a function out of this. Maybe some people here can edit this into a sequence that you can use.
This output the results you specified in your description.
discard temp;
We will generate list of letters using this.
create temp table generate_letters as
select chr(i) as letter from generate_series(65,90) i;
letter
--------
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
(26 rows)
And generate numbers up to 100000.
create temp table generate_num as
select lpad(i::text,6,'0') as num from generate_series(1,100000) i;
select * from generate_num limit 10;
num
--------
000001
000002
000003
000004
000005
000006
000007
000008
000009
000010
(10 rows)
Then we just have to cross join.
select concat_ws('','2018',gl1.letter,gl2.letter,d.num) as seq
from generate_letters gl1
cross join generate_letters gl2
cross join generate_num d limit 100003;
seq
--------------
2018AA000001
2018AA000002
2018AA000003
2018AA000004
2018AA000005
2018AA000006
2018AA000007
2018AA000008
2018AA000009
...skipping...
2018AA099997
2018AA099998
2018AA099999
2018AA100000
2018AB000001
2018AB000002

How to turn groups of rows into separate columns?

I have a postgresql table that looks like this:
a|b|c|result
0|3|6|50
0|3|7|51
0|4|6|52
0|4|7|53
1|3|6|54
1|3|7|55
1|4|6|56
1|4|7|57
Is there an easy way to SELECT something like:
a|result for b=3|result for b=4
0|sum(50,51) |sum(52,53)
1|sum(54,55) |sum(56,57)
In other words, how to convert the groups of values of b into columns of aggregate functions like sum(), avg(), or others?
Thanks for your comments.

Not sure I understand your question completely, but I think you are looking for case.
-- drop table if exists sample;
create table sample
(a int,
b int,
c int,
result int);
insert into sample values
(0,3,6,50),
(0,3,7,51),
(0,4,6,52),
(0,4,7,53),
(1,3,6,54),
(1,3,7,55),
(1,4,6,56),
(1,4,7,57)
;
select
a,
sum(case when b = 3 then result end) as result_for_b3,
sum(case when b = 4 then result end) as result_for_b4
from
sample
group by
a
Result:
a;result_for_b3;result_for_b4
1;109;113
0;101;105
And if you (but I hope you don't) need to have output exactly as in your question, than you need to use string_agg function:
select
a,
'aggreg(' || string_agg(case when b = 3 then result end::varchar, ',') || ')' as result_for_b3,
'aggreg(' || string_agg(case when b = 4 then result end::varchar, ',') || ')' as result_for_b4
from
sample
group by
a
Result:
a;result_for_b3;result_for_b4
0;aggreg(50,51);aggreg(52,53)
1;aggreg(54,55);aggreg(56,57)

show bool if row exists or doesn't

I have the table [Contracts] with columns [id], [number]. And I also have some numbers in the string format: '12342', '23252', '1256532'. I want to get the output something like this.
1535325 | no
12342 | yes
23252 | yes
434574 | no
1256532 | yes
of course I can write this and get the rows i have, but how can I determine if the row doesn't exist and get the output above:
SELECT [Id]
,[Number]
FROM [Contracts]
where [Number] in
('12342', '23252', '1256532')

You can put values into temporary table or a table variable and do left join:
declare #d table (Number varchar(10))
insert into #d values ('12342'), ('23252'), ('1256532'), ('xxxx') -- last one is not in Contracts
SELECT c.[Id], c.[Number], case when d.Number is NULL then 'no' else 'yes' end [This Number from C is in D also]
FROM [Contracts] c
left join #d d on d.Number = c.Number
for "opposite" use right join
SELECT c.[Id], d.[Number], case when c.Number is NULL then 'no' else 'yes' end [This Number from D is in C also]
FROM [Contracts] c
right join #d d on d.Number = c.Number

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How can I achieve this selection in SAS - select

Related

Compare a column with every element of another column which is an array

Merge two dataframes by date in SAS

sequence increment like 2018AA000001, 2018AB000001, 2018AC000001

How to turn groups of rows into separate columns?

show bool if row exists or doesn't

Categories

Resources