SAS: Proc freq by group for all variables at once? - group-by

I use Proc freq to calculate the Somers' D between the dependent variable (log salary) and the independent variable (crhits, crhome, etc.)
Is there a way to get the all the results in one proc freq statement?
The code I use currently is
DATA baseball;
SET sashelp.baseball;
RUN;
PROC SORT
DATA = baseball;
BY team;
RUN;
PROC FREQ
DATA = baseball
;
TABLES logsalary * crhits
/
MEASURES;
OUTPUT OUT = somersd
(KEEP = team N _SMDCR_
RENAME = (_SMDCR_ = somers_d
N = num_in_group))
MEASURES;
BY team;
RUN;
I'm looking to get the output "somersd" for each variable crhits, crhome, etc. in one table and to do this all in one procedure, is this possible?

Why not just transpose the data so that the independent variable becomes a new BY group?
proc transpose data=sashelp.baseball out=tall name=stat;
by name team logsalary notsorted;
var crhits crhome;
run;
proc sort;
by team stat;
run;
ods exclude all;
PROC FREQ DATA = tall ;
BY team stat;
TABLES logsalary * col1 / MEASURES;
output measures out=measures
(KEEP = team stat N _SMDCR_
RENAME = (_SMDCR_ = somers_d N = num_in_group)
)
;
RUN;
ods exclude none;

Related

SAS RANDOM SAMPLING, WITH GROUP SIZE

I have a data set with accounts and its attribute (there are 9 main attributes).
Each attribute group contains a different amount of account, where 2 groups hold a massively higher amount of accounts. Therefore, when I use PROC SURVEYSELECT, to randomly select 5 accounts per STRATA, and the METHOD=SRS, I get more results from attributes which contain more accounts.
How can I correct that? how can I make SAS consider the group's volume when sampling?
The above mentioned code:
PROC SURVEYSELECT DATA=FINAL_RANDOM OUT=FINAL_RANDOM_1 NOPRINT
METHOD=srs
SAMPSIZE = 5
SELECTALL;
STRATA Account_Branch_Id ;
RUN;
I'm not quite getting your point, if you use Sample Size as number of rows, you'll get exact number of rows by your strata Random Sample node in SAS Guide
And the same goes for your code,
PROC SURVEYSELECT DATA=*YOURDATASET* OUT=DATA
METHOD=srs
SAMPSIZE = 5
SELECTALL;
STRATA *YOUR_STRATA_VARIABLE*;
RUN;
Difference between N number of observations (in 'Random Sample' node in SAS Guide) and SAMPSIZE (in the code) is that N gives you total number of observations, while SAMPSIZE gives you number per strata variable.
Here is some example generated data with two strata (branches) each containing 35% of all the accounts and the other 7 strata containing the remaining 30% of account equally distributed.
The code in the question does select 5 samples from each strata (branch_id), and randomly over attribute (risk).
I am not so good at SurverySelect to say you can't select 'evenly' over risk within strata -- but if able to do so, you will need to specify a METHOD= other than SRS
Example
data have;
call streaminit (123);
do account_id = 1 to 100000;
x = rand('uniform');
select;
when (x>.65) branch_id = 1;
when (x>.30) branch_id = 2;
otherwise branch_id = 3 + floor(rand('uniform',7));
end;
open_date = round(rand('uniform', today())); format open_date yymmdd10.;
balance = round(rand('uniform',1e6)); format balance dollar9.;
if branch_id = 7 then do;
open_date = round(rand('uniform', 4000));
balance = 5e5 + round(rand('uniform',5e5)); format balance dollar9.;
end;
select (branch_id);
when (1) risk = ceil(rand('normal',5,1));
when (5) risk = ceil(rand('uniform', 2)) * 2;
when (6) risk = 3 + ceil(rand('uniform', 4));
when (9) risk = 4 + ceil(rand('uniform', 5));
otherwise risk = ceil(rand('uniform', 9));
end;
output;
end;
drop x;
run;
proc sort data=have;
by branch_id account_id;
run;
proc tabulate data=have;
title "Original risk distribution by branch";
class branch_id risk;
table branch_id='', risk * n='' * format=comma9. * [s=[width=1cm]] / box='branch_id';
run;
* survery select;
PROC SURVEYSELECT NOPRINT DATA=have OUT=want
METHOD=srs
SAMPSIZE = 5
SELECTALL
;
STRATA Branch_Id ;
RUN;
proc sql;
create table sample_classes_with_zeros as
select class.branch_id, class.risk,
case when sample.risk then 1 else 0 end as z
from (
select distinct branch_id, risk from have
) as class
left join sample
on class.branch_id = sample.branch_id
& class.risk = sample.risk
;
proc tabulate data=sample_classes_with_zeros;
title "A SurveySelect SRS SAMPSIZE=5 sampling, risk distribution by branch";
class branch_id risk;
var z;
table branch_id='', risk * z='' * sum='' * f=comma9. * [s=[width=1cm textalign=center]] / box='branch_id';
run;

SAS: How to reference a global macro variable to create new table or dataset?

I'm having some trouble referencing a global macro variable outside of the macro to create a new data set. The global variable was created to run a loop for creating several yearly data sets using a vector of specified years, as you can see in the code below:
%macro loopyear;
%global year;
%do year = 2004 %to 2017;
proc import datafile = "C:\Filepath\blah.txt"
dbms = dlm out = blah&year.; /*Creates a dataset for each year, e.g. blah2004, blah2005, etc.) */
delimiter = " ";
getnames = no;
run;
data blah&year.;
set blah&year.;
year = &year.;
run;
proc sql;
create table blah&year._rail as
select year, var1, var2, var3, var4
from blah&year.
where var2= "rail";
quit;
%end;
%mend loopyear;
%loopyear;
/*Merge all year datasets into one master set*/
data blah_total;
set blah&year._rail;
run;
When I try to create the master data set outside of the macro, however, I get the following error:
data blah;
set blah&year._rail;
run;
ERROR: File work.blah2018_rail.data does not exist
This is frustrating because I'm only trying to create the master set based on 2004-2017 data, as referenced in the macro variable. Can someone help me pinpoint my error -- is it in the way I defined the global variable, or am I missing a step somewhere? Any help is appreciated.
Thanks!
This is an interesting quirk of both macro and data step do-loops in SAS - the loop counter is incremented before the exit condition is checked, so after your loop has run it will be one increment past your stop value, e.g.:
%macro example;
%do i = 1 %to 3;
%put i = &i;
%end;
%put i = &i;
%mend;
%example;
Output:
i = 1
i = 2
i = 3
i = 4
For your final step you probably want the set statement to look like this:
set blah2004_rail ... blah2017_rail;
You could write a macro loop to generate the list and move the data step inside your macro, e.g.
set %do year = 2004 %to 2017; blah&year._rail %end;;
The second semi-colon is important! You need one to close the %end and one to terminate the set statement.
Change your naming structure. Have a common prefix and put the year at the end, then you can use the semi colon to short reference all the datasets at once.
%macro loopyear;
%global year;
%do year = 2004 %to 2017;
proc import datafile = "C:\Filepath\blah.txt"
dbms = dlm out = blah&year.; /*Creates a dataset for each year, e.g. blah2004, blah2005, etc.) */
delimiter = " ";
getnames = no;
run;
data blah&year.;
set blah&year.;
year = &year.;
run;
proc sql;
create table blah_rail_&year. as
select year, var1, var2, var3, var4
from blah&year.
where var2= "rail";
quit;
%end;
%mend loopyear;
%loopyear;
/*Merge all year datasets into one master set*/
data blah_total;
set blah_rail: ;
run;

Defining Fixed SAS Macro Variables

I am trying to have a macro run but I'm not sure if it will resolve since I don't have connection to my database for a little while. I want to know if the macro is written correctly and will resolve the states on each pass through the code (ie do it repetitively and create a table for each state).
The second thing I would like to know is if I can run a macro through a from statement. For example let entpr be the database that I'm pulling from. Would the following resolve correctly:
proc sql;
select * from entpr.&state.; /*Do I need the . after &state?*/
The rest of my code:
libname mdt "........."
%let state = ny il ar ak mi;
proc sql;
create table mdt.&state._members
as select
corp_ent_cd
,mkt_sgmt_admnstn_cd
,fincl_arngmt_cd
,aca_ind
,prod_type
,cvyr
,cvmo
,sum(1) as mbr_cnt
from mbrship1_&state.
group by 1,2,3,4,5,6,7;
quit;
If &state contains ny il ar ak mi then as it is written, the from statement in your code will resolve to: from mbrship1_ny il ar ak mi - which is invalid SQL syntax.
My guess is that you're wanting to run the SQL statement for each of the following tables:
mbrship1_ny
mbrship1_il
mbrship1_ar
mbrship1_ak
mbrship1_mi
In which case the simplest macro would look something like this:
%macro do_sql(state=);
proc sql;
create table mdt.&state._members
as select
...
from mbrship1_&state
group by 1,2,3,4,5,6,7;
quit;
%mend;
%do_sql(state=ny);
%do_sql(state=il);
%do_sql(state=ar);
%do_sql(state=ak);
%do_sql(state=mi);
As to your question regarding whether or not to include the . the rule is that if the character following your macro variable is not a-Z, 0-9, or the underscore, then the period is optional. Those characters are the list of valid characters for a macro variable name, so as long as it's not one of those you don't need it as SAS will be able to identify where the name of the macro finishes. Some people always include it, personally I leave it out unless it's required.
When selecting data from multiple tables, whose names themselves contain some data (in your case the state) you can stack the data with:
UNION ALL in SQL
SET in Data step
As long as you are stacking data, you should also add a new column to the query selection that tracks the state.
Consider this pattern for stacking in SQL
data one;
do index = 1 to 10; do _n_ = 1 to 2; output; end; end;
run;
data two;
do index = 101 to 110; do _n_ = 1 to 2; output; end; end;
run;
proc sql;
create table want as
select
source, index
from
(select 'one' as source, * from one)
union all
(select 'two' as source, * from two)
;
The pattern can be abstracted into a template for SQL source code that will be generated by macro.
%macro my_ultimate_selector (out=, inlib=, prefix= states=);
%local index n state;
%let n = %sysfunc(countw(&states));
proc sql;
create table &out as
select
state
, corp_ent_cd
, mkt_sgmt_admnstn_cd
, fincl_arngmt_cd
, aca_ind
, prod_type
, cvyr
, cvmo
, count(*) as state_7dim_level_cnt
from
%* ----- use the UNION ALL pattern for stacking data -----;
%do index = 1 %to &n;
%let state = %scan(&states, &index);
%if &index > 1 %then %str(UNION ALL);
(select "&state" as state, * from &inlib..&prefix.&state.)
%end;
group by 1,2,3,4,5,6,7,8 %* this seems to be to much grouping ?;
;
quit;
%mend;
%my_ultimate_selector (out=work.want, inlib=mdt, prefix=mbrship1_, states=ny il ar ak mi)
If the columns of the inlib tables are not identical with regard to column order and type, use a UNION ALL CORRESPONDING to have the SQL procedure line up the columns for you.

Macro increment

I have table lookup values as below
sno date
1 200101
2 200102
3 200103
4 200104
I wrote below macro
%let date=200102
proc sql;
select sno into :no from lookup where date=&date.;
quit;
I need a help on how to convert the entire table lookup into macro increment by creating first s.no and date as two macro variable then increment. So that i don’t need to update dates in my table lookup every time. So if i look up for date 201304 i need to get its corresponding s.no
Is there pattern to the SNO values? Are you basically numbering the months since 01JAN2001? If so then use INTCK() function.
data test;
input date yymmdd8. ;
format date yymmdd10. ;
sno = 1+intck('month','01JAN2001'd,date);
cards;
20010112
20010213
20010314
20010415
;
So you could create two macro variables. One with the base date and the other with the base SNO value.
36 %let basedate='01JAN2001'd ;
37 %let basesno=1;
38 %let date='01JAN2001'd ;
39 %let sno=%eval(&basesno + %sysfunc(intck(month,&basedate,&date)));
40 %put &=date &=sno;
DATE='01JAN2001'd SNO=1
41
42 %let date="%sysfunc(today(),date9)"d;
43 %let sno=%eval(&basesno + %sysfunc(intck(month,&basedate,&date)));
44 %put &=date &=sno;
DATE="16NOV2017"d SNO=203
If you want to simply translate one (unique) value into another. You can use (in)formats. They can do much more than just changing how data are read/displayed. They are easy to use, fast (in-memory) and don't depend on the table once created. Change the library to a permanent one if work (=> temporary library) doesn't suit your needs.
options fmtsearch=(formats,work);
data fmt(keep = fmtname type start end label hlo default);
length fmtname $10 type $1 start end $6 label 8 hlo $1 default 8;
fmtname = 'date_to_no';
type = 'I';
label=0;
do y = 2001 to 2099;
do m = 1 to 12;
start = put(y,4.) || put(m,z2.);
end = start;
label + 1;
default=50; /*default length of the string compared when informat is used. Should be higher than both start and end*/
output;
end;
end;
/*if you want to assign a value (=label) to inputs not found. In this case it's -2*/
hlo="O";
start = "";
end = start;
label= -2;
output;
run;
proc format library=work cntlin=fmt;
run;
data test;
no = input('200101',date_to_no.); output;
no = input('201710',date_to_no.); output;
no = input('201713',date_to_no.); output;
run;
Build a lookup table dynamically and create a macro variable for each row in the table. The macro variables will be named date_200101,date_200102,...and so on. They will contain a value equal to the corresponding sno value:
data lookup;
length var_name $20;
do sno = 1 to intck('month','01jan2001'd,date())+1;
date = input(put(intnx('month','01jan2001'd, sno-1, 'beginning'),yymmn6.),best.);
var_name = cats('date_',date);
call symput(var_name, cats(sno));
output;
end;
run;
You can then refer to the macro variables like so:
%let date =200103;
%put &&date_&date;
...or...
%put &date_200101;
The first usage example is using double macro resolution. Basically the macro processes needs to perform 2 iterations of the macro token &&date_&date in order to fully resolve it. On the first pass, it gets resolved to &date_200101. On the second pass, the macro token &date_200101 gets resolved to 1.

sas macro index or other?

I have 169 towns for which I want to iterate a macro. I need the output files to be saved using the town-name (rather than a town-code). I have a dataset (TOWN) with town-code and town-name. Is it possible to have a %let statement that is set to the town-name for each iteration where i=town-code?
I know that I can list out the town-names using the index function, but I'd like a way to set the index function so that it sets a %let statement to the TOWN.town-name when i=TOWN.town-code.
All the answers below seem possible. I have used the %let = %scan( ,&i) option for now. A limitation is that the town names can be more than one word, so I've substituted underscores for spaces that I correct later.
This is my macro. I output proc report to excel for each of the 169 towns. I need the excel file to be saved as the name of the town and for the header to include the name of the town. Then, in excel, I merge all 169 worksheets into a single workbook.
%MACRO BY_YEAR;
%let townname=Andover Ansonia Ashford Avon ... Woodbury Woodstock;
%do i = 1999 %to 2006;
%do j = 1 %to 169;
%let name = %scan(&townname,&j);
ods tagsets.msoffice2k file="&ASR.\Town_Annual\&i.\&name..xls" style=minimal;
proc report data=ASR nofs nowd split='/';
where YR=&i and TWNRES=&j;
column CODNUM AGENUM SEX,(dths_sum asr_sum seasr_sum);
define CODNUM / group ;
define agenum / group ;
define sex / across ;
define dths_sum / analysis ;
define asr_sum / analysis ;
define seasr_sum / analysis ;
break after CODNUM / ul;
TITLE1 "&name Resident Age-Specific Mortality Rates by Sex, &i";
TITLE2 "per 100,000 population for selected causes of death";
run;
ods html close;
%end;
%end;
%MEND;
My guess is that the reason why you want to look up the town name by town index is to repeatedly call a macro with each town name. If this is the case, then you don't even need to get involved with the town index business at all. Just call the macro with each town name. There are many ways to do this. Here is one way using call execute().
data towns;
infile cards dlm=",";
input town :$char10. ##;
cards;
My Town,Your Town,His Town,Her Town
;
run;
%macro doTown(town=);
%put Town is &town..;
%mend doTown;
/* call the macro for each town */
data _null_;
set towns;
m = catx(town, '%doTown(town=', ')');
call execute(m);
run;
/* on log
Town is My Town.
Town is Your Town.
Town is His Town.
Town is Her Town.
*/
If you do need to do a table lookup, then one way is to convert your town names into a numeric format and write a simple macro to retrieve the name, given an index value. Something like:
data towns;
infile cards dlm=",";
input town :$char10. ##;
cards;
My Town,Your Town,His Town,Her Town
;
run;
/* make a numeric format */
data townfmt;
set towns end=end;
start = _n_;
rename town = label;
retain fmtname 'townfmt' type 'n';
run;
proc format cntlin=townfmt;
run;
%macro town(index);
%trim(%sysfunc(putn(&index,townfmt)))
%mend town;
%*-- check --*;
%put %town(1),%town(2),%town(3),%town(4);
/* on log
My Town,Your Town,His Town,Her Town
*/
Or how about you just pass both the code and the name to the macro as parameters? Like this?
%MACRO DOSTUFF(CODE=, NAME=);
DO STUFF...;
PROC EXPORT DATA=XYZ OUTFILE="&NAME."; RUN;
%MEND;
DATA _NULL_;
SET TOWNS;
CALL EXECUTE("%DOSTUFF(CODE=" || STRIP(CODE) || ", NAME=" || STRIP(NAME) || ");");
RUN;