I am trying to implement a macro which will allow me to run several logistic regression models that have the same outcome but a different main explanatory variable (the covariates would be common for all models) for several datasets. I have written a scan and eval macro that scans two global variables but it's not quite working. The code is shown below:
%let numbers=5 7 8 9 10 12 13 14 16 18 19 24 26
32 33 35 37 39 41 44 45 48 50 52
55 56 58 66 67 68 ;
%let list=voting national local safe street violence say free;
%macro logistic;
%let j=1;
%let m=1;
%let first=%scan(&list,%eval(&j));
%let second=%scan(&numbers,%eval(&m));
%do %while (&first ne );
%do %while (&second ne );
proc logistic data=socialcapital&second. descending;
model depression= &first. agec married edu inc_2 inc_3 inc_4 inc_5/risklimits;
ods output ParameterEstimates=mv_model1&second._&first.;
run;
%let j=%eval(&j+1);
%let m=%eval(&m+1);
%let first=%scan(&list,%eval(&j));
%let second=%scan(&numbers,%eval(&m));
%end;
%end;
run;
%mend;
%logistic;
The global variable numbers refers to the "socialcaptial" dataset that I am using. Each dataset represents a country and so each number in the "numbers" global variable refers to a dataset. The global variable list refers to the list of main explanatory variables that I want to include in the model, one main explanatory variable per model. What I am looking to get is 8 separate multivariable logistic regression results for each country.
However, it appears that the scan function is not working properly for me so I know that I have done something wrong, but I am not sure what. It seems that the macro assigns 1 variable from &list to 1 dataset from &numbers until it runs out of variables from &list and simply runs the model with just the covariates instead of running all 8 models using the dataset 5, then running all 8 models again using dataset 7, and so forth.
Basically, I have messed up something with the numbering and I am not quite sure how to proceed with this macro. I know that I can get rid of the &numbers global variable by using a "by statement" in proc logistic with a stacked dataset but I would really like to learn how to get this to work for future models where that might not be an option.
Maggie,
I believe the code below will do what you want. I commented out the LOGISTIC procedure and put in a PUT statement for testing, and it seems to resolve the way that I expect you think it should.
%let numbers=5 7 8 9 10 12 13 14 16 18 19 24 26
32 33 35 37 39 41 44 45 48 50 52
55 56 58 66 67 68 ;
%let list=voting national local safe street violence say free;
%macro logistic;
%let j=1;
%let first=%scan(&list,%eval(&j));
%do %while (&first ne );
%let m=1;
%let second=%scan(&numbers,%eval(&m));
%do %while (&second ne );
/*
proc logistic data=socialcapital&second. descending;
model depression= &first. agec married edu inc_2 inc_3 inc_4 inc_5/risklimits;
ods output ParameterEstimates=mv_model1&second._&first.;
run;
*/
%put J=&j - M=&m - FIRST=&first - SECOND=&second;
%let m=%eval(&m+1);
%let second=%scan(&numbers,%eval(&m));
%end;
%let j=%eval(&j+1);
%let first=%scan(&list,%eval(&j));
%end;
run;
%mend;
%logistic;
Here's another way to do it: (if you end up with NUMBERS and LIST in data sets, we can alter the code to handle that too)
%let numbers=5 7 8 9 10 12 13 14 16 18 19 24 26
32 33 35 37 39 41 44 45 48 50 52
55 56 58 66 67 68 ;
%let list=voting national local safe street violence say free;
%macro logistic(First=, Second=);
%Put FIRST= &first;
%Put SECOND= &second;
/*proc logistic data=socialcapital&second. descending;*/
/*model depression= &first. agec married edu inc_2 inc_3 inc_4 inc_5/risklimits;*/
/*ods output ParameterEstimates=mv_model1&second._&first.;*/
/*run;*/
%mend logistic;
%Macro Test;
%do i = 1 %to %sysfunc(countw(&list));
%Let first=%scan(&list,&i);
%do j = 1 %to %sysfunc(countw(&numbers));
%Let second=%scan(&numbers,&j);
%logistic(First=&first,Second=&second)
%end;
%end;
%Mend test;
%test
Whoops, small correction. I should have used "numbers" not index "i" below.
You can do this with a macro but you also can do this in either a data step( using call execute ) or in Proc IML(with 9.22 or higher) with submit blocks nested in a loop. To get an idea please see below.
Data _Null_;
Do numbers = 5, 7,
8 to 10,
12 to 14,
16, 18, 19, 24, 26, 32,
33 to 41 by 2,
44, 45, 48, 50, 52, 55, 56, 58,
66 to 68;
Do IndpVar = "voting", "national", "local", "safe", "street", "violence", "say", "free";
call execute( '%Put '||strip(Indpvar)||strip(put(numbers,best.))||';');
"Logistic Code Goes Here";
End;
End;
Run;
Related
I have two variables (varx and vary) in data set "dat" and need to create a final score, by first categorizing varx and vary, and then translate the score categories into a final score according to a look-up table "lookup".
I managed to get past the categorizing part and am now stuck on how to tell SAS to use the categories I created (i.e., "varxcat" and "varycat") as row and column indices of "lookup", grab the value I need for each observation, and put it into a final score variable (call it "score") in "dat".
In R (in which I normally code) this can easily be done with something like a for loop. Is there anything similar in SAS? (I don't must use "varxcat" and "varycat", just need to eventually create "score".)
data dat;
input ID $ varx vary;
datalines;
1 1 1
2 4 5
3 11 12
4 23 14
5 24 20
;
data lookup;
input x01to10 x11to20 x21to30;
datalines;
21 52 73
84 95 96
107 118 149
; /*first row is for y01to10, second row is for y11to20, and third row is for y21to30,
such that if someone's x score is in category 1 and y score is in category 3,
the person's final score should be 107*/
data dat;
set dat;
if varx <= 10 then varxcat = 1;
else if varx > 10 & varx <= 20 then varxcat = 2;
else if varx > 20 & varx <= 30 then varxcat = 3;
if vary <= 10 then varycat = 1;
else if vary > 10 & vary <= 20 then varycat = 2;
else if vary > 20 & vary <= 30 then varycat = 3;
run;
Desired "dat" looks like
data dat;
input ID $ varx vary score;
datalines;
1 1 1 21
2 4 5 21
3 11 12 95
4 23 14 96
5 24 20 96
;
A lookup table for data value mapping is essentially a left join operation. SAS has a lot of ways to left join data, including
SQL
Merge
Hash object
Array (direct addressing)
Formats
Informats
Here are four ways: SQL, Merge, Array and Hash. The mapping from var* to category is done by the functional mapping int (value/10):
data have;
input ID $ varx vary;
datalines;
1 1 1
2 4 5
3 11 12
4 23 14
5 24 20
6 5 29 /* score should be 107 */
;
data lookup;
do index_y = 0 to 2;
do index_x = 0 to 2;
input lookup_value ##;
output;
end;
end;
datalines;
21 52 73
84 95 96
107 118 149
;
*------------------- SQL;
proc sql;
create table want as
select
id, lookup_value as score
from
have
left join
lookup
on
int (have.varx/10) = lookup.index_x
and
int (have.vary/10) = lookup.index_y
order by
id
;
*------------------- MERGE;
data have2(index=(myindexname=(xcat ycat)));
set have;
xcat = int(varx/10);
ycat = int(vary/10);
run;
proc sort data=lookup;
by index_x index_y;
options msglevel=i;
data want2(keep=id lookup_value rename=(lookup_value=score));
merge
have2(rename=(xcat=index_x ycat=index_y) in=left)
lookup
;
by index_x index_y;
if left;
run;
proc sort data=want2;
by id;
run;
*------------------- ARRAY DIRECT ADDRESSING;
data want3;
array lookup [0:2,0:2] _temporary_;
if _n_ = 1 then do until (endlookup);
set lookup end=endlookup;
lookup[index_x,index_y] = lookup_value;
end;
set have;
xcat = varx/10;
ycat = vary/10;
score = lookup[xcat,ycat];
keep id score;
run;
*------------------- HASH LOOKUP;
data want4;
if 0 then set lookup;
if _n_ = 1 then do;
declare hash lookup(dataset:'lookup');
lookup.defineKey('index_x', 'index_y');
lookup.defineData('lookup_value');
lookup.defineDone();
end;
set have;
index_x = int(varx/10);
index_y = int(vary/10);
if (lookup.find() = 0) then
score = lookup_value;
keep id score;
run;
I have a text file with comments that I need to import in SAS.
the text file look like this
# DATA1
#
# --
#
ID nbmiss x1 x2 x3 x4
1 1 45 38 47
2 0 37 45 39 51
3 3 58
4 4
5 0 68 45 73 76
6 2 52 48
my output in SAS must look like this
Obs x1 x2 x3 x4
1 . 45 38 47
2 37 45 39 51
3 . . . 58
4 . . . .
5 68 45 73 76
6 . . 52 48
here is what I did. It gives me what I am looking for but it's long. I think there is a more simple way.
proc import datafile= 'Z:\bloc1data\data\data1.txt'
out=class
dbms=dlm
replace;
datarow=6;
delimiter='09'x;
run;
proc print data = work.class label;
var VAR3 VAR4 VAR5 VAR6;
label VAR3='x1' VAR4='x2' VAR5='x3' VAR6='x4';
run;
My question is how to have the same output in a simplify way?
Thank you for your time.
This is the part that's doing the import:
proc import datafile= 'Z:\bloc1data\data\data1.txt'
out=class
dbms=dlm
replace;
datarow=6;
delimiter='09'x;
run;
That seems pretty short to me. Four actual lines of code, around a hundred characters... The equivalent code in the data step is basically the same.
data want;
infile 'z:\bloc1data\data\data1.txt' dlm='09'x dsd firstobs=6;
input id nbmiss x1 x2 x3 x4;
run;
That file unfortunately doesn't work well for determining the names automatically (which otherwise you could do). DBMS=DLM does not have a namerow option to tell it where to pick names up from, so you would need to preprocess the file to remove the extraneous lines to do that. You're welcome to ask as a separate question how to do so, but it's not "simpler" than the above (though it is probably "better").
I found this on SAS official website.
Use the GROUPFORMAT option in the BY statement to ensure that
1. formatted values are used to group observations when a FORMAT statement and a BY statement are used together in a DATA step
2. the FIRST.variable and LAST.variable are assigned by the formatted values of the variable
And the example it uses to illustrate the usage of groupformat is
proc format;
value range
low -55 = 'Under 55'
55-60 = '55 to 60'
60-65 = '60 to 65'
65-70 = '65 to 70'
other = 'Over 70';
run;
proc sort data=class out=sorted_class;
by height;
run;
data _null_;
format height range.;
set sorted_class;
by height groupformat;
if first.height then
put 'Shortest in ' height 'measures ' height:best12.;
run;
But I don't understand how this example shows groupformat "ensures"
formatted values are used to group observations when a FORMAT statement and a BY statement are used together in a DATA step.
Look at the results with and without the groupformat statement:
4805
4806 data _null_;
4807 format height range.;
4808 set sorted_class;
4809 by height groupformat;
4810 if first.height then
4811 put 'Shortest in ' height 'measures ' height:best12.;
4812 run;
Shortest in Under 55 measures 51.3
Shortest in 55 to 60 measures 56.3
Shortest in 60 to 65 measures 62.5
Shortest in 65 to 70 measures 65.3
Shortest in Over 70 measures 72
NOTE: There were 19 observations read from the data set WORK.SORTED_CLASS.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.01 seconds
4813
4814 data _null_;
4815 format height range.;
4816 set sorted_class;
4817 by height ;
4818 if first.height then
4819 put 'Shortest in ' height 'measures ' height:best12.;
4820 run;
Shortest in Under 55 measures 51.3
Shortest in 55 to 60 measures 56.3
Shortest in 55 to 60 measures 56.5
Shortest in 55 to 60 measures 57.3
Shortest in 55 to 60 measures 57.5
Shortest in 55 to 60 measures 59
Shortest in 55 to 60 measures 59.8
Shortest in 60 to 65 measures 62.5
Shortest in 60 to 65 measures 62.8
Shortest in 60 to 65 measures 63.5
Shortest in 60 to 65 measures 64.3
Shortest in 60 to 65 measures 64.8
Shortest in 65 to 70 measures 65.3
Shortest in 65 to 70 measures 66.5
Shortest in 65 to 70 measures 67
Shortest in 65 to 70 measures 69
Shortest in Over 70 measures 72
NOTE: There were 19 observations read from the data set WORK.SORTED_CLASS.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
From there is it obvious that the GROUPFORMAT makes the by groups based on the FORMATTED value. Without it, you are using the RAW value in HEIGHT.
I have a (maybe simple) question about Matlab data import. I want to import a huge dataset (~1GB) which has a comma separated format like this:
08:05, 12, 33, 124, 13, 08:06, 22, 84, 12, 35, ..
Every 5th value is a timestamp. I want to import it with a fixed numbers of colums (5 colums), but there is no delimiter for the end of row. It should look like this in the end:
08:05 12 33 124 13
08:06 22 14 1 35
08:07 22 124 12 34
08:08 22 12 12 0
I thought about replacing every 5th comma by a subroutine, but it's too time consuming. Do you know a better solution? I'm hoping for a nice build-in function.
You can use fscanf and C-type format strings to accomplish this. For example:
fid=fopen('filename.txt');
A=reshape(fscanf(fid,'%d:%d, %d, %d, %d, %d, '),6,[])';
fclose(fid);
This stores your answer in a matrix A which will contain
A =
8 5 12 33 124 13
8 6 22 84 12 35
If you want to format this into a string or output file as you listed, you could use:
fprintf('%02d:%02d %-3d %-3d %-3d %-3d\n',A')
suppose i have a .csv file And it has the values as follows:
A 23 45
B 69 84
C 48 78
D 12 34
so it has two columns. Now what i need to do is to add values staring from the 3rd column with out deleting the values in the 1st and 2nd columns..
i tried z code
fileID = fopen('exp.csv','A');
fprintf(fileID,' %12.4f\n',D);
fclose(fileID);
But the issue is that this is added all in one column like:
23
69
48
12
......
45
84
75
38
How can i do this...??
Use the csvread / csvwrite functions to load in the existing file, append a column, and write the new data.
data = csvread('exp.csv');
toadd = (1:4)';
newdata = [data toadd];
csvwrite('out.csv', newdata);