I have a text file with comments that I need to import in SAS.
the text file look like this
# DATA1
#
# --
#
ID nbmiss x1 x2 x3 x4
1 1 45 38 47
2 0 37 45 39 51
3 3 58
4 4
5 0 68 45 73 76
6 2 52 48
my output in SAS must look like this
Obs x1 x2 x3 x4
1 . 45 38 47
2 37 45 39 51
3 . . . 58
4 . . . .
5 68 45 73 76
6 . . 52 48
here is what I did. It gives me what I am looking for but it's long. I think there is a more simple way.
proc import datafile= 'Z:\bloc1data\data\data1.txt'
out=class
dbms=dlm
replace;
datarow=6;
delimiter='09'x;
run;
proc print data = work.class label;
var VAR3 VAR4 VAR5 VAR6;
label VAR3='x1' VAR4='x2' VAR5='x3' VAR6='x4';
run;
My question is how to have the same output in a simplify way?
Thank you for your time.
This is the part that's doing the import:
proc import datafile= 'Z:\bloc1data\data\data1.txt'
out=class
dbms=dlm
replace;
datarow=6;
delimiter='09'x;
run;
That seems pretty short to me. Four actual lines of code, around a hundred characters... The equivalent code in the data step is basically the same.
data want;
infile 'z:\bloc1data\data\data1.txt' dlm='09'x dsd firstobs=6;
input id nbmiss x1 x2 x3 x4;
run;
That file unfortunately doesn't work well for determining the names automatically (which otherwise you could do). DBMS=DLM does not have a namerow option to tell it where to pick names up from, so you would need to preprocess the file to remove the extraneous lines to do that. You're welcome to ask as a separate question how to do so, but it's not "simpler" than the above (though it is probably "better").
Related
I have a function that takes as input some of the values in a table and returns a tuple if you will - three separate return values, which I want to transpose into the output of a query. Here's a simplified example of what I want to achieve:
multiplier:{(x*2;x*3;x*3)};
select twoX:multiplier[price][0]; threeX:multiplier[price][1]; fourX:multiplier[price][2] from data;
The above basically works (I think I've got the syntax right for the simplified example - if not then hopefully my intention is clear), but is inefficient because I'm calling the function three times and throwing away most of the output each time. I want to rewrite the query to only call the function once, and I'm struggling.
Update
I think I missed a crucial piece of information in my explanation of the problem which affects the outcome - I need to get other data in the query alongside the output of my function. Here's a hopefully more realistic example:
multiplier:{(x*2;x*3;x*4)};
select average:avg price, total:sum price, twoX:multiplier[sum price][0]; threeX:multiplier[sum price][1]; fourX:multiplier[sum price][2] by category from data;
I'll have a go at adapting your answers to fit this requirement anyway, and apologies for missing this bit of information. The real function if a proprietary and fairly complex algorithm and the real query has about 30 output columns, hence the attempt at simplifying the example :)
If you're just looking for the results themselves you can extract (exec) as lists, create dictionary and then flip the dictionary into a table:
q)exec flip`twoX`threeX`fourX!multiplier[price] from ([]price:til 10)
twoX threeX fourX
-----------------
0 0 0
2 3 4
4 6 8
6 9 12
8 12 16
10 15 20
12 18 24
14 21 28
16 24 32
18 27 36
If you need other columns from the original table too then its trickier but you could join the tables sideways using ,'
q)t:([]price:til 10)
q)t,'exec flip`twoX`threeX`fourX!multiplier[price] from t
An apply # can also achieve what you want. Here data is just a table with 10 random prices. # is then used to apply the multiplier function to the price column while also assigning a column name to each of the three resulting lists:
q)data:([] price:10?100)
q)multiplier:{(x*2;x*3;x*3)}
q)#[data;`twoX`threeX`fourX;:;multiplier data`price]
price twoX threeX fourX
-----------------------
80 160 240 240
24 48 72 72
41 82 123 123
0 0 0 0
81 162 243 243
10 20 30 30
36 72 108 108
36 72 108 108
16 32 48 48
17 34 51 51
I want to make a Matlab function that takes two matrices A and B (of the same size) and combines them in a certain way to give an output that can be used in Latex - table.
I want the first row of the output matrix to consist of the first row of matrix A, with ampersands (&) in between them, and that ends with an double backslash.
The second row should be the first row of B with parentheses around them, and ampersands in between. And so on for the rest of A and B.
If I let A=rand(1,2), I could do this by using [num2str(A(1)), ' & ', num2str(A(2)),' \\'] and so on.
But I want to be able to make a function that does this for any size of the matrix A. I guess I have to make cell structures in some way. But how?
This could be one approach -
%// First off, make the "mixed" matrix of A and B
AB = zeros(size(A,1)*2,size(A,2));
AB(1:2:end) = A;
AB(2:2:end) = B;
%// Convert all numbers of AB to characters with ampersands separating them
AB_amp_backslash = num2str(AB,'%1d & ');
%// Remove the ending ampersands
AB_amp_backslash(:,end-1:end) = [];
%// Append the string ` \\` and make a cell array for the final output
ABcat_char = strcat(AB_amp_backslash,' \\');
ABcat_cell = cellstr(ABcat_char)
Sample run -
A =
183 163 116 50
161 77 107 91
150 124 56 46
B =
161 108 198 4
198 18 14 137
6 161 188 157
ABcat_cell =
'183 & 163 & 116 & 50 \\'
'161 & 108 & 198 & 4 \\'
'161 & 77 & 107 & 91 \\'
'198 & 18 & 14 & 137 \\'
'150 & 124 & 56 & 46 \\'
' 6 & 161 & 188 & 157 \\'
You can use sprintf, it will repeat the format spec as many times as required until all input variables are processed:
%combine both to one matrix
C=nan(size(A).*[2,1]);
C(1:2:end)=A;
C(2:2:end)=B;
%print
sprintf('%f & %f \\\\\n',C.')
The transpose (.') is required to fix the ordering.
Let's say I have a table like this:
post user date
____ ____ ________________
1 A 12.01.2014 13:05
2 B 15.01.2014 20:17
3 A 16.01.2014 05:22
I want to create a smaller table (but not delete the original one!) containing all posts of - for example - user A including the dates that those were posted on.
When looking at MATLAB's documentation (see the very last part for deleting rows) I discovered that MATLAB allows you to create a mask for a table based on some criterion. So in my case if I do something like this:
postsA = myTable.user == 'A'
I get a nice mask vector as follows:
>> postsA =
1
0
1
where the 1s are obviously those rows in myTable, which satisfy the rule I have given.
In the documention I have pointed at above rows are deleted from the original table:
postsNotA = myTable.user ~= 'A' % note that I have to reverse the criterion since I'm choosing stuff that will be removed
myTable(postsNotA,:) = [];
I would however - as stated above - like to not touch my original table. One possible solution here is to create an empty table with two columns:
post date
____ ____
interate through all rows of my original table, while also looking at the current value of my mask vector postsA and if it's equal to 1, copy the two of the columns in that row that I'm interested in and concatenate this shrunk row to my smaller table. What I'd like to know is if there is a more or less 1-2 lines long solution for this problem?
Assuming myTable is your original table.
You can just do
myTable(myTable.user == 'A',:)
Sample Code:
user = ['A';'B';'A';'C';'B'];
Age = [38;43;38;40;49];
Height = [71;69;64;67;64];
Weight = [176;163;131;133;119];
BloodPressure = [124 93; 109 77; 125 83; 117 75; 122 80];
T = table(user,Age,Height,Weight,BloodPressure)
T(T.user=='A',:)
Gives:
T =
user Age Height Weight BloodPressure
____ ___ ______ ______ _________________________
A 38 71 176 124 93
B 43 69 163 109 77
A 38 64 131 125 83
C 40 67 133 117 75
B 49 64 119 122 80
ans =
user Age Height Weight BloodPressure
____ ___ ______ ______ _________________________
A 38 71 176 124 93
A 38 64 131 125 83
suppose i have a .csv file And it has the values as follows:
A 23 45
B 69 84
C 48 78
D 12 34
so it has two columns. Now what i need to do is to add values staring from the 3rd column with out deleting the values in the 1st and 2nd columns..
i tried z code
fileID = fopen('exp.csv','A');
fprintf(fileID,' %12.4f\n',D);
fclose(fileID);
But the issue is that this is added all in one column like:
23
69
48
12
......
45
84
75
38
How can i do this...??
Use the csvread / csvwrite functions to load in the existing file, append a column, and write the new data.
data = csvread('exp.csv');
toadd = (1:4)';
newdata = [data toadd];
csvwrite('out.csv', newdata);
I am trying to implement a macro which will allow me to run several logistic regression models that have the same outcome but a different main explanatory variable (the covariates would be common for all models) for several datasets. I have written a scan and eval macro that scans two global variables but it's not quite working. The code is shown below:
%let numbers=5 7 8 9 10 12 13 14 16 18 19 24 26
32 33 35 37 39 41 44 45 48 50 52
55 56 58 66 67 68 ;
%let list=voting national local safe street violence say free;
%macro logistic;
%let j=1;
%let m=1;
%let first=%scan(&list,%eval(&j));
%let second=%scan(&numbers,%eval(&m));
%do %while (&first ne );
%do %while (&second ne );
proc logistic data=socialcapital&second. descending;
model depression= &first. agec married edu inc_2 inc_3 inc_4 inc_5/risklimits;
ods output ParameterEstimates=mv_model1&second._&first.;
run;
%let j=%eval(&j+1);
%let m=%eval(&m+1);
%let first=%scan(&list,%eval(&j));
%let second=%scan(&numbers,%eval(&m));
%end;
%end;
run;
%mend;
%logistic;
The global variable numbers refers to the "socialcaptial" dataset that I am using. Each dataset represents a country and so each number in the "numbers" global variable refers to a dataset. The global variable list refers to the list of main explanatory variables that I want to include in the model, one main explanatory variable per model. What I am looking to get is 8 separate multivariable logistic regression results for each country.
However, it appears that the scan function is not working properly for me so I know that I have done something wrong, but I am not sure what. It seems that the macro assigns 1 variable from &list to 1 dataset from &numbers until it runs out of variables from &list and simply runs the model with just the covariates instead of running all 8 models using the dataset 5, then running all 8 models again using dataset 7, and so forth.
Basically, I have messed up something with the numbering and I am not quite sure how to proceed with this macro. I know that I can get rid of the &numbers global variable by using a "by statement" in proc logistic with a stacked dataset but I would really like to learn how to get this to work for future models where that might not be an option.
Maggie,
I believe the code below will do what you want. I commented out the LOGISTIC procedure and put in a PUT statement for testing, and it seems to resolve the way that I expect you think it should.
%let numbers=5 7 8 9 10 12 13 14 16 18 19 24 26
32 33 35 37 39 41 44 45 48 50 52
55 56 58 66 67 68 ;
%let list=voting national local safe street violence say free;
%macro logistic;
%let j=1;
%let first=%scan(&list,%eval(&j));
%do %while (&first ne );
%let m=1;
%let second=%scan(&numbers,%eval(&m));
%do %while (&second ne );
/*
proc logistic data=socialcapital&second. descending;
model depression= &first. agec married edu inc_2 inc_3 inc_4 inc_5/risklimits;
ods output ParameterEstimates=mv_model1&second._&first.;
run;
*/
%put J=&j - M=&m - FIRST=&first - SECOND=&second;
%let m=%eval(&m+1);
%let second=%scan(&numbers,%eval(&m));
%end;
%let j=%eval(&j+1);
%let first=%scan(&list,%eval(&j));
%end;
run;
%mend;
%logistic;
Here's another way to do it: (if you end up with NUMBERS and LIST in data sets, we can alter the code to handle that too)
%let numbers=5 7 8 9 10 12 13 14 16 18 19 24 26
32 33 35 37 39 41 44 45 48 50 52
55 56 58 66 67 68 ;
%let list=voting national local safe street violence say free;
%macro logistic(First=, Second=);
%Put FIRST= &first;
%Put SECOND= &second;
/*proc logistic data=socialcapital&second. descending;*/
/*model depression= &first. agec married edu inc_2 inc_3 inc_4 inc_5/risklimits;*/
/*ods output ParameterEstimates=mv_model1&second._&first.;*/
/*run;*/
%mend logistic;
%Macro Test;
%do i = 1 %to %sysfunc(countw(&list));
%Let first=%scan(&list,&i);
%do j = 1 %to %sysfunc(countw(&numbers));
%Let second=%scan(&numbers,&j);
%logistic(First=&first,Second=&second)
%end;
%end;
%Mend test;
%test
Whoops, small correction. I should have used "numbers" not index "i" below.
You can do this with a macro but you also can do this in either a data step( using call execute ) or in Proc IML(with 9.22 or higher) with submit blocks nested in a loop. To get an idea please see below.
Data _Null_;
Do numbers = 5, 7,
8 to 10,
12 to 14,
16, 18, 19, 24, 26, 32,
33 to 41 by 2,
44, 45, 48, 50, 52, 55, 56, 58,
66 to 68;
Do IndpVar = "voting", "national", "local", "safe", "street", "violence", "say", "free";
call execute( '%Put '||strip(Indpvar)||strip(put(numbers,best.))||';');
"Logistic Code Goes Here";
End;
End;
Run;