I'm working on a table in SAS with a column that contains dates (FENTREGA in the picture), i want to complete the empty cells of this column with "Empty", can you help me with the code?
This is the structure of my table, the column i need to use is FENTREGA.
Since your column is numeric you cannot put text in the same column. However, you can make the period appear as the text EMPTY if you use a custom format.
Or you can make the whole column text, but then you cannot do date operations/calculations on the column without converting it back.
proc format;
value empty_dates
. = 'Empty'
Other = [mmddyyd10.];
run;
proc sql;
....
t1.FENTREGA format=empty_dates.,
....
EDIT: Fully tested solution, works as expected
DATA have;
informat FENTREGA mmddyy10.;
format FENTREGA date9.;
input FENTREGA;
datalines;
12/10/2003
10/15/2006
07/20/2010
05/11/2006
10/01/2006
07/03/2012
05/08/2015
.
.
.
.
;
RUN;
proc format;
value empty_dates
. = 'Empty'
Other = [mmddyyd10.];
run;
proc sql;
select
FENTREGA format=empty_dates.
from have;
quit;
Related
I have a dataset having 20 observations and 6 variables ID, Gender, Age,Height,Weight,Year. All are numeric except gender variable. I would like to extract 10 observations starting from fifth observation using SAS macros.
I have the code below to import and extract the selected rows from the table.
I want to extract the selected rows using macros as part of an exercise. Please let me know your advice how to use macros to extract specific observations.
Thank you for your time.
%macro one (a, b, c);
proc import out=&a
datafile= "C:\Users\komal\Desktop\&b"
dbms=&c replace;
getnames=yes;
run;
%mend one;
%one (outcsv, Sample.csv, csv);
data test;
set outcsv;
if _N_ in (5,6,7,8,9,10,11,12,13,14) then output;
run;
you could do something like this
%macro one (a, b, c,strtpt,endpt);
proc import out=&a
datafile= "C:\Users\komal\Desktop\&b"
dbms=&c replace;
getnames=yes;
run;
data test;
set &a;
if _n_ >= &strtpt and _n_ =< &endpt;
run;
%mend one;
%one (outcsv, Sample.csv, csv,5,14);
There is no need to use PROC IMPORT to read from a CSV file. Especially if you already know the names/types of the variables. So something like this should work.
data want ;
infile "C:\Users\komal\Desktop\&b" dsd firstobs=5 obs=14 truncover ;
input ID Gender $ Age Height Weight Year ;
run;
You might need to use 6 to 15 instead if the file has a header row.
I am importing a file with column headers include a $ sign (e.g. "Sales $") with proc import. The result of the import seems to rename that column something like "VAR11".
proc import out = raw
datafile="example.xlsx"
dbms=xlsx replace;
range = "Sheet1$A1:B50";
getnames = yes;
run;
Is there a way to still read in the name of the column, but just drop the $ sign so it is a meaningful header?
If the only problem names are in that format then you should be able to read it and then rename the variable using the label. So get the names and labels into a dataset. You can query dictionary tables, use proc contents, or you could use PROC TRANSPOSE.
proc transpose data=raw (obs=0) out=names ;
var _all_ ;
run;
Now make a list of oldname=newname pairs into a macro variable.
proc sql noprint ;
select catx('=',_name_,translate(trim(compress(_label_,'$')),'_',' '))
into :renames separated by ' '
from names
where upcase(_name_) ne upcase(substrn(_label_,1,32))
;
quit;
You can then use that in a RENAME statement or RENAME= dataset option.
data renamed;
set raw(rename=(&renames)) ;
run;
I have a dataset of CASE_ID (x y and z), a set of multiple dates (including duplicate dates) for each CASE_ID, and a variable VAR. I would like to create a dummy variable DUMMYVAR by group within a group whereby if VAR="C" for CASE_ID x on some specific date, then DUMMYVAR=1 for all observations corresponding to CASE_ID x on with that date.
I believe that a Classic 2XDOW would be the key here but this is my third week using SAS and having difficulty getting this by two BY groups here.
I have referenced and attempted to write a variation of Haikuo's code here:
PROC SORT have;
by CASE_ID DATE;
RUN;
data want;
do until (last.DATE);
set HAVE;
by date notsorted;
if var='c' then DUMMYVAR=1;
do until (last.DATE);
set HAVE;
by DATE notsorted;
if DATE=1 then ????????
end;
run;
Change your BY statements to match the grouping you are doing. And in the second loop add a simple OUTPUT; statement. Then your new dataset will have all the rows in your original dataset and the new variable DUMMYVAR.
data want;
do until (last.DATE);
set HAVE;
by case_id date;
if var='c' then DUMMYVAR=1;
end;
do until (last.DATE);
set HAVE;
by case_id date;
output;
end;
run;
This will create the variable DUMMYVAR with values of either 1 or missing. If you want the values to be 1 or 0 then you could either set it to 0 before the first DO loop. Or add if first.date then dummyvar=0; statement before the existing IF statement.
This is my example. In SAS I have a table where in column1 there are n variables, and every variable is repeated some times. In column2 there are or OK or KO.
I want to generate a synthetic table where every variable is repeated only one time (so I'll have only n records) and where in column2 there is OK, if in the first table ALL the values of column2 referred to variableK is OK and KO if even only one of the values of column2 referred to variableK is KO.
How can I do that?
Thanks
You can do this with a Data Step and the BY statement.
First sort you data by column1
proc sort data=have out=want
by column1;
run;
Then filter the table as you require
data want;
set want;
by column1;
format variableK $2.;
retain variableK;
if first.column1 then
variableK = "OK";
if column2 = "KO" then
variableK = "KO";
if last.column2 then
output;
run;
RETAIN tells SAS to keep the value of variableK between records.
first.column1 is a flag that lets us know when we are at the start of a new value in column1. Here we set variableK to OK.
Then we check to see if column2 is ='KO' and set variableK accordingly.
Finally last.column1 tells us when we are at the end of a value in column1. We use the OUTPUT statement to explicitly tell SAS to only output that record.
I'm working on a very big data set, (more than 100 variables and 11 millions observations). In this data set, i have a variable named DTDSI (simulation date) in DATE9. format. (For example: 01APR2015 , 02MAR2015...). I have a macro-program to analyse this data set by comparing the observations in 2 different months:
%macro analysis (data_input , m , m_1);
.....
%mend;
The 2 macro-variables m and m_1 are months that i want to compare. Their format is MONYY7.(APR2015 , MAR2015...). Keep in mind that i cannot modify my data_input (its the data of my company). In the beginning of my macro program, i want to create a new data set with only the observations of the &m and &m_1 month. I can easily do that by creating a new date variable from DTDSI (real_month for ex) but in the format MONYY7. Then i just select the observations where real_month equal &m or real_month equal &m:
Data new;
Set &data_input;
mois_real = input(DTDSI,MONYY7);
RUN;
PROC SQL;
CREATE TABLE NEW AS;
SELECT *
WHERE mois_real in ("&m" , "&m_1")
FROM NEW;
....
The problem is that in my first Data Statement, i duplicated my data_input; which is bad because it took 30 minutes. So can you tell me how can i make my selection (DTDSI = m and DTDSI=m_1) right in my first Statement?
You can use formula's in your where/if condition, so apply your formula from step 1 into step 2 or vice versa.
Data new;
set &data_input;
WHERE put(DTDSI,MONYY7) in ("&m" , "&m_1");
run;