I have two dates in my SAS dataset that I would like to compare i.e., does date1 = date2. I attempted to do this using the ifn and ifc functions but have a suspicion they are not working correctly. I took this code directly from a sugi, but have worked with these functions successfully comparing character/numeric variables. I also successfully attempted the comparison using proc sql, but I want to learn to do this in a data step.
My code is below:
data not_in_HDD_3;
set not_in_HDD_2;
start=STMT_PERIOD_FROM;
if start=. then start=ADMIT_START_OF_CARE;
start_2=input(start, ANYDTDTE8.);
format start_2 MMDDYY10.;
drop start;
rename start_2=start;
dob=input(birth_date, ANYDTDTE8.);
format dob MMDDYY10.;
Birth_record = ifn (start eq dob, 0 , 1);
ifc_result = ifc(start=dob,"True","False","Missing");
ifn_result = ifn(start=dob,1,0,.);
ifn_result_fmt = put(ifn_result,ifn_label.);
fuzz_result = ifc(fuzz(start-dob),"True","False","Missing");
drop ifn_result;
run;
proc sql;
create table not_in_HDD_4 as
select *,
case
when (start=dob) then "True"
else "False"
end as sql_case_var length=8
from not_in_HDD_3;
quit;
Any insight is appreciated!
SAS dates are simply numbers relative to Jan 1st, 1960 (i.e., day 0). You can compare them using any standard type of comparison. Here are a few ways to do it.
data want;
set have;
dob = input(birth_date, anydtdte8.);
/* 1 and 0 indicator if they are equal/not equal */
result1 = (start = dob);
/* 1, 0, missing indicator: we are exploiting the fact
that adding a missing value produces a missing value.
Note that this is not the case with the sum() function */
if(start + dob = .) then call missing(result2);
else if(start = dob) then result2 = 1;
else result2 = 0;
/* select function */
select;
when(start = dob) result3 = 1;
when(start + dob = .) result3 = .;
otherwise result3 = 0;
end;
run;
Related
I hope you can assist.
I have a SAS data set which has two columns, ID and Date which looks like this:
In some instances, the date column skips a month. I need a code which will create the missing date for each ID e.g. for AY273, I need a code that will create date 2022/11/20 and for WG163, 2022/12/15.
You can merge the data with itself shifted one observation forward (to get a lead value) and loop across that range.
Example:
data have;
input id $ date yymmdd10.;
format date yymmdd10.;
datalines;
AAAAA 2021-11-20
AY273 2022-10-20
AY273 2022-12-20
AY273 2023-01-20
WG163 2022-10-15
WG163 2022-11-15
WG163 2023-01-15
ZZZZZ 2022-01-15
;
data want(keep=id date fillflag);
merge have have(rename=(date=leaddate id=leadid) firstobs=2);
if id eq leadid then
do while (intck('month',date,leaddate) > 0);
output;
date = intnx('month',date,1,'sameday');
fillflag = 1;
end;
else
output;
run;
Try this
data WANT (drop = this_date last_date);
set HAVE(rename=(date = this_date));
by id;
last_date = lag(this_date);
if first.id then do;
date = this_date;
output;
end;
else do date = this_date to last_date + 16 by -30;
output;
end;
format date yymmdd10.;
proc sort;
by id date;
run;
If it does not work, I will correct it.
I am facing an issue, while importing an excel file into sas environment. So basically in the Excel file there are few columns named as
Geography
AR_NO
31-Jan-18
28-Feb-18
31-Mar-18
30-Apr-18
31-May-18
30-Jun-18
After using below the code - >
%macro FX_Lkup(sheet);
FILENAME FXFILE "/idn/home/Module2/excel.xlsx";
PROC IMPORT DATAFILE=FXFILE
DBMS=XLSX
OUT=&sheet.
REPLACE
;
SHEET="&sheet.";
RUN;
%mend FX_Lkup;
%FX_Lkup(LENDING_TEMPLATE);
%FX_Lkup(2018FXRates);
SAS data print the columns as
Geography
AR_NO
43131
43159
43190
43220
and so on.
Does any have solution on that? Any lead would be really appreciated : )
Thanks !
It is correctly imported, SAS uses numbers to store dates. in order to have a date in your final table, you need to declare format = AFRDFDE7. for instance
If you have mixed character and numeric values in the same column then SAS will be forced to create the variable as character. When it does that it stores the number that Excel uses for the date as a string of digits. To convert it to a date in SAS first convert the string of digits to a number and then adjust the number to account for the difference in how SAS and Excel count days.
data want ;
set LENDING_TEMPLATE ;
date = input(geography,??32.) + '30DEC1899'd;
format date date9.;
run;
Dates as headers
If your excel file is using dates as column headers then SAS will also convert them to digit strings since variable names are always characters strings, not numbers. One quick way to fix it is to use PROC TRANSPOSE. This will be easy when each row is uniquely identified by the other variables and when all of the "date" variables are numeric.
proc transpose data=LENDING_TEMPLATE out=tall ;
by geography ar_no ;
run;
data tall ;
set tall ;
date = input(_name_ , 32.) + '30DEC1899'd ;
format date date9. ;
drop _name_;
run;
You could stop here as now you have a useful dataset where the date values are in a variable instead of hiding in the metadata (variable name).
To get back to your original wide layout just add another PROC TRANSPOSE and tell it to use DATE as the ID variable.
proc transpose data=tall out=wide ;
by geography ar_no ;
id date;
var col1;
run;
IMPORT is using Excel date valued column headers as Excel epoch date numbers.
Use Proc DATASETS to change the column label, and possibly rename the columns to something generic such as DATE1-DATE6. Or, continue on, and further unpivot the data into a categorical form with columns GEO, AR_NO, DATE, VALUE
You might be asking yourself "Where do those numbers, such as 43131, come from?", or "What is an Excel epoch date number?"
They are unformatted Excel date values. The human readable date represented by a number is determined by a systems epoch, or the date represented by the number 0.
Different systems of time keeping have different epochs (starting points) and time units. Some examples:
21DEC1899 Excel datetime number 0, 1 = 1 day
01JAN1960 SAS date number 0, 1 = 1 day
01JAN1960 SAS datetime number 0, 1 = 1 second
01JAN1970 Unix OS datetime number 0, 1 = 1 second
To convert an Excel date number to a SAS date number you need to subtract 21916, which is the number of days from 31DEC1899 to 01JAN1960
This understanding of date epochs will be used when setting the label of a SAS column and renaming the column.
For others fiddling with code, the following will create an Excel worksheet having Date valued column headers. I speculate such a situation can otherwise arise when importing a worksheet containing an Excel pivot table.
First create some sample SAS data
data demo_tall;
do Geography = 'Mountains', 'Plains';
do AR_NO = 1 to 3;
_n_ = 0;
length label $200;
do label = '31-Jan-18', '28-Feb-18', '31-Mar-18',
'30-Apr-18', '31-May-18', '30-Jun-18'
;
_n_ + 1;
name = cats('Date',_n_);
value + 1;
output;
end;
end;
end;
run;
proc transpose data=demo_tall out=demo_wide(drop=_name_);
by Geography AR_NO;
var value;
id name;
idlabel label;
run;
Sample SAS data set (pivoted with transpose)
Then create Excel sheet with Excel date valued and formatted column headers
ods noresults;
ods excel file='%TEMP%\across.xlsx' options(sheet_name='Sample');
data _null_;
declare odsout xl();
if 0 then set demo_wide;
length _name_ $32;
xl.table_start();
* header;
xl.row_start();
do _n_ = 1 to 100; * 100 column guard;
call vnext(_name_);
if _name_ = '_name_' then leave;
_label_ = vlabelx(_name_);
_date_ = input(_label_, ?? date9.);
* make some header cells an Excel date formatted value;
if missing(_date_) then
xl.format_cell(data:_label_);
else
xl.format_cell(
data:_date_,
style_attr:"width=9em tagattr='type:DateTime format:dd-mmm-yy'"
);
end;
xl.row_end();
* data rows;
do _n_ = 1 by 1 while (not lastrow);
set demo_wide end=lastrow;
xl.row_start();
call missing(_name_);
do _index_ = 1 to 100; * 100 column guard;
call vnext(_name_);
if _name_ = '_name_' then leave;
xl.format_cell(data:vvaluex(_name_));
end;
xl.row_end();
end;
xl.table_end();
stop;
run;
ods excel close;
ods results;
Excel file created
IMPORT Excel worksheet
Log will show the 'funkiness' of date valued column headers
options msglevel=I;
proc import datafile='%temp%\across.xlsx' dbms=xlsx replace out=want;
sheet = "Sample";
run;
proc contents noprint data=want out=want_meta(keep=name label varnum);
run;
----- LOG -----
1380 proc import datafile='%temp%\across.xlsx' dbms=xlsx replace out=want;
1381 sheet = "Sample";
1382 run;
NOTE: Variable Name Change. 43131 -> _43131
NOTE: Variable Name Change. 43159 -> _43159
NOTE: Variable Name Change. 43190 -> _43190
NOTE: Variable Name Change. 43220 -> _43220
NOTE: Variable Name Change. 43251 -> _43251
NOTE: Variable Name Change. 43281 -> _43281
NOTE: VARCHAR data type is not supported by the V9 engine. Variable Geography has been converted
to CHAR data type.
NOTE: The import data set has 6 observations and 8 variables.
NOTE: WORK.WANT data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Modify the header (metadata) of the imported data set
Date valued column names will be renamed DATE1-DATE6 and the label will be changed to be the corresponding date in SAS format DATE11. (dd-mon-yyyy)
%let renames=;
%let labels=;
data _null_;
length newname $32;
length renames labels $32767;
retain renames labels;
set meta end=lastvar;
date = coalesce(input(label, ?? 5.),1e9) + '31dec1899'd;
if '01jan1980'd < date < today() then do;
index + 1;
newname = cats('DATE',index);
label = quote(trim(put(date,date11.)));
labels = catx(' ', labels, catx('=',name,label));
renames = catx(' ', renames, catx('=',name,newname));
end;
if lastvar;
if not missing(labels) then call symput('labels', trim('LABEL ' || labels));
if not missing(renames) then call symput('renames', trim('RENAME ' || renames));
run;
proc datasets nolist lib=work;
modify want;
&labels;
&renames;
run;
quit;
%symdel labels renames;
%let syslast = want;
The result, when printed.
Optional
Unpivot to a categorical form (tall layout)
proc transpose data=want out=stage1(rename=(col1=value _label_=date_string));
by geography ar_no;
var date:;
label _name_ = ' ';
label date_string = ' ';
run;
data want_tall;
set stage1;
date = input (date_string, date11.);
format date date11.;
keep geography ar_no _name_ date value;
run;
so I'm trying to get 2 dates in an excel sheet and use the DateDiff function to get the number of days between the 2 dates. I am essentially adding the number of days together and dividing by the the number of rows to get and average amount of days. So far I have it to where the total amount of days for every row gets added together and is displayed on column "E" and the number of rows is placed on column "F". I know I am close because at one point it worked but I was dumb and changed something and now i does not. here is my code and the excel sheet.
Sub GetDays()
Range("C1").Select
Do Until ActiveCell.Value = ""
date1 = DateValue(ActiveCell.Offset(1, 0).Value)
date2 = DateValue(ActiveCell.Offset(1, 0).EntireRow.Cells(1, "D").Value)
DayCount = DateDiff("d", date1, date2) + DayCount
ActiveCell.Offset(1, 0).EntireRow.Cells(1, "E").Value = DayCount
StudentCount = StudentCount + 1
ActiveCell.Offset(1, 0).EntireRow.Cells(1, "F").Value = StudentCount
ActiveCell.Offset(1, 0).Select
Loop
End Sub!
Here is a snippet of the sheet
The issue I discovered when testing your code is that your loop is comparing to the ActiveCell value to determine when to exit, but then your code is operating on the cell below ActiveCell, as a result of the Offset(1,0) call. So when your loop is on the last line of data, ActiveCell.Value = "3/25/2015 10:52", but your next line of code is trying to populate date1 with the DateValue of a null since it is offset down one row. This throws a Type Mismatch error.
I've adjusted your code below, this works for me:
Sub GetDays()
Range("C1").Select
Do Until ActiveCell.Value = ""
date1 = DateValue(ActiveCell.Value)
date2 = DateValue(ActiveCell.Offset(0, 1).Value)
DayCount = DateDiff("d", date1, date2) + DayCount
ActiveCell.Offset(0, 2).Value = DayCount
StudentCount = StudentCount + 1
ActiveCell.Offset(0, 3).Value = StudentCount
ActiveCell.Offset(1, 0).Select
Loop
End Sub
I adjusted the offset command so that we are looking at the same row at all times each loop. I replaced the "EntireRow.Cells(1, "D")" sections by just using the column integer in Offset().
You may need to change the second line to: Range ("C2").Select for my code to work, depending on if your data starts on row 1 or row 2.
I'm having some frustration with dates in SAS.
I am using proc forecast and am trying make my dates spread evenly. I did some pre-processing wiht proc sql to get my counts by month but my dates are incorrect.
Though my dataset looks good (b/c I used format MONYY.) the actual value of that variable is wrong.
date year month count
Jan10 2010 1 100
Feb10 2010 2 494
...
..
.
The Date value is actually the full SAS representation of the date (18267), meaning that it includes the day count.
Do I need to convert the variable to a string and back to a date or is there a quick proc i can run?
My goal is to use the date variable with proc forecast so I only want Month and year.
Thanks for any help!
You can't define a date variable in SAS (so the number of days passed from 1jan1960) excluding the day.
What you can do is to hide the day with a format like monyy. but the underlying number will always contain that information.
Maybe you can use the interval=month option in proc forecast?
Please add some detail about the problem you're encountering with the forecast procedure.
EDIT: check this example:
data past;
keep date sales;
format date monyy5.;
lu = 0;
n = 25;
do i = -10 to n;
u = .7 * lu + .2 * rannor(1234);
lu = u;
sales = 10 + .10 * i + u;
date = intnx( 'month', '1jul1991'd, i - n );
if i > 0 then output;
end;
run;
proc forecast data=past interval=month lead=10 out=pred;
var sales;
id date;
run;
I have a list of intra-day prices at 9am, 10am, 11am etc. for each day (which are in number formats such as 15011 15012 etc.)
I want to only keep the observations from the last available 5 days and the next 5 available days from the date 't' and delete everything else.
Is there a way to do this?
I tried using
if date < &t - 5 or date > &t + 5 then delete;
However, since there are weekends/holidays I don't get all the observations I want.
Thanks a lot in advance!
Not much info to go on, but here is a possible solution:
/* Invent some data */
data have;
do date=15001 to 15020;
do time='09:00't,'10:00't,'11:00't;
price = ranuni(0) * 10;
output;
end;
end;
run;
/* Your macro variable identifying the target "date" */
%let t=15011;
/* Subset for current and following datae*/
proc sort data=have out=temp(where=(date >= &t));
by date;
run;
/* Process to keep only current and following five days */
data current_and_next5;
set temp;
by date;
if first.date then keep_days + 1; /* Set counter for each day */
if keep_days <= 6; /* Days to keep (target and next five) */
drop keep_days; /* Drop this utility variable */
run;
/* Subset for previous and sort by date descending */
proc sort data=have out=temp(where=(date < &t));
by descending date;
run;
/* Process to keep only five previous days */
data prev5;
set temp;
by descending date;
if first.date then keep_days + 1; /* Set counter for each day */
if keep_days <= 5; /* Number of days to keep */
drop keep_days; /* Drop this utility variable */
run;
/* Concatenate together and re-sort by date */
data want;
set current_and_next5
prev5;
run;
proc sort data=want;
by date;
run;
Of course, this solution suggests that your starting data contains observations for all valid "trading days" and returns everything without doing date arithmetic. A much better solution would require that you create a "trading calendar" dataset with all valid dates. You can easily deal with weekends, but holidays and other "non-trading days" are very site specific; hence using a calendar is almost always preferred.
UPDATE: Joe's comment made me re-read the question more carefully. This should return a total of eleven (11) days of data; five days prior, five days following, and the target date. But still, a better solution would use a calendar reference table.
Try this
/* Get distinct dates before and after &T */
proc freq data=mydata noprint ;
table Date /out=before (where=(Date < &T)) ;
table Date /out=after (where=(Date > &T)) ;
run ;
/* Take 5 days before and after */
proc sql outobs=5 ;
create table before2 as
select Date
from before
order by Date descending ;
create table after2 as
select Date
from after
order by Date ;
quit ;
/* Subset to 5 available days before & after */
proc sql ;
create table final as
select *
from mydata
where Date >= (select min(date) from before2)
and Date <= (select max(date) from after2)
order by Date ;
quit ;