SAS 94 How to calculate the number of days until next record - date

Using SAS I want to be able to calculate the number of days between two dates where the value is the number of days until the next record.
The required output will be:
Date Num Days
10/09/2020 1
11/09/2020 1
12/09/2020 1
14/09/2020 2
15/09/2020 1
16/09/2020 1
17/09/2020 1
18/09/2020 1
20/09/2020 2
I have tried using Lag and Retain but just cant get it work.
Any advice and suggestions would be really appreciated.

If you sort the data by descending DATE then it is easier because then you just need to look backwards to find the next date. So you can use LAG() or DIF() function.
data want;
set have;
by descending date;
num_days = dif(date);
run;
To simulate a "lead" function you can set another copy of the data skipping the first observation.
data want;
set have ;
set have(firstobs=2 keep=date rename=(date=next_date)) have(obs=1 drop=_all_);
num_days = next_date - date;
run;

Related

how do i write in sas a filter that subtracts dates

Hello Stackoverflow community... hope you can help with this sas question.
I need to create a filter for a table which gives back only those records that are active from last year forward.
I would like to obtain something like :
data want;
set have;
where expire_date >= current(date) - 1year:
run;
the format of the expire_date column is 03MAY2022 (date9. format)... I tried to transform the date into a number and then subtracting 365, but i guess there is a better solution.
can someone illuminate me?
thanks in advance
I think you are searching for the INTNX() function:
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.2/lefunctionsref/p10v3sa3i4kfxfn1sovhi5xzxh8n.htm#n1lchasgjah7ran0z2wlmsbfwdx2
For example:
data a;
format num_date num_date_minus_year DATE9.;
char_date="03MAY2022";
num_date = inputn (char_date, "DATE9.");
num_date_minus_year = intnx ('YEAR', num_date, -1, "SAME");
put num_date= num_date_minus_year=;
run;
Output:
num_date=03MAY2022 num_date_minus_year=03MAY2021
You can get the current date using the DATE() function.
Do you want one YEAR or 365 days?
To use a date interval use the INTNX() function.
where expire_date >= intnx('year',date(),-1,'same') ;
To use a fixed number of days just subtract the number.
where expire_date >= date() - 365 ;

Re-assign value based on dates

I need to compare the dates and reassign the values to two new variables by ID.
If there are two dates for same id, then:
If the 'date' variable is earlier, its value should be reassigned to "earlier status".
If the 'date' variable is later, its value should be reassigned to "Current status".
if there is only one date for the id, the value will be reassigned to "current status". and the "earlier status" need to be missing.
if there are more than two dates for the id, then the value for the middle date will be ignored, and only use the earlier and most current value.
Any thoughts? Much appreciated!
This is the code that I have tried:
data origin;
input id date mmddyy8. status;
datalines;
1 1/1/2010 0
1 1/1/2011 1
2 2/2/2002 1
3 3/3/2003 1
3 2/5/2010 0
4 1/1/2000 0
4 1/1/2003 0
4 1/1/2005 1
;
run;
proc print; format date yymmdd8.; run;
proc sort data=origin out=a1;
by id date;
run;
data need; set a1;
if first.date then EarlierStatus=status;
else if last.date then CurrentStatus=status;
by id;
run;
proc print; format date yymmdd8.; run;
So, a couple of things. First - note a few corrections to your code - in particular the : which is critical if you're going to input with mixed list style.
Second; you need to retain EarlierStatus. Otherwise it gets cleared out each data step iteration.
Third, you need to use first.id not first.date (and similar for last) - what first is doing there is saying "This is the first iteration of a new value of id". Date is what you'd say in English ("The first date for that...").
Finally, you need a couple of more tests to set your variables the way you have them.
data origin;
input id date :mmddyy10. status;
format date mmddyy10.;
datalines;
1 1/1/2010 0
1 1/1/2011 1
2 2/2/2002 1
3 3/3/2003 1
3 2/5/2010 0
4 1/1/2000 0
4 1/1/2003 0
4 1/1/2005 1
;
run;
proc sort data=origin out=a1;
by id date;
run;
data need;
set a1;
by id;
retain EarlierStatus;
if first.id then call missing(EarlierStatus); *first time through for an ID, clear EarlierStatus;
if first.id and not last.id then EarlierStatus=status; *if it is first time for the id, but not ONLY time, then set EarlierStatus;
else if last.id then CurrentStatus=status; *if it is last time for the id, then set CurrentStatus;
if last.id then output; *and if it is last time for the id, then output;
run;
The if/elses that I do there could be done slightly differently, depending on how you want to do things exactly, I was trying to keep things a bit direct as far as how they relate to each other.
This proc sql will get what you want:
proc sql;
create table need as
select distinct
t1.id,
t2.EarlierStatus,
t1.CurrentStatus
from (select distinct
id,
date,
status as CurrentStatus
from origin
group by id
having date=max(date)) as t1
left join (select distinct
id,
date,
status as EarlierStatus
from origin
group by id
having date ~= max(date)) as t2 on t1.id=t2.id;
quit;
The above code has two subqueries. In the first subquery, you retain only the rows with the max of date by id, and rename status to CurrentStatus. In the second subquery, you retain all the rows that do not have the max of date by id and rename status to EarlyStatus. So if your origin table has only one date for one id, it is also the max and you will delete this row in the second subquery. Then you perform a left join between the first and the second subqueries, pulling EarlyStatus from the second into the first query. If EarlyStatus is not found, then it goes missing.
Best,

intck() giving negative value

I am new to SAS and I am having trouble with finding the difference between 2 dates.
I have 2 columns: checkin_date and checkout_date
the dates are in mmddyy10. format (mm/dd/yyyy).
I have used the following code:
stay_days= intck('day', checkin_day, checkout_day);
I am getting the right values for dates in the same month but wrong values for days that are across 2 months. For example, the difference between 02/06/2014 and 02/11/2014 is 5. But the difference between 1/31/2014 and 2/13/2014 is -18 which is incorrect.
I have also simply tried to subtract them both:
stay_day = checkout_day - checkin_day;
I am getting the same result for that too.
My entire code:
data hotel;
infile "XXXX\Hotel.dat";
input room_no num_guests checkin_month checkin_day checkin_year checkout_month checkout_day checkout_year internet_used $ days_used room_type $16. room_rate;
checkin_date = mdy(checkin_month,checkin_day,checkin_year);
informat checkin_date mmddyy.;
format checkin_date mmddyy10.;
checkout_date = mdy(checkout_month,checkout_day,checkout_year);
informat checkout_date mmddyy.;
format checkout_date mmddyy10.;
stay_day= intck('day', checkin_day, checkout_day);
Your problem is a typo - using wrong variables in intck() function. You are using variables "xxx_DAY" which is the DAY of month instead of the full DATE. Change to stay_day= intck('day', checkin_date, checkout_date);
Your data probably has the date values in the wrong variables. When using subtraction the order should be ENDDATE - STARTDATE. When using INTNX() function the order should be from STARTDATE to ENDDATE. In either case if the value in the STARTDATE variable is AFTER the value in the ENDDATE variable then the difference will be a negative number.
Perhaps you need to clean the data?
The only way to get -18 comparing 2014-01-31 and 2014-02-13 would be if you extracted the day of the month and subtracted them.
diff3 = day(end) - day(start);
which would be the same as subtracting 31 from 13.
Example using your dates:
data check;
input start end ;
informat start end mmddyy.;
format start end yymmdd10.;
diff1=intck('day',start,end);
diff2=end-start;
cards;
02/06/2014 02/11/2014
1/31/2014 2/13/2014
;
Results:
Obs start end diff1 diff2
1 2014-02-06 2014-02-11 5 5
2 2014-01-31 2014-02-13 13 13

Better way to make/compare date ranges?

I often have data that has a date1 and a date2. Date1 is the date we guess will have the event and date2 an event. I usually need to make 2 dummy variables where I increment date1 forwards a week and backwards then compare with the other 2. However I keep thinking there must be a better way to create a date range and then compare with a second date!
Is there a way to do this in sas? Basically I want to take date1 and date2 and make this dataset and am wondering if I MUST create 2 additional variables (date1-7 days and date1+7days)
Input datset:
DATE1 DATE2
10/23/2014 2/12/2015
2/12/2015 2/10/2015
Current output:
DATE1_wk_before Date1_wk_after Date2 In_range_indicator
10/16/2014 10/30/2014 2/12/2015 0
2/05/2015 2/19/2015 2/10/2015 1
Where In_range_indicator = 1 if date is in the range and 0 if not in the range
I want to know if I can do it just where I do something like
In_range_indicator= 1 where Date2 is in range(week before date1 , week after date1) without creating 2 extra sets of data. It seems a waste of time.
I am LITERALLY adding 7 days and subtracting 7 days before and after and it seems a bad way to do this.
You seems to just want to set the value of a variable based on a condition. No need to get too clever with it, just if and else in your data step:
if date2 ge date1-7 and date2 le date1+7 then ind=1;
else ind=0;
Agree with #DWal, simple if and else statement can help. You can also use IFN function.
data mydates;
infile datalines missover;
input (date1-date2) (:mmddyy10.);
In_range_indicator=ifn( date1-7 <= date2 <= date1+7 , 1,0);
format date1-date2 yymmdd10.;
datalines4;
10/23/2014 2/12/2015
2/12/2015 2/10/2015
;;;;
run;
proc print data=mydates;run;
if abs(date2-date1)<7 then ind=1; else ind=0;

removing day portion of date variable for time series SAS

I'm having some frustration with dates in SAS.
I am using proc forecast and am trying make my dates spread evenly. I did some pre-processing wiht proc sql to get my counts by month but my dates are incorrect.
Though my dataset looks good (b/c I used format MONYY.) the actual value of that variable is wrong.
date year month count
Jan10 2010 1 100
Feb10 2010 2 494
...
..
.
The Date value is actually the full SAS representation of the date (18267), meaning that it includes the day count.
Do I need to convert the variable to a string and back to a date or is there a quick proc i can run?
My goal is to use the date variable with proc forecast so I only want Month and year.
Thanks for any help!
You can't define a date variable in SAS (so the number of days passed from 1jan1960) excluding the day.
What you can do is to hide the day with a format like monyy. but the underlying number will always contain that information.
Maybe you can use the interval=month option in proc forecast?
Please add some detail about the problem you're encountering with the forecast procedure.
EDIT: check this example:
data past;
keep date sales;
format date monyy5.;
lu = 0;
n = 25;
do i = -10 to n;
u = .7 * lu + .2 * rannor(1234);
lu = u;
sales = 10 + .10 * i + u;
date = intnx( 'month', '1jul1991'd, i - n );
if i > 0 then output;
end;
run;
proc forecast data=past interval=month lead=10 out=pred;
var sales;
id date;
run;