If-Then Block Issue in SAS - date

I am working on writing a SAS code and since I am new to SAS (Have worked on R all the time), I am having trouble understanding the date formats in SAS.
I Have a SAS data set Sales_yyyymm and I am creating a code that takes the user's input of a date value, and if Sales data exists for that date, I need to create a flag as 1 else as 0. Currently I am using this code -
%Let check_date = 20010120;
Data A;
Set B;
If date=&check_date then Date_Flag = 1;
else Date_Flag = 0;
run;
date is the Date column in my SAS data set Sales_yyyymm and the values are like 20130129, 20110412, 20140120 etc.
But if I run this code, I get all my Date_Flag values as 0. The IF condition is being ignored and I am not sure how or why this is happening.
Any idea?
Thanks!

You need to read Understanding How SAS Handles Dates article to get how SAS internally stores a date and how arithmetic on date including comparisions are carried out.
In SAS, every date is a unique number on a number line. Dates before
January 1, 1960, are negative numbers; those after January 1, 1960,
are positive. Because SAS date values are numeric variables, you can
sort them easily, determine time intervals, and use dates as
constants, as arguments in SAS functions, or in calculations.
As I mentioned in the comments you really need to specify your date literals in "ddmmmyyyy"d format.
So, %Let check_date = 20010120; should be written as:
%Let check_date = 20JAN2001;
Data A;
Set B;
If date="&check_date."d then Date_Flag = 1;
else Date_Flag = 0;
run;
20010120 translated into SAS date goes beyond the valid range SAS can handle. Ex: 2001012 - note there is no zero at the end, corresponds to "03AUG7438" - yes, that's year 7438!!
Whereas 14995 is the integer that SAS understands to be the date 20JAN2001

Related

How to Define date format from a given date

I wanted to know, how can we define date format from given date
for example, i have date 20180423 then in sas I want to define format as 'yyyymmdd'
similarly , i have date given in data as 12022018 then i want to define as 'ddmmyyyy'
Please note that, date is provided to me in proper date, but i want to define format now.
Date given may be different in future
so I need to take care all of the date format through SAS
What I thought was given date 20180422
use substr function
data test;
a=20180422;
a=substr(a,1,4);
b=substr(a,5,1);
c=substr(a,7,1);
run;
but not sure.
If anyone can provide the solution,then it really helps me in my project work.
Thanks in Advance for help.
It sounds like you want to convert various values to a date. SAS stores dates as a number, being the number of days since 1st Jan 1960. It's then usual to format this number to display as a date, in whichever format is preferred.
When importing dates that's are already in a format, it is necessary to use the input function, along with an informat, to convert the formatted value to a SAS date. If the date values being read in are all in the same format, then the specific informat can be used. In your case, where different formats are used, you can use the anydtdte. informat which will convert most of the standard date formats to a SAS date.
The example below converts 3 different date formats to a SAS date, then displays the SAS date in the date9. format. I've printed both the unformatted and formatted new values to the log, just so you can see they are stored as numbers.
data _null_;
input date_in $20.;
date_out = input(date_in, anydtdte20.);
put date_in date_out date_out :date9.;
datalines;
20180422
12022018
27apr2018
;
run;
Use the input(a,anydtdte20.); this will convert any date to SAS date, then use the functions Year(), Month(), Day() to extract the data you want.
You will find this SAS Post very useful about dates and locales.
Solution:
I created a table with two rows; each row have a different date format YYYYMMDD & DDMMYYYY to show you how the code will handles different date formats, saved them to SAS date and broke them down to Year, Month & Day:
options DATESTYLE=DMY;
data have;
input a;
datalines;
20180422
12022018
;
run;
data test;
set have;
format date_a date9.;
date_a=input(a,anydtdte20.);
Year_a=year(date_a);
month_a=month(date_a);
day_a=day(date_a);
run;
Output:
a=20180422 date_a=22APR2018 Year_a=2018 month_a=4 day_a=22
a=12022018 date_a=12FEB2018 Year_a=2018 month_a=2 day_a=12
You can use an if condition inside a data step. Using If condition, check for the condition to be true (check date value satisfies the required criteria), then format the date using a put function.Put function can take a source as first argument and format as second argument , and return the formatted value. Different values of same column, can have different formats specified that way.
Something like this,
if a = 'date1CheckCondtion' then newA = put(a , dateformat1.);
if a = 'date2' then newA = put(a , dateformat2.);
You may then choose to get all values in a common format like this:
dateA=input(newA,mmddyy6.);

Fuzzy join without proc SQL

Good day,
I wish to merge two dates to next closest.
Datasets are huge 500Mb to 1G so proc sql is out of the question.
I have two data sets. First (Fleet) has observations, second has date and which generation number to use for further processing. Like this:
data Fleet
CreatedPortalDate
2013/2/19
2013/8/22
2013/8/25
2013/10/01
2013/10/07
data gennum_list
date
01/12/2014
08/12/2014
15/12/2014
22/12/2014
29/12/2014
...
What I'd like to have is a link-table like this:
data link_table
CreatedPortalDate date
14-12-03 01/12/2014
14-12-06 01/12/2014
14-12-09 08/12/2014
14-12-11 08/12/2014
14-12-14 08/12/2014
With logic that
Date < CreatedPortalDate and (CreatedPortalDate - date) = min(CreatedPortalDate - date)
What I came up with is a bit clunky and I'm looking for an efficient/better way to accomplish this.
data all_comb;
set devFleet(keep=createdportaldate);
do i=1 to n;
set gennum_list(keep=date) point=i nobs=n;
if createdportaldate > date
and createdportaldate - 15 < date then do;/*Assumption, the generations are created weekly.*/
distance= createdportaldate - date;
output;
end;
end;
run;
proc sort data=all_comb; by createdportaldate distance; run;
data link_table;
set _all_comb(drop=distance);
by createdportaldate;
if first.createdportaldate;
run;
Any ideas how to improve or approach this issue?
Ignorant idea: Could I create hash tables where distance would be stored.
Arrays maybe? somehow.
EDIT:
common format
Done
Where does the billion rows come from?
Yes, there are other data involved but the date is the only linking variable.
Sorted?
Yes, the data is sorted and can be sorted again.
Are gen num dates always seven days apart ?
No. That's the tricky part. Otherwise I could use weekand year(or other binning) as unique identifier.
Huge is a relative term, today's huge is tomorrow's speck.
Key data features indicate a direct addressing lookup scheme is possible
Date values are integers.
Date value ranges are limited.
A date value, or any of the next 14 days will be used as a lookup verifier
The key is a date value, which can be used as an array index.
Load the Gennum lookup once as follows
array gennum_of ( %sysfunc(today()) ) _temporary_;
if last_date then
do index = last_date to date-1;
gennum_of(index) = prev_date;
end;
last_date = date;
And fetch a gennum as
if portaldate > last_date
then portal_gennum = last_date;
else portal_gennum = gennum_of ( portaldate );
If you have many rows due to grouping by account ids, you will have to clear and load up the gennum array per group.
This is a typical application of a sas by statement.
The by statement in a data step is meant to read two or more data sets at onece sorted by a common variable.
The common variable is the date, but it is named differently on both datasets. In sql, you solve that by requiring equality of the one variable to the other Fleet.CreatedPortalDate = gennum_list.date, but the by statement does not allow such construction, so we have to rename (at least) one of them while reading the datasets. That is waht we do in the rename clause within the options of gennum_list
data all_comb;
merge gennum_list (in = in_gennum rename = (date = CreatedPortalDate))
Fleet (in = in_fleet);
by CreatedPortalDate;
I choose to combine the by statement with a merge statement, though a set would have done the job too, but then the order of both input datasets makes a difference.
Also note that I requested sas to create indicator variables in_gennum and in_fleet that indicate in which input dataset a value was present. It is handy to know that this type of variables id not written to the result data set.
However, we have to recover the date from the CreatedPortalDate, of course
if in_gennum then date = CreatedPortalDate;
If you are new to sas, you will be surprised the above statement does not work unless you explicitly instruct sas to retain the value of date from one observation to the nest. (Observation is sas jargon for row.)
retain date;
And here we write out one observation for each observation read from the Fleet dataset.
if in_fleet then output;
run;
The advantages of this approach are
you need much less logic to correctly combine the observations from both input datasets (and that is what the data step is invented for)
you never have to retain an array of values in memory, so you can not have overflow problems
this sollution is of order 1 (O1), in the size of the datasets (apart from the sorting), so we know upfront that doubling the amount of data will only const double the time.
Disclaimer: this answer is under construction.
It will be tested later this week

SAS: create time ID variable with program (instead of using the point-and-click system)

SAS offers a point-and-click system to create a time ID variable from a certain starting date using a particular frequency (e.g. weeks, quarters, years).
Since I need to do this proces repeatedly, I like to use a code as it makes things much easier. My data covers 1985-2005 and is divided into quarters (which gives 21 years * 4 quarters = 84 observations).
The date variable column should look like this (or give any other sas date which can be formated):
Date:
1985/1
1985/2
1985/3
1985/4
1986/1
etc.
Does anyone knows how to write a code for this?
Thank you very much in advance!
Rens (a PhD student in sociology working on the music charts)
You can use a data step and the YYQ function.
data quarters;
do year = 1985 to 2005;
do quarter = 1 to 4;
date = yyq(year,quarter);
output;
end;
end;
format date yyq.;
run;
proc print;
run;
Use intnx function.
data have;
do i=0 by 1;
date=intnx('quarter',yyq(1985,1),i);
if date>yyq(2005,4) then return;
output;
end;
format date yyqs6.;
run;

Convert Character Date variable to SAS Date

I have the following Variable called Date in an excel file which I'm reading into SAS:
Date
May2005
June2005
July2005
..
July2015
Both the format and the informat are characters ($8)
I wanted to convert these into a SAS Date variable.
How can I accomplish this task?
I thought about using substr to first create a month and year variable,
then use proc format to convert all the months to numeric (e.g 'jan' = 1).
The use the mdy date function to create a new date. But I wonder if there is a shorter way to accomplish this task?
You can use the ANYDTDTE. informat if you prepend a day to your month+year string.
data want ;
set have ;
actual_date = input('01'||date,anydtdte.);
format actual_date date9.;
run;
Note that the FORMAT or INFORMAT attached to the character variable is meaningless, but having a variable of only length 8 will not allow room to store longer month names. Perhaps the length got set to only 8 because your particular example set of data did not include any longer month names.
If you are running such an old version of SAS that the ANYDTDTE. informat does not exist or does not work with fully spelled out months then you will need to work a little harder. You could transform the string into DATE9 format.
actual_date = input
('01'||substr(date,1,3)||substr(date,length(date)-3)
,DATE9.);
As #Tom hints towards, you have to use an informat that SAS can interpret as a numeric value when reading in character dates. I'm not sure if there is one that reads MONTHYYYYw., (naturally, ANYDTDTE works but I prefer to avoid it). In this case, I would use MONYYw., combined with substr to get the length 3 Month abbreviation and the 2 digit year:
data have;
input Date $13.;
datalines;
January2005
Feburary2005
March2005
April2005
May2005
June2005
July2005
August2005
September2005
October2005
November2005
December2005
;
run;
data want;
set have;
Date2 = input(SUBSTR(Date,1,3)||SUBSTR(Date,length(date)-1,2),MONYY13.);
Format Date2 DATE8.;
run;
proc print data = want; run;

sas convert date format

I have a var of birth date in this format: 15APR1954
I need to set a new var that will present the current age - as if today's date is 01.01.2011
in order to use the var, how do I convert the date?
otherwise it gives me the following error :"The MDY function call does not have enough arguments".
data DAT2;set DAT1;
array BD{*} birth_date;
Curage=0;
do i=1 to dim(BD);
Curage+(MDY(01012011)-(birth_date));
end;
drop i;
run;
The best way to calculate age is to use the SAS built-in function yrdif().
data dat2;
set dat1;
curage = yrdif(birth_date, today(), 'AGE');
run;
The function today() returns today's date. If you want the age as of a certain date, e.g. 2011-01-01 like in your example, you can replace today() with '01JAN2011'd or with mdy(1, 1, 2011). (Note that your syntax for mdy() was incorrect.)
I'll also note that your array approach doesn't make a whole lot of sense; you're defining an array with only one element, so you might as well just perform operations on that value. Arrays are useful when you wish to perform identical operations to a group of 2 or more variables. For thorough information on array processing in SAS, see this section of the documentation.