I'd like to assign a person's name to a number based on a range rather than an explicit number. It's possible to do this using formats, but as I have the names in a dataset I'd prefer to avoid the manual process of writing the proc format.
data names;
input low high name $;
datalines;
1 10 John
11 20 Paul
21 30 George
31 40 Ringo
;
data numbers;
input number;
datalines;
33
21
17
5
;
The desired output is:
data output;
input number name $;
datalines;
33 Ringo
21 George
17 Paul
5 John
;
Thanks for any help.
You can do it like this using PROC SQL:
proc sql;
create table output as
select numbers.number, names.name
from numbers left join names
on numbers.number ge names.low
and numbers.number le names.high
;
quit;
One handy feature of proc format is the ability to use a data set to create the format, instead of typing it in by hand. Your scenario seems like a perfect scenario for this feature.
In the example you give, a few small changes to the "names" data set will put it in a form that can be read by proc format.
For example, if I modify the names data set like so..
data names;
retain fmtname "names" type "N";
input start end label $;
datalines;
1 10 John
11 20 Paul
21 30 George
31 40 Ringo
;
I can then issue this command to build the format based on it.
proc format cntlin=names;run;
Now I can use this format just like you would with any other format. For example, to create a new column that contains the desired "name" based on the number, you could do this:
data numbers;
input number;
number_formatted=put(number,names.);
datalines;
33
21
17
5
;
Here is what the output would look like:
number_
number formatted
33 Ringo
21 George
17 Paul
5 John
Update to address question:
There isn't much difference in coding needed to read from a text file. We just need to set it up so that the output data set has the particular variable names that proc format expects (fmtname, type, start, end , and label).
For example, if I have an external comma-seperated file called "names.csv" that looks like this:
1,10,John
11,20,Paul
21,30,George
31,40,Ringo
Then I simply can change the code that creates the "names" data set so that it looks like this:
data names;
retain fmtname "names" type "N";
infile "<path to file>/names.csv" dsd;
input start end label $;
run;
Now I can run proc format with the cntlin option like I did before:
proc format cntlin=names;run;
I think SQL is more succinct indeed, but if you aren't big fan of it and the numbers come in known increments, you may try something like:
data ranges;
set names;
do number = low to high; /* by ... */
output;
end;
proc sort;
by number;
run;
data output;
merge ranges
numbers ( in = innum )
;
by number;
keep number name;
if innum;
run;
Again, it requires numbers to come in predetermined increments, e.g. integers.
Related
Using SAS I want to be able to calculate the number of days between two dates where the value is the number of days until the next record.
The required output will be:
Date Num Days
10/09/2020 1
11/09/2020 1
12/09/2020 1
14/09/2020 2
15/09/2020 1
16/09/2020 1
17/09/2020 1
18/09/2020 1
20/09/2020 2
I have tried using Lag and Retain but just cant get it work.
Any advice and suggestions would be really appreciated.
If you sort the data by descending DATE then it is easier because then you just need to look backwards to find the next date. So you can use LAG() or DIF() function.
data want;
set have;
by descending date;
num_days = dif(date);
run;
To simulate a "lead" function you can set another copy of the data skipping the first observation.
data want;
set have ;
set have(firstobs=2 keep=date rename=(date=next_date)) have(obs=1 drop=_all_);
num_days = next_date - date;
run;
I need to compare the dates and reassign the values to two new variables by ID.
If there are two dates for same id, then:
If the 'date' variable is earlier, its value should be reassigned to "earlier status".
If the 'date' variable is later, its value should be reassigned to "Current status".
if there is only one date for the id, the value will be reassigned to "current status". and the "earlier status" need to be missing.
if there are more than two dates for the id, then the value for the middle date will be ignored, and only use the earlier and most current value.
Any thoughts? Much appreciated!
This is the code that I have tried:
data origin;
input id date mmddyy8. status;
datalines;
1 1/1/2010 0
1 1/1/2011 1
2 2/2/2002 1
3 3/3/2003 1
3 2/5/2010 0
4 1/1/2000 0
4 1/1/2003 0
4 1/1/2005 1
;
run;
proc print; format date yymmdd8.; run;
proc sort data=origin out=a1;
by id date;
run;
data need; set a1;
if first.date then EarlierStatus=status;
else if last.date then CurrentStatus=status;
by id;
run;
proc print; format date yymmdd8.; run;
So, a couple of things. First - note a few corrections to your code - in particular the : which is critical if you're going to input with mixed list style.
Second; you need to retain EarlierStatus. Otherwise it gets cleared out each data step iteration.
Third, you need to use first.id not first.date (and similar for last) - what first is doing there is saying "This is the first iteration of a new value of id". Date is what you'd say in English ("The first date for that...").
Finally, you need a couple of more tests to set your variables the way you have them.
data origin;
input id date :mmddyy10. status;
format date mmddyy10.;
datalines;
1 1/1/2010 0
1 1/1/2011 1
2 2/2/2002 1
3 3/3/2003 1
3 2/5/2010 0
4 1/1/2000 0
4 1/1/2003 0
4 1/1/2005 1
;
run;
proc sort data=origin out=a1;
by id date;
run;
data need;
set a1;
by id;
retain EarlierStatus;
if first.id then call missing(EarlierStatus); *first time through for an ID, clear EarlierStatus;
if first.id and not last.id then EarlierStatus=status; *if it is first time for the id, but not ONLY time, then set EarlierStatus;
else if last.id then CurrentStatus=status; *if it is last time for the id, then set CurrentStatus;
if last.id then output; *and if it is last time for the id, then output;
run;
The if/elses that I do there could be done slightly differently, depending on how you want to do things exactly, I was trying to keep things a bit direct as far as how they relate to each other.
This proc sql will get what you want:
proc sql;
create table need as
select distinct
t1.id,
t2.EarlierStatus,
t1.CurrentStatus
from (select distinct
id,
date,
status as CurrentStatus
from origin
group by id
having date=max(date)) as t1
left join (select distinct
id,
date,
status as EarlierStatus
from origin
group by id
having date ~= max(date)) as t2 on t1.id=t2.id;
quit;
The above code has two subqueries. In the first subquery, you retain only the rows with the max of date by id, and rename status to CurrentStatus. In the second subquery, you retain all the rows that do not have the max of date by id and rename status to EarlyStatus. So if your origin table has only one date for one id, it is also the max and you will delete this row in the second subquery. Then you perform a left join between the first and the second subqueries, pulling EarlyStatus from the second into the first query. If EarlyStatus is not found, then it goes missing.
Best,
I am facing an issue, while importing an excel file into sas environment. So basically in the Excel file there are few columns named as
Geography
AR_NO
31-Jan-18
28-Feb-18
31-Mar-18
30-Apr-18
31-May-18
30-Jun-18
After using below the code - >
%macro FX_Lkup(sheet);
FILENAME FXFILE "/idn/home/Module2/excel.xlsx";
PROC IMPORT DATAFILE=FXFILE
DBMS=XLSX
OUT=&sheet.
REPLACE
;
SHEET="&sheet.";
RUN;
%mend FX_Lkup;
%FX_Lkup(LENDING_TEMPLATE);
%FX_Lkup(2018FXRates);
SAS data print the columns as
Geography
AR_NO
43131
43159
43190
43220
and so on.
Does any have solution on that? Any lead would be really appreciated : )
Thanks !
It is correctly imported, SAS uses numbers to store dates. in order to have a date in your final table, you need to declare format = AFRDFDE7. for instance
If you have mixed character and numeric values in the same column then SAS will be forced to create the variable as character. When it does that it stores the number that Excel uses for the date as a string of digits. To convert it to a date in SAS first convert the string of digits to a number and then adjust the number to account for the difference in how SAS and Excel count days.
data want ;
set LENDING_TEMPLATE ;
date = input(geography,??32.) + '30DEC1899'd;
format date date9.;
run;
Dates as headers
If your excel file is using dates as column headers then SAS will also convert them to digit strings since variable names are always characters strings, not numbers. One quick way to fix it is to use PROC TRANSPOSE. This will be easy when each row is uniquely identified by the other variables and when all of the "date" variables are numeric.
proc transpose data=LENDING_TEMPLATE out=tall ;
by geography ar_no ;
run;
data tall ;
set tall ;
date = input(_name_ , 32.) + '30DEC1899'd ;
format date date9. ;
drop _name_;
run;
You could stop here as now you have a useful dataset where the date values are in a variable instead of hiding in the metadata (variable name).
To get back to your original wide layout just add another PROC TRANSPOSE and tell it to use DATE as the ID variable.
proc transpose data=tall out=wide ;
by geography ar_no ;
id date;
var col1;
run;
IMPORT is using Excel date valued column headers as Excel epoch date numbers.
Use Proc DATASETS to change the column label, and possibly rename the columns to something generic such as DATE1-DATE6. Or, continue on, and further unpivot the data into a categorical form with columns GEO, AR_NO, DATE, VALUE
You might be asking yourself "Where do those numbers, such as 43131, come from?", or "What is an Excel epoch date number?"
They are unformatted Excel date values. The human readable date represented by a number is determined by a systems epoch, or the date represented by the number 0.
Different systems of time keeping have different epochs (starting points) and time units. Some examples:
21DEC1899 Excel datetime number 0, 1 = 1 day
01JAN1960 SAS date number 0, 1 = 1 day
01JAN1960 SAS datetime number 0, 1 = 1 second
01JAN1970 Unix OS datetime number 0, 1 = 1 second
To convert an Excel date number to a SAS date number you need to subtract 21916, which is the number of days from 31DEC1899 to 01JAN1960
This understanding of date epochs will be used when setting the label of a SAS column and renaming the column.
For others fiddling with code, the following will create an Excel worksheet having Date valued column headers. I speculate such a situation can otherwise arise when importing a worksheet containing an Excel pivot table.
First create some sample SAS data
data demo_tall;
do Geography = 'Mountains', 'Plains';
do AR_NO = 1 to 3;
_n_ = 0;
length label $200;
do label = '31-Jan-18', '28-Feb-18', '31-Mar-18',
'30-Apr-18', '31-May-18', '30-Jun-18'
;
_n_ + 1;
name = cats('Date',_n_);
value + 1;
output;
end;
end;
end;
run;
proc transpose data=demo_tall out=demo_wide(drop=_name_);
by Geography AR_NO;
var value;
id name;
idlabel label;
run;
Sample SAS data set (pivoted with transpose)
Then create Excel sheet with Excel date valued and formatted column headers
ods noresults;
ods excel file='%TEMP%\across.xlsx' options(sheet_name='Sample');
data _null_;
declare odsout xl();
if 0 then set demo_wide;
length _name_ $32;
xl.table_start();
* header;
xl.row_start();
do _n_ = 1 to 100; * 100 column guard;
call vnext(_name_);
if _name_ = '_name_' then leave;
_label_ = vlabelx(_name_);
_date_ = input(_label_, ?? date9.);
* make some header cells an Excel date formatted value;
if missing(_date_) then
xl.format_cell(data:_label_);
else
xl.format_cell(
data:_date_,
style_attr:"width=9em tagattr='type:DateTime format:dd-mmm-yy'"
);
end;
xl.row_end();
* data rows;
do _n_ = 1 by 1 while (not lastrow);
set demo_wide end=lastrow;
xl.row_start();
call missing(_name_);
do _index_ = 1 to 100; * 100 column guard;
call vnext(_name_);
if _name_ = '_name_' then leave;
xl.format_cell(data:vvaluex(_name_));
end;
xl.row_end();
end;
xl.table_end();
stop;
run;
ods excel close;
ods results;
Excel file created
IMPORT Excel worksheet
Log will show the 'funkiness' of date valued column headers
options msglevel=I;
proc import datafile='%temp%\across.xlsx' dbms=xlsx replace out=want;
sheet = "Sample";
run;
proc contents noprint data=want out=want_meta(keep=name label varnum);
run;
----- LOG -----
1380 proc import datafile='%temp%\across.xlsx' dbms=xlsx replace out=want;
1381 sheet = "Sample";
1382 run;
NOTE: Variable Name Change. 43131 -> _43131
NOTE: Variable Name Change. 43159 -> _43159
NOTE: Variable Name Change. 43190 -> _43190
NOTE: Variable Name Change. 43220 -> _43220
NOTE: Variable Name Change. 43251 -> _43251
NOTE: Variable Name Change. 43281 -> _43281
NOTE: VARCHAR data type is not supported by the V9 engine. Variable Geography has been converted
to CHAR data type.
NOTE: The import data set has 6 observations and 8 variables.
NOTE: WORK.WANT data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Modify the header (metadata) of the imported data set
Date valued column names will be renamed DATE1-DATE6 and the label will be changed to be the corresponding date in SAS format DATE11. (dd-mon-yyyy)
%let renames=;
%let labels=;
data _null_;
length newname $32;
length renames labels $32767;
retain renames labels;
set meta end=lastvar;
date = coalesce(input(label, ?? 5.),1e9) + '31dec1899'd;
if '01jan1980'd < date < today() then do;
index + 1;
newname = cats('DATE',index);
label = quote(trim(put(date,date11.)));
labels = catx(' ', labels, catx('=',name,label));
renames = catx(' ', renames, catx('=',name,newname));
end;
if lastvar;
if not missing(labels) then call symput('labels', trim('LABEL ' || labels));
if not missing(renames) then call symput('renames', trim('RENAME ' || renames));
run;
proc datasets nolist lib=work;
modify want;
&labels;
&renames;
run;
quit;
%symdel labels renames;
%let syslast = want;
The result, when printed.
Optional
Unpivot to a categorical form (tall layout)
proc transpose data=want out=stage1(rename=(col1=value _label_=date_string));
by geography ar_no;
var date:;
label _name_ = ' ';
label date_string = ' ';
run;
data want_tall;
set stage1;
date = input (date_string, date11.);
format date date11.;
keep geography ar_no _name_ date value;
run;
I'm having some frustration with dates in SAS.
I am using proc forecast and am trying make my dates spread evenly. I did some pre-processing wiht proc sql to get my counts by month but my dates are incorrect.
Though my dataset looks good (b/c I used format MONYY.) the actual value of that variable is wrong.
date year month count
Jan10 2010 1 100
Feb10 2010 2 494
...
..
.
The Date value is actually the full SAS representation of the date (18267), meaning that it includes the day count.
Do I need to convert the variable to a string and back to a date or is there a quick proc i can run?
My goal is to use the date variable with proc forecast so I only want Month and year.
Thanks for any help!
You can't define a date variable in SAS (so the number of days passed from 1jan1960) excluding the day.
What you can do is to hide the day with a format like monyy. but the underlying number will always contain that information.
Maybe you can use the interval=month option in proc forecast?
Please add some detail about the problem you're encountering with the forecast procedure.
EDIT: check this example:
data past;
keep date sales;
format date monyy5.;
lu = 0;
n = 25;
do i = -10 to n;
u = .7 * lu + .2 * rannor(1234);
lu = u;
sales = 10 + .10 * i + u;
date = intnx( 'month', '1jul1991'd, i - n );
if i > 0 then output;
end;
run;
proc forecast data=past interval=month lead=10 out=pred;
var sales;
id date;
run;
I got 2 problems regarding the SAS date using macros. To make it more complicated I am stuck with 2 specific macros that i need to use(its part of the puzzle that I try to solve).
The macro that I need to use are:
%let id=741852
%let month=January February March April May June July August September October November December
The output that I need to generate is the grade rsults for students in different dicipline. Only by changing the ID of the student the output has to be updated all by itself.
The information related to the date are only needed in the Title of my Output. my code at the moment is as follow:
Title1 "Grade for &firstname &lastname;
Tilte2 "Bithtdate : &bday;
Title3 "ID :&id"
title5 "As of &sysdate, the grades are:"
To create the bday variable I used the a function since i had the info in my data set:
CALL SYMPUTX('bday',Birth_date)
At the moment my output title 2 and 4 are as follow:
Birtdate:12556
As of 17NOV12, the grades are:
How can I use the macro &month to have both title read as follow: Birthdate: 10 Janurary 2012 and As of 15 November 2012, the grade are as follow:
(**The date may seems wrong but im working in french and days come before the month)
I tought of the %SCAN fonction but it wont udate the month if I cange the ID. plz help :)
It's not clear to me what exactly you are trying to accomplish, but here is an example of something similar. I set the locale to French to show how the date is formatted.
data a;
length firstname lastname $20;
input id firstname $ lastname $ grade birthday :date9. ;
datalines;
741852 Mary Jones 92.3 01Jan1980
654654 Chuck Berry 76.9 02Mar1983
823983 Michael Jordan 81.2 04Apr1965
;
run;
options locale=FR;
%macro printinfo(id, ds);
data _null_;
set &ds;
where id=&id;
put "-----------------------------------";
put " Grade for: " firstname lastname;
put " Birthday : " birthday nldate.;
put " ID : " id;
put " As of &sysdate., the grade is: " grade;
put "-----------------------------------";
put " ";
run;
%mend;
option nonotes;
%printinfo(741852,a);
%printinfo(654654,a);
option notes;
Here is the log output
-----------------------------------
Grade for: Mary Jones
Birthday : 01 janvier 1980
ID : 741852
As of 20NOV12, the grade is: 92.3
-----------------------------------
7299 %printinfo(654654,a);
-----------------------------------
Grade for: Chuck Berry
Birthday : 02 mars 1983
ID : 654654
As of 20NOV12, the grade is: 76.9
-----------------------------------
Without changing your other code, try these two title statements:
title2 "Birthdate: %qleft(%sysfunc(putn(&bday,worddatx.)))";
title5 "As of %qleft(%sysfunc(putn(%sysfunc(today()),worddatx.))) the grades are:";
Basically, your first macro variable bday needs to be formatted using the WORDDATX format. Also, you should use the system function TODAY() to get the current system date so you can format it as you want.
The %SYSFUNC macro function lets you execute other SAS functions, in this case PUTN and TODAY(). The %QLEFT macro function trims leading blanks.