OK I'll start with the problem:
I have product tables being created every week which are named in the format:
products_20130701
products_20130708
.
.
.
I'm trying to automate some campaign analysis so that I don't have to manually change the table name in the code every week to use whichever product table is the first one after the maximum end date of my campaign.
e.g
%put &max_enddate.;
/*20130603*/
my product tables in June are:
products_20130602
*products_20130609*
products_20130616
products_20130623
in this instance i would like to use the second table in the list, ignoring over 12 months worth of product tables and just selecting the table who's date is just after my max_enddate macro.
I've been Googling all day and I'm stumped so ANY advice would be much appreciated.
Thanks!
A SQL solution:
data product_20130603;
run;
data product_20130503;
run;
data product_20130703;
run;
%let campdate=20130601;
proc sql;
select min(memname) into :datasetname from dictionary.tables
where libname='WORK' and upcase(scan(memname,1,'_'))='PRODUCT' and
input(scan(memname,2,'_'),YYMMDD8.) ge input("&campdate.",YYMMDD8.);
quit;
Now you have &datasetname that you can use in the set statement, so
data my_analysis;
set &datasetname;
(whatever you are doing);
run;
Modify 'WORK' to the appropriate libname, and if there are any other restrictions add those as well. You might get some warnings about invalid dates if you have product_somethingnotadate, but that shouldn't matter.
The way this works - the dictionary.tables is a list of all tables in all libnames you have accessed (same as sashelp.vtable, but only available in PROC SQL). First this selects all rows that have a name with a date greater than or equal to your campaign end date; then it takes the min(memname) from that. Memname is of course a string, but in strings that are identical except for a number, you can still use min and get the expected result.
This is probably not suitable for your application, however I find it very useful for the datasets I have as they absolutely must exist for each Sunday and I evaluate the existence of the dataset at the beginning of my code. If they don't exist then it sends an email to our IT guys that tells them that the file is missing and needs to be re-created\restored.
%LET DSN = PRODUCTS_%SYSFUNC(PUTN(%SYSFUNC(INTNX(WEEK.2,%SYSFUNC(INPUTN(&MAX_ENDDATE.,YYMMDD8.)),0,END)),YYMMDDN8.));
With the other suggestions above they will only give you results for datasets that exist, therefore if the one you should have been using has been deleted then it will grab the next one and run the job regardless.
First, get all possible tables:
data PRODUCT_TABLES;
set SASHELP.VTABLE (keep=libname memname);
*get what you need, here i keep it simple;
where lowcase(substr(memname,1,9))='products_';
run;
Next, sort it by date, easily done due to the format of your dataset names.
proc sort data=PRODUCT_TABLES;
by memname;
run;
Finally, you just need to get out the first record where the date is large enough.
data _NULL_;
set PRODUCT_TABLES;
*compare to your macro variable, note that i keep it as simple as possible and let SAS implicitly convert to numeric;
if substr(memname,10,18)>=symgetn("max_enddate") then do;
*set your match into a macro variable, i have put together the libname and memname here;
call symput("selectedTable",cats(libname,'.',memname));
stop; *do not continue, otherwise you will output simply the latest dataset;
end;
run;
Now you can just put the macro variable when you want to use the appropriate dataset, e.g.:
data SOME_TABLE;
set &selectedTable.;
/*DO SOME STUFF*/
run;
Related
I have some SAS code from my editor here. I am learning to use SAS (this is my first time using it), so I'm not sure how much code is relevant.
proc import
datafile="C:\Users\barnedsm\Desktop\SAS\ToothGrowth.csv"
dbms=csv
out=tooth;
proc print data=tooth (obs=5);
run;
6. create two SAS data sets ToothGrowth_OJ and ToothGrowth_VC for the animals with the
delivery method orange juice and ascorbic acid, respectively. (5 points)
data ToothGrowth_OJ;
set tooth;
where (supp="OJ");
proc print data=ToothGrowth_OJ (obs=5);
run;
data ToothGrowth_VC;
set tooth;
where (supp="VC");
proc print data=ToothGrowth_VC (obs=5);
run;
7. save the two SAS data sets in a permanent folder on your computer. (5 points)
libname mylibr "C:\Users\barnedsm\Desktop\SAS";
data mylibr.ToothGrowth_OJ_permanent;
set ToothGrowth_OJ;
run;
libname mylibr "C:\Users\barnedsm\Desktop\SAS";
data mylibr.ToothGrowth_VC_permanent;
set ToothGrowth_VC;
run;
For the final question on my assignment, I am wanting to re-combine the last two datasets I made (ToothGrowth_OJ and ToothGrowth_VC) into one dataset (ToothGrowth_combined). How would I do this? My thoughts would be to use a subset function like I used to separate the two. The code I have in mind is below.
data ToothGrowth_combined;
set ToothGrowth_OJ(where=(supp="OJ"));
keep supp Len;
run;
This would tell SAS to keep the values from the ToothGrowth_OJ dataset that have OJ in the "supp" columns (which is all of them) and to keep the variable Len. Assuming that I have done this code correctly, I want to add in the values from my ToothGrwoth_VC dataset in a similar way, but the output is an empty dataset when I run the same code, but replace the "ToothGrowth_OJ" with "ToothGrowth_VC". Is there a way to use the subset code to take these two separate datasets and combine them into one, or an easier way?
Your starting code is doing these steps.
Using PROC IMPORT to guess how to read text file into a dataset.
Creates a subset of the data with only some of the observations.
Creates a second subset of the data.
To recombine the two subsets use the SET statement and list all of the input datasets you want. To limit the number of variables written to the output dataset use a KEEP statement.
data ToothGrowth_combined;
set ToothGrowth_OJ ToothGrowth_VC ;
keep supp Len;
run;
I am not sure why you added the WHERE= dataset option in your code attempt since by the way they were created they each only have observations with a single value of SUPP.
If you want to combine the permanent datasets instead (for example if you started a new SAS session with an empty WORK library) then use those dataset names instead in the SET. Just make sure the libref that points to them is defined in this SAS session.
libname mylibr "C:\Users\barnedsm\Desktop\SAS";
data ToothGrowth_combined;
set mylibr.ToothGrowth_OJ_permanent mylibr.ToothGrowth_VC_permanent;
keep supp Len;
run;
I am new to stack overflow. I am also a beginner in SAS. I have two datasets: one with a list of ID's and medications by date and one with ID's and dates by admission number. I am trying to get a list of medications by ID, organized by admission number in SAS.
I've tried merging by ID number and creating an "admission number" variable by using:
if admission_date-admission_date_1=0 then admission_number="Admission 1"
but all values are missing when I do that.
Here's what I have:
Here's what I want:
Thank you for your help!
Doesn't seem like that second data set is useful at all. What you're doing there is creating a enumeration variable which can be accomplished using a BY variable.
proc sort data=have; by id admission_date; run;
data want;
set have;
by id admission_date;
if first.id then admission_number=0;
if first.date then admission_number + 1;
run;
More details are available on the methods here if needed.
https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/
I have a dataset, that each id has multiple incomplete records, it could make more sense to have a final dataset as shown. Basically the idea is to have non-missing data fill the blanks wherever the value is from the 1st line or 2nd line, as long as for the same id.
The easiest way to do this is the self-update. This uses the core property of the update statement, that only non-missing values can replace other values, in a fun way that allows the rows to be simplified like this. The first obs=0 is there simply to give an empty base to update from - the dataset is really being read in from the second mention on that statement.
data have;
id = 1;
input x y z;
datalines;
1 . .
. 1 .
. . 1
;;;;
run;
data want;
update have(obs=0) have;
by id;
run;
proc sql;
create table need as
Select ID, max(v1) as v1,
max(v2) as v2,
max(v3) as v3,
max(v4) as v4
from have;
quit;
I'm importing a semi-colon delimited file as such
ID Segment Number Date Payment
1 A1 103RTR 10OCT17 10
2 A1 205FCD 11OCT17 11
...
the SAS doesn't like the mixture of numbers and characters when I import this txt file using this code:
proc import
out=want (drop=Number)
datafile="have"
dbms=dlm
replace;
delimiter=';';
options validvarname=v7 missing='';
run;
Even though i'm not trying to load in Number, which in the real dataset is much longer, like 12 numbers followed by four characters, it returns this error in the log
NOTE: Invalid data for Number in line 22157 21-30.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
ERROR: Import unsuccessful. See SAS Log for details.
I would like to do a typical infile and informat but with having 32 variables and 2 million rows, I just cannit be taking the time to find out what range and style each variable needs to be read in. so I am asking whether there's a way to format that particular variable but sticking with the ease of proc import.
But I'm also asking whether this actually impacts my import? as the data seems fine when checking the output.
I would like to do a typical infile and informat but with having 32
variables and 2 million rows, I just cannit be taking the time to find
out what range and style each variable needs to be read in. so I am
asking whether there's a way to format that particular variable but
sticking with the ease of proc import.
Bad idea, garbage in = garbage out and you're only dealing with 32 variables so that's actually not that bad. Take the time to clean and import the data correctly pays off and you learn about the data in the process which speeds up further analysis. This step is not a waste of time.
After importing a data set, its a good idea to run a PROC MEANS and PROC FREQ and review the output to ensure it was read correctly.
proc means data=have;
run;
proc freq data=have;
run;
Set GUESSINGROWS=MAX in the PROC IMPORT. This forces SAS to scan the whole file before importing it, which will then be more likely correct. If you're automating this process and reading the file more than once, then take the code from the log and use that instead of PROC IMPORT, once you've verified the data.
And the option statement should not be within the PROC IMPORT step, it goes before.
options validvarname=v7 missing='';
proc import
out=want (drop=Number)
datafile="have"
dbms=dlm
replace;
delimiter=';';
guessingrows=max;
run;
As the title suggests I'd like to copy SAS tables from a Library to another but not all tables. I'd like to copy the tables which names start with 's' for example.
I know that I have to use proc datasets copy but which option? How ?
(English isn't my first English so Im sorry if my question isnt clear))
It is probably easier to just use PROC COPY. You can use : as a wildcard in the SELECT statement.
12220 proc copy inlib=work outlib=out;
12221 select c: / mtype=data ;
12222 run;
NOTE: Copying WORK.CHECK to OUT.CHECK (memtype=DATA).
NOTE: There were 3 observations read from the data set WORK.CHECK.
NOTE: The data set OUT.CHECK has 3 observations and 4 variables.
NOTE: Copying WORK.CLASS to OUT.CLASS (memtype=DATA).
NOTE: There were 19 observations read from the data set WORK.CLASS.
NOTE: The data set OUT.CLASS has 19 observations and 5 variables.