Facing problem importing a csv file while using macro variable in SAS - import

I wrote a code in unix SAS to import multiple csv files from current folder. The macro variables are being assigned correct values but somehow the relevant files are not being imported. I am getting the following error message
ERROR: Physical file does not exist, /work/pricepromo/modeler/tolapa01/pawan/&j..csv.
ERROR: Import unsuccessful. See SAS Log for details.
Below is the code.
OPTIONS MERROR MPRINT SERROR MLOGIC SYMBOLGEN ;
X ls *.csv > list;
data name ;
infile 'list' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=1 ;
informat name_list $9. ;
format name_list $9. ;
input
name_list $
;
run;
data name2;
set name;
name_mod=translate(name_list,'','.csv');
run;
proc sql;
select name_mod into :name separated by '*' from name2;
%let count2 = &sqlobs;
quit;
%macro yy;
%do i = 1 %to &count2;
%let j = %scan(&name,&i,*);
proc import out = &j datafile='./&j..csv'
dbms=csv replace;
run;
%end;
%mend;
%yy;

Try using double quotes
datafile="./&j..csv"
not
datafile='./&j..csv'
With all those options it should have been obvious from reading the SAS log.

Related

SAS proc import guessingrows issue

I'm trying to import csv file to SAS using proc import; I know that guessingrows argument will determine automatically the type of variable for each column for my csv file. But there is an issue with one of my CSV file which has two entire columns with blank values; those columns in my csv file should be numeric, but after running the below code, those two columns are becoming character type, is there any solutions for how to change the type of those two columns into numeric during or after importing it to SAS ?
Here below is the code that I run:
proc import datafile="filepath\datasetA.csv"
out=dataA
dbms=csv
replace;
getnames=yes;
delimiter=",";
guessingrows=100;
run;
Thank you !
Modifying #Richard's code I would do:
filename csv 'c:\tmp\abc.csv';
data _null_;
file csv;
put 'a,b,c,d';
put '1,2,,';
put '2,3,,';
put '3,4,,';
run;
proc import datafile=csv dbms=csv replace out=have;
getnames=yes;
run;
Go to the LOG window and see SAS code produced by PROC IMPORT:
data WORK.HAVE ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile CSV delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat a best32. ;
informat b best32. ;
informat c $1. ;
informat d $1. ;
format a best12. ;
format b best12. ;
format c $1. ;
format d $1. ;
input
a
b
c $
d $
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
Run this code and see that two last columns imported as characters.
Check it:
ods select Variables;
proc contents data=have nodetails;run;
Possible to modify this code and load required columns as numeric. I would not drop and add columns in SQL because this columns could have data somewhere.
Modified import code:
data WORK.HAVE ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile CSV delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat a best32. ;
informat b best32. ;
informat c best32;
informat d best32;
format a best12. ;
format b best12. ;
format c best12;
format d best12;
input
a
b
c
d
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
Check table description:
ods select Variables;
proc contents data=have nodetails;run;
You can change the column type of a column that has all missing value by dropping it and adding it back as the other type.
Example (SQL):
filename csv 'c:\temp\abc.csv';
data _null_;
file csv;
put 'a,b,c,d';
put '1,2,,';
put '2,3,,';
put '3,4,,';
run;
proc import datafile=csv dbms=csv replace out=have;
getnames=yes;
run;
proc sql;
alter table have
drop c, d
add c num, d num
;

SAS: How to reference a global macro variable to create new table or dataset?

I'm having some trouble referencing a global macro variable outside of the macro to create a new data set. The global variable was created to run a loop for creating several yearly data sets using a vector of specified years, as you can see in the code below:
%macro loopyear;
%global year;
%do year = 2004 %to 2017;
proc import datafile = "C:\Filepath\blah.txt"
dbms = dlm out = blah&year.; /*Creates a dataset for each year, e.g. blah2004, blah2005, etc.) */
delimiter = " ";
getnames = no;
run;
data blah&year.;
set blah&year.;
year = &year.;
run;
proc sql;
create table blah&year._rail as
select year, var1, var2, var3, var4
from blah&year.
where var2= "rail";
quit;
%end;
%mend loopyear;
%loopyear;
/*Merge all year datasets into one master set*/
data blah_total;
set blah&year._rail;
run;
When I try to create the master data set outside of the macro, however, I get the following error:
data blah;
set blah&year._rail;
run;
ERROR: File work.blah2018_rail.data does not exist
This is frustrating because I'm only trying to create the master set based on 2004-2017 data, as referenced in the macro variable. Can someone help me pinpoint my error -- is it in the way I defined the global variable, or am I missing a step somewhere? Any help is appreciated.
Thanks!
This is an interesting quirk of both macro and data step do-loops in SAS - the loop counter is incremented before the exit condition is checked, so after your loop has run it will be one increment past your stop value, e.g.:
%macro example;
%do i = 1 %to 3;
%put i = &i;
%end;
%put i = &i;
%mend;
%example;
Output:
i = 1
i = 2
i = 3
i = 4
For your final step you probably want the set statement to look like this:
set blah2004_rail ... blah2017_rail;
You could write a macro loop to generate the list and move the data step inside your macro, e.g.
set %do year = 2004 %to 2017; blah&year._rail %end;;
The second semi-colon is important! You need one to close the %end and one to terminate the set statement.
Change your naming structure. Have a common prefix and put the year at the end, then you can use the semi colon to short reference all the datasets at once.
%macro loopyear;
%global year;
%do year = 2004 %to 2017;
proc import datafile = "C:\Filepath\blah.txt"
dbms = dlm out = blah&year.; /*Creates a dataset for each year, e.g. blah2004, blah2005, etc.) */
delimiter = " ";
getnames = no;
run;
data blah&year.;
set blah&year.;
year = &year.;
run;
proc sql;
create table blah_rail_&year. as
select year, var1, var2, var3, var4
from blah&year.
where var2= "rail";
quit;
%end;
%mend loopyear;
%loopyear;
/*Merge all year datasets into one master set*/
data blah_total;
set blah_rail: ;
run;

Macro variable for each delimiter to import multiple delimited files in SAS

I have 2 different delimited files (csv and text) having the variables below respectively. The first 3 are character variables and the rest are numeric variables:Plant, Type, Treatment, conc, uptake. the text file has 5 numeric variables and a character variable.I would like to import the two files using a macro variable for every delimiter in SAS as part of an exercise.
I have the code below to extract multiple files using macro. I would like to get your advice on how to create a macro variable for every delimiter (csv, text).
%macro one (output, Sample);
proc import out=output
datafile= "C:\Users\komal\Desktop\Sample.csv"
dbms=csv replace;
getnames=yes;
run;
%mend one;
%one (output, Sample.csv);
%one (data2, datafiletwo.txt);
You import different type of data, so you need to define type of data in dbms.
%macro one (output, Sample,type);
proc import out=&output
datafile= "C:\Users\komal\Desktop\&Sample"
dbms=&type replace;
getnames=yes;
run;
%mend one;
%one (output, Sample.csv,cvs);
%one (data2, datafiletwo.xlsx,excel);
%one (class, class.txt,tab);
Thanks Shenglin
I have tried the code below and it works perfectly.
%macro one (a, b, c);
proc import out=&a
datafile= "C:\Users\komal\Desktop\&b"
dbms=&c replace;
getnames=yes;
run;
%mend one;
%one (outcsv, Sample.csv, csv);
%one (outtab, datafiletwo.txt, tab);

SAS Macro Functions vs Data Step Functions

I'm having an issue with resolving macro variables within a macro. I think the issue is the language, and how SAS is sending my statements to the Macro Processor vs. Compiler.
Here's the jist of my code:
....some import statements...
%MACRO FCERR(date=);
%LET REMHOST=MainFrame PORT;
SIGNON REMHOST USER=&SYSUSERID. PASSWORD= _PROMPT_;
%SYSLPUT date=&yymm. ;
RSUBMIT;
FILENAME FIN "MY.FILE.QUALIFIERS" DISP = shr;
......some datasteps......
LIBNAME METRO "My.File.Qualifiers" DISP=shr;
/********************************************************************
******* *********
** ** **
******* ** * **
** ** * **
******* *********
*
/*******************************************************************/
%IF %SYSFUNC(EXIST(work.EQ_&date._FIN)) %THEN %DO;
PROC UPLOAD Data = work.EQ_&date._FIN
OUT = work.EQ_&date._FIN;
..........a bunch of data steps..................
PROC SQL NOPRINT ;
select count(*) as EQB format=10.0 INTO :EQBEF from EQ_1701_FIN ;
select count(*) as EQA format=10.0 INTO :EQAFTER from trunc_fin_eq ;
QUIT ;
%PUT &EQBEF;
%PUT &EQAFTER;
%IF %SYSFUNC(STRIP(&EQBEF.)) ~= %SYSFUNC(STRIP(&EQAFTER.)) %THEN %DO;
options emailhost= MYEMAILHOST.ORG ;
filename mail email ' '
to= (&recip.)
subject = "EQ Error QA/QC";
DATA _NULL_;
file mail ;
put "There were potential errors processing the Equifax Error file.";
put "The input dataset contains %SYSFUNC(STRIP(&EQBEF.)) observations.";
put "The output dataset contains %SYSFUNC(STRIP(&EQAFTER.)) observations.";
put "Please check the SAS log for additional details.";
RUN;
%END;
%END;
%ENDRSUBMIT;
%SIGNOFF;
%MEND;
%FCERR(date=&yymm.);
I keep getting an error that is stopping my macro from processing. This is it:
> SYMBOLGEN: Macro variable EQBEF resolves to 24707
> 24707
> MLOGIC(FCERR): %PUT &EQAFTER
> WARNING: Apparent symbolic reference EQAFTER not resolved.
> &EQAFTER
> SYMBOLGEN: Macro variable EQBEF resolves to 24707
> WARNING: Apparent symbolic reference EQAFTER not resolved.
> WARNING: Apparent symbolic reference EQAFTER not resolved.
> ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric
> operand is required. The condition was: %SYSFUNC(STRIP(&EQBEF.)) ~=
> %SYSFUNC(STRIP(&EQAFTER.))
> ERROR: The macro FCERR will stop executing.
Question: Is SAS trying to process my second (i.e. the inside) %IF %THEN statement before it compiles and executes the Data steps above %IF %SYSFUNC(STRIP(&EQBEF.)) ~= %SYSFUNC(STRIP(&EQAFTER.)) %THEN %DO; I can see from the log that SAS is pumping out the error before it is creating the datasets from my datasteps, and I believe that the reason &EQBEF is resolving is because it is creating using PROC UPLOAD;
If so, How can I prevent SAS from executing the second %IF %THEN until the datasteps are processed, since my second select statement in proc sql; is dependent upon the datasteps executing.
Also, I'm having trouble getting my date variable to resolve in proc sql;
E.G.
PROC SQL NOPRINT ;
select count(*) as EQB format=10.0 INTO :EQBEF from EQ_1701_FIN ;
select count(*) as EQA format=10.0 INTO :EQAFTER from trunc_fin_eq ;
QUIT ;
is ideally:
PROC SQL NOPRINT ;
select count(*) as EQB format=10.0 INTO :EQBEF from EQ_&DATE._FIN ;
select count(*) as EQA format=10.0 INTO :EQAFTER from trunc_fin_eq ;
QUIT ;
but &DATE. won't resolve in that proc sql statement, but resolves perfectly fine in all my of my libname statements, etc. Is there some contingency as to why &date. won't resolve within PROC SQL?.....Do I need to have every variable used in my macro referenced in the Parameter List?
Your macro is running on your local SAS session, but because of the RSUBMIT; and ENDRSUBMIT; statements your SQL code that is generating the macro variable is running on the remote SAS session but the macro statements are referencing the local macro variable.
For example try this simple program that creates a local and a remote macro variable and tries to show the values.
signon rr sascmd='!sascmd';
%let mvar=Local ;
%syslput mvar=Remote ;
%put LOCAL &=mvar;
rsubmit rr;
%put REMOTE &=mvar ;
endrsubmit;
signoff rr;
If you run it in open SAS the %PUT statements will show that MVAR is equal to LOCAL and REMOTE , respectively.
But it you wrap inside a macro a run it
%macro xx;
signon rr sascmd='!sascmd';
%let mvar=Local ;
%syslput mvar=Remote ;
%put LOCAL &=mvar;
rsubmit rr;
%put REMOTE &=mvar ;
endrsubmit;
signoff rr;
%mend xx;
options mprint;
%xx;
You will see that both %PUT statements run in the local server and display the value of the local macro variable.
Check the log for the second select
select count(*) as EQA format=10.0 INTO :EQAFTER from trunc_fin_eq ;
If that data set doesn't exist, the macro variable won't be created.
You can set it to 0 to initialize it:
%let EQAFTER=0;

Extracting certain rows from data using hash object in SAS

I have two SAS data tables. The first has many millions of records, and each record is identified with a sequential record ID, like this:
Table A
Rec Var1 Var2 ... VarX
1 ...
2
3
The second table specifies which rows from Table A should be assigned a coding variable:
Table B
Code BegRec EndRec
AA 1200 4370
AX 7241 9488
BY 12119 14763
So the first row of Table B means any data in Table A that has rec between 1200 and 4370 should be assigned code AA.
I know how to accomplish this with proc sql, but I want to see how this is done with a hash object.
In SQL, it's just:
proc sql;
select b.code, a.*
from tableA a, tableB b
where b.begrec<=a.rec<=b.endrec;
quit;
My actual data contains hundreds of gigabytes of data, so I want to do the processing as efficiently as possible. My understanding is that using a hash object may help here, but I haven't been able to figure out how to map what I'm doing to use that way.
A hash object solution (data input code borrowed from #Rob_Penridge).
data big;
do rec = 1 to 20000;
output;
end;
run;
data lookup;
input Code $ BegRec EndRec;
datalines;
AA 1200 4370
AX 7241 9488
BY 12119 14763
;
run;
data created;
format code $4.;
format begrec endrec best8.;
if _n_=1 then do;
declare hash h(dataset:'lookup');
h.definekey('Code');
h.definedata('code','begrec','endrec');
h.definedone();
call missing(code,begrec,endrec);
declare hiter iter('h');
end;
set big;
iter.first();
do until (rc^=0);
if begrec <= rec <= endrec then do;
code_dup=code;
end;
rc=iter.next();
end;
keep rec code_dup;
run;
I'm not sure a hash table would even be the most efficient approach here. I would probably solve this problem using a SELECT statement as the conditional logic will be fast and it still only requires 1 parse through the data:
select;
when ( 1200 <= _n_ <=4370) code = 'AA';
...
otherwise;
end;
Assuming that you will need to run this code multiple times and the data may change each time you may not want to hardcode the select statement. So the best solution would dynamically build it using a macro. I have a utility macro I use for these kinds of situations (included at the bottom):
1) Create the data
data big;
do i = 1 to 20000;
output;
end;
run;
data lookup;
input Code $ BegRec EndRec;
datalines;
AA 1200 4370
AX 7241 9488
BY 12119 14763
;
run;
2) Save the contents of the smaller table into macro variables. You could also do this using call symput or other preferred method. This method assumes you don't have too many rows in your lookup table.
%table_parse(iDs=lookup, iField=code , iPrefix=code);
%table_parse(iDs=lookup, iField=begrec, iPrefix=begrec);
%table_parse(iDs=lookup, iField=endrec, iPrefix=endrec);
3) Dynamically build the SELECT statement.
%macro ds;
%local cnt;
data final;
set big;
select;
%do cnt=1 %to &code;
when (&&begrec&cnt <= _n_ <= &&endrec&cnt) code = "&&code&cnt";
%end;
otherwise;
end;
run;
%mend;
%ds;
Here is the utility macro:
/*****************************************************************************
** MACRO.TABLE_PARSE.SAS
**
** AS PER %LIST_PARSE BUT IT TAKES INPUT FROM A FIELD IN A TABLE.
** STORE EACH OBSERVATION'S FIELD'S VALUE INTO IT'S OWN MACRO VARIABLE.
** THE TOTAL NUMBER OF WORDS IN THE STRING IS ALSO SAVED IN A MACRO VARIABLE.
**
** THIS WAS CREATED BECAUSE %LIST_PARSE WOULD FALL OVER WITH VERY LONG INPUT
** STRINGS. THIS WILL NOT.
**
** EACH VALUE IS STORED TO ITS OWN MACRO VARIABLE. THE NAMES
** ARE IN THE FORMAT <PREFIX>1 .. <PREFIX>N.
**
** PARAMETERS:
** iDS : (LIB.DATASET) THE NAME OF THE DATASET TO USE.
** iFIELD : THE NAME OF THE FIELD WITHIN THE DATASET.
** iPREFIX : THE PREFIX TO USE FOR STORING EACH WORD OF THE ISTRING TO
** ITS OWN MACRO VARIABLE (AND THE TOTAL NUMBER OF WORDS).
** iDSOPTIONS : OPTIONAL. ANY DATSET OPTIONS YOU MAY WANT TO PASS IN
** SUCH AS A WHERE FILTER OR KEEP STATEMENT.
**
******************************************************************************
** HISTORY:
** 1.0 MODIFIED: 01-FEB-2007 BY: ROBERT PENRIDGE
** - CREATED.
** 1.1 MODIFIED: 27-AUG-2010 BY: ROBERT PENRIDGE
** - MODIFIED TO ALLOW UNMATCHED QUOTES ETC IN VALUES BEING RETURNED BY
** CHARACTER FIELDS.
** 1.2 MODIFIED: 30-AUG-2010 BY: ROBERT PENRIDGE
** - MODIFIED TO ALLOW BLANK CHARACTER VALUES AND ALSO REMOVED TRAILING
** SPACES INTRODUCED BY CHANGE 1.1.
** 1.3 MODIFIED: 31-AUG-2010 BY: ROBERT PENRIDGE
** - MODIFIED TO ALLOW PARENTHESES IN CHARACTER VALUES.
** 1.4 MODIFIED: 31-AUG-2010 BY: ROBERT PENRIDGE
** - ADDED SOME DEBUG VALUES TO DETERMINE WHY IT SOMETIMES LOCKS TABLES.
*****************************************************************************/
%macro table_parse(iDs=, iField=, iDsOptions=, iPrefix=);
%local dsid pos rc cnt cell_value type;
%let cnt=0;
/*
** OPEN THE TABLE (AND MAKE SURE IT EXISTS)
*/
%let dsid=%sysfunc(open(&iDs(&iDsOptions),i));
%if &dsid eq 0 %then %do;
%put WARNING: MACRO.TABLE_PARSE.SAS: %sysfunc(sysmsg());
%end;
/*
** GET THE POSITION OF THE FIELD (AND MAKE SURE IT EXISTS)
*/
%let pos=%sysfunc(varnum(&dsid,&iField));
%if &pos eq 0 %then %do;
%put WARNING: MACRO.TABLE_PARSE.SAS: %sysfunc(sysmsg());
%end;
%else %do;
/*
** DETERMINE THE TYPE OF THE FIELD
*/
%let type = %upcase(%sysfunc(vartype(&dsid,&pos)));
%end;
/*
** READ THROUGH EACH OBSERVATION IN THE TABLE
*/
%let rc=%sysfunc(fetch(&dsid));
%do %while (&rc eq 0);
%let cnt = %eval(&cnt + 1);
%if "&type" = "C" %then %do;
%let cell_value = %qsysfunc(getvarc(&dsid,&pos));
%if "%trim(&cell_value)" ne "" %then %do;
%let cell_value = %qsysfunc(cats(%nrstr(&cell_value)));
%end;
%end;
%else %do;
%let cell_value = %sysfunc(getvarn(&dsid,&pos));
%end;
%global &iPrefix.&cnt ;
%let &iPrefix.&cnt = &cell_value ;
%let rc=%sysfunc(fetch(&dsid));
%end;
/*
** CHECK FOR ABNORMAL TERMINATION OF LOOP
*/
%if &rc ne -1 %then %do;
%put WARNING: MACRO.TABLE_PARSE.SAS: %sysfunc(sysmsg());
%end;
/*
** ENSURE THE TABLE IS CLOSED SUCCESSFULLY
*/
%let rc=%sysfunc(close(&dsid));
%if &rc %then %do;
%put WARNING: MACRO.TABLE_PARSE.SAS: %sysfunc(sysmsg());
%end;
%global &iPrefix;
%let &iPrefix = &cnt ;
%mend;
Other examples of calling this macro:
%table_parse(iDs=sashelp.class, iField=sex, iPrefix=myTable, iDsOptions=%str(where=(sex='F')));
%put &mytable &myTable1 &myTable2 &myTable3; *etc...;
I'd be tempted to use the direct access method POINT= here, this will only read the required row numbers rather than the whole dataset.
Here is the code, which uses the same create data code as in Rob's answer.
data want;
set lookup;
do i=begrec to endrec;
set big point=i;
output;
end;
drop begrec endrec;
run;
If you have the code column already in the big dataset and you just wanted to update the values from the lookup dataset, then you could do this using MODIFY.
data big;
set lookup (rename=(code=code1));
do i=begrec to endrec;
modify big point=i;
code=code1;
replace;
end;
run;
Here's my solution, using proc format. This is also done in-memory, much like a hash table, but requires less structural code to work.
(Data input code also borrowed from #Rob_Penridge.)
data big;
do rec = 1 to 20000;
output;
end;
run;
data lookup;
input Code $ BegRec EndRec;
datalines;
ZZ 0 20
JJ 40 60
AA 1200 4370
AX 7241 9488
BY 12119 14763
;
run;
data lookup_f;
set lookup;
rename
BegRec = start
EndRec = end
Code = label;
retain fmtname 'CodeRecFormat';
run;
proc format library = work cntlin=lookup_f; run;
data big_formatted;
format rec CodeRecFormat.;
format rec2 8.;
length code $5.;
set big;
code = putn(rec, "CodeRecFormat.");
rec2 = rec;
run;