Reading SAV labels instead of values on SAS Proc Import [duplicate] - import

Sometimes if I import multiple SAV files into the SAS work library, one variable imported later on overwrites the display text (i.e., the format) of an earlier imported variable with a similar name.
I've determined that this is because the later dataset's variable produces a format name for the custom format (from SPSS Values Labels) that is identical to format name from the earlier variable, even though the variables have different definitions in the Value Labels attributes in the SAV files.
Is there a way to force SAS to not re-use the same format names by automatically checking at PROC IMPORT whether a format name already exists in the work library format library before auto-naming a new custom format? Or is there any other way of preventing this from happening?
Here is my code as well as an example of the variable names, format names, etc.
proc import out=Dataset1 datafile="S:\folder\Dataset1.SAV"
dbms=SAV replace;
run;
proc import out=DatasetA datafile="S:\folder\DatasetA.SAV"
dbms=SAV replace;
run;
Dataset1 contains variable Question_1. The original SPSS Values Labels are 1=Yes 2=No. When this dataset is imported, SAS automatically generates the Format Name QUESTION., for Question_1. When only Dataset1 is imported, the definition of format QUESTION. corresponds to the SPSS Value Labels for Question_1 in Dataset1.SAV
DatasetA contains variable Question_A with SPSS Value Labels 1=Agree 2=Unsure 3=Disagree. When this dataset is imported after Dataset1, SAS automatically generates the Format Name QUESTION. for Question_A, even though the work library already contains a format named QUESTION.. Therefore, this overwrites the definition of format QUESTION. that was generated when Dataset1 was imported. Once DatasetA is imported, the definition of format QUESTION. corresponds to the SPSS Value Labels for Question_A in DatasetA.SAV
Therefore, when Dataset1 and DatasetA are both imported, Variable Question_1 and Question_A both have the format name QUESTION assigned to them - And the definition of the format QUESTION. in the SAS work folder corresponds to the SPSS Value Labels in DatasetA.SAV, not Dataset1.SAV. Therefore, Question_1 will display as 1=Agree 2=Unsure, even though the variable values actually mean 1=Yes 2=No.
I would ideally like for these two variables to produce distinct custom format names at their import step, automatically. Is there any way to make this happen? Alternatively, is there any other way that prevent this type of overwriting from occurring?
Thank you.

The way to prevent literal overwriting is to point to a different format catalog for each SPSS file that is being read using the FMTLIB= optional statement.
proc import out=dataset1 replace
datafile="S:\folder\Dataset1.SAV" dbms=SAV
;
fmtlib=work.fmtcat1;
run;
proc import out=dataset2 replace
datafile="S:\folder\Dataset2.SAV" dbms=SAV
;
fmtlib=work.fmtcat2;
run;
You can then work later to rename the conflicting formats (and change the attached format in the dataset to use the new name).
So if the member name and format name are short enough you should be able to generate a unique new name by appending the two (add something in between to avoid conflict). So something like this will rename the formats, change the format name attached to the variables and rebuild the formats into the WORK.FORMATS catalog.
%macro sav_import(file,memname);
%if 0=%length(&memname) %then %let memname=%scan(&file,-2,\./);
proc import datafile=%sysfunc(quote(&file)) dbms=save
out=&memname replace
;
fmtlib=work.&memname ;
run;
proc format lib=work.&memname cntlout=formats;
run;
data formats ;
set formats end=eof;
by fmtname type notsorted;
oldname=fmtname;
fmtname=catx('_',"&memname",oldname);
run;
proc contents data=&memname noprint out=contents;
run;
proc sql noprint;
select distinct catx(' ',c.name,cats(f.fmtname,'.'))
into :fmtlist separated by ' '
from contents c inner join formats f
on c.format = f.oldname
;
quit;
proc datasets nolist lib=work;
modify &memname;
format &fmtlist ;
run;
quit;
proc format lib=work.formats cntlin=formats;
run;
%mend sav_import;
%sav_import(S:\folder\Dataset1.SAV);
%sav_import(S:\folder\Dataset2.SAV);

Related

Importing txt file in SAS

I tried to import text file in sas with the following code
PROC IMPORT DATAFILE= '/home/u44418748/MSc Biostatistics with SAS/Datasets/school.txt'
OUT= outdata
DBMS=dlm
REPLACE;
delimiter='09'x;
GETNAMES=YES;
RUN;
But I am getting import unsuccessful because text file has period for missing data
this is what i got in SAS log
NOTE: Invalid data for class_size in line 455 16-17.
455 CHAR 454.34.8.32.17.NA.23.125.12.188 31
ZONE 3330330303303304403323330332333
NUMR 454934989329179E1923E125912E188
sl_no=454 school=34 iq=8 test=32 ses=17 class_size=. meanses=23.125 meaniq=12.188 _ERROR_=1 _N_=454
how can load this text file in SAS
Did you create that text file from R? That package has a nasty habit of putting text values of NA for numeric values into text files. If you are the one that created the file the you might check if the system you are using has a way to not put the NA into the file to begin with. In a delimited file missing values are normally represented by having nothing for the field. So the delimiters are right next to each other. For SAS you can use a period to represent a missing value.
I wouldn't bother to use PROC IMPORT to read a delimited file. Just write a data step to read the file. Since it looks like your file only has six variables and they are all numeric the code is trivial.
data outdata;
infile '/home/u44418748/MSc Biostatistics with SAS/Datasets/school.txt'
dsd dlm='09'x firstobs=2 truncover
;
input sl_no school iq test ses class_size meanses meaniq ;
run;
One way to deal with the NA text in the input file is to replace them with periods. Since all of the fields are numeric you can do that easily because you don't have to worry about replacing real text that just happens to have the letter A after the letter N. Here is trick using the _INFILE_ automatic variable that you can use to make the change on the fly while reading the file.
data outdata;
infile '/home/u44418748/MSc Biostatistics with SAS/Datasets/school.txt'
dsd dlm='09'x firstobs=2 truncover
;
input #;
_infile_=tranwrd(_infile_,'NA','.');
input sl_no school iq test ses class_size meanses meaniq ;
run;
You are getting the NOTE: because of the NA value in the class_size field.
What you presume are periods (.) are actually tabs (hex code 09). Look under the period to confirm, the ZONE is 0 and NUMR 9. 09 is the tab character.
Proc IMPORT guesses each fields data type based on looking at the first few rows (default is 20 rows) of a text file. Your file contained only numbers the 20 rows, so the procedure guessed class_size was numeric.
There a couple of courses of action.
Do nothing. Read your log NOTES and know the places where NA occurred you will have a missing value in your data set.
or,Read the file as-is, but add GUESSINGROWS=MAX; statement to your import code
The mixed data type column class_size will be guessed as character and you might have to do another step to convert the values to numeric (a step in which the non-digit values get converted to missing values)
or, Edit the text file replacing all the NA with a period (.). The dot marks a missing value during IMPORT. The IMPORT step will have no incongruities to LOG about.
Converting a field
PROC IMPORT DATAFILE= '/home/u44418748/MSc Biostatistics with SAS/Datasets/school.txt'
DBMS=dlm REPLACE OUT=work.outdata;
delimiter='09'x;
GETNAMES=YES;
GUESSINGROWS=MAX;
RUN;
data want;
set outdata (rename=(class_size=class_size_char));
class_size = input (class_size_char, ?? best12.);
drop class_size_char;
run;

MODIFY STATEMENT SAS - EXPECTING NAME ERROR

I'm trying change the format of column using datasets whith modify statement:
PROC DATASETS LIBRARY= EU.ARRECADACAO_CONTAS_DOCINV;
MODIFY EU.ARRECADACAO_CONTAS_DOCINV;
FORMAT DTPAGTO DDMMYY10.;
FORMAT DTVENCTO DDMMYY10.;
QUIT;```
What am i doing wrong?
Tks.
Note change to LIBRARY= and MODIFY
PROC DATASETS LIBRARY= EU;
MODIFY ARRECADACAO_CONTAS_DOCINV;
FORMAT DTPAGTO DDMMYY10.;
FORMAT DTVENCTO DDMMYY10.;
QUIT;

SAS change date format

I want to define a date format that takes the following format : 12JAN2010
I tried using this code :
/* partie B question 2*/
data projet.Ophtalmo_new;
set projet.Ophtalmo_new (RENAME=(date_diagnostic=date_dia)) (RENAME=
(date_examen=date_exa));
date_diagnostic = input (date_dia, DDMMYY10.);
date_examen = input (date_exa, DDMMYY10.);
format date_diagnostic date_examen date9.;
run;
But it sends me the following syntax error :
ERROR 22-322: Syntax error, expecting one of the following: un nom, une chaîne
entre guillemets, ;,
CUROBS, END, INDSNAME, KEY, KEYRESET, KEYS, NOBS, OPEN, POINT,
_DATA_, _LAST_, _NULL_.
I'm still a sas beginner and i can't manage to get it to work properly, hope you can help, thanks.
The syntax for data set options is a single parenthetical expression. The rename option fits within:
data-set-name ( ... options ... rename=(...) );
The syntax of the RENAME option is the following:
rename=(old-name-1=new-name-1 old-name-2=new-name-2 ...)
So the correct set statement would be
set projet.Ophtalmo_new (RENAME=(date_diagnostic=date_dia date_examen=date_exa));
Because you state your are a beginner I added this section.
The code you show indicates input of the variables originally named date_diagnostic and date_examen. If these variables are indeed character variables to start, then the input is necessary to convert from character to a SAS date (which is simply a number with special meaning). If, however, the variables were already a SAS date with a format different than the one you want, you only need to update the format of the variables (or use a FORMAT statement to change the format to use during a PROC step)
data have;
x = '01-jan-2017'd;
format x ddmmyy10.;
run;
* demonstrate that the permanent format of x is ddmmyy10.;
data _null_;
put x=;
run;
* demonstrate temporary formatting of variable during step;
data _null_;
set have;
format x date9.; * modify the format temporarily during execution of data _null_;
put x=;
run;
* permanently change format of variable;
* only the dataset metadata (or header data) changes, the entire data set is NOT rewritten;
proc datasets nolist lib=work;
modify have;
format x date9.;
run;
* demonstrate that the permanent format of x has changed to date9.;
data _null_;
set have;
put x=;
run;
I believe the issue is the RENAME statement. You can only call it once.
Change this:
set projet.Ophtalmo_new (RENAME=(date_diagnostic=date_dia)) (RENAME=
(date_examen=date_exa));
to this:
set projet.Ophtalmo_new (RENAME=(date_diagnostic=date_dia date_examen=date_exa));
You can't rename the dates and then use those variables in your INPUT statement. They've been renamed and no longer exist, so trying to access date_dia in the INPUT function will at worst result in all missing values.
You also shouldn't use the notation of having the same data set name in your DATA and SET statement. This means once this step is run, the original data no longer exists. So you need to back up several steps and recreate your original data first before you can even fix your code. In general, this leads to errors that are harder to diagnose and fix because even if you fix your code your original data is wrong so you still think you have errors.
So, changes:
1. Change name of output data set in data statement.
2. Remove RENAME data set options.
3. Add DROP statement to remove the variables no longer desired.
/* partie B question 2*/
data projet.Ophtalmo2;
set projet.Ophtalmo_new;
date_diagnostic = input (date_dia, DDMMYY10.);
date_examen = input (date_exa, DDMMYY10.);
format date_diagnostic date_examen date9.;
drop date_dia date_exa;
run;

How to write macro for importing multiple excel files(xlsx) in sas and append it

I am having some 50 excel files (xlsx format) to be imported to sas and then need to append it for analysis. All the excel files header are same i.e., the variable names are same for all the file. I need macro for importing and appending all of them at a time rather than importing all the files one by one and then later append it. Your help is much appreciated.
The other issue with the excel file is that there is a blank column next between the variable name and data points. I have written a code remove it using data step but came we write this also in the macro while importing.
Data XXX.yyy;
Set XXX.yyy;
if missing(coalesceC(of ASC Brand Cdesc1 Cust_ DGM Desc Family Grp1 High_Level_Product_Desc
Issf Name Prod_Desc Product__Code RVP SA_Desc Terr_ UOM Yr
)) and missing(coalesce(of Acc Int_Margin M_Cost Mth Net_Sales Sls__ Uts )) then delete;
run;
It sounds as though your existing code already does what you need it to do. I doubt there will be much of a performance gain from attempting to import all 50 files in one data step (which is possible via dde, but rather fiddly).
If your existing code is set up to process just one hard-coded file, I'd suggest using it to write a simple macro that takes one excel file as input, imports that file, and appends it to the master dataset. Then you can call the macro 50 times.
e.g. You could write the macro as something like this, incorporating all the relevant bits of your code, and replacing all references to specific files with macro variables:
%macro import_and_append(excel_file,base_dataset);
proc import datafile = "&excel_file" dbms = excel out = t_import;
run;
proc append base = &base_dataset data = t_import;
run;
proc datasets lib = work nolist nowarn;
delete t_import;
run;
quit;
%mend;
Then you can call the macro like so:
%import_and_append(c:\excel_file_01.xls,work.master_dataset)
Another way to do this would be to use the Excel LIBNAME Engine. You declare a library to each of your files, and then read all the sheets in 1 Data Step.
In this example, I have 2 workbooks (Book1.xlsx and Book2.xlsx) in C:\temp. All data is in Sheet1. 3 variables -- X, Y, and Z. Modify as needed for your purpose.
data files;
format file $12.;
input file $;
datalines;
Book1.xlsx
Book2.xlsx
;
run;
%macro read_excel(dir,outdata,files);
data _null_;
set &files end=last;
call execute("libname t" || strip(put(_n_,8.)) || " excel '&dir\" || strip(file) || "';");
if last then
call symput("n",_n_);
run;
data &outdata;
set
%do i=1 %to &n;
t&i.."Sheet1$"n
%end;
;
a = sum(x,y,z);
if missing(a) then delete;
run;
%do i=1 %to &n;
libname t&i clear;
%end;
%mend;
%read_excel(c:\temp,data_from_excel,files);

sas proc import txt wth delimiter inside observations

My first question as a new user of SAS 9.3. I want to use proc import to read a large text file with dlm=’,’. But there is one variable has “,” in between in some obs, eg. “Hartford, CT”. (not all of them, others like “XL Center”).IS there any way that I can read “Hartford, CT” into one variable just like “XL Center” while using proc import to this text file?
Many thanks
Edited here: sorry I shouldn't put quote around the record. there are NO quotes wrap on any record, be XL center, or Hartford, CT. when dlm set as comma, the row has Hartford, CT produces on extra column and shifts records into wrong column afterwards.
So long as your text file has quotes around the delimeter, then it will work automatically. For example:
/* example data */
data _null_;
file "%sysfunc(pathname(work))/some.csv";
put 'head1,head2,head3';
put 'XL Center,1,"Hartford, CT"';
run;
/* import */
proc import datafile="%sysfunc(pathname(work))/some.csv"
out=example
dbms=dlm
replace;
delimiter=",";
datarow=2;
run;