When importing an excel file into SAS, I find that the import is not being done properly due to wrong variables'format.
The table trying to import look like this:
ID Barcode
1 56798274
2 56890263
3 60998217
4 SKU89731
...
The code I am using is the following:
PROC IMPORT OUT= WORK.test
DATAFILE= "D:\Barcode.xlsx"
DBMS=EXCEL REPLACE;
RANGE="Sheet1$";
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
What happens is that column "Barcode" has best12. format and therefore cases as ID=4 get a missing value ('.') because they originally had both characters and numbers.
Since it is not possible to change the format of a variable in a proc step how can I import the file correctly, and only using SAS editor?
EDIT:
Another option that does half the work and it might give some inspiration is dynamically changing the format of the variable by importing through a data step:
libname excelxls Excel "D:\Barcode.xlsx";
data want;
set excelxls.'Sheet1$'n (DBSASTYPE=(Barcode='CHAR(255)'));
run;
With the above code I force SAS to import the variable in the format I want (char), but still missings are generated for values as in ID=4.
I think your problem is that you have mixed=NO. Change this to mixed=YES and SAS will check a sample of the observations to see if there are any non-numeric characters in the variables - if it finds one then it will specify the variable as character.
Take a look at the link here for more information:
You could convert to a csv (or maybe xls?) file and either:
use the guessingrows= option to increase the number of rows SAS uses to determine type.
If you want complete control of the import: copy the data step code that proc import puts in the log and paste into your program. Now you can modify it to read the data exactly as you want.
Related
I am trying to import .csv files in SAS and they include dates and times.
In csv files, they are defined as "m-d-yyyy hh:mm:ss" (I am not allowed to change the data in excel but I have to work just on SAS).
The problem is that when SAS reads it, it thinks m is a and d is m. :(
For example, what is "9-1-2016 8:00:57" in excel is converted as 09JAN16:08:00:57 in SAS.
I want formats like 01SEP16:08:00:57.
How can i accurately import dates from .csv files in SAS?
Thanks.
I see that you've solved this by using R instead. In case anyone wants an answer for this in future:
One thing you can try is to change your options so that SAS reads as MDY instead of DMY. The way to do this is to run:
options datestyle=mdy;
I want to import a certain variables of an excel more than 40.
Would it be possible to import only the variables that I want in a step data with infile?
Thanks.
You can't use infile (without a lot of extra work, anyway) to read an excel file.
You can, however, embed keep statements in your proc import!
proc import file="path\to\myfile.xlsx"
out =mylib.mydata(keep=myvar)
dbms=excel replace;
run;
Of course, libname access allows the same keep statements.
I have a folder with various flat files. There will be new files added every month and I need to import this raw data using an automated job. I have managed everything except for the final little piece.
Here's my logic:
1) I Scan the folder and get all the file names that fit a certain description
2) I store all these file names and Routes in a Dataset
3) A macro has been created to check whether the file has been imported already. If it has, nothing will happen. If it has not yet been imported, it will be imported.
The final part that I need to get right, is I need to loop through all the records in the dataset created in step 2 and execute the macro from step 3 against all file names.
What is the best way to do this?
Look into call execute for executing a macro from a data step.
The method I most often use, is to write the macro statements to a file and use %include to submit it. I guess call execute as Reeza suggested is better, but I feel more in control when I do it like this:
filename s temp;
data _null_;
set table;
file s;
put '%macrocall(' variable ');';
run;
%inc s;
I use Matlab's Import Tool to generate a script that will take care of importing several CSV files with the same columns.
The Import Tool successfully manages to recognize the type of each column of my CSV file:
However, in the generated script, the same column are cast as strings (%s = string):
Any idea why?
Surprisingly it works fine with CSV files with fewer columns (it works with 70-column CSV files, but the issue arises with with 120-column CSV files). Here is one example of a CSV file that triggers the issue.
I use R2014b x64 with Windows 7 SP1 x64 Ultimate.
This is happening because one of the columns in your file contains data which contains numbers and text. The Import Tool is predicting that you're going to want to extract the numbers from this field, so it labels the column as 'NUMBER'. However, standard textscan doesn't allow for this, so the Import Tool must generate code to read in all of the data as text, and does the numeric conversion afterwards. The Import Tool is trying to help avoid errors using textscan.
The result of running the generated code is still numeric, as is shown in the Import Tool.
The specific column is labeled SEGMENT_ID in your example file. It contains data like:
l8K3ziItQ2pRFQ8
L79Y0zA8xF7fVTA
JYYqCCwRrigaUmD
I am trying to import multiple excel files using the code below. There is a column in each excel file that has both numeric and text values but proc import is only importing numeric values, and put the text values as blank ('.').
Can anyone help with me this issue? Thanks much.
%let subdir=S:\Temp\;
filename dir "&subdir.*.xls";
data new;
length filename fname $ 32767;
infile dir eof=last filename=fname;
input ;
last: filename=fname;
run;
proc sort data=new nodupkey;
by filename;
run;
data null;
set new end=last;
call symputx(cats('filename',_n_),filename);
call symputx(cats('dsn',_n_),scan(scan(filename,7,'\'),1,'.'));
if last then call symputx('nobs',_n_);
run;
%put &nobs;
%macro import;
%do i=1 %to &nobs;
proc import datafile="&&filename&i" out=&&dsn&i
dbms=excel replace;
sheet = "Sheet1";
getnames=yes;
mixed=yes;
run;
%end;
%mend import;
%import
The best way to control the data types in an imported Excel work book is to use the DBSASTYPE data set option with a libname. This is especially useful when dealing with other data types (like datetime and time values).
For example, let's assume that the affected column is named MY_VAR and should always be read as character with a max length of 30. And let's also assume you have a spreadsheet column named START_TIME that contains an Excel coded date and time stamp. Your macro might be revised like this:
libname x "&&filename&i";
data &&dsn&i;
set x.'Sheet1$'n(dbsastype=(MY_VAR=char30 START_TIME=datetime));
run;
libname x clear;
As long as you know the name of the Excel column causing the problem, this should work well.
Mixed=Yes should fix things for you, but if it's not, then there are a few solutions.
First off, you may want to check your scan value. You can see one possible location here:
http://support.sas.com/kb/35/563.html
HKEY_LOCAL_MACHINE ► Software ► Microsoft ► Office ► 12.0 ► Access Connectivity Engine ► Engines
If you have an older version of office (pre-2007), it's called the "JET Engine" and is located in a slightly different place (you can google for it). Your "12.0" may be different depending on what you have installed (12.0 is Office 2007).
Second, you can force columns to be particular types. DBSASTYPE option is where you need to go for this; see http://www2.sas.com/proceedings/sugi31/020-31.pdf for example (about in the middle of the document, search for DBSASTYPE).