I have a .scores file with two delimiters space and comma. I need to import it sas and not able to do it with regular import command. The format is as follows
832783_9399 973299,03200 238003
Thanks!
you can use multiple delimiters in dlm as show below
data want;
infile datalines dlm=' ,' ;
informat var1 var2 var3 var4 $60.;
input var1 $ var2 $ var3 $ var4 $;
datalines;
832783_9399 973299,03200 238003
;
Related
I'm trying to import csv file to SAS using proc import; I know that guessingrows argument will determine automatically the type of variable for each column for my csv file. But there is an issue with one of my CSV file which has two entire columns with blank values; those columns in my csv file should be numeric, but after running the below code, those two columns are becoming character type, is there any solutions for how to change the type of those two columns into numeric during or after importing it to SAS ?
Here below is the code that I run:
proc import datafile="filepath\datasetA.csv"
out=dataA
dbms=csv
replace;
getnames=yes;
delimiter=",";
guessingrows=100;
run;
Thank you !
Modifying #Richard's code I would do:
filename csv 'c:\tmp\abc.csv';
data _null_;
file csv;
put 'a,b,c,d';
put '1,2,,';
put '2,3,,';
put '3,4,,';
run;
proc import datafile=csv dbms=csv replace out=have;
getnames=yes;
run;
Go to the LOG window and see SAS code produced by PROC IMPORT:
data WORK.HAVE ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile CSV delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat a best32. ;
informat b best32. ;
informat c $1. ;
informat d $1. ;
format a best12. ;
format b best12. ;
format c $1. ;
format d $1. ;
input
a
b
c $
d $
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
Run this code and see that two last columns imported as characters.
Check it:
ods select Variables;
proc contents data=have nodetails;run;
Possible to modify this code and load required columns as numeric. I would not drop and add columns in SQL because this columns could have data somewhere.
Modified import code:
data WORK.HAVE ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile CSV delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat a best32. ;
informat b best32. ;
informat c best32;
informat d best32;
format a best12. ;
format b best12. ;
format c best12;
format d best12;
input
a
b
c
d
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
Check table description:
ods select Variables;
proc contents data=have nodetails;run;
You can change the column type of a column that has all missing value by dropping it and adding it back as the other type.
Example (SQL):
filename csv 'c:\temp\abc.csv';
data _null_;
file csv;
put 'a,b,c,d';
put '1,2,,';
put '2,3,,';
put '3,4,,';
run;
proc import datafile=csv dbms=csv replace out=have;
getnames=yes;
run;
proc sql;
alter table have
drop c, d
add c num, d num
;
I wrote a code in unix SAS to import multiple csv files from current folder. The macro variables are being assigned correct values but somehow the relevant files are not being imported. I am getting the following error message
ERROR: Physical file does not exist, /work/pricepromo/modeler/tolapa01/pawan/&j..csv.
ERROR: Import unsuccessful. See SAS Log for details.
Below is the code.
OPTIONS MERROR MPRINT SERROR MLOGIC SYMBOLGEN ;
X ls *.csv > list;
data name ;
infile 'list' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=1 ;
informat name_list $9. ;
format name_list $9. ;
input
name_list $
;
run;
data name2;
set name;
name_mod=translate(name_list,'','.csv');
run;
proc sql;
select name_mod into :name separated by '*' from name2;
%let count2 = &sqlobs;
quit;
%macro yy;
%do i = 1 %to &count2;
%let j = %scan(&name,&i,*);
proc import out = &j datafile='./&j..csv'
dbms=csv replace;
run;
%end;
%mend;
%yy;
Try using double quotes
datafile="./&j..csv"
not
datafile='./&j..csv'
With all those options it should have been obvious from reading the SAS log.
I have a SAS code that looks something like this:
DATA WORK.MY_IMPORT_&stamp;
INFILE "M:\YPATH\myfile_150*.csv"
delimiter = ';' MISSOVER DSD lrecl = 1000000 firstobs = 2 ignoredoseof;
[...]
RUN;
Now, at M:\YPATH I have several files named myfile_150.YYYYMMDD. The code works the way it is supposed to by importing always the latest file. I am wondering how SAS decides which file to choose, since the wildcard * can be replaced by anything. Does it sort the files in descending order and choose the first one?
On my system, SAS 9.4 TS1M4, SAS is reading ALL files that satisfy the wildcard.
I created 3 files (file_A.csv, file_B.csv, and file_C.csv). Each contain 1 record ('A', 'B', and 'C' respectively).
data test;
infile "c:\temp\file_*.csv"
delimiter = ';' MISSOVER DSD lrecl = 1000000 ignoredoseof;
format char $1.;
input char $;
run;
(Note I dropped the firstobs option from your code.)
The resulting TEST data set contains 3 observations, 'A', 'B', and 'C'.
This is the order of files returned when issuing
dir c:\temp\file_*.csv
SAS is using the default behavior of the OS and reading the files in that order.
25 data test;
26 infile "c:\temp\file_*.csv"
27 delimiter = ';' MISSOVER DSD lrecl = 1000000 ignoredoseof;
28 format char $1.;
29 input char $;
30 run;
NOTE: The infile "c:\temp\file_*.csv" is:
Filename=c:\temp\file_A.csv,
File List=c:\temp\file_*.csv,RECFM=V,
LRECL=1000000
NOTE: The infile "c:\temp\file_*.csv" is:
Filename=c:\temp\file_B.csv,
File List=c:\temp\file_*.csv,RECFM=V,
LRECL=1000000
NOTE: The infile "c:\temp\file_*.csv" is:
Filename=c:\temp\file_C.csv,
File List=c:\temp\file_*.csv,RECFM=V,
LRECL=1000000
NOTE: 1 record was read from the infile "c:\temp\file_*.csv".
The minimum record length was 1.
The maximum record length was 1.
NOTE: 1 record was read from the infile "c:\temp\file_*.csv".
The minimum record length was 1.
The maximum record length was 1.
NOTE: 1 record was read from the infile "c:\temp\file_*.csv".
The minimum record length was 1.
The maximum record length was 1.
NOTE: The data set WORK.TEST has 3 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.00 seconds
I have a table which has 120 columns and some of them is including Turkish characters (for example "ç","ğ","ı","ö"). So i want to replace this Turkish characters with English characters (for example "c","g","i","o"). When i use "TRANWRD Function" it could be really hard because i should write the function 120 times and sometimes hte column names could be change so always i have to check the code one by one because of that.
Is there a simple macro which replaces this characters in all columns .
EDIT
In retrospect, this is an overly complicated solution... The translate() function should be used, as pointed by another user. It could be integrated in a SAS function defined with PROC FCMP when used repeatedly.
A combination of regular expressions and a DO loop can achieve that.
Step 1: Build a conversion table in the following manner
Accentuated letters that resolve to the same replacement character are put on a single line, separated by the | symbol.
data conversions;
infile datalines dsd;
input orig $ repl $;
datalines;
ç,c
ğ,g
ı,l
ö|ò|ó,o
ë|è,e
;
Step 2: Store original and replacement strings in macro variables
proc sql noprint;
select orig, repl, count(*)
into :orig separated by ";",
:repl separated by ";",
:nrepl
from conversions;
quit;
Step 3: Do the actual conversion
Just to show how it works, let's deal with just one column.
data convert(drop=i re);
myString = "ç ğı òö ë, è";
do i = 1 to &nrepl;
re = prxparse("s/" || scan("&orig",i,";") || "/" || scan("&repl",i,";") || "/");
myString = prxchange(re,-1,myString);
end;
run;
Resulting myString: "c gl oo e, e"
To process all character columns, we use an array
Say your table is named mySource and you want all character variables to be processed; we'll create a vector called cols for that.
data convert(drop=i re);
set mySource;
array cols(*) _character_;
do c = 1 to dim(cols);
do i = 1 to &nrepl;
re = prxparse("s/" || scan("&orig",i,";") || "/" || scan("&repl",i,";") || "/");
cols(c) = prxchange(re,-1,cols(c));
end;
end;
run;
When changing single characters TRANSLATE is the proper function, it will be one line of code.
translated = translate(string,"cgio","çğıö");
First get all your columns from dictionary, and then replace the values of all of them in a macro do loop.
You can try a program like this (Replace MYTABLE with your table name):
proc sql;
select name , count(*) into :columns separated by ' ', :count
from dictionary.columns
where memname = 'MYTABLE';
quit;
%macro m;
data mytable;
set mytable;
%do i=1 %to &count;
%scan(&columns ,&i) = tranwrd(%scan(&columns ,&i),"ç","c");
%scan(&columns ,&i) = tranwrd(%scan(&columns ,&i),"ğ","g");
...
%end;
%mend;
%m;
I've got the following file I'm trying to import
namq_aux_lp 07.07.2014
namq_aux_ulc 08.07.2014
namq_aux_gph 07.07.2014
prc_hicp_cann 17.07.2014
namq_nace10_k 02.07.2014
sei_bsco_m 10.06.2014
ei_bsin_m_r2 26.06.2014
lassei_bsbu_m_r2 26.06.2014
assei_bsrt_m_r2 26.06.2014
ei_bssi_m_r2 26.06.2014
ei_bsse_m_r2 26.06.2014
ei_bsci_m_r2 26.06.2014
sts_trtu_m 17.07.2014
I've used the following proc import's
proc import out=lesdates datafile="C:\work\studies\project\data\calend\bigfilev2.txt"
dbms=tab REPLACE;
getnames=no;
run;
proc import out=lesdates datafile="C:\travail\etudes\projetpib\donnees\calend\bigfilev2.txt"
dbms=tab REPLACE;
delimiter='09'x;
getnames=no;
run;
But each time, instead of having 2 variables, I'm ending with one variable taking the 2 columns
var1
------------------------------
namq_aux_lp 07.07.2014
namq_aux_ulc 08.07.2014
namq_aux_gph 07.07.2014
prc_hicp_cann 17.07.2014
namq_nace10_k 02.07.2014
sei_bsco_m 10.06.2014
ei_bsin_m_r2 26.06.2014
lassei_bsbu_m_r2 26.06.2014
assei_bsrt_m_r2 26.06.2014
ei_bssi_m_r2 26.06.2014
ei_bsse_m_r2 26.06.2014
ei_bsci_m_r2 26.06.2014
sts_trtu_m 17.07.2014
What am I doing wrong???
PS: I can edit the text file but I would like to do the import without touching anything.
That's not a tab delimited text file, from what I can tell (I've never seen 9+ character tabs, so it seems likely). That's a fixed width format file. You could in theory use space delimiter, but reading it in as fixed width is better.
data want;
infile "yourfile.txt";
input
#1 var1 $20.
#21 var2 ddmmyy10.
;
format var2 ddmmyy10.;
run;