I am trying to import a csv file into sas using custom code. Below are the sample lines from the raw data.
Item,date,WKLY_QTY,WKLY_SALES
10001,01Apr12,3313,67536.16
10001,15Apr12,889,26577.66
10001,22Apr12,4543,65001.8
10001,29Apr12,2822,74522.02
My SAS Code is as follow:
data LOtpt.Dummy2;
infile "&InptPath.\Dummy2_CSV.csv" dsd dlm=',' FIRSTOBS=2;
input Item date DATE7. WKLY_QTY WKLY_SALES;
run;
Result I am getting is as follow:
Item date WKLY_QTY WKLY_SALES
10001 19084 . 3313
10001 19098 . 889
10001 19105 . 4543
10001 19112 . 2822
Can any one please help me with the mistake. There is some problem with date informat because when I am taking this informat as character everything is going good.
You're mixing two input varieties here. The only difference is a single colon:
data work.Dummy2;
infile datalines dsd dlm=',';
input Item date :DATE7. WKLY_QTY WKLY_SALES;
datalines;
10001,01Apr12,3313,67536.16
10001,15Apr12,889,26577.66
10001,22Apr12,4543,65001.8
10001,29Apr12,2822,74522.02
;;;;
run;
List input does not normally allow an informat in the list (you can put informats in an informat statement). Modified list input (as shown above) is however permitted.
I suggest that you get into the habit of defining all variables used in your data set explicitly using ATTRIB statements. It takes a bit more typing but you end up with code that is much easier to use, especially if you need to get help from other people. Even better, include a KEEP statement to control only the variables needed, which prevents stray variables from showing up.
This has the additional benefit of allowing you the use LIST input where appropriate.
In other words, try this:
data LOtpt.Dummy2;
/* Define all variables and attributes here */
attrib Item informat=5. format=5.;
attrib Date informat=date7. format=yymmdd10.;
attrib Wkly_Qty informat=best10. format=comma9.;
attrib Wkly_Sales informat=best10. format=dollar11.2;
keep Item Date Wkly_Qty Wkly_Sales;
infile "&InptPath.\Dummy2_CSV.csv" dsd dlm=',' firstobs=2;
input Item Date Wkly_Qty Wkly_Sales;
run;
If you get in the habit of doing something like this all the time, it gets much easier over time. Note I selected formats and informats based on what I think your data looks like. You should choose ones that best meet your needs.
Related
I tried to import text file in sas with the following code
PROC IMPORT DATAFILE= '/home/u44418748/MSc Biostatistics with SAS/Datasets/school.txt'
OUT= outdata
DBMS=dlm
REPLACE;
delimiter='09'x;
GETNAMES=YES;
RUN;
But I am getting import unsuccessful because text file has period for missing data
this is what i got in SAS log
NOTE: Invalid data for class_size in line 455 16-17.
455 CHAR 454.34.8.32.17.NA.23.125.12.188 31
ZONE 3330330303303304403323330332333
NUMR 454934989329179E1923E125912E188
sl_no=454 school=34 iq=8 test=32 ses=17 class_size=. meanses=23.125 meaniq=12.188 _ERROR_=1 _N_=454
how can load this text file in SAS
Did you create that text file from R? That package has a nasty habit of putting text values of NA for numeric values into text files. If you are the one that created the file the you might check if the system you are using has a way to not put the NA into the file to begin with. In a delimited file missing values are normally represented by having nothing for the field. So the delimiters are right next to each other. For SAS you can use a period to represent a missing value.
I wouldn't bother to use PROC IMPORT to read a delimited file. Just write a data step to read the file. Since it looks like your file only has six variables and they are all numeric the code is trivial.
data outdata;
infile '/home/u44418748/MSc Biostatistics with SAS/Datasets/school.txt'
dsd dlm='09'x firstobs=2 truncover
;
input sl_no school iq test ses class_size meanses meaniq ;
run;
One way to deal with the NA text in the input file is to replace them with periods. Since all of the fields are numeric you can do that easily because you don't have to worry about replacing real text that just happens to have the letter A after the letter N. Here is trick using the _INFILE_ automatic variable that you can use to make the change on the fly while reading the file.
data outdata;
infile '/home/u44418748/MSc Biostatistics with SAS/Datasets/school.txt'
dsd dlm='09'x firstobs=2 truncover
;
input #;
_infile_=tranwrd(_infile_,'NA','.');
input sl_no school iq test ses class_size meanses meaniq ;
run;
You are getting the NOTE: because of the NA value in the class_size field.
What you presume are periods (.) are actually tabs (hex code 09). Look under the period to confirm, the ZONE is 0 and NUMR 9. 09 is the tab character.
Proc IMPORT guesses each fields data type based on looking at the first few rows (default is 20 rows) of a text file. Your file contained only numbers the 20 rows, so the procedure guessed class_size was numeric.
There a couple of courses of action.
Do nothing. Read your log NOTES and know the places where NA occurred you will have a missing value in your data set.
or,Read the file as-is, but add GUESSINGROWS=MAX; statement to your import code
The mixed data type column class_size will be guessed as character and you might have to do another step to convert the values to numeric (a step in which the non-digit values get converted to missing values)
or, Edit the text file replacing all the NA with a period (.). The dot marks a missing value during IMPORT. The IMPORT step will have no incongruities to LOG about.
Converting a field
PROC IMPORT DATAFILE= '/home/u44418748/MSc Biostatistics with SAS/Datasets/school.txt'
DBMS=dlm REPLACE OUT=work.outdata;
delimiter='09'x;
GETNAMES=YES;
GUESSINGROWS=MAX;
RUN;
data want;
set outdata (rename=(class_size=class_size_char));
class_size = input (class_size_char, ?? best12.);
drop class_size_char;
run;
I need to import data from a csv-file. And I'm able to read everything else but the date. The date format is like dd.m.yyyy format: 6;Tiku;17.1.1967;M;191;
I'm guessing if I need to specify an informat to read it in? I can't figure out which one to use because nothing I've tried works.
What I've managed so far:
data [insert name here];
infile [insert name here];
dlm=";" missover;
length Avain 8 Nimi $10 Syntymapaiva 8 Sukupuoli $1 Pituus 8 Paino 5;
input
Avain Nimi $ Syntymapaiva ddmmyyp.(=this doesnt work) Sukupuoli$ Pituus
Paino;
format Paino COMMA5.2 ;
label Syntymapaiva="Syntymäpäivä";
run;
And part of the actual file to read in:
6;Tiku;17.1.1967;M;191;
Thank you for helping this doofus out!
There is no informat named DDMMYYP.. Use the informat DDMMYY. instead.
Also make sure to use the : modifier before the informat specification included in the INPUT statement so that you are still using list mode input instead of formatted input. If you use formatted input instead of list mode input then SAS could read past the delimiter.
input Avain Nimi Syntymapaiva :ddmmyy. Sukupuoli Pituus Paino;
Perhaps you are confused because there is a format named DDMMYYP.
Formats are used to convert values to text. Informats are what you need to use when you want to convert text to values.
553 options nofmterr ;
554 data _null_;
555 str='17.1.1967';
556 ddmmyy = input(str,ddmmyy10.);
557 ddmmyyp = input(str,ddmmyyp10.);
----------
485
NOTE 485-185: Informat DDMMYYP was not found or could not be loaded.
558 put str= (dd:) (= yymmdd10.);
559 _error_=0;
560 run;
NOTE: Invalid argument to function INPUT at line 557 column 13.
str=17.1.1967 ddmmyy=1967-01-17 ddmmyyp=.
NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to
missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 557:13
You could use the anydtdte informat, but (as #Tom points out) if your data is known to be fixed in this format, then ddmmyy. would be be better. Also, Tom's advice about using the : modifier is correct, and is preferable to use in most (if not all) cases.
data want;
infile cards dlm=";" missover;
input Avain Nimi:$10. Syntymapaiva:ddmmyy. Sukupuoli:$1. Pituus Paino;
format Paino COMMA5.2 Syntymapaiva date9.;
label Syntymapaiva="Syntymäpäivä";
datalines4;
6;Tiku;17.1.1967;M;191;
;;;;
run;
which gives:
I want to define a date format that takes the following format : 12JAN2010
I tried using this code :
/* partie B question 2*/
data projet.Ophtalmo_new;
set projet.Ophtalmo_new (RENAME=(date_diagnostic=date_dia)) (RENAME=
(date_examen=date_exa));
date_diagnostic = input (date_dia, DDMMYY10.);
date_examen = input (date_exa, DDMMYY10.);
format date_diagnostic date_examen date9.;
run;
But it sends me the following syntax error :
ERROR 22-322: Syntax error, expecting one of the following: un nom, une chaîne
entre guillemets, ;,
CUROBS, END, INDSNAME, KEY, KEYRESET, KEYS, NOBS, OPEN, POINT,
_DATA_, _LAST_, _NULL_.
I'm still a sas beginner and i can't manage to get it to work properly, hope you can help, thanks.
The syntax for data set options is a single parenthetical expression. The rename option fits within:
data-set-name ( ... options ... rename=(...) );
The syntax of the RENAME option is the following:
rename=(old-name-1=new-name-1 old-name-2=new-name-2 ...)
So the correct set statement would be
set projet.Ophtalmo_new (RENAME=(date_diagnostic=date_dia date_examen=date_exa));
Because you state your are a beginner I added this section.
The code you show indicates input of the variables originally named date_diagnostic and date_examen. If these variables are indeed character variables to start, then the input is necessary to convert from character to a SAS date (which is simply a number with special meaning). If, however, the variables were already a SAS date with a format different than the one you want, you only need to update the format of the variables (or use a FORMAT statement to change the format to use during a PROC step)
data have;
x = '01-jan-2017'd;
format x ddmmyy10.;
run;
* demonstrate that the permanent format of x is ddmmyy10.;
data _null_;
put x=;
run;
* demonstrate temporary formatting of variable during step;
data _null_;
set have;
format x date9.; * modify the format temporarily during execution of data _null_;
put x=;
run;
* permanently change format of variable;
* only the dataset metadata (or header data) changes, the entire data set is NOT rewritten;
proc datasets nolist lib=work;
modify have;
format x date9.;
run;
* demonstrate that the permanent format of x has changed to date9.;
data _null_;
set have;
put x=;
run;
I believe the issue is the RENAME statement. You can only call it once.
Change this:
set projet.Ophtalmo_new (RENAME=(date_diagnostic=date_dia)) (RENAME=
(date_examen=date_exa));
to this:
set projet.Ophtalmo_new (RENAME=(date_diagnostic=date_dia date_examen=date_exa));
You can't rename the dates and then use those variables in your INPUT statement. They've been renamed and no longer exist, so trying to access date_dia in the INPUT function will at worst result in all missing values.
You also shouldn't use the notation of having the same data set name in your DATA and SET statement. This means once this step is run, the original data no longer exists. So you need to back up several steps and recreate your original data first before you can even fix your code. In general, this leads to errors that are harder to diagnose and fix because even if you fix your code your original data is wrong so you still think you have errors.
So, changes:
1. Change name of output data set in data statement.
2. Remove RENAME data set options.
3. Add DROP statement to remove the variables no longer desired.
/* partie B question 2*/
data projet.Ophtalmo2;
set projet.Ophtalmo_new;
date_diagnostic = input (date_dia, DDMMYY10.);
date_examen = input (date_exa, DDMMYY10.);
format date_diagnostic date_examen date9.;
drop date_dia date_exa;
run;
I currently have a dataset with dates in the format "FY15 FEB". In attempting to format this variable for use with SAS's times and dates, I've done the following:
data temp;
set pre_temp;
yr = substr(fiscal,3,2);
month = substr(fiscal,6,length(fiscal));
mmmyy = month||yr;
input mmmyy MONYY5.;
datalines;
run;
So, I have the strings representing the year and corresponding month. However, running this code gives me the error "The informat $MONYY was not found or could not be loaded." Doing some background on this error tells me that it has something to do with passing the informat a value with the wrong type; what should I alter in order to get the correct output?
*Edit: I see on the SAS support page for formats that "MONYYw. expects a SAS date value as input;" given this, how do I go from strings to a different date format before this one?
When you see a $, it means character value. In this case, you're feeding SAS a character value and giving it a numeric format. SAS inserts the $ for you, but there is no such format in existence.
I'm going to ignore the datalines statement, because I'm not sure why it's there (though I do notice there is no set statement). You might have an easier time just changing your program to:
data temp;
yr = substr(fiscal,3,2);
month = substr(fiscal,6,length(fiscal));
pre_mmmyy = strip(month)||strip(yr);
mmmyy=input(pre_mmmyy,MONYY5.);
run;
you can also remove the "length(fiscal))" from the substring function. The 3rd argument to the substring function is optional, and will go to the end of the string by default.
I'm trying to convert a SAS string of the form "MO-YR" (e.g. "Jan-04") to a SAS date.
Unfortunately, I don't think there is a date format in SAS that takes that form, so I can't just use an input statement like this date = input(datestring, sasformat).
I've been using this site to find date formats: http://v8doc.sas.com/sashtml/lrcon/zenid-63.htm
Thanks,
Michael
MONYY seems to work for me.
data _null_;
input #1 mydate monyy6.;
put mydate= date9.;
datalines;
Jan-04
Dec-12
;;;;
run;
Puts the correct values to the log.