SAS PROC IMPORT not creating OUT dataset as commanded - import

Situation: I'm importing an xlsx file with PROC IMPORT and wanting to send the data OUT to a new netezza database table.
My issue: SAS appears to run fine, but the log shows a completely different table name was been created with a libref that I'm not using (and this libref is cleared).
LIBNAME abc sasionza server=server database=db port=123 user=user pass=pass;
PROC IMPORT
OUT = abc.DesiredTableName
DATAFILE= "my/excelfile/file.xlsx"
DBMS=xlsx
REPLACE;
SHEET="Sheet1";
GETNAMES=YES;
RUN;
This "runs" just fine, or so it appears to. I check the log and I see this:
NOTE: The import data set has 11 observations and 7 variables.
NOTE: xyz.ATableCreatedDaysAgoInAnotherProgram data set was successfully created. NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.55 seconds
cpu time 0.02 seconds
I thought, hmm, that is weird. libref xyz is actually cleared, so I couldn't possibly use it, and ATableCreatedDaysAgoInAnotherProgram is a tablename used in a completely different SAS E-Guide program I have going on.
Sounds like a memory or cache issue. So, I close all instances of SAS E-Guide and fire up a new one. I created a new program that only has my desired lines (the code listed above).
It runs, and I get the following log as a result:
NOTE: The import data set has 11 observations and 7 variables.
NOTE: WORK._PRODSAVAIL data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.55 seconds
cpu time 0.02 seconds
I will note that this is the first time I've actually tried to use PROC IMPORT to send something directly to a netezza table. Up until now, I've always imported files into WORK and worked with them for a bit before inserting them into a table in a database. I thought that maybe this is a SAS limitation I may not be aware of, but the SAS documentation for PROC IMPORT (https://v8doc.sas.com/sashtml/proc/z0308090.htm) says that you can specify a two level name in the OUT statement, so I feel that this should work. If it can't work, then I feel that SAS should error out instead of randomly creating a table name that I'm not even executing in my code.
Summary (tl;dr): Can you PROC IMPORT directly into a netezza database table using a libref? And if you can't, why is my code executing and producing text that isn't even related to what I'm doing?
Thanks, everyone!

The Solution: A column in the xlsx file being imported had a space in one of the column names... Simply removing the space in the column name and saving the changes to the xlsx file allowed for the PROC IMPORT code above to be executed flawlessly with the desired results being imported into the named netezza table.
NOTE: This fixed my problem, but it does not explain the SAS log showing text executing that wasn't actually in the code to be executed.

Sounds like you should report the issue with not getting a working ERROR message to SAS.
To make sure that your SAS/Netezza tables do not have variable names with spaces in them change the setting of the VALIDVARNAME option before running your program. That way PROC IMPORT will convert your column headings in the XLSX file into valid variable names.
options validvarname=v7;
libname out ...... ;
proc import out=out.table replace ...

Related

How to combine two datasets into one in SAS

I have some SAS code from my editor here. I am learning to use SAS (this is my first time using it), so I'm not sure how much code is relevant.
proc import
datafile="C:\Users\barnedsm\Desktop\SAS\ToothGrowth.csv"
dbms=csv
out=tooth;
proc print data=tooth (obs=5);
run;
6. create two SAS data sets ToothGrowth_OJ and ToothGrowth_VC for the animals with the
delivery method orange juice and ascorbic acid, respectively. (5 points)
data ToothGrowth_OJ;
set tooth;
where (supp="OJ");
proc print data=ToothGrowth_OJ (obs=5);
run;
data ToothGrowth_VC;
set tooth;
where (supp="VC");
proc print data=ToothGrowth_VC (obs=5);
run;
7. save the two SAS data sets in a permanent folder on your computer. (5 points)
libname mylibr "C:\Users\barnedsm\Desktop\SAS";
data mylibr.ToothGrowth_OJ_permanent;
set ToothGrowth_OJ;
run;
libname mylibr "C:\Users\barnedsm\Desktop\SAS";
data mylibr.ToothGrowth_VC_permanent;
set ToothGrowth_VC;
run;
For the final question on my assignment, I am wanting to re-combine the last two datasets I made (ToothGrowth_OJ and ToothGrowth_VC) into one dataset (ToothGrowth_combined). How would I do this? My thoughts would be to use a subset function like I used to separate the two. The code I have in mind is below.
data ToothGrowth_combined;
set ToothGrowth_OJ(where=(supp="OJ"));
keep supp Len;
run;
This would tell SAS to keep the values from the ToothGrowth_OJ dataset that have OJ in the "supp" columns (which is all of them) and to keep the variable Len. Assuming that I have done this code correctly, I want to add in the values from my ToothGrwoth_VC dataset in a similar way, but the output is an empty dataset when I run the same code, but replace the "ToothGrowth_OJ" with "ToothGrowth_VC". Is there a way to use the subset code to take these two separate datasets and combine them into one, or an easier way?
Your starting code is doing these steps.
Using PROC IMPORT to guess how to read text file into a dataset.
Creates a subset of the data with only some of the observations.
Creates a second subset of the data.
To recombine the two subsets use the SET statement and list all of the input datasets you want. To limit the number of variables written to the output dataset use a KEEP statement.
data ToothGrowth_combined;
set ToothGrowth_OJ ToothGrowth_VC ;
keep supp Len;
run;
I am not sure why you added the WHERE= dataset option in your code attempt since by the way they were created they each only have observations with a single value of SUPP.
If you want to combine the permanent datasets instead (for example if you started a new SAS session with an empty WORK library) then use those dataset names instead in the SET. Just make sure the libref that points to them is defined in this SAS session.
libname mylibr "C:\Users\barnedsm\Desktop\SAS";
data ToothGrowth_combined;
set mylibr.ToothGrowth_OJ_permanent mylibr.ToothGrowth_VC_permanent;
keep supp Len;
run;

Import Error even when variable is dropped SAS

I'm importing a semi-colon delimited file as such
ID Segment Number Date Payment
1 A1 103RTR 10OCT17 10
2 A1 205FCD 11OCT17 11
...
the SAS doesn't like the mixture of numbers and characters when I import this txt file using this code:
proc import
out=want (drop=Number)
datafile="have"
dbms=dlm
replace;
delimiter=';';
options validvarname=v7 missing='';
run;
Even though i'm not trying to load in Number, which in the real dataset is much longer, like 12 numbers followed by four characters, it returns this error in the log
NOTE: Invalid data for Number in line 22157 21-30.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
ERROR: Import unsuccessful. See SAS Log for details.
I would like to do a typical infile and informat but with having 32 variables and 2 million rows, I just cannit be taking the time to find out what range and style each variable needs to be read in. so I am asking whether there's a way to format that particular variable but sticking with the ease of proc import.
But I'm also asking whether this actually impacts my import? as the data seems fine when checking the output.
I would like to do a typical infile and informat but with having 32
variables and 2 million rows, I just cannit be taking the time to find
out what range and style each variable needs to be read in. so I am
asking whether there's a way to format that particular variable but
sticking with the ease of proc import.
Bad idea, garbage in = garbage out and you're only dealing with 32 variables so that's actually not that bad. Take the time to clean and import the data correctly pays off and you learn about the data in the process which speeds up further analysis. This step is not a waste of time.
After importing a data set, its a good idea to run a PROC MEANS and PROC FREQ and review the output to ensure it was read correctly.
proc means data=have;
run;
proc freq data=have;
run;
Set GUESSINGROWS=MAX in the PROC IMPORT. This forces SAS to scan the whole file before importing it, which will then be more likely correct. If you're automating this process and reading the file more than once, then take the code from the log and use that instead of PROC IMPORT, once you've verified the data.
And the option statement should not be within the PROC IMPORT step, it goes before.
options validvarname=v7 missing='';
proc import
out=want (drop=Number)
datafile="have"
dbms=dlm
replace;
delimiter=';';
guessingrows=max;
run;

Date in table is dd.mm.yyyy - Can't import to postgres via csv

I'm trying to add a .csv to a table in database.
All dates in the .csv is in this format dd.mm.yyyy ( 18.10.2017).
I'm importing via pgadmin and always get an invalid input error.
I've tried to use almost all date formatting options for the column but without any luck.
I would rather not change the csv manually.
Can anyone help me with this?
I almost always import data into a staging table where all the columns are strings.
Then I use queries to load the final table.
This has several advantages:
It gives me much more control over how the data is transformed.
It makes it easier to debug problems -- the entire staging table can be queried to find all rows with a particular issue (for instance).
Additional validations can be performed before loading into the final table.
This is just a suggestion, but you might find that overall this takes less time.
The DateStyle setting is probably set to MDY. You can check this by running:
show datestyle;
Although dd.mm.yyy isn't listed as a standard input format, if you expect it to work, you will need the DateStyle to line up with the ordering here (DMY).
The date/time style can be selected by the user using the SET datestyle command, the DateStyle parameter in the postgresql.conf configuration file, or the PGDATESTYLE environment variable on the server or client.
See section "Date Order Conventions":
https://www.postgresql.org/docs/current/static/datatype-datetime.html

Data Conversion Failed SQL

I am using the import and export wizard and imported a large csv file. I get the following error.
Error 0xc02020a1: Data Flow Task 1: Data conversion failed. The data
conversion for column "firms" returned status value 2 and status text "The
value could not be converted because of a potential loss of data.".
(SQL Server Import and Export Wizard)
Upon importing, I use the advanced tab and make all of the adjustments. As for the field in question, I set it is numeric (8,0). I have since went through this process multiple times and tried 7,8,9,10,and 11 to no avail. I import the csv into excel and look at the respective column, firms. It shows no entry with more than 5 characters. I thought about making it DT_String but will need to manipulate that column eventually by averaging it. Also, have searched for spaces or strange characters and found none.
Any other ideas?
1) Try changing the Numeric precision to numeric(30,20) both in source and destination table.
2) Change the data type to str/wstr and adjust the output column width while importing. It will run fine. It happened with me as well while loading large CSV file of approx 5 GB. After load, use Try_convert function to convert it back to numeric and check the values which went null while conversion, you will find the root cause then.

Import SPSS data into SAS without Labels and Values

As I am new to SAS I am having trouble to import spss data into sas using the "proc import" command. The code I was using:
proc import datafile = "C:\Users\spss.sav"
out=work.test
dbms = sav
replace;
run;
The main problem is that when imported to sas, the datatable variables have the values and not the coding. So for instance if the variable "Gender" is coded 1=male 2=female, each observation in sas has "female" or "male".
Now according to here:
Proc Import from SPSS
if the following code is added after the code above, then this problem ceases to exist:
proc datasets;
modify my_dataset;
format _all_;
quit;
What still remains is that the Variable names from spss, instead of having their name, when imported to sas they have the labels that are assigned in spss. Is there any command that can keep the Names of the variables in SAS, instead of the SPSS labels?
It's possible that you are seeing column labels but that the underlying names still exist. You can modify your datasets procedure to remove the labels as well as the formats. Try this after your proc import:
proc datasets library = work;
modify test;
attrib _ALL_ label = " " format =;
run;
The attrib statement is applying a blank label and format to every variable.
I had a similar problem. I had yearly SPSS datasets for a survey and the same format, call it "Yearformat" would go 0=2011, 1=2012, ... for the 2011 data, but 0=2012, 1=2013, ..., for 2012 data, etc. It seems like there should be a better solution, but what I did was ..
SPSS -> save as SAS 9 for windows.. and click the option to output the formats to a sas dataset and then applied / modified the formats as necessary along the way .. mainly data datacopy ; set data ; newyear = put(year,yearformat.) to preserve the proper years.
But the point is, SPSS will create a sas dataset without the formats and a script with the formats and code to apply/modify the dataset with those formats. So you have control over the process.