How to combine two datasets into one in SAS - merge

I have some SAS code from my editor here. I am learning to use SAS (this is my first time using it), so I'm not sure how much code is relevant.
proc import
datafile="C:\Users\barnedsm\Desktop\SAS\ToothGrowth.csv"
dbms=csv
out=tooth;
proc print data=tooth (obs=5);
run;
6. create two SAS data sets ToothGrowth_OJ and ToothGrowth_VC for the animals with the
delivery method orange juice and ascorbic acid, respectively. (5 points)
data ToothGrowth_OJ;
set tooth;
where (supp="OJ");
proc print data=ToothGrowth_OJ (obs=5);
run;
data ToothGrowth_VC;
set tooth;
where (supp="VC");
proc print data=ToothGrowth_VC (obs=5);
run;
7. save the two SAS data sets in a permanent folder on your computer. (5 points)
libname mylibr "C:\Users\barnedsm\Desktop\SAS";
data mylibr.ToothGrowth_OJ_permanent;
set ToothGrowth_OJ;
run;
libname mylibr "C:\Users\barnedsm\Desktop\SAS";
data mylibr.ToothGrowth_VC_permanent;
set ToothGrowth_VC;
run;
For the final question on my assignment, I am wanting to re-combine the last two datasets I made (ToothGrowth_OJ and ToothGrowth_VC) into one dataset (ToothGrowth_combined). How would I do this? My thoughts would be to use a subset function like I used to separate the two. The code I have in mind is below.
data ToothGrowth_combined;
set ToothGrowth_OJ(where=(supp="OJ"));
keep supp Len;
run;
This would tell SAS to keep the values from the ToothGrowth_OJ dataset that have OJ in the "supp" columns (which is all of them) and to keep the variable Len. Assuming that I have done this code correctly, I want to add in the values from my ToothGrwoth_VC dataset in a similar way, but the output is an empty dataset when I run the same code, but replace the "ToothGrowth_OJ" with "ToothGrowth_VC". Is there a way to use the subset code to take these two separate datasets and combine them into one, or an easier way?

Your starting code is doing these steps.
Using PROC IMPORT to guess how to read text file into a dataset.
Creates a subset of the data with only some of the observations.
Creates a second subset of the data.
To recombine the two subsets use the SET statement and list all of the input datasets you want. To limit the number of variables written to the output dataset use a KEEP statement.
data ToothGrowth_combined;
set ToothGrowth_OJ ToothGrowth_VC ;
keep supp Len;
run;
I am not sure why you added the WHERE= dataset option in your code attempt since by the way they were created they each only have observations with a single value of SUPP.
If you want to combine the permanent datasets instead (for example if you started a new SAS session with an empty WORK library) then use those dataset names instead in the SET. Just make sure the libref that points to them is defined in this SAS session.
libname mylibr "C:\Users\barnedsm\Desktop\SAS";
data ToothGrowth_combined;
set mylibr.ToothGrowth_OJ_permanent mylibr.ToothGrowth_VC_permanent;
keep supp Len;
run;

Related

Import Error even when variable is dropped SAS

I'm importing a semi-colon delimited file as such
ID Segment Number Date Payment
1 A1 103RTR 10OCT17 10
2 A1 205FCD 11OCT17 11
...
the SAS doesn't like the mixture of numbers and characters when I import this txt file using this code:
proc import
out=want (drop=Number)
datafile="have"
dbms=dlm
replace;
delimiter=';';
options validvarname=v7 missing='';
run;
Even though i'm not trying to load in Number, which in the real dataset is much longer, like 12 numbers followed by four characters, it returns this error in the log
NOTE: Invalid data for Number in line 22157 21-30.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
ERROR: Import unsuccessful. See SAS Log for details.
I would like to do a typical infile and informat but with having 32 variables and 2 million rows, I just cannit be taking the time to find out what range and style each variable needs to be read in. so I am asking whether there's a way to format that particular variable but sticking with the ease of proc import.
But I'm also asking whether this actually impacts my import? as the data seems fine when checking the output.
I would like to do a typical infile and informat but with having 32
variables and 2 million rows, I just cannit be taking the time to find
out what range and style each variable needs to be read in. so I am
asking whether there's a way to format that particular variable but
sticking with the ease of proc import.
Bad idea, garbage in = garbage out and you're only dealing with 32 variables so that's actually not that bad. Take the time to clean and import the data correctly pays off and you learn about the data in the process which speeds up further analysis. This step is not a waste of time.
After importing a data set, its a good idea to run a PROC MEANS and PROC FREQ and review the output to ensure it was read correctly.
proc means data=have;
run;
proc freq data=have;
run;
Set GUESSINGROWS=MAX in the PROC IMPORT. This forces SAS to scan the whole file before importing it, which will then be more likely correct. If you're automating this process and reading the file more than once, then take the code from the log and use that instead of PROC IMPORT, once you've verified the data.
And the option statement should not be within the PROC IMPORT step, it goes before.
options validvarname=v7 missing='';
proc import
out=want (drop=Number)
datafile="have"
dbms=dlm
replace;
delimiter=';';
guessingrows=max;
run;

Using proc datasets copy on files that fit a certain condition

As the title suggests I'd like to copy SAS tables from a Library to another but not all tables. I'd like to copy the tables which names start with 's' for example.
I know that I have to use proc datasets copy but which option? How ?
(English isn't my first English so Im sorry if my question isnt clear))
It is probably easier to just use PROC COPY. You can use : as a wildcard in the SELECT statement.
12220 proc copy inlib=work outlib=out;
12221 select c: / mtype=data ;
12222 run;
NOTE: Copying WORK.CHECK to OUT.CHECK (memtype=DATA).
NOTE: There were 3 observations read from the data set WORK.CHECK.
NOTE: The data set OUT.CHECK has 3 observations and 4 variables.
NOTE: Copying WORK.CLASS to OUT.CLASS (memtype=DATA).
NOTE: There were 19 observations read from the data set WORK.CLASS.
NOTE: The data set OUT.CLASS has 19 observations and 5 variables.

Merging sas data sets without a key variable

I am attempting to merge two data sets without a single key variable. The data looks like this in both data sets:
study_id.....round....other variables different between the two sets
A000019....R012....etc
A000019....R013
A000047....R013
A000047....R014
A000047....R015
A000267....R014
This is my code...
DATA RAKAI.complete;
length study_id $ 8;
MERGE hivgps2 rccsdata;
BY study_id round;
RUN;
I've tried to merge by study_id and round which are the only two variables shared across the data sets. But it just stacks the two sets creating double the correct number of IDs. The combination of "study_id" and "round" provides a unique identifier, but no one variable does. Does is just make the most sense to code a new unique id by combining the two variables that are shared by both data sets?
Many Thanks
I realized I can post the code that I meant to deal with potential unwanted spaces here.
DATA hivgps2;
SET hivgps2;
study_id = compress(study_id);
round= compress(round);
RUN;
DATA rccsdata;
SET rccsdata;
study_id = compress(study_id);
round=compress(round);
RUN;
Your code is the correct format for merging by multiple variables. Records from both datasets are included, so if none of the keys match then the result will be the same as if you used SET instead of MERGE.
Are you sure that there is any overlap between the two sets of data? Check that your variables are the same length. If they are character then make sure the values are consistent in their use of upper and lower case letters. Make sure that the values do not have leading spaces or other non-printing characters. Also make sure you haven't attached a format to one of the datasets so that the values you see printed are not what is actually in the data.
In your clean up data steps you should force the length of the variables to be consistent. Also you can compress more than just spaces from the values. I like to eliminate anything that is not a normal 7-bit ASCII code. That will get rid of tabs, non-breaking spaces, nulls and other strange things. In normal 7-Bit ASCII the printable characters are between ! ('21'x or 33 decimal) and ~ ('7E'x or 126 decimal).
data hivgps2_clean ;
length study_id $10 round $5 ;
set hivgps2;
format study_id round ;
study_id=upcase(compress(study_id,compress(study_id,collate(33,126))));
round=upcase(compress(round,compress(study_id,collate(33,126))));
run;
proc sort; by study_id round; run;
data rccsdata_clean;
length study_id $10 round $5 ;
set rccsdata;
format study_id round ;
study_id=upcase(compress(study_id,compress(study_id,collate(33,126))));
round=upcase(compress(round,compress(study_id,collate(33,126))));
run;
proc sort; by study_id round; run;
data want;
merge hivgps2_clean(in=in1) rccsdata_clean(in=in2);
by study_id round;
run;
You can try that, or you can just use a proc sql join:
proc sql;
create table rakai.complete as select
a.*, b.*
from hivgps2 as a
full join rccsdata as b
on a.study_id = b.study_id and a.round = b.round;
quit;

Import SPSS data into SAS without Labels and Values

As I am new to SAS I am having trouble to import spss data into sas using the "proc import" command. The code I was using:
proc import datafile = "C:\Users\spss.sav"
out=work.test
dbms = sav
replace;
run;
The main problem is that when imported to sas, the datatable variables have the values and not the coding. So for instance if the variable "Gender" is coded 1=male 2=female, each observation in sas has "female" or "male".
Now according to here:
Proc Import from SPSS
if the following code is added after the code above, then this problem ceases to exist:
proc datasets;
modify my_dataset;
format _all_;
quit;
What still remains is that the Variable names from spss, instead of having their name, when imported to sas they have the labels that are assigned in spss. Is there any command that can keep the Names of the variables in SAS, instead of the SPSS labels?
It's possible that you are seeing column labels but that the underlying names still exist. You can modify your datasets procedure to remove the labels as well as the formats. Try this after your proc import:
proc datasets library = work;
modify test;
attrib _ALL_ label = " " format =;
run;
The attrib statement is applying a blank label and format to every variable.
I had a similar problem. I had yearly SPSS datasets for a survey and the same format, call it "Yearformat" would go 0=2011, 1=2012, ... for the 2011 data, but 0=2012, 1=2013, ..., for 2012 data, etc. It seems like there should be a better solution, but what I did was ..
SPSS -> save as SAS 9 for windows.. and click the option to output the formats to a sas dataset and then applied / modified the formats as necessary along the way .. mainly data datacopy ; set data ; newyear = put(year,yearformat.) to preserve the proper years.
But the point is, SPSS will create a sas dataset without the formats and a script with the formats and code to apply/modify the dataset with those formats. So you have control over the process.

SAS Regression by class variable

I wish to perform multiple regressions conditionally based on the value of a categorical variable. So, for a simple example, consider the sashelp.class data. I need to perform a regression for males and another for females. Since my dataset has many more divisions and is much larger, I start by feeding the different types into macro variables:
proc sql;
select count(distinct Sex) into :numsex
from sashelp.class;
%let numsex=&numsex;
select distinct Sex into :sex1 - :sex&numsex
from sashelp.class;
quit;
Then I am trying to perform a regression on each one by looping them through. I know the commented out code works, but am unsure why my macro function is not working.
/**/
/*data dataF;*/
/* set sashelp.class;*/
/* where Sex='F';*/
/*run;*/
/**/
/*proc reg data=dataF outest=out1;*/
/* model Height=Weight;*/
/*run;*/
%macro regress;
%do i = 1 %to &numsex;
data data&&sex&i;
set sashelp.class;
where Sex='&&sex&i';
run;
proc reg data=data&&sex&i outest=out&i;
model Height=Weight;
run;
%end;
%mend;
%regress;
Also, if there is a better way to do this, then I'm all ears. The current way is a pain since I will have to combine all of my output sets of the estimates to get one dataset. Also, I get a bunch of intermediate datasets that I don't want or need.
Thanks.
Usually the BY group is the best way to do this, not sure if this is exactly what you're looking for:
proc sort data=sashelp.class out=class;
by sex;
run;
proc reg data=class outest=out1;
by sex;
model Height=Weight;
run;
Your macro failed because single quotes stop macro variable resolution (ie, '&sex' does not work to get 'F'; you have to use "&sex" to get "F".)