Hi I am trying to use BY GROUP statement in SAS to generate multiple graphs. I want to print each graph to an individual file named after BY GROUP varaible value, plus I want to add a footnote to each graph where I want add text "This graph is 2300-01" to graph 1 and the want to increment it by 1 for next graph to "This graph is 2300-02" and so on.
goptions reset=all border;
data grainldr;
length country $ 3 type $ 5;
input year country $ type $ amount;
megtons=amount/1000;
datalines;
1995 BRZ Wheat 1516
1995 BRZ Rice 11236
1995 BRZ Corn 36276
1995 CHN Wheat 102207
1995 CHN Rice 185226
1995 CHN Corn 112331
1995 INS Wheat .
1995 INS Rice 49860
1995 INS Corn 8223
1995 USA Wheat 59494
1995 USA Rice 7888
1995 USA Corn 187300
;
proc sort data=grainldr out=temp;
by country;
run;
proc sgplot data=temp (where=(megtons gt 31));
by country;
series x=type y= amount;
series x=type y=megtons;
title "Leading #byval(country) Producers"
j=c "1995 and 1996";
footnote1 j=r "This graph is 2300-&XY.";
run;
quit;
If you had a BY variable in your data set you could use it. For example, if you had a variable called CID (country id), and it had values "01", "02" etc, you could then do something like this:
proc sort data=grainldr out=temp;
by country cid;
run;
footnote1 j=r "This graph is 2300-#byval2";
proc sgplot data=temp (where=(megtons gt 31));
by country cid;
...
...
run;
In this case #BYVAL2 refers to the value of the second BY variable, i.e. CID
Related
Part A:
Define a macro variable for the quarter number. The idea is that this is the only thing the "user" should have to change when running the program for a new quarter.
Part B:
Define macro variables for each month in the quarter and set them equal to a month value that is generated from the quarter number. Hint: %if/%then
Given code:
data Month1;
input Name $ sales;
cards;
Joyce 235
Marsha 352
Bill 491
Vernon 210
Sally 418
;
data Month2;
input Name $ sales;
cards;
Joyce 169
Marsha 281
Bill 315
Vernon 397
Sally 305
;
data Month3;
input Name $ sales;
cards;
Joyce 471
Marsha 314
Bill 394
Vernon 291
Sally 337
;
data Month4;
input Name $ sales;
cards;
Joyce 338
Marsha 259
Bill 310
Vernon 432
Sally 362
;
data Month5;
input Name $ sales;
cards;
Joyce 209
Marsha 355
Bill 302
Vernon 416
Sally 475
;
data Month6;
input Name $ sales;
cards;
Joyce 306
Marsha 472
Bill 351
Vernon 405
Sally 358
;
proc sql;
create table qtr1 as
select Month1.name, month1.sales as m1sales, month2.sales as m2sales,
month3.sales as m3sales, sum(month1.sales, month2.sales, month3.sales) as qtr1sales
from month1, month2, month3
where month1.name=month2.name=month3.name;
select sum(m1sales) as m1total, sum(m2sales) as m2total, sum(m3sales) as m3total,
sum(qtr1sales) as qtr1total
from qtr1;
My solution:
/* question a */
%MACRO qtrn(qtr);
proc print data=&qtr ;
run;
%MEND qtrn;
/* question b */
%Macro Firstqtr(qtr);
%Let I = 1;
%If &qtr = qtr1 %then %do %until (&I > 3);
%Let var&I = Month&I;
%let I = %eval(&I + 1);
%end;
%Mend Firstqtr;
%Firstqtr(qtr);
Can anyone help me figure correct solution?
since this looks like a homework problem, here's the main part of your answer. I'll leave the final select for you to add. Should be pretty simple given the following solution:
%macro qtrSales(qtr);
%do i = 1 %to 3;
%let month&i = month%sysevalf((&qtr-1) * 3 + &i);
%put &&month&i;
%end;
proc sql;
create table qtr&qtr as
select &month1..name,
&month1..sales as &month1.sales,
&month2..sales as &month2.sales,
&month3..sales as &month3.sales,
sum(&month1..sales, &month2..sales, &month3..sales) as qtr&qtr.sales
from &month1, &month2, &month3
where &month1..name=&month2..name=&month3..name;
select sum(&month1.sales) as &month1.total,
sum(&month2.sales) as &month2.total,
sum(&month3.sales) as &month3.total,
sum(qtr&qtr.sales) as qtr&qtr.total
from qtr&qtr;
select sum(&month1.sales) as &month1.total,
sum(&month2.sales) as &month2.total,
sum(&month3.sales) as &month3.total,
sum(qtr&qtr.sales) as qtr&qtr.total
from qtr&qtr;
quit;
%mend qtrSales;
%qtrSales(2);
define a macro variable means simply to use %let to define a macro variable. macro variables are things that you define with %let, call symputx, or select into in SQL, and then reference using &.
%let qtrn = 3;
There you go. The question specified that the user will adjust this, right? So it isn't asking you to do any work on your end, just give the user a place to make this change.
As for the second, I don't entirely understand the hint. It doesn't seem necessary to use conditional logic here. Here's an example of what I'd do.
%let month1 = %eval(3*(&qtrn.-1)+1);
That simply calculates the month number of the first month based on the quarter. Quarter 3 is months 7/8/9, right? 3*(3-1)+1 = 7, 3*(3-1)+2 = 8, 3*(3-1)+2 = 9. (Or you could do it differently, 3*3-2 = 7, 3*3-1 = 8, 3*3 = 9)
Of course, you could do this in a macro with a loop to define them. But it seems excessive to do so - it's not like quarters ever have 4 months in them, or 2, right? They always have 3, it's a defining characteristic of a quarter, so it seems fine to hardcode month1/month2/month3.
data Month1;
input Name $ sales;
cards;
Joyce 235
Marsha 352
Bill 491
Vernon 210
Sally 418
;
data Month2;
input Name $ sales;
cards;
Joyce 169
Marsha 281
Bill 315
Vernon 397
Sally 305
;
data Month3;
input Name $ sales;
cards;
Joyce 471
Marsha 314
Bill 394
Vernon 291
Sally 337
;
data Month4;
input Name $ sales;
cards;
Joyce 338
Marsha 259
Bill 310
Vernon 432
Sally 362
;
data Month5;
input Name $ sales;
cards;
Joyce 209
Marsha 355
Bill 302
Vernon 416
Sally 475
;
data Month6;
input Name $ sales;
cards;
Joyce 306
Marsha 472
Bill 351
Vernon 405
Sally 358
;
options sgen;
%let qtr=qtr1;
%Macro ProcSql;
Proc Sql;
%if &qtr=qtr1 %then %do;
%let month1=month1;
%let month2=month2;
%let month3=month3;
%end;
%else %if &qtr=qtr2 %then %do;
%let month1=month4;
%let month2=month5;
%let month3=month6;
%end;
%else %if &qtr=qtr3 %then %do;
%let month1=month7;
%let month2=month8;
%let month3=month9;
%end;
%else %%if &qtr=qtr4 %then %do;
%let month1=month10;
%let month2=month11;
%let month3=month12;
%end;
create table &qtr as
select &month1.name, &month1.sales as m1sales, &month2.sales as m2sales,
&month3.sales as m3sales, sum(m1sales, m2sales, m3sales) as
qtrsales
from &month1, &month2, &month3
where &month1.name=&month2.name=&month3.name;
select sum(m1sales) as m1total, sum(m2sales) as m2total, sum(m3sales) as
m3total,
sum(qtrsales) as qtrtotal
from &qtr;
%mend ProcSql;
%ProcSql;
I am getting all of the
I am getting this error:
ERROR: Function SUM requires a numeric expression as argument 1.
ERROR: Function SUM requires a numeric expression as argument 2.
ERROR: Function SUM requires a numeric expression as argument 3.
ERROR: The following columns were not found in the contributing tables: m1sales, m2sales, m3sales.
ERROR: File WORK.QTR1.DATA does not exist.
If you want to reference a value derived in the current SELECT statement then you need to add the CALCULATED keyword to your query.
create table &qtr as
select &month1.name
, &month1.sales as m1sales
, &month2.sales as m2sales
, &month3.sales as m3sales
, sum(calculated m1sales,calculated m2sales,calculated m3sales) as qtrsales
from &month1, &month2, &month3
where &month1.name=&month2.name
and &month1.name=&month3.name
;
Get rid of multiple datasets as early as possible.
I'd just concatenate the data into a single dataset. Having multiple identical datasets for mutiple time periods (or other variables) is in my experience one of SAS's worst anti-patterns.
data sales;
set month1 (in=m1) month2 (in=m2) month3 (in=m3) month4 (in=m4) month5 (in=m5) month6 (in=m6);
if m1 then month=1;
if m2 then month=2;
if m3 then month=3;
if m4 then month=4;
if m5 then month=5;
if m6 then month=6;
qtr = ceil(month/3);
run;
With the data in one dataset it's much easier to manipulate. You can easily aggregate it in SQL:
proc sql;
create table monthly_sales as
select qtr,
month,
sum(sales) as monthly_sales
from sales
group by month ;
create table quarterly_sales as
select month,
qtr,
monthly_sales,
sum(monthly_sales) as quarterly_sales
from monthly_sales
group by qtr;
quit;
Or tabulate it:
proc tabulate data=sales;
var sales;
class month qtr;
table qtr*(month all='total')*sales=''*sum='';
run;
Or transpose it:
proc sort data=sales; by name;
proc transpose data=sales out=sales_wide;
by name;
var sales;
id month;
run;
Use macros to generate code, not for control-flow
If you have to use macros, try using a macro to generate code inside a data step instead of looping over multiple datasets. (Macros are supposed to be used to generate code, that's what they were designed for). They far too often get abused as a proxy for program control structures, which often leads to an un-maintainable mess).
Here I use a macro to generate the data step used to concatenate the months, where the number of months is a variable:
%macro myset(months);
set %do i=1 %to &months; month&i (in=m&i) %end; ;
%do i=1 %to &months;
if m&i then month=&i;
%end;
%mend;
data sales;
%myset(months=6);
qtr = ceil(month/3);
run;
If you use options mprint you can see that the generated code is the same as above.
I have two datasets, one with contracts and one with market prices. The gist of what I am trying to accomplish is to find the average value of a time series that corresponds to a period of time in a cross-sectional data set. Please see below.
Example Dataset 1:
Beginning Ending Price
1/1/2014 5/15/2014 $19.50
3/2/2012 10/9/2015 $20.31
...
1/1/2012 1/8/2012 $19.00
In the example above there are several contracts, the first spanning from January 2014 to May 2014, the second from March 2012 to October 2015. Each one has a single price. The second dataset has weekly market prices.
Example Dataset 2:
Date Price
1/1/2012 $18
1/8/2012 $17.50
....
1/15/2015 $21.00
I would like to find the average "market price" (i.e. the average of the price in dataset 2) between the beginning and ending period for each contract on dataset 1. So, for the third contract from 1/1/2012 to 1/8/2012, from the second dataset the output would be (18+17.50)/2 = 17.75. Then merge this value back to the original dataset.
I work with Stata, but can also work with R or Excel.
Also, if you have a better suggestion for a title I would really appreciate it!
You can cross the contracts cross section data with the time series, which forms every pairwise combination, drop the prices from outside the date range, and calculate the mean like this:
/* Fake Data */
tempfile ts ccs
clear
input str9 d p_daily
"1/1/2012" 18
"1/8/2012" 17.50
"1/15/2015" 21.00
end
gen date = date(d,"MDY")
format date %td
drop d
rename date d
save `ts'
clear
input id str8 bd str9 ed p_contract
1 "1/1/2014" "5/15/2014" 19.50
2 "3/2/2012" "10/9/2015" 20.31
3 "1/1/2012" "1/8/2012" 19.00
end
foreach var of varlist bd ed {
gen date = date(`var',"MDY")
format date %td
drop `var'
rename date `var'
}
save `ccs'
/* Calculate Mean Prices and Merge Contracts Back In */
cross using `ts'
sort id d
keep if d >= bd & d <=ed
collapse (mean) mean_p = p_daily, by(id bd ed p_contract)
merge 1:1 id using `ccs', nogen
sort id
This gets you something like this:
id p_contract bd ed mean_p
1 19.5 01jan2014 15may2014 .
2 20.31 02mar2012 09oct2015 21
3 19 01jan2012 08jan2012 17.75
I got 2 problems regarding the SAS date using macros. To make it more complicated I am stuck with 2 specific macros that i need to use(its part of the puzzle that I try to solve).
The macro that I need to use are:
%let id=741852
%let month=January February March April May June July August September October November December
The output that I need to generate is the grade rsults for students in different dicipline. Only by changing the ID of the student the output has to be updated all by itself.
The information related to the date are only needed in the Title of my Output. my code at the moment is as follow:
Title1 "Grade for &firstname &lastname;
Tilte2 "Bithtdate : &bday;
Title3 "ID :&id"
title5 "As of &sysdate, the grades are:"
To create the bday variable I used the a function since i had the info in my data set:
CALL SYMPUTX('bday',Birth_date)
At the moment my output title 2 and 4 are as follow:
Birtdate:12556
As of 17NOV12, the grades are:
How can I use the macro &month to have both title read as follow: Birthdate: 10 Janurary 2012 and As of 15 November 2012, the grade are as follow:
(**The date may seems wrong but im working in french and days come before the month)
I tought of the %SCAN fonction but it wont udate the month if I cange the ID. plz help :)
It's not clear to me what exactly you are trying to accomplish, but here is an example of something similar. I set the locale to French to show how the date is formatted.
data a;
length firstname lastname $20;
input id firstname $ lastname $ grade birthday :date9. ;
datalines;
741852 Mary Jones 92.3 01Jan1980
654654 Chuck Berry 76.9 02Mar1983
823983 Michael Jordan 81.2 04Apr1965
;
run;
options locale=FR;
%macro printinfo(id, ds);
data _null_;
set &ds;
where id=&id;
put "-----------------------------------";
put " Grade for: " firstname lastname;
put " Birthday : " birthday nldate.;
put " ID : " id;
put " As of &sysdate., the grade is: " grade;
put "-----------------------------------";
put " ";
run;
%mend;
option nonotes;
%printinfo(741852,a);
%printinfo(654654,a);
option notes;
Here is the log output
-----------------------------------
Grade for: Mary Jones
Birthday : 01 janvier 1980
ID : 741852
As of 20NOV12, the grade is: 92.3
-----------------------------------
7299 %printinfo(654654,a);
-----------------------------------
Grade for: Chuck Berry
Birthday : 02 mars 1983
ID : 654654
As of 20NOV12, the grade is: 76.9
-----------------------------------
Without changing your other code, try these two title statements:
title2 "Birthdate: %qleft(%sysfunc(putn(&bday,worddatx.)))";
title5 "As of %qleft(%sysfunc(putn(%sysfunc(today()),worddatx.))) the grades are:";
Basically, your first macro variable bday needs to be formatted using the WORDDATX format. Also, you should use the system function TODAY() to get the current system date so you can format it as you want.
The %SYSFUNC macro function lets you execute other SAS functions, in this case PUTN and TODAY(). The %QLEFT macro function trims leading blanks.
I'd like to assign a person's name to a number based on a range rather than an explicit number. It's possible to do this using formats, but as I have the names in a dataset I'd prefer to avoid the manual process of writing the proc format.
data names;
input low high name $;
datalines;
1 10 John
11 20 Paul
21 30 George
31 40 Ringo
;
data numbers;
input number;
datalines;
33
21
17
5
;
The desired output is:
data output;
input number name $;
datalines;
33 Ringo
21 George
17 Paul
5 John
;
Thanks for any help.
You can do it like this using PROC SQL:
proc sql;
create table output as
select numbers.number, names.name
from numbers left join names
on numbers.number ge names.low
and numbers.number le names.high
;
quit;
One handy feature of proc format is the ability to use a data set to create the format, instead of typing it in by hand. Your scenario seems like a perfect scenario for this feature.
In the example you give, a few small changes to the "names" data set will put it in a form that can be read by proc format.
For example, if I modify the names data set like so..
data names;
retain fmtname "names" type "N";
input start end label $;
datalines;
1 10 John
11 20 Paul
21 30 George
31 40 Ringo
;
I can then issue this command to build the format based on it.
proc format cntlin=names;run;
Now I can use this format just like you would with any other format. For example, to create a new column that contains the desired "name" based on the number, you could do this:
data numbers;
input number;
number_formatted=put(number,names.);
datalines;
33
21
17
5
;
Here is what the output would look like:
number_
number formatted
33 Ringo
21 George
17 Paul
5 John
Update to address question:
There isn't much difference in coding needed to read from a text file. We just need to set it up so that the output data set has the particular variable names that proc format expects (fmtname, type, start, end , and label).
For example, if I have an external comma-seperated file called "names.csv" that looks like this:
1,10,John
11,20,Paul
21,30,George
31,40,Ringo
Then I simply can change the code that creates the "names" data set so that it looks like this:
data names;
retain fmtname "names" type "N";
infile "<path to file>/names.csv" dsd;
input start end label $;
run;
Now I can run proc format with the cntlin option like I did before:
proc format cntlin=names;run;
I think SQL is more succinct indeed, but if you aren't big fan of it and the numbers come in known increments, you may try something like:
data ranges;
set names;
do number = low to high; /* by ... */
output;
end;
proc sort;
by number;
run;
data output;
merge ranges
numbers ( in = innum )
;
by number;
keep number name;
if innum;
run;
Again, it requires numbers to come in predetermined increments, e.g. integers.