Part A:
Define a macro variable for the quarter number. The idea is that this is the only thing the "user" should have to change when running the program for a new quarter.
Part B:
Define macro variables for each month in the quarter and set them equal to a month value that is generated from the quarter number. Hint: %if/%then
Given code:
data Month1;
input Name $ sales;
cards;
Joyce 235
Marsha 352
Bill 491
Vernon 210
Sally 418
;
data Month2;
input Name $ sales;
cards;
Joyce 169
Marsha 281
Bill 315
Vernon 397
Sally 305
;
data Month3;
input Name $ sales;
cards;
Joyce 471
Marsha 314
Bill 394
Vernon 291
Sally 337
;
data Month4;
input Name $ sales;
cards;
Joyce 338
Marsha 259
Bill 310
Vernon 432
Sally 362
;
data Month5;
input Name $ sales;
cards;
Joyce 209
Marsha 355
Bill 302
Vernon 416
Sally 475
;
data Month6;
input Name $ sales;
cards;
Joyce 306
Marsha 472
Bill 351
Vernon 405
Sally 358
;
proc sql;
create table qtr1 as
select Month1.name, month1.sales as m1sales, month2.sales as m2sales,
month3.sales as m3sales, sum(month1.sales, month2.sales, month3.sales) as qtr1sales
from month1, month2, month3
where month1.name=month2.name=month3.name;
select sum(m1sales) as m1total, sum(m2sales) as m2total, sum(m3sales) as m3total,
sum(qtr1sales) as qtr1total
from qtr1;
My solution:
/* question a */
%MACRO qtrn(qtr);
proc print data=&qtr ;
run;
%MEND qtrn;
/* question b */
%Macro Firstqtr(qtr);
%Let I = 1;
%If &qtr = qtr1 %then %do %until (&I > 3);
%Let var&I = Month&I;
%let I = %eval(&I + 1);
%end;
%Mend Firstqtr;
%Firstqtr(qtr);
Can anyone help me figure correct solution?
since this looks like a homework problem, here's the main part of your answer. I'll leave the final select for you to add. Should be pretty simple given the following solution:
%macro qtrSales(qtr);
%do i = 1 %to 3;
%let month&i = month%sysevalf((&qtr-1) * 3 + &i);
%put &&month&i;
%end;
proc sql;
create table qtr&qtr as
select &month1..name,
&month1..sales as &month1.sales,
&month2..sales as &month2.sales,
&month3..sales as &month3.sales,
sum(&month1..sales, &month2..sales, &month3..sales) as qtr&qtr.sales
from &month1, &month2, &month3
where &month1..name=&month2..name=&month3..name;
select sum(&month1.sales) as &month1.total,
sum(&month2.sales) as &month2.total,
sum(&month3.sales) as &month3.total,
sum(qtr&qtr.sales) as qtr&qtr.total
from qtr&qtr;
select sum(&month1.sales) as &month1.total,
sum(&month2.sales) as &month2.total,
sum(&month3.sales) as &month3.total,
sum(qtr&qtr.sales) as qtr&qtr.total
from qtr&qtr;
quit;
%mend qtrSales;
%qtrSales(2);
define a macro variable means simply to use %let to define a macro variable. macro variables are things that you define with %let, call symputx, or select into in SQL, and then reference using &.
%let qtrn = 3;
There you go. The question specified that the user will adjust this, right? So it isn't asking you to do any work on your end, just give the user a place to make this change.
As for the second, I don't entirely understand the hint. It doesn't seem necessary to use conditional logic here. Here's an example of what I'd do.
%let month1 = %eval(3*(&qtrn.-1)+1);
That simply calculates the month number of the first month based on the quarter. Quarter 3 is months 7/8/9, right? 3*(3-1)+1 = 7, 3*(3-1)+2 = 8, 3*(3-1)+2 = 9. (Or you could do it differently, 3*3-2 = 7, 3*3-1 = 8, 3*3 = 9)
Of course, you could do this in a macro with a loop to define them. But it seems excessive to do so - it's not like quarters ever have 4 months in them, or 2, right? They always have 3, it's a defining characteristic of a quarter, so it seems fine to hardcode month1/month2/month3.
Related
data Month1;
input Name $ sales;
cards;
Joyce 235
Marsha 352
Bill 491
Vernon 210
Sally 418
;
data Month2;
input Name $ sales;
cards;
Joyce 169
Marsha 281
Bill 315
Vernon 397
Sally 305
;
data Month3;
input Name $ sales;
cards;
Joyce 471
Marsha 314
Bill 394
Vernon 291
Sally 337
;
data Month4;
input Name $ sales;
cards;
Joyce 338
Marsha 259
Bill 310
Vernon 432
Sally 362
;
data Month5;
input Name $ sales;
cards;
Joyce 209
Marsha 355
Bill 302
Vernon 416
Sally 475
;
data Month6;
input Name $ sales;
cards;
Joyce 306
Marsha 472
Bill 351
Vernon 405
Sally 358
;
options sgen;
%let qtr=qtr1;
%Macro ProcSql;
Proc Sql;
%if &qtr=qtr1 %then %do;
%let month1=month1;
%let month2=month2;
%let month3=month3;
%end;
%else %if &qtr=qtr2 %then %do;
%let month1=month4;
%let month2=month5;
%let month3=month6;
%end;
%else %if &qtr=qtr3 %then %do;
%let month1=month7;
%let month2=month8;
%let month3=month9;
%end;
%else %%if &qtr=qtr4 %then %do;
%let month1=month10;
%let month2=month11;
%let month3=month12;
%end;
create table &qtr as
select &month1.name, &month1.sales as m1sales, &month2.sales as m2sales,
&month3.sales as m3sales, sum(m1sales, m2sales, m3sales) as
qtrsales
from &month1, &month2, &month3
where &month1.name=&month2.name=&month3.name;
select sum(m1sales) as m1total, sum(m2sales) as m2total, sum(m3sales) as
m3total,
sum(qtrsales) as qtrtotal
from &qtr;
%mend ProcSql;
%ProcSql;
I am getting all of the
I am getting this error:
ERROR: Function SUM requires a numeric expression as argument 1.
ERROR: Function SUM requires a numeric expression as argument 2.
ERROR: Function SUM requires a numeric expression as argument 3.
ERROR: The following columns were not found in the contributing tables: m1sales, m2sales, m3sales.
ERROR: File WORK.QTR1.DATA does not exist.
If you want to reference a value derived in the current SELECT statement then you need to add the CALCULATED keyword to your query.
create table &qtr as
select &month1.name
, &month1.sales as m1sales
, &month2.sales as m2sales
, &month3.sales as m3sales
, sum(calculated m1sales,calculated m2sales,calculated m3sales) as qtrsales
from &month1, &month2, &month3
where &month1.name=&month2.name
and &month1.name=&month3.name
;
Get rid of multiple datasets as early as possible.
I'd just concatenate the data into a single dataset. Having multiple identical datasets for mutiple time periods (or other variables) is in my experience one of SAS's worst anti-patterns.
data sales;
set month1 (in=m1) month2 (in=m2) month3 (in=m3) month4 (in=m4) month5 (in=m5) month6 (in=m6);
if m1 then month=1;
if m2 then month=2;
if m3 then month=3;
if m4 then month=4;
if m5 then month=5;
if m6 then month=6;
qtr = ceil(month/3);
run;
With the data in one dataset it's much easier to manipulate. You can easily aggregate it in SQL:
proc sql;
create table monthly_sales as
select qtr,
month,
sum(sales) as monthly_sales
from sales
group by month ;
create table quarterly_sales as
select month,
qtr,
monthly_sales,
sum(monthly_sales) as quarterly_sales
from monthly_sales
group by qtr;
quit;
Or tabulate it:
proc tabulate data=sales;
var sales;
class month qtr;
table qtr*(month all='total')*sales=''*sum='';
run;
Or transpose it:
proc sort data=sales; by name;
proc transpose data=sales out=sales_wide;
by name;
var sales;
id month;
run;
Use macros to generate code, not for control-flow
If you have to use macros, try using a macro to generate code inside a data step instead of looping over multiple datasets. (Macros are supposed to be used to generate code, that's what they were designed for). They far too often get abused as a proxy for program control structures, which often leads to an un-maintainable mess).
Here I use a macro to generate the data step used to concatenate the months, where the number of months is a variable:
%macro myset(months);
set %do i=1 %to &months; month&i (in=m&i) %end; ;
%do i=1 %to &months;
if m&i then month=&i;
%end;
%mend;
data sales;
%myset(months=6);
qtr = ceil(month/3);
run;
If you use options mprint you can see that the generated code is the same as above.
I am am given 2 dates, a start date and an end date.
I would like to know the date of the first 35 day period, then each subsequent 30 day period.
I have;
start end
22-Jun-15 22-Oct-15
9-Jan-15 15-May-15
I want;
start end tik1 tik2 tik3 tik4
22-Jun-15 22-Oct-15 27-Jul-15 26-Aug-15 25-Sep-15
9-Jan-15 15-May-15 13-Feb-15 15-Mar-15 14-Apr-15 14-May-15
I am fine with the dates calculations but my real issue is creating a variable and incrementing its name. I decided to include my whole problem because I thought it might be easier to explain in its context.
You can solve the problem via following logic:
1) Determining number of columns to be added.
2) Calculating the values for the columns basis the requirement
data test;
input start end;
informat start date9. end date9.;
format start date9. end date9.;
datalines;
22-Jun-15 22-Oct-15
09-Jan-15 15-May-15
;
run;
/*******Determining number of columns*******/
data noc_cal;
set test;
no_of_col = floor((end-start)/30);
run;
proc sql;
select max(no_of_col) into: number_of_columns from noc_cal;
run;
/*******Making an array where 1st iteration(tik1) is increased by 35days whereas others are incremented by 30days*******/
data test1;
set test;
array tik tik1-tik%sysfunc(COMPRESS(&number_of_columns.));
format tik: date9.;
tik1 = intnx('DAYS',START,35);
do i= 2 to %sysfunc(COMPRESS(&number_of_columns.));
tik[i]= intnx('DAYS',tik[i-1],30);
if tik[i] > end then tik[i]=.;
end;
drop i;
run;
Alternate Way (incase you dont want to use proc sql)
data test;
input start end;
informat start date9. end date9.;
format start date9. end date9.;
datalines;
22-Jun-15 22-Oct-15
09-Jan-15 15-May-15
;
run;
/*******Determining number of columns*******/
data noc_cal;
set test;
no_of_col = floor((end-start)/30);
run;
proc sort data=noc_cal;
by no_of_col;
run;
data _null_;
set noc_cal;
by no_of_col;
if last.no_of_col;
call symputx('number_of_columns',no_of_col);
run;
/*******Making an array where 1st iteration(tik1) is increased by 35days whereas others are incremented by 30days*******/
data test1;
set test;
array tik tik1-tik%sysfunc(COMPRESS(&number_of_columns.));
format tik: date9.;
tik1 = intnx('DAYS',START,35);
do i= 2 to %sysfunc(COMPRESS(&number_of_columns.));
tik[i]= intnx('DAYS',tik[i-1],30);
if tik[i] > end then tik[i]=.;
end;
drop i;
run;
My output:
> **start |end |tik1 | tik2 |tik3 |tik4**
> 22Jun2015 |22Oct2015 |27Jul2015| 26Aug2015|25Sep2015|
> 09Jan2015 |15May2015 |13Feb2015| 15Mar2015|14Apr2015|14May2015
I tend to prefer long vertical structures. I would approach it like:
data want;
set have;
tik=start+35;
do while(tik<=end);
output;
tik=tik+30;
end;
format tik mmddyy10.;
run;
If you really need it wide, you could transpose that dataset in a second step.
I'm having some frustration with dates in SAS.
I am using proc forecast and am trying make my dates spread evenly. I did some pre-processing wiht proc sql to get my counts by month but my dates are incorrect.
Though my dataset looks good (b/c I used format MONYY.) the actual value of that variable is wrong.
date year month count
Jan10 2010 1 100
Feb10 2010 2 494
...
..
.
The Date value is actually the full SAS representation of the date (18267), meaning that it includes the day count.
Do I need to convert the variable to a string and back to a date or is there a quick proc i can run?
My goal is to use the date variable with proc forecast so I only want Month and year.
Thanks for any help!
You can't define a date variable in SAS (so the number of days passed from 1jan1960) excluding the day.
What you can do is to hide the day with a format like monyy. but the underlying number will always contain that information.
Maybe you can use the interval=month option in proc forecast?
Please add some detail about the problem you're encountering with the forecast procedure.
EDIT: check this example:
data past;
keep date sales;
format date monyy5.;
lu = 0;
n = 25;
do i = -10 to n;
u = .7 * lu + .2 * rannor(1234);
lu = u;
sales = 10 + .10 * i + u;
date = intnx( 'month', '1jul1991'd, i - n );
if i > 0 then output;
end;
run;
proc forecast data=past interval=month lead=10 out=pred;
var sales;
id date;
run;
Hi I am trying to use BY GROUP statement in SAS to generate multiple graphs. I want to print each graph to an individual file named after BY GROUP varaible value, plus I want to add a footnote to each graph where I want add text "This graph is 2300-01" to graph 1 and the want to increment it by 1 for next graph to "This graph is 2300-02" and so on.
goptions reset=all border;
data grainldr;
length country $ 3 type $ 5;
input year country $ type $ amount;
megtons=amount/1000;
datalines;
1995 BRZ Wheat 1516
1995 BRZ Rice 11236
1995 BRZ Corn 36276
1995 CHN Wheat 102207
1995 CHN Rice 185226
1995 CHN Corn 112331
1995 INS Wheat .
1995 INS Rice 49860
1995 INS Corn 8223
1995 USA Wheat 59494
1995 USA Rice 7888
1995 USA Corn 187300
;
proc sort data=grainldr out=temp;
by country;
run;
proc sgplot data=temp (where=(megtons gt 31));
by country;
series x=type y= amount;
series x=type y=megtons;
title "Leading #byval(country) Producers"
j=c "1995 and 1996";
footnote1 j=r "This graph is 2300-&XY.";
run;
quit;
If you had a BY variable in your data set you could use it. For example, if you had a variable called CID (country id), and it had values "01", "02" etc, you could then do something like this:
proc sort data=grainldr out=temp;
by country cid;
run;
footnote1 j=r "This graph is 2300-#byval2";
proc sgplot data=temp (where=(megtons gt 31));
by country cid;
...
...
run;
In this case #BYVAL2 refers to the value of the second BY variable, i.e. CID
I'd like to assign a person's name to a number based on a range rather than an explicit number. It's possible to do this using formats, but as I have the names in a dataset I'd prefer to avoid the manual process of writing the proc format.
data names;
input low high name $;
datalines;
1 10 John
11 20 Paul
21 30 George
31 40 Ringo
;
data numbers;
input number;
datalines;
33
21
17
5
;
The desired output is:
data output;
input number name $;
datalines;
33 Ringo
21 George
17 Paul
5 John
;
Thanks for any help.
You can do it like this using PROC SQL:
proc sql;
create table output as
select numbers.number, names.name
from numbers left join names
on numbers.number ge names.low
and numbers.number le names.high
;
quit;
One handy feature of proc format is the ability to use a data set to create the format, instead of typing it in by hand. Your scenario seems like a perfect scenario for this feature.
In the example you give, a few small changes to the "names" data set will put it in a form that can be read by proc format.
For example, if I modify the names data set like so..
data names;
retain fmtname "names" type "N";
input start end label $;
datalines;
1 10 John
11 20 Paul
21 30 George
31 40 Ringo
;
I can then issue this command to build the format based on it.
proc format cntlin=names;run;
Now I can use this format just like you would with any other format. For example, to create a new column that contains the desired "name" based on the number, you could do this:
data numbers;
input number;
number_formatted=put(number,names.);
datalines;
33
21
17
5
;
Here is what the output would look like:
number_
number formatted
33 Ringo
21 George
17 Paul
5 John
Update to address question:
There isn't much difference in coding needed to read from a text file. We just need to set it up so that the output data set has the particular variable names that proc format expects (fmtname, type, start, end , and label).
For example, if I have an external comma-seperated file called "names.csv" that looks like this:
1,10,John
11,20,Paul
21,30,George
31,40,Ringo
Then I simply can change the code that creates the "names" data set so that it looks like this:
data names;
retain fmtname "names" type "N";
infile "<path to file>/names.csv" dsd;
input start end label $;
run;
Now I can run proc format with the cntlin option like I did before:
proc format cntlin=names;run;
I think SQL is more succinct indeed, but if you aren't big fan of it and the numbers come in known increments, you may try something like:
data ranges;
set names;
do number = low to high; /* by ... */
output;
end;
proc sort;
by number;
run;
data output;
merge ranges
numbers ( in = innum )
;
by number;
keep number name;
if innum;
run;
Again, it requires numbers to come in predetermined increments, e.g. integers.