I have two plots that I would like to merge into one. Each plot represents the proportion of present / not-present observations by their corresponding cumulative test results for the year
So on the plot I would like to see bars, side by side for groups of test scores but counting number of present to not-present
To represent this problem, this is what I have currently:
data test_scores;
do i = 1 to 200;
score = ranuni(200);
output;
end;
drop i;
run;
data test_scores_2;
set test_scores;
if _n_ le 100 then flag = 0;
else flag = 1;
run;
data test_scores_2_0 test_scores_2_1;
set test_scores_2;
if flag = 0 then output test_scores_2_0;
else if flag = 1 then output test_scores_2_1;
run;
PROC GCHART
DATA=test_scores_2_0
;
VBAR
score
/
CLIPREF
FRAME
LEVELS=20
TYPE=PCT
COUTLINE=BLACK
RAXIS=AXIS1
MAXIS=AXIS2
;
RUN;
QUIT;
PROC GCHART
DATA=test_scores_2_1
;
VBAR
score
/
CLIPREF
FRAME
LEVELS=20
TYPE=PCT
COUTLINE=BLACK
RAXIS=AXIS1
MAXIS=AXIS2
;
RUN;
QUIT;
bars should sum up to 100% for present
bars should sum up to 100% for non-present
TIA
proc sgplot to the rescue. Use the group= option to specify two separate groups. Set the transparency to 50% so one histogram does not cover the other.
proc sgplot data=test_scores_2;
histogram score / group=flag transparency=0.5 binwidth=.05;
run;
With Proc GCHART you can use VBAR options GROUP= and G100 to get bars that represent percent within group. This is useful when the groups have different counts.
The SUBGROUP= option splits the vertical bar according to the different values of the subgroup variable, and produces automatic coloration and legend corresponding to the subgroups.
When the SUBGROUP variable (or values) correspond 1:1 to the group the result is a chart with a different color for each group and a legend corresponding to the group.
For example, modify your data so group 1 has a 50 count and group 2 has 150 count:
data test_scores;
do _n_ = 1 to 200;
score = ranuni(200);
flag = _n_ > 50;
output;
end;
run;
axis1 label=("score");
axis2 ;
axis3 label=none value=none;
PROC GCHART data=test_scores;
VBAR score
/ levels=10
GROUP=flag G100
SUBGROUP=flag
SPACE=0 TYPE=PERCENT freq gaxis=axis3 maxis=axis1 ;
run;
Output
Similar chart showing the effect of a subgroup variable with values different than group values.
data test_scores;
do _n_ = 1 to 200;
subgroup = ceil(5 * ranuni(123)); * random 1 to 5;
score = ranuni(200);
flag = _n_ > 50;
output;
end;
run;
axis1 label=("score");
axis2 ;
axis3 label=none value=none;
PROC GCHART data=test_scores;
VBAR score
/ levels=10
GROUP=flag G100
SUBGROUP=subgroup /* has integer values in [1,5] */
SPACE=0 TYPE=PERCENT freq gaxis=axis3 maxis=axis1;
run;
Related
Hello I am trying to explore some annual data and it would be convenient to explore them every month. In order to separate the data I used this code for January:
d1 = '2021-01-01 00:00:00';
d2 = '2021-01-31 23:59:00';
t1 = datetime(d1,'InputFormat','yyyy-MM-dd HH:mm:ss');
t2 = datetime(d2,'InputFormat','yyyy-MM-dd HH:mm:ss');
idx_time = (date_time >= t1) & (date_time <= t2);
Is there an easier way to do this?
You could simply use the month method to extract the month component from date_time, like this:
idx_time = month(date_time) == 1;
To create separate arrays for each month of data, you can use findgroups and splitapply, like this.
[g, id] = findgroups(month(date_time));
dataByMonth = splitapply(#(x) {x}, var, g)
This results in dataByMonth being a 12x1 cell array where each element is a single month of data. id tells you which month.
EDIT following discussions in the chat, it turns out that the following approach was what was needed.
l = load('data.mat');
% Create a timetable
tt = timetable(l.date_time, l.var);
% Aggregate per day
amountPerDay = retime(tt, 'daily', 'sum')
% Select days with non-zero amount
anyPerDay = timetable(rainPerDay.Time, double(amountPerDay.Var1 > 0))
% Compute the number of days per month with non-zero amount
retime(anyPerDay, 'monthly', 'sum')
(Note the use of double(amountPerDay.Var1>0) is to work around a limitation in older versions of MATLAB that do not permit retime to aggregate logical data)
EDIT 2:
To get the Time variable of the resulting timetable to display as a long month name, you can simply set the Format property of that variable:
rainyDaysPerMonth = retime(rainyDays, 'monthly', 'sum')
rainyDaysPerMonth.Time.Format = 'MMMM'
EDIT 3:
To get the rainiest day per month, this needs splitapply and a small helper function. Like this
g = findgroups(month(amountPerDay.Time));
% Use splitapply to find the day with the maximum amount. Also
% need to return the day on which that occurred, so we need a small
% helper function
rainiestDayPerMonth = splitapply(#iMaxAndLoc, amountPerDay.Time, ...
amountPerDay.Var1, g);
% Given vectors of time and value, return a single-row table
% with the time at which the max value occurs, and that max value
function out = iMaxAndLoc(t, v)
[maxV, idx] = max(v);
out = table(t(idx), maxV, 'VariableNames', {'Time', 'Value'});
end
Hi I programmed a 1d random walker and I am trying to implement a capture zone, where the program will stop if the walker remains in a specific range of values for a certain amount of time. The code I have looks like this:
steps = 1000; %sets the number of steps to 1000
rw = cumsum(-1 + 2 * round(rand(steps,1)),1); %Set up our random walk with cumsum
%Now we will set up our capture zone between 13-18 for fun
if rw >= 13 & rw <= 18
dwc = dwc + 1 %Dwelling counted ticks up every time walker is in 13-18
else dwc = 0; %Once it leaves, it returns to 0
end
while dwc >= 5
fprintf('5 steps or more within range after %d steps, so so breaking out.\n', rw);
break
end
figure(7)
comet(rw); %This will plot our random walk
grid on; %Just to see the capture zone better
hold on;
line(xlim, [13, 13], 'Color', 'r');
line(xlim, [18, 18], 'Color', 'r');
hold off;
title('1d Random Walk with Capture Zone');
xlabel('Steps');
ylabel('Position');
It will run through the walk, but it will never break in the capture zone. I am sure it has been in the capture zone for longer than 5 steps on multiple occasions but it keeps running anyway. Any help is appreciated.
You code isn't doing what you think. There is no loop to step through to count steps & check for capture (... you don't need a loop for that anyway)
First this issue: rw is a 1000x1 array. So you if statement condition rw >= 13 & rw <= 18 will likewise return an 1000x1 logical. Which won't make a lot of since.
Second issue is you never modify the condition of the while inside the loop so it will either pass over it or get stuck in and endless loop.
while dwc >= 5
...
break
end
Edit linear version with now loops:
steps = 1000; %sets the number of steps to 1000
rw = cumsum(-1 + 2 * round(rand(steps,1)),1); %Set up our random walk with cumsum
%Now we will set up our capture zone between 13-18 for fun
captureCheck = rw >= 13 & rw <= 18;
%Counts the number of consecutive steps within the capture zone.
consecStepsInZone = diff([0 (find( ~(captureCheck(:).' > 0))) numel(captureCheck) + 1])- 1;
fprintf('The max number of consecutive steps in the zone is: %d\n',max(consecStepsInZone));
I am having a question on SAS Macro (I do analytics in R and python, No SAS). SO, it is getting me into some lack of understanding in syntax of SAS in solving the following question.
Write a macro that accepts a table name, a column name, a list of integers, a main axis label and an x axis label. This function should scan over each element in the list of integers and produce a histogram for each integer value, setting the bin count to the element in the input list, and labeling main and x-axis with the specified parameters. You should label the y-axis to read Frequency, bins = and the number of bins.
Also I need to test macro with a data set, using bin numbers 12, 36, and 60. So, that I am able to call macro with something like
%plot_histograms(data, y, 12 36 60, main="Title", xlabel="x_label");
to plot three different histograms of the data set.
Hint: Assume 12 36 60 resolve to a single macro parameter and use %scan, macro definition can look something like
%macro plot_histograms(table_name, column_name, number_of_bins, main="Main", xlabel="X Label")
Thanks in Advance.
I don't fully understand your question and this is not a free code platform anyway but this should point you in the right direction
%macro plot_histograms(table_name, column_name, number_of_bins, main="Main", xlabel="X Label");
%do i=1 %to %sysfunc(countw(&number_of_bins.); /* loop accross elements in your input list */
proc gchart data=&table_name.; /*make a chart for the provided table */
...
/* whatever it is you actually need to do, fetch the current element of the input list like this */
%scan(&number_of_bins.,&i.)
...
run;
%end;
%mend;
First, you really should try this on your own and let us know where you get stuck.
That said, let's break down how to solve this problem.
Make some test data;
data test;
do i=1 to 10000;
r = rannor(1);
output;
end;
run;
How do I create a histogram with this? Use PROC SGPLOT
proc sgplot data=test;
histogram r / nbins=10;
xaxis label="X LABEL";
yaxis label="Y LABEL";
run;
Produces this:
So, if I make a macro to create this generally:
%macro histogram(data,column,bin,xlabel,ylabel);
proc sgplot data=&data;
histogram &column / nbins=&bin;
xaxis label="&xlabel";
yaxis label="&ylabel";
run;
%mend;
Now %histogram(test,r,10,X LABEL,Y LABEL)' produces the same image.
Let's write something that loops over the values of bins and call this macro:
%macro make_histograms(data,column,bins,xlabel,ylabel);
%local i n bin;
%let n=%sysfunc(countw(&bins)); /*Number of words in &bins*/
%do i=1 %to &n;
%let bin=%scan(&bins,&i); /*Get the nth bin*/
%histogram(&data,&column,&bin,&xlabel,&ylabel);
%end;
%mend;
I have a maze file which is like this.
1111111
1001111
1101101
1101001
1100011
1111111
a format $direction indicating the direction
start end label
D D down
L L left
R R right
U U up
Then, I have a dataset indicating the start and end point.
Row Column
start 2 2
end 3 6
How can I record the moving direction from the start to the end like this?
direction row column
2 2
right 2 3
down 3 3
down 4 3
down 5 3
i have use array
array m(i,j)
if m(i,j) = 0 then
row=i;
column=j;
output;
however, it simply just not in the correct moving order.
Thanks if you can help.
Here's one way of doing this. Writing a more generalised maze-solving algorithm using SAS data step logic is left as an exercise for the reader, but this should work for labyrinths.
/* Define the format */
proc format;
value $direction
'D' = 'down'
'L' = 'left'
'R' = 'right'
'U' = 'up'
;
run;
data want;
/*Read in the maze and start/end points in (y,x) orientation*/
array maze(6,7) (
1,1,1,1,1,1,1,
1,0,0,1,1,1,1,
1,1,0,1,1,0,1,
1,1,0,1,0,0,1,
1,1,0,0,0,1,1,
1,1,1,1,1,1,1
);
array endpoints (2,2) (
2,2
3,6
);
/*Load the start point and output a row*/
x = endpoints(1,2);
y = endpoints(1,1);
output;
/*
Navigate through the maze.
Assume for the sake of simplicity that it is really more of a labyrinth,
i.e. there is only ever one valid direction in which to move,
other than the direction you just came from,
and that the end point is reachable
*/
do _n_ = 1 by 1 until(x = endpoints(2,2) and y = endpoints(2,1));
if maze(y-1,x) = 0 and direction ne 'D' then do;
direction = 'U';
y + -1;
end;
else if maze(y+1,x) = 0 and direction ne 'U' then do;
direction = 'D';
y + 1;
end;
else if maze(y,x-1) = 0 and direction ne 'R' then do;
direction = 'L';
x + -1;
end;
else if maze(y,x+1) = 0 and direction ne 'L' then do;
direction = 'R';
x + 1;
end;
output;
if _n_ > 15 then stop; /*Set a step limit in case something goes wrong*/
end;
format direction $direction.;
drop maze: endpoints:;
run;
%macro intercept(i1= ,i2= );
%let n = %sysfunc(countw(&i1));
%do i = 1 %to &n;
%let val_i1 = %scan(&i1,&i,'');
%let val_i2 = %scan(&i2,&i,'');
data scores;
set repeat_score2;
/* Segment 1 probablity score */
p1 = 0;
z1 = &val_i1 +
a * 0.03 +
r * -0.0047841 +
p * -0.000916081 ;
p1 = 1/(1+2.71828**-z1);
/* Segment 2 probablity score */
p2 = 0;
z2 = &val_i2 +
r * 0.09 +
m * 0.012786245 +
c * -0.00179618 +
p2 = 1/(1+2.71828**-z2);
logit_score = 0;
if max(p1,p2) = p1 then logit_score = 1;
else if max(p1,p2) = p2 then logit_score = 2;
run;
proc freq data = scores;
table logit_score * clu_ /nocol norow nopercent;
run;
%end;
%mend;
%intercept (i1=-0.456491042, i2=-3.207379842, i3=-1.380627318 , i4=0.035684096, i5=-0.855283373);
%intercept (i1=-0.456491042 0, i2=-3.207379842 -3.207379842, i3=-1.380627318 -1.380627318, i4=0.035684096 0.035684096,
i5=-0.855283373 -0.855283373);
I have the above macro which takes the intercept for the two of the above models and then calculates the probablity score and then assigns a a value to to a segment based on that probablity score.
The first problem with above macro is when I execute the macro with one argument each it's resolving macro variable 'n' to 2 and executing twice. First iteration, it's giving the right results while for second it's wrong.
For the second implementation(macro with two aruguments each) n is resolving to 3 and scan is resolving to both of those values together at a time (eg. i1 for the iteration itself is -0.45 and 0), If I remove the space, then it taking '.' to be the delimiter and resolving that to ( 0,45,0 - one for each iteration). I don't get any results for this case.
How do I get this to work the right way?
Thanks!!!
%SCAN and COUNTW function by default consider punctuation symbols and blanks as delimiters. Since your arguments include decimal points, you need to state explicitly that delimiter should be blank for both COUNTW and %SCAN. Which you have done for %SCAN, but not for COUNTW.
So the 2nd line of the code should be:
%let n = %sysfunc(countw(&i1,' '))
And I'm not sure if it's a typo or just formatting thing, but in your %SCAN functions third argument looks like two quotes together '', not quote-blank-quote ' ' as it should be.