Does someone know how to make a graph similar to this one with matlab?
To me it seems like a continuous stacked bar plot.
I did not manage to download the same data so I used other ones.
I tried the following code:
clear all
filename = 'C:\Users\andre\Desktop\GDPpercapitaconstant2000US.xlsx';
sheet = 'Data';
xlRange = 'AP5:AP259'; %for example
A = xlsread(filename,sheet,xlRange);
A(isnan(A))=[]; %remove NaNs
%create four subsets
A1=A(1:70);
A2=A(71:150);
A3=A(151:180);
A4=A(181:end);
edges=80:200:8000; %bins of the distributions
[n_A1,xout_A1] = histc(A1,edges); %distributions of the first subset
[n_A2,xout_A2] = histc(A2,edges);
[n_A3,xout_A3] = histc(A3,edges);
[n_A4,xout_A4] = histc(A4,edges);
%make stacked bar plot
for ii=1:numel(edges)
y(ii,:) = [n_A1(ii) n_A2(ii) n_A3(ii) n_A4(ii)];
end
bar(y,'stacked', 'BarWidth', 1)
and obtained this:
It is not so bad.. Maybe with other data it would look nicer... but I was wondering if someone has better ideas. Maybe it is possible to adapt fitdist in a similar way?
First, define the x axis. If you want it to follow the rules of bar, then use:
x = 0.5:numel(edges)-0.5;
Then use area(x,y), which produces a filled/stacked area plot:
area(x,y)
And if you want the same colors as the example you posted at the top, define the colormap and call colormap as:
map = [
218 96 96
248 219 138
253 249 199
139 217 140
195 139 217
246 221 245
139 153 221]/255;
colormap(map)
(It may not be exactly as the one you posted, but I got it quite close I think. Also, not all colors are shown in the result below as there are only 4 parameters, but all colors are defined)
Result:
Related
Using Octave 4.2.1 on Windows with the qt graphics toolkit (I can't use gnuplot because it crashes in some other part of the code). I have a dataset which is 35x7x4 (35 data points for 7 conditions on 4 channels) - you can use random data for the purpose of this exercise.
I am trying to create 4 subplots (1 for each channel), with 7 bar graphs on each subplot (one per condition) to see how the distribution of data changes with each condition. Each of the 7x4 = 28 distributions has its own set of bins and frequencies, and I can't seem to be able to combine the 7 datasets on one graph (subplot).
Posting the whole of the code would be too complicated, but here's a simplified version:
nb_channels = 4;
nb_conditions = 7;
nbins = 15;
freq = zeros(nbins,nb_conditions,nb_channels);
xbin = zeros(nbins,nb_conditions,nb_channels);
plot_colours = [91 237 165 255 68 112 255;
155 125 165 192 114 173 0;
213 49 165 0 196 71 255];
plot_colours = plot_colours / 255;
for k = 1:nb_channels
for n = 1:nb_conditions
% some complex calculations to generate temp variable
[freq(:,n,k),xbin(:,n,k)] = hist(temp,nbins);
end
end
figure
for k = 1:nb_channels
subplot(2,2,k)
for n = 1:nb_conditions
bar(xbin(:,n,k),freq(:,n,k),'FaceColor',plot_colours(:,n))
hold on
end
hold off
legend('condition #1','condition #2','condition #3','condition #4','condition #5','condition #6','condition #7')
end
which gives something like this:
So you can't really see anything, all the bars are on top of each other. In addition, Octave doesn't support transparency property for patch objects (which is what bar charts use), so I can't overlay the histograms on top of each other, which I would really quite like to do.
Is there a better way to approach this? It seems that bar will only accept a vector for x data and not a matrix, so I am stuck in having to use hold on and loop through the various conditions, instead of using a matrix approach.
OK, so I'll try to answer my own question based on the suggestions made in the comments:
Suggestion 1: make all the bins the same
This does improve the results somewhat but it's still an issue due to the lack of transparency for patch objects.
Code changes:
nbins = 15;
xbin = linspace(5.8,6.5,nbins);
for k = 1:nb_channels
for n = 1:nb_conditions
% some complex calculations to generate temp variable
freq_flow(:,n,k) = hist(temp,xbin);
end
end
figure
for k = 1:nb_channels
subplot(2,2,k)
for n = 1:nb_conditions
bar(xbin,freq_flow(:,n,k),'FaceColor',plot_colours(:,n))
hold on
end
hold off
xlim([5.8 6.3])
legend('condition #1','condition #2','condition #3','condition #4','condition #5','condition #6','condition #7')
end
Which gives the following plot:
Suggestion 2: Use line plots instead of bar charts
This helps a bit more in terms of readability. However, the result is a bit "piece-wise".
Code changes:
figure
for k = 1:nb_channels
subplot(2,2,k)
for n = 1:nb_conditions
plot(xbin,freq_flow(:,n,k),'LineStyle','none','marker','.',...
'markersize',12,'MarkerEdgeColor',plot_colours(:,n),...
'MarkerFaceColor',plot_colours(:,n))
hold on
end
hold off
xlim([5.8 6.3])
legend('condition #1','condition #2','condition #3','condition #4','condition #5','condition #6','condition #7')
end
Which gives the following result:
The legend is a bit screwed but I can probably sort that out.
A variation on this I also tried was to plot just the points as markers, and then a fitted normal distribution on top. I won't post all the code here, but the result looks something like this:
Suggestion 3: transparency workaround with gnuplot
Unfortunately, before I even got to the transparency workaround, gnuplot keeps crashing when trying to plot the figure. There's something it doesn't like with subplots and legends I think (which is why I moved to qt graphics toolkit in the first place, as I had exactly the same issue in other parts of the code).
Solution 4: use 3D bar graph
I found this on SO: 3D histogram with gnuplot or octave
and used it as such:
figure
for k = 1:size(flow_factor,2)
subplot(2,2,k)
h = my_bar3(freq_flow(:,:,k));
fvcd = kron((1:numel(freq_flow(:,:,k)))', ones(6,1));
set(h, 'FaceVertexCData',fvcd, 'FaceColor','flat', 'CDataMapping','scaled')
colormap hsv; axis tight; view(50,25)
ylbl = cell(length(xbin),1);
for k=1:length(xbin)
ylb{k} = num2str(xbin(k));
end
set(gca,'YTick',1:2:nbins);
set(gca,'YTickLabel',ylb(1:2:end));
end
to produce:
Which isn't bad, but probably not as clear as the line plots.
Conclusion
On balance, I will probably end up using one of the line plots approaches, as they tend to be clearer.
I'm having surprisingly difficult time to figure out something which appears so simple. I have two known coordinates on a graph, (X1,Y1) and (X2,Y2). What I'm trying to identify are the coordinates for (X3,Y3).
I thought of using sin and cos but once I get here my brain stops working. I know that
sin O = y/R
cos O = x/R
so I thought of simply importing in the length of the line (in this case it was 2) and use the angles which are known. Seems very simple but for the life of me, my brain won't wrap around this.
The reason I need this is because I'm trying to print a line onto an image using poly2mask in matlab. The code has to work in the 2D space as I will be building movies using the line.
X1 = [134 134 135 147 153 153 167]
Y1 = [183 180 178 173 164 152 143]
X2 = [133 133 133 135 138 143 147]
Y2 = [203 200 197 189 185 173 163]
YZdist = 2;
for aa = 1:length(X2)
XYdis(aa) = sqrt((x2(aa)-x1(aa))^2 + (Y2(aa)-Y1(aa))^2);
X3(aa) = X1(aa) * tan(XYdis/YZdis);
Y3(aa) = Y1(aa) * tan(XYdis/YZdis);
end
polmask = poly2mask([Xdata X3],[Ydata Y3],50,50);
one approach would be to first construct a vector l connection points (x1,y1) and (x2,y2), rotate this vector 90 degrees clockwise and add it to the point (x2,y2).
Thus l=(x2-x1, y2-y1), its rotated version is l'=(y2-y1,x1-x2) and therefore the point of interest P=(x2, y2) + f*(y2-y1,x1-x2), where f is the desired scaling factor. If the lengths are supposed to be the same, then f=1 and thus P=(x2 + y2-y1, y2 + x1-x2).
I am asking a follow-up my question here, in which there was a perfect solution that did exactly what I wanted. But I'm wondering how to apply this method, or do something similar, if instead of yes/no as possible responses, I would have more than 2 responses, so yes/no/maybe, for example. Or how it would generalize to 3+ responses.
This is the answer, reformatted as my question:
Assuming my data looks like this:
responses = categorical(randi(3,1250,1),[1 2 3],{'no','yes','maybe'});
race = categorical(randi(5,1250,1),1:5,{'Asian','Black','BHispanic','White','WHispanic'});
I would like to go through and do the same thing with my yes/no data, but do this with 3 possibilities, or more. And this will not end up working anymore:
% convert everything to numeric:
yn = double(responses);
rac = double(race);
% caluculate all frequencies:
data = accumarray(rac,yn-1);
data(:,2) = accumarray(rac,1)-data;
% get the categories names:
races = categories(race);
answers = categories(responses);
% plotting:
bar(data,0.4,'stacked');
ax = gca;
ax.XTickLabel = races; % set the x-axis ticks to the race names
legend(answers) % add a legend for the colors
colormap(lines(3)) % use nicer colors (close to your example)
ylabel('YES/NO/MAYBE')% set the y-axis label
% some other minor fixes:
box off
ax.YGrid = 'on';
I'm not sure if there is even a way to use the accumarray method to do this, as it doesn't make sense from my understanding to use this with 3 possible responses. I'd like to generalize it to n possible responses too.
UPDATE: I'm currently investigating the crosstab feature which I didn't find at all until now! I think this may be the feature I'm looking for.
Here is a generalized version:
% the data (with even more categories):
yesno = categorical(randi(4,1250,1),1:4,{'no','yes','maybe','don''t know'});
race = categorical(randi(5,1250,1),1:5,{'Asian','Black','BHispanic','White','WHispanic'});
% convert everything to numeric:
yn = double(yesno);
rac = double(race);
% caluculate all frequencies:
data = accumarray([rac yn],1);
% get the categories names:
races = categories(race);
answers = categories(yesno);
% plotting:
bar(data,0.4,'stacked');
ax = gca;
ax.XTickLabel = races; % set the x-axis ticks to the race names
legend(answers) % add a legend for the colors
colormap(lines(numel(answers))) % use pretier colors
ylabel('YES/NO')% set the y-axis lable
% some other minor fixes:
box off
ax.YGrid = 'on';
The result:
And in a table:
T = array2table(data.','VariableNames',races,'RowNames',answers)
the output:
T =
Asian Black BHispanic White WHispanic
_____ _____ _________ _____ _________
no 58 72 69 66 62
yes 58 53 72 54 58
maybe 63 62 67 62 61
don't know 58 57 66 58 74
As you already mentioned, you can use crosstab for the same task. crosstab(rac,yn) will give you the same result as accumarray([rac yn],1). I think accumarray is faster, though I didn't check it.
I'm in the process of coding what I'm learning about Linear Regression from the coursera Machine Learning course (MATLAB). There was a similar post that I found here, but I don't seem to be able to understand everything. Perhaps because my fundamentals in Machine Learning are a bit weak.
The problem I'm facing is that, for some data... both gradient descent (GD) and the Closed Form Solution (CFS) give the same hypothesis line. However, on one particular dataset, the results are different. I've read something about, that, if the data is singular, then results should be the same. However, I have no idea how to check whether or not my data is singular.
I will try to illustrate the best I can:
1) Firstly, here is the MATLAB code adapted from here. For the given dataset, everything turned out good where both GD and the CFS gave similar results.
The Dataset
X Y
2.06587460000000 0.779189260000000
2.36840870000000 0.915967570000000
2.53999290000000 0.905383540000000
2.54208040000000 0.905661380000000
2.54907900000000 0.938988900000000
2.78668820000000 0.966847400000000
2.91168250000000 0.964368240000000
3.03562700000000 0.914459390000000
3.11466960000000 0.939339440000000
3.15823890000000 0.960749710000000
3.32759440000000 0.898370940000000
3.37931650000000 0.912097390000000
3.41220060000000 0.942384990000000
3.42158230000000 0.966245780000000
3.53157320000000 1.05265000000000
3.63930020000000 1.01437910000000
3.67325370000000 0.959694260000000
3.92564620000000 0.968537160000000
4.04986460000000 1.07660650000000
4.24833480000000 1.14549780000000
4.34400520000000 1.03406250000000
4.38265310000000 1.00700090000000
4.42306020000000 0.966836480000000
4.61024430000000 1.08959190000000
4.68811830000000 1.06344620000000
4.97773330000000 1.12372390000000
5.03599670000000 1.03233740000000
5.06845360000000 1.08744520000000
5.41614910000000 1.07029880000000
5.43956230000000 1.16064930000000
5.45632070000000 1.07780370000000
5.56984580000000 1.10697580000000
5.60157290000000 1.09718750000000
5.68776170000000 1.16486030000000
5.72156020000000 1.14117960000000
5.85389140000000 1.08441560000000
6.19780260000000 1.12524930000000
6.35109410000000 1.11683410000000
6.47970330000000 1.19707890000000
6.73837910000000 1.20694620000000
6.86376860000000 1.12510460000000
7.02233870000000 1.12356720000000
7.07823730000000 1.21328290000000
7.15142320000000 1.25226520000000
7.46640230000000 1.24970650000000
7.59738740000000 1.17997060000000
7.74407170000000 1.18972990000000
7.77296620000000 1.30299340000000
7.82645140000000 1.26011340000000
7.93063560000000 1.25622670000000
My MATLAB code:
clear all; close all; clc;
x = load('ex2x.dat');
y = load('ex2y.dat');
m = length(y); % number of training examples
% Plot the training data
figure; % open a new figure window
plot(x, y, '*r');
ylabel('Height in meters')
xlabel('Age in years')
% Gradient descent
x = [ones(m, 1) x]; % Add a column of ones to x
theta = zeros(size(x(1,:)))'; % initialize fitting parameters
MAX_ITR = 1500;
alpha = 0.07;
for num_iterations = 1:MAX_ITR
thetax = x * theta;
% for theta_0 and x_0
grad0 = (1/m) .* sum( x(:,1)' * (thetax - y));
% for theta_0 and x_0
grad1 = (1/m) .* sum( x(:,2)' * (thetax - y));
% Here is the actual update
theta(1) = theta(1) - alpha .* grad0;
theta(2) = theta(2) - alpha .* grad1;
end
% print theta to screen
theta
% Plot the hypothesis (a.k.a. linear fit)
hold on
plot(x(:,2), x*theta, 'ob')
% Plot using the Closed Form Solution
plot(x(:,2), x*((x' * x)\x' * y), '--r')
legend('Training data', 'Linear regression', 'Closed Form')
hold off % don't overlay any more plots on this figure''
[EDIT: Sorry for the wrong labeling... It's not Normal Equation, but Closed Form Solution. My mistake]
The results for this code is as shown below (Which is peachy :D Same results for both GD and CFS) -
Now, I am testing my code with another dataset. The URL for the dataset is here - GRAY KANGAROOS. I converted it to CSV and read it into MATLAB. Note that I did scaling (divided by the maximum, since if I didn't do that, no hypothesis line appears at all and the thetas come out as Not A Number (NaN) in MATLAB).
The Gray Kangaroo Dataset:
X Y
609 241
629 222
620 233
564 207
645 247
493 189
606 226
660 240
630 215
672 231
778 263
616 220
727 271
810 284
778 279
823 272
755 268
710 278
701 238
803 255
855 308
838 281
830 288
864 306
635 236
565 204
562 216
580 225
596 220
597 219
636 201
559 213
615 228
740 234
677 237
675 217
629 211
692 238
710 221
730 281
763 292
686 251
717 231
737 275
816 275
The changes I made to the code to read in this dataset
dataset = load('kangaroo.csv');
% scale?
x = dataset(:,1)/max(dataset(:,1));
y = dataset(:,2)/max(dataset(:,2));
The results that came out was like this: [EDIT: Sorry for the wrong labeling... It's not Normal Equation, but Closed Form Solution. My mistake]
I was wondering if there is any explanation for this discrepancy? Any help would be much appreciate. Thank you in advance!
I haven't run your code, but let me trow you some theory:
If your code is right (it looks like it): Increase MAX_ITER and it will look better.
Gradient descend is not ensured to converge at MAX_ITER, and actually gradient descend is a quite slow method (convergence-wise).
The convergence of Gradient descend for a "standard" convex function (like the one you try to solve) looks like this (from the Internets):
Forget, about iteration number, as it depedns in the problem, and focus in the shape. What may be happening is that your maxiter falls somewhere like "20" in this image. Thus your result is good, but not the best!
However, solving the normal equations directly will give you the minimums square error solution. (I assume normal equation you mean x=(A'*A)^(-1)*A'*b). The problem is that there are loads of cases where you can not store A in memory, or in an ill-posed problem, the normal equation will lead to ill-conditioned matrices that will be numerically unstable, thus gradient descend is used.
more info
I think I figured it out.
I immaturely thought that a maximum iteration of 1500 was enough. I tried with a higher value (i.e. 5k and 10k), and both algorithms started to give the similar solution. So my main issue was the number of iterations. It needed more iteration to properly converge for that dataset :D
How can do edge detection on the ROI (only) of an image without processing the rest of the image? I have tried the following but it is not working:
h4 = #(x) edge(x,'log');
Edge_map = roifilt2(Foregound_Newframe,roi_mask,h4);
roi_mask is the binary mask that I am using and Foregound_Newframe is the gray image to be processed. Kindly provide an example. Thanks.
The error I see is that the function you are using to do the filtering requires input argument of type double, otherwise your calling syntax should work fine.
i.e. use
YourFilter = #(x) edge(double(x),'log');
When I apply this to an example fromroifilt2 docs it works fine (ok it looks weird in this case...):
clc
clear
FullImage = imread('eight.tif');
roi_col = [222 272 300 270 221 194];
roi_row = [21 21 75 121 121 75];
ROI = roipoly(FullImage,roi_col,roi_row);
YourFilter = #(x) edge(double(x),'log');
J = roifilt2(FullImage,ROI,YourFilter);
figure, imshow(FullImage), figure, imshow(J)
with following output: