Memory-speed issues when doing a scatter plot in Matlab - matlab

I have the following memory-speed problem in Matlab and I would like your help to understand whether there may be a solution.
Consider the following 4 big column vectors X1, X2, Y1, Y2.
clear
rng default
P=10^8;
X1=rand(1,P)*5;
X2=rand(1,P)*5;
Y1=rand(1,P)*5;
Y2=rand(1,P)*5;
What I would like to do is a scatter plot where on the x-axis I have the sum between any possible two elements of X1 and X2 and on the y-axis I have the sum between any possible two elements of Y1 and Y2.
I post here three options I thought about that do not work mainly because of memory and speed issues.
Option 1 (issues: too slow when doing the loop, out of memory when doing vertcat)
Xtemp=cell(P,1);
Ytemp=cell(P,1);
for i=1:P
tic
Xtemp{i}=X1(i)+X2(:);
Ytemp{i}=Y1(i)+Y2(:);
toc
end
X=vertcat(Xtemp{:});
Y=vertcat(Ytemp{:});
scatter(X,Y)
Option 2 (issues: too slow when doing the loop, time increasing as the loop proceeds, Matlab going crazy and unable to produce the scatter even if I stop the loop after 5 iterations)
for i=1:P
tic
scatter(X1(i)+X2(:), Y1(i)+Y2(:))
hold on
toc
end
Option 3 (sort of giving up) (issues: as I increase T the scatter gets closer and closer to a square which is correct; I am wondering though whether this is caused by the fact that I generated the data using rand and in option 3 I use randi; maybe with my real data the scatter does not "converge" to the true plot as I increase T; also, what is the "optimal" T and R?).
T=20;
R=500;
for t=1:T
tic
%select R points at random from X1,X2,Y1,Y2
X1sel=(X1(randi(R,R,1)));
X2sel=(X2(randi(R,R,1)));
Y1sel=(Y1(randi(R,R,1)));
Y2sel=(Y2(randi(R,R,1)));
%do option 1 among those points and plot
Xtempsel=cell(R,1);
Ytempsel=cell(R,1);
for r=1:R
Xtempsel{r}=X1sel(r)+X2sel(:);
Ytempsel{r}=Y1sel(r)+Y2sel(:);
end
Xsel=vertcat(Xtempsel{:});
Ysel=vertcat(Ytempsel{:});
scatter(Xsel,Ysel, 'b', 'filled')
hold on
toc
end
Is there a way to do what I want or is simply impossible?

You are trying to build a vector with P^2 elements, i.e. 10^16. This is many order of magnitude more that what would fit into the memory of a standard computer (10GB is 10^10 bytes or 1.2 billion double precision floats).
For smaller vectors (i.e. P<1e4), try:
Xsum=bsxfun(#plus,X1,X2.'); %Matrix with the sum of any two elements from X1 and X2
X=X(:); %Reshape to vector
Ysum=bsxfun(#plus,Y1,Y2.');
Y=Y(:);
plot(X,Y,'.') %Plot as small dots, likely to take forever if there are too many points
To build a figure with a more reasonable number of pairs picked randomly from these large vectors:
Npick=1e4;
sel1=randi(P,[Npick,1]);
sel2=randi(P,[Npick,1]);
Xsel=X1(sel1)+X2(sel2);
Ysel=Y1(sel1)+Y2(sel2);
plot(Xsel,Ysel,'.'); %Plot as small dots

Related

Linear Support Vector Machine Implementation in MATLAB (from scratch)

I am looking for some help on determining the linear decision boundary between two classes.
I've taken a look at the search results with no luck.
Implementing a linear, binary SVM (support vector machine) is similar but not quite on the mark.
My question comes down to how to pull the correct line equation out of the weight vector.
Given a matrix of test data X=[Xa Xb], where Xa=[Nx2] && Xb=[Nx2] data samples.
These are in two classes (-1,1) saved in [Nx1] vector Y=[1 1 ... 1 -1 -1... -1]'
I use MATLAB's quadprog.m to solve the quadratic program...
I understand that we want to solve w1x1 + w2x2 + wo=0, so in my code I solve for W=[w1 w2]; and I solve for wo=1/Y1 - WX1^T.
When implemented my decision boundary plots as:
Clearly this is not what I want. The slope of the line looks legit, but I think I want to translate it north a little bit to optimize. In this picture the yellow dots are the support vectors.
It appears that I used the first data point, X(1,:) to be precise, and my line sucks. If I use different points it draws the line in different places, this makes sense algebraically, but how do i get the optimal boundary, not just a parallel boundary orthogonal to the weighting vector?
Thanks!
If you're interested in the code, here it is:
function [alph,w,wo,sv]=svm_binary(Class1,Class2)
x1=Class1;
x2=Class2;
% Combine data into one set
xt=[x1;x2];
% Create class labels
y=[ones(length(x1),1); -1.*ones(length(x2),1)];
N=length(xt);
% Scatter plot of original data class data points
figure
scatter(x1(:,1),x1(:,2));
hold on
scatter(x2(:,1),x2(:,2));
legend('Class1','Class2')
xlabel('x1')
ylabel('x2')
title('Class Data Scatter Plot')
% Data component of Langrangian dual
H=(xt*xt').*(y*y');
% Vector to flip signs
f=-ones(N,1);
%Constraint 1) a(i)>=0
A= -eye(N);
a=zeros(N,1);
% Constraint 2) sum[a(i)y(i)]=0
B=[y';zeros(N-1,N)];
b=zeros(N,1);
%Solve Quadratic Programming optimization for alpha
alph=quadprog(H+eye(N)*.001,f,A,a,B,b);
%Solve for W
w=(alph.*y)'*xt;
sv=[];
for i=1:length(xt)
if abs(alph(i))>=.0000001
sv=[sv i];
end
end
xtsv=xt(sv,:);
wo=1/y(1)-w*xt(1,:)';
if abs(w(1))<=.000001
y=-wo/w(2).*ones(round(max(xt(:,1))-min(xt(:,1))),1);
x=min(xt(:,2)):(max(xt(:,2))-min(xt(:,2)))/(length(y)-1):max(xt(:,2));
elseif abs(w(2))<=.000001
x=-wo/w(1).*ones(round(max(xt(:,2))-min(xt(:,2))),1);
y=min(xt(:,1)):(max(xt(:,1))-min(xt(:,1)))/(length(x)-1):max(xt(:,1));
else
x=round(min(xt(:,1))):round(max(xt(:,1)))
y=(w(1)/w(2)).*-x-wo/(w(2));
end
sv=[];
for i=1:length(xt)
if abs(alph(i))>=.0000001
sv=[sv i];
end
end
xtsv=xt(sv,:);
scatter(xtsv(:,1),xtsv(:,2),'fillled','markeredgecolor','black','markerfacecolor','yellow');
% y=-(w(1).*x)-wo
length(x)
length(y)
hold on
plot(x,y)

Matlab, figures and for loops

I am trying to plot the following simple function; $y=A.*x$ with different A parameter values i.e. A=0,1,2,3 all on the same figure. I know how to plot simple functions i.e. $y=x$ by setting up x as a linspace vector so defining x=linspace(0,10,100); and I know that one can use the hold command.
I thought that one could simply use a for loop, but the problem then is getting a plot of all the permutations on one figure, i.e. I want a plot of y=t,2*t,3*t,4*t on the same figure. My attempt is as follows:
x=linspace(0,10,100);
%Simple example
Y=x;
figure;
plot(Y);
%Extension
B=3;
F=B*x;
figure;
plot(F);
%Attempt a for loop
for A= [0,1,2,3]
G=A*x;
end
figure;
plot(G);
This is how I would plot your for loop example:
figure;
hold all;
for A=[0,1,2,3]
G=A*x;
plot(G);
end
figure creates a new figure. hold all means that subsequent plots will appear on the same figure (hold all will use different colours for each plot as opposed to hold on). Then we plot each iteration of G within the loop.
You can also do it without the loop. As with most things in Matlab, removing the loop should give improved performance.
figure;
A=[0,1,2,3];
G=x'*A;
plot(G);
G is the outer product of the two vectors x and A (with x having been transposed into a column vector). plot is used to plot the columns of the 100x4 matrix G.

plotting 2 variable of different size in matlab

Am trying to plot 2 variable of different size length in matlab GUI using push button,
but because the variables are of different length it will not work,is there a way i can make it to plot.
d= pdist([x,y,z],'euclidean') ; % value of my distance
dd= 1:10:d; % interval and end 'd' value
FSL=-120; %value of free space loss get from the GUI
DFSL= 1:10:FSL %interval and end at FSL value
plot(dd,DFSL)
The plot code didnt work coming back with an error "
Error using plot
Vectors must be the same lengths"
You can plot vectors of two different lengths, but not against each other. You have used the syntax
plot(x,y)
which means for every element in vector x, there should be a corresponding element in vector y. In your case, you do not have this, hence the error.
You can plot like this though:
plot(x)
figure;
plot(y)
If you are looking to plot them in a single plot, subplot will be useful.

How can I find equation of a plot connecting data points in Matlab?

I have various plots (with hold on) as show in the following figure:
I would like to know how to find equations of these six curves in Matlab. Thanks.
I found interactive fitting tool in Matlab simple and helpful, though somewhat limited in scope:
The graph above seems to be linear interpolation. Given vectors X and Y of data, where X contains the arguments and Y the function points, you could do
f = interp1(X, Y, x)
to get the linearly interpolated value f(x). For example if the data is
X = [0 1 2 3 4 5];
Y = [0 1 4 9 16 25];
then
y = interp1(X, Y, 1.5)
should give you a very rough approximation to 1.5^2. interp1 will match the graph exactly, but you might be interested in fancier curve-fitting operations, like spline approximations etc.
Does rxns stand for reactions? In that case, your curves are most likely exponential. An exponential function has the form: y = a*exp(b * x) . In your case, y is the width of mixing zone, and x is the time in years. Now, all you need to do is run exponential regression in Matlab to find the optimal values of parameters a and b, and you'll have your equations.
The advice, though there might be better answer, from me is: try to see the rate of increase in the curve. For example, cubic is more representative than quadratic if the rate of increase seems fast and find the polynomial and compute the deviation error. For irregular curves, you might try spline fitting. I guess there is also a toolbox in matlab for spline fitting.
There is a way to extract information with the current figure handle (gcf) from you graph.
For example, you can get the series that were plotted in a graph:
% Some figure is created and data are plotted on it
figure;
hold on;
A = [ 1 2 3 4 5 7] % Dummy data
B = A.*A % Some other dummy data
plot(A,B);
plot(A.*3,B-1);
% Those three lines of code will get you series that were plotted on your graph
lh=findall(gcf,'type','line'); % Extract the plotted line from the figure handle
xp=get(lh,'xdata'); % Extract the Xs
yp=get(lh,'ydata'); % Extract the Ys
There must be other informations that you can get from the "findall(gcf,...)" methods.

matlab parfor leads to larger execution time than a for loop

I have a 3 dimensional grid, in which for each point of the grid I want to calculate a time dependent function G(t) for a large number of time steps and then summing the G function for each grid point. Using 4 for loops the execution time is becoming very large, so I am trying to avoid this using parfor.
a part of my code:
for i=1:50
for j=1:50
for k=1:25
x_in=i*dx;
y_in=j*dy;
z_in=k*dz;
%dx,dy, dz are some fixed values
r=sqrt((xx-x_in).^2+(yy-y_in).^2+(zz-z_in).^2);
%xx,yy,zz are 50x50x25 matrices generated from meshgrid
% r is a 3d matrix which produced from a 3 for-loop, for all the points of grid
parfor q=1:100
t=0.5*q;
G(q)=((a*p)/(t.^1.5)).*(exp(-r.^2/(4*a*t)));
% a,p are some fixed values
end
GG(i,j,k)=sum(G(:));
end
end
end
When I am using parfor the execution time becomes larger, and I am not sure why this is happening. Maybe I am not so familiar with sliced and indexed variables on a parfor loop.
My pc processor has 8 threads and ram memory ddr3 8GB
Any help will be great.
Thanks
As has been discussed in a previous question, parfor comes with an overhead. Therefore, loop that is too simple will execute more slowly with parfor.
In your case, the solution may be to parallelize the outermost loops.
%# preassign GG
GG = zeros(50,50,25);
%# loop over indices into GG
parfor idx = 1:(50*50*25)
[i,j,k] = ind2sub([50 50 25],idx);
x_in=i*dx;
y_in=j*dy;
z_in=k*dz;
%dx,dy, dz are some fixed values
r=sqrt((xx-x_in).^2+(yy-y_in).^2+(zz-z_in).^2);
%xx,yy,zz are 50x50x25 matrices generated from meshgrid
% r is a 3d matrix which produced from a 3 for-loop, for all the points of grid
for q=1:100
t=0.5*q;
G(q)=((a*p)/(t.^1.5)).*(exp(-r.^2/(4*a*t)));
% a,p are some fixed values
end
GG(idx)=sum(G(:));
end