Stochastic gradient Descent implementation - MATLAB - matlab

I'm trying to implement "Stochastic gradient descent" in MATLAB. I followed the algorithm exactly but I'm getting a VERY VERY large w (coffients) for the prediction/fitting function. Do I have a mistake in the algorithm ?
The Algorithm :
x = 0:0.1:2*pi // X-axis
n = size(x,2);
r = -0.2+(0.4).*rand(n,1); //generating random noise to be added to the sin(x) function
t=zeros(1,n);
y=zeros(1,n);
for i=1:n
t(i)=sin(x(i))+r(i); // adding the noise
y(i)=sin(x(i)); // the function without noise
end
f = round(1+rand(20,1)*n); //generating random indexes
h = x(f); //choosing random x points
k = t(f); //chossing random y points
m=size(h,2); // length of the h vector
scatter(h,k,'Red'); // drawing the training points (with noise)
%scatter(x,t,2);
hold on;
plot(x,sin(x)); // plotting the Sin function
w = [0.3 1 0.5]; // starting point of w
a=0.05; // learning rate "alpha"
// ---------------- ALGORITHM ---------------------//
for i=1:20
v = [1 h(i) h(i).^2]; // X vector
e = ((w*v') - k(i)).*v; // prediction - observation
w = w - a*e; // updating w
end
hold on;
l = 0:1:6;
g = w(1)+w(2)*l+w(3)*(l.^2);
plot(l,g,'Yellow'); // drawing the prediction function

If you use too big learning rate, SGD is likely to diverge.
The learing rate should converge to zero.

typically, if w ended up with too large values, there is overfitting. I didn't really look at your code carefully. But I think, what is missing from your code is a proper regularization term, which prevents the training overfitting. Also, here:
e = ((w*v') - k(i)).*v;
The v here is not the gradient of the predicted value, isn't it? According to algorithm, you should replace it. Let's see how it will be like after doing this.

Related

Vectors must be the same length error in Curve Fitting in Matlab

I'm having problems in curve fitting my randomized data for the function
Here is my code
N = 100;
mu = 5; stdev = 2;
x = mu+stdev*randn(N,1);
bin=mu-6*stdev:0.5:mu+6*stdev;
f=hist(x,bin);
plot(bin,f,'bo'); hold on;
x_ = x(1):0.1:x(end);
y_ = (1./sqrt(8.*pi)).*exp(-((x_-mu).^2)./8);
plot(x_,y_,'b-'); hold on;
It seems like I'm having vector size problems since it is giving me the error
Error using plot
Vectors must be the same length.
Note that I simplified y_ since mu and the standard deviation is known.
Plot:
Well first of all some adjustments to your question:
You are not trying to do curve fitting. What you are trying to do (in my opinion) is to overlay a probability density function on an histogram obtained by taking random points from the same distribution (A normal distribution with parameters (mu,sigma)). These two curve should indeed overlay, as they represent the same thing, only one is analytical and the other one is obtained numerically.
As seen in the hist documentation, hist is not recommended and you should use histogram instead
First step: Generating your random data
Knowing the distribution is the Normal distribution, we can use MATLAB's random function to do that :
N = 150;
rng('default') % For reproducibility
mu = 5;
sigma = 2;
r = random('Normal',mu,sigma,N,1);
Second step: Plot the histogram
Because we don't just want a count of the elements in each bin, but a feel of the probability density function, we can use the 'Normalization' 'pdf' arguments
Nbins = 25;
f=histogram(r,Nbins,'Normalization','pdf');
hold on
Here I'd rather specify a number of bins than specifying the bins themselves, because you never know in advance how far from the mean your data is going to be.
Last step: overlay the probability density function over the histogram
The histogram being already consistent with a probability density function, it is sufficient to just overlay the density function:
x_ = linspace(min(r),max(r),100);
y_ = (1./sqrt(2*sigma^2*pi)).*exp(-((x_-mu).^2)./(2*sigma^2));
plot(x_,y_,'b-');
With N = 150
With N = 1500
With N = 150.000 and Nbins = 50
If for some obscure reason you want to use old hist() function
The old hist() function can't handle normalization, so you'll have to do it by hand, by normalizing your density function to fit your histogram:
N = 1500;
% rng('default') % For reproducibility
mu = 5;
sigma = 2;
r = random('Normal',mu,sigma,1,N);
Nbins = 50;
[~,centers]=hist(r,Nbins);
hist(r,Nbins); hold on
% Width of bins
Widths = diff(centers);
x_ = linspace(min(r),max(r),100);
y_ = N*mean(Widths)*(1./sqrt(2*sigma^2*pi)).*exp(-((x_-mu).^2)./(2*sigma^2));
plot(x_,y_,'r-');

calculating sum of two triangular random variables (Matlab)

I would like to calculate the sum of two triangular random variables,
P(x1+x2 < y)
Is there a faster way to implement the sum of two triangular random variables in Matlab?
EDIT: It seems there's possibly a much easier way, as shown in this minitab demonstration. So it's not impossible. It doesn't explain how the PDF was calculated, sadly. Still looking into how I can do this in matlab.
EDIT2: Following advice, I'm using conv function in Matlab to develop the PDF of the sum of two random variables:
clear all;
clc;
pd1 = makedist('Triangular','a',85,'b',90,'c',100);
pd2 = makedist('Triangular','a',90,'b',100,'c',110);
x = linspace(85,290,200);
x1 = linspace(85,100,200);
x2 = linspace(90,110,200);
pdf1 = pdf(pd1,x1);
pdf2 = pdf(pd2,x2);
z = median(diff(x))*conv(pdf1,pdf2,'same');
p1 = trapz(x1,pdf1) %probability P(x1<y)
p2 = trapz(x2,pdf2) %probability P(x2<y)
p12 = trapz(x,z) %probability P(x1+x2 <y)
hold on;
plot(x1,pdf1) %plot pdf of dist. x1
plot(x2,pdf2) %plot pdf of dist. x2
plot(x,z) %plot pdf of x1+x2
hold off;
However this code has two problems:
PDF of X1+X2 integrates to much higher than 1.
PDF of X1+X2 varies widely depending on the range of x. Intuitively, if the X1+X2 is larger than 210 (the sum of upper limits "c" of two individual triangular distributions, 100 + 110), shouldn't P(X1+X2 <210) equal to 1? Also, since the lower limits "a" is 85 and 90, P(X1+X2 <85) = 0?
The pdf of the sum of independent variables is the convolution of the pdf's of the variables. So you need to compute the convolution of two variables with trianular pdf's. A triangle is piecewise linear, so the convolution will be piecewise quadratic.
There are a few ways to about it. If a numerical result is acceptable: discretize the pdf's and compute the convolution of the discretized pdf's. I believe there is a function conv in Matlab for that. If not, you can take the fast Fourier transform (via fft), compute the product point by point, then take the inverse transform (ifft if I remember correctly) since fft(convolution(f, g)) = fft(f) fft(g). You will need to be careful about normalization if you use either conv or fft.
If you must have an exact result, the convolution is just an integral, and if you're careful with the limits of integration, you can figure it out by hand. I don't know if the Matlab symbolic toolbox is available to you, and if so, I don't know if it can handle integrals of functions defined piecewise.
Below is the proper implementation for future users. Many thanks to Robert Dodier for guidance.
clear all;
clc;
min1 = 85;
max1 = 100;
min2 = 90;
max2 = 110;
y = 210;
pd1 = makedist('Triangular','a',min1,'b',90,'c',max1);
pd2 = makedist('Triangular','a',min2,'b',100,'c',max2);
dx = 0.01; % to ensure constant spacing
x1 = min1:dx:max1; % Could include some of the region where
x2 = min2:dx:max2; % the pdf is 0, but we don't have to.
x12 = linspace(...
x1(1) + x2(1) , ...
x1(end) + x2(end) , ...
length(x1)+length(x2)-1);
[c,index] = min(abs(x12-y));
x_short = linspace(min1+min2,x12(index),index);
pdf1 = pdf(pd1,x1);
pdf2 = pdf(pd2,x2);
pdf12 = conv(pdf1,pdf2)*dx;
zz = pdf12(1:index);
zz(index) = 0;
p1 = trapz(x1,pdf1)
p2 = trapz(x2,pdf2)
p12 = trapz(x_short,zz)
plot(x1,pdf1,x2,pdf2,x12,pdf12)
hold on;
fill(x_short,zz,'blue') % plot x1+x2
hold off;

I need to spectral clustering for two donuts shape dataset.(Matlab)

I have tried hours but I cannot find solution.
I have "two Donuts" Data sample (variable "X")
you can download file below link
donut dataset(rings.mat)
which spreads to 2D shape like below image
First 250pts are located inside donuts and last 750 pts are located outside donuts.
and I need to perform spectral clustering.
I made (similarity matrix "W") with Gaussian similarity distance.
and I made degree matrix by sum of each raw of "W"
and then I computed eigen value(E) and eigen Vector(V)
and the shape of "V" is not good.
what is wrong with my trial???
I cannot figure out.
load rings.mat
[D, N] = size(X); % data stored in X
%initial plot data
figure; hold on;
for i=1:N,
plot(X(1,i), X(2,i),'o');
end
% perform spectral clustering
W = zeros(N,N);
D = zeros(N,N);
sigma = 1;
for i=1:N,
for j=1:N,
xixj2 = (X(1,i)-X(1,j))^2 + (X(2,i)-X(2,j))^2 ;
W(i,j) = exp( -1*xixj2 / (2*sigma^2) ) ; % compute weight here
% if (i==j)
% W(i,j)=0;
% end;
end;
D(i,i) = sum(W(i,:)) ;
end;
L = D - W ;
normL = D^-0.5*L*D^-0.5;
[u,s,v] = svd(normL);
If you use the Laplacian like it is in your code (the "real" laplacian), then to cluster your points into two sets you will want the eigenvector corresponding to second smallest eigenvalue.
The intuitive idea is to connect all of your points to each other with springs, where the springs are stiffer if the points are near each other, and less stiff for points far away. The eigenvectors of the Laplacian are the modes of vibration if you hit your spring network with a hammer and watch it oscillate - smaller eigenvalues corresponding to lower frequency "bulk" modes, and larger eigenvalues corresponding to higher frequency oscillations. You want the eigenvalue corresponding to the second smallest eigenvalue, which will be like the second mode in a drum, with a positive clustered together, and negative part clustered together.
Now there is some confusion in the comments about whether to use the largest or smallest eigenvalue, and it is because the laplacian in the paper linked there by dave is slightly different, being the identity minus your laplacian. So there they want the largest ones, whereas you want the smallest. The clustering in the paper is also a bit more advanced, and better, but not as easy to implement.
Here is your code, modified to work:
load rings.mat
[D, N] = size(X); % data stored in X
%initial plot data
figure; hold on;
for i=1:N,
plot(X(1,i), X(2,i),'o');
end
% perform spectral clustering
W = zeros(N,N);
D = zeros(N,N);
sigma = 0.3; % <--- Changed to be smaller
for i=1:N,
for j=1:N,
xixj2 = (X(1,i)-X(1,j))^2 + (X(2,i)-X(2,j))^2 ;
W(i,j) = exp( -1*xixj2 / (2*sigma^2) ) ; % compute weight here
% if (i==j)
% W(i,j)=0;
% end;
end;
D(i,i) = sum(W(i,:)) ;
end;
L = D - W ;
normL = D^-0.5*L*D^-0.5;
[u,s,v] = svd(normL);
% New code below this point
cluster1 = find(u(:,end-1) >= 0);
cluster2 = find(u(:,end-1) < 0);
figure
plot(X(1,cluster1),X(2,cluster1),'.b')
hold on
plot(X(1,cluster2),X(2,cluster2),'.r')
hold off
title(sprintf('sigma=%d',sigma))
Here is the result:
Now notice that I changed sigma to be smaller - from 1.0 to 0.3. When I left it at 1.0, I got the following result:
which I assume is because with sigma=1, the points in the inner cluster were able to "pull" on the outer cluster (which they are about distance 1 away from) enough so that it was more energetically favorable to split both circles in half like a solid vibrating drum, rather than have two different circles.

Gradient descent in linear regression goes wrong

I actually want to use a linear model to fit a set of 'sin' data, but it turns out the loss function goes larger during each iteration. Is there any problem with my code below ? (gradient descent method)
Here is my code in Matlab
m=20;
rate = 0.1;
x = linspace(0,2*pi,20);
x = [ones(1,length(x));x]
y = sin(x);
w = rand(1,2);
for i=1:500
h = w*x;
loss = sum((h-y).^2)/m/2
total_loss = [total_loss loss];
**gradient = (h-y)*x'./m ;**
w = w - rate.*gradient;
end
Here is the data I want to fit
There isn't a problem with your code. With your current framework, if you can define data in the form of y = m*x + b, then this code is more than adequate. I actually ran it through a few tests where I define an equation of the line and add some Gaussian random noise to it (amplitude = 0.1, mean = 0, std. dev = 1).
However, one problem I will mention to you is that if you take a look at your sinusoidal data, you define a domain between [0,2*pi]. As you can see, you have multiple x values that get mapped to the same y value but of different magnitude. For example, at x = pi/2 we get 1 but at x = -3*pi/2 we get -1. This high variability will not bode well with linear regression, and so one suggestion I have is to restrict your domain... so something like [0, pi]. Another reason why it probably doesn't converge is the learning rate you chose is too high. I'd set it to something low like 0.01. As you mentioned in your comments, you already figured that out!
However, if you want to fit non-linear data using linear regression, you're going to have to include higher order terms to account for the variability. As such, try including second order and/or third order terms. This can simply be done by modifying your x matrix like so:
x = [ones(1,length(x)); x; x.^2; x.^3];
If you recall, the hypothesis function can be represented as a summation of linear terms:
h(x) = theta0 + theta1*x1 + theta2*x2 + ... + thetan*xn
In our case, each theta term would build a higher order term of our polynomial. x2 would be x^2 and x3 would be x^3. Therefore, we can still use the definition of gradient descent for linear regression here.
I'm also going to control the random generation seed (via rng) so that you can produce the same results I have gotten:
clear all;
close all;
rng(123123);
total_loss = [];
m = 20;
x = linspace(0,pi,m); %// Change
y = sin(x);
w = rand(1,4); %// Change
rate = 0.01; %// Change
x = [ones(1,length(x)); x; x.^2; x.^3]; %// Change - Second and third order terms
for i=1:500
h = w*x;
loss = sum((h-y).^2)/m/2;
total_loss = [total_loss loss];
% gradient is now in a different expression
gradient = (h-y)*x'./m ; % sum all in each iteration, it's a batch gradient
w = w - rate.*gradient;
end
If we try this, we get for w (your parameters):
>> format long g;
>> w
w =
Columns 1 through 3
0.128369521905694 0.819533906064327 -0.0944622478526915
Column 4
-0.0596638117151464
My final loss after this point is:
loss =
0.00154350916582836
This means that our equation of the line is:
y = 0.12 + 0.819x - 0.094x^2 - 0.059x^3
If we plot this equation of the line with your sinusoidal data, this is what we get:
xval = x(2,:);
plot(xval, y, xval, polyval(fliplr(w), xval))
legend('Original', 'Fitted');

Plotting an ellipse in MATLAB given in matrix form

I have an ellipse in 2 dimensions, defined by a positive definite matrix X as follows: a point x is in the ellipse if x'*X*x <= 1. How can I plot this ellipse in matlab? I've done a bit of searching while finding surprisingly little.
Figured out the answer actually: I'd post this as an answer, but it won't let me (new user):
Figured it out after a bit of tinkering. Basically, we express the points on the ellipse border (x'*X*x = 1) as a weighted combination of the eigenvectors of X, which makes some of the math to find the points easier. We can just write (au+bv)'X(au+bv)=1 and work out the relationship between a,b. Matlab code follows (sorry it's messy, just used the same notation that I was using with pen/paper):
function plot_ellipse(X, varargin)
% Plots an ellipse of the form x'*X*x <= 1
% plot vectors of the form a*u + b*v where u,v are eigenvectors of X
[V,D] = eig(X);
u = V(:,1);
v = V(:,2);
l1 = D(1,1);
l2 = D(2,2);
pts = [];
delta = .1;
for alpha = -1/sqrt(l1)-delta:delta:1/sqrt(l1)+delta
beta = sqrt((1 - alpha^2 * l1)/l2);
pts(:,end+1) = alpha*u + beta*v;
end
for alpha = 1/sqrt(l1)+delta:-delta:-1/sqrt(l1)-delta
beta = -sqrt((1 - alpha^2 * l1)/l2);
pts(:,end+1) = alpha*u + beta*v;
end
plot(pts(1,:), pts(2,:), varargin{:})
I stumbled across this post while searching for this topic, and even though it's settled, I thought I might provide another simpler solution, if the matrix is symmetric.
Another way of doing this is to use the Cholesky decomposition of the semi-definite positive matrix E implemented in Matlab as the chol function. It computes an upper triangular matrix R such that X = R' * R. Using this, x'*X*x = (R*x)'*(R*x) = z'*z, if we define z as R*x.
The curve to plot thus becomes such that z'*z=1, and that's a circle. A simple solution is thus z = (cos(t), sin(t)), for 0<=t<=2 pi. You then multiply by the inverse of R to get the ellipse.
This is pretty straightforward to translate into the following code:
function plot_ellipse(E)
% plots an ellipse of the form xEx = 1
R = chol(E);
t = linspace(0, 2*pi, 100); % or any high number to make curve smooth
z = [cos(t); sin(t)];
ellipse = inv(R) * z;
plot(ellipse(1,:), ellipse(2,:))
end
Hope this might help!