MATLAB Perceptron - matlab

I've been struggling with this for quite some time now. I cant seem to figure out why I have a percentage error in the thousands. I'm trying to figure out a perceptron between X1 and X2 which are Gaussian distributed data sets with distinct means and identical covariances. My code:
N=200;
X = [X1; X2];
X = [X ones(N,1)]; %bias
y = [-1*ones(N/2,1); ones(N/2,1)]; %classification
%Split data into training and test
ii = randperm(N);
Xtr = X(ii(1:N/2),:);
ytr = X(ii(1:N/2),:);
Xts = X(ii(N/2+1:N),:);
yts = y(ii(N/2+1:N),:);
w = randn(3,1);
eta = 0.001;
%learn from training set
for iter=1:500
j = ceil(rand*N/2);
if( ytr(j)*Xtr(j,:)*w < 0)
w = w + eta*Xtr(j,:)';
end
end
%apply what you have learnt to test set
yhts = Xts * w;
disp([yts yhts])
PercentageError = 100*sum(find(yts .*yhts < 0))/Nts;
Any help would be appreciated. Thank you

You have a bug in your error calculation.
On this line:
PercentageError = 100*sum(find(yts .*yhts < 0))/Nts;
The find is returning indices of the matching items. For your accuracy measure you don't want those, you just want the count:
PercentageError = 100*sum( yts .*yhts < 0 )/Nts;
If I generate X1 = randn(100,2); X2 = randn(100,2); and assume Nts=100, I get 2808% for your code, and expected 50% error (no better than guessing because my test data cannot be separated) for the corrected version.
Update - the perceptron model had a more subtle bug too, see: https://datascience.stackexchange.com/questions/2353/matlab-perceptron

Related

How to perform adaptive step size using Runge-Kutta fourth order (Matlab)?

For me, it seems like the estimated hstep takes quite a long time and long iteration to converge.
I tried it with this first ODE.
Basically, you perform the difference between RK4 with stepsize of h with h/2.Please note that to reach the same timestep value, you will have to use the y value after two timestep of h/2 so that it reaches h also.
frhs=#(x,y) x.^2*y;
Is my code correct?
clear all;close all;clc
c=[]; i=1; U_saved=[]; y_array=[]; y_array_alt=[];
y_arr=1; y_arr_2=1;
frhs=#(x,y) 20*cos(x);
tol=0.001;
y_ini= 1;
y_ini_2= 1;
c=abs(y_ini-y_ini_2)
hc=1
all_y_values=[];
for m=1:500
if (c>tol || m==1)
fprintf('More')
y_arr
[Unew]=vpa(Runge_Kutta(0,y_arr,frhs,hc))
if (m>1)
y_array(m)=vpa(Unew);
y_array=y_array(logical(y_array));
end
[Unew_alt]=Runge_Kutta(0,y_arr_2,frhs,hc/2);
[Unew_alt]=vpa(Runge_Kutta(hc/2,Unew_alt,frhs,hc/2))
if (m>1)
y_array_alt(m)=vpa(Unew_alt);
y_array_alt=y_array_alt(logical(y_array_alt));
end
fprintf('More')
%y_array_alt(m)=vpa(Unew_alt);
c=vpa(abs(Unew_alt-Unew) )
hc=abs(tol/c)^0.25*hc
if (c<tol)
fprintf('Less')
y_arr=vpa(y_array(end) )
y_arr_2=vpa(y_array_alt(end) )
[Unew]=Runge_Kutta(0,y_arr,frhs,hc)
all_y_values(m)=Unew;
[Unew_alt]=Runge_Kutta(0,y_arr_2,frhs,hc/2);
[Unew_alt]=Runge_Kutta(hc/2,Unew_alt,frhs,hc/2)
c=vpa( abs(Unew_alt-Unew) )
hc=abs(tol/c)^0.2*hc
end
end
end
all_y_values
A better structure for the time loop has only one place where the time step is computed.
x_array = [x0]; y_array = [y0]; h = h_init;
x = x0; y = y0;
while x < x_end
[y_new, err] = RK4_step_w_error(x,y,rhs,h);
factor = abs(tol/err)^0.2;
if factor >= 1
y_array(end+1) = y = y_new;
x_array(end+1) = x = x+h;
end
h = factor*h;
end
For the data given in the code
rhs = #(x,y) 20*cos(x);
x0 = 0; y0 = 1; x_end = 6.5; tol = 1e-3; h_init = 1;
this gives the result against the exact solution
The computed points lie exactly on the exact solution, for the segments between them one would need to use a "dense output" interpolation. Or as a first improvement, just include the middle value from the half-step computation.
function [ y_next, err] = RK4_step_w_error(x,y,rhs,h)
y2 = RK4_step(x,y,rhs,h);
y1 = RK4_step(x,y,rhs,h/2);
y1 = RK4_step(x+h/2,y1,rhs,h/2);
y_next = y1;
err = (y2-y1)/15;
end
function y_next = RK4_step(x,y,rhs,h)
k1 = h*rhs(x,y);
k2 = h*rhs(x+h/2,y+k1);
k3 = h*rhs(x+h/2,y+k2);
k4 = h*rhs(x+h,y+k3);
y_next = y + (k1+2*k2+2*k3+k4)/6;
end
Revision 1
The error returned is the actual step error. The error that is required for the step size control however is the unit step error or error density, which is the step error with divided by h
function [ y_next, err] = RK4_step_w_error(x,y,rhs,h)
y2 = RK4_step(x,y,rhs,h);
y1 = RK4_step(x,y,rhs,h/2);
y1 = RK4_step(x+h/2,y1,rhs,h/2);
y_next = y1;
err = (y2-y1)/15/h;
end
Changing the example to a simple bi-stable model oscillating between two branches of stable equilibria
rhs = #(x,y) 3*y-y^3 + 3*cos(x);
x0 = 0; y0 = 1; x_end = 13.5; tol = 5e-3; h_init = 5e-2;
gives plots of solution, error (against an ode45 integration) and step sizes
Red crosses are the step sizes of rejected steps.
Revision 2
The error in the function values can be used as an error guidance for the extrapolation value which is of 5th order, making the method a 5th order method in extrapolation mode. As it uses the 4th order error to predict the 5th order optimal step size, a caution factor is recommended, the code changes in the appropriate places to
factor = 0.75*abs(tol/err)^0.2;
...
function [ y_next, err] = RK4_step_w_error(x,y,rhs,h)
y2 = RK4_step(x,y,rhs,h);
y1 = RK4_step(x,y,rhs,h/2);
y1 = RK4_step(x+h/2,y1,rhs,h/2);
y_next = y1+(y1-y2)/15;
err = (y1-y2)/15;
end
In the plots the step size is appropriately larger, but the error shows sharper and larger spikes, this version of the method is apparently less stable.

Stochastic gradient descent algorithm in MATLAB

I'm trying to implement stochastic gradient descent in MATLAB, but I'm going wrong somewhere. I think that maybe the way I am checking for convergence is incorrect (I wasn't quite sure how to update the estimator with each iteration), but I'm not sure. I've been trying just to fit basic linear data, but I'm getting results that are pretty far off and I'm hoping to get some help. Would someone be able to point out where I'm going wrong, and why this isn't working correctly?
Thanks!
Here is the data set up and general code:
clear all;
close all;
clc
N_features = 2;
d = 100;
m = 100;
X_train = 10*rand(d,1);
X_test = 10*rand(d,1);
X_train = [ones(d,1) X_train];
X_test = [ones(d,1) X_test];
y_train = 5 + X_train(:,2) + 0.5*randn(d,1);
y_test = 5 + X_test(:,2) + 0.5*randn(d,1);
gamma = 0.01; %learning rate
[sgd_est_train,sgd_est_test,SSE_train,SSE_test,w] = stoch_grad(d,m,N_features,X_train,y_train,X_test,y_test,gamma);
figure(1)
plot(X_train(:,2),sgd_est_train,'ro',X_train(:,2),y_train,'go')
figure(2)
plot(X_test(:,2),sgd_est_test,'bo',X_test(:,2),y_test,'go')
and the function that actually implements the SGD is:
% stochastic gradient descent
function [sgd_est_train,sgd_est_test,SSE_train,SSE_test,w] = stoch_grad(d,m,N_features,X_train,y_train,X_test,y_test,gamma)
epsilon = 0.01; %convergence criterion
max_iter = 10000;
w0 = zeros(N_features,1); %initial guess
w = zeros(N_features,1); %for convenience
x = zeros(d,1);
z = zeros(d,1);
for jj=1:max_iter;
for kk=1:d;
x = X_train(kk,:)';
z = gamma*((w0'*x-y_train(kk))*x);
w = w0 - z;
end
if norm(w0-w,2)<epsilon
break;
else
w0 = w;
end
end
sgd_est_test = zeros(m,1);
sgd_est_train = zeros(d,1);
for ll=1:m;
sgd_est_test(ll,1) = w'*X_test(ll,:)';
end
for ii=1:d;
sgd_est_train(ii,1) = w'*X_train(ii,:)';
end
SSE_test = sum((sgd_est_test - y_test).^2);
SSE_train = sum((sgd_est_train - y_train).^2);
end
I tried lowering the learning rate at 0.001 and resulted to this:
Which tells me that your algorithm produces an estimation of the form y=ax instead of y=ax + b (for some reason ignores the constant term) and also you need to lower the learning rate in order to converge.

Interpolation using polyfit (Matlab)

My script is supposed to run Runge-Kutta and then interpolate around the tops using polyfit to calculate the max values of the tops. I seem to get the x-values of the max points correct but the y-values are off for some reason. Have sat with it for 3 days now. The problem should be In the last for-loop when I calculate py?
Function:
function funk = FU(t,u)
L0 = 1;
C = 1*10^-6;
funk = [u(2); 2.*u(1).*u(2).^2./(1+u(1).^2) - u(1).*(1+u(1).^2)./(L0.*C)];
Program:
%Runge kutta
clear all
close all
clc
clf
%Given values
U0 = [240 1200 2400];
L0 = 1;
C = 1*10^-6;
T = 0.003;
h = 0.000001;
W = [];
% Runge-Kutta 4
for i = 1:3
u0 = [0;U0(i)];
u = u0;
U = u;
tt = 0:h:T;
for t=tt(1:end-1)
k1 = FU(t,u);
k2 = FU(t+0.5*h,u+0.5*h*k1);
k3 = FU((t+0.5*h),(u+0.5*h*k2));
k4 = FU((t+h),(u+k3*h));
u = u + (1/6)*(k1+2*k2+2*k3+k4)*h;
U = [U u];
end
W = [W;U];
end
I1 = W(1,:); I2 = W(3,:); I3 = W(5,:);
dI1 = W(2,:); dI2 = W(4,:); dI3 = W(6,:);
I = [I1; I2; I3];
dI = [dI1; dI2; dI3];
%Plot of the currents
figure (1)
plot(tt,I1,'r',tt,I2,'b',tt,I3,'g')
hold on
legend('U0 = 240','U0 = 1200','U0 = 2400')
BB = [];
d = 2;
px = [];
py = [];
format short
for l = 1:3
[H,Index(l)]=max(I(l,:));
Area=[(Index(l)-2:Index(l)+2)*h];
p = polyfit(Area,I(Index(l)-2:Index(l)+2),4);
rotp(1,:) = roots([4*p(1),3*p(2),2*p(3),p(4)]);
B = rotp(1,2);
BB = [BB B];
Imax(l,:)=p(1).*B.^4+p(2).*B.^3+p(3).*B.^2+p(4).*B+p(5);
Tsv(i)=4*rotp(1,l);
%px1 = linspace(h*(Index(l)-d-1),h*(Index(l)+d-2));
px1 = BB;
py1 = polyval(p,px1(1,l));
px = [px px1];
py = [py py1];
end
% Plots the max points
figure(1)
plot(px1(1),py(1),'b*-',px1(2),py(2),'b*-',px1(3),py(3),'b*-')
hold on
disp(Imax)
Your polyfit line should read:
p = polyfit(Area,I(l, Index(l)-2:Index(l)+2),4);
More interestingly, take note of the warnings you get about poor conditioning of that polynomial (I presume you're seeing these). Why? Partly because of numerical precision (your numbers are very small, scaled around 10^-6) and partly because you're asking for a 4th-order fit to five points (which is singular). To do this "better", use more input points (more than 5), or a lower-order polynomial fit (quadratic is usually plenty), and (probably) rescale before you use the polyfit tool.
Having said that, in practice this problem is often solved using three points and a quadratic fit, because it's computationally cheap and gives very nearly the same answers as more complex approaches, but you didn't get that from me (with noiseless data like this, it doesn't much matter anyway).

How to vectorize sum of vector functions

I have a for loop in MATLAB which calculates sum of sine functions as follows:
% preliminary constants, etc.
tTot = 2;
fS = 10000;dt = 1/fS; % total time, sampling rate
Npts = tTot * fS; %number of points
t = dt:dt:tTot;
c1 = 2*pi/tTot;
c2 = pi/fS;
s = zeros(1,Npts)
% loop to optimize:
for(k=1:Npts/2)
s = s + sin(c1*k*t - c2*k*(k-1))
end
Basically, a one-liner for loop that becomes really slow as Npts becomes large. The difficulty comes in the fact that I am summing vectors which are defined by a parameter k, over k.
Is there a way to make this more efficient by vectorizing? One approach I've taken so far is defining a matrix and summing out the result, but this gives me an out of memory error for larger vectors:
[K,T] = meshgrid(1:1:Npts,t);
s = sum(sin(c1*K.*T - c2*K.*(K-1)),2);
Approach #1
Using sine of difference formula: sin(A-B) = sin A cos B - cos A sin B that enables us to leverage fast matrix multiplication -
K = 1:Npts/2;
p1 = bsxfun(#times,c1*K(:),t(:).');
p2 = c2*K(:).*(K(:)-1);
s = cos(p2).'*sin(p1) - sin(p2).'*cos(p1);
Approach #2
With bsxfun -
K = 1:Npts/2;
p1 = bsxfun(#times,c1*K(:),t(:).');
p2 = c2*K(:).*(K(:)-1);
s = sum(sin(bsxfun(#minus, p1,p2)),1);
Approach #1 can be modified to bring in a smaller sized loop to accommodate for problems that have large data arrays as shown next -
num_blks = 100;%// Edit this based on how much RAM can handle workspace data
intv_len = Npts/(2*num_blks); %// Interval length based on number of blocks
KP = 1:Npts/2;
P2 = c2*KP(:).*(KP(:)-1);
sin_P2 = sin(P2);
cos_P2 = cos(P2);
s = zeros(1,Npts);
for iter = 1:intv_len:Npts/2
K = iter:iter+intv_len-1;
p1 = bsxfun(#times,c1*K(:),t(:).');
s = s + (cos_P2(K).'*sin(p1) - sin_P2(K).'*cos(p1));
end

Implementation of shadow free 1d invariant image

I implemented a method for removing shadows based on invariant color features found in the paper Entropy Minimization for Shadow Removal. My implementation seems to be yielding similar computational results sometimes, but they are always off, and my grayscale image is blocky, maybe as a result of incorrectly taking the geometric mean.
Here is an example plot of the information potential from the horse image in the paper as well as my invariant image. Multiply the x-axis by 3 to get theta(which goes from 0 to 180):
And here is the grayscale Image my code outputs for the correct maximum theta (mine is off by 10):
You can see the blockiness that their image doesn't have:
Here is their information potential:
When dividing by the geometric mean, I have tried using NaN and tresholding the image so the smallest possible value is .01, but it doesn't seem to change my output.
Here is my code:
I = im2double(imread(strname));
[m,n,d] = size(I);
I = max(I, .01);
chrom = zeros(m, n, 3, 'double');
for i = 1:m
for j = 1:n
% if ((I(i,j,1)*I(i,j,2)*I(i,j,3))~= 0)
chrom(i,j, 1) = I(i,j,1)/((I(i,j,1)*I(i,j,2)*I(i,j, 3))^(1/3));
chrom(i,j, 2) = I(i,j,2)/((I(i,j,1)*I(i,j,2)*I(i,j, 3))^(1/3));
chrom(i,j, 3) = I(i,j,3)/((I(i,j,1)*I(i,j,2)*I(i,j, 3))^(1/3));
% else
% chrom(i,j, 1) = 1;
% chrom(i,j, 2) = 1;
% chrom(i,j, 3) = 1;
% end
end
end
p1 = mat2gray(log(chrom(:,:,1)));
p2 = mat2gray(log(chrom(:,:,2)));
p3 = mat2gray(log(chrom(:,:,3)));
X1 = mat2gray(p1*1/(sqrt(2)) - p2*1/(sqrt(2)));
X2 = mat2gray(p1*1/(sqrt(6)) + p2*1/(sqrt(6)) - p3*2/(sqrt(6)));
maxinf = 0;
maxtheta = 0;
data2 = zeros(1, 61);
for theta = 0:3:180
M = X1*cos(theta*pi/180) - X2*sin(theta*pi/180);
s = sqrt(std2(X1)^(2)*cos(theta*pi/180) + std2(X2)^(2)*sin(theta*pi/180));
s = abs(1.06*s*((m*n)^(-1/5)));
[m, n] = size(M);
length = m*n;
sources = zeros(1, length, 'double');
count = 1;
for x=1:m
for y = 1:n
sources(1, count) = M(x , y);
count = count + 1;
end
end
weights = ones(1, length);
sigma = 2*s;
[xc , Ak] = fgt_model(sources , weights , sigma , 10, sqrt(length) , 6 );
sum1 = sum(fgt_predict(sources , xc , Ak , sigma , 10 ));
sum1 = sum1/sqrt(2*pi*2*s*s);
data2(theta/3 + 1) = sum1;
if (sum1 > maxinf)
maxinf = sum1;
maxtheta = theta;
end
end
InvariantImage2 = cos(maxtheta*pi/180)*X1 + sin(maxtheta*pi/180)*X2;
Assume the Fast Gauss Transform is correct.
I don't know whether this makes any difference as it is more than a month now, but the blockiness and different information potential plot is simply caused by compression of the used image. You can't expect to be getting same results using this image as they had, because they have used raw, high resolution uncompressed version of it. I have to say I am fairly impressed with your results, especially with implementing the information potential. That thing went over my head a little.
John.