Why at Logistic Regression(multi-class) accuracy is so small? - matlab

I try to solve a problem with 3 features and 6 classes(label). The training dataset is 700 rows * 3 columns. The features values are continious from 0-100. I use one-Vs-all method, but I do not know why the prediction accuracy is so small, just 24%. Could anyone tell me, please? Thank you!
This is how I do the prediction:
function p = predictOneVsAll(all_theta, X)
m = size(X, 1);
num_labels = size(all_theta, 1);
% You need to return the following variables correctly
p = zeros(size(X, 1), 1);
% Add ones to the X data matrix
X = [ones(m, 1) X];
[m, p] = max(sigmoid(X * all_theta'), [], 2);
end
And the One-Vs-all
% You need to return the following variables correctly
all_theta = zeros(num_labels, n + 1);
% Add ones to the X data matrix
X = [ones(m, 1) X];
initial_theta = zeros(n+1, 1);
options = optimset('GradObj', 'on', 'MaxIter', 20);
for c = 1:num_labels,
[theta] = ...
fmincg (#(t)(lrCostFunction(t, X, (y == c), lambda)), ...
initial_theta, options);
all_theta(c,:) = theta';
end

In predictOneVsAll , you don't need to use sigmoid function. You need it only when the calculating cost. So correct code is,
[m, p] = max((X * all_theta'), [], 2);
In OneVsAll , loop should look like this
for c = 1:num_labels
all_theta(c,:) = fmincg (#(t)(lrCostFunction(t, X, (y == c), lambda)), initial_theta, options);
endfor
It is better if you ask these questions in andrew's ML course discussion. They would be more familiar with the code and the problem.

Related

Vectorize a regression map calculation

I compute the regression map of a time series A(t) on a field B(x,y,t) in the following way:
A=1:10; %time
B=rand(100,100,10); %x,y,time
rc=nan(size(B,1),size(B,2));
for ii=size(B,1)
for jj=1:size(B,2)
tmp = cov(A,squeeze(B(ii,jj,:))); %covariance matrix
rc(ii,jj) = tmp(1,2); %covariance A and B
end
end
rc = rc/var(A); %regression coefficient
Is there a way to vectorize/speed up code? Or maybe some built-in function that I did not know of to achieve the same result?
In order to vectorize this algorithm, you would have to "get your hands dirty" and compute the covariance yourself. If you take a look inside cov you'll see that it has many lines of input checking and very few lines of actual computation, to summarize the critical steps:
y = varargin{1};
x = x(:);
y = y(:);
x = [x y];
[m,~] = size(x);
denom = m - 1;
xc = x - sum(x,1)./m; % Remove mean
c = (xc' * xc) ./ denom;
To simplify the above somewhat:
x = [x(:) y(:)];
m = size(x,1);
xc = x - sum(x,1)./m;
c = (xc' * xc) ./ (m - 1);
Now this is something that is fairly straightforward to vectorize...
function q51466884
A = 1:10; %time
B = rand(200,200,10); %x,y,time
%% Test Equivalence:
assert( norm(sol1-sol2) < 1E-10);
%% Benchmark:
disp([timeit(#sol1), timeit(#sol2)]);
%%
function rc = sol1()
rc=nan(size(B,1),size(B,2));
for ii=1:size(B,1)
for jj=1:size(B,2)
tmp = cov(A,squeeze(B(ii,jj,:))); %covariance matrix
rc(ii,jj) = tmp(1,2); %covariance A and B
end
end
rc = rc/var(A); %regression coefficient
end
function rC = sol2()
m = numel(A);
rB = reshape(B,[],10).'; % reshape
% Center:
cA = A(:) - sum(A)./m;
cB = rB - sum(rB,1)./m;
% Multiply:
rC = reshape( (cA.' * cB) ./ (m-1), size(B(:,:,1)) ) ./ var(A);
end
end
I get these timings: [0.5381 0.0025] which means we saved two orders of magnitude in the runtime :)
Note that a big part of optimizing the algorithm is assuming you don't have any "strangeness" in your data, like NaN values etc. Take a look inside cov.m to see all the checks that we skipped.

Precompute weights for multidimensional linear interpolation

I have a non-uniform rectangular grid along D dimensions, a matrix of logical values V on the grid, and a matrix of query data points X. The number of grid points differs across dimensions.
I run the interpolation multiple times for the same grid G and query X, but for different values V.
The goal is to precompute the indexes and weights for the interpolation and to reuse them, because they are always the same.
Here is an example in 2 dimensions, in which I have to compute indexes and values every time within the loop, but I want to compute them only once before the loop. I keep the data types from my application (mostly single and logical gpuArrays).
% Define grid
G{1} = single([0; 1; 3; 5; 10]);
G{2} = single([15; 17; 18; 20]);
% Steps and edges are reduntant but help make interpolation a bit faster
S{1} = G{1}(2:end)-G{1}(1:end-1);
S{2} = G{2}(2:end)-G{2}(1:end-1);
gpuInf = 1e10;
% It's my workaround for a bug in GPU version of discretize in Matlab R2017a.
% It throws an error if edges contain Inf, realmin, or realmax. Seems fixed in R2017b prerelease.
E{1} = [-gpuInf; G{1}(2:end-1); gpuInf];
E{2} = [-gpuInf; G{2}(2:end-1); gpuInf];
% Generate query points
n = 50; X = gpuArray(single([rand(n,1)*14-2, 14+rand(n,1)*7]));
[G1, G2] = ndgrid(G{1},G{2});
for i = 1 : 4
% Generate values on grid
foo = #(x1,x2) (sin(x1+rand) + cos(x2*rand))>0;
V = gpuArray(foo(G1,G2));
% Interpolate
V_interp = interpV(X, V, G, E, S);
% Plot results
subplot(2,2,i);
contourf(G1, G2, V); hold on;
scatter(X(:,1), X(:,2),50,[ones(n,1), 1-V_interp, 1-V_interp],'filled', 'MarkerEdgeColor','black'); hold off;
end
function y = interpV(X, V, G, E, S)
y = min(1, max(0, interpV_helper(X, 1, 1, 0, [], V, G, E, S) ));
end
function y = interpV_helper(X, dim, weight, curr_y, index, V, G, E, S)
if dim == ndims(V)+1
M = [1,cumprod(size(V),2)];
idx = 1 + (index-1)*M(1:end-1)';
y = curr_y + weight .* single(V(idx));
else
x = X(:,dim); grid = G{dim}; edges = E{dim}; steps = S{dim};
iL = single(discretize(x, edges));
weightL = weight .* (grid(iL+1) - x) ./ steps(iL);
weightH = weight .* (x - grid(iL)) ./ steps(iL);
y = interpV_helper(X, dim+1, weightL, curr_y, [index, iL ], V, G, E, S) +...
interpV_helper(X, dim+1, weightH, curr_y, [index, iL+1], V, G, E, S);
end
end
I found a way to do this and posting it here because (as of now) two more people are interested. It takes only a slight modification to my original code (see below).
% Define grid
G{1} = single([0; 1; 3; 5; 10]);
G{2} = single([15; 17; 18; 20]);
% Steps and edges are reduntant but help make interpolation a bit faster
S{1} = G{1}(2:end)-G{1}(1:end-1);
S{2} = G{2}(2:end)-G{2}(1:end-1);
gpuInf = 1e10;
% It's my workaround for a bug in GPU version of discretize in Matlab R2017a.
% It throws an error if edges contain Inf, realmin, or realmax. Seems fixed in R2017b prerelease.
E{1} = [-gpuInf; G{1}(2:end-1); gpuInf];
E{2} = [-gpuInf; G{2}(2:end-1); gpuInf];
% Generate query points
n = 50; X = gpuArray(single([rand(n,1)*14-2, 14+rand(n,1)*7]));
[G1, G2] = ndgrid(G{1},G{2});
[W, I] = interpIW(X, G, E, S); % Precompute weights W and indexes I
for i = 1 : 4
% Generate values on grid
foo = #(x1,x2) (sin(x1+rand) + cos(x2*rand))>0;
V = gpuArray(foo(G1,G2));
% Interpolate
V_interp = sum(W .* single(V(I)), 2);
% Plot results
subplot(2,2,i);
contourf(G1, G2, V); hold on;
scatter(X(:,1), X(:,2), 50,[ones(n,1), 1-V_interp, 1-V_interp],'filled', 'MarkerEdgeColor','black'); hold off;
end
function [W, I] = interpIW(X, G, E, S)
global Weights Indexes
Weights=[]; Indexes=[];
interpIW_helper(X, 1, 1, [], G, E, S, []);
W = Weights; I = Indexes;
end
function [] = interpIW_helper(X, dim, weight, index, G, E, S, sizeV)
global Weights Indexes
if dim == size(X,2)+1
M = [1,cumprod(sizeV,2)];
Weights = [Weights, weight];
Indexes = [Indexes, 1 + (index-1)*M(1:end-1)'];
else
x = X(:,dim); grid = G{dim}; edges = E{dim}; steps = S{dim};
iL = single(discretize(x, edges));
weightL = weight .* (grid(iL+1) - x) ./ steps(iL);
weightH = weight .* (x - grid(iL)) ./ steps(iL);
interpIW_helper(X, dim+1, weightL, [index, iL ], G, E, S, [sizeV, size(grid,1)]);
interpIW_helper(X, dim+1, weightH, [index, iL+1], G, E, S, [sizeV, size(grid,1)]);
end
end
To do the task the whole process of interpolation ,except computing the interpolated values, should be done. Here is a solution translated from the Octave c++ source. Format of the input is the same as the frst signature of the interpn function except that there is no need to the v array. Also Xs should be vectors and should not be of the ndgrid format. Both the outputs W (weights) and I (positions) have the size (a ,b) that a is the number of neighbors of a points on the grid and b is the number of requested points to be interpolated.
function [W , I] = lininterpnw(varargin)
% [W I] = lininterpnw(X1,X2,...,Xn,Xq1,Xq2,...,Xqn)
n = numel(varargin)/2;
x = varargin(1:n);
y = varargin(n+1:end);
sz = cellfun(#numel,x);
scale = [1 cumprod(sz(1:end-1))];
Ni = numel(y{1});
index = zeros(n,Ni);
x_before = zeros(n,Ni);
x_after = zeros(n,Ni);
for ii = 1:n
jj = interp1(x{ii},1:sz(ii),y{ii},'previous');
index(ii,:) = jj-1;
x_before(ii,:) = x{ii}(jj);
x_after(ii,:) = x{ii}(jj+1);
end
coef(2:2:2*n,1:Ni) = (vertcat(y{:}) - x_before) ./ (x_after - x_before);
coef(1:2:end,:) = 1 - coef(2:2:2*n,:);
bit = permute(dec2bin(0:2^n-1)=='1', [2,3,1]);
%I = reshape(1+scale*bsxfun(#plus,index,bit), Ni, []).'; %Octave
I = reshape(1+sum(bsxfun(#times,scale(:),bsxfun(#plus,index,bit))), Ni, []).';
W = squeeze(prod(reshape(coef(bsxfun(#plus,(1:2:2*n).',bit),:).',Ni,n,[]),2)).';
end
Testing:
x={[1 3 8 9],[2 12 13 17 25]};
v = rand(4,5);
y={[1.5 1.6 1.3 3.5,8.1,8.3],[8.4,13.5,14.4,23,23.9,24.2]};
[W I]=lininterpnw(x{:},y{:});
sum(W.*v(I))
interpn(x{:},v,y{:})
Thanks to #SardarUsama for testing and his useful comments.

Comparing the function fminunc with the BFGS method for logistic regression

I´m constructing an algorithm that uses the BFGS method to find the parameters in a logistic regression for a binary dataset in Octave.
Now, I´m struggling with something I believe is an overfitting problem. I run the algorithm for several datasets and it actually converges to the same results as the fminunc function of Octave. However for an especific "type of dataset" the algorithm converges to very high values of the parameters, at contrary to the fminunc which gives razonable values of these parameters. I added a regularization term and I actually achieved my algorithm to converge to the same values of fminunc.
This especific type of dataset has data that can be completely separated by a straight line. My question is: why this is a problem for the BFGS method but it´s not a problem for fminunc? How this function avoid this issue without regularization? Could I implement this in my algorithm?
The code of my algorithm is the following:
function [beta] = Log_BFGS(data, L_0)
clc
close
%************************************************************************
%************************************************************************
%Loading the data:
[n, e] = size(data);
d = e - 1;
n; %Number of observations.
d; %Number of features.
Y = data(:, e); %Labels´ values
X_o = data(:, 1:d);
X = [ones(n, 1) X_o]; %Features values
%Initials conditions:
beta_0 = zeros(e, 1);
beta = [];
beta(:, 1) = beta_0;
N = 600; %Max iterations
Tol = 1e-10; %Tolerance
error = .1;
L = L_0; %Regularization parameter
B = eye(e);
options = optimset('GradObj', 'on', 'MaxIter', 600);
[beta_s] = fminunc(#(t)(costFunction(t, X, Y, L)), beta_0, options);
disp('Beta obtained with the fminunc function');
disp("--------------");
disp(beta_s)
k = 1;
a_0 = 1;
% Define the sigmoid function
h = inline('1.0 ./ (1.0 + exp(-z))');
while (error > Tol && k < N)
beta_k = beta(:, k);
x_0 = X*beta_k;
h_0 = h(x_0);
beta_r = [0 ; beta(:, k)(2:e, :)];
g_k = ((X)'*(h_0 - Y) + L*beta_r)/n;
d_k = -pinv(B)*g_k;
a = 0.1; %I´ll implement an Armijo line search here (soon)
beta(:, k+1) = beta(:, k) + a*d_k;
beta_k_1 = beta(:, k+1);
x_1 = X*beta_k_1;
h_1 = h(x_1);
beta_s = [0 ; beta(:, k+1)(2:e, :)];
g_k_1 = (transpose(X)*(h_1 - Y) + L*beta_s)/n;
s_k = beta(:, k+1) - beta(:, k);
y_k = g_k_1 - g_k;
B = B - B*s_k*s_k'*B/(s_k'*B*s_k) + y_k*y_k'/(s_k'*y_k);
k = k + 1;
error = norm(d_k);
endwhile
%Accuracy of the logistic model:
p = zeros(n, 1);
for j = 1:n
if (1./(1. + exp(-1.*(X(j, :)*beta(:, k)))) >= 0.5)
p(j) = 1;
else
p(j) = 0;
endif
endfor
R = mean(double(p == Y));
beta = beta(:, k);
%Showing the results:
disp("Estimation of logistic regression model Y = 1/(1 + e^(beta*X)),")
disp("using the algorithm BFGS =")
disp("--------------")
disp(beta)
disp("--------------")
disp("with a convergence error in the last iteration of:")
disp(error)
disp("--------------")
disp("and a total number of")
disp(k-1)
disp("iterations")
disp("--------------")
if k == N
disp("The maximum number of iterations was reached before obtaining the desired error")
else
disp("The desired error was reached before reaching the maximum of iterations")
endif
disp("--------------")
disp("The precision of the logistic regression model is given by (max 1.0):")
disp("--------------")
disp(R)
disp("--------------")
endfunction
The results I got for the dataset are showed in the following picture. If you need the data used in this situation, please let me know.
Results of the algorithm
Check the objectives!
The values of the solution-vector are nice, but the whole optimization is driven by the objective. You say fminunc which gives reasonable values of these parameters, but reasonable is not defined within this model.
It would not be impossible, that both, your low-value and your high-value solution allows pretty much the same objective. And that's what those solvers are solely caring about (when using no regulization-term).
So the important question is: is there a unique solution (which should disallow these results)? Only when your dataset has full rank! So maybe your data is rank-deficient and you obtain two equally good solutions. Of course there might be slight differences due to numerical-issues, which are always a source of errors, especially in more complex optimization-algorithms.

MATLAB sparse matrices: Gauss Seidel and power method using a sparse matrix with CSR (Compressed Sparse Row)

this is my first time here so I hope that someone can help me.
I'm trying to implementing the Gauss-Seidel method and the power method using a matrix with the storage CSR or called Morse storage. Unfortunately I can't manage to do better then the following codes:
GS-MORSE:
function [y] = gs_morse(aa, diag, col, row, nmax, tol)
[n, n] = size(A);
y = [1, 1, 1, 1];
m = 1;
while m < nmax,
for i = 1: n,
k1 = row(i);
k2 = row(i + 1) - 1;
for k = k1: k2,
y(i) = y(i) + aa(k) * x(col(k));
y(col(k)) = y(col(k)) + aa(k) * diag(i);
end
k2 = k2 + 1;
y(i) = y(i) + aa(k) * diag(i);
end
if (norm(y - x)) < tol
disp(y);
end
m = m + 1;
for i = 1: n,
x(i) = y(i);
end
end
POWER-MORSE:
I was able only to implement the power method but I don't understand how to use the former matrix... so my code for power method is:
function [y, l] = potencia_iterada(A, v)
numiter=100;
eps=1e-10;
x = v(:);
y = x/norm(x);
l = 0;
for k = 1: numiter,
x = A * y;
y = x / norm(x);
l0 = x.' * y;
if abs(l0) < eps
return
end
l = l0;
end
Please anyone can help me for completing these codes or can explain me how can I do that? I really don't understand how to do. Thank you very much

Calculate a tricky sum in Matlab

With the following variables:
m = 1:4; n = 1:32;
phi = linspace(0, 2*pi, 100);
theta = linspace(-pi, pi, 50);
S_mn = <a 4x32 coefficient matrix, corresponding to m and n>;
how do I compute the sum over m and n of S_mn*exp(1i*(m*theta + n*phi)), i.e.
I've thought of things like
[m, n] = meshgrid(m,n);
[theta, phi] = meshgrid(theta,phi);
r_mn = S_mn.*exp(1i*(m.*theta + n.*phi));
thesum = sum(r_mn(:));
but that requires theta and phi to have the same number of elements as m and n, and it gives me just one element in return - I want a matrix the the size of meshgrid(theta,phi), regardless of the sizes of theta and phi (i.e. I want to be able to evaluate the sum as a function of theta and phi).
How do I do this calculation in matlab?
Since I don't know what S is...
S = randn(4,32);
[m,n] = ndgrid(1:4,1:32);
fun = #(theta,phi) sum(sum(S.*exp(sqrt(-1)*(m*theta + n*phi))));
Works fine for me.
fun(pi,3*pi/2)
ans =
-15.8643373238676 - 1.45785698818839i
If you now wish to do this for a large set of values phi and theta, a pair of loops now are the trivial solution. Or, you can do it all in one computation, although the arrays will get larger. Still not hard. WTP?
You do realize that both meshgrid and ndgrid take more than just two arguments? So it is time to learn how to use bsxfun, and then squeeze.
[m,n,theta,phi] = ndgrid(1:4,1:32,linspace(-pi, pi, 50),linspace(0, 2*pi, 100));
res = bsxfun(#times,S,exp(sqrt(-1)*(m.*theta + n.*phi)));
res = squeeze(sum(sum(res,1),2));
Or do this, which will be a bit faster. The previous computation took my machine .07 seconds. This last one took .05, so some savings by using bsxfun heavily.
m = (1:4)';
n = 1:32;
[theta,phi] = ndgrid(linspace(-pi, pi, 50),linspace(0, 2*pi, 100));
theta = reshape(theta,[1,1,size(theta)]);
phi = reshape(phi,size(theta));
res = bsxfun(#plus,bsxfun(#times,m,theta*sqrt(-1)),bsxfun(#times,n,phi*sqrt(-1)));
res = bsxfun(#times,S,exp(res));
res = squeeze(sum(sum(res,1),2));
If you need to do the above 2000 times, so it should take 100 seconds to do. WTP? Get some coffee and relax.
First save the size of each variable:
size_m = size(m);
size_n = size(n);
size_theta = size(theta);
size_phi = size(phi);
Use ngrid function like this:
[theta, phi, m, n] = ngrid(theta, phi, m, n)
This will give you an array with 4 dimensions (one for each of your variables: theta, phi, m, n). Now you can calculate this:
m.*theta + n.*phi
Now you need to make S_mn have 4 dimensions with sizes size_theta, size_phi, size_m, size_n like this:
S_tpmn = repmat(S_mn, [size_theta size_phi size_m size_n]);
Now you can calculate your sum like this:
aux_sum = S_tpmn.*exp(1i*(m.*theta + n.*phi));
Finally you can sum along the last 2 dimensions (m and n) to get an array with 2 dimensions with size size_theta by size_phi:
final_sum = sum(sum(aux_sum, 4), 3);
Note: I don't have access to Matlab right now, so I can't test if this actually works.
There are several ways you could go about this.
One way is to create a function(-handle) that returns the sum as a function of theta and phi, and then use arrayfun to do the sums. Another is to fully vectorize the computation, though that will use more memory.
The arrayfun version:
[m, n] = meshgrid(m,n);
sumHandle = #(theta,phi)sum(reshape(...
S_mn.*exp(1i(m*theta + n*phi)),...
[],1))
[theta, phi] = meshgrid(theta,phi);
sumAsFunOfThetaPhi = arrayfun(sumHandle,theta,phi);
The vectorized version:
[m, n] = meshgrid(m,n);
m = permute(m(:),[2 4 1 3]); %# vector along dim 3
n = permute(n(:),[2 3 4 1]); %# vector along dim 4
S_mn = repmat( permute(S_mn,[3 4 1 2]), length(theta),length(phi));
theta = theta(:); %# vector along dim 1 (phi is along dim 2 b/c of linspace)
fullSum = S_mn.* exp( 1i*(...
bsxfun(#plus,...
bsxfun(#times, m, theta),...
bsxfun(#times, n, phi),...
)));
sumAsFunOfThetaPhi = sum(sum( fullSum, 3),4);