Prediction of SVM with custom kernel extremely slow in Matlab - matlab

I'm using the Matlab function fitcsvm for training a SVM with a RBF kernel. I'm using the following calls:
SVMModel = fitcsvm(X_train,labels,'KernelFunction','rbf','KernelScale',0.2087,'BoxConstraint',2.8779);
[~,scores] = predict(SVMModel,X_test);
X_train is a NxD matrix with training data, labels is a Nx1 vector with the labels for the training data and X_test is a MxD matrix with test data points.
Now I would like to use custom kernels. To start with, I decided to try the RBF kernel. The implementation goes as follows:
SVMModel = fitcsvm(X_train,labels,'KernelFunction','rbfKernel','BoxConstraint', 2.8779);
[~,scores] = predict(SVMModel,X_test);
function K = rbfKernel(U,V)
sigma = 0.2087;
gamma = 1 ./ (2*(sigma ^2));
K = exp(-gamma .* pdist2(U,V,'euclidean').^2);
end
I stored the function rbfKernel in a rbfKernel.m file.
The result of the built-in kernel as well as the custom kernel is very similar and the fitcsvm method runs for both approaches very fast.
The problem is that the predict method is extremely slow when using the custom kernel. It takes around 1 minute, compared to 5 seconds with the built-in kernel.
Why is this? Is there a mistake I made?

Probably it's due to code optimization. MATLAB engineers spend lots of time optimizing their codes, so my bet is that although your code does the same as the built-in function, it doesn't do as fast as MATLAB code does

Related

How to properly modify libsvm for regression in matlab to get better results?

I want to predict the y = f(X) and I have N sample points.
One of the few solutions I am thinking is using machine learning techniques such as SVM regression and Neural Networks. I have experimented using weka however the accuracy is very poor.
I also used libsvm. because it has matlab implementation (i dont have the svm toolbox for matlab since my matlab is 2013a). I tested a very simple case using this code
%input parameters
training_label_vector = [0;1]
X = [ 1 ; 2.5 ; 3.5 ; 4.0 ; 5.50 ; 6.0 ; 6.5 ; 7.0 ];
Y = X.^2.*(2+sin(2*X))/4;
training_instance_matrix (1,1:8)=X(:);
training_instance_matrix (2,1:8)=Y(:);
%train
model = svmtrain(label_vector,instance_matrix,['-s 3 -t 1']);
%predict
[predict_label, accuracy, dec_values] = svmpredict(label_vector,instance_matrix, model);
but it only shows only one iteration as some output (I dont entirely understands its output) so i guess there is something wrong.
If anyone has some experiences with libsvm in matlab implementation, please give me some light on this matter.
Is there a better way to approximate any function, y = f(X) which can be highly non linear using some functions in matlab. (splines, least squares regression might not be so accurate especially for highly non linear functions)

Kriging / Gaussian Process Conditional Simulations in Matlab

I would like to perform conditional simulations for Gaussian process (GP) models in Matlab. I have found a tutorial by Martin Kolář (http://mrmartin.net/?p=223).
sigma_f = 1.1251; %parameter of the squared exponential kernel
l = 0.90441; %parameter of the squared exponential kernel
kernel_function = #(x,x2) sigma_f^2*exp((x-x2)^2/(-2*l^2));
%This is one of many popular kernel functions, the squared exponential
%kernel. It favors smooth functions. (Here, it is defined here as an anonymous
%function handle)
% we can also define an error function, which models the observation noise
sigma_n = 0.1; %known noise on observed data
error_function = #(x,x2) sigma_n^2*(x==x2);
%this is just iid gaussian noise with mean 0 and variance sigma_n^2s
%kernel functions can be added together. Here, we add the error kernel to
%the squared exponential kernel)
k = #(x,x2) kernel_function(x,x2)+error_function(x,x2);
X_o = [-1.5 -1 -0.75 -0.4 -0.3 0]';
Y_o = [-1.6 -1.3 -0.5 0 0.3 0.6]';
prediction_x=-2:0.01:1;
K = zeros(length(X_o));
for i=1:length(X_o)
for j=1:length(X_o)
K(i,j)=k(X_o(i),X_o(j));
end
end
%% Demo #5.2 Sample from the Gaussian Process posterior
clearvars -except k prediction_x K X_o Y_o
%We can also sample from this posterior, the same way as we sampled before:
K_ss=zeros(length(prediction_x),length(prediction_x));
for i=1:length(prediction_x)
for j=i:length(prediction_x)%We only calculate the top half of the matrix. This an unnecessary speedup trick
K_ss(i,j)=k(prediction_x(i),prediction_x(j));
end
end
K_ss=K_ss+triu(K_ss,1)'; % We can use the upper half of the matrix and copy it to the
K_s=zeros(length(prediction_x),length(X_o));
for i=1:length(prediction_x)
for j=1:length(X_o)
K_s(i,j)=k(prediction_x(i),X_o(j));
end
end
[V,D]=eig(K_ss-K_s/K*K_s');
A=real(V*(D.^(1/2)));
for i=1:7
standard_random_vector = randn(length(A),1);
gaussian_process_sample(:,i) = A * standard_random_vector+K_s/K*Y_o;
end
hold on
plot(prediction_x,real(gaussian_process_sample))
set(plot(X_o,Y_o,'r.'),'MarkerSize',20)
The tutorial generates the conditional simulations using a direct simulation method based on covariance matrix decomposition. It is my understanding that there are several methods of generating conditional simulations that may be better when the number of simulation points is large such as conditioning by Kriging using a local neighborhood. I have found information regarding several methods in J.-P. Chilès and P. Delfiner, “Chapter 7 - Conditional Simulations,” in Geostatistics: Modeling Spatial Uncertainty, Second Edition, John Wiley & Sons, Inc., 2012, pp. 478–628.
Is there an existing Matlab toolbox that can be used for conditional simulations? I am aware of DACE, GPML, and mGstat (http://mgstat.sourceforge.net/). I believe only mGstat offers the capability to perform conditional simulations. However, mGstat also seems to be limited to only 3D models and I am interested in higher dimensional models.
Can anybody offer any advice on getting started performing conditional simulations with an existing toolbox such as GPML?
===================================================================
EDIT
I have found a few more Matlab toolboxes: STK, ScalaGauss, ooDACE
It appears STK is capable of conditional simulations using covariance matrix decomposition. However, is limited to a moderate number (maybe a few thousand?) of simulation points due to the Cholesky factorization.
I used the STK toolbox and I recommend it for others:
http://kriging.sourceforge.net/htmldoc/
I found that if you need conditional simulations at a large number of points then you might consider generating a conditional simulation at the points in a large design of experiment (DoE) and then simply relying on the mean prediction conditional on that DoE.

Improve the accuracy performance on SVM

I am working on people detecting using two different features HOG and LBP. I used SVM to train the positive and negative samples. Here, I wanna ask how to improve the accuracy of SVM itself? Because, everytime I added up more positives and negatives sample, the accuracy is always decreasing. Currently my positive samples are 1500 and negative samples are 700.
%extract features
[fpos,fneg] = features(pathPos, pathNeg);
%train SVM
HOG_featV = loadingV(fpos,fneg); % loading and labeling each training example
fprintf('Training SVM..\n');
%L = ones(length(SV),1);
T = cell2mat(HOG_featV(2,:));
HOGtP = HOG_featV(3,:)';
C = cell2mat(HOGtP); % each row of P correspond to a training example
%extract features from LBP
[LBPpos,LBPneg] = LBPfeatures(pathPos, pathNeg);
LBP_featV = loadingV(LBPpos, LBPneg);
LBPlabel = cell2mat(LBP_featV(2,:));
LBPtP = LBP_featV(3,:);
M = cell2mat(LBPtP)'; % each row of P correspond to a training example
featureVector = [C M];
model = svmlearn(featureVector, T','-t 2 -g 0.3 -c 0.5');
Anyone knows how to find best C and Gamma value for improving SVM accuracy?
Thank you,
To find best C and Gamma value for improving SVM accuracy you typically perform cross-validation. In sum you can leave-one-out (1 sample) and test the VBM for that sample using the different parameters (2 parameters define a 2d grid). Typically you would test each decade of the parameters for a certain range. For example: C = [0.01, 0.1, 1, ..., 10^9]; G= [1^-5, 1^-4, ..., 1000]. This should also improve your SVM accuracy by optimizing the hyper-parameters.
By looking again to your question it seems you are using the svmlearn of the machine learning toolbox (statistics toolbox) of Matlab. Therefore you have already built-in functions for cross-validation. Give a look at: http://www.mathworks.co.uk/help/stats/support-vector-machines-svm.html
I followed ASantosRibeiro's method to optimize the parameters before and it works well.
In addition, you could try to add more negative samples until the proportion of the negative and positive reach 2:1. The reason is that when you implement real-time application, you should scan the whole image step by step and commonly the negative extracted samples would be much more than the people-contained samples.
Thus, add more negative training samples is a quite straightforward but effective way to improve overall accuracy(Both false positive and true negative).

MATLAB fminunc() not completing for large datasets. Works for smaller ones

I am performing logistic regression in MATLAB with L2 regularization on text data. My program works well for small datasets. For larger sets, it keeps running infinitely.
I have seen the potentially duplicate question (matlab fminunc not quitting (running indefinitely)). In that question, the cost for initial theta was NaN and there was an error printed in the console. For my implementation, I am getting a real valued cost and there is no error even with verbose parameters being passed to fminunc(). Hence I believe this question might not be a duplicate.
I need help in scaling it to larger sets. The size of the training data I am currently working on is roughly 10k*12k (10k text files cumulatively containing 12k words). Thus, I have m=10k training examples and n=12k features.
My cost function is defined as follows:
function [J gradient] = costFunction(X, y, lambda, theta)
[m n] = size(X);
g = inline('1.0 ./ (1.0 + exp(-z))');
h = g(X*theta);
J =(1/m)*sum(-y.*log(h) - (1-y).*log(1-h))+ (lambda/(2*m))*norm(theta(2:end))^2;
gradient(1) = (1/m)*sum((h-y) .* X(:,1));
for i = 2:n
gradient(i) = (1/m)*sum((h-y) .* X(:,i)) - (lambda/m)*theta(i);
end
end
I am performing optimization using MATLAB's fminunc() function. The parameters I pass to fminunc() are:
options = optimset('LargeScale', 'on', 'GradObj', 'on', 'MaxIter', MAX_ITR);
theta0 = zeros(n, 1);
[optTheta, functionVal, exitFlag] = fminunc(#(t) costFunction(X, y, lambda, t), theta0, options);
I am running this code on a machine with these specifications:
Macbook Pro i7 2.8GHz / 8GB RAM / MATLAB R2011b
The cost function seems to behave correctly. For initial theta, I get acceptable values of J and gradient.
K>> theta0 = zeros(n, 1);
K>> [j g] = costFunction(X, y, lambda, theta0);
K>> j
j =
0.6931
K>> max(g)
ans =
0.4082
K>> min(g)
ans =
-2.7021e-05
The program takes incredibly long to run. I started profiling keeping MAX_ITR = 1 for fminunc(). With a single iteration, the program did not complete execution even after a couple of hours had elapsed. My questions are:
Am I doing something wrong mathematically?
Should I use any other optimizer instead of fminunc()? With LargeScale=on, fminunc() uses trust-region algorithms.
Is this problem cluster-scale and should not be run on a single machine?
Any other general tips will be appreciated. Thanks!
This helped solve the problem: I was able to get this working by setting the LargeScale flag to 'off' in fminunc(). From what I gather, LargeScale = 'on' uses trust region algorithms, while keeping it 'off' uses quasi-newton methods. Using quasi-newton methods and passing the gradient worked a lot faster for this particular problem and gave very nice results.
I was able to get this working by setting the LargeScale flag to 'off' in fminunc(). From what I gather, LargeScale = 'on' uses trust region algorithms, while keeping it 'off' uses quasi-newton methods. Using quasi-newton methods and passing the gradient worked a lot faster for this particular problem and gave very nice results.
Here is my advise:
-Set the Matlab flag to show debug output during run. If not just print out in your cost function the cost, which will allow you to monitor iteration count and error.
And second, which is very important:
Your problem is illposed, or so to say underdetermined. You have a 12k feature space and provide only 10k examples, which means for an unconstrained optimization the answer is -Inf. To make a quick example why this is, your problem is like:
Minimize x+y+z given that x+y-z = 2. Feature space dim 3, spanned vector space - 1d. I suggest use PCA or CCA to reduce the dimensionality of the the text files by retaining their variation up to 99%. This will probably give you a feature space ~100-200dim.
PS: Just to point out that the problem is very fram from cluster size requirement, which usually is 1kk+ data points and that fminunc is not at all an overkill, and LIBSVM has nothing to do with it because fminunc is just a quadratic optimizer, while LIBSVM is a classifier. To clear out LIBSVM uses something similar to fminunc just with different objective function.
Here's what I suspect to be the issue, based on my experience with this type of problem. You're using a dense representation for X instead of a sparse one. You're also seeing the typical effect in text classification that the number of terms increasing roughly linearly with the number of samples. Effectively, the cost of the matrix multiplication X*theta goes up quadratically with the number of samples.
By contrast, a good sparse matrix representation only iterates over the non-zero elements to do a matrix multiplication, which tends to be roughly constant per document if they're of appropriately constant length, causing linear instead of quadratic slowdown in the number of samples.
I'm not a Matlab guru, but I know it has a sparse matrix package, so try to use that.

matlab neural network gradient descent and mean square error

I want to know how grdient descent algorithm works on matlab network training and how MSE is calculated - I have my own app but it doesnt work as the matlab nn and I want to know why.
My algorithm looks like this:
foreach epoch
gradient_vector = 0 // this is a vector
rmse = 0
foreach sample in data set
output = CalculateForward(sample.input)
error = sample.target - output
rmse += DotProduct(error,error)
gradient_part = CalculateBackward(error)
gradient_vector += (gradient_part / number_of_samples)
end
network.AddToWeights( gradient_vector * learning_rate)
rmse = sqrt(rmse/number_of_samples)
end
I it something similar what matlab does?
It appears close to what MATLAB does, but keep in mind that the toolbox is designed for a broad base of applications. Your algorithm gives each data entry once to the network once per epoch. Matlab's toolbox can present the data multiple times per epoch, update multiple times per epoch, and can update in a number of ways. I assure you that your exact method can be duplicated with the existing matlab toolbox, but with a very specific setting, which can be found by digging around in the help files for the neural network you're using. Some of them may be closer to what you're doing than others, so be discerning. Good luck!