Anomally detection - matlab

I'm writing a code to detect anomalys using Gaussian distribution.
This is the code i wrote to compute the probability density function:
function p = multivariateGaussian(X, mu, Sigma2)
%MULTIVARIATEGAUSSIAN Computes the probability density function of the
%multivariate gaussian distribution.
% p = MULTIVARIATEGAUSSIAN(X, mu, Sigma2) Computes the probability
% density function of the examples X under the multivariate gaussian
% distribution with parameters mu and Sigma2. If Sigma2 is a matrix, it is
% treated as the covariance matrix. If Sigma2 is a vector, it is treated
% as the \sigma^2 values of the variances in each dimension (a diagonal
% covariance matrix)
%
k = length(mu);
if (size(Sigma2, 2) == 1) || (size(Sigma2, 1) == 1)
Sigma2 = diag(Sigma2);
end
X = bsxfun(#minus, X, mu(:)');
p = (2 * pi) ^ (- k / 2) * det(Sigma2) ^ (-0.5) * ...
exp(-0.5 * sum(bsxfun(#times, X * pinv(Sigma2), X), 2));
end
My first question: are there a fastter and clever way to compute this? I have a little matlab cluster setuped here with 2 pcs, but in this case, i have no clue how to parallelize this.
My second question:In one of the matrices i using as trainning set is [42712X19700], even having 24 gb of ram, im getting out of memory error. Is it possible use a technique like random forest(slice the trainning set, and then combine de results?)? Or any other way to circumvent this problem?
I appreciate any help. Tks in advance!

Divide the data into small chunks and apply parfor for processing of each chunk. That is simply my choose for large scale processing. Or you might use process based parallelization and read one chunk as another process is computing on another chunk of data.

Related

What is the reverse process of the repmat or repemel command?

I have a matrix of 50-by-1 that is demodulated data. As this matrix has only one element in each row, I want to repeat this only element 16 times in each row so the matrix become 50 by 16. I did it using the repmat(A,16) command in Matlab. Now at receiving end noise is also added in matrix of 50 by 16. I want to get it back of 50 by 1 matrix. How can I do this?
I tried averaging of all rows but it is not a valid method. How can I know when an error is occurring in this process?
You are describing a problem of the form y = A * x + n, where y is the observed data, A is a known linear transform, and n is noise. The least squares estimate is the simplest estimate of the unknown vector x. The keys here are to express the repmat() function as a matrix and the observed data as a vector (i.e., a 50*16x1 vector rather than a 50x16 matrix).
x = 10 * rand(50,1); % 50x1 data vector;
A = repmat(eye(length(x)),[16,1]); % This stacks 16 replicas of x.
n = rand(50*16,1); % Noise
y = A * x + n; % Observed data
xhat = A \ y; % Least squares estimate of x.
As for what the inverse (what I assume you mean by 'reverse') of A is, it doesn't have one. If you look at its rank, you'll see it is only 50. The best you can do is to use its pseudoinverse, which is what the \ operator does.
I hope the above helps.

numerical integration for Gaussian function - indefinite integral

My approach
fun = #(y) (1/sqrt(pi))*exp(-(y-1).^2).*log(1 + exp(-4*y))
integral(fun,-Inf,Inf)
This gives NaN.
So I tried plotting it.
y= -10:0.1:10;
plot(y,exp(-(y-1).^2).*log(1 + exp(-4*y)))
Then understood that domain (siginificant part) is from -4 to +4.
So changed the limits to
integral(fun,-10,10)
However I do not want to always plot the graph and then know its limits. So is there any way to know the integral directly from -Inf to Inf.
Discussion
If your integrals are always of the form
I would use a high-order Gauss–Hermite quadrature rule.
It's similar to the Gauss-Legendre-Kronrod rule that forms the basis for quadgk but is specifically tailored for integrals over the real line with a standard Gaussian multiplier.
Rewriting your equation with the substitution x = y-1, we get
.
The integral can then be computed using the Gauss-Hermite rule of arbitrary order (within reason):
>> order = 10;
>> [nodes,weights] = GaussHermiteRule(order);
>> f = #(x) log(1 + exp(-4*(x+1)))/sqrt(pi);
>> sum(f(nodes).*weights)
ans =
0.1933
I'd note that the function below builds a full order x order matrix to compute nodes, so it shouldn't be made too large.
There is a way to avoid this by explicitly computing the weights, but I decided to be lazy.
Besides, event at order 100, the Gaussian multiplier is about 2E-98, so the integrand's contribution is extremely minimal.
And while this isn't inherently adaptive, a high-order rule should be sufficient in most cases ... I hope.
Code
function [nodes,weights] = GaussHermiteRule(n)
% ------------------------------------------------------------------------------
% Find the nodes and weights for a Gauss-Hermite Quadrature integration.
%
if (n < 1)
error('There is no Gauss-Hermite rule of order 0.');
elseif (n < 0) || (abs(n - round(n)) > eps())
error('Given order ''n'' must be a strictly positive integer.');
else
n = round(n);
end
% Get the nodes and weights from the Golub-Welsch function
n = (0:n)' ;
b = n*0 ;
a = b + 0.5 ;
c = n ;
[nodes,weights] = GolubWelsch(a,b,c,sqrt(pi));
end
function [xk,wk] = GolubWelsch(ak,bk,ck,mu0)
%GolubWelsch
% Calculate the approximate* nodes and weights (normalized to 1) of an orthogonal
% polynomial family defined by a three-term reccurence relation of the form
% x pk(x) = ak pkp1(x) + bk pk(x) + ck pkm1(x)
%
% The weight scale factor mu0 is the integral of the weight function over the
% orthogonal domain.
%
% Calculate the terms for the orthonormal version of the polynomials
alpha = sqrt(ak(1:end-1) .* ck(2:end));
% Build the symmetric tridiagonal matrix
T = full(spdiags([[alpha;0],bk,[0;alpha]],[-1,0,+1],length(alpha),length(alpha)));
% Calculate the eigenvectors and values of the matrix
[V,xk] = eig(T,'vector');
% Calculate the weights from the eigenvectors - technically, Golub-Welsch requires
% a normalization, but since MATLAB returns unit eigenvectors, it is omitted.
wk = mu0*(V(1,:).^2)';
end
I've had success with transforming such infinite-bounded integrals using a numerical variable transformation, as explained in Numerical Recipes 3e, section 4.5.3. Basically, you substitute in y=c*tan(t)+b and then numerically integrate over t in (-pi/2,pi/2), which sweeps y from -infinity to infinity. You can tune the values of c and b to optimize the process. This approach largely dodges the question of trying to determine cutoffs in the domain, but for this to work reliably using quadrature you have to know that the integrand does not have features far from y=b.
A quick and dirty solution would be to look for a position, where your function is sufficiently small enough and then taking it as limits. This assumes that for x>0 the function fun decreases montonically and fun(x) is roughly the same size as fun(-x) for all x.
%// A small number
epsilon = eps;
%// Stepsize for searching bound
stepTest = 1;
%// Starting position for searching bound
position = 0;
%// Not yet small enough
smallEnough = false;
%// Search bound
while ~smallEnough
smallEnough = (fun(position) < eps);
position = position + stepTest;
end
%// Calculate integral
integral(fun, -position, position)
If your were happy with plotting the function, deciding by eye where you can cut, then this code will suffice, I guess.

Basic SVM Implemented in MATLAB

Linearly Non-Separable Binary Classification Problem
First of all, this program isn' t working correctly for RBF ( gaussianKernel() ) and I want to fix it.
It is a non-linear SVM Demo to illustrate classifying 2 class with hard margin application.
Problem is about 2 dimensional radial random distrubuted data.
I used Quadratic Programming Solver to compute Lagrange multipliers (alphas)
xn = input .* (output*[1 1]); % xiyi
phi = gaussianKernel(xn, sigma2); % Radial Basis Function
k = phi * phi'; % Symmetric Kernel Matrix For QP Solver
gamma = 1; % Adjusting the upper bound of alphas
f = -ones(2 * len, 1); % Coefficient of sum of alphas
Aeq = output'; % yi
beq = 0; % Sum(ai*yi) = 0
A = zeros(1, 2* len); % A * alpha <= b; There isn't like this term
b = 0; % There isn't like this term
lb = zeros(2 * len, 1); % Lower bound of alphas
ub = gamma * ones(2 * len, 1); % Upper bound of alphas
alphas = quadprog(k, f, A, b, Aeq, beq, lb, ub);
To solve this non linear classification problem, I wrote some kernel functions such as gaussian (RBF), homogenous and non-homogenous polynomial kernel functions.
For RBF, I implemented the function in the image below:
Using Tylor Series Expansion, it yields:
And, I seperated the Gaussian Kernel like this:
K(x, x') = phi(x)' * phi(x')
The implementation of this thought is:
function phi = gaussianKernel(x, Sigma2)
gamma = 1 / (2 * Sigma2);
featDim = 10; % Length of Tylor Series; Gaussian Kernel Converge 0 so It doesn't have to Be Inf Dimension
phi = []; % Kernel Output, The Dimension will be (#Sample) x (featDim*2)
for k = 0 : (featDim - 1)
% Gaussian Kernel Trick Using Tylor Series Expansion
phi = [phi, exp( -gamma .* (x(:, 1)).^2) * sqrt(gamma^2 * 2^k / factorial(k)) .* x(:, 1).^k, ...
exp( -gamma .* (x(:, 2)).^2) * sqrt(gamma^2 * 2^k / factorial(k)) .* x(:, 2).^k];
end
end
*** I think my RBF implementation is wrong, but I don' t know how to fix it. Please help me here.
Here is what I got as output:
where,
1) The first image : Samples of Classes
2) The second image : Marking The Support Vectors of Classes
3) The third image : Adding Random Test Data
4) The fourth image : Classification
Also, I implemented Homogenous Polinomial Kernel " K(x, x') = ( )^2 ", code is:
function phi = quadraticKernel(x)
% 2-Order Homogenous Polynomial Kernel
phi = [x(:, 1).^2, sqrt(2).*(x(:, 1).*x(:, 2)), x(:, 2).^2];
end
And I got surprisingly nice output:
To sum up, the program is working correctly with using homogenous polynomial kernel but when I use RBF, it isn' t working correctly, there is something wrong with RBF implementation.
If you know about RBF (Gaussian Kernel) please let me know how I can make it right..
Edit: If you have same issue, use RBF directly that defined above and dont separe it by phi.
Why do you want to compute phi for Gaussian Kernel? Phi will be infinite dimensional vector and you are bounding the terms in your taylor series to 10 when we don't even know whether 10 is enough to approximate the kernel values or not! Usually, the kernel is computed directly instead of getting phi (and the computing k). For example [1].
Does this mean we should never compute phi for Gaussian? Not really, no, but we have to be slightly smarter about it. There have been recent works [2,3] which show how to compute phi for Gaussian so that you can compute approximate kernel matrices while having just finite dimensional phi's. Here [4] I give the very simple code to generate the approximate kernel using the trick from the paper. However, in my experiments I needed to generate anywhere from 100 to 10000 dimensional phi's to be able to get a good approximation of the kernel (depending upon on the number of features the original input had as well as the rate at which the eigenvalues of the original matrix tapers off).
For the moment, just use code similar to [1] to generate the Gaussian kernel and then observe the result of SVM. Also, play around with the gamma parameter, a bad gamma parameter can result in really bad classification.
[1] https://github.com/ssamot/causality/blob/master/matlab-code/Code/mfunc/indep/HSIC/rbf_dot.m
[2] http://www.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf
[3] http://www.eecs.berkeley.edu/~brecht/papers/08.rah.rec.nips.pdf
[4] https://github.com/aruniyer/misc/blob/master/rks.m
Since Gaussian kernel is often referred as mapping to infinity dimensions, I always have faith in its capacity. The problem here maybe due to a bad parameter while keeping in mind grid search is always needed for SVM training. Thus I propose you could take a look at here where you could find some tricks for parameter tuning. Exponentially increasing sequence is usually used as candidates.

Custom Algorithm for Exp. maximization in Matlab

I try to write an algorithm which determine $\mu$, $\sigma$,$\pi$ for each class from a mixture multivariate normal distribution.
I finish with the algorithm partially, it works when I set the random guess values($\mu$, $\sigma$,$\pi$) near from the real value. But when I set the values far from the real one, the algorithm does not converge. The sigma goes to 0 $(2.30760684053766e-24 2.30760684053766e-24)$.
I think the problem is my covarience calculation, I am not sure that this is the right way. I found this on wikipedia.
I would be grateful if you could check my algorithm. Especially the covariance part.
Have a nice day,
Thanks,
2 mixture gauss
size x = [400, 2] (400 point 2 dimension gauss)
mu = 2 , 2 (1 row = first gauss mu, 2 row = second gauss mu)
for i = 1 : k
gaussEvaluation(i,:) = pInit(i) * mvnpdf(x,muInit(i,:), sigmaInit(i, :) * eye(d));
gaussEvaluationSum = sum(gaussEvaluation(i, :));
%mu calculation
for j = 1 : d
mu(i, j) = sum(gaussEvaluation(i, :) * x(:, j)) / gaussEvaluationSum;
end
%sigma calculation methode 1
%for j = 1 : n
% v = (x(j, :) - muNew(i, :));
% sigmaNew(i) = sigmaNew(i) + gaussEvaluation(i,j) * (v * v');
%end
%sigmaNew(i) = sigmaNew(i) / gaussEvaluationSum;
%sigma calculation methode 2
sub = bsxfun(#minus, x, mu(i,:));
sigma(i,:) = sum(gaussEvaluation(i,:) * (sub .* sub)) / gaussEvaluationSum;
%p calculation
p(i) = gaussEvaluationSum / n;
Two points: you can observe this even when you implement gaussian mixture EM correctly, but in your case, the code does seem to be incorrect.
First, this is just a problem that you have to deal with when fitting mixtures of gaussians. Sometimes one component of the mixture can collapse on to a single point, resulting in the mean of the component becoming that point and the variance becoming 0; this is known as a 'singularity'. Hence, the likelihood also goes to infinity.
Check out slide 42 of this deck: http://www.cs.ubbcluj.ro/~csatol/gep_tan/Bishop-CUED-2006.pdf
The likelihood function that you are evaluating is not log-concave, so the EM algorithm will not converge to the same parameters with different initial values. The link I gave above also gives some solutions to avoid this over-fitting problem, such as putting a prior or regularization term on your parameters. You can also consider running multiple times with different starting parameters and discarding any results with variance 0 components as having over-fitted, or just reduce the number of components you are using.
In your case, your equation is right; the covariance update calculation on Wikipedia is the same as the one on slide 45 of the above link. However, if you are in a 2d space, for each component the mean should be a length 2 vector and the covariance should be a 2x2 matrix. Hence your code (for two components) is wrong because you have a 2x2 matrix to store the means and a 2x2 matrix to store the covariances; it should be a 2x2x2 matrix.

Efficient low-rank appoximation in MATLAB

I'd like to compute a low-rank approximation to a matrix which is optimal under the Frobenius norm. The trivial way to do this is to compute the SVD decomposition of the matrix, set the smallest singular values to zero and compute the low-rank matrix by multiplying the factors. Is there a simple and more efficient way to do this in MATLAB?
If your matrix is sparse, use svds.
Assuming it is not sparse but it's large, you can use random projections for fast low-rank approximation.
From a tutorial:
An optimal low rank approximation can be easily computed using the SVD of A in O(mn^2
). Using random projections we show how to achieve an ”almost optimal” low rank pproximation in O(mn log(n)).
Matlab code from a blog:
clear
% preparing the problem
% trying to find a low approximation to A, an m x n matrix
% where m >= n
m = 1000;
n = 900;
%// first let's produce example A
A = rand(m,n);
%
% beginning of the algorithm designed to find alow rank matrix of A
% let us define that rank to be equal to k
k = 50;
% R is an m x l matrix drawn from a N(0,1)
% where l is such that l > c log(n)/ epsilon^2
%
l = 100;
% timing the random algorithm
trand =cputime;
R = randn(m,l);
B = 1/sqrt(l)* R' * A;
[a,s,b]=svd(B);
Ak = A*b(:,1:k)*b(:,1:k)';
trandend = cputime-trand;
% now timing the normal SVD algorithm
tsvd = cputime;
% doing it the normal SVD way
[U,S,V] = svd(A,0);
Aksvd= U(1:m,1:k)*S(1:k,1:k)*V(1:n,1:k)';
tsvdend = cputime -tsvd;
Also, remember the econ parameter of svd.
You can rapidly compute a low-rank approximation based on SVD, using the svds function.
[U,S,V] = svds(A,r); %# only first r singular values are computed
svds uses eigs to compute a subset of the singular values - it will be especially fast for large, sparse matrices. See the documentation; you can set tolerance and maximum number of iterations or choose to calculate small singular values instead of large.
I thought svds and eigs could be faster than svd and eig for dense matrices, but then I did some benchmarking. They are only faster for large matrices when sufficiently few values are requested:
n k svds svd eigs eig comment
10 1 4.6941e-03 8.8188e-05 2.8311e-03 7.1699e-05 random matrices
100 1 8.9591e-03 7.5931e-03 4.7711e-03 1.5964e-02 (uniform dist)
1000 1 3.6464e-01 1.8024e+00 3.9019e-02 3.4057e+00
2 1.7184e+00 1.8302e+00 2.3294e+00 3.4592e+00
3 1.4665e+00 1.8429e+00 2.3943e+00 3.5064e+00
4 1.5920e+00 1.8208e+00 1.0100e+00 3.4189e+00
4000 1 7.5255e+00 8.5846e+01 5.1709e-01 1.2287e+02
2 3.8368e+01 8.6006e+01 1.0966e+02 1.2243e+02
3 4.1639e+01 8.4399e+01 6.0963e+01 1.2297e+02
4 4.2523e+01 8.4211e+01 8.3964e+01 1.2251e+02
10 1 4.4501e-03 1.2028e-04 2.8001e-03 8.0108e-05 random pos. def.
100 1 3.0927e-02 7.1261e-03 1.7364e-02 1.2342e-02 (uniform dist)
1000 1 3.3647e+00 1.8096e+00 4.5111e-01 3.2644e+00
2 4.2939e+00 1.8379e+00 2.6098e+00 3.4405e+00
3 4.3249e+00 1.8245e+00 6.9845e-01 3.7606e+00
4 3.1962e+00 1.9782e+00 7.8082e-01 3.3626e+00
4000 1 1.4272e+02 8.5545e+01 1.1795e+01 1.4214e+02
2 1.7096e+02 8.4905e+01 1.0411e+02 1.4322e+02
3 2.7061e+02 8.5045e+01 4.6654e+01 1.4283e+02
4 1.7161e+02 8.5358e+01 3.0066e+01 1.4262e+02
With size-n square matrices, k singular/eigen values and runtimes in seconds. I used Steve Eddins' timeit file exchange function for benchmarking, which tries to account for overhead and runtime variations.
svds and eigs are faster if you want a few values from a very large matrix. It also depends on the properties of the matrix in question (edit svds should give you some idea why).