Dimensionality reduction in HOG feature vector - matlab

I found out the HOG feature vector of the following image in MATLAB.
Input Image
I used the following code.
I = imread('input.jpg');
I = rgb2gray(I);
[features, visualization] = extractHOGFeatures(I,'CellSize',[16 16]);
features comes out to be a 1x1944 vector and I need to reduce the dimensionality of this vector (say to 1x100), what method should I employ for the same?
I thought of Principal Component Analysis and ran the following in MATLAB.
prinvec = pca(features);
prinvec comes out to be an empty matrix (1944x0). Am I doing it wrong? If not PCA, what other methods can I use to reduce the dimension?

You can't do PCA on this, since you have more features than your single observation. Get more observations, some 10,000 presumably, and you can do PCA.
See PCA in matlab selecting top n components for the more detailed and mathematical explanation as to why this is the case.

Related

PCA for dimensionality reduction MATLAB

I have a feature vector of size [4096 x 180], where 180 is the number of samples and 4096 is the feature vector length of each sample.
I want to reduce the dimensionality of the data using PCA.
I tried using the built in pca function of MATLAB [V U]=pca(X) and reconstructed the data by X_rec= U(:, 1:n)*V(:, 1:n)', n being the dimension I chose. This returns a matrix of 4096 x 180.
Now I have 3 questions:
How to obtain the reduced dimension?
When I put n as 200, it gave an error as matrix dimension increased, which gave me the assumption that we cannot reduce dimension lesser than the sample size. Is this true?
How to find the right number of reduced dimensions?
I have to use the reduced dimension feature set for further classification.
If anyone can provide a detailed step by step explanation of the pca code for this I would be grateful. I have looked at many places but my confusion still persists.
You may want to refer to Matlab example to analyse city data.
Here is some simplified code:
load cities;
[~, pca_scores, ~, ~, var_explained] = pca(ratings);
Here, pca_scores are the pca components with respective variances of each component in var_explained. You do not need to do any explicit multiplication after running pca. Matlab will give you the components directly.
In your case, consider that data X is a 4096-by-180 matrix, i.e. you have 4096 samples and 180 features. Your goal is to reduce dimensionality such that you have p features, where p < 180. In Matlab, you can simply run the following,
p = 100;
[~, pca_scores, ~, ~, var_explained] = pca(X, 'NumComponents', p);
pca_scores will be a 4096-by-p matrix and var_explained will be a vector of length p.
To answer your questions:
How to obtain the reduced dimension? I above example, pca_scores is your reduced dimension data.
When I put n as 200, it gave an error as matrix dimension increased, which gave me the assumption that we cannot reduce dimension lesser than the sample size. Is this true? You can't use 200, since the reduced dimensions have to be less than 180.
How to find the right number of reduced dimensions? You can make this decision by checking the var_explained vector. Typically you want to retain about 99% variance of the features. You can read more about this here.

How to get the threshold value of k-means algorithm that is used to binarize the images?

I applied k-means algorithm for segmenting images. I used built in k-means function. It works properly but I want to know the threshold value that converts it to binary images in k-means method. For example, we can get threshold value by using built in function in MATLAB:
threshold=graythresh(grayscaledImage);
a=im2bw(a,threshold);
%Applying k-means....
imdata=reshape(grayscaledImage,[],1);
imdata=double(imdata);
[imdx mn]=kmeans(imdata,2);
imIdx=reshape(imdx,size(grayscaledImage));
imshow(imIdx,[]);
Actually, k-means and the well known Otsu threshold for binarizing intensity images based on a global threshold have an interesting relationship:
http://www-cs.engr.ccny.cuny.edu/~wolberg/cs470/doc/Otsu-KMeansHIS09.pdf
It can be shown that k-means is a locally optimal, iterative solution to the same objective function as Otsu, where Otsu is a globally optimal, non-iterative solution.
Given greyscale intensity data, one could compute a threshold based on otsu, which can be expressed in MATLAB using graythresh, or otsuthresh, depending on which interface you prefer.
A = imread('cameraman.tif');
A = im2double(A);
totsu = otsuthresh(histcounts(A,10000))
[~,c] = kmeans(A(:),2,'Replicates',10);
tkmeans = mean(c)
You can obtain a grayscale threshold from kmeans by just finding the midpoint of the two centroids, which should make sense geometrically since on either side of that midpoint, you are closer to one of the centroids or the other, and should therefore lie in that respective cluster.
totsu =
0.3308
tkmeans =
0.3472
You can't get the threshold because there is no threshold in the kmeans algorithm.
K-means is a clustering algorithm, it returns clusters which in many cases cannot be obtained with a simple thresholding.
See this link to learn further on how k-means works.

PCA for feature extraction MATLAB

I have a data matrix A of size N-by-M.
I wanted use PCA for dimensionality reduction. I want to set the dimensions to 'k'.
I understand that after feature extraction, I should get a Nxk matrix.
I have tried pcares as follows,
[residuals,reconstructed] = pcares(A,k)
But this does not help me.
I am also trying to use the dr toolbox (here)
This returns me a k-by-k matrix. How do I proceede further?
Any help would be appreciated.
Thank You
pcares gives you the residual, which is the error when subtracting the input with the reconstructed input. You can use the pca command. It returns a MxM matrix whose columns are the principle components. You can use the first k of them to construct the feature, just do the following
X = bsxfun(#minus, A, mean(A)) * coeff(:, 1:k);, where coeff is what is returned from the pca command. The function call with bsxfun subtracts the mean (centers the data, as this is what pca did when calculating the output coeff).

How to use SVM in Matlab?

I am new to Matlab. Is there any sample code for classifying some data (with 41 features) with a SVM and then visualize the result? I want to classify a data set (which has five classes) using the SVM method.
I read the "A Practical Guide to Support Vector Classication" article and I saw some examples. My dataset is kdd99. I wrote the following code:
%% Load Data
[data,colNames] = xlsread('TarainingDataset.xls');
groups = ismember(colNames(:,42),'normal.');
TrainInputs = data;
TrainTargets = groups;
%% Design SVM
C = 100;
svmstruct = svmtrain(TrainInputs,TrainTargets,...
'boxconstraint',C,...
'kernel_function','rbf',...
'rbf_sigma',0.5,...
'showplot','false');
%% Test SVM
[dataTset,colNamesTest] = xlsread('TestDataset.xls');
TestInputs = dataTset;
groups = ismember(colNamesTest(:,42),'normal.');
TestOutputs = svmclassify(svmstruct,TestInputs,'showplot','false');
but I don't know that how to get accuracy or mse of my classification, and I use showplot in my svmclassify but when is true, I get this warning:
The display option can only plot 2D training data
Could anyone please help me?
I recommend you to use another SVM toolbox,libsvm. The link is as follow:
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
After adding it to the path of matlab, you can train and use you model like this:
model=svmtrain(train_label,train_feature,'-c 1 -g 0.07 -h 0');
% the parameters can be modified
[label, accuracy, probablity]=svmpredict(test_label,test_feaure,model);
train_label must be a vector,if there are more than two kinds of input(0/1),it will be an nSVM automatically.
train_feature is n*L matrix for n samples. You'd better preprocess the feature before using it. In the test part, they should be preprocess in the same way.
The accuracy you want will be showed when test is finished, but it's only for the whole dataset.
If you need the accuracy for positive and negative samples separately, you still should calculate by yourself using the label predicted.
Hope this will help you!
Your feature space has 41 dimensions, plotting more that 3 dimensions is impossible.
In order to better understand your data and the way SVM works is to begin with a linear SVM. This tybe of SVM is interpretable, which means that each of your 41 features has a weight (or 'importance') associated with it after training. You can then use plot3() with your data on 3 of the 'best' features from the linear svm. Note how well your data is separated with those features and choose a basis function and other parameters accordingly.

Matlab - how to compute PCA on a huge data set [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
MATLAB is running out of memory but it should not be
I want to perform PCA analysis on a huge data set of points. To be more specific, I have size(dataPoints) = [329150 132] where 328150 is the number of data points and 132 are the number of features.
I want to extract the eigenvectors and their corresponding eigenvalues so that I can perform PCA reconstruction.
However, when I am using the princomp function (i.e. [eigenVectors projectedData eigenValues] = princomp(dataPoints); I obtain the following error :
>> [eigenVectors projectedData eigenValues] = princomp(pointsData);
Error using svd
Out of memory. Type HELP MEMORY for your options.
Error in princomp (line 86)
[U,sigma,coeff] = svd(x0,econFlag); % put in 1/sqrt(n-1) later
However, if I am using a smaller data set, I have no problem.
How can I perform PCA on my whole dataset in Matlab? Have someone encountered this problem?
Edit:
I have modified the princomp function and tried to use svds instead of svd, but however, I am obtaining pretty much the same error. I have dropped the error bellow :
Error using horzcat
Out of memory. Type HELP MEMORY for your options.
Error in svds (line 65)
B = [sparse(m,m) A; A' sparse(n,n)];
Error in princomp (line 86)
[U,sigma,coeff] = svds(x0,econFlag); % put in 1/sqrt(n-1) later
Solution based on Eigen Decomposition
You can first compute PCA on X'X as #david said. Specifically, see the script below:
sz = [329150 132];
X = rand(sz);
[V D] = eig(X.' * X);
Actually, V holds the right singular vectors, and it holds the principal vectors if you put your data vectors in rows. The eigenvalues, D, are the variances among each direction. The singular vectors, which are the standard deviations, are computed as the square root of the variances:
S = sqrt(D);
Then, the left singular vectors, U, are computed using the formula X = USV'. Note that U refers to the principal components if your data vectors are in columns.
U = X*V*S^(-1);
Let us reconstruct the original data matrix and see the L2 reconstruction error:
X2 = U*S*V';
L2ReconstructionError = norm(X(:)-X2(:))
It is almost zero:
L2ReconstructionError =
6.5143e-012
If your data vectors are in columns and you want to convert your data into eigenspace coefficients, you should do U.'*X.
This code snippet takes around 3 seconds in my moderate 64-bit desktop.
Solution based on Randomized PCA
Alternatively, you can use a faster approximate method which is based on randomized PCA. Please see my answer in Cross Validated. You can directly compute fsvd and get U and V instead of using eig.
You may employ randomized PCA if the data size is too big. But, I think the previous way is sufficient for the size you gave.
My guess is that you have a huge data set. You don't need all of the svd coefficients. In this case, use svds instead of svd :
Taken directly from Matlab help:
s = svds(A,k) computes the k largest singular values and associated singular vectors of matrix A.
From your question, I understand that you don't call svd directly. But you might as well take a look at princomp (It is editable!) and alter the line that calls it.
You probably needed to calculate an n by n matrix in your computation somehow that is to say:
329150 * 329150 * 8btyes ~ 866GB`
of space which explains why you're getting a memory error. There seems to be an efficient way to calculate pca using princomp(X, 'econ') which I suggest you give it a try.
More on this in stackoverflow and mathworks..
Manually compute X'X (132x132) and svd on it. Or find NIPALS script.