I am using the Eigenjoints of skeleton features to perform human action recognition by Matlab.
I have 320 videos, so the training data is 320x1 cell array, each one cell contains Nx2970 double array, where N is number of frames (it is variable because each video contains different number of frames), 2970 is number of features extracted from each video (it is constant because I am using same extraction method for all videos).
How can I format the training data into a 2d double matrix to use as input for an SVM? I don't know how to do it because SVM requires double matrix, and the information I have is one matrix for each video of different sizes.
Your question is a bit unclear about how you want to go about classifying human motion from your video. You have two options,
Look at each frame in isolation. This would classify each frame separately. Basically, it would be a pose classifier
Build a new feature that treats the data as a time series. This would classify each video clip.
Single Frame Classification
For the first option, the solution to your problem is simple. You simply concatenate all the frames into one big matrix.
Let me give a toy example. I've made X_cell, a cell array with a video with 2 frames and a video with 3 frames. In your question, you don't specify where you get your ground truth labels from. I'm going to assume that you have per video labels stored in a vector video_labels
X_cell = {[1 1 1; 2 2 2], [3 3 3; 4 4 4; 5 5 5]};
video_labels = [1, 0];
One simple way to concatenate these is to use a for loop,
X = [];
Y = [];
for ii = 1:length(X_cell)
X = [X; X_cell{ii}];
Y = [Y', repmat(video_labels(ii), size(X_cell{ii},1), 1)];
end
There is probably also a more efficient solution. You could think about vectorizing this code if you need to improve speed.
Whole Video Classification
Time series features are a course topic all in themselves. Here the simplest thing you could do is simply resize all the video clips to have the same length using imresize. Then vectorize the resulting matrix. This will create a very long, redundant feature.
num_frames = 10; %The desired video length
length_frame_feature = 2;
num_videos = length(X_cell);
X = zeros(num_videos, length_frame_feature*num_frames);
for ii=1:length(X_cell)
video_feature = imresize(X_cell{ii}, [num_frames, length_frame_feature]);
X(ii, :) = video_feature(:);
end
Y = video_labels;
For more sophisticated techniques, take a look at spectrograms.
Related
I have this dataset consisting of around 800 images of the view of a car driving in circles with corresponding coordinates and changing background. The goal is
to train a neural network to predict the position based on the images. I reshaped the images so that
the original 160x320 pixels come down to 1x51200, so that I can feed my NN more easily. However, because
this is a quite large dimension I applied a PCA to reduce the dimension and the PCA indeed worked well, so
that I could only take the 100 eigenvectors with the highest eigenvalues and still have 90-95 % of the total variance.
But now comes my obstacle: I have these 100 images, still reconstructable and visualizable, but I don't exactly know, to which coordinates they correspond. I can't just take the first 100 coordinates because these eigenvalues where obviously taken from different timesteps through the progression of the images. I need this information so that my NN is able to match them while learning and checking its progress while testing. I read a similar question where the answers stated that it's not possible to extract indices out of a PCA-output, but I'm pretty sure there must be other persons who already faced similar obstacles?
What you have to be able to do with PCA to go from compact to original representation.
# dataset.shape = (800, 51200)
M = pca(dataset); # M.shape = (100, 51200)
original = dataset[k, :] # one vector corresponding to one image in your dataset
compressed_dataset = dataset # M.T; # compressed_dataset.shape = (800, 100)
# if you have a compressed representation of an image you restore it with
restored = compressed_dataset[k, :] # M # .shape = (1, 512000)
# here you expect all_close(restored, original)
Notice that the indices are stored appart from the data vectors.
I'm using matlab to mimic some c code that I've written so that I can replicate the result and plot it in matlab.
In c I am using an open source sliding median filter linked here.
I am currently using medfilt1 to mimic it in matlab, but there are two problems.
It is SLOW! it would be ideal to have a similar filter in matlab, but I don't want to take the time to write one. If one does not exist, I'd like to at least be able to accomplish #2.
medfilt1 does not wrap around.
x = 1:5;
medfilt1(x,3) % = 1 2 3 4 4
It calculates the first values like
median([0 1 2]) % = 1
This is because medfilt1 uses zeros to fill in when you get near the edges. I would like a way to change this so that the first value is calculated like
median([5 1 2]) % = 2
medfiltcirc(x,3) % = 2 2 3 4 4
UPDATE: For fun, I wrote a sliding filter in matlab, and it turned out to be about 4x slower than using medfilt1 with padarray. Using ordfilt2(padarray(x,[N/2 0],'circular'),N/2,ones(N,1)) proved to be even faster than medfilt1. I believe the only way to improve speed further would be to write a mex file.
another option is to use padarray before operating with the median filter:
x = medfilt1(padarray(1:5,[0 1],'circular'));
and take x(2:end-1) as the answer...
to improve medfilt1 consider using ordfilt2 for example:
x = ordfilt2(1:5,2,[1 1 1]);
this should buy you up to a factor of two, read more about it here, and do pay attention to the variable class used for A...
You can do the wrapping around yourself. Let the data be defined as
x = 1:5; %// data values
S = 3; %// block size
Then:
s = floor(S/2); %// how many elements need to be wrapped around
n = numel(x); %// length of x
result = medfilt1([x(n-s+1:n) x x(1:s)], S);
result = result(s+1:end-s);
As I'm sure that you're aware, circular statistics can be treacherous – from the Wikipedia article on directional statistics:
When data is concentrated, the median and mode may be defined by analogy to the linear case, but for more dispersed or multi-modal data, these concepts are not useful.
Have you looked at the Circular statistics Toolbox on the MathWorks File Exchange? It includes a circ_median function. I don't know how fast this function is in comparison to what you're using, but it might be a good starting point for optimization. A paper was published in the Journal of Statistical Software on the toolbox (alt link to PDF). It might be a good idea to at least check if this function matches the output of other methods.
I'd like to begin by saying that I'm really new to CV, and there may be some obvious things I didn't think about, so don't hesitate to mention anything of that category.
I am trying to achieve scene classification, currently between indoor and outdoor images for simplicity.
My idea to achieve this is to use a gist descriptor, which creates a vector with certain parameters of the scene.
In order to obtain reliable classification, I used indoor and outdoor images, 100 samples each, used a gist descriptor, created a training matrix out of them, and used 'svmtrain' on it. Here's a pretty simple code that shows how I trained the gist vectors:
train_label= zeros(size(200,1),1);
train_label(1:100,1) = 0; % 0 = indoor
train_label(101:200,1) = 1; % 1 = outdoor
training_mat(1:100,:) = gist_indoor1;
training_mat(101:200,:) = gist_outdoor1;
test_mat = gist_test;
SVMStruct = svmtrain(training_mat ,train_label, 'kernel_function', 'rbf', 'rbf_sigma', 0.6);
Group = svmclassify(SVMStruct, test_mat);
The problem is that the results are pretty bad.
I read that optimizing the constraint and gamma parameters of the 'rbf' kernell should improve the classification, but:
I'm not sure how to optimize with multidimensional data vectors(the optimization example given in Mathworks site is in 2D while mine is 512), any suggestion how to begin?
I might be completely in the wrong direction, please indicate if it is so.
Edit:
Thanks Darkmoor! I'll try calibrating using this toolbox, and maybe try to improve my feature extraction.
Hopefully when I have a working classification, I'll post it here.
Edit 2: Forgot to update, by obtaining gist descriptors of indoor and urban outdoor images from the SUN database, and training with optimized parameters by using the libsvm toolbox, I managed to achieve a classification rate of 95% when testing the model on pictures from my apartment and the street outside.
I did the same with urban outdoor scenes and natural scenes from the database, and achieved similar accuracy when testing on various scenes from my country.
The code I used to create the data matrices is taken from here, with very minor modifications:
% GIST Parameters:
clear param
param.imageSize = [256 256]; % set a normalized image size
param.orientationsPerScale = [8 8 8 8]; % number of orientations per scale (from HF to LF)
param.numberBlocks = 4;
param.fc_prefilt = 4;
%Obtain images from folders
sdirectory = 'C:\Documents and Settings\yotam\My Documents\Scene_Recognition\test_set\indoor&outdoor_test';
jpegfiles = dir([sdirectory '/*.jpg']);
% Pre-allocate gist:
Nfeatures = sum(param.orientationsPerScale)*param.numberBlocks^2;
gist = zeros([length(jpegfiles) Nfeatures]);
% Load first image and compute gist:
filename = [sdirectory '/' jpegfiles(1).name];
img = imresize(imread(filename),param.imageSize);
[gist(1, :), param] = LMgist(img, '', param); % first call
% Loop:
for i = 2:length(jpegfiles)
filename = [sdirectory '/' jpegfiles(i).name];
img = imresize(imread(filename),param.imageSize);
gist(i, :) = LMgist(img, '', param); % the next calls will be faster
end
I suggest you to use libsvm it is very efficient. There is relevant post for cross validation of libsvm. The same logic can be used for the relevant Matlab lib you mention.
Your logic is correct. Extract features and try to classify them. In any case, do not expect that the calibration of your classifier will return huge differences. The key idea is the feature extraction for huge differences in your results, in combination of course with your classifier calibration ;).
Good luck.
Suppose I have a weight matrix W nxm where m is the number of variables and the n is the number of instances. Also I have data matrix X of the same size. I try to find the closest weight vector to each instance in X. However both matrices are so dimensional therefore plain methods are not sufficient enough. I have tried some GPU trick at MATLAB but it does not work well since it was sequential approach that was calculating the closest weight for each instance sequentially. I am now looking for efficient one shot code. That takes all the W and X and find the winner with some MATLAB tricks with possibly some GPU addition. Is there any one that can suggest any code snippet in the MATLAB?
This is the thing that I wrote for sequential
x_in_d = gpuArray(x_in); % take input instance to device
W_d = gpuArray(W); % take weight matrix to device
Dx = W_d - x_in_d(ones(size(W_d,1),1),logical(ones(1,length(x_in_d))));
[d_min,winner] = min(sum((Dx.^2)'));
d_min = gather(d_min); %gather results
winner = gather(winner);
What do you mean by so dimensional? It's just an m x n matrix right?
It would be really helpful if you could provide some sample data, based off your description (which isn't the clearest), here is what I think your data looks like.
weights=
[1 4 2
5 3 1]
data=
[2 5 1
1 2 2]
And you want to figure out which row of weights is closest to the row of data? Which in this case would be the first row of weights for both rows of data.
Please edit your question to clarify what your asking for and consider using some examples.
EDIT:
I like Rody's Dup. Comment, if I am correct, check out: Link Here
I have 10 images(18x18). I save these images inside an array named images[324][10] where the number 324 represents the amount of pixels for an image and the number 10 the total amount of images that I have.
I would like to use these images for a neuron network however 324 is a big number to give as an input and thus I would like to decrease this number but retain as much information as possible.
I heard that you can do this with the princomp function which implements PCA.
The problem is that I haven't found any example on how to use this function, and especially for my case.
If I run
[COEFF, SCORE, latent] = princomp(images);
it runs fine but how can I then get the array newimages[number_of_desired_features][10]?
PCA could be a right choice here (but not the only one). Although, you should be aware of the fact, that PCA does not reduce the number of your input data features automatically. I recommend you reading this tutorial: http://arxiv.org/pdf/1404.1100v1.pdf - it is the one I used to understand PCA and its really good for beginners.
Getting back to your question. An image is an vector in a 324-dimensional space. In this space the first base vector is one having a white pixel in top left corner, next one is having next pixel white, all the other black - and so on. It probably is not the best base vector set to represent this image data. PCA computes new base vectors (the COEFF matrix - the new vectors expressed as values in old vector space) and new image vector values (the SCORE matrix). At that point you have not lost ANY data at all (no feature number reduction). But, you could stop using some of the new base vectors, because they are probably connected with noice, not the data itself. It is all described in details in the tutorial.
images = rand(10,324);
[COEFF, SCORE] = princomp(images);
reconstructed_images = SCORE / COEFF + repmat(mean(images,1), 10, 1);
images - reconstructed_images
%as you see there are almost only zeros - the non-zero values are effects of small numerical errors
%its possible because you are only switching between the sets of base vectors used to represent the data
for i=100:324
SCORE(:,i) = zeros(10,1);
end
%we remove the features 100 to 324, leaving only first 99
%obviously, you could take only the non-zero part of the matrix and use it
%somewhere else, like for your neural network
reconstructed_images_with_reduced_features = SCORE / COEFF + repmat(mean(images,1), 10, 1);
images - reconstructed_images_with_reduced_features
%there are less features, but reconstruction is still pretty good