Split an uncompressed video into segments with ffmpeg on MatLab? - matlab

I have a video sequence(format Y4M) and I want to split it into sevral segment with the same GoP size.
GoP = 8;
How can I do that in MatLab using FFMPEG ?

One standard way of representing video in Matlab is a 4D matrix. The dimensions are height x width x color channels x frames. Once you have the matrix, it is easy to take time slices by specifying the range of frames that you would like.
For example, you can grab 8 frames at a time in a for-loop
%Loads video as 4D matrix
v = VideoReader('xylophone.mp4');
while hasFrame(v)
video = cat(4, video, readFrame(v));
end
%iterate over the length of the movie with step size of 8
for i=1:8:size(video, 4)-8
video_slice = video(:,:,:,i:i+7); %get the next 8 frames
% do something with the 8 frames here
% each frame is a slice across the 4th dimension
frame1 = video_slice(:,:,:,1);
end
%play movie
implay(video)
The other most common way of representing video is in a structure array. You can index a structure array with a range of values to slice 8 frames. The actual frame values in my example are stored in the structure element cdata. Depending on your structure the element might have a different name; look for an element with a 3d matrix value.
% Loads video as structure
load mri
video = immovie(D,map);
%iterate over the length of the movie with step size of 8
for i=1:8:size(video, 4)-8
video_slice = video(i:i+7); %get the next 8 frames
% do something with the 8 frames here
% to access the frame values use cdata
frame1 = video_slice(1).cdata
end
%play movie
implay(video)
The tricky part is your video format. Y4M is not supported by Matlab's VideoReader which is the most common way to load video. It is also not supported by the FFmpeg Toolbox which only provides a few media formats (MP3, AAC, mpeg4, x264, animated GIF).
There are a couple other questions that look for solutions to this problem including
how to read y4m video(get the frames) file in matlab
How to read yuv videos in matlab?
I would also check on the Matlab File Exchange, but I don't have personal experience with any of these methods.

Related

how to get feature of png image using 2d dct?

Could I use dct to extract the feature of .png images?
Or dct is just for jgp? Because my dataset using png format.
I've read several journals, and find that 2d dct could used to extract the feature based on coefficient. I need the features for Neural Neutwork.
I've tried basic code to do 2d dct (using matlab):
i = imread ('AB1.png');
b = im2double (x);
d = dct2 (b, [64 64]);
but, i am still not sure, that this code really give me the appropriate feature that i need. do you have any recommendation of another codes?
And also why 'dctmtx' function give me the same coefficient for different image?
*Thanks before.
First, png does not matter, as long as you are not doing some alpha channel processing etc, reading png is just like reading jpg since you are doing your DCT on the matrix representation of the image - instead of the file.
Your code:
d = dct2 (b, [64 64]);
should give you the 2d-DCT of the zero padded 64 by 64 image.
To check you could try something like:
d = dct(dct(b.').') %//If you want to pad your image with zero first.
since dct2 is implemented using dct at core.
as for dctmtx - it should give you the dct matrix which you can apply to your image matrix to obtain the dct result of your image (Thus , the result generated by dctmtx should be the same for any image that are the same size). Matlab gives a clear example:
A = im2double(imread('rice.png')); %//your image
D = dctmtx(size(A,1)); %//Generate a DCT matrix of the SIZE of your image
dct = D*A*D'; %//Obtain 2D - DCT
figure, imshow(dct) %//Result transform
All three examples should give you the same result.
And finally, in terms of best feature extraction algorithm/transformation, it really depends on what you are trying to achieve - recognition/enhancement/encryption but in general, DCT is very good and efficient for regular images.

How to format features for SVM for human recognition?

I am using the Eigenjoints of skeleton features to perform human action recognition by Matlab.
I have 320 videos, so the training data is 320x1 cell array, each one cell contains Nx2970 double array, where N is number of frames (it is variable because each video contains different number of frames), 2970 is number of features extracted from each video (it is constant because I am using same extraction method for all videos).
How can I format the training data into a 2d double matrix to use as input for an SVM? I don't know how to do it because SVM requires double matrix, and the information I have is one matrix for each video of different sizes.
Your question is a bit unclear about how you want to go about classifying human motion from your video. You have two options,
Look at each frame in isolation. This would classify each frame separately. Basically, it would be a pose classifier
Build a new feature that treats the data as a time series. This would classify each video clip.
Single Frame Classification
For the first option, the solution to your problem is simple. You simply concatenate all the frames into one big matrix.
Let me give a toy example. I've made X_cell, a cell array with a video with 2 frames and a video with 3 frames. In your question, you don't specify where you get your ground truth labels from. I'm going to assume that you have per video labels stored in a vector video_labels
X_cell = {[1 1 1; 2 2 2], [3 3 3; 4 4 4; 5 5 5]};
video_labels = [1, 0];
One simple way to concatenate these is to use a for loop,
X = [];
Y = [];
for ii = 1:length(X_cell)
X = [X; X_cell{ii}];
Y = [Y', repmat(video_labels(ii), size(X_cell{ii},1), 1)];
end
There is probably also a more efficient solution. You could think about vectorizing this code if you need to improve speed.
Whole Video Classification
Time series features are a course topic all in themselves. Here the simplest thing you could do is simply resize all the video clips to have the same length using imresize. Then vectorize the resulting matrix. This will create a very long, redundant feature.
num_frames = 10; %The desired video length
length_frame_feature = 2;
num_videos = length(X_cell);
X = zeros(num_videos, length_frame_feature*num_frames);
for ii=1:length(X_cell)
video_feature = imresize(X_cell{ii}, [num_frames, length_frame_feature]);
X(ii, :) = video_feature(:);
end
Y = video_labels;
For more sophisticated techniques, take a look at spectrograms.

Matlab : 256 binary matrices to one 256 levels grayscale image

a process of mine produces 256 binary (logical) matrices, one for each level of a grayscale source image.
Here is the code :
so = imread('bio_sd.bmp');
co = rgb2gray(so);
for l = 1:256
bw = (co == l); % Binary image from level l of original image
be = ordfilt2(bw, 1, ones(3, 3)); % Convolution filter
bl(int16(l)) = {bwlabel(be, 8)}; % Component labelling
end
I obtain a cell array of 256 binary images. Such a binary image contains 1s if the source-image pixel at that location has the same level as the index of the binary image.
ie. the binary image bl{12} contains 1s where the source image has pixels with the level 12.
I'd like to create new image by combining the 256 binary matrices back to a grayscale image.
But i'm very new to Matlab and i wonder if someone can help me to code it :)
ps : i'm using matlab R2010a student edition.
this whole answer only applies to the original form of the question.
Lets assume you can get all your binary matrices together into a big n-by-m-by-256 matrix binaryimage(x,y,greyvalue). Then you can calculate your final image as
newimage=sum(bsxfun(#times,binaryimage,reshape(0:255,1,1,[])),3)
The magic here is done by bsxfun, which multiplies the 3D (n x m x 256) binaryimage with the 1 x 1 x 256 vector containing the grey values 0...255. This produces a 3D image where for fixed x and y, the vector (y,x,:) contains many zeros and (for the one grey value G where the binary image contained a 1) it contains the value G. So now you only need to sum over this third dimension to get a n x m image.
Update
To test that this works correctly, lets go the other way first:
fullimage=floor(rand(100,200)*256);
figure;imshow(fullimage,[0 255]);
is a random greyscale image. You can calculate the 256 binary matrices like this:
binaryimage=false([size(fullimage) 256]);
for i=1:size(fullimage,1)
for j=1:size(fullimage,2)
binaryimage(i,j,fullimage(i,j)+1)=true;
end
end
We can now apply the solution I gave above
newimage=sum(bsxfun(#times,binaryimage,reshape(0:255,1,1,[])),3);
and verify that I returns the original image:
all(newimage(:)==fullimage(:))
which gives 1 (true) :-).
Update 2
You now mention that your binary images are in a cell array, I assume binimg{1:256}, with each cell containing an n x m binary array. If you can it probably makes sense to change the code that produces this data to create the 3D binary array I use above - cells are mostly usefull if different cells contain data of different types, shapes or sizes.
If there are good reasons to stick with a cell array, you can convert it to a 3D array using
binaryimage = reshape(cell2mat(reshape(binimg,1,256)),n,m,256);
with n and m as used above. The inner reshape is not necessary if you already have size(binimg)==[1 256]. So to sum it up, you need to use your cell array binimg to calculate the 3D matrix binaryimage, which you can then use to calculate the newimage that you are interested in using the code at the very beginning of my answer.
Hope this helps...
What your code does...
I thought it may be best to first go through what the code you posted is actually doing, since there are a couple of inconsistencies. I'll go through each line in your loop:
bw = (co == l);
This simply creates a binary matrix bw with ones where your grayscale image co has a pixel intensity equal to the loop value l. I notice that you loop from 1 to 256, and this strikes me as odd. Typically, images loaded into MATLAB will be an unsigned 8-bit integer type, meaning that the grayscale values will span the range 0 to 255. In such a case, the last binary matrix bw that you compute when l = 256 will always contain all zeroes. Also, you don't do any processing for pixels with a grayscale level of 0. From your subsequent processing, I'm guessing you purposefully want to ignore grayscale values of 0, in which case you probably only need to loop from 1 to 255.
be = ordfilt2(bw, 1, ones(3, 3));
What you are essentially doing here with ORDFILT2 is performing a binary erosion operation. Any values of 1 in bw that have a 0 as one of their 8 neighbors will be set to 0, causing islands of ones to erode (i.e. shrink in size). Small islands of ones will disappear, leaving only the larger clusters of contiguous pixels with the same grayscale level.
bl(int16(l)) = {bwlabel(be, 8)};
Here's where you may be having some misunderstandings. Firstly, the matrices in bl are not logical matrices. In your example, the function BWLABEL will find clusters of 8-connected ones. The first cluster found will have its elements labeled as 1 in the output image, the second cluster found will have its elements labeled as 2, etc. The matrices will therefore contain positive integer values, with 0 representing the background.
Secondly, are you going to use these labeled clusters for anything? There may be further processing you do for which you need to identify separate clusters at a given grayscale intensity level, but with regard to creating a grayscale image from the elements in bl, the specific label value is unnecessary. You only need to identify zero versus non-zero values, so if you aren't using bl for anything else I would suggest that you just save the individual values of be in a cell array and use them to recreate a grayscale image.
Now, onto the answer...
A very simple solution is to concatenate your cell array of images into a 3-D matrix using the function CAT, then use the function MAX to find the indices where the non-zero values occur along the third dimension (which corresponds to the grayscale value from the original image). For a given pixel, if there is no non-zero value found along the third dimension (i.e. it is all zeroes) then we can assume the pixel value should be 0. However, the index for that pixel returned by MAX will default to 1, so you have to use the maximum value as a logical index to set the pixel to 0:
[maxValue,grayImage] = max(cat(3,bl{:}),[],3);
grayImage(~maxValue) = 0;
Note that for the purposes of displaying or saving the image you may want to change the type of the resulting image grayImage to an unsigned 8-bit integer type, like so:
grayImage = uint8(grayImage);
The simplest solution would be to iterate through each of your logical matrices in turn, multiply it by its corresponding weight, and accumulate into an output matrix which will represent your final image.

How do I import an .AVI movie into 3d matrix in MATLAB

I am trying to write a function that imports an .AVI file and returns a 3D matrix in MATLAB.
Ultimately, this is so I can perform an fftn on the 3d matrix.
I would use the VIDEOREADER class.
Blockquote
% this is basically for gray scale video
function video3d
carobj=mmreader('carwide.avi');
% the carwide.avi is video considered for making it % matrix
nFrames=carobj.NumberOfFrames;
M=carobj.Height; % no of rows
N=carobj.Width; % no of columns
video=zeros(M,N,nFrames,'uint8'); % creating a video 3d matrix
for k= 1 : nFrames
im= read(carobj,k);
im=im(:,:,1); % all three layers will have same image
video(:,:,k)=im;
end
end
Blockquote

How do i play back a sampled audio file at the same speed as the original?

Question is as stated in the title.
After i decimate an audio signal that take every nth point out it in turns speeds up the audio clip at a factor of n. I want the decimated and original clips to have the same length in time.
Heres my code, analyzing and decimated piano .wav
[piano,fs]=wavread('piano.wav'); % loads piano
play=piano(:,1); % Renames the file as "play"
t = linspace(0,time,length(play)); % Time vector
x = play;
y = decimate(x,2);
stem(x(1:30)), axis([0 30 -2 2]) % Original signal
title('Original Signal')
figure
stem(y(1:30)) % Decimated signal
title('Decimated Signal')
%changes the sampling rate
fs1 = fs/2;
fs2 = fs/3;
fs3 = fs/4;
fs4 = fs*2;
fs5 = fs*3;
fs6 = fs*4;
wavwrite(y,fs,'PianoDecimation');
possible solutions: Double each of the remaining points since the new decimated clip is 2x shorter then the original.
I just want to be able to have a side by side comparison of the 2 clips.
here is the audio file: http://www.4shared.com/audio/11xvNmkd/piano.html
Although #sage's answer has a lot of good information, I think the answer to the question is as simple as changing your last line to:
wavplay(y,fs/2,'PianoDecimation')
You have removed half the samples in the file, so in order to play it back over the same time period as the original, you need to set the playback frequency to half as many samples per second.
Are you using wavplay, audioplayer, or something else to play the decimated signals? Are you explicitly specifying the sample frequencies?
The functions take the sample frequency as one of the parameters (the second parameter). You are decreasing the sample frequency as you decimate, so you need to update that parameter accordingly.
Also, when you are plotting the data, you should:
plot N times as many points on the original data (when decimating by N)
provide a corresponding x axis input - I recommend t = (1/Fs:1/Fs:maxT) where maxT is the maximum time you want to plot, which will address #1 if you use the updated Fs, which will result in larger time steps (and make sure to transpose t if it does not match your signal)
I have added an example that plays chirp and decimated chirp (this chirp is part of the standard MATLAB install). I amplified the decimated version. The tic and toc show that the elapsed time is equivalent (within variations in processor loading, etc.) - note that this also works for decim = 3, etc:
load chirp
inWav = y;
inFs = Fs;
decim = 2;
outWav = decimate(inWav,decim);
outFs = inFs/decim;
tic, wavplay(inWav,inFs),toc
pause(0.2)
tic,wavplay(outWav*decim^2,outFs),toc
The function 'decimate' really messes up the chirp sound (the sample rate of which is not very high frequency to begin with), but perhaps you are trying to show something like this...