keypoints match between using optical flow - match

I have two pictures
PlanA :
I detect SIFT key poitns in picture A
and using optical flow to find the corresponding points in picture B
Plan B:
but if i detect SIFT key points in picture B
and do the match thing between picture A and picture B
What the diffences between Plan A and Plan B
which one is better?

Plan A is motion estimation of the extracted points. Many of the algorithm work by comparing the intensity values in the neighborhood. Lucas Kanade is one good method.
Plan B may not find all points that are in image A.
Plan B is better. Optical flow does not work with sift descriptors. The excellent way of describing a neighborhood is lost in Plan A.
In Plan B: Many of the keypoints in A will not be matched in B. But those matched will have much higher confidence than in A.
In Plan A: Approximate matches of the points in A will be found
If A and B are continuous frames from video its profitable to use Plan A as the illumination does not change much in continuous frames of video and also Plan B is high on runtime (computing descriptors takes time), else use Plan B.
Cheers,

Related

Lukas Kanade optical flow: Understanding the math

I found a Matlab implementation of the LKT algorithm here and it is based on the brightness constancy equation.
The algorithm calculates the Image gradients in x and y direction by convolving the image with appropriate 2x2 horizontal and vertical edge gradient operators.
The brightness constancy equation in the classic literature has on its right hand side the difference between two successive frames.
However, in the implementation referred to by the aforementioned link, the right hand side is the difference of convolution.
It_m = conv2(im1,[1,1;1,1]) + conv2(im2,[-1,-1;-1,-1]);
Why couldn't It_m be simply calculated as:
it_m = im1 - im2;
As you mentioned, in theory only pixel by pixel difference is stated for optical flow computation.
However, in practice, all natural (not synthetic) images contain some degree of noise. On the other hand, differentiating is some kind of high pass filter and would stress (high pass) noise ratio to the signal.
Therefore, to avoid artifact caused by noise, usually an image smoothing (or low pass filtering) is carried out prior to any image differentiating (we have such process in edge detection too). The code does exactly this, i.e. apply and moving average filter on the image to reduce noise effect.
It_m = conv2(im1,[1,1;1,1]) + conv2(im2,[-1,-1;-1,-1]);
(Comments converted to an answer.)
In theory, there is nothing wrong with taking a pixel-wise difference:
Im_t = im1-im2;
to compute the time derivative. Using a spatial smoother when computing the time derivative mitigates the effect of noise.
Moreover, looking at the way that code computes spatial (x and y) derivatives:
Ix_m = conv2(im1,[-1 1; -1 1], 'valid');
computing the time derivate with a similar kernel and the valid option ensures the matrices It_x, It_y and Im_t have compatible sizes.
The temporal partial derivative(along t), is connected to the spatial partial derivatives (along x and y).
Think of the video sequence you are analyzing as a volume, spatio-temporal volume. At any given point (x,y,t), if you want to estimate partial derivatives, i.e. estimate the 3D gradient at that point, then you will benefit from having 3 filters that have the same kernel support.
For more theory on why this should be so, look up the topic steerable filters, or better yet look up the fundamental concept of what partial derivative is supposed to be, and how it connects to directional derivatives.
Often, the 2D gradient is estimated first, and then people tend to think of the temporal derivative secondly as independent of the x and y component. This can, and very often do, lead to numerical errors in the final optical flow calculations. The common way to deal with those errors is to do a forward and backward flow estimation, and combine the results in the end.
One way to think of the gradient that you are estimating is that it has a support region that is 3D. The smallest size of such a region should be 2x2x2.
if you do 2D gradients in the first and second image both using only 2x2 filters, then the corresponding FIR filter for the 3D volume is collected by averaging the results of the two filters.
The fact that you should have the same filter support region in 2D is clear to most: thats why the Sobel and Scharr operators look the way they do.
You can see the sort of results you get from having sanely designed differential operators for optical flow in this Matlab toolbox that I made, in part to show this particular point.

take center of mass or average for matched features

I have a application for tracking, then I will have the player object as the following photo shows. I need to do the following:
1- detect features from each frames and match them with the next frame, I use SURF
2- calculate the average point from the feature points which I have estimated from step 1
3- calculate distance between the average point that estimated at step 2, between each two frames.
then I am able to save the location for the matched features,
surfPoints.Location
but still I don't know what is the best way to get center of mass for these points, or take average for them?
Also how to filter the miss matched points, I see that there is a function estimateGeometricTransform , but this function remove many points from the matched ones !
is there any good approach for that?
So let me sum up :
You have two keypoint arrays, and matching function that gives you indices of matches in both lists ("keypoint 7 in original list is ~ matching keypoint 12 in the second")
So now your question is to evaluate global shift from these local displacements, taking into account outliers ?
In that case (fitting a model given outliers) you should really look into RANSAC song (and the eternally funny RANSAC song)
Although the algorithm works great, it is non-deterministic (as it will involve trying out models based on random samples and evaluating the number of outliers)
I'll let you do the reading on RANSAC's theory (simple statistics), now let's see how to use RANSAC in your case :
Your problem is thus : given a list of 2D vectors, find the best 2D vector that minimizes the number of "outliers"
The model fitting step is then just picking a vector out of the list of vector
Outliers are vectors that go "CRAZY WRONG" in direction or norm
Also, RANSAC explained by Mathworks
The difficulty here is that you have non-rigid motion. estimateGeometricTransform is great when the motion can be described by an affine or a projective transformation. However, because you are tracking a complex articulated object, like a person, the motion is much more complicated. This is why estimateGeometericTransform rejects a lot of matches as outliers.
There are several things you can try. One is to try using vision.PointTracker to track the points. It uses the KLT (Kanade-Lucas-Tomasi) algorithm.
Alternatively, if your camera is stationary, you can try using vision.ForegroundDetector, which implements background subtraction. It will give you a binary mask showing all moving objects.

Classifying of hand gestures using HMMs on Matlab

I'm currently working on a project where I should classify hand gestures, many papers proposed that HMMs is the way to do so, many tutorials speak of either a weather tutorial or a dice and coin tutorial, I can't seem to understand how to map these to my problem and what should my different matrices be, I currently have a feature vector (containing the detected features of the hands as a n*2 matrix where n is the total number of features detected in all the frames, i.e. if the algorithm detected 10 features in each frame and the video is 10 frames, n would be = 100, and 2 is the x and y coordinates) and the motion vector (the motion of the hand itself in the video m*2 size where m is the number of frames in the video) also any other data u would recommend to extract from the video.
I know the papers you are talking about and the exemples about the weather are simplistic and cannot be mapped to most of the problems now processed with HMMs. In your case, you have features corresponding to hand gestures that you know. HMM can work because the data you have is dynamic, i.e. ordered in time.
My advice is that you should first have a look at the widely used HMM toolbox by Kevin Murphy. It provides all the tools you need to start working with HMMs.
The main idea is to model each gesture type with one dedicated HMM. For a given gesture type, the corresponding HMM will be trained with the available features that you have.
Once trained, you get a state transition probability matrix, an emission probability matrix and a prior for selecting the initial state.
When your have an unknown gesture, you will then compute the likelihood this gesture (its features actually) could have been generated by each of the trained HMMs. Usually, the query sequence is assigned to the category of the one raising the highest score.
This is for the big picture. In your case, you will have to find a way to represent your features as a time series. The "time" being the different frames. With a complex application such as hand gesture it might be difficult to see what each state of the model represents. Some kinds of HMM, by their topology (left-to-right models for instance) make this analogy easier.

Pattern recognition in Neural Network using matlab simulation

I am new to this neural network in matlab. I wanted to create a Neural Network using matlab simulation.
This matlab simulation is using pattern recognition.
I am running on a windows XP platform.
For example, I have a sets of waveforms of circular shape.
I have extracted out the poles.
These poles will teach my Neural Network that it is circular in shape, hence whenever I input another set of slightly different circular shape waveform, the Neural Network is able to distinguish between the shape.
Currently, I have extracted the poles of these 3 shapes, cylinder, circle and rectangle.
But I am clueless of how I should go about creating my Neural Network.
I'd recommend utilizing SOM (Self-organizing map) for pattern recognition since it's really robust. Also there's a Som Toolbox for Matlab you might be interested in. However, to make it learn waves while neglecting their offsets, you'd need to make some changes to the "similarity function". These changes will affect quite a lot on the SOM's training time but if that's not a problem, keep reading.
For the SOM you'll have to sample your waves to constant sized vectors, let say:
sin x -> sin_vector = (a1, a2, a3, ..., aN)
cos x -> cos_vector = (b1, b2, b3, ..., bN)
Usually similarity of "SOM-vectors" is calculated with euclidian distance. Euclidian distance of those two vectors is huge since they have a different offset. In your case they should be considered to be similar ie. distance to be small. So.. if you don't sample all the similar waves from the same starting point, they will be classified in different classes. That is probably a problem. But! Similarity of vectors in SOM is calculated in order to find the BMU (best-matching unit) from the map and pulling the BMU's and its neigborhood's vectors torwards the values of the given sample. So all you need to change is the way to compare those vectors and the way to pull the vectors' values torwards the sample so that both will be "offset-tolerent".
Slow but working solution is first finding the best offset index for each vector. Best offset index is the one that will produce the smallest value with euclidian distance for the sample. Smallest distance calculated with some node of the net will then be the BMU. Then the BMU's and its neigborhood's vectors are pulled torwards the given sample using the offset index calculated for each node just before. Everything else should work out-of-the-box.
This solution is relatively slow but should work great. I'd recommend studying the consept of SOM thoroughly and then reading this post (and angry comments) again :)
PLEASE comment if you know some mathematical solution that would be better than that previous one!
You can try to use Matlab's Neural network pattern recognition tool nprtool as it is specialize to train and test neural network for pattern recognition.

Process for comparing two datasets

I have two datasets at the time (in the form of vectors) and I plot them on the same axis to see how they relate with each other, and I specifically note and look for places where both graphs have a similar shape (i.e places where both have seemingly positive/negative gradient at approximately the same intervals). Example:
So far I have been working through the data graphically but realize that since the amount of the data is so large plotting each time I want to check how two sets correlate graphically it will take far too much time.
Are there any ideas, scripts or functions that might be useful in order to automize this process somewhat?
The first thing you have to think about is the nature of the criteria you want to apply to establish the similarity. There is a wide variety of ways to measure similarity and the more precisely you can describe what you want for "similar" to mean in your problem the easiest it will be to implement it regardless of the programming language.
Having said that, here is some of the thing you could look at :
correlation of the two datasets
difference of the derivative of the datasets (but I don't think it would be robust enough)
spectral analysis as mentionned by #thron of three
etc. ...
Knowing the origin of the datasets and their variability can also help a lot in formulating robust enough algorithms.
Sure. Call your two vectors A and B.
1) (Optional) Smooth your data either with a simple averaging filter (Matlab 'smooth'), or the 'filter' command. This will get rid of local changes in velocity ("gradient") that appear to be essentially noise (as in the ascending component of the red trace.
2) Differentiate both A and B. Now you are directly representing the velocity of each vector (Matlab 'diff').
3) Add the two differentiated vectors together (element-wise). Call this C.
4) Look for all points in C whose absolute value is above a certain threshold (you'll have to eyeball the data to get a good idea of what this should be). Points above this threshold indicate highly similar velocity.
5) Now look for where a high positive value in C is followed by a high negative value, or vice versa. In between these two points you will have similar curves in A and B.
Note: a) You could do the smoothing after step 3 rather than after step 1. b) Re 5), you could have a situation in which a 'hill' in your data is at the edge of the vector and so is 'cut in half', and the vectors descend to baseline before ascending in the next hill. Then 5) would misidentify the hill as coming between the initial descent and subsequent ascent. To avoid this, you could also require that the points in A and B in between the two points of velocity similarity have high absolute values.