Similarity score between two detected images - matlab

I am currently implementing a detection and tracking system by tracking heads. I was trying to figure out some similarity score between two detected images so that I can analyse if the detections are of the same person or not (in different frames). I tried to change the RGB images (detections) into HSV and concatenate the Hue and Saturation values into one single vector for each image. Then I found the difference between the vectors of the separate detections by using the summation of the absolute difference.
However, when I tried it, a detection of a bag had a better similarity score than that of the same head. Can someone guide me with what maybe is wrong or else any better comparison technique?

Related

How to construct 3D image from 2D image using Markov Random Field?

I have one 2D CT image and I want to convert it to 3D image using Markov Random Field. There are several papers in the literature in which this technique was used based on 3 2D orthogonal images. However, I can't find a simple and clear resource that explains the conversion process using MRF in clear steps. Here are some papers I found,
http://www.immijournal.com/content/pdf/s40192-014-0019-3.pdf
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1971113/
https://www.eecs.berkeley.edu/Research/Projects/CS/vision/papers/efros-iccv99.pdf
What I have understood is that the image is converted into a graph of connected pixels, and the properties of a pixel depend on the properties of its adjacent ones. Put it was not really clear how the process takes place. Also, the cost minimization process was confusing to me what parameters are we trying to minimize? And how this will lead to constructing a 3D image from the 3 2D orthogonal ones?
Can anyone please explain to me how the conversion algorithm works using MRF in steps?
Thank You

use scale space representation to filter one image

Currently I hope to use scale space representation to filter one image. Features in one image can be filtered using an Gaussian smooth filter with one optimal sigma. It means different features in one image can be expressed best in different scale under scale space representation.
For example, I have one image with one tree in it. In the scale space representation, three sigma values are used and they are represented as sigma0, sigma1 and sigma2. The ground is best expressed in the smoothed image with sigma0 because it contains textures mainly. The branches are best expressed in the smoother image with sigma1 and the trunk is with the smoother image with sigma2. If I hope to filter the image, I hope that the filtered pixels for the group is from the smoothed image with sigma0.
The filtered pixels for the branches are from the smoothed image with sigma1. The filtered pixels for the trunk are from the smoothed image with sigma2.
It requires that I need to determine in which smoothed image one pixel is expressed best. Is this idea plausible?
I am trying to use differece-of-Gaussian of two successive smoothed images to perform the above task. Is there any other way to combine the three smoothed image?
I use Matlab to implement the idea. The values of the three sigmas is 1.0, 2.0 and 3.0. The corresponding size of Gaussian kernel is 3, 5 and 7. I use the function fspecial to generate the kernel. Are the parameter reasonable? Please share your experience with the scale space representation to help me. You can provide some links to useful papers.
your idea is very much plausible! You are just one step away from it. I did something very similar once and it looked like this:
After smoothing your images and extracting the edges for each smoothing step (I used a weighted [to compensate for maxima supression after Gauss filtering] Sobel filter for this since DOG was not quite stable for my aplication), you can proyect (and normalize) your whole stack of edge images into a single image ("cummulative edges") which will contain the characteristic edges. You can then compare the cummulative edges image (using cross-correlation or whatever you wish) with every single image in your edge stack, the biggest value of this comparation is then the smooth-scale in which the pixel is expressed the best.
Hope that makes sense for you after reading it a couple of times.
Also don't be afraid of using much bigger kernel sizes, while it all depends on your application, I ended up using things of 51 and bigger!!! (was working with 40MP images though...)
T. Lindeberg has literally dozens of papers related to this problem. I found this one the most useful, but since you are already in the right track, I don't think reading the 50 pages will make you that much smarter. The most important part of it is maybe this one:
Principle for scale selection:
In the absence of other evidence, assume that a scale level, at which some
(possibly non-linear) combination of normalized derivatives assumes a
local maximum over scales, can be treated as reflecting a characteristic
length of a corresponding structure in the data.

Stereo Camera Geometry

This paper describes nicely the geometry of a stereo image system. I am trying to figure out, if the cameras tilted towards each other with a certain angle, how the calculation would change? I looked around but couldn't find any reference to tilted camera systems.
Unfortunately, the calculation changes significantly. The rectified case (where both cameras are well-aligned to each other) has the advantage that you can calculate the disparity and the depth is proportional to the disparity. This is not the case in the general case.
When you introduce tilts, you end up with something called epipolar geometry. Here is a paper about this I just googled. In order to calculate the depth from a pixel-pair you need the fundamental matrix or the essential matrix. Both are not easy to obtain from the image pair. If, however, you have the geometric relation of both cameras (translation and rotation), calculating these matrices is a lot easier.
There are several ways to calculate the depth of a pixel-pair. One way is to use the fundamental matrix to rectify both images (although rectifying is not easy either, or even unique) and run a simple disparity check.

how to segment sky & water part in a picture

I'm trying to segment the sky and water part in this image.
Link of the Picture
I've tried so many methods like k-means, threshold, multi threshold etc. BUt unfortunately nothing worked so well.
Here is an example of my code(Matlab):
img=imread('1.jpg');
im_gray=rgb2gray(img);
b=imadjust(im_gray);
imshow(b);
bw_remove_small=imopen(b,strel('square',5));
imshow(bw_remove_small); %after 1st iteration
m3=medfilt2(bw_remove_small,[18,16]);
imshow(m3);
m3=medfilt2(bw_remove_small,20,20]);
m3=medfilt2(bw_remove_small,[20,20]);
imshow(m3);
I1=m3;
I2=rgb2gray(I1);
I=double(I2);
figure
subplot(1,3,1)
imshow(I1)
subplot(1,3,2)
imshow(I2)
g=kmeans(I(:),4);
J = reshape(g,size(I));
subplot(1,3,3)
imshow(J,[]);
Can any one help me?please
The picture's two regions are different in hue, texture, and gray level brightness.
The horizon is the best line in the image from our point of view and can be seen by the distinct change in brightness. The brightness will not work with a single threshold because the image brightness is not flat, so use brightness a model of the distribution to flatten out the sky or the water. This implies knowledge of the objective but there are two things that can give you an approximate answer: texture and/or hue.
The hue with a threshold of 120 (derived from the hue histogram) will give you the two regions but will not be divided cleanly and will have overlapping sections. Though using these two sections a model of the brightness can be found.
The same with texture. Using a small fft of the image, subtracting the dc out, then averaging or just summing up the non dc parts will result in a histogram with two peaks that may not be as distinct as the hue's is but is enough to find a threshold and two areas that will allow a model of the brightness to be found.
The key fact is if the sky is modeled properly as a gray surface then you can subtract it out of the image and use a simple threshold to pull it out.
Edge detection is very noisy in this image to be able to easily see the line but if you can pull out the image lines without losing shape then look for a straight and long contour it may take less code/work.
Hope this helps some! I used this to find mountains in the distance when there was not a big difference between the sky and the mountains. Plus I just tried this on your pic and almost got a good answer without a good model of the sky.

Corner Detection in 2D Vector Data

I am trying to detect corners (x/y coordinates) in 2D scatter vectors of data.
The data is from a laser rangefinder and our current platform uses Matlab (though standalone programs/libs are an option, but the Nav/Control code is on Matlab so it must have an interface).
Corner detection is part of a SLAM algorithm and the corners will serve as the landmarks.
I am also looking to achieve something close to 100Hz in terms of speed if possible (I know its Matlab, but my data set is pretty small.)
Sample Data:
[Blue is the raw data, red is what I need to detect. (This view is effectively top down.)]
[Actual vector data from above shots]
Thus far I've tried many different approaches, some more successful than others.
I've never formally studied machine vision of any kind.
My first approach was a homebrew least squares line fitter, that would split lines in half resurivly until they met some r^2 value and then try to merge ones with similar slope/intercepts. It would then calculate the intersections of these lines. It wasn't very good, but did work around 70% of the time with decent accuracy, though it had some bad issues with missing certain features completely.
My current approach uses the clusterdata function to segment my data based on mahalanobis distance, and then does basically the same thing (least squares line fitting / merging). It works ok, but I'm assuming there are better methods.
[Source Code to Current Method] [cnrs, dat, ~, ~] = CornerDetect(data, 4, 1) using the above data will produce the locations I am getting.
I do not need to write this from scratch, it just seemed like most of the higher-class methods are meant for 2D images or 3D point clouds, not 2D scatter data. I've read a lot about Hough transforms and all sorts of data clustering methods (k-Means etc). I also tried a few canned line detectors without much success. I tried to play around with Line Segment Detector but it needs a greyscale image as an input and I figured it would be prohibitivly slow to convert my vector into a full 2D image to feed it into something like LSD.
Any help is greatly appreciated!
I'd approach it as a problem of finding extrema of curvature that are stable at multiple scales - and the split-and-merge method you have tried with lines hints at that.
You could use harris corner detector for detecting corners.