I need to find the position of a smaller image inside a bigger image. The smaller image is a subset of the bigger image. The requirement is also that pixel values can slightly differ for example if images were produced by different JPEG compressions.
I've implemented the solution by comparing bytes using the CPU but I'm now looking into any possibility to speed up the process.
Could I somehow utilize OpenGLES and thus iPhone GPU for it?
Note: images are grayscale.
#Ivan, this is a pretty standard problem in video compression (finding position of current macroblock in previous frame). You can use a metric for difference in pixels such as sum of abs differences (SAD), sum of squared differences (SSD), or sum of Hadamard-transformed differences (SATD). I assume you are not trying to compress video but rather looking for something like a watermark. In many cases, you can use a gradient descent type search to find a local minimum (best match), on the empirical observation that comparing an image (your small image) to a slightly offset version of same (a match the position of which hasn't been found exactly) produces a closer metric than comparing to a random part of another image. So you can start by sampling the space of all possible offsets/positions (motion vectors in video encoding) rather coarsely, and then do local optimization around the best result. The local optimization works by comparing a match to some number of neighboring matches, and moving to the best of those if any is better than your current match, repeat. This is very much faster than brute force (checking every possible position), but it may not work in all cases (it is dependent on the nature of what is being matched). Unfortunately, this type of algorithm does not translate very well to GPU, because each step depends on previous steps. It may still be worth it; if you check eg 16 neighbors to the position for a 256x256 image, that is enough parallel computation to send to GPU, and yes it absolutely can be done in OpenGL-ES. However the answer to all that really depends on whether you're doing brute force or local minimization type search, and whether local minimization would work for you.
Related
I am working on an application that should determine if input image contain a stamp imprint and return its location. For RGB images I am using color segmentation and doing verification (with various shape factors), for grayscale image I thought that SIFT + verification would do the job, but using SIFT would only find those stamps(on input image) that I got in my database.
In ideal case it works really well, as shown on image bellow.
Fig. 1.
http://i.stack.imgur.com/JHkUl.png
The problem occurs when input image contains a stamp that does not exist in database. First thing I did was checking if there would be any matching key points if I compare a similar stamp to the one on input image. In most cases there is no single matching key point and if there is some they rather refer to other parts of input image than a stamp, as shown in Fig. 2.:
Fig. 2.
http://i.stack.imgur.com/coA4l.png
I also tried to find a match between input and circle images as the stamps are circular, but circle image has very few key points, if any.
So I wonder if there is any different approach that will make SIFT a bit more useful in this exact case? I though about creating a matrix with all descriptors and key-points from my database and then looking for nearest euclidean distance between input image and matrix, but it probably wont work as there is a lot of matching key-points(unwanted) across the database (see Fig. 2.).
I'm working with Matlab and tried both VLFeat and D. Lowe SIFT implementations.
Edit:
So I found a way to force SIFT to compute descriptors for user defined points on an image. My test image contained a circle, then the descriptors were computed and matched against input images, including the one under Fig 1 and 2. This process was repeated for scales from 0 to 10. Unfortunately it didn't help too.
This is only a first hint and not a full answer to the SIFT questions.
My impression is that detecting a circle by matching it against an image of a circle via SIFT is not the best approach, especially if the circle you want to detect has some unknown texture inside.
The textbook algorithm for circle detection would be Hough transform, which is mostly used for line detection but does work for any kind of shape which can be described by a low number of parameters (colleagues tell me things get nasty above 3, but a circle just has X,Y and r). There are several implementations in file exchange, the link is just to one example. Hough circle detection requires you to put an upper bound on the radii you want to detect, but this seems ok for your application.
From the examples you provided it looks like you should get quite far if you can detect circles reliably.
Actually I do not think SIFT will be solving this problem. I've been playing around with SIFT for quite some time and my conclusion is that it's really great for identifying identical patterns but not for similar patterns.
Just have a look at the construction of the SIFT feature vector: The descriptor is composed of several histograms of gradients(!). If you have patterns in the database that have very similar blob like structures in the stamps, then you might have a chance. But if this does not hold, then I guess you will not be very lucky.
From my point of view you have kind of solved the problem of finding indentical objects (stamps) and now extend to finding similar objects. This sounds like the same but in my past research I found these problems just related but not too identical.
Do you have any runtime constraints in your application? There might be other approaches but in this case, more input about possible constraints might be useful.
Update regarding constraints:
So your next task might be to detect the unknown stamps, right?
This sounds like a classification task.
In your case I would first try to find a descriptor/representation (or SVM) that classifies images into stamp/no-stamp. In order to evaluate this, set up a data base with ground truth and a reasonable amount of "unknown" stamps and other images like random snapshots from the letters, NOT containing stamps. This will be your test set.
Then try some descriptors/representations to caluclate the distance/similarity between your images to classify your test set into the classes STAMP / NO-STAMP. When you have found a descriptor/distance measure (or SVM) that performs well in classifying, then you could perform a sliding window approach on a letter to find a stamp. The sliding window approach is certainly not a very fast method, but a very easy one.
At least when you have reached this point, you can tune the detection - for example based on interesting point detectors.. but one step after the other...
I have applied Two different Image Enhancement Algorithm on a particular Image and got two resultant image , Now i want to compare the quality of those two image in order to find the effectiveness of those two Algorithms and find the more appropriate one based on the comparison of Feature vectors of those two images.So what Suitable Feature Vectors should i compare in this Case?
Iam asking in context of comparing the texture features of the images and which feature vector will be more suitable.
I need Mathematical support for verifying the effectiveness of any one algorithm based on the evaluation of Images for example using Constrast and Variance.So are there any more approaches do that?
A better approach would be to do some Noise/Signal ratio by comparing image spectra ?
Slayton is right, you need a metric and a way to measure against it, which can be an academic project in itself. However, i could think of one approach straightaway, not sure if it makes sense to your specific task at hand:
Metric:
The sum of abs( colour difference ) across all pixels. The lower, the more similar the images are.
Method:
For each pixel, get the absolute colour difference (or distance, to be precise) in LAB space between original and processed image and sum that up. Don't ruin your day trying to understand the full wikipedia article and coding that, this has been done before. Try re-using the methods getDistanceLabFrom(Color color) or getDistanceRgbFrom(Color color) from this PHP implementation. It worked like a charm for me when i needed a way to match a color of pixels in a jpg picture - which basically is the same principle.
The theory behind it (as far as my limited understanding goes): It's doing a mathematical abstraction of rgb or (better) lab colour space as a three dimensional room, and then calculate the distance, that's why it works well - and hardly worked for me when looking at a color code from a one-dimensionional perspective.
The usual way is to start with a reference image (a good one), then add some noise on it (in a controlled way).
Then, your algorithm should remove as much as possible from the added noise. The results are easy to compare with a signal-to-noise ration (see wikipedia).
Now, the approach is easy to apply on simple noise models, but if you aim to improve more complex appearance issues, you must devise a way to apply the noise, which is not easy.
Another, quite common way to do it is the one recommended by slayton: take all your colleagues to appreciate the output of your algorithm, then average their impressions.
If you have only the 2 images and no reference (higest quality) image, then you can see my rude solution/bash script there: https://photo.stackexchange.com/questions/75995/how-do-i-compare-two-similar-images-sharpness/117823#117823
It gets the 2 filenames and outputs the higher quality filename. It assumes the content of the images is identical (same source image).
It can be fooled though.
I am working on optical flow, and based on the lecture notes here and some samples on the Internet, I wrote this Python code.
All code and sample images are there as well. For small displacements of around 4-5 pixels, the direction of vector calculated seems to be fine, but the magnitude of the vector is too small (that's why I had to multiply u,v by 3 before plotting them).
Is this because of the limitation of the algorithm, or error in the code? The lecture note shared above also says that motion needs to be small "u, v are less than 1 pixel", maybe that's why. What is the reason for this limitation?
#belisarius says "LK uses a first order approximation, and so (u,v) should be ideally << 1, if not, higher order terms dominate the behavior and you are toast. ".
A standard conclusion from the optical flow constraint equation (OFCE, slide 5 of your reference), is that "your motion should be less than a pixel, less higher order terms kill you". While technically true, you can overcome this in practice using larger averaging windows. This requires that you do sane statistics, i.e. not pure least square means, as suggested in the slides. Equally fast computations, and far superior results can be achieved by Tikhonov regularization. This necessitates setting a tuning value(the Tikhonov constant). This can be done as a global constant, or letting it be adjusted to local information in the image (such as the Shi-Tomasi confidence, aka structure tensor determinant).
Note that this does not replace the need for multi-scale approaches in order to deal with larger motions. It may extend the range a bit for what any single scale can deal with.
Implementations, visualizations and code is available in tutorial format here, albeit in Matlab not Python.
Can you enlarge a feature so that rather than take up a certain number of pixels it actually takes up one or two times that many to make it easier to analyze? Would there be a way to generalize that in MATLAB?
This sounds an awful lot like a fictitious "zoom, enhance!" procedure that you'd hear about on CSI. In general, "blowing up" a feature doesn't make it any easier to analyze, because no additional information is created when you do this. Generally you would apply other, different transformations like noise reduction to make analysis easier.
As John F has stated, you are not adding any information. In fact, with more pixels to crunch through you are making it "harder" in the sense of requiring more processing.
You might be able to intelligently increase the resolution of an image using Compressed Sensing. It will require some work (or at least some serious thought), though, as you'll have to determine how best to sample the image you already have. There's a large number of papers referenced at Rice University Compressive Sensing Resources.
The challenge is that the image is already sampled using Nyquist-Shannon constraints. You essentially have to re-sample it using a linear basis function (with IID random elements) in such a way that the estimate is at the desired resolution and find some surrogate for the original image at that same resolution that doesn't bias the estimate.
The function imresize is useful for, well, resizing images, larger or smaller. And imcrop is useful for cropping images.
You might get other more useful answers if you tag the question image-processing too.
I am trying to implement buddhabrot fractal. I can't understand one thing: all implementations I inspected pick random points on the image to calculate the path of the particle escaping. Why do they do this? Why not go over all pixels?
What purpose do the random points serve? More points make better pictures so I think going over all pixels makes the best picture - am I wrong here?
From my test data:
Working on 400x400 picture. So 160 000 pixels to iterate if i go all over.
Using random sampling,
Picture only starts to take shape after 1 million points. Good results show up around 1 billion random points which takes hours to compute.
Random sampling is better than grid sampling for two main reasons. First because grid sampling will introduce grid-like artifacts in the resulting image. Second is because grid sampling may not give you enough samples for a converged resulting image. If after completing a grid pass, you wanted more samples, you would need to make another pass with a slightly offset grid (so as not to resample the same points) or switch to a finer grid which may end up doing more work than is needed. Random sampling gives very smooth results and you can stop the process as soon as the image has converged or you are satisfied with the results.
I'm the inventor of the technique so you can trust me on this. :-)
The same holds for flame fractals: Buddha brot are about finding the "attractors",
so even if you start with a random point, it is assumed to quite quickly converge to these attracting curves. You typically avoid painting the first 10 pixels in the iteration or so anyways, so the starting point is not really relevant, BUT, to avoid doing the same computation twice, random sampling is much better. As mentioned, it eliminates the risk of artefacts.
But the most important feature of random sampling is that it has all levels of precision (in theory, at least). This is VERY important for fractals: they have details on all levels of precision, and hence require input from all levels as well.
While I am not 100% aware of what the exact reason would be, I would assume it has more to do with efficiency. If you are going to iterate through every single point multiple times, it's going to waste a lot of processing cycles to get a picture which may not look a whole lot better. By doing random sampling you can reduce the amount work needed to be done - and given a large enough sample size still get a result that is difficult to "differentiate" from iterating over all the pixels (from a visual point of view).
This is possibly some kind of Monte-Carlo method so yes, going over all pixels would produce the perfect result but would be horribly time consuming.
Why don't you just try it out and see what happens?
Random sampling is used to get as close as possible to the exact solution, which in cases like this cannot be calculated exactly due to the statistical nature of the problem.
You can 'go over all pixels', but since every pixel is in fact some square region with dimensions dx * dy, you would only use num_x_pixels * num_y_pixels points for your calculation and get very grainy results.
Another way would be to use a very large resolution and scale down the render after the calculation. This would give some kind of 'systematic' render where every pixel of the final render is divided in equal amounts of sub pixels.
I realize this is an old post, but wanted to add my thoughts based on a current project.
The problem with tying your samples to pixels, like others said:
Couples your sample grid to your view plane, making it difficult to do projections, zooms, etc
Not enough fidelity. Random sampling is more efficient as everyone else said, so you need even more samples if you want to sample using a uniform grid
You're much more likely to see grid artifacts at equivalent sample counts, whereas random sampling tends to just look grainy at low counts
However, I'm working on a GPU-accelerated version of buddhabrot, and ran into a couple issues specific to GPU code with random sampling:
I've had issues with overhead/quality of PRNGs in GPU code where I need to generate thousands of samples in parallel
Random sampling produces highly scattered traces, and the GPU really, really wants threads in a given block/warp to move together as much as possible. The performance difference for clustered vs scattered traces was extreme in my testing
Hardware support in modern GPUs for efficient atomicAdd means writes to global memory don't bottleneck GPU buddhabrot nearly as much now
Grid sampling makes it very easy to do a two-pass render, skipping blocks of sample points based on a low-res pass to find points that don't escape or don't ever touch a pixel in the view plane
Long story short, while the GPU technically has to do more work this way for equivalent quality, it's actually faster in practice AFAICT, and GPUs are so fast that re-rendering is often a matter of seconds/minutes instead of hours (or even milliseconds at lower resolution / quality levels, realtime buddhabrot is very cool)