average time for SURF(speeded-up robust features) implementation in MATLAB - matlab

I have been implementing real-time object tracking for which I am using OpenSURF (http://www.mathworks.com/matlabcentral/fileexchange/28300-opensurf--including-image-warp-) library in matlab.But the problem is to extract SURF features alone it is taking around 0.5 seconds for a single frame(I have tried for several times). This makes my real time implementation very slow.
But people say that SURF is the fastest algorithm for feature extraction. Am I doing anything wrong here? or OpenSURF library implements it slow?
Or can I implement any other better algorithm for real-time object tracking?
any suggestion is appreciated.

Actually look more info in the link you provided. The last comment (in the webpage) is someone complaining about the speed of the algorithm on 28 Jul 2010 because it takes 11s in the test image. The next day the OPupdates the code for a 10x speedup, so 1s. You are getting 0.5, so thats pretty fast!
Recommendation:
Dont SURF all images. SURF 1 out of 10-20 secs, and apply optical flow to track those points during frames. Each 10-20 images, SURF and match the feature points, to make sure you didn't loose any. If you do this in a parallel tread better!

Related

Mapping Vision Outputs To Neural Network Inputs

I'm fairly new to MATLAB, but have acquainted myself with Simulink and Computer Vision over the past few days. My problem statement involves taking a traffic/highway video input and detecting if an accident has occurred.
I plan to do this by extracting the values of centroid to plot trajectory, velocity difference (between frames) and distance between two vehicles. I can successfully track the centroids, and aim to derive the rest of the features.
What I don't know is how to map these to ANN. I mean, every image has more than one vehicle blobs, which means, there are multiple centroids in a single frame/image. So, how does NN act on multiple inputs (the extracted features per vehicle) simultaneously? I am obviously missing the link. Help me figure it out please.
Also, am I looking at time series data?
I am not exactly sure about your question. The problem can be both time series data and not. You might be able to transform the time series version of the problem, such that it can be solved using ANN, but it is sort of a Maslow's hammer :). Also, Could you rephrase the problem.
As you said, you could give it features from two or three frames and then use the classifier to detect accident or not, but it might be difficult to train such a classifier. The problem is really difficult and the so you might need tons of training samples to get it right, esp really good negative samples (for examples cars travelling close to each other) etc.
There are multiple ways you can try to solve this problem of accident detection. For example : Build a classifier (ANN/SVM etc) to detect accidents without time series data. In which case your input would be accident images and non accident images or some sort of positive and negative samples for training and later images for test. In this specific case, you are not looking at the time series data. But here you might need lots of features to detect the same (this in some sense a single frame version of the problem).
The second method would be to use time series data, in which case you will have to detect the features, track the features (say using Lucas Kanade/Horn and Schunck) and then use the information about velocity and centroid to detect the accident. You might even be able to formulate it for HMMs.

Smooth of series data

i need to smooth better this kind of plot, I've already used a moving average (10 points) to get this plot but it's not yet perfect. I want to remove all these little peaks dued by noise, I need to consider only the bigger ones because I'm counting the num of beats from a sensor.
(ie.: in the first 30 seconds I should have just one peak instead of several successive little peaks)
I thought to use a cubic spline but isn't simple to implement in C and it's going to take almost 1-2 weeks of work.
Is there a simpler method / algorithm to use for this achievement? I'm working on this project for iOS (iPhone) environment.
a busy cat http://img15.imageshack.us/img15/1929/schermata022455973alle1o.png
The answer to your question depends a lot on the underlying data. Is the jaggedness of the data really 'noise' or is it really jagged data.
Strategies you could try:
windowing the data and take the median/mean in each window -- so each window is 50 (from your x axis)
sample the data
Nonlinear least squares curve fit (you'd probably have to use a C++ library for that, here is an open source version you could port http://www.ics.forth.gr/~lourakis/levmar/)
some sort of naive bezier smoothing should be pretty easy.
All of these methods have ramifications and none are without problems. Good luck.

Audio processing with fft iphone app [duplicate]

I'm trying to write a simple tuner (no, not to make yet another tuner app), and am looking at the AurioTouch sample source (has anyone tried to comment this code??).
My worry is that aurioTouch doesn't seem to actually work very well when looking at the frequency domain graph. I play a single note on an instrument and I don't see a nicely ordered, small, set of frequencies with one string peak at the appropriate frequency of the note.
Has anyone used aurioTouch enough to know whether the underlying code is functional or whether it is just a crude sample?
Other options I have are to use FFTW or KISS FFT. Anyone have any experience with those?
Thanks.
You're expecting the wrong thing!!
Not the library's fault
Whether the library produces it properly or not, you're looking for a pattern that rarely actually exists in real-life sounds. Only a perfect sine wave, electronically generated, will cause an even partway discrete appearing 'spike' in the freq. graph. If you don't believe it try firing up a 'spectrum analyzer' visualization in winamp or media player. It's not so much the PC's fault.
Real sound waves are complicated animals
Picture a sawtooth or sqaure wave in your mind's eye. those sharp turnaround - corners or points on the wave, look like tons of higher harmonics to the FFT or even a real fourier. And if you've ever seen a real 'sqaure wave/sawtooth' on a scope, or even a 'sine wave' produced by an instrument that is supposed to produce a sinewave, take a look at all the sharp nooks and crannies in just ONE note (if you don't have a scope just zoom way in on the wave in audacity - the more you zoom, the higher notes you're looking at). Yep, those deviations all count as frequencies.
It's hard to tell the difference between one note and a whole orchestra sometimes in a spectrum analysis.
But I hear single notes!
So how does the ear do it? It considers the entire waveform. Then your lower brain lies to your upper brain about what the input is: one note, not a mess of overtones.
You can't do it as fully, but you can approximate it via 'training.'
Approximation: building some smarts
PLAY the note on the instrument and 'save' the frequency graph. Do this for notes in several frequency ranges, or better yet all notes.
Then interpolate the notes to fill in gaps (by 1/2 or 1/4 steps) by multiplying the saved graphs for that instrument by 2^(1/12) (or 1/24 for 1/4 steps, etc).
Figure out how to store them in a quickly-searchable data structure like a BST or trie. Only it would have to return a 'how close is this' score. It would have to identify the match via proportions of frequencies as well, in case it came in different volumes.
Using the smarts
Next time you're looking for a note from that instrument, just take the 'heard' freq graph and find it in that data structure. You can record several instruments that make different waveforms and search for them too. If there are background sounds or multiple notes, take the closest match. Then if you want to identify other notes, 'subtract' the found frequency pattern from the sampled one, and rinse, lather repeat.
It won't work by your voice...
If you ever tried to tune yourself by singing into a guitar tuner, you'll know that tuners arent that clever. Of course some instruments (voice esp) really float around the pitch and generate an ever-evolving waveform (even without somebody singing).
What are you trying to accomplish?
You would not have to totally get this fancy for a 'simple' tuner app, but if you're not making just another tuner app them I'm guessing you actually want to identify notes (e.g., maybe you want to autogenerate midi files from songs on the radio ;-)
Good luck. I hope you find a library that does all this junk instead of having to roll your own.
Edit 2017
Note this webpage: http://www.feilding.net/sfuad/musi3012-01/html/lectures/015_instruments_II.htm
Well down the page, there are spectrum analyses of various organ pipes. There are many, many overtones. These are possible to detect - with enough work - if you 'train' your app with them first (just like telling a kid, 'this is what a clarinet sounds like...')
aurioTouch looks weird because the frequency axis is on a linear scale. It's very difficult to interpret FFT output when the x-axis is anything other than a logarithmic scale (traditionally log2).
If you can't use aurioTouch's integer-FFT, check out my library:
http://github.com/alexbw/iPhoneFFT
It uses double-precision, has support for multiple window types, and implements Welch's method (which should give you more stable spectra when viewed over time).
#zaph, the FFT does compute a true Discrete Fourier Transform. It is simply an efficient algorithm that takes advantage of the bit-wise representation of digital signals.
FFTs use frequency bins and the bin frequency width is based on the FFT parameters. To find a frequency you will need to record it sampled at a rate at least twice the highest frequency present in the sample. Then find the time between the cycles. If it is not a pure frequency this will of course be harder.
I am using Ooura FFT to compute the FFT of acceleromter data. I do not always obtain the correct spectrum. For some reason, Ooura FFT produces completely wrong results with spectral magnitudes of the order 10^200 across all frequencies.

Buddhabrot Fractal

I am trying to implement buddhabrot fractal. I can't understand one thing: all implementations I inspected pick random points on the image to calculate the path of the particle escaping. Why do they do this? Why not go over all pixels?
What purpose do the random points serve? More points make better pictures so I think going over all pixels makes the best picture - am I wrong here?
From my test data:
Working on 400x400 picture. So 160 000 pixels to iterate if i go all over.
Using random sampling,
Picture only starts to take shape after 1 million points. Good results show up around 1 billion random points which takes hours to compute.
Random sampling is better than grid sampling for two main reasons. First because grid sampling will introduce grid-like artifacts in the resulting image. Second is because grid sampling may not give you enough samples for a converged resulting image. If after completing a grid pass, you wanted more samples, you would need to make another pass with a slightly offset grid (so as not to resample the same points) or switch to a finer grid which may end up doing more work than is needed. Random sampling gives very smooth results and you can stop the process as soon as the image has converged or you are satisfied with the results.
I'm the inventor of the technique so you can trust me on this. :-)
The same holds for flame fractals: Buddha brot are about finding the "attractors",
so even if you start with a random point, it is assumed to quite quickly converge to these attracting curves. You typically avoid painting the first 10 pixels in the iteration or so anyways, so the starting point is not really relevant, BUT, to avoid doing the same computation twice, random sampling is much better. As mentioned, it eliminates the risk of artefacts.
But the most important feature of random sampling is that it has all levels of precision (in theory, at least). This is VERY important for fractals: they have details on all levels of precision, and hence require input from all levels as well.
While I am not 100% aware of what the exact reason would be, I would assume it has more to do with efficiency. If you are going to iterate through every single point multiple times, it's going to waste a lot of processing cycles to get a picture which may not look a whole lot better. By doing random sampling you can reduce the amount work needed to be done - and given a large enough sample size still get a result that is difficult to "differentiate" from iterating over all the pixels (from a visual point of view).
This is possibly some kind of Monte-Carlo method so yes, going over all pixels would produce the perfect result but would be horribly time consuming.
Why don't you just try it out and see what happens?
Random sampling is used to get as close as possible to the exact solution, which in cases like this cannot be calculated exactly due to the statistical nature of the problem.
You can 'go over all pixels', but since every pixel is in fact some square region with dimensions dx * dy, you would only use num_x_pixels * num_y_pixels points for your calculation and get very grainy results.
Another way would be to use a very large resolution and scale down the render after the calculation. This would give some kind of 'systematic' render where every pixel of the final render is divided in equal amounts of sub pixels.
I realize this is an old post, but wanted to add my thoughts based on a current project.
The problem with tying your samples to pixels, like others said:
Couples your sample grid to your view plane, making it difficult to do projections, zooms, etc
Not enough fidelity. Random sampling is more efficient as everyone else said, so you need even more samples if you want to sample using a uniform grid
You're much more likely to see grid artifacts at equivalent sample counts, whereas random sampling tends to just look grainy at low counts
However, I'm working on a GPU-accelerated version of buddhabrot, and ran into a couple issues specific to GPU code with random sampling:
I've had issues with overhead/quality of PRNGs in GPU code where I need to generate thousands of samples in parallel
Random sampling produces highly scattered traces, and the GPU really, really wants threads in a given block/warp to move together as much as possible. The performance difference for clustered vs scattered traces was extreme in my testing
Hardware support in modern GPUs for efficient atomicAdd means writes to global memory don't bottleneck GPU buddhabrot nearly as much now
Grid sampling makes it very easy to do a two-pass render, skipping blocks of sample points based on a low-res pass to find points that don't escape or don't ever touch a pixel in the view plane
Long story short, while the GPU technically has to do more work this way for equivalent quality, it's actually faster in practice AFAICT, and GPUs are so fast that re-rendering is often a matter of seconds/minutes instead of hours (or even milliseconds at lower resolution / quality levels, realtime buddhabrot is very cool)

AurioTouch & FFT for an instrument tuner

I'm trying to write a simple tuner (no, not to make yet another tuner app), and am looking at the AurioTouch sample source (has anyone tried to comment this code??).
My worry is that aurioTouch doesn't seem to actually work very well when looking at the frequency domain graph. I play a single note on an instrument and I don't see a nicely ordered, small, set of frequencies with one string peak at the appropriate frequency of the note.
Has anyone used aurioTouch enough to know whether the underlying code is functional or whether it is just a crude sample?
Other options I have are to use FFTW or KISS FFT. Anyone have any experience with those?
Thanks.
You're expecting the wrong thing!!
Not the library's fault
Whether the library produces it properly or not, you're looking for a pattern that rarely actually exists in real-life sounds. Only a perfect sine wave, electronically generated, will cause an even partway discrete appearing 'spike' in the freq. graph. If you don't believe it try firing up a 'spectrum analyzer' visualization in winamp or media player. It's not so much the PC's fault.
Real sound waves are complicated animals
Picture a sawtooth or sqaure wave in your mind's eye. those sharp turnaround - corners or points on the wave, look like tons of higher harmonics to the FFT or even a real fourier. And if you've ever seen a real 'sqaure wave/sawtooth' on a scope, or even a 'sine wave' produced by an instrument that is supposed to produce a sinewave, take a look at all the sharp nooks and crannies in just ONE note (if you don't have a scope just zoom way in on the wave in audacity - the more you zoom, the higher notes you're looking at). Yep, those deviations all count as frequencies.
It's hard to tell the difference between one note and a whole orchestra sometimes in a spectrum analysis.
But I hear single notes!
So how does the ear do it? It considers the entire waveform. Then your lower brain lies to your upper brain about what the input is: one note, not a mess of overtones.
You can't do it as fully, but you can approximate it via 'training.'
Approximation: building some smarts
PLAY the note on the instrument and 'save' the frequency graph. Do this for notes in several frequency ranges, or better yet all notes.
Then interpolate the notes to fill in gaps (by 1/2 or 1/4 steps) by multiplying the saved graphs for that instrument by 2^(1/12) (or 1/24 for 1/4 steps, etc).
Figure out how to store them in a quickly-searchable data structure like a BST or trie. Only it would have to return a 'how close is this' score. It would have to identify the match via proportions of frequencies as well, in case it came in different volumes.
Using the smarts
Next time you're looking for a note from that instrument, just take the 'heard' freq graph and find it in that data structure. You can record several instruments that make different waveforms and search for them too. If there are background sounds or multiple notes, take the closest match. Then if you want to identify other notes, 'subtract' the found frequency pattern from the sampled one, and rinse, lather repeat.
It won't work by your voice...
If you ever tried to tune yourself by singing into a guitar tuner, you'll know that tuners arent that clever. Of course some instruments (voice esp) really float around the pitch and generate an ever-evolving waveform (even without somebody singing).
What are you trying to accomplish?
You would not have to totally get this fancy for a 'simple' tuner app, but if you're not making just another tuner app them I'm guessing you actually want to identify notes (e.g., maybe you want to autogenerate midi files from songs on the radio ;-)
Good luck. I hope you find a library that does all this junk instead of having to roll your own.
Edit 2017
Note this webpage: http://www.feilding.net/sfuad/musi3012-01/html/lectures/015_instruments_II.htm
Well down the page, there are spectrum analyses of various organ pipes. There are many, many overtones. These are possible to detect - with enough work - if you 'train' your app with them first (just like telling a kid, 'this is what a clarinet sounds like...')
aurioTouch looks weird because the frequency axis is on a linear scale. It's very difficult to interpret FFT output when the x-axis is anything other than a logarithmic scale (traditionally log2).
If you can't use aurioTouch's integer-FFT, check out my library:
http://github.com/alexbw/iPhoneFFT
It uses double-precision, has support for multiple window types, and implements Welch's method (which should give you more stable spectra when viewed over time).
#zaph, the FFT does compute a true Discrete Fourier Transform. It is simply an efficient algorithm that takes advantage of the bit-wise representation of digital signals.
FFTs use frequency bins and the bin frequency width is based on the FFT parameters. To find a frequency you will need to record it sampled at a rate at least twice the highest frequency present in the sample. Then find the time between the cycles. If it is not a pure frequency this will of course be harder.
I am using Ooura FFT to compute the FFT of acceleromter data. I do not always obtain the correct spectrum. For some reason, Ooura FFT produces completely wrong results with spectral magnitudes of the order 10^200 across all frequencies.