we can achieve gestures using kinect,
I am thinking to get images through kinect and then apply neural network on it, to get more accuracy.
is it good approach or i am thinking in wrong way?
Using neural network to recognize human gesture is a good idea, and some guys have published their result.
Related
My current project is to gather data from a sports video using video processing techniques. Specifically, in sports like Tennis or badminton, I want to identify the type of shots taken.
So, I thought of two methods to do this:
Use motion detection techniques and highlight only the players and the ball in the foreground, then use some kind of a filtering algorithm like the Kalman filter to track the player and then track the motion of their hands separately. This method seems to be really hard and complicated, I cannot seem to track the players accurately at all.
Supplying a collection of videos of a particular shot to a neural network, make it train and identify those shots eventually. But I do not know how to supply videos as inputs to a neural networks and I'm not clear as to which software is ideal for this application.
Any help would be great.
The second option looks better to me. Maybe have a look at how images are fed to neural networks, for example in Alexnet. Then feeding videos should not be too different. You'll probably have to first decode the videos, though.
I want to film a batter swinging at a baseball, but the bat is blurry. The video is 30 fps.
Through research I have found that deconvolution seems to be the way to minimize motion blur, but I have no idea if or how I can implement it in my iOS app post processing.
I was hoping someone could point me in the right direction like how to apply a deconvolution algorithm in iOS or what I might need to do...or if it is even possible. I imagine it takes some processing power.
Any suggestions at all are welcome...
Thanks, this is driving me crazy...
After a lot of research and talks with developers about deconvolusion on iOS (Thanks to Brad Larson for taking the time to give me detailed information) I am confident that it is not possible and/or not worth the time. If the hardware can handle the computations (No guarantee) it would be EXTREMELY slow and consume much of the device's battery. I have also been told it could take months to implement the algorithms...if it is possible at all.
Here is the response I received from Apple...
Deconvolution algorithms are generally difficult to implement and can be very computationally intensive. I suggest you starting with a simple sharpening technique. Depending on the amount of the motion blur in your video, it might just suffice.
The sharpen filters, including CISharpenLuminance and CIUnsharpMask, are now available in iOS 6, so it is moderately easy to test them out.
Core Image Filter Reference
https://developer.apple.com/library/mac/#documentation/graphicsimaging/reference/CoreImageFilterReference/Reference/reference.html
Core Image sample code from this year's WWDC session 511 "Core Image Techniques". It's called "Attempt3". This sample demonstrates best practices for applying CIFilter's to a live video taken by the iPhone/iPad camera. You may download the session video from the following page: https://developer.apple.com/videos/wwdc/2012/.
Just wanted to pass this information along.
I need to recognise numbers from the camera image on iPhone, in real-time. I know there will be no more than 5 digits on the image.
Is this problem realistic to solve given the computational specifications of the iPhone?
Does anyone have any experience using the Tesseract OCR library, and do you think it could be solved by using it?
The depends on your definition of "real-time", but yes, it should be possible to do relatively fast recognition of just the digits 0-9 on an iPhone 4, particularly if you can fonts, lighting conditions, etc. that they will appear in.
I highly recommend reading the article on how Sudoku Grab does its recognition of puzzles using the iPhone camera. In their case, a trained neural network was used to identify the digits, which should be reasonably simple and fast on modern iOS hardware.
The current recognition libraries out there, like OpenCV, will use the iPhone's CPU to do the processing. I've heard that they can do even more complex tasks like facial recognition fast enough to use with video sources while showing a minimal amount of stutter.
For even better performance, I believe that there's a lot of potential in the programmable GPUs on the newer iOS devices. In my benchmarks, I saw a 14X - 28X speedup when using the iPhone 4's GPU for simple image processing. While few people are looking at this right now, something like Sudoku Grab's neural network should be a parallel enough process to benefit from running on the GPU.
It should be computationally possible. There are apps that can get a bar code in real time and also an app that does real time translation. (Word Lens). I'm not sure what libraries they use, however.
YES it is possible using the tesseract engine
Here is the sample code if you like to check...
https://github.com/nolanbrown/Tesseract-iPhone-Demo
There is free SDK for that: http://rtrsdk.com/ Supports both iOS and Andorid, works in real-time, helps you capture any text, numbers should not be a problem.
Disclaimer: I work for ABBYY
Yes. Bender can help you with that. It lets you build and run neural nets on iOS. As it uses Metal under the hood, it runs fast and smooth. It also supports running TensorFlow models directly.
So you can run in Bender an existing model in TensorFlow trained for digit recognition Handwritten Digit Recognition using Convolutional Neural Networks in Python with Keras if you need help
Disclaimer: I worked on this project.
I want to analyze MIC audio on an ongoing basis (not just a snipper or prerecorded sample), and display frequency graph and filter out certain aspects of the audio. Is the iPhone powerful enough for that? I suspect the answer is a yes, given the Google and iPhone voice recognition, Shazaam and other music recognition apps, and guitar tuner apps out there. However, I don't know what limitations I'll have to deal with.
Anyone play around with this area?
Apple's sample code aurioTouch has a FFT implementation.
The apps that I've seen do some sort of music/voice recognition need an internet connection, so it's highly likely that these just so some sort of feature calculation on the audio and send these features via http to do the recognition on the server.
In any case, frequency graphs and filtering have been done before on lesser CPUs a dozen years ago. The iPhone should be no problem.
"Fast enough" may be a function of your (or your customer's) expectations on how much frequency resolution you are looking for and your base sample rate.
An N-point FFT is on the order of N*log2(N) computations, so if you don't have enough MIPS, reducing N is a potential area of concession for you.
In many applications, sample rate is a non-negotiable, but if it was, this would be another possibility.
I made an app that calculates the FFT live
http://www.itunes.com/apps/oscope
You can find my code for the FFT on GitHub (although it's a little rough)
http://github.com/alexbw/iPhoneFFT
Apple's new iPhone OS 4.0 SDK allows for built-in computation of the FFT with the "Accelerate" library, so I'd definitely start working with the new OS if it's a central part of your app's functionality.
You cant just port FFT code written in C into your app...there is the thumb compiler option that complicates floating point arithmetic. You need to put it in arm mode
I hope this falls within the "programming question" category.
Im all lightheaded from Googling (and reading every post in here on the subject) on the subject "Computer Vision", but Im getting more confused than enlightened.
I have 6 abstract shapes printed on a piece of paper and I would like to have the camera on the iPhone identify these shapes (from different angles, lightning etc.).
I have used OpenCV a while back(Java) and I looked at other libraries out there. The caveat is that it seems that either they rely on a jail broken iPhone or they are so experimental and hard to use that I would probably end up using days learning libraries only to figure out they didn't work.
I have thought of taking +1000 images of my shapes and training a Haar filter. But again
if there is anything out there that is a bit easier to work with I would really appreciate the advise, suggestion of people with a bit of experience.
Thank you for any suggestion or pieces of advise you might have:)
Have a look at at OpenCV's SURF feature extraction (they also have a demo which uses it to detect objects).
Surf features are salient image features which are invariant to rotation and scale. Many algorithms detect objects by extracting such features from an image, and then use simple "bag of words" classification (comparing the set of extracted image features to the features of your "shapes". Even without referring to their spacial alignment you can have good detection rates if you only have 6 shapes).
While not a library, Chris Greening explains how iPhone Sudoku Grab does its image recognition of puzzles in his post here. He does seem to recommend OpenCV, and not just for jailbroken devices.
Also Glen Low talks a bit about how Instaviz does its shape recognition in an interview for the Mobile Orchard podcast.
I do shape recognition in my iPhone app Instaviz and the routines are actually packaged into a library I call "Recog". Only problem is that it is meant for finger or mouse gesture recognition rather than image recognition. You pass the routines a set of points representing the gesture and it tells you whether it's a square, circle etc.
I haven't yet decided on a licensing model but probably use a minimal per-seat royalty.