I've googled around a few times, but I have not gotten a straight answer. I have a matrix that I would like to convolve with a discrete filter (e.g. the Sobel operator for edge detection). Is it possible to do this in an accelerated way with OpenGL ES on the iPhone?
If it is, how how? If it is not, are there other high-performance tricks I can use to speed up the operation? Wizardly ARM assembly operations that can do it fast? Ultimately I want to perform as fast of a convolution as possible on an iPhone's ARM processor.
You should be able to do this using programmable shaders under OpenGL ES 2.0. I describe OpenGL ES 2.0 shaders in more detail in the video for my class on iTunes U.
Although I've not done image convolution myself, I describe some GPU-accelerated image processing for Mac and iOS here. I present a sample application that uses GLSL shaders (based on Core Image filters developed by Apple) that does realtime color tracking from the iPhone's camera feed.
Since I wrote this, I've created an open source framework based on the above example which has built-in image convolution filters, ranging from Sobel edge detection to custom 3x3 convolution kernels. These can run up to 100X faster than CPU-bound implementations.
However, if you were to do this on the CPU, you might be able to use the Accelerate framework to run some of the operations on the iPhone's NEON SIMD unit. In particular, FFT operations (which are usually a key component in image convolution filters, or so I've heard) can get a ~4-5X speedup by using the routines Apple provides here.
Related
I've looked into Metaio which can do Facial 3d reconstructions
video here: https://www.youtube.com/watch?v=Gq_YwW4KSjU
but I'm not looking to do that. I want to simply be able to have the user scan in a small simple object and a 3d model be created from that. I don't need it textured or anything. As far as I can tell Metaio cannot do what I'm looking for, or at least I can't find the documentation for it.
Since you are targeting mobile, you would have to take multiple pictures from different angles and use an approach used in this CSAIL paper.
Steps
For finding the keypoints, I would use FAST, or a method using the Laplacian of Gaussian. Other options include SURF and SIFT.
Once you identify the points, use triangulation to find where the points will be in 3D.
With all of the points, create a point cloud. In unity, I would recommend doing something similar to this project, which used particle systems as the points.
You now have a 3d reconstruction of the object!
Now, in implementing each of these steps, you could reinvent the wheel, or use C++ native plugins in Unity. This enables you to use OpenCV which has many of these operations already implemented (SURF, SIFT, possibly even some 3D reconstruction classes/methods, which use Stereo Calibration*).
That all being said... the Android Computer Vision Plugin(also apparently called "Starry Night") seems to have these capabilities. However, in version 1.0, only PrimeSense sensors are supported. See the description of the plugin**
Starry Night is an easy to use Unity plugin that provides high-level 3D computer vision processing functions that allow applications to interact with the real world. Version 1.0 provides SLAM (Simultaneous Localization and Mapping) functions which can be used for 3D reconstruction, augmented reality, robot controls, and many other applications. Starry Night will interface to any type of 3D sensor or stereo camera. However, version 1.0 will interface only to a PrimeSense Carmine sensor.
*Note: That tutorial is in matlab, but I think the overview section gives a good understanding of stereo calibration
**as of May 12th, 2014
I'm making an app on the iphone and I need a way of detecting the tune of the sounds coming in through the microphone. (I.e. A#, G, C♭, etc.)
I assumed I'd use AVAudio but I really don't know and I can't find anything in the documentation..
Any help?
Musical notes are nothing more than specific frequencies of sound. You will need a way to analyze all of the frequencies in your input signal, and then find a way to isolate the individual notes.
Finding frequencies in an audio signal is done using the Fast Fourier Transform (FFT). There is plenty of source code available online to compute the FFT from an audio signal. In particular, oScope offers an open-source solution for the iPhone.
Edit: Pitch detection seems to be the technical name for what you are trying to do. The answers to a similar question here on SO may be of use.
There's nothing built-in to the iOS APIs for musical pitch estimation. You will have to code your own DSP function. The FFTs in the Accelerate framework will give you spectral frequency information from a PCM sampled waveform, but frequency is different from psycho-perceptual pitch.
There are a bunch of good and bad ways to estimate frequency and pitch. I have a long partial list of various estimation methods on my DSP resources web page.
You can look at Apple's aurioTouch sample app for an example of getting iOS device audio input and displaying it's frequency spectrum.
Like #e.James said, you are looking to find the pitch of a note, its called Pitch Detection. There are a ton of resources at CCRMA, Stanford University for what you are looking for. Just google for Pitch Detection and you will see a brilliant collection of algorithms. As far as wanting to find the FFT of blocks of Audio Samples, you could use the built-in FFT function of the Accelerate Framework (see this and this) or use the MoMu toolkit. Using MoMu has the benefit of it's functions decomposing the audio stream into samples for you and easy application of the FFT using it's own functions.
Is there any library alternative to OpenCV which detects smile.
I dont want to use OpenCV as it sometimes fails to detect faces due to background.
Any one knw other library ? other than OpenCV ?
I would recommend having a look at The Machine Perception Toolbox (MPT Library).
I had a chance to play with it a bit at an Openframeworks OpenCV workshop at Goldsmiths and there is a c++ smile detection sample available.
I imagine you can try the MPT Library for iPhone with openframeworks or simply link to the library from an iphone project.
sometimes fails to detect faces due to
background.
An ideal lighting setup will guarantee better results, but given that you want to use this on a mobile device, you must inform your users that smile detection might fail under extreme conditions (bad lighting)
HTH
How are you doing smile detection? I can't see a smile-specific Haar dataset in the default OpenCV face detection cascades. I suspect your problem is training data rather than OpenCV itself.
Egawer is a good starting point if you need a working app to begin with.
https://github.com/Atrac613/egawer-iOS
I checked the training images of smileD_haarcascade_v0.05, an found that they include the full face. So, it seems to be a "smiling face" detector rather than a smile detector alone. While this seems easier, it can also be less accurate.
The best is to create your own Haar Cascade XML file, but admittedly most of us developers don't have time for that. You can improve the results considerably by equalizing the brightness of the image.
iOS 7 now has native support of simile detection in CoreImage. Here is the API diff:
For iOS 7, Yes, now you can do it with CoreImage.
Here is the API diff in iOS 7 Beta 2:
CoreImage
CIDetector.h
Added CIDetectorEyeBlink
Added CIDetectorSmile
I would like to apply image processing on pictures taken on the iPhone.
This processing would involve 2D matrix convolutions etc.
I'm afraid that the performance with nested NSArrays would be pretty bad. What is the right way to manipulate pixel based images? Should I simply use C arrays allocated with malloc?
Have you looked at the Quartz 2D engine available in the iPhone SDK? Or perhaps Core Graphics? Apple has a nice overview document describing all the different imaging technologies available on the iPhone. Unfortunately there isn't anything as nice as ImageKit on the iPhone yet.
I suggest to use OpenCV image processing library since it contains well optimized algorithms almost for anything you want. OpenCV will be definitely faster than using manual processing with NSArray.
But there are one major drawback - OpenCV library is written on C/C++, so you will have to convert your NSImage to native OpenCV image format to do processing. But it's really easy to google how to do this.
I use OpenCV in my own iPhone project, here is small how-to post of building OpenCv for IPhone: http://computer-vision-talks.com/2010/12/building-opencv-for-ios/
Yes, you would use a C array since that's how you get back the pixel data anyway.
As mentioned, you should look and see if you can use Quartz2D to do the manipulations you are interested in as it would probably perform better being hardware based. If not, just do your own over the array of pixels.
The iPhone also supports OpenCL, and it's GPU has way more processing power than the CPU.
I hope this falls within the "programming question" category.
Im all lightheaded from Googling (and reading every post in here on the subject) on the subject "Computer Vision", but Im getting more confused than enlightened.
I have 6 abstract shapes printed on a piece of paper and I would like to have the camera on the iPhone identify these shapes (from different angles, lightning etc.).
I have used OpenCV a while back(Java) and I looked at other libraries out there. The caveat is that it seems that either they rely on a jail broken iPhone or they are so experimental and hard to use that I would probably end up using days learning libraries only to figure out they didn't work.
I have thought of taking +1000 images of my shapes and training a Haar filter. But again
if there is anything out there that is a bit easier to work with I would really appreciate the advise, suggestion of people with a bit of experience.
Thank you for any suggestion or pieces of advise you might have:)
Have a look at at OpenCV's SURF feature extraction (they also have a demo which uses it to detect objects).
Surf features are salient image features which are invariant to rotation and scale. Many algorithms detect objects by extracting such features from an image, and then use simple "bag of words" classification (comparing the set of extracted image features to the features of your "shapes". Even without referring to their spacial alignment you can have good detection rates if you only have 6 shapes).
While not a library, Chris Greening explains how iPhone Sudoku Grab does its image recognition of puzzles in his post here. He does seem to recommend OpenCV, and not just for jailbroken devices.
Also Glen Low talks a bit about how Instaviz does its shape recognition in an interview for the Mobile Orchard podcast.
I do shape recognition in my iPhone app Instaviz and the routines are actually packaged into a library I call "Recog". Only problem is that it is meant for finger or mouse gesture recognition rather than image recognition. You pass the routines a set of points representing the gesture and it tells you whether it's a square, circle etc.
I haven't yet decided on a licensing model but probably use a minimal per-seat royalty.