I need to recognise numbers from the camera image on iPhone, in real-time. I know there will be no more than 5 digits on the image.
Is this problem realistic to solve given the computational specifications of the iPhone?
Does anyone have any experience using the Tesseract OCR library, and do you think it could be solved by using it?
The depends on your definition of "real-time", but yes, it should be possible to do relatively fast recognition of just the digits 0-9 on an iPhone 4, particularly if you can fonts, lighting conditions, etc. that they will appear in.
I highly recommend reading the article on how Sudoku Grab does its recognition of puzzles using the iPhone camera. In their case, a trained neural network was used to identify the digits, which should be reasonably simple and fast on modern iOS hardware.
The current recognition libraries out there, like OpenCV, will use the iPhone's CPU to do the processing. I've heard that they can do even more complex tasks like facial recognition fast enough to use with video sources while showing a minimal amount of stutter.
For even better performance, I believe that there's a lot of potential in the programmable GPUs on the newer iOS devices. In my benchmarks, I saw a 14X - 28X speedup when using the iPhone 4's GPU for simple image processing. While few people are looking at this right now, something like Sudoku Grab's neural network should be a parallel enough process to benefit from running on the GPU.
It should be computationally possible. There are apps that can get a bar code in real time and also an app that does real time translation. (Word Lens). I'm not sure what libraries they use, however.
YES it is possible using the tesseract engine
Here is the sample code if you like to check...
https://github.com/nolanbrown/Tesseract-iPhone-Demo
There is free SDK for that: http://rtrsdk.com/ Supports both iOS and Andorid, works in real-time, helps you capture any text, numbers should not be a problem.
Disclaimer: I work for ABBYY
Yes. Bender can help you with that. It lets you build and run neural nets on iOS. As it uses Metal under the hood, it runs fast and smooth. It also supports running TensorFlow models directly.
So you can run in Bender an existing model in TensorFlow trained for digit recognition Handwritten Digit Recognition using Convolutional Neural Networks in Python with Keras if you need help
Disclaimer: I worked on this project.
Related
Is there any library alternative to OpenCV which detects smile.
I dont want to use OpenCV as it sometimes fails to detect faces due to background.
Any one knw other library ? other than OpenCV ?
I would recommend having a look at The Machine Perception Toolbox (MPT Library).
I had a chance to play with it a bit at an Openframeworks OpenCV workshop at Goldsmiths and there is a c++ smile detection sample available.
I imagine you can try the MPT Library for iPhone with openframeworks or simply link to the library from an iphone project.
sometimes fails to detect faces due to
background.
An ideal lighting setup will guarantee better results, but given that you want to use this on a mobile device, you must inform your users that smile detection might fail under extreme conditions (bad lighting)
HTH
How are you doing smile detection? I can't see a smile-specific Haar dataset in the default OpenCV face detection cascades. I suspect your problem is training data rather than OpenCV itself.
Egawer is a good starting point if you need a working app to begin with.
https://github.com/Atrac613/egawer-iOS
I checked the training images of smileD_haarcascade_v0.05, an found that they include the full face. So, it seems to be a "smiling face" detector rather than a smile detector alone. While this seems easier, it can also be less accurate.
The best is to create your own Haar Cascade XML file, but admittedly most of us developers don't have time for that. You can improve the results considerably by equalizing the brightness of the image.
iOS 7 now has native support of simile detection in CoreImage. Here is the API diff:
For iOS 7, Yes, now you can do it with CoreImage.
Here is the API diff in iOS 7 Beta 2:
CoreImage
CIDetector.h
Added CIDetectorEyeBlink
Added CIDetectorSmile
I am developing an iPhone application (like Audio Processing). I have to give some effect to the audios.
If it is desktop app, many options are there. We can get good examples and full project like audacity. But I want to develop for iPhone.
I got an app with reverb option; (take a look at following link). Just I watch the "video", I did not test this application in my iPhone device.
http://www.appstorehq.com/reverb-iphone-89870/app
My question is; How can I develop the app with reverb functionality ? Is there any documentation for that ? If it is, just share with us.
NOTE: We can use AudioUnit to develop the app with reverb functionality (I am not clear with this.).
EDIT: I don't like to use any third party library.
If anybody having knowledge about this, please share with us.
Thanks.
if yourre targeting ios5 you can just the audio unit subtype kAudioUnitSubType_Reverb2 of the effect audio unit.
reverb unit
AudioComponentDescription auEffectUnitDescription;
auEffectUnitDescription.componentType = kAudioUnitType_Effect;
auEffectUnitDescription.componentSubType = kAudioUnitSubType_Reverb2;
auEffectUnitDescription.componentManufacturer = kAudioUnitManufacturer_Apple;
AUGraphAddNode(
processingGraph,
&auEffectUnitDescription,
&auEffectNode),
Failing that you could just write your own reverb code in the remoteio callback. A simple delay might be easier to do and would sound similar.
iOS 5.0 brings native OpenAL support, so it is now much easier - you don't have to code the algorithm yourself. It also bring support for a variety of reverb spaces:
Small Room
Medium Room
Large Room (2 configurations)
Medium Hall (3 configurations)
Large Hall (2 configurations)
Plate
Medium Chamber
Large Chamber
Cathedral
I suggest that you try the ObjectAL wrapper which already has a great support for the reverb effect:
https://github.com/kstenerud/ObjectAL-for-iPhone
Grab the source from this repository, load "ObjectAL.xcodeproj" and run the ObjectALDemo target on any iOS 5.0 device (should also work on the simulator). This will give you a good starting point and feeling of what the reverb effect is capable of.
If you still don't to use any 3rd party library, you can just grab the relevant pieces from ObjectAL. Look for the reverb-related code in the following source files (and their corresponding headers):
https://github.com/kstenerud/ObjectAL-for-iPhone/blob/master/ObjectAL/ObjectAL/OpenAL/ALListener.m
https://github.com/kstenerud/ObjectAL-for-iPhone/blob/master/ObjectAL/ObjectAL/OpenAL/ALSource.m
https://github.com/kstenerud/ObjectAL-for-iPhone/blob/master/ObjectAL/ObjectAL/OpenAL/ALWrapper.m
Good luck with your project!
AUs are a good place to start.
write your own reverb AU which contains a reverb implementation. there are tons of ways to implement a reverb. a medium/long convolution reverb is much to ask from a phone, but something such as a FDN (feedback delay network) will not require a lot of memory or CPU.
both implementations are easy to implement, if you're familiar with audio programming and optimization. the tough part is actually making one that sounds very good and performs well.
if you're unable to write optimal low level code or you do not (presently) understand basic audio signal processing, then you'll have a few obstacles to overcome -- it may be a long road in that case.
Searching the iOS documentation for "reverb" produces a link to the Core Audio Overview, which references reverb as an "effect unit." Perhaps that's worth further study?
No good, I have attempted the audio unit approach and even though it is in the documentation it is "not" implemented yet by the apple engineers. Each time you call the function to set the reverb property you will only get failure status code. You would have to implement your own reverb effect. Try reading some DSP book and you might find a clue.
you need to learn some DSP-level coding, the DSP cookbook book is okay and there are others out there. But basically you need to be comfortable with handling audio signal in the frequency domain and things such as FFT's. Once you have that, implementing a reverb filter should be straight-forward.
This is an answer I've given before, but I believe it is relevant here. I am going to agree with the others and say that you are going to have to become a bit more familiar with core-audio if you want to do this properly.
I highly recommend this core-audio book. It will teach what you need to do this right and will save you a lot of frustration.
The chapter on audio effects has not been published yet, but if it is anything like the rest of the book it's worth the wait.
EDIT
You will most likely need to do this with an audio effect (which is a form of an audio unit).
We are developing an iphone app that needs to process audio data in real time, but we are suffering with performance. The bottlenecks are in audio effects, which are in fact quite simple, but the performance hit is noticeable when several are added.
Most of the audio effects code is written in C.
We think there are two places we can use gpu hardware to speed things up: using openCL for effects and hardware for interpolation/smoothing. We are fairly new to this and don't know where to begin.
You probably mean OpenGL, as OpenCL is only present on the desktop. Yes, you could use OpenGL ES 2.0 programmable shaders for this, if you wanted to perform some very fast parallel processing, but that will be extremely complex to pull off.
You might first want to look at the Accelerate framework, which has hardware-accelerated functions for doing just the kind of tasks needed for audio processing. A great place to start is Apple's WWDC 2010 session 202 - "The Accelerate framework for iPhone OS", along with their "Taking Advantage of the Accelerate Framework" article.
Also, don't dismiss Hans' suggestion that you profile your code first, because your performance bottleneck might be somewhere you don't expect.
You might get better DSP acceleration coding for the ARM NEON SIMD unit. NEON is designed for DSP operations and can pipeline multiple single precision floating point operations per cycle. Whereas getting audio data in and out of GPU memory may be possible, but may not be that fast.
But you might want to profile your code to see if something else is the bottleneck. The iPhone 4 CPU can easily keep up with doing multiple FFT's and IIR filters on a real-time audio stream.
I want to analyze MIC audio on an ongoing basis (not just a snipper or prerecorded sample), and display frequency graph and filter out certain aspects of the audio. Is the iPhone powerful enough for that? I suspect the answer is a yes, given the Google and iPhone voice recognition, Shazaam and other music recognition apps, and guitar tuner apps out there. However, I don't know what limitations I'll have to deal with.
Anyone play around with this area?
Apple's sample code aurioTouch has a FFT implementation.
The apps that I've seen do some sort of music/voice recognition need an internet connection, so it's highly likely that these just so some sort of feature calculation on the audio and send these features via http to do the recognition on the server.
In any case, frequency graphs and filtering have been done before on lesser CPUs a dozen years ago. The iPhone should be no problem.
"Fast enough" may be a function of your (or your customer's) expectations on how much frequency resolution you are looking for and your base sample rate.
An N-point FFT is on the order of N*log2(N) computations, so if you don't have enough MIPS, reducing N is a potential area of concession for you.
In many applications, sample rate is a non-negotiable, but if it was, this would be another possibility.
I made an app that calculates the FFT live
http://www.itunes.com/apps/oscope
You can find my code for the FFT on GitHub (although it's a little rough)
http://github.com/alexbw/iPhoneFFT
Apple's new iPhone OS 4.0 SDK allows for built-in computation of the FFT with the "Accelerate" library, so I'd definitely start working with the new OS if it's a central part of your app's functionality.
You cant just port FFT code written in C into your app...there is the thumb compiler option that complicates floating point arithmetic. You need to put it in arm mode
I hope this falls within the "programming question" category.
Im all lightheaded from Googling (and reading every post in here on the subject) on the subject "Computer Vision", but Im getting more confused than enlightened.
I have 6 abstract shapes printed on a piece of paper and I would like to have the camera on the iPhone identify these shapes (from different angles, lightning etc.).
I have used OpenCV a while back(Java) and I looked at other libraries out there. The caveat is that it seems that either they rely on a jail broken iPhone or they are so experimental and hard to use that I would probably end up using days learning libraries only to figure out they didn't work.
I have thought of taking +1000 images of my shapes and training a Haar filter. But again
if there is anything out there that is a bit easier to work with I would really appreciate the advise, suggestion of people with a bit of experience.
Thank you for any suggestion or pieces of advise you might have:)
Have a look at at OpenCV's SURF feature extraction (they also have a demo which uses it to detect objects).
Surf features are salient image features which are invariant to rotation and scale. Many algorithms detect objects by extracting such features from an image, and then use simple "bag of words" classification (comparing the set of extracted image features to the features of your "shapes". Even without referring to their spacial alignment you can have good detection rates if you only have 6 shapes).
While not a library, Chris Greening explains how iPhone Sudoku Grab does its image recognition of puzzles in his post here. He does seem to recommend OpenCV, and not just for jailbroken devices.
Also Glen Low talks a bit about how Instaviz does its shape recognition in an interview for the Mobile Orchard podcast.
I do shape recognition in my iPhone app Instaviz and the routines are actually packaged into a library I call "Recog". Only problem is that it is meant for finger or mouse gesture recognition rather than image recognition. You pass the routines a set of points representing the gesture and it tells you whether it's a square, circle etc.
I haven't yet decided on a licensing model but probably use a minimal per-seat royalty.