APIs for converting Voice/Audio data in to text - iphone

I am working on a iphone apps in which i am storing the voice of users as audio file and want to display in text.
How it will be ...any idea about APIs ??
Thanks,
Aaryan

Have you seen CMU Sphinx ?
Particularly, pocket sphinx (written in C)
While more recognition oriented, it's been used for transcription before, so it will depend on what exactly you need:
Further, have you considered a non-native/local API, i.e. a web service you could call with your voice data, or are you adamant about a native library/API ?
For example, Ribbit has a platform for these sorts of things, and does support transcribing voice to text
"How do I enable voice-to-text transcriptions?
Available as a paid service, voice-to-text transcriptions are automatically available through the Ribbit API. Please use the $25 Free signup credit to try the service."

There is one app that does this already: Jott. The way they do it is to send the file to transcribers in India! (source)

You will have to develop the voice recognition engine yourself I'm afraid. No library that I know of can do this. Apart form that, the iPhone CPU would probably not be powerful enough.

Related

Audio Fingerprinting implementation on iPhone

It's the first time I post a question in a blog, but it seems to me this is the best resource on the web for that.
I'm looking for a way to implement audio fingerprinting in an application for iPhone. I had a look at the lastfm fingerprinter, being that I already use other lstfm api calls, but porting it to the iphone seems to be a mess and I'm quiet sure that it would be highly inefficient.
Should I give the search for now as I am looking for a free service ,I'm a young private developer and don't have sufficient economic resources for a payed service. This is also the reason for which I cannot install the library on my web server and run it remotely, sending just the audio data to it. The hosting I rely on dosen't allow me to install third party applications...
Music Brainz seems to be a solution, but not quiet sure on how to obtain the fingerprint...
Any suggestions, hint, tips, links, search queries, anything?
Thanks in advance!
Christian
Check out Echoprint, an open source audio fingerprinting service provided by Echonest.
Here is the iOS example they provide on their echoprint github repo.

Is it possible for a mobile webpage to capture a picture?

Assuming you built a page for each specific mobile browser (Android/iOS/BB/etc.), is it possible to have a web application capture an image and send it to the server for processing?
I'd like there to be "Nothing to install" for my application, but if I need to reach out to the hardware at all, I fear it's not possible.
There is the Video Capture API but I have no idea how widely spread addoption is at present and it is very new.
IF this api isn't avalible there isn't really much you can do other then asking users to upload it using a standard file upload and them to take the picture before hand.
This is one area that a native application would be far far better as intergeneration would be easier and more seamless for the user.

Voice Synthesis for the iPhone

I know that Apple hasn't given access to voice recognition, but do we have access to voice synthesis. If they haven't given us an API, would it be possible to hack the accessibility APIs to work even for people with VoiceOver turned off?
The last time i checked, the API's required for voice synthesis (NSSynthesis) are only available on Mac OS. The API's have not yet been ported to iphone.
I have heard a lot from this company:
http://www.acapela-group.com/acapela-for-iphone-multilingual-speech-synthesis-available-for-iphone-applications--2028-speech-synthesis.html
Their product is supposed to work quite well, although their licensing scheme is a bit steep.
The license is a bit steep, but the text-to-speech is excellent. At the end of the day, unless you can write your own tts engine, you are over a barrel if you app requires that feature. I have looked at some of the free implementations and they are just not ready for prime time. I guess question is "Is the profit left after that percentage worth writing the application?"

Speech Recognition on iPhone

I need to develop an iPhone application which recognizes speech, and based on the result it performs further tasks.
I know iPhone 3.0 doesn't support speech recognition and I need to implement speech recognition software on the server side. I know this thing only, since I am newbie I don't know how to deal with that.
Mean Which software i need to buy and implement it at server side, and how to use that Service ??
The best open source speech recognition package I know of is Sphinx.
http://cmusphinx.sourceforge.net/
Otherwise, I would suggest looking into Nuance software.
Current speech recognition does well with a limited grammar set (if you know what they are going to say). Open dictation still doesn't quite work well enough to be used reliably for many applications. Keep that in mind while developing your application. I'm hoping now that Google is getting into the transcription game (with Google Voice) that should start improving. I'm thinking they will probably have something in the future.
I don't think there are many server side speech recognition software suites. Open source versions seem virtually non-existent. You might want to take a look at this SDK though:
http://www.scribd.com/doc/17247334/Creaceed-Releases-iPhone-Speech-Recognition-SDK
http://www.creaceed.com/weblog/ceedvocalsdk.html
It might allow you to do what you want on the iPhone itself.
Getting speech recognition right is very tricky and an active research area.
There are a few open-source solutions out there, though, see here. An additional, new one is SCARF, but I don't know if that is ready to use or rather just a proof of concept.
Check out the Nuance Mobile Developer program. We've got libraries for various platforms (including iOS) and an HTTP service if necessary.

What is the iPhone SDK Missing?

I've been doing mobile app development for a long time (2001?), but the systems we worked with back then were dedicated mobile development environments (Symbian, J2ME, BREW). iPhone SDK is a curious hybrid of Mac OS X and Apple's take on mobile (Cocoa Touch).
But it is missing some stuff that other mobile systems have, IMO. Specifically:
Application background processing
SMS/MMS application routing (send an SMS to my application in the background)
API for accessing phone functions/call history/call interception
I realize that Apple has perfectly valid reasons for releasing the SDK the way they did. I am curious what people on SO think the SDK is missing and how would they go about fixing/adding it, were they an Engineering Product Manager at Apple.
The biggest shortcoming in my opinion is support for separating licensing from distribution.
What I mean by this is that it should be possible to download a trial version of an application and later purchase a license for that application (from an API call inside the application or from the app store). This would make it much easier to try-before-you-buy and get rid of the current duplicates of many applications with 'lite' versions.
I think lack of push notifications for apps is the big thing we're missing right now. With push, you can register your application to perform a task (like getting the most recent data from a web service) even when it's not running, at a time and frequency the OS decides is best. In an ideal world, along with the existing concept of iPhone apps loading quickly and resuming where you last left off, this solves the problem of not running in the background. I know some tasks will be more difficult or maybe impossible with this strategy, but it's still a pretty good compromise between third party applications and the iPhone's limited hardware.
Originally push was scheduled for last September, but it was removed from the beta SDK and not spoken of since then.
API's I'm personally looking for:
Apple80211 as a public API (private, current API is fine if documented)
Access to Volume buttons (semi-accessible via Celestial, private, needs new API)
Access to Calendar (private, API status unknown)
Access to Bluetooth + SPP profile (status unknown)
Access to Camera (directly, API status unknown)
Access to JavaScript runtime (directly, not through UIWebView, API status unknown)
WebKit access that's lower-level than UIWebView (private, current API is fine)
Access to Music Library (private, current API is fine)
Garbage Collection.
CoreData is missing.
You've mentioned some of the big ones - copy & paste (or in fact any way for apps to collaborate) is another huge omission.
It also seems to lack a desktop synch framework (at least if it exists I can't find it).
Language independence and especially lack of scripting is another pet peeve - objective-c is all very well but more languages to choose from would be good.
Inability to dynamically extend apps, via scripts or otherwise, is another big omission. This is partly an SDK/OS issue, partly licensing.
My list ordered by priority:
Mapping abstraction (the MapKit looks awesome), but that would require a new Google Maps TOS
Music library
Camera (photo + video) Access to more
UIViews, Apple designed some pretty nice custom ones for their apps
Better UIWebKit abstraction
The features I see missing that it should have is
Access to SMS
Direct Access to Google Maps App. You should be able have access to this so you could extend your application to use the built in features provided by Google Maps.
Access to the Bluetooth functionality of the phone.
Access to the Calendar. Why not allow access to simply post a calendar event for the user.
Access to Active Sync. It would great if we could directly access this and communicate back to the Exchange Server.
Core Image. They provide Core Animation but Core Image is missing. I hope that this is added to the API soon.
These are some of the features that my clients have access for in the past and are supprised when they are not available.
We definitely miss a Calendar API and SMS access. So many applications could leverage such APIs. The iPhone allows users to have everything in their pocket, but it's almost useless as long as developers cannot leverage this integration in their apps.
A language with proper namespaces.
A limitation that bugs me is lack of access to system features that require root or setuid. For example: opening privileged IP ports.
I'm not sure there is a good solution to this, as long as Apple's policy is to keep the device locked-down.
Allow program to set some kind of local timed event for your application to bring up an alert and launch your app if the user agrees (like any calendar app). You could do that with push notifications but there are many cases I'd hate to have to rely on a whole server infrastructure and network connectivity just to basically do some timed thing.
Some idea of what direction the user is facing. I cannot believe the GPS chip the newer iPhones use are not capable of reporting direction.
I would personally love to see
Access to the CoreTelephony Framework (Currently private). Which allows access to all the phone functions (Especially sending MMS / SMS).
Some sort of ability to run stuff in the background. While push notifications is ok for most things, but it is a bit hard to leverage CoreLocation (i.e. have the app show a notification at a certain location). Of course this would probably need an on/off button or app specific like push is.
animation view which will be reduce developer to make a cool app , of course the core business local still need consider more , but the view layer could more easy to use ....