I need to know the exact timeframe location from a Video using a small part of the audio. I will prefer using an external api, using google, amazon or Microsoft Voice/Audio solutions. Anyone knows what can I use? thanks.
Related
I want to integrate voice detection in my iPhone app. The iPhone app allow the user to search the word by using their voice. But, i don't know a single info about Voice Recognition in iPhone. Can you please suggest me any ideas,tutorials or sample code for this?
You can also use Google Chrome API to integrate voice recognition on your application, but there is a big problem : the API works only with FLAC encoded files, but this encoding isn't supported natively on iOS... :/
You can see those 2 links for more information :
http://www.albertopasca.it/whiletrue/2011/09/objective-c-use-google-speech-iphone/
http://8byte8.com/blog/2012/07/voice-recognition-ios/
EDIT :
I realized an application including voice recognition using Nuance SDK, but it's not free to use. You can register for free and get a developer key that allows you to test your application for 90-days. An application example is included, you can see the code, it's very easy to implement.
Good luck :)
The best approach will probably be to:
Record the voice on the phone
Send the recording to a server that runs the speech recognition software
Then return something to the phone to indicate what it should do
This approach is favorable as there are a number of open source voice to text softwares out there & you are not limited by computing power in the backend.
Having said that, iOS has OpenEars which is based on Pocket Sphinx. It looks promising...
Well voice recognition is not correlated with iphone. All you can do is record the voice in iphone. Once done, you can either code your one voice recognition module, or find a third party API and reuse it.
You can do google search on that.
I am working on a project where I have to record a voice covert into text then match the pattern and according to the user command perform action.
I am able to to record voice of the user through AVAudioRecorder and perform action. But the actions are perform on anything what user says. I want to perform on user's particular word like if he say play then playing should start.
Help me by any tutorial or any sample code.
Thanks in Advance
Most apps (including Siri) send the sound file to a remote data center via to do the speech recognition, which involves some fairly heavy duty processing. Nuance may have an commercial API.
Another option might be to try using the CMU OpenEars or PocketSphinx speech library, which has been ported to the iPhone. Also look at VocalKit and this article on running PocketSphinx on the iPhone.
So generally, I want to make an app which has video chat functionality for iPhone. But after many searches, I am still not able to find any successful results. Is there any public or even for that matter, private API available for doing this on iPhone??? If you have an YES answer, please help me.
Basically, what I want is to read the streams of the video on both the devices connected for chatting. Thanks a lot in advance and please help me if you can.
p.s - I have already checked iDoubs but it failed and always shows some unknown problem and for that reason, doesn't allow me to connect to anyone.
ALSO : The suggested method I have found is via HTTP Live Streaming. But, in that too, I have multiple doubts.
1.) I need to find how do I upload my video from iPhone to the HTTP server from where I would be broadcasting?
2.) Can you please post something related to setting up the server? How do I feed the video to the FFMPEG Server?
Mainly, I need to find the upload method. I am right now simply sending hex-code in the form of NSDATA to the server and I am stuck there. The main problem is, It is live. How do I handle that?
It would be best, if you could help me make the iDoubs work properly.
Thank you so much for any kind of support!
have a look on this how to implement video chat in iphone But before starting you must have a IMS server up & running.
here is the live video chat framework what you are looking for. Its easy and simple to implement for face to face video chat. I have already tried this. Its working very fine. Great thing about this framework is multiple platform support.
Tokbox : https://tokbox.com/platform
https://tokbox.com/opentok/tutorials/
Sample Code:
https://github.com/opentok/opentok-ios-sdk-samples/
Edit:
Here is the article explaining opentok using parse.
http://www.iphonegamezone.net/ios-tutorial-create-iphone-video-chat-app-using-parse-and-opentok-tokbox/
HTTP live streaming is primarily an approach for adaptive streaming from server-to-client. For client-to-server rather go for traditional streaming. There exists an open library for streaming, see this question.
Whilst it is possible to facetime to do two-way chat, it is not certain that you will be able to using public iOS APIs. That said, I have implemented one-way live streaming for iPhone and the difficult part was not the core streaming itself, but encoding of the payload. You will be able to do H264 in hardware and AAC / iLBC in software.
How you want to feed this to the FFMPEG depends on your transport, possibly changing from 'file' H264 frames to 'streaming' H264. Check out the H264 frame types if you implement frame dropping; reconfiguring the H264 encoder on-the-fly is not possible to my knowledge, but restarting with fresh parameters typically does not take more than a second or so.
Did you attempt to play back a live resource while capturing? That is a good starting point. If you come across an open API for H264 encoding, please post it here ;-)
I am trying to get started in advanced audio with the iPhone SDK. I really want to make professional level audio components. I know the basics (e.g. how to use NSAVAudioPlayers), but I don't know what to do for the more complicated sort of audio (e.g. osculation and audio cones). Does anyone know where to go for this? (I tried research online, and all that came up weer the sort of simplistic audio components).
Core Audio. That's where you'll find an OpenAL implementation. You might also want to look at the Audio Processing Graph API (also part of core audio).
In my application, I need to to open the camera and read the data as given below in attached image:
So by reading this the API should return me an audio file generated for the above image.
Do tell me if there is some other way to do this other than how I am trying. Thanks in advance.
OpenOMR is some GPL'd Optical Music Recognition code available on SourceForge, although there does not appear to be any iPhone/iOS port. So you'd have to do a major port (including creating a JVM for iOS) to create an OCR API from this code.
There's some history on the subject of Music OCR on Wikipedia.