I am building an application that gets real-time audio from our organization's VoIP system, records the call and transcribe the real-time voice. The transcription then passed to our analytics engine and get the insights.
We are able to transcribe the recorded audio and get the insights from the transcription. We have a solution for real-time transcription also. It will transcribe the voice from the microphone and even an RTSP stream also. We are having trouble finding a solution for getting the real-time audio from SIP/VoIP systems. I read that SIP Trunking and option and also WebRTC is also another option. But I don't know how to and where to start with.
I am experienced in Java and Python, I requesting experts to give me suggestions or examples on how to get the real-time audio stream from a SIP/VoIP conversation.
I am not familiar with SIP/VoIP and never written VoIP application.
A solution that might suit your needs is Oreka, which is the open source version of Orecx, a call recording software for VoIP.
I used it in the past and it works perfectly well with SIP calls that use open audio codecs like g711 (alaw,ulaw) or speex but it may have problems decoding the audio of calls that use the propietary g729 codec (I had to work out my own codecs at that time).
The paid version might support more codecs and protocols like Avaya's H323.
Have in mind that this app works by sniffing the network, so the setup is not trivial. Anyway, I suggest you give it a try.
Link: https://www.orecx.com/open-source/
For anyone out there. if you want to have access to live/realtime audio data from a VoIP call I suggest you use Twilio Streams.
If you're just looking to get realtime transcriptions without access to the actual audio data Twilio and Plivo also provide that.
Related
So generally, I want to make an app which has video chat functionality for iPhone. But after many searches, I am still not able to find any successful results. Is there any public or even for that matter, private API available for doing this on iPhone??? If you have an YES answer, please help me.
Basically, what I want is to read the streams of the video on both the devices connected for chatting. Thanks a lot in advance and please help me if you can.
p.s - I have already checked iDoubs but it failed and always shows some unknown problem and for that reason, doesn't allow me to connect to anyone.
ALSO : The suggested method I have found is via HTTP Live Streaming. But, in that too, I have multiple doubts.
1.) I need to find how do I upload my video from iPhone to the HTTP server from where I would be broadcasting?
2.) Can you please post something related to setting up the server? How do I feed the video to the FFMPEG Server?
Mainly, I need to find the upload method. I am right now simply sending hex-code in the form of NSDATA to the server and I am stuck there. The main problem is, It is live. How do I handle that?
It would be best, if you could help me make the iDoubs work properly.
Thank you so much for any kind of support!
have a look on this how to implement video chat in iphone But before starting you must have a IMS server up & running.
here is the live video chat framework what you are looking for. Its easy and simple to implement for face to face video chat. I have already tried this. Its working very fine. Great thing about this framework is multiple platform support.
Tokbox : https://tokbox.com/platform
https://tokbox.com/opentok/tutorials/
Sample Code:
https://github.com/opentok/opentok-ios-sdk-samples/
Edit:
Here is the article explaining opentok using parse.
http://www.iphonegamezone.net/ios-tutorial-create-iphone-video-chat-app-using-parse-and-opentok-tokbox/
HTTP live streaming is primarily an approach for adaptive streaming from server-to-client. For client-to-server rather go for traditional streaming. There exists an open library for streaming, see this question.
Whilst it is possible to facetime to do two-way chat, it is not certain that you will be able to using public iOS APIs. That said, I have implemented one-way live streaming for iPhone and the difficult part was not the core streaming itself, but encoding of the payload. You will be able to do H264 in hardware and AAC / iLBC in software.
How you want to feed this to the FFMPEG depends on your transport, possibly changing from 'file' H264 frames to 'streaming' H264. Check out the H264 frame types if you implement frame dropping; reconfiguring the H264 encoder on-the-fly is not possible to my knowledge, but restarting with fresh parameters typically does not take more than a second or so.
Did you attempt to play back a live resource while capturing? That is a good starting point. If you come across an open API for H264 encoding, please post it here ;-)
I realize that the official supported streaming protocol for the iPhone is HTTP streaming . This is great, but many appliances implement the RTSP protocol to stream video. I have looked around for quite some time looking for RTSP libraries in objective c and have not found them. Does anyone know of such libaries?
If not, does anyone know of some demo/code examples from people who have tried to get this to work. Since Apple supports h264 in hardware, I'm assuming it is possible to get low level, implement the stream, then construct the video packet and pass it along as if you have streamed using HTTP streaming. Anyone advice on how this might be done is appreciated.
Check out live555. This will handle all the RTSP handshaking and deliver data (in your case, h264) to you application for further processing/decoding. It is a C/C++ library, and hence can run on iOS.
Your options for integration with a cocoa app are:
1) Run live555 on its own thread using the event loop mechanism given as part of the library (note then that all operations directly related to live555 need to be run on this thread as live555 itself is not designed to be thread-safe).
2) Provide a cocoa implementation of "TaskScheduler", in which you use the cocoa library for async network callbacks, timers etc.
The above points will make more sense to you after reviewing the live555 doco.
I want to be able to (live) stream the frames/video FROM the iPhone camera to the internet. I've seen in a Thread (streaming video FROM an iPhone) that it's possible using AVCaptureSession's beginConfiguration and commitConfiguration. But I don't know how to start designing this task. There are already a lot of tutorials about how to stream video TO the iPhone, and it is not actually what I am searching for.
Could you guys give me any ideas which could help me further?
That's a tricky one. You should be able to do it, but it won't be easy.
One way that wouldn't be live (not answering your need, but worth mentioning) is to capture from the camera and save it to a video file. see the AV Foundation Guide on how to do that. Once saved you can then use the HTTP Live Streaming segmenter to generate the proper segments. Apple has applications for Mac OSX, but there's an open source version as well that you could adapt for iOS. On top of that, you'd also have to run an http server to serve those segments. Lots of http servers out there you could adapt.
But to do it live, first as you have already found, you need to collect frames from the camera. Once you have those you want to convert them to h.264. For that you want ffmpeg. Basically you shove the images to ffmpeg's AVPicture, making a stream. Then you'd need to manage that stream so that the live streaming segmenter recognized it as a live streaming h.264 device. I'm not sure how to do that, and it sounds like some serious work. Once you've done that, then you need to have an http server, serving that stream.
What might actually be easier would be to use an RTP/RTSP based stream instead. That approach is covered by open source versions of RTP and ffmpeg supports that fully. It's not http live streaming, but it will work well enough.
Using FFMPEG, Live555, JSON
Not sure how it works but if you look at the source files at http://github.com/dropcam/dropcam_for_iphone you can see that they are using a combination of open source projects like FFMPEG, Live555, JSON etc. Using Wireshark to sniff the packets sent from one of the public cameras that's available to view with the free "Dropcam For Iphone App" at the App Store, I was able to confirm that the iphone was receiving H264 video via RTP/RTSP/RTCP and even RTMPT which looks like maybe some of the stream is tunneled?
Maybe someone could take a look at the open source files and explain how they got RTSP to work on the iphone.
Thanks for the info TinC0ils. After digging a little deeper I'v read that they have modified the Axis camera with custom firmware to limit the streaming to just a single 320x240 H264 feed, to better provide a consistent quality video over different networks and, as you point out, be less of a draw on the phone's hardware etc. My interest was driven by a desire to use my iphone to view live video and audio from a couple of IP cameras that I own without the jerkiness of MJPEG or the inherent latency that is involved with "http live streaming". I think Dropcam have done an excellent job with their hardware/software combo, I just don't need any new hardware at the moment.
Oh yeah, I almost forgot the reason of this post RTSP PROTOCOL DOES WORK ON THE IPHONE!
They are using open source projects to receive the frames and decoding in software instead of using hardware decoders. This will work, however, this runs counter to Apple's requirement that you use their HTTP Streaming. It will also require greater CPU resources such that it doesn't decode video at the desired fps/resolution on older devices and/or decrease battery life compared to HTTP streaming.
I'm wondering if there are any examples atomic examples out there for streaming audio FROM the iPhone to a server. I'm not interested in telephony or SIP style solutions, just a simple socket stream to send an audio clip, in .wav format, as it is being recorded. I haven't had much luck with the google or other obvious avenues, although there seem to be many examples of doing this the other way around.
i cant figure out how to register the unregistered account i initially posted with.
anyway, I'm not really interested in the audio format at present, just the streaming aspect. i want to take the microphone input, and stream it from the iphone to a server. i dont presently care about the transfer rate as ill initially just test from a wifi connection, not the 3g setup. the reason i cant cache it is because im interested in trying out some open source speech recognition stuffs for my undergraduate thesis. caching and then sending the recording is possible but then it takes considerably longer to get the voice data to the server. if i can start sending the data as soon as i start recording, then the response time is considerably improved because most of the data will have already reached the server by the time i let go of the record button. furthermore, if i can get this streaming functionality to work from the iphone then on the server side of things i can also start the speech recognizer as soon as the first bit of audio comes through. again this should considerably speech up the final amount of time that the transaction takes from the user perspective.
colin barrett mentions the phones and phone networks, but these are actually a pretty suboptimal solution for asr, mainly because they provide no good way to recover from errors - doing so over a voip dialogue is a horrible experience. however, the iphone and in particular the touch screen provide a great way to do that, through use of an ime or nbest lists for the other recognition candidates.
if i can figure out the basic architecture for streaming the audio, then i can start thinking about doing flac encoding or something to reduce the required transfer rate. maybe even feature extraction, although that limits the later ability to retrain the system with the recordings.