Is there a way to continuously send snippets of audio being recorded in realtime to backend server in Flutter.io? - flutter

I am creating an application that uses Mozilla's Deep Speech API to transcribe the user's speech to text. The input requires audio files with some sort of format and in order for this app to work, I will need to continuously send these audio files to my flask server while the user is recording audio.
I have seen that most flutter plugins only allow you to record, pause, and stop, however, I need to find a way to keep on recording while also sending audio files. If anyone has found a way to accomplish this using Flutter.io's recording plugins, any guidance and information would help.
My backup plan is to use one of Flutter's Speech-To-Text plugins like this one: https://pub.dev/packages/speech_to_text#-example-tab-, and then to send over the text to my backend server through a websocket. However, I don't know what kind of API they're using to transcribe and how accurate the text gets transcribed to.
Any ideas on how to accomplish this? Or would anyone happen to know if another framework like Swift/React-Native can accomplish this?

Related

How can I handle multiple formats in a video player used in my flutter application for android/ios?

The backend api sends a link url to the video that can be used in the video player's source. But the issue is, the source can vary from m4v, mp4 and other potential formats. I want to support the major formats, if not all, in the app.
Backend is in Django. I am open to modify the api code as well.
If the video formats need conversion, suggest the fastest way with an ETA for a ~100MB file.
Tried video_player plugin from pub.dev. At worse, I am thinking of integrating flutter_ffmpeg but I don't want users to wait and ruin their UX.

getting realtime audio stream from voip or sip systems

I am building an application that gets real-time audio from our organization's VoIP system, records the call and transcribe the real-time voice. The transcription then passed to our analytics engine and get the insights.
We are able to transcribe the recorded audio and get the insights from the transcription. We have a solution for real-time transcription also. It will transcribe the voice from the microphone and even an RTSP stream also. We are having trouble finding a solution for getting the real-time audio from SIP/VoIP systems. I read that SIP Trunking and option and also WebRTC is also another option. But I don't know how to and where to start with.
I am experienced in Java and Python, I requesting experts to give me suggestions or examples on how to get the real-time audio stream from a SIP/VoIP conversation.
I am not familiar with SIP/VoIP and never written VoIP application.
A solution that might suit your needs is Oreka, which is the open source version of Orecx, a call recording software for VoIP.
I used it in the past and it works perfectly well with SIP calls that use open audio codecs like g711 (alaw,ulaw) or speex but it may have problems decoding the audio of calls that use the propietary g729 codec (I had to work out my own codecs at that time).
The paid version might support more codecs and protocols like Avaya's H323.
Have in mind that this app works by sniffing the network, so the setup is not trivial. Anyway, I suggest you give it a try.
Link: https://www.orecx.com/open-source/
For anyone out there. if you want to have access to live/realtime audio data from a VoIP call I suggest you use Twilio Streams.
If you're just looking to get realtime transcriptions without access to the actual audio data Twilio and Plivo also provide that.

icecast online radio , how to implement playlist and broadcasting from microphone input

I want to implement an online radio that will live in my own server and admin will have the option to select mp3 files just as they do in a media player play list. And admin will also be able to pause the mp3 file playing and start broadcasting from the microphone input.
in order to implement that in an online Linux server -
1) which source client should I use that will be easy to fulfill my requirements ?
2) Should the mp3 files be uploaded in the server first to give the admin the ability to select it from there or should the ability be such that the mp3 files will be selected by browsing the hard drive? Which one is better for performance ?
just use Sam Broadcaster... its a software that is specially designed to be able to use your mic and play mp3, then send them to the icecast server, there are a couple of guides around the web, but overall its a simple program, its what i use at least, you dont need to upload anything either

iPhone: HTTP live streaming without any server side processing

I want to be able to (live) stream the frames/video FROM the iPhone camera to the internet. I've seen in a Thread (streaming video FROM an iPhone) that it's possible using AVCaptureSession's beginConfiguration and commitConfiguration. But I don't know how to start designing this task. There are already a lot of tutorials about how to stream video TO the iPhone, and it is not actually what I am searching for.
Could you guys give me any ideas which could help me further?
That's a tricky one. You should be able to do it, but it won't be easy.
One way that wouldn't be live (not answering your need, but worth mentioning) is to capture from the camera and save it to a video file. see the AV Foundation Guide on how to do that. Once saved you can then use the HTTP Live Streaming segmenter to generate the proper segments. Apple has applications for Mac OSX, but there's an open source version as well that you could adapt for iOS. On top of that, you'd also have to run an http server to serve those segments. Lots of http servers out there you could adapt.
But to do it live, first as you have already found, you need to collect frames from the camera. Once you have those you want to convert them to h.264. For that you want ffmpeg. Basically you shove the images to ffmpeg's AVPicture, making a stream. Then you'd need to manage that stream so that the live streaming segmenter recognized it as a live streaming h.264 device. I'm not sure how to do that, and it sounds like some serious work. Once you've done that, then you need to have an http server, serving that stream.
What might actually be easier would be to use an RTP/RTSP based stream instead. That approach is covered by open source versions of RTP and ffmpeg supports that fully. It's not http live streaming, but it will work well enough.

Streaming Audio Clips from iPhone to server

I'm wondering if there are any examples atomic examples out there for streaming audio FROM the iPhone to a server. I'm not interested in telephony or SIP style solutions, just a simple socket stream to send an audio clip, in .wav format, as it is being recorded. I haven't had much luck with the google or other obvious avenues, although there seem to be many examples of doing this the other way around.
i cant figure out how to register the unregistered account i initially posted with.
anyway, I'm not really interested in the audio format at present, just the streaming aspect. i want to take the microphone input, and stream it from the iphone to a server. i dont presently care about the transfer rate as ill initially just test from a wifi connection, not the 3g setup. the reason i cant cache it is because im interested in trying out some open source speech recognition stuffs for my undergraduate thesis. caching and then sending the recording is possible but then it takes considerably longer to get the voice data to the server. if i can start sending the data as soon as i start recording, then the response time is considerably improved because most of the data will have already reached the server by the time i let go of the record button. furthermore, if i can get this streaming functionality to work from the iphone then on the server side of things i can also start the speech recognizer as soon as the first bit of audio comes through. again this should considerably speech up the final amount of time that the transaction takes from the user perspective.
colin barrett mentions the phones and phone networks, but these are actually a pretty suboptimal solution for asr, mainly because they provide no good way to recover from errors - doing so over a voip dialogue is a horrible experience. however, the iphone and in particular the touch screen provide a great way to do that, through use of an ime or nbest lists for the other recognition candidates.
if i can figure out the basic architecture for streaming the audio, then i can start thinking about doing flac encoding or something to reduce the required transfer rate. maybe even feature extraction, although that limits the later ability to retrain the system with the recordings.