Streaming Audio Clips from iPhone to server

Streaming Audio Clips from iPhone to server - iphone

I'm wondering if there are any examples atomic examples out there for streaming audio FROM the iPhone to a server. I'm not interested in telephony or SIP style solutions, just a simple socket stream to send an audio clip, in .wav format, as it is being recorded. I haven't had much luck with the google or other obvious avenues, although there seem to be many examples of doing this the other way around.

i cant figure out how to register the unregistered account i initially posted with.
anyway, I'm not really interested in the audio format at present, just the streaming aspect. i want to take the microphone input, and stream it from the iphone to a server. i dont presently care about the transfer rate as ill initially just test from a wifi connection, not the 3g setup. the reason i cant cache it is because im interested in trying out some open source speech recognition stuffs for my undergraduate thesis. caching and then sending the recording is possible but then it takes considerably longer to get the voice data to the server. if i can start sending the data as soon as i start recording, then the response time is considerably improved because most of the data will have already reached the server by the time i let go of the record button. furthermore, if i can get this streaming functionality to work from the iphone then on the server side of things i can also start the speech recognizer as soon as the first bit of audio comes through. again this should considerably speech up the final amount of time that the transaction takes from the user perspective.
colin barrett mentions the phones and phone networks, but these are actually a pretty suboptimal solution for asr, mainly because they provide no good way to recover from errors - doing so over a voip dialogue is a horrible experience. however, the iphone and in particular the touch screen provide a great way to do that, through use of an ime or nbest lists for the other recognition candidates.
if i can figure out the basic architecture for streaming the audio, then i can start thinking about doing flac encoding or something to reduce the required transfer rate. maybe even feature extraction, although that limits the later ability to retrain the system with the recordings.

Related

Google Assistant for voice-input game

I'd like to develop a game/skill on Google Assistant that requires the following, once the user has entered the game/session (“hey Google, start game123”)
playing an audio file that is a few minutes long
playing a second audio file while the first clip is still playing
always listening. While the files are playing, the game needs to listen and respond for specific voice phrases without the “Hey Google” keyword.
Are these capabilities supported? Thanks in advance.

"Maybe." A lot of it depends what devices on the Actions on Google platform you're looking to support and how necessary some of the requirements are. Depending on your needs, you may be able to play some tricks.
Playing an audio file that is "a few minutes" long.
You can play audio using SSML that is up to 120 seconds long. But that will be played before the microphone is opened to accept a response.
For longer files, you can use a Media Response. This has the interesting feature that when the audio finishes, an event will be sent to your server, so you have some limited way to handle timed responses and looping. On the downside - users have to say "Hey Google" to interrupt it. (And there are currently some bugs when using it.)
Since you're doing a game, you can take advantage of the Interactive Canvas. This will let you use things such as the HTML <audio> tag and the Web Audio API. The big downside is that this is only available on Smart Displays and Android devices - you can't use it on Smart Speakers.
Playing multiple audio tracks
Google has an extension to SSML that allows parallel audio tracks for multiple spoken and audio output. But you can't layer these on top of a Media Response.
If you're using the Web Audio API with the Interactive Canvas, I believe it supports multiple simultaneous inputs.
Can I leave the microphone open so they don't have to say "Hey Google" every time.
Probably not, but this may not be a good idea in some cases, anyway.
For Smart Speakers, you can't do this. People are used to something conversational, so they're waiting for the silence to know when they should be saying something. If you are constantly providing audio, they don't necessarily know when it is their "turn".
With the Interactive Canvas devices, we have a display that we can work with that cues them. And we can keep the microphone open during this time... at least to a point. The downside is that we don't know when the microphone is open and closed, so we can't duck the audio during this time. (At least not yet.)
Can I do what I want?
You're the only judge of that. It sounds like the Interactive Canvas might work well for your needs - but won't work everywhere. In some cases, you might be able to determine the capabilities of the device the user is playing with and present slightly different games depending on the features you have. Google does this, for example, with their "Lucky Trivia" game.

Streaming audio from a microphone on a Mac to an iPhone

I'm working on a personal project where the iPhone connects to a server-type application running on a Mac. The iPhone send and receives textual/ASCII data via standard sockets. I now need to stream the microphone from the Mac to the iPhone. I've done some work with AudioServices before but wanted to check my thoughts here before getting too deep.
I'm thinking I can:
1. Create an Audio Queue in the standard Cocoa application on the Mac.
2. In my Audio Queue Callback function, rather than writing it to a file, write it to another socket I open for audio streaming.
3. On the iPhone, receive the raw sampled/encoded audio data from the TCP stream and dump it into an Audio Queue Player which outputs to headphone/speaker.
I know this is no small task and I've greatly simplified what I need to do but could it be as easy as that?
Thanks for any help you can provide,
Stateful

This looks broadly sensible, but you'll almost certainly need to do a few more things:
Buffering. On the "recording" end, you probably don't want to block the audio queue if the buffer is full. On the "playback" end, I don't think you can just pass buffers into the queue (IIRC you'll need to buffer it until you get a callback).
Concurrency. I'm pretty sure AQ callbacks happen on their own thread, so you'll need some sort of locking/barriers around your buffer accesses.
Buffer pools, if memory allocation ends up being a big overhead.
Compression. AQ might be able to give you "IMA4" frames (IMA ADPCM 4:1, or so); I'm not sure if it does hardware MP3 decompression on the iPhone.
Packetization, if e.g. you need to interleave voice chat with text chat.
EDIT: Playback sync (or whatever you're supposed to call it). You need to be able to handle different effective audio clock rates, whether it's due to a change in latency or something else. Skype does it by changing playback speed (with pitch-correction).
EDIT: Packet loss. You might be able to get away with using TCP over a short link, but that depends a lot on the quality of your wireless network. UDP is a minor pain to get right (especially if you have to detect an MTU hole).
Depending on your data rates, it might be worthwhile going for the lower-level (BSD) socket API and potentially even using readv()/writev().
If all you want is an "online radio" service and you don't care about the protocol used, it might be easier to use AVPlayer/MPMoviePlayer to play audio from a URL instead. This involves implementing a server which speaks Apple's HTTP streaming protocol; I believe Apple has some sample code that does this.

Realtime Audio/Video Streaming FROM iPhone to another device (Browser, or iPhone)

I'd like to get real-time video from the iPhone to another device (either desktop browser or another iPhone, e.g. point-to-point).
NOTE: It's not one-to-many, just one-to-one at the moment. Audio can be part of stream or via telephone call on iphone.
There are four ways I can think of...
Capture frames on iPhone, send
frames to mediaserver, have
mediaserver publish realtime video
using host webserver.
Capture frames on iPhone, convert to
images, send to httpserver, have
javascript/AJAX in browser reload
images from server as fast as
possible.
Run httpServer on iPhone, Capture 1 second duration movies on
iPhone, create M3U8 files on iPhone, have the other
user connect directly to httpServer on iPhone for
liveStreaming.
Capture 1 second duration movies on
iPhone, create M3U8 files on iPhone,
send to httpServer, have the other
user connected to the httpServer
for liveStreaming. This is a good answer, has anyone gotten it to work?
Is there a better, more efficient option?
What's the fastest way to get data off the iPhone? Is it ASIHTTPRequest?
Thanks, everyone.

Sending raw frames or individual images will never work well enough for you (because of the amount of data and number of frames). Nor can you reasonably serve anything from the phone (WWAN networks have all sorts of firewalls). You'll need to encode the video, and stream it to a server, most likely over a standard streaming format (RTSP, RTMP). There is an H.264 encoder chip on the iPhone >= 3GS. The problem is that it is not stream oriented. That is, it outputs the metadata required to parse the video last. This leaves you with a few options.
Get the raw data and use FFmpeg to encode on the phone (will use a ton of CPU and battery).
Write your own parser for the H.264/AAC output (very hard)
Record and process in chunks (will add latency equal to the length of the chunks, and drop around 1/4 second of video between each chunk as you start and stop the sessions).

"Record and process in chunks (will add latency equal to the length of the chunks, and drop around 1/4 second of video between each chunk as you start and stop the sessions)."
I have just wrote such a code, but it is quite possible to eliminate such a gap by overlapping two AVAssetWriters. Since it uses the hardware encoder, I strongly recommend this approach.

We have similar needs; to be more specific, we want to implement streaming video & audio between an iOS device and a web UI. The goal is to enable high-quality video discussions between participants using these platforms. We did some research on how to implement this:
We decided to use OpenTok and managed to pretty quickly implement a proof-of-concept style video chat between an iPad and a website using the OpenTok getting started guide. There's also a PhoneGap plugin for OpenTok, which is handy for us as we are not doing native iOS.
Liblinphone also seemed to be a potential solution, but we didn't investigate further.
iDoubs also came up, but again, we felt OpenTok was the most promising one for our needs and thus didn't look at iDoubs in more detail.

Turning an iPhone or iPod into a wireless webcam

I'd like to stream video from the camera on an iOS device to a receiver via wifi, in effect turning the device into a wireless webcam. Is there a way to build a small app that captures video input on an iOS app and sends it via an RTSP stream or similar?
As this is an ad hoc experiment, I'm not concerned about App Store guidelines and can jailbreak if necessary.

If I interpret your question correctly you more or less need to solve four problems:
Get the camera feed.
Convert/encode this to the right format.
Stream the data.
Prevent the phone from locking itself and going into deep sleep.
The first one is fairly simple and Apple has as always provided good documentation and examples -> API link. Make sure you check out their example in the end as you will get a CMSampleBufferRef data object back.
For the second and third part, you should check out the CFNetwork framework and specially CFFTPStream for streaming using FTP.
If your are only building this for yourself then you can always turn off the Auto-Lock feature in the settings. If you on the other hand would like to distribute this to other users you could use a trick to play a mute sound every 10 seconds. This is more or less how all the alarm clocks work in the App Store. Here's a tutorial. =)
I hope I helped a little bit at least.
Good luck and best regards!

I'm 70% of the way to doing the same thing. Here's how I did it:
Capture content from video input
Chop video into files for use in HTML Live Streaming.
Spin up a web server on the iPhone and make the video files available.
Connect to the IP address of the phone and viola! you've got live streaming video.
Last time I touched the code I was trying to debug my Live Streaming not working. I'll try and get my source code posted on github this weekend, if you'd like to take a look.

Live streaming in iPhone?

I have read a lot of posts about live streaming in iPhone, but none of them really works.
The project I want to work out is as follow:
There is a MUTE movie streaming in a movie theater. I want to get the time code (the position it is playing) through wifi and makes iPhone/iPod Touch to play/stream an audio track at the same time code.
May I ask how to achieve it?
UPDATE: Latency is expected and will be taken into consideration. Small time difference is acceptable in this case.

The variable nature of a wireless connection and the latency involved will completely obliterate the video/audio sync you are trying to achieve.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse