Google Assistant for voice-input game

Google Assistant for voice-input game - actions-on-google

I'd like to develop a game/skill on Google Assistant that requires the following, once the user has entered the game/session (“hey Google, start game123”)
playing an audio file that is a few minutes long
playing a second audio file while the first clip is still playing
always listening. While the files are playing, the game needs to listen and respond for specific voice phrases without the “Hey Google” keyword.
Are these capabilities supported? Thanks in advance.

"Maybe." A lot of it depends what devices on the Actions on Google platform you're looking to support and how necessary some of the requirements are. Depending on your needs, you may be able to play some tricks.
Playing an audio file that is "a few minutes" long.
You can play audio using SSML that is up to 120 seconds long. But that will be played before the microphone is opened to accept a response.
For longer files, you can use a Media Response. This has the interesting feature that when the audio finishes, an event will be sent to your server, so you have some limited way to handle timed responses and looping. On the downside - users have to say "Hey Google" to interrupt it. (And there are currently some bugs when using it.)
Since you're doing a game, you can take advantage of the Interactive Canvas. This will let you use things such as the HTML <audio> tag and the Web Audio API. The big downside is that this is only available on Smart Displays and Android devices - you can't use it on Smart Speakers.
Playing multiple audio tracks
Google has an extension to SSML that allows parallel audio tracks for multiple spoken and audio output. But you can't layer these on top of a Media Response.
If you're using the Web Audio API with the Interactive Canvas, I believe it supports multiple simultaneous inputs.
Can I leave the microphone open so they don't have to say "Hey Google" every time.
Probably not, but this may not be a good idea in some cases, anyway.
For Smart Speakers, you can't do this. People are used to something conversational, so they're waiting for the silence to know when they should be saying something. If you are constantly providing audio, they don't necessarily know when it is their "turn".
With the Interactive Canvas devices, we have a display that we can work with that cues them. And we can keep the microphone open during this time... at least to a point. The downside is that we don't know when the microphone is open and closed, so we can't duck the audio during this time. (At least not yet.)
Can I do what I want?
You're the only judge of that. It sounds like the Interactive Canvas might work well for your needs - but won't work everywhere. In some cases, you might be able to determine the capabilities of the device the user is playing with and present slightly different games depending on the features you have. Google does this, for example, with their "Lucky Trivia" game.

Related

What protocol to use when sending video to multiple devices simultaneously

I don't know if this is the right place to ask, but I have a question..
I am working on a students project where we want to stream a video from one server to multiple devices. But the video should only play on one display at a time. The other displays should be black. But if you switch to another display, the video should continue seamlessly. (Please ask if you need clarification)
The video outputs can be plain displays or with additional servers so a wide range of protocols can be implemented. Wifi connection is possible. Web solutions for running the video in a browser is also possible.
I thought of a lot of alternatives like DLNA, RTP or Chromecast. But I don't know where to start and which is the right solution. It is important that the video that is streamed from the server can be continued on any display seamlessly.
If would mean much to me if you'd give me a hint.

How can I record currently playing audio on the iPhone?

I'd like to record what the iPhone is currently outputting. So I'm thinking about recording audio from Apps like Music (iPod), Skype, any Radio Streaming App, Phone, Instacast... I don't want to record my own audio or the mic input.
Is there an official way to do this? How do I do it? It seems like AVAudioRecorder does not allow this, can somebody confirm?

Officially you can't. The audio stream belongs to the app playing it ,and iOS.
The Sandbox paradigm means that a resource owned by your App can't be used by another App. Resource here means Audio/Video stream or file. Exceptions are when a mediator like Document interaction controller are used.
If you want to do this you'd have to start with deducing AVFoundation's private methods and find out if theres a way there. Needless to say this it wouldn't be saleable on the App store and will probably only be possible on a jailbreak.
Good Luck.

TLDR;
This is only feasible only from time to time, as it's a time expensive process.
You can record the screen while listening your songs on Spotify, Music or whatever music application.
This will generate a video on your Photos application. That video can be converted on MP3 from your computer.

Actually, this is not true. The screen recordings will not actually have the audio from Apple Music at all, as it blocks it. Discord also uses this pipe as well, so you cannot record Discord audio either this way.

Realtime Audio/Video Streaming FROM iPhone to another device (Browser, or iPhone)

I'd like to get real-time video from the iPhone to another device (either desktop browser or another iPhone, e.g. point-to-point).
NOTE: It's not one-to-many, just one-to-one at the moment. Audio can be part of stream or via telephone call on iphone.
There are four ways I can think of...
Capture frames on iPhone, send
frames to mediaserver, have
mediaserver publish realtime video
using host webserver.
Capture frames on iPhone, convert to
images, send to httpserver, have
javascript/AJAX in browser reload
images from server as fast as
possible.
Run httpServer on iPhone, Capture 1 second duration movies on
iPhone, create M3U8 files on iPhone, have the other
user connect directly to httpServer on iPhone for
liveStreaming.
Capture 1 second duration movies on
iPhone, create M3U8 files on iPhone,
send to httpServer, have the other
user connected to the httpServer
for liveStreaming. This is a good answer, has anyone gotten it to work?
Is there a better, more efficient option?
What's the fastest way to get data off the iPhone? Is it ASIHTTPRequest?
Thanks, everyone.

Sending raw frames or individual images will never work well enough for you (because of the amount of data and number of frames). Nor can you reasonably serve anything from the phone (WWAN networks have all sorts of firewalls). You'll need to encode the video, and stream it to a server, most likely over a standard streaming format (RTSP, RTMP). There is an H.264 encoder chip on the iPhone >= 3GS. The problem is that it is not stream oriented. That is, it outputs the metadata required to parse the video last. This leaves you with a few options.
Get the raw data and use FFmpeg to encode on the phone (will use a ton of CPU and battery).
Write your own parser for the H.264/AAC output (very hard)
Record and process in chunks (will add latency equal to the length of the chunks, and drop around 1/4 second of video between each chunk as you start and stop the sessions).

"Record and process in chunks (will add latency equal to the length of the chunks, and drop around 1/4 second of video between each chunk as you start and stop the sessions)."
I have just wrote such a code, but it is quite possible to eliminate such a gap by overlapping two AVAssetWriters. Since it uses the hardware encoder, I strongly recommend this approach.

We have similar needs; to be more specific, we want to implement streaming video & audio between an iOS device and a web UI. The goal is to enable high-quality video discussions between participants using these platforms. We did some research on how to implement this:
We decided to use OpenTok and managed to pretty quickly implement a proof-of-concept style video chat between an iPad and a website using the OpenTok getting started guide. There's also a PhoneGap plugin for OpenTok, which is handy for us as we are not doing native iOS.
Liblinphone also seemed to be a potential solution, but we didn't investigate further.
iDoubs also came up, but again, we felt OpenTok was the most promising one for our needs and thus didn't look at iDoubs in more detail.

Turning an iPhone or iPod into a wireless webcam

I'd like to stream video from the camera on an iOS device to a receiver via wifi, in effect turning the device into a wireless webcam. Is there a way to build a small app that captures video input on an iOS app and sends it via an RTSP stream or similar?
As this is an ad hoc experiment, I'm not concerned about App Store guidelines and can jailbreak if necessary.

If I interpret your question correctly you more or less need to solve four problems:
Get the camera feed.
Convert/encode this to the right format.
Stream the data.
Prevent the phone from locking itself and going into deep sleep.
The first one is fairly simple and Apple has as always provided good documentation and examples -> API link. Make sure you check out their example in the end as you will get a CMSampleBufferRef data object back.
For the second and third part, you should check out the CFNetwork framework and specially CFFTPStream for streaming using FTP.
If your are only building this for yourself then you can always turn off the Auto-Lock feature in the settings. If you on the other hand would like to distribute this to other users you could use a trick to play a mute sound every 10 seconds. This is more or less how all the alarm clocks work in the App Store. Here's a tutorial. =)
I hope I helped a little bit at least.
Good luck and best regards!

I'm 70% of the way to doing the same thing. Here's how I did it:
Capture content from video input
Chop video into files for use in HTML Live Streaming.
Spin up a web server on the iPhone and make the video files available.
Connect to the IP address of the phone and viola! you've got live streaming video.
Last time I touched the code I was trying to debug my Live Streaming not working. I'll try and get my source code posted on github this weekend, if you'd like to take a look.

Streaming Audio Clips from iPhone to server

I'm wondering if there are any examples atomic examples out there for streaming audio FROM the iPhone to a server. I'm not interested in telephony or SIP style solutions, just a simple socket stream to send an audio clip, in .wav format, as it is being recorded. I haven't had much luck with the google or other obvious avenues, although there seem to be many examples of doing this the other way around.

i cant figure out how to register the unregistered account i initially posted with.
anyway, I'm not really interested in the audio format at present, just the streaming aspect. i want to take the microphone input, and stream it from the iphone to a server. i dont presently care about the transfer rate as ill initially just test from a wifi connection, not the 3g setup. the reason i cant cache it is because im interested in trying out some open source speech recognition stuffs for my undergraduate thesis. caching and then sending the recording is possible but then it takes considerably longer to get the voice data to the server. if i can start sending the data as soon as i start recording, then the response time is considerably improved because most of the data will have already reached the server by the time i let go of the record button. furthermore, if i can get this streaming functionality to work from the iphone then on the server side of things i can also start the speech recognizer as soon as the first bit of audio comes through. again this should considerably speech up the final amount of time that the transaction takes from the user perspective.
colin barrett mentions the phones and phone networks, but these are actually a pretty suboptimal solution for asr, mainly because they provide no good way to recover from errors - doing so over a voip dialogue is a horrible experience. however, the iphone and in particular the touch screen provide a great way to do that, through use of an ime or nbest lists for the other recognition candidates.
if i can figure out the basic architecture for streaming the audio, then i can start thinking about doing flac encoding or something to reduce the required transfer rate. maybe even feature extraction, although that limits the later ability to retrain the system with the recordings.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse