According to documentation, if you are using SSML for playing audio, you're limited to "120 seconds maximum duration".
Is there other way to start playing longer media (audio) with "Actions on Google" SDK on Google Home?
Example usage: Meditation sounds that can last longer than 2 minutes.
As responded by Leon Nicholls (Google Assistant Developer Programs Engineer), currently there is no way to play audio longer than 120 sec. However they are considering that in the future.
Please check Media responses. You can play MP3 file longer than 2 minutes.
I ran into the same issue. I created a Voice Metronome app which need to play for a long time. On the Google home, I was able to get it to play for 10 minutes. On the phone the limit was 2. I don't know if the new speaker devices have longer play times and I don't know how long they will allow this. Just an FYI.
Related
I'd like to develop a game/skill on Google Assistant that requires the following, once the user has entered the game/session (“hey Google, start game123”)
playing an audio file that is a few minutes long
playing a second audio file while the first clip is still playing
always listening. While the files are playing, the game needs to listen and respond for specific voice phrases without the “Hey Google” keyword.
Are these capabilities supported? Thanks in advance.
"Maybe." A lot of it depends what devices on the Actions on Google platform you're looking to support and how necessary some of the requirements are. Depending on your needs, you may be able to play some tricks.
Playing an audio file that is "a few minutes" long.
You can play audio using SSML that is up to 120 seconds long. But that will be played before the microphone is opened to accept a response.
For longer files, you can use a Media Response. This has the interesting feature that when the audio finishes, an event will be sent to your server, so you have some limited way to handle timed responses and looping. On the downside - users have to say "Hey Google" to interrupt it. (And there are currently some bugs when using it.)
Since you're doing a game, you can take advantage of the Interactive Canvas. This will let you use things such as the HTML <audio> tag and the Web Audio API. The big downside is that this is only available on Smart Displays and Android devices - you can't use it on Smart Speakers.
Playing multiple audio tracks
Google has an extension to SSML that allows parallel audio tracks for multiple spoken and audio output. But you can't layer these on top of a Media Response.
If you're using the Web Audio API with the Interactive Canvas, I believe it supports multiple simultaneous inputs.
Can I leave the microphone open so they don't have to say "Hey Google" every time.
Probably not, but this may not be a good idea in some cases, anyway.
For Smart Speakers, you can't do this. People are used to something conversational, so they're waiting for the silence to know when they should be saying something. If you are constantly providing audio, they don't necessarily know when it is their "turn".
With the Interactive Canvas devices, we have a display that we can work with that cues them. And we can keep the microphone open during this time... at least to a point. The downside is that we don't know when the microphone is open and closed, so we can't duck the audio during this time. (At least not yet.)
Can I do what I want?
You're the only judge of that. It sounds like the Interactive Canvas might work well for your needs - but won't work everywhere. In some cases, you might be able to determine the capabilities of the device the user is playing with and present slightly different games depending on the features you have. Google does this, for example, with their "Lucky Trivia" game.
I would like to use chrome speech recognition WebKitSpeechRecognition() with the input of an audio file for testing purposes. I could use a virtual microphone but this is really hacky and hard to implement with automation, but when I tested it everything worked fine and the speechrecognition converted my audio file to text. now I wanted to use the following chrome arguments:
--use-file-for-fake-audio-capture="C:/url/to/audio.wav"
--use-fake-device-for-media-stream
--use-fake-ui-for-media-stream
This worked fine on voice recorder sites for example and I could hear the audio file play when I replayed the recording. But for some reason when I try to use this on WebKitSpeechRecognition of chrome then it doesn't use the fake audio device but instead my actual microphone. Is there any way I can fix this or test my audio files on the website? I am using C# and I couldn't really find any useful info on automatically adding, managing and configuring virtual audio devices. What approaches could I take?
Thanks in advance.
Well it turns out this is not possible because chrome and google check if you are using a fake mic ect, they do this specifically to prevent this kind of behavior so people cannot get free speech to text. There is a paid api available from google (first 60 minutes per month are free)
I need to develop an ionic app(for ios and android both) which can play audio at regular interval. e.g. every 2 hours in night only.
This should play even if app is in background.
Any other idea is also welcome.
I suggest using a native HTML5 player to reproduce your audio.
Take a look at this documentation: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/audio
There are a ton of SO posts on audio, HTML5, and mobile Safari, such as these:
Reusing HTML5 Audio Object in Mobile Safari
Autoplay an Audio File on Mobile Safari
Preloading HTML5 Audio in Mobile Safari
Will HTML5 support the access of offline cached audio?
However, they all are outdated.
We prefer solutions to support iOS 3+, but we will take anything that works -- even if it's restricted to iOS 5.
Anyone have the definitive answer as things stand today, or testers on iOS 5 have any insights?
Can audio files be cached in mobile Safari? If so, what are the limitations?
Is there a way to minimize lag or delay between pressing a button and playing a sound?
Thanks!
I recently wrote an HTML5 audio player. I had similar trouble with iOS4 and iOS5. First, the play has to be triggered by user input, meaning specifically that it has to be in the same call stack as a click event.
I tested this a lot, and iOS seemed to refuse to cache the audio at all. It fetched the audio with every play. I think this should be considered a bug, but perhaps they are trying to preserve local storage space (audio files can get rather large).
If your audio files are not too large, you might want to consider appending them together into a single file and then using pause / jump to position / play to switch between sounds. I haven't tried it, but it should work. I didn't use the technique because my app was a music player, and music files are a bit too large for that technique to be valuable.
I want to build an application where user when talks something on iphone it will convert into corresponding text.
I heard in windows platform it is possible.
Wheather this is possible in iphone ? Any API available for this ?
I used Nuance’s Dragon Speech SDK for this purpose.
Its free for developers and their SDK have a sample project for STT and TTS both.
Tried Speech to text using this SDK on iOS 9 and it works like a charm.
Here is the link.
https://developer.nuance.com/public/Help/DragonMobileSDKReference_iOS/SpeechKit_Guide/RecognizingSpeech.html
Limitations:
60 seconds recording time limit.
Recorded audio file is not accessible.
Pauses taken are detected as end of recording.
There's an app for that.
Search for "Dragon Speech".
The question has been asked a lot of times here already, this being one of these questions that received quite a few answers and good ideas.
There is no API for doing speech to text on the iPhone, but you can record the voice on the phone, send the recording to a server that runs the speech recognition software on Windows or whatever OS suits you best, then return the text results back to the phone.
It is possible on the iPhone. Pocketsphinx has been ported. For example, an app called cactus dialer uses pocketsphinx. No API has been published but its not hard to get it built. Many people have.
For full blown dictation it will be hard. You will need to make it server based like Nuance's 'dragon speech' does or accept a smaller vocabulary.