chrome speech recognition WebKitSpeechRecognition() not accepting input of fake audio device --use-file-for-fake-audio-capture or audio file - google-chrome-devtools

I would like to use chrome speech recognition WebKitSpeechRecognition() with the input of an audio file for testing purposes. I could use a virtual microphone but this is really hacky and hard to implement with automation, but when I tested it everything worked fine and the speechrecognition converted my audio file to text. now I wanted to use the following chrome arguments:
--use-file-for-fake-audio-capture="C:/url/to/audio.wav"
--use-fake-device-for-media-stream
--use-fake-ui-for-media-stream
This worked fine on voice recorder sites for example and I could hear the audio file play when I replayed the recording. But for some reason when I try to use this on WebKitSpeechRecognition of chrome then it doesn't use the fake audio device but instead my actual microphone. Is there any way I can fix this or test my audio files on the website? I am using C# and I couldn't really find any useful info on automatically adding, managing and configuring virtual audio devices. What approaches could I take?
Thanks in advance.

Well it turns out this is not possible because chrome and google check if you are using a fake mic ect, they do this specifically to prevent this kind of behavior so people cannot get free speech to text. There is a paid api available from google (first 60 minutes per month are free)

Related

Google Assistant for voice-input game

I'd like to develop a game/skill on Google Assistant that requires the following, once the user has entered the game/session (“hey Google, start game123”)
playing an audio file that is a few minutes long
playing a second audio file while the first clip is still playing
always listening. While the files are playing, the game needs to listen and respond for specific voice phrases without the “Hey Google” keyword.
Are these capabilities supported? Thanks in advance.
"Maybe." A lot of it depends what devices on the Actions on Google platform you're looking to support and how necessary some of the requirements are. Depending on your needs, you may be able to play some tricks.
Playing an audio file that is "a few minutes" long.
You can play audio using SSML that is up to 120 seconds long. But that will be played before the microphone is opened to accept a response.
For longer files, you can use a Media Response. This has the interesting feature that when the audio finishes, an event will be sent to your server, so you have some limited way to handle timed responses and looping. On the downside - users have to say "Hey Google" to interrupt it. (And there are currently some bugs when using it.)
Since you're doing a game, you can take advantage of the Interactive Canvas. This will let you use things such as the HTML <audio> tag and the Web Audio API. The big downside is that this is only available on Smart Displays and Android devices - you can't use it on Smart Speakers.
Playing multiple audio tracks
Google has an extension to SSML that allows parallel audio tracks for multiple spoken and audio output. But you can't layer these on top of a Media Response.
If you're using the Web Audio API with the Interactive Canvas, I believe it supports multiple simultaneous inputs.
Can I leave the microphone open so they don't have to say "Hey Google" every time.
Probably not, but this may not be a good idea in some cases, anyway.
For Smart Speakers, you can't do this. People are used to something conversational, so they're waiting for the silence to know when they should be saying something. If you are constantly providing audio, they don't necessarily know when it is their "turn".
With the Interactive Canvas devices, we have a display that we can work with that cues them. And we can keep the microphone open during this time... at least to a point. The downside is that we don't know when the microphone is open and closed, so we can't duck the audio during this time. (At least not yet.)
Can I do what I want?
You're the only judge of that. It sounds like the Interactive Canvas might work well for your needs - but won't work everywhere. In some cases, you might be able to determine the capabilities of the device the user is playing with and present slightly different games depending on the features you have. Google does this, for example, with their "Lucky Trivia" game.

Syntax for audio only questions (SSML) in AoG Trivia sample

I'm using the AoG Trivia sample code (there's so much depth to this code!) that it's easier for me to grapple with its functions. I'm trying to create audio-only questions (I host .ogg files in a GCP bucket), but when I use the ssml method in ssml.js .audio, it fails to use the url to speak the .ogg file. Is there a special way to enter the questions in the question.json file, that are urls to audio files? I checked that the ssml was valid using the simulator.
Thanks for your help!
OK, so my bad, in the code I was leaving out the AUDIO_BASE_URL which is used to point where the hosted audio files are in Firebase. However ... a new problem has arisen, but I'll close this question. (I get different behaviour of playing the audio on the simulator&Google Assistant on Android vs Google Home, coupled with some intermittent network time-outs - I've raised it with Google :)

How to recognize the human voice by code in iphone?

I want to integrate voice detection in my iPhone app. The iPhone app allow the user to search the word by using their voice. But, i don't know a single info about Voice Recognition in iPhone. Can you please suggest me any ideas,tutorials or sample code for this?
You can also use Google Chrome API to integrate voice recognition on your application, but there is a big problem : the API works only with FLAC encoded files, but this encoding isn't supported natively on iOS... :/
You can see those 2 links for more information :
http://www.albertopasca.it/whiletrue/2011/09/objective-c-use-google-speech-iphone/
http://8byte8.com/blog/2012/07/voice-recognition-ios/
EDIT :
I realized an application including voice recognition using Nuance SDK, but it's not free to use. You can register for free and get a developer key that allows you to test your application for 90-days. An application example is included, you can see the code, it's very easy to implement.
Good luck :)
The best approach will probably be to:
Record the voice on the phone
Send the recording to a server that runs the speech recognition software
Then return something to the phone to indicate what it should do
This approach is favorable as there are a number of open source voice to text softwares out there & you are not limited by computing power in the backend.
Having said that, iOS has OpenEars which is based on Pocket Sphinx. It looks promising...
Well voice recognition is not correlated with iphone. All you can do is record the voice in iphone. Once done, you can either code your one voice recognition module, or find a third party API and reuse it.
You can do google search on that.

How to convert speech to text in iphone?

I want to build an application where user when talks something on iphone it will convert into corresponding text.
I heard in windows platform it is possible.
Wheather this is possible in iphone ? Any API available for this ?
I used Nuance’s Dragon Speech SDK for this purpose.
Its free for developers and their SDK have a sample project for STT and TTS both.
Tried Speech to text using this SDK on iOS 9 and it works like a charm.
Here is the link.
https://developer.nuance.com/public/Help/DragonMobileSDKReference_iOS/SpeechKit_Guide/RecognizingSpeech.html
Limitations:
60 seconds recording time limit.
Recorded audio file is not accessible.
Pauses taken are detected as end of recording.
There's an app for that.
Search for "Dragon Speech".
The question has been asked a lot of times here already, this being one of these questions that received quite a few answers and good ideas.
There is no API for doing speech to text on the iPhone, but you can record the voice on the phone, send the recording to a server that runs the speech recognition software on Windows or whatever OS suits you best, then return the text results back to the phone.
It is possible on the iPhone. Pocketsphinx has been ported. For example, an app called cactus dialer uses pocketsphinx. No API has been published but its not hard to get it built. Many people have.
For full blown dictation it will be hard. You will need to make it server based like Nuance's 'dragon speech' does or accept a smaller vocabulary.

iPhone MP3 Streaming alternative to Segmenting

I have run into a bit of a problem. I built an iPhone app that streams my podcasts via the MPMoviePlayerController. Apple will not approve it because it can use too much bandwidth over the Carrier Network. So their workaround is to use a Stream Segmenter. I am unable to install a stream segmenter on my server. Are their ANY other solutions people have come up with that can help me stream my podcast to iPhone devices? Even if I have to make it a Web Application as opposed to a native application.
Thanks,
John
You could use a simple service like Encoding.com to create iphone segmented ondemand versions of your files for multi bitrate adaptive playback. You could also provide a high and low quality and only display the high when the reachability class shows that your using wifi. I had to do the second option to get one of my apps to pass approval. Hope this helps!
Well if you don't want a native app, I think you can just put a video link on a webpage and when the user clicks it Quicktime will take over and play the file. It will play the file as it downloads it.
I don't have any experience streaming large files over the iPhone, so I can't help guide you on alternatives and keeping it a native app.