Instead of generating mp3 voice files can we make input text automatically speak in text to speech in google-cloud-api while sending text - google-text-to-speech

I need to convert input text to speech , but web speech api it is automatically speaking the text without generating any audio files but there is no custom voices, so i am trying to use the google cloud api text to speech but i think in google api we have to generate the mp3 files then we have to use that files, my question is how can we make google cloud text to speech speak automatically without generating audio files like mp3..etc like web speech api.

Related

Convert speech to text in flutter for desktop(Windows) application

I am developing a cross platform application in flutter and cannot find any sources for converting speech to text in desktop(windows).
I tried using packages like speech_to_text
I even tried google_speech in which it require an audio file to transcribe it to text. But, I could not find any package for taking microphone input in windows so that I can pass it to an API and get the text.
Is there any possible solution for this?

Google Cloud Speech API and metadata industry naics code of audio

When using the Google Cloud Speech API does adding metadata ex. the industry naics code of audio ex. https://www.naics.com/search, influence the speech recognition? ie, would adding the naics code improve for recognition inline with the indicated vertical?
No, they do not have this feature.

Photon Voice chat and Speech to text not working together

I am creating multiplayer game using photon. The game also supports photon voice.
I do want to support some bot mechanism where user can ask some questions to bot. With predefined command to bot (hey dummybot), it can understand the question and convert it to text.
I am using below plugins
Photon voice
https://assetstore.unity.com/packages/tools/audio/photon-voice-45848
Speech to text
https://assetstore.unity.com/packages/add-ons/machinelearning/google-cloud-speech-recognition-vr-ar-desktop-desktop-72625
Both these plugins need access to microphone.
The problem I am facing is, If I am connected to photon voice (which understand the speech and transmit it to other network players) and same time trying to convert the same speech to text using speech to text plugin, it's not allowing me to do so. Speech to text failed to connect to microphone as photon voice is already using it
Is it possible to get microphone access to both plugins? How can I achieve that?
So to someone who's interested to know, I found one workaround to this.
Create audio clip using microphone data
Save audio clip
Pass audioclip time photon voice for network transmission
Using Google cloud speech to text api, convert audioclip to text

Encoding of audio (mp3, mp4, m4a, ogg) file for smooth streaming window media services

I want to encode the audio file (mp3, mp4, m4a, ogg) for the streaming and want to play (I want to play encoded file smoothly) using the HTML5 player but I think HTML5 player.
So now what I am doing, I am uplaoding a file and econding this file on windows Azure Media Services using the preset "AAC Good Quality Audio". It encode the file with .mp4 file format and then I create SAS locator to run this file, it works well but the problem is that user can download it too which I don't want to allow.
If I create the OnDemandOrigin locator of the same encoded asset, it gives me 404 erroe. It means we can not play it.
Below are the steps that I have used to upload the file on Azure Media Services:
Created the empty assest.
Upload the file into the asset.
Then create the new task job to encode the audio file.
I have successfully encoded the file but when I try to generate the origin url it generate the url but when I browse the file I get
the error 404.
My queries:
"AAC Good Quality Audio" preset is the right for my task?
How can I restrict the user to download the file, if I use sas locator.
Is it possible to play the encoded file using origin locator.
Can I encode audio files for smooth streaming ? If I can then which player I should use to run the encoded file for all browsers, IOS devices and android devices.
If you want further details please feel free to ask me.
Awaiting your response.
Thanks
If your user is able to listen to the audio you're publishing, they will also be able to download the file. This you can not prevent. At best, you can make it difficult, but not impossible. More to the point, Media Services at its current incarnation has no way for you to do authorization of any kind, so the only tool you've got is time-bombed SAS locators.
The typical solution for this problem is to use DRM. Media Services supports PlayReady encryption, but you need to either have a PlayReady server or purchase it as a service (there is currently a service in the Azure Marketplace that provides PlayReady for a monthly price).
See following article how to protect assets with Microsoft PlayReady technology
Origin Locators are something you would use to publish a Smooth Stream or HLS asset. It is not useful for regular media files, as it is internally something equivalent to an IIS Media Services endpoint. For regular media files, you can just as well host them in Blob Storage -- and refer to them via the SAS locator.
There is currently no single format that will play across all devices and operating systems. You can get Smooth Streaming to work on most Windows and Mac computers (possibly Linux, too), either with Silverlight or with the Smooth Streaming Plugin for the Flash-based OSMF. For iOS devices you will need to encode to HLS and use the HTML5 video tag. Microsoft Media Platform will support MPEG-DASH, a recently ratified ISO/IEC standard for dynamic adaptive streaming over HTTP.More details how to use DASH preview feature can be found here
If you want smooth streaming for audio only, it looks like you will have to create a video asset with an empty video stream -- although there is a Uservoice request to add support for audio only in the future.

Convert the voice to text in iPhone

I want to ask a question about the iPhone application. Does Apple provide any API to the developers to record the phone call and convert it to text message? Thank you.
In short, there are no APIs for recording phone calls or converting text to speech. You will need to create a speech recognition engine. I suspect the iPhone hardware will not be powerful enough to handle that type of processing though.
FYI...
APIs for converting Voice/Audio data in to text
API for Voice recognition in among group
iPhone speech recognition API?