Can Google Cloud Text-to-Speech return the IPA transcription for a word? - google-text-to-speech

If I send kittens to Google Cloud Text-to-Speech can it return ˈkɪtnz? I need a Text-to-Speech service that provides both audiofiles and IPA transcriptions. The Oxford English Dictionary and IBM Watson will return both. Can Google Cloud do this?

Yes, Google Cloud Text-to-Speech can convert a text with IPA transcriptions by using Speech Synthesis Markup Language (SSML) phoneme tag. This tag can help you produce custom pronunciations, and on your end, IPA transcriptions.
This block of code can produce the pronunciation of kittens in IPA phoneme in English (United Kingdom), en-GB.
<speak>
<phoneme alphabet="ipa" ph=ˈkɪtnz>kittens</phoneme>
</speak>
Take note that there are different phonemes and levels of stress for each language. The ˈkɪtnz is under English (United Kingdom). You can go through this documentation to see the supported phonemes and levels of stress.

Related

Does using saved Google Text-to-Speech audio files violate Google Cloud Usage Terms?

My app has a list of fixed paragraphs that needs to be translated into speech. I plan to use Google's Text-to-Speech API to convert them into speech then download their audio files so that I don't need to constantly communicate with the API to translate them, considering that the paragraphs, once again, do not change.
Does this violate the Google Cloud Terms of Service restrictions?
Good news. It seems that caching synthesized audio files to avoid re-synthesization and promote cost saving is allowed with Google Text-to-Speech, as promoted by one of their use cases.

The API’s like Vision, Speech & Video intelligence use out of GCloud Infrastructure

The API’s like Vision, Speech & Video intelligence to use it in any web application, is it mandate to enrol for google cloud infrastructure?
"enrol for google cloud infrastructure" - Yes you need to create a project on google cloud console with your gmail ID.
Google is giving USD300 credit with your first project.
However you don't need to start a compute engine to use speech recognition services.
You will have to create buckets in google storage to upload long audio sequences in FLAC/PCM format.

Speech in a different language

I need to change the speech language for a specific response. I know I can change the TTS voice for the whole app, but I have not found a way to do that for a response. In this case, the supported user locales are English and German, but the text I want Google Assistant to speak is in Korean.
Interestingly, there is no problem if the user locale is German and the text is in English. However, when I tried to create a response with Korean text, there was no audio feedback.
Unfortunately, the Actions on Google platform does not have support for in-dialog language changes. The case you've outlined may be an exception based on certain languages having support for other-language words which are supported as a subset in the primary language.
One alternative you might consider here is using recorded spoken audio through SSML. This is a popular way to insert custom audio output into your app, which may make sense for your use case.

IBM Watson Speech to Text and webm

Currently IBM Watson Speech to Text service supports only “ogg” compressed format. However, a new standard for WebRTC platform is “webm”. As a result, we have to either use Firefox or send huge “wav” files without compression to Bluemix from a client browser. Is it possible to make support of “webm”?
The service added support for webm on April 10th, 2017. See the release notes. Additionally, here is a list of the audio formats supported by the service.

How to recognize the human voice by code in iphone?

I want to integrate voice detection in my iPhone app. The iPhone app allow the user to search the word by using their voice. But, i don't know a single info about Voice Recognition in iPhone. Can you please suggest me any ideas,tutorials or sample code for this?
You can also use Google Chrome API to integrate voice recognition on your application, but there is a big problem : the API works only with FLAC encoded files, but this encoding isn't supported natively on iOS... :/
You can see those 2 links for more information :
http://www.albertopasca.it/whiletrue/2011/09/objective-c-use-google-speech-iphone/
http://8byte8.com/blog/2012/07/voice-recognition-ios/
EDIT :
I realized an application including voice recognition using Nuance SDK, but it's not free to use. You can register for free and get a developer key that allows you to test your application for 90-days. An application example is included, you can see the code, it's very easy to implement.
Good luck :)
The best approach will probably be to:
Record the voice on the phone
Send the recording to a server that runs the speech recognition software
Then return something to the phone to indicate what it should do
This approach is favorable as there are a number of open source voice to text softwares out there & you are not limited by computing power in the backend.
Having said that, iOS has OpenEars which is based on Pocket Sphinx. It looks promising...
Well voice recognition is not correlated with iphone. All you can do is record the voice in iphone. Once done, you can either code your one voice recognition module, or find a third party API and reuse it.
You can do google search on that.