If I use the <say-as interpret-as="characters"></say-as> tag in my voice response, google assistant suddenly pronounces the whole response differently. It sounds as if google assistant removes punctuation without any reason. The pauses, which are initiated by the ".", are suddenly gone.
To reproduce this behaviour, just launch the actions on google simulator, choose "English (United States)" as langauge and listen to this audio snippet:
<speak>This is a test with number spelled as digit. The number is <say-as interpret-as="characters">12345</say-as>. Ask me "What to do with this number?" if you'd like to have more information.</speak>
If you remove the <say-as interpret-as="characters"></say-as> tag, the pronounciation works again:
<speak>This is a test with number spelled as digit. The number is 12345. Ask me "What to do with this number?" if you'd like to have more information.</speak>
This now sounds like it should sound. You will get the same result if you leave out the speak tag at all.
In German, this issue is even more critical. Using <say-as interpret-as="characters"></say-as> in a German voice response leads to a response which honestly is barely understandable.
This is because of "text normalization" in Google Assistant's TTS process.
Because of that, SSML without using the <s> element won't always do what you expect for pauses if you are using embedded markup like <say-as>.
Here's an example using <s> to provide intended pauses in the TTS:
<speak>
This is a test with number spelled as digit.
<s>The number is <say-as interpret-as="characters">12345</say-as>.</s>
Ask me "What to do with this number?" if you'd like to have more information.
</speak>
You can still feel free to input multiple sentences separated by periods into SSML without needing <s> to separate sentences and let Google Assistant handle the break generation as long as you are not using embedded markup like <say-as>.
More information about the <s> element can be found in the docs where it talks about <p>,<s>
https://developers.google.com/actions/reference/ssml
It will link to the W3C spec on those elements.
Related
As stated in the question : I am developing an app for French kids to learn English vocabulary. I have added a SpeechToText functionality using the SpeechToText package, which really works well.
However, I am hitting a hard rock now... One of the activities proposed to the students is simply "Listen and repeat", so that they progressively improve their pronunciation.
I thought of using the SpeechToText package as well for this... and it would work if the students pronounced the words quite well.... One example : The sound "TH" for a French speaker is problematic and is very often pronounced as a "Z"... so the app never really recognize a word like "Father"... it keeps thinking the user says "Fazza".
Is there a way to compare the "good pronunciation" of a word to what the user says... get a percentage of "similitude". I know we can compare strings that way.
Would anyone know of a solution for this issue ? Any advice ?
You can use the speechace APIs to get the following features:
Pronunciation Assessment
Fluency Assessment
Spontaneous speech assessment
I found that when I'm sending back a response with <emphasis> inside of sentence Actions of Google considers it as a new paragraph and breaks the sentance.
For example:
<speak>
<p>
Do you like <emphasis level="strong">red</emphasis>, <emphasis level="strong">blue</emphasis> or <emphasis level="strong">green</emphasis> car?
</p>
<p>
Do you like red, blue or green car?
</p>
</speak>
Here is rendered mp3.
This bug could be replicated in TTS Simulator. Or sent as a response by official node.js SDK
I have added 2 sentences here and <p> to highlight the difference. But, actually, it isn't necessary here. And the problem could be replicated without those tags.
According to SSML standard <emphasis> could be in the middle of a sentence.
In additional. I've tried the same tag on Amazon Alexa and it doesn't consider it as a new line.
PS:
More over Actions of Google breaks rendering of speech text and adds an extra newline before/after the <emphasis>. But it is the minor problem - because I could fix it by displayText. Broken speech is more important here.
I am using the Actions on Google Trivia Game template.
Special characters () are not displaying in the chat window.
In google sheets, I have given in the following format.
Question: How to Add an item to the end of the list
Answer1: append()
Answer2: extend()
In google assistant, it was displaying without parenthesis. How to give questions and answers with parenthesis and other special characters?
This is a good one - it looks like the processor that uses what you entered removes special characters. This does seem odd when you look at the question and the suggestion chips.
However... it makes sense if you think about how people are expected to answer the question. If you run it in "speaker" mode, it won't display the suggestion chips, but users will be expected to verbally give an answer. It is pretty difficult to give an answer with parentheses - so the system automatically removes those from what is expected.
I'm developing a new action that sends the word "resume" (as in pause/play/resume) to be spoken. When this happens Google Home will pronounce the word as "résumé".
I know SSML supports the tag to handle pronunciations but it doesn't look like that is implemented currently with Google Home.
Worst-case I could hack the text to be "re zoom", but I'd rather find a more elegant solution. Ideas?
This isn't an ideal solution, but you can use the <sub> tag for SSML to simulate the correct pronunciation. Try something like:
<sub alias="re zoom">resume</sub>
Watson's speech recognizer supports a list of keywords as a parameter, but I'm trying to figure out whether these keywords actually affect recognition. For example, if you were handing Watson an audio clip you knew to contain proper names that might not be properly recognized, would submitting these names as keywords increase the likelihood that Watson would properly recognize them? Do the keywords interact with the recognition itself?
unfortunately the answer is no, the words wont get added to the vocabulary just because you added them as keywords, so they wont be found.
Dani
Looks like the answer to this is no (though I'd love to be proven wrong here): I ran a test clip that contained proper names that Watson mis-recognized, then submitted the same clip with those names as keywords, and it didn't change the transcript at all.
Not entirely conclusive, but it seems like the answer is no.