How to use the full dialogflow API functionality (and get speech recognition accuracy) through google assistant/google home? - actions-on-google

I have a dialog agent created through DialogFlow. I want to have a conversation with this agent on a Google Home device.
The problem:
The dialogflow API (ex. dialogflow-nodejs-client-v2) gives full access to agents built in DialogFlow. Most importantly, users can interact with the system either through text input or speech input (as a .wav file or an audio stream). When you send a request to the DialogFlow agent (ex. detect intent from audio), it returns this happy response object, which crucially includes a "speechRecognitionConfidence".
But! When interacting with the dialogue agent through a GoogleAssistant App, the request object sent to a webhook is missing the "speechRecognitionConfidence" value. This means that:
I don't have the input audio
I don't have the ASR confidence
Questions:
Is it possible to send the ASR confidence (and any other useful info) to a webhook?
Is there another way to access the ASR confidence (ie by making an API call)?
Is there a way to run a program built using the dialogflow API on a Google Home (or through the google assistant)?
Thank you in advance for any help. I've been struggling through endless documentation without success.

Related

Convert a text to voice recording and then play it after calling a phone number on Flutter

I'm trying to make an app which gets the location of a user and then call a number(pre-selected by user) to send a recorded voice message, like say "Location is {longitude} degrees, {latitude} degrees" but I can't find anything to do the same I found Twilio on some searching but in the Flutter package twilio_voice I can't find anything to do what I want.
So my question is can this be done in Twilio and if not what else I can do?
(Edit :I have figured out the part to fetch the user location and just display it on the screen, the part that remains is to convert this to speech and play on call)
You can achieve this with Twilio, however you will need a server from which to make the requests to the Twilio API. To make an API call to Twilio you need to use your Account SID and Auth Token and if you were to embed those within your Flutter application then a malicious user would be able to decompile your app, steal your credentials and abuse your account.
So, the best way is to build a server-side application that can make the API calls on behalf of your app and then send the relevant data (the location, in this case) from your mobile app to the server application.
On the server-side application, once you receive the location data you can generate a call by calling on the Twilio calls resource and pass the number you want to call, the number you are calling from and the TwiML you want to execute on the call. TwiML tells Twilio what to do on a call, and in this case you can direct Twilio to <Say> the message with the location.
I see you've asked python questions in the past, so here's a quick example of calling the Twilio API to create a call in python:
import os
from twilio.rest import Client
# Find your Account SID and Auth Token at twilio.com/console
# and set the environment variables. See http://twil.io/secure
account_sid = os.environ['TWILIO_ACCOUNT_SID']
auth_token = os.environ['TWILIO_AUTH_TOKEN']
client = Client(account_sid, auth_token)
call = client.calls.create(
twiml='<Response><Say>Location is {longitude} degrees, {latitude} degrees</Say></Response>',
to=TO_NUMBER,
from_=YOUR_TWILIO_NUMBER
)
print(call.sid)
You need to divide your problem into smaller subproblems and test if the native OS (Android / iOS) allows you this:
Fetching current user location
Create an audio file with a text-to-audio speech generator
Research if it's possible to make calls programmatically (this might be tricky)
Make a phone call
Research if it's possible to play audio file while in call
Play an audio file on speaker while incall

give Google Assistant device commands programmatically

Is it possible to give Google Assistant commands programmatically? For example, I'd like to be able to send a command as text "turn on the fan" and have GA react as if that was the spoken command. I would also accept sending a JSON request in whatever format needed (with device IDs or whatever the API needs).
My situation is I have a ceiling fan that is controlled by Google Assistant. I want to be able to control it programmatically. For example, some event happens and my code wants to turn the fan on. Is there any way my code can tell GA to turn on the fan?
I tried using the Google Assistant SDK. I can send it text like "what time is it?" and get back text and audio, eg "It is 11:00am". However, I have a test device called "washer" and if I send text "is the washer running?" I get back "Sorry, I didn't understand". If I speak the words into my phone, I get back "The washer is running".
Why can't the GA SDK interact with my device? The credentials I give to the GA SDK are the same I use for my SmartHomeApp that defines the "washer" device.
To do this, you can setup up a virtual Assistant device and then send commands to it.
Check out Assistant Relay, which is a service that sets up a virtual Assistant device and exposes a REST API so you can send text commands to it, as if they were spoken.
Per the Documentation:
Simply send Assistant Relay any query you would normally send Google
Assistant, and Assistant Relay will call the Assistant SDK an execute
your command.
Per the problem you are having with the Google Assistant SDK, I believe what you are trying to achieve is only possible with a device, be it physical or virtual and not by using the SDK directly.
There are a lot of firewall and security issues allowing each smart device to to connect to the Internet. To alleviate this problem, Google's design methodology uses a fulfillment device as a bridge to connect to the device locally from one of their devices.
You are locally, on your smartphone, hooking into Google Assistant.
The phone is the fulfillment glue for the "washer" device.
According to this page:
Google Home or Google Nest device is required to perform the
fulfillment.
Due to the portable nature of cell phones, it does not make sense to allow one to be used as the fulfillment device remotely, hence the local hook.

Skill closing and google opens recipe

We are developing interactive audiobooks for voice and have problems with some of our continuations with google assistant.
Example: In our story "Das tapfere Schneiderlein", the user hast to decide if he wants "Pflaumenmus" (plum butter) or "Apfelmus" (apple puree).
In the Test-console, everything works fine, both answers lead to the correct audio.
BUT with Google Assistant on Mobile device, only Pflaumenmus works. If I answer "Apfelmus", the action leaves conversations and opens Apple puree recipes with Google search. (see example image below, it's German, but still understandable I guess)
As we can never now, what our customers might answer, how can we prevent this from happening? (We are using Actions Builder.)
Example
This might be a result of an update regarding the Google Assistant Actions fallback intent behavior change that we announced on October 15/2020.
Follow the message from Google to make it work as you expect:
In order to provide a better experience, we now allow users to ask for some Assistant features, such as the weather or time, from within your Action. To perform this function, the Assistant detects if your Action matched a user's query with a fallback intent or NO_MATCH intent. If that is the case, and an appropriate response is available, Assistant responds to the user's request. If no response is available, or Assistant doesn't understand the query, the conversation continues within your Action.
As of October 15, 2020, this new behavior applies only if the fallback does not use a webhook. Starting January 15th 2021, we'll start enabling this feature for any Dialogflow fallback intent or Actions Builder NO_MATCH intent whether or not they use a webhook.
This change should not impact the operation of your Actions, unless you are using fallbacks as a way to collect input from your users. Going forward, you should only use fallback intents or NO_MATCH intents as a way to reprompt the user in the context of your Action. If you want your Actions to attempt to capture data from a wider range of user responses, create an intent that uses a Free form text type if you use Actions Builder. If you use Dialogflow, add an intent with a #sys.any type as the training phrase.

Is it possible to retrieve the configured rooms/locations in the fulfillment service?

I have been experimenting with Google Smart Home and the protocol flow looks very clear for me. In summary:
action.devices.SYNC - sent by Google Smart Home to fulfillment service to find out the available devices
action.devices.EXECUTE - sent by Google Smart Home to fulfillment service to execute a certain action on a device
On the smartphone/tablet, the customer can place a device in a certain location. This allows him to ask questions such as Turn everything in my office off. Internally, Google Smart Home knows which devices are located in the office, and sends a action.devices.EXECUTE action for each device in the office subsequently, as explained above.
I am now wondering about the following: is it possible to retrieve the configured locations/rooms in the fulfillment service also? Is this information exposed and available to retrieve?
It is not possible to receive information about a user's home layout through the Home Graph API. When the user gives a command like "Turn everything in my office off", you may get several OnOff commands in your fulfillment, although you will have no way of knowing the original query.

Make Autorization on app (Vocal recognition)

I created an app on google home, and I want the app to work only if its MY voice who that asks to do the skills on my google home
How can i do that? Is this possible? I configured my google home at beginning for it recognize my voice, now how to make it mandatory for this app?
I'm trying to make my app secure because it's make banking operations by voice.
Each user request is sent to your application with a unique anonymous UserID. You will need to determine the UserID that belongs to your account (by looking at logs for your application to see which value is yours) and reject requests from other UserIDs.
Even better would be to setup a more proper Account Linking system.
Keep in mind, however, that the voice authentication system isn't perfect and there is a slight, but possible, way for others to duplicate the request - either by using a recording of your voice or by having a similar voice. Consider all the risks when designing such applications.