I'm working up a little dialogflow application and would like to be able to ask it to play music implicitly, much like what happens if I "link" a pandora account to Google Home.
For instance
"Hey Google, play some jazz in the kitchen"
I have my app properly recognizing and handling the intents when they are directly requested, as in...
"Hey Google, ask My Wizzy App to play some jazz in the kitchen"
But that gets cumbersome quick.
I've set up the intents such that they +should+ be recognized as implicit intents (via the Google Assistant integrations page), but they don't actually work.
Unfortunately implicit invocation is not always guaranteed and, for this reason, may not be optimal for the use case you've defined here. The likelihood of implicit invocation for a given user is based on various factors including the relevance and quality of the app. You can find out how better to optimize your app for implicit invocation here.
Related
I have a Google Nest Hub Max and I want to increase its capabilities for a custom need:
"Hey Google, add xyz to my work planning"
Then I want to make an HTTP call to my private server
The private server returns a text
The text is displayed in the Google Nest Hub Max screen + speak-out.
How can that be achieved?
Originally I thought that this will not be difficult. I've imagined a NodeJs, Java, Python or whatever framework where Google gives me the xyz text and I can do my thing and return a simple text. And obviously, Google will handle the intent matching and only call my custom code when users say the precise phrase.
I've tried to search for how to do it online, but there is a lot of documentation everywhere. This post resumes quite well the situation, but I've never found a tutorial or hello world example of such a thing.
Does anyone know how to do it?
For steps 2. and 3., I don't necessarily need to use a private server, if I can achieve what the private server does inside the Smart Home Action code, mostly some basic Python code.
First - you're on the right track! There are a few assumptions and terminology issues in your question that we need to clear up first, but your idea is fundamentally sound:
Google uses the term "Smart Home Actions" to describe controlling IoT/smart home devices such as lights, appliances, outlets, etc. Making something that you control through the Assistant, including Smart Speakers and Smart Hubs, means building a Conversational Action.
Most Conversational Actions need to be invoked by name. So you would start your action with something like "Talk to Work Planning" or "Ask Work Planning to add XYZ'. There are a limited, but growing, number of built in intents (BIIs) to cover other verticals - but don't count on them right now.
All Actions are public. They all share an invocation name namespace and anyone can access them. You can add Account Linking or other ways to ensure a limited audience, and there are ways to have more private alpha and beta testing, but there are issues with both. (Consider this an opportunity!)
You're correct that Google will help you with parsing the Intent and getting the parameter values (the XYZ in your example) and then handing this over to your server. However, the server must be at a publicly accessible address with an HTTPS endpoint. (Google refers to this as a webhook.)
There are a number of resources available, via Google, StackOverflow, and elsewhere:
On StackOverflow, look for the actions-on-google tag. Frequently, conversational actions are either built with dialogflow-es or, more recently, actions-builder which each have their own tags. (And don't forget that when you post your own questions to make sure you provide code, errors, screen shots, and as much other information as you can to help us help you overcome the issues.)
Google's documentation about how to design and build conversational actions.
Google also has codelabs and sample code illustrating how to build conversational actions. The codelabs include the "hello world" examples you are probably looking for.
Most sample code uses JavaScript with node.js, since Google provides a library for it. If you want to use python, you'll need the JSON format that the Assistant will send to your webhook and that it expects back in response.
There are articles and videos written about it. For example, this series of blog posts discussing designing and developing actions outlines the steps and shows the code. And this YouTube playlist takes you through the process step-by-step (and there are other videos covering other details if you want more).
I know that DialogFlow can be trained for particular entities. But I wanted an insight on whether or not Google Assistant can understand my entities?
I've tried to search on official site but could not get clear understanding on whether or not I need to go for dialogflow.
Actions on Google will allow you to extend Google Assistant by writing your own app (i.e. an Action). In your Action, you can tailor conversational experience between the Google Assistant and a user. To write an action you will need to have a natural language understanding mechanism, which is what Dialogflow provides.
You can learn more about Actions on Google development in the official docs. There are also official informational talks about Actions on Google and Dialogflow online, such as
"An introduction to developing Actions for the Google Assistant (Google I/O '18)"
I'm not quite sure what you mean with your last sentence, there is no way to define entities for Google Assistant other than Dialogflow. Regarding your question, there is indeed no information on how entities are handled and how good one can reasonably expect the recognition to be. This is especially frustrating for the automated expension feature, where it is basically a lottery which values will be picked up and which will not. Extensive testing is really the only thing one can do there.
My project on Actions On Google is not getting approved and I'm struggling to find a good reason.
Let's say the name of my app is < appname>. The < appname> is a simple two word phrase and is not a duplicate of any existing app in the store.
The invocation names configured are:
Talk to < appname>
Play < appname>
Launch < appname>
Open < appname>
Speak to < appname>
While testing on Google Home Mini, all invocations worked flawlessly. However, Google review folks reverted, suggesting that other than Talk to < appname> all invocations are failing.
Thereafter, I tested on Google Assistant on iPhone. Surprisingly, although the said phrases are being interpreted properly (evident from speech-to-text) but other than Talk to < appname> it fails for other invocations.
They're suggesting to submit the only working invocation but it will limit access to my app.
Precursor
I realize this isn't exactly a code question, but I believe it belongs on Stack Overflow. There's JSON, queries, and invoking methods through voice involved. If it were about other metadata, such as the description and privacy policies, then it would be inappropriate, in my opinion.
I'll go ahead and respond to the question. Please don't hold the validity of the question against me.
Background
I've been building an Action on Google with the Actions SDK. While you're using dialogflow, some of the information I learned today should be helpful. Keep in mind I don't work on Actions on Google, so this is just a response from another user. It's also my first Action and I'm learning with you.
Solution
I think the issue with your configuration is the trigger words of your invocations. I'm still talking with one of the product managers, but it seems only certain trigger phrases are allowed. The format of an invocation is
[trigger] + [your action name] + to + [action invocation phrase]
If you look at the Language and Locales Doc, you will see
Docs: The basic verbs to trigger an app by its name are: talk, speak, and
ask. Here are some example phrases that users can say to trigger your
apps.
"let me talk to $name"
"I want to talk to $name"
"can I talk to $name"
"talk to $name"
...
Therefore some of your trigger phrases are invalid. (Mine were too and I'm going to need to fix for resubmission)
You: However, Google review folks reverted, suggesting that other than Talk
to < appname> all invocations are failing.
Talk to < appname > is working because it uses one of the three permitted English trigger phrases (talk).
I'm surprised the other invocations worked on the Google Home Mini. When I added more invocations through the Actions SDK using other triggers, they would not invoke the action. I can pass this along as a potential bug, where invalid triggers work with dialogflow on test devices, according to your report.
I'll follow up once my assumptions on trigger phrases are confirmed and will let you know if I learn anything notable.
Edit: One more note, I agree more trigger phrases are important for app discovery and I'm trying to find out if they can be added. From what I understand, some are disabled like play for media purposes, i.e. "hey google, play [some song]."
I'm creating a setup of a Google Assistant/Home that should IDEALLY respond to the phrase "Okay Google, show pictures of [PARAMETER PHRASE]" by giving me the parameter phrase. It also HAS to be able to function like a regular home ("Hey Google, how far away is the moon", "... tell me a joke", etc.), without having me reimplement all of that functionality (unmatched phrases should fallback to the Google Home).
If I use the Home, I'm afraid I won't be able to avoid "... tell [MY APP NAME] to ...", but it has a great mic and speaker built in.
I am alternatively looking into a raspberry pi solution for the added layer of control, but the Home has a fantastic mic and speaker already. And importantly, I absolutely don't want to recreate the core Google Home features (possibly able to pass off uncaught phrases to the Google Home backend?)
I can mask some non-parameterized commands with the Assistant Shortcuts ("Okay Google, cat time!", "Hey Google, show me cats") in order to simplify the call phrase, but that does not work because it's not parametrizable.
TLDR: I have a setup that needs to 1. work like a normal Google Home, but must 2. have additional functionality that I implement. I would like to 3. avoid having to state "... tell MY TARGET APP to [...]", but I need 4. parameters to be passed to my code., even if completely unparsed.
What are my options?
There are a bunch of possible approaches here, depending on the exact angle you want to tackle this. None really are perfect at this time, however, but since everything is evolving, we'll see what might develop.
It sounds like you're making an IoT picture frame or something like that? And you want to be able to talk to it? If so, you may want to look into the Assistant SDK, which lets you embed the Assistant into your IoT device. This would let you implement some voice commands yourself, but pass other things off to the Assistant to handle.
But this isn't a perfect solution, since it splits where the voice recognition works, where it is applied, and may not get you the hotword triggering.
It is also still in an early Developer Preview, so things might change, and it may evolve to be something closer to what you want... but it is difficult to tell right now.
Depending on the IoT appliance you're working on, you may be able to leverage the built-in commands by building a Smart Home Action. However, at the moment, these have a fairly limited set of appliance types they can work with. It also sounds like you're trying to deal with media control - which isn't something that Smart Home directly works with, and is (hopefully) a future Action API (there were some hints about this at I/O, with Cast compatibility promised... but no details).
If you really want to build for the Home and Assistant, you'll need to use the limitations around Actions on Google. And that does include some issues with the triggering name.
However... one good strategy is to pick a name that works well with the prefix phrases that are used. Since "Ask" is a legitimate prefix that Home handles, you could plan for a triggering name such as "awesome photo frame", and make the command "Ask awesome photo frame to show pictures of something".
More risky, since it isn't clearly documented, but it seems that some triggering names work without a prefix at all. So if your application is named "fly to the moon", it seems like you can say "Hey Google, fly to the moon" and the action will be triggered. If you can get a name like this registered, it will feel very natural for the user.
Finally, you can pick a reasonable name, but have your users set an alias or shortcut that makes sense to them. I'm not sure how this would fit in with solution (1), but being able for you to predefine shortcuts would make it pretty powerful.
You can't invoke your app without first connecting to your app using Ok Googe, talk to my app* because if it happens so, it will be like talking to the Core Assistant, not your app.
Google doesn't allow to talk an app without app invoke
I have developed a conversational skill using API.AI and deployed to Google Home but API.AI's support seems limited and I am unable to do certain things like playing an audio file. The question I have is whether it's better to stick with API.AI or switch to Actions on Google for the long term.
Google has said that API.AI is the recommended way to build an agent for 'actions on google' for those who don't need/want to do their own NLU. They seem to expect that most developers will use API.AI because it does some of the work for you, with the NLU being the prime example, cf. Alexa where the developer is expected to specify all the different utterence variations for an intent (well, almost all - it will do some minor interpretation for you).
On the other hand, keep in mind that API.AI was created/designed before 'actions on google' existed and before they were purchased by Google - it was designed to be a generic bot creation service. So, where you gain something in creating a single bot that can fulfill many different services and having it do some of the messy work for you, you will certainly lose something compared to the power and control you have when writing to the API of one specific service - something more then just the NLU IMO, though I can't speak to playing an audio file specifically.
So, if you plan to just target the one service (and an audio bot is not relevant to most of the other services supported by API.AI) and you are finding the API.AI interface to be limiting then you should certainly consider writing your service with the 'actions on google' sdk.