Is API.AI the native way to build conversational skills for Google Assistant? - actions-on-google

I have developed a conversational skill using API.AI and deployed to Google Home but API.AI's support seems limited and I am unable to do certain things like playing an audio file. The question I have is whether it's better to stick with API.AI or switch to Actions on Google for the long term.

Google has said that API.AI is the recommended way to build an agent for 'actions on google' for those who don't need/want to do their own NLU. They seem to expect that most developers will use API.AI because it does some of the work for you, with the NLU being the prime example, cf. Alexa where the developer is expected to specify all the different utterence variations for an intent (well, almost all - it will do some minor interpretation for you).
On the other hand, keep in mind that API.AI was created/designed before 'actions on google' existed and before they were purchased by Google - it was designed to be a generic bot creation service. So, where you gain something in creating a single bot that can fulfill many different services and having it do some of the messy work for you, you will certainly lose something compared to the power and control you have when writing to the API of one specific service - something more then just the NLU IMO, though I can't speak to playing an audio file specifically.
So, if you plan to just target the one service (and an audio bot is not relevant to most of the other services supported by API.AI) and you are finding the API.AI interface to be limiting then you should certainly consider writing your service with the 'actions on google' sdk.

Related

Create custom Google Smart Home Action

I have a Google Nest Hub Max and I want to increase its capabilities for a custom need:
"Hey Google, add xyz to my work planning"
Then I want to make an HTTP call to my private server
The private server returns a text
The text is displayed in the Google Nest Hub Max screen + speak-out.
How can that be achieved?
Originally I thought that this will not be difficult. I've imagined a NodeJs, Java, Python or whatever framework where Google gives me the xyz text and I can do my thing and return a simple text. And obviously, Google will handle the intent matching and only call my custom code when users say the precise phrase.
I've tried to search for how to do it online, but there is a lot of documentation everywhere. This post resumes quite well the situation, but I've never found a tutorial or hello world example of such a thing.
Does anyone know how to do it?
For steps 2. and 3., I don't necessarily need to use a private server, if I can achieve what the private server does inside the Smart Home Action code, mostly some basic Python code.
First - you're on the right track! There are a few assumptions and terminology issues in your question that we need to clear up first, but your idea is fundamentally sound:
Google uses the term "Smart Home Actions" to describe controlling IoT/smart home devices such as lights, appliances, outlets, etc. Making something that you control through the Assistant, including Smart Speakers and Smart Hubs, means building a Conversational Action.
Most Conversational Actions need to be invoked by name. So you would start your action with something like "Talk to Work Planning" or "Ask Work Planning to add XYZ'. There are a limited, but growing, number of built in intents (BIIs) to cover other verticals - but don't count on them right now.
All Actions are public. They all share an invocation name namespace and anyone can access them. You can add Account Linking or other ways to ensure a limited audience, and there are ways to have more private alpha and beta testing, but there are issues with both. (Consider this an opportunity!)
You're correct that Google will help you with parsing the Intent and getting the parameter values (the XYZ in your example) and then handing this over to your server. However, the server must be at a publicly accessible address with an HTTPS endpoint. (Google refers to this as a webhook.)
There are a number of resources available, via Google, StackOverflow, and elsewhere:
On StackOverflow, look for the actions-on-google tag. Frequently, conversational actions are either built with dialogflow-es or, more recently, actions-builder which each have their own tags. (And don't forget that when you post your own questions to make sure you provide code, errors, screen shots, and as much other information as you can to help us help you overcome the issues.)
Google's documentation about how to design and build conversational actions.
Google also has codelabs and sample code illustrating how to build conversational actions. The codelabs include the "hello world" examples you are probably looking for.
Most sample code uses JavaScript with node.js, since Google provides a library for it. If you want to use python, you'll need the JSON format that the Assistant will send to your webhook and that it expects back in response.
There are articles and videos written about it. For example, this series of blog posts discussing designing and developing actions outlines the steps and shows the code. And this YouTube playlist takes you through the process step-by-step (and there are other videos covering other details if you want more).

Can I make google assistant understand my entities and train it for the same or I need DialogFlow?

I know that DialogFlow can be trained for particular entities. But I wanted an insight on whether or not Google Assistant can understand my entities?
I've tried to search on official site but could not get clear understanding on whether or not I need to go for dialogflow.
Actions on Google will allow you to extend Google Assistant by writing your own app (i.e. an Action). In your Action, you can tailor conversational experience between the Google Assistant and a user. To write an action you will need to have a natural language understanding mechanism, which is what Dialogflow provides.
You can learn more about Actions on Google development in the official docs. There are also official informational talks about Actions on Google and Dialogflow online, such as
"An introduction to developing Actions for the Google Assistant (Google I/O '18)"
I'm not quite sure what you mean with your last sentence, there is no way to define entities for Google Assistant other than Dialogflow. Regarding your question, there is indeed no information on how entities are handled and how good one can reasonably expect the recognition to be. This is especially frustrating for the automated expension feature, where it is basically a lottery which values will be picked up and which will not. Extensive testing is really the only thing one can do there.

Certifying an Actions On Google Smart Home App

my app has passed the first review (yay) - but has now been passed to Allion for 'hardware review'.
The issue is that I am not providing hardware, i've provided voice interactions for an open source HA system, which in turn can support 00s of device types.
The Amazon review process was happy for me to provide them credentials to my service, which had access to a subset of device types, to then QA the interactions.
Is this normal for the review process?
Thus far many of the supported services have been direct hardware partners who own their own hardware and cloud. If your integration is done in a slightly different way it may require special instructions to your reviewer that may be different from normal.
I have had further comment from Google. Unless I can provide 1 of every type of physical device that the underlying HA system supports, they will not review, or certify my app. What this tells me is that either Google aren't committed to small SaaS providers & HA enthusiasts, or that this comment is not true. I note that there are several equivalent services listed on Smart home app list, which do exactly the same thing as mine. I rather doubt they provided one of every manufacturers zwave switch/light etc etc. Very disappointing, Amazon have a more welcoming approach –

understanding microsoft bot platform

My company has started looking into using a platform to generate chat bots, we came across microsoft's framework and are considering using it. we have a few concerns that we need to understand better about their product and would appreciate it if you could help us.
1) What kind of support do they give us when using Facebook messenger compared to what facebook gives natively? things like quick answer or image sending, buttons on the messages? do they support any of that?
2) We would like if you could elaborate exactly what the platform may give us and why we should use it, what we need is to keep all our logic in our servers and have a platform that will interact with all the messengers for us and keep us from coding to each a different code.
3) like question 1 but for telegram and any other messenger? (custom keyboards and stuff like that).
thanks for the help!
Thanks #ejadib
Regarding your second question, your bots logic does stay within your bot and your servers. The Bot Framework provides three things:
1) Connectivity services between your bot and the channels your users are on. All of the logic continues to reside in your bot.
2) Optionally - Bot Building SDK's you can use to facilitate dialog within your bot. These are SDK's you would code to, but still deploy to your own servers.
3) A directory where you could optionally publish your bot.
As #ejadib says, where we can be consistent across channels we add functionality to the core API; and where functionality is very specific to a channel we expose it through the ChannelData property of the C# SDK (SourceEvent in Node).
Regarding 1 and 3, if you want to be able to take advantage of special features or concepts for a channel (Facebook/Telegram) BotFramework provide a way for you to send native metadata to that channel giving you much deeper control over how your bot interacts on a channel. The way you do this is to pass extra properties via the ChannelData property (in C#).
Some things are already supported in the framework, for example Rich Cards will render differently depending on the channel.
Here you will find the information (including Facebook and Telegram).
Also, here you can find how for example you can use things like quick replies.

What is the technology behind Google Buzz?

I am really curious to know how Google Buzz and Facebook implement their comment feature which is being updated instantly. is it similar to Google wave technology? are there any resources to learn that technology and implement it to our website?
Thanks !!
I work on the Google Buzz team, so hopefully I can give you a good answer for our side of the equation. I obviously won't go into any of the confidential backend stuff, but I'm happy to address the open standards we use and the open source projects involved.
Starting in the UI space, we use technologies like Closure and GWT to build rich, responsive user interfaces. We use a technology vaguely similar to what you see in the Google App Engine Channel API to push real-time updates to the users. GAE is a really good choice for real-time web applications right now.
On the API side of things, we try to use open standards wherever possible. We use the Atom syndication format to enable feed readers to consume Buzz content, and Pubsubhubbub to enable real-time pushes of the content. In fact, we use Pubsubhubbub for our activity firehose — it's possible to subscribe to the entire real-time stream of all updates that happen in Buzz. Needless to say, this sends a massive amount of traffic to your application. On the JSON side of the equation, we use Activity Streams, and we're actively working with the community to refine and improve that specification. Our Atom feeds include Activity Streams as well, but the focus there is on syndication. All our secured API endpoints for Buzz use the OAuth standard for authorization.
On the backend, I think the only thing we're willing to say publicly is that Protocol Buffers are pretty awesome.
The technology is called Real-time web (http://en.wikipedia.org/wiki/Real-time_web). You have many application models to achieve real-time and one of them is Comet (http://en.wikipedia.org/wiki/Comet_%28programming%29). Good server to use it in your implementation is APE (http://www.ape-project.org/). It supports many common javascript frameworks. More you can check in provided links.