One similar phrase but for two different intents - chatbot

I have two intents. Both, however, may use a common phrase of "What is the status of ...". The first intent is for Request Tickets and shall always include a Request Number which starts with REQ. For example, "what is the status of REQ0054896?". The second intent is for the status of a service, such as "What is the status of Google Mail?".
I have made a custom entity for the REQ Number which is in Dialogflow as REQ#sys.number-integer:number-interger. I have also done the training and ensuring that the intents are matched however it does not always return the correct values.
I'd like it so that whenever the REQ number is entered, it matches it to the entity and understands that the user is asking for the status of a request, rather than for a business service.
As you can see from the above images, the phrase of "What is the status of..." is a common factor in both intents. Then the screenshot shows that a question in which a REQ number is used, it matches with the Business Service Intent.

Adding a simple rule-based intent classifier before the main one, in these situations when you can catch some intents with certainty can save you some headache.
I'm not aware of the details of your algorithm (data, model, ...) but clearly "What is the" part should not be important. One technique to reduce the importance of these types of words is using criteria like tf-idf as a weight function.

Related

Naming the RESTful API route. Naming of an action on a resource. What is the approach?

I have the following route with cars as a collection resource.
/api/v4/cars
/api/v4/cars/{carId}
Now I want to introduce a "Price per car", but this price is dependent on the input, not fixed. So when the user calls the pricing endpoint, it should send some data ex. color, enginse size etc. which then would determine the price for a car X.
My question now is, how this route should look like?
Does one of those make sense, what is the general approach one should take in such cases:
/api/v4/cars/{carId}/price
/api/v4/cars/{carId}/calculatePrice
/api/v4/cars/{carId}/getPrice
So when the user calls the pricing endpoint, it should send some data ex. color, engines size etc. which then would determine the price for a car X.
Sounds like submitting a web form; on the web, you would end up with the data appearing as encoded key value pairs in the URI.
GET /2c5d1cd4-0259-4c2b-9ca3-6215426732b8?color=red&transmission=automatic
Aside from the fact that browsers already know how to encode form values into a query string, there's no particular advantage to using a query; you could do path segments instead if you wanted to
GET /2c5d1cd4-0259-4c2b-9ca3-6215426732b8/color/red/transmission/automatic
Clients that understand how URI templates work can handle just about any layout of information you are interested in. HTML processing on the web isn't quite sophisticated enough to handle arbitrary uri templates; encoded key value pairs is a nice answer if you care about those use cases.
/api/v4/cars/{carId}/price
/api/v4/cars/{carId}/calculatePrice
/api/v4/cars/{carId}/getPrice
These are all "fine"; consumer code really doesn't care. A key idea to keep in mind is that a URI is an identifier; which is to say it is the name of the document (resource) that it fetches. If you have a clear understanding of your domain, it should be relatively straight forward to work out what the name of the document is, and choose an identifier spelling that makes understanding/remembering/guessing easier for human beings.

Handle the cases when users input all one-of optional paramaters in API response

An API endpoint I wrote supports two one-of parameters, and users are requested to supply values for one of them only, but ideally, they should not specify both.
{"A": "something", "B": "something"}
When users do not supply any of the two, an exception will be thrown.
However, I'm wondering how I should handle the scenarios when users put in values for both.
For the context, A is, loosely speaking, a subset of B. There's two opinions with my teammate:
When users input both, A shall prevail.
When users input both, a 400 exception is thrown to remind the users that we only need one of the two.
Thanks!
However, I'm wondering how I should handle the scenarios when users put in values for both.
You could look to media types as an analog.
That is, suppose we had a specification for application/vnd.example.AorB+json, which defined the A field, the B field, and explicitly expressed that exactly one of those two options should be present.
Now when our endpoint gets a POST, or a PUT, with content-type application/vnd.example.AorB+json, we know the entity included in the body of the request should be a json document with additional constraints on the keys and values.
So by analogy, if the keys and values are wrong, our endpoint should reject the document in the same way that it would if the JSON structure itself were not well formed.
If you buy that logic, then the practical answer is to deploy an endpoint that expects json, send it something broken, and see what response you get back. Then model your own design on that.
The WebDAV standard has an interesting discussion related to its definition of the 422 status code.
The 422 (Unprocessable Entity) status code means the server understands the content type of the request entity (hence a 415(Unsupported Media Type) status code is inappropriate), and the syntax of the request entity is correct (thus a 400 (Bad Request) status code is inappropriate) but was unable to process the contained instructions.
So you might look at your situation, and try to decide whether including both elements is "syntactically correct" or not, and choose the matching status code.
In practice, it really doesn't matter very much. You SHOULD be explaining the specific problem in the body of the response, so human beings can figure it out what happened. The different status codes are really meta data, so that generic components can figure out what to do -- but there isn't anything in the standard that tells generic components to handle a 422 differently from a 400.
So don't over think it.
If it's reasonable/meaningful to expect one property, you should only allow one property.
Strict APIs are helpful because they guide users much better to what the intent of the server and resource is.
One great tool for this kind of stuff, is to use something like json-schema to specify exactly what your server does and does not expect. Being able to specify that only 1 out of 2 properties can appear is one of its features.

What is the relationship between 'intents' and 'actions' in Dialogflow?

I'm having a bit of trouble conceptualizing the relationship between 'intents' and 'actions' in a Dialogflow agent.
I get that intents map the user's spoken request to a particular feature of my fulfillment service, optionally carrying parameters as input variables. This is how intents are defined in the official documentation:
"An intent represents a mapping between what a user says and what
action should be taken by your software."
But what then are actions? Their definition reads almost exactly the same:
"An action corresponds to the step your application will take when a
specific intent has been triggered by a user’s input."
Actions are defined within the context of intents, which means there can only be one action per intent and an action can't be attached to multiple intents. An action doesn't seem to be more than its name, which is also entirely optional as the intent works the same whether I specify an action name or not.
So what is their purpose? Why would I have my service react to actions instead of intents?
You have one slight misstatement in your question, but it illustrates the difference between a Dialogflow Intent and Action. The statement
an action can't be attached to multiple intents
isn't true. You can use the same Action name for multiple Intent names. In this case, it means that you can use the Action as a map to a function in the fulfillment without having to list each Intent that maps in your code.
In Dialogflow, the Intent does more than just match a particular user phrase - it also is used to match conversations that are in a particular state (determined by Contexts that are set) or for particular non-phrase events. Since you may wish to map several of these to the same action on the back-end (for example, if you have two different Incoming Contexts that you need matched with different user phrases), you can set the same Action for them but use different Intent names to identify them.
Some libraries, such as actions-on-google v2 and multivocal let you work with either, whichever makes the most sense.
When I name Intents, I will generally start all of them that do roughly the same thing with the same name I use for the Action, but add a suffix indicating why the Intents are different. (With the name of the context, or event, or parameters that are different.)
Update to clarify a few things
I generally use the Action name as the one that triggers my functions, however there are a few cases where I might still group things by action (because it makes sense to organize them that way), but carve out an exception for one of the Intents. Think of this as subclassing in an OO model. Rule of thumb would be to use the Action name but don't hold this rigidly if there is a good reason not to. (An example of this is using multivocal, the library defines an "unknown" Action that covers both misunderstood input and no input. Sometimes I want to handle one of these differently, however, so I'll define a handler that works on just the Intent.)
The Action name should be available in the JSON that Dialogflow sends your fulfillment in queryResult.action. I'm not sure why the documentation omits this at the moment.

Rest API: path for accessing derived data

It is not clear to me that if I have a micro service that is in place to provide some derived data how the rest api should be designed for this. For instance :-
If I have a customer and I want to access the customer I would define the API as:
/customer/1234
this would return everything we know about the customer
however if I want to provide a microservice that simply tells me if the customer was previously known to the system with another account number what do I do. I want this logic to be in the microservice but how do I define the API
customer/1234/previouslyKnow
customerPreviouslyKnown/1234
Both don't seem correct. In the first case it implies
customer/1234
could be used to get all the customer information but the microservice doesn't offer this.
Confused!
Adding some extra details for clarification.
I suppose my issue is, I don't really want a massive service which handles everything customer related. It would be better if there were lighter weight services that handles customer orders, customer info, customer history, customer status (live, lost, dead....).
It strikes me all of these would start with
/customer/XXXX
so would all the services be expected to provide a customer object back if only customer/XXXX was given with no extra in the path such as /orders
Also some of the data as mentioned isn't actually persisted anywhere it is derived and I want the logic of this hidden in a service and not in the calling code. So how is this requested and returned.
Doing microservices doesn't mean to have a separate artifact for each method. The rules of coupling and cohesion also apply to the microservices world. So if you can query several data all related to a customer, the related resources should probably belong to the same service.
So your resource would be /customers/{id}/previous-customer-numbers whereas /customers (plural!) is the list of customers, /customers/{id} is a single customer and /customers/{id}/previous-customer-numbers the list of customer numbers the customer previously had.
Try to think in resources, not operations. So returning the list of previously used customer numbers is better than returning just a boolean value. /customer/{id}/previous-accounts would be even better, I think...
Back to topic: If the value of previous-accounts is directly derived from the same data, i.e. you don't need to query a second database, etc. I would even recommend just adding the value to the customer representation:
{
"id": "1234",
"firstName": "John",
"lastName": "Doe",
"previouslyKnown": true,
"previousAccounts": [
{
"id": "987",
...
}
]
}
Whether the data is stored or derived shouldn't matter so the service client to it should not be visible on the boundary.
Adding another resource or even another service is unnecessary complexity and complexity kills you in the long run.
You mention other examples:
customer orders, customer info, customer history, customer status (live, lost, dead....)
Orders is clearly different from customer data so it should reside in a separate service. An order typically also has an order id which is globally unique. So there is the resource /orders/{orderId}. Retrieving orders by customer id is also possible:
/orders;customer={customerId}
which reads give me the list of orders for which the customer is identified by the given customer id.
These parameters which filter a list-like rest resource are called matrix parameters. You can also use a query parameter: /orders?customer={customerId} This is also quite common but a matrix parameter has the advantage that it clearly belongs to a specific part of the URL. Consider the following:
/orders;customer=1234/notifications
This would return the list of notifications belonging to the orders of the customer with the id 1234.
With a query parameter it would look like this:
/orders/notifications?customer=1234
It is not clear from the URL that the orders are filtered and not the notifications.
The drawback is that framework support for matrix parameters is varying. Some support them, some don't.
I'd like matrix parameters best here but a query parameter is OK, too.
Going back to your list:
customer orders, customer info, customer history, customer status (live, lost, dead....)
Customer info and customer status most likely belong to the same service (customer core data or the like) or even the same resource. Customer history can also go there. I would place it there as long as there isn't a reason to think of it separately. Maybe customer history is such a complicated domain (and it surely can be) that it's worth a separate service: /customer-history/{id} or maybe just /customer/{id}.
It's no problem that different services use the same paths for providing different information about one customer. They are different services and they have different endpoints so there is no collision whatsoever. Ideally you even have a DNS alias pointing to the corresponding service:
https://customer-core-data.service.lan/customers/1234
https://customer-history.service.lan/customers/1234
I'm not sure if I really understand your question. However, let me show how you can check if a certain resource exist in your server.
Consider the server provides a URL that locates a certain resource (in this situation, the URL locates a customer with the identifier 1): http://example.org/api/customers/1.
When a client perform a GET request to this URL, the client can expect the following results (there may be other situation, like authentication/authorization problems, but let's keep it simple):
If a customer with the identifier 1 exists, the client is supposed to receive a response with the status code 200 and a representation of the resource (for example, a JSON or XML representing the customer) in the response payload.
If the customer with the identifier 1 do not exist, the client is supposed to receive a response with the status code 404.
To check whether a resource exists or not, the client doesn't need the resource representation (the JSON or XML that represents the customer). What's relevant here is the status code: 200 when the resource exists and 404 when the resource do not exist. Besides GET requests, the URL that locates a customer (http://example.org/api/customers/1) could also handle HEAD requests. The HEAD method is identical to the GET method, but the server won't send the resource representation in HEAD requests. Hence, it's useful to check whether a resource exists or not.
See more details regarding the HEAD method:
4.3.2. HEAD
The HEAD method is identical to GET except that the server MUST NOT
send a message body in the response (i.e., the response terminates at
the end of the header section). The server SHOULD send the same
header fields in response to a HEAD request as it would have sent if
the request had been a GET, except that the payload header fields MAY be omitted. This method can be used for obtaining
metadata about the selected representation without transferring the
representation data and is often used for testing hypertext links for
validity, accessibility, and recent modification. [...]
If the difference between resource and resource representation is not clear, please check this answer.
One thing I want to add to the already great answers is: URLS design doesn't really matter that much if you do REST correctly.
One of the important tenets of REST is that urls are discovered. A client that has the customers's information already, and wants to find out what the "previously known" information, should just be able to discover that url on the main customer resource. If it links from there to the "previously known" information, it doesn't matter if the url is on a different domain, path, or even protocol.
So if you application naturally makes more sense if "previouslyKnown" is on a separate base path, then maybe you should just go for that.

exposing operations on resources RESTfully - overloaded POST vs. PUT vs. controller resources

Say you've got a Person resource, and part of its representation includes a Location value which can have values like "at home", "at school" and "at work". How would you RESTfully expose activities like "go home", "go to work", "go to school", etc? For the sake of discussion, let's stipulate that these activities take time, so they are executed asynchronously, and there are various ways in which they could fail (no means of transportation available, transportation breakdown during travel, other act of God, etc.). In addition, the Person resource has other attributes and associated operations that affect those attributes (e.g. attribute=energy-level, operations=eat/sleep/excercise).
Option 1: Overload POST on the Person resource, providing an input parameter indicating what you want the person to do (e.g. action=go-to-school). Return a 202 from the POST and expose activity-in-progress status attributes within the Person's representation that the client can GET to observe progress and success/failure.
Benefits: keeps it simple.
Drawbacks: amounts to tunneling. The action taking place is buried in the payload instead of being visible in the URI, verb, headers, etc. The POST verb on this resource doesn't have a single semantic meaning.
Option 2: Use PUT to set the Person's location to the state you'd like them to have. Return a 202 from the PUT and expose activity-in-progress attributes for status polling via GET.
Benefits: Not sure I see any.
Drawbacks: really, this is just tunneling with another verb. Also, it doesn't work in some cases (both sleeping and eating increase energy-level, so PUTting the energy-level to a higher value is ambiguous in terms of what action you want the resource to perform).
Option 3: expose a generic controller resource that operates on Person objects. For example, create a PersonActivityManager resource that accepts POST requests with arguments that identify the target Person and requested action. The POST could return a PersonActivity resource to represent the activity in progress, which the client could GET to monitor progress and success/failure.
Benefits: Seems a bit cleaner by separating the activity and its status from the Person resource.
Drawbacks: Now we've moved the tunneling to the PersonActivityManager resource.
Option 4:
Establish separate controller resources for each supported action, e.g. a ToWorkTransporter resource that accepts POST requests with an argument (or URI element) that identifies the Person, plus a ToHomeTransporter, a ToSchoolTransporter, a MealServer, a Sleeper, and an Exerciser. Each of these returns an appropriate task-monitoring resource (Commute, Meal, Slumber, Workout) from their POST method, which the client can monitor via GET.
Benefits: OK, we've finally eliminated tunneling. Each POST means only one thing.
Drawbacks: Now were talking about a lot of resources (maybe we could combine the transporters into one Transporter that accepts a destination argument). And some of them are pretty semantically contrived (a Sleeper?). It may be more RESTful, but is it practical?
OK, I've been researching and pondering this for about a week now. Since nobody else has answered, I'll post the results of what I've learned.
Tim Bray, in RESTful Casuistry, talks about PUT-ing a state field vs POST-ing to a controller which will perform an operation affecting that state. He uses the example of a VM and how to RESTfully expose the function of a "reboot button". He says
"If I want to update some fields in an existing resource, I’m inclined
to think about PUT. But that doesn’t work because it’s supposed to be
idempotent, and rebooting a server sure isn’t. Well, OK, do it with
POST I guess; no biggie.
But you’re not really changing a state, you’re requesting a specific
set of actions to happen, as a result of which the state may or may
not attain the desired value. In fact, when you hit the deploy switch,
the state changes to deploying and then after some unpredictable
amount of time to deployed. And the reboot operation is the classic
case of a box with a big red switch on the side; the problem is how to
push the switch.
So, the more I think of it, the more I think that these resources are
like buttons, with only one defined operation: push. People have been
whining about “write-only resources” but I don’t have a problem with
that because it seems accurate. The reboot and halt buttons don’t
really have any state, so you shouldn’t expect anything useful from a
GET."
Tim seems to settle somewhere between my #3 and #4 option, exposing multiple controller resources, but pulling back from "going overboard" and having separate controller resources for everything.
Tim's post led to another by Roy Fielding (It is OK to use POST) in which he says that for situations where there is a monitorable entity state, and an action to potentially change that state, he's inclined to use POST rather than PUT. In response to a commenter's suggestion to expose the monitored state as a separate PUT-able resource, he says
"we only use PUT when the update action is idempotent and the
representation is complete. I think we should define an additional
resource whenever we think that resource might be useful to others in
isolation, and make use of the GET/PUT methods for that resource, but
I don’t think we should define new resources just for the sake of
avoiding POST."
Finally, Bill de hOra, in Just use POST discusses the specific case of using PUT vs. POST to update the state of a collection resource, and the tradeoffs therein.