Why does a regex entity have priority in Dialogflow CX? - chatbot

I am establishing two entities in my Dialogflow CX agent:
the first one, called "id" contains some numeric IDs that exist in my domain: e.g. 29042, 29145, 28248, ...
the second one, "wrongId", is defined by a regex that is supposed to capture all numeric sequences similar to the previous entities: \d{5,6}
These two entities are used in different phrases of the same intent.
The goal is to make the agent behavior in a certain way if the numeric ID inserted by the user exists; otherwise, the agent will say that such ID does not exist. For this purpose, I created two different routes. The first one is activated when the first entity is matched; the second one is activated when the regex entity is matched.
Since routes are evaluated in the order they are presented, I would expect that if the user inserted a valid ID, the first route would be activated; if the user inserted an ID that does not exist, then the first route would be discarded and the second one will be activated.
However, I noticed that the second route is always activated, as if the regex entity is always preferred to the regular one when Dialogflow parses the entities in an intent.
Can anyone confirm this behavior, or otherwise point to any mistake I am making?

According to your description above, the entity "id" and the entity "wrongId" have overlapping values. Specifically, regular expression \d{5,6} can match all examples you provided for the entity "id".
Here's a screenshot from https://regex101.com/:
That is, when you input a correct id, either one or the other entity can be matched.
Using entities with competing values is against agent design best practices. Avoiding to use conflicting (also referred to as competing or ambiguous) training data when designing your agent will help you avoid conflicts at runtime.
If you have a small number of correct id's, you could add them as entity exclusions to the "wrongId" entity.
Another approach could be to match both correct and wrong id's with the same (broader) entity and validate collected values against your id's database in the backend.

Related

REST where should end point go?

Suppose there's USERS and ORDERS
for a specific user's order list
You could do
/user/3/order_list
/order/?user=3
Which one is prefered and why?
Optional parameters tend to be easier to put in the query string.
If you want to return a 404 error when the parameter value does not correspond to an existing resource then I would tend towards a path segment parameter. e.g. /customer/232 where 232 is not a valid customer id.
If however you want to return an empty list then when the parameter is not found then query string parameters. e.g. /contacts?name=dave
If a parameter affects an entire URI structure then use a path e.g. a language parameter /en/document/foo.txt versus /document/foo.txt?language=en
If unique identifiers to be in a path rather than a query parameter.
Path is friendly for search engine/browser history/ Navigation.
When I started to create an API, I was thinking about the same question.
Video from apigee. help me a lot.
In a nutshell when you decide to build an API, you should decide which entity is independent and which is only related to someone.
For example, if you have a specific endpoint for orders with create/update/delete operations, then it will be fine to use a second approach /order/?user=3.
In the other way, if orders have only one representation, depends on a user and they don't have any special interaction then you could first approach.
There is also nice article about best practice
The whole point of REST is resources. You should try and map them as closely as possible to the actual requests you're going to get. I'd definitely not call it order_list because that looks like an action (you're "listing" the orders, while GET should be enough to tell you that you're getting something)
So, first of all I think you should have /users instead of /user, Then consider it as a tree structure:
A seller (for lack of a better name) can have multiple users
A user can have multiple orders
An order can have multiple items
So, I'd go for something like:
The seller can see its users with yourdomain.com/my/users
The details of a single user can be seen with yourdomain.com/my/users/3
The orders of a single user can be seen with yourdomain.com/my/users/3/orders
The items of a single order can be seen with yourdomain.com/my/users/3/orders/5

Master Data Services - Domain based attributes

We are using Master Data Services as an MDM solution for our SQL Server BI environment. I have an entity containing a first name and last name and then I have created a business rule that concatenates these two fields to form a full name which is then stored in the "name" system field of the entity.
I use this as a domain based entity in another entity. Then the user can then see the full name before linking it as a attribute in the second entity.
I want to be able to restrict the users from capturing data in the first entity against the name attribute because the business rule deals with the logic to populate this attribute. I have read that there are two ways to do this:
Set the display width to zero of the attribute. This does not seem to work, the explorer version still shows a narrow version of the field in the rows and the user can still edit the field in the detail pane.
Use the security to make the attribute read only. I have tried different combinations of this but it seems that you cannot use this functionality for a name field (system field).
This seems like pretty basic functionality that I require and it seems that there is no clear cut way to do this in MDS.
Any assistance will be appreciated.
Thanks
We do exactly the same thing.
I tested it, and whether you create a new member, or edit an existing member, the business rule just overwrites the manual input value in the name attribute.
Is there a specific 'business' reason why you need to restrict data input in the name field? If it is for Ux reasons, you can change the display name of the name attribute to something like 'Don't populate' or alternatively make it a '.', then the users won't know what to input.

Should the natural or surrogate key be returned in an API?

First time I think about it...
Until now, I always used the natural key in my API. For example, a REST API allowing to deal with entities, the URL would be like /entities/{id} where id is a natural key known to the user (the ID is passed to the POST request that creates the entity). After the entity is created, the user can use multiple commands (GET, DELETE, PUT...) to manipulate the entity. The entity also has a surrogate key generated by the database.
Now, think about the following sequence:
A user creates entity with id 1. (POST /entities with body containing id 1)
Another user deletes the entity (DELETE /entities/1)
The same other user creates the entity again (POST /entities with body containing id 1)
The first user decides to modify the entity (PUT /entities/1 with body)
Before step 4 is executed, there is still an entity with id 1 in the database, but it is not the same entity created during step 1. The problem is that step 4 identifies the entity to modify based on the natural key which is the same for the deleted and new entity (while the surrogate key is different). Therefore, step 4 will succeed and the user will never know it is working on a new entity.
I generally also use optimistic locking in my applications, but I don't think it helps here. After step 1, the entity's version field is 0. After step 3, the new entity's version field is also 0. Therefore, the version check won't help. Is the right case to use timestamp field for optimistic locking?
Is the "good" solution to return surrogate key to the user? This way, the user always provides the surrogate key to the server which can use it to ensure it works on the same entity and not on a new one?
Which approach do you recommend?
It depends on how you want your users to user your api.
REST APIs should try to be discoverable. So if there is benefit in exposing natural keys in your API because it will allow users to modify the URI directly and get to a new state, then do it.
A good example is categories or tags. We could have these following URIs;
GET /some-resource?tag=1 // returns all resources tagged with 'blue'
GET /some-resource?tag=2 // returns all resources tagged with 'red'
or
GET /some-resource?tag=blue // returns all resources tagged with 'blue'
GET /some-resource?tag=red // returns all resources tagged with 'red'
There is clearly more value to a user in the second group, as they can see that the tag is a real word. This then allows them to type ANY word in there to see whats returned, whereas the first group does not allow this: it limits discoverability
A different example would be orders
GET /orders/1 // returns order 1
or
GET /orders/some-verbose-name-that-adds-no-meaning // returns order 1
In this case there is little value in adding some verbose name to the order to allow it to be discoverable. A user is more likely to want to view all orders first (or a subset) and filter by date or price etc, and then choose an order to view
GET /orders?orderBy={date}&order=asc
Additional
After our discussion over chat, your issue seems to be with versioning and how to manage resource locking.
If you allow resources to be modified by multiple users, you need to send a version number with every request and response. The version number is incremented when any changes are made. If a request sends an older version number when trying to modify a resource, throw an error.
In the case where you allow the same URIs to be reused, there is a potential for conflict as the version number always begins from 0. In this case, you will also need to send over a GUID (surrogate key) and a version number. Or don't use natural URIs (see original answer above to decided when to do this or not).
There is another option which is to disallow reuse of URIs. This really depends on the use case and your business requirements. It may be fine to reuse a URI as conceptually it means the same thing. Example would be if you had a folder on your computer. Deleting the folder and recreating it, is the same as emptying the folder. Conceptually the folder is the same 'thing' but with different properties.
User account is probably an area where reusing URIs is not a good idea. If you delete an account /accounts/u1, that URI should be marked as deleted, and no other user should be able to create an account with username u1. Conceptually, a new user using the same URI is not the same as when the previous user was using it.
Its interesting to see people trying to rediscover solutions to known problems. This issue is not specific to a REST API - it applies to any indexed storage. The only solution I have ever seen implemented is don't re-use surrogate keys.
If you are generating your surrogate key at the client, use UUIDs or split sequences, but for preference do it serverside.
Also, you should never use surrogate keys to de-reference data if a simple natural key exists in the data. Indeed, even if the natural key is a compound entity, you should consider very carefully whether to expose a surrogate key in the API.
You mentioned the possibility of using a timestamp as your optimistic locking.
Depending how strictly you're following a RESTful principle, the Entity returned by the POST will contain an "edit self" link; this is the URI to which a DELETE or UPDATE can be performed.
Taking your steps above as an example:
Step 1
User A does a POST of Entity 1. The returned Entity object will contain a "self" link indicating where updates should occur, like:
/entities/1/timestamp/312547124138
Step 2
User B gets the existing Entity 1, with the above "self" link, and performs a DELETE to that timestamp versioned URI.
Step 3
User B does a POST of a new Entity 1, which returns an object with a different "self" link, e.g.:
/entities/1/timestamp/312547999999
Step 4
User A, with the original Entity that they obtained in Step 1, tries doing a PUT to the "self" link on their object, which was:
/entities/1/timestamp/312547124138
...your service will recognise that although Entity 1 does exist; User A is trying a PUT against a version which has since become stale.
The service can then perform the appropriate action. Depending how sophisticated your algorithm is, you could either merge the changes or reject the PUT.
I can't remember the appropriate HTTP status code that you should return, following a PUT to a stale version... It's not something that I've implemented in the Rest framework that I work on, although I have planned to enable it in future. It might be that you return a 410 ("Gone").
Step 5
I know you don't have a step 5, but..! User A, upon finding their PUT has failed, might re-retrieve Entity 1. This could be a GET to their (stale) version, i.e. a GET to:
/entities/1/timestamp/312547124138
...and your service would return a redirect to GET from either a generic URI for that object, e.g.:
/entities/1
...or to the specific latest version, i.e.:
/entities/1/timestamp/312547999999
They can then make the changes intended in Step 4, subject to any application-level merge logic.
Hope that helps.
Your problem can be solved either using ETags for versioning (a record can only modified if the current ETag is supplied) or by soft deletes (so the deleted record still exists but with a trashed bool which is reset by a PUT).
Sounds like you might also benefit from a batch end point and using transactions.

Why is a GUID used as the type for the Id fields in the EventSources and Events tables in an EventStore database?

Why is uniqueidentifier (which equates to a GUID in .NET) used as the type for the Id fields in the EventSources and Events tables?
Would it not be faster to use an integer type (like bigint in SQL Server) that functioned as an identity, so that the database could assign the Id as the inserts are performed?
I am a complete newb when it comes to Event Sourcing and CQRS, so I apologize if this has been asked and answered and my searching hasn't been correct enough to find the answer.
Note: Answers 2 and 4 assume that you are following a few basic principles of Domain-Driven Design.
IDs should be unique across different aggregate types and even across different bounded contexts
Every aggregate must always be in a valid state. Having a unique ID is part of that. This means you couldn't do anything with the aggregate before the initial event has been stored and the database generated and returned the respective ID.
A client that sends the initial command often needs some kind of reference to relate the created aggregate to. How would the client know which ID has been assigned? Remember, commands are void, they don't return anything beyond ack/nack, not even an ID.
A domain concern (identification of an entity) would heavily rely on technical implementation details (This should actually be #1)

Informative vs unique generated ID in REST API

Designing a RESTful API. I have two ways of identifying resources (person data). Either by the unique ID generated by the database, or by a social security number (SSN), entered for each person. The SSN is supposedly unique, though can be changed.
Using the ID would be most convenient for me, since it is guaranteed to be unique, and does not change. Hence the URL for the resource, also always stays the same:
GET /persons/12
{
"name": Morgan
"ssn": "840212-3312"
}
The argument for using SSN, is that it is more informative and understandable by API clients. SSN is also used more in surrounding systems:
GET /persons/840212-3321
{
"name": Morgan
"id": "12"
}
So the question is: Should I go with the first approach, and avoid some implementation headaches where the SSN may change. And maybe provide some helper method that converts from SSN to ID?
Or go with the second approach. Providing a more informative API. Though having to deal with some not so RESTful strangeness where URL:s might change due to SSN changes?
URL design is a personal choice. But to give you some more examples which differ from those Ray has already provided, I will give you some of my own.
I have a user account resource and allow access via both URIs:
/users/12
and
/users/morgan
where the numerical value is an auto_incremented ID, and the alphabetic value is a unique username on the system specified by the user. these resources are uncachable so I do not bother about canonicalisation, however the /users page links to the alphabetic forms.
No other resources on my system have two unique fields, so are referred to by IDs, /jobs/123, /quotations/456.
As you can see, I prefer plural URI segments ;-)
I think of "job 123" as being from the "jobs" collection, so it seems logical to have a "jobs" resource, with subresources for each job.
You do not need to have a separate /search/ area for performing searches, I think it would be cleaner to apply your search criteria to the collection resource directly:
/people?ssn=123456-7890 (people with SSN matching/containing "123456-7890")
/people?name=morgan (people who's name is/contains "Morgan")
I have something similar, but use only the first letter as a filter:
/sites?alpha=f
Lists all sites beginning with F. You can think of it as a filter, or as a search criteria, those terms are just different sides of the same coin.
Good to see someone taking time to think about their Resource urls!
I would make a Url with the unique id to provide resource to a single user. Like:
http://api.mysite.com/person/12/
Where 12 is your unique ID. Note that I also prefer the singular 'person'....
Regardless, the url should return:
{
"ssn": "840212-3312"
"name": "Morgan"
"id": "12"
}
However, I would also create a general search URL that returns a list of users that match the parameters (either a json array or whatever format you need). You can specify search parameters as get params like this:
http://api.mysite.com/person/search/?ssn=840212-3312
Or
http://api.mysite.com/person/search/?name=Morgan
These would return something like this for a single search hit--note it's an array, not a single item like the unique id url that points directly to a single user.
[{
"ssn": "840212-3312"
"name": "Morgan"
"id": "12"
}]
This search could then be later augmented for other search criteria. You might only return the unique id's via the search Url--you could always make a request to the unique id url once you've got it from the search...
I would suggest that you use neither. Generate resource IDs that are unique both to a single user of your API and across all other resources (including other users' resources).
Using the unique database ID is not ideal for a couple of reasons. First, API resources and database records won't necessarily always be 1-to-1 even if you have designed it that way today. Second, you might change to a different data store that would generate different format unique ids.
Also, it is good practice to separate out the ID from other resource properties, such as SSN (as an aside I hope you are storing SSN in a very secure manner, but that's another topic). If for whatever reason an SSN changed, more than one API resource was associated with the same SSN, or you decide that piece of data is not needed someday, you don't want to have to change the ID.
One pattern is to prepend the unique ID with a few characters that indicate the resource type. For example if User is a resource type in you API, a generated unique ID would be something like USR56382.
RESTful API is an architectural style which emphasizes on resource centric design approach.
In my opinion, I would keep the resources as plural and noun format.
Every resource, for example, customers has following uniform interfaces
POST /customers - for creating a resource instance
PUT /customers/{customerId} - for updating a particular instance
GET /customers - is for search customers. So #Ray, search is not required to be part of URI itself. Any filter or query parameters that need to be supported should be there itself.
GET /customers/{customerId} - to retrieve a particular instance of customer
DELETE /customers/{customerId} to delete a particular instance
The reason why plural, it is because it behaves as a factory. For example, when u r trying to create a new instance of a resource, the instance does not exist and therefore, it cannot be on the self instance. Hence, singularity is not used.
It also goes hand-in-hand for search/inquiry, where you do not know or hold the actual instance of resource. Hence, the plural form is much recommended.
Now, the question is what to use for a resource id - a database primary key, a generated identifier, or an encrypted token.
In my opinion, database primary keys should not be exposed. Resource identifier should not be designed 1-1 with DB primary key. But, it happens a lot. A generated UUID based key is much more recommended to avoid any sequential follow-through attack but world is not ideal always.
Coming back to token or an encrypted token, is a recommended approach for sensitive APIs, and where data exchange is performed between two separate applications. If we are using it, the encryption/decryption should be solely at the API end. That means, the encrypted keys for sub-resources should be returned as part of parent API response, otherwise it defeats the purpose.