IBM Watson Assistant: Entity (number) extraction in French - ibm-cloud

I'm doing the restaurant tutorial for IBM Watson Assistant and I need to extract the number of persons for the reservation.
Since I'm French, I wanted to try this tutorial in French. However, French language being what it is, when you make a reservation for a table in English, there isn't any number in this phrase. But in french, the same phrase gives you "Je voudrai réserver une table".
"Une table" can be translated in different ways such as "a table" or... "one table" and this one table is interpreted by Watson Assistant as being the number of persons (even if I said that I would like to reserve a table for 5 people).
I was thinking to tell Waston to take only the second number but since you don't have to add the number of persons in the same phrase and eventually that you could put the number of persons before the "a table", this isn't a good solution.
Is there another solution that would be way better and more importantly would work in a majority of case ?

Related

Architecting support for marking susbtrings in text using Postgres, and attaching data to them

I need to support "Post"-like structures that contain text, but substrings of that text can be "linked" to some data by the user.
Example: "oh yes I know all about this"
Here the user marked the words know all, and linked it to some data that the user inserted (e.g - [date: __ , tags: _ , _ , media: ...]
The client implementation will be dependant on the architecture/implementation we choose using our Postgres DB.
Also, we would want to translate those posts and texts, which potentially linked/marked words in english for example can become one single word in another language.
e.g - "I have done" = "hice" in spanish. So relying on indexes is a bit problematic.
I thought of two approaches so far :
1. Approach I - Managing a list of indexes of highlighted text
post_links_table :
post_id
indexes
data
1
[9, 16]
{"data": 2/2/2022, "tags": [#abc, #def], "media": ....}
However:
a. indexes - what if user edited the post, and erased the words or made the link shorter/longer. Is managing a list of such indexes is really the most efficient and easiest way to handle this use-case ?
Also, as I mentioned above regarding translation : "I have done" can become one word - "hice" in Spanish. So relying on indexes is a bit problematic.
b. data - what if in the future I would like to query (or with/by) the "data" details, a json column could prove problematic. Although - this probably could be tackled by normalizing the data column.
2. Approach II - separating the text into jsons parts.
For example, our example above would turn into
[{"text", "oh yes I"},
{"link", "know all" },
{"text", "about this"}]
And then the "link" json could also contain some extra data the user inserted to that highlighted substring(s).
However, querying this post (by FTS or other querying) could prove to be inefficient.
Also, essentially this is just like managing a list of indexes. What if the user edited the post and the highlighted/linked parts?

Project Academic Knowlede | Query for and list papers by AA.AuId?

I've got a list of author names but I don't have Id's for any of them.
I'd like to:
Query by author name and store the most probable AuId.
List all papers written by a given AuId.
Is there any way to do this with the current interpret/evaluate APIs? It seems like everything is tied to a paper entity and I want to be sure I am only ever selecting and using one AuId.
Thanks.
I am not aware of such a feature. But indirectly, you could first search for the author name (AA.AuN in the expr-field), obtain all the (unique) various author IDs (AA.AuId in the attributes field), and search for their publications.
(You could even add orderby=logprob:desc, but to be honest, I am not 100% sure what logprob does.)
So, the first step could be to search for the author name (e.g. John Smith) like this and fetch all those AA.AuId where the names (AA.AuN) seem to fit John Smith (let's just add the orderby=logprob:desc):
https://api.labs.cognitive.microsoft.com/academic/v1.0/evaluate?&expr=Composite(AA.AuN=%27john%20smith%27)&count=100&attributes=AA.AuN,AA.AuId&orderby=logprob:desc&subscription-key={YOUR-KEY}
As a second step, if you have an Author ID AA.AuId (here, for example, 3038752200), use this to list their papers (ordered by year, in a descending manner orderby=Y:desc):
https://api.labs.cognitive.microsoft.com/academic/v1.0/evaluate?&expr=Composite(AA.AuId=3038752200)&count=100&attributes=AA.AuN,AA.AuId,DOI,Ti,VFN,Y&orderby=Y:desc&subscription-key={YOUR-KEY}
The approach would be more promising if you had an institutional affiliation as well. Then you could change the expr field to Composite(And(AA.AuN='{AUTHOR-NAME}',AA.AfId={AFFILIATION-ID})) so as to search for all {AUTHOR-NAMES} affiliated to {AFFILIATION-ID}.

Partially matching a post code with Algolia

I've loaded a dataset into an Algolia search index. Each item in the index is a shop with a catchment area (the catchment area is just an array of UK Postcodes that a store covers). For example:
['DS4 6','DS4 7', 'DS5 8, 'DS6 9' ... ]
The search feature is working to a point. If people search for "DS4" then Algolia returns several stores, but most people are typing their full post code (for example DS4 8XX) and this isn't returning anything even though "DS4" is indexed several times.
Is there a configuration in Algolia to search for the first part of a word, even when a person has 'typed past it'?
To clarify this a bit further. I could store every single individual postcode in a catchment area but there are millions and millions of them. A full UK postcode would be "DS4 7EN", so there are two more characters on the end representing a street in the UK. I've got the first part of a postcode: eg "DS4 7" because it seems excessive to store everything when I only really care about the wider area, ie: DS4, DS5, CV43, AB2 (and so on).
I could also probably use a places api and geocode the address. But I already have this catchment area postcode data, so it seems a shame not to use it if I can.
Algolia, like most search engines supports prefix search in order to allow search-as-you-type results, which is leveraged with InstantSearch libraries, where results are updated live as the user types. Without prefix search, you would have to wait for the user to enter an entire word before displaying any meaningful result.
In your case, since the catchment areas are indexed, e.g., DS4 6, when a user types DS4 6XX, no records will match the query since the query acts as a filter on the records based on their searchable attributes.
That said, I see two possible workaround that you can implement.
The first solution is to use the removeWordsIfNoResults index setting and set it to "Last Word". This will remove the last word of the query if there are no results. For instance, with the query DS4 6XX it will remove 6XX to just keep DS4 and retrieve the items that match this query. Note that this solution relies on the fact that DS4 6XX has two words (separated by a space) and it won't work with DS46XX.
The second solution is to change the structure of the records to add the full postcode in each item of the index. Since these are shops, I believe that it should be possible. This way your users will be able to search for both the full postcode DS4 6XX and the catchment areas DS4 6. Unless I misunderstood your problem, I don't see the need to store the full list of postcodes associated to a catchment area.

Access: Adding records with identical data, except for ID

I would like to be upfront. I am by no means an expert or even really all that technologically savy. However, I inherited a training system where the only way to find out if someone was current was to dig through physical file cabinets and try to find the hard copy. I have put together a basic access database to try and improve the situation. It is working okay, but I've run into a problem.
Previously, most training occurred in small enough batches that data entry is not a problem. (No more than 15-20 entries at any one time). However, regulatory changes now mandate the company put everyone through a mandated training course annually. This means all information about the training will be identical, except for the employee ID associated with the record.
Right now I can manually enter this training just like any other, but I have to perform this nearly identical data entry for each of the several hundred employees in the company.
I would like to be able to enter the pertinent details about the training and then have access create an training record for each employee.
The current form asks the user:
Who is the employee that was trained? (The appropriate employee ID # is entered)
Which subject was trained on? (the appropriate selection is made via combo box)
On what date was the training completed? Date picker is used to fill.
What is the file path to the scanned training certificate? (The majority of this field is prepopulated so only the actual file name needs to be typed. For the specific training in question all the employees of the company will be included in the same scanned pdf. Subsequently, this filed will be identical for all employees.)
The fields on the current form are:
txtEmpID – Text box, where employee ID # is entered. Corresponds to
field "empID"
cboTask – Combo box, where the appropriate training
subject is selected. Corresponds to field "reqID"
txtDate – Text box, the date the training was completed.
Corresponds to field "trngDate"
txtFilePath – Text box, file path to the scanned pdf of the physical
training record. Corresponds to field "trngLocat"
I would like to be able to fill in the information for 2-4 but then have access create a record, for each employee in my employees table, where all the data from 2-4 is identical.
Is this possible?
Pertinent Tables:
tblEmployees – keyed on field “empID” which is the employee number.
tblTrngSubjects - Keyed on field "reqID" which is autonumber.
tblTrngRec – keyed on field “recordID” which is autonumber. Relates
to tblEmployees through field “empID”. Relates to tblTrngSubjects
through field "reqID".
tblTrngRec is the table in which the records will be stored.
Other information that may be relevant:
I am using Access 2016.
I once had a copy of Access 2010 the missing manual…but that was in 2010. It has been almost a decade since I did anything more advanced than “docmd.openform”
I greatly appreciate any and all advice. Thanks, in advance.
I admit I haven't worked with access in quite some time, so some of the syntax might be slightly off. You need to know a list of employee IDs that were in that training.
Insert into tblTrngRec(empID,ReqID,txtDate,txtFilePath)
select empID
,25 'You need to enter this manually
,"6/9/2020" 'You need to enter this manually
,"Enter your file path"
from tblEmployees
where EmpID IN (enter a comma delimited list of employee IDs)

How can I match up user inputs to ambiguous city names?

We have a set of tables shown below we use for our other tables to reference for location data. Some examples are:
Find all companies within X miles of X City
Create a company profile's location as X City
We solve the problem of multiple cities with similar names by matching with State as well, but now we ran into a different set of problems. We use Google's Place Autocomplete for both Geocoding and matching up a users query with our Cities. This works fairly well until Google's format deviates from ours.
Example:
St. Louis !== Saint Louis and
Ameca del Torro !== Ameca Torro
Is there a way to fuzzy match cities in our queries?
Our query to match cities now looks like:
SELECT c.id
FROM city c
INNER JOIN state s
ON s.id = c.state_id
WHERE c.name = 'Los Angeles' AND s.short_name = 'CA'
I've also considered the denormalizing city and simply storing coordinates to still accomplish the radius search. We have around 2 million rows in our company table now so a radius search would be performed on that rather than by city table with a JOIN on company. This would also mean we wouldn't be able to create custom regions (simply anyway) for cities, and add other attributes to cities in the future.
I found this answer but it is basically affirming our way of normalizing input is a good method, but not how we match to our local Table (unless Google offers a City Name export I don't know about).
The short answer is that you can use Postgres's full text search functionality, with a customized search configuration.
Since your dealing with place names, your probably want to avoid stemming, so you can use the simple configuration as a starting point. You can also add stop-words that make sense for place names (with the examples above, you can probably consider "St.", "Saint", and "del" as stop-words).
A pretty basic outline of setting up your customized is below:
Create a stopwords file and put it in your $SHAREDIR/tsearch_data Postgres directory. See https://www.postgresql.org/docs/9.1/static/textsearch-dictionaries.html#TEXTSEARCH-STOPWORDS.
Create a dictionary that uses this stopwords list (you can probably use the pg_catalog.simple as your template dictionary). See https://www.postgresql.org/docs/9.1/static/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY.
Create a search configuration for place names. See https://www.postgresql.org/docs/9.1/static/textsearch-configuration.html.
Alter your search configuration to use the dictionary you created in Step 2 (cf. the link above).
Another consideration is how to consider internationalization. It seems that the issue for your second example (Ameca del Torro vs. Ameca Torro) might be a Spanish vs. English representation of the name. If that's the case, you could also consider storing both a "localized" and "universal" (e.g. English) version of the city name.
At the end, your query (using full-text search) might look like this (where the 'places' is the name of your search configuration):
SELECT cities."id"
FROM cities
INNER JOIN "state" ON "state".id = cities.state_id
WHERE
"state".short_name = 'CA'
AND TO_TSVECTOR('places', cities.name) ## TO_TSQUERY('places', 'Los & Angeles')