word suggestion based on input algorithm? - word

I am thinking of creating a web site, which lets people to rate restaurants. Since I don't have a database containing all the restaurants, this web site relies on user's inputs.
But there is a problem of this method, because people may use different word (name) to describe a same restaurant, but I don't want to create different entries inside the database, as they refer to the same restaurant.
For example, when describing KFC, somebody use the name "KFC", others may use "Kentucky Fried Chicken"
How can I make the system to automatically detect this? and give the user a list of existing items of the database.
This should quite similar to stackoverflow, which tells you "questions with similar title". But I don't know how to implement this.

You can't ... you have to create a list of the restaurant names and their "synonyms" and other possible spellings.

How can I make the system to automatically detect this?
The system doesn't know that "KFC" means "Kentucky Fried Chicken".
Make a map of synonyms, to let it know.
This should quite similar to stackoverflow, which tells you "questions with similar title"
It generally matches word-for-word. It may have an internal list of synonyms to deal with common cases.

Related

Watson Assistant: Can I define Intent using Entities in the Examples?

How to I create an #Intent which looks something like this:
How much is a #ProductType?
Whereas the #ProductType is an simple Entity which consists of:
Soft Drinks: Coke, Pepsi, Sprite, Fanta
Fruits: Apple, Banana, Watermelon
I tried adding an Intent with above settings, but it doesn't seem to work. Is such ability natively supported in IBM Watson? Or otherwise, do I need to manually handle in the Dialog, using Conditions and stuffs? Please kindly advise.
The training is based on regular language and typical sentences or phrases. So #ProductType is not what you want in the phrase, but any of the fruits or drinks.
By defining the entities, Watson Assistant later learns the connection and to identify the entities and intents.
To get started, you define the intents and entities. Both can be imported from lists. Then you add the dialog which references the different types.
This blog should give insight to all the ways to train an entity and how it is used within intents.
https://medium.com/ibm-watson/all-about-entities-dictionaries-and-patterns-with-watson-assistant-part-1-5ef7254df76b
There are a number of possible pipelines you can choose from.
1. Indirect references: this is the preferred method.
Use natural language in your intent training data. "I want to buy a pear"
Watson will automatically see the other values you have related to pear and use those as intent training as well. This will be the fastest and simplest way to manage your data
2. Direct references: this should only be used if absolutely necessary
Directly reference the entity in your intent data. "I want to buy an #pear"
Nothing is done in the UI to tell you this works, but it does. This tells Watson the entity is a very important term and will increase the weight, as well as reference all synonyms with high weight. This is more effort for you to go through your entire workspace and relabel everything this way, hence why it is not recommended unless absolutely necessary. By doing this, you also tell watson that when the system sees various fruits without the # symbol, to ignore them as entities which is not ideal
3. Contextual entities. This is highlighting them like in your screenshot.
Note the UI has been updated so there is no an annotation mode instead of just highlighting. This builds a model around the entity, and is good for things like names or locations, but not necessary for a small list of items like crayons in a box, or fruit in a store. This will ignore all of the dictionary values youve created and only look at the model. It should be used according to the blog above when the use case is ideal.
What #data_henrik answered was partially correct. But it doesn't seem like Watson Assistant "automatically" learns the preferred #Entity just by simply inputting the pure (plain-text) Examples into the #Intent. In fact, that step was required. But we still need to do one more step.
After keying in the good plain-text Examples into the #Intent, we then still need to "right click" on the text-string of the possible #Entity entry, and then choose (teach Watson) the correct #Entity name from the dropdown list appeared.
Only then Watson starts to understand such; this #Intent uses that #Entity, I suppose.
Thank you #data_henrik, and appreciate your hint.

What's the correct RESTful way to structure A Website's Parent/Child Views

NOTE: This is not specifically for an API.
I have three Entities: Building Unit Person
These are pure simple easy Exclusive 1:M relationships
A Person can only live in (1) unit
A Unit can only exist in (1) unit
The Building is essentially the parent.
Should I have URLs like:
The View mode is pretty easy
/buildings //Show all buildings
/buildings/[id] //Show one building
/buildings/[id]/units //Show all units in a building
/buildings/[id]/units/[id]/people //Show all people in a unit
However, this seems kind of verbose. While those URLs may work for PUTS and POSTS which redirect to a GET, if I want to show all the units and people in a building, should I be using a route like buildings/[id]/details or is there some other standard convention?
Also, when I want to display a form to edit the values, is there a standard url path like buildings/[id]/edit A POST and a PUT in this case will essentially be using the same form ( but with the PUT having the fields filled out ) .
I think your question may attract some opinionated answers, but it'd be good to hear about other peoples' practices regarding RESTful API designs.
You say your paths seem kind of verbose, and you may feel that way if your IDs are auto incremented integers and the only way to specify buildings, units, etc is with paths like
buildings/1/units/4/tenants
buildings/1/units/4/tenants/5
To me these are clear interfaces. If I had to maintain your code, I'd think it's pretty obvious what's going on here. If I had to criticize something, I would say you seem to be developing in a way that allows for all or one selection. It's your design choice, though. Maybe that's exactly what you need in this case. Here are some examples that come to mind.
update one tenant
PUT buildings/1/units/4/tenants/2
create three units
POST buildings/2/units //carries message body for SQL in back end
read tenants with certain criteria
GET buildings/1/tenants?params= //GET can't carry a message body
delete tenants with certain criteria
DELETE buildings/5/tenants?criteria= //params needed?

Which features should be added for NER in search result snippets

I want to cluster queries by help of the snippets of the search engine results they are currently returning. While using the noun phrases in the snippet worked well for Google results I felt that I should try a different approach for bing snippets and hence was going for Named Entity Extraction.
I have identified the following entities that can be extracted as of now using standard tools:
Person Names
Organisation Names
Locations
But I think I should be extracting more entities. Could anyone help me out here to identify more entities that may be useful?
This is an endless list, once you get to real data problems.
For example, dates are a common thing to extract. But for example booking codes such as airline tickets, or tracking codes such as parcels are something Google Mail already recognizes and extracts.
I don't think this is a very good question for a Q/A site. Plus, you may want to read more literature, and see what kind of data you can get - it clearly is data-driven what entities you want to extract. When analyzing log files, you might be interested in extracting host names, IPs, usernames and daemon/serivce names, for example.

Can I use Apache Mahout Taste for User Preferences matching?

I am trying to match objects based on predefined user preferences. A simple example would be finding best matching vechicle.
Lets say a user 'Tom' is offered a rented vehicle for travel based on his predefined preferences. In this case, the predefined user preferences will be -
** Pre-defined user preferences for Tom:
PreferredVehicle (Make='ANY', Type='3-wheeler/4-wheeler',
Category='Sedan/Hatchback', AC/Non-AC='AC')
** while the 10 available vehicles are -
Vechile1(Make='Toyota', Type='4-wheeler', Category='Hatchback', AC/Non-AC='AC')
Vechile2(Make='Tata', Type='3-wheeler', Category='Transport', AC/Non-AC='Non-AC')
Vechile3(Make='Honda', Type='4-wheeler', Category='Sedan', AC/Non-AC='AC')
;
;
and so on upto 'Vehicle10'
All I want to do is - choose a vehicle for Tom that best matches his preferences and also probably give him choices in order, i.e. best match first.
Questions I have :
Can this be done with Mahout Taste?
If yes, can someone please point me to some example code where I can start quickly?
A recommender may not be the best tool for the job here, for a few reasons. First, I don't expect that the best answers are all that personal in this domain. If I wanted a Ford Focus, the best alternative you have is likely about the same for most every user. Second, there is not much of a discovery problem here. I'm searching for a vehicle that meets certain needs; I don't particularly want or need to find new and unknown vehicles, like I would for music. Finally you don't have much data per user; I assume most users have never rented before, and very few have even 3+ rentals.
Can you throw this data at a recommender anyway? Sure, try Mahout Taste (I'm the author). If you have the book Mahout in Action it will walk you through it. Since it's non-rating data, I can also recommend the successor project, Myrrix (http://myrrix.com) as it will be easier to set up and run. You can at least evaluate the results to see if it's anywhere near useful.
Either way, your work will just be to make a CSV file of "userID,vehicleID" pairs from your data and feed it in. Then it will give you vehicle IDs as recommendations for any user ID.
But, I imagine you will do much better to analyze what people picked when the car wasn't available, and look at the difference, and learn which attributes they are most and least likely to be sacrificed, and learn to score the alternatives that way. This is entirely feasible since this data set is small, and because you have rich item attribute data.

How to address semantic issues with tag-based web sites

Tag-based web sites often suffer from the delicacy of language such as synonyms, homonyms, etc. For programmers looking for information, say on Stack Overflow, concrete examples are:
Subversion or SVN (or svn, with case-sensitive tags)
.NET or Mono
[Will add more]
The problem is that we do want to preserve our delicacy of language and make the machine deal with it as good as possible.
A site like del.icio.us sees its tag base grow a lot, thus probably hindering usage or search. Searching for SVN-related entries will probably list a majority of entries with both subversion and svn tags, but I can think of three issues:
A search is incomplete as many entries may not have both tags (which are 'synonyms').
A search is less useful as Q/A often lead to more Qs! Notably for newbies on a given topic.
Tagging a question (note: or an answer separately, sounds useful) becomes philosophical: 'Did I Tag the Right Way?'
One way to address these issues is to create semantic links between tags, so that subversion and SVN are automatically bound by the system, not by poor users.
Is it an approach that sounds good/feasible/attractive/useful? How to implement it efficiently?
Recognizing synonyms and semantic connections is something that humans are good at; a solution to organizing an open-ended taxonomy like what SO is featuring would probably be well served by finding a way to leave the matching to humans.
One general approach: someone (or some team) reviews new tags on a daily basis. New synonyms are added to synonym groups. Searches hit synonym groups (or, more nuanced, hit either literal matches or synonym group matches according to user preference).
This requires support for synonym groups on the back end (work for the dev team). It requires a tag wrangler or ten (work for the principals or for trusted users). It doesn't require constant scaling, though—the rate at which the total tag pool grows will likely (after the initial Here Comes Everybody bump of the open beta) will in all likelihood decrease over time, as any organic lexicon's growth-rate does.
Synonymy strikes me as the go-to issue. Hierarchical mapping is an ambitious and more complicated issue; it may be worth it or it may not be, but given the relative complexity of defining the hierarchy it'd probably be better left as a Phase 2 to any potential synonym project's Phase 1.
The way the software on blogspot.com is set up, is that there is an ajax-autocomplete-thingie on the box where you write the name of the tags. This searches all your previous posts for tags that start with the same letters. At least that way you catch different casings and spellings (but not synonyms).
How would the system know which tags to semantically link? Would it keep an ever-growing map of tags? I can't see that working. What if someone typed sbversion instead? How would that get linked?
I think that asking the user when they submit tags could work. For example, "You've entered the following tags: sbversion, pascal and bindings. Did you mean, "Subversion", "Pascal" and "Bindings"?
Obviously the system would have to have a fairly smart matching system for that to work. Doing it this way would be extra input for the user (which'd probably annoy them) but the human input would, if done correctly, make for less duplicate tags.
In fact, having said all that, the system could use the results of the user's input as a basis for automatic tag matching. From the previous example, someone creates a tag of "sbversion" and when prompted changes it to "Subversion" - the system could learn that and do it automatically next time.
Part of the issue you're looking at is that English is rife with synonyms - are the following different: build-management, subversion, cvs, source-control?
Maybe, maybe not. Having a system, like the one [now] in use on SO that brings up the tag you probably meant is extremely helpful. But it doesn't stop people from bulling-through the tagging process.
Maybe you could refuse to accept "new" tags without a user-interaction? Before you let 'sbversion' go in, force a spelling check?
This is definitely an interesting problem. I asked an open question similar to this on my blog last year. A couple of the responses were quite insightful.
I completely agree. The mass of tags that have currently. I don't participate in other tagged based sites. However having a hierarchy of tags would be very helpful, instead of ruby rails ruby-on-rails rubyonrails etc...
Tags are basically our admission that search algorithms aren't up to snuff. If we can get a computer to be smart enough to identify that things tagged "Subversion" have similar content to things tagged "svn", presumably we can parse the contents, so why not skip tags altogether, and match a search term directly to the content (i.e., autotagging, which is basically mapping keywords to results)?!
The problem is to make the search engine use the fact that 'subversion' and 'svn' are very similar to the point that they mean the same 'thing'.
It might be attractive to compute a simple similarity between tags based on frequency: 'subversion' and 'svn' appear very often together, so requesting 'svn' would return SVN-related questions, but also the rare questions only tagged 'subversion' (and vice versa). However, 'java' and 'c#' also appear often together, but for very different reasons (they are not synonyms). So similarity based on frequency is out.
An answer to this problem might be a mix of mechanisms, as the ones suggested in this Q/A thread:
Filtering out typos by suggesting tags when the user inputs them.
Maintaining a user-generated map of synonyms. This map may not be that big if it just targets synonyms.
Allowing multi-tag search, such that the user can put 'subversion svn' or 'subversion && svn' (well, from programmers to programmers) in the search box and get both. This would be quite practical as many users may actually try such approach when they do not know which term is the most meaningful.
#Nick: Agreed. The question is not meant to argue against tags. Tags have great potential, but users will face a growing issue if one cannot search 'across' tags.
#Steve: Maintaining an ever-growing map of tags is definitely not practical. As SO is accumulating an ever-growing bag of tags, how could we shade some light on this bag to make search of Q/A tags even more useful, in a convenient way?
#Espo: 'Ajax-powered' tag suggestions based on existing tags is apparently available on SO when creating a question. This is by the way very helpful to choose tags and appropriate spelling (avoiding the 'subversion' vs. 'sbversion' issue from Steve).