Infer one noun from another - algolia

In this scenario, there are descriptive words for users, which for argument's sake would be "plumbing". Is there any way to fetch a result that contains "plumbing" when searching for "plumber"? I've turned exact word matching off, and tried to play with other options, but I can't seem to match that exact situation.

Related

Use exact search with OR operator inside Sphinx query

There are different search options possible with sphinx extended syntax.
Exact search: "I love to eat" //will match exact phrase
OR search: (eat|sleep|dream)
Is it possible to mix them and build queries like that:
"I love to (eat|sleep|dream)"
I know it is possible to simplify it and split OR condition to different exact phrases, like:
"I love to eat" | "I love to sleep" | "I love to dream"
But I plan to use lot of OR groups with lot of options inside, and extending this query will end up with huge one.
So is it possible to use OR syntax inside exact match syntax in Sphinx?
No, its not possible to use 'OR' within the Phrase Operator (the "s around words that enforces adjacent words) - the proper name for what you call 'exact match'.
Alas there isn't another combined 'strict order' operator and 'near' (ie there isnt a 'just before' operator). So you forced to use both, so something like
("I love to" << (eat|sleep|dream)) NEAR/3 ("I love to" NEAR/1 (eat|sleep|dream))
Which is no simpler, and would argue is more complicated and convoluted! The NEAR/3 in the middle is needed to make sure you matching on the same sentance within the document (otherwise there are edge cases with false positives).
An off the wall idea, if you have a lot of these 'OR' lists, is rather than implement them in the query, use wordforms instead. The drawback is you need to know them in advance (ie compiled into the index) and 'opting out' is more complicated.

overpass-api: regex on keys

According to http://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_QL
queries can use regular expressions on both the values and the keys. While I have no trouble using regex on the values, I'm having a problem with the keys.
The example on the wiki referenced above says (among other examples):
/* finds addr:* tags with value exactly "Foo" */
node[~"^addr:.*$"~"^Foo$"];
So, that's an example of using regex on the keys and the values.
What I am interested in is the name key. Specifically the name:en key. There are a couple problems with searching by name. Not all names are in English, and for those nodes/way/relations whose names are not in English, there is no guarantee there will be a name:en tag with an English version of the name.
In general, there is no way to know in advance if the name will be in English or that there is a name:en tag. If you only ask for name or name:en, you run the risk of finding no hit. (Of course, searching for both is no guarantee of success, either.)
I have a case where I know name fails, but name:en succeeds. That is my test case. I can query the overpass-api.de/api/interpreter using this:
[out:json][timeout:25][bbox:33.465530,36.156006,33.608615,36.574516];
(
node[name~"duma",i][place];
way[name~"duma",i][place];
>;
relation[name~"duma",i][place];
node["name:en"~"duma",i][place];
way["name:en"~"duma",i][place];
>;relation["name:en"~"duma",i][place];
);
out center;
see it on overpass
and it works fine ("duma" is not found through name, but it is found with name:en), but I find it lengthy and somewhat repetitive.
I would like to use a regular expression involving the name and name:en tags, but either the server does not understand the query or I simply am using an incorrect regex.
Using the example shown in the wiki: node[~"^addr:.*$"~"^Foo$"]
I have tried:
[~"name|name:en"~"duma",i]
[~"name.*"~"duma",i]
[~"^name.*$"~"duma",i]
and several others. I even mimicked the example with [~"^name:.*"~"duma",i] just to see if anything would be returned.
Does overpass-api.de recognize regular expressions on the keys or do I just have the regex wrong? I don't get an error from overpass-api.de, just the coordinates of the bbox and an empty result. It's usually very strict about reacting to a poortly formatted query. Thanks in advance.
That's really a bug in the Overpass API implementation concerning case-insensitive key regex matching, see this Github ticket for details.
For the time being, you can already test the patch on the development box:
http://overpass-turbo.eu/s/b1l
BTW: If you don't need case-insensitive regexp matching, this should already work on overpass-api.de as of today.

Sphinx with metaphone and wildcard search

we are an anatomy platform and use sphinx for our search. We want to make our search more fuzzier and started to use metaphone to correct spelling mistakes. It finds for example phalanges even though the search word is falanges.
That's good but we want more. We want that the user could type in falange or even falang and we still find phalanges. Any ideas how to accomplish this?
If you are interested you can checkout our sphinx config file here.
Thanks!
Well you can enable both metaphone and min_prefix_len on an index at once. It will sort of work.
falange*
might then just work. (to match phalanges)
The problem is the 'stripped' letters may change the 'sound' of the word (because change the pronunciation)
eg falange becomes FLNJ, but falang acully becomes FLNK - so they no longer 'substrings' of one another. (ie phalanges becomes FLNJS, which FLNK* wont match)
... to be honest I dont know a good solution. You could perhaps get better results, if was to apply stemming, BEFORE metaphone. (so the endings that change the pronouncation of the words are removed.
Alas Sphinx can't do this. If you enable stemming and metaphone together, only ONE of the processors will ever fire.
Two possible solutions, implement stemming outside of sphinx (or maybe with regexp_filter. Not sure if say a porter stemmer can be implemnented purely with regular expressions)
or modify sphinx, so that ALL morphology processors apply. (rather than just the first one that changes the word)

Doing a different kind of completion

I am writing a function that uses the minibuffer and requires a somewhat different style of completion that might require deleting some characters. For example:
ar<tab> -> artist:
artist:ba<tab> -> artist:'Johann Sebastian Bach'
artist:'Johann Sebastian Bach'<tab> -> artist:'Bela Bartók'
artist:'Bela Bartók' and album:<tab>
etc...
I've already written the completion function, that generates a list of possible strings for the current input, yet I cannot use it with completing-read and completion-table-dynamic, because only the alternatives that do not require deletion are displayed. In this case, only the first step, from ar to artist.
To do the job, I'm considering using the lower-level (read-from-minibuffer) with a custom keymap to do the completion and display the alternatives. Is there a simpler solution? If not, which functions are there to handle displayed and cycling through the Completions buffer?
Thanks!
EDIT: In the end, I rolled my own. Here is the code, if anybody's interested.
Yes, Icicles should give you what you want, IIUC.
I'm not sure how you're handling things like 'artist' and 'album', but there are a few ways Icicles could help.
You can match parts of a completion candidate in any order. So if 'foo' and 'bar' are parts of the same candidate, then you can match any candidates that have both, in either order. This is "progressive completion": you can keep adding patterns to narrow the choices. Patterns are ANDed. You can also subtract candidates that match a pattern.
You can have candidates that are "multi-completions". These are composed of parts, separated by a configurable string. You can match against any or all parts. So, for example, a candidate might be an album name plus artist names.
If you also use Bookmark+ then you can tag files, a la delicious.com tagging. You need not visit files to tag them. Tags are generally strings (including newlines if you want), but they can also have associated Lisp values. Here, you might have a file for each album and tag albums with descriptions and various artists. Then you can complete to find a given album file by matching parts of its name and/or artists.
For all Icicles completion, you can use progressive completion and complementing (#1). For each part of a progression you can use substring or regexp matching (or even fuzzy matching of various sorts).
Some links that might help:
http://www.emacswiki.org/emacs/Icicles_-_Progressive_Completion
http://www.emacswiki.org/emacs/Icicles_-_Multi-Completions
http://www.emacswiki.org/emacs/Icicles_-_Bookmark_Enhancements -- see, for example, command `icicle-find-file-tagged'
You may benefit from the Icicles library. It contains many features for enhancing minibuffer completion.

Test Suite for Double Metaphone?

I've translated Double-Metaphone into ActionScript3 and I want to test it (obviously) before I release the source to ... um ... the open.
I'm looking for a long list of names with the primary and secondary codes. Google does not find anything except one list with pairs of names (presumably they should match).
Thanks
You could find someone else's double metaphone implementation, run it on the same long list of words, and compare the results to your own.
For long lists of words, I like infochimps. They have lots of word lists, like this one of 350,000 english words or this one of place names, and many more.
Here are implementations you can compare your results against. Here is an online example, except that it tests only one word at a time - I guess you'll have to download and run one of the scripts to test a large list of words.
For each word, two codes will be returned; you'll probably want to test that both codes returned match the ones returned of another implementation. You probably know that the reference implementation is here with full source code here, but including the links anyway for others' benefit.