Sphinx exact match to column string - sphinx

I thought that
Match('^Word$')
Would only find records that are exactly 'Word'
However although this does work for single words it does not for mutliple:
Match ('^Final Word$')
Finds 'Final Word' and 'Final and Last Word'.
as does
Match ('^"Final Word"$')
How do I tell Sphinx to only find an exact match?
Update: After some testing the best I can do is weighting/ranker and w/o ""
MATCH('^Final Word$') order by weight() desc limit 1 desc OPTION ranker=PROXIMITY_BM25
So I forced an exact match with ranking and limit, still would be nice to know how to actually say 'only return exact matches'.
One issue with above is if I do not have 'Final Word' in the table it will find all others e.g. 'Final and Last Word' which is behavior I do not want.

You just got your operators in wrong order :)
Match('"^Final Word$ "')
(having a space after $ helps with some mysterious sphinx bug!)

So the issue turned out to be that in my efforts to make this work one step had been to specify the ranker
Option Ranker=PROXIMITY_BM25
which had worked for me up to then. What actually works is
Match('^Final Word$')
and then not specifying ranker or specifying extended if the ranker in config is defined otherwise (it is extended by default).

Related

Should I use FIND function or use For Loop to search matching record

I am writing a function to search a record from sheet2 and return value to sheet1. I use the Find function and it works for most of the case. However, I encounter below issus.
The FIND function only support one matching key only. If I have more than one value to match, it didn't work.
Sometime, the search value may have space like "Key1 ". Then, if I search the word by "Key1", no record return. I can't use the wildcard search as I need exact match except the space.
To solve this, should I simply use a For loop to search the result instead of FIND? I have to loop through every row. I am not sure the performance is okay or not.
No related to FIND. As I need to copy the value as well as the backgroup color. I use the return cells.interior.colorindex to set the field. But the color is a little bit different. For example, the original is light green but it turn out to become darken color. I check both field has same colorindex. Any idea?
goodfit
Thanks,
see anyone has idea how to resolve it.

Distinct count to include blank values

In the Running Total Fields, how do you set up a Distinct Count that includes blank values as one of your conditions?
Just found a solution. It's:
isnull({table.column})
Before asking, that's what I tried, but it wasn't working. So for people like me who tried that and it didn't work, that's because you have multiple conditions in your Running Total, and for whatever reason it only works when you edit your syntax and place that near the top of your conditions instead of the bottom. Don't know the reason, but it's working now.
I'd recommend replacing the field you're currently totaling with a new Formula. Something that doesn't ever come up blank, like:
If ISNULL {yourfieldhere} THEN "Blank" ELSE {yourfieldhere}
or if it's an empty string:
If {yourfieldhere}="" THEN "Blank" ELSE {yourfieldhere}
You can replace "Blank" with whatever suits you, even just an empty " " space or 0. But then it's at least something distinct to be counted.

Override a stemmed word on the fly in a query with Spinx?

If I turn on stemming/lemmatizer in sphinx can I push a term to it "as needed" that does not utilize stemming? I know I can use wordforms to always ignore that word from stemming e.g. Radiology > Radiology but that results in never stemming the word. I'm looking for a way to not add as a wordform exception but be able to in a query in essence say 'look exactly for "Radiology" and do not stem/lemmatize". I have tried "Radiology" instead of Radiology to no avail.
http://sphinxsearch.com/docs/current.html#conf-index-exact-words
:)
Then can do
=Radiology
(in extended match mode)

overpass-api: regex on keys

According to http://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_QL
queries can use regular expressions on both the values and the keys. While I have no trouble using regex on the values, I'm having a problem with the keys.
The example on the wiki referenced above says (among other examples):
/* finds addr:* tags with value exactly "Foo" */
node[~"^addr:.*$"~"^Foo$"];
So, that's an example of using regex on the keys and the values.
What I am interested in is the name key. Specifically the name:en key. There are a couple problems with searching by name. Not all names are in English, and for those nodes/way/relations whose names are not in English, there is no guarantee there will be a name:en tag with an English version of the name.
In general, there is no way to know in advance if the name will be in English or that there is a name:en tag. If you only ask for name or name:en, you run the risk of finding no hit. (Of course, searching for both is no guarantee of success, either.)
I have a case where I know name fails, but name:en succeeds. That is my test case. I can query the overpass-api.de/api/interpreter using this:
[out:json][timeout:25][bbox:33.465530,36.156006,33.608615,36.574516];
(
node[name~"duma",i][place];
way[name~"duma",i][place];
>;
relation[name~"duma",i][place];
node["name:en"~"duma",i][place];
way["name:en"~"duma",i][place];
>;relation["name:en"~"duma",i][place];
);
out center;
see it on overpass
and it works fine ("duma" is not found through name, but it is found with name:en), but I find it lengthy and somewhat repetitive.
I would like to use a regular expression involving the name and name:en tags, but either the server does not understand the query or I simply am using an incorrect regex.
Using the example shown in the wiki: node[~"^addr:.*$"~"^Foo$"]
I have tried:
[~"name|name:en"~"duma",i]
[~"name.*"~"duma",i]
[~"^name.*$"~"duma",i]
and several others. I even mimicked the example with [~"^name:.*"~"duma",i] just to see if anything would be returned.
Does overpass-api.de recognize regular expressions on the keys or do I just have the regex wrong? I don't get an error from overpass-api.de, just the coordinates of the bbox and an empty result. It's usually very strict about reacting to a poortly formatted query. Thanks in advance.
That's really a bug in the Overpass API implementation concerning case-insensitive key regex matching, see this Github ticket for details.
For the time being, you can already test the patch on the development box:
http://overpass-turbo.eu/s/b1l
BTW: If you don't need case-insensitive regexp matching, this should already work on overpass-api.de as of today.

Sphinx with metaphone and wildcard search

we are an anatomy platform and use sphinx for our search. We want to make our search more fuzzier and started to use metaphone to correct spelling mistakes. It finds for example phalanges even though the search word is falanges.
That's good but we want more. We want that the user could type in falange or even falang and we still find phalanges. Any ideas how to accomplish this?
If you are interested you can checkout our sphinx config file here.
Thanks!
Well you can enable both metaphone and min_prefix_len on an index at once. It will sort of work.
falange*
might then just work. (to match phalanges)
The problem is the 'stripped' letters may change the 'sound' of the word (because change the pronunciation)
eg falange becomes FLNJ, but falang acully becomes FLNK - so they no longer 'substrings' of one another. (ie phalanges becomes FLNJS, which FLNK* wont match)
... to be honest I dont know a good solution. You could perhaps get better results, if was to apply stemming, BEFORE metaphone. (so the endings that change the pronouncation of the words are removed.
Alas Sphinx can't do this. If you enable stemming and metaphone together, only ONE of the processors will ever fire.
Two possible solutions, implement stemming outside of sphinx (or maybe with regexp_filter. Not sure if say a porter stemmer can be implemnented purely with regular expressions)
or modify sphinx, so that ALL morphology processors apply. (rather than just the first one that changes the word)