Elasticsearch autocomplete double-matches a query term - autocomplete

I've implemented autocomplete in Elasticsearch using edge-ngrams. Everything is working correctly, but there is a strange case which my implementation is not smart enough to handle.
Suppose I have indexed two documents,
Green Dragon
Green Griffin
and I type
green gr
the results I get back are
Green Dragon
Green Griffin
I am using a "match" query with the "and" operator, so every term in the query must match in order for the query to match. The reason Green Dragon is returned is because the query term "green" matches "Green" and the query term "gr" also matches "Green". Of course I want to exclude Green Dragon from the results.
It seems like to solve this problem Elasticsearch would need to keep track of which tokens in the index have been matched and not reuse them. Is there any way to do this within Elasticsearch?

Change field analyzer. You have nice explanation here.
http://www.elasticsearch.org/blog/starts-with-phrase-matching/

Related

Lucene Wildcard Query - Length of matched string

I have set up a simple lucene.net index and am testing out a few queries.
I have an index with a field called "Biography" and i am running this query
WildcardQuery query = new WildcardQuery(new Term("Biography", "*anag*"));
This returns back matches for records with the word Management - which is great
If i search for this...
WildcardQuery query = new WildcardQuery(new Term("Biography", "*anagm*"));
then i get no results.
Here are the 2 strings i have in the index
"im good at project management"
"im good at programming and project management. i like managing things"
Is there a character limit to wildcard searching?
My usecase will be a free text search box for users - hence im not sure what they may type in and wanting to do a wildcard
The partial word "anagm" does not occur in either of your two sentences so returning 0 results should be the expected behavior:
"im good at project management"
"im good at programming and project management. i like managing things"
Which sentence did you think would match? and Why?
Lucene is more often used to match words or more specifically tokens from the original sentences. Doing wildcard matches with Lucene (as one might do with Sql) is quite a bit less common since leading with a wild card is not performant (just as it is not with sql either).

How do I make Algolia Search guaruntee that it will return a number of results

I need Algolia to always return me 5 results from a full text search even if the query text itself bears little or no relevance to the actual returned results. Before someone suggests it, I have already tried to set the removeWordsIfNoResults option to all of it's possible modes and this still doesn't guarantee that I get my 5 results.
The purpose of this is to create a 'relevant entities' sidebar where the name of the current entity is used to search for other entities.
Any suggestions?
Using the removeWordsIfNoResults=allOptional query parameter is indeed a good way to go -> because all query words are required to match an object by default, fallbacking to "optional" is a good way to still retrieve results if one you the query words (or the combination of words) doesn't match anything.
index.search(query, { removeWordsIfNoResults: 'allOptional' });
Another solution is to always consider all query words as optional (not only as a fallback); to make sure the query foo bar baz is interpreted as OPT(foo) AND OPT(bar) AND OPT(baz) <=> foo OR bar OR baz. The difference is that this query will retrieve more results than the previous one because 1 single matching word will be enough to retrieve the object.
index.search(query, { optionalWords: query });
That being said, there is no way to force the engine to retrieve "at least" 5 results. What I would recommend is to have a small frontend logic:
- do the query with removeWordsIfNoResults or optionalWords
- if the engines returns less than 5 results, do another query

Is it possible to see which attributes were matched in a Sphinx resultset?

I have an index which has several different attributes.
MySQL [(none)]> select * FROM products_index WHERE MATCH('red shoes');
This returns a bunch of results. Magic. Love Sphinx.
Now, is it possible to see which attribute Sphinx matched on for each of these results?
For example, I have a "colour" field which the "red" would be matching on (potentially), but it could also match on the product name attribute.
I think PACKEDFACTORS() is the only way to do this
http://sphinxsearch.com/docs/current.html#expr-func-packedfactors
It a little cumbersome to use, and adds a bit of overhead to the query, but should work.
(other than post matching, eg using Snippets)

sphinx get non-stemmed results

When I query sphinx, it first applies stemming to my input keywords and gives me back a set of results with words that matched. The problem is that the result keywords are also stemmed.
Is there a way to get back from sphinx the original search keyword and not the stemmed one. For example, if I do a batch query with the following words:
credit card
working
walked
and suppose sphinx found a match for credit card. The problem is that sphinx returns me "cred card" and I have to manually check (by comparing strings) with which one of the above keywords the document(s) matched. And this could be very inefficient in my circumstances.
Any suggestion?

Thinking sphinx fuzzy search?

I am implementing sphinx search in my rails application.
I want to search with fuzzy on. It should search for spelling mistakes e.g if is enter search query charact*a*ristics, it should search for charact*e*ristics.
How should I implement this
Sphinx doesn't naturally allow for spelling mistakes - it doesn't care if the words are spelled correctly or not, it just indexes them and matches them.
There's two options around this - either use thinking-sphinx-raspell to catch spelling errors by users when they search, and offer them the choice to search again with an improved query (much like Google does); or maybe use the soundex or metaphone morphologies so words are indexed in a way that accounts for how they sound. Search on this page for stemming, you'll find the relevant section. Also have a read of Sphinx's documentation on the matter as well.
I've no idea how reliable either option would be - personally, I'd opt for #1.
By default, Sphinx does not pay any attention to wildcard searching using an asterisk character. You can turn it on, though:
development:
enable_star: true
# ... repeat for other environments
See http://pat.github.io/thinking-sphinx/advanced_config.html Wildcard/Star Syntax section.
Yes, Sphinx generaly always uses the extended match modes.
There are the following matching modes available:
SPH_MATCH_ALL, matches all query words (default mode);
SPH_MATCH_ANY, matches any of the query words;
SPH_MATCH_PHRASE, matches query as a phrase, requiring perfect match;
SPH_MATCH_BOOLEAN, matches query as a boolean expression (see Section 5.2, “Boolean query syntax”);
SPH_MATCH_EXTENDED, matches query as an expression in Sphinx internal query language (see Section 5.3, “Extended query syntax”);
SPH_MATCH_EXTENDED2, an alias for SPH_MATCH_EXTENDED;
SPH_MATCH_FULLSCAN, matches query, forcibly using the "full scan" mode as below. NB, any query terms will be ignored, such that filters, filter-ranges and grouping will still be applied, but no text-matching.
SPH_MATCH_EXTENDED2 was used during 0.9.8 and 0.9.9 development cycle, when the internal matching engine was being rewritten (for the sake of additional functionality and better performance). By 0.9.9-release, the older version was removed, and SPH_MATCH_EXTENDED and SPH_MATCH_EXTENDED2 are now just aliases.
enable_star
Enables star-syntax (or wildcard syntax) when searching through prefix/infix indexes. >Optional, default is is 0 (do not use wildcard syntax), for compatibility with 0.9.7. >Known values are 0 and 1.
For example, assume that the index was built with infixes and that enable_star is 1. Searching should work as follows:
"abcdef" query will match only those documents that contain the exact "abcdef" word in them.
"abc*" query will match those documents that contain any words starting with "abc" (including the documents which contain the exact "abc" word only);
"*cde*" query will match those documents that contain any words which have "cde" characters in any part of the word (including the documents which contain the exact "cde" word only).
"*def" query will match those documents that contain any words ending with "def" (including the documents that contain the exact "def" word only).
Example:
enable_star = 1