Sphinxsearch min_infix_len = 1 is disabled by force on 2.2.x? - sphinx

I had a previous version of SphinxSearch that worked like charm. Was fast and the results were accurate for me. After upgrading to 2.2.10 many changes occurred on that release that made the search results much worse.
Now if I am searching for example "Lenovo y" from existing "lenovo y5070" I get no results although I have in my config:
min_word_len = 1
min_infix_len = 1
searching for "Lenovo y5" does work fine so to me it seems that the infix is forced to use "2" instead of 1. This is very bad for my search results. Any suggestions?

Try to add expand_keywords = 1 with index_exact_words = 1

Related

Are there more robust SphinxQL Diagnostics other than Show Meta?

I have a pretty complex sphinx index.
Recently I was getting results on an important word in most of my searches and was getting false positives (meaning text records without the word at all).
In order to see what was going on I did show meta to see if there was some synonyn or other issue with the term which was causing the false results.
However Show Meta showed 1 keyword, the one I entered.
total 100000
total_found 1254244
time 6.856
keyword[0] book
docs[0] 1254244
hits[0] 3037375
Yet the word was found in only a small fraction of the 125k+ records found..
I'm wondering if there is some extension to or alternative SphinxQL to'Show Meta' that will give more information or where a good place to start looking for the cause of such an issue (since I'd think Meta would indicate it but does not).
I checked my cfg and the word is no where to be found (not mapped or referenced).
I checked stopwords and exceptions ditto.
The cfg settings are pretty basic:
exceptions = /etc/sphinxsearch/lemmatizer/exceptions.txt
stopwords = /etc/sphinxsearch/lemmatizer/stopwords.txt
stopword_step = 0
index_sp=1
min_word_len = 1
min_infix_len = 1
min_stemming_len = 1
#index_field_lengths = 1
html_strip = 1
enable_star = 1
So I'm not clear where to even start looking for the issue and was hoping there might be some other diagnostic tools more robust than "Show Meta"

Use wordforms or regexp in Sphinx to force a mutilword term to be a "word"

Is there a way to "force" Sphinx to index a term such as e.g. iphone 5 into a single-term? For various reasons I can't search for it as "iphone 5" or iphone near\1 5 I need to search for it as iphone 5. Naturally the way Sphinx works this means that it searches for both iphone and 5 anywhere in the document when I want it to search for the exact term iphone 5. Can I somehow index iphone 5 into a single-term to make this happen.
I still need to be able to apply wordforms/regexp and other mapping to the term e.g.
iphone 5>iphone5
This way if someone searches on iphone5 it will find iphone 5 and vice-versa. The issue is if I a search is done on iphone 5 while it will find iphone5 it will also find Selling 5 iphone 6Gs as well whereas if I search on "iphone 5" it will not find iphone5. So my goal is to make iphone 5 into a term that does not require "" to be treated as a phrase without being forced to search as an exact phrase which will break any additional wordform/regexp matching.
Do you control the configuration of the index?
If so you can configure the index to be created with the index_exact_words option.
From the documentation (http://sphinxsearch.com/docs/current.html#conf-index-exact-words) :
12.2.42. index_exact_words
Whether to index the original keywords along with the stemmed/remapped versions. Optional, default is 0 (do not index). Introduced in version 0.9.9-rc1.
When enabled, index_exact_words forces indexer to put the raw keywords in the index along with the stemmed versions. That, in turn, enables exact form operator in the query language to work. This impacts the index size and the indexing time. However, searching performance is not impacted at all.
Example:
index_exact_words = 1
`

Unexpected results when star enabled

I have an index that looks like this:
index user_core
{
source = user_core_0
path = ...
charset-type = utf-8
min_infix_length = 3
enable_star = 1
}
We escape and wrap all of our searches in asterisks. Every so often, we'll come across a
very strange case in which something such as the following happens:
Search: mocuddles
Results: All users with nicknames containing "yellowstone".
This behavior seems unpredictable, but will happen every time on terms it does effect.
I've been told that there's no real way to debug Sphinx indexes. Is this true? Is there
any sort of "explain query" functionality?
I've confirmed at this point that these are instances of CRC32 hash collisions. Bummer.

Sphinx Search term boost

Is there a way I can add a weight to each word in my query?
I need to do something like this (Lucene query):
"word1^50|word2^45|word3^25|word4^20"
All answers I found online are old and I was hoping this changed.
UPDATE:
Sphinx introduced term boosting in version 2.2.3: http://sphinxsearch.com/docs/current/extended-syntax.html
Usage:
select id,weight() from ljplain where match('open source^2') limit 2 option ranker=expr('sum(max_idf)*1000');
No nothing really changed. The same old workarounds should still work tho.

Mongoid limit parameter ignored

I tried to grab the latest N records with a unique value (first_name).
So far:
#users = User.all(:limit => 5, :sort => [:created_at, :desc]).distinct(:first_name)
almost works..But ignores the limit and sort order
Also:
#users = User.limit(5).desc(:created_at).distinct(:first_name)
Ignores both 'limit' and 'desc'
#users = User.limit(5)
Works..
What am I doing wrong?
Any help would be greatly appreciated!
I played with this for a little while and this is the best I could come up with.
Good luck.
#users = User.desc(:created_at).reduce([]) do |arr, user|
unless arr.length == 5 || arr.detect{ |u| u.first_name == user.first_name }
arr << user
end
arr
end
Have you tried using a pagination gem such as amatsuda / kaminari and limiting the results using page().per()?
Both distinct and count ignore the limit command in Mongoid. With count you can pass true (i.e. User.limit(5).count(true)) to force it to pay attention to the scope. Unfortunately there is no such trick for distinct as far as I'm aware (see docs/source here).
If you want to just grab the first 5 first_name's you can do this (not distinct):
User.desc(:created_at).limit(5).map(&:first_name)
This will respect the limit, but still load 5 full objects from the database (then discard the rest of the object to give you full name). If you actually need to run distinct, you're better off heading toward an aggregation framework solution.
I haven't tested, but this seems to be what you're looking for: https://stackoverflow.com/a/17568267/127311
I played with some result I found this.
User.limit(2).count => 10 #but in array I found only two results
User.limit(2).to_a.count => 2
May be limit gives the correct result, but count query gives wrong result.