custom sorting in sphinx / sort result by match & distance over a particular field - sphinx

I am using sphinx 2.0.
I want to achieve following results :
user will input tags with other search terms, documents associated with user input tags should come on top, sorted by distance.
After that documents does not contain those tags sorted by distance.
What i am doing:
I am searching on different parameters at the same time using like #name , #tag, #streetname etc.so i am using below
$cl->SetMatchMode(SPH_MATCH_EXTENDED);
and sorting the result by distance using $cl->SetSortMode(SPH_SORT_EXTENDED, '#geodist asc');
tag filed can contain multiple values i am using OR operator to get the desired results.
If i search for only #tags then i am able to achieve the requirement i have mentioned. but if user input is #tag food|dinner #city london #name taxi
then result with name: London Taxi, street: London comes on top or some other position breaking the sorting order by lat-long. because London is there in two parameters.i just want to sort by tag, do not want to include the weight of other search terms in sorting order.
Ranking mode is : $cl->setRankingMode(SPH_RANK_PROXIMITY_BM25);
any suggestion to overcome this issue ? or any other way to implement it.
Many Thanks.

I think the way to solve this would be to arrange for matches on the tag field to rank way way higher. Would have to test it but something like this...
$cl->setFieldWeights(array('tags' => 100000));
$cl->setSelect("*,IF(#weight>100000,1,0) AS matchtags");
$cl->SetSortMode(SPH_SORT_EXTENDED, 'matchtags DESC, #geodist ASC');

Related

use populate data to filter in search mongoose

Hello good evening can someone help me with this question please.
I want to bring all the users by the name of the area when it exists. But the name of the area is in a populate.
I have the image but it doesn't work for me
You can't query by populate.
But you can implement that instead:
First find Area by nombre_area
And then find Users with areaI equal to that area id
It's simple and should be performant, as I believe you don't have thousands of areas.
you have two solutions
1- separate the query and write a method & named it aeraExist(name) and filter it by name
2- in the populate add match & if that filter doesn't match it will return null
.populate({
path: "areaI",
match: {name: "Us"},
})

Extensive filtering

Example:
{
shortName: "KITT",
longName: "Knight Industries Two Thousand",
fromZeroToSixty: 2,
year: 1982,
manufacturer: "Pontiac",
/* 25 more fields */
}
Ability to query by at least 20 fields which means that only 10 fields are left unindexed
There's 3 fields (all number) that could be used for sorting (both ways)
This leaves me wondering that how does sites with lots of searchable fields do it: e.g real estate or car sale sites where you can filter by every small detail and can choose between several sort options.
How could I pull this off with MongoDB? How should I index that kind of collection?
Im aware that there are dbs specifically made for searching but there must be general rules of thumb to do this (even if less performant) in every db. Im sure not everybody uses Elasticsearch or similar.
---
Optional reading:
My reasoning is that index could be huge but the index order matters. You'll always make sure that fields that return the least results are first and most generic fields are last in index. However, what if user chooses only generic fields? Should I include non-generic fields to query anyway? How to solve ordering in both ways? Or index intersection saves the day and I should just add 20 different indexes?
text index is your friend.
Read up on it here: https://docs.mongodb.com/v3.2/core/index-text/
In short, it's a way to tell mongodb that you want full text search over a specific field, multiple fields, or all fields (yay!)
To allow text indexing of all fields, use the special symbol $**, and define it of type 'text':
db.collection.createIndex( { "$**": "text" } )
you can also configure it with Case Insensitivity or Diacritic Insensitivity, and more.
To perform text searches using the index, use the $text query helper, see: https://docs.mongodb.com/v3.2/reference/operator/query/text/#op._S_text
Update:
In order to allow user to select specific fields to search on, it's possible to use weights when creating the text-index: https://docs.mongodb.com/v3.2/core/index-text/#specify-weights
If you carefully select your fields' weights, for example using different prime numbers only, and then add the $meta text score to your results you may be able to figure out from the "textScore" which field was matched on this query, and so filter out the results that didn't get a hit from a selected search field.
Read more here: https://docs.mongodb.com/v3.2/tutorial/control-results-of-text-search/

Sphinx Search: excluding index B results from index A results

Here's my issue:
I have 2 indexes:
A - product titles only
B - product titles and product descriptions
By default I search index A to categorize products (e.g. most bikes have "bike" in title).
Sometimes there instances where to determine category (which might be a sub-category of something) we need to look at description, mostly to exclude irrelevant results. In order for pagination on search result page to work, I need to get this clean result as one array after running RunQueries().
But it does not work. It basically adds results of both queries, and looks like there's no way to subtract results. Anyone has any ideas?
Tell me if I'm completely missing something but it sounds to me like your trying to include results with product titles that match a certain query and exclude results with a description that matches another query?
If this is the case it seems to me that having 2 indexes is useless, and you can have one index with both product titles and descriptions and then run a full text search query as such:
#title queryA #description -queryB
You can use the same query to search for matches that have a title of queryA AND a description of queryB by simply removing the - symbol.
If this is off base the only other way I could think of doing it is using SphinxQL (I'm not well versed in any of the libraries since support for all the libraries which don't use SphinxQL is being phased out in the future as far as I've read)
Using SphinxQL you could run 2 queries, one which is like
SELECT id FROM indexB WHERE MATCH('#description queryB')
And then run a second query using a the list of ids you got from the first query as such
SELECT id FROM indexA WHERE id NOT IN(id1,id2,id3,...)

Is there a way to fetch max and min values in Sphinx?

Im using sphinx for document search. Each document has list of integer parameters, like "length", "publication date (unix)", popularity, ... .
The search process itself works fine. But is there a way to get a maximum and minimum fields values for a specified search query?
The main purpose is to generate a search form which will contain filter fields so user can select document`s length.
Or maybe there is another way to solve this problem?
It is possible if length, date etc are defined as attributes.
http://www.sphinxsearch.com/docs/current.html#attributes
Attributes are additional values
associated with each document that can
be used to perform additional
filtering and sorting during search.
Try GroupBy function by 'length' and select mix(length), max(lenght).
In SphinxQl it is like:
select mix(length), max(lenght) from index_123 group by length
The same for other attributes.

Lucene.Net give one field more weigt than another

I know I've seen this and I just can't seem to find it.
I have 2 fields that i am searching on - Name, and Tags. I want results that are based on a match on the "Name" field have a higher score than those based on the "tags" field.
how can I do this?
Thanks
Along with boosting during search, you can also boost fields differently during indexing. This means that a general search for a term that could show up in either field would still give a better score for those that match your preferred field without overtly stating where you're looking for the term.
You can use the boost operator:
title_wa:something^4
if the title matches 'something', then its score will be boosted according to the factor.