Lucene.Net give one field more weigt than another - lucene.net

I know I've seen this and I just can't seem to find it.
I have 2 fields that i am searching on - Name, and Tags. I want results that are based on a match on the "Name" field have a higher score than those based on the "tags" field.
how can I do this?
Thanks

Along with boosting during search, you can also boost fields differently during indexing. This means that a general search for a term that could show up in either field would still give a better score for those that match your preferred field without overtly stating where you're looking for the term.

You can use the boost operator:
title_wa:something^4
if the title matches 'something', then its score will be boosted according to the factor.

Related

MongoDB get all documents, sort by a field and add ordering field based on the sorting

I am trying to sort the result of my mongoDB query and add a ranking based on that sorting. Currently I only call .find().sort({total: 1}) and this gives me the correct ordering of the documents. But is it possible to "add a field" based on that sorting (basically a ranking field, starting from 1 and counting up)? I tried googling but didnt found anything that suits for this purpose.
Thanks in advance.

Is it possible to see which attributes were matched in a Sphinx resultset?

I have an index which has several different attributes.
MySQL [(none)]> select * FROM products_index WHERE MATCH('red shoes');
This returns a bunch of results. Magic. Love Sphinx.
Now, is it possible to see which attribute Sphinx matched on for each of these results?
For example, I have a "colour" field which the "red" would be matching on (potentially), but it could also match on the product name attribute.
I think PACKEDFACTORS() is the only way to do this
http://sphinxsearch.com/docs/current.html#expr-func-packedfactors
It a little cumbersome to use, and adds a bit of overhead to the query, but should work.
(other than post matching, eg using Snippets)

Stemming does not work properly for MongoDB text index

I am trying to use full text search feature of MongoDB and observing some unexpected behavior. The problem is related to "stemming" aspect of the text indexing feature. The way full text search is described in many articles online, if you have a string "big hunting dogs" in a document's field that is part of the text index, you should be able to search on "hunt" or "hunting" as well as on "dog" or "dogs". MongoDB should normalize or stem the text when indexing and also when searching. So in my example, I would expect it to save words "dog" and "hunt" in the index and search for a stemmed version of this words. If I search for "hunting", MongoDB should search for "hunt".
Well, this is not how it works for me. I am running MongoDB 2.4.8 on Linux with full text search enabled. If my record has value "big hunting dogs", only searching for "big" will produce the result, while searches for "hunt" or "dog" produce nothing. It is as if the words that are not in their "normalized" form are not stored in the text the index (or stored in a way it cannot find them). Searches using $regex operator work fine, that is I am able to find the document by searching on a string like /hunting/ against the field in question.
I tried dropping and recreating the full text index - nothing changed. I can only find the documents containing the words on their "normal" form. Searching for words like "dogs" or "hunting" (or even "dog" or "hunt") produces no results.
Do I misunderstand or misuse the full text search operations or is there a bug in MongoDB?
After a fair amount of experimenting and scratching my head I discovered the reason for this behavior. It turned out that the documents in the collection in question had attribute 'language'. Apparently the presence and the value of that attribute made these documents non-searchable. (The value happened to be 'ENG'. It is possible that changing it to 'eng' would make this document searchable again. The field, however, served a completely different purpose). After I renamed the field to 'lang' I was able to find the document containing the word "dogs" by searching for "dog" or "dogs".
I wonder whether this is expected behavior of MongoDB - that the presence of language attribute in the document would affect the text search.
Michael,
The "language" field (if present) allows each document to override the
language in which the stemming of words would be done. I think, as
you specified to MongoDB a language which it didn't recognize ("ENG"),
it was unable to stem the words at all. As others pointed out, you can use the
language_override option to specify that MongoDB should be using some
other field for this purpose (say "lang") and not the default one ("language").
Below is a nice quote (about full text indexing and searching) which
is exactly related to your issue. It is taken from this book.
"MongoDB: The Definitive Guide, 2nd Edition"
Searching in Other Languages
When a document is inserted (or the index is first created), MongoDB looks at the
indexes fields and stems each word, reducing it to an essential unit. However, different
languages stem words in different ways, so you must specify what language the index
or document is. Thus, text-type indexes allow a "default_language" option to be
specified, which defaults to "english" but can be set to a number of other languages
(see the online documentation for an up-to-date list).
For example, to create a French-language index, we could say:
> db.users.ensureIndex({"profil" : "text", "interets" : "text"}, {"default_language" : "french"})
Then French would be used for stemming, unless otherwise specified. You can, on a
per-document basis, specify another stemming language by having a "language" field
that describes the document’s language:
> db.users.insert({"username" : "swedishChef", "profile" : "Bork de bork", language : "swedish"})
What the book does not mention (at least this page of it doesn't) is that
one can use the language_override option to specify that MongoDB
should be using some other field for this purpose (say "lang") and
not the default one ("language").
In http://docs.mongodb.org/manual/tutorial/specify-language-for-text-index/ take a look at the language_override option when setting up the index. It allows you to change the name of the field that should be used to define the language of the text search. That way you can leave the "language" property for your application's use, and call it something else (e.g. searchlang or something like that).

Github API field descriptions

I'm toying with the Github search API (v3) and can't seem to find a description of the fields that are returned. Most of them are obvious, but there are a few like score that aren't. Does anyone know what score means, and does a field reference exist?
The score attribute is the search score of that document for a particular query, and is used for Best Match sorting. In other words, it's used for ranking search results, but it isn't shown in search results on github.com.

custom sorting in sphinx / sort result by match & distance over a particular field

I am using sphinx 2.0.
I want to achieve following results :
user will input tags with other search terms, documents associated with user input tags should come on top, sorted by distance.
After that documents does not contain those tags sorted by distance.
What i am doing:
I am searching on different parameters at the same time using like #name , #tag, #streetname etc.so i am using below
$cl->SetMatchMode(SPH_MATCH_EXTENDED);
and sorting the result by distance using $cl->SetSortMode(SPH_SORT_EXTENDED, '#geodist asc');
tag filed can contain multiple values i am using OR operator to get the desired results.
If i search for only #tags then i am able to achieve the requirement i have mentioned. but if user input is #tag food|dinner #city london #name taxi
then result with name: London Taxi, street: London comes on top or some other position breaking the sorting order by lat-long. because London is there in two parameters.i just want to sort by tag, do not want to include the weight of other search terms in sorting order.
Ranking mode is : $cl->setRankingMode(SPH_RANK_PROXIMITY_BM25);
any suggestion to overcome this issue ? or any other way to implement it.
Many Thanks.
I think the way to solve this would be to arrange for matches on the tag field to rank way way higher. Would have to test it but something like this...
$cl->setFieldWeights(array('tags' => 100000));
$cl->setSelect("*,IF(#weight>100000,1,0) AS matchtags");
$cl->SetSortMode(SPH_SORT_EXTENDED, 'matchtags DESC, #geodist ASC');