Per field ranking in Sphinx

Per field ranking in Sphinx - sphinx

I have a Sphinx index with three indexed fields. I would like to rank higher those documents with matches on field one over those on field two and then again those over field three, so that for example a document with one match on field one would outrank a document with multiple matches in field two. How can I go about doing this?

Use the SetFieldWeights API function
http://sphinxsearch.com/docs/current.html#api-func-setfieldweights

Related

Does length of indexed field matter while searching?

The chat app schema that I have is something like below.
1. conversations {participants[user_1, user_2], convsersation_id}
2. messages {sender: user_1, sonversation_id, timestamps}
I want to map this relationship using existing _id:ObjectId which is already indexed.
But if I want to get all conversation of user_1 I have to first search in which conversation that user is involed and get that conversation's _id and again search for the messages in messages using that conversation _id.
So my questions are -
Does length of indexed field (here _id) matters while searching?
Should I create another shorter indexed fields?.
Also if there is any better alternative schema please suggest.

I would suggest you to maintain the data as sub documents instead of array. The advantage you have is you can build another index (only) on conversation_id field, which you want to query to know the user's involvement
When you maintain it as array, you cannot index the converstaion_id field separately, instead you will have to build a multi key index, which indexes all the elements of the array (sender and timestamps fields) which you are never going to use for querying and it also increases the index size
Answering you questions:
Does length of indexed field (here _id) matters while searching? - Not really
Should I create another shorter indexed fields? - Create sub-document and index converstaion_id
Also if there is any better alternative schema please suggest. - Maintain the array fields as sub-documents

Difference between wildcard search and individual text search

Is there a difference between a wildcard search index like $** and text indexes that I create for each of the fields in the collection ?
I do see a small difference in response time when I individually create text indexes. Using individual indexes, returns a better response. I am not able to post an example now, but will try to.

A wildcard text search will index every field that contains string data for each document in the collection (https://docs.mongodb.com/manual/core/index-text/#wildcard-text-indexes).
Because you are essentially increasing the number of fields indexed with a wild card text index, it would take longer to run compared to targeting specific fields for a text index.
Since you can only have one text index per collection (https://docs.mongodb.com/manual/core/index-text/#create-text-index), its worth considering which fields you plan on querying against beforehand.

MongoDB multiple type of index on same field

Can I have multiple type of index on same field? Will it affect performance?
Example :
db.users.createIndex({"username":"text"})
db.users.createIndex({"username":1})

Yes, you can have different types of indexes on single field. You can create indexes of type e.g text, 2dsphere, hash
You can not create same index with sparse and unique options.
Every write operation is going to update a relevant index entry of all possible types in this case

The two index options are very different.
When you create a regular index on a string field it indexes the entire value in the string. Mostly useful for single word strings (like a username for logins) where you can match exactly.
A text index on the other hard will tokenize and stem the content of the field. So it will break the string into individual words or tokens, and will further reduce them to their stems so that variants of the same word will match ("talk" matching "talks", "talked" and "talking" for example, as "talk" is a stem of all three). Mostly useful for true text (sentences, paragraphs, etc).
Text Search
Text search supports the search of string content in documents of a collection. MongoDB provides the $text operator to perform text search in queries and in aggregation pipelines.
The text search process:
tokenizes and stems the search term(s) during both the index creation and the text command execution.
assigns a score to each document that contains the search term in the indexed fields. The score determines the relevance of a
document to a given search query.
The $text operator can search for words and phrases. The query matches on the complete stemmed words. For example, if a document field contains the word blueberry, a search on the term blue will not match the document. However, a search on either blueberry or blueberries will match.
$regex searches can be used with regular indexes on string fields, to provide some pattern matching and wildcard search. Not a terribly effective user of indexes but it will use indexes where it can:
If an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan. Further optimization can occur if the regular expression is a “prefix expression”, which means that all potential matches start with the same string. This allows MongoDB to construct a “range” from that prefix and only match against those values from the index that fall within that range.
http://docs.mongodb.org/manual/core/index-text/
http://docs.mongodb.org/manual/reference/operator/query/regex/

Determining which field Mongo found a $text $search match on

I'm new to Mongo (2.6.4) and I understand that you can only have one text index per collection.
I can include several fields in the index and weight them, but is there any way to determine which field(s) it found a match on?

As at MongoDB 3.0, field weights can be used to set the relative significance of fields in a document when calculating a relevance score, but there is no metadata in the return results to indicate which field(s) matched.

Count total docs containing specific field in Lucene index

I am trying to run a query on Lucene .NET 2.9.2 index without any luck:
My index holds documents, some of them contain numeric field called "MyNum" and some of them are not.
The field is indexed.
I am trying to count the total documents that contain the field, no matter the fields value.
Could some one please help me out?

A query like fieldX:* should return all of the documents that contain field "fieldX".
You may need to allow for a prefixed * in your search (I don't have a copy of Lucene up at the moment.)

You may use the wildcard query to retrieve all documents with specific field. Just provide the * as value (this is just regular wildcard). Here is the sample code:
IndexSearcher searcher = new IndexSearcher(reader);
TopDocs docs = searcher.Search(new WildcardQuery(new Term("MyNum", "*")), int.MaxValue);
Console.WriteLine(docs.TotalHits);

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse