We are trying to make a site-wide search using sphinx. This means, that our search must look at all the main indexes and fields in them and return them by relevance.
This is the query:
SELECT *
FROM articles, users, genres
WHERE match ('#(articles.title, genres.title, articles.description, users.nickname) test_sting')
But this does not seem to work. Is there any way to search across multiple indexes and specify the fields that we want to search?
SELECT * FROM articles, users, genres WHERE match ('test_sting')
should just match all fields in all indexes, no need to specify the specific fields.
Otherwise you can use the barely documented ##relaxed operator....
SELECT * FROM articles, users, genres
WHERE match ('##relaxed #(title, description, nickname) test_sting')
which should work. It will only search those named fields, but the ##relaxed, means it doesnt matter if a particular field doesnt exist in a particular index.
Related
Assume that all my documents are in the same collection, and none of them are nested. I am using Node.js.
Here's some dummy data.
firstName: "Pippa",
lastName: "Smith"
preferredSalutation: "Mrs"
mobileNumber: "123123"
emailAddress:"pippasmith#gmail.com"
I am trying to accomplish three things.
Find the correct Mongo query to search for specific fields in a collection of documents that will be displayed in a table. The table has drop down select boxes beside it that the user can use to filter the results displayed in the table. For now, assume the filters are firstName, lastName and preferredSalutation. The query needs to either return the appropriate match, or nothing. Currently the type of firstName is a String, as it is for all other fields.
If the user selects more than one filter, for example firstName and mobileNumber, then the table should display only the documents that match these two filters. Everything else should be excluded until...
The user selects the Show all option in the filter dropdown which removes the given filter for that field and updates the results in the table.
I have tried to use .find() with little success because I struggle to sort the results in the table so that only documents that match all the filter selections are displayed. I tried .aggregate() and but it seems like it may be overkill given that I don't wish to join collections.
Would appreciate some thoughts on the best way to do this.
Example:
{
shortName: "KITT",
longName: "Knight Industries Two Thousand",
fromZeroToSixty: 2,
year: 1982,
manufacturer: "Pontiac",
/* 25 more fields */
}
Ability to query by at least 20 fields which means that only 10 fields are left unindexed
There's 3 fields (all number) that could be used for sorting (both ways)
This leaves me wondering that how does sites with lots of searchable fields do it: e.g real estate or car sale sites where you can filter by every small detail and can choose between several sort options.
How could I pull this off with MongoDB? How should I index that kind of collection?
Im aware that there are dbs specifically made for searching but there must be general rules of thumb to do this (even if less performant) in every db. Im sure not everybody uses Elasticsearch or similar.
---
Optional reading:
My reasoning is that index could be huge but the index order matters. You'll always make sure that fields that return the least results are first and most generic fields are last in index. However, what if user chooses only generic fields? Should I include non-generic fields to query anyway? How to solve ordering in both ways? Or index intersection saves the day and I should just add 20 different indexes?
text index is your friend.
Read up on it here: https://docs.mongodb.com/v3.2/core/index-text/
In short, it's a way to tell mongodb that you want full text search over a specific field, multiple fields, or all fields (yay!)
To allow text indexing of all fields, use the special symbol $**, and define it of type 'text':
db.collection.createIndex( { "$**": "text" } )
you can also configure it with Case Insensitivity or Diacritic Insensitivity, and more.
To perform text searches using the index, use the $text query helper, see: https://docs.mongodb.com/v3.2/reference/operator/query/text/#op._S_text
Update:
In order to allow user to select specific fields to search on, it's possible to use weights when creating the text-index: https://docs.mongodb.com/v3.2/core/index-text/#specify-weights
If you carefully select your fields' weights, for example using different prime numbers only, and then add the $meta text score to your results you may be able to figure out from the "textScore" which field was matched on this query, and so filter out the results that didn't get a hit from a selected search field.
Read more here: https://docs.mongodb.com/v3.2/tutorial/control-results-of-text-search/
I have an index which has several different attributes.
MySQL [(none)]> select * FROM products_index WHERE MATCH('red shoes');
This returns a bunch of results. Magic. Love Sphinx.
Now, is it possible to see which attribute Sphinx matched on for each of these results?
For example, I have a "colour" field which the "red" would be matching on (potentially), but it could also match on the product name attribute.
I think PACKEDFACTORS() is the only way to do this
http://sphinxsearch.com/docs/current.html#expr-func-packedfactors
It a little cumbersome to use, and adds a bit of overhead to the query, but should work.
(other than post matching, eg using Snippets)
In my app, users can favorite documents. Sphinx is used to allow them to search for matching documents. If a user wants to search their favorites, I first go directly to the database (mySQL) to fetch a list of document IDs and use that to filter the search in sphinx. The pseudocode looks something like this:
function searchFavoritesForUser($userId, $query) {
$favoriteIds = getFavoriteIdsForUser($userId);
$sphinx = new Sphinx(...);
$sphinx->setFilter('document_id', $favoriteIds);
return $sphinx->search($query);
}
This works fine if the user has a reasonable number of favorites. If the user has a large number of favorites, then loading the favorites can use a potentially large amount of memory and setting the filter in sphinx can run up again various limits in searchd.
I realize that I can adjust those config values, but it seems like there must be a better way to design this. Ideally, I would be able to eliminate the step where I have to load all of the favorite document IDs from the database into main memory.
While you create sphinx index, you can create MVA (multi-value attribute) for favorites in sphinx having (doc_id, user_id) and then search directly in sphinx, no need to query to MySql.
Here's my issue:
I have 2 indexes:
A - product titles only
B - product titles and product descriptions
By default I search index A to categorize products (e.g. most bikes have "bike" in title).
Sometimes there instances where to determine category (which might be a sub-category of something) we need to look at description, mostly to exclude irrelevant results. In order for pagination on search result page to work, I need to get this clean result as one array after running RunQueries().
But it does not work. It basically adds results of both queries, and looks like there's no way to subtract results. Anyone has any ideas?
Tell me if I'm completely missing something but it sounds to me like your trying to include results with product titles that match a certain query and exclude results with a description that matches another query?
If this is the case it seems to me that having 2 indexes is useless, and you can have one index with both product titles and descriptions and then run a full text search query as such:
#title queryA #description -queryB
You can use the same query to search for matches that have a title of queryA AND a description of queryB by simply removing the - symbol.
If this is off base the only other way I could think of doing it is using SphinxQL (I'm not well versed in any of the libraries since support for all the libraries which don't use SphinxQL is being phased out in the future as far as I've read)
Using SphinxQL you could run 2 queries, one which is like
SELECT id FROM indexB WHERE MATCH('#description queryB')
And then run a second query using a the list of ids you got from the first query as such
SELECT id FROM indexA WHERE id NOT IN(id1,id2,id3,...)