Algolia search with a certain category first - algolia

I'm trying to search Algolia and have results for a certain category show up before all other categories. Here's an example:
Data in Algolia
{ name: Harry Potter, category: book},
{ name: The Avengers, category: movie},
{ name: Pottery, category: movie}
Problem
Let's say the normal Algolia algorithm has Harry Potter way more relevant than the movie Pottery, so normally if you searched pot then Harry Potter would show up ahead of Pottery.
I want to pass Algolia the search term pot and the category movie and then have Pottery show up ahead of everything else. It needs to be dynamic, i.e. I should be able to search pot with category book and get Harry Potter first.
Is there a way to do this with Algolia?

There is actually a nice way to implement that behavior using "optional" facet filters (a soon to be released advanced feature - as of 2016/12/01).
An "Optional Facet Filter" is a facet filter that doesn't need to match to retrieve a result but that will - by default - make sure the hits that have the facet value are retrieved first (thanks to the filters criterion of Algolia's tie-breaking ranking formula).
This is exactly what you want: on every single page where you want some results sharing a category value to be retrieved first; just query the Algolia index with the category:value optional facet filter.
make sure your category attribute is part of your attributesForFacet index setting
at query time, query the index with index.search('', { optionalFacetFilters: ["category:book"])
You can read more on this (beta) documentation page.

Related

MongoDB Querying Large Datasets

Lets say I have simple document structure like:
{
"item": {
"name": "Skittles",
"category": "Candies & Snacks"
}
}
On my search page, whenever user searches for product name, I want to have a filter options by category.
Since categories can be many (like 50 types) I cannot display all of the checkboxes on the sidebar beside the search results. I want to only show those which have products associated with it in the results. So if none of the products in search result have a category, then do not show that category option.
Now, the item search by name itself is paginated. I only show 30 items in a page. And we have tens of thousands of items in our database.
I can search and retrieve all items from all pages, then parse the categories. But if i retrieve tens of thousands of items in 1 page, it would be really slow.
Is there a way to optimize this query?
You can use different approaches based on your workflow and see what works the best in your situation. Some good candidate for the solution are
Use distinct prior to running the query on large dataset
Use Aggregation Pipeline as #Lucia suggested
[{$group: { _id: "$item.category" }}]
Use another datastore(either redis or mongo itselff) to store intelligence on categories
Finally based on the approach you choose and the inflow of requests for filters, you may want to consider indexing some fields
P.S. You're right about how aggregation works, unless you have a match filter as first stage, it will fetch all the documents and then applies the next stage.

MongoDB - Tag based search with autocomplete

I am looking to implement a tag search feature and was looking for some advice in terms of efficiency. I am new to MongoDB so I am unsure of best practices for performance.
Okay so I want to create a link sharing app which users tag the links based on their content. For instance a funny dog image would be tagged with "funny" and "dog". A link would have a:
title,
url,
user_id,
tags: array of tags
Now in order for me to allow users to search for links I need a list of all the tags used. For usability this needs to have auto-complete functionality. So I researched a bit and tested out using a collection of tags where I index the tag value e.g. "funny" and then use a regex.
db.tags.find({value:/^search/})
With a collection of 600,000 documents it searched for all documents beginning with "s" in 63 milliseconds. As the length of the search term increases the execution time decreases.
Now comes the part I'm unsure of. Say for instance I want to find all the links with have the tags "funny" and "dog" (need to use intersects). How should I store the tags? Should I store the object id of each tag? Can I index these object ids? Is there another way to structure the whole database?
Also id like to be able suggest tags based on tags they already entered. I was thinking of just having a related field in the tag document for instance:
tag
----
id
value
related: [{
tag_id
count
}]
(again unsure as it would suggest tags that could be related to one of the already entered tags and not to another. With an intersect this would return no results.)
Any advice would be much appreciated.
Edit: mistake
Create a text index on the tag array. This will enable you to search quickly for funny, dog, and funny or dog.
https://docs.mongodb.com/manual/core/index-text/
db.tags.createIndex( { tags: "text" }, {background:true} )
As to the related tags, I don't think that you want to reference the _id values. You can probably embed an array of related tags such as:
relatedTags: [{tag1}, {tag2}]

Github API field descriptions

I'm toying with the Github search API (v3) and can't seem to find a description of the fields that are returned. Most of them are obvious, but there are a few like score that aren't. Does anyone know what score means, and does a field reference exist?
The score attribute is the search score of that document for a particular query, and is used for Best Match sorting. In other words, it's used for ranking search results, but it isn't shown in search results on github.com.

custom sorting in sphinx / sort result by match & distance over a particular field

I am using sphinx 2.0.
I want to achieve following results :
user will input tags with other search terms, documents associated with user input tags should come on top, sorted by distance.
After that documents does not contain those tags sorted by distance.
What i am doing:
I am searching on different parameters at the same time using like #name , #tag, #streetname etc.so i am using below
$cl->SetMatchMode(SPH_MATCH_EXTENDED);
and sorting the result by distance using $cl->SetSortMode(SPH_SORT_EXTENDED, '#geodist asc');
tag filed can contain multiple values i am using OR operator to get the desired results.
If i search for only #tags then i am able to achieve the requirement i have mentioned. but if user input is #tag food|dinner #city london #name taxi
then result with name: London Taxi, street: London comes on top or some other position breaking the sorting order by lat-long. because London is there in two parameters.i just want to sort by tag, do not want to include the weight of other search terms in sorting order.
Ranking mode is : $cl->setRankingMode(SPH_RANK_PROXIMITY_BM25);
any suggestion to overcome this issue ? or any other way to implement it.
Many Thanks.
I think the way to solve this would be to arrange for matches on the tag field to rank way way higher. Would have to test it but something like this...
$cl->setFieldWeights(array('tags' => 100000));
$cl->setSelect("*,IF(#weight>100000,1,0) AS matchtags");
$cl->SetSortMode(SPH_SORT_EXTENDED, 'matchtags DESC, #geodist ASC');

Sphinx Search: excluding index B results from index A results

Here's my issue:
I have 2 indexes:
A - product titles only
B - product titles and product descriptions
By default I search index A to categorize products (e.g. most bikes have "bike" in title).
Sometimes there instances where to determine category (which might be a sub-category of something) we need to look at description, mostly to exclude irrelevant results. In order for pagination on search result page to work, I need to get this clean result as one array after running RunQueries().
But it does not work. It basically adds results of both queries, and looks like there's no way to subtract results. Anyone has any ideas?
Tell me if I'm completely missing something but it sounds to me like your trying to include results with product titles that match a certain query and exclude results with a description that matches another query?
If this is the case it seems to me that having 2 indexes is useless, and you can have one index with both product titles and descriptions and then run a full text search query as such:
#title queryA #description -queryB
You can use the same query to search for matches that have a title of queryA AND a description of queryB by simply removing the - symbol.
If this is off base the only other way I could think of doing it is using SphinxQL (I'm not well versed in any of the libraries since support for all the libraries which don't use SphinxQL is being phased out in the future as far as I've read)
Using SphinxQL you could run 2 queries, one which is like
SELECT id FROM indexB WHERE MATCH('#description queryB')
And then run a second query using a the list of ids you got from the first query as such
SELECT id FROM indexA WHERE id NOT IN(id1,id2,id3,...)