How do I make Algolia Search guaruntee that it will return a number of results - algolia

I need Algolia to always return me 5 results from a full text search even if the query text itself bears little or no relevance to the actual returned results. Before someone suggests it, I have already tried to set the removeWordsIfNoResults option to all of it's possible modes and this still doesn't guarantee that I get my 5 results.
The purpose of this is to create a 'relevant entities' sidebar where the name of the current entity is used to search for other entities.
Any suggestions?

Using the removeWordsIfNoResults=allOptional query parameter is indeed a good way to go -> because all query words are required to match an object by default, fallbacking to "optional" is a good way to still retrieve results if one you the query words (or the combination of words) doesn't match anything.
index.search(query, { removeWordsIfNoResults: 'allOptional' });
Another solution is to always consider all query words as optional (not only as a fallback); to make sure the query foo bar baz is interpreted as OPT(foo) AND OPT(bar) AND OPT(baz) <=> foo OR bar OR baz. The difference is that this query will retrieve more results than the previous one because 1 single matching word will be enough to retrieve the object.
index.search(query, { optionalWords: query });
That being said, there is no way to force the engine to retrieve "at least" 5 results. What I would recommend is to have a small frontend logic:
- do the query with removeWordsIfNoResults or optionalWords
- if the engines returns less than 5 results, do another query

Related

Algolia optionalFilters acts like filters on virtual index

I have my index and virtual index, on my index query like:
{
"facetFilters": [["objectID:12345", "tag:Luxury","tag:Makeup"]], // 12345 or luxury or makeup
"optionalFilters": "objectID:12345" // put it as first
}
will return all documents that have given object id or tag luxury or tag makeup and puts object with id "12345" as first. It behaves like expected.
But when I run the same query on my virtual index it only returns document with given id "12456". So it behave like filter where in docs it says:
https://www.algolia.com/doc/guides/managing-results/rules/merchandising-and-promoting/in-depth/optional-filters/
Unlike filters, optional filters don’t remove records from your search results when your query doesn’t match them. Instead, they divide your records into two sets: the results that match the optional filter, and the ones that don’t.
Weird. I just set this up and am seeing the same results. I don't see anything in the docs that would explain why the behavior would be different, so I'm reaching out to some engineering colleagues to see what's going on.
UPDATE:
Algolia virtual replicas and optionalFilters both do out-of-band sorting of results at query time. It looks like those two features are causing strangeness when they both try to do their sort. I've cut a ticket on this, but for now to get the results you'll want to use a standard replica with optionalFilters -- the standard replica will do index-time sorting and then the optionalFilters can layer their query time filtering on top of it.

Autocomplete and text search memory issues in apostrophe-cms: need ideas

I’m having trouble to use the text search and the autocomplete because I have a piece with +87k documents, some of them being big (~3.4MB of text).
I already:
Removed every field from the text index, except title , searchBoost and seoDescription ; these are the only fields copied to highSearchText and the field lowSearchText is always set to an empty string.
Modified the standard text index, including the fields type, published and trash in the beginning of it. I'm also modified the queries to have equality conditions on these fields. The result returned by the command db.aposDocs.stats() shows:
type_1_published_1_trash_1_highSearchText_text_lowSearchText_text_title_text_searchBoost_text: 12201984 (~11 MB, fits nicely in memory)
Verified that this index is being used, both in ‘toDistinc’ query as well in the final ‘toArray’ query.
What I think is the biggest problem
The documents have many repeated words in the title, so if the user types a word present in 5k document titles, the server suffers.
Idea I'm testing
The MongoDB docs says that to improve performance the entire collection must fit in RAM (https://docs.mongodb.com/manual/core/index-text/#storage-requirements-and-performance-costs, last bullet).
So, I created a separate collection named “search” with just the fields highSearchText (string, indexed as text) and highSearchWords (array, also indexed), which result in total size of ~ 19 MB.
By doing the same operations of the standard apostrophe autocomplete in this collection, I achieved much faster, but similar results.
I had to write events to automatically update the search collection when the piece changes, but it seems to work until now.
Issues
I'm testing this search collection with the autocomplete. For the simple text search, I’m just limiting the sorted response to 50 results. Maybe I'll have to use the search collection as well, because the search could still breaks.
Is there some easier approach I'm missing? Please, any ideas are welcome.

Extremely Slow MongoDb C# Driver 2.0 RegEx Query

I have the following query - it takes about 20-40 seconds to complete (similar queries without RegEx on the same collection take milliseconds at most):
var filter = Builders<BsonDocument>.Filter.Regex("DescriptionLC", new BsonRegularExpression(descriptionStringToFindFromCallHere, "i"));
var mongoStuff = GetMongoCollection<BsonDocument>(MongoConstants.StuffCollection);
var stuff = await mongoStuff
.Find(filter)
.Limit(50)
.Project(x => Mapper.Map<BsonDocument, StuffViewModel>(x))
.ToListAsync();
I saw an answer here that seems to imply that this query would be faster using the following format (copied verbatim):
var names = namesCollection.AsQueryable().Where(name =>
name.FirstName.ToLower().Contains("hamster"));
However, the project is using MongoDb .NET Driver 2.0 and it doesn't support LINQ. So, my question comes down to:
a). Would using LINQ be noticeably faster, or about the same? I can update to 1, but I would rather not.
b). Is there anything I can do to speed this up? I am already looking for a lower-case only field.
------------END ORIGINAL OF POST------------
Edit: Reducing the number of "stuff" returned via changing .Limit(50) to say .Limit(5) reduces the call time linearly. 40 seconds drops to 4 with the latter, I have experimented with different numbers and it seems to be a direct correlation. That's strange to me, but I don't really understand how this works.
Edit 2: It seems that the only solution might be to use "starts with" instead of "contains" regular expressions. Apparently the latter doesn't use indices efficiently according to the docs ("Index Use" section).
Edit 3: In the end, I did three things (field was already indexed):
1). Reduce the number of results returned - this help dramatically, linear correlation between number of items returned and amount of time the call takes.
2). Changed the search to lower-case only - this helped only slightly.
3). Changed the regular expression to only search "starts with" rather than "contains", again, this barely helped, changes for that were:
//Take the stringToSearch and make it into a "starts with" RegEx
var startingWithSearchRegEx = "^" + stringToSearch;
Then pass that into the new BsonRegularExpression instead of just the search string.
Still looking for any feedback!!
Regex on hundreds of thousands documents is not recommend as it's essentially doing document scan so no index is being used at all.
This is the main reason why your query is so slow. It has nothing to do with .net driver.
If you have a lot of text or searching for text patterns often, I'll suggest create text index on field of interest and do full text search. Please see docs for $text

Is it possible to make lucene.net ignore case of the field for queries?

I have documents indexed with field "GuidId" field and "guidid". How can I make lucene net ignore the case ...so that the following query searches regardless of the case ?
TermQuery termQuery = new TermQuery(new Term("GuidId", guidId.ToString()));
I don't want to write another query for the documents with fields "guidid" ..i.e. lowercase
Ideally, don't have fields names with funky cases. If you are defining field names dynamically or some such, then you should lowercase them before adding them to the index. That done, it should be easy enough to keep the query fields' names lowercase as well, and you're in good shape.
If, for whatever reason, you are stuck with this case-sensitive data, you'll be stuck expanding your queries to search all the known permutations of the field name to get all your results. Something like:
Query finalQuery = new DisjunctionMaxQuery(0)
finalQuery.add(new TermQuery(new Term("GuidId", guidId.ToString())));
finalQuery.add(new TermQuery(new Term("guidid", guidId.ToString())));
DisjunctionMaxQuery would probably be a good choice here, since it only returns the maximum scoring hit among is query collection, rather than possibly compounding scores across multiple hits.
You can also use MultiFieldQueryParser to similar effect. I don't believe it uses DisjunctionMax, but it doesn't sound like it would likely be that big a deal in this case.

MongoDB skip & limit when querying two collections

Let's say I have two collections, A and B, and a single document in A is related to N documents in B. For example, the schemas could look like this:
Collection A:
{id: (int),
propA1: (int),
propA2: (boolean)
}
Collection B:
{idA: (int), # id for document in Collection A
propB1: (int),
propB2: (...),
...
propBN: (...)
}
I want to return properties propB2-BN and propA2 from my API, and only return information where (for example) propA2 = true, propB6 = 42, and propB1 = propA1.
This is normally fairly simple - I query Collection B to find documents where propB6 = 42, collect the idA values from the result, query Collection A with those values, and filter the results with the Collection A documents from the query.
However, adding skip and limit parameters to this seems impossible to do while keeping the behavior users would expect. Naively applying skip and limit to the first query means that, since filtering occurs after the query, less than limit documents could be returned. Worse, in some cases no documents could be returned when there are actually still documents in the collection to be read. For example, if the limit was 10 and the first 10 Collection B documents returned pointed to a document in Collection A where propA2 = false, the function would return nothing. Then the user would assume there's nothing left to read, which may not be the case.
A slightly less naive solution is to simply check if the return count is < limit, and if so, repeat the queries until the return count = limit. The problem here is that skip/limit queries where the user would expect exclusive sets of documents returned could actually return the same documents.
I want to apply skip and limit at the mongo query level, not at the API level, because the results of querying collection B could be very large.
MapReduce and the aggregation framework appear to only work on a single collection, so they don't appear to be alternatives.
This seems like something that'd come up a lot in Mongo use - any ideas/hints would be appreciated.
Note that these posts ask similar sounding questions but don't actually address the issues raised here.
Sounds like you already have a solution (2).
You cannot optimize/skip/limit on first query, depending on search you can perhaps do it on second query.
You will need a loop around it either way, like you write.
I suppose, the .skip will always be costly for you, since you will need to get all the results and then throw them away, to simulate the skip, to give the user consistent behavior.
All the logic would have to go to your loop - unless you can match in a clever way to second query (depending on requirements).
Out of curiosity: Given the time passed, you should have a solution by now?!