Algolia optionalFilters acts like filters on virtual index - algolia

I have my index and virtual index, on my index query like:
{
"facetFilters": [["objectID:12345", "tag:Luxury","tag:Makeup"]], // 12345 or luxury or makeup
"optionalFilters": "objectID:12345" // put it as first
}
will return all documents that have given object id or tag luxury or tag makeup and puts object with id "12345" as first. It behaves like expected.
But when I run the same query on my virtual index it only returns document with given id "12456". So it behave like filter where in docs it says:
https://www.algolia.com/doc/guides/managing-results/rules/merchandising-and-promoting/in-depth/optional-filters/
Unlike filters, optional filters don’t remove records from your search results when your query doesn’t match them. Instead, they divide your records into two sets: the results that match the optional filter, and the ones that don’t.

Weird. I just set this up and am seeing the same results. I don't see anything in the docs that would explain why the behavior would be different, so I'm reaching out to some engineering colleagues to see what's going on.
UPDATE:
Algolia virtual replicas and optionalFilters both do out-of-band sorting of results at query time. It looks like those two features are causing strangeness when they both try to do their sort. I've cut a ticket on this, but for now to get the results you'll want to use a standard replica with optionalFilters -- the standard replica will do index-time sorting and then the optionalFilters can layer their query time filtering on top of it.

Related

Firestore 1 global index vs 1 index per query what is better?

I'm working on my app and I just ran into a dilemma regarding what's the best way to handle indexes for firestore.
I have a query that search for publication in a specify community that contains at least one of the tag and in a geohash range. The index for that query looks like this:
community Ascending tag Ascending location.geohash Ascending
Now if my user doesnt need to filter by tag, I run the query without the arrayContains(tag) which prompt me to create another index:
community Ascending location.geohash Ascending
My question is, is it better to create that second index or, to just use the first one and specifying all possible tags in arrayContains in the query if the user want no filters on tag ?
Neither is pertinently better, but it's a typical space vs time tradeoff.
Adding the extra tags in the query adds some overhead there, but it saves you the (storage) cost for the additional index. So you're trading some small amount of runtime performance for a small amount of space/cost savings.
One thing to check is whether the query with tags can actually run on just the second index, as Firestore may be able to do a zigzag merge join. In that case you could only keep the second, smaller index and save the runtime performance of adding additional clauses, but then get a (similarly small) performance difference on the query where you do specify one or more tags.

Need help querying distinct combinations of nested fields

Desired result
I am trying to query my collection and obtain every unique combination of a batch and entry code. I don't care about anything other than these fields, the parent objects do not matter to me.
What I have tried
I tried running:
db.accountant_ledgers.aggregate( [ {"$group": { "_id": { entryCode: "$actions.entry.entryCode", batchCode: "$actions.entry.batchCode" } } } ]);
Problem
I get unexpected results when I run that query. I'm looking for a list of every unique combination of batch and entry codes, but instead I get a list of arrays? Perhaps these are the results I'm looking for, but I have no idea how to read them if they are.
Theory
I think perhaps this could have to do with the fact that these fields are nested. Each object has several actions, each action has several entries. I believe that the result from that query is just the aggregated entry and batch codes found in each object. I don't know how long the list of results is, but I'd guess it's the same number as the total number of objects in my collection (~90 million).
EDIT: I found out that there are only 182 results from my query, which is clearly significantly smaller than 90 million. My new theory is that it has found all unique objects, with the criteria for "uniqueness" being the list of the batch and entry codes that appear in their actions, which makes sense. There should be a lot of repetition in the collection.
Question
How can I achieve the result I'm looking for? I'm expecting something like:
FEE, MG
EXN, WT
ACH, 9C
...etc
Notes
I apologize if this is a bad question, I'm not sure how else to frame it. Let me know if I can improve my question at all.
Picture below shows the results of the query.
EDIT FOR ADDITIONAL INFORMATION
I can't share any sample documents, but the general structure of the data is shown (crudely) in the below image. Each Entity has several Actions, each Action has one Entry and each Entry has one Batch code and one Entry code.
List item
You are getting a list of documents (each is a map or a hash), not a list of arrays.
The GUI you are using is trying to show you the contents of each document on the top level which is maybe what is confusing.
If you run the query in mongo shell you should see a list of documents.
It looks like your inputs are documents where entry code and batch code are arrays, if so:
Edit your question to include sample documents you are querying as text
You could use $unwind to flatten those arrays before using $group.

Firestore: Where query (==) on map field

I have a collection with shipments in them.
I want to filter on postalcode which is a part of the 'address_from' map.
Is this possible and are there extra steps required, indexes for example, to make this work?
query.where('address_from.postalcode', '==', shipment_code);
That query should just work. Any single field query should work without creating an index, as all fields are indexed by default. Queries that require an index will yield and error message telling you that one needs to be created, and provide a link to the console to automatically do that.

How do I make Algolia Search guaruntee that it will return a number of results

I need Algolia to always return me 5 results from a full text search even if the query text itself bears little or no relevance to the actual returned results. Before someone suggests it, I have already tried to set the removeWordsIfNoResults option to all of it's possible modes and this still doesn't guarantee that I get my 5 results.
The purpose of this is to create a 'relevant entities' sidebar where the name of the current entity is used to search for other entities.
Any suggestions?
Using the removeWordsIfNoResults=allOptional query parameter is indeed a good way to go -> because all query words are required to match an object by default, fallbacking to "optional" is a good way to still retrieve results if one you the query words (or the combination of words) doesn't match anything.
index.search(query, { removeWordsIfNoResults: 'allOptional' });
Another solution is to always consider all query words as optional (not only as a fallback); to make sure the query foo bar baz is interpreted as OPT(foo) AND OPT(bar) AND OPT(baz) <=> foo OR bar OR baz. The difference is that this query will retrieve more results than the previous one because 1 single matching word will be enough to retrieve the object.
index.search(query, { optionalWords: query });
That being said, there is no way to force the engine to retrieve "at least" 5 results. What I would recommend is to have a small frontend logic:
- do the query with removeWordsIfNoResults or optionalWords
- if the engines returns less than 5 results, do another query

MongoDB skip & limit when querying two collections

Let's say I have two collections, A and B, and a single document in A is related to N documents in B. For example, the schemas could look like this:
Collection A:
{id: (int),
propA1: (int),
propA2: (boolean)
}
Collection B:
{idA: (int), # id for document in Collection A
propB1: (int),
propB2: (...),
...
propBN: (...)
}
I want to return properties propB2-BN and propA2 from my API, and only return information where (for example) propA2 = true, propB6 = 42, and propB1 = propA1.
This is normally fairly simple - I query Collection B to find documents where propB6 = 42, collect the idA values from the result, query Collection A with those values, and filter the results with the Collection A documents from the query.
However, adding skip and limit parameters to this seems impossible to do while keeping the behavior users would expect. Naively applying skip and limit to the first query means that, since filtering occurs after the query, less than limit documents could be returned. Worse, in some cases no documents could be returned when there are actually still documents in the collection to be read. For example, if the limit was 10 and the first 10 Collection B documents returned pointed to a document in Collection A where propA2 = false, the function would return nothing. Then the user would assume there's nothing left to read, which may not be the case.
A slightly less naive solution is to simply check if the return count is < limit, and if so, repeat the queries until the return count = limit. The problem here is that skip/limit queries where the user would expect exclusive sets of documents returned could actually return the same documents.
I want to apply skip and limit at the mongo query level, not at the API level, because the results of querying collection B could be very large.
MapReduce and the aggregation framework appear to only work on a single collection, so they don't appear to be alternatives.
This seems like something that'd come up a lot in Mongo use - any ideas/hints would be appreciated.
Note that these posts ask similar sounding questions but don't actually address the issues raised here.
Sounds like you already have a solution (2).
You cannot optimize/skip/limit on first query, depending on search you can perhaps do it on second query.
You will need a loop around it either way, like you write.
I suppose, the .skip will always be costly for you, since you will need to get all the results and then throw them away, to simulate the skip, to give the user consistent behavior.
All the logic would have to go to your loop - unless you can match in a clever way to second query (depending on requirements).
Out of curiosity: Given the time passed, you should have a solution by now?!