What does the distinct on clause mean in cloud datastore and how does it effect the reads? - google-cloud-firestore

This is what the cloud datastore doc says but I'm having a hard time understanding what exactly this means:
A projection query that does not use the distinct on clause is a small operation and counts as only a single entity read for the query itself.
Grouping
Projection queries can use the distinct on clause to ensure that only the first result for each distinct combination of values for the specified properties will be returned. This will return only the first result for entities which have the same values for the properties that are being projected.
Let's say i have a table for questions and i only want to get the question text sorted by the created date would this be counted as a single read and rest as small operations?

If your goal is to just project the date and text fields, you can create a composite index on those two fields. When you query, this is a small operation with all the results as a single read. You are not trying to de-duplicate (so no distinct/on) in this case and so it is a small operation with a single read.

Related

Firestore 1 global index vs 1 index per query what is better?

I'm working on my app and I just ran into a dilemma regarding what's the best way to handle indexes for firestore.
I have a query that search for publication in a specify community that contains at least one of the tag and in a geohash range. The index for that query looks like this:
community Ascending tag Ascending location.geohash Ascending
Now if my user doesnt need to filter by tag, I run the query without the arrayContains(tag) which prompt me to create another index:
community Ascending location.geohash Ascending
My question is, is it better to create that second index or, to just use the first one and specifying all possible tags in arrayContains in the query if the user want no filters on tag ?
Neither is pertinently better, but it's a typical space vs time tradeoff.
Adding the extra tags in the query adds some overhead there, but it saves you the (storage) cost for the additional index. So you're trading some small amount of runtime performance for a small amount of space/cost savings.
One thing to check is whether the query with tags can actually run on just the second index, as Firestore may be able to do a zigzag merge join. In that case you could only keep the second, smaller index and save the runtime performance of adding additional clauses, but then get a (similarly small) performance difference on the query where you do specify one or more tags.

Do I need to separate a 'whereIn' and a field value that can contain two values in firestore?

I'm querying two fields in a document. I'm using the whereIn(that's what it's called in flutter, for web it's called in) operator for one field. The other field I need to query is a string. And I want to query this value against two values. Do I need to separate the queries or is there a way to do the full query in a single query..
According to the Firestore documentation on query limitations:
You can use only one in or array-contains-any clause per query. You can't use both in and array-contains-any in the same query.
Since you need two whereIn/in clauses, you need to execute a separate query for each value in the second whereIn and then merge the results in your application code.

Using Mongo: should we create an index tailored to each type of high-volume query?

We have two types of high-volume queries. One looks for docs involving 5 attributes: a date (lte), a value stored in an array, a value stored in a second array, one integer (gte), and one float (gte).
The second includes these five attributes plus two more.
Should we create two compound indices, one for each query? Assume each attribute has a high cardinality.
If we do, because each query involves multiple arrays, it doesn't seem like we can create an index because of Mongo's restriction. How do people structure their Mongo databases in this case?
We're using MongoMapper.
Thanks!
Indexes for queries after the first ranges in the query the value of the additional index fields drops significantly.
Conceptually, I find it best to think of the addition fields in the index pruning ever smaller sub-trees from the query. The first range chops off a large branch, the second a smaller, the third smaller, etc. My general rule of thumb is only the first range from the query in the index is of value.
The caveat to that rule is that additional fields in the index can be useful to aid sorting returned results.
For the first query I would create a index on the two array values and then which ever of the ranges will exclude the most documents. The date field is unlikely to provide high exclusion unless you can close the range (lte and gte). The integer and float is hard to tell without knowing the domain.
If the second query's two additional attributes also use ranges in the query and do not have a significantly higher exclusion value then I would just work with the one index.
Rob.

mongodb computed field based on another query

I have a mongodb query, and I want to add a computed field. The computed field is based on where or not the item is in the results of another query. So my query returns the columns a,b,c,d, and then column e should be based on whether or not the current row would be matched by another query.
Is there an efficient way to do this in mongo? I'm not really sure how to do this one...
There is no way currently to execute a function as you describe within the database when returning a document via standard functions such as find. It's been requested by the community, but the general request is to operate only on a single document.
There are calculated fields using $project in the aggregation framework. But, they only operate on the current document in the pipeline. So, they can't summarize other queries.
You'll need to likely build your e value as part of your data access layer.

MongoDB: How to execute a query to result of another query (nested queries)?

I need to apply a set of filters (queries) to a collection. By default, the MongoDB applies AND operator to all queries submitted to find function. Instead of whole AND I need to apply each query sequentially (one by one). That is, I need to run the first-query and get a set of documents, run the second-query to result of first-query, and so on.
Is this Possible?
db.list.find({..q1..}).find({..q2..}).find({..q3..});
Instead Of:
db.list.find({..q1..}, {..q2..}, {..q3..});
Why do I need this?
Bcoz, the second-query needs to apply an aggregate function to result of first-query, instead of applying the aggregate to whole collection.
Yes this is possible in MongoDB. You can write nested queries as per the requirement.Even in my application I created nested MongoDb queries.If you are familiar with SQL syntax then compare this with in of sql syntax:
select cname from table where cid in (select .....)
In the same way you can create nested MongoDB queries on different collections also.