React Query: order of refetch queries? - react-query

When React Query needs to invalidate multiple queries, in which order does it do that? Or is it random?
Does it dispatch multiple requests together, or one at a time?

It's one at a time (no batching), and I don't think there is any specific ordering that you should rely upon.

Related

MongoDB watch single document [scalability]

This is MongoDB's api:
db.foo.watch([{$match: {"bar.baz": "qux" }}])
Let's say that collection foo contains millions of documents. The arguments passed into watch indicate that for every single document that gets updated the system will filter the ones that $match the query (but it will be triggered behind the scenes with any document change).
The problem is that as my application scales, my listeners will also scale and my intuition is that I will end up having n^2 complexity with this approach.
I think that as I add more listeners, database performance will deteriorate due to changes to documents that are not part of the $match query. There are other ways to deal with this, (web sockets & rooms) but before prematurely optimizing the system, I would like to know if my intuition is correct.
Actual Question:
Can I attach a listener to a single document, such that watch's performance isn't affected by sibling documents?
When I do collection.watch([$matchQuery]), does the MongoDB driver listen to all documents and then filters out the relevant ones? (this is what I am trying to avoid)
The code collection.watch([$matchQuery]) actually means watch the change stream for that collection rather than the collection directly.
As far as I know, there is no way to add a listener to a single document. Since I do not know of any way, I will give you a couple tips on how to avoid scalability problems with the approach you have chosen. Your code appears to be using change streams. It should not cause problems unless you open too many change streams.
There are two ways to accomplish this task by watching the entire collection with a process outside of that won't lead to deterioration of the database performance.
If you use change streams, you can open only a single change stream with logic that checks for all the conditions you need to filter for over time. The mistake is that people often open many change streams for single document filtering tasks, and that is when people have problems.
The simpler way, since you mentioned Atlas, is to use Triggers. You can use something called a match expression in your Triggers configuration to prevent any operations on your collection unless the
match expression evaluates to true. As noted in the documentation, the trigger function will not execute unless a field status in this case is updated to "blocked", but many match expressions are available:
{
"updateDescription.updatedFields": {
"status": "blocked"
}
}
I hope this helps. If not, I can keep digging. I think with change streams or Triggers, you are ok if you want to write a bit of code. :)

The efficiency of the Update Query in CrateData

When executing an update query which triggers a lot of (e.g. a million) records to be updated. As I understand, the underlying index system needs to re-ingest the doc. So for this kind of "heavy" job, is there a way to control its working load, i.e., update with a fix rate until it's finished?
currently it is not possible to do throttling of update queries.
probably it would help for your use-case to split the updates into parts, by adding specific filters.
for example if you have a timestamp field you could just update each month seperately by adding queries accordingly.

MongoDB. Use cursor as value for $in in next query

Is there a way to use the cursor returned by the previous query as a value for $in in the next query? For example, something like this:
var users = db.user.find({state:1})
var offers = db.offer.find({user:{$in:users}})
I think this can reduce the traffic between mongodb and client in case the client doesn't need user information at all, just offers. Am i wrong?
Basically you want to do a join between two collections which Mongo doesn't support. You can reduce the amount of data being transferred from the server by limiting the fields returned from the first query to only the unique user information (i.e. the _id) that you need to get data from the offers collection.
If you really just want to make one query then you should store more information in the offers collection. For example, if you're trying to find offers for active users then you would store the active state of the user in the offers collection.
To work from your comment:
Yes, that's why I used tag 'join' in a question. The idea is that I
can make a first query more сomplex using a bunch of fields and
regexes without storing user data in other collections except
references. In these cases I always have to perform two consecutive
queries, but transfering of the results of the first query is not
necessary neither for me nor for the mongodb itself. I just want to
understand could it be done now, will it be possible to do so in the
future or it cannot be implemented for some technical reasons
As far as I understand it there is no immediate hurry to make this possible. Also the way it is coded atm will make this quite a big change to the way cursors work and are defined. A change big enough to possibly cause implementation breaks for other people. It is really a case of whether to set safe for inserts and updates for all future drivers. It is recognised that safe should be default but this will break implementation for other people who expect it the other way around.
It is rather inefficient if you don't require the results of the first query at all however since most networks are prepped with high traffic in mind and the traffic is cheap there hasn't been a demand to make it able to do chained queries server side in the cursor.
However subselects (which this basically is, it is selecting a set of rows based upon a sub selection of previous rows) have been on mongodb-user a couple of times and there might even be a JIRA for it somewhere, if not might be useful to make one.
As for doing it right now: there is no way.

What's the best way to find the most frequently occurring value in MongoDB?

I'm looking for the equivalent of this sort of SQL query.
SELECT field, count(*) as counter from table order by counter DESC
What's the best way to achieve this?
Thanks
Use Map-Reduce. Map each document by emitting the key and a value 1, then aggregate them using a simple reduce operation. See http://www.mongodb.org/display/DOCS/MapReduce
I'd handle aggregation queries by keeping track of the respective counts separately, i.e. in their own collection. This way, you can simply query the "most frequently occurring" collection. Downside: you need to perform another write whenever the data changes.
Of course, you could also update that collection from time to time using Map/Reduce. This depends a bit on how accurate the information must be and how often it changes.
Make sure, however, not to call the Map/Reduce operation very often: It is not meant to be used in an interactive fashion (i.e. not in every page view) but rather scarcely in an offline process that updates the counts every hour or so. Hence, if your counts change very quickly, use a counters collection.

Strategies for keeping Map Reduce results around for subsequent queries

I'm using Map Reduce with MongoDB. Simplified scenario: There are users, items and things. Items include any number of things. Each user can rate things. Map reduce is used to calculate the aggregate rating for each user on each item. It's a complex formula using the ratings for each thing in the item and the time of day - it's not something you could ever index on and thus map-reduce is an ideal approach to calculating it.
The question is: having calculated the results using Map Reduce what strategies do people use to maintain these per-user results collections in their NOSQL databases?
1) On demand with automatic deletion: Keep them around for some set period of time and then delete them; regenerate them as necessary when the user makes a new request?
2) On demand never delete: Keep them around indefinitely. When the user makes a request and the collection is past it's use-by date, regenerate it.
3) Scheduled: Regular process running to update all results collections for all users?
4) Other?
The best strategy depends on the nature of your map-reduce job.
If you're using a separate map-reduce call for each individual user, I would go with the first or second strategy. The advantage of the second strategy over the first strategy is that you always have a result ready. So when the user makes a request and the result is outdated, you can still present the old result to the user, while running a new map-reduce in the background to generate a fresh result for the next requests. This has the following advantages:
The user doesn't have to wait for the map-reduce to complete, which is important if the map-reduce may take a while to complete. The exception is of course the very first map-reduce call; at this point there is no old result available.
You're automatically running map-reduce only for the active users, reducing the load on the database.
If you're using a single, application-wide map-reduce call for all users, the third strategy is the best approach. You can easily achieve this by specifying an output collection. The advantages of this approach:
You can easily control the freshness of the result. If you need more up-to-date results, or need to reduce the load on the database, you only have to adjust the schedule.
Your application code isn't responsible for managing the map-reduce calls, which simplifies your application.
If a user can only see his or her own ratings, I'd go with strategy one or two, or include a lastActivity timestamp in user profiles and run an application-wide scheduled map-reduce job on the active subset of the users (strategy 3). If a user can see any other user's ratings, I'd go with strategy 3 as well, as this greatly reduces the complexity of the application.