React Query - query is not using cache? - react-query

I have the following definition of useQuery that i use in a couple of React components:
useQuery("myStuff", getMyStuffQuery().queryFn);
Where getMyStuffQuery looks like this:
export const getMyStuffQuery = () => {
return {
queryFn: () => makeSomeApiCall(),
}}
I would expect that although all of those components render, makeSomeApiCall() would only make an API call once, and the rest of the time will use the cache resulted in from this first call.
However, it seems like it keeps calling makeSomeApiCall() again and again, whenever any of said components renders.
Why is React Query not using the cache? Am i doing something wrong?

React Query will cache the data of the query by default, but that does not affect whether or not it thinks that data is stale. If it thinks data is stale, it will call the query function (hit the API) every time useQuery() is called. This means it will read the data from the cache if it has it, but since it thinks that data is stale, will still hit the API in the background to fetch any updated data.
Fortunately, you have complete control over whether or not React Query considers data to be stale. You can set a staleTime config option to control how long specific data should be considered fresh. You can even set it to Infinity to say that as long as your app is open, it should only ever call the query function (hit the API) one time. By default this value is 0, which is why you are seeing the behavior you are - React Query will refetch the data in the background every time useQuery is called because it immediately thinks that data is stale (even though it's still cached).
In your example, if you truly ever only wanted an API to be called one time, you could simply set the staleTime option to Infinity.
useQuery("myStuff", getMyStuffQuery().queryFn, { staleTime: Infinity });
This option, along with all others, can be read about in the docs here https://react-query.tanstack.com/reference/useQuery

React Query has a slightly different model of request caching.
A request can have its results cached, and those results can go stale.
Cached results are returned immediately, but if stale they are re-fetched in the background and the cache is updated.
The default configuration caches results for 5 minutes and makes them stale immediately.
See: https://tanstack.com/query/v4/docs/guides/caching
The cacheTime and staleTime can be set as part of the useQuery options object as shown here for a 5 minute cache time and 1 minute stale time.
useQuery({ queryKey: ['todos'], queryFn: fetchTodos, staleTime: 1 * 60 * 1000, cacheTime: 5 * 60 * 1000 });
The refetching cached results strategy can be changed with options like refetchOnWindowFocus
See: https://tanstack.com/query/v4/docs/guides/important-defaults
You can stop this refetching of stale values on the window focus event like this:
const client = new QueryClient({
defaultOptions: {
queries: {
refetchOnWindowFocus: false,
},
},
});

Related

RTK Query manual cache update without known previous query args

How to make optimistic/pessimistic cache update (probably with updateQueryData), but without knowing the arguments of previous queries?
updateQueryData(
"getItems",
HERE_ARE_THE_ARGUMENTS_I_DONT_HAVE,
(data) => {
...
}
)
For example:
getPosts query with args search: number
updatePosts mutation
Example actions:
Go to the table, first request getPosts with search = "" is cached.
Write in search input abc.
Second request getPosts with search = "abc" is cached.
I update one element within the table with pessimistic update - successfully modifying cache of second request (previous step)
Clear search input
Table shows the same state from first step, even if one element should be modified
But I need universal solution. I don't know how many different cached entries will be there. And also my case is more complex, because I have other args than "search" to worry about (pagination).
Possible solutions??
It would be perfect if there's a method to access all previous cached queries, so I could go with updateQueryData for each of them, but I cannot find simple way to do that.
I thought about accessing getState() within onQueryStarted and retrieving query parameters from there (in order to do above), but it's not elegant way
I thought about looking for a way to invalidate previous requests without invalidating the last request, but also cannot find it
Okay, I found solution here
Is it possible to optimistically update all caches of a endpoint?
Using
api.util.selectInvalidatedBy(getState(), { type, id }) gives you in return list of cached queries with endpointNames and queryArgs. Then you can easily mutate these caches by updateQueryData.
There's also new problem created here, which is situation where modifying some particular properties may change response from API in unpredictable way. For example in a list you may sort by status, which you can modify. Then changing the cache would be bad idea and instead of that I've created new tag which is invalidated with each update { type: "getItems", id: "status" } and for getItems it would be like { type: "getItems", id: sortBy }

Sometimes my Cloud Function returns old data from Firestore. Is it a cache problem?

Client-side, I'm using a listener to detect if the "notifications" collection of the user changes. The App calls a Cloud Function that retrieves the last three unread notifications and the total number of unread notifications.
In my App, I have this:
Listener
firestore.collection("users")
.doc(uid)
.collection("notifications")
.snapshots().listen((QuerySnapshot querySnapshot) {
NotificationsPreviewModel notificationsPreview =
await _cloudFunctionsService.getNotificationsPreview(doctor.id)
})
Cloud Function
exports.getNotificationsPreview = functions.https.onCall(async (data, context) => {
const userId = data.userId;
let notifications = [];
const notificationsDocuments = await db
.collection("users")
.doc(userId)
.collection("notifications")
.orderBy("dateTime", "desc")
.get();
notifications = notificationsDocuments.docs.map((rawNotification) =>
rawNotification.data()).filter((element) => element.unread == true);
const notificationsNumber = notifications.length;
notifications = notifications.slice(0, 3);
return { "notifications": notifications, "notificationsNumber": notificationsNumber };
});
The Cloud Function gets called only when a change is detected, so it shouldn't return old data.
The error appears only the first time the Cloud Function is called from the App's start, but not always. The following calls don't generate the error.
How can I solve this? For now, I've added a delay of 500ms, and it works perfectly, but it's not a real solution.
Based on your description, it sounds like you see some form of latency while collecting the data from Firestore. Retrieving data from the Cloud takes time, and a delay of 500ms is not excessive.
I am not familiar with Flutter enough to comment on your code. However, according to the documentation for Java:
By default, get() attempts to provide up-to-date data when possible by waiting for data from the server, but it may return cached data or fail if you are offline and the server cannot be reached. This behavior can be altered via the Source parameter.
Source:
By providing a Source value, these methods can be configured to fetch results only from the server, only from the local cache, or attempt to fetch results from the server and fall back to the cache (which is the default).
If you are online, get() checks the server for the latest data, which can take between 300ms and 1500ms depending on several factors. For example, where is your Firestore instance located in comparison to your Cloud Function and client? Try adjusting the delay and see if you can identify the timing.
There are also some soft limits you should be aware of as this might also impact your timings for how quickly you can retrieve the data. There is a maximum sustained write rate to a document of 1 per second. Sustaining a write rate above once per second increases latency and causes contention errors.
As for the documentation:
When you set a listener, Cloud Firestore sends your listener an initial snapshot of the data, and then another snapshot each time the document changes.
It seems that you are initially receiving the snapshot of the data, and then the following updates, as expected.
You can check some possible solutions to this in this post.

Contention-friendly database architecture for large documents and inner arrays

Context
I have a database with a collection of documents using this schema (shortened schema because some data is irrelevant to my problem):
{
title: string;
order: number;
...
...
...
modificationsHistory: HistoryEntry[];
items: ListRow[];
finalItems: ListRow[];
...
...
...
}
These documents can easily reach 100 or 200 kB, depending on the amount of items and finalItems that they hold. It's also very important that they are updated as fast as possible, with the smallest bandwidth usage possible.
This is inside a web application context, using Angular 9 and #angular/fire 6.0.0.
Problems
When the end user edits one item inside the object's item array, like editing just a property, reflecting that inside the database requires me to send the entire object, because firestore's update method doesn't support array indexes inside the field path, the only operations that can be done on arrays are adding or deleting an element as described inside documentation.
However, updating an element of the items array by sending the entire document creates poor performances for anyone without a good connection, which is the case for a lot of my users.
Second issue is that having everything in realtime inside one document makes collaboration hard in my case, because some of these elements can be edited by multiple users at the same time, which creates two issues:
Some write operations may fail due to too much contention on the document if two updates are made in the same second.
The updates are not atomic as we're sending the entire document at once, as it doesn't use transactions to avoid using bandwidth even more.
Solutions I already tried
Subcollections
Description
This was a very simple solution: create a subcollection for items, finalItems and modificationsHistory arrays, making them easy to edit as they now have their own ID so it's easy to reach them to update them.
Why it didn't work
Having a list with 10 finalItems, 30 items and 50 entries inside modificationsHistory means that I need to have a total of 4 listeners opened for one element to be listened entirely. Considering the fact that a user can have many of these elements opened at once, having several dozens of documents being listened creates an equally bad performance situation, probably even worse in a full user case.
It also means that if I want to update a big element with 100 items and I want to update half of them, it'll cost me one write operation per item, not to mention the amount of read operations needed to check permissions, etc, probably 3 per write so 150 read + 50 write just to update 50 items in an array.
Cloud Function to update the document
const {
applyPatch
} = require('fast-json-patch');
function applyOffsets(data, entries) {
entries.forEach(customEntry => {
const explodedPath = customEntry.path.split('/');
explodedPath.shift();
let pointer = data;
for (let fragment of explodedPath.slice(0, -1)) {
pointer = pointer[fragment];
}
pointer[explodedPath[explodedPath.length - 1]] += customEntry.offset;
});
return data;
}
exports.updateList = functions.runWith(runtimeOpts).https.onCall((data, context) => {
const listRef = firestore.collection('lists').doc(data.uid);
return firestore.runTransaction(transaction => {
return transaction.get(listRef).then(listDoc => {
const list = listDoc.data();
try {
const [standard, custom] = JSON.parse(data.diff).reduce((acc, entry) => {
if (entry.custom) {
acc[1].push(entry);
} else {
acc[0].push(entry);
}
return acc;
}, [
[],
[]
]);
applyPatch(list, standard);
applyOffsets(list, custom);
transaction.set(listRef, list);
} catch (e) {
console.log(data.diff);
}
});
});
});
Description
Using a diff library, I was making a diff between previous document and the new updated one, and sending this diff to a GCF that was operating the update using the transaction API.
Benefits of this approach being that since transaction happens inside GCF, it's super fast and doesn't consume too much bandwidth, plus the update only requires a diff to be sent, not the entire document anymore.
Why it didn't work
In reality, the cloud function was really slow and some updates were taking over 2 seconds to be made, they could also fail due to contention, without firestore connector knowing it, so no possibility to ensure data integrity in this case.
I will be edited accordingly to add more solutions if I find other stuff to try
Question
I feel like I'm missing something, like if firestore had something I just didn't know at all that could solve my use case, but I can't figure out what it is, maybe my previously tested solutions were badly implemented or I missed something important. What did I miss? Is it even possible to achieve what I want to do? I am open to data remodeling, query changes, anything, as it's mostly for learning purpose.
You should be able to reduce the bandwidth required to update your documents by using Maps instead of Arrays to store your data. This would allow you to send only the item that is being updated using its key.
I don't know how involved this would be for you to change, but it sounds like less work than the other options.
You said that it's not impossible for your documents to reach 200kb individually. It would be good to keep in mind that Firestore limits document size to 1mb. If you plan on supporting documents beyond that, you will need to find a way to fragment the data.
Regarding your contention issues... You might consider a system that "locks" the document and prevents it from receiving updates while another user is attempting to save. You could use a simple message system built with websockets or Firebase FCM to do this. A client would subscribe to the document's channel, and publish when they are attempting an update. Other clients would then receive a notice that the document is being updated and have to wait before they can save their own changes.
Also, I don't know what the contents of modificationsHistory look like, but that sounds to me like the type of data that you might keep in a subcollection instead.
Of the solutions you tried, the subcollection seems like the most scalable to me. You could look into the possibility of not using onSnapshot listeners and instead create your own event system to notify clients of changes. I suppose it could work similar to the "locking" system I mentioned above. A client sends an event when it updates an item belonging to a document. Other clients subscribed to that document's channel will know to check the database for the newest version.
Your diff-approach appeared mostly sensible, details aside.
You should store items inline, but defer modificationsHistory into a sub collection. For the entire root document, record which elements of modificationsHistory have been merged yet (by timestamp should suffice), and all elements not merged yet, you have to re-apply individually on each client, querying with aforementioned timestamp.
Each entry in modificationsHistory should not describe a single diff, but whenever possible a set of diffs.
Apply changes from modificationsHistory collections onto items in batch, deferred via GCF. You may defer this arbitrarily far, and you may want to exclude modifications performed only in the last few seconds, to account for not established consistency in Firestore. There is no risk of contention, that way.
Cleanup from the modificationsHistory collection has to be deferred even further, until you can be sure that no client has still access to an older revision of the root document. Especially if you consider that the client is not strictly required to update the root document when the listener is triggered.
You may need to reconstruct the patch stack on the client side if modificationsHistory changes in unexpected ways due to eventual consistency constraints. E.g. if you have a total order in the set of patches, you need to re-apply the patch stack from base image if the collection unexpectedly suddenly contains "older" patches unknown to the client before.
All in all, you should be able avoid frequent updates all together, and limit this solely to inserts into to modificationsHistory sub-collection. With bandwidth requirements not exceeding the cost of fetching the entire document once, plus streaming the collection of not-yet-applied patches. No contention expected.
You can tweak for how long clients may ignore hard updates to the root document, and how many changes they may batch client-side before submitting a new diff. Latter is also a tradeof with regard to how many documents another client has to fetch initially, with regard to max-documents-per-query limits.
If you require other information which are likely to suffer from contention, like list of users currently having a specific document open, that should go into sub-collections as well.
Should the latency for seeing changes by other users eventually turn out to be unacceptable, you may opt for an additional, real-time capable data channel for distribution of patches on a specific document. ActiveMQ or some other message broker operated on dedicated resources, running independently from FireStore.

Saving the same document twice concurrently will override the other

Saving the same document twice concurrently will only save one.
I have this flow in my app:
doc.money = 0
get doc (flow 1)
get doc (flow 2)
change doc.money += 5 (flow 1)
change doc.money += 10 (flow 2)
save doc (flow 1)
save doc (flow 2)
Now my doc.money is equal to 10 instead of 15.
How to fix this problem? Not even an error is thrown..
Update with inc: 5 can't be used in my app because of this:
Logic.js (shared both on client and on server):
var logic = function(doc, options){
doc.a = options.x;
// Some very complex logic here...
}
Server.js
// incoming ajax request
// query database and get a doc
logic(doc, options)
doc.save(...)
Client.js
// I have my doc
logic(doc, options);
// Now I have my logic applied
Benefits?
I only write once the logic.js of my app.
No bugs by forgetting to update some part of the logic.
Classic way
Server.js
// incoming ajax request
// query database and get a doc
// Some very complex logic here...
var update = {/*insert here the complex part*/}
Doc.update(cond, update, ...)
Client.js
// I have my doc
// Some very complex logic here...
// Now I have my logic applied
Conclusions
As you can see, in the classical way, you have your logic twice, in my way only once, and changes reflects both the client and the server side logic.
This is actually nothing to do with with 2 phase commits but rather versioning.
Two separate threads in your application are sending two different versions of the same document down.
The best way to to fix this in any database, including ACID ones, is to use versioning: http://askasya.com/post/trackversions
It's called Race Condition. And it's tricky to solve it in MongoDB as opposed to typical SQL databases. They have a solution (or rather a hack) on cookbook.
Basically, within document you have a state key. For every transaction, you keep tab of it. For example, If state is ready, you can perform the work on it. But first you change the state to pending. Once done, you set it back to ready again. So whichever process first gets to it, changes the state, saves it and then next process works on it. You can extend the idea and make it more fail-safe. Have a look at the cookbook link.

Atomic get and delete in memcached?

Is there a way to do atomic get-and-delete in memcached?
In other words, I want to get the value for a key if it exists and delete it immediately, so this value can be read once and only once.
I think this pseudocode might work, but note the caveat postscript:
# When setting:
SET key-0 value
SET key-ns 0
# When getting:
ns = INCR key-ns
GET key-{ns - 1}
Constraint: I have millions of keys that could be accessed millions of times, and only a small percentage will have a value set at any given time. I don't want to have to update an atomic counter for every key with every get access request as above.
The canonical, but yet generic, answer to your question is : lock free hash table with a relaxed memory model.
The more relaxed is your memory model the more you get with a good lock free design, it's a way to get more performance out of the same chipset.
Here is a talk about that, I don't think that it's possible to answer to your question with a single post on hash tables and lock free programming, I'm not even trying to do that.
You cannot do this with memcached in a single command since there is no api that supports exactly what your asking for. What I would do to get the behavior your looking for is to implement some sort of marking behavior to signify that another client has or hasn't read the data. For example, you could create a JSON document as follows:
{
"data": "value",
"used": false
}
When you get the item check to see if it has already been used by another client by examining the used field. If it hasn't been used then set the value using the cas you got from the GET command and make sure that the document is updated to reflect the fact that a client has already accessed this key.
If the set operation fails because the cas is invalid then this means that another client has obtained this item and already updated it in memcached to signify that it has been used. In this case you just cancel whatever you were doing with the item and move on.
If the set operation succeeds then this means you client is the sole owner of this data. You can now delete it from memcached and do whatever processing on it you like.
Note that when doing the set I would also add an expiration time of about 5 seconds. This way if you application crashes your documents will clean themselves up if you don't finish with the entire process of deleting them.
To put some code to the answer from #mikewied, I think the basic gist is... (using Node.js):
var Memcached = require('memcached');
var memcache = new Memcached('localhost:11211');
var getOnce = function(key, callback) {
// gets is the check-and-set get (vs regular get)
memcache.gets(key, function(err, data) {
if (!data) {
// Cache miss, nothing to see here.
callback(null);
} else {
var yourData = data[key];
// Do a check-and-set to remove the data from the cache.
// This sets the value to null *only* if no one else already did.
memcache.cas(key, null /* new data */, data.cas, 10, function(err) {
if (err) {
// Check-and-set failed! (Here we'll treat it like a cache miss)
yourData = null;
}
callback(yourData);
});
}
});
};
I'm not an expert on Memcached and so I may be wrong. My answer is from reading the documentation and my experience using Memcached.
IMO this is not possible with memcached's current implementation.
to demonstrate why this is not possible currently here is a simple example to demonstrate the race condition:
two processes start at the same time
both execute a get/delete at the same time
memcached replies to both get commands at the same time
done (the desired result was to have get/delete execute atomically then the second get/delete to fail. instead memcached did get, get, delete, fails to delete)
to get an atomic get/delete would require:
a new command for memcached that is atomic let's call it get_delete
some sort of synchronization lock method of all the memcached clients to ensure both the get and delete commands are executed while the lock is held
so all clients would grab the synchronization lock whenever they need to enter the critcal section (i.e. get, delete) then release the lock after the critical section