Does Firestore ever purge its own document cache? - google-cloud-firestore

Assume every time an app launches, a listener is attached to a certain document in Firestore. Is it safe to assume that this document will always be available in the cache should the app ever go offline given that a listener is always attached to it? Does Firestore cache some documents with higher priority like this one because they are always accessed? And does Firestore ever purge this cache behind the scenes?
I ask because I have the option to backup this data to disk on the client but wonder if it's even necessary—when would this backup ever really be used?

Documents do not stay in cache forever. The cache has a maximum size, and if a new document in cache would cause that limit to be exceeded, old documents will be evicted from cache. The cache could also be purged by the user if they clear the app's data. According to the documentation:
When persistence is enabled, Cloud Firestore caches every document received from the backend for offline access. Cloud Firestore sets a default threshold for cache size. After exceeding the default, Cloud Firestore periodically attempts to clean up older, unused documents. You can configure a different cache size threshold or disable the clean-up process completely.
If you must have a document available locally at all times, you should implement your own persistent storage to get that guarantee.
To read more about how the cache works, read this post.

Related

General question: Firestore offline persistence and synchronization

I could not find detailed information in the documentation. I have several questions regarding the offline persistence of firestore.
I understood that firestore locally caches everything and syncs back once online. My questions:
If I attach an onCompleteListener to my setDocument method it only fires when the device is online and has network access. But with offline persistence enabled, how can I detect that data has successfully been written to the cache (Is it always successful?!) - I see data is immediatly there without any listener ever triggering.
What if I wrote data to the cache while the device is offline then comes back online and everything gets synched. What if now any sort of error happens (So the onSuccessListener would contain an error, but the persistence cache already has the data). How do I know that offline and online data are ALWAYS in sync once network connection is restored on all devices?
What about race conditions? Lets say two users update a document at the "same time" while the device is offline. What happens once it comes back online?
But the most pressing question is: right now I continue with my programflow when the onSuccessListener fires, but it never does as long as the device is offline (showing an indefinete progress bar forever). I still need to continue with my program (thats why we have offline persistence) - How do I do this?
How can I detect that data has successfully been written to the cache
This is the case when the statement that write the data has completed. If writing to the local cache fails, an exception is thrown from that write statement.
You second point is hard to summarize, but:
Firestore keeps the pending writes separate from the snapshots it returns for local reads, and will update the cached snapshot correctly both for successful and for rejected writes.
If you want to know whether the snapshot you read contains any pending writes, you can check the pendingWrites field in its metadata.
What about race conditions? Let's say two users update a document at the "same time" while the device is offline. What happens once it comes back online?
The last write wins. If that's not what you need, use security rules to enforce your requirements on the server.

Only commit changes to Firestore on app/widget close

As Firestore charges by the read/write, it would be super helpful to keep the changes in memory during the session and only commit them when the user exists either the entire app or a specific section. Is there a way to do that in a Flutter web application?
I think one problem with this approach is that the user might just close the tab including your app. In this case, you have no time to send your data to Firestore.
This aside, you could use packages like Hive to store your documents offline and later run a function to add the data to Firestore later.
You also have 50k reads and 20k writes for free with Firebase, which is sufficient for smaller apps. If you exceed this limit, your app is probably big enough to earn money with it anyway.

Keep firesotrore collection available offline and sync it on foreground

What is the best approach to keep data of collection available offline (0 - 100 docs) and sync it on app startup if connection available?
UPD: I'm looking to setPersistenceEnabled but is there any guarantee my collection will be cached after first retrieve?
If you call setPersistenceEnabled(true) these documents will be available offline and synced when connection is available again (not necessarily at app startup). Check documentation:
https://firebase.google.com/docs/database/android/offline-capabilities
By enabling persistence, any data that the Firebase Realtime Database
client would sync while online persists to disk and is available
offline, even when the user or operating system restarts the app. This
means your app works as it would online by using the local data stored
in the cache. Listener callbacks will continue to fire for local
updates.
Also be aware that when you use this, then at app startup your listeners will be called TWICE! One - for offline (cached) data and second - online (read from firebase).

How to perform transactions in firestore when user is offline?

I'm creating a multi page application where i need to create and store transactions even when the users are offline. How do i achieve this using firestore ? Also i need some idea on how to persist the data received from the firestore locally.
You can't run transactions when offline,but if you think that your data is not changed while you are offline you can get the data from cache and update it it there using dbRef.addSnapshotListener(MetadataChanges.INCLUDE) and dbRef.update()
How to perform transactions in Firestore when the user is offline?
You cannot! Transactions are not supported for offline use, they can't be cached or saved for later. This is because a transaction absolutely requires round trip communications with the server in order to ensure that the code inside the transaction completes successfully. So you can use transaction only while online because the transactions are network dependent.
Also I need some idea on how to persist the data received from the Firestore locally.
According to the official documentation of Cloud Firestore regarding offline persistence:
For Android and iOS, offline persistence is enabled by default. To disable persistence, set the PersistenceEnabled option to false.
For the web, offline persistence is disabled by default. To enable persistence, call the enablePersistence method. Cloud Firestore's cache isn't automatically cleared between sessions. Consequently, if your web app handles sensitive information, make sure to ask the user if they're on a trusted device before enabling persistence.
Important: For the web, offline persistence is an experimental feature that is supported only by Chrome, Safari, and Firefox web browsers. Also, if a user opens multiple browser tabs that point to the same Cloud Firestore database, and offline persistence is enabled, Cloud Firestore will work correctly only in the first tab.
Edit:
The Firestore SDK for Android has a local cache that's enabled by default. So all read operations will come from the cache when there is no connectivity. So Firestore provides this feature to handle offline data. This means that if the user tries to add/delete documents while offline, every operation is added to a queue. Once the user regains the connection, every change that is made while offline will be updated on Firebase servers. In other words, all queries will be committed on the server.
Please also note that when you are offline, pending writes that have not yet been synced to the server are held in a queue. If you do too many write operations without going online to sync them, that queue will grow fast and it will not slow down only the write operations it will also slow down your read operations. So I suggest using this database for its online capabilities.

Caching strategy to reduce load on web application server

What is a good tool for applying a layer of caching between a webserver and an application server.
Basic Requirements:
The application server needs a way to remove items from the cache and put items in the cache with an expiration date.
The webserver needs a way to pull items out of the cache in a very light-weight, fast manner without requiring thread allocation on the application server.
It does not neccessarily need to be a distributed cache (accessible from multiple machines), but it wouldn't hurt.
Strategies I have considered:
Static file caching. Request comes in, gets hashed, if a file exists we serve it, if not we route the request to the app server. Is high I/O a problem or file locking problems due to concurrency? Is it accurate that the file system is actually very fast due to kernel level caching in memory.
Using a key-value DB like mongodb, or redis. This would store the finished HTML/JSON fragments in db. The webserver would be equipped to read from the DB and route to the app server if needed. The app server would be equipped to insert/remove from the DB.
A memory cache like memcached or Varnish (don't know much about Varnish). My only concern with memcached is that I'm going to want to cache 3 - 10 gigabytes of data at any given time, which is more than I can safely allocate in memory. Does memcached have a method to spill to the filesystem?
Any thoughts on some techniques and pitfalls when trying this type of caching layer?
You can also use GigaSpaces XAP in memory data grid for caching and even hosting your web application. You can choose just the caching option or combine the power of two and gain single management of your environment along other things.
Unlike the key value pair approach you suggested, using GigaSpaces XAP you'll be able to have complex queries such as SQL, object based temples and much more. In your caching scenario you should check out more specifically the local cache related features.
Local Cache
Web Container
Disclaimer, I am a developer in GigaSpaces.
Eitan
Just to answer this from the POV of using Coherence (http://coherence.oracle.com/):
1. The application server needs a way to remove items from the cache and put items in the cache with an expiration date.
// remove one item from cache
cache.remove(key);
// remove multiple items from cache
cache.keySet().removeAll(keylist);
2. The webserver needs a way to pull items out of the cache in a very light-weight, fast manner without requiring thread allocation on the application server.
// access one item from cache
Object value = cache.get(key);
// access multiple items from cache
Map mapKV = cache.getAll(keylist);
3. It does not neccessarily need to be a distributed cache (accessible from multiple machines), but it wouldn't hurt.
Elastic. Just add nodes. Auto-discovery. Auto-load-balancing. No data loss. No interruption. Every time you add a node, you get more data capacity and more throughput.
Automatic high availability (HA). Kill a process, no data loss. Kill a server, no data loss.
A memory cache like memcached or Varnish (don't know much about Varnish). My only concern with memcached is that I'm going to want to cache 3 - 10 gigabytes of data at any given time, which is more than I can safely allocate in memory. Does memcached have a method to spill to the filesystem?
Use both RAM and flash. Transparently. Easily handle 10s or even 100s of gigabytes per Coherence node (e.g. up to a TB or more per physical server).
For the sake of full disclosure, I work at Oracle. The opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer.