I am using swift and Firestore and in my application I have a snapshotlistener which retrieves data every time some documents are changed. As I expect this to happen many times a second, I would like to limit the snapshotlistener to retrieve data once every 2 seconds, say. Is this possible? I looked everywhere but could not find anything.
Cloud Firestore stores your data in multiple data centers, and only confirms the write operations once it's written to all of those. For this reason the maximum update frequency of a single document in Cloud Firestore is roughly once per second. So if your plan is to update a document many times per second, that won't work anyway.
There is no way to set a limit on how frequently Firestore broadcasts out updates to the underlying data. If the data gets updated, it is broadcast out to all active listeners.
The typical solution would be to limit how frequently you update the data. If nobody is going to see a significant chunk of the updates, you might as well not write them to the database. This sort of logic if often accomplished with a client side throttle/debounce (see 1, 2).
Related
I am using Google Cloud Functions and read, write operations are performed on the Firestore thru these Cloud Functions. We are seeing unexpectedly high number of read operations on Firestore, the source of which I am unable to figure out.
Not more than 20K documents are generated on a daily basis. But the daily read count is usually more than 25,000,000
What I am looking for, is ways to identify the root cause of these high number of reads in the Cloud Functions.
To start with, I have captured the size of the results of all the Firestore get() methods in Cloud Functions. But the sum total of all the sizes is much much much lower than the read count I mentioned above.
Need suggestions on ways/practices to identify the source from where these high reads are generating.
You can use a SnapshotListener as a workaround, which allows us to listen for changes in real-time.
You will be charged for readings as if we had sent a new query if the listener is disconnected for more than 30 minutes. If the listener is disconnected every 31 minutes in the worst-case scenario, we will be charged 50 reads each time.
As a result, this technique is only practicable when the listener is not frequently disconnected.
According to the documentation, I found you can reduce the number of reads using get().
In each document in the collection, you must add a new property named lastModified of type Date. When you create a new document or edit an existing one, you must use FieldValue.serverTimestamp() to set or update the field's value.
Reviewing Firebase documentation, I found that high read or write rates to lexicographically close documents need to be avoided to avoid contention faults in your application. Hotspotting is the term for this problem, and it can occur if your program does any of the following:
Creates new documents at a rapid rate and assigns its own IDs that
are monotonically rising.
A scatter algorithm is used by Cloud Firestore to assign document
IDs.
If you use automated document IDs to create new documents, you must
not see hotspotting on writes.
In a collection with few documents, creates new documents at a fast
rate.
Creates new documents at a rapid rate with a monotonically growing
field, such as a timestamp.
Deletes a large number of documents from a collection.
Writes to the database at a rapid rate without growing traffic
gradually.
I have a collection that is about 2000 docs in size. I want to stream a List of just their ids.
As follows...
Stream<QuerySnapshot> stream = FirebaseFirestore.instance.collection('someCollection').snapshots();
return stream.map((querySnapshot) => querySnapshot.docs.map((doc) => doc.id).toList());
Will this be a significant performance issue on the client side? Is this an unrealistic approach?
When you query for documents in Firestore using the client SDKs, you are pulling down the entire document every time. There is no way to reduce the amount of data - all of the matching documents are sent across the wire in their entirety.
As such, you use of map() to extract only the document ID has no real effect on performance here, since it runs after the query is complete and you have all of that data in a snapshot on the client. All you are doing is trimming down the entire document down to a string, but you are not saving on the cost of transferring that entire document.
If you want to make this faster, you should make the query on a backend (such as Cloud Functions), ideally in the same region as your Firestore instance, and trim the documents in your backend code before you send the data to the frontend. That will save you the cost of unnecessarily trasferring the contents of the document you don't need.
Read: Should I query my database directly or use Cloud Functions?
Performance implications will mostly come from:
The bandwidth consumed for transferring the documents.
The memory used for keeping the DocumentSnapshot objects.
Since you're throwing the document.data() of each snapshot away, that is quite directly where the quickest gains will be. If your documents have few fields, the gains will be small. If they have many fields, the gains will larger.
If you have a lot to gain by not transferring the fields and keeping them in memory, the main options you have are:
Use a server-side SDK to get only the document IDs, and transfer only that back to the client.
Create secondary/proxy documents that contain only the ID, and no data.
While the first approach is tempting because it reduces data duplication, the second one is typically a lot simpler to implement (as you're only going to be impacting the code that handles data writes).
If I create multiple onSnapshot listeners for the same document in different places in my code, will I be charged once (one document) or multiple times (for each listener).
Does it make sense to write a wrapper around Firestore that does this or is this built-in?
As per documentation:
Cloud Firestore allows you to listen to the results of a query and get
realtime updates when the query results change.
When you listen to the results of a query, you are charged for a read
each time a document in the result set is added or updated. You are
also charged for a read when a document is removed from the result set
because the document has changed. (In contrast, when a document is
deleted, you are not charged for a read.)
Also, if the listener is disconnected for more than 30 minutes (for
example, if the user goes offline), you will be charged for reads as
if you had issued a brand-new query.
What you decide to do afterwards will heavily depend on your use case and your application needs.
When documents on firestore is read, firestore wont give references data, if any. so currently I am requesting firestore for data from reference path. Do this increase in number of requests to server, eventually decrease in performance and increase in pricing ? How storing references is helpful in terms of requesting data from server ?
Reading a document that has a reference counts as a read of that document. Reading the referenced document count as a read of another document. So in total that is two reads.
There is no hidden cost-inflation here: if the server were to automatically follow the reference, it would also have to read both documents.
If you're looking to minimize the number of documents you read, you can consider adding the minimum data you need from the referenced document into the document containing the reference. For example, if you have a chat app:
you might want to include the display name of each user posting the message in the message itself, so that you don't have to read the user's profile document.
if you do so, you'll have to consider what to do if the user updates their display name. See my answer here for some options: How to write denormalized data in Firebase
the number of users is likely smaller than the number of chat messages (and rather limited in a specific time-frame), making the number of reads of linked documents lower than the number of messages.
by duplicating the data, you may be inflating the bandwidth usage, especially if the number of users is much lower than the number of messages.
What this boils down to is: you're likely optimizing prematurely, but even if not: there's no one-size-fits-all approach. NoSQL data modeling depends on the use-cases of your app, and Firestore is no different.
I am collecting data from a streaming API and I want to create a real-time analytics dashboard. This dashboard will display a simple timeseries plotting the number of documents per hour. I am wondering if my current approach is optimal.
In the following example, on_data is fired for each new document in the stream.
# Mongo collections.
records = db.records
stats = db.records.statistics
on_data(self, data):
# Create a json document from data.
document = simplejson.loads(data)
# Insert the new document into records.
records.insert(document)
# Update a counter in records.statistics for the hour this document belongs to.
stats.update({ 'hour': document['hour'] }, { '$inc': { document['hour']: 1 } }, upsert=True)
The above works. I get a beautiful graph which plots the number of documents per hour. My question is about whether this approach is optimal or not. I am making two Mongo requests per document. The first inserts the document, the second updates a counter. The stream sends approximately 10 new documents a second.
Is there for example anyway to tell Mongo to keep the db.records.statistics in RAM? I imagine this would greatly reduce disk access on my server.
MongoDB uses memory map to handle file I/O, so it essentially treats all data as if it is already in RAM and lets the OS figure out the details. In short, you cannot force your collection to be in memory, but if the operating system handles things well, the stuff that matters will be. Check out this link to the docs for more info on mongo's memory model and how to optimize your OS configuration to best fit your use case: http://docs.mongodb.org/manual/faq/storage/
But to answer your issue specifically: you should be fine. Your 10 or 20 writes per second should not be a disk bottleneck in any case (assuming you are running on not-ancient hardware). The one thing I would suggest is to build an index over "hour" in stats, if you are not already doing that, to make your updates find documents much faster.