How does cloud FireStore read count work? - google-cloud-firestore

This is image of my project where is only one map1 document
inside the document multiple map object
when I fetch data only map1 document then my read count increased 25 times sometimes 18.
So my question is that why is increasing like this?
And second thing that I read fireStore document where mentioned that
according to your Number of document is return by query as read count increase.

On the second question first:
The read count you refer to is called document read count. As that name implies it is increment by one for every document that is read on the server on your behalf. So if you request a numner of documents from the server, you will be charged for that many document reads.
The first question is harder to say, because we have no way of reproducing the issue based on your post. But the most common cause of unexpected reads for folks new to Firestore is keeping the Firebase console open.
If you have the Firestore console open, it also reads documents; and those are charged document reads too.

Related

Unexpectedly High no. of Reads

I am using Google Cloud Functions and read, write operations are performed on the Firestore thru these Cloud Functions. We are seeing unexpectedly high number of read operations on Firestore, the source of which I am unable to figure out.
Not more than 20K documents are generated on a daily basis. But the daily read count is usually more than 25,000,000
What I am looking for, is ways to identify the root cause of these high number of reads in the Cloud Functions.
To start with, I have captured the size of the results of all the Firestore get() methods in Cloud Functions. But the sum total of all the sizes is much much much lower than the read count I mentioned above.
Need suggestions on ways/practices to identify the source from where these high reads are generating.
You can use a SnapshotListener as a workaround, which allows us to listen for changes in real-time.
You will be charged for readings as if we had sent a new query if the listener is disconnected for more than 30 minutes. If the listener is disconnected every 31 minutes in the worst-case scenario, we will be charged 50 reads each time.
As a result, this technique is only practicable when the listener is not frequently disconnected.
According to the documentation, I found you can reduce the number of reads using get().
In each document in the collection, you must add a new property named lastModified of type Date. When you create a new document or edit an existing one, you must use FieldValue.serverTimestamp() to set or update the field's value.
Reviewing Firebase documentation, I found that high read or write rates to lexicographically close documents need to be avoided to avoid contention faults in your application. Hotspotting is the term for this problem, and it can occur if your program does any of the following:
Creates new documents at a rapid rate and assigns its own IDs that
are monotonically rising.
A scatter algorithm is used by Cloud Firestore to assign document
IDs.
If you use automated document IDs to create new documents, you must
not see hotspotting on writes.
In a collection with few documents, creates new documents at a fast
rate.
Creates new documents at a rapid rate with a monotonically growing
field, such as a timestamp.
Deletes a large number of documents from a collection.
Writes to the database at a rapid rate without growing traffic
gradually.

Reading from the same document multiple times in Firestore

If I had a function that reads the same document from Firestore multiple times does each read count towards the read count?
Or does the SDK use the cached version and so only add a single count?
I forgot to add. This is a question about the Admin SDK in a cloud function.
The key thing to realize is that you're charged for every document that is read for you on (and usually downloaded from) the server. So if a document is read from the cache, that usually won't count as a charged document read. But if the client needs to check with the server whether its local copy is up to date (the average document-level get() call), that does lead to a document read charge.
The Admin SDKs don't have a persistent cache, so in general each read would have to reach out to the server - and thus count as a charged document read. But some of it depends on how you actually perform the read operation, so it'll be easier to help if you can show an MCVE for that.

Firestore pagination - how to find if there are more data after query (using limit)

We use ndb datastore in our current python 2.7 standard environment. We migrating this application to python 3.7 standard environment with firestore (native mode).
We use pagination on ndb datastore and construct our query using fetch.
query_results , next_curs, more_flag = query_structure.fetch_page(10)
The next_curs and more_flag are very useful to indicate if there is more data to be fetched after the current query (to fetch 10 elements). We use this to flag the front end for "Next Page" / "Previous Page".
We can't find an equivalent of this in Firestore. Can someone help how to achieve this?
There is no direct equivalent in Firestore pagination. What you can do instead is fetch one more document than the N documents that the page requires, then use the presence of the N+1 document to determine if there is "more". You would omit the N+1 document from the displayed page, then start the next page at that N+1 document.
I build a custom firestore API not long ago to fetch records with pagination. You can take a look at the repository. This is the story of the learning cycle I went through:
My first attempt was to use limit and offset, this seemed to work like a charm, but then I walked into the issue that it ended up being very costly to fetch like 200.000 records. Because when using offset, google charges you also for the reads on all the records before that. The Google Firestore Pricing Page clearly states this:
There are no additional costs for using cursors, page tokens, and
limits. In fact, these features can help you save money by reading
only the documents that you actually need.
However, when you send a query that includes an offset, you are
charged a read for each skipped document. For example, if your query
uses an offset of 10, and the query returns 1 document, you are
charged for 11 reads. Because of this additional cost, you should use
cursors instead of offsets whenever possible.
My second attempt was using a cursor to minimize those reads. I ended up fetching N+1 documents and place the cursor like so:
collection = 'my-collection'
cursor = 'we3adoipjcjweoijfec93r04' # N+1th doc id
q = db.collection(collection)
snapshot = db.collection(collection).document(cursor).get()
q = q.start_at(snapshot) # Place cursor at this document
docs = q.stream()
Google wrote a whole page on pagination in Firestore. Some useful query methods when implementing pagination:
limit() limits the query to a fixed set of documents.
start_at() includes the cursor document.
start_after() starts right after the cursor document.
order_by() ensures all documents are ordered by a specific field.

Firestore: Reading data with references do increase in number of requests?

When documents on firestore is read, firestore wont give references data, if any. so currently I am requesting firestore for data from reference path. Do this increase in number of requests to server, eventually decrease in performance and increase in pricing ? How storing references is helpful in terms of requesting data from server ?
Reading a document that has a reference counts as a read of that document. Reading the referenced document count as a read of another document. So in total that is two reads.
There is no hidden cost-inflation here: if the server were to automatically follow the reference, it would also have to read both documents.
If you're looking to minimize the number of documents you read, you can consider adding the minimum data you need from the referenced document into the document containing the reference. For example, if you have a chat app:
you might want to include the display name of each user posting the message in the message itself, so that you don't have to read the user's profile document.
if you do so, you'll have to consider what to do if the user updates their display name. See my answer here for some options: How to write denormalized data in Firebase
the number of users is likely smaller than the number of chat messages (and rather limited in a specific time-frame), making the number of reads of linked documents lower than the number of messages.
by duplicating the data, you may be inflating the bandwidth usage, especially if the number of users is much lower than the number of messages.
What this boils down to is: you're likely optimizing prematurely, but even if not: there's no one-size-fits-all approach. NoSQL data modeling depends on the use-cases of your app, and Firestore is no different.

Is a good idea to store chat messages in a mongodb collection?

I'm developing a chat app with node.js, redis, socket.io and mongodb. MongoDB comes the last and for persisting the messages.
My question is what would be the best approach for this last step?
I'm afraid a collection with all the messages like
{
id,
from,
to,
datetime,
message
}
can get too big too soon, and is going to get very slow for reading purposes, what do you think?
Is there a better approach you already worked with?
In MongoDB, you store your data in the format you will want to read them later.
If what you read from the database is a list of messages filtered on the 'to' field and with a dynamic datetime filter, then this schema is the perfect fit.
Don't forget to add an index on the fields you will be querying on, then it will be reasonable fast to query them, even over millions of records.
If you would, for example, always show a full history of a full day, you would store all messages for a single day in one document. If both types of queries occur a lot, you would even store your messages in both formats.
If storage is an issue, you could also use capped collection, which will automatically delete messages of e.g. over 1 year old.
I think the db structure is fine, the way you mentioned in your question.
You may assign some unique id for chat between each pair and keep it in each record of chat. Retrieve based on that when you want to show it.
Say 12 is the unique id for chat between A and B, retrieve should be based on 12 when you want to show chat for A and B.
So your db structure can be like:-
{
id,
from,
to,
datetime,
message,
uid
}
Remember, you can optimize your retrieve, if you will give some limit(say 100 at a time) for retrieve. If user is scrolling beyond 100 retrieve more 100 chats. Which will solve lots of retrieve.
When using limit, retrieve based on date created and use sort with find query as well.
Just a thought here, are the messages plain text or are you allowed to share images and videos as well ?
If it's the latter then storing all the chats for a single day in one collection might not work out.
Actually if you have images and videos shares allowed then you need to take into account the. 16mb document restriction as well.