Firestore: Reading data with references do increase in number of requests? - google-cloud-firestore

When documents on firestore is read, firestore wont give references data, if any. so currently I am requesting firestore for data from reference path. Do this increase in number of requests to server, eventually decrease in performance and increase in pricing ? How storing references is helpful in terms of requesting data from server ?

Reading a document that has a reference counts as a read of that document. Reading the referenced document count as a read of another document. So in total that is two reads.
There is no hidden cost-inflation here: if the server were to automatically follow the reference, it would also have to read both documents.
If you're looking to minimize the number of documents you read, you can consider adding the minimum data you need from the referenced document into the document containing the reference. For example, if you have a chat app:
you might want to include the display name of each user posting the message in the message itself, so that you don't have to read the user's profile document.
if you do so, you'll have to consider what to do if the user updates their display name. See my answer here for some options: How to write denormalized data in Firebase
the number of users is likely smaller than the number of chat messages (and rather limited in a specific time-frame), making the number of reads of linked documents lower than the number of messages.
by duplicating the data, you may be inflating the bandwidth usage, especially if the number of users is much lower than the number of messages.
What this boils down to is: you're likely optimizing prematurely, but even if not: there's no one-size-fits-all approach. NoSQL data modeling depends on the use-cases of your app, and Firestore is no different.

Related

Unexpectedly High no. of Reads

I am using Google Cloud Functions and read, write operations are performed on the Firestore thru these Cloud Functions. We are seeing unexpectedly high number of read operations on Firestore, the source of which I am unable to figure out.
Not more than 20K documents are generated on a daily basis. But the daily read count is usually more than 25,000,000
What I am looking for, is ways to identify the root cause of these high number of reads in the Cloud Functions.
To start with, I have captured the size of the results of all the Firestore get() methods in Cloud Functions. But the sum total of all the sizes is much much much lower than the read count I mentioned above.
Need suggestions on ways/practices to identify the source from where these high reads are generating.
You can use a SnapshotListener as a workaround, which allows us to listen for changes in real-time.
You will be charged for readings as if we had sent a new query if the listener is disconnected for more than 30 minutes. If the listener is disconnected every 31 minutes in the worst-case scenario, we will be charged 50 reads each time.
As a result, this technique is only practicable when the listener is not frequently disconnected.
According to the documentation, I found you can reduce the number of reads using get().
In each document in the collection, you must add a new property named lastModified of type Date. When you create a new document or edit an existing one, you must use FieldValue.serverTimestamp() to set or update the field's value.
Reviewing Firebase documentation, I found that high read or write rates to lexicographically close documents need to be avoided to avoid contention faults in your application. Hotspotting is the term for this problem, and it can occur if your program does any of the following:
Creates new documents at a rapid rate and assigns its own IDs that
are monotonically rising.
A scatter algorithm is used by Cloud Firestore to assign document
IDs.
If you use automated document IDs to create new documents, you must
not see hotspotting on writes.
In a collection with few documents, creates new documents at a fast
rate.
Creates new documents at a rapid rate with a monotonically growing
field, such as a timestamp.
Deletes a large number of documents from a collection.
Writes to the database at a rapid rate without growing traffic
gradually.

What are the general limits, if any, of a very large frequently used collection?

What are the general limits, if any, of a very large frequently used collection (loads of writes and reads to the collection at the same time) in firestore?
Say you have an app where a user can scroll through a list of users. The information about each user is stored in a document in a collection. Now imagine a lot of new users is constantly created, and a lot of users is scrolling through the current list of users at the same time e.g. reading from the collection. Furthermore a lot of users searches the collection of users using different fields (name, interests, pets, etc.) and uses the indexes of Firestore.
If all of these things happened at the same time, would it affect the performance of firestore clientwise? Would it be necessary to create multiple smaller collections containing users? My question is to be understanded as an extreme case.
If all of these things happened at the same time, would it affect the performance of firestore clientwise? Would it be necessary to create multiple smaller collections containing users?
Yes. And yes.
Firestore is susceptible to hotspots on writes. If many clients write to the same collection and/or to the same index, you may experience delays in processing of those writes. For the exact limits, see the documentation on limits on writes and transactions.
Creating separate collections to shard out those writes is the common solution for this, although you can also accept the delay in processing.

Designing mongodb data model: embedding vs. referencing

I'm writing an application that gathers statistics of users across multiple social networks accounts. I have a collection of users and I would like to store the statistics information of each user.
Now, I have two options:
Create a collection that stores users statistics documents, and add a reference object to each of the user documents that links it to the corresponding document in the statistics collection.
Embed a statistics document in each of the users document.
Besides for query performance (which I'm less concerned about):
what are the pros and cons of each of these approaches?
What should I take into account if I choose to use references rather than embedding the information inside the user document?
The shape of the data is determined by the application itself.
There’s a good chance that when you are working with the users data, you probably need statistics details.
The decision about what to put in the document is pretty much determined by how the data is used by the application.
The data that is used together as users documents is a good candidate to be pre-joined or embedded.
One of the limitations of this approach is the size of the document. It should be a maximum of 16 MB.
Another approach is to split data between multiple collections.
One of the limitations of this approach is that there is no constraint in MongoDB, so there are no foreign key constraints as well.
The database does not guarantee consistency of the data. Is it up to you as a programmer to take care that your data has no orphans.
Data from multiple collections could be joined by applying the lookup operator. But, a collection is a separate file on disk, so seeking on multiple collections means seeking from multiple files, and that is, as you are probably guessing, slow.
Generally speaking, embedded data is the preferable approach.

Modeling data in MongoDB for a chat application

I'd like to use MongoDB to store chat messages as part of a chat application. The database will be used to display chat history to users joining a channel.
I am trying to determine the best way to model this data in the database. The application is a simple chat app which contains numerous channels that users can chat in. Here are a few options I've considered:
A Messages Collection containing a document for every message. This is easy to implement, however with any significant usage many documents would be created.
A Channels Collection containing a document for every channel. This would result in fewer documents. Messages would be stored as an array on a channel document.
Which of these options is preferred, and why? Is there a better option not listed here?
There are many, many ways to go about modeling something like this. There is no generic "best way," as it really depends on how you plan on using the data, how the app is going to function, etc. However, there are a few things to consider with your approach.
First, having a lot of documents is not an issue. That's what Mongo does - it's great at storing lots of documents. I am a strong advocate of modularity, as it makes things more flexible. I reflect that mindset in my database by separating data as much as possible, and then using references to populate as needed.
This means you have to do more population, but in the end it prevents you from pidgeon holing yourself into having to do things a certain way.
So for your example in particular, a good way would be to combine what you've mentioned above: Have a Messages collection which creates a document for every message. Then have a Channels collection which stores an array of Message IDs (not the message itself).
Why is this useful? I'm assuming you will want to load a Channel, but not all 2,000 messages that are in it. You probably want to load the first 50, and then load more via infinite scroll or something.
This allows you to fetch a Channel document and then populate the first 50 messages. Then you can incrementally fetch 50 more messages at a time if needed.
If you store all of the messages in that array, your Channel document is going to get very, very large.
This also allows a user to edit their Message without editing the Channel document in any way - this is very important!
Having a separate Message schema also allows you to do things like fetch all the messages from a single user. You'll probably want to have a reference in the Message to a User ID.
There is a lot to consider when modeling data like this, but the important things to think about are "How am I going to need to fetch this data?" and "How will I need to modify this data?" Then figure out if your current format makes one of those things difficult.

mongo db design of following and feeds, where should I embed?

I have a basic question about where I should embed a collection of followers/following in a mongo db. It makes sense to have an embedded collection of following in a user object, but does it also make sense to also embed the converse followers collection as well? That would mean I would have to update and embed in the profile record of both the:
following embedded list in the follower
And the followers embedded list of the followee
I can't ensure atomicity on that unless I also somehow keep a transaction or update status somewhere. Is it worth it embedding in both entities or should I just update #1, embed following in the follower's profile and, put an index on it so that I can query for the converse- followers across all profiles? Is the performance hit on that too much?
Is this a candidate for a collection that should not be embedded? Should I just have a collection of edges where I store following in its own collection with followerid and followedbyId ?
Now if I also have to update a feed to both users when they are followed or following, how should I organize that?
As for the use case, the user will see the people they are following when viewing their feeds, which happens quite often, and also see the followers of a profile when they view the profile detail of anyone, which also happens often but not quite as much as the 1st case. In both cases, the total numbers of following and followers shows up on every profile page.
In general, it's a bad idea to embed following/followed-by relationships into user documents, for several reasons:
(1) there is a maximum document size limit of 16MB, and it's plausible that a popular user of a well-subscribed site might end up with hundreds of thousands of followers, which will approach the maximum document size,
(2) followership relationships change frequently, and so the case where a user gains a lot of followers translates into repeated document growth if you're embedding followers. Frequent document growth will significantly hinder MongoDB performance, and so should be avoided (occasional document growth, especially is documents tend to reach a stable final size, is less of a performance penalty).
So, yes, it is best to split out following/followed-by relationship into a separate collection of records each having two fields, e.g., { _id : , oid : }, with indexes on _id (for the "who am I following?" query) and oid (for the "who's following me?" query). Any individual state change is modeled by a single document addition or removal, though if you're also displaying things like follower counts, you should probably keep separate counters that you update after any edge insertion/deletion.
(Of course, this supposes your business requirements allow you some flexibility on the consistency details: in general, if your display code tells a user he's got 304 followers and then proceeds to enumerate them, only the most fussy user will check that the followers enumerated tally up to 304. If business requirements necessitate absolute consistency, you'll either need a database that isolates transactions for you, or else you'll have to do the counting yourself as part of displaying all user identities.)
You can embed them all but create a new document when you reach a certain limit. For example you can limit a document to an array of 500 elements then create a new one. Also, if it is about feed, when viewed, you dont have to keep the viewed publications you can replace by new ones so you don't have to create new document for additional publication storage.
To maintain your performance, I'd advice you to make a collection that can use graphlookup aggregation, where you store your following. Being followed can reach millions of followers, so you have to store what pwople follow instead of who follows them.
I hope it helps you.