my daily quotas has just been resetted so I figured out it was the moment to go on my Firestore user interface to know how much reads where counted simply by retrieving documents.
I have 11 documents each of them have 3 sub-collections (inside of them a certain number of documents) plus 1 dummy document with no sub collection and connecting to the firestore UI counts me 36 reads (1 document is opened - its sub collection are closed);
I though it was 1 read/document retrieved without taking in account sub collections?
36 reads how is this even possible? Would this mean my 12 documents are read 3 times each?
here is my data structure:
myCollection: {
$docId: {
data:myDate
subCollection1:{
$subDocId
}
subCollection2:{
$subDocId
}
subCollection3:{
$subDocId
}
}
}
I have tested it on a completely fresh project. Indeed using the front end UI in the console it used around 2 times more than the number of documents. I created 6 documents with one field in one collection and every listing gave me usage of 12 reads. If you add some sub-collection it might be more.
But first of all I think that console UI is not meant to be used as a working interface, but rather for support/design purposes which means entered occasionally. With this assumption matter of cost effectiveness is less important. If you have 50 000 reads free each day and 0.036$ per 100 000 reads, a few hundred reads more when using the UI just not make any difference in costs.
The larger number of reads might be the result of implementation. Firestore is billed based on API calls, probably some items are queried even if they are not seen at the beginning to improve user experience or due to some other feature of the UI.
Firestore cost documentation here.
Related
I am using Google Cloud Functions and read, write operations are performed on the Firestore thru these Cloud Functions. We are seeing unexpectedly high number of read operations on Firestore, the source of which I am unable to figure out.
Not more than 20K documents are generated on a daily basis. But the daily read count is usually more than 25,000,000
What I am looking for, is ways to identify the root cause of these high number of reads in the Cloud Functions.
To start with, I have captured the size of the results of all the Firestore get() methods in Cloud Functions. But the sum total of all the sizes is much much much lower than the read count I mentioned above.
Need suggestions on ways/practices to identify the source from where these high reads are generating.
You can use a SnapshotListener as a workaround, which allows us to listen for changes in real-time.
You will be charged for readings as if we had sent a new query if the listener is disconnected for more than 30 minutes. If the listener is disconnected every 31 minutes in the worst-case scenario, we will be charged 50 reads each time.
As a result, this technique is only practicable when the listener is not frequently disconnected.
According to the documentation, I found you can reduce the number of reads using get().
In each document in the collection, you must add a new property named lastModified of type Date. When you create a new document or edit an existing one, you must use FieldValue.serverTimestamp() to set or update the field's value.
Reviewing Firebase documentation, I found that high read or write rates to lexicographically close documents need to be avoided to avoid contention faults in your application. Hotspotting is the term for this problem, and it can occur if your program does any of the following:
Creates new documents at a rapid rate and assigns its own IDs that
are monotonically rising.
A scatter algorithm is used by Cloud Firestore to assign document
IDs.
If you use automated document IDs to create new documents, you must
not see hotspotting on writes.
In a collection with few documents, creates new documents at a fast
rate.
Creates new documents at a rapid rate with a monotonically growing
field, such as a timestamp.
Deletes a large number of documents from a collection.
Writes to the database at a rapid rate without growing traffic
gradually.
Quick question about how firestore reads are calculated. Say I have a collection with 100 items in it, and I do
citiesRef.order(by: "name").limit(to: 3)
This would technically have to look at all 100 items, order them by name, and then return 3. Would this count for 3 reads or would it count for 100 reads, since we're looking at 100 items?
Thanks.
If the above query returns 3 documents then it would count as 3 reads.
You are charged for each document read, write, and delete that you perform with Cloud Firestore.
Charges for writes and deletes are straightforward. For writes, each set or update operation counts a single write.
Charges for reads have some nuances that you should keep in mind. The following sections explain these nuances in detail
https://firebase.google.com/docs/firestore/pricing#operations
I am having some trouble which schema design to pick, i have a document which holds user info each user have a very big set of items that can be up to 20k items.
an item have a date and an id and 19 other fields and also an internal array which can have 20-30 items , and it can be modified,deleted and of course newly inserted and queried by any property that it holds.
so i came up with 2 possible schemas.
1.Putting everything into a single docment
{_id:ObjectId("") type:'user' name:'xxx' items:[{.......,internalitems:[]},{.......,internalitems:[]},...]}
{_id:ObjectId("") type:'user' name:'yyy' items:[{.......,internalitems:[]},{.......,internalitems:[]},...]}
2.Seperating the items from the user and letting eachitem have its own
document
{_id:ObjectId(""), type:'user', username:'xxx'}
{_id:ObjectId(""), type:'user', username:'yyy'}
{_id:ObjectId(""), type:'useritem' username:'xxx' item:{.......,internalitems:[]}]}
{_id:ObjectId(""), type:'useritem' username:'xxx' item:{.......,internalitems:[]}]}
{_id:ObjectId(""), type:'useritem' username:'yyy' item:{.......,internalitems:[]}]}
{_id:ObjectId(""), type:'useritem' username:'yyy' item:{.......,internalitems:[]}]}
as i explained before a single user can have thousands of items and i have tens of users, internalitems can have 20-30 items, and it has 9 fields
considering that a single item can be queried by different users and can be modified only by the owner and another process.
if performance is really important which design would you pick?
if you pick neither of them what schema can you suggest?
on a side note i will be sharding and i have a single collection for everything.
I wouldn't recommend the first approach, there is a limit to the maximum document size:
"The maximum BSON document size is 16 megabytes.
The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API. See mongofiles and the documentation for your driver for more information about GridFS."
Source: http://docs.mongodb.org/manual/reference/limits/
There is also a performance implication if you exceed the current allocated document space when updating (http://docs.mongodb.org/manual/core/write-performance/ "Document Growth").
Your first solution is susceptible to both of these issues.
The second one is (Disclaimer: In the case of 20-30 internal items) is less susceptible of reaching the limit but still might require reallocation when doing updates. I haven't had this issue with a similar scenario, so this might be the way to go. And you might wanna look into Record Padding(http://docs.mongodb.org/manual/core/record-padding/) for some more details.
And, if all else fails, you can always split the internal items out as well.
Hope this helps!
I am collecting data from a streaming API and I want to create a real-time analytics dashboard. This dashboard will display a simple timeseries plotting the number of documents per hour. I am wondering if my current approach is optimal.
In the following example, on_data is fired for each new document in the stream.
# Mongo collections.
records = db.records
stats = db.records.statistics
on_data(self, data):
# Create a json document from data.
document = simplejson.loads(data)
# Insert the new document into records.
records.insert(document)
# Update a counter in records.statistics for the hour this document belongs to.
stats.update({ 'hour': document['hour'] }, { '$inc': { document['hour']: 1 } }, upsert=True)
The above works. I get a beautiful graph which plots the number of documents per hour. My question is about whether this approach is optimal or not. I am making two Mongo requests per document. The first inserts the document, the second updates a counter. The stream sends approximately 10 new documents a second.
Is there for example anyway to tell Mongo to keep the db.records.statistics in RAM? I imagine this would greatly reduce disk access on my server.
MongoDB uses memory map to handle file I/O, so it essentially treats all data as if it is already in RAM and lets the OS figure out the details. In short, you cannot force your collection to be in memory, but if the operating system handles things well, the stuff that matters will be. Check out this link to the docs for more info on mongo's memory model and how to optimize your OS configuration to best fit your use case: http://docs.mongodb.org/manual/faq/storage/
But to answer your issue specifically: you should be fine. Your 10 or 20 writes per second should not be a disk bottleneck in any case (assuming you are running on not-ancient hardware). The one thing I would suggest is to build an index over "hour" in stats, if you are not already doing that, to make your updates find documents much faster.
Is it a good idea to create per day collections for data on a given day (we could start with per day and then move to per hour if there is too much data). Is there a limit on the number of collections we can create in mongodb, or does it result in performance loss (is it an overhead for mongodb to maintain so many collections). Does a large number of collections have any adverse effect on performance?
To give you more context, the data will be more like facebook feeds, and only the latest data (say last one week or month) is more important to us. Making per day collections keeps the number of documents low, and probably would result in fast access. Even if we need old data, we can fall back to older collections. Does this make sense, or am I heading in the wrong direction?
what you actually need is to archive the old data. I would suggest you to take a look at this thread at the mongodb mailing list:
https://groups.google.com/forum/#!topic/mongodb-user/rsjQyF9Y2J4
Last post there from Michael Dirolf (10gen)says:
"The OS will handle LRUing out data, so if all of your queries are
touching the same portion of data that should stay in memory
independently of the total size of the collection."
so I guess you can stay with single collection and good indexes will do the work.
anyhow, if the collection goes too big you can always run manual archive process.
Yes, there is a limit to the number of collections you can make. From the Mongo documentation Abhishek referenced:
The limitation on the number of namespaces is the size of the namespace file divided by 628.
A 16 megabyte namespace file can support approximately 24,000 namespaces. Each index also counts as a namespace.
Indexes etc. are included in the namespaces, but even still, it would take something like 60 years to hit that limit.
However! Have you considered what happens when you want data that spans collections? In other words, if you wanted to know how many users have feeds updated in a week, you're in a bit of a tight spot. It's not easy/trivial to query across collections.
I would recommend instead making one collection to store the data and simply move data out periodically as Tamir recommended. You can easily write a job to move data out of the collection every week or every month.
Creating a collection is not much overhead, but it the overhead is larger than creating a new document inside a collections.
There is a limitation on the no of collections that you can create: " http://docs.mongodb.org/manual/reference/limits/#Number of Namespaces "
Making new collections to me, won't be having any performance difference because in RAM you cache only those data that you actually query. In your case it will be recent feeds etc.
But having per day/hour collection will help you in achieving old data very easily.