Firebase Realtime Database Pricing with Query - swift

I have a question about the firebase database pricing. I have about 400,000 rows in the leaderboard of my database, but in my app I just want to load the last 500 rows, so my question will I get charged for the 500 rows loaded when I run the query or will I get charged for all 400,000 rows.
Realtime database charges 5$ per gb stored and 1$ per gb downloaded. I did the calculation with Firestore and found that Realtime Database would be way cheaper if i get charged for the 500 rows and not the 400,000 rows.
I searched all documentation and have not found anything about queries: https://firebase.google.com/pricing
https://firebase.google.com/docs/database/usage/billing
Can someone tell me if I get charged for just the 500 rows in my collection or for all the data in the collection and if there is a way to only get charged for the 500 rows maybe with security rules?
Here is my query code:
let queryRef = ref.child("Leaderboard").queryOrdered(byChild: "totalStars").queryLimited(toLast: 500)
How the database looks like. (It will have about 500,000 childs same as these and be loaded 200,000 per day, But I just want to be priced on the top 500 that I load and not the whole 500,000 each time a user loads the leaderboard is it possible?)

You will only be charged for the number of Firestore Documents corresponding to the result of your query (not to the number of docs in the collection).
So in your case a maximum of 500 reads, since you would limit the Query to 500 documents.
On the other hand, note that the Realtime Database queries are not shallow (while the Firestore ones are) and therefore if you query for a JSON node you'll get the entire tree under this node.

Renaud's answer is the correct one but let me add some additional information and restate that:
With the Firebase Real Time Database for downloads you are charged for
what is downloaded and not how many nodes you are querying.
So the key is to reduce the amount of data you're downloading. Your nodes are already pretty shallow however, there's a huge savings to be made because in your current structure, the node key (the users uid) is duplicated within the node as a child node, and that's not needed.
You can always get the node key with snapshot.key and remove that child node. So it would look like
uid
fullName: "Logan Paul"
stars: 40
Also, I think your calculations are off a bit. It looks like each node would be about 100 bytes of data, and Firebase strings are UTF-8 Encoded so if you download 500 nodes per user per day and you have 200,000 users, that about 38Gb per day (as binary).
Roughly 400 bytes * 500 nodes * 200,000 users * 0.000000000931322574615479 bytes per Gb = 38Gb
so about $38 a day if I did my math correctly.

Related

What is the precise definition of a "read" in MongoDB?

In the docs it says "$0.10/million reads".
I have a collection with a 1 million documents. I was testing some intensive queries just for one day by myself and my account accumulated a charge of $26.
Not sure how I got there.
So I need to know the precise definition of a "read". Is it each time a document is accessed from disk?
At the time of writing https://www.mongodb.com/pricing reads:
Item
Description
Pricing
Read Processing Unit (RPU)
Number of read operations* to the database * Sum of documents read divided by 4KB and index bytes read divided by 256 bytes
$0.10 / million for the first 50 million per day*
If I get it, 1 read is fetching up to 4Kb of a document or up to 256b of index, e.g. querying whole million of documents in the first stage of an aggregation pipeline costs $0.1 if your documents are about 4Kb each.
I would strongly advise to talk to Atlas support tho. Don't rely on my understanding of what's written there.

Firestore reading quota

I have 1 collection on Firestore database and there are 2000 test documents (records) in this collection. Firestore gives free 50.000 daily reading quota. When I run my Javascript code to query documents, my reading quota decreases more than I expected. If I count all documents by using one query, is that mean "2000 reading" operation or only "1 reading" operation?
Currently firestore doesn't have any native support for aggregate queries over documents like sum of some fields or even count of documents.
So yes, when you count total number of documents in the collection then you are actually first fetching atleast the references for those docs.
So, having 2000 documents in a collections and using a query to count number of docs in that collection you are actually doing 2000 reads.
You accomplish what you want, you can take a look at the following also https://stackoverflow.com/a/49407570
Firebase Spark free give you
1 GiB total - size of data you can store
10GiB/month - Network egress Egress in the world of
networking implies traffic that exits an entity or a network
boundary, while Ingress is traffic that enters the boundary of a
network In short network bandwidth on database
20K/day Writes
50K/day Reads
20K/day Del
If You reading 2000 documents depends on calling a single call read cause one read if you reading multipal at one consider 1 reads the answer is depends how you reads you calling
Firebase Console also consume some reads writes thats why quota decreases more than you expected

Reduce numbers of request Firebase

I'd like to know what cost a reads on Cloud Firestore.
For example, the app is loading an object of a collection and all fields into it. Does it cost 1 read for the object or does it cost 10 reads since there are 10 fields in this object (name, image link, description, uuid, createDates, price , price, price 3 etc) ?
If the answer is 10 (which I supposed it is), there is a possibility to reduce reads by deleting the fields I don't need when using my app (createdates, uuid for example).
Is there any problems doing that?
Also, can I group some of the fields together? (let's say price(string)=price1/price2/price3 and then in my app I say price1 is the first number of price, price2 is the one in the middle and so on.
Will this reduce the reads by 3 for the price?
Thank you very much for theses explanations
Firestore pricing is based on document (object) reads: https://cloud.google.com/firestore/pricing with a minimum charge of one document for every query, even if there are no results.
Since documents contain the key/value pair fields (https://cloud.google.com/firestore/docs/data-model) you should only get charged per document, not per field.
Of course, other costs may come into play, as the documentation notes that larger documents can be slower to retrieve (a cost of latency) and of course larger documents will use more network bandwidth, which can incur a cost in some cases.
There is other guidance on the pricing page about how to reduce costs for large result sets, via the use of cursors, but the costs are still based on documents.

MongoDB aggregation performance capability

I am trying to work through some performance considerations about using MongoDb for a considerable amount of documents to be used in a variety of aggregations.
I have read that a collection has 32TB capcity depending on the sizes of chunk and shard key values.
If I have 65,000 customers who each supply to us (on average) 350 sales transactions per day, that ends up being about 22,750,000 documents getting created daily. When I say a sales transaction, I mean an object which is like an invoice with a header and line items. Each document I have is an average of 2.60kb.
I also have some other data being received by these same customers like account balances and products from a catalogue. I estimate about 1,000 product records active at any one time.
Based upon the above, I approximate 8,392,475,0,00 (8.4 billion) documents in a single year with a total of 20,145,450,000 kb (18.76Tb) of data being stored in a collection.
Based upon the capacity of a MongoDb collection of 32Tb (34,359,738,368 kb) I believe it would be at 58.63% of capacity.
I want to understand how this will perform for different aggregation queries running on it. I want to create a set of staged pipeline aggregations which write to a different collection which are used as source data for business insights analysis.
Across 8.4 billion transactional documents, I aim to create this aggregated data in a different collection by a set of individual services which output using $out to avoid any issues with the 16Mb document size for a single results set.
Am I being overly ambitious here expection MongoDb to be able to:
Store that much data in a collection
Aggregate and output the results of refreshed data to drive business insights in a separate collection for consumption by services which provide discrete aspects of a customer's business
Any feedback welcome, I want to understand where the limit is of using MongoDb as opposed to other technologies for quantity data storage and use.
Thanks in advance
There is no limit on how big collection in MongoDB can be (in a replica set or a sharded cluster). I think you are confusing this with maximum collection size after reaching which it cannot be sharded.
MongoDB Docs: Sharding Operational Restrictions
For the amount of data you are planning to have it would make sense to go with a sharded cluster from the beginning.

Updating large number of records in a collection

I have collection called TimeSheet having few thousands records now. This will eventually increase to 300 million records in a year. In this collection I embed few fields from another collection called Department which is mostly won't get any updates and only rarely some records will be updated. By rarely I mean only once or twice in a year and also not all records, only less than 1% of the records in the collection.
Mostly once a department is created there won't any update, even if there is an update, it will be done initially (when there are not many related records in TimeSheet)
Now if someone updates a department after a year, in a worst case scenario there are chances collection TimeSheet will have about 300 million records totally and about 5 million matching records for the department which gets updated. The update query condition will be on a index field.
Since this update is time consuming and creates locks, I'm wondering is there any better way to do it? One option that I'm thinking is run update query in batches by adding extra condition like UpdatedDateTime> somedate && UpdatedDateTime < somedate.
Other details:
A single document size could be about 3 or 4 KB
We have a replica set containing three replicas.
Is there any other better way to do this? What do you think about this kind of design? What do you think if there numbers I given are less like below?
1) 100 million total records and 100,000 matching records for the update query
2) 10 million total records and 10,000 matching records for the update query
3) 1 million total records and 1000 matching records for the update query
Note: The collection names department and timesheet, and their purpose are fictional, not the real collections but the statistics that I have given are true.
Let me give you a couple of hints based on my global knowledge and experience:
Use shorter field names
MongoDB stores the same key for each document. This repetition causes a increased disk space. This can have some performance issue on a very huge database like yours.
Pros:
Less size of the documents, so less disk space
More documennt to fit in RAM (more caching)
Size of the do indexes will be less in some scenario
Cons:
Less readable names
Optimize on index size
The lesser the index size is, the more it gets fit in RAM and less the index miss happens. Consider a SHA1 hash for git commits for example. A git commit is many times represented by first 5-6 characters. Then simply store the 5-6 characters instead of the all hash.
Understand padding factor
For updates happening in the document causing costly document move. This document move causing deleting the old document and updating it to a new empty location and updating the indexes which is costly.
We need to make sure the document don't move if some update happens. For each collection there is a padding factor involved which tells, during document insert, how much extra space to be allocated apart from the actual document size.
You can see the collection padding factor using:
db.collection.stats().paddingFactor
Add a padding manually
In your case you are pretty sure to start with a small document that will grow. Updating your document after while will cause multiple document moves. So better add a padding for the document. Unfortunately, there is no easy way to add a padding. We can do it by adding some random bytes to some key while doing insert and then delete that key in the next update query.
Finally, if you are sure that some keys will come to the documents in the future, then preallocate those keys with some default values so that further updates don't cause growth of document size causing document moves.
You can get details about the query causing document move:
db.system.profile.find({ moved: { $exists : true } })
Large number of collections VS large number of documents in few collection
Schema is something which depends on the application requirements. If there is a huge collection in which we query only latest N days of data, then we can optionally choose to have separate collection and old data can be safely archived. This will make sure that caching in RAM is done properly.
Every collection created incur a cost which is more than cost of creating collection. Each of the collection has a minimum size which is a few KBs + one index (8 KB). Every collection has a namespace associated, by default we have some 24K namespaces. For example, having a collection per User is a bad choice since it is not scalable. After some point Mongo won't allow us to create new collections of indexes.
Generally having many collections has no significant performance penalty. For example, we can choose to have one collection per month, if we know that we are always querying based on months.
Denormalization of data
Its always recommended to keep all the related data for a query or sequence of queries in the same disk location. You something need to duplicate the information across different documents. For example, in a blog post, you'll want to store post's comments within the post document.
Pros:
index size will be very less as number of index entries will be less
query will be very fast which includes fetching all necessary details
document size will be comparable to page size which means when we bring this data in RAM, most of the time we are not bringing other data along the page
document move will make sure that we are freeing a page, not a small tiny chunk in the page which may not be used in further inserts
Capped Collections
Capped collection behave like circular buffers. They are special type of fixed size collections. These collection can receive very high speed writes and sequential reads. Being fixed size, once the allocated space is filled, the new documents are written by deleting the older ones. However document updates are only allowed if the updated document fits the original document size (play with padding for more flexibility).