I have 1 collection on Firestore database and there are 2000 test documents (records) in this collection. Firestore gives free 50.000 daily reading quota. When I run my Javascript code to query documents, my reading quota decreases more than I expected. If I count all documents by using one query, is that mean "2000 reading" operation or only "1 reading" operation?
Currently firestore doesn't have any native support for aggregate queries over documents like sum of some fields or even count of documents.
So yes, when you count total number of documents in the collection then you are actually first fetching atleast the references for those docs.
So, having 2000 documents in a collections and using a query to count number of docs in that collection you are actually doing 2000 reads.
You accomplish what you want, you can take a look at the following also https://stackoverflow.com/a/49407570
Firebase Spark free give you
1 GiB total - size of data you can store
10GiB/month - Network egress Egress in the world of
networking implies traffic that exits an entity or a network
boundary, while Ingress is traffic that enters the boundary of a
network In short network bandwidth on database
20K/day Writes
50K/day Reads
20K/day Del
If You reading 2000 documents depends on calling a single call read cause one read if you reading multipal at one consider 1 reads the answer is depends how you reads you calling
Firebase Console also consume some reads writes thats why quota decreases more than you expected
Related
In the docs it says "$0.10/million reads".
I have a collection with a 1 million documents. I was testing some intensive queries just for one day by myself and my account accumulated a charge of $26.
Not sure how I got there.
So I need to know the precise definition of a "read". Is it each time a document is accessed from disk?
At the time of writing https://www.mongodb.com/pricing reads:
Item
Description
Pricing
Read Processing Unit (RPU)
Number of read operations* to the database * Sum of documents read divided by 4KB and index bytes read divided by 256 bytes
$0.10 / million for the first 50 million per day*
If I get it, 1 read is fetching up to 4Kb of a document or up to 256b of index, e.g. querying whole million of documents in the first stage of an aggregation pipeline costs $0.1 if your documents are about 4Kb each.
I would strongly advise to talk to Atlas support tho. Don't rely on my understanding of what's written there.
I am using Google Cloud Functions and read, write operations are performed on the Firestore thru these Cloud Functions. We are seeing unexpectedly high number of read operations on Firestore, the source of which I am unable to figure out.
Not more than 20K documents are generated on a daily basis. But the daily read count is usually more than 25,000,000
What I am looking for, is ways to identify the root cause of these high number of reads in the Cloud Functions.
To start with, I have captured the size of the results of all the Firestore get() methods in Cloud Functions. But the sum total of all the sizes is much much much lower than the read count I mentioned above.
Need suggestions on ways/practices to identify the source from where these high reads are generating.
You can use a SnapshotListener as a workaround, which allows us to listen for changes in real-time.
You will be charged for readings as if we had sent a new query if the listener is disconnected for more than 30 minutes. If the listener is disconnected every 31 minutes in the worst-case scenario, we will be charged 50 reads each time.
As a result, this technique is only practicable when the listener is not frequently disconnected.
According to the documentation, I found you can reduce the number of reads using get().
In each document in the collection, you must add a new property named lastModified of type Date. When you create a new document or edit an existing one, you must use FieldValue.serverTimestamp() to set or update the field's value.
Reviewing Firebase documentation, I found that high read or write rates to lexicographically close documents need to be avoided to avoid contention faults in your application. Hotspotting is the term for this problem, and it can occur if your program does any of the following:
Creates new documents at a rapid rate and assigns its own IDs that
are monotonically rising.
A scatter algorithm is used by Cloud Firestore to assign document
IDs.
If you use automated document IDs to create new documents, you must
not see hotspotting on writes.
In a collection with few documents, creates new documents at a fast
rate.
Creates new documents at a rapid rate with a monotonically growing
field, such as a timestamp.
Deletes a large number of documents from a collection.
Writes to the database at a rapid rate without growing traffic
gradually.
I understand that mongo db does locking on read and write operations.
My Use case:
Only read operations. No write operations.
I have a collection about 10million documents. Storage engine is wiredTiger.
Mongo version is 3.4.
I made a request which should return 30k documents - took 650ms on an average.
When I made concurrent requests - same requests - 100 times - It takes in seconds - few seconds to 2 minutes all requests handled.
I have single node to serve the data.
How do I access the data:
Each document contains 25 to 40 fields. I indexed few fields. I query based on one index field.
API will return all the matching documents in json form.
Other informations: API is written using Spring boot.
Concurrency tested through JMeter shell script from command line on remote machine.
So,
My question:
Am I missing any optimizations? [storage engine level, version]
Can't I achieve all read requests to be served less than a second?
If so, what sla I can keep for this use case?
Any suggestions?
Edit:
I enabled database profiler in mongodb with level 2.
My single query internally converted to 4 queries:
Initial read
getMore
getMore
getMore
These are the queries found through profiler.
Totally, it is taking less than 100ms. Is it true really?
My concurrent queries:
Now, When I hit 100 requests, nearly 150 operations are more than 100ms, 100 operations are more than 200ms, 90 operations are more than 300ms.
As per my single query analysis, 100 requests will be converted to 400 queries internally. It is fixed pattern which I verified by checking the query tag in the profiler output.
I hope this is what affects my request performance.
My single query internally converted to 4 queries:
Initial read
getMore
getMore
getMore
It's the way mongo cursors work. The documents are transferred from the db to the app in batches. IIRC the first batch is around 100 documents + cursor Id, then consecutive getMore calls retrieve next batches by cursor Id.
You can define batch size (number of documents in the batch) from the application. The batch cannot exceed 16MB, e.g. if you set batch size 30,000 it will fit into single batch only if document size is less than 500B.
Your investigation clearly show performance degradation under load. There are too many factors and I believe locking is not one of them. WiredTiger does exclusive locks on document level for regular write operations and you are doing only reads during your tests, aren't you? In any doubts you can compare results of db.serverStatus().locks before and after tests to see how many write locks were acquired. You can also run db.serverStatus().globalLock during the tests to check the queue. More details about locking and concurrency are there: https://docs.mongodb.com/manual/faq/concurrency/#for-wiredtiger
The bottleneck is likely somewhere else. There are few generic things to check:
Query optimisation. Ensure you use indexes. The profiler should have no "COLLSCAN" stage in execStats field.
System load. If your database shares system resources with application it may affect performance of the database. E.g. BSON to JSON conversion in your API is quite CPU hungry and may affect performance of the queries. Check system's LA with top or htop on *nix systems.
Mongodb resources. Use mongostat and mongotop if the server has enough RAM, IO, file descriptors, connections etc.
If you cannot spot anything obvious I'd recommend you to seek professional help. I find the simplest way to get one is by exporting data to Atlas, running your tests against the cluster. Then you can talk to the support team if they could advice any improvements to the queries.
I am planning to use a nested document structure for my MongoDB Schema design as I don't want to go for flat schema design as In my case I will need to fetch my result in one query only.
Since MongoDB has a size limit for a document.
MongoDB Limits and Threshold
A MongoDB document has a size limit of 16MB ( an amount of data). If your subcollection can growth without limits go flat.
I don't need to fetch my nested data but only be needing my nested data for filtering and querying purpose.
I want to know whether I will still be bound by MongoDB size limits even if I use my embedded data only for querying and filter purpose and never for fetching of nested data because as per my understanding, in this case, MongoDB won't load the complete document in memory but only the selected fields?
Nested schema design example
{
clinicName: "XYZ Hopital",
clinicAddress: "ABC place.",
"doctorsWorking":{
"doctorId1":{
"doctorJoined": ISODate("2017-03-15T10:47:47.647Z")
},
"doctorId2":{
"doctorJoined": ISODate("2017-04-15T10:47:47.647Z")
},
"doctorId3":{
"doctorJoined": ISODate("2017-05-15T10:47:47.647Z")
},
...
...
//upto 30000-40000 more records suppose
}
}
I don't think your understanding is correct when you say "because as per my understanding, in this case, MongoDB won't load the complete document in memory but only the selected fields?".
If we see MongoDB Doc. then it reads
The maximum BSON document size is 16 megabytes. The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API.
So the clear limit is 16 MB on document size. Mongo should stop you from saving such a document which is greater than this size.
If I agree with your understanding for a while then let's say that it allows to
save any size of document but more than 16 MB in RAM is not allowed. But on other hand, while storing the data it won't know what queries will be run on this data. So ultimately you will be inserting such big documents which can't be used later. (because while inserting we don't tell the query pattern, we can even try to fetch the full document in a single shot later).
If the limit is on transmission (hypothetically assuming) then there are lot of ways (via code) software developers can bring data into RAM in clusters and they won't cross 16 MB limit ever (that's how they do IO ops. on large files). They will make fun of this limit and just leave it useless. I hope MongoDB creators knew it and didn't want it to happen.
Also if limit is on transmission then there won't be any need of separate collection. We can put everything in a single collections and just write smart queries and can fetch data. If fetched data is crossing 16 MB then fetch it in parts and forget the limit. But it doesn't go this way.
So the limit must be on document size else it can create so many issues.
In my opinion if you just need "doctorsWorking" data for filtering or querying purpose (and if you also think that "doctorsWorking" will cause document to cross 16 MB limit) then it's good to keep it in a separate collection.
Ultimately all things depend on query and data pattern. If a doctor can serve in multiple hospitals in shifts then it will be great to keep doctors in separate collection.
I am trying to work through some performance considerations about using MongoDb for a considerable amount of documents to be used in a variety of aggregations.
I have read that a collection has 32TB capcity depending on the sizes of chunk and shard key values.
If I have 65,000 customers who each supply to us (on average) 350 sales transactions per day, that ends up being about 22,750,000 documents getting created daily. When I say a sales transaction, I mean an object which is like an invoice with a header and line items. Each document I have is an average of 2.60kb.
I also have some other data being received by these same customers like account balances and products from a catalogue. I estimate about 1,000 product records active at any one time.
Based upon the above, I approximate 8,392,475,0,00 (8.4 billion) documents in a single year with a total of 20,145,450,000 kb (18.76Tb) of data being stored in a collection.
Based upon the capacity of a MongoDb collection of 32Tb (34,359,738,368 kb) I believe it would be at 58.63% of capacity.
I want to understand how this will perform for different aggregation queries running on it. I want to create a set of staged pipeline aggregations which write to a different collection which are used as source data for business insights analysis.
Across 8.4 billion transactional documents, I aim to create this aggregated data in a different collection by a set of individual services which output using $out to avoid any issues with the 16Mb document size for a single results set.
Am I being overly ambitious here expection MongoDb to be able to:
Store that much data in a collection
Aggregate and output the results of refreshed data to drive business insights in a separate collection for consumption by services which provide discrete aspects of a customer's business
Any feedback welcome, I want to understand where the limit is of using MongoDb as opposed to other technologies for quantity data storage and use.
Thanks in advance
There is no limit on how big collection in MongoDB can be (in a replica set or a sharded cluster). I think you are confusing this with maximum collection size after reaching which it cannot be sharded.
MongoDB Docs: Sharding Operational Restrictions
For the amount of data you are planning to have it would make sense to go with a sharded cluster from the beginning.