What is the precise definition of a "read" in MongoDB? - mongodb

In the docs it says "$0.10/million reads".
I have a collection with a 1 million documents. I was testing some intensive queries just for one day by myself and my account accumulated a charge of $26.
Not sure how I got there.
So I need to know the precise definition of a "read". Is it each time a document is accessed from disk?

At the time of writing https://www.mongodb.com/pricing reads:
Item
Description
Pricing
Read Processing Unit (RPU)
Number of read operations* to the database * Sum of documents read divided by 4KB and index bytes read divided by 256 bytes
$0.10 / million for the first 50 million per day*
If I get it, 1 read is fetching up to 4Kb of a document or up to 256b of index, e.g. querying whole million of documents in the first stage of an aggregation pipeline costs $0.1 if your documents are about 4Kb each.
I would strongly advise to talk to Atlas support tho. Don't rely on my understanding of what's written there.

Related

When writing a single document to GCP Firestore, are you billed the same amount regardless of document size?

I'm deciding on a NoSQL database. I've noticed a surprising difference between AWS billing and GCP billing for their flagship NoSQL products.
AWS DynamoDB charges $1.25/million "WRUs," or Write Request Units. 1 WRU is billed for storing a document up to 1 KB in size. If you write a document that is larger than 1 KB, DynamoDB bills additional WRUs.
GCP Firestore charges $1.8/million "Document Writes." No mention is made of document size limitations, outside the limits page, which says that each document can be up to 1 MiB in size.
So, if I'm thinking about this correctly, if I stored 1 million 4KiB documents in DynamoDB, it would cost me 4 million WRUs, which adds up to $5. If I did the same in Firestore, it would only cost me 1 million writes, which is $1.8.
If I write 1 million 400KiB documents to DynamoDB, it would cost 400 million WRUs, which adds up to $500. But this same operation in Firestore would still be 1 million writes, which is still only $1.8.
I'm surprised by this. This large disparity in price is, in my experience, not common for cloud compute platforms.
When writing to GCP Firestore, are you billed the same amount regardless of document size?
Firestore charges for:
Storage data
Document write operations
Document read operations
Bandwidth consumed by read operations, aka Network egress
There is (as you noticed) no charge for the size of document writes, which indeed leads to favorable results in the comparison you make with writing relatively large documents.
In my experience most write operations performed from client-side application code result in documents that are quite small (a few KB at most) though, so you'll want to validate your expected document size first and then compare again.

Firestore reading quota

I have 1 collection on Firestore database and there are 2000 test documents (records) in this collection. Firestore gives free 50.000 daily reading quota. When I run my Javascript code to query documents, my reading quota decreases more than I expected. If I count all documents by using one query, is that mean "2000 reading" operation or only "1 reading" operation?
Currently firestore doesn't have any native support for aggregate queries over documents like sum of some fields or even count of documents.
So yes, when you count total number of documents in the collection then you are actually first fetching atleast the references for those docs.
So, having 2000 documents in a collections and using a query to count number of docs in that collection you are actually doing 2000 reads.
You accomplish what you want, you can take a look at the following also https://stackoverflow.com/a/49407570
Firebase Spark free give you
1 GiB total - size of data you can store
10GiB/month - Network egress Egress in the world of
networking implies traffic that exits an entity or a network
boundary, while Ingress is traffic that enters the boundary of a
network In short network bandwidth on database
20K/day Writes
50K/day Reads
20K/day Del
If You reading 2000 documents depends on calling a single call read cause one read if you reading multipal at one consider 1 reads the answer is depends how you reads you calling
Firebase Console also consume some reads writes thats why quota decreases more than you expected

Using nested document structure in mongodb

I am planning to use a nested document structure for my MongoDB Schema design as I don't want to go for flat schema design as In my case I will need to fetch my result in one query only.
Since MongoDB has a size limit for a document.
MongoDB Limits and Threshold
A MongoDB document has a size limit of 16MB ( an amount of data). If your subcollection can growth without limits go flat.
I don't need to fetch my nested data but only be needing my nested data for filtering and querying purpose.
I want to know whether I will still be bound by MongoDB size limits even if I use my embedded data only for querying and filter purpose and never for fetching of nested data because as per my understanding, in this case, MongoDB won't load the complete document in memory but only the selected fields?
Nested schema design example
{
clinicName: "XYZ Hopital",
clinicAddress: "ABC place.",
"doctorsWorking":{
"doctorId1":{
"doctorJoined": ISODate("2017-03-15T10:47:47.647Z")
},
"doctorId2":{
"doctorJoined": ISODate("2017-04-15T10:47:47.647Z")
},
"doctorId3":{
"doctorJoined": ISODate("2017-05-15T10:47:47.647Z")
},
...
...
//upto 30000-40000 more records suppose
}
}
I don't think your understanding is correct when you say "because as per my understanding, in this case, MongoDB won't load the complete document in memory but only the selected fields?".
If we see MongoDB Doc. then it reads
The maximum BSON document size is 16 megabytes. The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API.
So the clear limit is 16 MB on document size. Mongo should stop you from saving such a document which is greater than this size.
If I agree with your understanding for a while then let's say that it allows to
save any size of document but more than 16 MB in RAM is not allowed. But on other hand, while storing the data it won't know what queries will be run on this data. So ultimately you will be inserting such big documents which can't be used later. (because while inserting we don't tell the query pattern, we can even try to fetch the full document in a single shot later).
If the limit is on transmission (hypothetically assuming) then there are lot of ways (via code) software developers can bring data into RAM in clusters and they won't cross 16 MB limit ever (that's how they do IO ops. on large files). They will make fun of this limit and just leave it useless. I hope MongoDB creators knew it and didn't want it to happen.
Also if limit is on transmission then there won't be any need of separate collection. We can put everything in a single collections and just write smart queries and can fetch data. If fetched data is crossing 16 MB then fetch it in parts and forget the limit. But it doesn't go this way.
So the limit must be on document size else it can create so many issues.
In my opinion if you just need "doctorsWorking" data for filtering or querying purpose (and if you also think that "doctorsWorking" will cause document to cross 16 MB limit) then it's good to keep it in a separate collection.
Ultimately all things depend on query and data pattern. If a doctor can serve in multiple hospitals in shifts then it will be great to keep doctors in separate collection.

MongoDB aggregation performance capability

I am trying to work through some performance considerations about using MongoDb for a considerable amount of documents to be used in a variety of aggregations.
I have read that a collection has 32TB capcity depending on the sizes of chunk and shard key values.
If I have 65,000 customers who each supply to us (on average) 350 sales transactions per day, that ends up being about 22,750,000 documents getting created daily. When I say a sales transaction, I mean an object which is like an invoice with a header and line items. Each document I have is an average of 2.60kb.
I also have some other data being received by these same customers like account balances and products from a catalogue. I estimate about 1,000 product records active at any one time.
Based upon the above, I approximate 8,392,475,0,00 (8.4 billion) documents in a single year with a total of 20,145,450,000 kb (18.76Tb) of data being stored in a collection.
Based upon the capacity of a MongoDb collection of 32Tb (34,359,738,368 kb) I believe it would be at 58.63% of capacity.
I want to understand how this will perform for different aggregation queries running on it. I want to create a set of staged pipeline aggregations which write to a different collection which are used as source data for business insights analysis.
Across 8.4 billion transactional documents, I aim to create this aggregated data in a different collection by a set of individual services which output using $out to avoid any issues with the 16Mb document size for a single results set.
Am I being overly ambitious here expection MongoDb to be able to:
Store that much data in a collection
Aggregate and output the results of refreshed data to drive business insights in a separate collection for consumption by services which provide discrete aspects of a customer's business
Any feedback welcome, I want to understand where the limit is of using MongoDb as opposed to other technologies for quantity data storage and use.
Thanks in advance
There is no limit on how big collection in MongoDB can be (in a replica set or a sharded cluster). I think you are confusing this with maximum collection size after reaching which it cannot be sharded.
MongoDB Docs: Sharding Operational Restrictions
For the amount of data you are planning to have it would make sense to go with a sharded cluster from the beginning.

Maximum size of bulk create (or update) for Cloudant?

When using bulk operation with Cloudant. Is there a "hard" limit (size of all documents / number of documents)?
Also: is there a best practice setting? (size of all documents / number of documents per request)?
I understand there is a 65Mb limit in the size of individual documents in Cloudant. Having said that, I would try to avoid getting anywhere near that size of document.
A rule of thumb would be if the size of your documents is over a few tens of kilobytes, you might be better creating more documents and retrieving them using a view.
In terms of bulk operations, I tend to use batches of 500 documents. Bulk operations are a much more efficient way of transferring data between your client software and Cloudant and a 500 document batch size (as long as your document size is reasonable) is a good rule of thumb.
There is no such number that says, how many documents we can update in bulk, but There is a size limit of 1 MB for whole bulk document request object of 1 MB for whole request. if requested data is more than 1 MB then request will be rejected.
As I tested myself with JsonObject with 12 fields, it took around 2K documents to cover 1MB size. But still it can be vary if you small & large content.
Click here for more information under Rule 14: Use the bulk API