Is there a way to test an M10 Atlas cluster on MongoDB Atlas? - mongodb

We have an M10 cluster and the official page states that we get a max of 100 IOPS.
I cant run mongoperf on the cluster as we have direct mongo shell and compass access and mongoperf needs to be run on the instance that has MongoDB installed.
Is there any way to test the maximum requests per second that this cluster can handle and if not, is there any rough estimate available as to how many read/write operations it can handle concurrently?
PS:- Assume the queries being run aren't particularly complex and are only entering small sets of data such as Name, Email Address, Age, etc.
Thanks in advance!

Is there any way to test the maximum requests per second that this cluster can handle and if not, is there any rough estimate available as to how many read/write operations it can handle concurrently?
The answer to this, really depends on a lot of variables. i.e. document sizes, indexes utilisation, network latency between application and servers, etc.
To provide a rough estimation however, assuming your MongoDB cluster is hosted on AWS (GCP and Azure would be different), the specs would be:
M10, 2GB RAM and 10GB included storage.
In addition to this, you can select different EBS storage sizes as well as provisioned IOPS to match your required performance.
See also MongoDB Atlas: FAQ
We have an M10 cluster and the official page states that we get a max of 100 IOPS.
The number of IOPS advertised is, what would be advised by the cloud provider, i.e. AWS. Not taking account the network latency and your database usage that affects the server resources i.e. CPU/RAM.

Related

Do mongodb databases within a cluster share the same node set under the hood?

The reason I am asking is, I have a resource-intensive collection that degrades performance of its entire database. I need to decide whether to migrate other collections away to a different database within the same cluster or to a different cluster altogether.
The answer I think depends on under-the-hood implementation. Does a poorly performing collection take resources only from its own database, or from the cluster as a whole?
Hosted on Atlas.
I would suggest first look at your logical and schema designs and try to optimize it but if that is not working then
"In MongoDB Atlas, all databases within a cluster share the same set of nodes (servers) and are subject to the same resource limitations. Each database has its own logical namespace and operates independently from the other databases, but they share the same underlying hardware resources, such as CPU, memory, and I/O bandwidth.
So, if you have a resource-intensive collection that is degrading performance for its entire database, migrating other collections to a different database within the same cluster may not significantly improve performance if the resource bottleneck is at the cluster level. In this case, you may need to consider scaling up the cluster or upgrading to a higher-tier plan to increase the available resources and improve overall cluster performance."
Reference: https://www.mongodb.com/community/forums/t/creating-a-new-database-vs-a-new-collection-vs-a-new-cluster/99187/2
The term "cluster" is overloaded. It can refer to a replica set or to a sharded cluster.
A sharded cluster is effectively a group of replica set with a query router.
If you are using a sharded cluster, you can design a sharding strategy that will put the busy collection on its own shard, the rest of the data on the other shard(s), and still have a common point to query them both.

Should I create a read replica or scale the DB instance?

I'm using AWS's RDS PostgresQL DB
I recently traced an application error to the database connections being maxed out, I temporarily scaled the instance to support more connections since AWS's default behaviour is to use the instance's memory to calculate the maximum number of connections.
Most of the connections are due to clients reading data, so should I create a read replica instead of scaling the server? I'm thinking of this in terms of best practices, costs, and effort
It depends how much the IOPS are, if your app is read intensive and there are connections limitation as well. Read IOPS are directly related to EBS storage, if you are using general storage.
Thus if you are using general storage. First look at your Read IOPS, if they are on average higher than STORAGE_IN_GB*3 then go for a read replica.
Otherwise just connection issue should be fixed by scaling the instance up vertically.

mongoDB architecture for scalable read-heavy app (constant writes)

My app runs a daily job that collects data and feeds it to a mongoDB. This data is processed and then exposed via rest API.
Need to setup a mongodb cluster in AWS, the requirements:
Data will grow about the same size each day ( about 50M records), so write throughput doesn't need to scale. writes would be triggered by a cron at a certain hour. Objects are immutable ( they won't grow)
Read throughput will depend on number of users / traffic, so it should be scalable. traffic won't be heavy in the beginning.
Data is mostly simple JSON, need a couple of indices around some of the fields for fast-querying / filtering.
what kind of architecture should I use in terms of replica sets, shards, etc ?.
What kind of storage volumes should I use for this architecture? ( EBS, NVMe) ?
Is it preferred to use more instances or to use RAID setups. ?
I'm looking to spend some ~500 a month.
Thanks in advance
To setup the MongoDB cluster in AWS I would recommend to refer the latest AWS quick start for MongoDB which will cover the architectural aspects and also provides CloudFormation templates.
For the storage volumes you need to use EC2 instance types that supports EBS instead of NVMe storage since NVMe is only an instance storage. If you stop and start the EC2, the data in NVMe is lost.
Also for the storage volume throughput, you can start with General Purpose IOPS with resonable storage size and if you find any limitations then only consider Provisioned IOPS.
For high availability and fault tolerance the CloudFormation will create multiple instances(Nodes) in MongoDB cluster.

Running MongoDB on heterogeneous cluster

We're currently trying to find a solution, where we can host tens of millions of images of various sizes on our website. We're evaluating Riak and MongoDB. Currently Riak looks very nice, but all the servers in the cluster should be homogeneous, because Riak treats each node equally.
It is a bit hard to find information about MongoDB regarding to this, except the fact, that you can set a priority on the nodes. My questions are:
What is the consequence of creating a MongoDB cluster composed of machines with wildly varying specifications (cpu, diskspace, disk speed, amount and speed of memory, network speed)?
Can the MongoDB cluster detect and compensate for such differences automatically?
I'm also open for ideas for other solutions, where heterogeneous cluster would work nicely.
What is the consequence of creating a MongoDB cluster composed of machines with wildly varying specifications (cpu, diskspace, disk speed, amount and speed of memory, network speed)?
Each shard in MongoDB is its own mongod, an isolated MongoDB in itself.
It isn't tied by the performance of others, so say you have a shard that requires more power, you can just upgrade its hardware and job is done. MongoDB likes environments like this and actively promotes them.
Can the MongoDB cluster detect and compensate for such differences automatically?
It doesn't need to detect but it will automatically compensate, naturally without any work within the mongos or configsrv
In fact you will find you can even run different MongoDB versions.

How does mongodb replica compare with amazon ebs?

I am new to mongodb and amazon ec2.
It seems to me that mongo replicas are here to : 1/ avoid data loss and 2/ make reads and serving faster.
In Amazon they have this EbS thing. From what I understand it is a global persistent storage, like dropbox for instance.
So is there a need to have replicas if amazon abstracts away the need of it with EBS ?
Thanks in advance
Thomas
Let me clarify a couple of things.
EBS is essentially a SAN Volume if you are used to working within existing technologies. It can be attached to one instance, but it still has a limited IO throughout. Using RAID can help maximize the IO, provisioned IOPS can help you maximize the throughput.
Ideally however, with MongoDB, you want to have enough memory where indexes can be completely accessed within memory, performance drops if the disk needs to be hit.
Mongo can use Replicas, which is primarily used for failover and replication (You can send reads to a slave, but all writes need to hit the primary), and sharding which is used to split a dataset to increase performance. You will still need to do these things anyway even if you are using EBS for storage.
Replicas are there not just for storage redundancy but also for server redundancy. What happens if your MongoDB server (which uses an EBS volume) suddenly disappears because, for example, the host on which is sits fails? You would need to do a whole bunch of stuff, like clone a new instance to replace it, attach the volume to that instance, reroute traffic to it, etc. Mongo's replica sets mean you don't have to do that. They keep working even if one of them fails, so you have basically 0 down time.
Additionally, it's one more layer of redundancy. You can only trust EBS so far - what if AWS has a bug that erases your volume or that makes it unavailable for an unacceptably long time? With replica sets you can even replicate your data across availability zones or even to a completely different cloud provider.
Replica sets also let you read from multiple nodes, so you can increase your read throughput, theoretically, after you've maxed out what the EBS connection gives you from one instance.