mongoDB atlas scaling with AWS regions

mongoDB atlas scaling with AWS regions - mongodb

I have 6 EC2 Instances on different region on AWS. I have only one MongoDB server on MongoDB Atlas (AWS). I am having latency issue while updating Data to the database. what is the best way to scale my MongoDB server with My 6 EC2 instances.

You can try one way you can apply the VPC peering from EC2 to MongoDB Atlas (AWS). As MongoDB Atlas as now VPC peering function.
If you are trying to connect the MongoDB using public-facing network, then it might be slow.
AWS launched instances in a different region may create a connection using inter-region cross VPC peering.
So if you are following tradition way to whitelisting ACL's better try this it will reduce latency issue.please peer all the region's VPC (make sure there will not be any IP conflict) and try to connect using private connections.
EDIT : 1
There is no additional penalty with MongoDB in terms of using it out of region, but most database protocols are not optimized for high latency conditions. You might be much better off setting up a read replica in other regions.
You can read this: https://www.mongodb.com/blog/post/optimizing-fast-responsive-reads-cross-region-replication-mongodb-atlas
EDIT : 2
If you can't push your database to multiple regions (by using read replicas for example), then you should consider using CloudFront in front of your application(website) to allow for caching of requests in the different regions.
It won't technically improve the latency between application and database, but in terms of your user's perception of performance, it will be little speedy.

Related

What is the difference between a cluster and a replica set in mongodb atlas?

I'm taking the Mongodb University M103 course and over there they gave a brief overview of what a cluster and a replica set is.
From my understanding a cluster is a set of servers or nodes. While a replica set is a set of servers or nodes all of which has replication mechanism built into each of them for downtime access and faster read operation.
From that it seems that replica set is a specific type of cluster, but my confusion arises from MongoDB Atlas. In mongoDB atlas one has to create a cluster, is that a replica set as well?
Are those terms interchangeable in all scenarios?

Replica Set
In MongoDB, a replicaset is a concept that depicts a set of MongoDB server working in providing redundancy (all the servers in the replica set have the same data) and high availability (if a server goes down, the remaining servers can still fulfil requests). When you create a replicaset, you need a minimum of 3 servers. There will always be a primary (read and write) and the remaining are called secondaries (for reading only).
MongoDB Atlas Cluster
Atlas is a DaaS, meaning a database a service company. They remove the burdain of maintaining, scaling and monitoring MongoDB servers on premise, so that you can focus on your applications.
An Atlas MongoDB cluster is a set of configuration you give to Atlas so it can setup the MongoDB servers for you. Hence, a MongoDB ReplicaSet is a feature subset in Atlas.
For example, while creating an Atlas Cluster, they will ask you whether you want a replicaset, sharded cluster, etc. Also, in which cloud provider you want to deploy. Your backup policy, the specs of your MongoDB hardware and more...
The keyword here is configuration. At the end of the day, you will have your MongoDB servers (replicaset or not) up and ready.
Summary
MongoDB Cluster
A specific configuration set of MongoDB servers to provide specific
features. i.e. replicaset and sharding.
MongoDB Replicaset
A MongoDB cluster setup to provide redundancy and high
availability with 3 or more odd number of servers (3, 5, 7, etc.)
MongoDB Atlas Cluster
High level MongoDB cluster configuration that allows you to set a
replicaset or other type of MongoDB cluster with its location and performance range.
I would suggest you to play with their web console. You will definitely see the difference.

How can I deploy Mongo database on AWS?

I am building my own webapp which requires a huge database. I want to build and manage my own Mongo database on AWS rather than using Mongo Atlas. Which will be more cost saving? And whether I should go for Mongo Atlas? What will be its advantage over my own database?

There are pros and cons for both approaches:
Running MongoDB on AWS
Pros:
Complete control over how you run the database and how resources are allocated on the server. This could even be together with an application server on the same EC2 instance depending on your traffic and load. This might help with cost saving if your database is huge but isn't likely to see much traffic.
Cons:
You will be responsible for ensuring database availability and applying security patches as and when they are available. You may also have to setup firewalls and protect the EC2 instance and database in other ways that would be trivial to do on a hosted service like Atlas.
Data sharding and clustering can be a real pain to manage by yourself.
Running on Atlas
Pros:
Completely managed service where you don't have to be concerned about performance optimization or scalability. You pay for the services and Mongodb takes care of the rest.
You can focus on building a great application instead of spending your time on administering the database and the EC2 instance on which the database runs.
Cons:
You will be constrained by the options offered by Atlas. For most use cases this should be fine, but if you really want a specific change, it would be difficult to implement it if Mongodb doesn't already support it as a part of Atlas.
Think running your application on EC2 vs buying a server on-premise and running your application on that.
Being a managed service, costs might also be higher if your database does not see much traffic.

HOSTING yourself: You can get one or more AWS ec2 instances(which are VMs) where you can install and run Mongo DB yourself and manage it like you wanted to, making sure that you spin up more instances when the workload becomes large and there are instances up and running at all times to enable high availability.
Cost (high) - Management responsibilities (lots) - Full MongoDB functionality
MongoDB Atlas is a managed service, you don't need to worry about management tasks like scaling of your database and high availability when a single/more instances die... You pay a very low cost for it - this is run by MongoDb themselves on AWS, Azure, Google cloud;
Cost (low) - Management responsibilities (some) - Full MongoDB functionality
Now AWS has its own Mongo compatible database called DocumentDB - this is also a managed database, so you don't need to worry about scalability, high availability etc. This is only available on AWS so super simple and convinient.
Cost (low) - Management responsibilities (minimal) - Limited MongoDB functionality

mongo atlas or aws - Internal or External connection

i am working on my next project currently which works 100% on mongo,
my past projects worked on SQL + Mongo on which i used AWS RDS + AWS EC2 and could connect them both in AWS internal IP which result me with much faster connection.
Now in mongo there is alot of fancy cloud servers like MLab and MongoDB Atlas which is actually cheaper then AWS.
My concern is that moving back to external DB connection will be slower and more network consuming then the internal connection in RDS
Have anyone experienced in such issue? maybe the different isn't that big as i make it but i need it to be optimized

This depends on your setup. Many of the "fancy" services also host stuff on AWS, so latency is minimal. Some even offer "private environments" or such, so you can hide your databases from public view.
The only thing left to care about is the amount of network traffic. But this will be your problem regardless of your database host. You can test this relatively easily (e.g. get a trial from one of the providers and test for throughput, or raise your own MongoDB docker cluster to use as a test etc) just to get an idea of the performance range you'll be in.

How to setup a MongoDB replica set in EC2 US-WEST with only two availability zones

We are setting up a MongoDB replica set on Amazon EC2 in the us-west-1 region.
This region only has two availability zones though. My understanding is that MongoDB must have a majority to work correctly. If we create 2 servers in zone us-west-1b and one server in us-west-1c this will not provide high availability if the entire us-west-1b goes down right? How is this possible? What is the recommended configuration?

Having faced a similar challenge we looked at a number of possible solutions:
Put an Arbiter in another region:
Secure the connection either by using a point to point VPN between the regions a routing the traffic across this connection.
or
Give each server an E-IP and DNS name and use some combination of AWS security groups, IPTables and SSL to ensure connections are secure.
AWS actually have a whitepaper on this not sure how old it is though http://media.amazonwebservices.com/AWS_NoSQL_MongoDB.pdf
Alternatively you could allow the application to fall back to a read-only state until your servers come back on-line (not the nicest of options though)
Hope this helps

Does the data in mongodb provisioned in EC2 gets replicated while Autoscaling?

To deploy a server in Amazon Ec2, I wish to have the mongodb master database in an Ec2 instance itself and at an average I would be having around 5-6 Ec2 instances running in parallel which are scaled by amazon auto-scaling group.
As database is updated frequently and all instances are under Elastic load balancer,it is hard to predict which users data is in which database of Ec2. By following this approach, am i assured of data consistency in mongodb across the instances while scaling up and down ? If it is not the good approach please suggest alternate ways of doing it.

When using Amazon autoscaling, new EC2 instances will be created from a root AMI image (for example, with an empty database).
As data is added to your database, that data is not synced back to the AMI image. So when a second EC2 instance is launched due to a scaling event, that new EC2 instance will have it's own blank database, because it will be based on the same root AMI image (with the blank database).
The two databases will not know about each other and no syncing will occur. Also, at any time, any of the EC2 instances may be deleted due to a scale-down event. So any data on that instance may be lost.
Separate your web layer from the database layer: use autoscaling to scale your web layer, but don't use autoscaling for your data layer.
MongoDB has it's own form of clustering for load balancing and high-availability. Use it rather than rolling your own using autoscaling.

It is not a standard practice to couple your web server with the database server. Here is what I would suggest.
Implement load balancing on your web servers as well as your mongo db instances so for the sake of argument, you will have 4 web and 4 mongo db servers.
For implementing load balancing on your mongo db servers, it is up to you if you wish to go with a master-slave tier where each mongo db server is a master as well as a slave(so that all instances have data synced) or you can look into sharding.