I'm actually new on AWS. And I configured 2 EC2 instances.
One for my MongoDB database and an other one for my application.
I'm using pymongo to make the connection. But If send data through instances each time, it takes too much time. I would like to know if it's possible to have the mongoDB instance as localhost for the application one, using groups or I don't know, to get better performances.
Or If it is better to put the database on the same instance as my application and get more EBS.
Be sure you know where your performance bottleneck is.
If both instances are in the same Availability Zone, network latency should not be the largest performance issue. In fact if you have instances that are at least large... due to the better NIC... network latency should be a non-issue.
To know for sure, measure your network utilization with a monitoring tool.
If any of your working set (MongoDB documents that are used with any frequency) cannot fit in RAM of the instance, that means you are touching EBS. EBS is very, very slow compared to what MongoDB needs. I measured a single EBS volume using iozone recently and found the EBS volume to be half as fast as my laptop's rotational hard drive.
You can improve EBS performance substantially by striping multiple EBS volumes into a software RAID configuration.
The bottom line when running MongoDB on AWS is that you need enough RAM to hold the MongoDB documents that you will touch with any frequency.
I have an application in production that uses a mongodb instance on the same machine as the web server. Works fine for me but then I don't have need for scalability right now. One instance is enough.
So to answer your question, sure you can run it as localhost.
But if your app picks up and you need multiple instances or sharding or such then you'd have to have instances deployed on other machines as well.
Related
I am having multiple web apps that use MongoDB Atlas as their database.
In Atlas, you can create Clusters that hold multiple databases.
For every web app, I usually need one database. However, I am not sure if I should create one cluster for every web app or only one cluster in total holding one database for every web app. Is there a better choice?
If I see right, then MongoDB's business model is to limit the free clusters capacities, which means that it would be better to create a free cluster for every web app, since otherwise the capacity of one cluster is consumed very quickly.
If I see right, then MongoDB's business model is to limit the free clusters capacities, which means that it would be better to create a free cluster for every web app
If this is correct (which seems to me like it is) then creating separate clusters per application is a good idea.
Once you are paying for your databases, it may be cheaper to put multiple databases in the same cluster (since you'll have less overhead per database).
A reason to use separate clusters per application when you are paying for databases is additional security/resilience to accidental database wipes.
I currently have a small website hosted on AWS.
The server is a micro-instance.
On this micro-instance:
I am running nginx to serve static files and error pages
I am running my node server
I am storing my mongoDB
As the website is getting more traffic, I reached the time where I need to scale, and I am not sure what the best-practices are and what are the implication of each.
I would love any referrals to reading materials
I was thinking of having:
2 dedicated micro-instances to run the website
1 micro-instance running nginx
1 micro-instance storing the db
questions:
Would having the db stored on a separate machine make the queries
significantly slower?
Should I in fact store the db on S3 instead?
Is it justified to have an entire instance for nginx alone?
How would you go about scaling from 1 machine to multiple ones? I am guessing moving from one to two is harder than moving from two to 50.
Any advice will be greatly appreciated!
Would having the db stored on a separate machine make the queries significantly slower?
No, the speed impact would be very minimal, and this would be needed for scalability anyway. Just make sure you use the private IP addresses of your instances for any inter-instance communication so that the traffic stays inside your VPC (for both security and performance reasons).
Should I in fact store the db on S3 instead?
No, that wouldn't work at all. You can't store a DB on S3, only DB backups.
Is it justified to have an entire instance for nginx alone?
If you are getting enough traffic, then yes absolutely.
How would you go about scaling from 1 machine to multiple ones?
In general you need to move your DB to a separate server, create multiple instances of your web server, and place a load balancer in front of them. If you want automatic scaling based on traffic then you would also place the web servers in an auto-scaling group. If all this sounds difficult then I would recommend looking into moving your web servers into Elastic Beanstalk which will manage much of this for you.
If your database is a bottleneck then you might also need to setup a MongoDB cluster and balance the load across the cluster. You could also move your DB to something like mlab which would greatly ease the management of that as well.
Currently, I can dynamically increase or decrease the APP servers with AWS ELB(just by monitoring the CPU loading).
However, All of the data is stored in MongoDB at one machine with 2GB Ram, all of the data is keeping updating as well,
It could NOT be easily scaled under burst incoming flow.
Vertical horizontal won't work because the server will be out of service for few minutes.
To create a new DB machine sounds won't work too. Because the newly created machine doesn't have updated data.
How could I design the DB infrastructure to handle this dynamic loading situation?
Most of the time, there are only about 20 members on my site. Nevertheless, at some particular moment, there will be about 1500 members on my site.
Thanks
You should look into the topic of replica sets to enable vertical scaling, and sharded deployments to enable horizontal scaling.
These topics are introduced nicely on page 10 of the following document -
https://d0.awsstatic.com/whitepapers/AWS_NoSQL_MongoDB.pdf
Both these features are slightly complex and will take some intimate knowledge with mongo to work well. If you want an out-of-the-box solution, you can run you DB on a seperate service outside AWS. We are using compose.io for this matter. It satisfies our needs during peak hours and isn't that expensive.
I have to setup a server environment for a web application. I have to use aws, and so far it looks good for that purpose.
I need:
a scalable Tomcat 7 webapp server
session replication!
a mongodb database cluster(?)
As far as I think it could work with:
The scalable Tomcat 7 I can do easily with elastic beanstalk.
The session replication could work with elasticache
It seems like I have todo the mongodb cluster "manually", so I created some ec2 instances todo so.
I have some problems with that.
the costs would be quite high. The minimum setup would be 2 ec2 instances and one for the elasticache
The only thing, which autoscales, is the elastic beanstalk, means that I have to take care of that, too. (Well, for the mongodb instances, I could use a balancer, too)
In case of the mongodb ec2 instances, I need to setup each instance by myself
Do you have any idea how to:
lower the costs (Especially in the beginning, it would be a little much, no?)?
make the administration easier?
If you install mongo, have a look at: AWS_NoSQL_MongoDB.pdf. It is very customary to have one member of the replica set outside the availability zone you use, or even outside of the same region, so you can have a hot backup in case of a failure.
About prices - experiment (load test) and find the smaller instance type that fits you. Also, remember to shut down unused instances
About management - there is the AWS console and many 3rd party products. Also, Netflix have released some nice management tools.
Autoscaling for your web server is easily done with Elastic Beanstalk, but you can use it independently. Check out the documentation for autoscaling here: http://aws.amazon.com/autoscaling/
It has couple of features that will help you save most of your computation costs;
one is the ability to scale out (and most importantly in), by changing automatically the number of web server you are using, based on your load. You can define the minimum number (for example, 1) and the maximum number. The system can watch a set of metrics (number of requests, CPU load...) and decide when to add or subtract instances.
the second is the ability to change the scale policy and increase or decrease the size of the machines, based on your usage. You can use medium size instances and switch to large or extra large ones, if you find it more cost effective. You are encourage to try out different sizes to see what fits you best.
Using Elastic Cache, can help you both in the session replication, but also to lower the load on your DB machine. You can cache there your frequent queries output (front page, main category page...), and get better performance, and use fewer DB instances. It supports Memecached clients, which makes it very easy to develop in almost any programming language.
You should check Couchbase instead of MongoDB (see comparison). It is more robust and more reliable in scale.
Elastic BeanStalk is the best to reduce the over head of management for Web/App servers. However if haven't taken the development too far, I would recommend to use DynamoDB for the administration easement aspect.
However keep check on cost as well. Performance and Management is something really awesome in DynamoDB
although I've much experience writing code. I don't really have much experience deploying things. I am writing a project that uses mongodb for persistence, redis for meta-caching, and play for serving pages. I am deciding whether to buy a dedicated server vs buying multiple small/medium instance from amazon/linode (one for each, mongo, redis, play). I have thought of the trade-offs as below, I wonder if anyone can add to the list or provide further insights. I am leaning toward (b) buying two sets of instances from linode and amazon, so if one of them have an outage it will fail over to the other provider. Also if anyone has any tips for deploying scala/maven cluster or tools to do so, much appreciated.
A. put everything in one instance
Pros:
faster speed between database and page servlet (same host).
cheaper.
less end points to secure.
Cons:
harder to manage. (in my opinion)
harder to upgrade a single module. if there are installation issues, it might bring down the whole system.
B. put each module (mongo,redis,play) in different instances
Pros:
sharding is easier.
easier to create cluster for a single purpose. (i.e. cluster of redis)
easier to allocate resources between module.
less likely everything will fail at once.
Cons:
bandwidth between modules -> $
secure each connection and end point.
I can only comment about the technical aspects (not cost, serviceability, etc ...)
It is not mentioned whether the dedicated instance is a physical box, or simply a large VM. If the application generates a lot of roundtrips to MongoDB or Redis, then the difference will be quite significant.
With a VM, the cost of I/Os, OS scheduling and system calls is higher. These elements tend to represent an important part in the performance cost of efficient remote data stores like MongoDB or Redis, and the virtualization toll is higher for them.
From a system point of view, I would not put MongoDB and Redis/Play on the same box if the MongoDB database is expected to be larger than the available memory. MongoDB maps data files in memory, and relies on the OS to perform memory swapping. It is designed for this. The other processes are not. Swapping induced by MongoDB will have catastrophic consequences on Redis and Play response time if they are all on the same box. So I would at least separate MongoDB from Redis/Play.
If you plan to use Redis for caching, it makes sense to keep it on the same box than the Play server. Redis will use memory, but low CPU. Play will use CPU, but not much memory. So it seems a good fit. Also, I'm not sure it is possible from Play, but if you use a unix domain socket to connect to Redis instead of the TCP loopback, you can achieve about 50% more throughput for free.