Do I really need a VPC if I can use AWS security groups to secure my MongoDB EC2 instance? - mongodb

I am really stuck here deciding whether I really need a VPC to deploy my MongoDB instance (a graphQL server also) into on AWS? I'm working on a project that's going to have a GraphQL server to serve a mobile-app along with a MongoDB instance to store the data. I've read everywhere that you must use a VPC, why though? Can't I use the security groups that AWS provides? This will allow me to lockdown my MongoDB instance right?
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-security-groups.html
The reason I don't want to use a VPC is purely because of the extra costs!. The project I'm working on has a small budget & paying all the extra money (min $60 a month) for the VPC on AWS just isn't viable. Maybe if I was building an application that was going to be massive and has 10s of thousands of users and required scale and added security for peace of mind, then I'd consider using a VPC, but since it's not going to be that, and the budget is small, is it okay to use the security groups to lockdown my mongodb ec2 instance?
I've looked into other hosting solutions, in particular Digitalocean as they provide a free VPC service, however Digitalocean does not have data centers in my region (amongst other things) + I've used AWS a fair bit in the past and would love to keep using it.
I would love any suggestions about what I could/should do.

Security groups are a feature of VPCs and are tightly coupled with how EC2 instances are hosted. You need a VPC to define your networking rules including if your instances that host the MongoDB and GraphQL servers are public/private and what their security group rules are.
I'm not sure what costs you are referring to as VPCs are free and all accounts come with a VPC already created for you (the default VPC). You only pay for the ingress/egress traffic that you use so if you aren't doing anything massive, then the cost will be tiny ($0.02/GB) compared to the cost of the instances used to host your machines.
To address your comment, A NAT Gateway would only be needed if you want your instances on private subnets but you want those subnets to have internet access. This is not required if you are comfortable with putting your instances on public subnets and then locking them down with security group and NACL rules (this is not the best security practice but it is a comprise you can make to save on costs).

Related

Cloud Formation Template design

What factors do folk take into account when deciding to write 1 large CF template, or nest many smaller ones? The use case I have in mind is RDS based where I'll need to define RDS instances, VPC Security groups, parameter and option groups as well as execute some custom lambda resources.
My gut feel is that this should be split, perhaps by resource type, but I was wondering if there was generally accepted practice on this.
My current rule of thumb is to split resources by deployment units - what deploys together, goes together.
I want to have the smallest deployable stack, because it's fast to deploy or fail if there's an issue. I don't follow this rule religiously. For example, I often group Lambdas together (even unrelated ones, depends on the size of the project), as they update only if the code/config changed and I tend to push small updates where only one Lambda changed.
I also often have a stack of shared resources that are used (Fn::Import-ed) throughout the other stacks like a KMS key, a shared S3 Bucket, etc.
Note that I have a CD process set up for every stack, hence the rule.
My current setup requires deployment of a VPC (with endpoints), RDS & application (API gateway, Lambdas). I have broken them down as
VPC stack: a shared resource with 1 VPC per region with public & private subnets, VPC endpoints, S3 bucket, NAT gateways, ACLs, security groups.
RDS stack: I can have multiple RDS clusters inside a VPC so made sense to keep it separate. Also, this is created after VPC as it needs VPC resources such as private subnets, security groups. This cluster is shared by multiple application stacks.
Application stack: This deploys API gateway & Lambdas (basically a serverless application) with the above RDS cluster as the DB.
So, in general, I pretty much follow what #Milan Cermak described. But in my case, these deployments are done when needed (not part of CD) so exported parameters are stored in parameter store of AWS Systems Manager.

How to execute Amazon Lambda functions on dedicated EC2 server?

I am currently developing the backend for my app based on Amazon Web Services. I pretended to use DynamoDB to store the user's data, but finally opted for MongoDB, which I have already installed in my EC2 instance.
I have some code written in Python to update/query... the DB, so that when a Cognito event triggers my lambda function, this code is directly executed on my instance so I can access my DB. Any ideas how can I accomplish this?
As mentioned by Gustavo Tavares, "the whole point of lambda is to run code without the need to deploy EC2 instances". And you do not have to put your EC2 with database to "public" subnets for Lambda to access them. Actually, you should never do that.
When creating/editing Lambda configuration you may select to run it in any of you VPCs (Configuration -> Advanced Settings -> VPC). Then select Subnet(s) to run your Lambda in. This will create ENIs (Elastic Network Interface) for the virtual machines you Lambdas will run on.
Your subnets must have Routing/ACL configured to access the subnets where Database resides. At least one of the SecurityGroups associated with Lambda must also have Outbound traffic allowed to the Database subnet on appropriate ports (27017).
Since you mentioned that your Lambdas are "back-end" then you should probably put them in the same "private" subnets as your MongoDB and avoid any access/routing headache.
One way to accomplish this is to give the Lambda a SAM Template, then use sam local invoke inside of the EC2 instance to execute locally.
OK BUT WHY OH WHY WOULD ANYONE DO THIS?
If your Lambda requires access to both a VPC and the Internet, and doesn't use a lot of memory and doesn't really require scalability, and you already wrote the code (*), it's actually 10x cheaper(**) and higher-performing to launch a t3.nano EC2 Spot Instance on a public subnet than to add a NAT Gateway to the Lambda function.
(*) if you have not written the code yet, don't even bother to make it a Lambda.
(**) 10x cheaper as in $3 vs $30, so this really only applies to hobbyist projects on a shoestring budget. Don't do this at work, because the cost of engineers' time to manage and maintain an EC2 instance will far exceed $30/month over the long term.
If you want Lambda to execute code on your ec2-instances you'll need to use the SDK for the language you're writing your lambda in. Then you can simply use the AWS API to run commands on your EC2 instance.
See: http://docs.aws.amazon.com/systems-manager/latest/userguide/run-command.html
I think you misunderstood the idea of AWS lambda.
The whole point of lambda is to run code without the need to deploy EC2 instances. You upload the code and the infrastructure is provisioned on the fly. If your application does not need the infrastructure anymore (after a brief period), it vanishes and you will not be charged for the idle time. If you need it again a new infrastructure is provisioned.
If you have a service, like your MongoDB, running in EC2 instances your lambda functions can access it like any other code. You just need configure your lambda code to connect to the EC2 instance, like you would be doing if your database were installed in any other internet faced server.
For example: You can put your MongoDB server in a public subnet of your VPC and assign an elastic IP for your server. In your Python lambda code you configure your driver to connect to this elastic IP and update the database.
It will work like every service were deployed in different servers across internet: Cognito connect to Lambda functions across internet and then the python code deployed in lambda connect to your MongoDB across internet.
If I can give you an advice, try DynamoDB a little more. With DynamoDB it will be even more simple to make all this work, because you will not need to configure a public subnet and request an elastic IP. And the API for DynamoDB is not very different of the MongDB API.

Any disadvantages or security issues for having website and databases on separate servers?

We're about to dive into Odoo (OpenERP). We're planning on using Amazon EC2 for the actual installation, and put the postgreSQL database server on Amazon RDS. (like this guide http://toolkt.com/site/install-openerp-server-and-postgresql-on-separate-servers/ )
If the RDS is only allowed to talk to the EC2 server, does this mitigate any security issues compared to a regular Odoo installation (where database and front facing webserver are on the same machine)? Is this an advisable setup?
Input data in your post is very vague to give you exact answer, but you may consider the following:
RDS can talk to EC2 or any other clients and application servers. Connection only depends on your configuration. You can configure VPC and configure/restrict access to your database and application servers there.
Depending on the size of your system (in terms of I/O, number of users , etc), of course you may want to configure separate database instance and application servers. At scale this separation is important.
In short, Nither any Disadvantage nor any security issues.
In Detail Odoo with AWS EC2,
We "SnippetBucket.com" Team had implemeneted already RDS and better know odoo security.
RDS is bit very expensive.
RDS make private instead of public in AWS
make complete secured.
As well AWS Security helps to make extra protection with inbound and outbound ports. Totally Safe.
Note: AWS "RDS Aurora-Postgresql" is 4X faster than official postgresql. AWS RDS support specific versions by AWS.

How to setup a MongoDB replica set in EC2 US-WEST with only two availability zones

We are setting up a MongoDB replica set on Amazon EC2 in the us-west-1 region.
This region only has two availability zones though. My understanding is that MongoDB must have a majority to work correctly. If we create 2 servers in zone us-west-1b and one server in us-west-1c this will not provide high availability if the entire us-west-1b goes down right? How is this possible? What is the recommended configuration?
Having faced a similar challenge we looked at a number of possible solutions:
Put an Arbiter in another region:
Secure the connection either by using a point to point VPN between the regions a routing the traffic across this connection.
or
Give each server an E-IP and DNS name and use some combination of AWS security groups, IPTables and SSL to ensure connections are secure.
AWS actually have a whitepaper on this not sure how old it is though http://media.amazonwebservices.com/AWS_NoSQL_MongoDB.pdf
Alternatively you could allow the application to fall back to a read-only state until your servers come back on-line (not the nicest of options though)
Hope this helps

cloudformation best practices in AWS

We are at early stages with running our services on AWS. We have our server hosted in AWS, in a VPC, having private and public subnets and have multiple instances in private and public subnets using ELB and autoscaling setup (using AMIs) for frontend web servers. The whole environement(VPC, security groups, EC2 instances, DB instances, S3 buckets, cloudfront) are setup manually using AWS console at first.
Application servers host jboss and war files are deployed on the servers.
As per AWS best practices we want to create whole infrastructure using cloudformation and have setup test/stage/prod environment.
-Would it be a good idea to have all the above componenets (VPC, security groups, EC2 instances, DB instances, S3 buckets, cloudfront etc) using one cloudformation stack/template? Or we should we create two stacks 1) having network replated components and 2) having EC2 related components?
-Once we have a prod envoronemtn running with cloudformation stact and In case we want to update the new AMIs on prod in future, how can we update the live running EC2 instances using cloudformation without interruptions?
-What are the best practices/multiple ways for code deployment to multiple EC2 notes when a new release is done? We dont use Contineus integration at the moment.
It's a very good idea to separate your setup into multiple stacks. One obvious reason is that stacks have certain limits that you may reach eventually. A more practical reason is that you don't really need to update, say, your VPC every time you just want to deploy a new version. The network architecture typically changes less frequently. Another reason to avoid having one huge template, or to make changes to an "important" template needlessly, is that you always run the risk of messing things up. If there's an error in your template and you remove an important resource by accident (e.g. commented out) you'll be very sorry. So separating stacks out of sheer caution is probably a good idea.
If you want to update your application you can simply update the template with the new AMIs and CFN will know what needs to be recreated or updated. You can read about rolling updates here. However, I'd recommend considering using something a bit more straightforward for deploying your actual code, like Ansible or Chef.
I'd also recommend you look into Docker for packaging and deploying your application's nodes. Very handy.