Generate OAS3 API from AWS serverless YAML without deploying? - aws-api-gateway

I am developing a moderately complex (for me) serverless infrastructure on AWS that consists of about 50 lambdas and would like to start fleshing out the API documentation but am finding it very tedious. Right now any time I want to make a minor schema change, documentation change, etc. in my YAML, I am re-deploying the whole Cloud Formation and then regenerating the OAS3 API with
sls deploy
aws apigateway get-export --parameters extensions='apigateway' \
--rest-api-id $API_ID --stage-name dev --export-type oas30 \
latest_changes.json
This is obviously pretty time consuming and I feel like there must be a better way. I poked around with with serverless-documentation plugin, but that still seems to require a redeploy (and only works with OAS2), and I've now started investigating serverless-offline (which I wish I knew about in the past), but before I go down that rabbit hole I wanted to see if there's a better way to do this.

I was ultimately able to do what I want using the serverless-openapi plugin (I'm actually using the pvdlg fork), which uses a custom:documentation style very similar to the OAS2 serverless-aws-documentation plugin from serverless.com. With that plugin installed I can then do sls openapi generate -f json -o oas3api.json to output swagger that's pretty close to fully documented, though I did have to write a small script to ingest the swagger, add in per-path security sections (which the plugin doesn't seem to support) and an info block for things like a contact name and TOS, and spit it back out again.
But at least it means no more waiting for AWS to update the CloudFormation just so I can update my docs!

Related

Best practice for running database schema migrations

Build servers are generally detached from the VPC running the instance. Be it Cloud Build on GCP, or utilising one of the many CI tools out there (CircleCI, Codeship etc), thus running DB schema updates is particularly challenging.
So, it makes me wonder.... When's the best place to run database schema migrations?
From my perspective, there are four opportunities to automatically run schema migrations or seeds within a CD pipeline:
Within the build phase
On instance startup
Via a warm-up script (synchronously or asynchronously)
Via an endpoint, either automatically or manually called post deployment
The primary issue with option 1 is security. With Google Cloud Sql/Google Cloud Build, it's been possible for me to run (with much struggle), schema migrations/seeds via a build step and a SQL proxy. To be honest, it was a total ball-ache to set up...but it works.
My latest project is utilising MongoDb, for which I've connected in migrate-mongo if I ever need to move some data around/seed some data. Unfortunately there is no such SQL proxy to securely connect MongoDb (atlas) to Cloud Build (or any other CI tools) as it doesn't run in the instance's VPC. Thus, it's a dead-end in my eyes.
I'm therefore warming (no pun intended) to the warm-up script concept.
With App Engine, the warm-up script is called prior to traffic being served, and on the host which would already have access via the VPC. The warmup script is meant to be used for opening up database connections to speed up connectivity, but assuming there are no outstanding migrations, it'd be doing exactly that - a very light-weight select statement.
Can anyone think of any issues with this approach?
Option 4 is also suitable (it's essentially the same thing). There may be a bit more protection required on these endpoints though - especially if a "down" migration script exists(!)
It's hard to answer you because it's an opinion based question!
Here my thoughts about your propositions
It's the best solution for me. Of course you have to take care to only add field and not to delete or remove existing schema field. Like this, you can update your schema during the Build phase, then deploy. The new deployment will take the new schema and the obsolete field will no longer be used. On the next schema update, you will be able to delete these obsolete field and clean your schema.
This solution will decrease your cold start performance. It's not a suitable solution
Same remark as before, in addition to be sticky to App Engine infrastructure and way of working.
No real advantage compare to the solution 1.
About security, Cloud Build will be able to work with worker pool soon. Still in alpha but I expect in the next month an alpha release of it.

Advantages of Templates ( ie infrastructure as code) over API calls

I am trying to setup a module to deploy resources in the cloud (it could be any cloud provider). I don't see the advantages of using templates (ie. the deploy manager) over direct API calls :
Creation of VM using a template :
# deployment.yaml
resources:
- type: compute.v1.instance
name: quickstart-deployment-vm
properties:
zone: us-central1-f
machineType: f1-micro
...
# bash command to deploy yaml file
gcloud deployment-manager deployments create vm-deploy --config deployment.yaml
Creation of VM using a API call :
def addInstance(http, listOfHeaders):
url = "https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/instances"
body = {
"name": "quickstart-deployment-vm",
"zone": " us-central1-f",
"machineType": "f1-micro",
...
}]
bodyContentURLEncoded = urllib.urlencode(bodyContent)
http.request(uri=url, method="POST", body=body)
Can someone explain to me what benefits I get using templates?
readability\easy of use\authentication handled for you\no need to be a coder\etc. There can be many advantages, it really depends on how you look at it. It depends on your background\tools you use.
It might be more beneficial to use python all the way for you specifically.
It's easier to use templates and you get a lot of builtin functionality such as running a validation on your template to scan for possible security vulnerabilities and similar. You can also easily delete your infra using the same template as you create it. FWIW, I've gone all the way with templates and do as much as I can with templates and in smaller units. It makes it easy to move out a part of the infra or duplicate it to another project, using a pipeline in GitLab to deploy it for example.
The reason to use templates over API calls is that templates can be used in use cases where a deterministic outcome is required.
Both Template and API call has its own benefits. There is always a tradeoff between the two options. If you want more flexibility in the deployment, then the API call suits you better. On the other hand, if the security and complete revision is your priority, then Template should be your choice. Details can be found in this online documentation.
When using a template, orchestration of the deployment is handled by the platform. When using API calls (or other imperative approaches) you need to handle orchestration.

Deploy REST API with dependencies

I want to deploy a trained machine learning model as a REST API. The API would take a file and first decompose it into features. The problem is that this step depends on other libraries (e.g., FFTW). The API would then query the model with the features from the previous step.
Theoretically I can spin up a virtual machine in the cloud, install all the dependencies there, and point the end point to that VM. But this won't scale if we have concurrent requests.
Ideally I'd love to put everything in a API gateway and leverage serverless paradigm so I don't have to worry about scalability.
First of all, you need to decompose your model into different steps. From your question I see preprocessing and model inference steps.
Your preprocessing includes dependencies such as a FFTW.
You didn't specify what kind of model do you have, but I assume that it also requires some sort of environment and/or dependencies.
Having said that, what do you need to do is to implement 2 services for each step.
It's better pack them into docker images in order to keep each container isolated and you will be able to easily deploy them.
Scalability on a docker lever could be achieved by deployment into cloud providers and docker orchestration with AWS ECS or Kubernetes.
There is an open-source project hydro-serving that could help you with this task.
In this case you just need to focus on the models themselves. hydro-serving takes care of the infrastructure.
If preprocessing stage is implemented as Python script -- we can deploy it with all deps from requirements.txt in individual containers.
The same is also true for the model -- it has have out-of-box of Tensorflow and Spark models.
Otherwise it's easy to adapt existing mechanism to satisfy your requirements (other language/toolkit)
Then, assuming that you already have a hydro-serving instance somewhere, you upload your steps with hs upload --host $HOST --port $PORT
and compose an application pipeline with your models.
You can access your application via HTTP api, GRPC api or Kafka topic.
It would be great if you'd specify what the files you are trying to send to REST API.
Possibly you will need to encode them somehow, in order to send them through REST API. On the other hand you could just send them as-is via GRPC api.
Disclosure: I'm a developer of hydro-serving

Convert Terraform Templates to Cloudformation Templates

I want to convert the existing terraform templates(hcl) to aws cloudformation templates(json/yaml).
I basically want to find security issues with these templates through CFN_NAG.
An approach that I have already tried was converting HCL to JSON and then passing the template to CFN_NAG but I received a failure since both the templates have different structure.
Can anyone please provide any suggestions here?
A rather convoluted way of achieving this is to use Terraform to stand-up actual AWS environments, and then to use AWS’s CloudFormer to extract CloudFormation templates (JSON or YAML) from what Terraform has built. At which point you can use cfn-nag.
CloudFormer has some limitations, in that not all AWS resources are currently supported (RDS Security Groups for example) , but it will get you all the basic AWS resources.
Don't forget to remove all the environments, including CloudFormer's, to minimise the cost.
You want to use static code analysis to find security issues in your Terraform setup.
Trying to converting Terraform to CloudFormation to later use cfn-nag is one way. However, there exist tools now that directly operate on the Terraform setup.
I would recommend to take a look at terrascan. It is built on terraform_validate.
https://github.com/bridgecrewio/checkov/ runs security scanning for both terraform and cloudformation

Using Ansible to automatically configure AWS autoscaling group instances

I'm using Amazon Web Services to create an autoscaling group of application instances behind an Elastic Load Balancer. I'm using a CloudFormation template to create the autoscaling group + load balancer and have been using Ansible to configure other instances.
I'm having trouble wrapping my head around how to design things such that when new autoscaling instances come up, they can automatically be provisioned by Ansible (that is, without me needing to find out the new instance's hostname and run Ansible for it). I've looked into Ansible's ansible-pull feature but I'm not quite sure I understand how to use it. It requires a central git repository which it pulls from, but how do you deal with sensitive information which you wouldn't want to commit?
Also, the current way I'm using Ansible with AWS is to create the stack using a CloudFormation template, then I get the hostnames as output from the stack, and then generate a hosts file for Ansible to use. This doesn't feel quite right – is there "best practice" for this?
Yes, another way is just to simply run your playbooks locally once the instance starts. For example you can create an EC2 AMI for your deployment that in the rc.local file (Linux) calls ansible-playbook -i <inventory-only-with-localhost-file> <your-playbook>.yml. rc.local is almost the last script run at startup.
You could just store that sensitive information in your EC2 AMI, but this is a very wide topic and really depends on what kind of sensitive information it is. (You can also use private git repositories to store sensitive data).
If for example your playbooks get updated regularly you can create a cron entry in your AMI that runs every so often and that actually runs your playbook to make sure your instance configuration is always up to date. Thus avoiding having "push" from a remote workstation.
This is just one approach there could be many others and it depends on what kind of service you are running, what kind data you are using, etc.
I don't think you should use Ansible to configure new auto-scaled instances. Instead use Ansible to configure a new image, of which you will create an AMI (Amazon Machine Image), and order AWS autoscaling to launch from that instead.
On top of this, you should also use Ansible to easily update your existing running instances whenever you change your playbook.
Alternatives
There are a few ways to do this. First, I wanted to cover some alternative ways.
One option is to use Ansible Tower. This creates a dependency though: your Ansible Tower server needs to be up and running at the time autoscaling or similar happens.
The other option is to use something like packer.io and build fully-functioning server AMIs. You can install all your code into these using Ansible. This doesn't have any non-AWS dependencies, and has the advantage that it means servers start up fast. Generally speaking building AMIs is the recommended approach for autoscaling.
Ansible Config in S3 Buckets
The alternative route is a bit more complex, but has worked well for us when running a large site (millions of users). It's "serverless" and only depends on AWS services. It also supports multiple Availability Zones well, and doesn't depend on running any central server.
I've put together a GitHub repo that contains a fully-working example with Cloudformation. I also put together a presentation for the London Ansible meetup.
Overall, it works as follows:
Create S3 buckets for storing the pieces that you're going to need to bootstrap your servers.
Save your Ansible playbook and roles etc in one of those S3 buckets.
Have your Autoscaling process run a small shell script. This script fetches things from your S3 buckets and uses it to "bootstrap" Ansible.
Ansible then does everything else.
All secret values such as Database passwords are stored in CloudFormation Parameter values. The 'bootstrap' shell script copies these into an Ansible fact file.
So that you're not dependent on external services being up you also need to save any build dependencies (eg: any .deb files, package install files or similar) in an S3 bucket. You want this because you don't want to require ansible.com or similar to be up and running for your Autoscale bootstrap script to be able to run. Generally speaking I've tried to only depend on Amazon services like S3.
In our case, we then also use AWS CodeDeploy to actually install the Rails application itself.
The key bits of the config relating to the above are:
S3 Bucket Creation
Script that copies things to S3
Script to copy Bootstrap Ansible. This is the core of the process. This also writes the Ansible fact files based on the CloudFormation parameters.
Use the Facts in the template.