Are there any database administration tools that runs in AWS Lambda? - postgresql

Are there any tools for database administration that can be deployed in AWS Lambda? My usecase is i've aurora serverless running inside a vpc and I want an AWS Lambda function to be able to visualize, clear and delete datas so developers do not need to get inside bastion hosts everytime they need to clear a row.

There is Data API for Aurora Serverless which allows you to use regular AWS SDK (e.g. boto3) to query your databases in Aurora Serverless.

Related

Alert about DB creation on RDS/Aurora PostgreSQL

I have some Aurora PostgreSQL Clusters created on our AWS account. Because of some access issues (which we are working on already), there are several people in other teams who create random DB's on these Aurora Clusters and then we need to work on cleaning them up.
I wanted to check if there is a way to get alerted (via SNS Notifications etc.) whenever a new DB is created on these AWS Postgres clusters using some AWS Tools itself.
Thanks
You could do it using AWS Aurora Database Activity Streams, it will capture all database activity on the database and send it AWS Kinesis Data Stream and you could create a AWS Lambda function to read Kinesis Data Stream and identify the events needed (ex. create database)and finally send notification to AWS SNS from AWS Lambda code.
Another option is enable pgaudit on your AWS Aurora PostgreSQL, send logs to AWS CloudWatch, create AWS Lambda to read the events from AWS CloudWatch and send AWS Notification
Below you can find step by step on AWS Blog below.
Part 2: Audit Aurora PostgreSQL databases using Database Activity Streams and pgAudit

Best way to set up jupyter notebook project in AWS

My current project have the following structure:
Starts with a script in jupyter notebook which dowloads data from a CRM API to put in a local PostgressSql database I run with PgAdmin. After that it runs cluster analysis, return some scoring values, creates a table in database with the results and updates this values in the CRM with another API call. This process will take between 10 to 20 hours (the API only allows 400 requests per minute).
The second notebook reads the database, detects last update, runs api call to update database since the last call, runs kmeans analysis to cluster the data, compare results with the previous call, updates the new ones and the CRM via API. This second process takes less than 2 hours in my estimation and I want this script to run every 24 hours.
After testing, this works fine. Now I'm evaluating how to put this in production in AWS. I understand for the notebooks I need Sagemaker and from I have seen is not that complicated, my only doubt here is if I can call the API without implementing aditional code or need some configuration. My second problem is database. I don't understand the difference between RDS which is the one I think I have to use for this and Aurora or S3. My goal is to write the less code as possible, but a have try some tutorial of RDS like this one: [1]: https://www.youtube.com/watch?v=6fDTre5gikg&t=10s, and I understand this connect my local postgress to AWS but I can't find the data in the amazon page, only creates an instance?? and how to connect to it to analysis this data from SageMaker. My final goal is to run the notebooks in the cloud and connect to my postgres in the cloud. Just some orientation about how to use this tools would be appreciated.
I don't understand the difference between RDS which is the one I think I have to use for this and Aurora or S3
RDS and Aurora are relational databases fully managed by AWS. "Regular" RDS allows you to launch the existing popular databases such as MySQL, PostgreSQSL and other which you can launch at home/work as well.
Aurora is in-house, cloud-native implementation databases compatible with MySQL and PosrgreSQL. It can store the same data as RDS MySQL or PosrgreSQL, but provides a number of features not available for RDS, such as more read replicas, distributed storage, global databases and more.
S3 is not a database, but an object storage, where you can store your files, such as images, csv, excels, similarly like you would store them on your computer.
I understand this connect my local postgress to AWS but I can't find the data in the amazon page, only creates an instance??
You can migrate your data from your local postgress to RDS or Aurora if you wish. But RDS nor Aurora will not connect to your existing local database, as they are databases themselfs.
My final goal is to run the notebooks in the cloud and connect to my postgres in the cloud.
I don't see a reason why you wouldn't be able to connect to the database. You can try to make it work, and if you encounter difficulties you can make new question on SO with RDS/Aurora setup details.

AWS Aurora RDS PostgreSql create global database for existing cluster through cloud formation script

We already have a cluster and instance of Aurora PostgreSql in abc region. Now as part of disaster recovery strategy, we are trying to create a read replica in a xyz region.
I was able to create it manually by clicking on "Add Region" in AWS web console. As explained here.
As part of it, following as been created.
1. A global database to the existing cluster
2. Secondary region cluster
3. Secondary region instance.
Everything is fine. Now I have to implement this through cloud formation script.
My first question is, can we do this through Cloud formation script without losing data if primary cluster and instance already created ?
If possible, please share aws doc for cloud formation scripts.
Please see the other post on this subject: CloudFormation templates for Global Aurora Database
The API that is required for setting up the GlobalCluster is AWS::RDS::GlobalCluster and this is currently not listed in CloudFormation documentation.
I was able to do the same using Terraform and that is documented for PostgreSQL here: Getting Aurora PostgreSQL Global Database setup using Terraform

Is there any tool to migrate RDS PostgreSQL from AWS to Google Cloud SQL?

I want to migrate AWS PostgreSQL to google cloud SQL. I can perform such by some basic strategy such as extract the AWS data, Create Database in GCP and Restore the extracted data in GCP. But I was wondering is there any more sophisticated way to so such as using terraform or similar.
Yes. See https://cloud.google.com/solutions/migrating-postgresql-to-gcp/
For migrating MySQL there are more options available, however at the time of the writing, these only apply to MySQL:
https://cloud.google.com/sql/docs/mysql/migrate-data
https://cloud.google.com/sql/docs/mysql/replication/replication-from-external

how to setup tables of aws aurora mysql using aws cloudformation or aws cdk?

how to setup tables of aws aurora mysql using aws cloudformation or aws cdk?
in my setup i have a serverless app using lambda for various microservices. the datebase is a serverless aurora mysql database. to provision the aws infrastructure i will use aws CDK. i like to setup the database using some migration tools like liquibase or sequelize.
for the moment i am using a separat lambda function. the lambda function executes liquibase to setup db changes. but i have to execute the function separately after deployment with CDK is succeded.
an execution triggered after the execution of the cloudformation stack (cdk stack) would be optimal?! I would like to avoid a CI / CD stack via code pipeline.
does anyone has best practice to setup database at provision time?
Cloud watch rules
Cloud watch rules based on cloudformation events can be used to route events for processing lambda. Cloud watch rules can be a part of the CDK deployment description.
The triggered function can then execute liquibase, flyway, sequelize or something else to spinup or change db.
---- or ----
Cloudformation custom resource
AWS cloudformation custom ressource can execute a lambda function during cloudformation lifecycle.
The triggered function can then execute liquibase, flyway, sequelize or something else to spinup or change db.
I use Cloudformation custom resources for running database migrations and initial database setup scripts at deployment time.
This is the recommended way for running DB migrations for serverless applications if you don't want to rely on a CI/CD pipeline to do it for you.
Here's a well written blog post by Alex DeBrie about CF custom resources: https://www.alexdebrie.com/posts/cloudformation-custom-resources/