Alert about DB creation on RDS/Aurora PostgreSQL - postgresql

I have some Aurora PostgreSQL Clusters created on our AWS account. Because of some access issues (which we are working on already), there are several people in other teams who create random DB's on these Aurora Clusters and then we need to work on cleaning them up.
I wanted to check if there is a way to get alerted (via SNS Notifications etc.) whenever a new DB is created on these AWS Postgres clusters using some AWS Tools itself.
Thanks

You could do it using AWS Aurora Database Activity Streams, it will capture all database activity on the database and send it AWS Kinesis Data Stream and you could create a AWS Lambda function to read Kinesis Data Stream and identify the events needed (ex. create database)and finally send notification to AWS SNS from AWS Lambda code.
Another option is enable pgaudit on your AWS Aurora PostgreSQL, send logs to AWS CloudWatch, create AWS Lambda to read the events from AWS CloudWatch and send AWS Notification
Below you can find step by step on AWS Blog below.
Part 2: Audit Aurora PostgreSQL databases using Database Activity Streams and pgAudit

Related

Are there any database administration tools that runs in AWS Lambda?

Are there any tools for database administration that can be deployed in AWS Lambda? My usecase is i've aurora serverless running inside a vpc and I want an AWS Lambda function to be able to visualize, clear and delete datas so developers do not need to get inside bastion hosts everytime they need to clear a row.
There is Data API for Aurora Serverless which allows you to use regular AWS SDK (e.g. boto3) to query your databases in Aurora Serverless.

Can I configure AWS RDS to only stream INSERT operations to AWS DMS?

My requirement is to stream only INSERTs on a specific table in my db to a Kinesis data stream.
I have configured this pipeline in my AWS environment:
RDS Postgres 13 -> DMS (Database Migration Service) -> KDS (Kinesis Data Stream)
This setup works correctly but it processes all changes, even UPDATEs and DELETEs, on my source table.
What I've tried:
Looking for config options in the Postgres logical decoding plugin. DMS uses the test_decoding PG plugin which does not accept options to include/exclude data changes by operation type.
Looking at the DMS selection and filtering rules. Still didn't see anything that might help.
Of course I could simply ignore records originated from non-INSERT operations in my Kinesis consumer, but this doesn't look like a cost-efficient implementation.
Is there any way to meet my requirements using these AWS services (RDS -> DMS -> Kinesis)?
Well DMS does not have this capability .
If you want only INSERT to be send to Kinesis in that case you can have a lambda function on every INSERT of RDS .
Lambda function can be configured as trigger for INSERT .
You can invoke lambda only for INSERT and write to Kinesis directly .
Cost wise also this will be less .
In DMS you are paying for Replication instance even when not in use .
For detailed reference Stream changes from Amazon RDS for PostgreSQL using Amazon Kinesis Data Streams and AWS Lambda

Best way to set up jupyter notebook project in AWS

My current project have the following structure:
Starts with a script in jupyter notebook which dowloads data from a CRM API to put in a local PostgressSql database I run with PgAdmin. After that it runs cluster analysis, return some scoring values, creates a table in database with the results and updates this values in the CRM with another API call. This process will take between 10 to 20 hours (the API only allows 400 requests per minute).
The second notebook reads the database, detects last update, runs api call to update database since the last call, runs kmeans analysis to cluster the data, compare results with the previous call, updates the new ones and the CRM via API. This second process takes less than 2 hours in my estimation and I want this script to run every 24 hours.
After testing, this works fine. Now I'm evaluating how to put this in production in AWS. I understand for the notebooks I need Sagemaker and from I have seen is not that complicated, my only doubt here is if I can call the API without implementing aditional code or need some configuration. My second problem is database. I don't understand the difference between RDS which is the one I think I have to use for this and Aurora or S3. My goal is to write the less code as possible, but a have try some tutorial of RDS like this one: [1]: https://www.youtube.com/watch?v=6fDTre5gikg&t=10s, and I understand this connect my local postgress to AWS but I can't find the data in the amazon page, only creates an instance?? and how to connect to it to analysis this data from SageMaker. My final goal is to run the notebooks in the cloud and connect to my postgres in the cloud. Just some orientation about how to use this tools would be appreciated.
I don't understand the difference between RDS which is the one I think I have to use for this and Aurora or S3
RDS and Aurora are relational databases fully managed by AWS. "Regular" RDS allows you to launch the existing popular databases such as MySQL, PostgreSQSL and other which you can launch at home/work as well.
Aurora is in-house, cloud-native implementation databases compatible with MySQL and PosrgreSQL. It can store the same data as RDS MySQL or PosrgreSQL, but provides a number of features not available for RDS, such as more read replicas, distributed storage, global databases and more.
S3 is not a database, but an object storage, where you can store your files, such as images, csv, excels, similarly like you would store them on your computer.
I understand this connect my local postgress to AWS but I can't find the data in the amazon page, only creates an instance??
You can migrate your data from your local postgress to RDS or Aurora if you wish. But RDS nor Aurora will not connect to your existing local database, as they are databases themselfs.
My final goal is to run the notebooks in the cloud and connect to my postgres in the cloud.
I don't see a reason why you wouldn't be able to connect to the database. You can try to make it work, and if you encounter difficulties you can make new question on SO with RDS/Aurora setup details.

AWS Aurora RDS PostgreSql create global database for existing cluster through cloud formation script

We already have a cluster and instance of Aurora PostgreSql in abc region. Now as part of disaster recovery strategy, we are trying to create a read replica in a xyz region.
I was able to create it manually by clicking on "Add Region" in AWS web console. As explained here.
As part of it, following as been created.
1. A global database to the existing cluster
2. Secondary region cluster
3. Secondary region instance.
Everything is fine. Now I have to implement this through cloud formation script.
My first question is, can we do this through Cloud formation script without losing data if primary cluster and instance already created ?
If possible, please share aws doc for cloud formation scripts.
Please see the other post on this subject: CloudFormation templates for Global Aurora Database
The API that is required for setting up the GlobalCluster is AWS::RDS::GlobalCluster and this is currently not listed in CloudFormation documentation.
I was able to do the same using Terraform and that is documented for PostgreSQL here: Getting Aurora PostgreSQL Global Database setup using Terraform

AWS DMS - Scheduled DB Migration

I have Postgresql db in RDS. I need to fetch data from a bunch of tables in postgresql db and push data into a S3 bucket every hour. I only want the delta changes (any new inserts / updates) to be sent in the hourly. Is it possible to do this using DMS or is EMR a better tool for performing this activity?
You can create an automated environment of migration data from RDS to S3 using AWS DMS (Data Migration Service) tasks.
Create a source endpoint (reading your RDS database - Postgres, MySQL, Oracle, etc...);
Create a target endpoint using S3 as an engine endpoint (read it: Using Amazon S3 as a Target for AWS Database Migration Service);
Create a replication instance, responsible to make a bridge between source data and target endpoint (you will only pay while processing);
Create a database migration task using the option 'Replication data change only' on migration type field;
Create a cron lambda, which starts a DMS task, with stack Python following these instructions of this articles Lambda with scheduled events e Start DMS tasks with boto3 in Python.
Connecting these resources above you may can have what you want.
Regards,
Renan S.