Load data from CSV file to AWS Aurora Serverless (PostgreSQL) db - postgresql

Scenario:
We have a source SQL database table which gets updated every 24 hours. I am designing an automated process which would export that table to CSV file to an EC2 instance after the update of the source DB happens.
Problem:
I am trying to figure out what would be the best way to load a CSV file containing DB records from a table exported with bcp command-line utility to Aurora Serverless PostgreSQL database.
My current plan is to generate a bunch of insert statements from that CSV file using a script
Then use the AWS CLI on the EC2 Linux instance to talk to the Aurora DB and execute the following:
// empty the table
AWS rds-data execute-statement --transaction-id $ID --database users --sql "delete from mytable"
Use the Data API feature of Aurora Serverless to run a transaction such as:
$ $ID=`aws rds-data begin-transaction --database users --output json | jq .transactionId`
// populate the table with latest data
$ aws rds-data execute-statement --transaction-id $ID --database users --sql "insert into mytable values (value1,value2)"
$ aws rds-data execute-statement --transaction-id $ID --database users --sql "insert into mytable values (value1,value2)"
$ ...
$ aws rds-data commit-transaction $ID
Is there a better way to load that CSV file to the Aurora DB? Or I should stick with the above solution.
Note:
I found that article on AWS docs - "Loading data into an Amazon Aurora MySQL DB cluster from text files in an Amazon S3 bucket" but it explicitly states that This feature currently isn't available for Aurora Serverless clusters.

Related

Use Terraform on Google Cloud SQL Postgres to create a Replication Slot

Overall I'm trying to create a Datastream Connection to a Postgres database in Cloud SQL.
As I'm trying to configure it all through Terraform, I'm stuck on how I should create a Replication Slot. This guide explains how to do it through the Postgres Client and running SQL commands, but I thought there might be a way to do it in the Terraform configuration directly.
Example SQL that I would like to replicate in Terraform:
ALTER USER [CURRENT_USER] WITH REPLICATION;
CREATE PUBLICATION [PUBLICATION_NAME] FOR ALL TABLES;
SELECT PG_CREATE_LOGICAL_REPLICATION_SLOT('[REPLICATION_SLOT_NAME]', 'pgoutput');
If not, does anyone know how to run the Postgres SQL commands against the Cloud SQL database through Terraform?
I have setup the Datastream and Postgres connection for all other parts. I'm expecting that there is a Terraform setting I'm missing or a way to run Postgres commands against the Google Cloud SQL Postgres database.
Unfortunately, there is no terraform resource for specifying a replication slot on a google_sql_database_instance.

loading one table from RDS / postgres into Redshift

We have a Redshift cluster that needs one table from one of our RDS / postgres databases. I'm not quite sure the best way to export that data and bring it in, what the exact steps should be.
In piecing together various blogs and articles the consensus appears to be using pg_dump to copy the table to a csv file, then copying it to an S3 bucket, and from there use the Redshift COPY command to bring it in to a new table-- that's my high level understanding, but am not sure what the command line switches should be, or the actual details. Is anyone doing this currently and if so, is what I have above the 'recommended' way to do a one-off import into Redshift?
It appears that you want to:
Export from Amazon RDS PostgreSQL
Import into Amazon Redshift
From Exporting data from an RDS for PostgreSQL DB instance to Amazon S3 - Amazon Relational Database Service:
You can query data from an RDS for PostgreSQL DB instance and export it directly into files stored in an Amazon S3 bucket. To do this, you use the aws_s3 PostgreSQL extension that Amazon RDS provides.
This will save a CSV file into Amazon S3.
You can then use the Amazon Redshift COPY command to load this CSV file into an existing Redshift table.
You will need some way to orchestrate these operations, which would involve running a command against the RDS database, waiting for it to finish, then running a command in the Redshift database. This could be done via a Python script that connects to each database (eg via psycopg2) in turn and runs the command.

How to export result set as CSV from Aurora Postgres DB to AWS-S3?

As part of my Flask and Celery application, I'm trying to move data from AWS-Aurora Postgres DB to Redshift.
I'll be running this application in Kubernetes.
My approach is to query the aurora Postgres database and write the result set to a CSV file which is saved on to an attached volume and then upload it to S3 and then import the file into Redshift.
However, I came across another article which lets us directly upload the result set as a CSV file to S3 instead of having an intermediate volume.
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.SaveIntoS3.html
They've mentioned the usage of OUTFILE command. But it's mentioned about MySQL. But they haven't mentioned anything about Postgres DB.
Is it even possible to use the command on Aurora Postgres DB and export to S3.
If you can connect to the database with psql, you can use the \copy command to export the output from any select statement to a csv:
https://codeburst.io/two-handy-examples-of-the-psql-copy-meta-command-2feaefd5dd90
https://dba.stackexchange.com/questions/7651/postgres-client-copy-copy-command-doesnt-have-access-to-a-temporary-table
Yes, Aurora runs on MySQL so you can use the outfile command. Did you even try running a query with outfile?

Can we load CSV data from S3 to Redshift and RDS using same COPY command?

I'm working on a class that processes CSV files into Redshift and also Postgres DB.
For copying data into Redshift, I've created a function - process_files that takes a processingID, searches the files that contain pattern like redshift{processingID}-*.csv in the local directory and upload and copy the CSV files from S3 to Redshift using COPY command as mentioned in https://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html.
like
copy customer
from 's3://mybucket/mydata'
access_key_id '<access-key-id>'
secret_access_key '<secret-access-key';
I'm using Psycopg2 to connect to Redshift.
But I also need to load the data that is present in S3 to another Postgres DB not the Redshift.
The only difference is the pattern for the new Postgres DB, the pattern of files will be like
prediction{processingID}-*.csv. Here also I'm using Psycopg2 to connect to new Postgres DB.
Can I use the same COPY command that I used for copying from S3 to Redshift for copying from S3 to the new Postgres DB?
Can I execute the same COPY command on the new Postgres DB connection and just change the file and table names for loading data? Will it work?

how to import data files from s3 to postgresql rds

I am very new to AWS, and Postgresql.
I have created a Postgresql db (using rds on aws)
I have uploaded several documents to multiple s3 buckets
I have a EC2 (Amazon Linux 64 bit) running
I tried to use a data pipeline, but nothing seems to be available (template) for Postgres. I can't figure out how to connect to my RDS instance and import/export data from postgres.
I assumed that I could use EC2 to grab from my S3 bucket and import into Postgres in lieu of no data pipeline template being available. If it is possible I have no idea how.. Please advise if possible..
S3 -> RDS direct load is now possible for PostgreSQL Aurora and RDS PostgreSQL >= 11.1 as aws_s3 extension.
Amazon Aurora with PostgreSQL Compatibility Supports Data Import from Amazon S3
Amazon RDS for PostgreSQL Now Supports Data Import from Amazon S3
Parameters are similar to those of PostgreSQL COPY command
psql=> SELECT aws_s3.table_import_from_s3(
'table_name', '', '(format csv)',
'BUCKET_NAME', 'path/to/object', 'us-east-2'
);
Be warned that this feature does not work for older versions.
I wish AWS extends COPY command in RDS Postgresql as they did in Redshift. But for now they haven't and we have to do it by ourselves.
Install awscli on your EC2 box (it might have been installed by default)
Configure your awscli with credentials
Use aws s3 sync or aws s3 cp commmands to download from s3 to your local directory
Use psql command to \COPY the files into your RDS (requires \ to copy from client directory)
Example:
aws s3 cp s3://bucket/file.csv /mydirectory/file.csv
psql -h your_rds.amazonaws.com -U username -d dbname -c '\COPY table FROM ''file.csv'' CSV HEADER'
The prior answers have been superseded by more recent events at AWS.
There is now excellent support for S3-to-RDS-database loading via the Data Pipeline service (which can be used for many other data conversion tasks too, this is just one example).
This AWS article is for S3-to-RDS-MySQL. Should be very similar for RDS-Postgres.
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-template-copys3tords.html
if you can launch the psql client and connect to RDS on EC2 instance, you should be able to use the following command:
\copy customer_orders from 'myfile.csv' with DELIMITER ','