How to pipe data from AWS Postgres RDS to S3 (then Redshift)? - postgresql

I'm using AWS data pipeline service to pipe data from a RDS MySql database to s3 and then on to Redshift, which works nicely.
However, I also have data living in an RDS Postres instance which I would like to pipe the same way but I'm having a hard time setting up the jdbc-connection. If this is unsupported, is there a work-around?
"connectionString": "jdbc:postgresql://THE_RDS_INSTANCE:5432/THE_DB”

Nowadays you can define a copy-activity to extract data from a Postgres RDS instance into S3. In the Data Pipeline interface:
Create a data node of the type SqlDataNode. Specify table name and select query
Setup the database connection by specifying RDS instance ID (the instance ID is in your URL, e.g. your-instance-id.xxxxx.eu-west-1.rds.amazonaws.com) along with username, password and database name.
Create a data node of the type S3DataNode
Create a Copy activity and set the SqlDataNode as input and the S3DataNode as output

this doesn't work yet. aws hasnt built / released the functionality to connect nicely to postgres. you can do it in a shellcommandactivity though. you can write a little ruby or python code to do it and drop that in a script on s3 using scriptUri. you could also just write a psql command to dump the table to a csv and then pipe that to OUTPUT1_STAGING_DIR with "staging: true" in that activity node.
something like this:
{
"id": "DumpCommand",
"type": "ShellCommandActivity",
"runsOn": { "ref": "MyEC2Resource" },
"stage": "true",
"output": { "ref": "S3ForRedshiftDataNode" },
"command": "PGPASSWORD=password psql -h HOST -U USER -d DATABASE -p 5432 -t -A -F\",\" -c \"select blah_id from blahs\" > ${OUTPUT1_STAGING_DIR}/my_data.csv"
}
i didn't run this to verify because it's a pain to spin up a pipeline :( so double check the escaping in the command.
pros: super straightforward and requires no additional script files to upload to s3
cons: not exactly secure. your db password will be transmitted over the wire without encryption.
look into the new stuff aws just launched on parameterized templating data pipelines: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-custom-templates.html. it looks like it will allow encryption of arbitrary parameters.

AWS now allow partners to do near real time RDS -> Redshift inserts.
https://aws.amazon.com/blogs/aws/fast-easy-free-sync-rds-to-redshift/

Related

AWS mirgate data from MongoDB to DynamoDB/S3/Redshift

The issue is that mirgating data from MongoDB to DynamoDB/S3/Redshift currently, as I unterstand for us is not available via AWS DMS Service, as it does not support all data types. Or maybe I'm wrong.
The probelm is that our Mongo object contain not scalar fields(arrays, maps).
So when I make a mirgation task via AWS DMS with table mode, it pull data badly.Buy some reason only selection works. Transformation rules are ignored by DMS(tried renaming and removing).
In the doc mode is all ok, but how can I run migration with some custom script for transformation? As storing data this way still need transformation.
We need some modifications like: rename, remove fields and flatting some fields(for example we ahve a map object and it should be flatten into several scalar fields).
Migration should be done into one of the sources: S3, Dyanamo, Redshift
Will be thankfull for any help and suggestions.
use the following below script to take a backup of the MongoDB DB
mongodump -h localhost:27017 -d my_db_name -o $DEST
use the below command to sync your backup to S3 bucket
aws s3 sync ~/db_backups s3://my-bucket-name
Once your data in S3, you can load very easily to Redshift using copy command

loading one table from RDS / postgres into Redshift

We have a Redshift cluster that needs one table from one of our RDS / postgres databases. I'm not quite sure the best way to export that data and bring it in, what the exact steps should be.
In piecing together various blogs and articles the consensus appears to be using pg_dump to copy the table to a csv file, then copying it to an S3 bucket, and from there use the Redshift COPY command to bring it in to a new table-- that's my high level understanding, but am not sure what the command line switches should be, or the actual details. Is anyone doing this currently and if so, is what I have above the 'recommended' way to do a one-off import into Redshift?
It appears that you want to:
Export from Amazon RDS PostgreSQL
Import into Amazon Redshift
From Exporting data from an RDS for PostgreSQL DB instance to Amazon S3 - Amazon Relational Database Service:
You can query data from an RDS for PostgreSQL DB instance and export it directly into files stored in an Amazon S3 bucket. To do this, you use the aws_s3 PostgreSQL extension that Amazon RDS provides.
This will save a CSV file into Amazon S3.
You can then use the Amazon Redshift COPY command to load this CSV file into an existing Redshift table.
You will need some way to orchestrate these operations, which would involve running a command against the RDS database, waiting for it to finish, then running a command in the Redshift database. This could be done via a Python script that connects to each database (eg via psycopg2) in turn and runs the command.

How to export result set as CSV from Aurora Postgres DB to AWS-S3?

As part of my Flask and Celery application, I'm trying to move data from AWS-Aurora Postgres DB to Redshift.
I'll be running this application in Kubernetes.
My approach is to query the aurora Postgres database and write the result set to a CSV file which is saved on to an attached volume and then upload it to S3 and then import the file into Redshift.
However, I came across another article which lets us directly upload the result set as a CSV file to S3 instead of having an intermediate volume.
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.SaveIntoS3.html
They've mentioned the usage of OUTFILE command. But it's mentioned about MySQL. But they haven't mentioned anything about Postgres DB.
Is it even possible to use the command on Aurora Postgres DB and export to S3.
If you can connect to the database with psql, you can use the \copy command to export the output from any select statement to a csv:
https://codeburst.io/two-handy-examples-of-the-psql-copy-meta-command-2feaefd5dd90
https://dba.stackexchange.com/questions/7651/postgres-client-copy-copy-command-doesnt-have-access-to-a-temporary-table
Yes, Aurora runs on MySQL so you can use the outfile command. Did you even try running a query with outfile?

Trying to transfer csv file from EC2 to PostgreSQL RDS; get FATAL psql init file error

I'm fairly new to AWS in general. I'm currently trying to replicate work by another group and therefore am attempting to mimic their setup. I've established an EC2 instance (Amazon Linux AMI) and a PostgreSQL 9.3.5 RDS instance. I've uploaded a 4 GB csv file to EC2 and would like to copy it to a table in my RDS db. I used the following code within the EC2 shell (following 2nd set of instructions here):
psql -h XX.us-west-2.rds.amazonaws.com -U username -d DBname -p 5432 -c "\copy tablename from 'data.csv' with DELIMITER ',';"
After giving my password I get the error "psql: FATAL: could not write init file". I think this psql client may be version 9.2, is that something that matters? Or is this the wrong syntax for this type of transfer? Or, could it be related to having free trial size instances, which I believe have a 5 GB limit? I think I should be under that limit, but would it tell me if that were the problem? Any help would be much appreciated.

how to import data files from s3 to postgresql rds

I am very new to AWS, and Postgresql.
I have created a Postgresql db (using rds on aws)
I have uploaded several documents to multiple s3 buckets
I have a EC2 (Amazon Linux 64 bit) running
I tried to use a data pipeline, but nothing seems to be available (template) for Postgres. I can't figure out how to connect to my RDS instance and import/export data from postgres.
I assumed that I could use EC2 to grab from my S3 bucket and import into Postgres in lieu of no data pipeline template being available. If it is possible I have no idea how.. Please advise if possible..
S3 -> RDS direct load is now possible for PostgreSQL Aurora and RDS PostgreSQL >= 11.1 as aws_s3 extension.
Amazon Aurora with PostgreSQL Compatibility Supports Data Import from Amazon S3
Amazon RDS for PostgreSQL Now Supports Data Import from Amazon S3
Parameters are similar to those of PostgreSQL COPY command
psql=> SELECT aws_s3.table_import_from_s3(
'table_name', '', '(format csv)',
'BUCKET_NAME', 'path/to/object', 'us-east-2'
);
Be warned that this feature does not work for older versions.
I wish AWS extends COPY command in RDS Postgresql as they did in Redshift. But for now they haven't and we have to do it by ourselves.
Install awscli on your EC2 box (it might have been installed by default)
Configure your awscli with credentials
Use aws s3 sync or aws s3 cp commmands to download from s3 to your local directory
Use psql command to \COPY the files into your RDS (requires \ to copy from client directory)
Example:
aws s3 cp s3://bucket/file.csv /mydirectory/file.csv
psql -h your_rds.amazonaws.com -U username -d dbname -c '\COPY table FROM ''file.csv'' CSV HEADER'
The prior answers have been superseded by more recent events at AWS.
There is now excellent support for S3-to-RDS-database loading via the Data Pipeline service (which can be used for many other data conversion tasks too, this is just one example).
This AWS article is for S3-to-RDS-MySQL. Should be very similar for RDS-Postgres.
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-template-copys3tords.html
if you can launch the psql client and connect to RDS on EC2 instance, you should be able to use the following command:
\copy customer_orders from 'myfile.csv' with DELIMITER ','