Is there anyway could export redshift table data into a local file as dump via JDBC?
Thanks
The most performant way to extract data is not through JDBC but using the UNLOAD command to dump data into Amazon S3. After the data is extracted, you can then download it directly from S3, as needed. By using JDBC you will be limited by I/O and network performance. UNLOAD benefits from parallelization and very low-level, high-throughput connections between Redshift (EC2, actually) and S3.
Examples:
http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD_command_examples.html
The application you are using to connect to Redshift via JDBC would be responsible for saving the data to a "local file".
For example, it could send a SELECT * FROM <table> command to Redshift. The data is then returned to the application, which could then save it to disk.
Take a look at apps such as SQL Workbench/J, DBVisualizer and Aginity. They should all offer that capability.
Related
I have a big AWS RDS database that needs to be updated with data on a periodic basis. The data is in JSON files stored in S3 buckets.
This is my current flow:
Download all the JSON files locally
Run a ruby script to parse the JSON files to generate a CSV file matching the table in the database
Connect to RDS using psql
Use \copy command to append the data to the table
I would like switch this to an automated approach (maybe using an AWS Lambda). What would be the best practices?
Approach 1:
Run a script (Ruby / JS) that parses all folders in the past period (e.g., week) and within the parsing of each file, connect to the RDS db and execute an INSERT command. I feel this is a very slow process with constant writes to the database and wouldn't be optimal.
Approach 2:
I already have a Ruby script that parses local files to generate a single CSV. I can modify it to parse the S3 folders directly and create a temporary CSV file in S3. The question is - how do I then use this temporary file to do a bulk import?
Are there any other approaches that I have missed and might be better suited for my requirement?
Thanks.
I set up a regular Postgres DB in AWS using the Amazon Relational Database Service (RDS). I would like to ingest this data using data wrangler for inspection and further processing.
Is this possible? I only see S3, Athena, Redshift and SnowFlake as the data ingestion options. Does this mean I must move the data from Postgresql to one of these 4 options before I can use Data Wrangler?
If it's not possible through data wrangler, can I connect to my Postgres through a Jupyter notebook, using a connection string or some kind of option like this? I'm looking to use the data for the SageMaker Feature Store.
Using Amazon RDS with DataWrangler is not possible (yet).
That said, you can connect to Postgres in Jupyter using Python and then use the Feature Store API to ingest data and use it for subsequent tasks.
Source: AWS employee
We have a Redshift cluster that needs one table from one of our RDS / postgres databases. I'm not quite sure the best way to export that data and bring it in, what the exact steps should be.
In piecing together various blogs and articles the consensus appears to be using pg_dump to copy the table to a csv file, then copying it to an S3 bucket, and from there use the Redshift COPY command to bring it in to a new table-- that's my high level understanding, but am not sure what the command line switches should be, or the actual details. Is anyone doing this currently and if so, is what I have above the 'recommended' way to do a one-off import into Redshift?
It appears that you want to:
Export from Amazon RDS PostgreSQL
Import into Amazon Redshift
From Exporting data from an RDS for PostgreSQL DB instance to Amazon S3 - Amazon Relational Database Service:
You can query data from an RDS for PostgreSQL DB instance and export it directly into files stored in an Amazon S3 bucket. To do this, you use the aws_s3 PostgreSQL extension that Amazon RDS provides.
This will save a CSV file into Amazon S3.
You can then use the Amazon Redshift COPY command to load this CSV file into an existing Redshift table.
You will need some way to orchestrate these operations, which would involve running a command against the RDS database, waiting for it to finish, then running a command in the Redshift database. This could be done via a Python script that connects to each database (eg via psycopg2) in turn and runs the command.
I want to backup entire redshift cluster, such that I can use it in other databases like mysql or hadoop in future.
I was looking up and creating a manual screenshot seems to be an option but I guess that wont work for cross database languages.
So what would be the detailed steps to backup the entire cluster of redshift
Cluster backups can be done via the aws console, however these can only be restored to another redshift cluster.
Because Redshift is not the same as postgres in many ways, it will be inpossible / tricky to use standard tools like pg_dump and pg_restore.
I think that your best option is to :
extract the ddl from the Redshift tables that you wish to create elsewhere, most ide's have a simple way to do this.
modify the ddl to work with your target database (e.g. postgres will
be easy, mysql harder)
copy the contents of the Redshift database, one table at a time to s3 using
the unload command
import the data that you unloaded in step 3 to your target tables
Is there a way to backup or restore a specific schema or table on a Cloud SQL server? Backing up the entire set of data, but then being able to restore only certain schemas or tables would be very helpful for multi-tenant systems.
Not via backup/restore. You can export a specific schema or table to Google Cloud Storage, but that's probably not what you're looking for.
The Cloud SQL use MySQL database with some limitations. You can check out the unsupported features and functions at the following link:
https://cloud.google.com/sql/faq#supportmysqlfeatures
With this in mind any backup/restore tool that are being used for MySQL should work for Google Cloud SQL as well. Using mysqldump for import/export of Cloud SQL is covered at this document:
https://cloud.google.com/sql/docs/import-export
You can use mysqldump to backup specific table or backup entire database and restore only specific tables using the solutions offered in these links:
Can I restore a single table from a full mysql mysqldump file?
How to take backup of a single table in a MySQL database?
You can restore the backup to a different instance and then replace the data on the original instance with the data from the backup instance.