How to run Redshift copy command from EC2 - postgresql

I have my log files on EC2 instance and want to load it to Redshift. Two questions:
Do I have to copy this log file to S3 before proceeding or can I directly copy from my EBS Volume.
I can see I can use copy command from SQL Workbench or Data Pipeline. But can I use it from my EC2 instance itself ? Which AWS CLI I need to install?
http://docs.aws.amazon.com/cli/latest/reference/redshift/ does
not list copy command

Not really. Redshift allows you to copy from a remote host, which, in your case, would be your EC2 instance. Documentation here.
The link you've referred to provides cluster management commands. To run SQL queries on your cluster, you can use the psql tool. Documentation here.

you can copy the data directly from EC2, but my recommendation is to save it first on S3 , also for a backup

All the documentation available online was confusing me. Finally the solution was that I wrote a simple Java file with DriverManager.getConnection() and calling copy command via stmt.executeUpdate() and it worked seamlessly. Only executeUpdate() did not return me number of records Inserted.

Related

Can DB2REMOTE be used to point a file from another server?

Using the script below, I was able to load the data to the table with local files.
db2 load from SOME/LOCAL/File.txt of asc modified by reclen=123 method L \(1 11, 12 14\) REPLACE INTO schema.tablename
However, I want to achieve to load the file from another server. I don't want to transfer the files from another server to db2 server so I will be able to use the command as above. Found that DB2REMOTE can be used for remotefiles in this documentation, but I'm not sure how to execute it with success.
Do I need to do this also? Because I don't have the right IAM role and don't have the credentials to do so. If I just can skip this and proceed to connect with another server only.
This is the script I'm trying with DB2REMOTE:
db2 load from 'DB2REMOTE://centos#123.456.789.0:/folders/directory/file.txt' of asc modified by reclen=123 method L \(1 11, 12 14\) REPLACE INTO schema.tablename
Thank you in advance!
DB2REMOTE is for accessing cloud object storage (e.g Amazon S3, IBM Cloud Object Storage), from some Db2 commands.
If you are not using cloud object storage, then mount the remote directory locally with appropriate permissions, and specify the local mountpoint with the Db2 load command .
You can remote mount with SSHFS or similar, when installed and properly configured. This is not programming , but instead it is administration and configuration.

How to import sql file in Google SQL with binary mode enabled?

I have a database that is giving error:
ASCII '\0' appeared in the statement, but this is not allowed unless option --binary-mode is enabled and mysql is run in non-interactive mode. Set --binary-mode to 1 if ASCII '\0' is expected.
I'm including importing the database through the console with gcloud sql import sql mydb gs://my-path/mydb.sql --database=mydb but I don't see in the documentation any flags for binary mode. Is it possible at all?
Optional - is there a way to set this flag when importing through the MySQL Workbench. I haven't seen anything about it there too, but may be I'm missing some setting or something. If there is way to set that flag, then I can import my database through MySQL Workbench.
Thank you.
Depending where the source database is hosted, on Cloud SQL or on an on-premise environment, the proper flags are set during the export, so the dump file is compatible with the target database.
Since you would like to import a file that has been exported from an on-premise environment, mysqldump is the suggested way to perform the export.
First, create a dump file as suggested in the documentation. Make sure to pay attention to the following 2 points:
Do not export customer-created MySQL users. This will cause the import to the new instance to fail. Instead, manually create the MySQL users you wish to.
Make sure that you have configured the appropriate flags in order to make sure that the dump file will contain all the necessary details you need. Eg triggers, stored procedures etc.
Then, create a Cloud Storage Bucket and upload the dump file to the bucket.
Before proceeding with the import, grant the Storage Object Admin role to the service account of the target Cloud SQL instance. You may do that with the following command:
gsutil iam ch serviceAccount:[SERVICE-ACCOUNT]:objectAdmin gs://[BUCKET-NAME]
You may locate the aforementioned Service Account in the Cloud SQL instance Overview, or by running the following command:
gcloud sql instances describe [INSTANCE_NAME]
The service account will be mentioned at the serviceAccountEmailAddress field.
Now you are able to do the import either from Console, or using the gcloud command or a REST API.
More details in Google documentation
Best Practices for importing/exporting data

How to load data from S3 to PostgreSQL RDS

I have a need to load data from S3 to Postgres RDS (around 50-100 GB) I don't have the option to use AWS Data Pipeline and I am looking for something similar to using the COPY command to load data in S3 into Amazon Redshift.
I would appreciate any suggestions on how I can accomplish this.
Originally, this answer was trying to use the S3 to Postgres RDS Functionality. That whole enterprise failed (see below).
The way I have finally been able to do this is:
Set-up an EC2 instance with psql installed (see below near end of post)
Copy the relevant CSVs to import from S3 to the local instance
Use the psql /copy command to import the files up
This last part is really, really important. If you use the SQL COPY command the entire RDS Postgres role structure will frustrate you to no end. It has a wonky SUPERRDSADMIN role which is not very super at all. However, if you use the psql /copy commany you apparently can do anything. I have confirmed this be the case and have started my uploads succesfully. I will come back and re-edit this post (time permitting) to add relevant documentation steps for the above.
Caveat Emptor: The post below was all the original work I had done trying to get this implemented. I don't want to bury the lead despite multiple efforts (including what can only be described as pathetic tech support from AWS) I don't believe that this feature is ready for prime time. Despite a very simple test environment, easy to replicate, AWS has not provided an effective way to not get the copy statement to crap out as follows:
The actual call to aws_s3.table_import_from_s3(...) is reporting a permission problem between RDS and S3. From my research work with psql this appears to be a C library, probably installed by AWS.
NOTICE: CURL error code: 28 when attempting to validate pre-signed URL, 1 attempt(s) remaining
NOTICE: HINT: make sure your instance is able to connect with S3.
S3 to Postgres RDS Functionality Now Added
On 2019-04-24 AWS released functionality allowing a Postgres RDS to load directly from S3. You can read the announcement here, and see the documentation page here.
I am sharing with the OP because this appears to be the AWS supported way of solving the question posed.
Key summary points:
Requires Postgres 11.1 or greater
Need access to psql and the ability to connect it to the RDS instance
Need to install the aws_s3 extension which pulls in aws_commons.
You can get to the S3 bucket by specifying credentials or by assigning IAM roles to RDS
It advertises supporting all of the same data formats as the postgres COPY command
It currently only appears to support a single file at a time (ie no regex)
The instructions are fairly detailed and provide a variety of paths to configuring (AWS CLI scripts, Console instructions, etc). Additionally, the option to use your IAM keys rather than have to set-up roles is nice.
I did not find a way to download just psql, so I had to bring down a full postgres install down to my mac, but that was no big deal with brew:
brew install postgres
and since the DB service does not get activated it is the quickest way to get psql.
Update: Decided that having psql on my mac was a security hole, port forwarding, etc. I found that there is a simple Postgres install available for AMI Linux 2 under the AMI Extras rubric. The install command is fairly simple on your ami instance type.
sudo amazon-linux-extras install postgresql10
psql is fairly easy to use, however, important to keep in mind that any instructions to psql itself are escaped by a \. Documentation on psql can be found here. Recommend going through it at least once before executing the AWS recommended scripts.
To the extent you run tight security and have access to your RDS instances seriously restricted (which I do) don't forget to open up the ports from your AMI instance running Postgres to your RDS instance.
If your preference is a GUI then you can try to use PGAdmin4. It is the AWS recommended way of connecting to RDS Postgres instances according to the docs. I was unable to get any of the SSH tunneling features to work (which is why I ended up doing the localhost SSH mapping that I used for psql). I also found it to be rather buggy in other ways. Reading reviews of the product it seems that version 4 may not be the stablest of releases.
http://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html
Use the COPY command to load a table in parallel from data files on
Amazon S3. You can specify the files to be loaded by using an Amazon
S3 object prefix or by using a manifest file.
The syntax to specify the files to be loaded by using a prefix is as
follows:
copy <table_name> from 's3://<bucket_name>/<object_prefix>'
authorization;
update
Another option is to mount s3 and use direct path to the csv with COPY command. I'm not sure If it will hold 100GB effectively, but worth of trying. Here is some list of options on software.
Yet another option would be "parsing" s3 file part by part with something described here to a file and COPY from named pipe, described here
And the most obvious option to just download file to local storage and use COPY I don't cover at all
Also worth of mentioning would be s3_fdw (status unstable). Readme is very laconic, but I assume you could create a foreign table leading to s3 file. Which itself means you can load data to other relation...

Get big(250Gb) RDS PostgreSQL db dump into my local machine

My problem is to get big(250Gb) postgres dump on my local machine.
Its on AWS RDS. I tried to dump it to local machine, but it takes too long, kinda 3+ days.
Trying to find a way to dump it into S3 and download from there safely. May be you could suggest more effective way to do that. Will appreciate any kind of help.
Thanks!
As of my knowledge, aws does not provide a way to backup db into s3
you can take a look into this question and answers,
Export huge database from amazon RDS to local mysql
here is one answer
If the data is that big I would suggest copying the RDS snapshot on S3, as explained here.
Link to documentation to copy snapshot to s3
This topic is covered in this StackOverflow thread Exporting a AWS Postgres RDS Table to AWS S3
Another solution would be to spin up an EC2 instance and dump the database to a local EBS volume that is large enough for the following steps. Then chose one of the following:
Compress the DB dump into multiple files and copy to S3 for download. I would use a smart S3 download manager given the size of the database dump.
Export the S3 data using Snowball Export S3 Data. If your Internet connection is not fast enough / reliable enough then Snowball will get you the data.

is there any way to create directory in data directory location of Amazon RDS PostgreSQL instance

AWS RDS PostgreSQL instance able to connect from another PostgreSQL client but not able to see data directory and configuration files .is there any way to edit/view data directory and configuration files
If you want to work with file system, use EC2 instances with postgres installed and configured as you wish. Neither postgres.conf, nor hba.conf cant be edited directly on file system.
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.html#Appendix.PostgreSQL.CommonDBATasks.Parameters
Instead use amazon provided interface to change supported parameters or use SET command where possible...