We have multiple PostgreSQL Instances in AWS RDS. We need to maintain an on-premise copy of each database to comply with our disaster recovery policy. I have been successful is using pg_dump and pg_restore to export the database schemas and tables to our on-premise server, but I have been unsuccessful in exporting the roles and tablespaces. I have found that this is only possible by using pg_dumpall, but as this requires super_user access, and that is not allowed in RDS, how can I export those aspects of the database to on our on-premise server?
My pg_dump command:
Pg_dump -h {AWS Endpoint} -U {Master Username}-p 5432 -F c -f C:\AWS_Backups\{filename}.dmp {database name}
My pg_restore command:
pg_restore -h {AWS Endpoint} -p 5432 -U {Master Username} -d {database name} {filename}.dmp
I have found multiple examples of people using pg_dump to export their PostgreSQL databases, however, they are not addressing the "Globals" that are ignored using pg_dump. Have I misread the documentation? After performing my pg_restore, my logins were not created on the database.
Any help you can provide on getting the FULL database (including globals) to our offsite location would be greatly appreciated.
UPDATE: My patch is now a part of Postgres v10+.
You can read about how this works here 3.
Earlier, I had also posted a working solution posted to my Github account. Then, you'd need to compile the binary and use that however, with the patch now a part of Postgres v10+, any pg_dumpall since that version now supports this feature.
You can read some more detailed inner workings here.
I haven't been able to find an answer to my question anywhere online. Just in case someone else may be experiencing this problem, I thought I would post a high-level outline of my "solution". I go around my elbow to get to my knee, but this is the option I have come up with:
Create a table (I created 2 - 1 for roles, and one for logins) in each PostgreSQL database within AWS. This table(s) will need to have all columns that you will need to dynamically create the SQL to do CREATE, GRANT, REVOKE, etc.
Insert all roles, logins, privileges, and permissions into this table. These are scattered everywhere, but here are the ones I used:
pg_auth_members (role and login relationships)
pg_roles (role and login permissions ie can login, inherit parent, etc)
information_schema.role_usage_grants (schema privileges)
information_schema.role_table_grants (table privileges)
information_schema.role_routine_grants (function privileges)
To fill in the gaps, there are clever queries on the web page below to use the built in functions to check for access. You will have to loop through the tables and process a row at a time
https://vibhorkumar.wordpress.com/2012/07/29/list-user-privileges-in-postgresqlppas-9-1/
Specifically, I used a variation of database_privs function
Once all of the data is in those tables, you can execute pg_dump, and it will extract that info from each database to your on-premise location. I did this through a Python script.
On your server, use the data in the tables to dynamically create the SQL statements needed to run the CREATE, GRANT, REVOKE, etc. scripts. Save in a .sql file that you can instruct a Python script to execute against the database and recreate the AWS roles and logins.
One thing I forgot to mention - because we are unable to access the pg_auth_id table in AWS, I have found no way to extract the passwords out of AWS. We are going to store these in a password manager, and when I create the CREATE ROLE statements, I'll pass a default to be updated.
I haven't completed the process, but it has taken me several days to track down a viable option to the absence of pg_dumpall's functionality. If anyone sees any flaws in my logic, or has a better solution, I'd love to read about it. Good luck!
Related
I would be happy to get your help / feedback re data load.
Goal:
Load source data from a Postgres database, which is located in GCP project A to another Postgres database, which is located in GCP project B.
Challenge:
Get a connection (I have an IAM account with sufficient rights to run a COPY TO / COPY FROM command) to the Postgres DB in GCP Project A and copy the table either to a CSV or create a dump that can be used in order to be inserted to another Postgres DB in GCP Project B.
How do I connect to the database (e.g. if I create a key, where shall I store the json keyfile and would that approach even be feasible?) with this IAM email account?
Other ways I've researched were to use psycopg2 (thus I could use the function cursor.copy_expert (which doesn’t need any superuser right or Postgres user credentials and copy the data), but I didn’t succeed in connecting to the database with psycopg2 due to challenges with cloud proxy.
Another idea was to use pg_dump or gcloud sql export csv.
I would be curious if some of you were facing a similar challenge and how did you solve it and what might be the best way/practice
You can have a try out database migration service. You can set up a continuous migration configuration and use Cloud SQL for PostgreSQL.
Hello after a lot of searching I've come to these solutions:
If you have continuous copy, you need to use the database migration service, check this documentation.
If you have one shot copy:
you can restore your instance, see the bottom page of this documentation
you can create a bucket and backup your instance on it, then import it from the other project
I have a need to load data from S3 to Postgres RDS (around 50-100 GB) I don't have the option to use AWS Data Pipeline and I am looking for something similar to using the COPY command to load data in S3 into Amazon Redshift.
I would appreciate any suggestions on how I can accomplish this.
Originally, this answer was trying to use the S3 to Postgres RDS Functionality. That whole enterprise failed (see below).
The way I have finally been able to do this is:
Set-up an EC2 instance with psql installed (see below near end of post)
Copy the relevant CSVs to import from S3 to the local instance
Use the psql /copy command to import the files up
This last part is really, really important. If you use the SQL COPY command the entire RDS Postgres role structure will frustrate you to no end. It has a wonky SUPERRDSADMIN role which is not very super at all. However, if you use the psql /copy commany you apparently can do anything. I have confirmed this be the case and have started my uploads succesfully. I will come back and re-edit this post (time permitting) to add relevant documentation steps for the above.
Caveat Emptor: The post below was all the original work I had done trying to get this implemented. I don't want to bury the lead despite multiple efforts (including what can only be described as pathetic tech support from AWS) I don't believe that this feature is ready for prime time. Despite a very simple test environment, easy to replicate, AWS has not provided an effective way to not get the copy statement to crap out as follows:
The actual call to aws_s3.table_import_from_s3(...) is reporting a permission problem between RDS and S3. From my research work with psql this appears to be a C library, probably installed by AWS.
NOTICE: CURL error code: 28 when attempting to validate pre-signed URL, 1 attempt(s) remaining
NOTICE: HINT: make sure your instance is able to connect with S3.
S3 to Postgres RDS Functionality Now Added
On 2019-04-24 AWS released functionality allowing a Postgres RDS to load directly from S3. You can read the announcement here, and see the documentation page here.
I am sharing with the OP because this appears to be the AWS supported way of solving the question posed.
Key summary points:
Requires Postgres 11.1 or greater
Need access to psql and the ability to connect it to the RDS instance
Need to install the aws_s3 extension which pulls in aws_commons.
You can get to the S3 bucket by specifying credentials or by assigning IAM roles to RDS
It advertises supporting all of the same data formats as the postgres COPY command
It currently only appears to support a single file at a time (ie no regex)
The instructions are fairly detailed and provide a variety of paths to configuring (AWS CLI scripts, Console instructions, etc). Additionally, the option to use your IAM keys rather than have to set-up roles is nice.
I did not find a way to download just psql, so I had to bring down a full postgres install down to my mac, but that was no big deal with brew:
brew install postgres
and since the DB service does not get activated it is the quickest way to get psql.
Update: Decided that having psql on my mac was a security hole, port forwarding, etc. I found that there is a simple Postgres install available for AMI Linux 2 under the AMI Extras rubric. The install command is fairly simple on your ami instance type.
sudo amazon-linux-extras install postgresql10
psql is fairly easy to use, however, important to keep in mind that any instructions to psql itself are escaped by a \. Documentation on psql can be found here. Recommend going through it at least once before executing the AWS recommended scripts.
To the extent you run tight security and have access to your RDS instances seriously restricted (which I do) don't forget to open up the ports from your AMI instance running Postgres to your RDS instance.
If your preference is a GUI then you can try to use PGAdmin4. It is the AWS recommended way of connecting to RDS Postgres instances according to the docs. I was unable to get any of the SSH tunneling features to work (which is why I ended up doing the localhost SSH mapping that I used for psql). I also found it to be rather buggy in other ways. Reading reviews of the product it seems that version 4 may not be the stablest of releases.
http://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html
Use the COPY command to load a table in parallel from data files on
Amazon S3. You can specify the files to be loaded by using an Amazon
S3 object prefix or by using a manifest file.
The syntax to specify the files to be loaded by using a prefix is as
follows:
copy <table_name> from 's3://<bucket_name>/<object_prefix>'
authorization;
update
Another option is to mount s3 and use direct path to the csv with COPY command. I'm not sure If it will hold 100GB effectively, but worth of trying. Here is some list of options on software.
Yet another option would be "parsing" s3 file part by part with something described here to a file and COPY from named pipe, described here
And the most obvious option to just download file to local storage and use COPY I don't cover at all
Also worth of mentioning would be s3_fdw (status unstable). Readme is very laconic, but I assume you could create a foreign table leading to s3 file. Which itself means you can load data to other relation...
I have a site that uses PostgreSQL. All content that I provide in my site is created at a development environment (this happens because it's webcrawler content). The only information created at the production environment is information about the users.
I need to find a good way to update data stored at production. May I restore to production only the tables updated at development environment and PostgreSQL will update this records at production or the best way would be to backup the users information at production, insert them at development and restore the whole database at production?
Thank you
You can use pg_dump to export the data just from the non-user tables in the development environment and pg_restore to bring that into prod.
The -t switch will let you pick specific tables.
pg_dump -d <database_name> -t <table_name>
https://www.postgresql.org/docs/current/static/app-pgdump.html
There are many tips arounds this subject here and here.
I'd suggest you to take a look on these links before everything.
If your data is discarded at each update process then a plain dump will be enough. You can redirect pg_dump output directly to psql connected on production to avoid pg_restore step, something like below:
#Of course you must drop tables to load it again
#so it'll be reasonable to make a full backup before this
pg_dump -Fp -U user -h host_to_dev -T=user your_db | psql -U user -h host_to_production your_db
You might asking yourself "Why he's saying to drop my tables"?
Bulk loading data on a fresh table is faster than deleting old data and inserting again. A quote from the docs:
Creating an index on pre-existing data is quicker than updating it incrementally as each row is loaded.
Ps¹: If you can't connect on both environment at same time then you need to do pg_restore manually.
Ps²: I don't recommend it but you can append --clean option on pg_dump to generate DROP statements automatically. Be extreme careful with this option to avoid dropping unnexpected objects.
I am trying to setup Postgres 9.3 on an Ubuntu 14 server, and I'm feeling pretty demoralised at this point. I've previously used MySQL, so I'm happy with general database concepts, as well as client/server models etc.
I start with two users - 'root' and 'sam' (me). As 'sam' I install postgresql using apt-get. This also creates a third user called 'postgres'.
Fine.
I'm told that to use postgres you must be logged in as the postgres user, so I switch to that account. Apparently this comes with a postgres admin role (I think I'm fine with the concept of roles per se), and apparently all roles have an associated database of the same name (?). So now I have a Linux account called postgres, a role called postgres, and a database called postgres? This all seems needless but I'm assuming it's useful for reasons I don't know about (not meant sarcastically - this is usually the case when things seem overly complicated at first).
So, to create a database, do I login to the server as postgres, start postgres by typing 'psql' (which doesn't ask for a password - why doesn't the postgres account have a password?) and proceed from there? Or should I create a new role? Does that role need its own Linux user? Should the role be the same name as the database I want to create?
I appreciate this is a bit of a jumble, but my confusion is such that I'm not even sure I understand the fundamentals here. I miss MySQL.
I've been mainly using the DigitalOcean tutorial for this - which are usually very good - but it didn't really make any of this clear. I also read the postgres docs (specifically the installation and users/roles sections) which didn't help, and the google results for this are even less helpful.
This is my last hope before I go back to the safety blanket of MySQL. Any suggestions for making this click?
OS usernames and Postgres DB usernames are not related; they live in seperate universes.
one exception: if you connect from the same machine via the unix-domain socket, and you don't explicitely specify a username, your OS name is assumed to be your DB-username, too. (which in most cases is not correct)
second exception: the "postgres" username is used both as an OS-username (owner of the files, uid of the running processes) and as the DBMS superuser.
Note: "root" is a bad name for a DB-user.
Can I connect to a Heroku Postgres database via an web/application without the risk of dropping a table?
I'm building a Heroku application for a third party which uses Heroku Postgres for the backend. The third party are very security sensitive so I'm looking at applying "Layered security" throughout the application. So for example checking for SQL injection attacks at the web/application layer. Applying a "Layered security" approach I should also secure the database in case a potential SQL injection attack is missed, which might drop a database table.
In other systems I have built there would be a minimum of two users in the database. Firstly the database administrator who creates/drops tables, index, triggers, etc and the application user who would run with less privileges than the database administrator who could only insert and update records for example.
Within the Heroku Postgres setup there doesn't appear to be a way to create another user with less privileges (without the “drop table” option). So the application must connect with the default Heroku Postgres user and therefore the risk of a “drop table” might exist.
I'm running the Heroku Postgres Crane add-on.
Has anyone come up against this or got any creative work arounds for this scenario?
With Heroku Postgres you do only have a single account to connect with. One option that does exist for this type of functionality is to create a follower on Heroku Postgres. A follower is asynchronously kept up to date (usually only a second or so behind) and is read only. This would allow you to grant access to the follower to those that need it while not providing them with the details for the leader db.