Protect Data on Postgres with pgcrypto - postgresql

Someone can clarify the difference among different type of encrypting a database?
I saw that a lot of people use pgcrypto, but they say that TDE is always the best choice.
Is pgcrypto enough to respect GPDR?
I have already installed pgcrypto and test it. It works fine.
The only guide I found for the TDE on Postgres says that it is possible to use by adding on postgresql.conf these line:
keystore_location
tablespace_encryption_algorithm
And executing these lines code:
select pgx_set_master_key 'passphrase'
pg_ctl --keystore-passphrase restart 'keystore location'
At the end you can create a new tablespace.
On the official docs of Postgres the TDE is never explained.
Thanks

Related

How to setup a password for PostgreSQL in postgreapp?

I am using postgresapp for the PostgreSQL and without password I was able to connect to the database and perform operations. I am curious to learn about the password. Also I use Postico as Interface. Open to any suggestions.
The default for Postgres.app is to have no password and set trust-level authentication in pg_hba.conf. To change this, you need to do the following:
Alter the IP address and mask for host all all 127.0.0.1/32 trust as needed in pg_hba.conf, and change authentication method from trust to password or md5 (or whatever your requirements are)
Set the password for the desired user(s) with ALTER USER <username> WITH PASSWORD '<password>';
Reload the conf with SELECT pg_reload_conf()
Note your pg_hba.conf file is usually located in ~/Library/Application Support/Postgres/var-12 -- the sure-fire way to know is by querying SHOW data_directory in your psql prompt
Postgres.app is a great way to get Postgres running on macOS in a few minutes. It ships with a default user name and password, and they you're on your own. Postgres.app is a nicely compiled version of Postgres that you can run by double-clicking, you'll need other tools (and knowledge) to take advantage of Postgres. As you'll have noticed, the UI for Postgres.app is pretty much a few buttons to configure a server, and to give you shortcuts to the logs, configuration files, and data.
If you want to use psql (as mentioned), or any of the other command line tools, they're embedded in the application's package. Right-click, open the package, open Contents, open Versions, open the version you use, and look in bin.
If you want a GUI tool, there are many options. Since you say Postgres.app, I'll assume macOS. You've found Postico, SQLPro for Postgres is good, TablePlus is also good. Those tools have fairly uncluttered UIs. If you want or need more features, pgAdmin has a whole lot to offer, and it's free. I end up using Navicat a lot, even though it has a UI that screams "Look Ma! I wrote it in Java!" It gets a lot done. I'd say that day-to-day on macOS, I use SQLPro most. But, really, it's largely a matter of taste. psql is quite powerful, and you'll find no short of help for that.

How to load data from S3 to PostgreSQL RDS

I have a need to load data from S3 to Postgres RDS (around 50-100 GB) I don't have the option to use AWS Data Pipeline and I am looking for something similar to using the COPY command to load data in S3 into Amazon Redshift.
I would appreciate any suggestions on how I can accomplish this.
Originally, this answer was trying to use the S3 to Postgres RDS Functionality. That whole enterprise failed (see below).
The way I have finally been able to do this is:
Set-up an EC2 instance with psql installed (see below near end of post)
Copy the relevant CSVs to import from S3 to the local instance
Use the psql /copy command to import the files up
This last part is really, really important. If you use the SQL COPY command the entire RDS Postgres role structure will frustrate you to no end. It has a wonky SUPERRDSADMIN role which is not very super at all. However, if you use the psql /copy commany you apparently can do anything. I have confirmed this be the case and have started my uploads succesfully. I will come back and re-edit this post (time permitting) to add relevant documentation steps for the above.
Caveat Emptor: The post below was all the original work I had done trying to get this implemented. I don't want to bury the lead despite multiple efforts (including what can only be described as pathetic tech support from AWS) I don't believe that this feature is ready for prime time. Despite a very simple test environment, easy to replicate, AWS has not provided an effective way to not get the copy statement to crap out as follows:
The actual call to aws_s3.table_import_from_s3(...) is reporting a permission problem between RDS and S3. From my research work with psql this appears to be a C library, probably installed by AWS.
NOTICE: CURL error code: 28 when attempting to validate pre-signed URL, 1 attempt(s) remaining
NOTICE: HINT: make sure your instance is able to connect with S3.
S3 to Postgres RDS Functionality Now Added
On 2019-04-24 AWS released functionality allowing a Postgres RDS to load directly from S3. You can read the announcement here, and see the documentation page here.
I am sharing with the OP because this appears to be the AWS supported way of solving the question posed.
Key summary points:
Requires Postgres 11.1 or greater
Need access to psql and the ability to connect it to the RDS instance
Need to install the aws_s3 extension which pulls in aws_commons.
You can get to the S3 bucket by specifying credentials or by assigning IAM roles to RDS
It advertises supporting all of the same data formats as the postgres COPY command
It currently only appears to support a single file at a time (ie no regex)
The instructions are fairly detailed and provide a variety of paths to configuring (AWS CLI scripts, Console instructions, etc). Additionally, the option to use your IAM keys rather than have to set-up roles is nice.
I did not find a way to download just psql, so I had to bring down a full postgres install down to my mac, but that was no big deal with brew:
brew install postgres
and since the DB service does not get activated it is the quickest way to get psql.
Update: Decided that having psql on my mac was a security hole, port forwarding, etc. I found that there is a simple Postgres install available for AMI Linux 2 under the AMI Extras rubric. The install command is fairly simple on your ami instance type.
sudo amazon-linux-extras install postgresql10
psql is fairly easy to use, however, important to keep in mind that any instructions to psql itself are escaped by a \. Documentation on psql can be found here. Recommend going through it at least once before executing the AWS recommended scripts.
To the extent you run tight security and have access to your RDS instances seriously restricted (which I do) don't forget to open up the ports from your AMI instance running Postgres to your RDS instance.
If your preference is a GUI then you can try to use PGAdmin4. It is the AWS recommended way of connecting to RDS Postgres instances according to the docs. I was unable to get any of the SSH tunneling features to work (which is why I ended up doing the localhost SSH mapping that I used for psql). I also found it to be rather buggy in other ways. Reading reviews of the product it seems that version 4 may not be the stablest of releases.
http://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html
Use the COPY command to load a table in parallel from data files on
Amazon S3. You can specify the files to be loaded by using an Amazon
S3 object prefix or by using a manifest file.
The syntax to specify the files to be loaded by using a prefix is as
follows:
copy <table_name> from 's3://<bucket_name>/<object_prefix>'
authorization;
update
Another option is to mount s3 and use direct path to the csv with COPY command. I'm not sure If it will hold 100GB effectively, but worth of trying. Here is some list of options on software.
Yet another option would be "parsing" s3 file part by part with something described here to a file and COPY from named pipe, described here
And the most obvious option to just download file to local storage and use COPY I don't cover at all
Also worth of mentioning would be s3_fdw (status unstable). Readme is very laconic, but I assume you could create a foreign table leading to s3 file. Which itself means you can load data to other relation...

AWS pg_dump Does Not Include Globals

We have multiple PostgreSQL Instances in AWS RDS. We need to maintain an on-premise copy of each database to comply with our disaster recovery policy. I have been successful is using pg_dump and pg_restore to export the database schemas and tables to our on-premise server, but I have been unsuccessful in exporting the roles and tablespaces. I have found that this is only possible by using pg_dumpall, but as this requires super_user access, and that is not allowed in RDS, how can I export those aspects of the database to on our on-premise server?
My pg_dump command:
Pg_dump -h {AWS Endpoint} -U {Master Username}-p 5432 -F c -f C:\AWS_Backups\{filename}.dmp {database name}
My pg_restore command:
pg_restore -h {AWS Endpoint} -p 5432 -U {Master Username} -d {database name} {filename}.dmp
I have found multiple examples of people using pg_dump to export their PostgreSQL databases, however, they are not addressing the "Globals" that are ignored using pg_dump. Have I misread the documentation? After performing my pg_restore, my logins were not created on the database.
Any help you can provide on getting the FULL database (including globals) to our offsite location would be greatly appreciated.
UPDATE: My patch is now a part of Postgres v10+.
You can read about how this works here 3.
Earlier, I had also posted a working solution posted to my Github account. Then, you'd need to compile the binary and use that however, with the patch now a part of Postgres v10+, any pg_dumpall since that version now supports this feature.
You can read some more detailed inner workings here.
I haven't been able to find an answer to my question anywhere online. Just in case someone else may be experiencing this problem, I thought I would post a high-level outline of my "solution". I go around my elbow to get to my knee, but this is the option I have come up with:
Create a table (I created 2 - 1 for roles, and one for logins) in each PostgreSQL database within AWS. This table(s) will need to have all columns that you will need to dynamically create the SQL to do CREATE, GRANT, REVOKE, etc.
Insert all roles, logins, privileges, and permissions into this table. These are scattered everywhere, but here are the ones I used:
pg_auth_members (role and login relationships)
pg_roles (role and login permissions ie can login, inherit parent, etc)
information_schema.role_usage_grants (schema privileges)
information_schema.role_table_grants (table privileges)
information_schema.role_routine_grants (function privileges)
To fill in the gaps, there are clever queries on the web page below to use the built in functions to check for access. You will have to loop through the tables and process a row at a time
https://vibhorkumar.wordpress.com/2012/07/29/list-user-privileges-in-postgresqlppas-9-1/
Specifically, I used a variation of database_privs function
Once all of the data is in those tables, you can execute pg_dump, and it will extract that info from each database to your on-premise location. I did this through a Python script.
On your server, use the data in the tables to dynamically create the SQL statements needed to run the CREATE, GRANT, REVOKE, etc. scripts. Save in a .sql file that you can instruct a Python script to execute against the database and recreate the AWS roles and logins.
One thing I forgot to mention - because we are unable to access the pg_auth_id table in AWS, I have found no way to extract the passwords out of AWS. We are going to store these in a password manager, and when I create the CREATE ROLE statements, I'll pass a default to be updated.
I haven't completed the process, but it has taken me several days to track down a viable option to the absence of pg_dumpall's functionality. If anyone sees any flaws in my logic, or has a better solution, I'd love to read about it. Good luck!

postgresql "createdb" and "CREATE DATABASE" yield a non-empty database. what the fork?

First of all, I apologize if this question turns out to be painfully obvious, I'm not that postgres-savvy beyond the basics. I use postgresql as a database backend for quite a few django projects that I'm working on, and that's always worked just fine for me. Recently, I set up postgresql on a new machine, and at one point a co worker tried setting up a new project on that machine. Unfortunately, it's too late to go back into the bash history to figure out what he did, and he won't be available for a while to ask him about it. The issue i'm having now is...
I regularly reset postgres databases by simply using a dropdb/createdb command. I've noticed that whenever I run the dropdb command, the database does disappear, but when I run the createdb command next, the resulting database is not empty. It contains tables, and those tables do contain data (which appears to be dummy data from the other project). I realise that i'm a bit of a postgres noob, but is this in some way related to template features in postgres? I don't specify anything like that on the command line, and I'm seeing the exact same results if I drop/create from the psql console.
By the way, I can still wipe the db by dropping and recreating the "public" schema in the database. I'll be glad to add any info necessary to help figure this out, but to be honest I haven't a clue what to look for at this point. Any help would be much appreciated.
Summarizing from the docs template0 is essentially a clean, virgin system database, whereas template1 serves as a blue print for any new database created with the createdb command or create database from a psql prompt (there is no effective difference).
It is probable that you have some tables lurking in template1, which is why they keep reappearing on createdb. You can solve this by dropping template1 and recreating it from template0.
createdb -T template0 template1
The template1 database can be extremely useful. I use Postgis a lot, so I have all of the functions and tables related to that installed in template1, so any new database I create is immediately spatially enabled.
EDIT. As noted in docs, but worth emphasizing, to delete tempate1 you need to have pg_database.datistemplate = false set.

backing up coverity PostgreSQL database to file

We have the "coverity" tool setup and are trying to find a way to backup the database to a file, it uses I believe PostgreSQL.
How can we do this, is it using its own independent installation of PostgreSQL?
Even better answer..
cov-admin-db backup c:/mybackupfile
When you installed Coverity Integrity Manager, it asked you if you want it to install and manage a PostgreSQL instance or if you want to connect to your own existing PostgreSQL instance that you then have to manage.
If you chose the former, then you would use the provided cov-admin-db command.
If you chose the latter then presumably you already do regular back-ups of your databases with *pg_dump*, you should do the same for the Coverity database.
Without knowing which of the two you chose, it's not clear which of the two answers already given is correct.
You can check which option you chose by looking in the file /config/system.properties - if the first line is "*embedded_db=true*" then use the cov-admin-db command which is documented in the manual as well as in its own --help option.
If it does use PostgreSQL, then there should be a pg_dump utility somewhere in the PostgreSQL installation.
Taking backups using pg_dump is very well explained in the manual:
http://www.postgresql.org/docs/current/static/backup-dump.html
http://www.postgresql.org/docs/current/static/app-pgdump.html