Copying postgresql data without roles - postgresql

I have a snapshot of a Postgres database named srcDB in a file named srcDB-cachedump.gz, obtained using pg_dump. The dump is from a production database, so there are a bunch of different roles. What I would like to do is make a development database called devDB, which has only one role, a user by the name of 'development', but with the same schema as the original database.
The way I initially tried to do this was:
gunzip -c srcDB-cachedump.gz | psql -d devDB -U development -W
When I ran this, however, I got many errors that boiled down to, essentially, "role does not exist." I would like to bypass the recreation of the roles in the production database if at all possible, as I have other programmers on my team and I would like the dev environment to be as portable as possible. I am relatively new at Postgres administration, though, so I am at a loss.

Related

Dump all DATABASE with specifi prefixe with pg_dump

Im currently using the tool pg_dump to backup my database.
I have a lot of database. I don't know their full name but I know that all DATABASES have a well defined prefix.
I would like to automate the process of backup all my DATABASES, however, i have not found a way to specify pg_dump to dump multiple DATABASE that have the same prefix.
I said database and not table nor schema, because I tried the commands in the pgsql doc which gives the options -n for schemas and -t for tables.
But that's not what I want to do, I want to save all my databases with a defined prefix.
Any help in this matter would be greatly appreciated

PGAdmin shows excessive amount of database from Heroku

I'm learning some backend stuff, I made a test database locally and filled some data, and then I dump the database to an SQL file with the following command:
pg_dump -U USERNAME DATABASE —no-owner —no-acl -f backup.sql
And finally restore it to Heroku:
heroku pg:psql —app APPNAME < backup.sql
There is only 1 database I’m deploying, however, when I use PGAdmin to connect to it, it shows more than 2000 databases and crashes my computer:
Where are all these databases coming from?
You don't get a dedicated PostgreSQL server with Heroku Postgres. Your databases are co-located with other users' databases on the same server. You'll be able to see the names of other users' databases, but you won't be able to access them.
I'm not sure what "crashes my computer" means, but make sure you are selecting your database when trying to connect.

What's the difference between initdb /usr/local/var/[db] and createdb [db]

I am starting to use PostgreSQL and I am confused about the two ways to create a database. When I installed it the first time, the instructions said I have to create a default database with initdb /usr/local/var/postgres When I lookup my databases, I can see that I have a database called postgres. Now I am able to create a database with two other commands whereas the former is the command line script and the latter the SQL command. In the case of a "postgres" called database it would be:
createdb postgres
CREATE DATABASE postgres
Both are setting up a database in my list of databases. When I try to create another database with initdb /usr/local/var/[someDbName] though, it doesn't appear in my list of databases. So what's the difference between initdb and createdb then?
initdb is not used to create a "new database".
As documented in the manual you need it to create a "cluster" or "data directory" which then stores databases created with create database.
Quote from the manual:
Before you can do anything, you must initialize a database storage area on disk. We call this a database cluster. (The SQL standard uses the term catalog cluster.) A database cluster is a collection of databases that is managed by a single instance of a running database server
[...]
In file system terms, a database cluster is a single directory under which all data will be stored. We call this the data directory or data area
In short: initdb creates the necessary directory layout on the harddisk to be able to create and manage databases.
It's a necessary part of the installation process of a Postgres server.

Restore database from production to Development

We have the database named 'itreport' on production server and database named 'itreport_dev' on development server.
1)On Production server, 52 users are present in the database 'itreport'.
2)On Development server, 60 users are in present the database 'itreport_dev'.
3)I have taken the dump of production server database 'itreport'. Dump file name is backup_12082017.sql
My question is
If I restore the above dump(backup) file to Development server database 'itreport_dev, Users(60) present will present in the Development database?
If not what option we have to give in the restore process?
What are the pre steps and post steps to be performed on Develpement server?
Short answer: No, roles are not part of a single-database backup.
If you dump only the database using pg_dump it will only restore tables and data. not any roles. any objects owned by missing roles will end up owned by the user performing the restore (this user should be a superuser)
If you do pg_dumpall roles and all databases will be backed up.
Roles can be backed up separately using pg_dumpall -r
if you do pgdumpall --clean the resore will destroy and replace any databases and roles on the dev server that also exist in the dump. any names that are not in both will be unaffected, (the special role "postgres" and template databases also are untouched)
pgdumpall backups are SQL backups and should be restores using psql
su postgres -c psql < all-database-backupfile.sql
or
zcat all-database-backupfile.sql.gz | su postgres -c psql
etc.
(for windows use runas instead of su, I'm not sure of the exact syntax needed)

How to copy a PostgreSQL RDS database within an RDS instance

I had so much trouble doing this - I thought I would make a Q/A on StackOverflow to explain the process.
The question is about copying an RDS postgres database for development usage - especially for testing database migrations scripts, etc. That's why the focus on a "single schema" within a "single database".
In my case, I want to create a test database that's as isolated as possible, while remaining within a single RDS instance (because spinning up entire RDS instances takes anywhere from 5 - 15 minutes and because I'm cheap).
Here is an answer using only the command line.
Pre-requisites:
you must have Postgres client tools installed (don't need the actual server)
client version must be same or higher than your postgres server version
network access to the RDS instance
credentials for accessing the relevant database accounts
Example context:
I have an RDS instance at rds.example.com which has a master user named rds_master.
I have an "application user" named db_dev_user, a database named dev_db that contains the schema app_schema.
note that "user" and "role" in postgres are synonymous
Note: this guide was written in 2017, for postgres version 9.6.
If you find that some steps are no longer working on a recent version of postgres - please do post any fixes to this post as comments or alternative answers.
pg_dump prints out the schema and data of the original database and will work even while there are active connections to the database. Of course, performance for those connections is likely to be affected, but the resultant copy of the DB is transactional.
pg_dump --host=rds.example.com --port=5432 \
--format=custom \
--username=db_dev_user --dbname=dev_db \
> pgdumped
The createuser command creates the user that your test application/processes should connect with (for better isolation), note that the created user is not a superuser and it cannot create databases or roles.
createuser --host=rds.example.com --port=5432 \
--username=rds_master \
--no-createdb --no-createrole --no-superuser \
--login --pwprompt \
db_test_user
Without this next grant command the following createdb will fail:
psql --host=rds.example.com --port=5432 \
--username=rds_master --dbname=postgres \
--command="grant db_test_user TO rds_master"
createdb does what it says on the tin; note that the db_test_user role "owns" the DB.
createdb --host=rds.example.com --port=5432 \
--username=rds_master --owner=db_test_user test_db
The create schema command is next. The db_test_user cannot create the schema, but it must be authorized for the schema or the pg_restore would fail because it would end up trying to restore into the pg_catalog schema (so note that user=rds_master, but dbname=test_db).
psql --host=rds.example.com --port=5432 \
--username=rds_master --dbname=test_db \
--command="create schema app_schema authorization db_test_user"
Finally, we issue the pg_restore command, to actually create the schema objects (tables, etc.) and load the data into them:
pg_restore --host=rds.example.com --port=5432 \
--verbose --exit-on-error --single-transaction \
--username=db_test_user --schema=app_schema \
--dbname=test_db --no-owner \
./pgdumped
exit-on-error - because otherwise finding out what went wrong involves too much scrolling and scanning (also it's implied by single-transaction anyway)
single-transaction - avoids having to drop or recreate the DB if things go pear-shaped
schema - only do the schema we care about (can also supply this to the original pg_dump command)
dbname - to ensure use of the DB we created
no-owner - we're connecting as db_test_user anyway, so everything should be owned by the right user
For production, you'd be better off just taking an RDS snapshot of your instance and restoring that, which will create an entirely new RDS instance.
On a mostly empty database - it takes a few minutes to create the snapshot and another 5 minutes or so to create the new RDS instance (that's part of why it's a pain during development).
You will be charged for the new RDS instance only while it is running. Staying within the free tier is one of the reasons I wanted to create this DB with the same instance for development purposes, plus not having to deal with a second DNS name; and that effect is multiplied as you start to have multiple small development environments.
Running a second RDS instance is the better option for production because you nearly completely eliminate any risk to your original DB. Also, when you're dealing with real amounts of data - snapshot/DB creation times will be dwarfed by the amount of time spent reading/writing the data. For large amounts of data, it's likely the Amazon RDS snapshot creation/restore process is going to have far better parallelisation than a set of scripts running on a single server somewhere. Additionally, the RDS console gives you visilibility into the progress of the restore - which becomes invaluable as the dataset grows larger and more people become involved.