Copy an entire data table (not entire database) from local machine to heroku postgres - postgresql

I have a relatively large data table (~4m rows) that has been imported to a locally hosted postgresql database. (As it happens it's a ruby on rails app database, but that shouldn't be important for the purposes of the question - unless it helps)
I want to take that table and add it into an identical table in a heroku postgresql database (the table is currently empty).
How would I do that quickly and efficiently?
I found this Copy a table from one database to another in Postgres
but I'm struggling with the syntax for the heroku end, i.e. how do I connect to both at the same time? Which database am I connecting to originally?

In that answer, you are originally connected to the database "source_db" or "my_db" (depending on which line in the answer you are looking at). Presumably that database is on the instance running locally on port 5432, unless unshown environment variables (or non-default compilation) have changed that. And the destination database is named "target_db", running in the same instance.
The pg_dump and psql are independent commands and each takes all the connection options that they would take if run in isolation. So you would probably want something like:
pg_dump -t table_to_copy source_db | psql target_db -h you.heroku.hostname_or_ip
A problem could be if both commands prompt for a password, it might make a mess. Which password do you need to enter first? And whichever order, will they read them correctly? If both need passwords, it is best to arrange that at least one of them be supplied by ~/.pgpass.

Related

I have loaded wrong psql dump into my database, anyway to revert?

Ok, I screwed up.
I dumped one of my psql (9.6.18) staging database with the following command
pg_dump -U postgres -d <dbname> > db.out
And after doing some testing, I "restored" the data using the following command.
psql -f db.out postgres
Notice the absence of -d option? yup. And that was supposed to be the username.
Annnd as the database happend to have the same name as its user, it overwrote the 'default' database (postgres), which had data that other QAs are using.
I cancelled the operation quickly as soon as I realised my mistake, but the damage was still done. Around 1/3 ~ 1/2 of the database is roughly identical to the staging database - at least in terms of the schema.
Is there any way to revert this? I am still looking for any other dumps if any of these guys made one. But I don't think there is any past two to three months. Seems like I got no choice but to own up and apologise to them in the morning.
Without a recent dump or some sort of PITR replication setup, you can't un-revert this easily. The only option is to manually go through the log of what was restored and remove/alter it in the postgres database. This will work for the schema, the data is another matter. FYI, the postgres database should not really be used as a 'working' database. It is there to be a database to connect to for doing other operations, such as CREATE DATABASE or to bootstrap your way into a cluster. If left empty then the above would not have been a problem. You could have done, from another database, DROP DATABASE postgres; and then CREATE DATABASE postgres.
Do you have a capture of the output of the psql -f db.out postgres run?
Since the pg_dump didn't specify --clean or -c, it should not have overwritten anything, just appended. And if your tables have unique or primary keys, most of the data copy operations should have failed with unique key violations and rolled back. Even one overlapping row (per table) would roll back the entire dataset for that table.
Without having the output, it will be hard to figure out what damage has actually been done.
You should also immediately copy the pg_xlog data someplace safe. If it comes down to it, you might be able to use pg_xlogdump to figure out what changes committed and what did not.

What is the best way to backfill old database data to an existing Postgres database?

A new docker image was recently stood up to replace an existing postgres database. A dump was taken of the database before the old instance was shut down using the following command:
pg_dump -h localhost -p 5432 -d *dbname* -U postgres > *dbname*.pgdump
We'd like to concatenate or append this data to the new database in order to "backfill" some older historical data. The database name and schema of the two databases is identical. What is the easiest, safest way to do this? Secondly, need postgres be shut down during the process?
If overlapping primary keys or unique columns have been assigned to the new data, then there will be no clean way to merge them without putting in some work to clean that up. Assuming that hasn't happened...
The current dump file will have create statements for all the objects that already exists. If you replay that file into the current database, you will get a bunch of errors for all those objects. If you don't have it all run in one transaction, then you could simply ignore those errors. But, you might also load data in the wrong order and get foreign key violations. Those errors will be mixed in with all the other ones about existing object, so might be easy to overlook.
So what I would do is stand up an empty database server, and replay your current dump into that. Then retake the pg_dump, but with either -a or --section=data. Then you should be able to load that dump into your new database. This has two advantages, it will not dump out CREATE statements which are not needed and throw errors which would need to be ignored, and it should dump the tables in an order which will not cause foreign key violations.

Database restore from a hacked system

A linux VM with postgres 9.4 was hacked into. (Two processes taking 100% cpu, weird files in /tmp, did not reoccur after kill(s) and restart.) It was decided to install the system from scratch on a new machine (with postgres 9.6). The only data needed was in one of postgres databases. A pg_dump of the database was made after the attack.
Regardless of whether the data - the tables/rows/etc. - were modified during the attack: is it safe to restore the database in the new system?
I consider using pg_restore with the -O option (ignores the user permissions)
The two dangers are:
important data could have been modified
back doors could have been installed in your database
With the first, you're on your own how to verify that your data are ok. The safest thing would be to use a backup from before the machine was compromized, but this would mean data loss.
For the second, I would run a pg_dumpall -s and spend a day reading it carefully. Compare it with a dump from a backup made before the breach. Watch out for weird object and column names and functions with SECURITY DEFINER.

Get all database names through JDBC

Is there any way how to get all database names out of a postgres database using JDBC? I can get the current one, but thats not what I am looking for...
I have a jUnit rule, which creates database for each test and after the test it drops it, but in some special cases, when the JVM dies, the drop never happens. So I'd like to check in the rule also existing database and clean some, which are not used any more. What I am looking for is some \l metacommand (but I can't easily ssh to the machine from unit tests...)
What would be also a solution for me would be some database ttl, something like some amqp queues have, but I suppose thats not in postgres either...
Thanks
Just run:
select datname
from pg_database
through JDBC. It returns all databases on the server you are connected to.
If you know how to get the information you want through a psql meta command (e.g. \l) just run psql with the -E switch - all internal SQL queries for the meta commands are then printed to the console.
-l actually uses a query that is a bit more complicated, but to only the the names, the above is sufficient

Can PostgreSQL be used with an on-disk database?

Currently, I have an application that uses Firebird in embedded mode to connect to a relatively simple database stored as a file on my hard drive. I want to switch to using PostgreSQL to do the same thing (Yes, I know it's overkill). I know that PostgreSQL cannot operate in embedded mode and that is fine - I can leave the server process running and that's OK with me.
I'm trying to figure out a connection string that will achieve this, but have been unsuccessful. I've tried variations on the following:
jdbc:postgresql:C:\myDB.fdb
jdbc:postgresql://C:\myDB.fdb
jdbc:postgresql://localhost:[port]/C:\myDB.fdb
but nothing seems to work. PostgreSQL's directions don't include an example for this case. Is this even possible?
You can trick it. If you are running PostGRESQL on a UNIXlike system, then you should be able to create a RAMDISK and use that for the database storage. Here's a pretty good step by step guide for RAMdisks on Linux.
In general though, I would suggest using SQLITE for an SQL db in RAM type of application.
Postgres databases are not a single file. There will be one file for each table and each index in the data directory, inside a directory for the database. All files will be named with the object ID (OID) of db / table / index.
The JDBC urls point to the database name, not any specific file:
jdbc:postgresql:foodb (localhost is implied)
If by "disk that behaves like memory", you mean that the db only exists for the lifetime of your program, there's no reason why you can't create a db at program start and drop it at program exit. Note that this is just DDL to create the DB, not creating the data dir via the init-db program. You could connect to the default 'postgres' db, create your db then connect to it.
Firebird 2.1 onwards supports global temporary tables, which only exist for the duration of the database connection.
Syntax goes something like CREATE GLOBAL TEMPORARY TABLE ... ON COMMIT PRESERVE ROWS