I have dozens of unlogged tables, and doc says that an unlogged table is automatically truncated after a crash or unclean shutdown.
Based on that, I need to check some tables after database starts to see if they are "empty" and do something about it.
So in short words, I need to execute a procedure, right after database is started.
How the best way to do it?
PS: I'm running Postgres 9.1 on Ubuntu 12.04 server.
There is no such feature available (at time of writing, latest version was PostgreSQL 9.2). Your only options are:
Start a script from the PostgreSQL init script that polls the database and when the DB is ready locks the tables and populates them;
Modify the startup script to use pg_ctl start -w and invoke your script as soon as pg_ctl returns; this has the same race condition but avoids the need to poll.
Teach your application to run a test whenever it opens a new pooled connection to detect this condition, lock the tables, and populate them; or
Don't use unlogged tables for this task if your application can't cope with them being empty when it opens a new connection
There's been discussion of connect-time hooks on pgsql-hackers but no viable implementation has been posted and merged.
It's possible you could do something like this with PostgreSQL bgworkers, but it'd be a LOT harder than simply polling the DB from a script.
Postgres now has pg_isready for determining if the database is ready.
https://www.postgresql.org/docs/11/app-pg-isready.html
Related
Is there a way to submit a command in datagrip to a database without keeping the connection open / asynchronously? I'm attempting to create indexes concurrently, but I'd also like to close my laptop.
My datagrip workflow:
Select column in a database, click 'modify column', and eventually run code such as:
create index concurrently batchdisbursements_updated_index
on de_testing.batchdisbursements (updated);
However, these run as background tasks and cancel if I exit datagrip.
However, these run as background tasks and cancel if I exit datagrip.
What if you close your laptop without exiting datagrip? Datagrip is probably actively sending a cancellation message to PostgreSQL when you exit it. If you just close the laptop, I doubt it will do that. In that case, PostgreSQL won't notice the client has gone away until it tries to send a message, at which point the index creation should already be done and committed.
But this is a fragile plan. I would ssh to the server, run screen (or one of the fancier variants), run psql in that, and create the indexes from there.
I'm attempting to upgrade heroku PostgreSQL instances from pg11 to pg12 using the copy method as my testing environments are on hobby instances. At the end of the process it appears to be hanging for a long time (does not exit after >30 minutes for a 120MB database). The datastore view suggests everything is fine, I have the same number of rows, but there are issues.
It appears to be the fault of a materialized view. If I connect to the database and look through the tables and views, only one appears to be empty. Using postico, it waits and waits for the view's structure, but doesn't give the usual warning for an unpopulated view.
I can recreate the stalling behaviour by creating a local pg12 database and attempting to use pg_restore with a recent backup. Along the same lines, I appear to be able to get it working by creating an empty local database, running all the db migrations, truncating all tables and sequences, and then doing a --data-only --disable-triggers load from the same backup. Not a particularly smooth or inspiring migration plan plan. Using --verbose doesn't show up any obvious errors, the last thing I get is that it's creating the problematic materialized view.
I've also set log_statement to all, and the last one I get is that it's refreshing the problematic view. At this point, the postgres command starts using ~100% CPU.
Locally, I'm using this command to restore:
pg_restore --verbose --clean --no-acl --no-owner -h localhost -d database_name database_backup.dump
This is the command we use regularly to restore production backups for local development.
Are there any known gotchas with upgrading from 11 to 12, or ways that I might be able to extract more information about what's going on?
It has probably chosen an appalling plan for doing the materialized view query, due to lack of statistics at the time the refresh was launched.
You could kill the process, then restart the refresh once stats are gathered (which they might already be.)
If starting from scratch, you could run pg_restore with --section of pre-data and data, then do an ANALYZE, then do post-data.
I want to execute a long-running stored procedure on PostgreSQL 9.3. Our database server is (for the sake of this question) guaranteed to be running stable, but the machine calling the stored procedure can be shut down at any second (Heroku dynos get cycled every 24h).
Is there a way to run the stored procedure 'detached' on PostgreSQL? I do not care about its output. Can I run it asynchronously and then let the database server keep working on it while I close my database connection?
We're using Python and the psycopg2 driver, but I don't care so much about the implementation itself. If the possibility exists, I can figure out how to call it.
I found notes on the asynchronous support and the aiopg library and I'm wondering if there's something in those I could possibly use.
No, you can't run a function that keeps on running after the connection you started it from terminates. When the PostgreSQL server notices that the connection has dropped, it will terminate the function and roll back the open transaction.
With PostgreSQL 9.3 or 9.4 it'd be possible to write a simple background worker to run procedures for you via a queue table, but this requires the ability to compile and install new C extensions into the server - something you can't do on Heroku.
Try to reorganize your function into smaller units of work that can be completed individually. Huge, long-running functions are problematic for other reasons, and should be avoided even if unstable connections aren't a problem.
We do nightly full backups of our db and I then use that dump to create my own dev-db. The creation of the dev-db takes roughly 10 minutes so its scheduled every morning by cron before I get to work. So I can now work with an almost live db.
But when I'm testing things it would sometimes be convenient to rollback the full db or just some specific tables to the initial backup. Of course I could do the full recreation of the dev-db but that would make me wait for another 10 minutes before I could run the tests again.
So is there an easy way to restore/rewind the database/table to a specific point in time or from a dump?
I have tried to use pg_restore like this to restore specific tables:
pg_restore -d my-dev-db -n stuff -t tableA -t tableB latest-live-db.dump
I have tried with options like -cand --data-only also. But there seems to be several issues here that I did not foresee:
The old data is not automatically removed when the restored data is copied back.
There is several foreign-key constraints that makes this impossible (correct me if I'm wrong) without explicitly removing the FK before the restore and then adding them back again.
PK-sequences that gets out of order does not concern me at all at this point but that might be an issue as well.
Edit: more things I tested/looked into:
pg_basebackup
A more brute force alternative to pg_basebackup is to stop the db-server, copy the db-files, then start the db-server.
Both of the alternatives above fail because I have several local databases running in the same cluster and that sums up to a lot of data on disk. There is no way to separate the databases this way! So the file copy action here will not give me any speed gain.
I'm assuming you are asking about a database not a cluster. The first thing that comes to my mind is to restore the backup to 2 different dbs, one with the dev_db name and the other with another name like dev_db_back. Then when you need a fresh db drop dev_db and rename dev_db_backup to dev_db with
drop database if exists dev_db;
alter database dev_db_backup rename to dev_db;
After that, to have another source to rename from, restore the backup to dev_db_backup again. This could be done by a script so the dropping, renaming and restoring would be automated. As dropping and renaming are instantaneous just start the script and the renaming is done without a need to wait for the new restore.
If it is common to need repeated restores in less 10 minutes intervals I think you can try to do what you are doing inside a transaction:
begin;
-- alter the db
-- test the alterations
commit; -- or ...
-- rollback;
I want to make a script that will run postgres in-memory without durability.
I read this page: http://www.postgresql.org/docs/9.1/static/non-durability.html
But I didn't understand how I can set this parameters in script. Could you please, help me?
Thanks for help!
Most of those parameters, like fsync, can only be set in postgresql.conf. Changes are applied by re-starting PostgreSQL. They apply to the whole database cluster - all the databases in that PostgreSQL install. That's because the databases all share a single postmaster, write-ahead log, and set of shared system tables.
The only parameter listed there that you can set at the SQL level in a script is synchronous_commit. By setting synchronous_commit = 'off' you can say "it's OK to lose this transaction if the database crashes in the next few seconds, just make sure it still applies atomically".
I wrote more on this topic in a previous answer, Optimise PostgreSQL for fast testing.
If you want to set the other params with a script you can do so but you have to do it by opening and modifying postgresql.conf using the script, then re-starting PostgreSQL. Text-processing tools like sed make this kind of job easier.
If you're running a debian based linux distro, you can just do something like:
pg_createcluster -d /dev/shm/mypgcluster 8.4 ramcluster
to create a ram based cluster. Note that you'll have to do:
pg_drop cluster 8.4 ramcluster
and recreate it on reboot etc.