Postgres Restore taking ages (days) - postgresql

I've been working on a backup / restore for a Postgres server for quite a while now. It's an Azure Windows Virtual Machine (Windows server 2012).
The database isn't that big (near 5Gb), but the restore takes (literally) days. I've tried (several) times with different settings to restore the database, but all of the times it took days to "finish" (it didn't finish - I killed the process because I didn't see anything happening, that's why I'm running the job verbose this time).
I've now been running the job (verbose one) for 5 days straight and still it isn't finished. It's inserting rows (or at least displaying the rows), but it's still running.
Currently I'm using this command:
pg_restore -Fc -v --jobs=2 --host=localhost [filename]
Jobs is set at 2 because it's a dual core server. Like I said: different settings still very very slow.
What is wrong - should I be "tuning" the database before the restore or what?
This is a test-server setup. When we're doing with the test the current data need to be restored (again) to the new production server: we can't afford to wait days on end before the production environment comes online.
It's not pushing errors into the logs or something - it just keeps running and running and running...
So what am I doing wrong?

Related

Restoring PostgreSQL 11 backup to 12 hangs. How can I debug it?

I'm attempting to upgrade heroku PostgreSQL instances from pg11 to pg12 using the copy method as my testing environments are on hobby instances. At the end of the process it appears to be hanging for a long time (does not exit after >30 minutes for a 120MB database). The datastore view suggests everything is fine, I have the same number of rows, but there are issues.
It appears to be the fault of a materialized view. If I connect to the database and look through the tables and views, only one appears to be empty. Using postico, it waits and waits for the view's structure, but doesn't give the usual warning for an unpopulated view.
I can recreate the stalling behaviour by creating a local pg12 database and attempting to use pg_restore with a recent backup. Along the same lines, I appear to be able to get it working by creating an empty local database, running all the db migrations, truncating all tables and sequences, and then doing a --data-only --disable-triggers load from the same backup. Not a particularly smooth or inspiring migration plan plan. Using --verbose doesn't show up any obvious errors, the last thing I get is that it's creating the problematic materialized view.
I've also set log_statement to all, and the last one I get is that it's refreshing the problematic view. At this point, the postgres command starts using ~100% CPU.
Locally, I'm using this command to restore:
pg_restore --verbose --clean --no-acl --no-owner -h localhost -d database_name database_backup.dump
This is the command we use regularly to restore production backups for local development.
Are there any known gotchas with upgrading from 11 to 12, or ways that I might be able to extract more information about what's going on?
It has probably chosen an appalling plan for doing the materialized view query, due to lack of statistics at the time the refresh was launched.
You could kill the process, then restart the refresh once stats are gathered (which they might already be.)
If starting from scratch, you could run pg_restore with --section of pre-data and data, then do an ANALYZE, then do post-data.

2 processes running under postgres user consuming very high cpu

I have a ubuntu server with postgres installed. It is showing me 2 processes running under postgres user. One of them is a cron command and the other tsm.
I have absolutely no idea how they are being executed. I have been trying to google it for hours and I cannot seem to find any references for them.
Any idea where to start looking and how to switch them off ?

Stymied by idle_in_transaction_session_timeout

Immediate problem: When I do a pgAdmin 4 restore I get "Stymied by idle_in_transaction_session_timeout" error.
I am on a MacBook Pro running macOS Mojave version 10.14.5, using Java and PostgreSQL. I use the pgAdmin 4 GUI, as I am not proficient in psql, bash, etc. I have a test database named pg2. As you can see from the attachment, PostgreSQL servers 9.4 and 10 have the identical databases. If I make a change in a database on one server, it will show also in the other server’s database. There is a third server, 11, in which there is only the postgres database.
I have tried psql and get errors (including timeout errors).
I have tried to Delete/Drop server 11, it will disappear but when I sign out of pgAdmin 4 and then go into pgAdmin 4 again the server 11 will be there again.
See the attachments for screen shots.
I expect the backup/restore to work: backup, then make a change to the database, then correctly restore to previous state.
I would like to have just one server, preferably 11 with only pg1 and the test db tempdb running in it. I thought that I could live with the three, for I am aware of my current capabilities and thus did not want to screw things up further. However, I suspect that the two servers 9.4 and 10 are the source of my current problem: receiving the idle_in_transaction_session_timeout error while doing a restore. Note: I did the backup using the server 10’s pg1 backup. Did it create 2 backups, one for 9.4 and one for 10?
I tried to attach these before. They will help make sense of my problem.
The 2 servers have the same database; is this causing the idle in transaction session timeout?

why my postgres job stop running?

I create a job to clean the database every day at 01:00.
According to statistic run OK from 3 months.
But today i realize the database size was very big so i check the jobs and hasn't run for one month.
Properties say last run was '10/27/2014' and statistics confirm run successful.
Also say next run will be '10/28/2014' but looks like never run and stay frozen since then.
(I'm using dd/mm/yyyy format)
So why stop running?
There is a way to restart the job or should i go and delete and recreate the job?
How can i know a job didn't run?
I guess i can write a code if job isn't successful but what about if never execute?
Windows Server 2008
PostgreSQL 9.3.2, compiled by Visual C++ build 1600, 64-bit
The problem was the pgAgent service wasn't running.
When I Restart the Postgres service:
Postgres service stop
pgAgent stop because is a dependent service
Postgres service start
but pgAgent didn't
Here you can see pgAgent didn't start.

Why is pgsql sometimes not listening for the first few seconds after start even though "service postgres status" returns OK?

I have a web app that uses postgresql 9.0 with some plperl functions that call custom libraries of mine. So, when I want to start fresh as if just released, my build process for my development area does basically this:
dumps data and roles from production
drops dev data and roles
restores production data and roles onto dev
restarts postgresql so that any cached versions of my custom libraries are flushed and newly-changed ones will be picked up
applies my dev delta
vacuums
Since switching my app's stack from win32 to CentOS, I now sometimes (i.e., it seems, only if and only if I haven't run this build process in "a while"--perhaps at least a day) get an error when my build script tries to apply the delta:
psql: could not connect to server: No such file or directory
Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
Specifically, what's failing to execute at the shell level is this:
psql --host=$host -U $superuser -p $port -d $db -f "$delta_filename.sql"
If, immediately after seeing this error, I try to connect to the dev database with psql, I can do so with no trouble. Also, if I just re-run the build script, it works fine the second time, every time I've encountered this. Acceptable workaround, but is the underlying cause something to be concerned about?
So far in my attempts to debug this, I inserted a step just after the server restart (which of course reports OK shutdown, OK startup) whereby I check the results of service postgresql-dev status in a loop, waiting 2 seconds between tries if it fails. On my latest build script run, said loop succeeds on the first try--status returns "is running"--but then applying the delta still fails with the above connection error. Again, second try succeeds, as does connecting via psql outside the script just after it fails.
My next debug attempt was to sleep for 5 seconds before the first status check and see what happens. So far this seems to solve the problem.
So why is pgsql not listening on the socket after it starts [OK] and also has status running ok, for up to 5 seconds, unless it has "recently" been restarted?
The status check only checks whether the process is running. It doesn't check whether you can connect. There can be any amount of time between starting the process and the process being ready to accept connections. It's usually a few seconds, but it could be longer. If you need to cope with this, you need to script it so that it checks whether it is possible to connect before proceeding. You could argue that the CentOS package should do this for you, but it doesn't.
Actually, I think in your case there is no reason to do a full restart. Unless you are loading libraries with shared_preload_libraries, it is sufficient to restart the connection to pick up new libraries.