My application is using libpq to write data to Postgres using the COPY API. After over 900000 successful COPY+commit (each containing a single row, don't ask) actions, one errored out with the following:
ERROR: canceling statement due to user request
CONTEXT: COPY [...]
My code never calls PQcancel or related friends, which I think is precluded anyway by the fact that libpq is being used synchronously and my app is not multi-threaded.
libpq v8.3.0
Postgres v9.2.4
Is there any reasonable explanation for what might have caused the COPY to be cancelled? Will upgrading libpq (as I have done in more recent versions of my application) be expected to improve the situation?
The customer reports that the Postgres server may have been shut down when this error was reported, but I'm not convinced since the error text is pretty specific.
That error will be emitted when you:
send a PQcancel
use pg_cancel_backend
Hit control-C in psql (which invokes PQcancel)
Send SIGINT to a backend, e.g. kill -INT or kill -2.
My initial answer was incorrect, claiming that the following also produced the same error. They don't; these:
pg_terminate_backend
pg_ctl shutdown -m fast
will emit a different error FATAL: terminating connection due to administrator command.
Related
Is there a way to submit a command in datagrip to a database without keeping the connection open / asynchronously? I'm attempting to create indexes concurrently, but I'd also like to close my laptop.
My datagrip workflow:
Select column in a database, click 'modify column', and eventually run code such as:
create index concurrently batchdisbursements_updated_index
on de_testing.batchdisbursements (updated);
However, these run as background tasks and cancel if I exit datagrip.
However, these run as background tasks and cancel if I exit datagrip.
What if you close your laptop without exiting datagrip? Datagrip is probably actively sending a cancellation message to PostgreSQL when you exit it. If you just close the laptop, I doubt it will do that. In that case, PostgreSQL won't notice the client has gone away until it tries to send a message, at which point the index creation should already be done and committed.
But this is a fragile plan. I would ssh to the server, run screen (or one of the fancier variants), run psql in that, and create the indexes from there.
When trying to connect to my Amazon PostgreSQL DB, I get the above error. With pgAdmin, I get "error saving properties".
I don't see why to connect to a server, I would do any write actions?
There are several reasons why you can get this error:
The PostgreSQL cluster is in recovery (or is a streaming replication standby). You can find out if that is the case by running
SELECT pg_is_in_recovery();
The parameter default_transaction_read_only is set to on. Diagnose with
SHOW default_transaction_read_only;
The current transaction has been started with
START TRANSACTION READ ONLY;
You can find out if that is the case using the undocumented parameter
SHOW transaction_read_only;
If you understand that, but still wonder why you are getting this error, since you are not aware that you attempted any data modifications, it would mean that the application that you use to connect tries to modify something (but pgAdmin shouldn't do that).
In that case, look into the log file to find out what statement causes the error.
This was a bug which is now fixed, Fix will be available in next release.
https://redmine.postgresql.org/issues/3973
If you want to try then you can use Nightly build and check: https://www.postgresql.org/ftp/pgadmin/pgadmin4/snapshots/2019-02-17/
pg_dump is failing with the error message:
"pg_dump FATAL: segment too big"
What does that mean?
PostgreSQL 10.4 on Ubuntu 16.04.
It appears that pg_dump passes the error messages it receives from the queries it is running into the logs.
The following line in the logs (maybe buried deeper if you have busy logs), shows the query that failed.
In this case, we had a corrupted sequence. Any query on the sequence, whether it was interactive, via a column default, or via pgdump, returned the "segment too big" error, and killed the querying process.
I figured out the new start value for the sequence, dropped the dependencies, and created a new sequence starting where the old one left off and then put the dependencies back.
pg_dump worked fine after that.
It is not clear why or how a sequence could get so corrupted that you would have a session killing error when it was accessed. We did have a recent database hard-crash though, so it may be related. (Although that sequence is accessed very rarely and it is unlikely we went down in the middle of incrementing it.)
I was debugging a PostgreSQL 9.2 database corruption issue (on Solaris, but I doubt it matters) recently, and I found that we could reproduce it reliably if the client died in the middle of a transaction and then I shut down PostgreSQL by doing pkill postgres (which basically sends SIGTERM to every running postgres process). If instead we did pkill -QUIT postgres to send SIGQUIT, the database would shut down cleanly and no corruption would occur.
Based on the PostgreSQL 9.2 docs, I think that SIGTERM should be 100% expected by the database server, so why is it not safe to shut down like this? Is it a bug in PostgreSQL, or could I be doing something (configuration, etc.) that would allow the corruption to occur?
I don't think sigterm is what is causing your problem. Again, recommend you ask on dba.stackexchange instead.
If the client dies in the middle of a transacction, then the problem is that the network connection hangs? And then when you kill it you get corruption during WAL replay?
This is a complicated area to troubleshoot but here are some places to begin:
What is going on conncurrently when this happens? What sort of transaction commit load?
How often do WAL logs normally get rotated?
It is possible you could be running into a rare, obscure bug with PostgreSQL (possibly somewhere between the db, kernel, and filesystem), but if so please start by upgrading to latest 9.2 and try to reproduce again. Term and even kill signals are supposed to be 100% safe on PostgreSQL so if you are seeing database corruption, that is not expected.
I have a web app that uses postgresql 9.0 with some plperl functions that call custom libraries of mine. So, when I want to start fresh as if just released, my build process for my development area does basically this:
dumps data and roles from production
drops dev data and roles
restores production data and roles onto dev
restarts postgresql so that any cached versions of my custom libraries are flushed and newly-changed ones will be picked up
applies my dev delta
vacuums
Since switching my app's stack from win32 to CentOS, I now sometimes (i.e., it seems, only if and only if I haven't run this build process in "a while"--perhaps at least a day) get an error when my build script tries to apply the delta:
psql: could not connect to server: No such file or directory
Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
Specifically, what's failing to execute at the shell level is this:
psql --host=$host -U $superuser -p $port -d $db -f "$delta_filename.sql"
If, immediately after seeing this error, I try to connect to the dev database with psql, I can do so with no trouble. Also, if I just re-run the build script, it works fine the second time, every time I've encountered this. Acceptable workaround, but is the underlying cause something to be concerned about?
So far in my attempts to debug this, I inserted a step just after the server restart (which of course reports OK shutdown, OK startup) whereby I check the results of service postgresql-dev status in a loop, waiting 2 seconds between tries if it fails. On my latest build script run, said loop succeeds on the first try--status returns "is running"--but then applying the delta still fails with the above connection error. Again, second try succeeds, as does connecting via psql outside the script just after it fails.
My next debug attempt was to sleep for 5 seconds before the first status check and see what happens. So far this seems to solve the problem.
So why is pgsql not listening on the socket after it starts [OK] and also has status running ok, for up to 5 seconds, unless it has "recently" been restarted?
The status check only checks whether the process is running. It doesn't check whether you can connect. There can be any amount of time between starting the process and the process being ready to accept connections. It's usually a few seconds, but it could be longer. If you need to cope with this, you need to script it so that it checks whether it is possible to connect before proceeding. You could argue that the CentOS package should do this for you, but it doesn't.
Actually, I think in your case there is no reason to do a full restart. Unless you are loading libraries with shared_preload_libraries, it is sufficient to restart the connection to pick up new libraries.