Encountering a SQL Error 40001 in Dbeaver - postgresql

I'm doing a count * query that executes correctly but for some reason when I attempt to export it to CSV, I'm encountering a SQL Error [40001]. Any ideas what the issue could be?

You are running a long query on a standby server, and some modifications that are replicated from the primary server conflict with your query. In particular, VACUUM removed some old row versions that your query still might want to use.
PostgreSQL has to make a choice: either delay applying the changes from the primary, or cancel the query that blocks replication.
How PostgreSQL behaves is determined by the parameter max_standby_streaming_delay. The default value gives the query 30 seconds to finish before it is canceled.
You have three options:
Retry the query and hope it succeeds this time.
Increase max_standby_streaming_delay on the standby.
The risk you are running is that replication will fall behind.
Set the parameter hot_standby_feedback to on on the standby, then the primary won't VACUUM row versions the standby might still need.
The risk you are running is table bloat on the primary, because autovacuum cannot do its job.

Related

Sudden Increase in row exclusive locks and connection exhaustion in PostgreSQL

I have a scenario that repeats itself every few hours. In every few hours, there is a sudden increase in row exclusive locks in PostgreSQL DB. In Meantime there seems that some queries are not responded in time and causes connection exhaustion to happen that PostgreSQL does not accept new clients anymore. After 2-3 minutes locks and connection numbers drops and the system comes back to normal state again.
I wonder if auto vacuum can be the root cause of this? I see analyze and vacuum (NOT FULL VACCUM) take about 20 seconds to complete on one of the tables. I have INSERT,SELECT,UPDATE and DELETE operations going on from my application and I don't have DDL commands (ALTER TABLE, DROP TABLE, CREATE INDEX, ...) going on. Can auto vacuum procedure conflict with queries from my application and cause them to wait until vacuum has completed? Or it's all the applications and my bad design fault? I should say one of my tables has a field of type jsonb that keeps relatively large data for each row (10 MB roughly).
I have attached an image from monitoring application that shows the sudden increase in row exclusive locks.
ROW EXCLUSIVE locks are perfectly harmless; they are taken on tables against which DML statements run. Your graph reveals nothing. You should set log_lock_waits = on and log_min_duration_statement to a reasonable value. Perhaps you can spot something in the logs. Also, watch out for long running transactions.

RDS Postgres "canceling statement due to conflict with recovery" [duplicate]

I'm getting the following error when running a query on a PostgreSQL db in standby mode. The query that causes the error works fine for 1 month but when you query for more than 1 month an error results.
ERROR: canceling statement due to conflict with recovery
Detail: User query might have needed to see row versions that must be removed
Any suggestions on how to resolve? Thanks
No need to touch hot_standby_feedback. As others have mentioned, setting it to on can bloat master. Imagine opening transaction on a slave and not closing it.
Instead, set max_standby_archive_delay and max_standby_streaming_delay to some sane value:
# /etc/postgresql/10/main/postgresql.conf on a slave
max_standby_archive_delay = 900s
max_standby_streaming_delay = 900s
This way queries on slaves with a duration less than 900 seconds won't be cancelled. If your workload requires longer queries, just set these options to a higher value.
Running queries on hot-standby server is somewhat tricky — it can fail, because during querying some needed rows might be updated or deleted on primary. As a primary does not know that a query is started on secondary it thinks it can clean up (vacuum) old versions of its rows. Then secondary has to replay this cleanup, and has to forcibly cancel all queries which can use these rows.
Longer queries will be canceled more often.
You can work around this by starting a repeatable read transaction on primary which does a dummy query and then sits idle while a real query is run on secondary. Its presence will prevent vacuuming of old row versions on primary.
More on this subject and other workarounds are explained in Hot Standby — Handling Query Conflicts section in documentation.
There's no need to start idle transactions on the master. In postgresql-9.1 the
most direct way to solve this problem is by setting
hot_standby_feedback = on
This will make the master aware of long-running queries. From the docs:
The first option is to set the parameter hot_standby_feedback, which prevents
VACUUM from removing recently-dead rows and so cleanup conflicts do not occur.
Why isn't this the default? This parameter was added after the initial
implementation and it's the only way that a standby can affect a master.
As stated here about hot_standby_feedback = on :
Well, the disadvantage of it is that the standby can bloat the master,
which might be surprising to some people, too
And here:
With what setting of max_standby_streaming_delay? I would rather
default that to -1 than default hot_standby_feedback on. That way what
you do on the standby only affects the standby
So I added
max_standby_streaming_delay = -1
And no more pg_dump error for us, nor master bloat :)
For AWS RDS instance, check http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.html
The table data on the hot standby slave server is modified while a long running query is running. A solution (PostgreSQL 9.1+) to make sure the table data is not modified is to suspend the replication and resume after the query:
select pg_xlog_replay_pause(); -- suspend
select * from foo; -- your query
select pg_xlog_replay_resume(); --resume
I'm going to add some updated info and references to #max-malysh's excellent answer above.
In short, if you do something on the master, it needs to be replicated on the slave. Postgres uses WAL records for this, which are sent after every logged action on the master to the slave. The slave then executes the action and the two are again in sync. In one of several scenarios, you can be in conflict on the slave with what's coming in from the master in a WAL action. In most of them, there's a transaction happening on the slave which conflicts with what the WAL action wants to change. In that case, you have two options:
Delay the application of the WAL action for a bit, allowing the slave to finish its conflicting transaction, then apply the action.
Cancel the conflicting query on the slave.
We're concerned with #1, and two values:
max_standby_archive_delay - this is the delay used after a long disconnection between the master and slave, when the data is being read from a WAL archive, which is not current data.
max_standby_streaming_delay - delay used for cancelling queries when WAL entries are received via streaming replication.
Generally, if your server is meant for high availability replication, you want to keep these numbers short. The default setting of 30000 (milliseconds if no units given) is sufficient for this. If, however, you want to set up something like an archive, reporting- or read-replica that might have very long-running queries, then you'll want to set this to something higher to avoid cancelled queries. The recommended 900s setting above seems like a good starting point. I disagree with the official docs on setting an infinite value -1 as being a good idea--that could mask some buggy code and cause lots of issues.
The one caveat about long-running queries and setting these values higher is that other queries running on the slave in parallel with the long-running one which is causing the WAL action to be delayed will see old data until the long query has completed. Developers will need to understand this and serialize queries which shouldn't run simultaneously.
For the full explanation of how max_standby_archive_delay and max_standby_streaming_delay work and why, go here.
It might be too late for the answer but we face the same kind of issue on the production.
Earlier we have only one RDS and as the number of users increases on the app side, we decided to add Read Replica for it. Read replica works properly on the staging but once we moved to the production we start getting the same error.
So we solve this by enabling hot_standby_feedback property in the Postgres properties.
We referred the following link
https://aws.amazon.com/blogs/database/best-practices-for-amazon-rds-postgresql-replication/
I hope it will help.
Likewise, here's a 2nd caveat to #Artif3x elaboration of #max-malysh's excellent answer, both above.
With any delayed application of transactions from the master the follower(s) will have an older, stale view of the data. Therefore while providing time for the query on the follower to finish by setting max_standby_archive_delay and max_standby_streaming_delay makes sense, keep both of these caveats in mind:
the value of the follower as a standby / backup diminishes
any other queries running on the follower may return stale data.
If the value of the follower for backup ends up being too much in conflict with hosting queries, one solution would be multiple followers, each optimized for one or the other.
Also, note that several queries in a row can cause the application of wal entries to keep being delayed. So when choosing the new values, it’s not just the time for a single query, but a moving window that starts whenever a conflicting query starts, and ends when the wal entry is finally applied.

postgres autovacuum retry mechanism

In postgres, when autovacuum runs and for some reason say its able to perform autovacuum - for example when hot_standby_feedback is set and there are long running queries on standby. Say for example tab1 has been updated and it triggers autovacuum, meanwhile a long running query on standby is running and sends this info to primary which will skip the vacuum on tab1.
Since the autovacuum got skipped for tab1, When does autovacuum run vacuum on the table again? Or it will not run autovacuum again on it and we would need to manually run vacuum on that table. Basically does autovacuum retry autovacuum on tables that could not be vacuumed for the first time?
Autovacuum does not get skipped due to hot_standby_feedback. It still runs, it just might not accomplish anything if no rows can be removed. If this is the case, then pg_stat_all_tables.n_dead_tup does not get decremented, which means that the table will probably get autovacuumed again the next time the database is assessed for autovacuuming as the stats that make it eligible have not changed. On an otherwise idle system, this will happen about once every however long it takes to scan not-all-visible part of the table, rounded up to the next increment of autovacuum_naptime.
It might be a good idea (although the use case is narrow enough that I doubt it) to suppress repeat autovacuuming of a table until the horizon has advanced far enough to make it worthwhile, but currently there is no code to do this.
Note that this differs from INSERT driven autovacuums. There, n_ins_since_vacuum does get reset, even if none of the tuples were marked all visible. So that vacuum will not get run again until the table cross some other threshold to make it eligible.

PostgreSQL ANALYZE statisticts & Replication

On my primary I ran a VACUUM then an ANALYZE on all databases, then when I check pg_stat_user_tables, the last_analyze column shows a current timestamp which is great.
When I check my replication instance, there are no values in the last_analyze column. I was assuming this timestamp would also eventually populate? Is this known behaviour?
The reason I ask is that after that VACUUM/ANALYZE on the primary, I'm running into some extremely slow queries on the replication instance. I ran an EXPLAIN plan prior to the VACUUM/ANALYZE on a query and it ran in 5 seconds... now it's taking 65 seconds. The EXPLAIN shows it's not using a lot of indexes that it should be.
PostgreSQL has two different stats systems. One records data about the distribution of values in the columns, this is transactional. It propagates to the replica via the WAL.
The other system records data about turn over on the tables and data on when the last vac/an was done. This system is used to determine when to schedule new vac/an (to prevent the first system from getting too out of date). This one is not transactional, and does not propagate to the replica.
So the replica has the latest column value distribution statistics (as soon as the WAL replays, anyway), but it doesn't know how recent they are.

Postgresql: Querying on Hot Stand-by, option to continue ignoring effects of vaccum on any relevant rows?

My long running SELECT queries against Hot stand-by are failing apparently due to replay on the standby leading to vacuum of some of the rows matching my query.
Is there support for an option where I can ask the Hot stand-by server to not bother about such changes to the rows (even the rows that were updated/deleted) and continue with the scan, for my query?
Or, is dropping all queries where a matching row was cleaned up during replay, vaccumm something the server always does and there's no other way supported.
You can use hot_standby_feedback to tell the primary server to not vacuum rows that the standby server is still using. If you are concerned about affecting the primary in this way, you could instead use one of max_standby_streaming_delay or max_standby_archive_delay (depending on if you are streaming or copying log files).
These are all detailed here: https://www.postgresql.org/docs/current/runtime-config-replication.html