Deal with Postgresql Error -canceling statement due to conflict with recovery- in psycopg2

Deal with Postgresql Error -canceling statement due to conflict with recovery- in psycopg2 - postgresql

I'm creating a reporting engine that makes a couple of long queries over a standby server and process the result with pandas. Everything works fine but sometimes I have some issues with the execution of those queries using a psycopg2 cursor: the query is cancelled with the following message:
ERROR: cancelling statement due to conflict with recovery
Detail: User query might have needed to see row versions that must be removed
I was investigating this issue
PostgreSQL ERROR: canceling statement due to conflict with recovery
https://www.postgresql.org/docs/9.0/static/hot-standby.html#HOT-STANDBY-CONFLICT
but all solutions suggest fixing the issue making modifications to the server's configuration. I can't make those modifications (We won the last football game against IT guys :) ) so I want to know how can I deal with this situation from the perspective of a developer. Can I resolve this issue using python code? My temporary solution is simple: catch the exception and retry all the failed queries. Maybe could be done better (I hope so).
Thanks in advance

There is nothing you can do to avoid that error without changing the PostgreSQL configuration (from PostgreSQL 9.1 on, you could e.g. set hot_standby_feedback to on).
You are dealing with the error in the correct fashion – simply retry the failed transaction.

The table data on the hot standby slave server is modified while a long running query is running. A solution (PostgreSQL 9.1+) to make sure the table data is not modified is to suspend the replication on the slave and resume after the query.
select pg_xlog_replay_pause(); -- suspend
select * from foo; -- your query
select pg_xlog_replay_resume(); --resume

I recently encountered a similar error and was also in the position of not being a dba/devops person with access to the underlying database settings.
My solution was to reduce the time of the query where ever possible. Obviously this requires deep knowledge of your tables and data, but I was able to solve my problem with a combination of a more efficient WHERE filter, a GROUPBY aggregation, and more extensive use of indexes.
By reducing the amount of server side execute time and data, you reduce the chance of a rollback error occurring.
However, a rollback can still occur during your shortened window, so a comprehensive solution would also make use of some retry logic for when a rollback error occurs.
Update: A colleague implemented said retry logic as well as batching the query to make the data volumes smaller. These three solutions have made the problem go away entirely.

I got the same error. What you CAN do (if the query is simple enough), is deviding the data into smaller chunks as a workaround.
I did this within a python loop to call the query multiple times with the LIMIT and OFFSET parameter like:
query_chunk = f"""
SELECT *
FROM {database}.{datatable}
LIMIT {chunk_size} OFFSET {i_chunk * chunk_size}
"""
where database and datatable are the names of your sources..
The chunk_size is individually and to set this to a not too high value is crucial for the query to finish.

Related

ORA-08177: can't serialize access for the transaction with oracle_FDW

oracle_fdw from postgres was randomly throwing ORA-08177. Wanted to either do a retry or figure out what was happening.

https://github.com/laurenz/oracle_fdw
Unfortunately Oracle's implementation of SERIALIZABLE is rather bad and causes serialization errors (ORA-08177) in unexpected situations
Using READ COMMITTED transactions works around this problem, but there is a risk of inconsistencies. If you want to use it, check your execution plans if the foreign scan could be executed more than once.
alter server oracle options ( set isolation_level 'read_committed');

Does q/kdb+ ever act asynchronously when synchronously instructed to write tables to disk?

I've had a strange bug pop up -- when I write to a partitioned table, then immediately after do a select on that same table, I get an error like:
./2018.04.23/ngbarx/cunadj. OS reports: Operation not permitted.
This error does not appear if, after writing the table, I wait a few seconds. This to me seems like pointing towards there being a caching situation, where q responds before an operation is complete, but afaik everything I am doing should be synchronous.
I would love to understand what I am doing wrong / what is causing this error exactly / which commands are executing asynchronously.
The horrible details:
Am writing from Python connected to q synchronously using the qpython3 package
The q session is launched with slaves i.e. -s 4
To write the partitioned table, I am using the unofficial function .Q.dcfgnt which can be found here
I write to a q session that was initialized with a database directory as is usual when dealing with partitioned tables
After writing the table with .Q.dcfgnt, but before doing the select, I also do .Q.chk`:/db/; system"l /db/"; .Q.cn table in that order, just to be sure the table is up and ready to use in the q session. These might be both overkill and in the wrong order, but I believe they are all synchronous calls afaik, please correct me if I am wrong.
The trigger to the error is a 10#select from table; I understand
why this is a bad idea to do in general on a partitioned table, but
also from my understanding it shouldn't be triggering the particular error that I am getting.

JOOQ Cannot get autoCommit to a PostgreSQL database

I have the following setup where a service layer, using jooq, contacts a PostgreSQL database.
In this scenario, whenever multiple requests happen quickly one after another (or even not that quickly), I get the following error message:
Internal error processing createItem: Cannot get autoCommit
My queries all run within transactions (using jooq's transactionResult methods).
Searching has not yielded many results, and I do not see why autoCommit should even be enabled in those cases. Is this most likely a configuration issue, or is there something else I can try to troubleshoot this issue better?

I noticed the same problem and message when running massive batch uploads on the limit of physical memory and limited amount of db connection (specific to my environment). It would be hard to provide a reproduction case for that, but to me this is a sign of db performance/memory starvation. Reduction of Java execution threads helped in my case.

SQL queries running slowly or stuck after DBCC DBReindex or Alter Index

All,
SQL 2005 sp3, database is about 70gb in size. Once in a while when I reindex all of my indexes in all of my tables, the front end seems to freeze up or run very slowly. These are queries coming from the front end, not stored procedures in sql server. The front end is using JTDS JDBC connection to access the SQL Server. If we stop and restart the web services sending the queries the problem seems to go away. It is my understandning that we have a connection pool in which we re-use connections and dont establish a new connection each time.
This problem does not happen every time we reindex. I have tried both ways with dbcc dbreindex and alter index online = on and sort in tempdb = on.
Any insight into why this problem occurs once in a while and how to prevent this problem would be very helpful.
Thanks in advance,
Gary Abbott

When this happens next time, look into sys.dm_exec_requests to see what is blocking the requests from the clients. The blocking_session_id will indicate who is blocking, and the wait_type and wait_resource will indicate what is blocking on. You can also use the Activity Monitor to the same effect.
On a pre-grown database an online index rebuild will not block normal activity 9select/insert/update/delete). The load on the server may increase as a result of the online index rebuild and this could result in overall slower responses, but should not cause blocking.
If the database is not pre-grown though then the extra allocations of the index rebuild will trigger database growth events, which can be very slow if left default at 10% increments and without instant file initialisation enabled. During a database growth event all activity is frozen in that database, and this may be your problem even if the indexes are rebuilt online. Again, Activity Monitor and sys.dm_exec_requests would both clearly show this as happening.

Transactions, when should be discarded and rolledback

I'm trying to debug an application (under PostgreSQL) and came across the following error:
"current transaction is aborted, commands ignored".
As far as I can understand a "transaction" is just a notion related to the underlying database connection.
If the connection has an auto commit "false", you can execute queries through the same Statement as long as it isn't failing. In which case you should rollback.
If auto commit is "true" then it doesn't matter as long as all your queries are considered atomic.
Using auto commit false, I get the aforementioned error by PostgreSQL even when a simple
select * from foo
fails, which makes me ask, under which SQLException(s) is a "transaction" considered invalid and should be rolled backed or not used for another query?
using MacOS 10.5, Java 1.5.0_16, PostgreSQL 8.3 with JDBC driver 8.1-407.jdbc3

That error means that one of the queries sent in a transaction has failed, so the rest of the queries are ignored until the end of the current transaction (which will automatically be a rollback). To PostgreSQL the transaction has failed, and it will be rolled back in any case after the error with one exception. You have to take appropriate measures, one of
discard the statement and start anew.
use SAVEPOINTs in the transaction to be able to get back to that point in time and try another path. (This is the exception)
Enable query logging to see which query is the failing one and why.
In any case the exact answer to your question is that any SQLException should mean a rollback happened when the end of transaction command is sent, that is when a COMMIT or ROLLBACK (or END) is issued. This is how it works, if you use savepoints you'll still be bound by the same rules, you'll just be able to get back to where you saved and try something else.

It seems to be a characteristic behaviour of PostgreSQL that is not shared by most other DBMS. In general (outside of PostgreSQL), you can have one operation fail because of an error and then, in the same transaction, can try alternative actions that will succeed, compensating for the error. One example: consider a merging (insert/update) operation. If you try to INSERT the new record but find that it already exists, you can switch to an UPDATE operation that changes the existing record instead. This works fine in all the main DBMS. I'm not certain that it does not work in PostgreSQL, but the descriptions I've seen elsewhere, as well as in this question, suggest that when the attempted INSERT means that any further activity in the transaction is doomed to fail too. Which is at best draconian and at worst 'unusable'.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse