Postgres query idle after running for a few hours - postgresql

I'm doing a huge upgrade on a Database. I have quite a complex query that needs a few hours to run - I tested it on some sample data and the query is fine.
After analyzing my queries, I saw that my query changed from state 'ACTIVE' to 'IDLE' after running for 3:30h.
What does that exactly mean? The PostgreSQL manual indicates that this means the transaction is open (inside BEGIN) and idle. I
Will my query end? Should I kill it and find a smarter way to upgrade?

Related

Postgres server-side cursor with LEFT JOIN does not return on Heroku PG

I have a Heroku app that uses a psycopg server-side cursor together with a LEFT JOIN query running on Heroku PG 13.5.
The query basically says “fetch items from one table, that don’t appear in another table”.
My data volume is pretty stable, and this has been working well for some time.
This week these queries stopped returning. In pg_stat_activity they appeared as active indefinitely (17+ hours), similarly in heroku pg:ps. There appeared to be no deadlocks. All the Heroku database metrics and logs appeared healthy.
If I run the same queries directly in the console (without a cursor) they return in a few seconds.
I was able to get it working again in the cursor by making the query a bit more efficient (switching from LEFT JOIN to NOT EXISTS; dropping one of the joins).
My questions are:
Why might the original query perform fine in the console, but not return with a psycopg server-side cursor?
How might I debug this?
What might have changed this week to trigger the issue?
I can say that:
However I write the query (LEFT JOIN, Subquery, NOT EXISTS), the query plan involves a Nested Loop Anti Join
I don’t believe this is related to the Heroku outage the following day (and which didn’t affect Heroku PG)
Having Googled extensively, the closest thing I can find to a hypothesis to explain this is a post on the PG message boards from 2003 entitled left join in cursor where the response is “Some plan node types don't cope very well with being run backwards.”
Any advice appreciated!
If you are using a cursor, PostgreSQL estimates that only 10% of the query result will be fetched quickly and prefers plans that return the first few rows quickly, at the expense of the total query cost.
You can disable this optimization by setting the PostgreSQL parameter cursor_tuple_fraction to 1.0.

How to cancel PostgreSQL query?

I am not very familiar with databases or SQL and wanted to make sure that I don't mess anything up. I did:
SELECT pid, state, usename, query FROM pg_stat_activity;
To check if I had any queries and there were several that had the state active. Do I just cancel them by doing:
select pg_cancel_backend(PID);
And this won't affect anything except the my queries, correct? I also wanted to figure out why those queries were still in the state active. I have a python file where I read in my sql file, but I stopped running the python file in the middle of reading my sql file. Is that possibly why it happened and why the states are still active?
Yes, this is what pg_cancel_backend(pid) is for. Why exactly the query is still running depends on a few things - could be waiting to grab a lock, or the query could just take a long time - but given the python processes that started the queries have exited, the connection is technically already closed, the PG backend process just hasn't noticed yet. It won't notice until the query completes and it tries to return the query status to the client, at which point it'll rollback the transaction when it sees the connection is no longer present.
The only effect pg_cancel_backend on the PIDs of those backends should have is to cause PG to notice the connection is closed immediately, rather than whenever the query completes.

Multiple updates performance improvement

I have built an application with Spring Boot and JPA to migrate a Jira postgres database.
Basically, I have 5000 users that I need to migrate. Each user means 67 update queries in different tables.
Each query uses the LOWER function to compare ignoring case.
Some pseudo-code:
for (user : users){
for (query : queries) {
jdbcTemplate.execute(query.replace(user....
I ignore any errors, so if a single query fails, I still go on and execute the other 66.
I am running this in 10 separate threads and each user is taking roughly 120 seconds to migrate. (20 threads resulted in database dead lock)
At this pace, it's gonna take more than a day, which is not acceptable (I am running this in a test environment before doing in production).
The queries looks like this:
UPDATE table SET column = 'NEWUSERNAME' where LOWER(column) = LOWER('CURRENTUSERNAME');
Is there anything I can do to try and optimize this migration?
UPDATE:
I changed my approach. First, I select every element with the CURRENTUSERNAME and get it's ID. Then I create the UPDATE queries using the ID as the "where" clause.
Other than that, it is still taking a long time (4+ hours) to execute.
I am running millions of UPDATEs, each at a time. I know jdbcTemplate has a bulk method, but if a single UPDATE fails, I believe it roll's back every successful update too. Also, I am not aware of the performance improvement it would have, if any.
So, to update the question, given that I have millions of UPDATE queries to run, what would be the best way execute them? (bulk, multi threading, something else)

Small UPDATE makes hang issue

PostgreSQL 9.5.
A very small update SQL uses very high CPU for a long time like a hang.
My Windows console application uses a simple UPDATE statement to update the latest time as follows.
UPDATE META_TABLE SET latest_time = current_timestamp WHERE host = 'MY_HOST'
There are just 2 console applications which issue above SQL.
No index on META_TABLE.
Only 1 row.
When it is hanging, no lock information.
UNLOGGED Table.
IDLE status in pg_stat_activity
Commit after UPDATE.
During that the hang time, I can INSERT or DELETE data with the table above.
After starting the application, about in 20 minutes, this issue happens.
I think it is not a SQL statement or table structure issue, probably something wrong in a database side.
Can you guess anything to resolve this issue?
Update
There are 2 db connections in the console application. 1 for Select & 1 for DML.
I tried to close DML DB connection every 2 minutes. Then, I haven't seen the issue!! However, the hang issue happened on SELECT statement (also very simple SELECT).
It seems that there is a some limit per the session.
Now, I am also closing the Select db connection as well per 3 minutes and monitoring.

Deal with Postgresql Error -canceling statement due to conflict with recovery- in psycopg2

I'm creating a reporting engine that makes a couple of long queries over a standby server and process the result with pandas. Everything works fine but sometimes I have some issues with the execution of those queries using a psycopg2 cursor: the query is cancelled with the following message:
ERROR: cancelling statement due to conflict with recovery
Detail: User query might have needed to see row versions that must be removed
I was investigating this issue
PostgreSQL ERROR: canceling statement due to conflict with recovery
https://www.postgresql.org/docs/9.0/static/hot-standby.html#HOT-STANDBY-CONFLICT
but all solutions suggest fixing the issue making modifications to the server's configuration. I can't make those modifications (We won the last football game against IT guys :) ) so I want to know how can I deal with this situation from the perspective of a developer. Can I resolve this issue using python code? My temporary solution is simple: catch the exception and retry all the failed queries. Maybe could be done better (I hope so).
Thanks in advance
There is nothing you can do to avoid that error without changing the PostgreSQL configuration (from PostgreSQL 9.1 on, you could e.g. set hot_standby_feedback to on).
You are dealing with the error in the correct fashion – simply retry the failed transaction.
The table data on the hot standby slave server is modified while a long running query is running. A solution (PostgreSQL 9.1+) to make sure the table data is not modified is to suspend the replication on the slave and resume after the query.
select pg_xlog_replay_pause(); -- suspend
select * from foo; -- your query
select pg_xlog_replay_resume(); --resume
I recently encountered a similar error and was also in the position of not being a dba/devops person with access to the underlying database settings.
My solution was to reduce the time of the query where ever possible. Obviously this requires deep knowledge of your tables and data, but I was able to solve my problem with a combination of a more efficient WHERE filter, a GROUPBY aggregation, and more extensive use of indexes.
By reducing the amount of server side execute time and data, you reduce the chance of a rollback error occurring.
However, a rollback can still occur during your shortened window, so a comprehensive solution would also make use of some retry logic for when a rollback error occurs.
Update: A colleague implemented said retry logic as well as batching the query to make the data volumes smaller. These three solutions have made the problem go away entirely.
I got the same error. What you CAN do (if the query is simple enough), is deviding the data into smaller chunks as a workaround.
I did this within a python loop to call the query multiple times with the LIMIT and OFFSET parameter like:
query_chunk = f"""
SELECT *
FROM {database}.{datatable}
LIMIT {chunk_size} OFFSET {i_chunk * chunk_size}
"""
where database and datatable are the names of your sources..
The chunk_size is individually and to set this to a not too high value is crucial for the query to finish.