Cannot kill a Postgres query - postgresql

So, this is a problem I've had a few times – where I'll accidentally do a SELECT * from a giant database. Normally, I just go in, get the pid of the query (SELECT * FROM pg_stat_activity), and then do a SELECT pg_cancel_backend(PID here), and voilà, it ends. But sometimes – especially with queries that will eventually yield a ridiculous number of rows – it just returns this:
db=# select pg_cancel_backend(5246);
pg_cancel_backend
-------------------
t
(1 row)
...and the query lives on! How do I kill these things??

I would try pg_terminate_backend(pid);

Related

postgres - how to kill all long running queries that match a particular query?

I have lots of background jobs that queue up the same query.
I've been having a situation where they get stuck, and I'd like to kill the really long running queries in some simple way, that won't take down the entire DB.
How can I do this using a query?
Let's assume my query is the following, and I want to kill any that have been running for over 60 minutes
select * from some_big_table;
This is the best I could come up with. Note I'm running on a mac
First I generated an MD5 of my query, at the command line. I did this so it'd simplify what the query looks like, to ensure I only matched my target query and to avoid "sql injecting" myself if I mistyped my query.
# on linux, use 'md5sum' instead.
$ echo -n 'select * from some_big_table;' | md5
65007f37ff78f1e66645105412430b7c
Then I just used a subselect to filter out any other queries that hadn't been running for > 60 minutes.
SELECT pg_cancel_backend(pid)
FROM pg_stat_activity
WHERE
now() - pg_stat_activity.query_start >= interval '60 minutes' AND
md5(query) = '65007f37ff78f1e66645105412430b7c' AND
state = 'active';
which seemed to work fine
pg_cancel_backend
-------------------
t
t
(2 rows)

Postgres slow distinct query for multiple columns

I have a very simple query that is taking way too long to run.
SELECT DISTINCT col1,col2,col3,col4 FROM tbl1;
What indexes do I need to add to speed up? I ran a simple vacuum; command and added the following index but neither helped.
CREATE INDEX tbl_idx ON tbl1(col1,col2,col3,col4);
The table has 400k rows. In fact counting them is taking extremely long as well. Running a simple
SELECT count(*) from tbl1;
is taking 8 seconds. So it's possible my problems are with vacuuming or reindexing or something I'm not sure.
Here is the explain command
EXPLAIN SELECT DISTINCT col1,col2,col3,col4 FROM tbl1;
QUERY PLAN
---------------------------------------------------------------------------------
Unique (cost=3259846.80..3449267.51 rows=137830 width=25)
-> Sort (cost=3259846.80..3297730.94 rows=15153657 width=25)
Sort Key: col1, col2, col3, col4
-> Seq Scan on tbl1 (cost=0.00..727403.57 rows=15153657 width=25)
(4 rows)
Edit: I'm currently running vacuum full; which hopefully fixes the issue and then maybe someone can give me some pointers on how to fix where I went wrong. It is several hours in and still going as far as I can tell. I did run
select relname, last_autoanalyze, last_autovacuum, last_vacuum, n_dead_tup from pg_stat_all_tables where n_dead_tup >0;
and the table has nearly 16 million n_dead_tup rows.
My data doesn't change that frequently so I ended up creating a materialized view
CREATE MATERIALIZED VIEW tbl1_distinct_view AS SELECT DISTINCT col1,col2,col3,col4 FROM tbl1;
that I refresh with a cronjob once a day at 6am
0 6 * * * psql -U mydb mydb -c 'REFRESH MATERIALIZED VIEW tbl1_distinct_view;
try force database to use your index
set enable_seqscan=off ;
SELECT DISTINCT col1,col2,col3,col4 FROM tbl1;
set enable_seqscan=on ;
VACUUM and VACUUM FULL are two commands that sound the same but have very different effects.
VACUUM scans a table for tuples that it no longer needs, so that it can overwrite that space during INSERT or UPDATE statements. This command only looks at deleted rows, and does not "defragment" the table - it leaves the space usage the same, but simply marks some space as "dead" in order that it can be reused.
VACUUM FULL looks at every row, and reclaims the space left by deleted rows and dead tuples, essentially "defragmenting" the table. If this is done on a live table, it can take a very long time, and can result in heavy weight locks, increased IO, and index bloat.
I imagine what you need is a VACUUM followed by an ANALYZE, which will rebuild your statistics for each table, improving index performance. These should be performed reasonably regularly in low-usage times for a database. Only if you have a lot of space to reclaim (due to lots of DELETE statements) should you use VACUUM FULL.
Anyhow, since you've run a VACUUM FULL, once that it complete you should run an ANALYZE on the database, followed by a REINDEX (on the database), and then an EXPLAIN on your query again, you should notice an improvement.

Postgresql Check if query is still running

At my work, I needed to build a new join table in a postgresql database that involved doing a lot of computations on two existing tables. The process was supposed to take a long time so I set it up to run over the weekend before I left on Friday. Now, I want to check to see if the query finished or not.
How can I check if an INSERT command has finished yet while not being at the computer I ran it on? (No, I don't know how many rows it was suppose to add.)
Select * from pg_stat_activity where state not ilike 'idle%' and query ilike 'insert%'
This will return all non-idle sessions where the query begins with insert, if your query does not show in this list then it is no longer running.
pg_stat_activity doc
You can have a look at the table pg_stat_activity which contains all database connections including active query, owner etc.
At https://gist.github.com/rgreenjr/3637525 there is a copy-able example how such a query could look like.

select from final table(update table) concurrent select i.e select from two threads

Due to a concurrency issue in my project which happend due to 2 threads coming in together to do a select at same time, both recieve the same values which ideally should not happen.
After selecting a value it should perform a update and then second thread should select the updated value.
Am using DB2
I thought of using this approach of using
select number from final table(update tablename set columnanme=""
where )
.
My question is would this approach lock the db when the other thread comes in to select the value as theere is an update within select? and solve my concurrency issue.
OR
I was browsing and found another approach
update table (.....) select col from table where wait for
outcome
Would this select wait until the first thread finishes the select?
One thing you can certainly do to avoid multiple reads of the same value before it gets updated by one of the readers:
LOCK TABLE tablename IN EXCLUSIVE MODE;
SELECT id, ... FROM tablename WHERE ...;
UPDATE tablename SET id=newval WHERE ...;
COMMIT;
This will of course block the full table, which is maybe not what you want!
Alternative approach (relatively standard, but somewhat more involved programming logic):
SELECT id, ... FROM tablename WHERE ...
SELECT count(1) FROM FINAL TABLE (UPDATE tablename SET id=newval
WHERE ... AND id=newval);
while this count(1) is zero (meaning: someone else meanwhile updated
it) repeat from 1)
--Peter Vanroose,
ABIS Training & Consulting,
Leuven, Belgium.

lock the rows until next select postgres

Is there a way in postgres to lock the rows until the next select query execution from the same system.And one more thing is there will be no update process on locked rows.
scenario is something like this
If the table1 contains data like
id | txt
-------------------
1 | World
2 | Text
3 | Crawler
4 | Solution
5 | Nation
6 | Under
7 | Padding
8 | Settle
9 | Begin
10 | Large
11 | Someone
12 | Dance
If sys1 executes
select * from table1 order by id limit 5;
then it should lock row from id 1 to 5 for other system which are executing select statement concurrently.
Later if sys1 again execute another select query like
select * from table1 where id>10 order by id limit 5;
then pereviously locked rows should be released.
I don't think this is possible. You cannot block a read only access to a table (unless that select is done FOR UPDATE)
As far as I can tell, the only chance you have is to use the pg_advisory_lock() function.
http://www.postgresql.org/docs/current/static/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS
But this requires a "manual" release of the locks obtained through it. You won't get an automatic unlocking with that.
To lock the rows you would need something like this:
select pg_advisory_lock(id), *
from
(
select * table1 order by id limit 5
) t
(Note the use of the derived table for the LIMIT part. See the manual link I posted for an explanation)
Then you need to store the retrieved IDs and later call pg_advisory_unlock() for each ID.
If each process is always releasing all IDs at once, you could simply use pg_advisory_unlock_all() instead. Then you will not need to store the retrieved IDs.
Note that this will not prevent others from reading the rows using "normal" selects. It will only work if every process that accesses that table uses the same pattern of obtaining the locks.
It looks like you really have a transaction which transcends the borders of your database, and all the change happens in an another system.
My idea is select ... for update no wait to lock the relevant rows, then offload the data into another system, then rollback to unlock the rows. No two select ... for update queries will select the same row, and the second select will fail immediately rather than wait and proceed.
But you don't seem to mark offloaded records in any way; I don't see why two non-consecutive selects won't happily select overlapping range. So I'd still update the records with a flag and/or a target user name and would only select records with the flag unset.
I tried both select...for update and pg_try_advisory_lock and managed to get near my requirement.
/*rows are locking but limit is the problem*/
select * from table1 where pg_try_advisory_lock( id) limit 5;
.
.
$_SESSION['rows'] = $rowcount; // no of row to process
.
.
/*afer each process of word*/
$_SESSION['rows'] -=1;
.
.
/*and finally unlock locked rows*/
if ($_SESSION['rows']===0)
select pg_advisory_unlock_all() from table1
But there are two problem in this
1. As Limit will apply before lock, every time the same rows are trying to lock in different instance.
2. Not sure whether pg_advisory_unlock_all will unlock the rows locked by current instance or all the instance.