POSTGRES: Find out how long a lock has been acquired - postgresql

i'm currently using Postgres as a database engine for an application.
I currently have a situation where I have a lot of READ locks (AccessSharedLocks) that are present. I run the following query to check for the locks:
SELECT t.schemaname,
t.relname,
l.locktype,
l.page,
l.virtualtransaction,
l.pid,
l.mode,
l.granted
FROM pg_locks l
JOIN pg_stat_all_tables t ON l.relation = t.relid
WHERE t.schemaname <> 'pg_toast'::name AND t.schemaname <> 'pg_catalog'::name
What I would like to know is how long the lock was acquired for by a table. Is there any way that I can retrieve this information?
Thank you in advance.

The time when a lock has been taken is not available in PostgreSQL.
The best you can do is to take the transaction start time xact_start from pg_stat_activity, that is a lower boundary for the age of the lock.
Transactions should always be short, because long transactions they hold locks and keep autovacuum from doing its job.
If you have any long running transactions, that might be the problem that you have to fix. Then the locks won't be such a problem.

Related

DBT Model hanging

I use macOS and I am having issue with full-refresh on a large table. During the run it appears as if it hangs and there is no query running in redshift. It does not happen with smaller tables and it does not happen if I run an incremental. This table used to be smaller and I was able to run a full refresh as long as I specified the table. Now that it is bigger I seem to be running into this issue. There are 6 tables that this model is dependent on. Almost like the command isn’t being sent. Any suggestions?
There is no error because it just doesn't run. Other team members running this on windows and macos expect it to finish in 10 min. Currently it is 30 min but I have let it sit a lot longer than that.
My command is
dbt run --models +fct_mymodel --full-refresh --vars "run_date_start: 2020-06-01"
Thank you
Redhift UI usually shows only the long running queries. I ran into similar problems and they were caused by lock on some tables - in our case caused by uncommitted explicit transactions (BEGIN without COMMIT or ROLLBACK).
run this query to see current transactions and their locks:
select a.txn_owner, a.txn_db, a.xid, a.pid, a.txn_start, a.lock_mode, a.relation as table_id,nvl(trim(c."name"),d.relname) as tablename, a.granted,b.pid as blocking_pid ,datediff(s,a.txn_start,getdate())/86400||' days '||datediff(s,a.txn_start,getdate())%86400/3600||' hrs '||datediff(s,a.txn_start,getdate())%3600/60||' mins '||datediff(s,a.txn_start,getdate())%60||' secs' as txn_duration
from svv_transactions a
left join (select pid,relation,granted from pg_locks group by 1,2,3) b
on a.relation=b.relation and a.granted='f' and b.granted='t'
left join (select * from stv_tbl_perm where slice=0) c
on a.relation=c.id
left join pg_class d on a.relation=d.oid
where a.relation is not null;
read AWS knowledge base entry for more details https://aws.amazon.com/premiumsupport/knowledge-center/prevent-locks-blocking-queries-redshift/

Why is the postgreSQL waiting while executing vacuum full table? 4T table data

I have a bloated table, its name is "role_info".
There are about 20K insert operations and a lot of update operations per day, there are no delete operations.
The table is about 4063GB now.
We have migrated the table to another database using dump, and the new table is about 62GB, so the table on the old database is bloated very seriously.
PostgreSQL version: 9.5.4
The table schema is below:
CREATE TABLE "role_info" (
"roleId" bigint NOT NULL,
"playerId" bigint NOT NULL,
"serverId" int NOT NULL,
"status" int NOT NULL,
"baseData" bytea NOT NULL,
"detailData" bytea NOT NULL,
PRIMARY KEY ("roleId")
);
CREATE INDEX "idx_role_info_serverId_playerId_roleId" ON "role_info" ("serverId", "playerId", "roleId");
The average size of field 'detailData' is about 13KB each line.
There are some SQL execution results below:
1)
SELECT
relname AS name,
pg_stat_get_live_tuples(c.oid) AS lives,
pg_stat_get_dead_tuples(c.oid) AS deads
FROM pg_class c
ORDER BY deads DESC;
Execution Result:
2)
SELECT *,
Pg_size_pretty(total_bytes) AS total,
Pg_size_pretty(index_bytes) AS INDEX,
Pg_size_pretty(toast_bytes) AS toast,
Pg_size_pretty(table_bytes) AS TABLE
FROM (SELECT *,
total_bytes - index_bytes - Coalesce(toast_bytes, 0) AS
table_bytes
FROM (SELECT c.oid,
nspname AS table_schema,
relname AS TABLE_NAME,
c.reltuples AS row_estimate,
Pg_total_relation_size(c.oid) AS total_bytes,
Pg_indexes_size(c.oid) AS index_bytes,
Pg_total_relation_size(reltoastrelid) AS toast_bytes
FROM pg_class c
LEFT JOIN pg_namespace n
ON n.oid = c.relnamespace
WHERE relkind = 'r') a
WHERE table_schema = 'public'
ORDER BY total_bytes DESC) a;
Execution Result:
3)
I have tried to vacuum full the table "role_info", but it seemed blocked by some other process, and didn't execute at all.
select * from pg_stat_activity where query like '%VACUUM%' and query not like '%pg_stat_activity%';
Execution Result:
select * from pg_locks;
Execution Result:
There are parameters of vacuum:
I have two questions:
How to deal with table bloating? autovacuum seems not working.
Why did the vacuum full blocked?
With your autovacuum settings, it will sleep for 20ms once for every 10 pages (200 cost_limit / 20 cost_dirty) it dirties. Even more because there will also be cost_hit and cost_miss as well. At that rate is would take over 12 days to autovacuum a 4063GB table which is mostly in need of dirtying pages. That is just the throttling time, not counting the actual work-time, nor the repeated scanning of the indexes. So it the actual run time could be months. The chances of autovacuum getting to run to completion in one sitting without being interrupted by something could be pretty low. Does your database get restarted often? Do you build and drop indexes on this table a lot, or add and drop partitions, or run ALTER TABLE?
Note that in v12, the default setting of autovacuum_vacuum_cost_delay was lowered by a factor of 10. This is not just because of some change to the code in v12, it was because we realized the default setting was just not sensible on modern hardware. So it would probably make sense to backport this change your existing database, if not go even further. Before 12, you can't lower to less than 1 ms, but you could lower it to 1 ms and also either increase autovacuum_vacuum_cost_delay or lower vacuum_cost_page_* setting.
Now this analysis is based on the table already being extremely bloated. Why didn't autovacuum prevent it from getting this bloated in the first place, back when the table was small enough to be autovacuumed in a reasonable time? That is hard to say. We really have no evidence as to what happened back then. Maybe your settings were even more throttled than they are now (although unlikely as it looks like you just accepted the defaults), maybe it was constantly interrupted by something. What is the "autovacuum_count" from pg_stat_all_tables for the table and its toast table?
Why did the vacuum full blocked?
Because that is how it works, as documented. That is why it is important to avoid getting into this situation in the first place. VACUUM FULL needs to swap around filenodes at the end, and needs an AccessExclusive lock to do that. It could take a weaker lock at first and then try to upgrade to AccessExclusive later, but lock upgrades have a strong deadlock risk, so it takes the strongest lock it needs up front.
You need a maintenance window where no one else is using the table. If you think you are already in such window, then you should look at the query text for the process doing the blocking. Because the lock already held is ShareUpdateExclusive, the thing holding it is not a normal query/DML, but some kind of DDL or maintenance operation.
If you can't take a maintenance window now, then you can at least do a manual VACUUM without the FULL. This takes a much weaker lock. It probably won't shrink the table dramatically, but should at least free up space for internal reuse so it stops getting even bigger while you figure out when you can schedule a maintenance window or what your other next steps are.

Postgresql Check if query is still running

At my work, I needed to build a new join table in a postgresql database that involved doing a lot of computations on two existing tables. The process was supposed to take a long time so I set it up to run over the weekend before I left on Friday. Now, I want to check to see if the query finished or not.
How can I check if an INSERT command has finished yet while not being at the computer I ran it on? (No, I don't know how many rows it was suppose to add.)
Select * from pg_stat_activity where state not ilike 'idle%' and query ilike 'insert%'
This will return all non-idle sessions where the query begins with insert, if your query does not show in this list then it is no longer running.
pg_stat_activity doc
You can have a look at the table pg_stat_activity which contains all database connections including active query, owner etc.
At https://gist.github.com/rgreenjr/3637525 there is a copy-able example how such a query could look like.

How to list all locked rows of a table?

My application uses pessimistic locking. When a user opens the form for update a record, the application executes this query (table names are exemplary):
begin;
select *
from master m
natural join detail d
where m.master_id = 123456
for update nowait;
The query locks one master row and several (to several dozen) detail rows. Transaction is open until a user confirms or cancels updates.
I need to know what rows (at least master rows) are locked. I have excavated the documentation and postgres wiki without success.
Is it possible to list all locked rows?
PostgreSQL 9.5 added a new option to FOR UPDATE that provides a straightforward way to do this.
SELECT master_id
FROM master
WHERE master_id NOT IN (
SELECT master_id
FROM master
FOR UPDATE SKIP LOCKED);
This acquires locks on all the not-currently-locked rows, so think through whether that's a problem for you, especially if your table is large. If nothing else, you'll want to avoid doing this in an open transaction. If your table is huge you can apply additional WHERE conditions and step through it in chunks to avoid locking everything at once.
Is it possible? Probably yes, but it is the Greatest Mystery of Postgres. I think you would need to write your own extension for it (*).
However, there is an easy way to work around the problem. You can use very nice Postgres feature, advisory locks. Two arguments of the function pg_try_advisory_lock(key1 int, key2 int) you can interpret as: table oid (key1) and row id (key2). Then
select pg_try_advisory_lock(('master'::regclass)::integer, 123456)
locks row 123456 of table master, if it was not locked earlier. The function returns boolean.
After update the lock has to be freed:
select pg_advisory_unlock(('master'::regclass)::integer, 123456)
And the nicest thing, list of locked rows:
select classid::regclass, objid
from pg_locks
where locktype = 'advisory'
Advisory locks may be complementary to regular locks or you can use them independently. The second option is very temptive, as it can significantly simplify the code. But it should be applied with caution because you have to make sure that all updates (deletes) on the table in all applications are performed with this locking.
(*) Mr. Tatsuo Ishii did it (I did not know about it, have just found).

Postgresql - Why is DROP VIEW command hanging?

I want to perform a simple DROP VIEW ... but it hangs.
I have run this query SELECT * FROM pg_locks WHERE NOT granted taken from this page on Lock Monitoring.
However the following query they suggest returns no results:
SELECT bl.pid AS blocked_pid,
a.usename AS blocked_user,
kl.pid AS blocking_pid,
ka.usename AS blocking_user,
a.query AS blocked_statement
FROM pg_catalog.pg_locks bl
JOIN pg_catalog.pg_stat_activity a ON a.pid = bl.pid
JOIN pg_catalog.pg_locks kl ON kl.transactionid = bl.transactionid AND kl.pid != bl.pid
JOIN pg_catalog.pg_stat_activity ka ON ka.pid = kl.pid
WHERE NOT bl.granted;
Where should I look now ?
Finally I figure out what was wrong. Here are the steps to find the root cause:
Solution
Step 1 : List requested locks not granted
select * from pg_locks where not granted;
In my case, an attempt to lock, with the mode AccessExclusiveLock, the view I want to drop was not granted. This is why my DROP VIEW... hangs.
Step 2 : Find which other process(es) held a conflicting lock
select * from pg_locks where relation = <oid_of_view>
Here I list all processes locking or trying to lock on my view. I found out two processes, the one that want to drop the view and... another one.
Step 3 : Find out what other process(es) is/are doing now
select xact_start,query_start,backend_start,state_change,state from pg_stat_activity where pid in (<list_of_other_process(es)_pid>);
I had only one process holding a lock in my case. Surprisingly, its state was : idle in transaction
I was not able to drop the view because another process was idle in transaction. I simply kill it to solve my issue. For example, if the procpid was 8484 and let's suppose my postgresql server runs on a Linux box, then in the shell, I execute the following command:
$ kill -9 8484
Discussion
If you face similar issue, you can quickly find out what's going on by reproducing steps 1,2,3. You may need to customize Step 2 in order to find other conflicting process(es).
References
Lock Monitoring
Lock Dependency Information
View Postgresql Locks
I had a similar problem but the accepted answer didn't work for me as I do not have admin access to kill any process. Instead, this is how I managed to solve the problem:
Issue SELECT * FROM pg_stat_activity; to get the stats about the PostgreSQL activities.
In query column, look for the queries that read from that view. You may choose to narrow down your search by only looking into the rows related to your username (using username column) or query_start if you know when the issue emerged. There could be more than one row associated with your unwanted view.
Identify all pid from the rows in the above step and plug them into SELECT pg_terminate_backend(<pid>); (instead of <pid>) one by one and run them.
Now you should be able to drop your view.
Please note that as you terminate the backend processes using pg_terminate_backend(), you may face some errors. The reason is that terminating some process may automatically end other processes. Therefore, some of the identified PIDs might be invalid by the time.
As a summary, this solution from comments worked for me:
Step 1: Find the pid:
select * from pg_stat_activity
where pid in
(select pid from pg_locks
where relation =
(select relation from pg_locks where not granted));
Step 2: kill pid:
kill -9 pid