Why Select queries on partitioned tables take lock and get stuck

Why Select queries on partitioned tables take lock and get stuck - postgresql

We designed a database so that it can accepts lots of data.
For that we used partitioned table quite a lot (because the way we handle trafic information in database can take advantage of the partitionning system).
to be more precise, we have table with partition, and partition that also have partition (with 4 levels)
main table
-> sub tables partitioned by row 1 (list)
-> sub tables partitioned by row 2 (list)
...
There are 4 main partitionned tables. Each one has from 40 to 120 (sub) partitions behing.
The query that take lock and is locked by others is a SELECT that work on these 4 tables, joined. (So counting partitions it work over about 250 tables)
We had no problem until now, maybe due to trafic increase. Now Select queries that use these tables, that normally are executed in 20ms, can wait up to 10seconds , locked and waiting.
When requesting pg_stat_activity I see that these queries are :
wait_event_type : LWLock
wait_event : lock_manager
I asked dev team and also confirmed in reading logs (primary and replica), there were nothing else running except select and insert / update queries on these tables.
These queries, select queries, are running on the replica servers.
I try to find on the internet before but everything I find is : yes there are exclusive lock on partitioned table , but it's when there are some operations like drop, attach / dettach partitions.. And it's not happening on my server while there is a problem.
Server is version 12.4, running on AWS Aurora.
What can make these queries locked and waiting for this LWLock ?
What could be my options to improve this behaviour ? (help my queries no being locked...)
EDIT :
Adding some details I ve been asked or that could be interesting :
number of connections :
usually : 10 connections opened by seconds
in peak (when the problem appear) : from 40 to 100 connections opened by second
during the problem, the numbre of opened connection vary from 100 to 200.
size of the database : 30 Gb - currently lots of partitions are empty.

You are probably suffering from internal contention on database resources caused by too many connections that all compete to use the same shared data structures. It is very hard to pinpoint the exact resource with the little information provided, but the high number of connections is a strong indication.
What you need is a connecrion pool that maintains a small number of persistent database connections. That will at the same time reduce the problematic contention and do away with the performance wasted on opening lots of short-lived database connections. Your overall throughput will increase.
If your application has no connection pool built in, use pgBouncer.

Related

Sudden Increase in row exclusive locks and connection exhaustion in PostgreSQL

I have a scenario that repeats itself every few hours. In every few hours, there is a sudden increase in row exclusive locks in PostgreSQL DB. In Meantime there seems that some queries are not responded in time and causes connection exhaustion to happen that PostgreSQL does not accept new clients anymore. After 2-3 minutes locks and connection numbers drops and the system comes back to normal state again.
I wonder if auto vacuum can be the root cause of this? I see analyze and vacuum (NOT FULL VACCUM) take about 20 seconds to complete on one of the tables. I have INSERT,SELECT,UPDATE and DELETE operations going on from my application and I don't have DDL commands (ALTER TABLE, DROP TABLE, CREATE INDEX, ...) going on. Can auto vacuum procedure conflict with queries from my application and cause them to wait until vacuum has completed? Or it's all the applications and my bad design fault? I should say one of my tables has a field of type jsonb that keeps relatively large data for each row (10 MB roughly).
I have attached an image from monitoring application that shows the sudden increase in row exclusive locks.

ROW EXCLUSIVE locks are perfectly harmless; they are taken on tables against which DML statements run. Your graph reveals nothing. You should set log_lock_waits = on and log_min_duration_statement to a reasonable value. Perhaps you can spot something in the logs. Also, watch out for long running transactions.

PostgreSQL ANALYZE statisticts & Replication

On my primary I ran a VACUUM then an ANALYZE on all databases, then when I check pg_stat_user_tables, the last_analyze column shows a current timestamp which is great.
When I check my replication instance, there are no values in the last_analyze column. I was assuming this timestamp would also eventually populate? Is this known behaviour?
The reason I ask is that after that VACUUM/ANALYZE on the primary, I'm running into some extremely slow queries on the replication instance. I ran an EXPLAIN plan prior to the VACUUM/ANALYZE on a query and it ran in 5 seconds... now it's taking 65 seconds. The EXPLAIN shows it's not using a lot of indexes that it should be.

PostgreSQL has two different stats systems. One records data about the distribution of values in the columns, this is transactional. It propagates to the replica via the WAL.
The other system records data about turn over on the tables and data on when the last vac/an was done. This system is used to determine when to schedule new vac/an (to prevent the first system from getting too out of date). This one is not transactional, and does not propagate to the replica.
So the replica has the latest column value distribution statistics (as soon as the WAL replays, anyway), but it doesn't know how recent they are.

Slow transaction processing in PostgreSQL

I have been noticing a bad behavior with my Postrgre database, but I still can't find any solution or improvement to apply.
The context is simple, let's say I have two tables: CUSTOMERS and ITEMS. During certain days, the number of concurrent customers increase and so the request of items, they can consult, add or remove the amount from them. However in the APM I can see how any new request goes slower than the previous, pointing at the query response from those tables as the highest time consumer.
If the normal execution of the query is about 200 milliseconds, few moments later it can be about 20 seconds.
I understand the lock process in PostgreSQL as many users can be checking over the same item, they could be even affecting the values of it, but the response from the database it's too slow.
So I would like to know if there are ways to improve the performance in the database.
The first time I used PGTune to get the initial settings and it worked well. I have version 11 with 20Gb for RAM, 4 vCPU and SAN storage, the simultaneous customers (no sessions) can reach over 500.

Measuring load per database in Postgres using 'active' processes in pg_stat_activity?

I am trying to measure the load that various databases living on the same Postgres server are incurring, to determine how to best split them up across multiple servers. I devised this query:
select
now() as now,
datname as database,
usename as user,
count(*) as processes
from pg_stat_activity
where state = 'active'
and waiting = 'f'
and query not like '%from pg_stat_activity%'
group by
datname,
usename;
But there were surprisingly few active processes!
Digging deeper I ran a simple query that returns 20k rows and took 5 seconds to complete, according to the client I ran it from. When I queried pg_stat_activity during that time, the process was idle! I repeated this experiment several times.
The Postgres documentation says active means
The backend is executing a query.
and idle means
The backend is waiting for a new client command.
Is it really more nuanced than that? Why was the process running my query was not active when I checked in?
If this approach is flawed, what alternatives are there for measuring load at a database granularity than periodically sampling the number of active processes?

your expectations regarding active, idleand idle in transaction are very right. The only explanation I can think of is a huge delay in displaying data client side. So the query indeed finished on server and session is idle and yet you don't see the result with client.
regarding the load measurement - I would not rely on number of active sessions much. Pure luck to hit the fast query in active state. Eg hypothetically you can check pg_stat_activity each second and see one active session, but between measurement one db was queried 10 times and another once - yet none of those numbers will be seen. Because they were active between executions. And this 10+1 active states (although mean that one db is queried 10times more often) do not mean you should consider load at all - because cluster is so much not loaded, that you can't even catch executions. But this unavoidably mean that you can catch many active sessions and it would not mean that server is loaded indeed.
so at least take now()-query_start to your query to catch longer queries. Or even better save execution time for some often queries and measure if it degrades over time. Or better select pid and check resources eaten by that pid.
Btw for longer queries look into pg_stat_statements - looking how they change over time can give you some expectations on how the load changes

Sqoop PSQLException "Sorry, too many clients already" on large exports

I'm seeing Sqoop throw PSQLException "Sorry, too many clients already" when exporting large (200+ million rows) tables from HDFS to Postgres. I have a few smaller tables (~3 million) that seem to be working fine.
Even when the large tables fail it seems I still get ~2 million rows into my postgres table but I'm guessing that's just from the workers that didn't die since they got one of the connections first. My Postgres table is configured to allow 300 max_connections and there are about 70 connections always active from other applications so SQOOP should have ~230 to use.
I've tried toggling --num-mappers between 2-8 in my SQOOP export command but that hasn't seemed to make much of a difference. Looking at the failed hadoop job in the job tracker shows "Num tasks" as 3,660 in the map stage and "Failed/Killed Task Attempts" is showing 184/273, if that helps at all
Is there anyway to set a max # of connections to use? Anything else I can do here? Happy to provide additional info if it's needed.
Thanks.

Figured it out for my specific scenario. Thought I'd share my findings:
The core of the problem was the number of simultaneous map tasks running at a single time.
The large tables had 280 map tasks running simultaneously (3,660
total)
List item the small tables had 180 map tasks running simultaneously (180
total).
Due to this, the tasks were running out of connections since each of the 280 would try to spawn a connection and 280 + the existing 70 is > 300. So, I had two options: (1) jack up the postgres max_connection limit a bit, or (2) reduce the number of map tasks running at a single time.
I went with (1) since I control the database and jacked up max_connections to 400 and moved on with life.
FWIW, it looks like (2) is do-able with the following, but I couldn't test it since I don't control the HDFS cluster:
https://hadoop.apache.org/docs/r1.0.4/mapred-default.html
mapred.jobtracker.maxtasks.per.job

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse