Postgres EXPLAIN ANALYZE: Total time greatly exceeds sum of parts - postgresql

I have a python script that receives external events and writes them to Postgres database.
Most of the time INSERT operates quite fast (< 0.3 sec). But sometimes query execution time exceeds 10-15 seconds! There are about 500 events per minute and such a slow behavior is unacceptable.
The query is simple:
INSERT INTO tests (...) VALUES (...)
The table (has about 5 million records) is quite simple too:
I've added 'EXPLAIN ANALYZE' before 'INSERT' in my script and it gives me this:
Insert on tests (cost=0.00..0.01 rows=1 width=94) (actual time=0.051..0.051 rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=94) (actual time=0.010..0.010 rows=1 loops=1)
Planning time: 0.014 ms
Execution time: 15624.374 ms
How can this be possible? How can I find out what is it doing these 15 seconds?
I'm using Windows server and Postgres 9.6. Script and Postgres are on the same machine.
Additionally I've collected Windows Performance Counter's data during this (disk queue length, Processor time) and it showed nothing.
This server is virtual server on VMWare ESXi, but I don't know what can I examine in it about this situation.
Added
These queries are running in multiple threads (several parallel scripts do that).
INSERT is done without explicit transaction.
There is no triggers (there was a trigger, but I removed it and nothing changed) and no foreign keys.
What if you execute it a second time?
this query is executed in several scripts about 400 times a minute (total). And most of the time it executes quickly. I cannot catch this long execution time in query tool.
I will definitely try to look into pg_stat_activity next time I'll see this, thanks.

Related

Postgres 9.6 Gather node finishing after aggregate when parallel workers are present

I tried running "Explain Analyze" a select statement with parallel workers.
The query plan that it returns have Gather node taking more time than its parent, the Aggregate node.
Can anyone help me understand why this happens?

client_backend vs parallel_worker?

I'm running:
select *
from pg_stat_activity
And it shows 2 rows with same query content (under query field), and in active state,
but one row show client_backed (backend_type) and the other row show parallel_worker (backend_type)
why do I have 2 instances of same query ? (I have run just one query in my app)
what is the different between client_backed and parallel_worker ?
Since PostgreSQL v10 there is parallel processing for queries:
If the optimizer decides it is a good idea and there are enough resources, PostgreSQL will start parallel worker processes that execute the query together with your client backend. Eventually, the client backend will gather all the information from the parallel workers and finish query processing.
This speeds up query processing, but uses more resources on the database server.
The parameters that govern this are, among others max_parallel_workers, which limits the total limit for parallel worker processes, and max_parallel_workers_per_gather, which limits the numbers of parallel workers for a single query.

postgres performance - query hanging - query analysis tools and configuration question

We have a query we have been using for several months. It recently started hanging. The query is a join from 4 tables. One of those tables is only a few thousand records, one a hundred thousand, and 2 are about 2 million. It has been running in about 20 seconds for several months.
After several attempts to identify the issue by adding indexes to unindexed fields with no avail we modified one of the large tables to be a sub query with the result of 100000 records versus 2 million. The query now runs in 20 about seconds.
Explain of the query that hangs produces:
Limit (cost=1714850.81..1714850.81 rows=1 width=79)
While explain of the query that runs in 20 seconds produces:
Limit (cost=1389451.40..1389451.40 rows=1 width=79)
The query that hangs is larger, but does not indicate a significant difference.
Questions:
Are there thresh holds in postgres that cause it to use system
resources differently, i.e. disk buffering versus memory buffering?
The query that hangs shows one cpu with 100% usage. The system is
Linux. Iotop does not show extraordinary io usage. The system has 32
GB RAM and 8 processors. Postgres does not appear to by loading the
system heavily.
Are there other tools that can be applied? The query sub select
worked in this case but we may not be able to reduce a join
dimensions in this way in the future.
As a note the full explain does not show a markedly different execution plan.
Thanks, Dan

Difference between total and execution time in postgresql?

When I run any SQL in PostgreSQL manager I'm coming across execution time: 328 ms; total time: 391 ms. I'm wondering what is this two times that is execution time and
total time.
Not sure what PostgreSQL manager is, but it is most likely combination of those:
Planning time: 0.430 ms
Execution time: 150.225 ms
Planning is how long it takes for Postgres to decide how to get your data. You send query and server might try to optimize it, that takes time.
Execution is how long it took to actually run that plan.
You can verify it yourself if you send your query like that:
EXPLAIN (ANALYZE)
SELECT something FROM table WHERE whatever = 5;

Postgres 9.6: Parallel query does not take max_parallel_workers_per_gather setting

Postgres 9.6; Centos 6.7 ; 24 cores
BigTable1 contains 1,500,000,000 rows; weight 180GB.
max_worker_processes = 20
max_parallel_workers_per_gather = 12
1)
When running
EXPLAIN
SELECT
date_id, id1, id2, id3, id4, topdomain, ftype, SUM(imps), SUM(cls)
FROM BigTable1
WHERE
date_id BETWEEN 2017021200 AND 2017022400
AND date_id BETWEEN 2017020000 AND 2017029999
GROUP BY
date_id, id1, id2, id3, id4, topdomain, ftype;
No “Workers Planned:” used at all. Why?
2)
When running the same query when in the session defined
set max_parallel_workers_per_gather = 5;
“Workers Planned: 5” appear. The execution time was improved only by 25%.
2.1) Why “Workers Planned:” appears only after this setting?
2.2) Why we could not see a much better improvement when running with max_parallel_workers_per_gather = 5 ?
Thank you!.
When PostgreSQL considers a parallel sequential scan, it decides how many workers should be used based on the relation size (or the parallel_workers storage parameter for the driving table) and computes the cost of a parallel plan with that number of workers. This is compared to the cost of a serial plan, and the cheaper plan wins. Plans with other numbers of workers are not considered, so it can happen that the cost of the serial plan is less than the cost of the plan considered but more than the cost of some plan with a different number of workers. This probably happened in your case.
Since you didn't post the EXPLAIN ANALYZE output, we can't see how many groups your query is producing, but my guess is that it's a fairly large number. In PostgreSQL 9.6, a parallel aggregate must be performed by aggregating a portion of the data in each worker (a PartialAggregate) and then merging groups with the same keys in the leader (a FinalizeAggregate). Between these two steps, a Gather node is required to transmit the partially grouped data from the workers to the leader. This Gather node is somewhat expensive, so the most likely reason why you saw only limited speedup is that the number of groups being transferred was large. The cost of sending all of those groups, and of merging groups that occurred in more than one worker, may have looked too high to justify parallelism with a higher number of workers, but may have looked like a win with a lesser number of workers. These same costs probably account for the fact that even when parallel query was used, you saw only a 25% speedup.
If you post the EXPLAIN ANALYZE output with and without parallel query (that is, with "Workers Planned: 5" and with no parallelism), it might be possible to understand more clearly what is happening in your case.
(Source: I am one of the principal authors of PostgreSQL's parallel query support.)
If you are just looking to test the parallel query bits then you can have a look at force_parallel_mode and setting it to on.
force_parallel_mode (enum)
Allows the use of parallel queries for testing purposes even in
cases where no performance benefit is expected. The allowed values of
force_parallel_mode are off (use parallel mode only when it is
expected to improve performance), on (force parallel query for all
queries for which it is thought to be safe), and regress (like on, but
with additional behavior changes as explained below).
And, just as robert-haas mentioned below, Without force_parallel_modethe optimizer will potentially decide parallel query isn't the quickest... see the params below:
select
name,
setting,
unit,
short_desc
from pg_settings
where name in (
'force_parallel_mode',
'min_parallel_relation_size',
'parallel_setup_cost',
'parallel_tuble_cost',
'max_parallel_workers_per_gather' )
limit 10 ;
postgres=>
---------------------------------+---------+------+---------------------------------------------------------------------------------------------
force_parallel_mode | off | | Forces use of parallel query facilities.
max_parallel_workers_per_gather | 0 | | Sets the maximum number of parallel processes per executor node.
min_parallel_relation_size | 1024 | 8kB | Sets the minimum size of relations to be considered for parallel scan.
parallel_setup_cost | 1000 | | Sets the planner's estimate of the cost of starting up worker processes for parallel query.
(4 rows)