In Postgres 14 in pg_stat_statements, why is there a huge time difference between max_exec_time and mean_exec_time?
Could you please help?
That means that the execution time varies considerably. To see if that is just a single outlier or a regular occurrence, see if stddev_exec_time is high or not.
It can mean several things:
Perhaps execution sometimes took long because the statement was stuck behind a lock. If that is a rare occurrence, it may me the odd ALTER TABLE statement.
Perhaps the execution time varies depending on a query parameter. It may be fast for rare values and slow for frequent ones. Test with different parameters!
Perhaps execution time varies depending on how much of the data happens to be cached in RAM. Look at a high "buffers" footprint in the EXPLAIN (ANALYZE, BUFFERS) output.
Related
How can I bench mark SQL performance in postgreSQL? I tried using Explain Analyze but that gives varied Execution time every time when I repeat same query.
I am applying some tuning techniques on my query and trying to see whether my tuning technique is improving the query performace. The Explain analyze has varying execution times that I cant bechmark and compare . The tuning has imapact in MilliSeconds so I am looking for bechmarch that can give fixed values to compare against.
There will always be variations in the time it takes a statement to complete:
Pages may be cached in memory or have to be read from disk. This is usually the source of the greatest deviations.
Concurrent processes may need CPU time
You have to wait for internal short time locks (latches) to access a data structure
These are just the first three things that come to my mind.
In short, execution time is always subject to small variations.
Run the query several times and take the the median of the execution times. That is as good as it gets.
Tuning for milliseconds only makes sense if it is a query that is executed a lot.
Also, tuning only makes sense if you have realistic test data. Don't make the mistake to examine and tune a query with only a few test data when it will have to perform with millions of rows.
I want to know how long my queries take to execute, so that I can see whether my changes improve the runtime or not.
Simply timing the executing of the whole query is unsuitable, since this also takes into account the (highly variable) time spent waiting in an execution queue.
Redshift provides the STL_WLM_QUERY table that contains separate columns for queue wait time and execution time. However, my queries do not reliably show up in this table. For example if I execute the same query multiple times the number of corresponding rows in STL_WLM_QUERY is often much smaller than the number of repetitions. Sometimes, but not always, only one row is generated no matter how often I run the query. I suspect some caching is going on.
Is there a better way to find the actual execution time of a Redshift query, or can someone at least explain under what circumstances exactly a row in STL_WLM_QUERY is generated?
My tips
If possible, ensure that your query has not waited at all, if it has
there should be a row on stl_wlm_query. If it did wait - then rerun
it.
Run the query once to compile it, then a second time to benchmark
it. compile time can be significant
Disable the new query result caching feature (if you have it yet -
you probably don't)
(https://aws.amazon.com/about-aws/whats-new/2017/11/amazon-redshift-introduces-result-caching-for-sub-second-response-for-repeat-queries/)
How does the DBMS (postgres in my case) deals with execution plan and prepared statement.
The parameters of a query can have a huge impact on the execution plan, mainly due to data statistics.
It might prefer in certain cases use an index if the data is well distributed but for a particular value prefer a sequential scan because the parameter is not discriminant (usually when the parameter matches > 10% of table rows)
I am wondering if prepared statement are always a good way to improve performances or if it more a kind of "best effort"
Thanks in advance
Edit: my concern is about running frequently the same query, but with other parameters that could need to vary the execution plan. It is quite hard to measure the performance gain of prepared statement vs always have the most accurate execution plan
A prepared statement is a GREAT way to make the same simple query run over and over faster. For instance, something like
insert into table values ($1,$2,$3);
OTOH it is NOT a great way to make big ugly complex reporting queries run faster, where the data set may change based on what's in the where clause.
The whole point of prepared queries is to save the somewhat expensive step of query planning over and over. For the simple insert listed above, run 1,000 times, the cost of planning adds up.
OTOH for a big complex reporting query, the planning time is inconsequential. Most big reporting queries etc take seconds to minutes to even hours to run. The planning time, measured in milliseconds, is not worth worrying about here.
Trying to understand EXPLAIN function - I have two queries - first query is optimised, that is running 600 ms(I have 100k rows) and second query is running 900 ms
But when I run EXPLAIN ANALYZE - first query, that is running quickly shows me cost - 64296 and second query, that is running slowly shows me cost - 20873
can't understand why faster query has bigger cost, and why longer running query has smaller cost.
Could someone give me some hint ?
PostgreSQL EXPLAIN is an animal that really has a lot of arms & legs, each of which can cause it to work in a way that isn't easy to understand at first.
To answer your question, I understand that although running the first query Q1 (not its EXPLAIN), it runs faster than the second (Q2), but when you do an EXPLAIN ANALYSE, Q1 actually has a higher cost.
I could think of two reasons that come to mind at this moment:
If the Queries are LIMIT queries, its possible for Q1 to execute faster and still have higher 'cost', since the PostgreSQL Planner (intentionally) does not plan for a smaller total cost, but a smaller cost of the required result (in this case, a smaller number of rows).
Another reason could be that caching could be playing havoc with your times. Could you confirm if the observation is persistent with multiple (3+) runs?
Besides these hunches, if you really want to get deep into understanding EXPLAIN, recommend you to refer the following articles here, here and here.
Cost is what planner thinks about how many recourses (I/O and CPU time) it will take to perform the query. It's just an estimation, calculated by a mathematical model.
In your case planner was wrong, it chose suboptimal plan. It happens sometimes.
Why? There could be many reasons. Maybe statistics are inadequate (try to run analyze for your tables first of all). Maybe statistics are ok, but planner uses the wrong model (for example, you may have correlated predicates in your query which are known to be problematic). Maybe your query is over several dozens of tables and planner just can't go through all possible plans. And so on.
I just want to know what is the reason for having different time while executing the same query in PostgreSQL.
For Eg: select * from datas;
For the first time it takes 45ms
For the second time the same query takes 55ms and the next time it takes some other time.Can any one say What is the reason for having non static time.
Simple, everytime the database has to read the whole table and retrieve the rows. There might be 100 different things happening in database which might cause a difference of few millis. There is no need to panic. This is bound to happen. You can expect the operation to take same time with some millis accuracy. If there is a huge difference then it is something which has to be looked.
Have u applied indexing in your table . it also increases speed to a great deal!
Compiling the explanation from
Reference by matt b
EXPLAIN statement? helps us to display the execution plan that the PostgreSQL planner generates for the supplied statement.
The execution plan shows how the
table(s) referenced by the statement will be scanned — by plain
sequential scan, index scan, etc. — and if multiple tables are
referenced, what join algorithms will be used to bring together the
required rows from each input table
And Reference by Pablo Santa Cruz
You need to change your PostgreSQL configuration file.
Do enable this property:
log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements
# and their durations, > 0 logs only
# statements running at least this number
# of milliseconds
After that, execution time will be logged and you will be able to figure out exactly how bad (or good) are performing your queries.
Well that's about the case with every app on every computer. Sometimes the operating system is busier than other times, so it takes more time to get the memory you ask it for or your app gets fewer CPU time slices or whatever.