I just want to know what is the reason for having different time while executing the same query in PostgreSQL.
For Eg: select * from datas;
For the first time it takes 45ms
For the second time the same query takes 55ms and the next time it takes some other time.Can any one say What is the reason for having non static time.
Simple, everytime the database has to read the whole table and retrieve the rows. There might be 100 different things happening in database which might cause a difference of few millis. There is no need to panic. This is bound to happen. You can expect the operation to take same time with some millis accuracy. If there is a huge difference then it is something which has to be looked.
Have u applied indexing in your table . it also increases speed to a great deal!
Compiling the explanation from
Reference by matt b
EXPLAIN statement? helps us to display the execution plan that the PostgreSQL planner generates for the supplied statement.
The execution plan shows how the
table(s) referenced by the statement will be scanned — by plain
sequential scan, index scan, etc. — and if multiple tables are
referenced, what join algorithms will be used to bring together the
required rows from each input table
And Reference by Pablo Santa Cruz
You need to change your PostgreSQL configuration file.
Do enable this property:
log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements
# and their durations, > 0 logs only
# statements running at least this number
# of milliseconds
After that, execution time will be logged and you will be able to figure out exactly how bad (or good) are performing your queries.
Well that's about the case with every app on every computer. Sometimes the operating system is busier than other times, so it takes more time to get the memory you ask it for or your app gets fewer CPU time slices or whatever.
Related
In Postgres 14 in pg_stat_statements, why is there a huge time difference between max_exec_time and mean_exec_time?
Could you please help?
That means that the execution time varies considerably. To see if that is just a single outlier or a regular occurrence, see if stddev_exec_time is high or not.
It can mean several things:
Perhaps execution sometimes took long because the statement was stuck behind a lock. If that is a rare occurrence, it may me the odd ALTER TABLE statement.
Perhaps the execution time varies depending on a query parameter. It may be fast for rare values and slow for frequent ones. Test with different parameters!
Perhaps execution time varies depending on how much of the data happens to be cached in RAM. Look at a high "buffers" footprint in the EXPLAIN (ANALYZE, BUFFERS) output.
I have a processing pipeline generates two streams of data, then joins the two streams of data to produce a third stream. Each stream of data is a timeseries over 1 year of 30 minute intervals (so 17520 rows). Both the generated streams and the joined stream are written into a single table keyed by a unique stream id and the timestamp of each point in the timeseries.
In abstract terms, the c and g series are generated by plpgsql functions which insert into the timeseries table from data stored elsewhere in the database (e.g. with a select) and then return the unique identifiers of the newly created series. The n series is generated with a join between the timeseries identified by c_id and g_id by the calculate_n() function which returns the id of the new n series.
To illustrate with some pseudo code:
-- generation transaction
begin;
c_id = select generate_c(p_c);
g_id = select generate_g(p_g);
end transaction;
-- calculation transaction
begin;
n_id = select calculate_n(c_id, g_id);
end transaction;
I observe that generate_c() and generate_g() typically run in a lot less than a second however the first time calculate_n() runs, it typically takes 1 minute.
However if I run calculate_n() a second time with exactly the same parameters as the first run, it runs in less than a second. (calculate_n() generates a completely new series each time it runs - it is not reading or re-writing any data calculated by the first execution)
If I stop the database server, restart it, then run calculate_n() on c_id and g_id calculated previously, the execution of calculate_n() also takes less than a second.
This is very confusing to me. I could understand the second run of calculate_n() is taking only a second if, somehow, the first run had warmed a cache but if that is so, then why does the third run (after a server restart) still run quickly when any such cache would have been cleared?
It appears to me that perhaps some kind of write cache, generated by the first generation transaction, is (unexpectedly) impeding the first execution of calculate_n() but once calculate_n() completes, that cache is purged so that it doesn't get in the way of subsequent executions of calculate_n() when they occur. I have had a look at the activity of the shared buffer cache via pg_buffercache but didn't see any strong evidence that this was happening although there were certainly evidence of cache activity across executions of calculate_n().
I may be completely off-base about this being the result of an interaction with a write-cache that was populated by the first transaction, but I am struggling to understand why the performance of calculate_n() is so poor immediately after the first transaction completes but not at other times, such as immediately after the first attempt or after the database server is restarted.
I am using postgres 11.6.
What explains this behaviour?
update:
So further on this. Running the vacuum analyze between the two generate steps and the calculate step did improve the performance of the calculate step, but if I found that if I repeated the steps again, I needed to run vacuum analyze in between the generate steps and the calculate step every time I executed the sequence which doesn't seem like a particularly practical thing to do (since you can't call vacuum analyze in a function or a procedure). I understand the need to run vacuum analyze at least once with a reasonable number of rows in the table. But do I really need to do it every time I insert 34000 more rows?
How can I bench mark SQL performance in postgreSQL? I tried using Explain Analyze but that gives varied Execution time every time when I repeat same query.
I am applying some tuning techniques on my query and trying to see whether my tuning technique is improving the query performace. The Explain analyze has varying execution times that I cant bechmark and compare . The tuning has imapact in MilliSeconds so I am looking for bechmarch that can give fixed values to compare against.
There will always be variations in the time it takes a statement to complete:
Pages may be cached in memory or have to be read from disk. This is usually the source of the greatest deviations.
Concurrent processes may need CPU time
You have to wait for internal short time locks (latches) to access a data structure
These are just the first three things that come to my mind.
In short, execution time is always subject to small variations.
Run the query several times and take the the median of the execution times. That is as good as it gets.
Tuning for milliseconds only makes sense if it is a query that is executed a lot.
Also, tuning only makes sense if you have realistic test data. Don't make the mistake to examine and tune a query with only a few test data when it will have to perform with millions of rows.
I want to know how long my queries take to execute, so that I can see whether my changes improve the runtime or not.
Simply timing the executing of the whole query is unsuitable, since this also takes into account the (highly variable) time spent waiting in an execution queue.
Redshift provides the STL_WLM_QUERY table that contains separate columns for queue wait time and execution time. However, my queries do not reliably show up in this table. For example if I execute the same query multiple times the number of corresponding rows in STL_WLM_QUERY is often much smaller than the number of repetitions. Sometimes, but not always, only one row is generated no matter how often I run the query. I suspect some caching is going on.
Is there a better way to find the actual execution time of a Redshift query, or can someone at least explain under what circumstances exactly a row in STL_WLM_QUERY is generated?
My tips
If possible, ensure that your query has not waited at all, if it has
there should be a row on stl_wlm_query. If it did wait - then rerun
it.
Run the query once to compile it, then a second time to benchmark
it. compile time can be significant
Disable the new query result caching feature (if you have it yet -
you probably don't)
(https://aws.amazon.com/about-aws/whats-new/2017/11/amazon-redshift-introduces-result-caching-for-sub-second-response-for-repeat-queries/)
I have an application written on Play Framework 1.2.4 with Hibernate(default C3P0 connection pooling) and PostgreSQL database (9.1).
Recently I turned on slow queries logging ( >= 100 ms) in postgresql.conf and found some issues.
But when I tried to analyze and optimize one particular query, I found that it is blazing fast in psql (0.5 - 1 ms) in comparison to 200-250 ms in the log. The same thing happened with the other queries.
The application and database server is running on the same machine and communicating using localhost interface.
JDBC driver - postgresql-9.0-801.jdbc4
I wonder what could be wrong, because query duration in the log is calculated considering only database processing time excluding external things like network turnarounds etc.
Possibility 1: If the slow queries occur occasionally or in bursts, it could be checkpoint activity. Enable checkpoint logging (log_checkpoints = on), make sure the log level (log_min_messages) is 'info' or lower, and see what turns up. Checkpoints that're taking a long time or happening too often suggest you probably need some checkpoint/WAL and bgwriter tuning. This isn't likely to be the cause if the same statements are always slow and others always perform well.
Possibility 2: Your query plans are different because you're running them directly in psql while Hibernate, via PgJDBC, will at least sometimes be doing a PREPARE and EXECUTE (at the protocol level so you won't see actual statements). For this, compare query performance with PREPARE test_query(...) AS SELECT ... then EXPLAIN ANALYZE EXECUTE test_query(...). The parameters in the PREPARE are type names for the positional parameters ($1,$2,etc); the parameters in the EXECUTE are values.
If the prepared plan is different to the one-off plan, you can set PgJDBC's prepare threshold via connection parameters to tell it never to use server-side prepared statements.
This difference between the plans of prepared and unprepared statements should go away in PostgreSQL 9.2. It's been a long-standing wart, but Tom Lane dealt with it for the up-coming release.
It's very hard to say for sure without knowing all the details of your system, but I can think of a couple of possibilities:
The query results are cached. If you run the same query twice in a short space of time, it will almost always complete much more quickly on the second pass. PostgreSQL maintains a cache of recently retrieved data for just this purpose. If you are pulling the queries from the tail of your log and executing them immediately this could be what's happening.
Other processes are interfering. The execution time for a query varies depending on what else is going on in the system. If the queries are taking 100ms during peak hour on your website when a lot of users are connected but only 1ms when you try them again late at night this could be what's happening.
The point is you are correct that the query duration isn't affected by which library or application is calling it, so the difference must be coming from something else. Keep looking, good luck!
There are several possible reasons. First if the database was very busy when the slow queries excuted, the query may be slower. So you may need to observe the load of the OS at that moment for future analysis.
Second the history plan of the sql may be different from the current session plan. So you may need to install auto_explain to see the actual plan of the slow query.