PostgreSQL. Slow queries in log file are fast in psql - postgresql

I have an application written on Play Framework 1.2.4 with Hibernate(default C3P0 connection pooling) and PostgreSQL database (9.1).
Recently I turned on slow queries logging ( >= 100 ms) in postgresql.conf and found some issues.
But when I tried to analyze and optimize one particular query, I found that it is blazing fast in psql (0.5 - 1 ms) in comparison to 200-250 ms in the log. The same thing happened with the other queries.
The application and database server is running on the same machine and communicating using localhost interface.
JDBC driver - postgresql-9.0-801.jdbc4
I wonder what could be wrong, because query duration in the log is calculated considering only database processing time excluding external things like network turnarounds etc.

Possibility 1: If the slow queries occur occasionally or in bursts, it could be checkpoint activity. Enable checkpoint logging (log_checkpoints = on), make sure the log level (log_min_messages) is 'info' or lower, and see what turns up. Checkpoints that're taking a long time or happening too often suggest you probably need some checkpoint/WAL and bgwriter tuning. This isn't likely to be the cause if the same statements are always slow and others always perform well.
Possibility 2: Your query plans are different because you're running them directly in psql while Hibernate, via PgJDBC, will at least sometimes be doing a PREPARE and EXECUTE (at the protocol level so you won't see actual statements). For this, compare query performance with PREPARE test_query(...) AS SELECT ... then EXPLAIN ANALYZE EXECUTE test_query(...). The parameters in the PREPARE are type names for the positional parameters ($1,$2,etc); the parameters in the EXECUTE are values.
If the prepared plan is different to the one-off plan, you can set PgJDBC's prepare threshold via connection parameters to tell it never to use server-side prepared statements.
This difference between the plans of prepared and unprepared statements should go away in PostgreSQL 9.2. It's been a long-standing wart, but Tom Lane dealt with it for the up-coming release.

It's very hard to say for sure without knowing all the details of your system, but I can think of a couple of possibilities:
The query results are cached. If you run the same query twice in a short space of time, it will almost always complete much more quickly on the second pass. PostgreSQL maintains a cache of recently retrieved data for just this purpose. If you are pulling the queries from the tail of your log and executing them immediately this could be what's happening.
Other processes are interfering. The execution time for a query varies depending on what else is going on in the system. If the queries are taking 100ms during peak hour on your website when a lot of users are connected but only 1ms when you try them again late at night this could be what's happening.
The point is you are correct that the query duration isn't affected by which library or application is calling it, so the difference must be coming from something else. Keep looking, good luck!

There are several possible reasons. First if the database was very busy when the slow queries excuted, the query may be slower. So you may need to observe the load of the OS at that moment for future analysis.
Second the history plan of the sql may be different from the current session plan. So you may need to install auto_explain to see the actual plan of the slow query.

Related

When and why should i trigger pg_stat_reset()?

I am trying to understand how to monitor and tune postgresql performance. I started with exploring tables pg_stat_all_tables, pg_stat_statements in order to gather information about live tuples, dead tuples, last autovacuum time etc. There were some usefull information about n_live_tuples (near to real rows count in table) and n_dead_tup util i run pg_stat_reset query. After that i have some strange results - there are less n_live_tup than n_dead_tup. I can't find any articles/docs about why and when (some use cases) should i run pg_stat_reset query. Can somebody explain me that or provide some useful resources?
It is ok to run pg_stat_reset() occasionally, like once per month, to get a fairly up-to-date view on what is going on in your database.
But don't do that too often, as there is a down side to it: the system relevant autovacuum process relies on these statistics, so you will miss a couple of autovacuum (and autoanalyze) runs if you do that. That may or may not be a problem in your database, but at any rate I wouldn't do it too often. If you can, manually VACUUM and ANALYZE the database after calling pg_stat_reset().
There is no such problem with pg_stat_statements_reset(), so run that as often as you please.
The best thing for you would be to have a monitoring software that checks the values of the statistics regularly and provides you with a the development (differences to the previous run). Then you never have to reset the statistics and still have a good overview over what is going on.

PostgreSQL: Backend processes are active for a long time

now I am hitting a very big road block.
I use PostgreSQL 10 and its new table partitioning.
Sometimes many queries don't return and at the time many backend processes are active when I check backend processes by pg_stat_activity.
First, I thought theses process are just waiting for lock, but these transactions contain only SELECT statements and the other backend doesn't use any query which requires ACCESS EXCLUSIVE lock. And these queries which contain only SELECT statements are no problem in terms of plan. And usually these work well. And computer resources(CPU, memory, IO, Network) are also no problem. Therefore, theses transations should never conflict. And I thoughrouly checked the locks of theses transaction by pg_locks and pg_blocking_pids() and finnaly I couldn't find any lock which makes queries much slower. Many of backends which are active holds only ACCESS SHARE because they use only SELECT.
Now I think these phenomenon are not caused by lock, but something related to new table partition.
So, why are many backends active?
Could anyone help me?
Any comments are highly appreciated.
The blow figure is a part of the result of pg_stat_activity.
If you want any additional information, please tell me.
EDIT
My query dosen't handle large data. The return type is like this:
uuid UUID
,number BIGINT
,title TEXT
,type1 TEXT
,data_json JSONB
,type2 TEXT
,uuid_array UUID[]
,count BIGINT
Because it has JSONB column, I cannot caluculate the exact value, but it is not large JSON.
Normally theses queries are moderately fast(around 1.5s), so it is absolutely no problem, however when other processes work, the phenomenon happens.
If statistic information is wrong, the query are always slow.
EDIT2
This is the stat. There are almost 100 connections, so I couldn't show all stat.
For me it looks like application problem, not postresql's one. active status means that your transaction still was not commited.
So why do you application may not send commit to database?
Try to review when do you open transaction, read data, commit transaction and rollback transaction in your application code.
EDIT:
By the way, to be sure try to check resource usage before problem appear and when your queries start hanging. Try to run top and iotop to check if postgres really start eating your cpu or disk like crazy when problem appears. If not, I will suggest to look for problem in your application.
Thank you everyone.
I finally solved this problem.
I noticed that a backend process holded too many locks. So, when I executed the query SELECT COUNT(*) FROM pg_locks WHERE pid = <pid>, the result is about 10000.
The parameter of locks_per_transactions is 64 and max_connections is about 800.
So, if the number of query that holds many locks is large, the memory shortage occurs(see calculation code of shared memory inside PostgreSQL if you are interested.).
And too many locks were caused when I execute query like SELECT * FROM (partitioned table). Imangine you have a table foo that is partitioned and the number of the table is 1000. And then you can execute SELECT * FROM foo WHERE partion_id = <id> and the backend process will hold about 1000 table locks(and index locks). So, I change the query from SELECT * FROM foo WHERE partition_id = <id> to SELECT * FROM foo_(partitioned_id). As the result, the problem looks solved.
You say
Sometimes many queries don't return
...however when other processes work, the phenomenon happens. If statistic
information is wrong, the query are always slow.
They don't return/are slow when directly connecting to the Postgres instance and running the query you need, or when running the queries from an application? The backend processes that are running, are you able to kill them successfully with pg_terminate_backend($PID) or does that have issues? To rule out issues with the statement itself, make sure statement_timeout is set to a reasonable amount to kill off long-running queries. After that is ruled out, perhaps you are running into a case of an application hanging and never allowing the send calls from PostgreSQL to finish. To avoid a situation like that, if you are able to (depending on OS) you can tune the keep-alive time: https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-TCP-KEEPALIVES-IDLE (by default is 2 hours)
Let us know if playing with any of that gives any more insight into your issue.
Sorry for late post, As #Konstantin pointed out, this might be because of your application(which is why I asked for your EDIT2). Adding a few excerpts,
table partition has no effect on these locks, that is a totally different concept and does not hold up locks in your case.
In your application, check if the connection properly close() after read() and is in finally block (From Java perspective). I am not sure of your application tier.
Check if SELECT..FOR UPDATE or any similar statement is written erroneously recently which is causing this.
Check if any table has grown in size recently and the column is not Indexed. This is very important and frequent cause of select statements running for some minutes. I'd also suggest using timeouts for select statements in your application. https://www.postgresql.org/docs/9.5/gin-intro.html This can give you a headstart.
Another thing that is fishy to me is the JSONB column, maybe your Jsonb values are pretty long, or the queries are unnecessarily selecting JSONB value even if not required?
Finally, If you don't need some special features of Jsonb data type, then you use JSON data type which is faster (magical maximum, sometimes 50x!)
It looks like the pooled connections not getting closed properly and a few queries might be taking huge time to respond back. As pointed out in other answers, it is the problem with the application and could be connection leak. Most possibly, it might be because of pending transactions over some already pending and unresolved transactions, leading to a number of unclosed transactions.
In addition, PostgreSQL generally has one or more "helper" processes like the stats collector, background writer, autovaccum daemon, walsender, etc, all of which show up as "postgres" instances.
One thing I would suggest you check in which part of the code you have initiated the queries. Try to DRY run your queries outside the application and have some benchmarking of queries performance.
Secondly, you can keep some timeout for certain queries if not all.
Thirdly, you can do kill the idle transactions after certain timeouts by using:
SET SESSION idle_in_transaction_session_timeout = '5min';
I hope it might work. Cheers!

EntityFramework taking excessive time to return records for a simple SQL query

I have already combed through this old article:
Why is Entity Framework taking 30 seconds to load records when the generated query only takes 1/2 of a second?
but no success.
I have tested the query:
without lazy loading (not using .Include of related entities) and
without merge tracking (using AsNoTracking)
I do not think I can easily switch to compiled queries in general due to the complexity of queries and using a Code First model, but let me know if you experience otherwise...
Setup
Entity Framework '4.4' (.Net 4.0 with EF 5 install)
Code First model and DbContext
Testing directly on the SQL Server 2008 machine hosting the database
Query
- It's just returning simple fields from one table:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Active] AS [Active],
[Extent1].[ChangeUrl] AS [ChangeUrl],
[Extent1].[MatchValueSetId] AS [MatchValueSetId],
[Extent1].[ConfigValueSetId] AS [ConfigValueSetId],
[Extent1].[HashValue] AS [HashValue],
[Extent1].[Creator] AS [Creator],
[Extent1].[CreationDate] AS [CreationDate]
FROM [dbo].[MatchActivations] AS [Extent1]
The MatchActivations table has relationships with other tables, but for this purpose using explicit loading of related entities as needed.
Results (from SQL Server Profiler)
For Microsoft SQL Server Management Studio Query: CPU = 78 msec., Duration = 587 msec.
For EntityFrameworkMUE: CPU = 31 msec., Duration = 8216 msec.!
Does anyone know, besides suggesting the use of compiled queries if there is anything else to be aware of when using Entity Framework for such a simple query?
A number of people have run into problems where cached query execution plans due to parameter sniffing cause SQL Server to produce a very inefficient execution plan when running a query through ADO.NET, while running the exact same query directly from SQL Server Management Studio uses a different execution plan because some flags on the query are set differently by default.
Some people have reported success in forcing a refresh of the query execution plans by running one or both of the following commands:
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
But a more long-term, targeted solution to this problem would be to use Query Hints like OPTIMIZE FOR and OPTION(Recompile), as described in this article, to help ensure that good execution plans are chosen more consistently in the first place.
I think the framework is doing something funky if you what you say is true i.e. running the query in management studio takes half a second while entity framework takes 8.2 seconds. My hunch is that it's trying to do something with those 25K+ records set (perhaps bind to something else).
Can you download NP NET profiler and profile your app once? http://www.microsoft.com/en-in/download/details.aspx?id=35370
This nifty little program is going to record every method call and their execution time and basically give you info from under the hood on where it's spending those 7+ seconds. If that does not help, I also recommend trying out JetBrains .NET profiler. https://www.jetbrains.com/profiler/
Previous answer suggests that the execution plan can be off and that's true in many cases but it's also worth to sometimes look under the hood to determine the cause.
My thanks to Kalagen and others who responded to this - I did come to a conclusion on this, but forgot about this post.
It turns it is the number of records being returned X processing time (LINQ/EF I presume) to repurpose the raw SQL data back into objects on the client side. I set up wireshark on the SQL server to monitor the network traffic between it and client machines post-query and discovered:
There is a constant stream of network traffic between SQL server and
the rate of packet processing varies greatly between different machines (8x)
While that is occurring, the SQL server CPU utilization is < 25% and no resource starvation seems to be happening (working set, virtual memory, thread, handle counts, etc.)
so it is basically the constant conversion of the results back into EF objects.
The query in question BTW was part of a 'performance' unit test so we ended up culling it down to a more reasonable typical web-page loading of 100 records in under 1 sec. which passes easily.
If anyone wants to chime in on the details of how Entity Framework processes records post-query, I'm sure that would be useful to know.
It was an interesting discovery that the processing time depended more heavily on the client machine than on the SQL server machine (this is an intranet application).

Does PostgreSQL cache Prepared Statements like Oracle

I have just moved to PostgreSQL after having worked with Oracle for a few years.
I have been looking into some performance issues with prepared statements in the application (Java, JDBC) with the PostgreSQL database.
Oracle caches prepared statements in its SGA - the pool of prepared statements is shared across database connections.
PostgreSQL documentation does not seem to indicate this. Here's the snippet from the documentation (https://www.postgresql.org/docs/current/static/sql-prepare.html) -
Prepared statements only last for the duration of the current database
session. When the session ends, the prepared statement is forgotten,
so it must be recreated before being used again. This also means that
a single prepared statement cannot be used by multiple simultaneous
database clients; however, each client can create their own prepared
statement to use.
I just want to make sure that I am understanding this right, because it seems so basic for a database to implement some sort of common pool of commonly executed prepared statements.
If PostgreSQL does not cache these that would mean every application that expects a lot of database transactions needs to develop some sort of prepared statement pool that can be re-used across connections.
If you have worked with PostgreSQL before, I would appreciate any insight into this.
Yes, your understanding is correct. Typically if you had a set of prepared queries that are that critical then you'd have the application call a custom function to set them up on connection.
There are three key reasons for this afaik:
There's a long todo list and they get done when a developer is interested/paid to tackle them. Presumably no-one has thought it worth funding yet or come up with an efficient way of doing it.
PostgreSQL runs in a much wider range of environments than Oracle. I would guess that 99% of installed systems wouldn't see much benefit from this. There are an awful lot of setups without high-transaction performance requirement, or for that matter a DBA to notice whether it's needed or not.
Planned queries don't always provide a win. There's been considerable work done on delaying planning/invalidating caches to provide as good a fit as possible to the actual data and query parameters.
I'd suspect the best place to add something like this would be in one of the connection pools (pgbouncer/pgpool) but last time I checked such a feature wasn't there.
HTH

SQL Server & ADO NET : how to automatically cancel long running user query?

I have a .NET Core 2.1 application that allows users to search a large database, with the possibility of using lots of parameters. The data access is done through ADO.NET. Some of the queries generated result in long running queries (several hours). Obviously, the user gives up on waiting, but the query chugs along in SQL Server.
I realize that the root cause is the design of the app, but I would like a quick solution for now, if possible.
I have tried many solutions, but none seem to work as expected.
What I have tried:
CommandTimeout
CommandTimeout works as expected with ExecuteNonQuery but does not work with ExecuteReader, as discussed in this forum
When you execute command.ExecuteReader(), you don't get this exception because the server responds on time. The application doesn't respond because it reads data to the memory, and the ExecuteReader() method doesn't return control until all the data is read.
I have also tried using SqlDataAdapter, but this does not work either.
SQL Server query governor
SQL Server's query governor works off of the estimated execution plan, and while it does work sometimes, it does not always catch inefficient queries.
SQL Server execution time-out
Tools > Options > Query Execution > SQL Server > General
I'm not sure what this does, but after entering a value of 1, SQL Server still allows queries to run as long as they need. I tried restarting the server instance, but that did not make any difference.
Again, I realize that the cause of this problem is the way that the queries are generated, but with so many parameters and so much data, fine tuning a solution in the design of the application may take some time. As of now, we are manually killing any spid associated with this app that has run over 10 or so minutes.
EDIT:
I abandoned the hope of finding a simple solution. If you're having a similar issue, here is what we did to address it:
We created a .net core console app that polls the database for queries running over a certain allotted time. The app looks at the login name and the amount of time it's been running and determines whether to kill the process.
https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlcommand.cancel?view=netframework-4.7.2
Looking through the documentation on SqlCommand.Cancel, I think it might solve your issue.
If you were to create and start a Timer before you call ExecuteReader(), you could then keep track of how long the query is running, and eventually call the Cancel method yourself.
(Note: I wanted to add this as a comment but I don't have the reputation to be allowed to yet)