Postgresql has 3 keepalive settings for managing dropped connections (in postgresql.conf):
tcp_keepalives_count
tcp_keepalives_idle
tcp_keepalives_interval
By default these are 0.
The behavior I would like is for Postgresql to drop client connections after a period of time, should the client lose its network connection or go to sleep.
I am currently using these values:
tcp_keepalives_count = 1
tcp_keepalives_idle = 60
tcp_keepalives_interval = 60
I am running PostgreSQL 8.4 on Mac OS X, but it doesn't seem to have any effect. My test is that I lock a row in a table (using SELECT FOR UPDATE) and disconnect the workstation from the network. But in Postgresql I still see that workstation holding the lock.
I would expect that after the time has passed (60 seconds in this case) the connection would be terminated and the lock would be released.
Either I am doing something wrong or I am completely misunderstanding how this is supposed to work.
Any advice?
I think you need to configure your operating system instead. Changing keepalive parameters by programs is not widely supported yet. This should help you:
Using TCP keepalive to Detect Network Errors
Also your parameters are chosen badly. If tcp_keepalives_count=1 worked then even one lost keepalive packet will drop your connection. And single packets get lost often. I'd use the following in /etc/sysctl.conf on MacOSX/FreeBSD:
net.inet.tcp.keepidle = 60000
net.inet.tcp.keepintvl = 10000
OS will then drop connections at most 140 seconds (60 seconds of idle + 8 keepalive packets in 10 seconds intervals) after loosing connectivity.
Related
After several hours of uptime on a new dedicated server, my Minecraft server would lose connection to the database and throw the following error: "org.postgresql.util.PSQLException: This connection has been closed."
After restarting the MC server, it would be fine again but the issue would continue after several hours again. My guess is that it's losing connection over time somehow. Currently the ping between the dedicated host and the host where the database is on ranges between 0.3 - 0.5ms.
I create the connection instance with a PGSimpleDataSource after setting the values from a config.
PGSimpleDataSource ds = new PGSimpleDataSource();
// Set the values here...
Connection c = ds.getConnection(...);
Does anyone know why this happens and/or if there is a fix to this? Thank you!
If it is not a bug in the client software, it could be a firewall or router misconfiguration.
Set tcp_keepalives_idle in postgresql.conf to a value less than the default of 2 hours, for example 300 for 5 minutes. After you reload PostgreSQL, it will automatically send keepalive packets every 5 minutes, keeping the connection from getting idle.
If that does not solve the problem, then it must be a bug in the client application.
When I run a query that takes a long time on my Postgres server (maybe 30 minutes), I get the error. I've verified the query is running with active status on the server using pgAdmin. I've also verified the correctness of the query, as it runs successfully on a smaller dataset. Server configurations are default, I haven't changed anything. Please help!
Look into the PostgreSQL server log.
Either you'll find a crash report there, which would explain the broken connection, or there is something in your network that cuts connections with no activity after a while.
Investigate your firewalls!
Maybe it is a solution to set the configuration parameter tcp_keepalives_idle to a value shorter than the time when the connection is cut. That will cause the server operating system to send keepalive messages on idle connections, which may be enough to prevent the overzealous connection reaper in your environment from disrupting your work.
The situation:
Postgres 9.1 on Debian Server
Scala(Java) application using the LISTEN/NOTIFY mechanism to get notified through JDBC
As there can be very long pauses (multipla days) between notifications I ran into the problem that the underlying TCP connection silently got terminated after some time and my application stopped to receive the notifications.
When googeling for a solution I found that there is a parameter tcpKeepAlive that you can set on the connection. So I set it to true and was happy. Until the next day I saw that again my connection was dead.
As I had been suspicious there was a wireshark capture running in parallel which now turns out to be very usefull. Just about exactly two hours after the last successfull communication on the connection of interest my application sends a keepalive packet to the database server. However the server responds with RST as it seems it has already closed the connection.
The net.ipv4.tcp_keepalive_time on the server is set to 7200 which is 2 hours.
Do I need to somehow enable keepalive on the server or increase the keepalive_time?
Is this the way to go about keeping my application connected?
TL;DR: My database connection gets terminated after long inactivity. Setting tcpKeepAlive didnt fix it as server responds with RST. What to do?
As Craig suggested in the comments the problem was very likely related to some piece of network hardware in between the server and the application. The fix was to increase the frequency of the keepalive messages.
In my case the OS was Windows where you have to create a Registry key with the idle time in milliseconds after which the message should be sent. Info on that here
I have set it to 15 minutes which seems to have solved the issue.
UPDATE:
It only seemed like it solved the issue. After about two days of program run time my connection was gone again. I switched to checking the validity my connection every time I use it. This does not seem like it is the solution but it is a solution nonetheless.
Happy Spring Festival - the Chinese New Year.
I'm working on server programming, and stucked in 10055 Error.
I have a TCP client application, which can simulate a huge amount of clients.
Hearing that 65534 is the maximum value of tcp client connections of one computer,
I use Asio to implement simulation client which start 50000 asynchronous tcp connects.
pseudocode:
for (int i=0: i<50000 ; ++i)
asyn_connect(...);
Development Environment is:
windows xp , x86 , 4G memory, 4 core CPU
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\MaxUserPort=65000
The result is:
when connects come up to 17000 , 10055 Error occur.
I tried another computer ,the Error occur at 30000 connections, better but not enough good.
( the server app runs at another computer, also using Asio ).
The question is:
How to successfully start 50000 client connections at one computer?
Cou could try to do it more blockwise:
Eg. start with 10000 connections. As soon as 5000 connections were successful start the next 5000 async_connect calls. Then repeat that until you have reached your target. That would at least put less stress on the IO completion port. If it doesn't work I would try with even smaller blocks.
However depending on where the OS runs out of memory that still won't help.
Do you start asynchronous reads directly after the connect succeeds? These will also drain the memory resources.
I couldn't find any documentation for the -oConnectTimeout option. So I was just wondering if its just for the ssh connection to occur or for the total connection? like if I have a single line command wherein I connect to a sqlserver from command line and execute a query that takes 20 seconds, then should I keep the timeout as 5 or 10 seconds for the conn to occur, or the whole 30-35 seconds for the query to be completed also?
Its for the time it takes to connect to the server. Although I would not recommend relying on -oConnectTimeout, I'd rather close the connection when the job is done, regardless of how long it takes. You should use this value to increase the timeout more than the default TCP timeout.