I couldn't find any documentation for the -oConnectTimeout option. So I was just wondering if its just for the ssh connection to occur or for the total connection? like if I have a single line command wherein I connect to a sqlserver from command line and execute a query that takes 20 seconds, then should I keep the timeout as 5 or 10 seconds for the conn to occur, or the whole 30-35 seconds for the query to be completed also?
Its for the time it takes to connect to the server. Although I would not recommend relying on -oConnectTimeout, I'd rather close the connection when the job is done, regardless of how long it takes. You should use this value to increase the timeout more than the default TCP timeout.
Related
In pgjdbc we have:
loginTimeout
connectTimeout
socketTimeout
cancelSignalTimeout
But it isn't clear to me what's the difference (when are they applied) between loginTimeout, connectTimeout and socketTimeout.
As documented in the PostgreSQL JDBC documentation:
loginTimeout = int
Specify how long to wait for establishment of a database connection.
The timeout is specified in seconds.
connectTimeout = int
The timeout value used for socket connect operations. If connecting
to the server takes longer than this value, the connection is broken.
The timeout is specified in seconds and a value of zero means that it
is disabled.
socketTimeout = int
The timeout value used for socket read operations. If reading from
the server takes longer than this value, the connection is closed.
This can be used as both a brute force global query timeout and a
method of detecting network problems. The timeout is specified in
seconds and a value of zero means that it is disabled.
cancelSignalTimeout = int
Cancel command is sent out of band over its own connection, so
cancel message can itself get stuck. This property controls "connect
timeout" and "socket timeout" used for cancel commands. The timeout is
specified in seconds. Default value is 10 seconds.
The connectTimeout and socketTimeout are timeouts on low-level socket operations. The connectTimeout governs the time needed to establish a TCP socket connection. Establishing a TCP connection doesn't guarantee a login (it doesn't even guarantee that you're connecting to a PostgreSQL server, just that you connected to something that accepted your TCP connection). A socketTimeout governs the time a socket can be blocked waiting to read from a socket. This involves all reads from the server, not just during connect, but also during subsequent interaction with the server (eg executing queries).
On the other hand loginTimeout governs the PostgreSQL protocol operation of connecting and authenticating to the PostgreSQL server. This involves establishing a TCP connection followed by one or more exchanges of packets for the handshake and authentication to the PostgreSQL server (I'm not familiar with the details of the PostgreSQL protocol, so I can't be very specific).
Exchanging these packets can take additional time, or if you connected to something that isn't a PostgreSQL server the packet exchange may stall. It might be possible to solve this with careful control of both connectTimeout and socketTimeout, but there are no guarantees (eg data is being exchanged, but the login never completes). In addition, as the socketTimeout also governs all other operations on the connection, you may want to set it higher (eg for other operations that take a long time to get a response back) than you are willing to wait for the login to complete.
The cancelSignalTimeout is used as the connect and socket timeout of the separate TCP connection used for cancel commands.
After reading the source, I'd say it is like this:
connectTimeout specifies how long to wait for a TCP network connection to get established
loginTimeout specifies how long the whole process of logging into the database is allowed to take
socketTimeout specifies how long the client will wait for a response to a command from the server before throwing an error
The first two are related to establishing a connection, the third is relevant for the whole database session.
Establishing a TCP connection is part of establishing a database connection.
When I run a query that takes a long time on my Postgres server (maybe 30 minutes), I get the error. I've verified the query is running with active status on the server using pgAdmin. I've also verified the correctness of the query, as it runs successfully on a smaller dataset. Server configurations are default, I haven't changed anything. Please help!
Look into the PostgreSQL server log.
Either you'll find a crash report there, which would explain the broken connection, or there is something in your network that cuts connections with no activity after a while.
Investigate your firewalls!
Maybe it is a solution to set the configuration parameter tcp_keepalives_idle to a value shorter than the time when the connection is cut. That will cause the server operating system to send keepalive messages on idle connections, which may be enough to prevent the overzealous connection reaper in your environment from disrupting your work.
The situation:
Postgres 9.1 on Debian Server
Scala(Java) application using the LISTEN/NOTIFY mechanism to get notified through JDBC
As there can be very long pauses (multipla days) between notifications I ran into the problem that the underlying TCP connection silently got terminated after some time and my application stopped to receive the notifications.
When googeling for a solution I found that there is a parameter tcpKeepAlive that you can set on the connection. So I set it to true and was happy. Until the next day I saw that again my connection was dead.
As I had been suspicious there was a wireshark capture running in parallel which now turns out to be very usefull. Just about exactly two hours after the last successfull communication on the connection of interest my application sends a keepalive packet to the database server. However the server responds with RST as it seems it has already closed the connection.
The net.ipv4.tcp_keepalive_time on the server is set to 7200 which is 2 hours.
Do I need to somehow enable keepalive on the server or increase the keepalive_time?
Is this the way to go about keeping my application connected?
TL;DR: My database connection gets terminated after long inactivity. Setting tcpKeepAlive didnt fix it as server responds with RST. What to do?
As Craig suggested in the comments the problem was very likely related to some piece of network hardware in between the server and the application. The fix was to increase the frequency of the keepalive messages.
In my case the OS was Windows where you have to create a Registry key with the idle time in milliseconds after which the message should be sent. Info on that here
I have set it to 15 minutes which seems to have solved the issue.
UPDATE:
It only seemed like it solved the issue. After about two days of program run time my connection was gone again. I switched to checking the validity my connection every time I use it. This does not seem like it is the solution but it is a solution nonetheless.
I'm using LWP::UserAgent to communicate with webservices on several servers; the servers are contacted one at a time. Each response might take up to 30 minutes to finish, so I set the LWP timeout to 30 minutes.
Unfortunately the same timeout also applies, if the server is not reachable at all (e.g. the webserver is down). So my application waits 30 minutes for a server, which is not running.
Is it feasable to set two seperate timeouts?
a short one, which waits for the connection to be established.
a longer one, which waits for the response, once the connection has been established.
The same timeout doesn't "also apply" if the server is not reachable. The timeout option works in a very specific way:
The request is aborted if no activity on the connection to the server is
observed for timeout seconds. This means that the time it takes for the
complete transaction and the request() method to actually return might be
longer.
As long as data is being passed, the timeout won't be triggered. You can use callback functions (see the REQUEST METHODS section of the docs) to check how long data transfer has been going on, and to exit entirely if desired.
Postgresql has 3 keepalive settings for managing dropped connections (in postgresql.conf):
tcp_keepalives_count
tcp_keepalives_idle
tcp_keepalives_interval
By default these are 0.
The behavior I would like is for Postgresql to drop client connections after a period of time, should the client lose its network connection or go to sleep.
I am currently using these values:
tcp_keepalives_count = 1
tcp_keepalives_idle = 60
tcp_keepalives_interval = 60
I am running PostgreSQL 8.4 on Mac OS X, but it doesn't seem to have any effect. My test is that I lock a row in a table (using SELECT FOR UPDATE) and disconnect the workstation from the network. But in Postgresql I still see that workstation holding the lock.
I would expect that after the time has passed (60 seconds in this case) the connection would be terminated and the lock would be released.
Either I am doing something wrong or I am completely misunderstanding how this is supposed to work.
Any advice?
I think you need to configure your operating system instead. Changing keepalive parameters by programs is not widely supported yet. This should help you:
Using TCP keepalive to Detect Network Errors
Also your parameters are chosen badly. If tcp_keepalives_count=1 worked then even one lost keepalive packet will drop your connection. And single packets get lost often. I'd use the following in /etc/sysctl.conf on MacOSX/FreeBSD:
net.inet.tcp.keepidle = 60000
net.inet.tcp.keepintvl = 10000
OS will then drop connections at most 140 seconds (60 seconds of idle + 8 keepalive packets in 10 seconds intervals) after loosing connectivity.