getsockopt: connection timed out - postgresql

I rewrite my project from python tornado to go(use iris framework). The basic function tested ok. When I test under high concurrence.the app always stops a while and then comes out the errors:
(dial tcp 192.168.1.229:6543: getsockopt: connection timed out)
the 6543 port is the postgresql port used with pgbouncer...the pgbouncer and postgresl process runs Ok.
Also, I find that the memcache connect time out sometimes(the memcache process is still working).
Does this happen because too many connections? Or some connections not
closed on time?
How can I avoid this problem?

Check your PgBouncer config. Try to increase max_client_conn option. Then experiment with concurrency level and requests count during stress test. Another possible issue can be that you don't return connections to pool.

Related

What exactly does a connection pool for databases like PostgreSQL do?

I know the general idea that a connection pool is a pool of reusable connections that speeds up traffic to the database because it can reuse connections instead of constantly creating new ones.
But this is a very high level explanation. It doesn't explain what is meant by a connection and why the connection pool works, since even with a connection pool such as for example client -> PgBouncer -> PostgreSQL, while the client does not have to create a connection to the databasee, it still has to connect to create a connection to the proxy.
So what is the connection created from (e.g.) client -> PgBouncer and why is creating this connection faster than creating the connection PgBouncer -> PostgreSQL?
There are two uses of a connection pool:
it prevents opening and closing database connections all the time
There is certainly a certain overhead with establishing a TCP connection to pgBouncer, but that is way cheaper than establishing a database connection. When you start a database connection, additional work is done:
a server process is started, which is way more expensive than a TCP connection
PostgreSQL loads cached metadata tables
it puts a limit on the number of client connections, thereby preventing database overload
The advantage over limiting max_connections is that connections in excess of the limit won't receive an error, but will be queued waiting for a connection to become free.

PostgreSQL Connection to the server has been lost

I installed postgreSQL server on a raspberry pi 4 with raspbian buster. When I try to connect from local network i have no problems about idle time. When i try to connect from my static public ip I can send command but if I didn't send anything for more than 3 minutes, it appears this message "Connection to the server has been lost".
I tried to install ufw and disable it, I used DMZ, I tried to change keepalive_idle, but i have always the same problem. Please help me.
sometimes the error is
"ERROR: SSL SYSCALL error: Operation timed out"
(Note: always if I am connected from public IP)
If you don't have the same issue from within your local network I assume the connection is being terminated by a network device sitting between the client and the server (a router most likely).
There are routers with small TCP timeout settings (such as 300 seconds) which is close to what you're experiencing.
Try to check (and increase if needed) the TCP timeout settings on your router (and any other devices you might have in between).
Edit:
I tried to find some info on that device (seems to be Sercomm VD625) and it does not seem you can easily change TCP timeout settings (maybe via telnet/ssh if it supports it).
However, a simpler solution might be to avoid keeping an open connection to PostgreSQL if you will have large idle intervals; just connect when you need to and close the connection afterwards.

ill effects of reducing TCP socket connection retries

I have a tcp client in embedded Linux device,to establish connection with the server while the device is in running mode.
We have a program mode, where all activities have to seize, as the system parameters will be changed.
The way I designed it was create a socket on boot and close the connection and reopen the connections after coming out of program mode.
My problem is the 'connect' , during the boot up is blocking for more than 2 minutes , and it keeps on increasing as time goes on making the system sluggish.
someone told me that, changing the 'tcp_syn_retries' will eventually reduce the hog time and I tried and found that it will reduce the blocking time to under '1 ms'
Can anyone tell me about the possible implications of this change?
Also, can you suggest me how to implement the connect in a non blocking mode ? because the one i tried didn't establish the connection.
Any comments / response will be helpful.
Edit: As TCP has a 3way handshake, this would reduce the number of SYNC requests to the TCP server during the TCP handshaking. As a result , connecting to the remote TCP servers on a slow or sluggish connection will not be reliable
This is the info i got out of from googling. how much is too much ? Any suggestions welcome.

How to handle TCP keepalive in application

I have a TCP application running on VxWorks. I have SO_KEEPALIVE option set for my TCP connections. My application keep track of all TCP connection and put it into a link list.
If client is idle for long time, we see that connection is closing down. Connection is not listed in netstat output.
As the connection is closed by TCP stack, resources allocated for that connection are not cleaned up. Can you please help me figure out how does application get notified if connection is closed due to keep-alive's failures.
TCP keepalive is intended primarily to prevent network routers from shutting the TCP connection down during long periods of inactivity, not to prevent your OS or application from shutting down the connection when it deems appropriate.
In most TCP/IP implementations, you can determine if a connection has been closed by attempting to read from it.
From this reference : http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html
I quote :
This procedure is useful because if the other peers lose their connection (for example by rebooting) you will notice that the connection is broken, even if you don't have traffic on it. If the keepalive probes are not replied to by your peer, you can assert that the connection cannot be considered valid and then take the correct action.
If you have a server for instance and a lot of clients can connect to it, without sending regularly, you might end up in a situation with clients that are no longer there. A client may have rebooted and this goes undetected because a FIN is never sent in that case.
For cases like this the keepalive exists.
From TCP point of view there is nothing special with a keep alive. And hence if the peer fails to ack a keepalive, you will receive 0 bytes on your socket and you'll have to close your end of the socket. Which is the only corrective action you can do at that moment.
As the connection is closed by TCP stack, resources allocated for that connection are not cleaned up.
Only if you never use the connection again.
If client is idle for long time, we see that connection is closing down. Connection is not listed in netstat output.
Make up your mind. Either you see it or you don't. What you will see is the port in CLOSE_WAIT in netstat.
Can you please help me figure out how does application get notified if connection is closed due to keep-alive's failures.
Next time you use the connection for read or write you will get an ECONNRESET.

Timeouts connecting to a Postgres database on Amazon RDS from Azure

I get the following exception in my application after leaving a database connection idle for some amount of time:
... An I/O error occured while sending to the backend.; nested exception is org.postgresql.util.PSQLException: An I/O error occured while sending to the backend.] with root cause
java.net.SocketException: Operation timed out
at java.net.SocketInputStream.socketRead0(Native Method)
The same issue happens in psql AND I don't have issues connecting to a local database, so I'm pretty sure the problem is on RDS.
psql=> select 'ok';
SSL SYSCALL error: Operation timed out
psql=> select 'ok';
SSL SYSCALL error: EOF detected
The connection to the server was lost. Attempting reset: Succeeded.
I found this other question which suggests a work around that improved the situation (timeouts now take a lot longer) but didn't fix it.
I'm using Spring Boot with JDBC (tomcat connection pooling) and JDBCTemplate.
Is there a work around or a fix?
Perhaps forcing the connection pool to test and reconnect?
How do I do that in this environment?
EDIT:
This is my connection string
jdbc:postgresql://myhost.c2estvxozjm3.eu-west-1.rds.amazonaws.com/dashboard?tcpKeepAlive=true
SOLUTION:
Edited the RDS server side TCP_KeepAlive parameters as suggested in the selected answer. The parameters I'm using are:
tcp_keepalives_count 5
tcp_keepalives_idle 200
tcp_keepalives_interval 200
It looks like something - maybe a NAT router on your end, maybe something on AWS's end - is connection tracking, and is forgetting about connections after a while.
I suggest enabling TCP keepalives. You might be able to enable them server side in the AWS RDS configuration; if not, you can request them client-side in the JDBC driver.
TCP keepalives are a lot better than a validation/test query, because they're much lower overhead, and they don't result in unnecessary log spam in the server query logs.
Maybe try
spring.datasource.validation-query=SELECT 1
spring.datasource.test-on-borrow=true
(See AbstractDataSourceConfiguration for other options.)
In your connection string are you also including the port or just the endpoint? Try using the entire endpoint in your connection string. Also check to make sure the security group assigned to the RDS instance has the proper ports and Inbound CIDR defined.