Timeouts connecting to a Postgres database on Amazon RDS from Azure - postgresql

I get the following exception in my application after leaving a database connection idle for some amount of time:
... An I/O error occured while sending to the backend.; nested exception is org.postgresql.util.PSQLException: An I/O error occured while sending to the backend.] with root cause
java.net.SocketException: Operation timed out
at java.net.SocketInputStream.socketRead0(Native Method)
The same issue happens in psql AND I don't have issues connecting to a local database, so I'm pretty sure the problem is on RDS.
psql=> select 'ok';
SSL SYSCALL error: Operation timed out
psql=> select 'ok';
SSL SYSCALL error: EOF detected
The connection to the server was lost. Attempting reset: Succeeded.
I found this other question which suggests a work around that improved the situation (timeouts now take a lot longer) but didn't fix it.
I'm using Spring Boot with JDBC (tomcat connection pooling) and JDBCTemplate.
Is there a work around or a fix?
Perhaps forcing the connection pool to test and reconnect?
How do I do that in this environment?
EDIT:
This is my connection string
jdbc:postgresql://myhost.c2estvxozjm3.eu-west-1.rds.amazonaws.com/dashboard?tcpKeepAlive=true
SOLUTION:
Edited the RDS server side TCP_KeepAlive parameters as suggested in the selected answer. The parameters I'm using are:
tcp_keepalives_count 5
tcp_keepalives_idle 200
tcp_keepalives_interval 200

It looks like something - maybe a NAT router on your end, maybe something on AWS's end - is connection tracking, and is forgetting about connections after a while.
I suggest enabling TCP keepalives. You might be able to enable them server side in the AWS RDS configuration; if not, you can request them client-side in the JDBC driver.
TCP keepalives are a lot better than a validation/test query, because they're much lower overhead, and they don't result in unnecessary log spam in the server query logs.

Maybe try
spring.datasource.validation-query=SELECT 1
spring.datasource.test-on-borrow=true
(See AbstractDataSourceConfiguration for other options.)

In your connection string are you also including the port or just the endpoint? Try using the entire endpoint in your connection string. Also check to make sure the security group assigned to the RDS instance has the proper ports and Inbound CIDR defined.

Related

What exactly does a connection pool for databases like PostgreSQL do?

I know the general idea that a connection pool is a pool of reusable connections that speeds up traffic to the database because it can reuse connections instead of constantly creating new ones.
But this is a very high level explanation. It doesn't explain what is meant by a connection and why the connection pool works, since even with a connection pool such as for example client -> PgBouncer -> PostgreSQL, while the client does not have to create a connection to the databasee, it still has to connect to create a connection to the proxy.
So what is the connection created from (e.g.) client -> PgBouncer and why is creating this connection faster than creating the connection PgBouncer -> PostgreSQL?
There are two uses of a connection pool:
it prevents opening and closing database connections all the time
There is certainly a certain overhead with establishing a TCP connection to pgBouncer, but that is way cheaper than establishing a database connection. When you start a database connection, additional work is done:
a server process is started, which is way more expensive than a TCP connection
PostgreSQL loads cached metadata tables
it puts a limit on the number of client connections, thereby preventing database overload
The advantage over limiting max_connections is that connections in excess of the limit won't receive an error, but will be queued waiting for a connection to become free.

PostgreSQL 10 error: server closed the connection unexpectedly

When I run a query that takes a long time on my Postgres server (maybe 30 minutes), I get the error. I've verified the query is running with active status on the server using pgAdmin. I've also verified the correctness of the query, as it runs successfully on a smaller dataset. Server configurations are default, I haven't changed anything. Please help!
Look into the PostgreSQL server log.
Either you'll find a crash report there, which would explain the broken connection, or there is something in your network that cuts connections with no activity after a while.
Investigate your firewalls!
Maybe it is a solution to set the configuration parameter tcp_keepalives_idle to a value shorter than the time when the connection is cut. That will cause the server operating system to send keepalive messages on idle connections, which may be enough to prevent the overzealous connection reaper in your environment from disrupting your work.

How are read and write socket errors defined in the wrk HTTP benchmarking tool?

I am using the wrk HTTP benchmarking tool to test a server. And I am getting READ, WRITE as well as CONNECTION and TIMEOUT errors.
What I understand is:
CONNECTION errors, are caused by the refusal of a TCP connection.
Which could involve every element in the connection chain (Client,
ISP and Server).
TIMEOUT errors, are caused by the host failing to respond to a
request within a certain time.
But what about READ and WRITE errors?
I would really appreciate, if someone could point me in the direction of a good resource?
So what I understood from this code from the WRK repository is that.
WRITE ERROR’s happen when attempting to write on a connection, but it fails because of a closed socket on the server.
READ ERROR’s happen when attempting to read on a connection, but it fails because of a closed socket on the server.
Happy if anybody can confirm or refute that.

Connection timeout to MongoDb on Azure VM

I have some timeout problems when connecting my Azure Web App to a MongoDb hosted on a Azure VM.
2015-12-19T15:57:47.330+0100 I NETWORK Socket recv() errno:10060 A connection attempt
failed because the connected party did not properly respond after a period of time,
or established connection failed because connected host has failed to respond.
2015-12-19T15:57:47.343+0100 I NETWORK SocketException: remote: 104.45.x.x:27017 error:
9001 socket exception [RECV_ERROR] server [104.45.x.x:27017]
2015-12-19T15:57:47.350+0100 I NETWORK DBClientCursor::init call() failed
Currently mongodb is configured on a single server (just for dev) and it is exposed through a public ip. Website connect to it using an azure domain name (*.westeurope.cloudapp.azure.com) and without a Virtual Network.
Usually everything works well, but after some minutes of inactivity I get that timeout exception. The same will happen when using the MongoDb shell from my PC, so I'm quite sure that it is a problem on mongodb side.
I'm missing some configuration?
After some searching here my considerations:
It is usually a good practice to implement some sort of retry logic on every resource that you access on Azure (database, VM, ...). For MongoDb there is a partial implementation so you should potentially write your own. See also this issue and this.
If possible all resources on Azure should be in the same Azure Virtual Network (in this way all connections are made using Azure Private Ip instead of Public Ip. This is also useful for security reasons because you don't need to open endpoint to the public.
When deploying MongoDb on Azure try to follow the official MongoDb guidelines.
In this particular case you should set the net.ipv4.tcp_keepalive_time to a value lower than the tcp keep alive of Azure, that by default is 240 seconds. In this way the connection is closed and MongoDb driver can intercept this condition and open a new connection. If the connection is closed by Azure the driver cannot intercept it. If you want to change this setting on Azure (not recommended) you can find it inside the Public Ip configuration.
In my development environment I have set the net.ipv4.tcp_keepalive_time to 120 and now everything seems to work fine. Consider that if you host MondoDb inside an Docker container you should set this setting on the Docker host.
Here some other useful links:
http://focusmatic.tumblr.com/post/39569711018/solving-mongodb-connection-losses-on-windows-azure
https://docs.mongodb.org/ecosystem/platforms/windows-azure/
https://michaelmckeownblog.wordpress.com/2013/12/04/resolving-internal-ips-vs-dns-names-between-vms/
https://gist.github.com/davideicardi/f2094c4c3f3e00fbd490
MongoDB connection problems on Azure
MongoDB connection timeouts (Azure)
When using the C# Mongo driver we resolved this by setting the following
MongoDefaults.MaxConnectionIdleTime = TimeSpan.FromMinutes(1);

How to get the local port of a jdbc connection?

As far as I know when one establishes multiple Connection objects via JDBC to one database then each connection occupies a separate port on the machine where the Connection is established (they all connect to one port on the server where the DBMS is running).
I tried to extract the port that corresponds to the Connection objects. Unfortunately I did not find any way to do so.
Background: I'm doing performance analysis where I setup multiple clients which issue queries on the db. I'm logging the execution time of queries on the database server. In the resulting log I have - among others - information about the connection who initiated the query e.g. localhost.localdomain:44760 I hope it is possible to use this information to map each query to the client or more precisely the Connection object who initiated the query (which is my ultimate goal and serves analysis purposes).
Just run this select through the JDBC connection:
select inet_client_port()
More functions like that are in the manual:
http://www.postgresql.org/docs/current/static/functions-info.html