TLDR:
Lots of TCP connections in OPEN_WAIT status shutting down server
Setup:
riak_1.2.0-1_amd64.deb installed on Ubuntu12
Spring MVC 3.2.5
riak-client-1.1.0.jar
Tomcat7.0.51 hosted on Windows Server 2008 R2
JRE6_45
Full Description:
How do I ensure that the Java RiakClient is properly cleaning up it's connections to that I'm not left with an abundance of CLOSE_WAIT tcp connections?
I have a Spring MVC application which uses the Riak java client to connect to the remote instance/cluster.
We are seeing a lot of TCP Connections on the server hosting the Spring MVC application, which continue to build up until the server can no longer connect to anything because there are no ports available.
Restarting the Riak cluster does not clean the connections up.
Restarting the webapp does clean up the extra connections.
We are using the HTTPClientAdapter and REST api.
When connecting to a relational database, I would normally clean up connections by either explicitly calling close on the connection, or by registering the datasource with a pool and transaction manager and then Annotating my Services with #Transactional.
But since using the HTTPClientAdapter, I would have expected this to be more like an HttpClient.
With an HttpClient, I would consume the Response entity, with EntityUtils.consume(...), to ensure that the everything is properly cleaned up.
HTTPClientAdapter does have a shutdown method, and I see it being called in the online examples.
When I traced the method call through to the actual RiakClient, the method is empty.
Also, when I dig through the source code, nowhere in it does it ever close the Stream on the HttpResponse or consume any response entity (as with the standard Apache EntityUtils example).
Here is an example of how the calls are being made.
private RawClient getRiakClientFromUrl(String riakUrl) {
return new HTTPClientAdapter(riakUrl);
}
public IRiakObject fetchRiakObject(String bucket, String key, boolean useCache) {
try {
MethodTimer timer = MethodTimer.start("Fetch Riak Object Operation");
//logger.debug("Fetching Riak Object {}/{}", bucket, key);
RiakResponse riakResponse;
riakResponse = riak.fetch(bucket, key);
if(!riakResponse.hasValue()) {
//logger.debug("Object {}/{} not found in riak data store", bucket, key);
return null;
}
IRiakObject[] riakObjects = riakResponse.getRiakObjects();
if(riakObjects.length > 1) {
String error = "Got multiple riak objects for " + bucket + "/" + key;
logger.error(error);
throw new RuntimeException(error);
}
//logger.debug("{}", timer);
return riakObjects[0];
}
catch(Exception e) {
logger.error("Error fetching " + bucket + "/" + key, e);
throw new RuntimeException(e);
}
}
The only option I can think of, is to create the RiakClient separately from the adapter so I can access the HttpClient and then the ConnectionManager.
I am currently working on switching over to the PBClientAdapter to see if that might help, but for the purposes of this question (and because the rest of the team may not like me switching for whatever reason), let's assume that I must continue to connect over HTTP.
So it's been almost a year, so I thought I would go ahead and post how I solved this problem.
The solution was to change the client implementation we were using to the HTTPClientAdapter provided by the java client, passing in the configuration to implement pools and max connections. Here's some code example of how to do it.
First, we are on an older version of RIAK, so here's the amven dependency:
<dependency>
<groupId>com.basho.riak</groupId>
<artifactId>riak-client</artifactId>
<version>1.1.4</version>
</dependency>
And here's the example:
public RawClient riakClient(){
RiakConfig config = new RiakConfig(riakUrl);
//httpConnectionsTimetolive is in seconds, but timeout is in milliseconds
config.setTimeout(30000);
config.setUrl("http://myriakurl/);
config.setMaxConnections(100);//Or whatever value you need
RiakClient client = new RiakClient(riakConfig);
return new HTTPClientAdapter(client);
}
I actually broke that up a bit in my implementation and used Spring to inject values; I just wanted to show a simplified example for it.
By setting the timeout to something less than the standard five minutes, the system will not hang to the connections for too long (so, 5 minutes + whatever you set the timeout to) which causes the connectiosn to enter the close_wait status sooner.
And of course setting the max connections in the pool prevents the application from opening up 10's of thousands of connections.
Related
I have an embedded ActiveMQ Artemis 2.17.0 broker that has no pre-configured queues. I'd like for the client to connect to the broker and have its queue created automatically if it doesn't exist. This works just fine the 1st time when I run the client but on the second time I connect again to the broker the client throws an error :
errorType=QUEUE_EXISTS message=AMQ229019: Queue hornetq already exists on address router
Is it possible to configure the client or the server in such a way that if a queue is already in existence it should not attempt to recreate it again?
Below are the programs I am running:
Server
try {
ActiveMQServer server = new ActiveMQServerImpl(new ConfigurationImpl()
.setPersistenceEnabled(true)
.setBindingsDirectory("./router/data/bindings")
.setLargeMessagesDirectory("./router/data/large")
.setPagingDirectory("./router/data/paging")
.setJournalDirectory("./router/data/journal")
.setSecurityEnabled(false)
.addAcceptorConfiguration("tcp", "tcp://0.0.0.0:61617?protocols=CORE,AMQP"));
server.start();
} catch (Exception ex) {
System.err.println(ex);
}
Client
ServerLocator serverLocator = ActiveMQClient.createServerLocator("tcp://127.0.0.1.3:61617");
ClientSessionFactory factory = serverLocator.createSessionFactory();
ClientSession session = factory.createSession();
session.createQueue(new QueueConfiguration("router::hornetq")
.setAutoCreateAddress(Boolean.FALSE)
.setAutoCreated(Boolean.FALSE)
.setRoutingType(RoutingType.ANYCAST));
ClientProducer producer = session.createProducer("router::hornetq");
ClientMessage message = session.createMessage(true);
message.getBodyBuffer().writeString("Core Queue Message");
producer.send(message);
session.start();
ClientConsumer consumer = session.createConsumer("router::hornetq");
ClientMessage msgReceived = consumer.receive();
System.out.println("message = " + msgReceived.getBodyBuffer().readString());
session.close();
I'm using the fully qualified queue name here (i.e. router::hornetq) because I have multiple queues on the router address.
Your client is using the core API which is a low-level API that does not support auto-queue creation. Your client is creating (or attempting to create) the queue manually, e.g.:
session.createQueue(new QueueConfiguration("router::hornetq")
.setAutoCreateAddress(Boolean.FALSE)
.setAutoCreated(Boolean.FALSE)
.setRoutingType(RoutingType.ANYCAST));
The exception you're seeing is no doubt being thrown here if the queue exists already. Creating a queue is not idempotent so you have 3 options from what I can see:
Simply catch the ActiveMQQueueExistsException that is being thrown here and ignore it.
Use ClientSession.queueQuery to see if the queue exists before you attempt to create it. If it exists then don't try to create it again. If it doesn't exist then create it. That said, if you have lots of clients like this running concurrently it's still possible you could get an ActiveMQQueueExistsException due to race conditions between the clients.
Use a client/protocol that supports auto-creation like the core JMS client or AMQP.
A few other things are worth mentioning:
Since you're not explicitly calling ClientSession.createAddress() you probably want to use setAutoCreateAddress(Boolean.TRUE).
Your consumer doesn't have to use router::hornetq. It can just use hornetq and it will receive any messages sent to the hornetq queue. Queue names are universally unique in the broker.
I am writing an apache-camel RabbitMQ consumer. I would like to react somehow to connection problems (i.e. try to reconnect). Is it possible to configure apache-camel to automatically reconnect?
If not, how can I find out that a connection to the queue was interrupted? I've done the following test:
start the queue (and some producer)
start my consumer (it was getting messages as expected)
stop the queue (the messages stopped arriving, as expected, but no exception was thrown)
start the queue (no new messages were received)
I am using camel in Scala (via akka-camel), but a Java solution would be probably also OK
You can pass in the flag automaticRecoveryEnabled=true to the URI, Camel will reconnect if the connection is lost.
For automatic RabbitMQ resource recovery (Connections/Channels/Consumers/Queues/Exchanages/Bindings) when failures occur, check out Lyra (which I authored). Example usage:
Config config = new Config()
.withRecoveryPolicy(new RecoveryPolicy()
.withMaxAttempts(20)
.withInterval(Duration.seconds(1))
.withMaxDuration(Duration.minutes(5)));
ConnectionOptions options = new ConnectionOptions().withHost("localhost");
Connection connection = Connections.create(options, config);
The rest of the API is just the amqp-client API, except your resources are automatically recovered when failures occur.
I'm not sure about camel-rabbitmq specifically, but hopefully there's a way you can swap in your own resource creation via Lyra.
Current camel-rabbitmq just create a connection and the channel when the consumer or producer is started. So it don't have a chance to catch the connection exception :(.
Following the recommended transaction setup for Squeryl, in my Boot.scala:
import net.liftweb.squerylrecord.SquerylRecord
import org.squeryl.Session
import org.squeryl.adapters.H2Adapter
SquerylRecord.initWithSquerylSession(Session.create(
DriverManager.getConnection("jdbc:h2:lift_proto.db;DB_CLOSE_DELAY=-1", "sa", ""),
new H2Adapter
))
The first startup works fine. I can connect via H2's web-interface and if I use my app, it updates the database appropriately. However if I restart jetty without restarting the JVM, I get:
java.sql.SQLException: No suitable driver found for jdbc:h2:lift_proto.db;DB_CLOSE_DELAY=-1
The same result is had if I replace "DB_CLOSE_DELAY=-1" with "AUTO_SERVER=TRUE", or remove it entirely.
Following the recommendations on the Squeryl list, I tried C3P0:
import com.mchange.v2.c3p0.ComboPooledDataSource
val cpds = new ComboPooledDataSource
cpds.setDriverClass("org.h2.Driver")
cpds.setJdbcUrl("jdbc:h2:lift_proto")
cpds.setUser("sa")
cpds.setPassword("")
org.squeryl.SessionFactory.concreteFactory =
Some(() => Session.create(
cpds.getConnection, new H2Adapter())
)
This produces similar behavior:
WARNING: A C3P0Registry mbean is already registered. This probably means that an application using c3p0 was undeployed, but not all PooledDataSources were closed prior to undeployment. This may lead to resource leaks over time. Please take care to close all PooledDataSources.
To be sure it wasn't anything I was doing which was causing this, I started and stopped the server without calling a transaction { } block. No exceptions were thrown. I then added to my Boot.scala:
transaction { /* Do nothing */ }
And the exception was once again thrown (I'm assuming because connections are lazy). So I moved the db initialization code to its own file away from Lift:
SessionFactory.concreteFactory = Some(()=>
Session.create(
java.sql.DriverManager.getConnection("jdbc:h2:mem:test", "sa", ""),
new H2Adapter
))
transaction {}
Results were unchanged. What am I doing wrong? I cannot find any mention of needing to explicitly close connections or sessions in the Squeryl documentation, and this is my first time using JDBC.
I found mention of the same issue here on the Lift google group, but no resolution.
Thanks for any help.
When you say you are restarting Jetty, I think what you're actually doing is reloading your webapp within Jetty. Neither the h2 database or C3P0 will automatically shut down when your app reloads, which explains the errors you are receiving when Lift tries to initialize them a second time. You don't see the error when you don't create a transaction block because both h2 and C3P0 are initialized when the first DB connection is retrieved.
I tend to use BoneCP as a connection pool myself. You can configure the minimum number of pooled connections to be > 1, which will stop h2 from shutting down without the need for DB_CLOSE_DELAY=-1. Then you can use:
LiftRules.unloadHooks append { () =>
yourPool.close() //should destroy the pool and it's associated threads
}
That will close all of the connections when Lift is shutdown, which should properly shutdown the h2 database as well.
For the relevant part of our server stack, we're running:
NGINX 1.2.3
PHP-FPM 5.3.10 with PECL mongo 1.2.12
MongoDB 2.0.7
CentOS 6.2
We're getting some strange, but predictable behavior when the MongoDB server goes away (crashes, gets killed, etc). Even with a try/catch block around the connection code, i.e:
try
{
$mdb = new Mongo('mongodb://localhost:27017');
}
catch (MongoConnectionException $e)
{
die( $e->getMessage() );
}
$db = $mdb->selectDB('collection_name');
Depending on which PHP-FPM workers have connected to mongo already, the connection state is cached, causing further exceptions to go unhandled, because the $mdb connection handler can't be used. The troubling thing is that the try does not consistently fail for a considerable amount of time, up to 15 minutes later, when -- I assume -- the php-fpm processes die/respawn.
Essentially, the behavior is that when you hit a worker that hasn't connected to mongo yet, you get the die message above, and when you connect to a worker that has, you get an unhandled exception from $mdb->selectDB('collection_name'); because catch does not run.
When PHP is a single process, i.e. via Apache with mod_php, this behavior does not occur. Just for posterity, going back to Apache/mod_php is not an option for us at this time.
Is there a way to fix this behavior? I don't want the connection state to be inconsistent between different php-fpm processes.
Edit:
While I wait for the driver to be fixed in this regard, my current workaround is to do a quick polling to determine if the driver can handle requests and then load or not load the MongoDB library/run queries if it can't connect/query:
try
{
// connect
$mongo = new Mongo("mongodb://localhost:27017");
// try to do anything with connection handle
try
{
$mongo->YOUR_DB->YOUR_COLLECTION->findOne();
$mongo->close();
define('MONGO_STATE', TRUE);
}
catch(MongoCursorException $e)
{
$mongo->close();
error_log('Error connecting to MongoDB: ' . $e->getMessage() );
define('MONGO_STATE', FALSE);
}
}
catch(MongoConnectionException $e)
{
error_log('Error connecting to MongoDB: ' . $e->getMessage() );
define('MONGO_STATE', FALSE);
}
The PHP mongo driver connectivity code is getting a big overhaul in the 1.3 release, currently in beta2 as of writing this. Based on your description, your issues may be resolved by the fixes for:
https://jira.mongodb.org/browse/PHP-158
https://jira.mongodb.org/browse/PHP-465
Once it is released you will be able to see the full list of fixes here:
https://jira.mongodb.org/browse/PHP/fixforversion/10499
Or, alternatively on the PECL site. If you can test 1.3 and confirm that your issues are still present then I'm sure the driver devs would love to hear from you before the 1.3.0 release, especially if it is easily reproducible.
Okay, this is a new one. I'm trying to debug my project, which I've done many times in the past, and I'm now getting this exception in one of my repositories. I haven't seen it before now. I haven't touched my repos in days, and my connection string is the same as its always been. The inner exception states:
{"A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)"}
And the code it's choking on is:
public class HGArticleRepository : IArticleRepository
{
private HGEntities _siteDB = new HGEntities();
public List<Article> Articles
{
get { return _siteDB.Articles.ToList(); } // <-- this is the line
}
// more repo code
}
Again, like I said, I've never encountered this exception before, and I haven't touched my domain code in days.
This error usually means:
Connection String points to nonexistent SQL Server.
Connection String points to SQL Server that was shut down. Or not started.
Named pipes transport was disabled in SQL Server settings.
Check them carefully one by one. In your case I guess it is 2.
A second option of solution:
Review that IIS is running.
In my case it was stopped, so I got the same error.