For the relevant part of our server stack, we're running:
NGINX 1.2.3
PHP-FPM 5.3.10 with PECL mongo 1.2.12
MongoDB 2.0.7
CentOS 6.2
We're getting some strange, but predictable behavior when the MongoDB server goes away (crashes, gets killed, etc). Even with a try/catch block around the connection code, i.e:
try
{
$mdb = new Mongo('mongodb://localhost:27017');
}
catch (MongoConnectionException $e)
{
die( $e->getMessage() );
}
$db = $mdb->selectDB('collection_name');
Depending on which PHP-FPM workers have connected to mongo already, the connection state is cached, causing further exceptions to go unhandled, because the $mdb connection handler can't be used. The troubling thing is that the try does not consistently fail for a considerable amount of time, up to 15 minutes later, when -- I assume -- the php-fpm processes die/respawn.
Essentially, the behavior is that when you hit a worker that hasn't connected to mongo yet, you get the die message above, and when you connect to a worker that has, you get an unhandled exception from $mdb->selectDB('collection_name'); because catch does not run.
When PHP is a single process, i.e. via Apache with mod_php, this behavior does not occur. Just for posterity, going back to Apache/mod_php is not an option for us at this time.
Is there a way to fix this behavior? I don't want the connection state to be inconsistent between different php-fpm processes.
Edit:
While I wait for the driver to be fixed in this regard, my current workaround is to do a quick polling to determine if the driver can handle requests and then load or not load the MongoDB library/run queries if it can't connect/query:
try
{
// connect
$mongo = new Mongo("mongodb://localhost:27017");
// try to do anything with connection handle
try
{
$mongo->YOUR_DB->YOUR_COLLECTION->findOne();
$mongo->close();
define('MONGO_STATE', TRUE);
}
catch(MongoCursorException $e)
{
$mongo->close();
error_log('Error connecting to MongoDB: ' . $e->getMessage() );
define('MONGO_STATE', FALSE);
}
}
catch(MongoConnectionException $e)
{
error_log('Error connecting to MongoDB: ' . $e->getMessage() );
define('MONGO_STATE', FALSE);
}
The PHP mongo driver connectivity code is getting a big overhaul in the 1.3 release, currently in beta2 as of writing this. Based on your description, your issues may be resolved by the fixes for:
https://jira.mongodb.org/browse/PHP-158
https://jira.mongodb.org/browse/PHP-465
Once it is released you will be able to see the full list of fixes here:
https://jira.mongodb.org/browse/PHP/fixforversion/10499
Or, alternatively on the PECL site. If you can test 1.3 and confirm that your issues are still present then I'm sure the driver devs would love to hear from you before the 1.3.0 release, especially if it is easily reproducible.
Related
I'm using an official Perl driver for working with mongodb. To catch and handle errors, Try::Tiny and Safe::Isa modules are recommended. However, it doesn't work as expected. Please check code below that should work according to documentation but in fact it doesn't work:
use MongoDB;
use Try::Tiny;
use Safe::Isa;
my $client;
try {
$client = MongoDB->connect('mongodb://localhost');
$client->connect;
} catch {
warn "caught error: $_";
};
my $collection = $client->ns('foo.bar');
try {
my $all = $collection->find;
} catch {
warn "2 - caught error: $_";;
};
As far as connections are established automatically according documentation there will be no exception on connect(). But there is no exception on request too! Also I added $client->connect string to force connection, but again no exception. I'm running this script at machine where there is no mongodb installed and no mongodb docker container running so an exception definitely must appear.
Could someone explain what am I doing wrong?
This is a subtle issue. find returns a cursor object but doesn't issue the query immediately. From the documentation for MongoDB::Collection:
Note, a MongoDB::Cursor object holds the query and does not issue the
query to the server until the `result` method is called on it or until
an iterator method like `next` is called.
Why the error
`Error: The hook 'orm' is taking too long to load.`
It occurs very often when sails is lifting? Even orm timeout already set to 100000, it still occur sometimes (not always). Usually it happen after PC has been restarted and sails run at first time.
It also often occur at cloud server and my laptop beside my PC. So it occur at my tested environment:
Windows 8.1 (PC and laptop) and Linux (Ubuntu 14.04)
node.js version 0.10.38
sails version 0.11
MongoDB version 3
The complete error report is like this
error: Error: The hook `orm` is taking too long to load.
Make sure it is triggering its `initialize()` callback, or else set `sails.config.orm._hookTimeout to a higher value (currently 20000)
at [object Object].tooLong [as _onTimeout] (D:\Workspace\Hellowin\cannes\node_modules\sails\lib\app\private\loadHooks.js:92:21)
at Timer.listOnTimeout [as ontimeout] (timers.js:112:15) { [Error: The hook `orm` is taking too long to load.
Make sure it is triggering its `initialize()` callback, or else set `sails.config.orm._hookTimeout to a higher value (currently 20000)] code: 'E_HOOK_TIMEOUT' }
D:\Workspace\Hellowin\cannes\node_modules\sails\node_modules\async\lib\async.js:30
if (called) throw new Error("Callback was already called.");
^
Error: Callback was already called.
at D:\Workspace\Hellowin\cannes\node_modules\sails\node_modules\async\lib\async.js:30:31
at process._tickDomainCallback (node.js:492:13)
Is it a MongoDB adapter bug?
I have encountered the same error recently and found syntax error in one of my model. Now it is working fine. SO basically this error occurs because of some error related to the database. you can check your models, connection, etc..
TLDR:
Lots of TCP connections in OPEN_WAIT status shutting down server
Setup:
riak_1.2.0-1_amd64.deb installed on Ubuntu12
Spring MVC 3.2.5
riak-client-1.1.0.jar
Tomcat7.0.51 hosted on Windows Server 2008 R2
JRE6_45
Full Description:
How do I ensure that the Java RiakClient is properly cleaning up it's connections to that I'm not left with an abundance of CLOSE_WAIT tcp connections?
I have a Spring MVC application which uses the Riak java client to connect to the remote instance/cluster.
We are seeing a lot of TCP Connections on the server hosting the Spring MVC application, which continue to build up until the server can no longer connect to anything because there are no ports available.
Restarting the Riak cluster does not clean the connections up.
Restarting the webapp does clean up the extra connections.
We are using the HTTPClientAdapter and REST api.
When connecting to a relational database, I would normally clean up connections by either explicitly calling close on the connection, or by registering the datasource with a pool and transaction manager and then Annotating my Services with #Transactional.
But since using the HTTPClientAdapter, I would have expected this to be more like an HttpClient.
With an HttpClient, I would consume the Response entity, with EntityUtils.consume(...), to ensure that the everything is properly cleaned up.
HTTPClientAdapter does have a shutdown method, and I see it being called in the online examples.
When I traced the method call through to the actual RiakClient, the method is empty.
Also, when I dig through the source code, nowhere in it does it ever close the Stream on the HttpResponse or consume any response entity (as with the standard Apache EntityUtils example).
Here is an example of how the calls are being made.
private RawClient getRiakClientFromUrl(String riakUrl) {
return new HTTPClientAdapter(riakUrl);
}
public IRiakObject fetchRiakObject(String bucket, String key, boolean useCache) {
try {
MethodTimer timer = MethodTimer.start("Fetch Riak Object Operation");
//logger.debug("Fetching Riak Object {}/{}", bucket, key);
RiakResponse riakResponse;
riakResponse = riak.fetch(bucket, key);
if(!riakResponse.hasValue()) {
//logger.debug("Object {}/{} not found in riak data store", bucket, key);
return null;
}
IRiakObject[] riakObjects = riakResponse.getRiakObjects();
if(riakObjects.length > 1) {
String error = "Got multiple riak objects for " + bucket + "/" + key;
logger.error(error);
throw new RuntimeException(error);
}
//logger.debug("{}", timer);
return riakObjects[0];
}
catch(Exception e) {
logger.error("Error fetching " + bucket + "/" + key, e);
throw new RuntimeException(e);
}
}
The only option I can think of, is to create the RiakClient separately from the adapter so I can access the HttpClient and then the ConnectionManager.
I am currently working on switching over to the PBClientAdapter to see if that might help, but for the purposes of this question (and because the rest of the team may not like me switching for whatever reason), let's assume that I must continue to connect over HTTP.
So it's been almost a year, so I thought I would go ahead and post how I solved this problem.
The solution was to change the client implementation we were using to the HTTPClientAdapter provided by the java client, passing in the configuration to implement pools and max connections. Here's some code example of how to do it.
First, we are on an older version of RIAK, so here's the amven dependency:
<dependency>
<groupId>com.basho.riak</groupId>
<artifactId>riak-client</artifactId>
<version>1.1.4</version>
</dependency>
And here's the example:
public RawClient riakClient(){
RiakConfig config = new RiakConfig(riakUrl);
//httpConnectionsTimetolive is in seconds, but timeout is in milliseconds
config.setTimeout(30000);
config.setUrl("http://myriakurl/);
config.setMaxConnections(100);//Or whatever value you need
RiakClient client = new RiakClient(riakConfig);
return new HTTPClientAdapter(client);
}
I actually broke that up a bit in my implementation and used Spring to inject values; I just wanted to show a simplified example for it.
By setting the timeout to something less than the standard five minutes, the system will not hang to the connections for too long (so, 5 minutes + whatever you set the timeout to) which causes the connectiosn to enter the close_wait status sooner.
And of course setting the max connections in the pool prevents the application from opening up 10's of thousands of connections.
I have a replica set consisting of four nodes (ux002, ux009, ux019, ux020). I have a program that I'd like to run in parallel on each of the same four nodes which connects to this replica set using the Mongo Java driver.
Examining the status of the replica set shows that all four nodes are operating fine, however the program throws the following warning message on all four nodes:
Nov 12, 2014 2:34:40 PM com.mongodb.ConnectionStatus$UpdatableNode update
WARNING: Server seen down: ux009/127.0.1.1:27017 - java.io.IOException - message: couldn't connect to [ux009/127.0.1.1:27017] bc:java.net.ConnectException: Connection refused
However, on each node, the server which is seen down is the one the program is running on. I.e. I run the program on ux009, and it tells me ux009 is down. I run it on ux002, it tells me ux002 is down.
I made a stupidly simple program to test whether there was something wrong with my original code, but the same warning persists:
public static void main(String[] args) throws Exception {
List<ServerAddress> addrs = new ArrayList<>();
if (args.length == 0) {
addrs.add(new ServerAddress("localhost", 27017));
} else {
for (String a : args) {
String[] host = a.split(":");
addrs.add(new ServerAddress(host[0], Integer.valueOf(host[1])));
}
}
mongo = new Mongo(addrs);
Thread.sleep(5000); // Sleep to give it time to print messages
mongo.close();
}
And I run it as follows:
java -jar mongo-test.jar ux002:27017 ux009:27017 ux019:27017 ux020:27017
Could it be that mongod isn't configured correctly? Or perhaps I am misusing the Java API?
The Mongo Java driver is version 2.9.3, and mongod is version 2.6.5.
Many thanks in advance!
-Jim
The IP is a little strange for the local host:
ux009/127.0.1.1:27017
I would have expected that to be:
ux009/127.0.0.1:27017
Most likely someone fat fingered the ip address in the /etc/hosts on each machine.
Posting the answer here for completeness. The issue was that the bind_ip parameter in the mongod configuration file had been set to the IP address of only one of the nodes. Thanks to helmy for spotting that.
Following the recommended transaction setup for Squeryl, in my Boot.scala:
import net.liftweb.squerylrecord.SquerylRecord
import org.squeryl.Session
import org.squeryl.adapters.H2Adapter
SquerylRecord.initWithSquerylSession(Session.create(
DriverManager.getConnection("jdbc:h2:lift_proto.db;DB_CLOSE_DELAY=-1", "sa", ""),
new H2Adapter
))
The first startup works fine. I can connect via H2's web-interface and if I use my app, it updates the database appropriately. However if I restart jetty without restarting the JVM, I get:
java.sql.SQLException: No suitable driver found for jdbc:h2:lift_proto.db;DB_CLOSE_DELAY=-1
The same result is had if I replace "DB_CLOSE_DELAY=-1" with "AUTO_SERVER=TRUE", or remove it entirely.
Following the recommendations on the Squeryl list, I tried C3P0:
import com.mchange.v2.c3p0.ComboPooledDataSource
val cpds = new ComboPooledDataSource
cpds.setDriverClass("org.h2.Driver")
cpds.setJdbcUrl("jdbc:h2:lift_proto")
cpds.setUser("sa")
cpds.setPassword("")
org.squeryl.SessionFactory.concreteFactory =
Some(() => Session.create(
cpds.getConnection, new H2Adapter())
)
This produces similar behavior:
WARNING: A C3P0Registry mbean is already registered. This probably means that an application using c3p0 was undeployed, but not all PooledDataSources were closed prior to undeployment. This may lead to resource leaks over time. Please take care to close all PooledDataSources.
To be sure it wasn't anything I was doing which was causing this, I started and stopped the server without calling a transaction { } block. No exceptions were thrown. I then added to my Boot.scala:
transaction { /* Do nothing */ }
And the exception was once again thrown (I'm assuming because connections are lazy). So I moved the db initialization code to its own file away from Lift:
SessionFactory.concreteFactory = Some(()=>
Session.create(
java.sql.DriverManager.getConnection("jdbc:h2:mem:test", "sa", ""),
new H2Adapter
))
transaction {}
Results were unchanged. What am I doing wrong? I cannot find any mention of needing to explicitly close connections or sessions in the Squeryl documentation, and this is my first time using JDBC.
I found mention of the same issue here on the Lift google group, but no resolution.
Thanks for any help.
When you say you are restarting Jetty, I think what you're actually doing is reloading your webapp within Jetty. Neither the h2 database or C3P0 will automatically shut down when your app reloads, which explains the errors you are receiving when Lift tries to initialize them a second time. You don't see the error when you don't create a transaction block because both h2 and C3P0 are initialized when the first DB connection is retrieved.
I tend to use BoneCP as a connection pool myself. You can configure the minimum number of pooled connections to be > 1, which will stop h2 from shutting down without the need for DB_CLOSE_DELAY=-1. Then you can use:
LiftRules.unloadHooks append { () =>
yourPool.close() //should destroy the pool and it's associated threads
}
That will close all of the connections when Lift is shutdown, which should properly shutdown the h2 database as well.