I have a ASP.NET Core application with npgsql that is setup to retry when a database connection fails:
services.AddDbContext<DatabaseContext>(
opt => opt.UseNpgsql(Configuration.GetConnectionString("DefaultConnection"),
(options) =>
{
options.EnableRetryOnFailure(10);
})
);
However, when i restart the database during a request, it will first throw an exception
Exception thrown: 'Npgsql.PostgresException' in System.Private.CoreLib.dll: '57P01: terminating connection due to administrator command'
That propogates into my code and causes an exception without any retries.
Looking into the documentation for npgsql it seems like some exceptions are marked as IsTransient, and that means they will be retried, but this one is not. I assume there's a reason for that, but I'd still like to be able to restart the postgres database without interrupting requests by simply having them retry after a while. I could create try catches around all the requests and check for this particular exception, but that seems like a clunky solution.
How do I handle this situation gracefully?
npgsql package used:
<PackageReference Include="Npgsql.EntityFrameworkCore.PostgreSQL" Version="5.0.10" />
Managed to find an answer myself, you can add the error code in the EnableRetryOnFailure
services.AddDbContext<DatabaseContext>(
opt => opt.UseNpgsql(Configuration.GetConnectionString("DefaultConnection"),
(options) =>
{
options.EnableRetryOnFailure(10, TimeSpan.FromSeconds(30), new string[] { "57P01" });
})
);
Now the only question that remains is if that is a good idea or not.
Related
I was trying to implement the failover available here
http://www.andreanolanusse.com/en/implementing-failover-and-load-balancing-in-datasnap-2010/
but if I add a database connection and kill the first server, when I use some of the remote functions or '.open' a TClientDataSet 2 times, it throws an exception and I need to reconnect to the database again.
Expected: '{' found: '-+0-9.' at position: 84
{"result":[{"rows":[0]},{"data":[44,#192#16#0(nabas - (Server 213) pacote36-trafegusgr]}]}#0#0#0#0#0#0#0#0#0#0#0#0#0#0#0#0
(the error message contains about 23000 of those '#0')
It works if I reconnect but when you have to implement in a system that have hundreds of '.open' and remote method calls it isn't the best option.
And since the error is only thrown the second time I click the button and call the method, it looks like it doesn't need to reconnect, it must be something I'm doing wrong.
Here's the project:
https://drive.google.com/file/d/0B6YhWGZN7O24Zl9pYkx6d3hrdDA/view?usp=sharing
it uses the postgresql connection from devart (DevartPostgreSQL)
Why the error
`Error: The hook 'orm' is taking too long to load.`
It occurs very often when sails is lifting? Even orm timeout already set to 100000, it still occur sometimes (not always). Usually it happen after PC has been restarted and sails run at first time.
It also often occur at cloud server and my laptop beside my PC. So it occur at my tested environment:
Windows 8.1 (PC and laptop) and Linux (Ubuntu 14.04)
node.js version 0.10.38
sails version 0.11
MongoDB version 3
The complete error report is like this
error: Error: The hook `orm` is taking too long to load.
Make sure it is triggering its `initialize()` callback, or else set `sails.config.orm._hookTimeout to a higher value (currently 20000)
at [object Object].tooLong [as _onTimeout] (D:\Workspace\Hellowin\cannes\node_modules\sails\lib\app\private\loadHooks.js:92:21)
at Timer.listOnTimeout [as ontimeout] (timers.js:112:15) { [Error: The hook `orm` is taking too long to load.
Make sure it is triggering its `initialize()` callback, or else set `sails.config.orm._hookTimeout to a higher value (currently 20000)] code: 'E_HOOK_TIMEOUT' }
D:\Workspace\Hellowin\cannes\node_modules\sails\node_modules\async\lib\async.js:30
if (called) throw new Error("Callback was already called.");
^
Error: Callback was already called.
at D:\Workspace\Hellowin\cannes\node_modules\sails\node_modules\async\lib\async.js:30:31
at process._tickDomainCallback (node.js:492:13)
Is it a MongoDB adapter bug?
I have encountered the same error recently and found syntax error in one of my model. Now it is working fine. SO basically this error occurs because of some error related to the database. you can check your models, connection, etc..
Following the recommended transaction setup for Squeryl, in my Boot.scala:
import net.liftweb.squerylrecord.SquerylRecord
import org.squeryl.Session
import org.squeryl.adapters.H2Adapter
SquerylRecord.initWithSquerylSession(Session.create(
DriverManager.getConnection("jdbc:h2:lift_proto.db;DB_CLOSE_DELAY=-1", "sa", ""),
new H2Adapter
))
The first startup works fine. I can connect via H2's web-interface and if I use my app, it updates the database appropriately. However if I restart jetty without restarting the JVM, I get:
java.sql.SQLException: No suitable driver found for jdbc:h2:lift_proto.db;DB_CLOSE_DELAY=-1
The same result is had if I replace "DB_CLOSE_DELAY=-1" with "AUTO_SERVER=TRUE", or remove it entirely.
Following the recommendations on the Squeryl list, I tried C3P0:
import com.mchange.v2.c3p0.ComboPooledDataSource
val cpds = new ComboPooledDataSource
cpds.setDriverClass("org.h2.Driver")
cpds.setJdbcUrl("jdbc:h2:lift_proto")
cpds.setUser("sa")
cpds.setPassword("")
org.squeryl.SessionFactory.concreteFactory =
Some(() => Session.create(
cpds.getConnection, new H2Adapter())
)
This produces similar behavior:
WARNING: A C3P0Registry mbean is already registered. This probably means that an application using c3p0 was undeployed, but not all PooledDataSources were closed prior to undeployment. This may lead to resource leaks over time. Please take care to close all PooledDataSources.
To be sure it wasn't anything I was doing which was causing this, I started and stopped the server without calling a transaction { } block. No exceptions were thrown. I then added to my Boot.scala:
transaction { /* Do nothing */ }
And the exception was once again thrown (I'm assuming because connections are lazy). So I moved the db initialization code to its own file away from Lift:
SessionFactory.concreteFactory = Some(()=>
Session.create(
java.sql.DriverManager.getConnection("jdbc:h2:mem:test", "sa", ""),
new H2Adapter
))
transaction {}
Results were unchanged. What am I doing wrong? I cannot find any mention of needing to explicitly close connections or sessions in the Squeryl documentation, and this is my first time using JDBC.
I found mention of the same issue here on the Lift google group, but no resolution.
Thanks for any help.
When you say you are restarting Jetty, I think what you're actually doing is reloading your webapp within Jetty. Neither the h2 database or C3P0 will automatically shut down when your app reloads, which explains the errors you are receiving when Lift tries to initialize them a second time. You don't see the error when you don't create a transaction block because both h2 and C3P0 are initialized when the first DB connection is retrieved.
I tend to use BoneCP as a connection pool myself. You can configure the minimum number of pooled connections to be > 1, which will stop h2 from shutting down without the need for DB_CLOSE_DELAY=-1. Then you can use:
LiftRules.unloadHooks append { () =>
yourPool.close() //should destroy the pool and it's associated threads
}
That will close all of the connections when Lift is shutdown, which should properly shutdown the h2 database as well.
For the relevant part of our server stack, we're running:
NGINX 1.2.3
PHP-FPM 5.3.10 with PECL mongo 1.2.12
MongoDB 2.0.7
CentOS 6.2
We're getting some strange, but predictable behavior when the MongoDB server goes away (crashes, gets killed, etc). Even with a try/catch block around the connection code, i.e:
try
{
$mdb = new Mongo('mongodb://localhost:27017');
}
catch (MongoConnectionException $e)
{
die( $e->getMessage() );
}
$db = $mdb->selectDB('collection_name');
Depending on which PHP-FPM workers have connected to mongo already, the connection state is cached, causing further exceptions to go unhandled, because the $mdb connection handler can't be used. The troubling thing is that the try does not consistently fail for a considerable amount of time, up to 15 minutes later, when -- I assume -- the php-fpm processes die/respawn.
Essentially, the behavior is that when you hit a worker that hasn't connected to mongo yet, you get the die message above, and when you connect to a worker that has, you get an unhandled exception from $mdb->selectDB('collection_name'); because catch does not run.
When PHP is a single process, i.e. via Apache with mod_php, this behavior does not occur. Just for posterity, going back to Apache/mod_php is not an option for us at this time.
Is there a way to fix this behavior? I don't want the connection state to be inconsistent between different php-fpm processes.
Edit:
While I wait for the driver to be fixed in this regard, my current workaround is to do a quick polling to determine if the driver can handle requests and then load or not load the MongoDB library/run queries if it can't connect/query:
try
{
// connect
$mongo = new Mongo("mongodb://localhost:27017");
// try to do anything with connection handle
try
{
$mongo->YOUR_DB->YOUR_COLLECTION->findOne();
$mongo->close();
define('MONGO_STATE', TRUE);
}
catch(MongoCursorException $e)
{
$mongo->close();
error_log('Error connecting to MongoDB: ' . $e->getMessage() );
define('MONGO_STATE', FALSE);
}
}
catch(MongoConnectionException $e)
{
error_log('Error connecting to MongoDB: ' . $e->getMessage() );
define('MONGO_STATE', FALSE);
}
The PHP mongo driver connectivity code is getting a big overhaul in the 1.3 release, currently in beta2 as of writing this. Based on your description, your issues may be resolved by the fixes for:
https://jira.mongodb.org/browse/PHP-158
https://jira.mongodb.org/browse/PHP-465
Once it is released you will be able to see the full list of fixes here:
https://jira.mongodb.org/browse/PHP/fixforversion/10499
Or, alternatively on the PECL site. If you can test 1.3 and confirm that your issues are still present then I'm sure the driver devs would love to hear from you before the 1.3.0 release, especially if it is easily reproducible.
Okay, this is a new one. I'm trying to debug my project, which I've done many times in the past, and I'm now getting this exception in one of my repositories. I haven't seen it before now. I haven't touched my repos in days, and my connection string is the same as its always been. The inner exception states:
{"A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)"}
And the code it's choking on is:
public class HGArticleRepository : IArticleRepository
{
private HGEntities _siteDB = new HGEntities();
public List<Article> Articles
{
get { return _siteDB.Articles.ToList(); } // <-- this is the line
}
// more repo code
}
Again, like I said, I've never encountered this exception before, and I haven't touched my domain code in days.
This error usually means:
Connection String points to nonexistent SQL Server.
Connection String points to SQL Server that was shut down. Or not started.
Named pipes transport was disabled in SQL Server settings.
Check them carefully one by one. In your case I guess it is 2.
A second option of solution:
Review that IIS is running.
In my case it was stopped, so I got the same error.