Partition is below target replica or instance count - azure-service-fabric

When attempting to publish a Service Fabric application to a local cluster, the cluster fails to load the application stating the error in the title. The stack trace points me to an exception line in OwinCommunicationListener.cs:
try
{
this.eventSource.LogInfo("Starting web server on " + this.listeningAddress);
this.webApp = WebApp.Start(this.listeningAddress, appBuilder => this.startup.Invoke(appBuilder));
this.eventSource.LogInfo("Listening on " + this.publishAddress);
return Task.FromResult(this.publishAddress);
}
catch (Exception ex)
{
var logString = $"Web server failed to open endpoint {endpointName}. {ex.ToString()}";
this.eventSource.LogFatal(logString);
this.StopWebServer();
throw ex; // points to this line from cluster manager
}
I am unable to inspect the exception thrown, but there is no useful exception information other than a TargetInvocationException with a stack trace to the line noted above. Why won't this application load on my local cluster?

It's hard to say without an actual exception message or stack trace but judging by the location from which the exception was thrown and the fact that the problem resolved itself the next morning, the most likely and most common cause of this is that the port you were trying to use to open the web listener was taken by some other process at the time, and the next morning the port was free again. This, by the way, isn't really specific to Service Fabric. You're just trying to open a socket on a port that was taken by someone else.
I'm honestly more curious about why you couldn't inspect the exception. I can think of three things off the top of my head to help with that:
Use "throw" instead of "throw ex" so you don't reset the stack trace.
Look at your logs. It looks like you're writing out an ETW event in your catch block. What did it say?
Use the Visual Studio debugger: Simply set a breakpoint in the catch block and start the application with debugging by pressing F5.

Related

Illegal access error when deleting Google Pub Sub subscription upon JVM shutdown

I'm trying to delete a Google Pub Sub subscription in a JVM shutdown hook, but I'm encountering an illegal access error with the Google Pub Sub subscription admin client when the shutdown hook runs. I've tried using both sys.addShutdownHook as well as Runtime.getRuntime().addShutdownHook, but I get the same error either way.
val deleteInstanceCacheSubscriptionThread = new Thread {
override def run: Unit = {
cacheUpdateService. deleteInstanceCacheUpdateSubscription()
}
}
sys.addShutdownHook(deleteInstanceCacheSubscriptionThread.run)
// Runtime.getRuntime().addShutdownHook(deleteInstanceCacheSubscriptionThread)
This is the stack trace:
Exception in thread "shutdownHook1" java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [META-INF/services/com.google.auth.http.HttpTransportFactory]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1385)
at org.apache.catalina.loader.WebappClassLoaderBase.findResources(WebappClassLoaderBase.java:985)
at org.apache.catalina.loader.WebappClassLoaderBase.getResources(WebappClassLoaderBase.java:1086)
at java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:348)
at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
at com.google.common.collect.Iterators.getNext(Iterators.java:845)
at com.google.common.collect.Iterables.getFirst(Iterables.java:779)
at com.google.auth.oauth2.OAuth2Credentials.getFromServiceLoader(OAuth2Credentials.java:318)
at com.google.auth.oauth2.ServiceAccountCredentials.<init>(ServiceAccountCredentials.java:145)
at com.google.auth.oauth2.ServiceAccountCredentials.createScoped(ServiceAccountCredentials.java:505)
at com.google.api.gax.core.GoogleCredentialsProvider.getCredentials(GoogleCredentialsProvider.java:92)
at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:142)
at com.google.cloud.pubsub.v1.stub.GrpcSubscriberStub.create(GrpcSubscriberStub.java:263)
at com.google.cloud.pubsub.v1.stub.SubscriberStubSettings.createStub(SubscriberStubSettings.java:242)
at com.google.cloud.pubsub.v1.SubscriptionAdminClient.<init>(SubscriptionAdminClient.java:178)
at com.google.cloud.pubsub.v1.SubscriptionAdminClient.create(SubscriptionAdminClient.java:159)
at com.google.cloud.pubsub.v1.SubscriptionAdminClient.create(SubscriptionAdminClient.java:150)
at com.company.pubsub.services.GooglePubSubService.$anonfun$deleteSubscription$2(GooglePubSubService.scala:384)
at com.company.utils.TryWithResources$.withResources(TryWithResources.scala:21)
at com.company.pubsub.services.GooglePubSubService.$anonfun$deleteSubscription$1(GooglePubSubService.scala:384)
at com.company.scalalogging.Logging.time(Logging.scala:43)
at com.company.scalalogging.Logging.time$(Logging.scala:35)
at com.company.pubsub.services.GooglePubSubService.time(GooglePubSubService.scala:30)
at com.company.pubsub.services.GooglePubSubService.deleteSubscription(GooglePubSubService.scala:382)
at com.company.cache.services.CacheUpdateService.deleteInstanceCacheUpdateSubscription(CacheUpdateService.scala:109)
at com.company.cache.services.CacheUpdateHandlerService$$anon$1.run(CacheUpdateHandlerService.scala:132)
at com.company.cache.services.CacheUpdateHandlerService$.$anonfun$addSubscriptionShutdownHook$1(CacheUpdateHandlerService.scala:135)
at scala.sys.ShutdownHookThread$$anon$1.run(ShutdownHookThread.scala:37)
It seems like by the time the shutdown hook runs the Pub Sub library has already shut down, so we can't access the subscription admin client anymore. But, I was wondering if there was anyway to delete the subscription before this happens.

How to properly handle exceptions in MongoClient for VertX

In the startup method of my application I want to check that the credentials for MongoDB provided to the application are OK. If they are OK, I continue the startup, if not, the application is supposed to exit as it cannot connect to the DB. The code snippet is as below:
// Create the client
MongoClient mongodb = null;
try {
mongodb = MongoClient.createShared(vertx, mongo_cnf, mongo_cnf.getString("pool_name"));
}
catch(Exception e) {
log.error("Unable to create MongoDB client. Cause: '{}'. Bailing out", e.getMessage());
System.exit(-1);
}
If I provide wrong credentials, the catch block is not called. Yet I get the following on the console:
19:35:43.017 WARN org.mongodb.driver.connection - Exception thrown during connection pool background maintenance task
com.mongodb.MongoSecurityException: Exception authenticating MongoCredential{mechanism=null, userName='user', source='admin', password=<hidden>, mechanismProperties={}}
at com.mongodb.connection.SaslAuthenticator.wrapException(SaslAuthenticator.java:162)
at com.mongodb.connection.SaslAuthenticator.access$200(SaslAuthenticator.java:39)
... many lines
The question is: how to intercept this exception in my code and be able to handle it properly ?
The exception is happening in the mongodb's java driver daemon thread so you cannot catch it.
Vertx MongoClient abstracts you from direct interaction with MongoDB Java driver so you can't modify anything related to the client.
You could access mongo client instance via reflection, but as it's already created you cannot pass additional configuration to it.
If you used com.mongodb.async.client.MongoClient you could pass ServerListener which could access the exception and you could examine it (please see this answer for more details - https://stackoverflow.com/a/46526000/1126831).
But it's only possible to specify the ServerListener in the moment of construction of the mongo client, which happens inside the Vertx MongoClient wrapper and there's no way to pass this additional configuration.
Currently the exception is not thrown, which in my opinion a mistake in design, since you receive an object that you cannot work with. Feel free to open a bug: https://github.com/vert-x3/vertx-mongo-client/issues
What you can do to detect that your client is "dead or arrival" is to wait for connection timeout:
// Default is 30s, which is quite long
JsonObject config = new JsonObject().put("serverSelectionTimeoutMS", 5_000);
MongoClient client = MongoClient.createShared(vertx, config, "pool_name");
client.findOne("some_collection", json1, json2, (h) -> {
if (h.succeeded()) {
//...
}
else {
// Notify that the client is dead
}
});

Handling connection failures in apache-camel

I am writing an apache-camel RabbitMQ consumer. I would like to react somehow to connection problems (i.e. try to reconnect). Is it possible to configure apache-camel to automatically reconnect?
If not, how can I find out that a connection to the queue was interrupted? I've done the following test:
start the queue (and some producer)
start my consumer (it was getting messages as expected)
stop the queue (the messages stopped arriving, as expected, but no exception was thrown)
start the queue (no new messages were received)
I am using camel in Scala (via akka-camel), but a Java solution would be probably also OK
You can pass in the flag automaticRecoveryEnabled=true to the URI, Camel will reconnect if the connection is lost.
For automatic RabbitMQ resource recovery (Connections/Channels/Consumers/Queues/Exchanages/Bindings) when failures occur, check out Lyra (which I authored). Example usage:
Config config = new Config()
.withRecoveryPolicy(new RecoveryPolicy()
.withMaxAttempts(20)
.withInterval(Duration.seconds(1))
.withMaxDuration(Duration.minutes(5)));
ConnectionOptions options = new ConnectionOptions().withHost("localhost");
Connection connection = Connections.create(options, config);
The rest of the API is just the amqp-client API, except your resources are automatically recovered when failures occur.
I'm not sure about camel-rabbitmq specifically, but hopefully there's a way you can swap in your own resource creation via Lyra.
Current camel-rabbitmq just create a connection and the channel when the consumer or producer is started. So it don't have a chance to catch the connection exception :(.

PHP-FPM processes holding onto MongoDB connection states

For the relevant part of our server stack, we're running:
NGINX 1.2.3
PHP-FPM 5.3.10 with PECL mongo 1.2.12
MongoDB 2.0.7
CentOS 6.2
We're getting some strange, but predictable behavior when the MongoDB server goes away (crashes, gets killed, etc). Even with a try/catch block around the connection code, i.e:
try
{
$mdb = new Mongo('mongodb://localhost:27017');
}
catch (MongoConnectionException $e)
{
die( $e->getMessage() );
}
$db = $mdb->selectDB('collection_name');
Depending on which PHP-FPM workers have connected to mongo already, the connection state is cached, causing further exceptions to go unhandled, because the $mdb connection handler can't be used. The troubling thing is that the try does not consistently fail for a considerable amount of time, up to 15 minutes later, when -- I assume -- the php-fpm processes die/respawn.
Essentially, the behavior is that when you hit a worker that hasn't connected to mongo yet, you get the die message above, and when you connect to a worker that has, you get an unhandled exception from $mdb->selectDB('collection_name'); because catch does not run.
When PHP is a single process, i.e. via Apache with mod_php, this behavior does not occur. Just for posterity, going back to Apache/mod_php is not an option for us at this time.
Is there a way to fix this behavior? I don't want the connection state to be inconsistent between different php-fpm processes.
Edit:
While I wait for the driver to be fixed in this regard, my current workaround is to do a quick polling to determine if the driver can handle requests and then load or not load the MongoDB library/run queries if it can't connect/query:
try
{
// connect
$mongo = new Mongo("mongodb://localhost:27017");
// try to do anything with connection handle
try
{
$mongo->YOUR_DB->YOUR_COLLECTION->findOne();
$mongo->close();
define('MONGO_STATE', TRUE);
}
catch(MongoCursorException $e)
{
$mongo->close();
error_log('Error connecting to MongoDB: ' . $e->getMessage() );
define('MONGO_STATE', FALSE);
}
}
catch(MongoConnectionException $e)
{
error_log('Error connecting to MongoDB: ' . $e->getMessage() );
define('MONGO_STATE', FALSE);
}
The PHP mongo driver connectivity code is getting a big overhaul in the 1.3 release, currently in beta2 as of writing this. Based on your description, your issues may be resolved by the fixes for:
https://jira.mongodb.org/browse/PHP-158
https://jira.mongodb.org/browse/PHP-465
Once it is released you will be able to see the full list of fixes here:
https://jira.mongodb.org/browse/PHP/fixforversion/10499
Or, alternatively on the PECL site. If you can test 1.3 and confirm that your issues are still present then I'm sure the driver devs would love to hear from you before the 1.3.0 release, especially if it is easily reproducible.

EF4 EntityException - The underlying provider failed on Open

Okay, this is a new one. I'm trying to debug my project, which I've done many times in the past, and I'm now getting this exception in one of my repositories. I haven't seen it before now. I haven't touched my repos in days, and my connection string is the same as its always been. The inner exception states:
{"A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)"}
And the code it's choking on is:
public class HGArticleRepository : IArticleRepository
{
private HGEntities _siteDB = new HGEntities();
public List<Article> Articles
{
get { return _siteDB.Articles.ToList(); } // <-- this is the line
}
// more repo code
}
Again, like I said, I've never encountered this exception before, and I haven't touched my domain code in days.
This error usually means:
Connection String points to nonexistent SQL Server.
Connection String points to SQL Server that was shut down. Or not started.
Named pipes transport was disabled in SQL Server settings.
Check them carefully one by one. In your case I guess it is 2.
A second option of solution:
Review that IIS is running.
In my case it was stopped, so I got the same error.