Amazon.Runtime.ClientConfig.get_RetryMode() deadlock - aws-sdk-net

I'm running a .net Framework ASP.NET WebApi application on Elastic Beanstalk and it occasionally becomes unresponsive.
We've got some process dumps of w3wp.exe and the blocking thread gets stuck on a call to Amazon.Runtime.ClientConfig.get_RetryMode(). This causes a deadlock as it doesn't release a lock obtained in Amazon.Runtime.Internal.Util.Logger.GetLogger. Subsequent calls get blocked in GetLogger() waiting for the lock to be released which never happens.
Any idea why Amazon.Runtime.ClientConfig.get_RetryMode() doesn't return?
Blocking Call Stack
Amazon.Runtime.ClientConfig.get_RetryMode()+2f
Amazon.Runtime.AmazonServiceClient.BuildRuntimePipeline()+1cd
AWS.Logger.Core.AWSLoggerCore..ctor(AWS.Logger.AWSLoggerConfig, System.String)+1b0
AWS.Logger.Log4net.AWSAppender.ActivateOptions()+131
log4net.Repository.Hierarchy.XmlHierarchyConfigurator.ParseAppender(System.Xml.XmlElement)+47b
log4net.Repository.Hierarchy.XmlHierarchyConfigurator.FindAppenderByReference(System.Xml.XmlElement)+1dc
log4net.Repository.Hierarchy.XmlHierarchyConfigurator.ParseChildrenOfLoggerElement(System.Xml.XmlElement, log4net.Repository.Hierarchy.Logger, Boolean)+110
log4net.Repository.Hierarchy.XmlHierarchyConfigurator.ParseRoot(System.Xml.XmlElement)+5f
log4net.Repository.Hierarchy.XmlHierarchyConfigurator.Configure(System.Xml.XmlElement)+554
log4net.Repository.Hierarchy.Hierarchy.XmlRepositoryConfigure(System.Xml.XmlElement)+c9
log4net.Config.XmlConfigurator.InternalConfigure(log4net.Repository.ILoggerRepository, System.IO.Stream)+2ad
log4net.Config.XmlConfigurator.InternalConfigure(log4net.Repository.ILoggerRepository, System.IO.FileInfo)+18f
log4net.Config.XmlConfigurator.Configure(log4net.Repository.ILoggerRepository, System.Uri)+77
log4net.Core.DefaultRepositorySelector.ConfigureRepository(System.Reflection.Assembly, log4net.Repository.ILoggerRepository)+2d1
log4net.Core.DefaultRepositorySelector.CreateRepository(System.Reflection.Assembly, System.Type, System.String, Boolean)+2bf
log4net.Core.DefaultRepositorySelector.GetRepository(System.Reflection.Assembly)+3e
log4net.Core.LoggerManager.GetLogger(System.Reflection.Assembly, System.Type)+45
[[DebuggerU2MCatchHandlerFrame]]
[[HelperMethodFrame_PROTECTOBJ] (System.RuntimeMethodHandle.InvokeMethod)] System.RuntimeMethodHandle.InvokeMethod(System.Object, System.Object[], System.Signature, Boolean)
mscorlib_ni!System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(System.Object, System.Object[], System.Object[])+84
mscorlib_ni!System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo)+92
Amazon.Runtime.Internal.Util.InternalLog4netLogger..ctor(System.Type)+d8
Amazon.Runtime.Internal.Util.Logger..ctor(System.Type)+5f
Amazon.Runtime.Internal.Util.Logger.GetLogger(System.Type)+af
Amazon.Runtime.AppConfigAWSCredentials..ctor()+33
Amazon.Runtime.FallbackCredentialsFactory+<>c.b__10_0()+1f
Amazon.Runtime.FallbackCredentialsFactory.GetCredentials(Boolean)+c7
Amazon.SimpleSystemsManagement.AmazonSimpleSystemsManagementClient..ctor(Amazon.RegionEndpoint)+3b
// UserCode new Amazon.SimpleSystemsManagement.AmazonSimpleSystemsManagementClient(region)
Blocked Call Stack
System.Threading.Monitor.Enter(System.Object)
Amazon.Runtime.Internal.Util.Logger.GetLogger(System.Type)+68
Amazon.Runtime.AmazonServiceClient..ctor(Amazon.Runtime.AWSCredentials, Amazon.Runtime.ClientConfig)+8d
// UserCode new Amazon.SimpleSystemsManagement.AmazonSimpleSystemsManagementClient(region)

I think the dead lock was caused by fallback process.
Aws Appender
-> obtain Logger
-> Would release Logger after setting up finish
-> setting up is missing some info , call fallback to figure out
-> failed connection, get_RetryMode()
-> trying to obtain Logger (dead lock generated)

Related

Spring Batch Job Stop Using jobOperator

I have Started my job using jobLauncher.run(processJob,jobParameters); and when i try stop job using another request jobOperator.stop(jobExecution.getId()); then get exeption :
org.springframework.batch.core.launch.JobExecutionNotRunningException:
JobExecution must be running so that it can be stopped
Set<JobExecution> jobExecutionsSet= jobExplorer.findRunningJobExecutions("processJob");
for (JobExecution jobExecution:jobExecutionsSet) {
System.err.println("job status : "+ jobExecution.getStatus());
if (jobExecution.getStatus()== BatchStatus.STARTED|| jobExecution.getStatus()== BatchStatus.STARTING || jobExecution.getStatus()== BatchStatus.STOPPING){
jobOperator.stop(jobExecution.getId());
System.out.println("###########Stopped#########");
}
}
when print job status always get job status : STOPPING but batch job is running
its web app, first upload some CSV file and start some operation using spring batch and during this execution if user need stop then stop request from another controller method come and try to stop running job
Please help me for stop running job
If you stop a job while it is running (typically in a STARTED state), you should not get this exception. If you have this exception, it means you have stopped your job while it is currently stopping (that is what the STOPPING status means).
jobExplorer.findRunningJobExecutions returns only running executions, so if in the next line right after this one you have a job in STOPPING status, this means the status changed right after calling jobExplorer.findRunningJobExecutions. You need to be aware that this is possible and your controller should handle this case.
When you tell spring batch to stop a job it goes into STOPPING mode. What this means is it will attempt to complete the unit of work chunk it is currently processing but then stop working. Likely what's happening is you are working on a long running task that is not finishing a unit of work (is it hung?) so it can't move from STOPPING to STOPPED.
Doing it twice rightly leads to an Exception because your job is already STOPPING by the time you did it the first time.

Illegal access error when deleting Google Pub Sub subscription upon JVM shutdown

I'm trying to delete a Google Pub Sub subscription in a JVM shutdown hook, but I'm encountering an illegal access error with the Google Pub Sub subscription admin client when the shutdown hook runs. I've tried using both sys.addShutdownHook as well as Runtime.getRuntime().addShutdownHook, but I get the same error either way.
val deleteInstanceCacheSubscriptionThread = new Thread {
override def run: Unit = {
cacheUpdateService. deleteInstanceCacheUpdateSubscription()
}
}
sys.addShutdownHook(deleteInstanceCacheSubscriptionThread.run)
// Runtime.getRuntime().addShutdownHook(deleteInstanceCacheSubscriptionThread)
This is the stack trace:
Exception in thread "shutdownHook1" java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [META-INF/services/com.google.auth.http.HttpTransportFactory]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1385)
at org.apache.catalina.loader.WebappClassLoaderBase.findResources(WebappClassLoaderBase.java:985)
at org.apache.catalina.loader.WebappClassLoaderBase.getResources(WebappClassLoaderBase.java:1086)
at java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:348)
at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
at com.google.common.collect.Iterators.getNext(Iterators.java:845)
at com.google.common.collect.Iterables.getFirst(Iterables.java:779)
at com.google.auth.oauth2.OAuth2Credentials.getFromServiceLoader(OAuth2Credentials.java:318)
at com.google.auth.oauth2.ServiceAccountCredentials.<init>(ServiceAccountCredentials.java:145)
at com.google.auth.oauth2.ServiceAccountCredentials.createScoped(ServiceAccountCredentials.java:505)
at com.google.api.gax.core.GoogleCredentialsProvider.getCredentials(GoogleCredentialsProvider.java:92)
at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:142)
at com.google.cloud.pubsub.v1.stub.GrpcSubscriberStub.create(GrpcSubscriberStub.java:263)
at com.google.cloud.pubsub.v1.stub.SubscriberStubSettings.createStub(SubscriberStubSettings.java:242)
at com.google.cloud.pubsub.v1.SubscriptionAdminClient.<init>(SubscriptionAdminClient.java:178)
at com.google.cloud.pubsub.v1.SubscriptionAdminClient.create(SubscriptionAdminClient.java:159)
at com.google.cloud.pubsub.v1.SubscriptionAdminClient.create(SubscriptionAdminClient.java:150)
at com.company.pubsub.services.GooglePubSubService.$anonfun$deleteSubscription$2(GooglePubSubService.scala:384)
at com.company.utils.TryWithResources$.withResources(TryWithResources.scala:21)
at com.company.pubsub.services.GooglePubSubService.$anonfun$deleteSubscription$1(GooglePubSubService.scala:384)
at com.company.scalalogging.Logging.time(Logging.scala:43)
at com.company.scalalogging.Logging.time$(Logging.scala:35)
at com.company.pubsub.services.GooglePubSubService.time(GooglePubSubService.scala:30)
at com.company.pubsub.services.GooglePubSubService.deleteSubscription(GooglePubSubService.scala:382)
at com.company.cache.services.CacheUpdateService.deleteInstanceCacheUpdateSubscription(CacheUpdateService.scala:109)
at com.company.cache.services.CacheUpdateHandlerService$$anon$1.run(CacheUpdateHandlerService.scala:132)
at com.company.cache.services.CacheUpdateHandlerService$.$anonfun$addSubscriptionShutdownHook$1(CacheUpdateHandlerService.scala:135)
at scala.sys.ShutdownHookThread$$anon$1.run(ShutdownHookThread.scala:37)
It seems like by the time the shutdown hook runs the Pub Sub library has already shut down, so we can't access the subscription admin client anymore. But, I was wondering if there was anyway to delete the subscription before this happens.

Handling connection failures in apache-camel

I am writing an apache-camel RabbitMQ consumer. I would like to react somehow to connection problems (i.e. try to reconnect). Is it possible to configure apache-camel to automatically reconnect?
If not, how can I find out that a connection to the queue was interrupted? I've done the following test:
start the queue (and some producer)
start my consumer (it was getting messages as expected)
stop the queue (the messages stopped arriving, as expected, but no exception was thrown)
start the queue (no new messages were received)
I am using camel in Scala (via akka-camel), but a Java solution would be probably also OK
You can pass in the flag automaticRecoveryEnabled=true to the URI, Camel will reconnect if the connection is lost.
For automatic RabbitMQ resource recovery (Connections/Channels/Consumers/Queues/Exchanages/Bindings) when failures occur, check out Lyra (which I authored). Example usage:
Config config = new Config()
.withRecoveryPolicy(new RecoveryPolicy()
.withMaxAttempts(20)
.withInterval(Duration.seconds(1))
.withMaxDuration(Duration.minutes(5)));
ConnectionOptions options = new ConnectionOptions().withHost("localhost");
Connection connection = Connections.create(options, config);
The rest of the API is just the amqp-client API, except your resources are automatically recovered when failures occur.
I'm not sure about camel-rabbitmq specifically, but hopefully there's a way you can swap in your own resource creation via Lyra.
Current camel-rabbitmq just create a connection and the channel when the consumer or producer is started. So it don't have a chance to catch the connection exception :(.

Squeryl 0.9.5 (with Lift 2.4) not releasing database connections/pools

Following the recommended transaction setup for Squeryl, in my Boot.scala:
import net.liftweb.squerylrecord.SquerylRecord
import org.squeryl.Session
import org.squeryl.adapters.H2Adapter
SquerylRecord.initWithSquerylSession(Session.create(
DriverManager.getConnection("jdbc:h2:lift_proto.db;DB_CLOSE_DELAY=-1", "sa", ""),
new H2Adapter
))
The first startup works fine. I can connect via H2's web-interface and if I use my app, it updates the database appropriately. However if I restart jetty without restarting the JVM, I get:
java.sql.SQLException: No suitable driver found for jdbc:h2:lift_proto.db;DB_CLOSE_DELAY=-1
The same result is had if I replace "DB_CLOSE_DELAY=-1" with "AUTO_SERVER=TRUE", or remove it entirely.
Following the recommendations on the Squeryl list, I tried C3P0:
import com.mchange.v2.c3p0.ComboPooledDataSource
val cpds = new ComboPooledDataSource
cpds.setDriverClass("org.h2.Driver")
cpds.setJdbcUrl("jdbc:h2:lift_proto")
cpds.setUser("sa")
cpds.setPassword("")
org.squeryl.SessionFactory.concreteFactory =
Some(() => Session.create(
cpds.getConnection, new H2Adapter())
)
This produces similar behavior:
WARNING: A C3P0Registry mbean is already registered. This probably means that an application using c3p0 was undeployed, but not all PooledDataSources were closed prior to undeployment. This may lead to resource leaks over time. Please take care to close all PooledDataSources.
To be sure it wasn't anything I was doing which was causing this, I started and stopped the server without calling a transaction { } block. No exceptions were thrown. I then added to my Boot.scala:
transaction { /* Do nothing */ }
And the exception was once again thrown (I'm assuming because connections are lazy). So I moved the db initialization code to its own file away from Lift:
SessionFactory.concreteFactory = Some(()=>
Session.create(
java.sql.DriverManager.getConnection("jdbc:h2:mem:test", "sa", ""),
new H2Adapter
))
transaction {}
Results were unchanged. What am I doing wrong? I cannot find any mention of needing to explicitly close connections or sessions in the Squeryl documentation, and this is my first time using JDBC.
I found mention of the same issue here on the Lift google group, but no resolution.
Thanks for any help.
When you say you are restarting Jetty, I think what you're actually doing is reloading your webapp within Jetty. Neither the h2 database or C3P0 will automatically shut down when your app reloads, which explains the errors you are receiving when Lift tries to initialize them a second time. You don't see the error when you don't create a transaction block because both h2 and C3P0 are initialized when the first DB connection is retrieved.
I tend to use BoneCP as a connection pool myself. You can configure the minimum number of pooled connections to be > 1, which will stop h2 from shutting down without the need for DB_CLOSE_DELAY=-1. Then you can use:
LiftRules.unloadHooks append { () =>
yourPool.close() //should destroy the pool and it's associated threads
}
That will close all of the connections when Lift is shutdown, which should properly shutdown the h2 database as well.

Security Exception in self signed sound recorder Applet

I have created an applet for recording sound. It throws an exception when I try to open a dataline.
TargetDataLine.open()
java.security.AccessControlException: access denied (javax.sound.sampled.AudioPermission record)
My applet is self signed, all other jar files are self signed.
Previously I was using a different thread to start the TargetDataLine and close the line. Afterwords instead of creating another thread, I switched to Executorservice. It works fine with thread but throws above exception with ExecutorService.
Since the executor service starts a new thread when there is call from javascript, the security level of the thread is set to that of javascript thread.
So using AccessControl.doPrivilaged helps to solve the problem.Explained here how to do it.