I have a Camel application deployed on JBoss in a WAR file with a spring configuration for starting the Camel context.
It deploys and runs very nicely on a JBoss EAP 7.0.0.GA.
If I want to change values in a property file that my application depends on and touch the war file, it normally redeploys the application. But in some cases it fails.
I get the following in the server.log:
2017-07-25 12:05:26.671 INFO class=org.apache.camel.impl.DefaultShutdownStrategy thread="ServerService Thread Pool -- 74" Starting to graceful shutdown 12 routes (timeout 300 seconds)
2017-07-25 12:05:26.725 INFO class=org.apache.camel.impl.DefaultShutdownStrategy thread="Camel (interfacedb) thread #2 - ShutdownTask" Waiting as there are still 4 inflight and pending exchanges to complete, timeout in 300 seconds. Inflights per route: [interfacePersistDirect = 1, route1 = 1, pullFromTransferEntityTable = 1, lastScheduledRun = 1]
...
2017-07-25 12:10:26.691 WARN class=org.apache.camel.impl.DefaultShutdownStrategy thread="ServerService Thread Pool -- 74" Timeout occurred during graceful shutdown. Forcing the routes to be shutdown now. Notice: some resources may still be running as graceful shutdown did not complete successfully.
2017-07-25 12:10:26.691 WARN class=org.apache.camel.impl.DefaultShutdownStrategy thread="Camel (interfacedb) thread #2 - ShutdownTask" Interrupted while waiting during graceful shutdown, will force shutdown now.
2017-07-25 12:10:26.694 INFO class=org.apache.camel.impl.DefaultShutdownStrategy thread="ServerService Thread Pool -- 74" Graceful shutdown of 12 routes completed in 300 seconds
After this the application will not start again. JBoss reports the following in the myApp.war.failed file in the deployments folder.
"WFLYDS0022: Did not receive a response to the deployment operation within the allowed timeout period [600 seconds]. Check the server configuration file and the server logs to find more about the status of the deployment."
The application normally deploys a lot quicker than 600 seconds. I can touch the war file or delete the .failed file, which normally triggers a redeployment, but JBoss keeps giving me the error above in the .failed file.
The application starts normally if I restart the JBoss VM, but I would like to avoid restarting the other applications running on the JBoss instance.
Any suggestions?
Related
I have been running PAM 7.9/JBPM 7.48 for about a year under JBOSS EAP 7.3. My JBPM's KieServer persists using SQL Server. I repeatedly deployed the KieServer yesterday but deploying today fails.
2021-12-16 15:25:53,645 ERROR [org.jboss.as.controller.management-operation] (DeploymentScanner-threads - 1) WFLYCTL0348: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'full-replace-deployment' at address '[]'
2021-12-16 15:26:03,649 ERROR [org.jboss.as.controller.management-operation] (DeploymentScanner-threads - 1) WFLYCTL0190: Step handler org.jboss.as.server.deployment.DeploymentHandlerUtil$4#74e289e9 for operation full-replace-deployment at address [] failed handling operation rollback -- java.util.concurrent.TimeoutException: java.util.concurrent.TimeoutException
at org.jboss.as.controller.OperationContextImpl.waitForRemovals(OperationContextImpl.java:523)
at org.jboss.as.controller.AbstractOperationContext$Step.handleResult(AbstractOperationContext.java:1518)
I have already set the property to increase the timeout for the deployment but it still complains about a 5 second timeout that must be controlled by another property
2021-12-16 13:40:47,039 ERROR [org.jboss.as.controller.management-operation] (DeploymentScanner-threads - 1) WFLYCTL0349: Timeout after [5] seconds waiting for service container stability while finalizing an operation. Process must be restarted. Step that first updated the service container was 'deploy' at address '[("deployment" => "kie-server.war")]'
I have changed the logging level to trace in order to gain all the information I can. How else can I debug / solve this issue?
There are two factors that may be contributing to this, but I don't have a good approach for addressing them.
There was a Windows Update yesterday (likely due to the recent Log4j exploit)
Some people at my company are having problems connecting to the SQL Server database. I am not seeing log messages about KieServer being unable to connect to the DB, but when it cannot reaching the DB the KieServer fails to start.
We are facing a major incident in our Camunda Orchestrator. When we hit 100 running process instances, Camunda Cockpit takes an eternity and never responds.
We have the same issue when calling /app/engine/.
Few messages are being consumed from RabbitMQ, and then everything stops.
The application however is not down.
I suspect a process engine configuration issue, because of being unable to get the job executor log.
When I set JobExecutorActivate to false, all things go right for Cockpit and queue consumption, but processes stop at the end of the first subprocess.
We have this log loop non stop:
2018/11/17 14:47:33.258 DEBUG ENGINE-14012 Job acquisition thread woke up
2018/11/17 14:47:33.258 DEBUG ENGINE-14022 Acquired 0 jobs for process engine 'default': []
2018/11/17 14:47:33.258 DEBUG ENGINE-14023 Execute jobs for process engine 'default': [8338]
2018/11/17 14:47:33.258 DEBUG ENGINE-14023 Execute jobs for process engine 'default': [8217]
2018/11/17 14:47:33.258 DEBUG ENGINE-14023 Execute jobs for process engine 'default': [8256]
2018/11/17 14:47:33.258 DEBUG ENGINE-14011 Job acquisition thread sleeping for 100 millis
2018/11/17 14:47:33.359 DEBUG ENGINE-14012 Job acquisition thread woke up
And this log too (for queue consumption):
2018/11/17 15:04:19.582 DEBUG Waiting for message from consumer. {"null":null}
2018/11/17 15:04:19.582 DEBUG Retrieving delivery for Consumer#5d05f453: tags=[{amq.ctag-0ivcbc2QL7g-Duyu2Rcbow=queue_response}], channel=Cached Rabbit Channel: AMQChannel(amqp://guest#127.0.0.1:5672/,4), conn: Proxy#77a5983d Shared Rabbit Connection: SimpleConnection#17a1dd78 [delegate=amqp://guest#127.0.0.1:5672/, localPort= 49812], acknowledgeMode=AUTO local queue size=0 {"null":null}
Environment :
Spring Boot 2.0.3.RELEASE, Camunda v7.9.0 with PostgreSQL, RabbitMQ
Camunda BPM listen and push to 165 RabbitMQ queue.
Configuration :
# Data source (PostgreSql)
com.campDo.fr.camunda.datasource.url=jdbc:postgresql://localhost:5432/campDo
com.campDo.fr.camunda.datasource.username=campDo
com.campDo.fr.camunda.datasource.password=password
com.campDo.fr.camunda.datasource.driver-class-name=org.postgresql.Driver
com.campDo.fr.camunda.bpm.database.jdbc-batch-processing=false
oms.camunda.retry.timer=1
oms.camunda.retry.nb-max=2
SpringProcessEngineConfiguration :
#Bean
public SpringProcessEngineConfiguration processEngineConfiguration() throws IOException {
final SpringProcessEngineConfiguration config = new SpringProcessEngineConfiguration();
config.setDataSource(camundaDataSource);
config.setDatabaseSchemaUpdate("true");
config.setTransactionManager(transactionManager());
config.setHistory("audit");
config.setJobExecutorActivate(true);
config.setMetricsEnabled(false);
final Resource[] resources = resourceLoader.getResources(CLASSPATH_ALL_URL_PREFIX + "/processes/*.bpmn");
config.setDeploymentResources(resources);
return config;
}
Pom dependencies :
<dependency>
<groupId>org.camunda.bpm.springboot</groupId>
<artifactId>camunda-bpm-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.camunda.bpm.springboot</groupId>
<artifactId>camunda-bpm-spring-boot-starter-webapp</artifactId>
</dependency>
<dependency>
<groupId>org.camunda.bpm.springboot</groupId>
<artifactId>camunda-bpm-spring-boot-starter-rest</artifactId>
</dependency>
I am quite sure that my job executor config is wrong.
Update :
I can start cockpit and make Camunda consume messages by setting JobExecutorActivate to false, but processes are still stopping at the first job executor required step:
config.setJobExecutorActivate(false);
Thanks for your help.
First: if your process contains async steps (Jobs) then it will pause. Activating the jobExecutor will just say that camunda should manage how these jobs are worked on. If you disable the executor, your processes will still stop and since no-one will execute them, they remain stopped.
Disabling job-execution is only sensible during testing or when you have multiple nodes and only some of them should do processing.
To your main issue: the job executor works with a threadPool. From what you describe, it is very likely, that all threads in the pool block forever, so they never finish and never return, meaning your system is stuck.
This happened to us a while ago when working with a smtp server, there was an infinite timeout on the connection so the threads kept waiting although the machine was not available.
Since job execution in camunda is highly reliable and well tested per se, I yywould suggest that you double check everything you do in your delegates, if you are lucky (and I am right) you will find the spot where you just wait forever ...
I am new to jboss server. When I am trying to deploy .war file on server the following exception gets print on console:
6:38:04,388 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0348: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[
("core-service" => "management"),
("management-interface" => "http-interface")
]'
16:38:05,642 INFO [org.jboss.as.connector.deployers.jdbc] (MSC service thread 1-4) WFLYJCA0019: Stopped Driver service with driver-name = Aerobay.war_com.mysql.jdbc.Driver_5_1
16:38:09,548 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0190: Step handler org.jboss.as.server.DeployerChainAddHandler$FinalRuntimeStepHandler#5f88823f for operation {"operation" => "add-deployer-chains","address" => []} at address [] failed handling operation rollback -- java.util.concurrent.TimeoutException: java.util.concurrent.TimeoutException
at org.jboss.as.controller.OperationContextImpl.waitForRemovals(OperationContextImpl.java:396)
at org.jboss.as.controller.AbstractOperationContext$Step.handleResult(AbstractOperationContext.java:1384)
at org.jboss.as.controller.AbstractOperationContext$Step.finalizeInternal(AbstractOperationContext.java:1332)
at org.jboss.as.controller.AbstractOperationContext$Step.finalizeStep(AbstractOperationContext.java:1292)
at org.jboss.as.controller.AbstractOperationContext$Step.access$300(AbstractOperationContext.java:1180)
at org.jboss.as.controller.AbstractOperationContext.handleContainerStabilityFailure(AbstractOperationContext.java:964)
at org.jboss.as.controller.AbstractOperationContext.doCompleteStep(AbstractOperationContext.java:590)
at org.jboss.as.controller.AbstractOperationContext.completeStepInternal(AbstractOperationContext.java:354)
at org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:330)
at org.jboss.as.controller.OperationContextImpl.executeOperation(OperationContextImpl.java:1183)
at org.jboss.as.controller.ModelControllerImpl.boot(ModelControllerImpl.java:453)
at org.jboss.as.controller.AbstractControllerService.boot(AbstractControllerService.java:327)
at org.jboss.as.controller.AbstractControllerService.boot(AbstractControllerService.java:313)
at org.jboss.as.server.ServerService.boot(ServerService.java:384)
at org.jboss.as.server.ServerService.boot(ServerService.java:359)
at org.jboss.as.controller.AbstractControllerService$1.run(AbstractControllerService.java:271)
at java.lang.Thread.run(Thread.java:745)
Thanks in advance for the help !
I had the same problem when I tried to deploy the WAR file on my Red Hat Jboss EAP 7.0.
But the server was integrated into my IDE (Eclipse Neon) and the problem only occured in Debug-Modus.
I was able to solve the problem by removing all breakpoints and after that i started the server again.
Try increasing timeout by adding java option "blocking.timeout". You can do it in bin/standalone.conf.bat (depends on how you configure wildfly) by adding line:
set "JAVA_OPTS=%JAVA_OPTS% -Djboss.as.management.blocking.timeout=600
Change the number if it's not enough.
increasing the timeout doesn't solve the root cause of the problem. You need to check the cause of the time of the block and solve the issue. Maybe in some cases the solution is to increase the timeout.
In most cases, increasing resources is a bad way to solve issues. I had this case, the Wildfly took a lot of time to boot. I increased the timeout to 600 and solved the issue but was still having issue with the wildfly booting time which was so annoying.
2018-03-26 07:50:36,523 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0348: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[("path" => "xxxxxxxxxxxxxxxx")]'
Finally I checked the block cause in and found the block was due to network host resolving (NAS storage defined as a path in wildfly).
I jumped to the network setting and found that my local DNS was not set properly. I added the local DNS instead of the public DNS and the block issue was gone. Hope this helps
Regards
Sleem
When i tried to debug and started the server with debug mode got the following error:
16:19:50,096 ERROR [org.jboss.as.controller.management-operation] (management-handler-thread - 1) JBAS013412: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'deploy' at address '[("deployment" => "ViprWeb.war")]'
16:19:50,096 ERROR [org.jboss.as.server] (management-handler-thread - 1) JBAS015870
16:20:00,117 ERROR [org.jboss.as.controller.management-operation] (management-handler-thread - 1) JBAS013413: Timeout after [5000] seconds waiting for service container stability while finalizing an operation.
I removed all my breakpoints and restarted my server jboss and it resolved the issue.
just increase time out in standalone.conf.bat
set as set "JAVA_OPTS=%JAVA_OPTS% -Djboss.as.management.blocking.timeout=600
It worked for me.
I had the same problem running a "dockerized" application locally - turns out increasing the resources fixed the issue. What I finally settled on:
CPUs: 4
Memory: 8GB
Swap: 2GB
Same problem, with netbeans
but I had not break points.
Running jboss by command line, helped me
Stop jboss
Close Netbeans
open command line
Go to jboss folder > bin >
type: standalone.bat (this starts jboss)
open Netbeans
worked fine!
Hope it'll help someone else.
I've been facing the same problem recently with WildFly 18 and 21, trying to run a WAR file containing JSR-352 batch jobs that worked fine on WildFly 14.
Increasing the timeout did not solve the situation, only prolonged the time before the TimeoutException was casted, no matter the value (e.g. 5, 10 or 20 minutes).
I've just found that to turn off microprofile-metrics-smallryesubsystem seems to be a possible solution.
After commenting out this line from the standalone.xml file, the war deploy was successful and much faster (about 2 minutes):
<subsystem xmlns="urn:wildfly:microprofile-metrics-smallrye:2.0" security-enabled="false" exposed-subsystems="*" prefix="${wildfly.metrics.prefix:wildfly}"/>
I am having problem with keycloak server 15.0.2.
WFLYCTL0190: Step handler org.jboss.as.server.DeployerChainAddHandler$FinalRuntimeStepHandler#410c55ac for operation add-deployer-chains at address [] failed
I am using mysql5.7 with jconnect8.0 jar.
I had the same problem. Then I killed the Kaspersky process and it helped!
I tackled a similar problem and had only succeed with undeploy the the apps. This gave a clean environment for Wildfly to restart and start the management and http-service. Then deploy the apps/WARs and identify what got you to this state.
In my case it was transactions that wanted to recover and deleting those from DB solve the problem bot to re-occur.
I am calling a script within main script to start the jboss sever after releasing the build on server.it is successfully starting the JBOSS but showing the below output in server/log/ server.log file and at the console output which is hanged.
To run the next build i need to kill this manually which is not appropriate.
05:04:17,373 INFO [AjpProtocol] Starting Coyote AJP/1.3 on ajp-0.0.0.0-8209
05:04:17,451 INFO [ServerImpl] JBoss (Microcontainer) [5.1.0.GA (build: SVNTag=JBoss_5_1_0_GA date=200905221053)] Started in 2m:38s:444ms
05:04:20,912 WARN [PropertyMessageResources] Resource MessageResources_en_US.properties Not Found.
05:04:20,913 WARN [PropertyMessageResources] Resource MessageResources_en.properties Not Found.
Help would be really highly appreciable.
Thanks.
By default, when you start your jboss server, it is not started as a background process and the console just sits there with the logs when server is started, that must be the reason why your script seems to hang , in reality it is just awaiting console output from the server.
To start jboss as background operation, replace the lines of code where you fire the run.sh in startup script with
nohup /path/to/jboss_home/jboss/bin/run.sh -b0.0.0.0 > /tmp/logs/jbosslogs.log &
This should start jboss in the background and redirect all startup logs to jbosslogs.log file. Since it is a background process, it will not hang at all.
I am having web application running in JBOSS AS 4.2.2.
Observed that jboss server automatically shuts down, and the following exception is observed in server.log
14:20:38,048 INFO [Server] Runtime shutdown hook called, forceHalt: true
14:20:38,049 INFO [Server] JBoss SHUTDOWN: Undeploying all packages
I want to enable TRACE for org.jboss.system.server.Server in jboss-log4j.xml, to hopefully get some more info when the server shuts down.
Please let me know how to enable TRACE for org.jboss.system.server.Server in jboss-log4j.xml.
I was able to add trace for server log and i could see the following output when JBOSS AS shuts down automatically:
2010-06-09 19:07:46,631 DEBUG [org.jboss.wsf.stack.jbws.RequestHandlerImpl] END handleRequest: jboss.ws:context=hpnp_lqs,endpoint=APIWebService
2010-06-09 19:07:46,631 DEBUG [org.jboss.ws.core.soap.MessageContextAssociation] popMessageContext: org.jboss.ws.core.jaxws.handler.SOAPMessageContextJAXWS#3290a11e (Thread http-0.0.0.0-8080-1)
2010-06-09 19:07:55,895 INFO [org.jboss.system.server.Server] Runtime shutdown hook called, forceHalt: true
2010-06-09 19:07:55,895 TRACE [org.jboss.system.server.Server] Shutdown caller:
java.lang.Throwable: Here
at org.jboss.system.server.ServerImpl$ShutdownHook.shutdown(ServerImpl.java:1017)
at org.jboss.system.server.ServerImpl$ShutdownHook.run(ServerImpl.java:996)
2010-06-09 19:07:55,895 INFO [org.jboss.system.server.Server] JBoss SHUTDOWN: Undeploying all packages
If anybody, has any clue, on what might be cause for automatic shutdown, pls help me.
Thanks!
There's a JBoss wiki page listing log output for various shutdown causes. It looks like yours was caused by a Ctrl-C. I assume you would have known if you hit Ctrl-C, though.
On unix-type servers, Ctrl-C generates a TERM signal, which could also come from someone or some script running as your jboss user or as root executing "kill <jboss pid>". If you're on linux I'd take a look at this question about the OOM killer.
One possible cause for this behaviour is console logout. We have observed this with our own server.
In brief, by default the Sun JVM listens to the event of the console user logging out, and shuts itself down automatically when that happens. To disable this, start the JVM with the -Xrs parameter.
See here for more details (look for Mysterious shutdowns).
One possible cause for a forced shutdown is if the virtual machine is out of memory.
I had this problem several years ago when a colleague implemented some very nasty bulk loading of objects from a database which caused jboss to shutdown on certain requests.
Try searching for "memory" or similar keywords in the log file and/or monitor the memory usage of the server.