Camunda Cockpit and Rest API down but application up/JobExecutor config - postgresql

We are facing a major incident in our Camunda Orchestrator. When we hit 100 running process instances, Camunda Cockpit takes an eternity and never responds.
We have the same issue when calling /app/engine/.
Few messages are being consumed from RabbitMQ, and then everything stops.
The application however is not down.
I suspect a process engine configuration issue, because of being unable to get the job executor log.
When I set JobExecutorActivate to false, all things go right for Cockpit and queue consumption, but processes stop at the end of the first subprocess.
We have this log loop non stop:
2018/11/17 14:47:33.258 DEBUG ENGINE-14012 Job acquisition thread woke up
2018/11/17 14:47:33.258 DEBUG ENGINE-14022 Acquired 0 jobs for process engine 'default': []
2018/11/17 14:47:33.258 DEBUG ENGINE-14023 Execute jobs for process engine 'default': [8338]
2018/11/17 14:47:33.258 DEBUG ENGINE-14023 Execute jobs for process engine 'default': [8217]
2018/11/17 14:47:33.258 DEBUG ENGINE-14023 Execute jobs for process engine 'default': [8256]
2018/11/17 14:47:33.258 DEBUG ENGINE-14011 Job acquisition thread sleeping for 100 millis
2018/11/17 14:47:33.359 DEBUG ENGINE-14012 Job acquisition thread woke up
And this log too (for queue consumption):
2018/11/17 15:04:19.582 DEBUG Waiting for message from consumer. {"null":null}
2018/11/17 15:04:19.582 DEBUG Retrieving delivery for Consumer#5d05f453: tags=[{amq.ctag-0ivcbc2QL7g-Duyu2Rcbow=queue_response}], channel=Cached Rabbit Channel: AMQChannel(amqp://guest#127.0.0.1:5672/,4), conn: Proxy#77a5983d Shared Rabbit Connection: SimpleConnection#17a1dd78 [delegate=amqp://guest#127.0.0.1:5672/, localPort= 49812], acknowledgeMode=AUTO local queue size=0 {"null":null}
Environment :
Spring Boot 2.0.3.RELEASE, Camunda v7.9.0 with PostgreSQL, RabbitMQ
Camunda BPM listen and push to 165 RabbitMQ queue.
Configuration :
# Data source (PostgreSql)
com.campDo.fr.camunda.datasource.url=jdbc:postgresql://localhost:5432/campDo
com.campDo.fr.camunda.datasource.username=campDo
com.campDo.fr.camunda.datasource.password=password
com.campDo.fr.camunda.datasource.driver-class-name=org.postgresql.Driver
com.campDo.fr.camunda.bpm.database.jdbc-batch-processing=false
oms.camunda.retry.timer=1
oms.camunda.retry.nb-max=2
SpringProcessEngineConfiguration :
#Bean
public SpringProcessEngineConfiguration processEngineConfiguration() throws IOException {
final SpringProcessEngineConfiguration config = new SpringProcessEngineConfiguration();
config.setDataSource(camundaDataSource);
config.setDatabaseSchemaUpdate("true");
config.setTransactionManager(transactionManager());
config.setHistory("audit");
config.setJobExecutorActivate(true);
config.setMetricsEnabled(false);
final Resource[] resources = resourceLoader.getResources(CLASSPATH_ALL_URL_PREFIX + "/processes/*.bpmn");
config.setDeploymentResources(resources);
return config;
}
Pom dependencies :
<dependency>
<groupId>org.camunda.bpm.springboot</groupId>
<artifactId>camunda-bpm-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.camunda.bpm.springboot</groupId>
<artifactId>camunda-bpm-spring-boot-starter-webapp</artifactId>
</dependency>
<dependency>
<groupId>org.camunda.bpm.springboot</groupId>
<artifactId>camunda-bpm-spring-boot-starter-rest</artifactId>
</dependency>
I am quite sure that my job executor config is wrong.
Update :
I can start cockpit and make Camunda consume messages by setting JobExecutorActivate to false, but processes are still stopping at the first job executor required step:
config.setJobExecutorActivate(false);
Thanks for your help.

First: if your process contains async steps (Jobs) then it will pause. Activating the jobExecutor will just say that camunda should manage how these jobs are worked on. If you disable the executor, your processes will still stop and since no-one will execute them, they remain stopped.
Disabling job-execution is only sensible during testing or when you have multiple nodes and only some of them should do processing.
To your main issue: the job executor works with a threadPool. From what you describe, it is very likely, that all threads in the pool block forever, so they never finish and never return, meaning your system is stuck.
This happened to us a while ago when working with a smtp server, there was an infinite timeout on the connection so the threads kept waiting although the machine was not available.
Since job execution in camunda is highly reliable and well tested per se, I yywould suggest that you double check everything you do in your delegates, if you are lucky (and I am right) you will find the spot where you just wait forever ...

Related

Unable to start tomcat 9 with flowable war- PUBLIC.ACT_DE_DATABASECHANGELOGLOCK error

I have downloaded flowable from flowable.com/open-source and placed the flowable-ui.war and flowable-rest.war in tomcat 9.0.52 webapps folder.
When i start server after some time i can see below line repeating in cmd and server getting stopped.
SELECT LOCKED FROM PUBLIC.ACT_DE_DATABASECHANGELOGLOCK WHERE ID=1
2021-08-13 20:45:05.818 INFO 8316 --- [ main] l.lockservice.StandardLockService : Waiting for changelog lock.
Why is this issue occurring I have not made any changes?
The message
l.lockservice.StandardLockService : Waiting for changelog lock.
occurs when Flowable waits for the lock for the DB changes to be released.
If that doesn't happen it means that some other node has picked up the log and not released it properly. I would suggest you manually deleting the lock from that table (ACT_DE_DATABASECHANGELOGLOCK).
In addition to that, there is no need to run both flowable-ui.war and flowable-rest.war. flowable-rest.war is a subset of flowable-ui.war.

Spring Batch Job Stop Using jobOperator

I have Started my job using jobLauncher.run(processJob,jobParameters); and when i try stop job using another request jobOperator.stop(jobExecution.getId()); then get exeption :
org.springframework.batch.core.launch.JobExecutionNotRunningException:
JobExecution must be running so that it can be stopped
Set<JobExecution> jobExecutionsSet= jobExplorer.findRunningJobExecutions("processJob");
for (JobExecution jobExecution:jobExecutionsSet) {
System.err.println("job status : "+ jobExecution.getStatus());
if (jobExecution.getStatus()== BatchStatus.STARTED|| jobExecution.getStatus()== BatchStatus.STARTING || jobExecution.getStatus()== BatchStatus.STOPPING){
jobOperator.stop(jobExecution.getId());
System.out.println("###########Stopped#########");
}
}
when print job status always get job status : STOPPING but batch job is running
its web app, first upload some CSV file and start some operation using spring batch and during this execution if user need stop then stop request from another controller method come and try to stop running job
Please help me for stop running job
If you stop a job while it is running (typically in a STARTED state), you should not get this exception. If you have this exception, it means you have stopped your job while it is currently stopping (that is what the STOPPING status means).
jobExplorer.findRunningJobExecutions returns only running executions, so if in the next line right after this one you have a job in STOPPING status, this means the status changed right after calling jobExplorer.findRunningJobExecutions. You need to be aware that this is possible and your controller should handle this case.
When you tell spring batch to stop a job it goes into STOPPING mode. What this means is it will attempt to complete the unit of work chunk it is currently processing but then stop working. Likely what's happening is you are working on a long running task that is not finishing a unit of work (is it hung?) so it can't move from STOPPING to STOPPED.
Doing it twice rightly leads to an Exception because your job is already STOPPING by the time you did it the first time.

War with spring configured Camel context will not redeploy on JBoss

I have a Camel application deployed on JBoss in a WAR file with a spring configuration for starting the Camel context.
It deploys and runs very nicely on a JBoss EAP 7.0.0.GA.
If I want to change values in a property file that my application depends on and touch the war file, it normally redeploys the application. But in some cases it fails.
I get the following in the server.log:
2017-07-25 12:05:26.671 INFO class=org.apache.camel.impl.DefaultShutdownStrategy thread="ServerService Thread Pool -- 74" Starting to graceful shutdown 12 routes (timeout 300 seconds)
2017-07-25 12:05:26.725 INFO class=org.apache.camel.impl.DefaultShutdownStrategy thread="Camel (interfacedb) thread #2 - ShutdownTask" Waiting as there are still 4 inflight and pending exchanges to complete, timeout in 300 seconds. Inflights per route: [interfacePersistDirect = 1, route1 = 1, pullFromTransferEntityTable = 1, lastScheduledRun = 1]
...
2017-07-25 12:10:26.691 WARN class=org.apache.camel.impl.DefaultShutdownStrategy thread="ServerService Thread Pool -- 74" Timeout occurred during graceful shutdown. Forcing the routes to be shutdown now. Notice: some resources may still be running as graceful shutdown did not complete successfully.
2017-07-25 12:10:26.691 WARN class=org.apache.camel.impl.DefaultShutdownStrategy thread="Camel (interfacedb) thread #2 - ShutdownTask" Interrupted while waiting during graceful shutdown, will force shutdown now.
2017-07-25 12:10:26.694 INFO class=org.apache.camel.impl.DefaultShutdownStrategy thread="ServerService Thread Pool -- 74" Graceful shutdown of 12 routes completed in 300 seconds
After this the application will not start again. JBoss reports the following in the myApp.war.failed file in the deployments folder.
"WFLYDS0022: Did not receive a response to the deployment operation within the allowed timeout period [600 seconds]. Check the server configuration file and the server logs to find more about the status of the deployment."
The application normally deploys a lot quicker than 600 seconds. I can touch the war file or delete the .failed file, which normally triggers a redeployment, but JBoss keeps giving me the error above in the .failed file.
The application starts normally if I restart the JBoss VM, but I would like to avoid restarting the other applications running on the JBoss instance.
Any suggestions?

Handling connection failures in apache-camel

I am writing an apache-camel RabbitMQ consumer. I would like to react somehow to connection problems (i.e. try to reconnect). Is it possible to configure apache-camel to automatically reconnect?
If not, how can I find out that a connection to the queue was interrupted? I've done the following test:
start the queue (and some producer)
start my consumer (it was getting messages as expected)
stop the queue (the messages stopped arriving, as expected, but no exception was thrown)
start the queue (no new messages were received)
I am using camel in Scala (via akka-camel), but a Java solution would be probably also OK
You can pass in the flag automaticRecoveryEnabled=true to the URI, Camel will reconnect if the connection is lost.
For automatic RabbitMQ resource recovery (Connections/Channels/Consumers/Queues/Exchanages/Bindings) when failures occur, check out Lyra (which I authored). Example usage:
Config config = new Config()
.withRecoveryPolicy(new RecoveryPolicy()
.withMaxAttempts(20)
.withInterval(Duration.seconds(1))
.withMaxDuration(Duration.minutes(5)));
ConnectionOptions options = new ConnectionOptions().withHost("localhost");
Connection connection = Connections.create(options, config);
The rest of the API is just the amqp-client API, except your resources are automatically recovered when failures occur.
I'm not sure about camel-rabbitmq specifically, but hopefully there's a way you can swap in your own resource creation via Lyra.
Current camel-rabbitmq just create a connection and the channel when the consumer or producer is started. So it don't have a chance to catch the connection exception :(.

How to enable JBoss server TRACE log?

I am having web application running in JBOSS AS 4.2.2.
Observed that jboss server automatically shuts down, and the following exception is observed in server.log
14:20:38,048 INFO [Server] Runtime shutdown hook called, forceHalt: true
14:20:38,049 INFO [Server] JBoss SHUTDOWN: Undeploying all packages
I want to enable TRACE for org.jboss.system.server.Server in jboss-log4j.xml, to hopefully get some more info when the server shuts down.
Please let me know how to enable TRACE for org.jboss.system.server.Server in jboss-log4j.xml.
I was able to add trace for server log and i could see the following output when JBOSS AS shuts down automatically:
2010-06-09 19:07:46,631 DEBUG [org.jboss.wsf.stack.jbws.RequestHandlerImpl] END handleRequest: jboss.ws:context=hpnp_lqs,endpoint=APIWebService
2010-06-09 19:07:46,631 DEBUG [org.jboss.ws.core.soap.MessageContextAssociation] popMessageContext: org.jboss.ws.core.jaxws.handler.SOAPMessageContextJAXWS#3290a11e (Thread http-0.0.0.0-8080-1)
2010-06-09 19:07:55,895 INFO [org.jboss.system.server.Server] Runtime shutdown hook called, forceHalt: true
2010-06-09 19:07:55,895 TRACE [org.jboss.system.server.Server] Shutdown caller:
java.lang.Throwable: Here
at org.jboss.system.server.ServerImpl$ShutdownHook.shutdown(ServerImpl.java:1017)
at org.jboss.system.server.ServerImpl$ShutdownHook.run(ServerImpl.java:996)
2010-06-09 19:07:55,895 INFO [org.jboss.system.server.Server] JBoss SHUTDOWN: Undeploying all packages
If anybody, has any clue, on what might be cause for automatic shutdown, pls help me.
Thanks!
There's a JBoss wiki page listing log output for various shutdown causes. It looks like yours was caused by a Ctrl-C. I assume you would have known if you hit Ctrl-C, though.
On unix-type servers, Ctrl-C generates a TERM signal, which could also come from someone or some script running as your jboss user or as root executing "kill <jboss pid>". If you're on linux I'd take a look at this question about the OOM killer.
One possible cause for this behaviour is console logout. We have observed this with our own server.
In brief, by default the Sun JVM listens to the event of the console user logging out, and shuts itself down automatically when that happens. To disable this, start the JVM with the -Xrs parameter.
See here for more details (look for Mysterious shutdowns).
One possible cause for a forced shutdown is if the virtual machine is out of memory.
I had this problem several years ago when a colleague implemented some very nasty bulk loading of objects from a database which caused jboss to shutdown on certain requests.
Try searching for "memory" or similar keywords in the log file and/or monitor the memory usage of the server.