I'm looking for a standard pattern for automatically retrying failed jobs within Spring XD for a configured number of times and after a specified delay. Specifically, I have an HTTP item reader job that is triggered periodically from a cron stream. Occasionally we see the HTTP item reader fail due to network blips so we want the job to automatically try again.
I've tried with a JobExecutionListener which picks up when a job has failed but the tricky bit is actually retrying the failed job. I can do it by triggering a HTTP PUT to the XD admin controller (e.g. http://xd-server:9393/jobs/executions/2?restart=true)
which successfully retries the job. However, I want to be able to:
Specify a delay before retrying
Have some sort of audit within XD to indicate the job will be retried in X seconds.
Adding the delay can be done within the JobExecutionListener but it involves spinning off a thread with a delay which isnt really traceable from the XD container so it's difficult to see if a job is about the be retried or not.
It appears that you need to have a specific job definition that does delayed job retries for you to be able to get any trace of it from the XD container.
Can anyone suggest a pattern for this?
So here's the solution I went for in the end:
Created a job execution listener
public class RestartableBatchJobExecutionListener extends JobExecutionListener {
private Logger logger = LoggerFactory.getLogger(this.getClass());
public final static String JOB_RESTARTER_NAME = "jobRestarter";
/**
* A list of valid exceptions that are permissible to restart the job on
*/
private List<Class<Throwable>> exceptionsToRestartOn = new ArrayList<Class<Throwable>>();
/**
* The maximum number of times the job can be re-launched before failing
*/
private int maxRestartAttempts = 0;
/**
* The amount of time to wait in milliseconds before restarting a job
*/
private long restartDelayMs = 0;
/**
* Map of all the jobs against how many times they have been attempted to restart
*/
private HashMap<Long,Integer> jobInstanceRestartCount = new HashMap<Long,Integer>();
#Autowired(required=false)
#Qualifier("aynchJobLauncher")
JobLauncher aynchJobLauncher;
#Autowired(required=false)
#Qualifier("jobRegistry")
JobLocator jobLocator;
/*
* (non-Javadoc)
* #see org.springframework.batch.core.JobExecutionListener#afterJob(org.springframework.batch.core.JobExecution)
*/
#Override
public void afterJob(JobExecution jobExecution) {
super.afterJob(jobExecution);
// Check if we can restart if the job has failed
if( jobExecution.getExitStatus().equals(ExitStatus.FAILED) )
{
applyRetryPolicy(jobExecution);
}
}
/**
* Executes the restart policy if one has been specified
*/
private void applyRetryPolicy(JobExecution jobExecution)
{
String jobName = jobExecution.getJobInstance().getJobName();
Long instanceId = jobExecution.getJobInstance().getInstanceId();
if( exceptionsToRestartOn.size() > 0 && maxRestartAttempts > 0 )
{
// Check if the job has failed for a restartable exception
List<Throwable> failedOnExceptions = jobExecution.getAllFailureExceptions();
for( Throwable reason : failedOnExceptions )
{
if( exceptionsToRestartOn.contains(reason.getClass()) ||
exceptionsToRestartOn.contains(reason.getCause().getClass()) )
{
// Get our restart count for this job instance
Integer restartCount = jobInstanceRestartCount.get(instanceId);
if( restartCount == null )
{
restartCount = 0;
}
// Only restart if we haven't reached our limit
if( ++restartCount < maxRestartAttempts )
{
try
{
reLaunchJob(jobExecution, reason, restartCount);
jobInstanceRestartCount.put(instanceId, restartCount);
}
catch (Exception e)
{
String message = "The following error occurred while attempting to re-run job " + jobName + ":" + e.getMessage();
logger.error(message,e);
throw new RuntimeException( message,e);
}
}
else
{
logger.error("Failed to successfully execute jobInstanceId {} of job {} after reaching the maximum restart limit of {}. Abandoning job",instanceId,jobName,maxRestartAttempts );
try
{
jobExecution.setStatus(BatchStatus.ABANDONED);
}
catch (Exception e)
{
throw new RuntimeException( "The following error occurred while attempting to abandon job " + jobName + ":" + e.getMessage(),e);
}
}
break;
}
}
}
}
/**
* Re-launches the configured job with the current job execution details
* #param jobExecution
* #param reason
* #throws JobParametersInvalidException
* #throws JobInstanceAlreadyCompleteException
* #throws JobRestartException
* #throws JobExecutionAlreadyRunningException
*/
private void reLaunchJob( JobExecution jobExecution, Throwable reason, int restartCount ) throws JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException, JobParametersInvalidException
{
try
{
Job jobRestarter = jobLocator.getJob(JOB_RESTARTER_NAME);
JobParameters jobParameters =new JobParametersBuilder().
addLong("delay",(long)restartDelayMs).
addLong("jobExecutionId", jobExecution.getId()).
addString("jobName", jobExecution.getJobInstance().getJobName())
.toJobParameters();
logger.info("Re-launching job with name {} due to exception {}. Attempt {} of {}", jobExecution.getJobInstance().getJobName(), reason, restartCount, maxRestartAttempts);
aynchJobLauncher.run(jobRestarter, jobParameters);
}
catch (NoSuchJobException e)
{
throw new RuntimeException("Failed to find the job restarter with name=" + JOB_RESTARTER_NAME + " in container context",e);
}
}
}
Then in the module definition, I add this job listener to the job:
<batch:job id="job">
<batch:listeners>
<batch:listener ref="jobExecutionListener" />
</batch:listeners>
<batch:step id="doReadWriteStuff" >
<batch:tasklet>
<batch:chunk reader="itemReader" writer="itemWriter"
commit-interval="3">
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
<!-- Specific job execution listener that attempts to restart failed jobs -->
<bean id="jobExecutionListener"
class="com.mycorp.RestartableBatchJobExecutionListener">
<property name="maxRestartAttempts" value="3"></property>
<property name="restartDelayMs" value="60000"></property>
<property name="exceptionsToRestartOn">
<list>
<value>com.mycorp.ExceptionIWantToRestartOn</value>
</list>
</property>
</bean>
<!--
Specific job launcher that restarts jobs in a separate thread. This is important as the delayedRestartJob
fails on the HTTP call otherwise!
-->
<bean id="executor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="maxPoolSize" value="10"></property>
</bean>
<bean id="aynchJobLauncher"
class="com.mycorp.AsyncJobLauncher">
<property name="jobRepository" ref="jobRepository" />
<property name="taskExecutor" ref="executor" />
</bean>
AysncJobLauncher:
public class AsyncJobLauncher extends SimpleJobLauncher
{
#Override
#Async
public JobExecution run(final Job job, final JobParameters jobParameters)
throws JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException,
JobParametersInvalidException
{
return super.run(job, jobParameters);
}
}
I then have a separate processor module purely for restarting jobs after a delay (this allows us audit from the spring XD ui or db):
delayedJobRestart.xml:
<batch:job id="delayedRestartJob">
<batch:step id="sleep" next="restartJob">
<batch:tasklet ref="sleepTasklet" />
</batch:step>
<batch:step id="restartJob">
<batch:tasklet ref="jobRestarter" />
</batch:step>
</batch:job>
<bean id="sleepTasklet" class="com.mycorp.SleepTasklet" scope="step">
<property name="delayMs" value="#{jobParameters['delay'] != null ? jobParameters['delay'] : '${delay}'}" />
</bean>
<bean id="jobRestarter" class="com.mycorp.HttpRequestTasklet" init-method="init" scope="step">
<property name="uri" value="http://${xd.admin.ui.host}:${xd.admin.ui.port}/jobs/executions/#{jobParameters['jobExecutionId'] != null ? jobParameters['jobExecutionId'] : '${jobExecutionId}'}?restart=true" />
<property name="method" value="PUT" />
</bean>
delayedJobProperties:
# Job execution ID
options.jobExecutionId.type=Long
options.jobExecutionId.description=The job execution ID of the job to be restarted
# Job execution name
options.jobName.type=String
options.jobName.description=The name of the job to be restarted. This is more for monitoring purposes
# Delay
options.delay.type=Long
options.delay.description=The delay in milliseconds this job will wait until triggering the restart
options.delay.default=10000
and accompanying helper beans:
SleepTasklet:
public class SleepTasklet implements Tasklet
{
private static Logger logger = LoggerFactory.getLogger(SleepTasklet.class);
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception
{
logger.debug("Pausing current job for {}ms",delayMs);
Thread.sleep( delayMs );
return RepeatStatus.FINISHED;
}
private long delayMs;
public long getDelayMs()
{
return delayMs;
}
public void setDelayMs(long delayMs)
{
this.delayMs = delayMs;
}
}
HttpRequestTasklet:
public class HttpRequestTasklet implements Tasklet
{
private HttpClient httpClient = null;
private static final Logger LOGGER = LoggerFactory.getLogger(HttpRequestTasklet.class);
private String uri;
private String method;
/**
* Initialise HTTP connection.
* #throws Exception
*/
public void init() throws Exception
{
// Create client
RequestConfig config = RequestConfig.custom()
.setCircularRedirectsAllowed(true)
.setRedirectsEnabled(true)
.setExpectContinueEnabled(true)
.setRelativeRedirectsAllowed(true)
.build();
httpClient = HttpClientBuilder.create()
.setRedirectStrategy(new LaxRedirectStrategy())
.setDefaultRequestConfig(config)
.setMaxConnTotal(1)
.build();
}
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception
{
if (LOGGER.isDebugEnabled()) LOGGER.debug("Attempt HTTP {} from '" + uri + "'...",method);
HttpUriRequest request = null;
switch( method.toUpperCase() )
{
case "GET":
request = new HttpGet(uri);
break;
case "POST":
request = new HttpPost(uri);
break;
case "PUT":
request = new HttpPut(uri);
break;
default:
throw new RuntimeException("Http request method " + method + " not supported");
}
HttpResponse response = httpClient.execute(request);
// Check response status and, if valid wrap with InputStreamReader
StatusLine status = response.getStatusLine();
if (status.getStatusCode() != HttpStatus.SC_OK)
{
throw new Exception("Failed to get data from '" + uri + "': " + status.getReasonPhrase());
}
if (LOGGER.isDebugEnabled()) LOGGER.debug("Successfully issued request");
return RepeatStatus.FINISHED;
}
public String getUri()
{
return uri;
}
public void setUri(String uri)
{
this.uri = uri;
}
public String getMethod()
{
return method;
}
public void setMethod(String method)
{
this.method = method;
}
public HttpClient getHttpClient()
{
return httpClient;
}
public void setHttpClient(HttpClient httpClient)
{
this.httpClient = httpClient;
}
}
And finally when all is built and deployed, create your jobs as a pair (note, the restarter should be defined as "jobRestarter"):
job create --name myJob --definition "MyJobModule " --deploy true
job create --name jobRestarter --definition "delayedRestartJob" --deploy true
A little convoluted, but it seems to work.
Related
I have an scheduled job implemented with Spring batch. Right now when it finishes it doesn't start again because it is detected as completed, is it possible to reset its state after completion?
#Component
class JobScheduler {
#Autowired
private Job job1;
#Autowired
private JobLauncher jobLauncher;
#Scheduled(cron = "0 0/15 * * * ?")
public void launchJob1() throws Exception {
this.jobLauncher.run(this.job1, new JobParameters());
}
}
#Configuration
public class Job1Configuration{
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Bean
public Job job1() {
return this.jobBuilderFactory.get("job1")
.start(this.step1()).on(STEP1_STATUS.NOT_READY.get()).end()
.from(this.step1()).on(STEP1_STATUS.READY.get()).to(this.step2())
.next(this.step3())
.end()
.build();
}
}
I know I can set a job parameter with the time or the id, but this will launch a new execution every 15 minutes. I want to repeat the same execution until is completed without errors, and then, execute a new one.
You can't restart your job because you're setting the job status to COMPLETE by calling end() in .start(this.step1()).on(STEP1_STATUS.NOT_READY.get()).end().
You should instead either fail the job by calling .start(this.step1()).on(STEP1_STATUS.NOT_READY.get()).fail()
or stop the job by calling .start(this.step1()).on(STEP1_STATUS.NOT_READY.get()).stopAndRestart(step1())
Those options will mean the job status is either FAILED or STOPPED instead of COMPLETE which means that if you launch the job with the same JobParameters, it will restart the previous job execution.
See https://docs.spring.io/spring-batch/docs/current/reference/html/step.html#configuringForStop
To launch the job in a way that handles restarting previous instances or starting a new instance, you could look at how the SimpleJobService in spring-batch-admin does it and modify the launch method slightly for your purposes. This requires you to specify an incremental job parameter that is used to launch new instances of your job.
https://github.com/spring-attic/spring-batch-admin/blob/master/spring-batch-admin-manager/src/main/java/org/springframework/batch/admin/service/SimpleJobService.java#L250
#Override
public JobExecution launch(String jobName, JobParameters jobParameters) throws NoSuchJobException,
JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException,
JobParametersInvalidException {
JobExecution jobExecution = null;
if (jobLocator.getJobNames().contains(jobName)) {
Job job = jobLocator.getJob(jobName);
JobExecution lastJobExecution = jobRepository.getLastJobExecution(jobName, jobParameters);
boolean restart = false;
if (lastJobExecution != null) {
BatchStatus status = lastJobExecution.getStatus();
if (status.isUnsuccessful() && status != BatchStatus.ABANDONED) {
restart = true;
}
}
if (job.getJobParametersIncrementer() != null && !restart) {
jobParameters = job.getJobParametersIncrementer().getNext(jobParameters);
}
jobExecution = jobLauncher.run(job, jobParameters);
if (jobExecution.isRunning()) {
activeExecutions.add(jobExecution);
}
} else {
if (jsrJobOperator != null) {
// jobExecution = this.jobExecutionDao
// .getJobExecution(jsrJobOperator.start(jobName, jobParameters.toProperties()));
jobExecution = new JobExecution(jsrJobOperator.start(jobName, jobParameters.toProperties()));
} else {
throw new NoSuchJobException(String.format("Unable to find job %s to launch",
String.valueOf(jobName)));
}
}
return jobExecution;
}
I think the difficulty here comes from mixing scheduling with restartability. I would make each schedule execute a distinct job instance (for example by adding the run time as an identifying job parameter).
Now if a given schedule fails, it could be restarted separately until completion without affecting subsequent schedules. This can be done manually or programmtically in another scheduled method.
This is the solution I came up with after all the comments:
#Component
class JobScheduler extends JobSchedulerLauncher {
#Autowired
private Job job1;
#Scheduled(cron = "0 0/15 * * * ?")
public void launchJob1() throws Exception {
this.launch(this.job1);
}
}
public abstract class JobSchedulerLauncher {
#Autowired
private JobOperator jobOperator;
#Autowired
private JobExplorer jobExplorer;
public void launch(Job job) throws JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException,
JobParametersInvalidException, NoSuchJobException, NoSuchJobExecutionException, JobExecutionNotRunningException, JobParametersNotFoundException, UnexpectedJobExecutionException {
// Get the last instance
final List<JobInstance> jobInstances = this.jobExplorer.findJobInstancesByJobName(job.getName(), 0, 1);
if (CollectionUtils.isNotEmpty(jobInstances)) {
// Get the last executions
final List<JobExecution> jobExecutions = this.jobExplorer.getJobExecutions(jobInstances.get(0));
if (CollectionUtils.isNotEmpty(jobExecutions)) {
final JobExecution lastJobExecution = jobExecutions.get(0);
if (lastJobExecution.isRunning()) {
this.jobOperator.stop(lastJobExecution.getId().longValue());
this.jobOperator.abandon(lastJobExecution.getId().longValue());
} else if (lastJobExecution.getExitStatus().equals(ExitStatus.FAILED) || lastJobExecution.getExitStatus().equals(ExitStatus.STOPPED)) {
this.jobOperator.restart(lastJobExecution.getId().longValue());
return;
}
}
}
this.jobOperator.startNextInstance(job.getName());
}
}
My job now uses an incrementer, based on this one https://docs.spring.io/spring-batch/docs/current/reference/html/job.html#JobParametersIncrementer:
#Bean
public Job job1() {
return this.jobBuilderFactory.get("job1")
.incrementer(new CustomJobParameterIncrementor())
.start(this.step1()).on(STEP1_STATUS.NOT_READY.get()).end()
.from(this.step1()).on(STEP1_STATUS.READY.get()).to(this.step2())
.next(this.step3())
.end()
.build();
}
In my case my scheduler won't start 2 instances of the same job at the same time, so if I detect a running job in this code it means that the server restarted leaving the job with status STARTED, that's why I stop it and abandon it.
scenario , i have two server nodes in beginning and when we are trying to connect client nodes taking 15+ min to start client. please find below server configuration. only change is IP address for another server nd, and on console i am getting below error thanks in advance
[12:42:10] Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, igniteInstanceName=null, finished=false, heartbeatTs=1600672317715]]] [12:42:40,486][SEVERE][tcp-disco-msg-worker-[5023dc59 172.16.0.189:48510]-#2][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=tcp-comm-worker, threadName=tcp-comm-worker-#1, blockedFor=18s] [12:42:40] Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, igniteInstanceName=null, finished=false, heartbeatTs=1600672341604]]] [12:42:49,498][SEVERE][tcp-disco-msg-worker-[5023dc59 172.16.0.189:48510]-#2][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=tcp-comm-worker, threadName=tcp-comm-worker-#1, blockedFor=27s] [12:42:49] Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, igniteInstanceName=null, finished=false, heartbeatTs=1600672341604]]] [12:43:01,603][SEVERE][tcp-disco-msg-worker-[5023dc59 172.16.0.189:48510]-#2][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=tcp-comm-worker, threadName=tcp-comm-worker-#1, blockedFor=39s]
``
-->
-->
<!-- <property name="consistentId" value="#{ systemEnvironment['IGNITE_CONSISTENT_ID'] }" /> -->
<!-- Enable task execution events for examples. -->
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true" />
<property name="maxSize" value="#{4L * 1024 * 1024 * 1024}"/>
<property name="initialSize" value="#{1L * 1024 * 1024 * 1024}"/>
</bean>
</property>
</bean>
</property>
<!-- Explicitly configure TCP discovery SPI to provide list of initial nodes. -->
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="localPort" value="48510"/>
<property name="ipFinder">
<!--
Ignite provides several options for automatic discovery that can be used
instead os static IP based discovery. For information on all options refer
to our documentation: http://apacheignite.readme.io/docs/cluster-config
-->
<!-- Uncomment static IP finder to enable static-based discovery of initial nodes. -->
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<!-- <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder"> -->
<property name="addresses">
<list>
<!-- In distributed environment, replace with actual host IP address. -->
<value>127.0.0.1:48510..48512</value>
<value>X.16.0.X:48510..48512</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
<property name="communicationSpi">
<bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
<property name="localPort" value="48110"/>
<!-- <property name="localPortRange" value="1000"/> -->
</bean>
</property>
<property name="clientConnectorConfiguration">
<bean class="org.apache.ignite.configuration.ClientConnectorConfiguration">
<property name="port" value="10801"/>
</bean>
</property>
<property name="userAttributes">
<map>
<entry key="ROLE" value="SecindNode" />
</map>
</property>
</bean>
``
Client Code
``
public final class IgniteConnectionUtil {
private static final Logger logger = Logger.getLogger(IgniteConnectionUtil.class);
private static IgniteConnectionUtil instance;
private static Ignite ignite;
private static String CACHE_NAME = "CollectionCache";
private static String jdbcThinHost = null;
private IgniteConnectionUtil() {
if(ignite == null)
init();
try {
boolean clearRedisMap = ConfigurationManager.getInstance().getPropertyAsBoolean("CLEAR_REDIS_MAP",
"IN_MEMORY_DB", "CONFIG");
if (clearRedisMap)
InMemoryTableStore.getInstance().clearStore();
} catch (Exception e) {
logger.info("Unable to clear ignite-redis map");
}
}
public static synchronized void init() {
try {
if(!isIgniteEnabled() || ignite != null)
return;
logger.info("Ignite Client starting");
Ignition.setClientMode(true);
DataStorageConfiguration storageCfg = new DataStorageConfiguration();
storageCfg.setWalMode(WALMode.BACKGROUND);
IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setDataStorageConfiguration(storageCfg);
cfg.setPeerClassLoadingEnabled(true);
TcpDiscoverySpi discoverySpi = new TcpDiscoverySpi();
TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
String serverIp = ConfigurationManager.getInstance()
.getPropertyAsString("SERVER_ADDRESS", "IN_MEMORY_DB", "CONFIG");
//ipFinder.setAddresses(Arrays.asList(serverIp));
ipFinder.setAddresses(
Arrays.asList("127.0.0.1:48510","127.0.0.1:48511","127.0.0.1:48512",
"X.16.0.189:48510","X.16.0.X:48511","X.16.0.X:48512"
));
discoverySpi.setLocalPort(48510);
// timeout for which client node will try to connect to ignite servers
// it will throw exception and exit if server can not be found
long discoveryTimeout = ConfigurationManager.getInstance()
.getPropertyAsLong("DISCOVERY_TIMEOUT", "IN_MEMORY_DB", "CONFIG");
discoverySpi.setIpFinder(ipFinder).setJoinTimeout(discoveryTimeout);
TcpCommunicationSpi commSpi = new TcpCommunicationSpi();
long communicationTimeout = ConfigurationManager.getInstance()
.getPropertyAsLong("COMMUNICATION_TIMEOUT", "IN_MEMORY_DB", "CONFIG");
commSpi.setConnectTimeout(communicationTimeout).setLocalPort(48110);
// this timeout is used to reconnect client to server if server has failed/restarted
long clientFailureDetectionTimeout = ConfigurationManager.getInstance()
.getPropertyAsLong("CLIENT_FAILURE_DETECTION_TIMEOUT", "IN_MEMORY_DB", "CONFIG");
cfg.setClientFailureDetectionTimeout(30000);
cfg.setDiscoverySpi(discoverySpi);
cfg.setCommunicationSpi(commSpi);
//cfg.setIncludeEventTypes(EventType.EVT_NODE_JOINED);
ignite = Ignition.start(cfg);
ignite.cluster().active(true);
ignite.cluster().baselineAutoAdjustEnabled(true);
ignite.cluster().baselineAutoAdjustTimeout(30000);
initializeJDBCThinDriver();
//igniteEventListen();
logger.info("Ignite Client started");
} catch (Exception e) {
logger.error("Error in starting ignite cluster", e);
}
}
public static synchronized IgniteConnectionUtil getInstance() {
if (instance == null) {
instance = new IgniteConnectionUtil();
} else {
try {
if(ignite == null || ignite.cluster() == null) {
logger.error("Illegal Ignite state. Will try to restart ignite clinet.");
init();
} else if(Ignition.state().equals(IgniteState.STOPPED_ON_SEGMENTATION)) {
logger.error("Reconnecting to Ignite");
ignite = null;
init();
}else if(!ignite.cluster().active())
ignite.cluster().active(true);
} catch(Exception e) {
logger.error("Ignite Exception. Please restart ignite server.");
}
}
return instance;
}
public static void initializeJDBCThinDriver() {
try {
Class.forName("org.apache.ignite.IgniteJdbcThinDriver");
jdbcThinHost = ConfigurationManager.getInstance()
.getPropertyAsString("JDBC_THIN_HOST", "IN_MEMORY_DB", "CONFIG");
} catch (ClassNotFoundException e) {
logger.error("Error in loading IgniteJdbcThinDriver class", e);
}
}
public Connection getJDBCConnection() {
Connection conn = null;
try {
conn = DriverManager.getConnection("jdbc:ignite:thin://"+jdbcThinHost+"/");
if(conn == null )
{
conn = DriverManager.getConnection("jdbc:ignite:thin://172.16.0.189:10801/");
}
} catch (SQLException e) {
logger.error("Error in getting Ignite JDBC connection", e);
}
return conn;
}
public IgniteCache<?, ?> getOrCreateCache(String cacheName) {
CacheConfiguration<?, ?> cacheConfig = new CacheConfiguration<>(CACHE_NAME);
//cacheConfig.setDataRegionName("500MB_Region");
cacheConfig.setCacheMode(CacheMode.PARTITIONED);
cacheConfig.setBackups(1);
cacheConfig.setRebalanceMode(CacheRebalanceMode.ASYNC);
cacheConfig.setAtomicityMode(CacheAtomicityMode.ATOMIC);
cacheConfig.setWriteSynchronizationMode(CacheWriteSynchronizationMode.PRIMARY_SYNC);
cacheConfig.setReadFromBackup(true);
cacheConfig.setCopyOnRead(true);
cacheConfig.setOnheapCacheEnabled(true);
cacheConfig.setSqlSchema("PUBLIC");
if(ignite != null) {
return ignite.getOrCreateCache(cacheConfig);
}else {
throw new IgniteSQLException("Internal Server Error Please contact support");
}
}
public IgniteCache<?, ?> getOrCreateCache() {
CacheConfiguration<?, ?> cacheConfig = new CacheConfiguration<>(CACHE_NAME);
//cacheConfig.setDataRegionName("500MB_Region");
cacheConfig.setCacheMode(CacheMode.PARTITIONED);
cacheConfig.setBackups(1);
cacheConfig.setRebalanceMode(CacheRebalanceMode.ASYNC);
cacheConfig.setAtomicityMode(CacheAtomicityMode.ATOMIC);
cacheConfig.setWriteSynchronizationMode(CacheWriteSynchronizationMode.PRIMARY_SYNC);
cacheConfig.setReadFromBackup(true);
cacheConfig.setCopyOnRead(true);
cacheConfig.setOnheapCacheEnabled(true);
cacheConfig.setSqlSchema("PUBLIC");
if(ignite != null) {
return ignite.getOrCreateCache(cacheConfig);
}else {
throw new IgniteSQLException("Internal Server Error Please contact support");
}
}
public static synchronized void shutdown() throws Exception {
try {
if(ignite != null) {
ignite.close();
}
} catch(IgniteException ie) {
throw new Exception(ie);
} finally {
ignite = null;
}
}
public static boolean isIgniteEnabled() throws Exception {
return ConfigurationManager.getInstance().getPropertyAsBoolean("ENABLED",
"IN_MEMORY_DB");
}
}
``
Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=tcp-comm-worker, threadName=tcp-comm-worker-#1, blockedFor=18s]
This would likely mean that server node cannot connect to client's communication port (47100), or vice versa. In 2.8.1 or earlier, it needs to be traversable in both directions. In 2.9, new operation mode will be introduced where server will never try to connect to client, only the traditional way around.
I need to publish notification events to external systems over JMS, when data is updated. Id like this to be done within the same transaction as the objects are committed to the database to ensure integrity.
The ApplicationLifecycle events that spring-data-rest emits seemed like the logical place to implement this logic.
#org.springframework.transaction.annotation.Transactional
public class TestEventListener extends AbstractRepositoryEventListener<Object> {
private static final Logger LOG = LoggerFactory.getLogger(TestEventListener.class);
#Override
protected void onBeforeCreate(Object entity) {
LOG.info("XXX before create");
}
#Override
protected void onBeforeSave(Object entity) {
LOG.info("XXX before save");
}
#Override
protected void onAfterCreate(Object entity) {
LOG.info("XXX after create");
}
#Override
protected void onAfterSave(Object entity) {
LOG.info("XXX after save");
}
}
However, these events happen before and after the tx starts and commits.
08 15:32:37.119 [http-nio-9000-exec-1] INFO n.c.v.vcidb.TestEventListener - XXX before create
08 15:32:37.135 [http-nio-9000-exec-1] TRACE o.s.t.i.TransactionInterceptor - Getting transaction for [org.springframework.data.jpa.repository.support.SimpleJpaRepository.save]
08 15:32:37.432 [http-nio-9000-exec-1] TRACE o.s.t.i.TransactionInterceptor - Completing transaction for [org.springframework.data.jpa.repository.support.SimpleJpaRepository.save]
08 15:32:37.479 [http-nio-9000-exec-1] INFO n.c.v.vcidb.TestEventListener - XXX after create
What extension point does spring-data-rest have for adding behaviour that will execute within the spring managed transaction?
I use aop (pointcut and tx advice) to solve this problem:
#Configuration
#ImportResource("classpath:/aop-config.xml")
public class AopConfig { ...
and aop-config.xml:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:aop="http://www.springframework.org/schema/aop" xmlns:tx="http://www.springframework.org/schema/tx"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop.xsd
http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx.xsd"
default-autowire="byName">
<aop:config>
<aop:pointcut id="restRepositoryTx"
expression="execution(* org.springframework.data.rest.webmvc.RepositoryEntityController.*(..))" />
<aop:advisor id="managerTx" advice-ref="txAdvice" pointcut-ref="restRepositoryTx" order="20" />
</aop:config>
<tx:advice id="txAdvice" transaction-manager="transactionManager">
<tx:attributes>
<tx:method name="postCollectionResource*" propagation="REQUIRES_NEW" rollback-for="Exception" />
<tx:method name="putItemResource*" propagation="REQUIRES_NEW" rollback-for="Exception" />
<tx:method name="patchItemResource*" propagation="REQUIRES_NEW" rollback-for="Exception" />
<tx:method name="deleteItemResource*" propagation="REQUIRES_NEW" rollback-for="Exception" />
<!-- <tx:method name="*" rollback-for="Exception" /> -->
</tx:attributes>
</tx:advice>
</beans>
This is the same as having controller methods annotated with #Transactional.
The solution described by phlebas work. And I also think "Run event handler within a same transaction" should be a feature which should be provided by Spring Data Rest. There are many common use cases to need to split logic to sepreate eventHandler. just like "triggers in database". The version show below is same as phlebas solution.
#Aspect
#Component
public class SpringDataRestTransactionAspect {
private TransactionTemplate transactionTemplate;
public SpringDataRestTransactionAspect(PlatformTransactionManager transactionManager) {
this.transactionTemplate = new TransactionTemplate(transactionManager);
this.transactionTemplate.setName("around-data-rest-transaction");
}
#Pointcut("execution(* org.springframework.data.rest.webmvc.*Controller.*(..))")
public void aroundDataRestCall(){}
#Around("aroundDataRestCall()")
public Object aroundDataRestCall(ProceedingJoinPoint joinPoint) throws Throwable {
return transactionTemplate.execute(transactionStatus -> {
try {
return joinPoint.proceed();
} catch (Throwable e) {
transactionStatus.setRollbackOnly();
if(e instanceof RuntimeException) {
throw (RuntimeException)e;
} else {
throw new RuntimeException(e);
}
}
});
}
}
I have not worked on spring-data-rest, but with spring, this can be handled the following way.
1) Define custom TransactionSynchronizationAdapter, and register the bean in TransactionSynchronizationManager.
Usually, I have a method registerSynchronizaiton with a #Before pointcut for this.
#SuppressWarnings("rawtypes") #Before("#annotation(org.springframework.transaction.annotation.Transactional)")
public void registerSynchronization() {
// TransactionStatus transStatus = TransactionAspectSupport.currentTransactionStatus();
TransactionSynchronizationManager.registerSynchronization(this);
final String transId = UUID.randomUUID().toString();
TransactionSynchronizationManager.setCurrentTransactionName(transId);
transactionIds.get().push(transId);
if (TransactionSynchronizationManager.isActualTransactionActive() && TransactionSynchronizationManager
.isSynchronizationActive() && !TransactionSynchronizationManager.isCurrentTransactionReadOnly()) {
if (!TransactionSynchronizationManager.hasResource(KEY)) {
final List<NotificationPayload> notifications = new ArrayList<NotificationPayload>();
TransactionSynchronizationManager.bindResource(KEY, notifications);
}
}
}
2) And, implement Override method as follows
#Override public void afterCompletion(final int status) {
CurrentContext context = null;
try {
context = ExecutionContext.get().getContext();
} catch (final ContextNotFoundException ex) {
logger.debug("Current Context is not available");
return;
}
if (status == STATUS_COMMITTED) {
transactionIds.get().removeAllElements();
publishedEventStorage.sendAllStoredNotifications();
// customize here for commit actions
} else if ((status == STATUS_ROLLED_BACK) || (status == STATUS_UNKNOWN)) {
// you can write your code for rollback actions
}
}
I'm using Quartz and want to change it's thread pool size via remote JMX call, but unfortunately couldn't find any proper solution. Is it possible to change the configuration of the running job programmatically ?
I used Quartz with spring. In my web.xml I created a spring ContextListener. My app starts the Quartz job and exposes 2 JMX methods to start and stop on demand.
<listener>
<listener-class>za.co.lance.admin.infrastructure.ui.util.MBeanContextListener</listener-class>
</listener>
The MBeanContextListener class like this.
public class MBeanContextListener extends ContextLoaderListener {
private ObjectName objectName;
private static Logger logger = LoggerFactory.getLogger(MBeanContextListener.class);
#Override
public void contextDestroyed(final ServletContextEvent sce) {
super.contextDestroyed(sce);
logger.debug("=============> bean context listener destroy");
final MBeanServer mbeanServer = ManagementFactory.getPlatformMBeanServer();
try {
mbeanServer.unregisterMBean(objectName);
logger.info("=============> QuartzJmx unregisterMBean ok");
} catch (final Exception e) {
e.printStackTrace();
}
}
#Override
public void contextInitialized(final ServletContextEvent sce) {
super.contextInitialized(sce);
logger.debug("=============> bean context listener started");
final MBeanServer mbeanServer = ManagementFactory.getPlatformMBeanServer();
try {
final QuartzJmx processLatestFailedDocumentsMbean = new QuartzJmx();
Scheduler scheduler = (Scheduler) ContextLoader.getCurrentWebApplicationContext().getBean("runProcessLatestFailedDocumentsScheduler");
processLatestFailedDocumentsMbean.setScheduler(scheduler);
objectName = new ObjectName("za.co.lance.admin.infrastructure.jmx.mbeans:type=QuartzJmxMBean");
mbeanServer.registerMBean(processLatestFailedDocumentsMbean, objectName);
logger.info("=============> QuartzJmx registerMBean ok");
} catch (final Exception e) {
e.printStackTrace();
}
}
}
The QuartzJmx class. PLEASE NOTE! Any MBean class (QuartzJmx) must have an interface ending with MBean (QuartzJmxMBean ).
#Component
public class QuartzJmx implements QuartzJmxMBean {
private Scheduler scheduler;
private static Logger LOG = LoggerFactory.getLogger(QuartzJmx.class);
#Override
public synchronized void suspendRunProcessLatestFailedDocumentsJob() {
LOG.info("Suspending RunProcessLatestFailedDocumentsJob");
if (scheduler != null) {
try {
if (scheduler.isStarted()) {
scheduler.standby();
LOG.info("RunProcessLatestFailedDocumentsJob suspended");
} else {
LOG.info("RunProcessLatestFailedDocumentsJob already suspended");
throw new SchedulerException("RunProcessLatestFailedDocumentsJob already suspended");
}
} catch (SchedulerException e) {
LOG.error(e.getMessage());
}
} else {
LOG.error("Cannot suspend RunProcessLatestFailedDocumentsJob. Scheduler = null");
throw new IllegalArgumentException("Cannot suspend RunProcessLatestFailedDocumentsJob. Scheduler = null");
}
}
#Override
public synchronized void startRunProcessLatestFailedDocumentsJob() {
LOG.info("Starting RunProcessLatestFailedDocumentsJob");
if (scheduler != null) {
try {
if (scheduler.isInStandbyMode()) {
scheduler.start();
LOG.info("RunProcessLatestFailedDocumentsJob started");
} else {
LOG.info("RunProcessLatestFailedDocumentsJob already started");
throw new SchedulerException("scheduler already started");
}
} catch (SchedulerException e) {
LOG.error(e.getMessage());
}
} else {
LOG.error("Cannot start RunProcessLatestFailedDocumentsJob. Scheduler = null");
throw new IllegalArgumentException("Cannot start RunProcessLatestFailedDocumentsJob. Scheduler = null");
}
}
#Override
public void setScheduler(Scheduler scheduler) {
this.scheduler = scheduler;
}
And last, the Spring context
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-3.0.xsd">
<bean id="runProcessLatestFailedDocumentsTask"
class="za.co.lance.admin.infrastructure.service.vbs.process.ProcessDocumentServiceImpl" />
<!-- Spring Quartz -->
<bean name="runProcessLatestFailedDocumentsJob" class="org.springframework.scheduling.quartz.JobDetailBean">
<property name="jobClass"
value="za.co.lance.admin.infrastructure.service.quartz.RunProcessLatestFailedDocuments" />
<property name="jobDataAsMap">
<map>
<entry key="processDocumentService" value-ref="runProcessLatestFailedDocumentsTask" />
</map>
</property>
</bean>
<!-- Cron Trigger -->
<bean id="processLatestFailedDocumentsTrigger" class="org.springframework.scheduling.quartz.CronTriggerBean">
<property name="jobDetail" ref="runProcessLatestFailedDocumentsJob" />
<!-- Cron-Expressions (seperated with a space) fields are -->
<!-- Seconds Minutes Hours Day-of-Month Month Day-of-Week Year(optional) -->
<!-- Run every morning hour from 9am to 6pm from Monday to Saturday -->
<property name="cronExpression" value="0 0 9-18 ? * MON-SAT" />
</bean>
<!-- Scheduler -->
<bean id="runProcessLatestFailedDocumentsScheduler"
class="org.springframework.scheduling.quartz.SchedulerFactoryBean">
<property name="jobDetails">
<list>
<ref bean="runProcessLatestFailedDocumentsJob" />
</list>
</property>
<property name="triggers">
<list>
<ref bean="processLatestFailedDocumentsTrigger" />
</list>
</property>
</bean>
</beans>
I am trying to launch a job in Spring Batch 2, and I need to pass some information in the job parameters, but I do not want it to count for the uniqueness of the job instance. For example, I'd want these two sets of parameters to be considered unique:
file=/my/file/path,session=1234
file=/my/file/path,session=5678
The idea is that there will be two different servers trying to start the same job, but with different sessions attached to them. I need that session number in both cases. Any ideas?
Thanks!
So, if 'file' is the only attribute that's supposed to be unique and 'session' is used by downstream code, then your problem matches almost exactly what I had. I had a JMSCorrelationId that i needed to store in the execution context for later use and I didn't want it to play into the job parameters' uniqueness. Per Dave Syer, this really wasn't possible, so I took the route of creating the job with the parameters (not the 'session' in your case), and then adding the 'session' attribute to the execution context before anything actually runs.
This gave me access to 'session' downstream but it was not in the job parameters so it didn't affect uniqueness.
References
https://jira.springsource.org/browse/BATCH-1412
http://forum.springsource.org/showthread.php?104440-Non-Identity-Job-Parameters&highlight=
You'll see from this forum that there's no good way to do it (per Dave Syer), but I wrote my own launcher based on the SimpleJobLauncher (in fact I delegate to the SimpleLauncher if a non-overloaded method is called) that has an overloaded method for starting a job that takes a callback interface that allows contribution of parameters to the execution context while not being 'true' job parameters. You could do something very similar.
I think the applicable LOC for you is right here:
jobExecution = jobRepository.createJobExecution(job.getName(),
jobParameters);
if (contributor != null) {
if (contributor.contributeTo(jobExecution.getExecutionContext())) {
jobRepository.updateExecutionContext(jobExecution);
}
}
which is where, after execution context creatin, the execution context is added to. Hopefully this helps you in your implementation.
public class ControlMJobLauncher implements JobLauncher, InitializingBean {
private JobRepository jobRepository;
private TaskExecutor taskExecutor;
private SimpleJobLauncher simpleLauncher;
private JobFilter jobFilter;
public void setJobRepository(JobRepository jobRepository) {
this.jobRepository = jobRepository;
}
public void setTaskExecutor(TaskExecutor taskExecutor) {
this.taskExecutor = taskExecutor;
}
/**
* Optional filter to prevent job launching based on some specific criteria.
* Jobs that are filtered out will return success to ControlM, but will not run
*/
public void setJobFilter(JobFilter jobFilter) {
this.jobFilter = jobFilter;
}
public JobExecution run(final Job job, final JobParameters jobParameters, ExecutionContextContributor contributor)
throws JobExecutionAlreadyRunningException, JobRestartException,
JobInstanceAlreadyCompleteException, JobParametersInvalidException, JobFilteredException {
Assert.notNull(job, "The Job must not be null.");
Assert.notNull(jobParameters, "The JobParameters must not be null.");
//See if job is filtered
if(this.jobFilter != null && !jobFilter.launchJob(job, jobParameters)) {
throw new JobFilteredException(String.format("Job has been filtered by the filter: %s", jobFilter.getFilterName()));
}
final JobExecution jobExecution;
JobExecution lastExecution = jobRepository.getLastJobExecution(job.getName(), jobParameters);
if (lastExecution != null) {
if (!job.isRestartable()) {
throw new JobRestartException("JobInstance already exists and is not restartable");
}
logger.info(String.format("Restarting job %s instance %d", job.getName(), lastExecution.getId()));
}
// Check the validity of the parameters before doing creating anything
// in the repository...
job.getJobParametersValidator().validate(jobParameters);
/*
* There is a very small probability that a non-restartable job can be
* restarted, but only if another process or thread manages to launch
* <i>and</i> fail a job execution for this instance between the last
* assertion and the next method returning successfully.
*/
jobExecution = jobRepository.createJobExecution(job.getName(),
jobParameters);
if (contributor != null) {
if (contributor.contributeTo(jobExecution.getExecutionContext())) {
jobRepository.updateExecutionContext(jobExecution);
}
}
try {
taskExecutor.execute(new Runnable() {
public void run() {
try {
logger.info("Job: [" + job
+ "] launched with the following parameters: ["
+ jobParameters + "]");
job.execute(jobExecution);
logger.info("Job: ["
+ job
+ "] completed with the following parameters: ["
+ jobParameters
+ "] and the following status: ["
+ jobExecution.getStatus() + "]");
} catch (Throwable t) {
logger.warn(
"Job: ["
+ job
+ "] failed unexpectedly and fatally with the following parameters: ["
+ jobParameters + "]", t);
rethrow(t);
}
}
private void rethrow(Throwable t) {
if (t instanceof RuntimeException) {
throw (RuntimeException) t;
} else if (t instanceof Error) {
throw (Error) t;
}
throw new IllegalStateException(t);
}
});
} catch (TaskRejectedException e) {
jobExecution.upgradeStatus(BatchStatus.FAILED);
if (jobExecution.getExitStatus().equals(ExitStatus.UNKNOWN)) {
jobExecution.setExitStatus(ExitStatus.FAILED
.addExitDescription(e));
}
jobRepository.update(jobExecution);
}
return jobExecution;
}
static interface ExecutionContextContributor {
boolean CONTRIBUTED_SOMETHING = true;
boolean CONTRIBUTED_NOTHING = false;
/**
*
* #param executionContext
* #return true if the exeuctioncontext was contributed to
*/
public boolean contributeTo(ExecutionContext executionContext);
}
#Override
public void afterPropertiesSet() throws Exception {
Assert.state(jobRepository != null, "A JobRepository has not been set.");
if (taskExecutor == null) {
logger.info("No TaskExecutor has been set, defaulting to synchronous executor.");
taskExecutor = new SyncTaskExecutor();
}
this.simpleLauncher = new SimpleJobLauncher();
this.simpleLauncher.setJobRepository(jobRepository);
this.simpleLauncher.setTaskExecutor(taskExecutor);
this.simpleLauncher.afterPropertiesSet();
}
#Override
public JobExecution run(Job job, JobParameters jobParameters)
throws JobExecutionAlreadyRunningException, JobRestartException,
JobInstanceAlreadyCompleteException, JobParametersInvalidException {
return simpleLauncher.run(job, jobParameters);
}
}
Starting from spring batch 2.2.x, there is support for non-identifying parameters. If you are using CommandLineJobRunner, you can specify non-identifying parameters with '-' prefix.
For example:
java org.springframework.batch.core.launch.support.CommandLineJobRunner file=/my/file/path -session=5678
If you are using old version of spring batch, you need to migrate your database schema. See 'Migrating to 2.x.x' section at http://docs.spring.io/spring-batch/getting-started.html.
This is the Jira page of the feature https://jira.springsource.org/browse/BATCH-1412, and here are the change that implement it https://fisheye.springsource.org/changelog/spring-batch?cs=557515df45c0f596588418d53c3f2bae3781c1c3
In more recent versions of Spring Batch (I am using spring-batch-core:4.3.3), you can use the JobParametersBuilder to specify whether a parameter is identifying or not. For example:
new JobParametersBuilder()
.addString("identifying-param-name", paramValue1)
.addString("non-identifying-param-name", paramValue2, false)
.toJobParameters();
The 'false' in the third argument makes the parameter non-identifying.