In the spring batch project, I used JdbcCursorItemReader to read data to process them in parallel. I can run the batch locally without any problem.
I also heard that JdbcPagingItemReader is recommended for parallel processing against JdbcCursorItemReader, as cursor reader will hold the connection too long while paging reader can release connection once the page size is reached.
I then switched to JdbcPagingItemReader in step2, but out of surprise, I got the exception below when running locally.
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 -
Connection is not available, request timed out after 300001ms.
However, it seems the above exception occurs in step1 before the paging reader in step2 is executed, and that is the only change made. Please shed some light on why the exception is thrown and if it is good practice to use paging reader instead of cursor in parallel processing. Much appreciated your help!
The code snippet is pasted below:
#Bean
#StepScope
public Flow createParallelSubFlow() {
List<Flow> subFlowList = new ArrayList<>();
List<Stream> streamList;
try {
streamList = dataSourceConfig.streamMapper().
getStreamListByStatus(Constants.PENDING_STATUS_CD);
} catch (Exception e) {
}
streamList.forEach(stream -> {
long id = stream.getStreamId();
String flowName = "stream" + id + "_flow";
Flow subFlow = new FlowBuilder<Flow>(flowName)
.start(step1(id))
.next(step2(id))
.end();
subFlowList.add(subFlow);
});
return new FlowBuilder<Flow>("splitFlow").split(new SimpleAsyncTaskExecutor())
.add(subFlowList.toArray(new Flow[0])).build();
}
public Step step1(long id) {
return stepBuilderFactory.get("step1")
.<Domain, Domain>chunk(100)
.reader(reader1(id))
.writer(writer1())
.build();
}
//#StepScope
//#Bean
public Step step2(long id) {
return stepBuilderFactory.get("step2")
.<Domain, Domain>chunk(100)
.reader(cursorReader2(id))
.processor(processor2)
.writer(writer2())
.build();
}
public JdbcCursorItemReader<Domain> cursorReader2(Long id) {
return new JdbcCursorItemReaderBuilder<Domain>()
.dataSource(dataSourceConfig.dataSource())
.name("cursorReader")
.sql(Constants.QUERY_SQL)
.preparedStatementSetter(new PreparedStatementSetter() {
#Override
public void setValues(PreparedStatement ps) throws SQLException {
ps.setLong(1, id);
}})
.rowMapper(new RowMapper())
.build();
}
//Switch from cursorReader2 to pagingReader2 in step2
public JdbcPagingItemReader<Domain> pagingReader2(Long id) {
return new JdbcPagingItemReaderBuilder<Domain>()
.dataSource(dataSourceConfig.dataSource())
.name("pagingReader")
.queryProvider(queryProvider())
.parameterValues(parameterValues(id))
.rowMapper(new RowMapper())
.pageSize(100)
.build();
}
#Bean
public PagingQueryProvider queryProvider() {
SqlPagingQueryProviderFactoryBean providerFactory = new SqlPagingQueryProviderFactoryBean();
Map<String, Order> sortKeys = new HashMap<>(2);
sortKeys.put("ID", Order.ASCENDING);
providerFactory.setDataSource(dataSourceConfig.dataSource());
providerFactory.setSelectClause("SELECT Clause");
providerFactory.setFromClause("FROM Clause");
providerFactory.setWhereClause("WHERE Clause");
providerFactory.setSortKeys(sortKeys);
PagingQueryProvider pagingQueryProvider = null;
try {
pagingQueryProvider = providerFactory.getObject();
} catch (Exception e) {
logger.error("Failed to get PagingQueryProvider", e);
throw new RuntimeException("Failed to get PagingQueryProvider", e);
}
return pagingQueryProvider;
}
private Map<String, Object> parameterValues(Long id) {
Map<String, Object> parameterValues = new HashMap<>();
parameterValues.put("1", id);
return parameterValues;
}
Related
I'm using Spring Batch with Spring Cloud Task for remote partitioning. But for each new Job execution it is created with the same task execution id. Is there any way to create a new task execution id for new job execution?
In the following Task Execution table, each job is running with same parent execution id.
For each new job execution it is starting within the same task execution. The code for Batch configuration is as follows:
#Bean
public PartitionHandler partitionHandler(TaskLauncher taskLauncher, JobExplorer jobExplorer, Environment environment, DelegatingResourceLoader delegatingResourceLoader, TaskRepository taskRepository) {
Resource resource = delegatingResourceLoader.getResource(jarLocation);
DeployerPartitionHandler partitionHandler = new DeployerPartitionHandler(taskLauncher, jobExplorer, resource, "workerStep", taskRepository);
List<String> commandLineArguments = new ArrayList<>(5);
commandLineArguments.add("--spring.profiles.active=worker");
commandLineArguments.add("--spring.cloud.task.initialize.enable=false");
commandLineArguments.add("--spring.batch.initializer.enabled=false");
commandLineArguments.add("--spring.cloud.task.closecontext_enabled=true");
commandLineArguments.add("--logging.level=DEBUG");
partitionHandler.setCommandLineArgsProvider(new PassThroughCommandLineArgsProvider(commandLineArguments));
partitionHandler.setEnvironmentVariablesProvider(new SimpleEnvironmentVariablesProvider(environment));
partitionHandler.setMaxWorkers(2);
partitionHandler.setApplicationName("BatchApplicationWorker");
return partitionHandler;
}
#Bean
#StepScope
public Partitioner partitioner(#Value("#{jobParameters['inputFiles']}") String file, #Value("#{jobParameters['partitionSize']}") String partitionSize1){
int partitionSize = Integer.parseInt(partitionSize1);
return new Partitioner() {
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, ExecutionContext> partitions = new HashMap<>();
String[] ids = fetchAllPrimaryKeys(file);
List<List<String>> partitionPayloads = splitPayLoad(ids, partitionSize);
int size = partitionPayloads.size();
for(int i = 0 ; i < size ; i++) {
ExecutionContext executionContext = new ExecutionContext();
executionContext.put("partitionNumber", i);
executionContext.put("partitionPayLoad", new ArrayList<>(partitionPayloads.get(i)));
partitions.put("partition" + i, executionContext);
}
return partitions;
}
};
}
#Bean
public Step masterStep(Step workerStep, PartitionHandler partitionHandler) {
return this.stepBuilderFactory.get("masterStep")
.partitioner(workerStep.getName(), partitioner(null, null))
.step(workerStep)
.partitionHandler(partitionHandler)
.build();
}
#Bean
public Step workerStep(CustomWriter customWriter, CustomProcessor customProcessor) {
return this.stepBuilderFactory.get("workerStep")
.<User,User>chunk(10000)
.reader(reader(null))
.processor(customProcessor)
.writer(customWriter)
.build();
}
#Bean
public Job batchJob(Step masterStep, JobExecutionListnerClass jobExecutionListnerClass, JobBuilderFactory jobBuilderFactory) {
return jobBuilderFactory.get("batchJob")
.incrementer(new RunIdIncrementer())
.start(masterStep)
.listener(jobExecutionListnerClass)
.build();
public Long jobRunner(JobParams jobParams) throws BatchException {
Map<String, JobParameter> maps = new HashMap<>();
maps.put(Constants.TIME, new JobParameter(System.currentTimeMillis()));
maps.put(Constants.INPUT_FILES, new JobParameter(jobParams.getInputSource()));
maps.put(Constants.PARTITION_SIZE, new JobParameter(Integer.toString(jobParams.getPartitionSize())));
maps.put(Constants.MAIL_RECIPIENTS, new JobParameter(jobParams.getMailRecipients()));
maps.put(Constants.JOB_NAME, new JobParameter(jobParams.getJobName()));
maps.put(Constants.JOB_DESCRIPTION, new JobParameter(jobParams.getJobDescription()));
maps.put(Constants.JOB_RESTART, new JobParameter(Boolean.toString(jobParams.getRestart())));
JobParameters jobParameters = new JobParameters(maps);
JobExecution jobExecution;
try {
jobExecution = jobLauncher.run(job, jobParameters);
} catch (JobExecutionAlreadyRunningException | JobRestartException | JobInstanceAlreadyCompleteException
| JobParametersInvalidException e) {
throw new BatchException(e.getMessage());
}
return jobExecution.getId();
}
I am trying to use Processor Indicator Pattern to make my job idempotent, i tried to use Write Listener, AfterWrite to update mongo document by setting a field Processed: true. However there were issues when there is a big number of chunks.
MongoDB Item Reader(10000 Docs) ---chunk(1000)--> JDBC Batch Item Writer(Only 5000 are saved in table after Step's completion)
The following Code is about The step:
#Bean
public MongoItemReader<X> Reader() throws Exception {
MongoItemReader<X> reader = new MongoItemReader<>();
reader.setTemplate(mongoTemplate);
reader.setCollection("MY_COLLECTION");
reader.setTargetType(X.class);
reader.setQuery("{PROCESSED: {$exists: false}}");
reader.setSort(new HashMap<String, Sort.Direction>() {{
put("_id", Sort.Direction.ASC);
}});
reader.afterPropertiesSet();
return reader;
}
#Bean
public XItemProcessor x_item_processor() {
return new XItemProcessor();
}
#Bean
public X_Item_Listener item_listener() {
return new X_Item_Listener();
}
#Bean
public X_Step_Listener step_listener() {
return new X_Step_Listener();
}
#Bean
public JdbcBatchItemWriter<Y> YWriter() {
JdbcBatchItemWriter<Y> Y_Writer = new JdbcBatchItemWriter<>();
Y_Writer.setDataSource(dataSource);
Y_Writer.setAssertUpdates(true);
Y_Writer.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>());
Y_Writer.setSql("INSERT INTO Y (Y1,Y2,Y3,Y4) VALUES (:y1, :y2, :y3, :y4)");
Y_Writer.afterPropertiesSet();
return Y_Writer;
}
#Bean
public Step XY_Step() throws Exception {
return stepBuilderFactory.get("XY")
.<X, Y>chunk(1000)
.reader(Reader())
.processor(x_item_processor())
.writer(YWriter())
.faultTolerant()
.skipLimit(Integer.MAX_VALUE)
.skip(Exception.class)
.listener((ItemProcessListener<? super X, ? super Y>) item_listener())
.listener(step_listener())
.build();
}
Here a snippet of code used in After Write Listener for updating mongo Document.
#Autowired
private MongoTemplate mongoTemplate;
#Transactional(propagation = Propagation.REQUIRES_NEW)
public void afterWrite(List<? extends Y> items) {
BulkOperations ops=mongoTemplate.bulkOps(BulkOperations.BulkMode.UNORDERED,"MY_COLLECTION");
for (Y item : items) {
Update update = new Update().set("PROCESSED", true);
ops.updateOne(new Query(Criteria.where("_id").is(item.getID())), update);
}
ops.execute();
}
Our writer is designed to write records to a relational database.
If exception occurs on any of the records, Spring Batch performs a rollback and retry write operation on each record in the chunk. This results in SQL Duplicate Key exception to occur since previously processed records in the chunk were successfully written to the database.
We have tried making use of noRetry() and noRollback(), specifying explicitly a list of exceptions that should not trigger retry or rollback.
According to Spring Batch online documentation noRollback() could be used to prevent rollback when error occurs on ItemWriter: https://docs.spring.io/spring-batch/4.1.x/reference/html/step.html#controllingRollback
However, this contradicts java doc in the source code which says that FaultTolerantStepBuilder.noRollback() is ignored during write:
https://docs.spring.io/spring-batch/4.1.x/api/index.html?org/springframework/batch/core/step/builder/FaultTolerantStepBuilder.html
Here is a sample of our Job definition:
#Bean("my-job")
public Job job(Step step) {
return jobBuilderFactory.get("my-job")
.start(step)
.build();
}
#Bean
public Step step() {
return stepBuilderFactory.get("skip-step")
.<String, String>chunk(3)
.reader(reader())
.processor(myprocessor())
.writer(this::write)
.faultTolerant()
.skipLimit(1)
.skip(JobSkippableException.class)
.noRollback(JobSkippableException.class)
.noRetry(JobSkippableException.class)
.processorNonTransactional()
.build();
}
public ItemReader<String> reader() {
return new ItemReader<String> () {
#Override
public String read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
String s = randomUUID().toString();
logger.debug("READ STRING {}", s);
return s;
}
};
}
public void write(List<? extends String> items) {
for(String s : items) {
logger.debug("WRITE STRING {}", s);
throw new JobSkippableException("My skippable exception");
}
}
public ItemProcessor <String, String> myprocessor() {
return new ItemProcessor<String, String>() {
#Override
public String process(String item) throws Exception {
logger.debug("PROCESS STRING {}", item);
return item;
}
};
}
Our expected behavior is that exceptions that occur in write doesn’t trigger a retry or rollback. This would prevent repeat calls to database and hence not cause SQL Duplicate Key exception.
Not a solution, but at least an explanation for why the framework does not behave as you expect I found in lines 335-350 of FaultTolerantChunkProcessor:
try {
doWrite(outputs.getItems());
}
catch (Exception e) {
if (rollbackClassifier.classify(e)) {
throw e;
}
/*
* If the exception is marked as no-rollback, we need to
* override that, otherwise there's no way to write the
* rest of the chunk or to honour the skip listener
* contract.
*/
throw new ForceRollbackForWriteSkipException(
"Force rollback on skippable exception so that skipped item can be located.", e); }
I'm just desperately looking for example code for an Esper CEP Kafka Adapter code. I've already installed Kafka and wrote data to a Kafka topic using a producer and now I want to process it with Esper CEP. Unfortunately the documentation of Esper for the Kafka Adapter is not very meaningful. Does anyone have a very simple example?
Edit:
So far I added an adapter and it seems to work. However, I don't know how to read the adapter nor how to link a CEP pattern with this adapter. This is my code so far:
config.addImport(KafkaOutputDefault.class);
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.StringDeserializer.class.getName());
props.put(ConsumerConfig.GROUP_ID_CONFIG, "group.id");
props.put(EsperIOKafkaConfig.INPUT_SUBSCRIBER_CONFIG, EsperIOKafkaInputSubscriberByTopicList.class.getName());
props.put(EsperIOKafkaConfig.TOPICS_CONFIG, "test123");
props.put(EsperIOKafkaConfig.INPUT_PROCESSOR_CONFIG, EsperIOKafkaInputProcessorDefault.class.getName());
props.put(EsperIOKafkaConfig.INPUT_TIMESTAMPEXTRACTOR_CONFIG, EsperIOKafkaInputTimestampExtractorConsumerRecord.class.getName());
Configuration config2 = new Configuration();
config2.addPluginLoader("KafkaInput", EsperIOKafkaInputAdapterPlugin.class.getName(), props, null);
EsperIOKafkaInputAdapter adapter = new EsperIOKafkaInputAdapter(props, "default");
adapter.start();
I've had the same problem. I created a sample Project you could have a look at, especially the plain-esper branch.
An even more simplified Version would be:
public class KafkaExample implements Runnable {
private String runtimeURI;
public KafkaExample(String runtimeURI) {
this.runtimeURI = runtimeURI;
}
public static void main(String[] args){
new KafkaExample("KafkaExample").run();
}
#Override
public void run() {
Configuration configuration = new Configuration();
configuration.getCommon().addImport(KafkaOutputDefault.class);
configuration.getCommon().addEventType(String.class);
Properties consumerProps = new Properties();
// Kafka Consumer Properties
consumerProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
consumerProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
consumerProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,StringDeserializer.class.getName());
consumerProps.put(ConsumerConfig.GROUP_ID_CONFIG, UUID.randomUUID().toString());
consumerProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, OffsetResetStrategy.EARLIEST.toString().toLowerCase());
// EsperIO Kafka Input Adapter Properties
consumerProps.put(EsperIOKafkaConfig.INPUT_SUBSCRIBER_CONFIG, Consumer.class.getName());
consumerProps.put(EsperIOKafkaConfig.INPUT_PROCESSOR_CONFIG, InputProcessor.class.getName());
consumerProps.put(EsperIOKafkaConfig.INPUT_TIMESTAMPEXTRACTOR_CONFIG, EsperIOKafkaInputTimestampExtractorConsumerRecord.class.getName());
configuration.getRuntime().addPluginLoader("KafkaInput", EsperIOKafkaInputAdapterPlugin.class.getName(), consumerProps, null);
String stmt = "#name('sampleQuery') select * from String";
EPCompiled compiled;
try {
compiled = EPCompilerProvider.getCompiler().compile(stmt, new CompilerArguments(configuration));
} catch (EPCompileException ex) {
throw new RuntimeException(ex);
}
EPRuntime runtime = EPRuntimeProvider.getRuntime(runtimeURI, configuration);
EPDeployment deployment;
try {
deployment = runtime.getDeploymentService().deploy(compiled, new DeploymentOptions().setDeploymentId(UUID.randomUUID().toString()));
} catch (EPDeployException ex) {
throw new RuntimeException(ex);
}
EPStatement statement = runtime.getDeploymentService().getStatement(deployment.getDeploymentId(), "sampleQuery");
statement.addListener((newData, oldData, sta, run) -> {
for (EventBean nd : newData) {
System.out.println(nd.getUnderlying());
}
});
while (true) {}
}
}
public class Consumer implements EsperIOKafkaInputSubscriber {
#Override
public void subscribe(EsperIOKafkaInputSubscriberContext context) {
Collection<String> collection = new ArrayList<String>();
collection.add("input");
context.getConsumer().subscribe(collection);
}
}
public class InputProcessor implements EsperIOKafkaInputProcessor {
private EPRuntime runtime;
#Override
public void init(EsperIOKafkaInputProcessorContext context) {
this.runtime = context.getRuntime();
}
#Override
public void process(ConsumerRecords<Object, Object> records) {
for (ConsumerRecord record : records) {
if (record.value() != null) {
try {
runtime.getEventService().sendEventBean(record.value().toString(), "String");
} catch (Exception e) {
throw e;
}
}
}
}
public void close() {}
}
Sample code follows. This code assumes there are already some messages in the topic. This does not loop and wait for more messages.
Properties consumerProps = new Properties();
consumerProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, ip);
consumerProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.StringDeserializer.class.getName());
consumerProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.StringDeserializer.class.getName());
consumerProps.put(ConsumerConfig.GROUP_ID_CONFIG, "mygroup");
KafkaConsumer consumer = new KafkaConsumer<>(consumerProps);
ConsumerRecords<String, String> rows = consumer.poll(1000);
Iterator<ConsumerRecord<String, String>> it = rows.iterator();
while (it.hasNext()) {
ConsumerRecord<String, String> row = it.next();
MyEvent event = new MyEvent(row.value()); // transform string to event
// process event
runtime.sendEvent(event);
}
I am trying to create a Spring Batch POC with Java Configuration and PostGreSQL.
I have successfully created beans that would have otherwise been provided via the in memory DB using #EnableBatchProcessing and #EnableAutoConfiguration.
I am not able to get the beans (JobExplorer) to return a JobExecution list given a JobInstance bean created from the same JobExplorer bean.
The error I am getting is "Unable to deserialize the execution context" which seems to be coming from the method trying to deserialize the "SHORT_CONTEXT" field of the JOB_EXECUTION_CONTEXT table.
I have passed the created JobExplorer bean DefaultExecutionContextSerializer. Later passed a DefaultLobHandler with "wrapAsLob" set to True when I was still getting the error.
#Bean
public JobRegistry jobRegistry() {
JobRegistry jr = new MapJobRegistry();
return jr;
}
#Bean
public JobRegistryBeanPostProcessor jobRegistryBeanPostProcessor() {
JobRegistryBeanPostProcessor jrbpp = new JobRegistryBeanPostProcessor();
jrbpp.setJobRegistry(jobRegistry());
return jrbpp;
}
#Bean
public JobOperator jobOperator() {
SimpleJobOperator sjo = new SimpleJobOperator();
sjo.setJobExplorer(jobExplorer());
sjo.setJobLauncher(jobLauncher());
sjo.setJobRegistry(jobRegistry());
sjo.setJobRepository(jobRepository());
return sjo;
}
#Bean
public JobExplorer jobExplorer() {
JobExplorerFactoryBean jefb = new JobExplorerFactoryBean();
jefb.setDataSource(dataSource());
jefb.setJdbcOperations(jdbcTemplate);
jefb.setTablePrefix("batch_");
jefb.setSerializer(new DefaultExecutionContextSerializer());
DefaultLobHandler lh = new DefaultLobHandler();
lh.setWrapAsLob(true);
jefb.setLobHandler(lh);
JobExplorer je = null;
try {
je = jefb.getObject();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return je;
}
#ConfigurationProperties(prefix = "spring.datasource")
#Bean
#Primary
public DataSource dataSource() {
return DataSourceBuilder.create().build();
}
#Bean
public JobRepository jobRepository() {
JobRepositoryFactoryBean jrfb = new JobRepositoryFactoryBean();
jrfb.setDataSource(dataSource());
jrfb.setDatabaseType("POSTGRES");
jrfb.setTransactionManager(new ResourcelessTransactionManager());
jrfb.setSerializer(new DefaultExecutionContextSerializer());
jrfb.setTablePrefix("batch_");
JobRepository jr = null;
try {
jr = (JobRepository)jrfb.getObject();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return jr;
}
Below is the get method in my rest controller where I am trying handle generate a list of failed Job executions
#Autowired
JobLauncher jobLauncher;
#Autowired
JobRegistry jobRegistry;
#Autowired
JobOperator jobOperator;
#Autowired
JobExplorer jobExplorer;
#SuppressWarnings("unchecked")
#GetMapping("batch/failedJobs")
public Map<String, List<JobExecution>> getFailedJobs() {
try {
if (jobRegistry == null || jobOperator == null || jobExplorer == null) {
System.out.println("job registry, operator or explorer is null");
} else {
Map<String, List<JobExecution>> allJobInstances = new HashMap<String, List<JobExecution>>();
// Get all jobs
jobRegistry.getJobNames().stream().forEach(jobName -> {
jobExplorer.getJobInstances(jobName, 1, 1000).forEach(l -> {
System.out.println("jobName: " + jobName + " instance: " + l);
});
jobExplorer.getJobInstances(jobName, 1, 1000).stream().forEach(jobInstance -> {
List<JobExecution> execultionList = jobExplorer.getJobExecutions(jobInstance); //Failing here
if (execultionList != null) {
System.out.println("" + execultionList);
execultionList.stream().forEach(l2 -> {
System.out.println("jobName: " + jobName + " instance: " + jobInstance
+ " jobExecution: " + l2);
});
if(allJobInstances.get(jobName) == null) {
allJobInstances.put(jobName, new ArrayList<JobExecution>());
}
allJobInstances.get(jobName).addAll((Collection<? extends JobExecution>) jobExplorer.getJobExecutions(jobInstance).stream().filter(e -> e.getStatus().equals(BatchStatus.FAILED)));
}else {
System.out.println("Could not get jobExecution for jobName " + jobName + " jobInstance: " + jobInstance);
}
});
});
return allJobInstances;
}
}catch (Exception e) {
System.out.println(e.getMessage());
logger.info(e.getMessage());
}
return null;
}
I fixed a similar issue by changing to the Jackson2 serializer:
jefb.setSerializer(new Jackson2ExecutionContextStringSerializer());
You may try it.