How to commit the offsets when using KafkaItemReader in spring batch job, once all the messages are processed and written to the .dat file?

How to commit the offsets when using KafkaItemReader in spring batch job, once all the messages are processed and written to the .dat file? - apache-kafka

I have developed a Spring Batch Job which read from Kafka topic using KafkaItemReader class. I want to commit the offset only when the messages read in defined chunk are Processed and written successfully to an Output .dat file.
#Bean
public Job kafkaEventReformatjob(
#Qualifier("MaintStep") Step MainStep,
#Qualifier("moveFileToFolder") Step moveFileToFolder,
#Qualifier("compressFile") Step compressFile,
JobExecutionListener listener)
{
return jobBuilderFactory.get("kafkaEventReformatJob")
.listener(listener)
.incrementer(new RunIdIncrementer())
.flow(MainStep)
.next(moveFileToFolder)
.next(compressFile)
.end()
.build();
}
#Bean
Step MainStep(
ItemProcessor<IncomingRecord, List<Record>> flatFileItemProcessor,
ItemWriter<List<Record>> flatFileWriter)
{
return stepBuilderFactory.get("mainStep")
.<InputRecord, List<Record>> chunk(5000)
.reader(kafkaItemReader())
.processor(flatFileItemProcessor)
.writer(writer())
.listener(basicStepListener)
.build();
}
//Reader reads all the messages from akfka topic and sending back in form of IncomingRecord.
#Bean
KafkaItemReader<String, IncomingRecord> kafkaItemReader() {
Properties props = new Properties();
props.putAll(this.properties.buildConsumerProperties());
List<Integer> partitions = new ArrayList<>();
partitions.add(0);
partitions.add(1);
return new KafkaItemReaderBuilder<String, IncomingRecord>()
.partitions(partitions)
.consumerProperties(props)
.name("records")
.saveState(true)
.topic(topic)
.pollTimeout(Duration.ofSeconds(40L))
.build();
}
#Bean
public ItemWriter<List<Record>> writer() {
ListUnpackingItemWriter<Record> listUnpackingItemWriter = new ListUnpackingItemWriter<>();
listUnpackingItemWriter.setDelegate(flatWriter());
return listUnpackingItemWriter;
}
public ItemWriter<Record> flatWriter() {
FlatFileItemWriter<Record> fileWriter = new FlatFileItemWriter<>();
String tempFileName = "abc";
LOGGER.info("Output File name " + tempFileName + " is in working directory ");
String workingDir = service.getWorkingDir().toAbsolutePath().toString();
Path outputFile = Paths.get(workingDir, tempFileName);
fileWriter.setName("fileWriter");
fileWriter.setResource(new FileSystemResource(outputFile.toString()));
fileWriter.setLineAggregator(lineAggregator());
fileWriter.setForceSync(true);
fileWriter.setFooterCallback(customFooterCallback());
fileWriter.close();
LOGGER.info("Successfully created the file writer");
return fileWriter;
}
#StepScope
#Bean
public TransformProcessor processor() {
return new TransformProcessor();
}
==============================================================================
Writer Class
#BeforeStep
public void beforeStep(StepExecution stepExecution) {
this.stepExecution = stepExecution;
}
#AfterStep
public void afterStep(StepExecution stepExecution) {
this.stepExecution.setWriteCount(count);
}
#Override
public void write(final List<? extends List<Record>> lists) throws Exception {
List<Record> consolidatedList = new ArrayList<>();
for (List<Record> list : lists) {
if (!list.isEmpty() && null != list)
consolidatedList.addAll(list);
}
delegate.write(consolidatedList);
count += consolidatedList.size(); // to count Trailer record count
}
===============================================================
Item Processor
#Override
public List process(IncomingRecord record) {
List<Record> recordList = new ArrayList<>();
if (null != record.getEventName() and a few other conditions inside this section) {
// setting values of Record Class by extracting from the IncomingRecord.
recordList.add(the valid records which matching the condition);
}else{
return null;
}

Synchronizing a read operation and a write operation between two transactional resources (a queue and a database for instance)
is possible by using a JTA transaction manager that coordinates both transaction managers (2PC protocol).
However, this approach is not possible if one of the resources is not transactional (like the majority of file systems). So unless you use
a transactional file system and a JTA transaction manager that coordinates a kafka transaction manager and a file system transaction manager..
you need another approach, like the Compensating Transaction pattern. In your case, the "undo" operation (compensating action) would be rewinding the offset where it was before the failed chunk.

Related

Pass filenames dynamically to FlatFileItemWriter through StepBuilderFactory stream() when using ClassifierCompositeItemProcessor in SpringBatch

I'm processing multiple input files with multi-format lines using ClassifierCompositeItemProcessor. But when using StepBuilderFactory stream to write the files, I'm unable to pass the Resource filename dynamically. Filename should be the respective input file name. Any help would be much appreciated.
Input File 1 (data-111111-12323.txt)
1#9999999#00001#2#RecordType1
2#00002#June#Statement#2020#9#RecordType2
3#7777777#RecordType3
Input File 2 (data-22222-23244.txt)
1#435435#00002#2#RecordType1
2#345435#July#Statement#2021#9#RecordType2
3#645456#RecordType3
Expected output file 1 (data-111111-12323.txt)
1#9999999#00001#2#RecordType1#mobilenumber1
2#00002#June#Statement#2020#9#RecordType2#mobilenumber2
3#7777777#RecordType3#mobilenumber3
Expected output file 2 (data-22222-23244.txt)
1#9999999#00001#2#RecordType1#mobilenumber1
2#00002#June#Statement#2020#9#RecordType2#mobilenumber2
3#7777777#RecordType3#mobilenumber3
Step
public Step partitionStep() throws Exception {
ItemReader reader = context.getBean(FlatFileItemReader.class);
ClassifierCompositeItemWriter writer = context.getBean(ClassifierCompositeItemWriter.class);
return stepBuilderFactory.get("statementProcessingStep.slave").<String, String>chunk(12).reader(reader).processor(processor()).writer(writer)
.stream(recordType0FlatFileItemWriter())
.stream(recordType1FlatFileItemWriter())
.build();
}
Processor
#Bean
#StepScope
public ItemProcessor processor() {
ClassifierCompositeItemProcessor<? extends RecordType, ? extends RecordType> processor = new ClassifierCompositeItemProcessor<>();
SubclassClassifier classifier = new SubclassClassifier();
Map typeMap = new HashMap();
typeMap.put(RecordType0.class, recordType0Processor);
typeMap.put(RecordType1.class, recordType1Processor);
classifier.setTypeMap(typeMap);
processor.setClassifier(classifier);
return processor;
}
Writer
#Bean
public FlatFileItemWriter<RecordType1> recordType1FlatFileItemWriter() throws Exception{
FlatFileItemWriter<RecordType1> writer = new FlatFileItemWriter<>();
writer.setResource( new FileSystemResource("record1.txt")); //This filename should be dynamic
writer.setAppendAllowed(true);
writer.setLineAggregator(new DelimitedLineAggregator<RecordType1>() {{
setDelimiter("#");
setFieldExtractor(new BeanWrapperFieldExtractor<RecordType1>() {
{
setNames(new String[] { "RecordType", "ID1", "ID2", "ID3"});
}
});
}});
return writer;
}

You can make your item reader/writer step-scoped and inject values from job parameters or step/job execution context using late-binding. For example:
#StepScope
#Bean
public FlatFileItemReader flatFileItemReader(#Value("#{jobParameters['input.file.name']}") String name) {
return new FlatFileItemReaderBuilder<Foo>()
.name("flatFileItemReader")
.resource(new FileSystemResource(name))
.build();
}
You can find more details in the Late Binding of Job and Step Attributes section of the reference documentation.

Kafka: Consumer api: Regression test fails if runs in a group (sequentially)

I have implemented a kafka application using consumer api. And I have 2 regression tests implemented with stream api:
To test happy path: by producing data from the test ( into the input topic that the application is listening to) that will be consumed by the application and application will produce data (into the output topic ) that the test will consume and validate against expected output data.
To test error path: behavior is the same as above. Although this time application will produce data into output topic and test will consume from application's error topic and will validate against expected error output.
My code and the regression-test codes are residing under the same project under expected directory structure. Both time ( for both tests) data should have been picked up by the same listener at the application side.
The problem is :
When I am executing the tests individually (manually), each test is passing. However, If I execute them together but sequentially ( for example: gradle clean build ) , only first test is passing. 2nd test is failing after the test-side-consumer polling for data and after some time it gives up not finding any data.
Observation:
From debugging, it looks like, the 1st time everything works perfectly ( test-side and application-side producers and consumers). However, during the 2nd test it seems that application-side-consumer is not receiving any data ( It seems that test-side-producer is producing data, but can not say that for sure) and hence no data is being produced into the error topic.
What I have tried so far:
After investigations, my understanding is that we are getting into race conditions and to avoid that found suggestions like :
use #DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_EACH_TEST_METHOD)
Tear off broker after each test ( Please see the ".destry()" on brokers)
use different topic names for each test
I applied all of them and still could not recover from my issue.
I am providing the code here for perusal. Any insight is appreciated.
Code for 1st test (Testing error path):
#DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_EACH_TEST_METHOD)
#EmbeddedKafka(
partitions = 1,
controlledShutdown = false,
topics = {
AdapterStreamProperties.Constants.INPUT_TOPIC,
AdapterStreamProperties.Constants.ERROR_TOPIC
},
brokerProperties = {
"listeners=PLAINTEXT://localhost:9092",
"port=9092",
"log.dir=/tmp/data/logs",
"auto.create.topics.enable=true",
"delete.topic.enable=true"
}
)
public class AbstractIntegrationFailurePathTest {
private final int retryLimit = 0;
#Autowired
protected EmbeddedKafkaBroker embeddedFailurePathKafkaBroker;
//To produce data
#Autowired
protected KafkaTemplate<PreferredMediaMsgKey, SendEmailCmd> inputProducerTemplate;
//To read from output error
#Autowired
protected Consumer<PreferredMediaMsgKey, ErrorCmd> outputErrorConsumer;
//Service to execute notification-preference
#Autowired
protected AdapterStreamProperties projectProerties;
protected void subscribe(Consumer consumer, String topic, int attempt) {
try {
embeddedFailurePathKafkaBroker.consumeFromAnEmbeddedTopic(consumer, topic);
} catch (ComparisonFailure ex) {
if (attempt < retryLimit) {
subscribe(consumer, topic, attempt + 1);
}
}
}
}
.
#TestConfiguration
public class AdapterStreamFailurePathTestConfig {
#Autowired
private EmbeddedKafkaBroker embeddedKafkaBroker;
#Value("${spring.kafka.adapter.application-id}")
private String applicationId;
#Value("${spring.kafka.adapter.group-id}")
private String groupId;
//Producer of records that the program consumes
#Bean
public Map<String, Object> sendEmailCmdProducerConfigs() {
Map<String, Object> results = KafkaTestUtils.producerProps(embeddedKafkaBroker);
results.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
AdapterStreamProperties.Constants.KEY_SERDE.serializer().getClass());
results.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
AdapterStreamProperties.Constants.INPUT_VALUE_SERDE.serializer().getClass());
return results;
}
#Bean
public ProducerFactory<PreferredMediaMsgKey, SendEmailCmd> inputProducerFactory() {
return new DefaultKafkaProducerFactory<>(sendEmailCmdProducerConfigs());
}
#Bean
public KafkaTemplate<PreferredMediaMsgKey, SendEmailCmd> inputProducerTemplate() {
return new KafkaTemplate<>(inputProducerFactory());
}
//Consumer of the error output, generated by the program
#Bean
public Map<String, Object> outputErrorConsumerConfig() {
Map<String, Object> props = KafkaTestUtils.consumerProps(
applicationId, Boolean.TRUE.toString(), embeddedKafkaBroker);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
AdapterStreamProperties.Constants.KEY_SERDE.deserializer().getClass()
.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
AdapterStreamProperties.Constants.ERROR_VALUE_SERDE.deserializer().getClass()
.getName());
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
return props;
}
#Bean
public Consumer<PreferredMediaMsgKey, ErrorCmd> outputErrorConsumer() {
DefaultKafkaConsumerFactory<PreferredMediaMsgKey, ErrorCmd> rpf =
new DefaultKafkaConsumerFactory<>(outputErrorConsumerConfig());
return rpf.createConsumer(groupId, "notification-failure");
}
}
.
#RunWith(SpringRunner.class)
#SpringBootTest(classes = AdapterStreamFailurePathTestConfig.class)
#ActiveProfiles(profiles = "errtest")
public class ErrorPath400Test extends AbstractIntegrationFailurePathTest {
#Autowired
private DataGenaratorForErrorPath400Test datagen;
#Mock
private AdapterHttpClient httpClient;
#Autowired
private ErroredEmailCmdDeserializer erroredEmailCmdDeserializer;
#Before
public void setup() throws InterruptedException {
Mockito.when(httpClient.callApi(Mockito.any()))
.thenReturn(
new GenericResponse(
400,
TestConstants.ERROR_MSG_TO_CHK));
Mockito.when(httpClient.createURI(Mockito.any(),Mockito.any(),Mockito.any())).thenCallRealMethod();
inputProducerTemplate.send(
projectProerties.getInputTopic(),
datagen.getKey(),
datagen.getEmailCmdToProduce());
System.out.println("producer: "+ projectProerties.getInputTopic());
subscribe(outputErrorConsumer , projectProerties.getErrorTopic(), 0);
}
#Test
public void testWithError() throws InterruptedException, InvalidProtocolBufferException, TextFormat.ParseException {
ConsumerRecords<PreferredMediaMsgKeyBuf.PreferredMediaMsgKey, ErrorCommandBuf.ErrorCmd> records;
List<ConsumerRecord<PreferredMediaMsgKeyBuf.PreferredMediaMsgKey, ErrorCommandBuf.ErrorCmd>> outputListOfErrors = new ArrayList<>();
int attempt = 0;
int expectedRecords = 1;
do {
records = KafkaTestUtils.getRecords(outputErrorConsumer);
records.forEach(outputListOfErrors::add);
attempt++;
} while (attempt < expectedRecords && outputListOfErrors.size() < expectedRecords);
//Verify the recipient event stream size
Assert.assertEquals(expectedRecords, outputListOfErrors.size());
//Validate output
}
#After
public void tearDown() {
outputErrorConsumer.close();
embeddedFailurePathKafkaBroker.destroy();
}
}
2nd test is almost the same in structure. Although this time the test-side-consumer is consuming from application-side-output-topic( instead of error topic). And I named the consumers,broker,producer,topics differently. Like :
#DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_EACH_TEST_METHOD)
#EmbeddedKafka(
partitions = 1,
controlledShutdown = false,
topics = {
AdapterStreamProperties.Constants.INPUT_TOPIC,
AdapterStreamProperties.Constants.OUTPUT_TOPIC
},
brokerProperties = {
"listeners=PLAINTEXT://localhost:9092",
"port=9092",
"log.dir=/tmp/data/logs",
"auto.create.topics.enable=true",
"delete.topic.enable=true"
}
)
public class AbstractIntegrationSuccessPathTest {
private final int retryLimit = 0;
#Autowired
protected EmbeddedKafkaBroker embeddedKafkaBroker;
//To produce data
#Autowired
protected KafkaTemplate<PreferredMediaMsgKey,SendEmailCmd> sendEmailCmdProducerTemplate;
//To read from output regular topic
#Autowired
protected Consumer<PreferredMediaMsgKey, NotifiedEmailCmd> ouputConsumer;
//Service to execute notification-preference
#Autowired
protected AdapterStreamProperties projectProerties;
protected void subscribe(Consumer consumer, String topic, int attempt) {
try {
embeddedKafkaBroker.consumeFromAnEmbeddedTopic(consumer, topic);
} catch (ComparisonFailure ex) {
if (attempt < retryLimit) {
subscribe(consumer, topic, attempt + 1);
}
}
}
}
Please let me know if I should provide any more information.,

"port=9092"
Don't use a fixed port; leave that out and the embedded broker will use a random port; the consumer configs are set up in KafkaTestUtils to point to the random port.
You shouldn't need to dirty the context after each test method - use a different group.id for each test and a different topic.

In my case the consumer was not closed properly. I had to do :
#After
public void tearDown() {
// shutdown hook to correctly close the streams application
Runtime.getRuntime().addShutdownHook(new Thread(ouputConsumer::close));
}
to resolve.

Custome ItemReader for reading each iteration of a for loop

Recently started working with Spring Batch chunk based processing. I need to create batch for creating random 3 million strings, 1 million each of different type and count. Like a million strings starting with A, next 1.5 million ending with GH. Cannot do it with for loop as it will block a thread. I have to write them in db also. No idea how to make my ItemReader read each iteration.
I have understood the custom ItemReader through this
but not getting what should be the "item" here.
If i create a chunk of 1000, then how the counters will be handled, counting the chunk entry and the string generated counter.

The ItemReader interface is quite similar to an Iterator and like that has to maintain the iteration state internally. The Spring Batch framework even provides an IteratorItemReader.
Since Java 8 there are Streams, which I find quite versatile also for generating data and which can be converted into an Iterator.
Here is a possible solution along the lines of what you described:
#RunWith(SpringRunner.class)
#SpringBootTest(classes = Java8StreamReaderTest.TestConfig.class, properties = {"spring.batch.job.enabled=false"})
#EnableBatchProcessing
public class Java8StreamReaderTest {
#Configuration
static class TestConfig {
#Bean
JobBuilderFactory jobBuilderFactory(final JobRepository jobRepository) {
return new JobBuilderFactory(jobRepository);
}
#Bean
StepBuilderFactory stepBuilderFactory(final JobRepository jobRepository, final PlatformTransactionManager transactionManager) {
return new StepBuilderFactory(jobRepository, transactionManager);
}
#Bean
Job streamJob() {
return jobBuilderFactory(null).get("streamJob")
.start(stepBuilderFactory(null, null).get("streamStep")
.<String, String>chunk(1000)
.reader(streamReader())
.writer(listWriter())
.build()
)
.build();
}
#Bean
ListItemWriter<String> listWriter() {
return new ListItemWriter<>();
}
#Bean
ItemReader<String> streamReader() {
return new IteratorItemReader<String>(stream().iterator());
}
#Bean
Stream<String> stream() {
return Stream.of(
IntStream.range(0, 100000).boxed().map(i -> {
return new String("A"+ RandomStringUtils.random(10, true, false));
}),
IntStream.range(0, 100000).boxed().map(i -> {
return new String("B"+ RandomStringUtils.random(10, true, false));
}),
IntStream.range(0, 100000).boxed().map(i -> {
return new String(RandomStringUtils.random(10, true, false) + "GH");
})
)
.flatMap(s -> s);
}
}
#Autowired
Job streamJob;
#Autowired
JobLauncher jobLauncher;
#Autowired
ListItemWriter<String> listWriter;
#Test
public void shouldExecuteTestJob() throws Exception {
JobExecution execution = jobLauncher.run(streamJob, new JobParametersBuilder().toJobParameters());
assertEquals(ExitStatus.COMPLETED, execution.getExitStatus());
assertThat(listWriter.getWrittenItems(), hasSize(300000));
}
}

Kafka + Spring Batch Listener Flush Batch

Using Kafka Broker: 1.0.1
spring-kafka: 2.1.6.RELEASE
I'm using a batched consumer with the following settings:
// Other settings are not shown..
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "100");
I use spring listener in the following way:
#KafkaListener(topics = "${topics}", groupId = "${consumer.group.id}")
public void receive(final List<String> data,
#Header(KafkaHeaders.RECEIVED_PARTITION_ID) final List<Integer> partitions,
#Header(KafkaHeaders.RECEIVED_TOPIC) Set<String> topics,
#Header(KafkaHeaders.OFFSET) final List<Long> offsets) { // ......code... }
I always find the a few messages remain in the batch and not received in my listener. It appears to be that if the remaining messages are less than a batch size, it isn't consumed (may be in memory and published to my listener). Is there any way to have a setting to auto-flush the batch after a time interval so as to avoid the messages not being flushed?
What's the best way to deal with such kind of situation with a batch consumer?

I just ran a test without any problems...
#SpringBootApplication
public class So50370851Application {
public static void main(String[] args) {
SpringApplication.run(So50370851Application.class, args);
}
#Bean
public ApplicationRunner runner(KafkaTemplate<String, String> template) {
return args -> {
for (int i = 0; i < 230; i++) {
template.send("so50370851", "foo" + i);
}
};
}
#KafkaListener(id = "foo", topics = "so50370851")
public void listen(List<String> in) {
System.out.println(in.size());
}
#Bean
public NewTopic topic() {
return new NewTopic("so50370851", 1, (short) 1);
}
}
and
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.enable-auto-commit=false
spring.kafka.consumer.max-poll-records=100
spring.kafka.listener.type=batch
and
100
100
30
Also, the debug logs shows after a while that it is polling and fetching 0 records (and this gets repeated over and over).
That implies the problem is on the sending side.

Spring batch partitioning is not working

I am using Spring Batch Partitioning to merge data from group of related flat files to a single file. The batch is failing with below two issues:
First slave step thread is failing as the data to file writer is written before it is opened. The value for variable inputFileNames (step context data provided by partitioner) for this thread is[20002", 20003]
Second slave step thread is failing as the partitioning data is missing from the step context. The value for variable inputFileNames for this thread is null
Please let me know if I am missing some thing in the configuration.
// log with Error info
2015-12-26 17:59:14,165 DEBUG [SimpleAsyncTaskExecutor-1] c.d.d.b.r.ReaderConfiguration [ReaderBatchConfiguration.java:473] inputFileNames ----[20002", 20003]
2015-12-26 17:59:14,165 DEBUG [SimpleAsyncTaskExecutor-1] c.d.d.b.r.BatchConfiguration [BatchConfiguration.java:389] consumer ----p2
2015-12-26 17:59:14,275 ERROR [SimpleAsyncTaskExecutor-1] o.s.b.c.s.AbstractStep [AbstractStep.java:225] Encountered an error executing step testConsumersInputFileMergeStep in job testFileForInputJob
org.springframework.batch.item.WriterNotOpenException: Writer must be open before it can be written to
at org.springframework.batch.item.file.FlatFileItemWriter.write(FlatFileItemWriter.java:255) ~[spring-batch-infrastructure-3.0.3.RELEASE.jar:3.0.3.RELEASE]
2015-12-26 18:00:14,421 DEBUG [SimpleAsyncTaskExecutor-2] c.d.d.b.r.ReaderBatchConfiguration [ReaderConfiguration.java:474] inputFileNames ----null
// Partitioner
public class ProvisioningInputFilePartitioner implements Partitioner {
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, ExecutionContext> filesToProcess = getFilesToProcess(outboundSourceFolder);
Map<String, ExecutionContext> execCtxs = new HashMap<>();
for(Entry<String, ExecutionContext> entry : filesToProcess.entrySet()) {
execCtxs.put(entry.getKey(), entry.getValue());
}
return execCtxs;
}
private Map<String, ExecutionContext> getFilesToProcess(String outboundSourceFolder2) {
Map<String, ExecutionContext> contexts = new HashMap<>();
ExecutionContext execCtx1 = new ExecutionContext();
List<String> inputFileNames1 = Arrays.asList("20001", "22222");
execCtx1.put("consumer", "p1");
execCtx1.put("inputFileNames", inputFileNames1);
contexts.put("p1", execCtx1);
ExecutionContext execCtx2 = new ExecutionContext();
List<String> inputFileNames2 = Arrays.asList("20002", "20003");
execCtx1.put("consumer", "p2");
execCtx1.put("inputFileNames", inputFileNames2);
contexts.put("p2", execCtx2);
return contexts;
}
}
// Writer
#Bean
#StepScope
public ItemWriter<String> testConsumerFileItemWriter (#Value("#{stepExecutionContext[consumer]}") String consumer){
logger.debug("consumer ----"+ consumer);
FileSystemResource fileSystemResource = new FileSystemResource(new File(outboundSourceFolder, consumer + ".txt"));
FlatFileItemWriter<String> fileItemWriter = new FlatFileItemWriter<>();
fileItemWriter.setResource(fileSystemResource);
fileItemWriter.setLineAggregator(new PassThroughLineAggregator<String>());
return fileItemWriter;
}
#Bean
public Partitioner provisioningInputFilePartitioner() {
return new ProvisioningInputFilePartitioner();
}
#Bean
public TaskExecutor taskExecutor() {
return new SimpleAsyncTaskExecutor();
}
// Reader
#Bean
#StepScope
public ItemReader<String> testInputFilesReader (#Value("#{stepExecutionContext[inputFileNames]}") List<String> inputFileNames) {
logger.debug("inputFileNames ----" + inputFileNames);
MultiResourceItemReader<String> multiResourceItemReader = new MultiResourceItemReader<String>();
...
return multiResourceItemReader;
}
// slave step
#Bean
public Step testConsumersInputFileMergeStep(StepBuilderFactory stepBuilder, ItemReader<String> testInputFilesReader,
ItemWriter<String> testConsumerFileItemWriter){
return stepBuilder.get("testConsumersInputFileMergeStep").<String, String>chunk(1).reader(testInputFilesReader)
.writer(testConsumerFileItemWriter).build();
}
// master step
#Bean
public Step testConsumersFilePartitionerStep(StepBuilderFactory stepBuilder, Step testConsumersInputFileMergeStep, Partitioner provisioningInputFilePartitioner,
TaskExecutor taskExecutor ){
return stepBuilder.get("testConsumersFilePartitionerStep").partitioner(testConsumersInputFileMergeStep)
.partitioner("testConsumersInputFileMergeStep", provisioningInputFilePartitioner)
.taskExecutor(taskExecutor)
.build();
}
//Job
#Bean
public Job testFileForInputJob(JobBuilderFactory factory, Step testFileForInputStep, Step testConsumersFilePartitionerStep) {
return factory.get("testFileForInputJob").incrementer(new RunIdIncrementer()).start(testConsumersFilePartitionerStep).build();
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to commit the offsets when using KafkaItemReader in spring batch job, once all the messages are processed and written to the .dat file? - apache-kafka

Related

Pass filenames dynamically to FlatFileItemWriter through StepBuilderFactory stream() when using ClassifierCompositeItemProcessor in SpringBatch

Kafka: Consumer api: Regression test fails if runs in a group (sequentially)

Custome ItemReader for reading each iteration of a for loop

Kafka + Spring Batch Listener Flush Batch

Spring batch partitioning is not working

Categories

Resources