Insert & Update from single Spring Batch ItemReader - spring-batch

My process transforms the data into SCD2 pattern. Thus, any update in the source data culminates into updating the end_date & active_ind in the dimension table and inserting a new record.
I have configured the SQL in an ItemReader implementation which identifies the records which got changed in the source data.
I need help/suggestion on how to route the data to 2 writers, 1 each for update & insert?

There is a general pattern in Spring for this type of use case and not necessarily for Spring Batch using Classifier Interface.
You can use BackToBackPatternClassifier implementation of this interface.
Additionally, you need to use Spring Batch provided ClassifierCompositeItemWriter.
Here is a summary of steps:
The POJO/Java Bean that is passed on to writer should have some kind of String field that can identify the target ItemWriter for that POJO.
Then you write a Classifier that returns that String type for each POJO like this:
public class UpdateOrInsertClassifier {
#Classifier
public String classify(WrittenMasterBean writtenBean){
return writtenBean.getType();
}
}
and
#Bean
public UpdateOrInsertClassifier router() {
return new UpdateOrInsertClassifier();
}
I assume that WrittenMasterBean is POJO that you sent to either of writers and it has a private String type; field This Classifier is your router.
Then you implement BackToBackPatternClassifier like -
#Bean
public Classifier classifier() {
BackToBackPatternClassifier classifier = new BackToBackPatternClassifier();
classifier.setRouterDelegate(router());
Map<String,ItemWriter<WrittenMasterBean>> writerMap = new HashMap();
writerMap.put("Writer1", writer1());
writerMap.put("Writer2", writer2());
classifier.setMatcherMap(writerMap);
return classifier;
}
i.e. I assume that keys Writer1 and Writer2 will identify your writers for that particular bean.
writer1() and writer2() return actual ItemWriter beans.
BackToBackPatternClassifier needs two fields - one router classifier and another matcher map.
Restriction is that keys are Strings in this classifier. You can't use any other type of keys.
Pass on BackToBackPatternClassifier to ClassifierCompositeItemWriter - You need to use Spring Batch provided ClassifierCompositeItemWriter
#Bean
public ItemWriter<WrittenMasterBean> classifierWriter(){
ClassifierCompositeItemWriter<WrittenMasterBean> writer = new ClassifierCompositeItemWriter();
writer.setClassifier(classifier());
return writer;
}
You configure this - classifierWriter() into your Step .
Then you are good to go.

Related

Axon Framework - Configuring Multiple EventStores in Axon Configuration

We are having an usecase wherein each aggregate root should have different eventstores. We have used the following configuration where currently , we have only one event-store configured as below
#Configuration
#EnableDiscoveryClient
public class AxonConfig {
private static final String DOMAIN_EVENTS_COLLECTION_NAME = "coll-capture.domainEvents";
//private static final String DOMAIN_EVENTS_COLLECTION_NAME_TEST =
//"coll-capture.domainEvents-test";
#Value("${mongodb.database}")
private String databaseName;
#Value("${spring.application.name}")
private String appName;
#Bean
public RestTemplate restTemplate() {
CloseableHttpClient httpClient = HttpClientBuilder.create().build();
HttpComponentsClientHttpRequestFactory clientHttpRequestFactory = new
HttpComponentsClientHttpRequestFactory(httpClient);
return new RestTemplate(clientHttpRequestFactory);
}
#Bean
#Profile({"uat", "prod"})
public CommandRouter springCloudHttpBackupCommandRouter(DiscoveryClient discoveryClient,
Registration localInstance,
RestTemplate restTemplate,
#Value("${axon.distributed.spring-
cloud.fallback-url}") String messageRoutingInformationEndpoint) {
return new SpringCloudHttpBackupCommandRouter(discoveryClient,
localInstance,
new AnnotationRoutingStrategy(),
serviceInstance -> appName.equalsIgnoreCase(serviceInstance.getServiceId()),
restTemplate,
messageRoutingInformationEndpoint);
}
#Bean
public Repository<TestEnquiry> testEnquiryRepository(EventStore eventStore) {
return new EventSourcingRepository<>(TestEnquiry.class, eventStore);
}
#Bean
public Repository<Test2Enquiry> test2enquiryRepository(EventStore eventStore) {
return new EventSourcingRepository<>(Test2Enquiry.class, eventStore);
}
#Bean
public EventStorageEngine eventStorageEngine(MongoClient client) {
MongoTemplate mongoTemplate = new DefaultMongoTemplate(client, databaseName)
.withDomainEventsCollection(DOMAIN_EVENTS_COLLECTION_NAME);
return new MongoEventStorageEngine(mongoTemplate);
}
}
Now , We want to configure "DOMAIN_EVENTS_COLLECTION_NAME_TEST"(just for example) as well in EventStorageEngine. How we can achieve the same support for multiple event-stores and select the tracking process as which collection they should be part of
If you are going the route of segregating the event streams, then combining them from an event handling perspective could become a necessity indeed. Especially when having several bounded contexts, segregating the event streams into distinct storage solutions is reasonable.
If you want to define which [message source / event store] is used by a TrackingEventProcessor, you will have to deal with the EventProcessingConfigurer. More specifically, you should invoke the EventProcessingConfigurer#registerTrackingEventProcessor(String, Function<Configuration, StreamableMessageSource<TrackedEventMessage<?>>>) method. The first String parameter is the name of the processor you want to configure as being "tracking". The second parameter defines a Function which gives you the message source to be used by this TrackingEventProcessor (TEP). It is here where you should provide the event store you want this TEP to ingest events from.
Pairing them up at a later stage could also occur of course, which is also supported by Axon Framework. This boils down to a specific form of StreamableMessageSource implementation.
More specifically, you can use the MultiStreamableMessageSource, where you can connect any number of StreamableMessageSources together.
Note that Axon's EmbeddedEventStore is in essence an implementation of a StreamableMessageSource. Once the MultiStreamableMessageSource, you will have to specify it as the messageSource for your TrackingEventProcessors of course.
Last note, know that this solution can only be used when you are using TrackingEventProcessors, as those are the only Event Processors provided by Axon ingesting a StreamableMessageSource as the source for it's events.

Writing object fields with fixed length to file with Spring batch

Spring batch provides FixedLengthTokenizer to read data but I do not see FixedLengthLineAggreator. How do I write an object into a flat file whereby the different fields should be written into the file with fixed length.
You can do this with FormatterLineAggregator. Just set your fields and set your formats using the String.format() syntax.
#Bean
public FormatterLineAggregator<MyObject> myLineAggregator() {
FormatterLineAggregator<MyObject> lineAggregator = new FormatterLineAggregator<>();
lineAggregator.setFieldExtractor(myBeanWrapperFieldExtractor());
lineAggregator.setFormat("%-5s%-09d%20s");
return lineAggregator;
}
#Bean
public BeanWrapperFieldExtractor<MyObject> myBeanWrapperFieldExtractor() {
BeanWrapperFieldExtractor<MyObject> fieldExtractor = new BeanWrapperFieldExtractor<MyObject>();
fieldExtractor.setNames(new String[]{"fieldOne", "fieldTwo", "fieldThree"});
return fieldExtractor;
}

Accessing Pipeline within DoFn

I'm writing a pipeline to replicate data from one source to another. Info about data sources is stored in db (BQ). How I can use this data it to build read/write endpoints dynamically?
I tried to pass Pipeline object to my custom DoFn but it can't be serialized. Later I tried to call method getPipeline() on a passed view but it doesn't work as well. -- which is actually expected
I can't know all tables I need to serialize in advance so I have to read all data from db (or any other source).
// builds some random view
PCollectionView<IdWrapper> idView = ...;
// reads tables meta and replicates data per each table
pipeline.apply(getTableMetaEndpont().read())
.apply(ParDo.of(new MyCustomReplicator(idView)).withSideInputs(idView))
private static class MyCustomReplicator extends DoFn<TableMeta, TableMeta> {
private final PCollectionView<IdWrapper> idView;
private DataReplicator(PCollectionView<IdWrapper> idView) {
this.idView = idView;
}
// TableMeta {string: sourceTable, string: destTable}
#ProcessElement
public void processElement(#Element TableMeta tableMeta, ProcessContext ctx) {
long id = ctx.sideInput(idView).getValue();
// builds read endpoint which depends on table meta
// updates entities
// stores entities using another endpoint
idView
.getPipeline()
.apply(createReadEndpoint(tableMeta).read())
.apply(ParDo.of(new SomeFunction(tableMeta, id)))
.apply(createWriteEndpoint(tableMeta).insert());
ctx.output(tableMetadata);
}
}
I expect it to replicate data specified by TableMeta but I can't use pipeline within DoFn object because it can't be serialized/deserialized.
Is there any way to implement the intended behavior?

Any way I can change in runtime mongo document name

In the project we need to change collection name suffix everyday based on date.
So one day collection is named:
samples_22032019
and in the next day it is
samples_23032019
Everyday I need to change suffix and recompile spring-boot application because of this. Is there any way I can change this so the collection/table can be calculated dynamically based on current date? Any advice for MongoRepository?
Considering the below is your bean. you can use #Document annotation with spring expression language to resolve suffix at runtime. Like show below,
#Document(collection = "samples_#{T(com.yourpackage.Utility).getDateSuffix()}")
public class Samples {
private String id;
private String name;
}
Now have your date change function in a Utility method which spring can resolve at runtime. SpEL is handy in such scenarios.
package com.yourpackage;
public class Utility {
public static final String getDateSuffix() {
//Add your real logic here, below is for representational purpose only.
return DateTime.now().toDate().toString();;
}
}
HTH!
Make a cron job to run daily and generateNewName for your collection and execute the below code. Here I am getting collection using MongoDatabse than by using MongoNamespace we can rename the collection.
To get old/new collection name you can write a separate method.
#Component
public class RenameCollectionTask {
#Scheduled(cron = "${cron}")
public void renameCollection() {
// creating mongo client object
final MongoClient client = new MongoClient(HOST_NAME, PORT);
// selecting the mongo database
final MongoDatabase database = client.getDatabase("databaseName");
// selecting the mongo collection
final MongoCollection<Document> collection = database.getCollection("oldCollectionName");
// creating namespace
final MongoNamespace newName = new MongoNamespace("databaseName", "newCollectionName");
// renaming the collection
collection.renameCollection(newName);
System.out.println("Collection has been renamed");
// closing the client
client.close();
}
}
To assign the name of the collection you can refer this so that every time restart will not be required.
The renameCollection() method has the following limitations:
1) It cannot move a collection between databases.
2) It is not supported on sharded collections.
3) You cannot rename the views.
Refer this for detail.

Spring Batch - Delegated Item Writer

I currently have the following spring batch (v2.2.4) job which reads from a single datasource and then creates three distinct output formats. I have three steps each with a standard reader, processsor and writer bean. I'm using brackets below to denote which format each processor or writer is using. Each of the processors in this example return a SqlParameterSource object to the writer bean.
Job
Step 1 - PolicyDetails
R -> P(pd) -> W(pd)
Step 2 - PolicyCharge
R -> P(pc) -> W(pc)
Step 3 - PolicyFund
R -> P(pf) -> W(pf)
I don't like the fact that i'm reading the same data three times so I'm planning on using a Composite Processor in a new job. The square brackets denote the composite processors. I'm unsure what interface my new Writer (Wn) should implement since it will have to handle or delegate the writing of three different object types.
Option 1 Job
Step 1
R -> [P(pd) -> P(pc) -> P(pf)] -> Wn(pd,pc,pf)
I'm wondering is there an existing spring batch writer class that supports delegating based on different input types?
EDIT
I've defined this wrapper interface which each of the Processors in Option 1 which will return.
/**
* This interface allows us to wrap a SqlParameterSource and link it to a specific HedgingTable.
* The ClassifierCompositeItemWriter will use the HedgingTable to route the SqlParameterSource to the correct writer.
*/
public interface HedgingDataSqlParameterSource {
/**
* The specific table that the SqlParameterSource data should be written to.
*/
HedgingTable getHedgingTable();
/**
* The name value data for the insertion to the database table.
*/
SqlParameterSource getSQLParameterSource();
}
I've read up on the ClassifierCompositeItemWriter, but i'm still unsure how i can filter on the getHedgingTable() value. Do i reuse the existing SubclassClassifier class or define my own custom classifier.
EDIT 2
My first attempt at a custom Classifier implementation, which wraps the SubclassClassifier.
/**
* Return an ItemWriter for the provided HedgingDataSqlParameterSource based on reusing a SubclassClassifier
* which maps the specific HedgingTable type to a ItemWriter.
*/
public class HedgingTableClassifier implements Classifier<HedgingDataSqlParameterSource, ItemWriter<HedgingDataSqlParameterSource>> {
private SubclassClassifier<HedgingTable, ItemWriter<HedgingDataSqlParameterSource>> subclassClassifier = null;
public ItemWriter<HedgingDataSqlParameterSource> classify(HedgingDataSqlParameterSource classifiable) {
HedgingTable table = classifiable.getHedgingTable();
return subclassClassifier.classify(table);
}
public SubclassClassifier<HedgingTable, ItemWriter<HedgingDataSqlParameterSource>> getSubclassClassifier() {
return subclassClassifier;
}
public void setSubclassClassifier(
SubclassClassifier<HedgingTable, ItemWriter<HedgingDataSqlParameterSource>> subclassClassifier) {
this.subclassClassifier = subclassClassifier;
}
}
Take a look at the ClassifierCompositeItemWriter (http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/support/ClassifierCompositeItemWriter.html). This allows you to provide a Classifier which will delegate to the appropriate ItemWriter based on your logic.