I currently have the following spring batch (v2.2.4) job which reads from a single datasource and then creates three distinct output formats. I have three steps each with a standard reader, processsor and writer bean. I'm using brackets below to denote which format each processor or writer is using. Each of the processors in this example return a SqlParameterSource object to the writer bean.
Job
Step 1 - PolicyDetails
R -> P(pd) -> W(pd)
Step 2 - PolicyCharge
R -> P(pc) -> W(pc)
Step 3 - PolicyFund
R -> P(pf) -> W(pf)
I don't like the fact that i'm reading the same data three times so I'm planning on using a Composite Processor in a new job. The square brackets denote the composite processors. I'm unsure what interface my new Writer (Wn) should implement since it will have to handle or delegate the writing of three different object types.
Option 1 Job
Step 1
R -> [P(pd) -> P(pc) -> P(pf)] -> Wn(pd,pc,pf)
I'm wondering is there an existing spring batch writer class that supports delegating based on different input types?
EDIT
I've defined this wrapper interface which each of the Processors in Option 1 which will return.
/**
* This interface allows us to wrap a SqlParameterSource and link it to a specific HedgingTable.
* The ClassifierCompositeItemWriter will use the HedgingTable to route the SqlParameterSource to the correct writer.
*/
public interface HedgingDataSqlParameterSource {
/**
* The specific table that the SqlParameterSource data should be written to.
*/
HedgingTable getHedgingTable();
/**
* The name value data for the insertion to the database table.
*/
SqlParameterSource getSQLParameterSource();
}
I've read up on the ClassifierCompositeItemWriter, but i'm still unsure how i can filter on the getHedgingTable() value. Do i reuse the existing SubclassClassifier class or define my own custom classifier.
EDIT 2
My first attempt at a custom Classifier implementation, which wraps the SubclassClassifier.
/**
* Return an ItemWriter for the provided HedgingDataSqlParameterSource based on reusing a SubclassClassifier
* which maps the specific HedgingTable type to a ItemWriter.
*/
public class HedgingTableClassifier implements Classifier<HedgingDataSqlParameterSource, ItemWriter<HedgingDataSqlParameterSource>> {
private SubclassClassifier<HedgingTable, ItemWriter<HedgingDataSqlParameterSource>> subclassClassifier = null;
public ItemWriter<HedgingDataSqlParameterSource> classify(HedgingDataSqlParameterSource classifiable) {
HedgingTable table = classifiable.getHedgingTable();
return subclassClassifier.classify(table);
}
public SubclassClassifier<HedgingTable, ItemWriter<HedgingDataSqlParameterSource>> getSubclassClassifier() {
return subclassClassifier;
}
public void setSubclassClassifier(
SubclassClassifier<HedgingTable, ItemWriter<HedgingDataSqlParameterSource>> subclassClassifier) {
this.subclassClassifier = subclassClassifier;
}
}
Take a look at the ClassifierCompositeItemWriter (http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/support/ClassifierCompositeItemWriter.html). This allows you to provide a Classifier which will delegate to the appropriate ItemWriter based on your logic.
Related
I need to run multiple queries from a single .SQL file but with different params
I've tried something like this but it does not work as BigQueryIO.Read consumes only PBegin.
public PCollection<KV<String, TestDitoDto>> expand(PCollection<QueryParamsBatch> input) {
PCollection<KV<String, Section1Dto>> section1 = input.apply("Read Section1 from BQ",
BigQueryIO
.readTableRows()
.fromQuery(ResourceRetriever.getResourceFile("query/test/section1.sql"))
.usingStandardSql()
.withoutValidation())
.apply("Convert section1 to Dto", ParDo.of(new TableRowToSection1DtoFunction()));
}
Are there any other ways to put params from existing PCollection inside my BigQueryIO.read() invocation?
Are different queries/parameters available in the pipeline construction time ? If so you could just create multiple read transforms and combine results, for example, using a Flatten transform.
Beam Java BigQuery source does not support reading a PCollection of queries currently. Python BQ source does though.
I've come up with the following solution: not to use BigQueryIO but regular GCP library for accessing BigQuery, marking it as transient and initializing it each time in method with #Setup annotation, as it is not Serializable
public class DenormalizedCase1Fn extends DoFn<*> {
private transient BigQuery bigQuery;
#Setup
public void initialize() {
this.bigQuery = BigQueryOptions.newBuilder()
.setProjectId(bqProjectId.get())
.setLocation(LOCATION)
.setRetrySettings(RetrySettings.newBuilder()
.setRpcTimeoutMultiplier(1.5)
.setInitialRpcTimeout(Duration.ofSeconds(5))
.setMaxRpcTimeout(Duration.ofSeconds(30))
.setMaxAttempts(3).build())
.build().getService();
}
#ProcessElement
...
I'm writing a pipeline to replicate data from one source to another. Info about data sources is stored in db (BQ). How I can use this data it to build read/write endpoints dynamically?
I tried to pass Pipeline object to my custom DoFn but it can't be serialized. Later I tried to call method getPipeline() on a passed view but it doesn't work as well. -- which is actually expected
I can't know all tables I need to serialize in advance so I have to read all data from db (or any other source).
// builds some random view
PCollectionView<IdWrapper> idView = ...;
// reads tables meta and replicates data per each table
pipeline.apply(getTableMetaEndpont().read())
.apply(ParDo.of(new MyCustomReplicator(idView)).withSideInputs(idView))
private static class MyCustomReplicator extends DoFn<TableMeta, TableMeta> {
private final PCollectionView<IdWrapper> idView;
private DataReplicator(PCollectionView<IdWrapper> idView) {
this.idView = idView;
}
// TableMeta {string: sourceTable, string: destTable}
#ProcessElement
public void processElement(#Element TableMeta tableMeta, ProcessContext ctx) {
long id = ctx.sideInput(idView).getValue();
// builds read endpoint which depends on table meta
// updates entities
// stores entities using another endpoint
idView
.getPipeline()
.apply(createReadEndpoint(tableMeta).read())
.apply(ParDo.of(new SomeFunction(tableMeta, id)))
.apply(createWriteEndpoint(tableMeta).insert());
ctx.output(tableMetadata);
}
}
I expect it to replicate data specified by TableMeta but I can't use pipeline within DoFn object because it can't be serialized/deserialized.
Is there any way to implement the intended behavior?
My process transforms the data into SCD2 pattern. Thus, any update in the source data culminates into updating the end_date & active_ind in the dimension table and inserting a new record.
I have configured the SQL in an ItemReader implementation which identifies the records which got changed in the source data.
I need help/suggestion on how to route the data to 2 writers, 1 each for update & insert?
There is a general pattern in Spring for this type of use case and not necessarily for Spring Batch using Classifier Interface.
You can use BackToBackPatternClassifier implementation of this interface.
Additionally, you need to use Spring Batch provided ClassifierCompositeItemWriter.
Here is a summary of steps:
The POJO/Java Bean that is passed on to writer should have some kind of String field that can identify the target ItemWriter for that POJO.
Then you write a Classifier that returns that String type for each POJO like this:
public class UpdateOrInsertClassifier {
#Classifier
public String classify(WrittenMasterBean writtenBean){
return writtenBean.getType();
}
}
and
#Bean
public UpdateOrInsertClassifier router() {
return new UpdateOrInsertClassifier();
}
I assume that WrittenMasterBean is POJO that you sent to either of writers and it has a private String type; field This Classifier is your router.
Then you implement BackToBackPatternClassifier like -
#Bean
public Classifier classifier() {
BackToBackPatternClassifier classifier = new BackToBackPatternClassifier();
classifier.setRouterDelegate(router());
Map<String,ItemWriter<WrittenMasterBean>> writerMap = new HashMap();
writerMap.put("Writer1", writer1());
writerMap.put("Writer2", writer2());
classifier.setMatcherMap(writerMap);
return classifier;
}
i.e. I assume that keys Writer1 and Writer2 will identify your writers for that particular bean.
writer1() and writer2() return actual ItemWriter beans.
BackToBackPatternClassifier needs two fields - one router classifier and another matcher map.
Restriction is that keys are Strings in this classifier. You can't use any other type of keys.
Pass on BackToBackPatternClassifier to ClassifierCompositeItemWriter - You need to use Spring Batch provided ClassifierCompositeItemWriter
#Bean
public ItemWriter<WrittenMasterBean> classifierWriter(){
ClassifierCompositeItemWriter<WrittenMasterBean> writer = new ClassifierCompositeItemWriter();
writer.setClassifier(classifier());
return writer;
}
You configure this - classifierWriter() into your Step .
Then you are good to go.
I am working on implementing a state machine for a workflow management system based on the Stateless4j API. However, I am not able to find an effective way to persist the states and transitions in Stateless4j.
As part of our usecases, we have the requirement to keep States alive for more than 3 - 4 days until the user returns to the workflow. And we will have more than one workflow running concurrently.
Can you please share your insights on the best practices to persist states in Stateless4j based State Machine implementation?
It looks like what you need to do is construct your StateMachine with a custom accessor and mutator, something like this:
public class PersistentMutator<S> implements Action1<S> {
Foo foo = null;
#Inject
FooRepository fooRepository;
public PersistentMutator(Foo foo) {
this.foo = foo;
}
#Override
public void doIt(S s) {
foo.setState(s);
fooRepository.save(foo)
}
}
Then you want to call the constructor with your accessors and mutators:
/**
* Construct a state machine with external state storage.
*
* #param initialState The initial state
* #param stateAccessor State accessor
* #param stateMutator State mutator
*/
public StateMachine(S initialState, Func<S> stateAccessor, Action1<S> stateMutator, StateMachineConfig<S, T> config) {
this.config = config;
this.stateAccessor = stateAccessor;
this.stateMutator = stateMutator;
stateMutator.doIt(initialState);
}
Alternatively, you may want to look at StatefulJ. It has built in support for atomically updating state in both JPA and Mongo out of the box. This may save you some time.
Disclaimer: I'm the author of StatefulJ
I am trying to use apache commons pool to create a pool of 'objects'. Since I already have an object factory which takes a string type argument and create a right type of object I want to use this factory.
But the problem is that none of the signatures of generic pool object allow me to pass a factory which takes arguments.
//This is a wrapper class that holds an Object pool
Class INService {
private ObjectPool<INConnection> pool_ = null;
/**
* Constructs an instance of INService, given a pool size
* and a class which implements INHandler interface.
* #param poolSize - size of the service pool
* #param c - the class which handles the INHandler service interface.
*/
public INService(int poolSize, String objectType) {
pool_ = new GenericObjectPool<INConnection>(factory, Objecttype); // won't compile.
}
...
}
The PoolableObjectfactory interface defines methods like makeObject, destroyObject, validateObject, activateObject and passivateObject. But no makeObject() method which takes parameters.
It seems that the only way I can do this is to write multiple factory classes for each type of object and write an if-else stuff, like:
public INService(int poolSize, String objectType) {
if (objectType.equals("scap")
pool_ = new GenericObjectPool<INConnection>(scapFactory);
else if (objectType.equals("ucip")
pool_ = new GenericObjectPool<INConnection>(ucipFactory);
...
}
Or, is there any elegant way, instead of duplicating/creating several factory classes just for this sake?
You should read up on the KeyedObjectPool<K,V> interface which can also be found in commons-pool.
From its javadoc:
A keyed pool pools instances of multiple types. Each type may be accessed using an arbitrary key.
You could then implement a KeyedPoolableObjectFactory<K,V> to make instances based on the key parameter, it has the makeObject(K key) function you are looking for.
PS: It appears you haven't marked any answers to your questions as "accepted", you might want to work on that.