Spring batch writing to different tables conditionally with ItemWriter (ItemProcessor output to Table_A if success or Table_B if fail) - spring-batch

I am reading from source table (using JpaPagingItemReader) and passing to ItemProcessor.
My requirement is if Item is processed successfully then it should write to TABLE_A and if processing failed then write to TABLE_B.
I got it working, but I dont feel it as nice way.
My current implementation is
// my processor
public class MyItemProcessor implements ItemProcessor<SourceEntity, BaseOutputEntity>{
#Override
public BaseOutputEntity process(SourceEntity input) {
// NOTE: EntityA, EntityB both extend BaseOutputEntity
try {
EntityA a = callMyBusiness.method(input);
return a;
} catch (MyBusinessException e) {
EntityB b = createMyFailureObj(input)
return b;
}
}
}
// my itemwriter
public class MyItemWriter extends JpaItemWriter<MyBaseOutputEntity> {
// donthing as JpaItemWriter methods will take care
}
It is doing functionally what exactly I want.
One drawback of above is when I see job execution / step execution history, I can't know how many are successful or how many are failure, as it shows e.g. if 100 reads then 100 writes.
Can anyone suggest better approach. Are conditional steps useful here?

You can throw an exception on your processor, and declare this Exception as Skipable (if not, chunck will be broken).
If you implements an ItemProcessListener you can catch the invalid item on the onProcessError(Entry item, Exception t) function and write it on the table B.
(Read the documentation carefully: Some listeners functions are on transactions, others not)
At the end of the batch, writedItemsCount is the number of valids item, skippedItemCount is the number of invalid items.
Other way to write in different tables is use the ClassifierCompositeItemWriter with the BackToBackPatternClassifier but you loose the count of invalid items.

Related

Split incoherent aggregates

the most common way to implement aggregates is to create a god class with enum status and "if-ladder", like below:
public class Order {
private OrderId id;
private PropertyA a;
private PropertyB b;
private OrderStatus status;
public void doSthWithA() {
if(status != OrderStatus.A) {
//throw illegal argument
}
//do sth with PropertyA
status = OrderStatus.B;
}
public void doSthWithB() {
if(status != OrderStatus.B) {
//throw illegal argument
}
//do sth with PropertyB
}
}
Order class isn not coherent, because doSthWithA uses PropertyA and doSthWithB uses PropertyB.
Isn't better way doing it in this way:
public class OrderA {
private OrderId id;
private PropertyA a;
public OrderB doSthWithA() {
//do sth with PropertyA
return new OrderB(id);
}
}
public class OrderB {
private OrderId id;
private PropertyB b;
public void doSthWithB() {
//do sth with PropertyB
}
}
?
Anyway I have a question. We could persist both aggregates in one table ORDER or two tables: ORDER_A and ORDER_B.
But what are strategies to test which order state is newest?
Let's assume that someone saves OrderA to DB, next executes doSthWithA and saves OrderB to DB.
Futher when we do some query how could we resolve the newest state? Should we add some version or timestamp to aggregates?
And what about REST services?
With one god class Order the REST services could look like:
/orders/{id}/do-sth-with-a and
/orders/{id}/do-sith-with-b
With second solution could we have:
/a-orders/{id}/do-sth-with-a and
/b-orders/{id}/do-sith-with-b
?
Isn't better way doing it in this way
Not necessarily better, because there are trade offs. However, it is common that the benefits of a design with smaller aggregates outweigh the costs.
Vaughn Vernon, in Implementing Domain Driven Design, proposes the rule "Design Small Aggregates".
Roughly, each aggregate "should" enclose coupled information; typically graphs of values that must be internally "consistent". If you find that your values form two discrete sets that have only a single identifier in common, that's a good sign that there is an opportunity to reduce the aggregate further.
We could persist both aggregates in one table ORDER or two tables: ORDER_A and ORDER_B. But what are strategies to test which order state is newest?
Real answer: if you actually care about "newest", then you should be modeling time in your domain logic.
It's not uncommon to throw general purpose timing information into a design, but you want to be careful about entangling general purpose timing used for operation and analysis from your domain timing.
And what about REST services?
Your resource model is not your domain model. Having a single "god" resource in your resource model has a completely different set of trade offs from "god" aggregates in your domain model.
It's completely normal to have one web resource that renders information from multiple aggregates.

Spring Batch : How can we pre load values from DB and it will be used in processor section

I have a requirement where i need to lookup few tables in ItemProcessor section. I dont want to make multiple JDBC call for each row in the ItemProcessor section where it might lead to performance issue when the spring batch started to process more number of records. What are the workarounds to avoid this situation? is there any way to preload these objects before the ItemProcessor or before batch starts and can refer it in ItemProcessor ?
You can use annotate your method with #PostConstruct to read data during the Spring application context initialization. Make your ItemReader's read method returns value from the list. When entire list is completed return null. This stops reading.
#Service
public class YourItemReader implements ItemReader<DomainObject> {
private int index;
List<DomainObject> dbRows;
#PostConstruct
public void init() {
List<DomainObject> //read from database
}
#Override
public DomainObject read(){
if (null != dbRows && index < dbRows.size()) {
return dbRows.get(index);
}
return null;
}
If the number of records are in millions, I would suggest to do a chunk based read from your database instead of reading all the records at once which might case Garbage collector out of memory exception. This can be done easily by adding a column called STATUS to your table to track the status of the records that are processed. Initially when you load data to your table, set the status as 'NOT PROCESSED' and when your ItemReader reads the chunk of records set the status to 'IN PROGRESS'. Once your ItemProcessor or ItemWriter completes its processing, change the status from 'IN PROGRESS' to 'PROCESSED'. Make sure to make the method which fetches the data from the database as 'synchronized'. This will make sure multiple threads not to fetch the same data from database.
public List<DomainObject> read(){
return fetchDataFromDb();
}
private synchronized List<DomainObject> fetchProductAssociationData(){
//read your chunk-size of records from database which has status as 'NOT
PROCESSED'
and change the status of the data which is read to 'IN PROGRESS'
return list;
}

Does a FlowableOperator inherently supports backpressure?

I've implemented an FlowableOperator as described in the RxJava2 wiki (https://github.com/ReactiveX/RxJava/wiki/Writing-operators-for-2.0#operator-targeting-lift) except that I perform some testing in the onNext() operation something like that:
public final class MyOperator implements FlowableOperator<Integer, Integer> {
...
static final class Op implements FlowableSubscriber<Integer>, Subscription {
#Override
public void onNext(Integer v) {
if (v % 2 == 0) {
child.onNext(v * v);
}
}
...
}
}
This operator is part of a chain where I have a Flowable created with a backpressure drop. In essence, it looks almost like this:
Flowable.<Integer>create(emitter -> myAction(), DROP)
.filter(v -> v > 2)
.lift(new MyOperator())
.subscribe(n -> doSomething(n));
I've met the following issue:
backpressure occurs, so doSomething(n) cannot handle the upcoming upstream
items are dropped due to the Backpressure strategy chosen
but doSomething(n) never receives back new item after the drop has been performed and while doSomething(n) was ready to deal with new items
Reading back the excellent blog post http://akarnokd.blogspot.fr/2015/05/pitfalls-of-operator-implementations.html of David Karnok, it's seems that I need to add a request(1) in the onNext() method. But that was with RxJava1...
So, my question is: is this fix enough in RxJava2 to deal with my backpressure issue? Or do my operator have to implement all the stuff about Atomics, drain stuff described in https://github.com/ReactiveX/RxJava/wiki/Writing-operators-for-2.0#atomics-serialization-deferred-actions to properly handle my backpressure issue?
Note: I've added the request(1) and it seems to work. But I can't figure out whether it's enough or whether my operator needs the tricky stuff of queue-drain and atomics.
Thanks in advance!
Does a FlowableOperator inherently supports backpressure?
FlowableOperator is an interface that is called for a given downstream Subscriber and should return a new Subscriber that wraps the downstream and modulates the Reactive Streams events passing in one or both directions. Backpressure support is the responsibility of the Subscriber implementation, not this particular functional interface. It could have been Function<Subscriber, Subscriber> but a separate named interface was deemed more usable and less prone to overload conflicts.
need to add a request(1) in the onNext() [...]
But I can't figure out whether it's enough or whether my operator needs the tricky stuff of queue-drain and atomics.
Yes, you have to do that in RxJava 2 as well. Since RxJava 2's Subscriber is not a class, it doesn't have v1's convenience request method. You have to save the Subscription in onSubscribe and call upstream.request(1) on the appropriate path in onNext. For your case, it should be quite enough.
I've updated the wiki with a new section explaining this case explicitly:
https://github.com/ReactiveX/RxJava/wiki/Writing-operators-for-2.0#replenishing
final class FilterOddSubscriber implements FlowableSubscriber<Integer>, Subscription {
final Subscriber<? super Integer> downstream;
Subscription upstream;
// ...
#Override
public void onSubscribe(Subscription s) {
if (upstream != null) {
s.cancel();
} else {
upstream = s; // <-------------------------
downstream.onSubscribe(this);
}
}
#Override
public void onNext(Integer item) {
if (item % 2 != 0) {
downstream.onNext(item);
} else {
upstream.request(1); // <-------------------------
}
}
#Override
public void request(long n) {
upstream.request(n);
}
// the rest omitted for brevity
}
Yes you have to do the tricky stuff...
I would avoid writing operators, except if you are very sure what you are doing? Nearly everything can be achieved with the default operators...
Writing operators, source-like (fromEmitter) or intermediate-like
(flatMap) has always been a hard task to do in RxJava. There are many
rules to obey, many cases to consider but at the same time, many
(legal) shortcuts to take to build a well performing code. Now writing
an operator specifically for 2.x is 10 times harder than for 1.x. If
you want to exploit all the advanced, 4th generation features, that's
even 2-3 times harder on top (so 30 times harder in total).
There is the tricky stuff explained: https://github.com/ReactiveX/RxJava/wiki/Writing-operators-for-2.0

Proper way to write a spring-batch ItemReader

I'm constructing a spring-batch job that modifies a given number of records. The list of record ID's are an input parameter of the job. For example, one job might be: Modify the record Id's {1,2,3,4} and set parameters X and Y on related tables.
Since I'm unable to pass a potentialy very long input list (tipical cases, 50K records) to my ItemReader I only pass a MyJobID which then the itemReader uses to load the target ID list.
Problem is, the resulting code appears "wrong" (altough it works) and not in the spirit of spring-batch. Here's the reader:
#Scope(value = "step", proxyMode = ScopedProxyMode.INTERFACES)
#Component
public class MyItemReader implements ItemReader<Integer> {
#Autowired
private JobService jobService;
private List<Integer> itemsList;
private Long jobId;
#Autowired
public MyItemReader(#Value("#{jobParameters['jobId']}") final Long jobId) {
this.jobId = jobId;
this.itemsList = null;
}
#Override
public Integer read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
// First pass: Load the list.
if (itemsList == null) {
itemsList = new ArrayList<Integer>();
MyJob myJob = (MyJob) jobService.loadById(jobId);
for (Integer i : myJob.getTargedIdList()) {
itemsList.add(i);
}
}
// Serve one at a time:
if (itemsList.isEmpty()) {
return null;
} else {
return itemsList.remove(0);
}
}
}
I tried to move the first part of the read() method to the constructor but the #Autowired reference is null at that point. Afterwards (on the read method) it's initialized.
Is there a better way to write the ItemReader? I would like to move the "load"Or is this the best solution for this scenario?
Thank you.
Generally, your approach is not "wrong", but probably not ideal.
Firstly, you could move the initialisation to a initMethod which is annotated with #PostConstruct. This method is called after all Autowired fields have been injected:
#PostConstruct
public void afterPropertiesSet() throws Exception {
itemsList = new ArrayList<Integer>();
MyJob myJob = (MyJob) jobService.loadById(jobId);
for (Integer i : myJob.getTargedIdList()) {
itemsList.add(i);
}
}
But there is still the problem, that you load all the data at once. If you have a billion records to process, this could blow up the memory.
So what you should do is to load only a chunk of your data into memory, then return the items one by one in your read method. If all entries of a chunk have been returned, load the next chunk and return its items one by one again. If there is no other chunk to be loaded, then return null from the read method.
This ensures that you have a constant memory footprint regardless of how many records you have to process.
(If you have a look at FlatFileItemReader, you see that it uses a BufferedReader to read the data from the disk. While it has nothing to do with SpringBatch, it is the same principle: it reads a chunk of data from the disk, returns that and if more data is needed, it reads the next chunk of data).
Next problem is the restartability. What happens if the job crashes after doing 90% of the work? How can the job be restarted and only process the missing 10%?
This is actually a feature that springbatch provides, all you have to do is to implement the ItemStream interface and implement the methods open(), update(), close().
If you consider this two points - load data in chunks instead all at once and implement ItemStream interface - you'll end up having a reader that is in the spring spirit.

Generic way to initialize a JPA 2 lazy association

So, the question at hand is about initializing the lazy collections of an "unknown" entity, as long as these are known at least by name. This is part of a more wide effort of mine to build a generic DataTable -> RecordDetails miniframework in JSF + Primefaces.
So, the associations are usually lazy, and the only moment i need them loaded is when someone accesses one record of the many in the datatable in order to view/edit it. The issues here is that the controllers are generic, and for this I also use just one service class backing the whole LazyLoading for the datatable and loading/saving the record from the details section.
What I have with come so far is the following piece of code:
public <T> T loadWithDetails(T record, String... associationsToInitialize) {
final PersistenceUnitUtil pu = em.getEntityManagerFactory().getPersistenceUnitUtil();
record = (T) em.find(record.getClass(), pu.getIdentifier(record));
for (String association : associationsToInitialize) {
try {
if (!pu.isLoaded(record, association)) {
loadAssociation(record, association);
}
} catch (..... non significant) {
e.printStackTrace(); // Nothing else to do
}
}
return record;
}
private <T> void loadAssociation(T record, String associationName) throws IntrospectionException, InvocationTargetException, IllegalAccessException, NoSuchFieldException {
BeanInfo info = Introspector.getBeanInfo(record.getClass(), Object.class);
PropertyDescriptor[] props = info.getPropertyDescriptors();
for (PropertyDescriptor pd : props) {
if (pd.getName().equals(associationName)) {
Method getter = pd.getReadMethod();
((Collection) getter.invoke(record)).size();
}
}
throw new NoSuchFieldException(associationName);
}
And the question is, did anyone start any similar endeavor, or does anyone know of a more pleasant way to initialize collections in a JPA way (not Hibernate / Eclipselink specific) without involving reflection?
Another alternative I could think of is forcing all entities to implement some interface with
Object getId();
void loadAssociations();
but I don't like the idea of forcing my pojos to implement some interface just for this.
With the reflection solution you would suffer the N+1 effect detailed here: Solve Hibernate Lazy-Init issue with hibernate.enable_lazy_load_no_trans
You could use the OpenSessionInView instead, you will be affected by the N+1 but you will not need to use reflection. If you use this pattern your transaction will remain opened until the end of the transaction and all the LAZY relationships will be loaded without a problem.
For this pattern you will need to do a WebFilter that will open and close the transaction.