How to set max no of records read in flatfileItemReader? - spring-batch

My application needs only fixed no of records to be read
& processed. How to limit this if I am using a flatfileItemReader ?
In DB based Item Reader, I am returning null/empty list when max_limit is reached.
How to achieve the same if I am using a org.springframework.batch.item.file.FlatFileItemReader ?

For the FlatFileItemReader as well as any other ItemReader that extends AbstractItemCountingItemStreamItemReader, there is a maxItemCount property. By configuring this property, the ItemReader will continue to read until either one of the following conditions has been met:
The input has been exhausted.
The number of items read equals the maxItemCount.
In either of the two above conditions, null will be returned by the reader, indicating to Spring Batch that the input is complete.
If you have any custom ItemReader implementations that need to satisfy this requirement, I'd recommend extending the AbstractItemCountingItemStreamItemReader and going from there.

The best approch is to write a delegate which is responsible to track down number of read records and stop after a fixed count; the components should take care of execution context to allow restartability
class CountMaxReader<T> implements ItemReader<T>,ItemStream
{
private int count = 0;
private int max = 0;
private ItemReader<T> delegate;
T read() {
T next = null;
if(count < max) {
next = delegate.read();
++count;
}
return next;
}
void open(ExecutionContext executionContext) {
((ItemStream)delegate).open(executionContext);
count = executionContext.getInt('count', 0);
}
void close() {
((ItemStream)delegate).close(executionContext);
}
void update(ExecutionContext executionContext) {
((ItemStream)delegate).update(executionContext);
executionContext.putInt('count', count);
}
}
This works with any reader.

public class CountMaxFlatFileItemReader extends FlatFileItemReader {
private int counter;
private int maxCount;
public void setMaxCount(int maxCount) {
this.maxCount = maxCount;
}
#Override
public Object read() throws Exception {
counter++;
if (counter >= maxCount) {
return null; // this will stop reading
}
return super.read();
}
}
Something like this should work. The reader stops reading, as soon as null is returned.

Related

Solr custom query component does not return correct facet counts

I have a simple Solr query component as follows:
public class QueryPreprocessingComponent extends QueryComponent implements PluginInfoInitialized {
private static final Logger LOG = LoggerFactory.getLogger( QueryPreprocessingComponent.class );
private ExactMatchQueryProcessor exactMatchQueryProcessor;
public void init( PluginInfo info ) {
initializeProcessors(info);
}
private void initializeProcessors(PluginInfo info) {
List<PluginInfo> queryPreProcessors = info.getChildren("queryPreProcessors")
.get(0).getChildren("queryPreProcessor");
for (PluginInfo queryProcessor : queryPreProcessors) {
initializeProcessor(queryProcessor);
}
}
private void initializeProcessor(PluginInfo queryProcessor) {
QueryProcessorParam processorName = QueryProcessorParam.valueOf(queryProcessor.name);
switch(processorName) {
case ExactMatchQueryProcessor:
exactMatchQueryProcessor = new ExactMatchQueryProcessor(queryProcessor.initArgs);
LOG.info("ExactMatchQueryProcessor initialized...");
break;
default: throw new AssertionError();
}
}
#Override
public void prepare( ResponseBuilder rb ) throws IOException
{
if (exactMatchQueryProcessor != null) {
exactMatchQueryProcessor.modifyForExactMatch(rb);
}
}
#Override
public void process(ResponseBuilder rb) throws IOException
{
// do nothing - needed so we don't execute the query here.
return;
}
}
This works as expected functionally except when I use this in a distributed request, it has an issue with facets counts returned. It doubles the facet counts.
Note that I am not doing anything related to faceting in plugin. exactMatchQueryProcessor.modifyForExactMatch(rb); does a very minimal processing if the query is quoted otherwise it does nothing. Even if the incoming query is not quoted, facet count issue is there. Even if I comment everything inside prepare function, issue persists.
Note that this component is declared in as first-components in solrconfig.xml.
I resolved this issue by extending the class to SearchComponent instead of QueryComponent. It seems that SearchComponent sits at higher level of abstraction than QueryComponent and is useful when you want to work on a layer above shards.

To return a list from Database using a custom Item Reader in spring batch

I have a custom item reader to return a list of records from table.My job is running in an infinite loop as the reader contract is not met.Any suggestions on this pls?
public class customReader implements ItemReader<List<T>>{
#Autowired
customDao customDao;
static List<T> CCTransDlyLg = null;
#Override
public List<T> read() throws Exception {
if(CCTransDlyLg==null || (CCTransDlyLg!=null && CCTransDlyLg.size()==0)){
CCTransDlyLg=customDao.getList();
}
log.info("CCTransDlyLg List:"+CCTransDlyLg.size());
return CCTransDlyLg.size()==0 ? null : CCTransDlyLg;
}
You're list never changes. Assuming you read a list that is size 5, your return statement will always return that same list. The logic of your ItemReader looks like you only want to return a single list (aka one call to the read() method).
As per Spring Batch Reader contract your method will be called again and again till it returns null.In your code if customDao succeeds your list will be always of Same Size it will never be zero. You need some condition to break out of that loop and return null .This is one possible solution by using a variable called index to break out of that loop.
On other note i see Mike answered your question i learned spring batch from his book and video itself :)
public class customReader implements ItemReader<List<T>> {
private static List<T> CCTransDlyLg = null;
#Autowired
customDao customDao;
private int index = 0;
#Override
public List<T> read() throws Exception {
if (CCTransDlyLg == null || (CCTransDlyLg != null && CCTransDlyLg.size() == 0)) {
CCTransDlyLg = customDao.getList();
index = CCTransDlyLg.size() + 1;
}
log.info("CCTransDlyLg List:" + CCTransDlyLg.size());
return index > CCTransDlyLg.size() ? null : CCTransDlyLg;
}

Update facts in Decision Table : Drools

I have a drools decision table in excel spreadsheet with two rules. (This example has been greatly simplified, the one I am working with has alot more rules.)
The first rule checks if amount is more than or equal to 500. If it is, then it sets status to 400.
The second rule checks if status is 400. If it is, then it sets the message variable.
The problem is, I am unable to get the second rule to fire, even though sequential is set. I also have to use no-loop and lock-on-active to prevent infinite looping.
My goal is to get the rules to fire top down, and the rules that come after might depend on changes made to the fact/object by earlier rules.
Is there a solution to this problem?
Any help would be appreciated, thanks!
package com.example;
import org.kie.api.KieServices;
import org.kie.api.runtime.KieContainer;
import org.kie.api.runtime.KieSession;
public class SalaryTest {
public static final void main(String[] args) {
try {
// load up the knowledge base
KieServices ks = KieServices.Factory.get();
KieContainer kContainer = ks.getKieClasspathContainer();
KieSession kSession = kContainer.newKieSession("ksession-dtables");
Salary a = new Salary();
a.setAmount(600);
kSession.insert(a);
kSession.fireAllRules();
} catch (Throwable t) {
t.printStackTrace();
}
}
public static class Salary{
private String message;
private int amount;
private int status;
public String getMessage() {
return message;
}
public void setMessage(String message) {
this.message = message;
}
public int getAmount() {
return amount;
}
public void setAmount(int amount) {
this.amount = amount;
}
public int getStatus() {
return status;
}
public void setStatus(int status) {
this.status = status;
}
}
}
The attribute lock-on-active countermands any firings after the first from the group of rules with the same agenda group. Remove this column.
Don't plan to have rules firing in a certain order. Write logic that describes exactly the state of a fact as it should trigger the rule. Possibly you'll have to write
rule "set status"
when
$s: Salary( amount >= 500.0 && < 600.0, status == 0 )
then
modify( $s ){ setStatus( 400 ) }
end
to avoid more than one status setting to happen or just the right setting to happen. But you'll find that your rules may be more outspoken and easier to read.
Think of rule attributes are a last resort.
Please replace the action in the column H in the following way:
Current solution:
a.setStatus($param);update(a);
New solution:
modify(a) {
setStatus($param)
}

Concatenating ImmutableLists

I have a List<ImmutableList<T>>. I want to flatten it into a single ImmutableList<T> that is a concatenation of all the internal ImmutableLists. These lists can be very long so I do not want this operation to perform a copy of all the elements. The number of ImmutableLists to flatten will be relatively small, so it is fine that lookup will be linear in the number of ImmutableLists. I would strongly prefer that the concatenation will return an Immutable collection. And I need it to return a List that can be accessed in a random location.
Is there a way to do this in Guava?
There is Iterables.concat but that returns an Iterable. To convert this into an ImmutableList again will be linear in the size of the lists IIUC.
By design Guava does not allow you to define your own ImmutableList implementations (if it did, there'd be no way to enforce that it was immutable). Working around this by defining your own class in the com.google.common.collect package is a terrible idea. You break the promises of the Guava library and are running firmly in "undefined behavior" territory, for no benefit.
Looking at your requirements:
You need to concatenate the elements of n ImmutableList instances in sub-linear time.
You would like the result to also be immutable.
You need the result to implement List, and possibly be an ImmutableList.
As you know you can get the first two bullets with a call to Iterables.concat(), but if you need an O(1) random-access List this won't cut it. There isn't a standard List implementation (in Java or Guava) that is backed by a sequence of Lists, but it's straightforward to create one yourself:
/**
* A constant-time view into several {#link ImmutableList} instances, as if they were
concatenated together. Since the backing lists are immutable this class is also
immutable and therefore thread-safe.
*
* More precisely, this class provides O(log n) element access where n is the number of
* input lists. Assuming the number of lists is small relative to the total number of
* elements this is effectively constant time.
*/
public class MultiListView<E> extends AbstractList<E> implements RandomAccess {
private final ImmutableList<ImmutableList<E>> elements;
private final int size;
private final int[] startIndexes;
private MutliListView(Iterable<ImmutableList<E>> elements) {
this.elements = ImmutableList.copyOf(elements);
startIndexes = new int[elements.size()];
int currentSize = 0;
for (int i = 0; i < this.elements.size(); i++) {
List<E> ls = this.elements.get(i);
startIndexes[i] = ls.size();
currentSize += ls.size();
}
}
#Override
public E get(int index) {
checkElementIndex(index, size);
int location = Arrays.binarySearch(startIndexes, index);
if (location >= 0) {
return elements.get(location).get(0);
}
location = (~location) - 1;
return elements.get(location).get(index - startIndexes[location]);
}
#Override
public int size() {
return size;
}
// The default iterator returned by AbstractList.iterator() calls .get()
// which is likely slower than just concatenating the backing lists' iterators
#Override
public Iterator<E> iterator() {
return Iterables.concat(elements).iterator();
}
public static MultiListView<E> of(Iterable<ImmutableList<E>> lists) {
return new MultiListView<>(lists);
}
public static MultiListView<E> of(ImmutableList<E> ... lists) {
return of(Arrays.asList(lists));
}
}
This class is immutable even though it doesn't extend ImmutableList or ImmutableCollection, therefore there's no need for it to actually extend ImmutableList.
As to whether such a class should be provided by Guava; you can make your case in the associated issue, but the reason this doesn't already exist is that surprisingly few users actually need it. Be sure there isn't a reasonable way to solve your problem with an Iterable before using MultiListView.
Firstly, #dimo414's answer is right on the mark - with a clean wrapper view implementation and advice.
Still, I would like to emphasise that since Java 8, you probably just want to do:
listOfList.stream()
.flatMap(ImmutableList::stream)
.collect(ImmutableList.toImmutableList());
The guava issue was since closed as working-as-intended with the remark:
We are more down on lazy view collections than we used to be (especially now that Stream exists) (...)
At least, profile your own use case before trying the view-collection approach.
Under the hood using streams, what effectively happens is that a new backing array is populated with references to the elements - the elements themselves are not deeply copied. So there's very low number of objects created (GC costs) and linear copies from backing-arrays to backing-arrays usually proceed faster than you might expect even with large inner-lists. (They work well with CPU cache prefetch).
Depending on how much you do with the result, the stream version might work out faster that the wrapper version's extra indirection every time you access it.
Here is probably a slightly more readable version of dimo414 implementation, which processes empty lists correctly and populates startIndexes correctly:
public class ImmutableMultiListView<E> extends AbstractList<E> implements RandomAccess {
private final ImmutableList<ImmutableList<E>> listOfLists;
private final int[] startIndexes;
private final int size;
private ImmutableMultiListView(List<ImmutableList<E>> originalListOfLists) {
this.listOfLists =
originalListOfLists.stream().filter(l -> !l.isEmpty()).collect(toImmutableList());
startIndexes = new int[listOfLists.size()];
int sumSize = 0;
for (int i = 0; i < listOfLists.size(); i++) {
List<E> list = listOfLists.get(i);
sumSize += list.size();
if (i < startIndexes.length - 1) {
startIndexes[i + 1] = sumSize;
}
}
this.size = sumSize;
}
#Override
public E get(int index) {
checkElementIndex(index, size);
int location = Arrays.binarySearch(startIndexes, index);
if (location >= 0) {
return listOfLists.get(location).get(0);
} else {
// See Arrays#binarySearch Javadoc:
int insertionPoint = -location - 1;
int listIndex = insertionPoint - 1;
return listOfLists.get(listIndex).get(index - startIndexes[listIndex]);
}
}
#Override
public int size() {
return size;
}
// AbstractList.iterator() calls .get(), which is slower than just concatenating
// the backing lists' iterators
#Override
public Iterator<E> iterator() {
return Iterables.concat(listOfLists).iterator();
}
public static <E> ImmutableMultiListView<E> of(List<ImmutableList<E>> lists) {
return new ImmutableMultiListView<>(lists);
}
}
Not sure if it is possible just with Guava classes, but it seems not difficult to implement, how about something like the following:
package com.google.common.collect;
import java.util.List;
public class ConcatenatedList<T> extends ImmutableList<T> {
private final List<ImmutableList<T>> underlyingLists;
public ConcatenatedList(List<ImmutableList<T>> underlyingLists) {
this.underlyingLists = underlyingLists;
}
#Override
public T get(int index) {
for (ImmutableList<T> list : underlyingLists) {
if (index < list.size()) return list.get(index);
index -= list.size();
}
throw new IndexOutOfBoundsException();
}
#Override
boolean isPartialView() {
for (ImmutableList<T> list : underlyingLists) {
if (list.isPartialView()) return true;
}
return false;
}
#Override
public int size() {
int result = 0;
for (ImmutableList<T> list : underlyingLists) {
result += list.size();
}
return result;
}
}
Note package declaration, it needs to be like that to access Guava's ImmutableList package access constructor. Be aware that this implementation might break with future version of Guava, since the constructor is not part of API. Also as mentioned in the javadoc of ImmutableList and in comments this class was not intended to be subclassed by the original library author. However, there is no good reason for not using it in application you control and it has additional benefit of expressing immutability in the type signature compared to MultiListView suggested in the other answer.

Spring Batch Footer Validation

I am using Spring batch for processing a file with a header, detail and footer records.
The footer contains the total number of records in the file.
If the detail record count dosent match the count in the footer, the file should not be processed.
I am using a Custom Line Tokenizer that processes the header, detail and footer record. When the footer record is encountered, if the count dosent match the detail record count, I am throwing an exception.
But the problem I am facing is if the chunk size is set to small numbers like 10 and the file has 20 records, the first 10 detail records are being persisted into the DB, even though the footer count dosent match the total number of records.
Is there a way to validate the footer count with the number of records in the file before the call to the Writer?
Thanks.
What you need is a reader with a footer callback handler defined. I had faced a similar problem and this link helped me a lot!
See the last post by Atefeh Zareh. He has also included the xml configuration.
And regarding the first ten being persisted, you can have another validation step before the main processing step which will just check the header and trailer counts. Do not write any persisting logic in the writer. If the count fails, stop the job so that it does not go into the processing step.
By writing our own Item Reader as well as Item classes to handle Header,Footer,Data records and finding the counts of Header,Footer,Data records
ItemReader Class
public class AggregateItemReader<T> implements ItemStreamReader<ResultHolder> {
private ItemStreamReader<AggregateItem<T>> itemReader;
#Override
public ResultHolder read() throws Exception {
ResultHolder holder = new ResultHolder();
while (process(itemReader.read(), holder)) {
continue;
}
if (!holder.isExhausted()) {
return holder;
}
else {
return null;
}
}
private boolean process(AggregateItem<T> value, ResultHolder holder) {
// finish processing if we hit the end of file
if (value == null) {
LOG.debug("Exhausted ItemReader");
holder.setExhausted(true);
return false;
}
// start a new collection
if (value.isHeader()) {
LOG.debug("Header Record detected");
holder.addHeaderRecordCount();
return true;
}
// mark we are finished with current collection
if (value.isFooter()) {
LOG.debug("Tailer Record detected");
holder.addTailerRecordCount();
holder.setFiledRecordCount(value.getFieldSet().readInt(3));
System.out.println("###########################################"+holder.getDataRecordCount()+"############################################");
return false;
}
// add a simple record to the current collection
holder.addDataRecordCount();
return true;
}
And Item Class is
public class AggregateItem<T> {
#SuppressWarnings("unchecked")
public static <T> AggregateItem<T> getData(FieldSet fs) {
return new AggregateItem(fs, false, false, true);
}
#SuppressWarnings("unchecked")
public static <T> AggregateItem<T> getFooter(FieldSet fs) {
return new AggregateItem(fs, false, true, false);
}
#SuppressWarnings("unchecked")
public static <T> AggregateItem<T> getHeader(FieldSet fs) {
return new AggregateItem(fs, true, false, false);
}
private boolean data = false;
private FieldSet fieldSet;
private boolean footer = false;
private boolean header = false;
private T item;
public AggregateItem(FieldSet fs, boolean header, boolean footer, boolean data) {
this(null);
this.header = header;
this.footer = footer;
this.data = data;
this.fieldSet = fs;
}
public AggregateItem(T item) {
super();
this.item = item;
}
public FieldSet getFieldSet() {
return fieldSet;
}
public T getItem() {
return item;
}
public boolean isData() {
return data;
}
public boolean isFooter() {
return footer;
}
public boolean isHeader() {
return header;
}
}
And ResultHolder class is
public class ResultHolder implements {
private Integer headerRecordCount = 0;
private Integer dataRecordCount = 0;
private Integer tailerRecordCount = 0;
private Integer filedRecordCount;//this is to save record count given in source File
private boolean exhausted = false;//setters & getters
}
If any doubts feel free to mail at sk.baji6#gmail.com