Multi-Line Records right way to handle - spring-batch

Currently I follow spring-batch docs on how to handle with Multi-Line Records situation
Currently if all line in the same format I can use FlatFileItemReader and FlatFileItemWriter to write and read csv.
Something like this
#Bean
public FlatFileItemReader<AdaptNeko> reader() {
return new FlatFileItemReaderBuilder<AdaptNeko>()
.name("itemReader")
.resource(new ClassPathResource(INPUT_FILE))
.delimited()
.names(new String[]{"name", "age", "potentialLevel"})
.fieldSetMapper(fieldSet -> NormalNeko.builder()
.name(fieldSet.readString("name"))
.age(fieldSet.readInt("age"))
.potentialLevel(fieldSet.readInt("potentialLevel"))
.build())
.build();
}
Currently the format file is a bit different.
My current csv format:
START, day, play
123, 456, 899, abc, xyz
END
START, day1, play23
789, 456, 899, koq, koq
END
After doing some research I found out that currently spring-batch have docs about this case but I'm very new with spring-batch so barely understand it. And the second line which is data only don't have start attribute with default value.
In documents example it uses orderFileTokenizer() then after that fieldSetMapper back into just one class like FlatFileItemReader usage
#Bean
public FlatFileItemReader flatFileItemReader() {
FlatFileItemReader<Trade> reader = new FlatFileItemReaderBuilder<>()
.name("flatFileItemReader")
.resource(new ClassPathResource("data/iosample/input/multiLine.txt"))
.lineTokenizer(orderFileTokenizer())
.fieldSetMapper(orderFieldSetMapper())
.build();
return reader;
}
and
#Bean
public PatternMatchingCompositeLineTokenizer orderFileTokenizer() {
PatternMatchingCompositeLineTokenizer tokenizer =
new PatternMatchingCompositeLineTokenizer();
Map<String, LineTokenizer> tokenizers = new HashMap<>(4);
tokenizers.put("HEA*", headerRecordTokenizer());
tokenizers.put("FOT*", footerRecordTokenizer());
tokenizers.put("NCU*", customerLineTokenizer());
tokenizers.put("BAD*", billingAddressLineTokenizer());
tokenizer.setTokenizers(tokenizers);
return tokenizer;
}
So may I ask what is this function tokenizer meaning like headerRecordTokenizer() and mapping with Hashmap<>(4) here means 4 lines each times reading the file?
I don't have the default value in like FOT*, NCU* like the example so how do I set to the spring-batch know that I want to stop at END to begin a new reading?
I'm quite new to spirng-batch so I have bit confused right here because as the document write I see no difference to how FlatFileItemReader work when all attributes in one line is stable as my first example.

So may I ask what is this function tokenizer meaning like headerRecordTokenizer() and mapping with Hashmap<>(4) here means 4 lines each times reading the file?
A logical record in the input file of the example you shared spans 4 physical lines, where each line has to be tokenized differently. This is where the composite line tokenizer comes to play: it will delegate line tokenization to 4 delegate pattern-matching tokenizers, one for each line format.
There is another example which is similar to your requirement, which you can find here: https://github.com/spring-projects/spring-batch/blob/main/spring-batch-samples/src/test/java/org/springframework/batch/sample/iosample/internal/MultiLineTradeItemReader.java
This one reads items between "BEGIN" and "END", similar to what you have. The complete example can be found here: https://github.com/spring-projects/spring-batch/tree/main/spring-batch-samples#multiline

Related

Can I use AtomicReference to get value of a Mono and code still remain reactive

Sorry, I am new to reactive paradigm. Is is possible to use AtomicReference to get value of a Mono since reactive code can run asynchronously and different events run on different thread. Please see the sample below. I am also not sure if this piece of code is considered reactive
sample code:
public static void main(String[] a) {
AtomicReference<UserDTO> dto = new AtomicReference<>();
Mono.just(new UserDTO())
.doOnNext(d -> d.setUserId(123L))
.subscribe(d -> dto.set(d));
UserDTO result = dto.get();
dto.set(null);
System.out.println(result); // produce UserDTO(userId=123)
System.out.println(dto.get()); // produce null
}
The code snippet you have shared is not guaranteed to always work. There is no way to guarantee that the function inside doOnNext will happen before dto.get(). You have created a race condition.
You can run the follow code to simulate this.
AtomicReference<UserDTO> dto = new AtomicReference<>();
Mono.just(new UserDTO())
.delayElement(Duration.ofSeconds(1))
.doOnNext(d -> d.setUserId(123L))
.subscribe(dto::set);
UserDTO result = dto.get();
System.out.println(result); // produces null
To make this example fully reactive, you should print out in the subscribe operator
Mono.just(new UserDTO())
.doOnNext(d -> d.setUserId(123L))
.subscribe(System.out::println)
In a more "real world" example, your method would return a Mono<UserDTO> and you would then perform transformations on this using map or flatMap operators.
** EDIT **
If you are looking to make a blocking call within a reactive stream this previous stack overflow question contains a good answer

What is the most effective way in systemVerilog to know how many words a string has?

I have Strings in the following structure:
cmd, addr, data, data, data, data, ……., \n
For example:
"write,A0001000,00000000, \n"
I have to know how many words the String has.
I know that I can go over the String and search for the number of commas, but is there more effective way to do it?
UVM provides a facility to do regexp matching using the DPI, in case you're already using that. Have a look at the functions in uvm_svcmd_dpi.svh
Verilab also provides svlib, a package containing string matching functions.
A simpler option would be to change the commas(,) to a space, then you can use $sscanf (or $fscanf to skip the intermediate string and read directly from a file), assuming each command has a maximum number of words.
int code; // returns the number of words read
string str,word[5];
code = $sscanf(str,"%s %s %s %s %s", word[0],word[1],word[2],word[3],word[4]);
You can use %h if you know a word is in hex and translate it directly to a numeric value instead of a string.
The first step is to define extremely clearly what a word actually is vis. what constitutes the start of a word and what constitutes the end of the word, once you understand this, if should become obvious how to parse the string correctly.
In Java StringTokenizer is the best way to find the count of words in a string.
String sampleString= "cmd addr data data data data...."
StringTokenizer st = new Tokenizer(sampleString);
st.countTokens();
Hope this will help you :)
In java you can use following code to count words in string
public class WordCounts{
public static void main(String []args){
String text="cmd, addr, data, data, data, data";
String trimmed = text.trim();
int words = trimmed.isEmpty() ? 0 : trimmed.split("\\s+").length;
System.out.println(words);
}
}

itextsharp: unable to cast from FilteredRenderListener to ITextExtractionStrategy

I want to extract text from a specified pdf area with itextsharp. I know there is an example http://sourceforge.net/p/itextsharp/code/HEAD/tree/book/iTextExamplesWeb/iTextExamplesWeb/iTextInAction2Ed/Chapter15/ExtractPageContentArea.cs#l35. The core code is like this:
RenderFilter[] filter = {new RegionTextRenderFilter(rect)};
ITextExtractionStrategy strategy = new FilteredRenderListener(new LocationTextExtractionStrategy(), filter);
string text = PdfTextExtractor.GetTextFromPage(reader, i, strategy);
However, vs2012 shows me that "cannot convert implicitly from FilteredRenderListener to ITextExtractionStrategy ". I tried to do explicit conversion. But failed. Could anyone help me? Do I use a wrong itextsharp version? Thanks so much!
Your line looks like this:
ITextExtractionStrategy strategy = new FilteredRenderListener(new LocationTextExtractionStrategy(), filter);
but the corresponding lines from the sample you reference look like this:
ITextExtractionStrategy strategy;
...
strategy = new FilteredTextRenderListener(
new LocationTextExtractionStrategy(), filter
);
I.e. the sample uses a FilteredTextRenderListener while you only use a FilteredRenderListener. The former actually extends the latter to also implement ITextExtractionStrategy.
Thus, simply use FilteredTextRenderListener instead of FilteredRenderListener.

Drool does not sort numbers correctly

I am new to Drools and am trying to get the sample program to work.
This sample is given in the drools documentation http://docs.jboss.org/drools/release/5.5.0.Final/drools-expert-docs/html_single/index.html#d0e9542.
This drool rule is expected to sort integers. I just changed the numbers from what are given in the sample and they do not get sorted as expected.
Tried using drools version 5.5.0, 5.5.1 and the master 6.0.0, but got the same wrong results.
Following is the main code:
package com.sample;
public class Example2 {
public static void main(String[] args) throws Exception {
Number[] numbers = new Number[] { wrap(5), wrap(6), wrap(4), wrap(1), wrap(2) };
new RuleRunner().runRules(new String[] { "Example3.drl" }, numbers);
}
private static Integer wrap(int i) {
return new Integer(i);
}
}
The RuleRunner class is the same as given in the example and I do not think I should give that here, since it will clutter the question. It simply creates the KnowledgeBase, stateful session, inserts the facts as given in the 'numbers' array above and then calls fireAllRules method on the session.
The rule file (Example3.drl) is:
rule "Rule 04"
dialect "mvel"
when
$number : Number()
not Number(intValue < $number.intValue)
then
System.out.println("Number found with value: " + $number.intValue());
retract($number);
end
The output I get is as follows:
Loading file: Example3.drl
Inserting fact: 5
Inserting fact: 6
Inserting fact: 4
Inserting fact: 1
Inserting fact: 2
Number found with value: 1
Number found with value: 4
Number found with value: 2
Number found with value: 5
Number found with value: 6
Not the correct expected ascending sorted order.
What might I be doing wrong? I cannot imagine that the drools rule engine would be broken at this basic level.
This seems to be a bug that was introduced in 5.5.0 onwards and still exists.
This sorting code works fine with 5.4.0.
Workaround:
Instead of: "not Number(intValue < $number.intValue)" If you use: "not Number(intValue() < $number.intValue)" Then it works.
A non-getter method without telling that it is a function seems to create a problem.
A debilitating problem that reduces the confidence in the product in the evaluation phase.

ADO.NET Mapping From SQLDataReader to Domain Object?

I have a very simple mapping function called "BuildEntity" that does the usual boring "left/right" coding required to dump my reader data into my domain object. (shown below) My question is this - If I don't bring back every column in this mapping as is, I get the "System.IndexOutOfRangeException" exception and wanted to know if ado.net had anything to correct this so I don't need to bring back every column with each call into SQL ...
What I'm really looking for is something like "IsValidColumn" so I can keep this 1 mapping function throughout my DataAccess class with all the left/right mappings defined - and have it work even when a sproc doesn't return every column listed ...
Using reader As SqlDataReader = cmd.ExecuteReader()
Dim product As Product
While reader.Read()
product = New Product()
product.ID = Convert.ToInt32(reader("ProductID"))
product.SupplierID = Convert.ToInt32(reader("SupplierID"))
product.CategoryID = Convert.ToInt32(reader("CategoryID"))
product.ProductName = Convert.ToString(reader("ProductName"))
product.QuantityPerUnit = Convert.ToString(reader("QuantityPerUnit"))
product.UnitPrice = Convert.ToDouble(reader("UnitPrice"))
product.UnitsInStock = Convert.ToInt32(reader("UnitsInStock"))
product.UnitsOnOrder = Convert.ToInt32(reader("UnitsOnOrder"))
product.ReorderLevel = Convert.ToInt32(reader("ReorderLevel"))
productList.Add(product)
End While
Also check out this extension method I wrote for use on data commands:
public static void Fill<T>(this IDbCommand cmd,
IList<T> list, Func<IDataReader, T> rowConverter)
{
using (var rdr = cmd.ExecuteReader())
{
while (rdr.Read())
{
list.Add(rowConverter(rdr));
}
}
}
You can use it like this:
cmd.Fill(products, r => r.GetProduct());
Where "products" is the IList<Product> you want to populate, and "GetProduct" contains the logic to create a Product instance from a data reader. It won't help with this specific problem of not having all the fields present, but if you're doing a lot of old-fashioned ADO.NET like this it can be quite handy.
Although connection.GetSchema("Tables") does return meta data about the tables in your database, it won't return everything in your sproc if you define any custom columns.
For example, if you throw in some random ad-hoc column like *SELECT ProductName,'Testing' As ProductTestName FROM dbo.Products" you won't see 'ProductTestName' as a column because it's not in the Schema of the Products table. To solve this, and ask for every column available in the returned data, leverage a method on the SqlDataReader object "GetSchemaTable()"
If I add this to the existing code sample you listed in your original question, you will notice just after the reader is declared I add a data table to capture the meta data from the reader itself. Next I loop through this meta data and add each column to another table that I use in the left-right code to check if each column exists.
Updated Source Code
Using reader As SqlDataReader = cmd.ExecuteReader()
Dim table As DataTable = reader.GetSchemaTable()
Dim colNames As New DataTable()
For Each row As DataRow In table.Rows
colNames.Columns.Add(row.ItemArray(0))
Next
Dim product As Product While reader.Read()
product = New Product()
If Not colNames.Columns("ProductID") Is Nothing Then
product.ID = Convert.ToInt32(reader("ProductID"))
End If
product.SupplierID = Convert.ToInt32(reader("SupplierID"))
product.CategoryID = Convert.ToInt32(reader("CategoryID"))
product.ProductName = Convert.ToString(reader("ProductName"))
product.QuantityPerUnit = Convert.ToString(reader("QuantityPerUnit"))
product.UnitPrice = Convert.ToDouble(reader("UnitPrice"))
product.UnitsInStock = Convert.ToInt32(reader("UnitsInStock"))
product.UnitsOnOrder = Convert.ToInt32(reader("UnitsOnOrder"))
product.ReorderLevel = Convert.ToInt32(reader("ReorderLevel"))
productList.Add(product)
End While
This is a hack to be honest, as you should return every column to hydrate your object correctly. But I thought to include this reader method as it would actually grab all the columns, even if they are not defined in your table schema.
This approach to mapping your relational data into your domain model might cause some issues when you get into a lazy loading scenario.
Why not just have each sproc return complete column set, using null, -1, or acceptable values where you don't have the data. Avoids having to catch IndexOutOfRangeException or re-writing everything in LinqToSql.
Use the GetSchemaTable() method to retrieve the metadata of the DataReader. The DataTable that is returned can be used to check if a specific column is present or not.
Why don't you use LinqToSql - everything you need is done automatically. For the sake of being general you can use any other ORM tool for .NET
If you don't want to use an ORM you can also use reflection for things like this (though in this case because ProductID is not named the same on both sides, you couldn't do it in the simplistic fashion demonstrated here):
List Provider in C#
I would call reader.GetOrdinal for each field name before starting the while loop. Unfortunately GetOrdinal throws an IndexOutOfRangeException if the field doesn't exist, so it won't be very performant.
You could probably store the results in a Dictionary<string, int> and use its ContainsKey method to determine if the field was supplied.
I ended up writing my own, but this mapper is pretty good (and simple): https://code.google.com/p/dapper-dot-net/