I have a spreadsheet that looks like so...
US LC US MC
40 55.39 3.26
39 54.39 1.26
This is it's POJO:
#Data
public class ExcelObject {
private BigDecimal timePeriod;
private BigDecimal usLc;
private BigDecimal usMc;
}
I need to transform this sheet and save it to the database like so...
TIME_PERIOD, LABEL, ALLOCATION
40, "US LC", 55.39
40, "US MC", 3.26
39, "US LC", 54.39
39, "US MC", 1.26
This is the POJO for the transformed ExcelObject:
#Data
public class ExcelItem {
private BigDecimal timePeriod;
private String label;
private BigDecimal allocation;
}
What's the best Spring Batch strategy to accomplish this transformation? I have a row mapper to map the data as-is from the spreadsheet. I was thinking in my processor I'd make the transformation. But how do I return 4 results from the processor, and write 4 rows? Thanks.
Implement an ItemReader to read ExcelObject :
public class MyItemReader implements ItemReader<ExcelObject> {
#Override
ExcelObject read() {
}
}
Then an ItemProcessor to convert ExcelObject to a List<ExcelItem> :
public class MyItemProcessor implements ItemProcessor<ExcelObject,List<ExcelItem>> {
#Override
List<ExcelItem> process(ExcelObject){
}
}
Finally , an ItemWriter to write List<ExcelItem>. In the writer , you can loop the items and reuse an existing ItemWriter provided by spring-batch and delegate to it for writing an item such as :
public class MyItemWriter implements ItemWriter<List<ExcelItem>> {
#Autowired
private JdbcBatchItemWriter<ExcelItem> jdbcWriter;
#Override
public void write(List<List<ExcelItem>> items) throws Exception {
for(List<ExcelItem> item : items){
jdbcWriter.writer(item);
}
}
}
Related
I have a spring batch job with a step consisting or reader(reading from elastic search), processor(flattening and giving me a list of items) and writer(writing in database the list of item) running as following
#Bean
public Step userAuditStep(
StepBuilderFactory stepBuilderFactory,
ItemReader<User> esUserReader,
ItemProcessor<Product, List<UserAuditFact>>
userAuditFactProcessor,
ItemWriter<List<UserAuditFact>> userAuditFactListItemWriter) {
return stepBuilderFactory
.get(stepName)
.<Product, List<UserAuditFact>>chunk(chunkSize)
.reader(esUserReader)
.processor(userAuditFactProcessor)
.writer(userAuditFactListItemWriter)
.listener(listener)
.build();
}
As you can see above user reader(just kept batch reader size 2) gives list of userAudit(which are usually in few thousands) and then writing using a listWriter which is something like below
#Bean
#StepScope
public ItemWriter<List<UserAuditFact>>
userAuditFactListItemWriter(
ItemWriter<UserAuditFact> userAuditFactItemWriter) {
return new UserAuditFactListUnwrapWriter(userAuditFactItemWriter);
}
#Bean
#StepScope
public ItemWriter<UserAuditFact> userAuditFactItemWriter(
#Qualifier("dbTemplate") NamedParameterJdbcTemplate jdbcTemplate) {
return new JdbcBatchItemWriterBuilder<UserAuditFact>()
.itemSqlParameterSourceProvider(
UserAuditFactQueryProvider
.getUserAuditFactInsertParams())
.assertUpdates(false)
.sql(UserAuditFactQueryProvider.UPSERT)
.namedParametersJdbcTemplate(jdbcTemplate)
.build();
}
So since I am having a list of Items, I am unwrapping them before writing into database like below
public class UserAuditFactListUnwrapWriter
implements ItemWriter<List<UserFact>> {
private final ItemWriter<UserAuditFact> delegate;
#Override
public void write(List<? extends List<UserAuditFact>> lists) throws Exception {
final List<UserAuditFact> consolidatedList = new ArrayList<>();
for (final List<UserAuditFact> list : lists) {
consolidatedList.addAll(list);
}
delegate.write(consolidatedList);
}
}
Now the write operation is taking a lot of time especially when there is lot of items in consolidatedList
One option I can do is do some chunking logic like following where I split the list to some chunk and then give to delegate like shown below
public class UserAuditFactListUnwrapWriter
implements ItemWriter<List<UserFact>> {
private final ItemWriter<UserAuditFact> delegate;
#Setter private int chunkSize = 0; // Maybe chunksize of 200
#Override
public void write(List<? extends List<UserAuditFact>> lists) throws Exception {
final List<UserAuditFact> consolidatedList = new ArrayList<>();
for (final List<UserAuditFact> list : lists) {
consolidatedList.addAll(list);
}
List<List<T>> partitions = ListUtils.partition(consolidatedList, chunkSize);
for (List<T> partition : partitions) {
delegate.write(partition);
}
}
}
However, this too does not give me needed performance. Imagine 60000 records (3000 partitions with each chunk of 200) takes long
I was wondering if i can make it better further (maybe someway to have the partitions to be written in parallel)
Some additional info.
Database is AWS postgres rds
Is there a way to query for multiple values of the same property with Spring DataREST JPA and querydsl? I am not sure what the format of the query URL should be and if I need extra customization in my bindings. I couldn't find anything in documentation. If I have a "student" table in my database with a "major" column with corresponding Student entity I would assume that querying for all students which have "math" and "science" majors would look like http://localhost:8080/students?major=math&major=science. However in this query only the first part is being taken and major=science is ignored
Below example customizes Querydsl web support to perform collection in operation. URI /students?major=sword&major=magic searches for students with major in ["sword", "magic"].
Entity and repository
public class Student {
private Long id;
private String name;
private String major;
}
public interface StudentRepos extends PagingAndSortingRepository<Student, Long>,
QuerydslPredicateExecutor<Student>,
QuerydslBinderCustomizer<QStudent> {
#Override
default void customize(QuerydslBindings bindings, QStudent root) {
bindings.bind(root.major)
.all((path, value) -> Optional.of(path.in(value)));
}
}
Test data
new Student("Arthur", "sword");
new Student("Merlin", "magic");
new Student("Lancelot", "lance");
Controller
#RestController
#RequestMapping("/students")
#RequiredArgsConstructor
public class StudentController {
private final StudentRepos studentRepos;
#GetMapping
ResponseEntity<List<Student>> getAll(Predicate predicate) {
Iterable<Student> students = studentRepos.findAll(predicate);
return ResponseEntity.ok(StreamSupport.stream(students.spliterator(), false)
.collect(Collectors.toList()));
}
}
Test case
#Test
#SneakyThrows
public void queryAll() {
mockMvc.perform(get("/students"))
.andExpect(status().isOk())
.andExpect(jsonPath("$").isArray())
.andExpect(jsonPath("$", hasSize(3)))
.andDo(print());
}
#Test
#SneakyThrows
void querySingleValue() {
mockMvc.perform(get("/students?major=sword"))
.andExpect(status().isOk())
.andExpect(jsonPath("$").isArray())
.andExpect(jsonPath("$", hasSize(1)))
.andExpect(jsonPath("$[0].name").value("Arthur"))
.andExpect(jsonPath("$[0].major").value("sword"))
.andDo(print());
}
#Test
#SneakyThrows
void queryMultiValue() {
mockMvc.perform(get("/students?major=sword&major=magic"))
.andExpect(status().isOk())
.andExpect(jsonPath("$").isArray())
.andExpect(jsonPath("$", hasSize(2)))
.andExpect(jsonPath("$[0].name").value("Arthur"))
.andExpect(jsonPath("$[0].major").value("sword"))
.andExpect(jsonPath("$[1].name").value("Merlin"))
.andExpect(jsonPath("$[1].major").value("magic"))
.andDo(print());
}
The full Spring Boot application is in Github
So I'm new to Spring and I'm basically trying to make a REST service for the first time. Some of the data I'd like to return is some data from a properties file.
This is my configuration bean:
#Configuration
#PropertySource("classpath:client.properties")
public class PropertyConfig {
#Bean
public static PropertySourcesPlaceholderConfigurer
propertySourcesPlaceholderConfigurer() {
return new PropertySourcesPlaceholderConfigurer();
}
}
This is the class containing the info I want to return from the API. When I hover over the values, I can see that the property is being injected.
public class ProviderInfo {
#Value("${op.iss}") private String issuer;
#Value("${op.jwks_uri}") private String jwksURI;
#Value("${op.authz_uri}") private String authzURI;
#Value("${op.token_uri}") private String tokenURI;
#Value("${op.userinfo_uri}") private String userInfoURI;
// Getter methods
}
And this is the RestController
#RestController
public class ProviderInfoController {
#RequestMapping(value = "/provider-info", method = RequestMethod.GET)
public ProviderInfo providerInfo() {
return new ProviderInfo();
}
}
When I navigate to that endpoint, everything is null:
{"issuer":null,"jwksURI":null,"authzURI":null,"tokenURI":null,"userInfoURI":null}
Can anybody see what I'm doing wrong? Or if there is a better way to accomplish this in general?
Thanks!
The processing of the #Value annotations is done by Spring, so you need to get the ProviderInfo instance from Spring for the values to actually be set.
#RestController
public class ProviderInfoController {
#Autowired
private ProviderInfo providerInfo;
#RequestMapping(value = "/provider-info", method = RequestMethod.GET)
public ProviderInfo providerInfo() {
return providerInfo;
}
}
This also requires that Spring picks up and processes the ProviderInfo class.
Also, you need to add the ProviderInfo class to the Spring Bean life cycle using either #Component or #Service as follows:
#Component
public class ProviderInfo {
#Value("${op.iss}") private String issuer;
#Value("${op.jwks_uri}") private String jwksURI;
#Value("${op.authz_uri}") private String authzURI;
#Value("${op.token_uri}") private String tokenURI;
#Value("${op.userinfo_uri}") private String userInfoURI;
// Getter methods
}
Only then, you can use #Autowired inside ProviderInfoController class.
Based on my research, I know that Spring Batch provides API to handling many different kinds of data file formats.
But I need clarification on how do we supply multiple files of different format in one chunk / Tasklet.
For that, I know that there is MultiResourceItemReader can process multiple files but AFAIK all the files have to be of the same format and data structure.
So, the question is how can we supply multiple files of different data formats as input in a Tasklet ?
Asoub is right and there is no out-of-the-box Spring Batch reader that "reads it all!". However with just a handful of fairly simple and straight forward classes you can make a java config spring batch application that will go through different files with different file formats.
For one of my applications I had a similar type of use case and I wrote a bunch of fairly simple and straight forward implementations and extensions of the Spring Batch framework to create what I call a "generic" reader. So to answer your question: below you will find the code I used to go through different kind of file formats using spring batch. Obviously below you will find the stripped implementation, but it should get you going in the right direction.
One line is represented by a Record:
public class Record {
private Object[] columns;
public void setColumnByIndex(Object candidate, int index) {
columns[index] = candidate;
}
public Object getColumnByIndex(int index){
return columns[index];
}
public void setColumns(Object[] columns) {
this.columns = columns;
}
}
Each line contains multiple columns and the columns are separated by a delimiter. It does not matter if file1 contains 10 columns and/or if file2 only contains 3 columns.
The following reader simply maps each line to a record:
#Component
public class GenericReader {
#Autowired
private GenericLineMapper genericLineMapper;
#SuppressWarnings({ "unchecked", "rawtypes" })
public FlatFileItemReader reader(File file) {
FlatFileItemReader<Record> reader = new FlatFileItemReader();
reader.setResource(new FileSystemResource(file));
reader.setLineMapper((LineMapper) genericLineMapper.defaultLineMapper());
return reader;
}
}
The mapper takes a line and converts it to an array of objects:
#Component
public class GenericLineMapper {
#Autowired
private ApplicationConfiguration applicationConfiguration;
#SuppressWarnings({ "unchecked", "rawtypes" })
public DefaultLineMapper defaultLineMapper() {
DefaultLineMapper lineMapper = new DefaultLineMapper();
lineMapper.setLineTokenizer(tokenizer());
lineMapper.setFieldSetMapper(new CustomFieldSetMapper());
return lineMapper;
}
private DelimitedLineTokenizer tokenizer() {
DelimitedLineTokenizer tokenize = new DelimitedLineTokenizer();
tokenize.setDelimiter(Character.toString(applicationConfiguration.getDelimiter()));
tokenize.setQuoteCharacter(applicationConfiguration.getQuote());
return tokenize;
}
}
The "magic" of converting the columns to the record happens in the FieldSetMapper:
#Component
public class CustomFieldSetMapper implements FieldSetMapper<Record> {
#Override
public Record mapFieldSet(FieldSet fieldSet) throws BindException {
Record record = new Record();
Object[] row = new Object[fieldSet.getValues().length];
for (int i = 0; i < fieldSet.getValues().length; i++) {
row[i] = fieldSet.getValues()[i];
}
record.setColumns(row);
return record;
}
}
Using yaml configuration the user provides an input directory and a list of file names and ofcourse the appropriate delimiter and character to quote a column if the column contains the delimiter. Here is an exmple of such a yaml configuration:
#Component
#ConfigurationProperties
public class ApplicationConfiguration {
private String inputDir;
private List<String> fileNames;
private char delimiter;
private char quote;
// getters and setters ommitted
}
And then the application.yml:
input-dir: src/main/resources/
file-names: [yourfile1.csv, yourfile2.csv, yourfile3.csv]
delimiter: "|"
quote: "\""
And last but not least, putting it all together:
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Autowired
private GenericReader genericReader;
#Autowired
private NoOpWriter noOpWriter;
#Autowired
private ApplicationConfiguration applicationConfiguration;
#Bean
public Job yourJobName() {
List<Step> steps = new ArrayList<>();
applicationConfiguration.getFileNames().forEach(f -> steps.add(loadStep(new File(applicationConfiguration.getInputDir() + f))));
return jobBuilderFactory.get("yourjobName")
.start(createParallelFlow(steps))
.end()
.build();
}
#SuppressWarnings("unchecked")
public Step loadStep(File file) {
return stepBuilderFactory.get("step-" + file.getName())
.<Record, Record> chunk(10)
.reader(genericReader.reader(file))
.writer(noOpWriter)
.build();
}
private Flow createParallelFlow(List<Step> steps) {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
// max multithreading = -1, no multithreading = 1, smart size = steps.size()
taskExecutor.setConcurrencyLimit(1);
List<Flow> flows = steps.stream()
.map(step -> new FlowBuilder<Flow>("flow_" + step.getName()).start(step).build())
.collect(Collectors.toList());
return new FlowBuilder<SimpleFlow>("parallelStepsFlow")
.split(taskExecutor)
.add(flows.toArray(new Flow[flows.size()]))
.build();
}
}
For demonstration purposes you can just put all the classes in one package. The NoOpWriter simply logs the 2nd column of my test files.
#Component
public class NoOpWriter implements ItemWriter<Record> {
#Override
public void write(List<? extends Record> items) throws Exception {
items.forEach(i -> System.out.println(i.getColumnByIndex(1)));
// NO - OP
}
}
Good luck :-)
I don't think there is an out-of-the-box Spring batch reader for multiple input format.
You'll have to build your own. Of course you can reuse already existing FileItemReader as delegates in your custom file reader, and for each file type/format, use the right one.
The query in my reader takes a really long time to fetch results due to multiple table joins. I am considering the option of splitting my query joins, using temp tables if possible. is this a feasible solution ? can spring batch support use of temp tables between the reader, processor and writer ?
Yes it is possible. You should use Same DataSource instance for your reader, writer, processor.
Example:
#Component
public class DataSourceDao{
DataSource dataSource;
public DataSource getDataSource() {
return dataSource;
}
#Autowired
public void setDataSource(DataSource dataSource) {
this.dataSource = dataSource;
}
}
Reader:
public class MyReader implements ItemReader<POJO_CLASS> {
#Autowired
DataSourceDao dataSource;
#Override
JdbcCursorItemReader<POJO_CLASS> reader= new
JdbcCursorItemReader<>();
public <POJO_CLASS> read() throws Exception, UnexpectedInputException,
ParseException, NonTransientResourceException {
reader.setDataSource(dataSource.getDataSource());
// Implement your read logic
}
}
Writer:
public class YourWriter implements ItemWriter<POJO_CLASS> {
JdbcBatchItemWriter<POJO_CLASS> writer= new JdbcBatchItemWriter<>();
#Autowired
DataSourceDao dataSource;
void write(List<? extends POJO_CLASS> POJO)
{
writer.setDataSource(dataSource.getDataSource());
<Your logics...>
}