I'm working on a batch using spring-batch with one reader, one writer ,one processor. I have one CSV file as an input of my reader.
I wanted to use OpenCSV to convert one line to one bean but what i see from the documentation is that OpenCsv take one file and use the object CsvToBeanBuilder to map all the line of one file to a list of object.
I saw this post : Configuring openCSV instead of FlatFileItemReader in spring batch step
but there is no explanation on how to map one String line to a Bean object using opencsv. Do someone know if it's possible? thanks.
The explanation is in the comments. OpenCSV does the reading and the mapping. If you want to use OpenCSV in your Spring Batch app with a FlatFileItemReader, you only need the mapping part, ie a LineMapper implementation based on OpenCSV.
Now if OpenCSV does not provide a way to map a single line to a POJO, then it is probably not suitable to be used in Spring Batch in that way. In that case, you need to implement a custom ItemReader based on OpenCSV that does the reading and the mapping.
Related
Is this possible to generate random Avro data by the specified schema using org.apache.avro library?
I need to produce this data with Kafka.
I tried to find some kind of random data generator for test, however, I have stumbled upon tools for such data generator or GenericRecord usage. Tools are not very suitable for me as there is a specific file dependency (like reading the file and so on) and GenericRecord should be generated one-by-one as I've understood.
Are there any other solutions for Java/Scala?
UPDATE: I have found this class but it does not seem to beaccessible from org.apache.avro version version 1.8.2
The reason you need to read a file, is that it matches a Schema, which defines the fields that need to be created, and of which types.
That is not a hard requirement, and there would be nothing preventing creation of random Generic or Specific Records that are built in code via Avro's SchemaBuilder class
See this repo for example, that uses a POJO generated from an AVSC schema (which again, could be done with SchemaBuilder instead) into a Java class.
Even the class you linked to uses a schema file
So I personally would probably use Avro4s (https://github.com/sksamuel/avro4s) in conjunction with scalachecks (https://www.scalacheck.org) Gen to model such tests.
You could use scalacheck to generate random instances of case classes and avro4s to convert them to generic records, extract their schema etc etc.
There's also avro-mocker https://github.com/speedment/avro-mocker though I don't know how easy it is to hook into the code.
I'd just use Podam http://mtedone.github.io/podam/ to generate POJOs and then just output them to Avro using Java Avro library https://avro.apache.org/docs/1.8.1/gettingstartedjava.html#Serializing
I have a requirement where csv input file is dynamically created and hence specifying Mapper class is not possible.
Is there a way to avoid setting the class and still able to read and write in Spring batch
BeanWrapperFieldSetMapper fieldSetMapper = new BeanWrapperFieldSetMapper();
fieldSetMapper.setTargetType(Target.class);//I want to avoid this.
Additional information :
Run some logic and create CSV(comma dilimited)
Columns are in order and I stored that information statically in properties file.(c1,c2,c3) which I also use to pass in lineTokenizer.setNames(properties.get(jobName.columnValues))
The same code executes for different jobNames and all information required is fetched from properties.
Now the problem : For FieldSetMapper
Class classInstance = Class.forName(getClassProperty(jobName));
fieldSetMapper.setTargetType(classInstance);
for point 4 I have to maintain all the classes for each job which I want to avoid.
Alternatively, the question is : I have a requirement where I am not sure how many fields will be in input file.
I am just wondering whether using a OneToManyResultSetExtractor or a ResultSetExtractor with Spring Batch's JdbcCursorItemReader?
The issue I have is that the expected RowMapper only deals with one object per row and I have a join sql query that returns many rows per object.
Out of the box, it does not support the use of a ResultSetExtractor. The reason for this is that the wrapping ItemReader is stateful and needs to be able to keep track of how many rows have been consumed (it wouldn't know otherwise). The way that type of functionality is typically done in Spring Batch is by using an ItemProcessor to enrich the object. Your ItemReader would return the one (of the one to many) and then the ItemProcessor would enrich the object with the many. This is a common pattern in batch processing called the driving query pattern. You can read more about it in the Spring Batch documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/patterns.html
That being said, you could also wrap the JdbcCursorItemReader with your own implementation that performs the logic of aggregation for you.
I have a requirement to implement in Spring batch,I need to read from a file and from a DB ,the data needs to be processed and written to an email
I have gone through the spring batch documentation but was unable to find a CHUNKtasklet which would read data from multiple readers
SO essentially I have to read from 2 different sources of data(one from file and another from DB,each will need to have its own mapper)
Regards
Tar
I see two options depending on how the data is structured:
Spring Batch relies heavily on composition when building batch components. One option would be to create a custom composite ItemReader that delegates to a other readers (ones Spring Batch provides or otherwise) and provides the logic to assemble a single object based on the results of those delegated ItemReaders.
You could use an ItemReader to provide the base information (say from a database) and use and ItemProcessor to enrich the item (say reading from a file).
Either of the above are normal ways to handle this type of input scenario.
I need to access an object in both itemProcessor and itemWriter but I don't want to persist it in the executionContext. I would read this object in a pre-processing step.
What is the best way to do that?
So far what I have is - I put the object in the jobExecutionContext, then I set the scope of my itemProcessor to "step" and bind a property of the itemProcessor to "#{stepExecution.jobExecution.executionContext}". This does give me access to my object. But I am stuck at two issues with this solution:
When do I remove the object from the context so that it doesn't stay persisted, it has to be after all the items are done.
My object could be huge and it seems the column for the context is of size 2500.
Is this a good solution and if it is, how do I solve the two concerns mentioned above. And if not, is there a good way to do this in spring batch or is caching the best way to go?
Thanks.
execution/job/step ... Context uses by Spring batch are meant to be persisted in the metadata of spring batch for the restartable feature to name one!
What i have done previously is creating a normal spring bean with the object yo need and simply #autowired it in your Processor and writer!
Job Done.