Import data from few csv files to database in Spring Batch - spring-batch

What is the best way to import data from few csv files in Spring Batch? I mean one csv file responds to one table in database.
I created one batch configuration class for each table and every table has its own job and step.
Is there any solution to do this in more elegant way?

There's a variety of ways you could tackle the problem, but the simplest job would look something like:
FlatFileItemWriter reader with a DelmitedLineTokenizer and BeanWrapperFieldSetMapper to read the file
Processor if you need to do any additional validation/filtering/transformation
JDBCBatchItemWriter to insert/update the target table
Here's an example that includes more information around specific dependencies, config, etc. The example uses context file config rather than annotation-based, but it should be sufficient to show you the way.
A more complex solution might be a single job with a partitioned step that scans the input folder for files and, leveraging reference table/schema information, creates a reader/writer step for each file that it finds.
You also may want to consider what to do with the files once you're done... Delete them? Compress them?

Related

Reading Different csv's using springbatch

Hi I need help with reading different csv files using spring batch. Files are different types not able to get idea how to read. can somebody help me with this issue.
I'm using FlatFileItemReader to read one file. I need to read multiple files
example:
i need to read all files process and insert in to db.
Files are of different types, so I would keep it simple use different items readers.
If those files can be processed independently, you can process them concurrently in different steps.

Working with PowerShell and file based DB operations

I have a scenario where I have a lot of files in a CSV file i need to do operations on. The script needs to be able to handle if script is stopped or failed, then it should continue where i stopped from. In a database scenario this would be fairly simple. I would have an updated column and update that when operation for the line has completed. I have looked if I somehow could update the CSV on the fly, but I dont think that is possible. I could start having multiple files, but not that elegant. Can anyone recommend some kind of simple file based DB like framework? Where I from PowerShell could create a new database file (maybe json) and read from it and update on the fly.
If your problem is really so complex, that you actually need somewhat of a local database solution, then consider to go with SQLite which was built for such scenarios.
In your case, since you process an CSV row-by-row, I assume storing the info for the current row only will be enough. (Line number, status etc.)

Spark Structured Streaming Processing Previous Files

I am implementing the file source in Spark Structures Streaming and want to process the same file name again if the file has been modified. Basically an update to the file. Currently right now Spark will not process the same file name again once processed. Seems limited compared to Spark Streaming with Dstream. Is there a way to do this? Spark Structured Streaming doesn't document this anywhere it only process new file with different names.
I believe this is somewhat of an anti pattern, but you may be able to dig through the checkpoint data and remove the entry for that original file.
Try looking for the original file name in the /checkpoint/sources// files delete the file or entry. That might cause the stream to pick up the file name again. I haven't tried this myself.
If this is a one time manual update, I would just change the file name to something new and drop it in the source directory. This approach won't be maintainable or automated.

Spring batch job to read multiple files and write to multiple tables

I need to create a spring batch job which takes multiple files and writes to multiple tables. Tried to use multiresourceitemwriter but my files are located in different folders and no common name. Looking for examples using ListItemReader and ListItemWriter. Any references are highly helpful.Thank you.
You can try the example at here
https://bigzidane.wordpress.com/2016/09/12/spring-batch-partitionerreaderprocesorwriterhibernateintellij/
I believe it fits to your question. If you need more question, please let me know.

Write to different Sheets in a single excel file from multiple tables

Can someone please give me a technical design overview of how I should implement this scenario :
I am using spring batch to import data from CSV files to different tables and once they are imported I run some validations on these tables and now I need to write all those data from 3 different tables into three different Sheets of a single Excel file. Can someone please help me how I should use ItemReaders and Itemwriters to solve this problem ?
If I'm asked I would implement as follows. create xls file from your code or first step which would be method invoker. which would create the file. and pass the file job parameters.
Step 1/2 would do a chunk reading from table 1 and in the itemwriter I would use the custom Item writer which would use POI and I would write to first sheet.
Step 2 would do a chunk reading from table 2 and in the itemwriter would read second sheet.
Since you have single file you can never achieve the advantage of spring batch performance like multithread, partitioning etc. Rather than its better to write to different file with independent task