Write to different Sheets in a single excel file from multiple tables - spring-batch

Can someone please give me a technical design overview of how I should implement this scenario :
I am using spring batch to import data from CSV files to different tables and once they are imported I run some validations on these tables and now I need to write all those data from 3 different tables into three different Sheets of a single Excel file. Can someone please help me how I should use ItemReaders and Itemwriters to solve this problem ?

If I'm asked I would implement as follows. create xls file from your code or first step which would be method invoker. which would create the file. and pass the file job parameters.
Step 1/2 would do a chunk reading from table 1 and in the itemwriter I would use the custom Item writer which would use POI and I would write to first sheet.
Step 2 would do a chunk reading from table 2 and in the itemwriter would read second sheet.
Since you have single file you can never achieve the advantage of spring batch performance like multithread, partitioning etc. Rather than its better to write to different file with independent task

Related

Reading Different csv's using springbatch

Hi I need help with reading different csv files using spring batch. Files are different types not able to get idea how to read. can somebody help me with this issue.
I'm using FlatFileItemReader to read one file. I need to read multiple files
example:
i need to read all files process and insert in to db.
Files are of different types, so I would keep it simple use different items readers.
If those files can be processed independently, you can process them concurrently in different steps.

Talend Open Studio Big Data - Iterate and load multiple files in DB

I am new to talend and need guidance on below scenario:
We have set of 10 Json files with different structure/schema and needs to be loaded into 10 different tables in Redshift db.
Is there a way we can write generic script/job which can iterate through each file and load it into database?
For e.g.:
File Name: abc_< date >.json
Table Name: t_abc
File Name: xyz< date >.json
Table Name: t_xyz
and so on..
Thanks in advance
With Talend Enterprise version one can benefit of dynamic schema. However based on my experiences with json-s they are somewhat nested structures usually. So you'd have to figure out how to flatten them, once thats done it becomes a 1:1 load. However with open studio this will not work due to the missing dynamic schema.
Basically what you could do is: write some java code that transforms your JSON into CSV. Use either psql from commandline or if your Talend contains new enough PostgreSQL JDBC driver then invoke the client side \COPY from it to load the data. If your file and the database table column order matches it should work without needing to specify how many columns you have, so its dynamic, but the data newer "flows" through talend.
Really not cool but also theoretically possible solution: If Redshift supports JSON (Postgres does) then one can create a staging table, with 2 columns: filename, content. Once the whole content is in this staging table, INSERT-SELECT SQL could be created that transforms the JSON into tabular format that can be inserted into the final table.
However, with your toolset you probably have no other choice than to load these files with 1 job per file. And I'd suggest 1 dedicated job to each file. They would each look for their own files and triggered / scheduled individually or be part of a bigger job where you scan the folders and trigger the right job for the right file.

Write dataframe to multiple tabs in a excel sheet using Spark

I have been using Spark-excel (https://github.com/crealytics/spark-excel) to write the output to a single sheet of an Excel sheet. However, I am unable to write the output to different sheets (tabs).
Can anyone suggest any alternative?
Thanks,
Sai
I would suggest to split the problem into two phases:
save the data into multiple csv using multiple Spark flows
write an application, that converts multiple csv files to a single excel sheet, using e.g. this Java library: http://poi.apache.org/

Import data from few csv files to database in Spring Batch

What is the best way to import data from few csv files in Spring Batch? I mean one csv file responds to one table in database.
I created one batch configuration class for each table and every table has its own job and step.
Is there any solution to do this in more elegant way?
There's a variety of ways you could tackle the problem, but the simplest job would look something like:
FlatFileItemWriter reader with a DelmitedLineTokenizer and BeanWrapperFieldSetMapper to read the file
Processor if you need to do any additional validation/filtering/transformation
JDBCBatchItemWriter to insert/update the target table
Here's an example that includes more information around specific dependencies, config, etc. The example uses context file config rather than annotation-based, but it should be sufficient to show you the way.
A more complex solution might be a single job with a partitioned step that scans the input folder for files and, leveraging reference table/schema information, creates a reader/writer step for each file that it finds.
You also may want to consider what to do with the files once you're done... Delete them? Compress them?

getting data from DB in spring batch and store in memory

In the spring batch program, I am reading the records from a file and comparing with the DB if the data say column1 from file is already exists in table1.
Table1 is fairly small and static. Is there a way I can get all the data from table1 and store it in memory in the spring batch code? Right now for every record in the file, the select query is hitting the DB.
The file is having 3 columns delimited with "|".
The file I am reading is having on an average 12 million records and it is taking around 5 hours to complete the job.
Preload in memory using a StepExecutionListener.beforeStep (or #BeforeStep).
Using this trick data will be loaded once before step execution.
This also works for step restarting.
I'd use caching like a standard web app. Add service caching using Spring's caching abstractions and that should take care of it IMHO.
Load static table in JobExecutionListener.beforeJob(-) and keep this in jobContext and you can access through multiple steps using 'Late Binding of Job and Step Attributes'.
You may refer 5.4 section of this link http://docs.spring.io/spring-batch/reference/html/configureStep.html