I have a use case where i need to read two csv files with one unique column and create one output csv..do we have any out of the box solution that we have with spring batch
Related
I need to do BULK INSERT into PostgreSQL table in Apache NiFi without specifying the physical csv file in COPY command. I just cannot store the CSV files on the disk and would like to do BULK INSERT using a flow file that is coming from previous processors and is already in CSV format (or I can change it to json, that's not an issue).
Please advise, what is the best way to do this in Apache NiFi?
PostgreSQL's COPY operation requires a path to a file. I'd recommend looking at PutDatabaseRecord which generates a PreparedStatement based on your CSV data and executes the statement in a single batch (unless Maximum Batch Size is set)
I am working on Spring Boot + Spring Batch example. Where I want to read data (Employee Data) from Oracle Datasource and Department Data from CSV and load it into MongoDB as Employee schema holds embeeded Department details.
I am not getting sure if AnstractItemStreamItemReader is the good choice ?
An item reader is designed to read from a single source. You can't read from two sources at the same time, unless you are ready to write a custom reader. What you can do is create a staging table where you load the content of the file, then join data between employee/department tables and write the result in MongoDB.
A similar question but for two files here: Combine rows from 2 files and write to DB using Spring Batch.
I have multiple files stored in HDFS, and I need to merge them into one file using spark. However, because this operation is done frequently (every hour). I need to append those multiple files to the source file.
I found that there is the FileUtil that gives the 'copymerge' function. but it doesn't allow to append two files.
Thank you for your help
You can do this with two methods:
sc.textFile("path/source", "path/file1", "path/file2").coalesce(1).saveAsTextFile("path/newSource")
Or as #Pushkr has proposed
new UnionRDD(sc, Seq(sc.textFile("path/source"), sc.textFile("path/file1"),..)).coalesce(1).saveAsTextFile("path/newSource")
If you don't want to create a new source and overwrite the same source every hour, you can use dataframe with save mode overwrite ( How to overwrite the output directory in spark)
What is the best way to import data from few csv files in Spring Batch? I mean one csv file responds to one table in database.
I created one batch configuration class for each table and every table has its own job and step.
Is there any solution to do this in more elegant way?
There's a variety of ways you could tackle the problem, but the simplest job would look something like:
FlatFileItemWriter reader with a DelmitedLineTokenizer and BeanWrapperFieldSetMapper to read the file
Processor if you need to do any additional validation/filtering/transformation
JDBCBatchItemWriter to insert/update the target table
Here's an example that includes more information around specific dependencies, config, etc. The example uses context file config rather than annotation-based, but it should be sufficient to show you the way.
A more complex solution might be a single job with a partitioned step that scans the input folder for files and, leveraging reference table/schema information, creates a reader/writer step for each file that it finds.
You also may want to consider what to do with the files once you're done... Delete them? Compress them?
Can someone please give me a technical design overview of how I should implement this scenario :
I am using spring batch to import data from CSV files to different tables and once they are imported I run some validations on these tables and now I need to write all those data from 3 different tables into three different Sheets of a single Excel file. Can someone please help me how I should use ItemReaders and Itemwriters to solve this problem ?
If I'm asked I would implement as follows. create xls file from your code or first step which would be method invoker. which would create the file. and pass the file job parameters.
Step 1/2 would do a chunk reading from table 1 and in the itemwriter I would use the custom Item writer which would use POI and I would write to first sheet.
Step 2 would do a chunk reading from table 2 and in the itemwriter would read second sheet.
Since you have single file you can never achieve the advantage of spring batch performance like multithread, partitioning etc. Rather than its better to write to different file with independent task