Spring batch FlatFileItemReader read multiple files

Spring batch FlatFileItemReader read multiple files - spring-batch

As per spring batch docs they don't recommend using MuliResourceItemReader because of restart issue and recommend to use one file in each folder.
"It should be noted that, as with any ItemReader, adding extra input
(in this case a file) could cause potential issues when restarting. It
is recommended that batch jobs work with their own individual
directories until completed successfully."
If I have a folder with following structure dest/<timestamp>/file1.txt, file2.txt
How do I configure FlatFileItemReader to read a file with pattern for each folder in a path.

I would prefer Spring Integration project for reading files from a directory since it is not Spring Batch Framework's business to poll a directory.
In the most basic scenario, Spring Integration will poll the files in the directory, and for each file it will run a job with the filename as a parameter. This will leave out the file polling logic from your batch jobs.
I should suggest this excellent article by Dave Syer for the basic concepts of integrating these two technologies. Take a close look at the sections dealing with FileToJobLaunchRequestAdapter
Source code of this adapter will also help understanding the internals.

I also got a similar set of requirement to read multiple text/csv files and achieved by using org.springframework.batch.item.file.MultiResourceItemReader.
The detailed implementation is provided in the below link.
http://parameshk.blogspot.in/2013/11/spring-batch-flat-file-reader-reads.html

Related

FileWatcher on a directory

I have a Spark/Scala application and my requirement here is to look for a file in a directory
and process it and finally cleaning up that directory.
Isn't this possible to do this within the spark application itself like
- Watching for a file in a directory
- When it finds the file continue the process
- Cleans up the directory before ending the app
- Repeat the above for the next new run and so on...
We currently do this file-watching process using an external application
so in order to remove the dependency on that third-party application
we would like to do this within our spark/scala application itself.
Is there a feasible solution using just scala/spark for a file-watcher?
Please guide me.

File streams in spark streaming?
https://spark.apache.org/docs/latest/streaming-programming-guide.html#file-streams

Export OSB resources without using export wizard on JDeveloper

Using JDeveloper in order to create and manage Oracle Service Bus 12c resources, I am able to export the required resources into a .jar file using the Resources Export Wizard of JDeveloper, selecting one by one those needed, under the tree of each project.
What I want to do though is find a way to export a .jar file based on resources list, given in a file of a commonly used format (JSON, CSV etc), as it can be time saving for a large number of resources. My first thought was to search if JDeveloper provides such way or attempt do this programmatically, yet my search on this has not given me any information of how-to.
Is there an alternative way of doing this?

If you have Oracle OSB 11.1.1.7.0 or higher you can automate the compilation process for OSB at project level using configjar, here's a whole example of an implementation which include: compilation using configjar, automating the task retrieving the code from GIT using Jenkins and a python script.
You can also do it using ANT, here's a good document of Oracle explaining that. (I've tried it, but found easier to use configjar, this is the only option for versions below 11.1.1.7.0).
After creating any of those compilation methods you can create a CSV file, parse it with python and loop the compilation.

Analytics for Apache Hadoop - what files are uploaded for Analyzing data with Oozie?

The Analytics for Apache Hadoop documentation lists the following steps for analysing data with Oozie:
Analyzing data with Oozie
Install required drivers.
Use webHDFS to upload the workflow related files to HDFS.
For example, upload the files to /user/biblumix/apps/oozie
...
Source: https://www.ng.bluemix.net/docs/services/AnalyticsforHadoop/index.html
Question: What files are typically uploaded in step 2? The wording suggests that the files are oozie files (e.g. xml files). However, the link takes you to the section Upload your data.

I performed some testing, and I had to upload a workflow.xml in addition to the data files that my oozie job processes.

How to check the content of a file if it has changes before processing it on a job using the spring batch framework

How to check the content of a file if it has changes before processing it on a job using the spring batch framework. My idea is to compare it on the existing database where I wrote that file content (the previous content of the file). To avoid processing it again if there is no changes on the content of that file. I am new in using spring batch framework . Can you give me some idea or sample codes to do that?

See the Spring Integration Documentation.
You can use a file inbound channel adapter, configured with a FileSystemPersistentAcceptOnceFileListFilter. If the modified time on the file changes, the file will be resent to the message channel.
Then, using the Spring Batch Integration components (e.g. JobLaunchingGateway) to launch your batch job to process the file.
You need to be careful, though, to not pick up the file while it is in the process of being modified. It's generally better to remove or rename the file after processing and have the writer create a temporary file and rename it to the final file name after writing. This will avoid the problem of the adapter "seeing" a partially updated file.

How should I handle Sphinx configuration in version control?

I have a problem with my development workflow and Sphinx. I want to keep configuration file for Sphinx in version control so it's easier to manage. This means it's easier to link the file to code updates, etc ... However, the configuration file is stored in /usr/local/etc.
There are two solutions I can think of. Store the file in the repository and move it to the correct folder on deployment or recompile Sphinx to look for the file in my repository. I had a suggestion from someone to use a symlink, but that still requires a change on deployment.
Is there an elegant solution in Sphinx I'm missing?

perhaps have the /usr/local/etc/sphinx.conf file be a script that pulls the actual sphinx config from the file in your repo.
http://sphinxsearch.com/docs/current.html#rel098 scroll down to general and you'll see:
"added scripting (shebang syntax) support to config files (example: #!/usr/bin/php in the first line)"

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Spring batch FlatFileItemReader read multiple files - spring-batch

I also got a similar set of requirement to read multiple text/csv files and achieved by using org.springframework.batch.item.file.MultiResourceItemReader. The detailed implementation is provided in the below link. http://parameshk.blogspot.in/2013/11/spring-batch-flat-file-reader-reads.html

Related

FileWatcher on a directory

Export OSB resources without using export wizard on JDeveloper

Analytics for Apache Hadoop - what files are uploaded for Analyzing data with Oozie?

How to check the content of a file if it has changes before processing it on a job using the spring batch framework

How should I handle Sphinx configuration in version control?

Categories

Resources