Spring Batch and Executors Framework

Spring Batch and Executors Framework - spring-batch

Are these 2 Frameworks used for same purpose. If not why and how these are used in real time applications ? Are there any tutorials to learn these.

Spring batch is meant for batch processing of files by executing them in a series of jobs .Batch processing could be reading from CSV or XML or any flat file and write it to DB .Spring Batch provides many made Classes to read/write CSV, XML and database.
http://www.mkyong.com/tutorials/spring-batch-tutorial/
Java Executor service on the other hand is all about spawning multiple threads in a thread pool and executing them for any purpose be it batch or anything else,but here you have better control because of transaction management.Also , its a feature introduced from Java 5 onwards.Also there are many methods depends on you want to compute on a response or you don't want a response from future object .
http://tutorials.jenkov.com/java-util-concurrent/executorservice.html

Related

JMeter vs ReadyAPI SOAP API Performance Testing

Scenario: Read data from a CSV file with unknown number of records. Use the data to create Soap XML MSg and Post Method. Continue to do this until all the records have been read.
Problem: I used ReadyAPI to perform these actions and was able to achieve the intended TPS at server with whatever i have provided in VU's option. Tried with 150 vu's and observed constant load of 150 requests at the server. But when i try to do the same in JMeter, i was not able to achieve more than 70 TPS and the load isn't evenly distributed as well no matter how many threads i use. I am using a Thread Group, CSV DataSet Config, UserDefined Parameters to create unique request ID and JSR223 PreProcessor with Groovy Script as a child of HTTPRequest to remove empty xml tags.
Read some posts where it was mentioned that JMeter throughput will be stagnant based on servers response capability. But it's not in my case since i can generate 150TPS with ReadyAPI. Annual Licensing costs and Renewal costs associated with ReadyAPI is the Reason that i am looking for solution with JMeter.

Not only the server need to be able to respond fast enough, JMeter must be able to send requests fast enough as well.
Default JMeter configuration is suitable for tests development and debugging and creating some load (rather limited though), you need to properly tune your JMeter instance in order to fully utilize your machine resources so make sure to follow:
JMeter Best Practices
9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure
If your single machine is not powerful enough to conduct the required load it's possible to run JMeter in distributed mode using as many load generators as needed in order to create the necessary number of virtual users/requests per second

Multithreading in Apache Beam : Reading Files in Seperate Threads

We have a requirement to create separate threads for reading multiple files.
Thread 1 can read file 1 and create PCollection<String>. Can I execute a Pardo Operation in a multithreaded environment. and create a PCollection < String,String > from PCollection< String >?
Thread 2 and complete the same operation from Thread 1 but on a different file File 2.
Join output of File1 and File 2 in the main thread after Thread 1 and Thread 2 operation is completed.
Could you please tell whether this is possible and it is a recommended approach?

It sounds like what you want can be done with Beam. In the Beam model, you do not define how you want your operations to run, but rather, what operations you want to perform; then Beam, and the underlying runner takes care of managing threads.
That's why you generally shouldn't manage your own threads to read files in Beam. You should use TextIO to read from plain text files, and the TextIO module should read the files in parallel.
There are a few cases when your files will not be able to be read in parallel:
Your files are compressed. This means that the file needs to be simultaneously decompressed and read, and can be read from different offsets simultaneously.
You have too many files (1000s). If you have thousands or tens of thousands of files, you may want to use TextIO.readAll instead of the normal TextIO implementation, because keeping track of thousands of files that are being read in parallel can overwhelm the system.
Let me know if you are using non-plain text files, or other kind of source.

can spring batch be used as job framework for non batch jobs (regular job)

Is it possible to use spring batch as a regular job framework?
I want to create a device service (microservice) that has the responsibility
to get events and trigger jobs on devices. The devices are remote so it will take time for the job to be complete, but it is not a batch job (not periodically running or partitioning large data set).
I am wondering whether spring batch can still be used a job framework, or if it is only for batch processing. If the answer is no, what jobs framework (besides writing your own) are famous?
Job Description:
I need to execute against a specific device a job that will contain several steps. Each step will communicate with a device and wait for a device to confirm it executed the former command given to it.
I need retry, recovery and scheduling features (thought of combining spring batch with quartz)
Regarding read-process-write, I am basically getting a command request regarding a device, I do a little DB reads and then start long waiting periods that all need to pass in order for the job/task to be successful.
Also, I can choose (justify) relevant IMDG/DB. Concurrency is outside the scope (will be outside the job mechanism). An alternative that came to mind was akka actors. (job for a device will create children actors as steps)

As far as I know - not periodically running or partitioning large data set are not primary requirements for usage of Spring Batch.
Spring Batch is basically a read - process - write framework where reading & processing happens item by item and writing happens in chunks ( for chunk oriented processing ) .
So you can use Spring Batch if your job logic fits into - read - process - write paradigm and rest of the things seem secondary to me.
Also, with Spring Batch , you should also evaluate the part about Job Repository . Spring Batch needs a database ( either in memory or on disk ) to store job meta data and its not optional.
I think, you should put more explanation as why you need a Job Framework and what kind of logic you are running that you are calling it a Job so I will revise my answer accordingly.

Spring batch job to process a dynamic set of files parallely

I am trying to design a spring batch job which I want to process a dynamic set of files parallely. Meaning when the batch job itself is started, the number of files to process is not known - the files are available dynamically. The job should run and continue to process the files parallely as and when a new file arrives,till it has finished processing all files.
I have gone through the spring batch project page, and from my understanding it looks like Multi-threaded Step is suitable for my case. But the thing that I am not sure of is whether it can support dynamic availability of files to be processed?
Any inputs will be highly appreciated.
Thanks and regards,
Priya

You have a couple options here:
MultiResourceItemReader - This ItemReader wraps an ItemWriter like the FlatFileItemReader and loops through the resources provided via an expression.
Partitioning - This option is better for parallel processing of files. Using the MultiResourcePartitioner, you can execute files in parallel with all the restartability, etc features you'd normally get with Spring Batch.
You can read more about partitioning in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/scalability.html

Retry failed writing operations without delaying other steps in Spring Batch application

I am maintaining a legacy application written using Spring Batch and need to tweak it to never lose data.
I have to read from various webservice (one for each step) and then write to a remote database. Things goes bad when connection with the DB drops because all itens read from webservice are discarded (can't read the same item twice), and the data is lost because can not be written.
I need to setup Spring Batch to keep already read data on one step to retry the writing operation next time the step runs. The same step can not read more data until the write operation is successfully concluded.
When not being able to write, the step should keep the read data and pass execution to the next step, after a while, when it's time to the failed step to run again, it should not read another item, retrying the failed writing operation instead.
The batch application should runs in an infinite loop and each step should gather data from one different source. Failed writing operations should be momentarily skipped (keeping the read data) to not delay others steps but should resume from the write operation next time they are called.
I am researching in various web sources aside from official docs, but Spring Batch hasn't the most intuitive docs I have come across.
Can this be achieved? If yes, how?

You can write the data you need to persist in case the job fails to the Batch Step's ExecutionContext. You can restart the job again with this data:
Step executions are represented by objects of the StepExecution class.
Each execution contains a reference to its corresponding step and
JobExecution, and transaction related data such as commit and rollback
count and start and end times. Additionally, each step execution will
contain an ExecutionContext, which contains any data a developer needs
persisted across batch runs, such as statistics or state information
needed to restart
More from: http://static.springsource.org/spring-batch/reference/html/domain.html#domainStepExecution

I do not know if this will be ok with you, but here are my thoughts on your configuration.
Since you have two remote sources that are open to failure, let us partition the overall system with two jobs (not two steps)
JOB A
Step 1: Tasklet
Check a shared folder for files. If files exist, do not proceed to the next step. Will be more understandable when writing about JOB B
Step 2: Webservice to files
Read from your web service and write results to flatfiles in the shared folder. Since you would be using flatfiles for the output, you will solve your "all items read from webservice are discarded and the data is lost because can not be written."
Use Quartz or equivalent for the scheduling of this job.
JOB B
Poll the shared folder for generated files and create a joblauncher with the file (file.getWhere as a jobparameter). Spring integration project may help in this polling.
Step 1:
Read from the file, write them to remote db and move/delete file if writing to db is successful.
No scheduling will be needed since job launching originates from polled in files.
Sample Execution
Time 0: No file in the shared folder
Time 1: Read from web service and write to shared folder
Time 2: Job B file polling occurs, tries to write to db.
If successfull, the system continues to execute.
If not, when Job A tries to execute on its scheduled time, it will skip reading from web service since files still exist in the shared folder. It will skip until Job B consumes the files.
I did not want to go into implementation specifics but Spring Batch can handle all of these situations. Hope that this helps.