spring batch add job parameters in a step - spring-batch

I have a job with two steps. first step is to create a file in a folder with the following structure
src/<timestamp>/file.zip
The next step needs to retrieve this file and process it
I want to add the timestamp to the job parameter. Each job instance is differentiated by the timestamp, but I won't know the timestamp before the first step completes. If i add a timestamp at the beginning of the job to the job parameter then each time a new job instance will be started. any incomplete job will be ignored.

I think you can make use of JobExecutionContext instead.
Step 1 gets the current timestamp, use that to generate the file, and put to JobExecutionContext. Step 2 read from the JobExecutionContext to get the timestamp, which used to construct the input path for its processing.
Just to add something on top on your approach of splitting steps like this: You have to think twice whether this is really what you want. If Step 1 finished, and Step 2 failed, when the job instance is re-runed, it will start from Step 2, that means the file is not going to regenerate in Step 1 (because it is completed already). If it is what you look for, that's fine. If not, you may see if you want to put Step1 & Step2 in one step instead.

Related

Export database to multiple files in same job Spring Batch

I need to export some database of arround 180k objects to JSON files so I can retain data structure in certain way that suits me for later import to other database. However because of amount of data, I wanto to separate and group data based on some atribute value from database records itself. So all records that have attribute1=value1, I want to go to value1.json, value2.json and so on.
However I still haven't figured out how to do this kind of job. I am using RepositoryItemReader and JsonFileWriter.
I started by filtering data on that attribute and running separate exports, just to verify that works, however I need to do this so I can automate whole process and let it work.
Can this be done?
There are several ways to do that. Here are a couple of options:
Option 1: parallel steps
You start by creating a tasklet that calculates the distinct values of the attribute you want to group items by, and you put this information in the job execution context.
After that, you create a flow with a chunk-oriented step for each value. Each chunk-oriented step would process a distinct value and generate an output file. The item reader and writer would be step-scoped bean and dynamically configured with the information from the job execution context.
Option 2: partitioned step
Here, you would implement a Partitioner that creates a partition for each distinct value. Each worker step would then process a distinct value and generate an output file.
Both options should perform equally in your use-case. However, option 2 is easier to implement and configure in my opinion.

How to call a parallel data load job from a sequence loop job in datastage

I am new to daatstage and working on my first datastage job. I have prepared a data load job which need to take input from a sequence job. The sequence job has table list and I need to pass the table name from table list to load job in a loop. It should pass table name from table list to load job and once load it complete the next table name need to be passed.
However there is error while passing the parameter between the two jobs. Can someone please suggest the steps to pass parameter from one job to another to pick table name.
Examples for that are described in the Knowledge Center documentation or as video
Put your job within the loop in a sequence and use $Counter as value for the table name job parameter (as described in the documentation)

How to pass output from a Datastage Parallel job to input as another job?

My requirement is
Parallel Job1 --I extract data from a table, when row count is more than 0
Parallel job 2 should be triggered in the sequencer only when the row count from source query in Job1 is greater than 0
I want to achieve this without creating any intermediate file in job1.
So basically what you want to do is using information from a data stream (of your Job1) and use it in the "above" sequence as a parameter.
In your case you want to decide on sequence level to run subsequent jobs (if more than 0 rows get returned) or not.
Two options for that:
Job1 writes information to a file which is a value file of a parameterset. These files are stored in a fixed directory. The parameter of the value file could then be used in your sequence to decide your further processing. Details for parameter sets can be found here.
You could use a server job for Job1 and set a user status (basic function DSSetUserStatus) in a transfomer. This is also passed back to the sequence and could be referenced in subsequent stages of the sequence. See the documentation but you will find many other information on the internet as well regarding this topic.
There are more solution to this problem - or let us call it challenge. Other ways may be a script called at sequence level which queries the database and will avoid Job1...

Spring Batch - execute a set of steps 'x' times based on a condition

I need to execute a sequence of steps a specific number of times.. any pointers on what is the best way to do this in Spring Batch. I am able to implement executing a single step 'x' times. but my requirement is to execute a set of steps - based on a condition 'x' times.Any pointers will help.
Thanks
Lakshmi
You could put all steps in a job an start the whole job several times. There are different ways, how a job actually is launched in spring-batch. have a look at joboperator and launcher and then simply implement a loop around the launching of the job.
You can do this after the whole spring-context is initialized, so there will be no overhead concerning that. But you must by attention about the scope of your beans, especially the reader and writers.
Depending on your needs concerning failurehandling and restart, you also have pay attention how you manage the execution context of your job and steps.
You can simulate a loop with SB using a JobExecutionDecider:
Put it in front of all steps.
Store x in job execution context and check for x value into
decider: move to 'END' if x equals desidered value or increment it
and move to first step of set.
After last step move back to start (the decider).

TfileList catches one of the 6 files only

I tried to display some results from several files in a directory. I use TFileList, and 2 tFileInputDelimited which are both linked to TFileList. I don't know why but at the end of the processing my results are lugged from just one of the 6 files I want. It appears that there are results from the list file of the directory.
Each tFileInputDelimited has ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")) as name of the flow.
Here is my TMap:
Your job is set up so your lookup is iterative which causes some issues as Talend only seems to use the last iteration rather than doing what you might expect and iterating through every step for everything it needs (although this might be more complicated than you first think).
One option is to rework the job so you use your iterate part of the job as the main input to the tMap rather than the lookup.
Alternatively, you could iterate the data into a tBufferOutput component and then OnSubjobOk you could link the job as before but replace the iterative part with a tBufferInput component as it will store all of the data from all of the files iterated through.