How to call a parallel data load job from a sequence loop job in datastage - datastage

I am new to daatstage and working on my first datastage job. I have prepared a data load job which need to take input from a sequence job. The sequence job has table list and I need to pass the table name from table list to load job in a loop. It should pass table name from table list to load job and once load it complete the next table name need to be passed.
However there is error while passing the parameter between the two jobs. Can someone please suggest the steps to pass parameter from one job to another to pick table name.

Examples for that are described in the Knowledge Center documentation or as video
Put your job within the loop in a sequence and use $Counter as value for the table name job parameter (as described in the documentation)

Related

Export database to multiple files in same job Spring Batch

I need to export some database of arround 180k objects to JSON files so I can retain data structure in certain way that suits me for later import to other database. However because of amount of data, I wanto to separate and group data based on some atribute value from database records itself. So all records that have attribute1=value1, I want to go to value1.json, value2.json and so on.
However I still haven't figured out how to do this kind of job. I am using RepositoryItemReader and JsonFileWriter.
I started by filtering data on that attribute and running separate exports, just to verify that works, however I need to do this so I can automate whole process and let it work.
Can this be done?
There are several ways to do that. Here are a couple of options:
Option 1: parallel steps
You start by creating a tasklet that calculates the distinct values of the attribute you want to group items by, and you put this information in the job execution context.
After that, you create a flow with a chunk-oriented step for each value. Each chunk-oriented step would process a distinct value and generate an output file. The item reader and writer would be step-scoped bean and dynamically configured with the information from the job execution context.
Option 2: partitioned step
Here, you would implement a Partitioner that creates a partition for each distinct value. Each worker step would then process a distinct value and generate an output file.
Both options should perform equally in your use-case. However, option 2 is easier to implement and configure in my opinion.

AZURE DATA FACTORY - Can I set a variable from within a CopyData task or by using the output?

I have simple pipeline that has a Copy activity to populate a table. That task is based on a query and will only ever return 1 row.
The problem I am having is that I want to reuse the value from one of the columns (batch number) to set a variable so that at the end of the pipeline I can use a Stored Procedure to log that the batch was processed. I would rather avoid running the query a second time in a lookup task so can I make use of the data already being returned?
I have tried duplicating the column in the Copy activity and then mapping that to something like #BatchNo but that fails and have even tried to add a Set Variable task but can't figure out how to take a single column #{activity('Populate Aleprstw').output} does not error but not sure what that will actually do in this case.
Thanks and sorry if its a silly question.
Cheers
Mark
I always do it like this:
Generate a batch number (usually with a proc)
Use a lookup to grab it into a variable
Use the batch number in all activities (might be multiple copes, procs etc.)
Write the batch completion
From your description it seems you have the batch embedded in the data copy from the start which is not typical.
If you must do it this way, is there really an issue with running a lookup again?
Copy activity doesn't return data like that, so you won't be able to capture the results that way. With this design, running the query again in a Lookup is the best option.
Is the query in the Source running on the same Server as the Sink? If so, you could collapse the entire operation into a Stored Procedure that returns the data point you are trying to capture.

Quartz capabilities

I am trying to create a Quartz scheduler using Java which will be able to call an API and pass in data.
I am totally new to Quartz but now I understand the Job concept and how to create one. I understand the trigger concept and how to trigger one
and I understand how the scheduler works.
What I am having difficult with is how can I pass in the information which is required to be passed to the API. I have an example of an API being called and the data is entered into the DB but the information has been hard coded into the class be passed into the JobDetails.
Ie. the user passes a message to the system which needs to be sent to the user in 12 hours and not before, so what i was planning was create a Job and a trigger in to set the execute time to 12 hours. How to do i pass the message into the scheduler? Where should this message be stored? Is what I am trying to do possible? Have i misunderstood what Quartz is capable of doing?
Thank you for your time. Any assistance would be greatly appreciated.
Take a look at JobDataMap. If you are creating a new job for each user action you can store the message in there which will be available during the execution.
JobDataMap Holds state information for Job instances.
JobDataMap instances are stored once when the Job is added to a scheduler. They are also re-persisted after every execution of jobs annotated with #PersistJobDataAfterExecution.
JobDataMap instances can also be stored with a Trigger. This can be useful in the case where you have a Job that is stored in the scheduler for regular/repeated use by multiple Triggers, yet with each independent triggering, you want to supply the Job with different data inputs.
The JobExecutionContext passed to a Job at execution time also contains a convenience JobDataMap that is the result of merging the contents of the trigger's JobDataMap (if any) over the Job's JobDataMap (if any).
In case you have a single job but for each user action you are creating a new trigger, you can follow the solution given here.
Third option will be, for each user action, persist the message and time to send email to the database. Have a job that runs periodically and scans through database for eligible records for which email has to be sent

How to pass output from a Datastage Parallel job to input as another job?

My requirement is
Parallel Job1 --I extract data from a table, when row count is more than 0
Parallel job 2 should be triggered in the sequencer only when the row count from source query in Job1 is greater than 0
I want to achieve this without creating any intermediate file in job1.
So basically what you want to do is using information from a data stream (of your Job1) and use it in the "above" sequence as a parameter.
In your case you want to decide on sequence level to run subsequent jobs (if more than 0 rows get returned) or not.
Two options for that:
Job1 writes information to a file which is a value file of a parameterset. These files are stored in a fixed directory. The parameter of the value file could then be used in your sequence to decide your further processing. Details for parameter sets can be found here.
You could use a server job for Job1 and set a user status (basic function DSSetUserStatus) in a transfomer. This is also passed back to the sequence and could be referenced in subsequent stages of the sequence. See the documentation but you will find many other information on the internet as well regarding this topic.
There are more solution to this problem - or let us call it challenge. Other ways may be a script called at sequence level which queries the database and will avoid Job1...

spring batch add job parameters in a step

I have a job with two steps. first step is to create a file in a folder with the following structure
src/<timestamp>/file.zip
The next step needs to retrieve this file and process it
I want to add the timestamp to the job parameter. Each job instance is differentiated by the timestamp, but I won't know the timestamp before the first step completes. If i add a timestamp at the beginning of the job to the job parameter then each time a new job instance will be started. any incomplete job will be ignored.
I think you can make use of JobExecutionContext instead.
Step 1 gets the current timestamp, use that to generate the file, and put to JobExecutionContext. Step 2 read from the JobExecutionContext to get the timestamp, which used to construct the input path for its processing.
Just to add something on top on your approach of splitting steps like this: You have to think twice whether this is really what you want. If Step 1 finished, and Step 2 failed, when the job instance is re-runed, it will start from Step 2, that means the file is not going to regenerate in Step 1 (because it is completed already). If it is what you look for, that's fine. If not, you may see if you want to put Step1 & Step2 in one step instead.