IBM DataStage : Job activity does not continue in sequence - datastage

I have 16 job activities in a sequence, I already define the trigger with OK so they're all connected and auto run when the previous job has finished. I already run and recompiled each job activity on their own but when I recompile and re-run the sequence, somehow only the first job activity run and finished as OK but it does not trigger the next job. Here's the log
job_spi_februari..JobControl (#Coordinator): Summary of sequence run
19:18:01: Sequence started
19:18:01: jenis_kredit (JOB job_jenis_kredit) started
19:18:16: jenis_kredit (JOB job_jenis_kredit) finished, status=2 [Finished with warnings]
19:18:16: Sequence finished OK
I'm very confused why it's like this, it shows that it goes well without any problem or warning but it does not trigger the next job as it should be as if there's something wrong. What happens actually and how to fix this?
In case, you're curious about my job activity, they all look like this

If you connect all job activites with a OK trigger - the Sequence will end once a single activity does not finish with ok (like "Finished with Warnings") because nothing is left to execute.
If you want to go on I suggest to define a custom trigger which fires on RunOK and Runwarn.

Related

What is Replay in Cadence/Temporal workflow?

What is Replay in Cadence/Temporal workflow? Is it the same as "retry"?
Why can't I simply use my own logger in workflow code due to replay?
“Retry” and "replay" are completely different.
Replay is for rebuilding the workflow thread states(process/thread stack).
Imagine this workflow code(in Java):
activityStub.doA()
LOG.info("first log")
activityStub.doB()
LOG.info("second log")
If LOG is not from Workflow.getLogger or not wrapped by Workflow.isReplay, the first log will be printed more than one times — just depend on how many times the code got replayed.
The timeline of causing duplicated logs:
After doA is completed, first log is printed.
And then doB is executed, let say doB will take 1 minute.
During the 1 minute, the worker crashes or got restarted.
And then the doB completed.
Then there will be a new workflow task to process the completion of doB.
The workflow task will then executed in a new worker host, which requires a replay to rebuild the Stack until the code of doB. During the replay, assuming LOG is your own logger without wrapping by workflow.isReplay(), the first log will be printed again.
And then doB will get completed and then the second log will be printed.
So at the end, you will see the logs:
first log
first log
second log

Give Input to github action job after first job is succesful

I have a requirement in Github-Action where i need to feed input after 1st job is finished.
I.e. The first job (Build) starts automatically after each commit without taking any input and it runs succesfully.
Then for second job to start(Deploy), based on input(Environment) i select the Deploy should start executing.
I have figured out the manual job execution from https://stackoverflow.com/a/73708545/2728619
But I need help for taking input after 1st job is finished.
For Manual trigger, i tried the solution https://stackoverflow.com/a/73708545/2728619
I am expeccting reading input after 1st job is run(not at the beginning of the workflow).

Stop running Azure Data Factory Pipeline when it is still running

I have a Azure Data Factory Pipeline. My trigger has been set for every each 5 minutes.
Sometimes my Pipeline takes more than 5 mins to finished its jobs. In this case, Trigger runs again and creates another instance of my Pipeline and two instances of the same pipeline make problem in my ETL.
How can I be sure than just one instance of my pipeline runs at time?
As you can see there are several instances running of my pipelines
Few options I could think of:
OPT 1
Specify 5 min timeout on your pipeline activities:
https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities
https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities#activity-policy
OPT 2
1) Create a 1 row 1 column sql RunStatus table: 1 will be our "completed", 0 - "running" status
2) At the end of your pipeline add a stored procedure activity that would set the bit to 1.
3) At the start of your pipeline add a lookup activity to read that bit.
4) The output of this lookup will then be used in if condition activity:
if 1 - start the pipeline's job, but before that add another stored procedure activity to set our status bit to 0.
if 0 - depending on the details of your project: do nothing, add a wait activity, send an email, etc.
To make a full use of this option, you can turn the table into a log, where the new line with start and end time will be added after each successful run (before initiating a new run, you can check if the previous run had the end time). Having this log might help you gather data on how much does it take to run your pipeline and perhaps either add more resources or increase the interval between the runs.
OPT 3
Monitor the pipeline run with SDKs (have not tried that, so this is just to possibly direct you):
https://learn.microsoft.com/en-us/azure/data-factory/monitor-programmatically
Hopefully you can use at least one of them
It sounds like you're trying to run a process more or less constantly, which is a good fit for tumbling window triggers. You can create a dependency such that the trigger is dependent on itself - so it won't run until the previous run has completed.
Start by creating a trigger that runs a pipeline on a tumbling window, then create a tumbling window trigger dependency. The section at the bottom of that article discusses "tumbling window self-dependency properties", which shows you what the code should look like once you've successfully set this up.
Try changing the concurrency of the pipeline to 1.
Link: https://www.datastackpros.com/2020/05/prevent-azure-data-factory-from-running.html
My first thought is that the recurrence is too frequent under these circumstances. If the graph you shared is all for the same pipeline, then most of them take close to 5 minutes, but you have some that take 30, 40, even 60 minutes. Situations like this are when a simple recurrence trigger probably isn't sufficient. What is supposed to happen while the 60 minute one is running? There will be 10-12 runs that wouldn't start: so they still need to run or can they be ignored?
To make sure all the pipelines run, and manage concurrency, you're going to need to build a queue manager of some kind. ADF cannot handle this itself, so I have built such a system internally and rely on it extensively. I use a combination of Logic Apps, Stored Procedures (Azure SQL), and Azure Functions to queue, execute, and monitor pipeline executions. Here is a high level break down of what you probably need:
Logic App 1: runs every 5 minutes and queues an ADF job in the SQL database.
Logic App 2: runs every 2-3 minutes and checks the queue to see if a) there is not a job currently running (status = 'InProgress') and 2) there is a job in the queue waiting to run (I do this with a Stored Procedure). IF this state is met: execute the next ADF and update its status to 'InProgress'.
I use an Azure Function to submit jobs instead of the built in Logic App activity because I have better control over variable parameters. Also, they can return the newly created ADF RunId, which I rely in #3.
Logic App 3: runs every minute and updates the status of any 'InProgress' jobs.
I use an Azure Function to check the status of the ADF pipeline based on RunId.

Autosys trigger same DataStage job multiple times with different inovacation IDs

Here is what I am trying to do, not sure if it is possible:
Autosys gets File1:10pm starts DataStage Job 1.1:10pm
Job1.1:10pm is still running
Autosys gets File1:20pm, it needs to start the same Job1 but run it as Job1.1:20pm, even though Job1.1:10pm is still running & not wait for it to finish, go ahead & run.
Can Autosys call the same DataStage job every time it gets a new file & run it with the new timestamp as the invocation id. Without waiting for the previous job to finish.
Thanks ya'll
Yes - absolutely - this is possible. To enable different InvocationIds you have to check the "multiple instance" property in the jobs properties. With this you allow multiple simultaneous runs of the job.
The invocationID can be a parameter as well when calling it from a sequence.
When your (multiple intance) job writes to a file make sure that each filename is unique to avoid side effects due to the multiple runs at the same time. This can be done by specifying DSJobInvocationId as part of the filename. Note that it is a parameter provided by DataStage which needs to be written exactly as shown with the upper and lower case letters. DataStage will the replace it with the content of your job invocationid at runtime.

Run scheduler to execute jobs at an interval from the completion of the previous job

I need to create schedulers to execute jobs(class files) at specified intervals..For Now, I'm using Quartz Scheduler which triggers the jobs at defined intervals from the time of triggering of it.
For Eg: Consider I'm giving a cron expression to run for every one hour starting at morning 9.My first run will be at 9 and my second run will be at 10 and so on.
If my job is taking 20 minutes to execute then in that case this method is not that much efficient.
What I need to do is to schedule a job for every one hour from the completion time of the previously ran job
For Eg: Consider my job to run every one hour is triggered at 9 and for the first run it took 20 minutes to run, so for the next time the job should trigger only at 10:20 instead of 10 (ie., one hour from the completion of previous ran job)
I need to know whether there are any methods in Quartz Scheduling to achieve this or any other logic I need to do.
If anyone could help me out on this,it would be very helpful for me.
You can easily achieve this by job-chaining your job executions. There are various approaches you can choose from:
(1) Implement a Quartz JobListener and in its jobWasExecuted method, that is invoked by Quartz whenever a job finishes executing, re-fire your job.
(2) Look at the Quartz JobChainingJobListener that you can use to implement simple job chaining scenarios. Please note that the functionality of this listener is very limited as it does not allow you to insert delays between job executions, there is no support for conditions that must be met before target jobs are executed etc. But you can use it as a good starting point to implement (1).
(3) Use QuartzDesk (our commercial product) or any other product that allows you to create job chains while externalizing and managing all job dependencies outside of your application. A job chain can have multiple target jobs that can be executed immediately, with a fixed delay or at arbitrary time in the future produced by a JavaScript expression. It also allows you to implement somewhat more sophisticated works flows, such as firing a target job when multiple source jobs complete their execution etc. I am attaching screenshots showing you what a simple job chain that re-executes Job1 with a 1 minute delay upon Job1's completion (with any job execution status) looks like: