ADF Scheduling when existing Job not yet finished - azure-data-factory

Having read https://learn.microsoft.com/en-us/azure/data-factory/v1/data-factory-scheduling-and-execution, it is unclear to me if:
A schedule is made every hr for a job to run,
can we stop the concurrent execution of the next job at hr+1 if the job for hr+0 is still running?
It looks if concurrency = 1 means this,
But is that invocation simply not start until concurrent execution is finished?
Or will it be discarded?

When we set the concurrency 1, only one instance will be allowed to run at a time. When the scheduled trigger runs again and tries to run the pipeline, If the pipeline is already running, the next invocation will be queued. It will start after finishing the current instance.
For your question, the following invocation will be queued. After the first run finishes, the next run will start.

Related

Impact of unscheduling over the running job using Quartz?

There are some jobs scheduled using any trigger either SimpleTrigger or CronTrigger, now want to unschedule and delete the jobs. The job can be in running or already completed its execution state. If a unschedule or already executed job is deleted then there won't be any worst impact but what happen to the running job, if unschedule using unscheduleJob() or deleted directly by deleteJob() methods of the Quartz?
And if the running job is being halted in-between when the unscheduleJob() or deleteJob() is called upon then is there any way to let the job to complete it's current execution before unscheduling or deleting to avoid any malfunctioning or bad data?
Tried to check the conflicting jobs and make use of SchedulerListener also but didn't get any information.
Thanks in Advance!!!

How to queue multiple runs of same azure pipeline on one agent

My pipeline triggers on resources, schedule and merges. Sometimes these can happen almost at the same time and many pipeline runs can be created. I've noticed that the jobs that run don't always belong to the same run.
Example
one pipeline A includes 2 jobs j.1 and j.2
a resource triggers A.1 and starts j.1
another resource triggers A.2 also and queues j.1.
A.1 finishes a job and instead of starting j.2 it is A.2 j.1 that starts.
How do I lock the run so that A.1 j.1 and j.2 runs to completion before A.2 starts?
On the agent, the queue is for the job-level not pipeline-level. So, normally the agent will be allocate to the higher priority jobs in the pipelines regardless of whether the jobs are in the same pipeline run.
Currently, we have not method or settings to manager the sort of the queued jobs.

How can I kill (not cancel) an errant Azure Pipeline run, stage, job, or task?

I want to know how to kill an Azure Pipeline task (or any level of execution - run, stage, job, etc.), so that I am not blocked waiting for an errant pipeline to finish executing or timeout.
For example, canceling the pipeline does not stop it immediately if a condition is configured incorrectly. If the condition resolves to true the task will execute even if the pipeline is cancelled.
This is really painful if your org/project only has 1 agent. :(
How can I kill (not cancel) an errant Azure Pipeline run, stage, job, or task?
For the hosted agent, we could not kill that azure pipeline directly, since we cannot directly access the running machine.
As workaround, we could reduce the time that the task continues to run after the job is cancelled by setting a shorter Build job cancel timeout in minutes:
For example, I create a pipeline with task, which will still run for 60 minutes after the job is cancelled. But if I set the value of Build job cancel timeout in minutes to 2 mins, the azure pipeline will be cancelled completely.
For the private agent, we could run services.msc, and look for "VSTS Agent (name of your agent)". Right-click the entry and then choose restart.

Abort a Datastage job at a specified time

I have a scheduled parallel Datastage (11.7) job.
This job has a Hive Connector with a Before and After Statement.
The before statement run ok but After statement remains in running state for several hours (on Hue Log i see this job finished in 1hour) and i have to manually abort it on Datastage Director.
Is there the way to "program an abort"?
For example i want schedule the interruption of the running job every morning at 6.
I hope I was clear :)
Even though you can kill the job - as per other responses - using dsjob to stop the job, this may have no effect because the After statement has been issued synchronously; the job is waiting for it to finish, and (probably) not processing kill signals and the like in the meantime. You would be better advised to work out why the After command is taking too long, and addressing that.

Datastage: How to keep continuous mode job running after a unexpected termination

I have a job that uses the Kafka Connector Stage in order to read a Kafka queue and then load into the database. That job runs in Continuous Mode, which it has no time to conclude, since it keeps monitoring the Kafka queue in real time.
For unexpected reasons (say, server issues, job issues etc) that job may terminate with failure. In general, that happens after 300 running hours of that job. So, in order to keep the job alive I have to manually look to the job status and then to do a Reset and Run, in order to keep the job running.
The problem is that between the job termination and my manual Reset and Run can pass several hours, which is critical. So I'm looking for a way to eliminate the manual interaction and to reduce that gap by automating the job invocation.
I tried to use Control-M to daily run the job, but with no success: The first day the Control-M called the job, it ran it fine. But in the next day, when the Control-M did an attempt to instantiate the job again it failed (since it was already running). Besides, the Datastage will never tell back Control-M that a job was successfully concluded, since the job's nature won't allow that.
Said that, I would like to hear ideas from you that can light me up.
The first thing that came in mind is to create a intermediate Sequence and then schedule it in Control-M. Then, this new Sequence would call the continuous job asynchronously by using command line stage.
For the case where just this one job terminates unexpectedly and you want it to be restarted as soon as possible, have you considered calling this job from a sequence? The sequence could be setup to loop running this job.
Thus sequence starts job and waits for it to finish. When job finishes, the sequence will then loop and start the job again. You could have added conditions on job exit (for example, if the job aborted, then based on that job end status, you could reset the job before re-running it.
This would not handle the condition where the DataStage engine itself was shut down (such as for maintenance or possibly an error) in which case all jobs end including your new sequence. The same also applies for a server reboot or other situations where someone may have inadvertently stopped your sequence. For those cases (such as DataStage engine stop) your team would need to have process in place for jobs/sequences that need to be started up following a DataStage or System outage.
For the outage scenario, you could create a monitor script (regardless of whether running the job solo or from sequence) that sleeps/loops on 5-10 minute intervals and then checks the status of your job using dsjob command, and if not running can start that job/sequence (also via dsjob command). You can decide whether that script startup would occur at DataSTage startup, machine startup, or run it from Control M or other scheduler.