Concourse - fly CLI - Limit to specific job name - concourse

Is it possible in Concourse to limit to a task inside the pipeline? Let's say I have a pipeline with three jobs, but I want test just job #2 not 1 and 3. I tried to do a trigger job by pointing to a pipeline/job-name and it kind of worked (i.e., fly -t lab tj -j bbr-backup-bosh/export-om-installation). 'Kind of' because it did start from this job and then it fired off other jobs that I didn't want to test anyway. Wondering if there Ansible-like (i.e., --tag)
Thanks!!

You cannot "limit" a triggered job to itself, since a job is part of a pipeline. Each time you trigger a job, it will keep putting all the resources it uses. These resources, if marked as trigger: true downstream, well, they will trigger the downstream jobs.
You have two possibilities:
do not mark any resource in the pipeline as trigger: true. This obviously also means that your pipeline will never advance automatically, you will need to manual trigger each job. Not ideal but maybe good enough while troubleshooting the pipeline itself.
Think in terms of tasks. A job is made of one or more tasks, and tasks can be run independently from the pipeline. See the documentation for fly execute and for example https://concoursetutorial.com/ where they explain tasks and fly execute. Note that fly execute supports also --input and --output, so it is possible to emulate the task inputs and outputs as if it were in the pipeline.

Marco is pretty dead on but there’s one other option. You could pause the other jobs and abort any builds that would be triggered after they’re unpaused

Related

Combine multiple queued Azure DevOps build pipeline jobs into one run

I have a custom Agent Pool with multiple Agents, each with the same capabilities. This Agent Pool is used to run many YAML build pipeline jobs called them A1, A2, A3, etc. Each of those A* jobs triggers a different YAML build pipeline job called B. In this scheme, multiple simultaneous completions of A* jobs will trigger multiple simultaneous B jobs. However, the B job is setup to self-interlock, so that only one instance can run at a time. The nice thing is that when B job runs, it consumes all of the existing A* outputs (for safety reasons, A* and B are also interlocked).
Unfortunately, this means that of the multiple simultaneous B jobs, most will be stuck waiting for the first to finish after it processed all of the outputs of complete A* jobs, and only then the rest of the queued and/or running but blocked on interlock instances of B job can continue one at a time, with each having nothing to consume because all of the A* outputs have already been processed.
Is there a watch to make Azure DevOps batch together multiple instances of job B together? In other words, if there is already one B job instance running or queued, don't add another one?
Is there a watch to make Azure DevOps batch together multiple instances of job B together? In other words, if there is already one B job instance running or queued, don't add another one?
Sorry for any inconvenience.
This behavior is by designed. AFAIK, there is no such way/feature to combine multiple queued build pipeline to one.
Besides, personally think that your request is reasonable. You could add your request for this feature on our UserVoice site (https://developercommunity.visualstudio.com/content/idea/post.html?space=21 ), which is our main forum for product suggestions. Thank you for helping us build a better Azure DevOps.
Hope this helps.

Stop running Azure Data Factory Pipeline when it is still running

I have a Azure Data Factory Pipeline. My trigger has been set for every each 5 minutes.
Sometimes my Pipeline takes more than 5 mins to finished its jobs. In this case, Trigger runs again and creates another instance of my Pipeline and two instances of the same pipeline make problem in my ETL.
How can I be sure than just one instance of my pipeline runs at time?
As you can see there are several instances running of my pipelines
Few options I could think of:
OPT 1
Specify 5 min timeout on your pipeline activities:
https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities
https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities#activity-policy
OPT 2
1) Create a 1 row 1 column sql RunStatus table: 1 will be our "completed", 0 - "running" status
2) At the end of your pipeline add a stored procedure activity that would set the bit to 1.
3) At the start of your pipeline add a lookup activity to read that bit.
4) The output of this lookup will then be used in if condition activity:
if 1 - start the pipeline's job, but before that add another stored procedure activity to set our status bit to 0.
if 0 - depending on the details of your project: do nothing, add a wait activity, send an email, etc.
To make a full use of this option, you can turn the table into a log, where the new line with start and end time will be added after each successful run (before initiating a new run, you can check if the previous run had the end time). Having this log might help you gather data on how much does it take to run your pipeline and perhaps either add more resources or increase the interval between the runs.
OPT 3
Monitor the pipeline run with SDKs (have not tried that, so this is just to possibly direct you):
https://learn.microsoft.com/en-us/azure/data-factory/monitor-programmatically
Hopefully you can use at least one of them
It sounds like you're trying to run a process more or less constantly, which is a good fit for tumbling window triggers. You can create a dependency such that the trigger is dependent on itself - so it won't run until the previous run has completed.
Start by creating a trigger that runs a pipeline on a tumbling window, then create a tumbling window trigger dependency. The section at the bottom of that article discusses "tumbling window self-dependency properties", which shows you what the code should look like once you've successfully set this up.
Try changing the concurrency of the pipeline to 1.
Link: https://www.datastackpros.com/2020/05/prevent-azure-data-factory-from-running.html
My first thought is that the recurrence is too frequent under these circumstances. If the graph you shared is all for the same pipeline, then most of them take close to 5 minutes, but you have some that take 30, 40, even 60 minutes. Situations like this are when a simple recurrence trigger probably isn't sufficient. What is supposed to happen while the 60 minute one is running? There will be 10-12 runs that wouldn't start: so they still need to run or can they be ignored?
To make sure all the pipelines run, and manage concurrency, you're going to need to build a queue manager of some kind. ADF cannot handle this itself, so I have built such a system internally and rely on it extensively. I use a combination of Logic Apps, Stored Procedures (Azure SQL), and Azure Functions to queue, execute, and monitor pipeline executions. Here is a high level break down of what you probably need:
Logic App 1: runs every 5 minutes and queues an ADF job in the SQL database.
Logic App 2: runs every 2-3 minutes and checks the queue to see if a) there is not a job currently running (status = 'InProgress') and 2) there is a job in the queue waiting to run (I do this with a Stored Procedure). IF this state is met: execute the next ADF and update its status to 'InProgress'.
I use an Azure Function to submit jobs instead of the built in Logic App activity because I have better control over variable parameters. Also, they can return the newly created ADF RunId, which I rely in #3.
Logic App 3: runs every minute and updates the status of any 'InProgress' jobs.
I use an Azure Function to check the status of the ADF pipeline based on RunId.

Wait for all LSF jobs with given name, overriding JOB_DEP_LAST_SUB = 1

I've got a large computational task, consisting of several steps, that I run on a PC cluster, managed by LSF.
Part of this task includes launching several parallel jobs with identical names. Jobs are somewhat different, therefore it is hard to transform them to a job array.
The next step of this computation, following these jobs, summarizes their results, therefore it must wait until all of them are finished.
I'm trying to use -w ended(job-name) command line switch for bsub, as usual, to specify job dependencies.
However, admins of the cluster have set JOB_DEP_LAST_SUB = 1 in lsb.params.
According to the LSF manual, this makes LSF to wait for only one most recent job with supplied name to complete, instead of all jobs.
Is it possible to override this behavior for my task only without asking admins to reconfigure the whole cluster (this cluster is used by many people, it is very unlikely that they agree)?
I cannot find any clues in the manual.
Looks like it cannot be overridden.
I've changed job names to make them unique by appending random value, then I've changed condition to -w ended(job-name-*)

How to run five agent jobs simultaneously in VSTS (Azure DevOps)?

I have created a release pipeline with five agent jobs and I want to start all five jobs at the same time.
example:
In example I need to start all agent jobs simultaneously, and execute unique task (wait 10 seconds) at the same time.
Does VSTS (Azure DevOps) have option to do this?
You could also just use 5 different stages (depending on what exactly it is you're doing). Then you can leverage the full power of the pipeline model, have pre and post stages, whatever you wish. This is as mentioned in the other answers also possible with different agent jobs but this is more straight forward. Also you can easily clone stages.
I'm not sure what it is what you're trying to achieve with waiting for 10 seconds, but this is very easy to do with a PowerShell step. Select the radio button "Inline" and type this:
Start-Sleep -Seconds 10
Example of a pipeline, that might do the simultaneous work that you want, but keep in mind, each agent job (doesn't matter multiple jobs in one stage or multiple single job stages) has to find an agent that is capable, available and idle, otherwise the job(s) will wait in a waiting queue!!!
In the release pipeline click on "Agent job", then expand the "Execution plan" and click on "Multi-agent".
I think you need to create 5 stages since for the release pipeline in the Azure devops, jobs in one stage could not be paralleled.see documents from Microsoft
Or if you want to run the same set of set of tasks on multiple agents, you could use the option Multi-agent as shown below.
ADO Multi-agent option
If you want a job to be executed in parallel then choose multi-agent configuration, but if you have 5 (very) different jobs then you can choose "Even if a previous job has failed" from the dropdown "Run this job".
This is by default set to "Only when all previous jobs have succeeded" which means that:
All of your 5 jobs will be executed sequentially in the order that you've set them up
The chain of jobs will come to a stop as soon as one of the jobs fails
Take note that you can specify individually on what agent queue what job will execute, by default they're all going to the same queue, if you run 5 jobs in parallel on a single queue, then this queue should have 5 agents available and idle to get what you're expecting.

Setting up a Job Schedule

I currently have a setup that creates a job and then collect some metrics about the tasks in the job. I want to do something similar, but by setting a job schedule instead. In particular, I want to set a job schedule that wakes up at a recurrence interval that I specify, and run the same code that I was running when creating a job. What's the best way to go about doing that?
It seems that there is a CloudJobSchedule that I could use to set up my job schedule, but this only lets me create say a job manager task, and specify few properties. How can I run external code on the jobs created by the Job schedule?
It could also help to clarify how the CloudJobSchedule works. Specifically, after I commit my job schedule, what would happen programmatically. Does the code just move sequentially and run the rest of the code. In this case, does it make sense to get a reference to the last job created by the job schedule and run code on the job returned?
You'll want to create a CloudJobSchedule. You can specify the recurrence in the Schedule.
If you only need to run a single task per recurrence, your job manager task can simply be the task you need to run. If you need to run multiple tasks per job recurrence, your job manager needs to have logic to submit tasks to Batch and monitor for completion (if necessary).
When you submit a job schedule to Batch, your client side code will continue running. The behavior is no different than if you were submitting a regular job. You can retrieve the last job run via JobScheduleExecutionInformation and the RecentJob property.