I have a custom Agent Pool with multiple Agents, each with the same capabilities. This Agent Pool is used to run many YAML build pipeline jobs called them A1, A2, A3, etc. Each of those A* jobs triggers a different YAML build pipeline job called B. In this scheme, multiple simultaneous completions of A* jobs will trigger multiple simultaneous B jobs. However, the B job is setup to self-interlock, so that only one instance can run at a time. The nice thing is that when B job runs, it consumes all of the existing A* outputs (for safety reasons, A* and B are also interlocked).
Unfortunately, this means that of the multiple simultaneous B jobs, most will be stuck waiting for the first to finish after it processed all of the outputs of complete A* jobs, and only then the rest of the queued and/or running but blocked on interlock instances of B job can continue one at a time, with each having nothing to consume because all of the A* outputs have already been processed.
Is there a watch to make Azure DevOps batch together multiple instances of job B together? In other words, if there is already one B job instance running or queued, don't add another one?
Is there a watch to make Azure DevOps batch together multiple instances of job B together? In other words, if there is already one B job instance running or queued, don't add another one?
Sorry for any inconvenience.
This behavior is by designed. AFAIK, there is no such way/feature to combine multiple queued build pipeline to one.
Besides, personally think that your request is reasonable. You could add your request for this feature on our UserVoice site (https://developercommunity.visualstudio.com/content/idea/post.html?space=21 ), which is our main forum for product suggestions. Thank you for helping us build a better Azure DevOps.
Hope this helps.
Related
I have some Teamcity jobs created.
One of those jobs, let's call it job A, has a schedule trigger, running daily at 7:00 am.
Now I have another one, job B, that I want to run once per week, but only after job A ran.
Given that job A takes about 30 seconds to run, I know I can create a schedule trigger for job B, that will run on every Monday, at 07:10 am.
I also know I can create a Finish Build Trigger, making sure that job B runs after job A ran, but it will run every day(because job A needs to run every day)
I'm trying to find a way to combine these, and come up with some sort of trigger that does something like this:
runs job B once per week(say Monday morning), after job A ran.
Could someone nudge me in the right direction? Or explain to me if/why what I'd like to do is a no-no. Thanks
It looks like the feature called Snapshot Dependency fits well into your scenario.
To tell it short, you can link the two jobs you have with the snapshot dependency. In your case, job B will "snapshot-depend" on job A. In my experience, it works best if both jobs use the same VCS root, that is, work with the same repository.
Job A is configured to run daily and job B is configured to run weekly (via regular scheduled triggers). When job A is triggered, it doesn't affect job B at all. On the other hand, when job B is triggered, it tries to find whether there's a suitable build of A by that time. If the two jobs work with the same repo, and the Enforce revision synchronization flag is ON, this means it will try to find the build of A of that same source code revision.
If there's a suitable build of A, it won't trigger a new one and will just build B. If there's no suitable build of A, it will first trigger A, and then trigger the build of B.
Activity tasks are pretty easy to understand since it's executing an activity...but what is a decision task? Does the worker run through the workflow from beginning (using records of completed activities) until it hits the next "meaningful" thing it needs to do while making a "decision" on what needs to be done next?
My Opinions
Ideally users don't need to understand it!
However, decision/workflow Task a leaked technical details from Cadence/Temporal API.
Unfortunately, you won't be able to use Cadence/Temporal well if you don't fully understand it.
Fortunately, using iWF will keep you away from leakage. iWF provides a nice abstraction on top of Cadence/Temporal but keep the same power.
TL;DR
Decision is short for workflow decision.
A decision is a movement from one state to another in a workflow state machine. Essentially, your workflow code defines a state machine. This state machine must be a deterministic state machine for replay, so workflow code must be deterministic.
A decision task is a task for worker to execute workflow code to generate decision.
NOTE: in Temporal, decision is called "command", the workflow decision task is called "workflow task" which generates the "command"
Example
Let say we have this workflow code:
public string sampleWorkflowMethod(...){
var result = activityStubs.activityA(...)
if(result.startsWith("x"){
Workflow.sleep(...)
}else{
result = activityStubs.activityB(...)
}
return result
}
From Cadence/Temporal SDK's point of view, the code is a state machine.
Assuming we have an execution that the result of activityA is xyz, so that the execution will go to the sleep branch.
Then the workflow execution flow is like this graph.
Workflow code defines the state machine, and it's static.
Workflow execution will decide how to move from one state to another during the run time, based on the intput/result/and code logic
Decision is an abstraction in Cadence internal. During the workflow execution, when it change from one state to another, the decision is the result of that movement.
The abstraction is basically to define what needs to be done when execution moves from one state to another --- schedule activity, timer or childWF etc.
The decision needs to be deterministic --- with the same input/result, workflow code should make the same decision --- schedule activityA or B must be the same.
Timeline in the example
What happens during the above workflow execution:
Cadence service schedules the very first decision task, dispatched to a workflow worker
The worker execute the first decision task, and return the decision result of scheduling activityA to Cadence service. Then workflow stay there waiting.
As a result of scheduling activityA, an activity task is generated by Cadence service and the task is dispatched to an activity worker
The activity worker executes the activity and returns a result xyz to Cadence service.
As a result of receiving the activity result, Cadence service schedules the second decision task, and dispatch to a workflow worker.
The workflow worker execute the second decision task, and respond the decision result of scheduling a timer to Cadence service
On receiving the decision task respond, Cadence service schedules a timer
When the timer fires, Cadence service schedules the third decision task and dispatched to workflow worker again
The workflow worker execute the third decision task, and respond the result of completing the workflow execution successfully with result xyz.
Some more facts about decision
Workflow Decision is to orchestrate those other entities like activity/ChildWorkflow/Timer/etc.
Decision(workflow) task is to communicate with Cadence service, telling what is to do next. For example, start/cancel some activities, or complete/fail/continueAsNew a workflow.
There is always at most one outstanding(running/pending) decision task for each workflow execution. It's impossible to start one while another is started but not finished yet.
The nature of the decision task results in some non-determinism issue when writing Cadence workflow. For more details you can refer to the article.
On each decision task, Cadence Client SDK can start from very beginning to "replay" the code, for example, executing activityA. However, this replay mode won't generate the decision of scheduling activityA again. Because client knows that the activityA has been scheduled already.
However, a worker doesn't have to run the code from very beginning. Cadence SDK is smart enough to keep the states in memory, and wake up later to continue on previous states. This is called "Workflow Sticky Cache", because a workflow is sticky on a worker host for a period.
History events of the example:
1. WorkflowStarted
2. DecisionTaskScheduled
3. DecisionTaskStarted
4. DecisionTaskCompleted
5. ActivityTaskScheduled <this schedules activityA>
6. ActivityTaskStarted
7. ActivityTaskCompleted <this records the results of activityA>
8. DecisionTaskScheduled
9. DecisionTaskStarted
10. DecisionTaskCompleted
11. TimerStarted < this schedules the timer>
12. TimerFired
13. DecisionTaskScheduled
14. DecisionTaskStarted
15. DecisionTaskCompleted
16. WorkflowCompleted
TLDR; When a new external event is received a workflow task is responsible for determining which next commands to execute.
Temporal/Cadence workflows are executed by an external worker. So the only way to learn about which next steps a workflow has to take is to ask it every time new information is available. The only way to dispatch such a request to a worker is to put into a workflow task into a task queue. The workflow worker picks it up, gets workflow out of its cache, and applies new events to it. After the new events are applied the workflow executes producing a new set of commands. After the workflow code is blocked and cannot make any forward progress the workflow task is reported as completed back to the service. The list of commands to execute is included in the completion request.
Does the worker run through the workflow from beginning (using records of completed activities) until it hits the next "meaningful" thing it needs to do while making a "decision" on what needs to be done next?
This depends if a worker has the workflow object in its LRU cache. If workflow is in the cache, no recovery is needed and only new events are included in the workflow task. If object is not cached then the whole event history is shipped and the worker has to execute the workflow code from the beginning to get it to its current state. All commands produced while replaying past events are duplicates of previously produced commands and are ignored.
The above means that during a lifetime of a workflow multiple workflow tasks have to be executed. For example for a workflow that calls two activities in a sequence:
a();
b();
The tasks will be executed for every state transition:
-> workflow task at the beginning: command is ScheduleActivity "a"
a();
-> workflow task when "a" completes: command is ScheduleActivity "b"
b();
-> workflow task when "b" completes: command is CompleteWorkflowExecution
In the answer, I used terminology adopted by temporal.io fork of Cadence. Here is how the Cadence concepts map to the Temporal ones:
decision task -> workflow task
decision -> command, but it can also mean workflow task in some contexts
task list -> task queue
I have created a release pipeline with five agent jobs and I want to start all five jobs at the same time.
example:
In example I need to start all agent jobs simultaneously, and execute unique task (wait 10 seconds) at the same time.
Does VSTS (Azure DevOps) have option to do this?
You could also just use 5 different stages (depending on what exactly it is you're doing). Then you can leverage the full power of the pipeline model, have pre and post stages, whatever you wish. This is as mentioned in the other answers also possible with different agent jobs but this is more straight forward. Also you can easily clone stages.
I'm not sure what it is what you're trying to achieve with waiting for 10 seconds, but this is very easy to do with a PowerShell step. Select the radio button "Inline" and type this:
Start-Sleep -Seconds 10
Example of a pipeline, that might do the simultaneous work that you want, but keep in mind, each agent job (doesn't matter multiple jobs in one stage or multiple single job stages) has to find an agent that is capable, available and idle, otherwise the job(s) will wait in a waiting queue!!!
In the release pipeline click on "Agent job", then expand the "Execution plan" and click on "Multi-agent".
I think you need to create 5 stages since for the release pipeline in the Azure devops, jobs in one stage could not be paralleled.see documents from Microsoft
Or if you want to run the same set of set of tasks on multiple agents, you could use the option Multi-agent as shown below.
ADO Multi-agent option
If you want a job to be executed in parallel then choose multi-agent configuration, but if you have 5 (very) different jobs then you can choose "Even if a previous job has failed" from the dropdown "Run this job".
This is by default set to "Only when all previous jobs have succeeded" which means that:
All of your 5 jobs will be executed sequentially in the order that you've set them up
The chain of jobs will come to a stop as soon as one of the jobs fails
Take note that you can specify individually on what agent queue what job will execute, by default they're all going to the same queue, if you run 5 jobs in parallel on a single queue, then this queue should have 5 agents available and idle to get what you're expecting.
Is it possible in Concourse to limit to a task inside the pipeline? Let's say I have a pipeline with three jobs, but I want test just job #2 not 1 and 3. I tried to do a trigger job by pointing to a pipeline/job-name and it kind of worked (i.e., fly -t lab tj -j bbr-backup-bosh/export-om-installation). 'Kind of' because it did start from this job and then it fired off other jobs that I didn't want to test anyway. Wondering if there Ansible-like (i.e., --tag)
Thanks!!
You cannot "limit" a triggered job to itself, since a job is part of a pipeline. Each time you trigger a job, it will keep putting all the resources it uses. These resources, if marked as trigger: true downstream, well, they will trigger the downstream jobs.
You have two possibilities:
do not mark any resource in the pipeline as trigger: true. This obviously also means that your pipeline will never advance automatically, you will need to manual trigger each job. Not ideal but maybe good enough while troubleshooting the pipeline itself.
Think in terms of tasks. A job is made of one or more tasks, and tasks can be run independently from the pipeline. See the documentation for fly execute and for example https://concoursetutorial.com/ where they explain tasks and fly execute. Note that fly execute supports also --input and --output, so it is possible to emulate the task inputs and outputs as if it were in the pipeline.
Marco is pretty dead on but there’s one other option. You could pause the other jobs and abort any builds that would be triggered after they’re unpaused
I have a computation task which is split in several individual program executions, with dependencies. I'm using Condor 7 as task scheduler (with the Vanilla Universe, due do constraints on the programs beyond my reach, so no checkpointing is involved), so DAG looks like a natural solution. However some of the programs need to run on the same host. I could not find a reference on how to do this in the Condor manuals.
Example DAG file:
JOB A A.condor
JOB B B.condor
JOB C C.condor
JOB D D.condor
PARENT A CHILD B C
PARENT B C CHILD D
I need to express that B and D need to be run on the same computer node, without breaking the parallel execution of B and C.
Thanks for your help.
Condor doesn't have any simple solutions, but there is at least one kludge that should work:
Have B leave some state behind on the execute node, probably in the form of a file, that says something like MyJobRanHere=UniqueIdentifier". Use the STARTD_CRON support to detect this an advertise it in the machine ClassAd. Have D use Requirements=MyJobRanHere=="UniqueIdentifier". A part of D's final cleanup, or perhaps a new node E, it removes the state. If you're running large numbers of jobs through, you'll probably need to clean out left-over state occasionally.
I don't know the answer but you should ask this question on the Condor Users mailing list. The folks who support the DAG functionality in Condor monitor it and will respond. See this page for subscription information. It's fairly low traffic.
It's generally fairly difficult to keep two jobs together on the same host in Condor without locking them to a specific host in advance, DAG or no DAG. I actually can't think of a really viable way to do this that would let B start before C or C start before B. If you were willing to enforce that B must always start before C you could make part of the work that Job B does when it starts running be modify the Requirements portion of Job C's ClassAd so that it has a "Machine == " string where is the name of the machine B landed on. This would also require that Job C be submitted held or not submitted at all until B was running, B would also have to release it as part of its start up work.
That's pretty complicated...
So I just had a thought: you could use Condor's dynamic startd/slots features and collapse your DAG to achieve what you want. In your DAG where you currently have two separate nodes, B and C, you would collapse this down into one node B' that would run both B and C in parallel when it starts on a machine. As part of the job requirements you note that it needs 2 CPUs on a machine. Switch your startd's to use the dynamic slot configuration so machines advertise all of their resources and not just statically allocated slots. Now you have B and C running concurrently on one machine always. There are some starvation issues with dynamic slots when you have a few multi-CPU jobs in a queue with lots of single-CPU jobs, but it's at least a more readily solved problem.
Another option is to tag B' with a special job attribute:
MultiCPUJob = True
And target it just at slot 1 on machines:
Requirements = Slot == 1 && ...your other requirements...
And have a static slot startd policy that says, "If a job with MultiCPUJob=True tries to run on slot 1 on me preempt any job that happens to be in slot 2 on this machine because I know this job will need 2 cores/CPUs".
This is inefficient but can be done with any version of Condor past 6.8.x. I actually use this type of setup in my own statically partitioned farms so if a job needs a machine all to itself for benchmarking it can happen without reconfiguring machines.
If you're interested in knowing more about that preemption option let me know and I can point you to some further configuration reading in the condor-user list archives.
The solution here is to use the fact that you can modify submit descriptions even while DAGMan is running as long as DAGMan has not yet submitted the node. Assume a simple DAG of A -> B -> C. If you want all nodes to run on the same host you can do the following:
Define a POST script on node A.
The post script searches condor_history for the ClusterId of the completed node A. Something like condor_history -l -attribute LastRemoteHost -m1 $JOB_ID ... You'll need to clean up the output and what not, but you'll be left with the host that ran node A.
The post script then searches for and modifies dependent job submit files, inserting into them a job job requirement at the top of the submit file. Just make sure you build your job requirements incrementally so that they pick up this new requirement if it is present.
When the post script completes, DAGMan will then look to submit ready nodes, of which in this example we have one: B. The submission of B will now be done with the new requirement you added in step 3, so that it will run on the same execute host as A.
I do this currently with numerous jobs. It works great.