Is there any way to trigger monthly glue job after daily glue job? - workflow

Let's say there are three glue jobs, which are A, B, C
A and B are daily jobs, and C is monthly job.
I want these jobs to be executed following order:
A(daily) -> B(daily, but only when A succeed) -> C(monthly, but only when A and B succeed)
C should not work when B is failed.
Is there any way to do this easily and safely?

You can use Glue Workflow. Just create two workflows, one that would have Jobs A and B, running everyday and second workflow with Jobs A,B,C running monthly.
While creating workflows, you can set to run job B or C only when previous job succeeds.
For more information, read this.

Related

How to create a Teamcity build trigger that will run job B once per week, after job A finished, where A runs daily

I have some Teamcity jobs created.
One of those jobs, let's call it job A, has a schedule trigger, running daily at 7:00 am.
Now I have another one, job B, that I want to run once per week, but only after job A ran.
Given that job A takes about 30 seconds to run, I know I can create a schedule trigger for job B, that will run on every Monday, at 07:10 am.
I also know I can create a Finish Build Trigger, making sure that job B runs after job A ran, but it will run every day(because job A needs to run every day)
I'm trying to find a way to combine these, and come up with some sort of trigger that does something like this:
runs job B once per week(say Monday morning), after job A ran.
Could someone nudge me in the right direction? Or explain to me if/why what I'd like to do is a no-no. Thanks
It looks like the feature called Snapshot Dependency fits well into your scenario.
To tell it short, you can link the two jobs you have with the snapshot dependency. In your case, job B will "snapshot-depend" on job A. In my experience, it works best if both jobs use the same VCS root, that is, work with the same repository.
Job A is configured to run daily and job B is configured to run weekly (via regular scheduled triggers). When job A is triggered, it doesn't affect job B at all. On the other hand, when job B is triggered, it tries to find whether there's a suitable build of A by that time. If the two jobs work with the same repo, and the Enforce revision synchronization flag is ON, this means it will try to find the build of A of that same source code revision.
If there's a suitable build of A, it won't trigger a new one and will just build B. If there's no suitable build of A, it will first trigger A, and then trigger the build of B.

Combine multiple queued Azure DevOps build pipeline jobs into one run

I have a custom Agent Pool with multiple Agents, each with the same capabilities. This Agent Pool is used to run many YAML build pipeline jobs called them A1, A2, A3, etc. Each of those A* jobs triggers a different YAML build pipeline job called B. In this scheme, multiple simultaneous completions of A* jobs will trigger multiple simultaneous B jobs. However, the B job is setup to self-interlock, so that only one instance can run at a time. The nice thing is that when B job runs, it consumes all of the existing A* outputs (for safety reasons, A* and B are also interlocked).
Unfortunately, this means that of the multiple simultaneous B jobs, most will be stuck waiting for the first to finish after it processed all of the outputs of complete A* jobs, and only then the rest of the queued and/or running but blocked on interlock instances of B job can continue one at a time, with each having nothing to consume because all of the A* outputs have already been processed.
Is there a watch to make Azure DevOps batch together multiple instances of job B together? In other words, if there is already one B job instance running or queued, don't add another one?
Is there a watch to make Azure DevOps batch together multiple instances of job B together? In other words, if there is already one B job instance running or queued, don't add another one?
Sorry for any inconvenience.
This behavior is by designed. AFAIK, there is no such way/feature to combine multiple queued build pipeline to one.
Besides, personally think that your request is reasonable. You could add your request for this feature on our UserVoice site (https://developercommunity.visualstudio.com/content/idea/post.html?space=21 ), which is our main forum for product suggestions. Thank you for helping us build a better Azure DevOps.
Hope this helps.

Running two instances of the ScheduledThreadPoolExecutor

I have a number of asynchronous tasks to run in parallel. All the tasks can be divided into two types, lets call one - type A (that are time consuming) and everything else type B (faster and quick to execute ones).
with a single ScheduledThreadPoolExecutor with x poolsize, eventually at some point all threads are busy executing type A, as a resul type B gets blocked and delayed.
what im trying to accomplish is to run a type A tasks parallel to type B, and i want tasks in both the types to run parallel within their group for performance .
Would you think its prudent to have two instances of ScheduledThreadPoolExecutor for the type A and B exclusively with their own thread pools ? Do you see any issues with this approach?
No, that's seems reasonable.
I am doing something similar i.e. I need to execute tasks in serial fashion depending on some id e.g. all the tasks which are for component with id="1" need to be executed serially to each another and in parallel to all other tasks which are for components with different ids.
so basically I need a separate queue of tasks for each different component, the tasks are pulled one after another from each specific queue.
In order to achieve that I use
Executors.newSingleThreadExecutor(new JobThreadFactory(componentId));
for each component.
Additionally I need ExecutorService for a different type of tasks which are not bound to componentIds, for that I create additional ExecutorService instance
Executors.newFixedThreadPool(DEFAULT_THREAD_POOL_SIZE, new JobThreadFactory());
This works fine for my case at least.
The only problem I can think of if there is a need of ordered execution of the tasks i.e.
task2 NEEDS to be executed after task1 and so on... But I doubt this the case here ...

How to synchronize tasks in different dispatch queues?

I'm new to queues and I'm having some trouble setting up the following scheme.
I have three tasks that need doing.
Task A: Can only run on the main queue, can run asynchronously with task B, cannot run asynchronously with task C. Runs a lot but runs fairly quickly.
Task B: Can run on any queue, can run asynchronously with task A, cannot run asynchronously with task C. Runs rarely, but takes a long time to run. Needs Task C to run afterwards, but once again task C cannot run asynchronously with task A.
Task C: Can run on any queue. Cannot run asynchronously with either task A or task B. Runs rarely and runs quickly.
Right now I have it like this:
Task A is submitted to the main queue by a Serial Queue X (a task is submitted to Serial Queue X to submit task A to the main queue).
Task B is submitted to Serial Queue X.
Task C is submitted to the main queue by Serial Queue X, just like task A.
The problem here is that task C sometimes runs at the same time as task B. The main queue sometimes runs task C at the same time that the serial queue runs task B.
So, how can I ensure that task B and task C never run at the same time while still allowing A and B to run at the same time and preventing A and C from running at the same time? Further, is there any easy way to make sure they run the same number of times? (alternating back and forth)
You know, I think I had this problem on my GRE, only A, B, and C were Bob, Larry, and Sue and they all worked at the same office.
I believe that this can be solved with a combination of a serial queue and a dispatch semaphore. If you set up a single-wide serial dispatch queue and submit tasks B and C to that, you'll guarantee that they won't run at the same time. You can then use a dispatch semaphore with a count set to 1 that is shared between tasks A and C to guarantee that only one of them will run at a time. I describe how such a semaphore works in my answer here. You might need to alter that code to use DISPATCH_TIME_FOREVER so that task A is held up before submission rather than just tossed aside if C is running (likewise for submission of C).
This way, A and B will be running on different queues (the main queue and your serial queue) so they can execute in parallel, but B and C cannot run at the same time due to their shared queue, nor can A and C because of the semaphore.
As far as load balancing on A and C goes (what I assume you want to balance), that's probably going to be fairly application-specific, and might require some experimentation on your part to see how to interleave actions properly without wasting cycles. I'd also make sure that you really need them to alternate evenly, or if you can get by with one running slightly more than another.
Did you check out NSOperation to synchronize your operations? You can handle dependencies there.
There's a much simpler way, of course, assuming that C must always follow A and B, which is to have A and B schedule C as completion callbacks for their own operations (and have C check to make sure it's not already running, in case A and B both ask for it to happen simultaneously). The completion callback pattern (described in dispatch_async man page) is very powerful and a great way of serializing async operations that need to be nonetheless coupled.
Where the problem is A, B, C, D and E where A-D can run async and E must always run at the end, dispatch groups are a better solution since you can set E to run as the completion callback for the entire group and then simply put A-E in that group.

Condor job using DAG with some jobs needing to run the same host

I have a computation task which is split in several individual program executions, with dependencies. I'm using Condor 7 as task scheduler (with the Vanilla Universe, due do constraints on the programs beyond my reach, so no checkpointing is involved), so DAG looks like a natural solution. However some of the programs need to run on the same host. I could not find a reference on how to do this in the Condor manuals.
Example DAG file:
JOB A A.condor
JOB B B.condor
JOB C C.condor
JOB D D.condor
PARENT A CHILD B C
PARENT B C CHILD D
I need to express that B and D need to be run on the same computer node, without breaking the parallel execution of B and C.
Thanks for your help.
Condor doesn't have any simple solutions, but there is at least one kludge that should work:
Have B leave some state behind on the execute node, probably in the form of a file, that says something like MyJobRanHere=UniqueIdentifier". Use the STARTD_CRON support to detect this an advertise it in the machine ClassAd. Have D use Requirements=MyJobRanHere=="UniqueIdentifier". A part of D's final cleanup, or perhaps a new node E, it removes the state. If you're running large numbers of jobs through, you'll probably need to clean out left-over state occasionally.
I don't know the answer but you should ask this question on the Condor Users mailing list. The folks who support the DAG functionality in Condor monitor it and will respond. See this page for subscription information. It's fairly low traffic.
It's generally fairly difficult to keep two jobs together on the same host in Condor without locking them to a specific host in advance, DAG or no DAG. I actually can't think of a really viable way to do this that would let B start before C or C start before B. If you were willing to enforce that B must always start before C you could make part of the work that Job B does when it starts running be modify the Requirements portion of Job C's ClassAd so that it has a "Machine == " string where is the name of the machine B landed on. This would also require that Job C be submitted held or not submitted at all until B was running, B would also have to release it as part of its start up work.
That's pretty complicated...
So I just had a thought: you could use Condor's dynamic startd/slots features and collapse your DAG to achieve what you want. In your DAG where you currently have two separate nodes, B and C, you would collapse this down into one node B' that would run both B and C in parallel when it starts on a machine. As part of the job requirements you note that it needs 2 CPUs on a machine. Switch your startd's to use the dynamic slot configuration so machines advertise all of their resources and not just statically allocated slots. Now you have B and C running concurrently on one machine always. There are some starvation issues with dynamic slots when you have a few multi-CPU jobs in a queue with lots of single-CPU jobs, but it's at least a more readily solved problem.
Another option is to tag B' with a special job attribute:
MultiCPUJob = True
And target it just at slot 1 on machines:
Requirements = Slot == 1 && ...your other requirements...
And have a static slot startd policy that says, "If a job with MultiCPUJob=True tries to run on slot 1 on me preempt any job that happens to be in slot 2 on this machine because I know this job will need 2 cores/CPUs".
This is inefficient but can be done with any version of Condor past 6.8.x. I actually use this type of setup in my own statically partitioned farms so if a job needs a machine all to itself for benchmarking it can happen without reconfiguring machines.
If you're interested in knowing more about that preemption option let me know and I can point you to some further configuration reading in the condor-user list archives.
The solution here is to use the fact that you can modify submit descriptions even while DAGMan is running as long as DAGMan has not yet submitted the node. Assume a simple DAG of A -> B -> C. If you want all nodes to run on the same host you can do the following:
Define a POST script on node A.
The post script searches condor_history for the ClusterId of the completed node A. Something like condor_history -l -attribute LastRemoteHost -m1 $JOB_ID ... You'll need to clean up the output and what not, but you'll be left with the host that ran node A.
The post script then searches for and modifies dependent job submit files, inserting into them a job job requirement at the top of the submit file. Just make sure you build your job requirements incrementally so that they pick up this new requirement if it is present.
When the post script completes, DAGMan will then look to submit ready nodes, of which in this example we have one: B. The submission of B will now be done with the new requirement you added in step 3, so that it will run on the same execute host as A.
I do this currently with numerous jobs. It works great.