Wait for all LSF jobs with given name, overriding JOB_DEP_LAST_SUB = 1 - distributed-computing

I've got a large computational task, consisting of several steps, that I run on a PC cluster, managed by LSF.
Part of this task includes launching several parallel jobs with identical names. Jobs are somewhat different, therefore it is hard to transform them to a job array.
The next step of this computation, following these jobs, summarizes their results, therefore it must wait until all of them are finished.
I'm trying to use -w ended(job-name) command line switch for bsub, as usual, to specify job dependencies.
However, admins of the cluster have set JOB_DEP_LAST_SUB = 1 in lsb.params.
According to the LSF manual, this makes LSF to wait for only one most recent job with supplied name to complete, instead of all jobs.
Is it possible to override this behavior for my task only without asking admins to reconfigure the whole cluster (this cluster is used by many people, it is very unlikely that they agree)?
I cannot find any clues in the manual.

Looks like it cannot be overridden.
I've changed job names to make them unique by appending random value, then I've changed condition to -w ended(job-name-*)

Related

How to configure channels and AMQ for spring-batch-integration where all steps are run as slaves on another cluster member

Followup to Configuration of MessageChannelPartitionHandler for assortment of remote steps
Even though the first question was answered (I think well), I think I'm confused enough that I'm not able to ask the right questions. Please direct me if you can.
Here is a sketch of the architecture I am attempting to build. Today, we have a job that runs a step across the cluster that works. We want to extend the architecture to run n (unbounded and different) jobs with n (unbounded and different) remote steps across the cluster.
I am not confusing job executions and job instances with jobs. We already run multiple job instances across the cluster. We need to be able to run other processes that are scalable in hte same way as the one we have defined already.
The source data is all coming from database which are known to the steps. The partitioner is defining the range of data for the "where" clause in the source database and putting that in the stepContext. All of the actual work happens in the stepContext. The jobContext simply serves to spawn steps, wait for completion, and provide the control API.
There will be 0 to n jobs running concurrently, with 0 to n steps from however many jobs running on the slave VM's concurrently.
Does each master job (or step?) require its own request and reply channel, and by extension its own OutboundChannelAdapter? Or are the request and reply channels shared?
Does each master job (or step?) require its own aggregator? By implication this means each job (or step) will have its own partition handler (which may be supported by the existing codebase)
The StepLocator on the slave appears to require a single shared replyChannel across all steps, but it appears to me that the messageChannelpartitionHandler requires a separate reply channel per step.
What I think is unclear (but I can't tell since it's unclear) is how the single reply channel is picked up by the aggregatedReplyChannel and then returned to the correct step. Of course I could be so lost I'm asking the wrong questions.
Thank you in advance
Does each master job (or step?) require its own request and reply channel, and by extension its own OutboundChannelAdapter? Or are the request and reply channels shared?
No, there is no need for that. StepExecutionRequests are identified with a correlation Id that makes it possible to distinguish them.
Does each master job (or step?) require its own aggregator? By implication this means each job (or step) will have its own partition handler (which may be supported by the existing codebase)
That should not be the case, as requests are uniquely identified with a correlation ID (similar to the previous point).
The StepLocator on the slave appears to require a single shared replyChannel across all steps, but it appears to me that the messageChannelpartitionHandler requires a separate reply channel per step.
The messageChannelpartitionHandler should be step or job scoped, as mentioned in the Javadoc (see recommendation in the last note). As a side note, there was an issue with message crossing in a previous version due to the reply channel being instance based, but it was fixed here.

Autosys trigger same DataStage job multiple times with different inovacation IDs

Here is what I am trying to do, not sure if it is possible:
Autosys gets File1:10pm starts DataStage Job 1.1:10pm
Job1.1:10pm is still running
Autosys gets File1:20pm, it needs to start the same Job1 but run it as Job1.1:20pm, even though Job1.1:10pm is still running & not wait for it to finish, go ahead & run.
Can Autosys call the same DataStage job every time it gets a new file & run it with the new timestamp as the invocation id. Without waiting for the previous job to finish.
Thanks ya'll
Yes - absolutely - this is possible. To enable different InvocationIds you have to check the "multiple instance" property in the jobs properties. With this you allow multiple simultaneous runs of the job.
The invocationID can be a parameter as well when calling it from a sequence.
When your (multiple intance) job writes to a file make sure that each filename is unique to avoid side effects due to the multiple runs at the same time. This can be done by specifying DSJobInvocationId as part of the filename. Note that it is a parameter provided by DataStage which needs to be written exactly as shown with the upper and lower case letters. DataStage will the replace it with the content of your job invocationid at runtime.

Give an entire Celery chain priority over new tasks

I want to launch a chain of Celery tasks, and have them all execute before any newly arriving tasks do. I'll have a single worker process handling all tasks.
I guess the easiest thing to do would be to not make them a chain at all, but instead launch a single task that synchronously calls a sequence of functions. But I'd like to take advantage of Celery retries, allowing each task to be retried a different number of times.
What's the best way to do this?
If you have a single worker running a single process then as far as I can tell from working with celery (this is not explicitly documented) you should get the behavior you want.
If you want to use multiple worker processes then you may need to set CELERYD_PREFETCH_MULTIPLIER to 1.

Select node in Quartz cluster to execute a job

I have some questions about Quartz clustering, specifically about how triggers fire / jobs execute within the cluster.
Does quartz give any preference to nodes when executing jobs? Such as always or never the node that executed the same job the last time, or is it simply whichever node that gets to the job first?
Is it possible to specify the node which should execute the job?
The answer to this will be something of a "it depends".
For quartz 1.x, the answer is that the execution of the job is always (only) on a more-or-less random node. Where "randomness" is really based on whichever node gets to it first. For "busy" schedulers (where there are always a lot of jobs to run) this ends up giving a pretty balanced load across the cluster nodes. For non-busy scheduler (only an occasional job to fire) it may sometimes look like a single node is firing all the jobs (because the scheduler looks for the next job to fire whenever a job execution completes - so the node just finishing an execution tends to find the next job to execute).
With quartz 2.0 (which is in beta) the answer is the same as above, for standard quartz. But the Terracotta folks have built an Enterprise Edition of their TerracottaJobStore which offers more complex clustering control - as you schedule jobs you can specify which nodes of the cluster are valid for the execution of the job, or you can specify node characteristics/requisites, such as "a node with at least 100 MB RAM available". This also works along with ehcache, such that you can specify the job to run "on the node where the data keyed by X is local".
I solved this question for my web application using Spring + AOP + memcached. My jobs do know from the data they traverse if the job has already been executed, so the only thing I need to avoid is two or more nodes running at the same time.
You can read it here:
http://blog.oio.de/2013/07/03/cluster-job-synchronization-with-spring-aop-and-memcached/

Condor job using DAG with some jobs needing to run the same host

I have a computation task which is split in several individual program executions, with dependencies. I'm using Condor 7 as task scheduler (with the Vanilla Universe, due do constraints on the programs beyond my reach, so no checkpointing is involved), so DAG looks like a natural solution. However some of the programs need to run on the same host. I could not find a reference on how to do this in the Condor manuals.
Example DAG file:
JOB A A.condor
JOB B B.condor
JOB C C.condor
JOB D D.condor
PARENT A CHILD B C
PARENT B C CHILD D
I need to express that B and D need to be run on the same computer node, without breaking the parallel execution of B and C.
Thanks for your help.
Condor doesn't have any simple solutions, but there is at least one kludge that should work:
Have B leave some state behind on the execute node, probably in the form of a file, that says something like MyJobRanHere=UniqueIdentifier". Use the STARTD_CRON support to detect this an advertise it in the machine ClassAd. Have D use Requirements=MyJobRanHere=="UniqueIdentifier". A part of D's final cleanup, or perhaps a new node E, it removes the state. If you're running large numbers of jobs through, you'll probably need to clean out left-over state occasionally.
I don't know the answer but you should ask this question on the Condor Users mailing list. The folks who support the DAG functionality in Condor monitor it and will respond. See this page for subscription information. It's fairly low traffic.
It's generally fairly difficult to keep two jobs together on the same host in Condor without locking them to a specific host in advance, DAG or no DAG. I actually can't think of a really viable way to do this that would let B start before C or C start before B. If you were willing to enforce that B must always start before C you could make part of the work that Job B does when it starts running be modify the Requirements portion of Job C's ClassAd so that it has a "Machine == " string where is the name of the machine B landed on. This would also require that Job C be submitted held or not submitted at all until B was running, B would also have to release it as part of its start up work.
That's pretty complicated...
So I just had a thought: you could use Condor's dynamic startd/slots features and collapse your DAG to achieve what you want. In your DAG where you currently have two separate nodes, B and C, you would collapse this down into one node B' that would run both B and C in parallel when it starts on a machine. As part of the job requirements you note that it needs 2 CPUs on a machine. Switch your startd's to use the dynamic slot configuration so machines advertise all of their resources and not just statically allocated slots. Now you have B and C running concurrently on one machine always. There are some starvation issues with dynamic slots when you have a few multi-CPU jobs in a queue with lots of single-CPU jobs, but it's at least a more readily solved problem.
Another option is to tag B' with a special job attribute:
MultiCPUJob = True
And target it just at slot 1 on machines:
Requirements = Slot == 1 && ...your other requirements...
And have a static slot startd policy that says, "If a job with MultiCPUJob=True tries to run on slot 1 on me preempt any job that happens to be in slot 2 on this machine because I know this job will need 2 cores/CPUs".
This is inefficient but can be done with any version of Condor past 6.8.x. I actually use this type of setup in my own statically partitioned farms so if a job needs a machine all to itself for benchmarking it can happen without reconfiguring machines.
If you're interested in knowing more about that preemption option let me know and I can point you to some further configuration reading in the condor-user list archives.
The solution here is to use the fact that you can modify submit descriptions even while DAGMan is running as long as DAGMan has not yet submitted the node. Assume a simple DAG of A -> B -> C. If you want all nodes to run on the same host you can do the following:
Define a POST script on node A.
The post script searches condor_history for the ClusterId of the completed node A. Something like condor_history -l -attribute LastRemoteHost -m1 $JOB_ID ... You'll need to clean up the output and what not, but you'll be left with the host that ran node A.
The post script then searches for and modifies dependent job submit files, inserting into them a job job requirement at the top of the submit file. Just make sure you build your job requirements incrementally so that they pick up this new requirement if it is present.
When the post script completes, DAGMan will then look to submit ready nodes, of which in this example we have one: B. The submission of B will now be done with the new requirement you added in step 3, so that it will run on the same execute host as A.
I do this currently with numerous jobs. It works great.