Alternative of bsub (IBM LSF) to submit job (python, R, sas) - lsf

Currently I am using IBM LSF to submit jobs. The jobs are the python, R or SAS program. So I have few Python, R, SAS Program written and I submit them to execute using bsub command. To kill the job bkill and to get the status of job bjobs. The bsub command returns the job id.
What could be the other open-source alternatives similar to IBM LSF where I can submit this program as jobs? Also to get the status or kill the job?

Related

Talend Automation Job taking too much time

I had developed a Job in Talend and built the job and automated to run the Windows Batch file from the below build
On the Execution of the Job Start Windows Batch file it will invoke the dimtableinsert job and then after it finishes it will invoke fact_dim_combine it is taking just minutes to run in the Talend Open Studio but when I invoke the batch file via the Task Scheduler it is taking hours for the process to finish
Time Taken
Manual -- 5 Minutes
Automation -- 4 hours (on invoking Windows batch file)
Can someone please tell me what is wrong with this Automation Process
The reason of the delay in the execution would be a latency issue. Talend might be installed in the same server where database instance is installed. And so whenever you execute the job in Talend, it will complete as expected. But the scheduler might be installed in the other server, when you call the job through scheduler, it would take some time to insert the data.
Make sure you scheduler and database instance is on the same server
Execute the job directly in the windows terminal and check if you have same issue
The easiest way to know what is taking so much time is to add some logs to your job.
First, add some tWarn at the start and finish of each of the subjobs (dimtableinsert and fact_dim_combine) to know which one is the longest.
Then add more logs before/after the components inside the jobs.
This way you should have a better idea of what is responsible for the slowdown (DB access, writing of some files, etc ...)

Need solution to schedule Spark jobs

I am new to Spark.
In our project,
we have converted seven PLSql scripts into Scala-Spark.
The existing PLSql scripts are scheduled as jobs on Talend. Each
script is a scheduled on a separate job and these seven jobs run on a sequence as only after the first job completes successfully, the second job starts and same continues until the last job(seventh).
My team is exploring the possibilities to schedule the Scala-Spark programs as jobs in other ways. One of the suggestion was to convert/write the same job that is running on Talend into Scala. I have no idea if it is possible.
So, Could anyone let me know whether it is possible to do the same on Scala.
You can submit your spark job in Talend using tSystem or tSSH component. and get the response code (exit code) from the mentioned component. If the exit code=0 (Success) then you can submit next spark job. We did the same in our project.

Make job submitted with bsub run in parallel with job that was submitted before and still running

I would like the new submitted job to bsub not PEND but start immediate run.
If possible I would like to limit this to N jobs.
If you want two jobs to run in parallel, its probably best to submit a single parallel job (bsub -n) that runs two different processes, potentially on two different hosts.
The LSF admin can force a PENDing job to run with the brun command. However, this will cause the execution host to be temporarily overloaded.

How to integrate spring-xd batch jobs with Control-M scheduler

I'm trying to solve integration between Control-M scheduler and batch jobs running within spring-xd.
In our existing environment, Control-M agents run on the host and batch jobs are triggered via bash script from Control-M.
In the spring-xd architecture a batch job is pushed out into the XD container cluster and will run on an available container. This means however I don't know what XD container the job will run on. I could pin it to a single container with a deployment manifest, but that goes against the whole point of the cluster.
One potential solution.
Run a VM outside the XD container cluster with the Control-M agent and trigger jobs through the XD API via a bash script. The script would need to wait for the job to complete, by either polling for the job completion via the XD API or wait for an event to signal the completion.
Thinking further ahead this could be a solution to triggering batch jobs deployed in PCF.
In a previous life, I had the enterprise scheduler use Perl scripts to interact with the old Spring Batch Admin REST API to start jobs and poll for completion.
So, yes, the same technique should work fine with XD.
You can also tap into the job events.

Detect errors with torque and grid engine and prevent execution of dependent tasks

I have a shell script that queues multiple tasks for execution on an HPC cluster. The same job submission script works for either torque or grid engine with some minor conditional logic. This is a pipeline where the output of earlier tasks are fed to later tasks for further processing. I'm using qsub to define job dependencies, so later tasks wait for earlier tasks to complete before starting execution. So far so good.
Sometimes, a task fails. When a failure happens, I don't want any of the dependent tasks to attempt processing the output of the failed task. However, the dependent tasks have already been queued for execution long before the failure occurred. What is a good way to prevent the unwanted processing?
You can use the afterok dependency argument. For example, the qsub command may look like:
qsub -w depend=afterok:<jobid> submit.pbs
Torque will only start the next job if the jobid exits without errors. See documentation on the Adaptive Computing page.
Here is what I eventually implemented. The key to making this work is returning error code 100 on error. Sun Grid Engine stops execution of subsequent jobs upon seeing error code 100. Torque stops execution of subsequent jobs upon seeing any non-zero error code.
qsub starts a sequence of bash scripts. Each of those bash scripts has this code:
handleTrappedErrors()
{
errorCode=$?
bashCommand="$BASH_COMMAND"
scriptName=$(basename $0)
lineNumber=${BASH_LINENO[0]}
# log an error message to a log file here -- not shown
exit 100
}
trap handleErrors ERR
Torque (as Derek mentioned):
qsub -W depend=afterok:<jobid> ...
Sun Grid Engine:
qsub -hold_jid <jobid> ...