How to set the max running jobs number in lsf - lsf

I use IBM lsf. I want to configure the max running jobs number.
for example, per-user can submit 100000 jobs, but only 10 job can be runned, the other jobs are pending.
I have tried to set UJOB_LIMIT in lsb.queues
UJOB_LIMIT:Specifies the per-user job slot limit for the queue
and MAX_JOBS in lsb.users
MAX_JOBS: Per-user or per-group job slot limit for the cluster. Total number of job slots that
each user or user group can use in the cluster.
and MAX_PEND_JOBS in lsb.users
MAX_PEND_JOBS: Per-user or per-group pending job limit.
But they are wrong. I don't know how to set , who can give me a help?

To allow a user to submit 100000 jobs but only run 10 of them at a time, set this in lsb.users:
Begin User
USER_NAME MAX_JOBS MAX_PEND_JOBS
myuser 10 99990
End User
(99990 = 100000 - 10).
These limits will apply to the whole cluster

Related

How can I increase the max num of concurrent jobs in Dataproc?

I need to run hundreds of concurrent jobs in a Dataproc cluster, each job is pretty lightweight (e.g., a Hive query which gets a table metadata) which doesn't take much resources. But there seem to be some unknown factors which limit the max concurrent jobs. What can I do if I want to increase the max concurrency limit?
If you are submitting the jobs through the Dataproc API / CLI, these are the factors which affect the max number of concurrent jobs:
The number of master nodes;
The master memory size;
The cluster properties dataproc:agent.process.threads.job.max and dataproc:dataproc.scheduler.driver-size-mb, see Dataproc Properties for more details.
For debugging, when submitting jobs with gcloud, SSH into the master node and run ps aux | grep dataproc-launcher.py | wc -l every a few seconds to show how many concurrent jobs are running. At the same time, you can run tail -f /var/log/google-dataproc-agent.0.log to monitor how the agent is launching the jobs. You can tune the parameters above to get a higher concurrency.
You can also try submitting the jobs directly from the master node through spark-submit or Hive beeline, which will bypass the Dataproc job concurrency control mechanism. This can help you identify where the bottleneck is.

Do Dataproc have a resource allocation limit per job

Let say I have a Dataproc cluster of 100 worker nodes with a certain spec.
When I submitted a job to dataproc, is there a usage allocation limit to each job
e.g. job A cannot run more than 50% of all total nodes
Do we have this kind of limit? Or any job can allocate all resource of the cluster
There is no such per job limit on DataProc. One job could use all resources of YARN, and that's usually the default config for various job types on DataProc. But users can set per job limit as they want, e.g., for Spark, disable dynamic allocation, set the number of executors and the memory size of each executors.

Can we limit the number of DAGs running at any time in Apache Airflow

Can we limit the number of DAGs running at any time in Apache Airflow ?
We have a limit on resources in the environment . Is there a configuration to limit the no. of DAGs running in Airflow as a whole at a point in time ?
max_active_runs parameter limits run within a DAG
Is it possible that , If one DAG is running , all other scheduled DAGs should wait for the first DAG to complete and then trigger itself sequentially ?
By setting parallelism configuration option in airflow.cfg, you can limit the total maximum number of tasks (not DAGs) allowed to run in parallel. Then, by setting the dag_concurrency configuration option, you can specify how many tasks can a DAG run in parallel.
For example, setting parallelism=8 and dag_concurrency=1 will give you at maximum 8 DAGs running in parallel (with 1 running task each) at any time.

Total Datastage Jobs Greater Than My Max Jobs I Set

I have below system policies defined in InfoSphere DataStage Operations Console under "Work Load Management(WLM)".
Sometimes, the total number of currently running jobs shoots upto 150 although I have defined maximum running job count as 40 in WLM.
Whenever the currently running job count increases beyond 100, most of the datastage jobs starts showing increased startup time in director log and they took long time to run otherwise if the job concurrency is less than 100 then the same set of jobs run fine with startup time in seconds. Please suggest how to address this issue and how to enforce currently running job should not exceed eg 100 at any point of time. Thanks a lot !
This is working as designed, generally the WLM system is used to control the start of parallel and server jobs. It uses a set of user-defined queues and when a job is started, it is submitted to a designated queue. In the figure above the parallel jobs are in queue named 'MediumPriorityJobs'.
Note that the sequence job is not in the queue to be counted to the total running workloads controlled by the WLM Job Count System Policy.
Source: https://www.ibm.com/support/pages/how-interpret-job-count-maximum-running-jobs-system-policy-ibm-infosphere-information-server-workload-management-wlm

Job and Resource Scheduler

I am looking for some suggestions on a job scheduler for a stack of machines virtualized into one operating system (Suse) with synchronized cache (so you can access any memory from any process) with a few hundred processors. Ultimately what I would like to do is have multiple users submit jobs with a predetermined/expected execution time. I should be able to run something like
submit_job [-r <max_run_time_limit>] [-o <output_file>] [-i <input_file>] [-n <cpus_per_process>] [-t <threads_per_process>] [-j <jobname>] <my_program> [-arg <my_prog_arg>]
from the shell, and have the CPUs run the job uninterrupted until the end of the program or max_run_time_limit. Conversely, if other jobs from multiple users are occupying all CPUs, the job you submit must wait until other jobs are completed.
Ideally, in order to avoid users submitting jobs lasting a month on all CPUs, I would have a maximum allocated total run-time hours per user per month which the user cannot renew in that month without requesting additional hours from the administrator.
I am not very well versed in this subject so I'll provide more details if they are needed.