Is it possible to change the already queued jobs in windows windows hpc 2012?
I need to move some files from the head node before running another queued job to free space for that job.
I found this statement in Microsoft TechNet:
The order of the job queue is based on job priority level and submit time. Jobs with higher priority levels run before lower priority jobs. The job submit time determines the order within each priority level.
So, as my already queued jobs all are of "Normal" priority, I can set the priority of my move job higher than "Normal" such as "Highest" to get the job done.
Related
My pipeline triggers on resources, schedule and merges. Sometimes these can happen almost at the same time and many pipeline runs can be created. I've noticed that the jobs that run don't always belong to the same run.
Example
one pipeline A includes 2 jobs j.1 and j.2
a resource triggers A.1 and starts j.1
another resource triggers A.2 also and queues j.1.
A.1 finishes a job and instead of starting j.2 it is A.2 j.1 that starts.
How do I lock the run so that A.1 j.1 and j.2 runs to completion before A.2 starts?
On the agent, the queue is for the job-level not pipeline-level. So, normally the agent will be allocate to the higher priority jobs in the pipelines regardless of whether the jobs are in the same pipeline run.
Currently, we have not method or settings to manager the sort of the queued jobs.
I have below system policies defined in InfoSphere DataStage Operations Console under "Work Load Management(WLM)".
Sometimes, the total number of currently running jobs shoots upto 150 although I have defined maximum running job count as 40 in WLM.
Whenever the currently running job count increases beyond 100, most of the datastage jobs starts showing increased startup time in director log and they took long time to run otherwise if the job concurrency is less than 100 then the same set of jobs run fine with startup time in seconds. Please suggest how to address this issue and how to enforce currently running job should not exceed eg 100 at any point of time. Thanks a lot !
This is working as designed, generally the WLM system is used to control the start of parallel and server jobs. It uses a set of user-defined queues and when a job is started, it is submitted to a designated queue. In the figure above the parallel jobs are in queue named 'MediumPriorityJobs'.
Note that the sequence job is not in the queue to be counted to the total running workloads controlled by the WLM Job Count System Policy.
Source: https://www.ibm.com/support/pages/how-interpret-job-count-maximum-running-jobs-system-policy-ibm-infosphere-information-server-workload-management-wlm
I'm working on a project where we need to execute a lot of jobs (say 60000 jobs) each time in HPC cluster.
From HPC documentation, i noticed HPC has 2 mode
- Queued mode: tart jobs in queue order, and attempt to allocate the maximum requested resources to running jobs.
- Balanced mode: Attempt to start all incoming jobs as soon as possible at their minimum resource requirements
https://learn.microsoft.com/en-us/powershell/high-performance-computing/understanding-policy-configuration?view=hpc16-ps
But i'm not sure about tolerance of this balance mode in HPC. Does it can scale like other queue service like SQS in AWS or Queue Storage in Azure?
Queue mode and balance mode are referring to the scheduling policy. There is no real queue included.
Basically, in Queue mode, your jobs run in a FIFO manner. If the next job needs more cores then currently available, it waits despite the fact that there are available resources. While in Balance mode, HPC Pack tries to run as many jobs as possible at the same time and makes the best effort to ensure they use the same amount of resources., to maximize the resource usage.
Both two policies will not effect scale target of the cluster.
I have few questions about running tasks in parallel in Azure Batch. Per the official documentation, "Azure Batch allows you to set maximum tasks per node up to four times (4x) the number of node cores."
Is there a setup other than specifying the max tasks per node when creating a pool, that needs to be done (to the code) to be able to run parallel tasks with batch?
So if I am understanding this correctly, if I have a Standard_D1_v2 machine with 1 core, I can run up to 4 concurrent tasks running in parallel in it. Is that right? If yes, I ran some tests and I am quite not sure about the behavior that I got. In a pool of D1_v2 machines set up to run 1 task per node, I get about 16 min for my job execution time. Then, using the same applications and same parameters with the only change being a new pool with same setup, also D1_v2, except running 4 tasks per node, I still get a job execution time of about 15 min. There wasn't any improvement in the job execution time for running tasks in parallel. What could be happening? What am I missing here?
I ran a test with a pool of D3_v2 machines with 4 cores, set up to run 2 tasks per core for a total of 8 tasks per node, and another test with a pool (same number of machines as previous one) of D2_v2 machines with 2 cores, set up to run 2 tasks per core for a total of 4 parallel tasks per node. The run time/ job execution time for both these tests were the same. Isn't there supposed to be an improvement considering that 8 tasks are running per node in the first test versus 4 tasks per node in the second test? If yes, what could be a reason why I'm not getting this improvement?
No. Although you may want to look into the task scheduling policy, compute node fill type to control how your tasks are distributed amongst nodes in your pool.
How many tasks are in your job? Are your tasks compute-bound? If so, you won't see any improvement (perhaps even end-to-end performance degradation).
Batch merely schedules the tasks concurrently on the node. If the command/process that you're running utilizes all of the cores on the machine and is compute-bound, you won't see an improvement. You should double check your tasks start and end times within the job and the node execution info to see if they are actually being scheduled concurrently on the same node.
I have many jobs running and pending. I would like to indicate the relative priority of jobs that I have submitted to the queue, that are pending, but not yet running. Is it possible to set this priority after submission? Is it possible to set this priority before submission?
You can move jobs that are pending with the btop command.
btop job_ID | "job_ID[index_list]" [position]
If you add [position] it means that the job will be put at that place in the queue.
By default, LSF dispatches jobs in a queue in the order of their
arrival (that is, first come, first served), subject to availability
of suitable server hosts.
Having said this, the priority of the job is unchanged. So you will only be able to change the order if the jobs have the same priority.
Depending on your LSF version, see the following links for details about btop
LSF 10.1.0 > Command Reference > btop
LSF 9.1.3 > Command Reference > btop