How to set sun grid engine scheduling policy to satisfy this? - scheduler

We use sun grid engine(actually open scheduler grid) as drms. Suppose we have 3 users: uA, uB, uC.
uA submit 100000 jobs then uB submit 10 jobs then uC submit 1 job. With default scheduling policy, grid engine will run uA's 100000 jobs and then uB's 10 and then uC's 1 job, thus uB and uC need to wait a long time.
We hope the scheduler can select jobs to run like this:
first, select 1 uA's job, 1 uB's job, 1 uC's job
then, select 19 uA's jobs, 19 uB's jobs
then, select uA's other jobs
How to set the policy to fit this?

I have done this by setting up a share tree policy with a single user = default. You also need to set a rapid half-life decay on prioritization (I used 1 hour). Also, place 0 priority weight on job waiting time. (Put 100% on the share tree policy.) I did this poking around in qmon and experimenting with different values.

Related

Is using "pg_sleep" in plpgsql Procedures/Functions while using multiple worker background processes concurrently bad practice?

I am running multiple background worker processes in my Postgres database. I am doing this using Pg_Cron extension. I cannot unfortunately use Pg_Timetables, as suggested by another user here.
Thus, I have 5 dependent "Jobs" that need 1 other independent Procedure/Function to execute and complete before they can start. I originally having my Cron jobs simply check every 30minutes or-so some "job_log" table I created to see if the independent Job completed (i.e. if yes, execute Procedure, if not, return out of Procedure and check at next Cron interval)
However, I believe I could simplify the way I am triggering/orchestrating all these Jobs/Procedures greatly if I utilize pg_sleep and start all the Jobs at one -time (so no more checking every 30minutes). I would be running these Jobs in the night time concurrently so I believe it shouldn't effect my actual traffic that much.
i.e.
WHILE some_variable != some_condition LOOP
PERFORM pg_sleep(1);
some_variable := some_value; -- update variable here
END LOOP;
My question is
Would starting all these Jobs at one time (i.e. Setting a concrete time in the Cron expression e.g. 15 18 * * *), and utilizing pg_sleep be bad practice/inefficient as I would be idling 5 background workers while 1 Job completes. The 1 job these are dependent on could take any amount of time to finish i.e. 15 min, 30 min, 1hr (should be < 1 hr though).
Or is better to simply just use a Cron expression to check every 5min or so if the main/independent Job is done, so my other Jobs that are dependent can then run?
Running two schedulers, one of them home-built, seems more complex than just running one scheduler that does 2 (or 6, 1+5, however you count it) different things. If your goal is to make things simpler, does your proposal really achieve that?
I wouldn't worry about 5 backends sleeping on pg_sleep at the same time. But you might worry about them holding back the xid horizon while they do so, which would make vacuuming and HOT pruning less effective. But if you already have one long-running task in one snapshot (the thing they are waiting for) then more of them aren't going to make the matter worse.

is it possible to define Must Start Times for dependent jobs in AutoSys

I'm looking for an option in autosys to send an alert if the dependant jobs don't start within 10 min of the avg run time, for example, the last 30 days. The dependant jobs, do not have a fixed starting condition. They might have run at varying times in the last 30 days based on the completion of starting jobs. Would it be possible in autosys, to dynamically set the must start time for the jobs which don't have starting time rather they are dependant on starting conditions?
I use something like this in my critical jobs. You can try them too, if you want :
must_start_times: "19:01"
days_of_week: mo,tu,we,th,fr,sa,su
start_times: "19:00"

Parallel activities for one agent

Is there a way to have parallel services and/or delays to occur per agent and move on with the activity that takes the longest. For example if I have an agent that can be painted and serviced at the same time, each requiring a different resource pool with different processing times but the agent will move forward when the process that took the longest is over.
Use a Split block to split your agent into two for the two parallel tasks (Service blocks) and then use a Combine to combine them back again afterwards (with the Combine block outputting the original agent 1).
You can also use RestrictedAreaStart and RestrictedAreaEnd blocks (capacity 1) around the split/combine area to ensure that other agents can't 'jump in' whilst the longest parallel process is still running (but the shorter one has already finished).
Something like the below (with resource pools).
Probably easiest to dynamically setup delay duration and resources needed from within 1 Service element:
calculate the duration your agent will use in each case (painting = 5 mins, servicing = 10 mins) --> use the longer value as the service delay
Also, make the agent require 1 painter and 1 service-engineer as resources.
Only drawback: Your painter will stay for 10 mins as well.
Alternative approach would likely involve creating your own, purely agent-based setup with seizing& releasing

How to put a rate limit on a celery queue?

I read this in the celery documentation for Task.rate_limit:
Note that this is a per worker instance rate limit, and not a global rate limit. To enforce a global rate limit (e.g., for an API with a maximum number of requests per second), you must restrict to a given queue.
How do I put a rate limit on a celery queue?
Turns out it cant be done at queue level for multiple workers.
IT can be done at queue level for 1 worker. Or at queue level for each worker.
So if u say 10 jobs/ minute on 5 workers. Your workers will process upto 50 jobs per minute collectively.
So to have only 10 jobs running at a time you either chose one worker. Or chose 5 workers with a limit of 2/minute.
Update: How to exactly put the limit in settings/configuration:
task_annotations = {'tasks.<task_name>': {'rate_limit': '10/m'}}
or change the same for all tasks:
task_annotations = {'*': {'rate_limit': '10/m'}}
10/m means 10 tasks per minute, /s would mean per second. More details here: Task annotations setting
hey I am trying to find a way to do rate limit on queue, and I find out Celery can't do that, however Celery can control the rate per tasks, see this:
http://docs.celeryproject.org/en/latest/userguide/workers.html#rate-limits
so for a workaround, maybe you can set up one tasks per queue(which makes sense in a lot of situations), and put the limit on task.
You can set this limit in the flower > worker pane.
there is a specified blank space for entering your limit there.
The format that is suggested to be used is also like the below:
The rate limits can be specified in seconds, minutes or hours by appending “/s”, >“/m” or “/h” to the value. Tasks will be evenly distributed over the specified >time frame.
Example: “100/m” (hundred tasks a minute). This will enforce a minimum delay of >600ms between starting two tasks on the same worker instance.

Is this an intelligent use case for optaPlanner?

I'm trying to clean up an enterprise BI system that currently is using a prioritized FIFO scheduling algorithm (so a priority 4 report from Tuesday will be executed before priority 4 reports from Thursday and priority 3 reports from Monday.) Additional details:
The queue is never empty, jobs are always being added
Jobs range in execution time from under a minute to upwards of 24 hours
There are 40 some odd identical app servers used to execute jobs
I think I could get optaPlanner up and running for this scenario, with hard rules around priority and some soft rules around average time in the queue. I'm new to scheduling optimization so I guess my question is what should I be looking for in this situation to decide if optaPlanner is going to help me or not?
The problem looks like a form of bin packing (and possibly job shop scheduling), which are NP-complete, so OptaPlanner will do better than a FIFO algorithm.
But is it really NP-complete? If all of these conditions are met, it might not be:
All 40 servers are identical. So running a priority report on server A instead of server B won't deliver a report faster.
All 40 servers are identical. So total duration (for a specific input set) is a constant.
Total makespan doesn't matter. So given 20 small jobs of 1 hour and 1 big job of 20 hours and 2 machines, it's fine that it takes all small jobs are done after 10 hours before the big job starts, given a total makespan of 30 hours. There's no desire to reduce the makespan to 20 hours.
"the average time in the queue" is debatable: do you care about how long the jobs are in the queue until they are started or until they are finished? If the total duration is a constant, this can be done by merely FIFO'ing the small jobs first or last (while still respecting priority of course).
There are no dependencies between jobs.
If all these conditions are met, OptaPlanner won't be able to do better than a correctly written greedy algorithm (which schedules the highest priority job that is the smallest/largest first). If any of these conditions aren't met (for example you buy 10 new servers which are faster), then OptaPlanner can do better. You just have to evaluate if it's worth spending 1 thread to figure that out.
If you use OptaPlanner, definitely take a look at real-time scheduling and daemon mode, to replan as new reports enter the system.