LSF different resource request for multiple slots - lsf

Is it possible to submit a job using LSF in which the initial execution slot has one value for rusage[mem=] and the others have different rusage values. Job is master-slave. Master needs lots of memory, slaves, not so much.
Tried various select specifications, e.g., select[ 1*rusage[mem=6000] + 2*rusage[mem=1000]] but only get "invalid resource specification" errors.

The general format of compound resource requirements is: <# of slots on headnode>*{<headnode resreq>}+<# of slots on compute nodes>*{<computenode resreq>}. Try this
bsub -R "1*{rusage[mem=6000]}+2*{rusage[mem=1000]}" ...

Related

Routing agents through specific resources in anylogic

I am solving a job shop scheduling problem resorting to anylogic. I have 20 jobs (agents) and 5 machines(resources) and each job as a specific order to visit the machines. My question is: how can I make sure that each job follows its order.
This is what I have done. One agent called 'jobs' and 5 agents, each one corresponding to a machine. One resource pool associated to each one of the service blocks. In the collection enterblocks I selected the 5 enter blocks.
In the agent 'jobs' I have this. The parameters associated to each job, read from the database file, and the collection 'enternames' where I selected the machine(1,2,3,4,5) parameters and the collection 'ptimes' where I put the processing times of the job (This two colletions is where I am not sure I have done it correctly)
My database file
I am not sure how to use the counter used here How to store routings in job shop production in Anylogic. In the previous link the getNextService function is used in the exit blocks but I am also not sure how to use it in my case due to the counter.
Firstly, to confirm that based on the Job agent and database view, the first line in the database will result in a Job agent with values such as:
machine1 = 1 and process1=23
machine2 = 0 and process2=82 and so on
If that is the intent, then a better way is to restructure the database, so there are two tables:
Table of jobs to machine sequence looking something like this:
job
op1
op2
op3
op4
op5
1
machine2
machine1
machine4
machine5
machine3
2
machine4
machine3
machine5
machine1
machine2
3
...
...
...
...
...
Table of jobs to processing time
Then, add a collection of type ArrayList of String to Job (let's call this collection col_machineSequence) and when the Job agents get created their on startup code should be:
for (String param : List.of("op1","op2","op3","op4","op5")) {
col_machineSequence.add(getParameter(param));
}
As a result, col_machineSequence will contain sequence of machines each job should visit in the order defined in the database.
NOTE: Please see help on getParameter() here.
Also:
Putting a Queue in front of the Service isn't necessary
Repeating Enter-Queue-Service-Exit isn't necessary, this can be simplified using this method
Follow-up clarifications:
Collections - these will be enclosed in each Job agent
Queue sorting - Service block has Priorities / preemption which governs the ordering on the queue
Create another agent for the second table (call the agent ProcessingTime and table processing_time) and add it to the Job agent and then load it from database filtering on p_jobid as shown in the picture

Architecture/Optimization of a scheduling problem on OptaPlanner

I'd like to generate a planning with OptaPlanner, with the following problem :
"Task" is the planning entity
It has a fixed duration
It has an employee (planning variable) which must match required skills
It has a workstation (planning variable) which must match required attributes
It has dependencies: some tasks must start after some others
Some tasks have a deadline
At first, I tried with chained/shadow variables, basing on the OptaPlanner Task Assignment example. On my first attempt, I kept the employee as an anchor, and let the workstation management to the solver:
I quickly saw that there was a problem with this approach. Here, Task A and Task B (and all others) have an influence the one on the other while they are not on the same chain and also the previous element of the chain has not enough information to determine the start time of the task. Also, the worst thing is that workstations changes not being tracked in this model, the solutions are just non-sense as workstations can be used several times.
To fix the problem of workstation tracking, I added workstations to the anchor and made all employee/workstation combinations
This way I know the employee and workstation on each chain. However this does not solve the problem of start time of tasks on a chain being dependent on tasks on other chains (e.g, Task A and Task D share the same workstation) and therefore Task insertions/removals shall have repercussion on other chains, which is not the spirit.
I ended giving up the idea of chained variables as it seems that their usage does not fit my problem.
So I modified my classes and the start time of tasks is now resolved with all intelligence left to OptaPlanner's solver, driven by pure drools rules and a ValueRangeProvider.
When I have only the following rules :
No employee recovery (Hard)
No workstation recovery (Hard)
Skills and attributes requirements (Hard)
Tasks with deadlines ending before deadline (Soft, could be Medium)
Tasks ending as soon as possible (sum of squared end times, Soft)
I can get quite fast a solution that seems to be the best.
However, when I add dependencies between tasks (with a hard rule going down task dependencies to see if a dependency does not end after the task starts), the complexity seems to dramatically increase so for a few dozens task, with only 2 operators, 3 workstations, a satisfying solution (without an unexpected holes between tasks) can take an hour to come, with the following parameters:
Experiment length: 10,000 min
Granularity: 1 min
I also have a TaskDifficultyComparator to help the solver place the hardest tasks before
This is very long for a few tasks and it will be far worse when inserting a notion of availability of user, I even suspect it never converges as task end time will "jump" depending on availabilities.
So my questions are:
Are there solutions leveraging chained variables that would suit my problem?
Are their any optimizations on the solver/rules/something else that could grant me a precious speed-up?

Is a JOB card necessary?

I guess Job Cards are such like global attributes of a Java Class. In my job, I never used these job cards attributes. So job card is necessary in a job ? Could you please look at the job card below and tell me if that's required and why I need it ?
Best Regards
//BJ03H03 JOB (BBO09272,0000),
// 'NHS-STAT $',
// USER=BPB,
// SCHENV=HDZ2PO,
// CLASS=E,
// TIME=270,
// MSGCLASS=2
What is and isn't required on a job card will be system/installation dependent. The minimum requirement is that a JOB statement with a JOBNAME exist. i.e. //JOBNAME JOB (an EXEC statement is also required)
However, your installation will likely require other parameters, it may implement defaults. In short you need to either speak to the system programmers or alternately experiment by omitting parameters (this latter method could end up resulting in discussions with the Systems Programmers (perhaps angry ones)).
The system is designed to enable users to perform many types of job
control in many ways. To allow this flexibility, only two job entry
tasks are required:
Identification: The job must be identified in the jobname field of a JOB statement.
Execution: The program or procedure to be executed must be named in a PGM or PROC parameter on an EXEC statement.
Therefore, the following statements are the minimum needed to perform
a job control task:
//jobname JOB
// EXEC {PGM=program-name }
{PROC=procedure-name}
{procedure-name}`
As from Task Charts z/OS MVS JCL Reference SA23-1385-00 which wouldn't be the worst starting place to find out more.

How to pass output from a Datastage Parallel job to input as another job?

My requirement is
Parallel Job1 --I extract data from a table, when row count is more than 0
Parallel job 2 should be triggered in the sequencer only when the row count from source query in Job1 is greater than 0
I want to achieve this without creating any intermediate file in job1.
So basically what you want to do is using information from a data stream (of your Job1) and use it in the "above" sequence as a parameter.
In your case you want to decide on sequence level to run subsequent jobs (if more than 0 rows get returned) or not.
Two options for that:
Job1 writes information to a file which is a value file of a parameterset. These files are stored in a fixed directory. The parameter of the value file could then be used in your sequence to decide your further processing. Details for parameter sets can be found here.
You could use a server job for Job1 and set a user status (basic function DSSetUserStatus) in a transfomer. This is also passed back to the sequence and could be referenced in subsequent stages of the sequence. See the documentation but you will find many other information on the internet as well regarding this topic.
There are more solution to this problem - or let us call it challenge. Other ways may be a script called at sequence level which queries the database and will avoid Job1...

Billing by tag in Google Compute Engine

Google Compute Engine allows for a daily export of a project's itemized bill to a storage bucket (.csv or .json). In the daily file I can see X-number of seconds of N1-Highmem-8 VM usage. Is there a mechanism for further identifying costs, such as per tag or instance group, when a project has many of the same resource type deployed for different functional operations?
As an example, Qty:10 N1-Highmem-8 VM's are deployed to a region in a project. In the daily bill they just display as X-seconds of N1-Highmem-8.
Functionally:
2 VM's might run a database 24x7
3 VM's might run batch analytics operation averaging 2-5 hrs each night
5 VM's might perform a batch operation which runs in sporadic 10 minute intervals through the day
final operation writes data to a specific GS Buckets, other operations read/write to different buckets.
How might costs be broken out across these four operations each day?
The Usage Logs do not provide 'per-tag' granularity at this time and it can be a little tricky to work with the usage logs but here is what I recommend.
To further break down the usage logs and get better information out of em, I'd recommend trying to work like this:
Your usage logs provide the following fields:
Report Date
MeasurementId
Quantity
Unit
Resource URI
ResourceId
Location
If you look at the MeasurementID, you can choose to filter by the type of image you want to verify. For example VmimageN1Standard_1 is used to represent an n1-standard-1 machine type.
You can then use the MeasurementID in combination with the Resource URI to find out what your usage is on a more granular (per instance) scale. For example, the Resource URI for my test machine would be:
https://www.googleapis.com/compute/v1/projects/MY_PROJECT/zones/ZONE/instances/boyan-test-instance
*Note: I've replaced the "MY_PROJECT" and "ZONE" here, so that's that would be specific to your output along with the name of the instance.
If you look at the end of the URI, you can clearly see which instance that is for. You could then use this to look for a specific instance you're checking.
If you are better skilled with Excel or other spreadsheet/analysis software, you may be able to do even better as this is just an idea on how you could use the logs. At that point it becomes somewhat a question of creativity. I am sure you could find good ways to work with the data you gain from an export.
9/2017 update.
It is now possible to add user defined labels, then track usage and billing by these labels for Compute and GCS.
Additionally, by enabling the billing export to Big Query, it is then possible to create custom views or hit Big Query in a tool more friendly to finance people such as Google Docs, Data Studio, or anything which can connect to Big Query. Here is a great example of labels across multiple projects to split costs into something friendlier to organizations, in this case a Data Studio report.