I have a problem, which is similar to Assignment Problem, described as follows:
The problem instance has a number of workers and a number of tasks. Any task can only be assigned to a subset of the workers. A task should be assigned to exactly one worker. Each task has different difficulty, therefore, workers will spend different time to finish each task. It is required to assign tasks as uniformly as possible: all of the workers spent almost same time to finish assigned tasks.
I can translate this problem to Integer Linear Programming problem: the cost function is to minimize the variance of each worker.
Is the problem well studied? Are there any algorithm to solve it more efficiently? (Approximation is acceptable)
Thanks in advance!
We’re generating the data that we might get from a shop floor to run, test, and validate our machine learning models. We first have here a discrete event simulation model for our manufacturing system. Each production order is seen as an agent, which then goes through different processes with a queue (waiting time) and delays (firstly production time, secondly logistics time).
enter image description here
But sometimes we have one process, for example, printing (code 5A, after the second Select5Output), with three different machines, which do not have a particular capacity. It’s time when we divide our order into parts and send them to those machines (very randomly, subjectively).
The data we take is from flowchart_process_states_log in Database.
The data we take is from flowchart_process_states_log in Database.
My questions here are:
How can we define the number of products in each order? Ex. we’re printing card, for one order it may be 10k, for another 8k or 33k. Can we define it as agent’s parameter? Then how can we vary them (stochastically, no exact number needed).
How can we split those 10k cards into three different machines? And then how to get back an complete agent with 10k? The Agent ID should remain the same as we trace and analyse them in ML model. Is it reasonable to see an order as an agent?
How can we multiply the number of our agent after a process? Ex. After cutting 10k pieces we have 20k.
We have the distribution for delay ex. triangle distribution. But we want some disturbances, when it suddenly takes 2 days for that delay instead of 3-4 hours as normal. How to do it?
Thank you in advance for your effort. Every help is highly appreciated, because we're here and learning together. Thank you !!
I am struggling to find a way to implement an asynchronous evolutionary process in MATLAB.
I've already implemented a "standard" GA that evaluates the fitness of individuals in parallel using MATLAB's parfor routine. However, this also means that the different workers in my parallel pool have to wait for the last worker to finish his assesment before I can create a new generation and continue with my parallel fitness evaluation. But when the fitness assessment time is very heterogeneous, such an approach can involve significant idle times.
The suggested solution in the literature is what is sometimes called asynchronous evolution. It means that there are no generations anymore, instead every time a worker becomes idle we immediately let him breed a new individual and evalute its fitness. As such there is hardly any idle time as no worker has to wait for others to complete their fitness assessment. The only problem is that those parallel threads/workers need to work on the same population, i.e. they need to communicate to each other whenever they want to breed a new individual (because they need to select parents out of a population that is constantly altered by the workers).
Now ideally you would control each worker separately, letting him access the joint population to select parents, evaluate the offspring's fitness and then replace a parent in the joint population. That involves an ongoing iterative process where parallel workers need to exchange and alter joint information and doing independent fitness evaluation.
My big question is: How can this be achieved in MATLAB?
Things I've already tried:
MATLAB's 'spmd'-functionality using the labSend and labReceive functions but this also seems to involve waiting for other workers.
creating jobs and assigning them tasks. Does not seem to work as multiple jobs are processed sequentially on the cluster. Only tasks within a job use multiple workers but that fails since you can't assign tasks dynamically (you always have to submit the whole job anew to the queue.
parfeval in a recursive while statement. I do not see how this would reduce waiting time.
So does anyone know a way how to implement something like this in MATLAB?
I have implemented a combinatorial search algorithm (for comparison to a more efficient optimization technique) and tried to improve its runtime with parfor.
Unfortunately, the work assignments appear to be very badly unbalanced.
Each subitem i has complexity of approximately nCr(N - i, 3). As you can see, the tasks i < N/4 involve significantly more work than i > 3*N/4, yet it seems MATLAB is assigning all of i < N/4 to a single worker.
Is it true that MATLAB divides the work based on equally sized subsets of the loop range?
No, this question cites the documentation saying it does not.
Is there a convenient way to rebalance this without hardcoding the number of workers (e.g. if I require exactly 4 workers in the pool, then I could swap the two lowest bits of i with two higher bits in order to ensure each worker received some mix of easy and hard tasks)?
I don't think a full "work-stealing" implementation is necessary, perhaps just assigning 1, 2, 3, 4 to the workers, then when 4 completes first, its worker begins on item 5, and so on. The size of each item is sufficiently larger than the number of iterations that I'm not too worried about the increased communication overhead.
If the loop iterations are indeed distributed ahead of time (which would mean that in the end, there is a single worker that will have to complete several iterations while the other workers are idle - is this really the case?), the easiest way to ensure a mix is to randomly permute the loop iterations:
permutedIterations = randperm(nIterations);
permutedResults = cell(nIterations,1); %# or whatever else you use for storing results
%# run the parfor loop, completing iterations in permuted order
parfor iIter = 1:nIterations
permutedResults(iIter) = f(permutedIterations(iIter));
end
%# reorder results for easier subsequent analysis
results = permutedResults(permutedIterations);
I'm working with a long running parfor loop in matlab.
parfor iter=1:1000
chunk_of_work(iter);
end
There are generally about 2-3 timing outliers per run. That is to say for every 1000 chunks of work performed there are 2-3 that take about 100 times longer than the rest. As the loop nears completion, the workers that evaluated the outliers continue to run while the rest of the workers have no computational load.
This is consistent with the parfor loop distributing work statically. This is in contrast with the documentation for the parallel computing toolbox found here:
"Work distribution is dynamic. Instead of being allocated a fixed
iteration range, the workers are allocated a new iteration only after
they finish processing their current iteration, which results in an
even work load distribution."
Any ideas about what's going on?
I think the doc you quote has a pretty good description what is considered a static allocation of work: each worker "being allocated a fixed iteration range". For 4 workers, this would mean the first being assigned iter 1:250, the second iter 251:500,... or the 1:4:100 for the first, 2:4:1000 for the second and so on.
You did not say exactly what you observe, but what you describe is well consistent with dynamic workload distribution: First, the four (example) workers work on one iter each, the first one that is finished works on a fifth, the next one that is done (which may well be the same if three of the first four take somewhat longer) works on a sixth, and so on. Now if your outliers are number 20, 850 and 900 in the order MATLAB chooses to process the loop iterations and each take 100 times as long, this only means that the 21st to 320th iterations will be solved by three of the four workers while one is busy with the 20th (by 320 it will be done, now assuming roughly even distribution of non-outlier calculation time). The worker being assigned the 850th iteration will, however, continue to run even after another has solved #1000, and the same for #900. In fact, if there were about 1100 iterations, the one working on #900 should be finished roughly at the time when the others are.
[edited as the orginal wording implied MATLAB would still assign the iterations of the parfor loop in order from 1 to 1000, which should not be assumed]
So long story short, unless you find a way to process your outliers first (which of course requires you to know a priori which ones are the outliers, and to find a way to make MATLAB start the parfor loop processing with these), dynamic workload distribution alone cannot avoid the effect you observe.
Addition: I think, however, that your observation that as "the loop nears completion, the worker*s* that evaluated the outliers continue to run" seems to imply at least one of the following
The outliers somehow are among the last iterations MATLAB starts to process
You have many workers, in the order of magnitude of the number of iterations
Your estimate of the number of outliers (2-3) or your estimate of their computation time penalty (factor 100) is too low
The work distribution in PARFOR is somewhat deterministic. You can observe precisely what's going on by having each worker log to disk how things go, but basically it turns out that PARFOR divides your loop up into chunks in a deterministic way, but farms them out dynamically. Unfortunately, there's currently no way to control that chunking.
However, if you cannot predict which of your 1000 cases are going to be outliers, it's hard to imagine an efficient scheme for distributing the work.
If you can predict your outliers, you might be able to take advantage of the fact that roughly speaking, PARFOR executes loop iterations in reverse order, so you could put them at the "end" of the loop so work starts on them immediately.
The problem you face is well described in #arne.b's answer, I have nothing to add to that.
But, the parallel compute toolbox does contain functions for decomposing a job into tasks for independent execution. From your question it's not possible to conclude either that this is suitable or that this is not suitable for your application. If it is, the general strategy is to break the job into tasks of some size and have each processor tackle a task, when finished go back to the stack of unfinished tasks and start on another.
You might be able to decompose your problem such that one task replaces one loop iteration (lots of tasks, lots of overhead in managing the computation but best load-balancing) or so that one task replaces N loop iterations (fewer tasks, less overhead, poorer load-balancing). Jobs and tasks are a little trickier to implement than parfor too.
As an alternative to PARFOR, in R2013b and later, you can use PARFEVAL and divide up the work any way you see fit. You could even cancel the 'timing outliers' once you've got sufficient results, if that's appropriate. There is, of course, overhead when dividing up your existing loop into 1000 individual remote PARFEVAL calls. Perhaps that's a problem, perhaps not. Here's the sort of thing I'm imagining:
for idx = 1:1000
futures(idx) = parfeval(#chunk_of_work, 1, idx);
end
done = false; numComplete = 0;
timer = tic();
while ~done
[idx, result] = fetchNext(futures, 10); % wait up to 10 seconds
if ~isempty(idx)
numComplete = numComplete + 1;
% stash result
end
done = (numComplete == 1000) || (toc(timer) > 100);
end
% cancel outstanding work, has no effect on completed futures
cancel(futures);