I've got some code that, as it iterates through a loop, grows by some percent in what is to be processed each time. The first few iterations take 4 seconds, but by the 100th, they're taking minutes - and this is for a lite selection of parameters, as I intend to do 350 iterations. To do serious research with this would take enormous time, and it's really inconvenient that simply running a script ties Matlab's hands behind its back until it's all done, and on top of that it hardly ever uses more than one core at a time.
I understand that turning on a Parallel Pool will enable parallel processing. Even if I can't convert any of the for loops into parfor loops, I understand that running a script as a batch job sends that process into the background, and I can do other things with the Matlab interface and the other 7 processors while I wait for this one to finish.
However, though I have the local parallel pool up and running, and I've checked the syntax for starting a batch job, it's not leaving the "queued" status. The first time I typed in batch('Script4') and hit Enter, and then realized I must have a variable name for the job, so then I did run1 = batch('Script4'). I typed get(run1,'State'), and also checked the Job Monitor, and both told me that its state was "queued".
I did some googling before I came here, and while I found some Q&As of similar experiences, they seemed to be solved by things like waiting for the pool to stop using the whole CPU as it starts up. But I started my pool up a long time ago (and it is still running at this moment!), and when I entered the first batch command, my first clue that something was wrong was that Windows Task Manager said all 8 cores were at 0%.
Is there something I need to call or maybe adjust before it will start executing the queued jobs?
I'm using Matlab R2015a on Windows 7 Enterprise.
I think the problem here is that you're trying to run batch jobs while the parallel pool is open. (Unfortunately, this is a common misunderstanding). Basically, the parallel pool and your batch job are both trying to consume local workers. However, because you opened the parallel pool first, it's consuming all the local workers, and the batch job cannot proceed. You should have seen a warning when you submit the batch job, like this:
>> parpool('local');
Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers.
>> j = batch(#rand, 1, {});
Warning: This job will remain queued until the Parallel Pool is closed.
There are two possible fixes - the first is simple
delete(gcp('nocreate'))
will ensure no parallel pool is open, and your batch submissions should proceed. The second is more appropriate if your tasks are relatively short-lived - you can use parfeval to submit work to an open parallel pool:
f = parfeval(#rand, 1); % initiate 'rand' on the parallel pool workers
fetchOutputs(f); % wait for completion, and retrieve the result
Related
This is purely for a non-eager pytest mode of operation. I want to know when celery has "caught" up with all the outstanding work. Is there any way to find that information? My testing config has a celery_session_app and a single celery_session_worker in it's own thread.
Check the number of entries in the Rabbit queue. This has problems because of pre-fetch. I can set prefetch to 1 and maybe solve it that way but I worry about race conditions. (I'm testing chords and some celery tasks queue other celery tasks)
Add a task to the "end" of the list and then .wait() on it to finish. This has problems for tasks that queue other tasks because the queue is being extended in the other thread so I can be at the end of the list when queued, but that quickly moves forward as tasks are queued behind it. I can work around this using .apply_async(countdown=3) but this is pretty much the definition of a race condition and I might need countdown=4 or I might need nothing and that is some number of seconds wasted on a test regardless.
Use signals (somehow). But what I really need is a worker_is_bored which does not exist and suffers from the same kind of race conditions mentioned above. Tasks queueing tasks could make it flash "bored" and right back to "busy".
time.sleep(N) but what should N be. (i'm running pytest -n 10 so how busy the machine is during tests, is non-trivial). And this wastes time like countdown= above.
I have a memory leak issue in an app that I cannot fix so I have addressed it by using CELERY_WORKER_MAX_MEMORY_PER_CHILD in my django app settings. It appears to be working in that workers that reach the memory limit are reset, but those workers are part of a group within a chain that looks like:
chain(setup | group(job1, job2) | call_back)()
After a worker hits the memory limit while processing one of the jobs within the group it appears that the call_back never gets called because the celery.chord_unlock loops indefinitely. Does the CELERY_WORKER_MAX_MEMORY_PER_CHILD only work with individual tasks? (And not within chains or chords?)
The max memory per child configuration setting most likely works.
I suspect one of the two possibilities here:
A task, or tasks that are executed as part of your Chord reached maximum number retries.
There is an issue with Chord (a bug) that prevents callback from being called.
Run your workers with DEBUG or INFO log-level, and see what is going on.
When a MATLAB parallel pool is started for the first time it typically takes a few seconds. In a user-interactive application there is hence an incentive to make sure there is a parallel pool running before the first demand for computational tasks arrives, so the process of starting a parallel pool isn't added to the total time to respond to the request.
However every programmatic action such as parpool that I've seen that will start a parallel pool blocks execution until the pool is done starting up. This means even if the user has no need to call upon the parallel pool for some time, they cannot do anything else like begin setting up their computationally expensive request – filling in a user interface for instance – until the parallel pool is done starting.
This is very frustrating! If it was any other time-consuming preparatory action, once a parallel pool was in place it could be done in the background using parfeval and wouldn't obstruct the user's workflow until any request that actually called upon the completion of that preparation. But because this task actually addresses the lack of a running parallel pool, it seems users must wait for something they may not actually need to use until long after the task is complete.
Is there any way around this apparent limitation on usability?
There is currently no way to launch a parallel pool in the background. There are a couple of potential mitigations that might help:
Don't ever explicitly call parpool - just let the auto-creation of the pool only start creating the pool when you hit a parallel language construct such as parfor, parfeval, or spmd.
If you're using a cluster which might not be able to service your request for workers for some time, you could use batch to launch the whole computation in the background. (This is probably only appropriate if you've got a fairly long-running computation).
Say you have a message queue that needs to be polled every x seconds. What are the usual ways to poll it and execute HTTP/Rest-based jobs? Do you simply create a cron service and call the worker script every x seconds?
Note: This is for a web application
I would write a windows service which constantly polls/waits for new messages.
Scheduling a program to run every x min has a number of problems
If your interval is too small the program will still be running with the next startup is triggered.
If your interval is too big the queue will fill up between runs.
Generally you expect a constant stream of messages, so there is no problem just keeping the program running 24/7
One common feature of the message queue systems I've worked with is that you don't poll but use a blocking read. If you have more than one waiting worker, the queue system will pick which one gets to process the message.
I'm working on a system that uses several hundreds of workers in parallel (physical devices evaluating small tasks). Some workers are faster than others so I was wondering what the easiest way to load balance tasks on them without a priori knowledge of their speed.
I was thinking about keeping track of the number of tasks a worker is currently working on with a simple counter and then sorting the list to get the worker with the lowest active task count. This way slow workers would get some tasks but not slow down the whole system. The reason I'm asking is that the current round-robin method is causing hold up with some really slow workers (100 times slower than others) that keep accumulating tasks and blocking new tasks.
It should be a simple matter of sorting the list according to the current number of active tasks, but since I would be sorting the list several times a second (average work time per task is below 25ms) I fear that this might be a major bottleneck. So is there a simple version of getting the worker with the lowest task count without having to sort over and over again.
EDIT: The tasks are pushed to the workers via an open TCP connection. Since the dependencies between the tasks are rather complex (exclusive resource usage) let's say that all tasks are assigned to start with. As soon as a task returns from the worker all tasks that are no longer blocked are queued, and a new task is pushed to the worker. The work queue will never be empty.
How about this system:
Worker reaches the end of its task queue
Worker requests more tasks from load balancer
Load balancer assigns N tasks (where N is probably more than 1, perhaps 20 - 50 if these tasks are very small).
In this system, since you are assigning new tasks when the workers are actually done, you don't have to guess at how long the remaining tasks will take.
I think that you need to provide more information about the system:
How do you get a task to a worker? Does the worker request it or does it get pushed?
How do you know if a worker is out of work, or even how much work is it doing?
How are the physical devices modeled?
What you want to do is avoid tracking anything and find a more passive way to distribute the work.