How to ensure distribution of a heavy task to other nodes using dispy? - distributed-computing

I'm currently performing computation of the factorial of 10 random numbers using dispy, which "distributes" the tasks to various nodes.
However, if one of the computation is of the factorial of a large number let's say factorial(100), then if the that task takes a very long time, yet dispy runs it only on a single node.
How do I make sure that dispy breaks down and distributes this task to other nodes, so that it doesn't take so much time?
Here's the code that I have come up with so far, where the factorial of 10 random numbers is calculated and the 5th computation is always of factorial(100) :-
# 'compute' is distributed to each node running 'dispynode'
def compute(n):
import time, socket
ans = 1
for i in range(1,n+1):
ans = ans * i
time.sleep(n)
host = socket.gethostname()
return (host, n,ans)
if __name__ == '__main__':
import dispy, random
cluster = dispy.JobCluster(compute)
jobs = []
for i in range(10):
# schedule execution of 'compute' on a node (running 'dispynode')
# with a parameter (random number in this case)
if(i==5):
job = cluster.submit(100)
else:
job = cluster.submit(random.randint(5,20))
job.id = i # optionally associate an ID to job (if needed later)
jobs.append(job)
# cluster.wait() # waits for all scheduled jobs to finish
for job in jobs:
host, n, ans = job() # waits for job to finish and returns results
print('%s executed job %s at %s with %s as input and %s as output' % (host, job.id, job.start_time, n,ans))
# other fields of 'job' that may be useful:
# print(job.stdout, job.stderr, job.exception, job.ip_addr, job.start_time, job.end_time)
cluster.print_status()

Dispy distributes the tasks as you define them - it doesn't make the tasks more granular for you.
You could create your own logic for granulating the tasks first. That's probably pretty easy to do for a factorial. however I wonder if in your case the performance problem is due to this line:
time.sleep(n)
For factorial(100), why do you want to sleep 100 seconds?

Related

Parallel (Aff) execution with concurrent limit?

What is the way(s) of implementing parallel execution with concurrent processes limit in terms of Aff? I believe there is no method in std libs and didn't find a good full answer on this.
parSequenceWithLmit :: Array (Aff X) -> Int -> Aff (Array X)
Aff X calcs should be made in parallel, but not more the given N concurrent calcs. So it starts N cals, when one is accomplished, the next one (of the left) is started.
For this sort of thing a good mechanism is AVar, which is a blocking mutable cell. It can be conceptually thought of as a one-element blocking queue.
First, an AVar may be either empty or full. You can create an empty one with empty, and then you can "fill" it with a value using put. The useful bit here is that, when you call put and the AVar is already "full", put will block until it's empty again.
Second, you can read the value using take, which will return you the value, but leave the AVar empty at the same time. Similarly to put, if the AVar is empty, take will block until it's full.
So what you can do with it is the following:
Create a single AVar.
Fork off N processes, each of which will take a value from that AVar and process it, then loop. Forever.
Have an orchestrator process, which will iterate over the whole sequence of work and put work items into the AVar.
When all work processes are busy, the orchestrator process will push another value into the AVar, and then will try to push the next one, but will become blocked at this point, because AVar is already full. It will remain blocked until one of the work processes finishes its work and calls take to get the next work item, leaving the AVar empty. This will unblock the orchestrator process, which will immediately push the next work item into AVar, and so on.
The missing bit here is how to stop. If the work processes just do an infinite loop, they will never quit. When the orchestrator process eventually runs out of work and stops filling the AVar, the work processes will just block forever on the take calls. Not good.
So to fight this, have two kinds of work items - (1) actual work and (2) command to stop processing. Then have the orchestrator process first push all the work items, and once that is done, push N commands to stop. Optionally you can push N+1 commands to stop: this will guarantee that the orchestrator process blocks until the last worker has finished.
Putting all of this together, here's a demo program:
module Main where
import Prelude
import Data.Array ((..))
import Data.Foldable (for_)
import Data.Int (toNumber)
import Effect (Effect)
import Effect.AVar (AVar)
import Effect.Aff (Aff, Milliseconds(..), delay, forkAff, launchAff_)
import Effect.Aff.AVar as AVar
import Effect.Class (liftEffect)
import Effect.Console (log)
data Work a = Work a | Done
process :: Int -> AVar (Work Int) -> Aff Unit
process myIndex v = do
w <- AVar.take v
case w of
Done ->
pure unit
Work i -> do
liftEffect $ log $ "Worker " <> show myIndex <> ": Processing " <> show i
delay $ Milliseconds $ toNumber i
liftEffect $ log $ "Worker " <> show myIndex <> ": Processed " <> show i
process myIndex v
main :: Effect Unit
main = launchAff_ do
var <- AVar.empty
for_ (1..5) \idx -> forkAff $ process idx var
let inputs = [100,200,300,300,400,1000,2000,101,102,103,104]
for_ inputs \i -> AVar.put (Work i) var
for_ (1..6) \_ -> AVar.put Done var
In this program my work items are just numbers, which signify the number of milliseconds to sleep. I'm using this as a model of how "expensive" each work item is to process. The program output will be something like this:
Worker 1: Processing 100
Worker 2: Processing 200
Worker 3: Processing 300
Worker 4: Processing 300
Worker 5: Processing 400
Worker 1: Processed 100
Worker 1: Processing 1000
Worker 2: Processed 200
Worker 2: Processing 2000
Worker 3: Processed 300
Worker 3: Processing 101
Worker 4: Processed 300
Worker 4: Processing 102
Worker 5: Processed 400
Worker 5: Processing 103
Worker 3: Processed 101
Worker 3: Processing 104
Worker 4: Processed 102
Worker 5: Processed 103
Worker 3: Processed 104
Worker 1: Processed 1000
Worker 2: Processed 2000

flink job is not distributed across machines

I have small use case in Apache flink, which is, a batch processing system. I need to process a colletion of files. Processing of each file must be handled by one machine. I have this below code. All the time only one task slot is occupied, and the files are processed one after the other. I have 6 nodes (so 6 task managers) and configured 4 task slot in each node. So, i expect 24 files are processed at a time.
class MyMapPartitionFunction extends RichMapPartitionFunction[java.io.File, Int] {
override def mapPartition(
myfiles: java.lang.Iterable[java.io.File],
out:org.apache.flink.util.Collector[Int])
: Unit = {
var temp = myfiles.iterator()
while(temp.hasNext()){
val fp1 = getRuntimeContext.getDistributedCache.getFile("hadoopRun.sh")
val file = new File(temp.next().toURI)
Process(
"/bin/bash ./run.sh " + argumentsList(3)+ "/" + file.getName + " " + argumentsList(7) + "/" + file.getName + ".csv",
new File(fp1.getAbsoluteFile.getParent))
.lines
.foreach{println}
out.collect(1)
}
}
}
I launched flink as ./bin/start-cluster.sh command and the web user interface shows it has 6 task managers, 24 task slots.
The folders contain about 49 files. When I create mapPartition on this collection, i expect 49 parallel processes are spanned. But then, in my infrastructure, they are all processed one after the other. This means that only one machine (one task manager) handles all the 49 filenames. What i want is, as configured 2 tasks per slots, I expect 24 files to be processed simultaneously.
Any pointers will surely help here. I have these parameters in flink-conf.yaml file
jobmanager.heap.mb: 2048
taskmanager.heap.mb: 1024
taskmanager.numberOfTaskSlots: 4
taskmanager.memory.preallocate: false
parallelism.default: 24
Thanks in advance. Can someone throw me light on where I am going wrong?
As David described the problem is that env.fromCollection(Iterable[T]) creates a DataSource with a non parallel InputFormat. Therefore, the DataSource is executed with a parallelism of 1. The subsequent operators (mapPartition) inherit this parallelism from the source so that they can be chained (this saves us one network shuffle).
The way to solve this problem is to either explicitly rebalance the source DataSet via
env.fromCollection(folders).rebalance()
or to explicitly set the wished parallelism at the subsequent operator (mapPartition):
env.fromCollection(folders).mapPartition(...).setParallelism(49)

How to implement 'time' to call functions in Matlab?

I'm trying to simulate in Matlab the traffic of a network of 14 nodes and 21 links. I have a function called "New_Connection" and another called " Close_connection " among others.
For now I have implemented traffic using a 'for' loop. In each iteration is called "New_Connection" that randomly chooses a source node , a destination node and a random duration (now this value is an integer) is executed. This connection may or may not be set (lock).
After, is called the "Close_connection" function, which checks all connection times (stored in an array) and if they have a value of 0 closes the connection.
Finally, before the end of the loop, is subtracted a temporary unit to all connections established least the last one.
What I would like is to perform this simulation using a system that implements a time (eg 1 minute) and at any time, any node establishes a new connection. For example:
t=0.000134 s ---- Node1 to Node8
t=0.003024 s ---- Node12 to Node11
t=0.003799 s ---- Node6 to Node3
.
.
.
t=59.341432 s ---- Node1 to Node4
And the "Close_Connection" function considers these time to close connections.
I have searched for information on Simulink, SimEvents, Parallel computing, Discrete event simulation... but I can not really understand the functioning.
Thank you very much in advance and apologies for my English.
You don't need to to use a complex framework like SIMEVENTS. For simple tasks you can write your own event queue. The following code implements a simple scenario. Create new Connections ever T=Uniform(0,10) seconds, delete the connection after 10s
%max duration
SIMTIME=60;
T=0;
NODES=[1:20];
%Constructor for new events. 'Command' is a string, 'data' gives the parameters
MAKEEVENT=#(t,c,d)(struct('time',t,'command',c,'data',{d}));
%create event to end simulation
QUEUE(1)=MAKEEVENT(SIMTIME,'ENDSIM',[]);
%create initial event to create the first connection
QUEUE(end+1)=MAKEEVENT(0,'PRODUCECONNECTION',[]);
RUN=true;
while RUN
[nT,cevent]=min([QUEUE.time]);
assert(nT>=T,'event was created for the past')
T=nT;
EVENT=QUEUE(cevent);
QUEUE(cevent)=[];
fprintf('T=%f\n',T)
switch (EVENT.command)
case 'ENDSIM'
%maybe collect data here
RUN=false;
case 'PRODUCECONNECTION'
%standard producer pattern
%Create a connection between two random nodes every 10s
next=rand*10;
QUEUE(end+1)=MAKEEVENT(T+next,'PRODUCECONNECTION',[]);
R=randperm(size(NODES,2));
first=NODES(R(1));
second=NODES(R(2));
fprintf('CONNECT NODE %d and %d\n',first,second)
%connection will last for 20s
QUEUE(end+1)=MAKEEVENT(T+next,'RELEASECONNECTION',{first,second});
case 'RELEASECONNECTION'
first=EVENT.data{1};
second=EVENT.data{2};
fprintf('DISCONNNECT NODE %d and %d\n',first,second)
end
end

Python-Multithreading Time Sensitive Task

from random import randrange
from time import sleep
#import thread
from threading import Thread
from Queue import Queue
'''The idea is that there is a Seeker method that would search a location
for task, I have no idea how many task there will be, could be 1 could be 100.
Each task needs to be put into a thread, does its thing and finishes. I have
stripped down a lot of what this is really suppose to do just to focus on the
correct queuing and threading aspect of the program. The locking was just
me experimenting with locking'''
class Runner(Thread):
current_queue_size = 0
def __init__(self, queue):
self.queue = queue
data = queue.get()
self.ID = data[0]
self.timer = data[1]
#self.lock = data[2]
Runner.current_queue_size += 1
Thread.__init__(self)
def run(self):
#self.lock.acquire()
print "running {ID}, will run for: {t} seconds.".format(ID = self.ID,
t = self.timer)
print "Queue size: {s}".format(s = Runner.current_queue_size)
sleep(self.timer)
Runner.current_queue_size -= 1
print "{ID} done, terminating, ran for {t}".format(ID = self.ID,
t = self.timer)
print "Queue size: {s}".format(s = Runner.current_queue_size)
#self.lock.release()
sleep(1)
self.queue.task_done()
def seeker():
'''Gathers data that would need to enter its own thread.
For now it just uses a count and random numbers to assign
both a task ID and a time for each task'''
queue = Queue()
queue_item = {}
count = 1
#lock = thread.allocate_lock()
while (count <= 40):
random_number = randrange(1,350)
queue_item[count] = random_number
print "{count} dict ID {key}: value {val}".format(count = count, key = random_number,
val = random_number)
count += 1
for n in queue_item:
#queue.put((n,queue_item[n],lock))
queue.put((n,queue_item[n]))
'''I assume it is OK to put a tulip in and pull it out later'''
worker = Runner(queue)
worker.setDaemon(True)
worker.start()
worker.join()
'''Which one of these is necessary and why? The queue object
joining or the thread object'''
#queue.join()
if __name__ == '__main__':
seeker()
I have put most of my questions in the code itself, but to go over the main points (Python2.7):
I want to make sure I am not creating some massive memory leak for myself later.
I have noticed that when I run it at a count of 40 in putty or VNC on
my linuxbox that I don't always get all of the output, but when
I use IDLE and Aptana on windows, I do.
Yes I understand that the point of Queue is to stagger out your
Threads so you are not flooding your system's memory, but the task at
hand are time sensitive so they need to be processed as soon as they
are detected regardless of how many or how little there are; I have
found that when I have Queue I can clearly dictate when a task has
finished as oppose to letting the garbage collector guess.
I still don't know why I am able to get away with using either the
.join() on the thread or queue object.
Tips, tricks, general help.
Thanks for reading.
If I understand you correctly you need a thread to monitor something to see if there are tasks that need to be done. If a task is found you want that to run in parallel with the seeker and other currently running tasks.
If this is the case then I think you might be going about this wrong. Take a look at how the GIL works in Python. I think what you might really want here is multiprocessing.
Take a look at this from the pydocs:
CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

tasknumber in MATLAB distributed jobs?

I'm running a distributed job on a cluster. I need to execute a script that sends me an email when the last task finishes (rather, all the tasks are complete). I have my script ready, but I'm not sure how to go about finding task completion. Is there a task ID analogous to labindex?
The reason I want to build this email feature into the job is so that I can just quit MATLAB after submission and collect my data when it's done. That way I won't waste resources pinging it often to get its state.
jobMgr = findResource(parameters for your cluster's job manager...);
job = createJob(jobMgr);
set(job, 'JobData', yourdata);
set(job, 'MaximumNumberOfWorkers', yourmaxworkers);
set(job, 'PathDependencies', yourpathdeps);
set(job, 'FileDependencies', yourfiledeps);
set(job, 'Timeout', yourtimeout);
for m = 1:numjobs
task(m) = createTask(job, #parallelfoo, 1, {m});
% Calls taskFinish when the task completes
set(task(m), 'FinishedFcn', {#taskFinish, m});
end
Elsewhere, you'll have defined a function taskFinish that gets a callback when each task completes.
function taskFinish(taskObj, eventData, tasknum)
disp(['Task ' num2str(tasknum) ' completed']);
end
Note, this code was written for the original release of the Distributed Computing Toolbox (which was subsequently renamed the Parallel Computing Toolbox), so it's possible that there are more elegant ways of accomplishing what you're trying to do. This gets the job done though, with one caveat -- my understanding is that this callback functionality only works if you're running the MATLAB job manager on your cluster (not one of the third party MPI job managers such as TORQUE).
I'm not sure if this answers much of your question, but you can get at the Task ID when running on the workers like so:
t = getCurrentTask();
tid = t.ID;
However, note that most schedulers execute tasks in an arbitrary order...