How to make dworkers for multiprocess? - ipython

I am working on Distributed cluster computing. To implement such system I am trying to use python libs that is dask.distriuted. But there has a problem that is the dworkers are not for multiprocess, means 2 or 3 dworkers, works together but don't support multiple executions that support in multiprocessing lib.
for an example:
def testFun():
while True:
time.sleep(3)
print('looping')
If I executes this function in the client.submit(testFun).It will execute this function for infinite times then it will never come to the next steps. Like for this program:
client.submit(testFun)
client.submit(testFun)
Here until execute the first line it will never come to the next line.
I want to make that dworker for multiprocessing. How will I do this ?

That's because the function has the same signature, only runs one time.
You can tell by the key that is generated. See:
In [5]: client.submit(testFun)
<Future: status: pending, key: testFun-a4102f4653c498f9fafc90003d87bd08>
In [6]: client.submit(testFun)
<Future: status: pending, key: testFun-a4102f4653c498f9fafc90003d87bd08>
Try this
def testFun(x):
while True:
time.sleep(3)
print('looping', x)
In [13]: client.submit(testFun, 1)
<Future: status: pending, key: testFun-afa640a088a357e5f8dd46c1937af3a7>
In [14]: client.submit(testFun, 2)
<Future: status: pending, key: testFun-98309530cb5b26d69131e54a521b8b40>

Related

Parallel (Aff) execution with concurrent limit?

What is the way(s) of implementing parallel execution with concurrent processes limit in terms of Aff? I believe there is no method in std libs and didn't find a good full answer on this.
parSequenceWithLmit :: Array (Aff X) -> Int -> Aff (Array X)
Aff X calcs should be made in parallel, but not more the given N concurrent calcs. So it starts N cals, when one is accomplished, the next one (of the left) is started.
For this sort of thing a good mechanism is AVar, which is a blocking mutable cell. It can be conceptually thought of as a one-element blocking queue.
First, an AVar may be either empty or full. You can create an empty one with empty, and then you can "fill" it with a value using put. The useful bit here is that, when you call put and the AVar is already "full", put will block until it's empty again.
Second, you can read the value using take, which will return you the value, but leave the AVar empty at the same time. Similarly to put, if the AVar is empty, take will block until it's full.
So what you can do with it is the following:
Create a single AVar.
Fork off N processes, each of which will take a value from that AVar and process it, then loop. Forever.
Have an orchestrator process, which will iterate over the whole sequence of work and put work items into the AVar.
When all work processes are busy, the orchestrator process will push another value into the AVar, and then will try to push the next one, but will become blocked at this point, because AVar is already full. It will remain blocked until one of the work processes finishes its work and calls take to get the next work item, leaving the AVar empty. This will unblock the orchestrator process, which will immediately push the next work item into AVar, and so on.
The missing bit here is how to stop. If the work processes just do an infinite loop, they will never quit. When the orchestrator process eventually runs out of work and stops filling the AVar, the work processes will just block forever on the take calls. Not good.
So to fight this, have two kinds of work items - (1) actual work and (2) command to stop processing. Then have the orchestrator process first push all the work items, and once that is done, push N commands to stop. Optionally you can push N+1 commands to stop: this will guarantee that the orchestrator process blocks until the last worker has finished.
Putting all of this together, here's a demo program:
module Main where
import Prelude
import Data.Array ((..))
import Data.Foldable (for_)
import Data.Int (toNumber)
import Effect (Effect)
import Effect.AVar (AVar)
import Effect.Aff (Aff, Milliseconds(..), delay, forkAff, launchAff_)
import Effect.Aff.AVar as AVar
import Effect.Class (liftEffect)
import Effect.Console (log)
data Work a = Work a | Done
process :: Int -> AVar (Work Int) -> Aff Unit
process myIndex v = do
w <- AVar.take v
case w of
Done ->
pure unit
Work i -> do
liftEffect $ log $ "Worker " <> show myIndex <> ": Processing " <> show i
delay $ Milliseconds $ toNumber i
liftEffect $ log $ "Worker " <> show myIndex <> ": Processed " <> show i
process myIndex v
main :: Effect Unit
main = launchAff_ do
var <- AVar.empty
for_ (1..5) \idx -> forkAff $ process idx var
let inputs = [100,200,300,300,400,1000,2000,101,102,103,104]
for_ inputs \i -> AVar.put (Work i) var
for_ (1..6) \_ -> AVar.put Done var
In this program my work items are just numbers, which signify the number of milliseconds to sleep. I'm using this as a model of how "expensive" each work item is to process. The program output will be something like this:
Worker 1: Processing 100
Worker 2: Processing 200
Worker 3: Processing 300
Worker 4: Processing 300
Worker 5: Processing 400
Worker 1: Processed 100
Worker 1: Processing 1000
Worker 2: Processed 200
Worker 2: Processing 2000
Worker 3: Processed 300
Worker 3: Processing 101
Worker 4: Processed 300
Worker 4: Processing 102
Worker 5: Processed 400
Worker 5: Processing 103
Worker 3: Processed 101
Worker 3: Processing 104
Worker 4: Processed 102
Worker 5: Processed 103
Worker 3: Processed 104
Worker 1: Processed 1000
Worker 2: Processed 2000

Abort second job activity if first one aborts

I want abort 2nd job activity if 1st job activity aborts due env. issue or manual abortion.
I gone through the triggering option but couldn't get.
Can anyone help me ?
OK, You can't abort a job without running it. So you would need to be able to invoke Job 2 with knowledge of Job 1 status (e.g. from activity variable Job1.$JobStatus). This could be used to cause Job 2 to abort.
The cleanest solution would be a before-job subroutine, which sets its return code to 0 if Job 1's status was DSJS.RUNOK or DSJS.RUNWARN (these are DataStage constants), or to a non-zero value if Job 1's status was DSJS.RUNFATAL or any other value. This is the cleanest approach because the before-job subroutine could write a message to the job log indicating precisely why the job was being aborted.
A less clean way would be to have a parameter in Job 2 of type, say, Integer (anything other than string), and set its value in the Job Activity using an expression such as If Job1.$JobStatus = DSJS.RUNFATAL Then "" Else 1 - setting a non-string parameter to "" will cause the job to abort with DSJE.PARAMBADVALUE error.
You will need to show us exactly what you did. The use of trigger, which I explained earlier, is the correct solution. If you did it correctly, the arrow on the design canvas should be red.
Alternatively you could change your sequence to bypass Job 2 entirely, and instead simply log a message that Job 2 was being bypassed because Job 1 aborted.

Pulling ipywidget value manually

I have this code ipython code:
import ipywidgets as widgets
from IPython.display import display
import time
w = widgets.Dropdown(
options=['Addition', 'Multiplication', 'Subtraction'],
value='Addition',
description='Task:',
)
def on_change(change):
print("changed to %s" % change['new'])
w.observe(on_change)
display(w)
It works as expected. When the value of the widget changes, the on_change function gets triggered. However, I want to run a long computation and periodically check for updates to the widget. For example:
for i in range(100):
time.sleep(1)
# pull for changes to w here.
# if w.has_changed:
# print(w.value)
How can I achieve this?
For reference, I seem to be able to do the desired polling with
import IPython
ipython = IPython.get_ipython()
ipython.kernel.do_one_iteration()
(I'd still love to have some feedback on whether this works by accident or design.)
I think you need to use threads and hook into the ZMQ event loop. This gist illustrates an example:
https://gist.github.com/maartenbreddels/3378e8257bf0ee18cfcbdacce6e6a77e
Also see https://github.com/jupyter-widgets/ipywidgets/issues/642.
To elaborate on the OP's self answer, this does work. It forces the widgets to sync with the kernel at an arbitrary point in the loop. This can be done right before the accessing the widget.value.
So the full solution would be:
import IPython
ipython = IPython.get_ipython()
last_val = 0
for i in range(100):
time.sleep(1)
ipython.kernel.do_one_iteration()
new_val = w.value
if new_val != old_val:
print(new_val)
old_val = new_val
A slight improvement to the ipython.kernel.do_one_iteration call used
# Max iteration limit, in case I don't know what I'm doing here...
for _ in range(100):
ipython.kernel.do_one_iteration()
if ipython.kernel.msg_queue.empty():
break
In my case, I had a number of UI elements, that could be clicked multiple times between do_one_iteration calls, which will process them one at a time, and with a 1 second time delay, that could get annoying. This will process at most 100 at a time. Tested it by mashing a button multiple times, and now they all get processes as soon as the sleep(1) ends.

Simpy 3.0.4, setting resource priority

I am having trouble with resource priority in simpy. Consider the following code:
import simpy
env = simpy.Environment()
res = simpy.PriorityResource(env, capacity = 1)
def go(id):
with res.request(priority = id) as req:
yield req
print id,res
env.process(go(3))
env.process(go(2))
env.process(go(4))
env.process(go(5))
env.process(go(1))
env.run()
Lower number means higher priority, so I should get 1,2,3,4,5. But instead i am getting 3,1,2,4,5. So the first output is wrong, after that its sorted!
Thanks in advance for your help.
This is correct. When "3" requests the resource, it is empty so it gets the
slot. The remaining processes have to queue and will get the resource in the
order 1, 2, 4, 5.
If you use the PreemptiveResource instead (like request(priority=id,
preempt=True)), 3 will still get the resource first but will be preempted by
2. 2 will then get preempted by 1. 2 and 3 would then have to request the
resource again to gain access to it.
Even I had the same problem where I was supposed to make a factory FIFO. At that time I assigned a reaction time to a part and made it to follow the previous part. That is only if the previous part had got into service of resource, I made the next part request. It solved the problem objectively but seemed like it slowed down the simulation little and also gave a rexn time to the part. It was basically a revamp of the factory process. But I would love to see a feature when the part doesn't have to request again.
Can it be done in the present version?

pySerial buffer won't flush

I'm having a problem with serial IO under both Windows and Linux using pySerial. With this code the device never receives the command and the read times out:
import serial
ser = serial.Serial('/dev/ttyUSB0',9600,timeout=5)
ser.write("get")
ser.flush()
print ser.read()
This code times out the first time through, but subsequent iterations succeed:
import serial
ser = serial.Serial('/dev/ttyUSB0',9600,timeout=5)
while True:
ser.write("get")
ser.flush()
print ser.read()
Can anyone tell what's going on? I tried to add a call to sync() but it wouldn't take a serial object as it's argument.
Thanks,
Robert
Put some delay in between write and read
e.g.
import serial
ser = serial.Serial('/dev/ttyUSB0',9600,timeout=5)
ser.flushInput()
ser.flushOutput()
ser.write("get")
# sleep(1) for 100 millisecond delay
# 100ms dely
sleep(.1)
print ser.read()
Question is really old, but I feel this might be relevant addition.
Some devices (such as Agilent E3631, for example) rely on DTR. Some ultra-cheap adapters do not have DTR line (or do not have it broken out), and using those, such devices may never behave in expected manner (delays between reads and writes get ridiculously long).
If you find yourself wrestling with such a device, my recommendation is to get an adapter with DTR.
This is because pyserial returns from opening the port before it is actually ready. I've noticed that things like flushInput() don't actually clear the input buffer, for example, if called immediately after the open(). Following is code to demonstrate:
import unittest
import serial
import time
"""
1) create a virtual or real connection between COM12 and COM13
2) in a terminal connected to COM12 (at 9600, N81), enter some junk text (e.g.'sdgfdsgasdg')
3) then execute this unit test
"""
class Test_test1(unittest.TestCase):
def test_A(self):
with serial.Serial(port='COM13', baudrate=9600) as s: # open serial port
print("Read ASAP: {}".format(s.read(s.in_waiting)))
time.sleep(0.1) # wait for 100 ms for pyserial port to actually be ready
print("Read after delay: {}".format(s.read(s.in_waiting)))
if __name__ == '__main__':
unittest.main()
"""
output will be:
Read ASAP: b''
Read after delay: b'sdgfdsgasdg'
.
----------------------------------------------------------------------
Ran 1 test in 0.101s
"""
My workaround has been to implement a 100ms delay after opening before doing anything.
Sorry that this is old and obvious to some, but I didn't see this option mentioned here. I ended up calling a read_all() when flush wasn't doing anything with my hardware.
# Stopped reading for a while on the connection so things build up
# Neither of these were working
conn.flush()
conn.flushInput()
# This did the trick, return value is ignored
conn.read_all()
# Waits for next line
conn.read_line()