I am creating multiple futures and I am expecting only one to achieve the desired goal.
How can I cancel all other futures from within a future?
This is how I create futures:
jobs = days_to_scan.map{|day|
Concurrent::Future.execute do
sleep_time = day.to_f / days_to_scan.count.to_f * seconds_to_complete.to_f
sleep (sleep_time)
if GoogleAPI.new.api_call(#adwords, ad_seeder, visitor, day)
# How to cancel other futures here?
end
end
}
I might be late to the party but I'm gonna reply anyway as other people might stumble upon this question.
So what you want is to probably force-shutdown the thread pool as soon as one Future finishes:
class DailyJobs
def call
thread_pool = ::Concurrent::CachedThreadPool.new
jobs = days_to_scan.map{ |day|
Concurrent::Future.execute(executor: thread_pool) do
sleep_time = day.to_f / days_to_scan.count.to_f * seconds_to_complete.to_f
sleep (sleep_time)
if GoogleAPI.new.api_call(#adwords, ad_seeder, visitor, day)
# How to cancel other futures here?
thread_pool.kill
end
end
}
end
end
the thing is: killing a thread pool is not really recommended and might have unpredictable results
a better approach is to track when a Future is done and ignore other Futures:
class DailyJobs
def call
status = ::Concurrent::AtomicBoolean.new(false)
days_to_scan.map{ |day|
Concurrent::Future.execute do
return if status.true? # Early return so Future does nothing
sleep_time = day.to_f / days_to_scan.count.to_f * seconds_to_complete.to_f
sleep (sleep_time)
if GoogleAPI.new.api_call(#adwords, ad_seeder, visitor, day)
# Do your thing
status.value = true # This will let you know that at least one Future completed
end
end
}
end
end
It is worth noting that if this is a Rails application, you probably want to wrap your Future on Rails executor to avoid autoloading and deadlock issues. I wrote about it here
Okey, I could implement it as:
#wait until one job has achieved the goal
while jobs.select{|job| job.value == 'L' }.count == 0 && jobs.select{|job| [:rejected, :fulfilled].include?(job.state) }.count != jobs.count
sleep(0.1)
end
#cancel other jobs
jobs.each{|job| job.cancel unless (job.state == :fulfilled && job.value == success_value) }
}
Related
I have a load of test which I want to rerun if there is a particular exception. The reason for this is that I am running real API calls to a server and sometimes I hit the rate limit for the API, in which case I want to wait and try again.
However, I am also using a pytest fixture to make each test is run several times, because I am sending requests to different servers (the actual use case is different cryptocurrency exchanges).
Using pytest-rerunfailures comes very close to what I need...apart from that I can't see how to look at the exception of the last test run in the condition.
Below is some code which shows what I am trying to achieve, but I don't want to write code like this for every test obviously.
#pytest_asyncio.fixture(
params=EXCHANGE_NAMES,
)
async def client(request):
exchange_name = request.param
exchange_client = get_exchange_client(exchange_name)
return exchange_client
def test_something(client):
test_something.count += 1
### This block is the code I want to
try:
result = client.do_something()
except RateLimitException:
test_something.count
if test_something.count <= 3:
sleep_duration = get_sleep_duration(client)
time.sleep(sleep_duration)
# run the same test again
test_something()
else:
raise
expected = [1,2,3]
assert result == expected
You can use the retry library to wrap your actual code in:
#pytest_asyncio.fixture(
params=EXCHANGE_NAMES,
autouse=True,
)
async def client(request):
exchange_name = request.param
exchange_client = get_exchange_client(exchange_name)
return exchange_client
def test_something(client):
actual_test_something(client)
#retry(RateLimitException, tries=3, delay=2)
def actual_test_something(client):
'''Retry on RateLimitException, raise error after 3 attempts, sleep 2 seconds between attempts.'''
result = client.do_something()
expected = [1,2,3]
assert result == expected
The code looks much cleaner this way.
I want to accomplish something like this:
results = []
for i in range(N):
data = generate_data_slowly()
res = tasks.process_data.apply_async(data)
results.append(res)
celery.collect(results).then(tasks.combine_processed_data())
ie launch asynchronous tasks over a long period of time, then schedule a dependent task that will only be executed once all earlier tasks are complete.
I've looked at things like chain and chord, but it seems like they only work if you can construct your task graph completely upfront.
For anyone interested, I ended up using this snippet:
#app.task(bind=True, max_retries=None)
def wait_for(self, task_id_or_ids):
try:
ready = app.AsyncResult(task_id_or_ids).ready()
except TypeError:
ready = all(app.AsyncResult(task_id).ready()
for task_id in task_id_or_ids)
if not ready:
self.retry(countdown=2**self.request.retries)
And writing the workflow something like this:
task_ids = []
for i in range(N):
task = (generate_data_slowly.si(i) |
process_data.si(i)
)
task_id = task.delay().task_id
task_ids.append(task_id)
final_task = (wait_for(task_ids) |
combine_processed_data.si()
)
final_task.delay()
That way you would be running your tasks synchronously.
The solution depends entirely on how and where data are collected. Roughly, given that generate_data_slowly and tasks.process_data are synchronized, a better approach would be to join both in one task (or a chain) and to group them.
chord will allow you to add a callback to that group.
The simplest example would be:
from celery import chord
#app.task
def getnprocess_data():
data = generate_data_slowly()
return whatever_process_data_does(data)
header = [getnprocess_data.s() for i in range(N)]
callback = combine_processed_data.s()
chord(header)(callback).get()
How mature is Chronos? Is it a viable alternative to scheduler like celery-beat?
Right now our scheduling implements a periodic "heartbeat" task that checks of "outstanding" events and fires them if they are overdue. We are using python-dateutil's rrule for defining this.
We are looking at alternatives to this approach, and Chronos seems a very attactive alternative: 1) it would mitigate the necessity to use a heartbeat schedule task, 2) it supports RESTful submission of events with ISO8601 format, 3) has a useful interface for management, and 4) it scales.
The crucial requirement is that scheduling needs to be configurable on the fly from the Web Interface. This is why can't use celerybeat's built-in scheduling out of the box.
Are we going to shoot ourselves in the foot by switching over to Chronos?
This SO has solutions to your dynamic periodic task problem. It's not the accepted answer at the moment:
from djcelery.models import PeriodicTask, IntervalSchedule
from datetime import datetime
class TaskScheduler(models.Model):
periodic_task = models.ForeignKey(PeriodicTask)
#staticmethod
def schedule_every(task_name, period, every, args=None, kwargs=None):
""" schedules a task by name every "every" "period". So an example call would be:
TaskScheduler('mycustomtask', 'seconds', 30, [1,2,3])
that would schedule your custom task to run every 30 seconds with the arguments 1 ,2 and 3 passed to the actual task.
"""
permissible_periods = ['days', 'hours', 'minutes', 'seconds']
if period not in permissible_periods:
raise Exception('Invalid period specified')
# create the periodic task and the interval
ptask_name = "%s_%s" % (task_name, datetime.datetime.now()) # create some name for the period task
interval_schedules = IntervalSchedule.objects.filter(period=period, every=every)
if interval_schedules: # just check if interval schedules exist like that already and reuse em
interval_schedule = interval_schedules[0]
else: # create a brand new interval schedule
interval_schedule = IntervalSchedule()
interval_schedule.every = every # should check to make sure this is a positive int
interval_schedule.period = period
interval_schedule.save()
ptask = PeriodicTask(name=ptask_name, task=task_name, interval=interval_schedule)
if args:
ptask.args = args
if kwargs:
ptask.kwargs = kwargs
ptask.save()
return TaskScheduler.objects.create(periodic_task=ptask)
def stop(self):
"""pauses the task"""
ptask = self.periodic_task
ptask.enabled = False
ptask.save()
def start(self):
"""starts the task"""
ptask = self.periodic_task
ptask.enabled = True
ptask.save()
def terminate(self):
self.stop()
ptask = self.periodic_task
self.delete()
ptask.delete()
I haven't used djcelery yet, but it supposedly has an admin interface for dynamic periodic tasks.
I have a few blocks of code, inside a function of some object, that can run in parallel and speed things up for me.
I tried using subs::parallel in the following way (all of this is in a body of a function):
my $is_a_done = parallelize {
# block a, do some work
return 1;
};
my $is_b_done = parallelize {
# block b, do some work
return 1;
};
my $is_c_done = parallelize {
# block c depends on a so let's wait (block)
if ($is_a_done) {
# do some work
};
return 1;
};
my $is_d_done = parallelize {
# block d, do some work
return 1;
};
if ($is_a_done && $is_b_done && $is_c_done && $is_d_done) {
# just wait for all to finish before the function returns
}
First, notice I use if to wait for threads to block and wait for previous thread to finish when it's needed (a better idea? the if is quite ugly...).
Second, I get an error:
Thread already joined at /usr/local/share/perl/5.10.1/subs/parallel.pm line 259.
Perl exited with active threads:
1 running and unjoined
-1 finished and unjoined
3 running and detached
I haven't seen subs::parallel before, but given that it's doing all of the thread handling for you, and it seems to be doing it wrong, based on the error message, I think it's a bit suspect.
Normally I wouldn't just suggest throwing it out like that, but what you're doing really isn't any harder with the plain threads interface, so why not give that a shot, and simplify the problem a bit? At the same time, I'll give you an answer to the other part of your question.
use threads;
my #jobs;
push #jobs, threads->create(sub {
# do some work
});
push #jobs, threads->create(sub {
# do some other work
});
# Repeat as necessary :)
$_->join for #jobs; # Wait for everything to finish.
You need something a little bit more intricate if you're using the return values from those subs (simply switching to a hash would help a good deal) but in the code sample you provided, you're ignoring them, which makes things easy.
proces P0: proces P1:
while (true) while (true)
{ {
flag[0] = true; flag[1] = true;
while (flag[1]) while (flag[0])
{ {
flag[0] = false; flag[1] = false;
flag[0] = true; flag[1] = true;
} }
crit0(); crit1();
flag[0] = false; flag[1] = false;
rem0(); rem1();
} }
Could someone give me a scenario with context switches to prove if the above stated code meets the requirements of progress and bounded waiting.
And can anyone give me some tips about how to detect if a code meets the requirements of progress or bounded waiting(and maybe including starvation,deadlock and after-you after you)
The two processes are happening at the same time.
The trick here is that since there is nothing truly synchronizing the two programs, something could happen between lines. On the same note, it's possible things happen at the same time.
To see how this can be an issue, think about this situation...
What would happen if the first flag[0] = true and the first flag[1] = true happened on P0/P1 at exactly the same time?
Both process 1 and process 2 would be stuck in a while loop. How would they exit the while loop? One process would have to check while(flag[other]) at exactly the same moment the other process set their flag[me] to true. This is a very narrow time span. It's the equivalent of rolling dice over and over and not continuing until you hit a certain number.
This is why we need something of a higher level to handle the synchronization for us - real locks and the like.
edit: Oh, one other thing. You may want to check to see if the read/write operations are thread safe. What happens if the system tries to write to the bit the same time it tries to read it?
edit2: FYI - http://msdn.microsoft.com/en-us/library/aa645755(v=VS.71).aspx