Coming from a node.js background, I am new to Scala and I tried using Twitter's Future.collect to perform some simple concurrent operations. But my code shows sequential behavior rather than concurrent behavior. What am I doing wrong?
Here's my code,
import com.twitter.util.Future
def waitForSeconds(seconds: Int, container:String): Future[String] = Future[String] {
Thread.sleep(seconds*1000)
println(container + ": done waiting for " + seconds + " seconds")
container + " :done waiting for " + seconds + " seconds"
}
def mainFunction:String = {
val allTasks = Future.collect(Seq(waitForSeconds(1, "All"), waitForSeconds(3, "All"), waitForSeconds(2, "All")))
val singleTask = waitForSeconds(1, "Single")
allTasks onSuccess { res =>
println("All tasks succeeded with result " + res)
}
singleTask onSuccess { res =>
println("Single task succeeded with result " + res)
}
"Function Complete"
}
println(mainFunction)
and this is the output I get,
All: done waiting for 1 seconds
All: done waiting for 3 seconds
All: done waiting for 2 seconds
Single: done waiting for 1 seconds
All tasks succeeded with result ArraySeq(All :done waiting for 1 seconds, All :done waiting for 3 seconds, All :done waiting for 2 seconds)
Single task succeeded with result Single :done waiting for 1 seconds
Function Complete
The output I expect is,
All: done waiting for 1 seconds
Single: done waiting for 1 seconds
All: done waiting for 2 seconds
All: done waiting for 3 seconds
All tasks succeeded with result ArraySeq(All :done waiting for 1 seconds, All :done waiting for 3 seconds, All :done waiting for 2 seconds)
Single task succeeded with result Single :done waiting for 1 seconds
Function Complete
Twitter's futures are more explicit about where computations are executed than the Scala standard library futures. In particular, Future.apply will capture exceptions safely (like s.c.Future), but it doesn't say anything about which thread the computation will run in. In your case the computations are running in the main thread, which is why you're seeing the results you're seeing.
This approach has several advantages over the standard library's future API. For one thing it keeps method signatures simpler, since there's not an implicit ExecutionContext that has to be passed around everywhere. More importantly it makes it easier to avoid context switches (here's a classic explanation by Brian Degenhardt). In this respect Twitter's Future is more like Scalaz's Task, and has essentially the same performance benefits (described for example in this blog post).
The downside of being more explicit about where computations run is that you have to be more explicit about where computations run. In your case you could write something like this:
import com.twitter.util.{ Future, FuturePool }
val pool = FuturePool.unboundedPool
def waitForSeconds(seconds: Int, container:String): Future[String] = pool {
Thread.sleep(seconds*1000)
println(container + ": done waiting for " + seconds + " seconds")
container + " :done waiting for " + seconds + " seconds"
}
This won't produce exactly the output you're asking for ("Function complete" will be printed first, and allTasks and singleTask aren't sequenced with respect to each other), but it will run the tasks in parallel on separate threads.
(As a footnote: the FuturePool.unboundedPool in my example above is an easy way to create a future pool for a demo, and is often just fine, but it isn't appropriate for CPU-intensive computations—see the FuturePool API docs for other ways to create a future pool that will use an ExecutorService that you provide and can manage yourself.)
Related
The Java docs say the following:
Emit the last value from this Flux only if there were no new values emitted during the time window provided by a publisher for that particular last value.
However I found the above description confusing. I read in gitter chat that its similar to debounce in RxJava. Can someone please illustrate it with an example? I could not find this anywhere after doing a thorough search.
sampleTimeout lets you associate a companion Flux X' to each incoming value x in the source. If X' completes before the next value is emitted in the source, then value x is emitted. If not, x is dropped.
The same processing is applied to subsequent values.
Think of it as splitting the original sequence into windows delimited by the start and completion of each companion flux. If two windows overlap, the value that triggered the first one is dropped.
On the other side, you have sample(Duration) which only deals with a single companion Flux. It splits the sequence into windows that are contiguous, at a regular time period, and drops all but the last element emitted during a particular window.
(edit): about your use case
If I understand correctly, it looks like you have a processing of varying length that you want to schedule periodically, but you also don't want to consider values for which processing takes more than one period?
If so, it sounds like you want to 1) isolate your processing in its own thread using publishOn and 2) simply need sample(Duration) for the second part of the requirement (the delay allocated to a task is not changing).
Something like this:
List<Long> passed =
//regular scheduling:
Flux.interval(Duration.ofMillis(200))
//this is only to show that processing is indeed started regularly
.elapsed()
//this is to isolate the blocking processing
.publishOn(Schedulers.elastic())
//blocking processing itself
.map(tuple -> {
long l = tuple.getT2();
int sleep = l % 2 == 0 || l % 5 == 0 ? 100 : 210;
System.out.println(tuple.getT1() + "ms later - " + tuple.getT2() + ": sleeping for " + sleep + "ms");
try {
Thread.sleep(sleep);
} catch (InterruptedException e) {
e.printStackTrace();
}
return l;
})
//this is where we say "drop if too long"
.sample(Duration.ofMillis(200))
//the rest is to make it finite and print the processed values that passed
.take(10)
.collectList()
.block();
System.out.println(passed);
Which outputs:
205ms later - 0: sleeping for 100ms
201ms later - 1: sleeping for 210ms
200ms later - 2: sleeping for 100ms
199ms later - 3: sleeping for 210ms
201ms later - 4: sleeping for 100ms
200ms later - 5: sleeping for 100ms
201ms later - 6: sleeping for 100ms
196ms later - 7: sleeping for 210ms
204ms later - 8: sleeping for 100ms
198ms later - 9: sleeping for 210ms
201ms later - 10: sleeping for 100ms
196ms later - 11: sleeping for 210ms
200ms later - 12: sleeping for 100ms
202ms later - 13: sleeping for 210ms
202ms later - 14: sleeping for 100ms
200ms later - 15: sleeping for 100ms
[0, 2, 4, 5, 6, 8, 10, 12, 14, 15]
So the blocking processing is triggered approximately every 200ms, and only values that where processed within 200ms are kept.
I'm running a task every second, and it seems celery doesn't actually perform the task every second.
I guess celery might be a good scheduler for every 1 minute task, but might not be adequte for every second task.
Here's the picture which illustrates what I mean.
I'm using the following options
'schedule': 1.0,
'args': [],
'options': {
'expires': 3
}
And I'm using celery 4.0.0
Yes, Celery actually handles times as low as 1 second, and possibly lower since it takes a float. See this entry of periodic tasks in the docs http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html:
from celery import Celery
from celery.schedules import crontab
app = Celery()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
# Calls test('hello') every 10 seconds.
sender.add_periodic_task(10.0, test.s('hello'), name='add every 10')
# Calls test('world') every 30 seconds
sender.add_periodic_task(30.0, test.s('world'), expires=10)
# Executes every Monday morning at 7:30 a.m.
sender.add_periodic_task(
crontab(hour=7, minute=30, day_of_week=1),
test.s('Happy Mondays!'),
)
#app.task
def test(arg):
print(arg)
A better written example can be found 1/3 the way down https://github.com/celery/celery/issues/3589:
# file: tasks.py
from celery import Celery
celery = Celery('tasks', broker='pyamqp://guest#localhost//')
#celery.task
def add(x, y):
return x + y
#celery.on_after_configure.connect
def add_periodic(**kwargs):
celery.add_periodic_task(10.0, add.s(2,3), name='add every 10')
So sender is the actual Celery broker, i.e. app = Celery()
Celery tasks have the limitation that they can not call subprocesses. Call backs and other canvas functionality is handled by how the tasks are called and related through the various canvas functions.
But when you schedule tasks through celery beat it appear that the only option is to call a single task without any of the canvas funcitons.
I need to schedule a task that has call backs. Is there anyway to do that in celery?
Its possible I could accomplish what I need in a single task but it would involve having significant memory overhead and take a long time to execute with many db hits. Is that a good idea?
You can just generate the signature for whatever you need and pass it in the options dictionary. Here is a working tasks.py for a simple chain:
from datetime import timedelta
from celery import Celery, signature
app = Celery('tasks', broker='redis://localhost')
app.conf.update(
beat_schedule={'add_divide': {'task': 'tasks.add',
'args': (5, 7),
'schedule': timedelta(seconds=10),
'options': {'queue': 'testq',
'link': signature('tasks.divide',
args=(4, ),
queue='testq'
)
}
}
}
)
#app.task
def add(x, y):
z = x + y
print('sum: {0}'.format(z))
return z
#app.task
def divide(x, y):
z = x / y
print('divide: {0}'.format(z))
return z
Running celery worker -A tasks.celery -Q testq --beat -c1 will output something like:
[2017-04-07 00:00:00,000: WARNING/PoolWorker-2] sum: 12
[2017-04-07 00:00:00,050: WARNING/PoolWorker-2] divide: 3
I want to process several tasks in parallel inside a Action, and push back any task result in first-completed order and as soon as it completes.
For example, if task A completes in 5 secs, task B completes in 3 secs and task C completes in 1 sec, the output should be "C", "B", "A".
The following codes seems output the wrong order and await all the task completes before output the result.
def lookup = Action { implicit req =>
val a = Enumerator( Await.result(Promise.timeout("A", 5 seconds), 1 minute))
val b = Enumerator( Await.result(Promise.timeout("B", 3 seconds), 1 minute))
val c = Enumerator( Await.result(Promise.timeout("C", 1 second), 1 minute))
val d = a >- b >- c
Ok.chunked(d &> Comet(callback = "console.log"))
}
Your code is broken because of how you are using Await.result. The line that defines a doesn't complete until Await.result returns, and so the promise for b never starts until after the one for a has finished. If you use something like:
val a = Enumerator.flatten(Future.firstCompletedOf(List(
Promise.timeout("A", 5 seconds),
Promise.timeout(throw new Exception("A timed out"), 1 minute)
)).map(Enumerator(_)))
You will get correct behavior.
I want to run a cron job every hour after a certain start time. Currently my cron job expression is
cronExpression = seconds + " " + minutes + " " + hours +"/1" + " " + " * * ? *" ;
(seconds, minutes, hours are passed in by the user selection)
The job starts at the right time and runs every hour until midnight but then stops until the hour on the next day and then resumes. How do I get the job to continuously run and not stop at midnight?
I understand I can change the expression to
cronExpression = seconds + " " + minutes + " " * * * ? *" ;
but then it will not take into account the start time. It will just run at every hour.
Thanks in advance,
Rich
Do you mean you want the job to start at the given time and then run once hourly forever? If so, I don't think a cron expression is the right approach.
If you're using a scheduler it should be straightforward to start the job and run forever at a given interval. For example, here's a snippet from the Quartz scheduler docs for JobBuilder:
JobDetail job = newJob(MyJob.class)
.withIdentity("myJob")
.build();
Trigger trigger = newTrigger()
.withIdentity(triggerKey("myTrigger", "myTriggerGroup"))
.withSchedule(simpleSchedule()
.withIntervalInHours(1)
.repeatForever())
.startAt(futureDate(10, MINUTES))
.build();
scheduler.scheduleJob(job, trigger);
Run a wrapper every hour which runs your job if the current time is after your start time.
For example,
case $(date +%H) in
09 |1[0-6] ) pingcheck ;;
esac
would run pingcheck (whatever that is :-) between 09:00 and 16:59.
Use your own expression
cronExpression = seconds + " " + minutes + " " * * * ? *" ;
with the startAt(your start time) method() mentioning the start time of the trigger. This start time will implicitely tell the quartz when the trigger will start coming into effect.