Akka Stream - Timer or Scheduler like CRON - scala

I use Akka Stream on Scala. I'd like to set a scheduler which runs on every 24:00. I tried to search for it. But I could't find what I want to do. Could you tell me how to write code?

This is mentioned in a comment but should really be the preferred solution using only akka-streams:
Source.tick(0.seconds, 24.hours, Done).runForeach { x =>
//do something
}

Use the build in Akka scheduler, see:
http://doc.akka.io/docs/akka/current/scala/scheduler.html
You can use the scheduler like:
system.scheduler.schedule(
initialDelay = FiniteDuration(/*offset to next 24:00*/),
interval = FiniteDuration(24, TimeUnit.HOURS),
receiver = self,
message = ScheduleAkkaStream
)
Then in the actor, when the ScheduleAkkaStream is received, run the job

The most commonly used one is akka quartz scheduler:
https://github.com/enragedginger/akka-quartz-scheduler
This one written by me and has no additional dependencies, a bit more lightweight than using quartz with fewer bells and whistles:
https://github.com/johanandren/akron

I used:
system.scheduler.scheduleWithFixedDelay(10.seconds, 30.seconds)(
() => {
println("Action")
}
)

Related

How to run periodic tasks in an Apache Storm topology?

I have an Apache Storm topology and would like to perform a certain action every once in a while. I'm not sure how to approach this in a way which would be natural and elegant.
Should it be a Bolt or a Spout using ScheduledExecutorService, or something else?
Tick tuples are a decent option https://kitmenke.com/blog/2014/08/04/tick-tuples-within-storm/
Edit: Here's the essential code for your bolt
#Override
public Map<String, Object> getComponentConfiguration() {
// configure how often a tick tuple will be sent to our bolt
Config conf = new Config();
conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 300);
return conf;
}
Then you can use TupleUtils.isTick(tuple) in execute to check whether the received tuple is a tick tuple.
I don't know if this is a correct approach, but it seems to be working fine:
At the end of the prepare method of a Bolt, I added a call to intiScheduler(), which contains the following code:
Calendar calendar = Calendar.getInstance();
ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor();
scheduler.scheduleAtFixedRate(new PeriodicAction() [class implementing Runnable], millisToFullHour(calendar) [wanna start at the top of the hour], 60*60*1000 [run every hour], TimeUnit.MILLISECONDS);
This needs to be used with caution though, because the bolt can have multiple instances depending on your setup.

Proper way to stop Akka Streams on condition

I have been successfully using FileIO to stream the contents of a file, compute some transformations for each line and aggregate/reduce the results.
Now I have a pretty specific use case, where I would like to stop the stream when a condition is reached, so that it is not necessary to read the whole file but the process finishes as soon as possible. What is the recommended way to achieve this?
If the stop condition is "on the outside of the stream"
There is a advanced building-block called KillSwitch that you could use to do this: http://doc.akka.io/japi/akka/2.4.7/akka/stream/KillSwitches.html The stream would get shut down once the kill switch is notified.
It has methods like abort(reason) / shutdown etc, see here for it's API: http://doc.akka.io/japi/akka/2.4.7/akka/stream/SharedKillSwitch.html
Reference documentation is here: http://doc.akka.io/docs/akka/2.4.8/scala/stream/stream-dynamic.html#kill-switch-scala
Example usage would be:
val countingSrc = Source(Stream.from(1)).delay(1.second,
DelayOverflowStrategy.backpressure)
val lastSnk = Sink.last[Int]
val (killSwitch, last) = countingSrc
.viaMat(KillSwitches.single)(Keep.right)
.toMat(lastSnk)(Keep.both)
.run()
doSomethingElse()
killSwitch.shutdown()
Await.result(last, 1.second) shouldBe 2
If the stop condition is inside the stream
You can use takeWhile to express any condition really, though sometimes take or limit may be also enough "take 10 lnes".
If your logic is very advanced, you could build a special stage that handles that special logic using statefulMapConcat that allows to express literally anything - so you could complete the stream whenever you want to "from the inside".

Performance issue in play application

I have a play(2.3.0) application that does some database lookups. When there are more than 6 users the application runs into performance problems.
I have narrowed down the problem to a controller with an action that does a sleep of 4 seconds.
A test client calls this action every 500 ms. I can see the the first 6 requests are processesed, and it stops a few seconds(until the 4 seconds sleep have passed) and reads the next 6.
Also: when I open 7 browser windows the 7th will not load(waits for connection).
Looking at the documentation it looks like my problem is blocking io and using the highly synchronous profile should solve my problem.
Therefore I added this profile to my application.conf but nothing changes.
my application.conf looks like this
application.context=/appname/
# Secret key
# ~~~~~
# The secret key is used to secure cryptographics functions.
# If you deploy your application to several instances be sure to use the same key!
application.secret="xxxxx"
play {
akka {
akka.loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = WARNING
actor {
default-dispatcher = {
fork-join-executor {
parallelism-min = 300
parallelism-max = 300
}
}
}
}
}
and the action
def performancetestSleep() = Action{ request => {
Thread.sleep(4000)
Ok("hmmm good sleep")
}}
It seems to me the threadpool configuration is ignored. What am I missing here?
What you need for this is really just one thread which handles the 4 second delay - a scheduler. Spawning that many threads defeats the whole point of the architecture that Play has, IMHO. You could then use the scheduler to create a Future[Result] which you'd feed into an Action.async block.
Now, you don't really need to implement your own scheduler since Play depends on Akka for its concurrency; and Akka has a scheduler which will do the job.
import scala.concurrent.{Promise}
import scala.concurrent.duration._
import play.libs.Akka
val system = Akka.system()
def delayedResponse = Action.async {
import system.dispatcher
val promise = Promise[Result]
system.scheduler.scheduleOnce(4000 milliseconds) {
promise.success(Ok("Sorry for the wait!"))
}
promise.future
}
I used
activator run
to start the server, that does not seem to pick up the threadpool profile. Using
activator start
does, and now the profile seems to be used. I now need to test if this solves my problem. Will also have a look at the async call.

django-celery PeriodicTask and eta field

I have a django project in combination with celery and my need is to be able to schedule tasks dynamically, at some point in the future, with recurrence or not. I need the ability to delete/edit already scheduled tasks
So to achieve this at the beginning I started using django-celery with DatabaseScheduler to store some PeriodicTasks (with expiration) to the database as it is described more or less here
In this way if I close my app and start it again my schedules are still there
My problem though still remains since I cannot utilize the eta and schedule a task at some point in the future. Is it possible somehow to dynamically schedule a task with eta?
A second question of mine is whether I can schedule a once off task, like schedule it to run e.g. at 2015-05-15 15:50:00 (that is why I'm trying to use eta)
Finally, I will be scheduling some thousants of notifications, is celery beat capable to handle this number of scheduled tasks? some of them once-off while others being periodic? Or do I have to go with a more advanced solution such as APScheduler
Thank you
I've faced the same problem yesterday. My ugly temporary solution is:
# tasks.py
from djcelery.models import PeriodicTask, IntervalSchedule
from datetime import timedelta, datetime
from django.utils.timezone import now
...
#app.task
def schedule_periodic_task(task='app.tasks.task', task_args=[], task_kwargs={},
interval=(1, 'minute'), expires=now()+timedelta(days=365*100)):
PeriodicTask.objects.filter(name=task+str(task_args)+str(task_kwargs)).delete()
task = PeriodicTask.objects.create(
name=task+str(task_args)+str(task_kwargs), task=task,
args=str(task_args),
kwargs=str(task_kwargs),
interval=IntervalSchedule.objects.get_or_create(
every=interval[0],
period=interval[1])[0],
expires=expires,
)
task.save()
So, if you want to schedule periodic task with eta, you shoud
# anywhere.py
schedule_periodic_task.apply_async(
kwargs={'task': 'grabber.tasks.grab_events',
'task_args': [instance.xbet_id], 'task_kwargs': {},
'interval': (10, 'seconds'),
'expires': instance.start + timedelta(hours=3)},
eta=instance.start,
)
schedule task with eta, which creates periodic task. Ugly:
deal with raw.task.name
strange period (n, 'interval')
Please, let me know, if you designed some pretty solution.

Quartz.Net - delay a simple trigger to start

I have a few jobs setup in Quartz to run at set intervals. The problem is though that when the service starts it tries to start all the jobs at once... is there a way to add a delay to each job using the .xml config?
Here are 2 job trigger examples:
<simple>
<name>ProductSaleInTrigger</name>
<group>Jobs</group>
<description>Triggers the ProductSaleIn job</description>
<misfire-instruction>SmartPolicy</misfire-instruction>
<volatile>false</volatile>
<job-name>ProductSaleIn</job-name>
<job-group>Jobs</job-group>
<repeat-count>RepeatIndefinitely</repeat-count>
<repeat-interval>86400000</repeat-interval>
</simple>
<simple>
<name>CustomersOutTrigger</name>
<group>Jobs</group>
<description>Triggers the CustomersOut job</description>
<misfire-instruction>SmartPolicy</misfire-instruction>
<volatile>false</volatile>
<job-name>CustomersOut</job-name>
<job-group>Jobs</job-group>
<repeat-count>RepeatIndefinitely</repeat-count>
<repeat-interval>43200000</repeat-interval>
</simple>
As you see there are 2 triggers, the first repeats every day, the next repeats twice a day.
My issue is that I want either the first or second job to start a few minutes after the other... (because they are both in the end, accessing the same API and I don't want to overload the request)
Is there a repeat-delay or priority property? I can't find any documentation saying so..
I know you are doing this via XML but in code you can set the StartTimeUtc to delay say 30 seconds like this...
trigger.StartTimeUtc = DateTime.UtcNow.AddSeconds(30);
This isn't exactly a perfect answer for your XML file - but via code you can use the StartAt extension method when building your trigger.
/* calculate the next time you want your job to run - in this case top of the next hour */
var hourFromNow = DateTime.UtcNow.AddHours(1);
var topOfNextHour = new DateTime(hourFromNow.Year, hourFromNow.Month, hourFromNow.Day, hourFromNow.Hour, 0, 0);
/* build your trigger and call 'StartAt' */
TriggerBuilder.Create().WithIdentity("Delayed Job").WithSimpleSchedule(x => x.WithIntervalInSeconds(60).RepeatForever()).StartAt(new DateTimeOffset(topOfNextHour))
You've probably already seen this by now, but it's possible to chain jobs, though it's not supported out of the box.
http://quartznet.sourceforge.net/faq.html#howtochainjobs