Akka scheduler stops on exception; is it expected? - scala

We run this code:
scheduler.schedule(1 minute, 1 minute) { triggerOperations.tick() }
when starting our application, where scheduler is an Akka actorSystem.scheduler.
If tick() throws an exception, then it is never called again!
I checked the documentation, but cannot find any statement that this is expected. Mostly the description is "Schedules a function to be run repeatedly with an initial delay and a frequency", with no mention that if the function throws an excpetion the task will stop firing.
Our akka version is 2.3.2.
http://doc.akka.io/docs/akka/2.3.4/scala/scheduler.html
http://doc.akka.io/api/akka/2.0/akka/actor/Scheduler.html
Is this behavior expected? Is it documented anywhere?

When in doubt, go to source. The code is a bit terse, but this fragment:
override def run(): Unit = {
try {
runnable.run()
val driftNanos = clock() - getAndAdd(delay.toNanos)
if (self.get != null)
swap(schedule(executor, this, Duration.fromNanos(Math.max(delay.toNanos - driftNanos, 1))))
} catch {
case _: SchedulerException ⇒ // ignore failure to enqueue or terminated target actor
}
}
shows that if your runnable throws, scheduler does not reschedule the next execution (which happens inside swap as far as I understand).

Related

Stop Flink Kafka consumer task programmatically

I'm using Kafka consumer with Flink 1.9 (in Scala 2.12), and facing the following problem (similar to this question): the consumer should stop fetching data (and finish the task) when no new messages are received for a specific amount of time (since the stream is potentially infinite, so there is no "end-of-stream" message in the topic itself).
I've tried to use ProcessFunction which calls consumer.close(), but this did not help (consumer continues to run). Throwing an exception in ProcessFunction kills the job completely, which is not what I want (since the job consists of several stages, which are canceled after throwing an exception). Here is my ProcessFunction:
class TimeOutFunction( // delay after which an alert flag is thrown
val timeOut: Long, consumer: FlinkKafkaConsumer[Row]
) extends ProcessFunction[Row, Row] {
// state to remember the last timer set
private var lastTimer: ValueState[Long] = _
override def open(conf: Configuration): Unit = { // setup timer state
val lastTimerDesc = new ValueStateDescriptor[Long]("lastTimer", classOf[Long])
lastTimer = getRuntimeContext.getState(lastTimerDesc)
}
override def processElement(value: Row, ctx: ProcessFunction[Row, Row]#Context, out: Collector[Row]): Unit = { // get current time and compute timeout time
val currentTime = ctx.timerService.currentProcessingTime
val timeoutTime = currentTime + timeOut
// register timer for timeout time
ctx.timerService.registerProcessingTimeTimer(timeoutTime)
// remember timeout time
lastTimer.update(timeoutTime)
// throughput the event
out.collect(value)
}
override def onTimer(timestamp: Long, ctx: ProcessFunction[Row, Row]#OnTimerContext, out: Collector[Row]): Unit = {
// check if this was the last timer we registered
if (timestamp == lastTimer.value) {
// it was, so no data was received afterwards.
// stop the consumer.
consumer.close()
}
}
}
The isEndOfStream() method on a deserialization schema is also no good, since it requires nextElement (and my case is kind of vice-versa, since the stream should stop when there is no next element for some time).
So, there is a way to do this (preferably without subclassing FlinkKafkaConsumer and/or using reflection)?

Strange timeout with ScalaTest's Selenium DSL

I'm writing Selenium tests with ScalaTest's Selenium DSL and I'm running into timeouts I can't explain. To make matters more complicated, they only seem to happen some of the time.
The problem occurs whenever I access an Element after a page load or some Javascript rendering. It looks like this:
click on "editEmployee"
eventually {
textField(name("firstName")).value = "Steve"
}
My PatienceConfig is configured like this:
override implicit val patienceConfig: PatienceConfig =
PatienceConfig(timeout = Span(5, Seconds), interval = Span(50, Millis))
The test fails with the following error:
- should not display the old data after an employee was edited *** FAILED ***
The code passed to eventually never returned normally. Attempted 1 times over 10.023253653000001 seconds.
Last failure message: WebElement 'firstName' not found.. (EditOwnerTest.scala:24)
It makes sense that it doesn't succeed immediately, because the click causes some rendering, and the textfield may not be available right away. However, it shouldn't take 10 seconds to make an attempt to find it, right?
Also, I find it very interesting that the eventually block tried it only once, and that it took almost precisely 10 seconds. This smells like a timeout occurred somewhere, and it's not my PatienceConfig, because that was set to time out after 5 seconds.
With this workaround, it does work:
click on "editEmployee"
eventually {
find(name("firstName")).value // from ScalaTest's `OptionValues`
}
textField(name("firstName")).value = "Steve"
I did some digging in the ScalaTest source, and I've noticed that all calls that have this problem (it's not just textField), eventually call webElement at some point. The reason why the workaround works, is because it doesn't call webElement. webElement is defined like this:
def webElement(implicit driver: WebDriver, pos: source.Position = implicitly[source.Position]): WebElement = {
try {
driver.findElement(by)
}
catch {
case e: org.openqa.selenium.NoSuchElementException =>
// the following is avoid the suite instance to be bound/dragged into the messageFun, which can cause serialization problem.
val queryStringValue = queryString
throw new TestFailedException(
(_: StackDepthException) => Some("WebElement '" + queryStringValue + "' not found."),
Some(e),
pos
)
}
}
I've copied that code into my project and played around with it, and it looks like constructing and/or throwing the exception is where most of the 10 seconds are spent.
(EDIT Clarification: I've actually seen the code actually spend its 10 seconds inside the catch block. The implicit wait is set to 0, and besides, if I remove the catch block everything simply works as expected.)
So my question is, what can I do to avoid this strange behaviour? I don't want to have to insert superfluous calls to find all the time, because it's easily forgotten, especially since, as I said, the error occurs only some of the time. (I haven't been able to determine when the behaviour occurs and when it doesn't.)
It is clear that the textField(name("firstName")).value = "Steve" ends up calling the WebElement as you have found out.
Since the issue in the op is happening where ever web elements are involved (which in turn implies that webdriver is involved), I think it is safe to assume that the issue is related to the implicit wait on the Web driver.
implicitlyWait(Span(0, Seconds))
The above should ideally fix the issue. Also, making implicit wait to be 0 is a bad practice. Any web page might have some loading issues. The page load is handled by Selenium outside its wait conditions. But slow element load (may be due to ajax calls) could result in failure. I usually keep 10 seconds as my standard implicit wait. For scenarios which require more wait, explicit waits can be used.
def implicitlyWait(timeout: Span)(implicit driver: WebDriver): Unit = {
driver.manage.timeouts.implicitlyWait(timeout.totalNanos, TimeUnit.NANOSECONDS)
}
Execution Flow:
name("firstName") ends up having value as Query {Val by = By.className("firstName") }.
def name(elementName: String): NameQuery = new NameQuery(elementName)
case class NameQuery(queryString: String) extends Query { val by = By.name(queryString) }
Query is fed to the textField method which calls the Query.webElement as below.
def textField(query: Query)(implicit driver: WebDriver, pos: source.Position): TextField = new TextField(query.webElement)(pos)
sealed trait Query extends Product with Serializable {
val by: By
val queryString: String
def webElement(implicit driver: WebDriver, pos: source.Position = implicitly[source.Position]): WebElement = {
try {
driver.findElement(by)
}
catch {
case e: org.openqa.selenium.NoSuchElementException =>
// the following is avoid the suite instance to be bound/dragged into the messageFun, which can cause serialization problem.
val queryStringValue = queryString
throw new TestFailedException(
(_: StackDepthException) => Some("WebElement '" + queryStringValue + "' not found."),
Some(e),
pos
)
}
}
}
I don't know ScalaTest's specifics, but such strange timeouts usually occur when you're mixing up implicit and explicit waits together.
driver.findElement uses implicit waits internally. And depending on specified explicit waits timeout, you may face with summing both together.
Ideally, implicit waits should be set to 0 to avoid such issues.

Apache Spark: how to cancel job in code and kill running tasks?

I am running a Spark application (version 1.6.0) on a Hadoop cluster with Yarn (version 2.6.0) in client mode. I have a piece of code that runs a long computation, and I want to kill it if it takes too long (and then run some other function instead).
Here is an example:
val conf = new SparkConf().setAppName("TIMEOUT_TEST")
val sc = new SparkContext(conf)
val lst = List(1,2,3)
// setting up an infite action
val future = sc.parallelize(lst).map(while (true) _).collectAsync()
try {
Await.result(future, Duration(30, TimeUnit.SECONDS))
println("success!")
} catch {
case _:Throwable =>
future.cancel()
println("timeout")
}
// sleep for 1 hour to allow inspecting the application in yarn
Thread.sleep(60*60*1000)
sc.stop()
The timeout is set for 30 seconds, but of course the computation is infinite, and so Awaiting on the result of the future will throw an Exception, which will be caught and then the future will be canceled and the backup function will execute.
This all works perfectly well, except that the canceled job doesn't terminate completely: when looking at the web UI for the application, the job is marked as failed, but I can see there are still running tasks inside.
The same thing happens when I use SparkContext.cancelAllJobs or SparkContext.cancelJobGroup. The problem is that even though I manage to get on with my program, the running tasks of the canceled job are still hogging valuable resources (which will eventually slow me down to a near stop).
To sum things up: How do I kill a Spark job in a way that will also terminate all running tasks of that job? (as opposed to what happens now, which is stopping the job from running new tasks, but letting the currently running tasks finish)
UPDATE:
After a long time ignoring this problem, we found a messy but efficient little workaround. Instead of trying to kill the appropriate Spark Job/Stage from within the Spark application, we simply logged the stage ID of all active stages when the timeout occurred, and issued an HTTP GET request to the URL presented by the Spark Web UI used for killing said stages.
I don't know it this answers your question.
My need was to kill jobs hanging for too much time (my jobs extract data from Oracle tables, but for some unknonw reason, seldom the connection hangs forever).
After some study, I came to this solution:
val MAX_JOB_SECONDS = 100
val statusTracker = sc.statusTracker;
val sparkListener = new SparkListener()
{
override def onJobStart(jobStart : SparkListenerJobStart)
{
val jobId = jobStart.jobId
val f = Future
{
var c = MAX_JOB_SECONDS;
var mustCancel = false;
var running = true;
while(!mustCancel && running)
{
Thread.sleep(1000);
c = c - 1;
mustCancel = c <= 0;
val jobInfo = statusTracker.getJobInfo(jobId);
if(jobInfo!=null)
{
val v = jobInfo.get.status()
running = v == JobExecutionStatus.RUNNING
}
else
running = false;
}
if(mustCancel)
{
sc.cancelJob(jobId)
}
}
}
}
sc.addSparkListener(sparkListener)
try
{
val df = spark.sql("SELECT * FROM VERY_BIG_TABLE") //just an example of long-running-job
println(df.count)
}
catch
{
case exc: org.apache.spark.SparkException =>
{
if(exc.getMessage.contains("cancelled"))
throw new Exception("Job forcibly cancelled")
else
throw exc
}
case ex : Throwable =>
{
println(s"Another exception: $ex")
}
}
finally
{
sc.removeSparkListener(sparkListener)
}
For the sake of future visitors, Spark introduced the Spark task reaper since 2.0.3, which does address this scenario (more or less) and is a built-in solution.
Note that is can kill an Executor eventually, if the task is not responsive.
Moreover, some built-in Spark sources of data have been refactored to be more responsive to spark:
For the 1.6.0 version, Zohar's solution is a "messy but efficient" one.
According to setJobGroup:
"If interruptOnCancel is set to true for the job group, then job cancellation will result in Thread.interrupt() being called on the job's executor threads."
So the anno function in your map must be interruptible like this:
val future = sc.parallelize(lst).map(while (!Thread.interrupted) _).collectAsync()

shutdown hook won't start upon ^C (scala)

i'm trying to get a clean and gracefull shutdown, and for some reason, it wont execute. iv'e tried:
sys addShutdownHook{
logger.warn("SHUTTING DOWN...")
// irrelevant logic here...
}
and also:
Runtime.getRuntime.addShutdownHook(ThreadOperations.delayOnThread{
logger.warn("SHUTTING DOWN...")
// irrelevant logic here...
}
)
where ThreadOperations.delayOnThread definition is:
object ThreadOperations {
def startOnThread(body: =>Unit) : Thread = {
onThread(true, body)
}
def delayOnThread(body: =>Unit) : Thread = {
onThread(false, body)
}
private def onThread(runNow : Boolean, body: =>Unit) : Thread = {
val t=new Thread {
override def run=body
}
if(runNow){t.start}
t
}
// more irrelevant operations...
}
but when i run my program (executable jar, double activation), the hook does not start. so what am i doing wrong? what is the right way to add a shutdown hook in scala? is it in any way related to the fact i'm using double activation?
double activation is done like that:
object Gate extends App {
val givenArgs = if(args.isEmpty){
Array("run")
}else{
args
}
val jar = Main.getClass.getProtectionDomain().getCodeSource().getLocation().getFile;
val dir = jar.dropRight(jar.split(System.getProperty("file.separator")).last.length + 1)
val arguments = Seq("java", "-cp", jar, "boot.Main") ++ givenArgs.toSeq
Process(arguments, new java.io.File(dir)).run();
}
(scala version: 2.9.2 )
thanks.
In your second attempt, your shutdown hook you seems to just create a thread and never start it (so it just gets garbage collected and does nothing). Did I miss something? (EDIT: yes I did, see comment. My bad).
In the first attempt, the problem might just be that the underlying log has some caching, and the application exits before the log is flushed.
Solved it.
For some reason, I thought that run as opposed to ! would detach the process. It actually hangs on because there are open streams left to the Process, which is returned from run (or maybe it just hangs for another reason, 'cause exec doesn't hang, but returns a Process with open streams to and from the child process, much like run). For this reason, the original process was still alive, and I accidentally sent the signals to it. Of course, it did not contain a handler, or a shutdown hook, so nothing happened.
The solution was to use Runtime.getRuntime.exec(arguments.toArray) instead of Process(arguments, new java.io.File(dir)).run();, close the streams in the Gate object, and send the ^C signal to the right process.

Scala program exiting before the execution and completion of all Scala Actor messages being sent. How to stop this?

I am sending my Scala Actor its messages from a for loop. The scala actor is receiving the
messages and getting to the job of processing them. The actors are processing cpu and disk intensive tasks such as unzipping and storing files. I deduced that the Actor part is working fine by putting in a delay Thread.sleep(200) in my message passing code in the for loop.
for ( val e <- entries ) {
MyActor ! new MyJob(e)
Thread.sleep(100)
}
Now, my problem is that the program exits with a code 0 as soon as the for loop finishes execution. Thus preventing my Actors to finish there jobs. How do I get over this? This may be really a n00b question. Any help is highly appreciated!
Edit 1:
This solved my problem for now:
while(MyActor.getState != Actor.State.Terminated)
Thread.sleep(3000)
Is this the best I can do?
Assume you have one actor you're want to finish its work. To avoid sleep you can create a SyncVar and wait for it to be initialized in the main thread:
val sv = new SyncVar[Boolean]
// start the actor
actor {
// do something
sv.set(true)
}
sv.take
The main thread will wait until some value is assigned to sv, and then be woken up.
If there are multiple actors, then you can either have multiple SyncVars, or do something like this:
class Ref(var count: Int)
val numactors = 50
val cond = new Ref(numactors)
// start your actors
for (i <- 0 until 50) actor {
// do something
cond.synchronized {
cond.count -= 1
cond.notify()
}
}
cond.synchronized {
while (cond.count != 0) cond.wait
}