Akka test how to use multiple Executioncontexts - scala

my actor runs on default akka dispatcher, which then calls a method which returns a future. I have configured different executioncontexts for all futures to run (since they are blocking(due to db calls) and to keep actors dispatcher dedicated to non blocking actors only. Wondering if this code can be tested (continue using two execution contexts etc) using Akka Testkit? If so what would be the way to configure a test so Actor runs on default dispatcher and futures can find "custom-dispatcher" as well for them to run? Obviously currently test throws following.
Caused by: akka.ConfigurationException: Dispatcher [custom-dispatcher] not configured

When you create an Akka Testkit's TestActorRef for an Actor it will use PinnedDispatcher except you've specified a different one in Actor's Props and passed that Props when creating the TestActorRef.
The exception "Dispatcher [custom-dispatcher] not configured" may mean that you are using different Akka config for your tests in which no dispatcher with name [custom-dispatcher] configured.

create a file application.conf in your test/resources directory
my-custom-dispatcher {
executor = "thread-pool-executor"
type = PinnedDispatcher
}
then in your test when you create the actor
val boothWorker = system.actorOf(Props(classOf[WorkerTest]))
.withDispatcher("my-custom-dispatcher"))

Related

Flink job cant use savepoint in a batch job

Let me start in a generic fashion to see if I somehow missed some concepts: I have a streaming flink job from which I created a savepoint. Simplified version of this job looks like this
Pseduo-Code:
val flink = StreamExecutionEnvironment.getExecutionEnvironment
val stream = if (batchMode) {
flink.readFile(path)
}
else {
flink.addKafkaSource(topicName)
}
stream.keyBy(key)
stream.process(new ProcessorWithKeyedState())
CassandraSink.addSink(stream)
This works fine as long as I run the job without a savepoint. If I start the job from a savepoint I get an exception which looks like this
Caused by: java.lang.UnsupportedOperationException: Checkpoints are not supported in a single key state backend
at org.apache.flink.streaming.api.operators.sorted.state.NonCheckpointingStorageAccess.resolveCheckpoint(NonCheckpointingStorageAccess.java:43)
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1623)
at org.apache.flink.runtime.scheduler.SchedulerBase.tryRestoreExecutionGraphFromSavepoint(SchedulerBase.java:362)
at org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:292)
at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:249)
I could work around this if I set the option:
execution.batch-state-backend.enabled: false
but this eventually results in another error:
Caused by: java.lang.IllegalArgumentException: The fraction of memory to allocate should not be 0. Please make sure that all types of managed memory consumers contained in the job are configured with a non-negative weight via `taskmanager.memory.managed.consumer-weights`.
at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160)
at org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:673)
at org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653)
at org.apache.flink.runtime.memory.MemoryManager.getSharedMemoryResourceForManagedMemory(MemoryManager.java:526)
Of course I tried to set the config key taskmanager.memory.managed.consumer-weights (used DATAPROC:70,PYTHON:30) but this doesn't seems to have any effects.
So I wonder if I have a conceptual error and can't reuse savepoints from a streaming job in a batch job or if I simply have a problem in my configuration. Any hints?
After a hint from the flink user-group it turned out that it is NOT possible to reuse a savepoint from the streaming job (https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/execution_mode/#state-backends--state). So instead of running the job as in batch-mode (flink.setRuntimeMode(RuntimeExecutionMode.BATCH)) I just run it in the default execution mode (STREAMING). This has the minor downside that it will run forever and have to be stopped by someone once all data was processed.

Shutdown Hook for spark batch application

I have a spark scala batch application. It commits the run status to mariadb when it completes or fails.
I want to implement an edge case when the application is killed by say "yarn application -kill [appid]", I want to update the status as failed in mariadb table.
I planned to use "ShutdownHookManager" for the same but I see it is private in spark and the scala sys.ShutdownHookThread does not work as well.
Can somebody guide me on the shutdown hook handling of killing spark batch application. Not much resources on the same.
You can create a custom SparkListener that reacts on the onApplicationEnd event:
class MyListener extends SparkListener {
override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = {
println("Shutting down...")
}
}
This listener can then be added to the SparkContext:
spark.sparkContext.addSparkListener(new MyListener())
When the Spark application terminates, the string Shutting down... is printed on the console.

Is FAIR available for Spark Standalone cluster mode?

I'm having 2 node cluster with spark standalone cluster manager. I'm triggering more than one job using same sc with Scala multi threading.What I found is my jobs are scheduled one after another because of FIFO nature so I tried to use FAIR scheduling
conf.set("spark.scheduler.mode", "FAIR")
conf.set("spark.scheduler.allocation.file", sys.env("SPARK_HOME") + "/conf/fairscheduler.xml")
val job1 = Future {
val job = new Job1()
job.run()
}
val job2 =Future {
val job = new Job2()
job.run()
}
class Job1{
def run()
sc.setLocalProperty("spark.scheduler.pool", "mypool1")
}
}
class Job2{
def run()
sc.setLocalProperty("spark.scheduler.pool", "mypool2")
}
}
<pool name="mypool1">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
<minShare>2</minShare>
</pool>
<pool name="mypool2">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
<minShare>2</minShare>
</pool>
Job1 and Job2 will be triggered from an launcher class. Even after setting these properties, my jobs are handled in FIFO.
Is FAIR available for Spark Standalone cluster mode?Is there a page
where it's described in more details? I can't seem to find much about
FAIR and Standalone in Job Scheduling.I'm following this SOF question.am I missing anything here ?
I don't think standalone is the problem. You described creating only one pool, so I think your problem is that you need at least one more pool and assign each job to a different pool.
FAIR scheduling is done across pools, anything within the same pool will run in FIFO mode anyway.
This is based on the documentation here:
https://spark.apache.org/docs/latest/job-scheduling.html#default-behavior-of-pools

Using Akka Dispatchers for Handling Futures

I have a Spray based HTTP Service. I have a stream that runs inside this HTTP application. Now since this stream does a lot of I/O, I decided to use a separate thread pool. I looked up the Akka documentation to see what I could do so that my thread pool is configurable. I came across the Dispatcher concept in Akka. So I tried to use it as below in my application.conf:
akka {
io-dispatcher {
# Dispatcher is the name of the event-based dispatcher
type = Dispatcher
# What kind of ExecutionService to use
executor = "fork-join-executor"
# Configuration for the fork join pool
fork-join-executor {
# Min number of threads to cap factor-based parallelism number to
parallelism-min = 2
# Parallelism (threads) ... ceil(available processors * factor)
parallelism-factor = 2.0
# Max number of threads to cap factor-based parallelism number to
parallelism-max = 10
}
# Throughput defines the maximum number of messages to be
# processed per actor before the thread jumps to the next actor.
# Set to 1 for as fair as possible.
throughput = 20
}
}
In my Actor, I tried to lookup this configuration as:
context.system.dispatchers.lookup("akka.io-dispatcher")
When I ran my service, I get the following error:
[ERROR] [05/03/2016 12:59:08.673] [my-app-akka.actor.default-dispatcher-2] [akka://my-app/user/myAppSupervisorActor] Dispatcher [akka.io-dispatcher] not configured
akka.ConfigurationException: Dispatcher [akka.io-dispatcher] not configured
at akka.dispatch.Dispatchers.lookupConfigurator(Dispatchers.scala:99)
at akka.dispatch.Dispatchers.lookup(Dispatchers.scala:81)
My questions are:
Is this io-dispatcher thread pool that I create, is it meant to be only used for Actor's? My intention was to use this thread pool for my streams which gets instantiated by one of the Actor. I then pass this thread pool to my stream.
How could I create an ExecutionContext by just loading the dispatcher from the application.conf? Should I use any specific library that would read my configuration for the thread pool and give me an ExecutionContext?
The configuration is correct. All You need to do is to pass the loaded configuration file to the Akka ActorSystem like:
ActorSystem("yourActorSystem", ConfigFactory.load())

Akka disable logging to sysout

Our application uses akka actors (v. 1.1.2) that throw exception from time to time. We would like to have those exceptions logged and are using event-handlers = ["akka.event.slf4j.Slf4jEventHandler"] to turn on the logging using slf4j/logback.
However, error log messages are still also written to stdout.
Is there any setting in akka how to disable EventHandler$DefaultListener which seems to cause that behavior?
Thanks in advance!