I have pretty simple application that has an Akka HTTP endpoint, does some processing and writes results to either of the output files. The code to ensure graceful shutdown looks a bit complicated, is there a way to make it more succinct?
val bindingFuture = Http().newServerAt("localhost", config.port).bind(route)
val validQueue: BoundedSourceQueue[ByteString] = ???
val invalidQueue: BoundedSourceQueue[ByteString] = ???
val validDone: Future[Done] = ???
val invalidDone: Future[Done] = ???
val allDone = Future.sequence(validDone, invalidDone)
bindingFuture.onComplete {
case Success(binding) =>
logger.info("Server started on port {}", config.port)
binding.addToCoordinatedShutdown(5.seconds)
case Failure(ex) =>
logger.error("Can't start server", ex)
system.terminate()
}
allDone.onComplete { result =>
result match {
case Failure(ex) =>
logger.error("Streams completed with error", ex)
case Success(_) =>
logger.info("Streams completed successfully")
}
system.terminate()
}
sys.addShutdownHook {
logger.info("Shutting down...")
validQueue.complete()
invalidQueue.complete()
}
Akka installs JVM shutdown hooks by default, there is no need to add a shutdown hook yourself, you can remove sys.addShutdownHook { ... }
Calling ActorSystem.terminate will also terminate all streams. (Streams can be terminated abruptly, in most applications this is not a problem though, and it sounds like this would also not be a problem in your case.)
Minor cleanup, you could consider using map, recoverWith and andThen.
allDone.map { _ =>
logger.info("Streams completed successfully")
}.recoverWith {
case ex =>
logger.error("Streams completed with error", ex)
}.andThen {
case _ => system.terminate()
}
You could use CoordinatedShutdown:
CoordinatedShutdown(context.system).addTask(CoordinatedShutdown.PhaseServiceRequestsDone, "complete hdfs sinks") { () =>
validQueue.complete()
invalidQueue.complete()
}
You could also use a shared kill switch (https://doc.akka.io/docs/akka/current/stream/stream-dynamic.html#sharedkillswitch), which you can put in your flows with .via(sharedKillSwitch.flow), you could shutdown the switch from the CoordinatedShutdown:
// create it somewhere, use in your flows
val sharedKillSwitch = KillSwitches.shared("hdfs-switch")
// use switch in CoordinatedShutdown
CoordinatedShutdown(context.system).addTask(CoordinatedShutdown.PhaseServiceRequestsDone, "complete hdfs sinks") { () =>
sharedKillSwitch.shutdown()
}
Related
I am using Akka-hhtp (scala) to send multiple http batch requests asynchronously to an API and wondering what is the right way to handle exceptions when the response code is other than 200 OK.
Below is some pseudo-code to demonstrate my point.
/* Using For comprehension here because the server API has restriction on the amount of data we can send and the time it takes them to process each request. So they require us to send multiple mini requests instead. If one of those fails, then our entire job should fail.*/
val eventuallyResponses = for {
batches <- postBatch(payload)
} yield batches
val eventualResponses = Future.sequence(eventuallyResponses)
/* Do I need to recover here? If I don't, will the actor system terminate? */
eventualResponses.recover { case es =>
log.warn("some message")
List()
}
/* As I said I need to wait for all mini batch requests to complete. If one response is different than 200, then the entire job should fail. */
val result = Await.result(eventualResponses, 10.minutes)
actorSystem.terminate().oncomplete{
case Success(_) =>
if (result.isEmpty) =>
/* This doesn't seem to interrupt the program */
throw new RuntimeException("POST failed")
} else {
log.info("POST Successful")
}
case Failure(ex) =>
log.error("error message $ex")
throw ex
}
def postBatch(payload) = {
val responseFuture: Future[HttpResponse] = httpClient.post(payload)
responseFuture.flatMap{ res =>
res.status match {
case StatusCodes.OK => Future.successful(res)
case _ => Future.failed(new RuntimeException("error message"))
}
}
}
The above code throws exception when we receive StatusCodes different than OK. It does go through the branch of result.isEmpty true, but it doesn't seem to stop/interrupt the execution of the program. I need it to do that, as this is scheduled as an Autosys job, and I need to make the job fail if at least one of the batch requests returns different response than 200 OK.
If I don't recover and let the exception be thrown then (when we receive non 200 status code), will the Actor System be terminated properly?
Do you know of a good way to do the above?
Thanks :)
As far as I understand your question you need to throw an exception from main body if some responses haven't status 200.
def postBatch(payload: HttpRequest)(implicit system: ActorSystem, ec: ExecutionContext): Future[HttpResponse] = {
Http().singleRequest(payload).flatMap(response => response.status match {
case StatusCodes.OK => Future.successful(response)
case _ => Future.failed(new RuntimeException("error message"))
})
}
val reuests: List[HttpRequest] = List(...)
/*
You don't need for comprehension here because
val eventuallyResponses = for {
batches <- postBatch(payload)
} yield batches
is equal to
val eventuallyResponses = postBatch(payload)
For comprehension doesn't process recursive sending. If you need it you should write it yourself by flatMap on each request future.
*/
val eventualResponses: Future[List[HttpResponse]] =
Future.sequence(reuests.map(postBatch)) //also, its better to add some throttling logic here
//as far as i understand you need to wait for all responses and stop the actor system after that
Await.ready(eventualResponses, 10 minutes) //wait for all responses
Await.ready(actorSystem.terminate(), Duration.Inf) //wait for actor system termination
//because of Await.ready(eventualResponses, 10 minutes) we can match on the future value and expect that it should be completed
eventualResponses.value match {
case Some(Success(responses)) =>
log.info("All requests completed")
case Some(Failure(exception)) =>
log.error("Some request failed")
throw exception //rethrow this exception
case None =>
log.error("Requests timed out")
throw RuntimeException("Requests timed out")
}
I am toying around trying to use a source.queue from an Actor. I am stuck in parttern match the result of an offer operation
class MarcReaderActor(file: File, sourceQueue: SourceQueueWithComplete[Record]) extends Actor {
val inStream = file.newInputStream
val reader = new MarcStreamReader(inStream)
override def receive: Receive = {
case Process => {
if (reader.hasNext()) {
val record = reader.next()
pipe(sourceQueue.offer(record)) to self
}
}
case f:Future[QueueOfferResult] =>
}
}
}
I don't know how to check if it was Enqueued or Dropped or Failure
if i write f:Future[QueueOfferResult.Enqueued] the compile complain
Since you use pipeTo, you do no need to match on futures - the contents of the future will be sent to the actor when this future is completed, not the future itself. Do this:
override def receive: Receive = {
case Process =>
if (reader.hasNext()) {
val record = reader.next()
pipe(sourceQueue.offer(record)) to self
}
case r: QueueOfferResult =>
r match {
case QueueOfferResult.Enqueued => // element has been consumed
case QueueOfferResult.Dropped => // element has been ignored because of backpressure
case QueueOfferResult.QueueClosed => // the queue upstream has terminated
case QueueOfferResult.Failure(e) => // the queue upstream has failed with an exception
}
case Status.Failure(e) => // future has failed, e.g. because of invalid usage of `offer()`
}
I can't find lifecycle description for High level consumer. I'm on 0.8.2.2 and I can't use "modern" consumer from kafka-clients. Here is my code:
def consume(numberOfEvents: Int, await: Duration = 100.millis): List[MessageEnvelope] = {
val consumerProperties = new Properties()
consumerProperties.put("zookeeper.connect", kafkaConfig.zooKeeperConnectString)
consumerProperties.put("group.id", consumerGroup)
consumerProperties.put("auto.offset.reset", "smallest")
val consumer = Consumer.create(new ConsumerConfig(consumerProperties))
try {
val messageStreams = consumer.createMessageStreams(
Predef.Map(kafkaConfig.topic -> 1),
new DefaultDecoder,
new MessageEnvelopeDecoder)
val receiveMessageFuture = Future[List[MessageEnvelope]] {
messageStreams(kafkaConfig.topic)
.flatMap(stream => stream.take(numberOfEvents).map(_.message()))
}
Await.result(receiveMessageFuture, await)
} finally {
consumer.shutdown()
}
It's not clear to me. Should I shutdown consumer after each message retrieval or I can keep instance and reuse it for message fetching? I suppose reusing instance is the right way, but can't find some articles / best practices.
I'm trying to reuse consumer and / or messageStreams. It doesn't work well for me and I can't find the reason for it.
If I try to reuse messageStreams, I get exception:
2017-04-17_19:57:57.088 ERROR MessageEnvelopeConsumer - Error while awaiting for messages java.lang.IllegalStateException: Iterator is in failed state
java.lang.IllegalStateException: Iterator is in failed state
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:54)
at scala.collection.IterableLike$class.take(IterableLike.scala:134)
at kafka.consumer.KafkaStream.take(KafkaStream.scala:25)
Happens here:
def consume(numberOfEvents: Int, await: Duration = 100.millis): List[MessageEnvelope] = {
try {
val receiveMessageFuture = Future[List[MessageEnvelope]] {
messageStreams(kafkaConfig.topic)
.flatMap(stream => stream.take(numberOfEvents).map(_.message()))
}
Try(Await.result(receiveMessageFuture, await)) match {
case Success(result) => result
case Failure(_: TimeoutException) => List.empty
case Failure(e) =>
// ===> never got any message from topic
logger.error(s"Error while awaiting for messages ${e.getClass.getName}: ${e.getMessage}", e)
List.empty
}
} catch {
case e: Exception =>
logger.warn(s"Error while consuming messages", e)
List.empty
}
}
I tried to create messageStreams each time:
no luck...
2017-04-17_20:02:44.236 WARN MessageEnvelopeConsumer - Error while consuming messages
kafka.common.MessageStreamsExistException: ZookeeperConsumerConnector can create message streams at most once
at kafka.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:151)
at MessageEnvelopeConsumer.consume(MessageEnvelopeConsumer.scala:47)
Happens here:
def consume(numberOfEvents: Int, await: Duration = 100.millis): List[MessageEnvelope] = {
try {
val messageStreams = consumer.createMessageStreams(
Predef.Map(kafkaConfig.topic -> 1),
new DefaultDecoder,
new MessageEnvelopeDecoder)
val receiveMessageFuture = Future[List[MessageEnvelope]] {
messageStreams(kafkaConfig.topic)
.flatMap(stream => stream.take(numberOfEvents).map(_.message()))
}
Try(Await.result(receiveMessageFuture, await)) match {
case Success(result) => result
case Failure(_: TimeoutException) => List.empty
case Failure(e) =>
logger.error(s"Error while awaiting for messages ${e.getClass.getName}: ${e.getMessage}", e)
List.empty
}
} catch {
case e: Exception =>
// ===> now exception raised here
logger.warn(s"Error while consuming messages", e)
List.empty
}
}
UPD
I used iterator based approach. It looks this way:
// consumerProperties.put("consumer.timeout.ms", "100")
private lazy val consumer: ConsumerConnector = Consumer.create(new ConsumerConfig(consumerProperties))
private lazy val messageStreams: Seq[KafkaStream[Array[Byte], MessageEnvelope]] =
consumer.createMessageStreamsByFilter(Whitelist(kafkaConfig.topic), 1, new DefaultDecoder, new MessageEnvelopeDecoder)
private lazy val iterator: ConsumerIterator[Array[Byte], MessageEnvelope] = {
val stream = messageStreams.head
stream.iterator()
}
def consume(): List[MessageEnvelope] = {
try {
if (iterator.hasNext) {
val fromKafka: MessageAndMetadata[Array[Byte], MessageEnvelope] = iterator.next
List(fromKafka.message())
} else {
List.empty
}
} catch {
case _: ConsumerTimeoutException =>
List.empty
case e: Exception =>
logger.warn(s"Error while consuming messages", e)
List.empty
}
}
Now I'm trying to figure out if it automatically commits offsets to ZK...
Constant shutdown causes unnecessary consumer group rebalances which affects the performance a lot. See this article for best practices: https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
My answer is the latest question update. Iterator approach works for me as expected.
This question is based on a pet project that I did and this SO thread. Inside a Akka HTTP route definition, I start a long-running process, and naturally I want to do that without blocking the user. I'm able to achieve this with the code snippet below:
blocking-io-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
fixed-pool-size = 16
}
throughput = 1
}
complete {
Try(new URL(url)) match {
case scala.util.Success(u) => {
val src = Source.fromIterator(() => parseMovies(u).iterator)
src
.via(findMovieByTitleAndYear)
.via(persistMovies)
.toMat(Sink.fold(Future(0))((acc, elem) => Applicative[Future].map2(acc, elem)(_ + _)))(Keep.right)
// run the whole graph on a separate dispatcher
.withAttributes(ActorAttributes.dispatcher("blocking-io-dispatcher"))
.run.flatten
.onComplete {
_ match {
case scala.util.Success(n) => logger.info(s"Created $n movies")
case Failure(t) => logger.error(t, "Failed to process movies")
}
}
Accepted
}
case Failure(t) => logger.error(t, "Bad URL"); BadRequest -> "Bad URL"
}
}
What's the problem then if I've already solved it? The problem is that I'm not sure how to set a timeout. The execution of the graph creates a Future that executes until complete on the dedicated blocking-io-dispatcher. If I add a Await call, the code blocks. Is there a way to put a timeout?
completionTimeout stage should help here. Example below:
src
.completionTimeout(5.seconds)
...
.run.flatten
.onComplete {
case scala.util.Success(n) => logger.info(s"Created $n movies")
case Failure(t: TimeoutException) => logger.error(t, "Timed out")
case Failure(t) => logger.error(t, "Failed to process movies")
}
Docs reference here.
When I run the code below it terminates and nothing happens, is there a way to catch the exception in future2.map ?
object TestFutures1 {
val future1 = Future {
throw new Exception("error")
}
}
object TestFutures2 extends App {
val future2 = TestFutures1.future1
future2.map { result => println(result) }
Thread.sleep(5000)
}
Generally speaking, there are two options (with some sub-options) to handle future exceptions.
You can add a callback that will be called when the future completes with an exception:
future2
.map { result => println(result) }
.onFailure { exc =>
exc.printStackTrace
}
1a. .onFailure is only used for side effects, you cannot use, for example, to handle the exception. If you want to handle in and rescue the future, you can use .recover or .recoverWith (the only different between the two is that the latter takes a function that returns a Future, while the former deals with the actual result, same idea as map vs. flatMap:
future2
.map(println)
.recover { case e =>
e.printStackTrace
"recovered"
}.map(println)
The other option is to wait for the future to complete, and access it's result:
try {
// Instead of Thread.sleep - you don't need that:
Await.result(future2, 2 seconds)
} catch { case exc =>
exc.printStackTrace
}
Except #Dima's answer, also can use onComplete to catch failure, like:
val re = future2.map { result => println(result) }
re onComplete {
case Success(s) => ...
case Failure(e) => e.printStackTrace()
}