Gracefully restart a Reactive-Kafka Consumer Stream on failure - scala

Problem
When I restart/complete/STOP stream the old Consumer does not Die/Shutdown:
[INFO ] a.a.RepointableActorRef -
Message [akka.kafka.KafkaConsumerActor$Internal$Stop$]
from Actor[akka://ufo-sightings/deadLetters]
to Actor[akka://ufo-sightings/system/kafka-consumer-1#1896610594]
was not delivered. [1] dead letters encountered.
Description
I'm building a service that receives a message from Kafka topic and sends the message to an external service via HTTP request.
A connection with the external service can be broken, and my service needs to retry the request.
Additionally, if there is an error in the Stream, entire stream needs to restart.
Finally, sometimes I don't need the stream and its corresponding Kafka-consumer and I would like to shut down the entire stream
So I have a Stream:
Consumer.committableSource(customizedSettings, subscriptions)
.flatMapConcat(sourceFunction)
.toMat(Sink.ignore)
.run
Http request is sent in sourceFunction
I followed new Kafka Consumer Restart instructions in the new documentation
RestartSource.withBackoff(
minBackoff = 20.seconds,
maxBackoff = 5.minutes,
randomFactor = 0.2 ) { () =>
Consumer.committableSource(customizedSettings, subscriptions)
.watchTermination() {
case (consumerControl, streamComplete) =>
logger.info(s" Started Watching Kafka consumer id = ${consumer.id} termination: is shutdown: ${consumerControl.isShutdown}, is f completed: ${streamComplete.isCompleted}")
consumerControl.isShutdown.map(_ => logger.info(s"Shutdown of consumer finally happened id = ${consumer.id} at ${DateTime.now}"))
streamComplete
.flatMap { _ =>
consumerControl.shutdown().map(_ -> logger.info(s"3.consumer id = ${consumer.id} SHUTDOWN at ${DateTime.now} GRACEFULLY:CLOSED FROM UPSTREAM"))
}
.recoverWith {
case _ =>
consumerControl.shutdown().map(_ -> logger.info(s"3.consumer id = ${consumer.id} SHUTDOWN at ${DateTime.now} ERROR:CLOSED FROM UPSTREAM"))
}
}
.flatMapConcat(sourceFunction)
}
.viaMat(KillSwitches.single)(Keep.right)
.toMat(Sink.ignore)(Keep.left)
.run
There is an issue opened that discusses this non-terminating Consumer in a complex Akka-stream, but there is no solution yet.
Is there a workaround that forces the Kafka Consumer termination

How about wrapping the consumer in an Actor and registering a KillSwitch, see: https://doc.akka.io/docs/akka/2.5/stream/stream-dynamic.html#dynamic-stream-handling
Then in the Actor postStop method you can terminate the stream.
By wrapping the Actor in a BackoffSupervisor, you get the exponential backoff.
Example actor: https://github.com/tradecloud/kafka-akka-extension/blob/master/src/main/scala/nl/tradecloud/kafka/KafkaSubscriberActor.scala#L27

Related

akka streams kafka consumer stops in few seconds

I am reading from kafka consumer, then persist message to db and send to producer on different topic. My akka streams app stops running in few seconds after launch.
Here is how my stream looks like.
Consumer.committableSource(consumerSettings, Subscriptions.topics(config.getString("topic")))
.mapAsync(8) {
msg => dbPersistActor.ask(msg.record.value()).map(_=> msg)
}.async
.map {
msg =>
ProducerMessage.Message(new ProducerRecord("test-output", msg.record.key(), msg.record.value())
, passThrough = msg.committableOffset)
}.via(Producer.flexiFlow(producerSettings))
.map(_.passThrough)
.via(Committer.flow(committerSettings))
.runWith(Sink.ignore)

Grpc parallel Stream communication leads to error:AkkaNettyGrpcClientGraphStage

I have two services: one that sends stream data and the second one receives it using akka-grpc for communication. When source data is provided Service one is called to process and send it to service two via grpc client. It's possible that multiple instances of server one runs at the same time when multiple source data are provided at the same time.In long running test of my application. I see below error in service one:
ERROR i.a.g.application.actors.DbActor - GraphStage [akka.grpc.internal.AkkaNettyGrpcClientGraphStage$$anon$1#59d40805] terminated abruptly, caused by for example materializer or act
akka.stream.AbruptStageTerminationException: GraphStage [akka.grpc.internal.AkkaNettyGrpcClientGraphStage$$anon$1#59d40805] terminated abruptly, caused by for example materializer or actor system termination.
I have never shutdown actor systems but only kill actors after doing their job. Also I used proto3 and http2 for request binding. Here is a piece of my code in service one:
////////////////////server http binding /////////
val service: HttpRequest => Future[HttpResponse] =
ServiceOneServiceHandler(new ServiceOneServiceImpl(system))
val bound = Http().bindAndHandleAsync(
service,
interface = config.getString("akka.grpc.server.interface"),
port = config.getString("akka.grpc.server.default-http-port").toInt,
connectionContext = HttpConnectionContext(http2 = Always))
bound.foreach { binding =>
logger.info(s"gRPC server bound to: ${binding.localAddress}")
}
////////////////////client /////////
def send2Server[A](data: ListBuffer[A]): Future[ResponseDTO] = {
val reply = {
val thisClient = interface.initialize()
interface.call(client = thisClient, req = data.asInstanceOf[ListBuffer[StoreRequest]].toList)
}
reply
}
///////////////// grpc communication //////////
def send2GrpcServer[A](data: ListBuffer[A]): Unit = {
val reply = send2Server(data)
Await.ready(reply, Duration.Inf) onComplete {
case util.Success(response: ResponseDTO) =>
logger.info(s"got reply message: ${res.description}")
//////check response content and stop application if desired result not found in response
}
case util.Failure(exp) =>
//////stop application
throw exp.getCause
}
}
Error occurred exactly after waiting for service 2 response :
Await.ready(reply, Duration.Inf)
I can't catch the cause of error.
UPDATE
I found that some stream is missed such that service one sends an stream an indefinitely wait for the response and service two does not receive any thing to reply to service one but still don't know why stream is missed
I also updated akka grpc plugin but has no sense:
addSbtPlugin("com.lightbend.akka.grpc" % "sbt-akka-grpc" % "0.6.1")
addSbtPlugin("com.lightbend.sbt" % "sbt-javaagent" % "0.1.4")

Producer lost some message on kafka restart

Kafka Client : 0.11.0.0-cp1
Kafka Broker :
On Kafka broker rolling restart, our application lost some messages while sending to broker. I believe with rolling restart there should not be any loss of message. These are the producer (Using Producer with asynchronous send() and not using callback/future etc) settings we are using :
val acksConfig: String = "all",
val retriesConfig: Int = Int.MAX_VALUE,
val retriesBackOffConfig: Int = 1000,
val batchSize: Int = 32768,
val lingerTime: Int = 1,
val maxBlockTime: Int = Int.MAX_VALUE,
val requestTimeOut: Int = 420000,
val bufferMemory: Int = 33_554_432,
val compressionType: String = "gzip",
val keySerializer: Class<StringSerializer> = StringSerializer::class.java,
val valueSerializer: Class<ByteArraySerializer> = ByteArraySerializer::class.java
I am seeing these exceptions in the logs
2019-03-19 17:30:59,224 [org.apache.kafka.clients.producer.internals.Sender] [kafka-producer-network-thread | producer-1] (Sender.java:511) WARN org.apache.kafka.clients.producer.internals.Sender - Got error produce response with correlation id 1105790 on topic-partition catapult_on_entitlement_updates_prod-67, retrying (2147483643 attempts left). Error: NOT_LEADER_FOR_PARTITION
But log says retry attempt left, i am curious why didnt it retry then? Let me know if anyone has any idea?
Two things to note:
What is the replication factor of the topic you are producing and what is the required number of min.insync.replicas?
What do you mean by "producer lost some messages". The producer if it cannot successfully produce to #min.insync.replicas brokers it will throw an exception and fail (for synchronous production). It is up to the producer/ client to retry in case of failure (synchronous or asynchronous production).

Akka streams with gilt aws kinesis exception: Stream is terminated. SourceQueue is detached

I'm using gilt aws kinesis stream consumer library there to connect to a single shard kinesis stream.
Specifically:
...
val streamConfig = KinesisStreamConsumerConfig[String](
streamName = queueName
, applicationName = kinesisConsumerApp
, regionName = Some(awsRegion)
, checkPointInterval = 5.minutes
, retryConfig = RetryConfig(initialDelay = 1.second, retryDelay = 1.second, maxRetries = 3)
, initialPositionInStream = InitialPositionInStream.LATEST
)
implicit val mat = ActorMaterializer()
val flow = Source.queue[String](0, OverflowStrategy.backpressure)
.to(Sink.foreach {
msgBody => {
log.info(s"Flow got message: $msgBody")
try {
val workAsJson = parse(msgBody)
frontEnd ! workAsJson
} catch {
case th: Throwable => log.error(s"Exception thrown trying to parse message from Kinesis stream, e.cause: ${th.getCause}, e.message: ${th.getMessage}")
}
}
})
.run()
val consumer = new KinesisStreamConsumer[String](
streamConfig,
KinesisStreamHandler(
KinesisStreamSource.pumpKinesisStreamTo(flow, 10.second)
)
)
val ec = Executors.newSingleThreadExecutor()
ec.submit(new Runnable {
override def run(): Unit = consumer.run()
})
The application runs fine for about 24 hours, I verify occasionally by pushing records using aws kinesis put-record command line and watch them getting consumed by my application, but then suddenly the application start receiving exceptions each time a new record is pushed to the stream.
Here is the console logging when that happens:
INFO: Sleeping ... [863/1962]
DEBUG[RecordProcessor-0000] KCLRecordProcessorFactory$IRecordProcessorFactoryImpl - Processing 1 records from shard shardId-000000000000
WARN [RecordProcessor-0000] KCLRecordProcessorFactory$IRecordProcessorFactoryImpl - Kinesis shard: shardId-000000000000 :: Stream is terminated. SourceQueue is detached
WARN [RecordProcessor-0000] KCLRecordProcessorFactory$IRecordProcessorFactoryImpl - Kinesis shard: shardId-000000000000 :: Stream is terminated. SourceQueue is detached
WARN [RecordProcessor-0000] KCLRecordProcessorFactory$IRecordProcessorFactoryImpl - Kinesis shard: shardId-000000000000 :: Stream is terminated. SourceQueue is detached
ERROR[RecordProcessor-0000] KCLRecordProcessorFactory$IRecordProcessorFactoryImpl - SKIPPING 1 records from shard shardId-000000000000 :: Kinesis shard: shardId-000000000000 :: Stream is termi
nated. SourceQueue is detached
com.gilt.gfc.aws.kinesis.client.KCLRecordProcessorFactory$KCLProcessorException: Kinesis shard: shardId-000000000000 :: Stream is terminated. SourceQueue is detached
at com.gilt.gfc.aws.kinesis.client.KCLRecordProcessorFactory$IRecordProcessorFactoryImpl$$anon$1.$anonfun$doRetry$2(KCLRecordProcessorFactory.scala:156)
at com.gilt.gfc.util.Retry$.retryWithExponentialDelay(Retry.scala:67)
at com.gilt.gfc.aws.kinesis.client.KCLRecordProcessorFactory$IRecordProcessorFactoryImpl$$anon$1.doRetry(KCLRecordProcessorFactory.scala:151)
at com.gilt.gfc.aws.kinesis.client.KCLRecordProcessorFactory$IRecordProcessorFactoryImpl$$anon$1.processRecords(KCLRecordProcessorFactory.scala:120)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.V1ToV2RecordProcessorAdapter.processRecords(V1ToV2RecordProcessorAdapter.java:42)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ProcessTask.call(ProcessTask.java:176)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:49)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:24)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Stream is terminated. SourceQueue is detached
at akka.stream.impl.QueueSource$$anon$1.$anonfun$postStop$1(Sources.scala:57)
at akka.stream.impl.QueueSource$$anon$1.$anonfun$postStop$1$adapted(Sources.scala:56)
at akka.stream.stage.CallbackWrapper.$anonfun$invoke$1(GraphStage.scala:1373)
at akka.stream.stage.CallbackWrapper.locked(GraphStage.scala:1379)
at akka.stream.stage.CallbackWrapper.invoke(GraphStage.scala:1370)
at akka.stream.stage.CallbackWrapper.invoke$(GraphStage.scala:1369)
at akka.stream.impl.QueueSource$$anon$1.invoke(Sources.scala:47)
at akka.stream.impl.QueueSource$$anon$2.offer(Sources.scala:180)
at com.gilt.gfc.aws.kinesis.akka.KinesisStreamSource$.$anonfun$pumpKinesisStreamTo$1(KinesisStreamSource.scala:20)
at com.gilt.gfc.aws.kinesis.akka.KinesisStreamSource$.$anonfun$pumpKinesisStreamTo$1$adapted(KinesisStreamSource.scala:20)
at com.gilt.gfc.aws.kinesis.akka.KinesisStreamHandler$$anon$1.onRecord(KinesisStreamHandler.scala:29)
at com.gilt.gfc.aws.kinesis.akka.KinesisStreamConsumer.$anonfun$run$1(KinesisStreamConsumer.scala:40)
at com.gilt.gfc.aws.kinesis.akka.KinesisStreamConsumer.$anonfun$run$1$adapted(KinesisStreamConsumer.scala:40)
at com.gilt.gfc.aws.kinesis.client.KCLWorkerRunner.$anonfun$runSingleRecordProcessor$2(KCLWorkerRunner.scala:159)
at com.gilt.gfc.aws.kinesis.client.KCLWorkerRunner.$anonfun$runSingleRecordProcessor$2$adapted(KCLWorkerRunner.scala:159)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:52)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at com.gilt.gfc.aws.kinesis.client.KCLWorkerRunner.$anonfun$runSingleRecordProcessor$1(KCLWorkerRunner.scala:159)
at com.gilt.gfc.aws.kinesis.client.KCLWorkerRunner.$anonfun$runSingleRecordProcessor$1$adapted(KCLWorkerRunner.scala:159)
at com.gilt.gfc.aws.kinesis.client.KCLWorkerRunner.$anonfun$runBatchProcessor$1(KCLWorkerRunner.scala:121)
at com.gilt.gfc.aws.kinesis.client.KCLWorkerRunner.$anonfun$runBatchProcessor$1$adapted(KCLWorkerRunner.scala:116)
at com.gilt.gfc.aws.kinesis.client.KCLRecordProcessorFactory$IRecordProcessorFactoryImpl$$anon$1.$anonfun$processRecords$2(KCLRecordProcessorFactory.scala:120)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at com.gilt.gfc.aws.kinesis.client.KCLRecordProcessorFactory$IRecordProcessorFactoryImpl$$anon$1.$anonfun$doRetry$2(KCLRecordProcessorFactory.scala:153)
... 11 common frames omitted
Wondering if that answer might be related. If so, I'd appreciate simpler explanation/how-to-fix that suits a newbie like myself
Notes:
This is still in testing/staging phase so there is no real load to
the stream except for the occasional manual pushes I'm making
The 24h duration in which the application runs fine is not
accurately tested but was an observation.
I'm running the test for a third time (started at 8:42 UTC) but with
the difference of increasing Source.queue buffer size to 100.
If 24h turns out to be accurate, could that be related to Kinesis
default 24h retention period of stream records?
Update:
Application still working fine after 24+ hours of operation.
Update2:
So the application has been running fine for the past 48+ hours, again the only difference is increasing the stream's Source.queue size to 100.
Could that be the proper fix to the issue?
Will I face similar issue with increased load once we go to production?
Is 100 enough/too much/too few?
Can someone please explain how this change fixed/suppressed/metigated the error?

How to shutdown a Kafka ConsumerConnector

I have a system that pulls messages from a Kafka topic, and when it's unable to process messages because some external resource is unavailable, it shuts down the consumer, returns the message to the topic, and waits some time before starting the consumer again. The only problem is, shutting down doesn't work. Here's what I see in my logs:
2014-09-30 08:24:10,918 - com.example.kafka.KafkaConsumer [info] - [application-akka.actor.workflow-context-8] Shutting down kafka consumer for topic new-problem-reports
2014-09-30 08:24:10,927 - clients.kafka.ProblemReportObserver [info] - [application-akka.actor.workflow-context-8] Consumer shutdown
2014-09-30 08:24:11,946 - clients.kafka.ProblemReportObserver [warn] - [application-akka.actor.workflow-context-8] Sending 7410-1412090624000 back to the queue
2014-09-30 08:24:12,021 - clients.kafka.ProblemReportObserver [debug] - [kafka-akka.actor.kafka-consumer-worker-context-9] Message from partition 0: key=7410-1412090624000, msg=7410-1412090624000
There's a few layers at work here, but the important code is:
In KafkaConsumer.scala:
protected def consumer: ConsumerConnector = Consumer.create(config.asKafkaConfig)
def shutdown() = {
logger.info(s"Shutting down kafka consumer for topic ${config.topic}")
consumer.shutdown()
}
In the routine that observes messages:
(processor ? ProblemReportRequest(problemReportKey)).map {
case e: ConnectivityInterruption =>
val backoff = 10.seconds
logger.warn(s"Can't connect to essential services, pausing for $backoff", e)
stop()
// XXX: Shutdown isn't instantaneous, so returning has to happen after a delay.
// Unfortunately, there's still a race condition here, plus there's a chance the
// system will be shut down before the message has been returned.
system.scheduler.scheduleOnce(100 millis) { returnMessage(message) }
system.scheduler.scheduleOnce(backoff) { start() }
false
case e: Exception => returnMessage(message, e)
case _ => true
}.recover { case e => returnMessage(message, e) }
And the stop method:
def stop() = {
if (consumerRunning.get()) {
consumer.shutdown()
consumerRunning.compareAndSet(true, false)
logger.info("Consumer shutdown")
} else {
logger.info("Consumer is already shutdown")
}
!consumerRunning.get()
}
Is this a bug, or am I doing it wrong?
Because your consumer is a def. It creates a new Kafka instance and shut that new instance down when you call it like consumer.shutdown(). Make consumer a val instead.