Reactive-Kafka Stream Consumer: Dead letters occured - scala

I am trying to consume messages from Kafka using akka's reactive kafka library. I am getting one message printed and after that I got
[INFO] [01/24/2017 10:36:52.934] [CommittableSourceConsumerMain-akka.actor.default-dispatcher-5] [akka://CommittableSourceConsumerMain/system/kafka-consumer-1] Message [akka.kafka.KafkaConsumerActor$Internal$Stop$] from Actor[akka://CommittableSourceConsumerMain/deadLetters] to Actor[akka://CommittableSourceConsumerMain/system/kafka-consumer-1#-1726905274] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
This is the code I am executing
import akka.actor.ActorSystem
import akka.kafka.scaladsl.Consumer
import akka.kafka.{ConsumerSettings, Subscriptions}
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.Sink
import org.apache.kafka.clients.consumer.ConsumerConfig
import play.api.libs.json._
import org.apache.kafka.common.serialization.{ByteArrayDeserializer, StringDeserializer}
object CommittableSourceConsumerMain extends App {
implicit val system = ActorSystem("CommittableSourceConsumerMain")
implicit val materializer = ActorMaterializer()
val consumerSettings =ConsumerSettings(system, new ByteArrayDeserializer, new StringDeserializer).withBootstrapServers("localhost:9092").withGroupId("CommittableSourceConsumer").withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
val done =
Consumer.committableSource(consumerSettings, Subscriptions.topics("topic1"))
.mapAsync(1) { msg =>
val record=(msg.record.value())
val data=Json.parse(record)
val recordType=data \ "data" \"event" \"type"
val actualData=data \ "data" \ "row"
if(recordType.as[String]=="created"){
"Some saving logic"
}
else{
"Some logic"
}
msg.committableOffset.commitScaladsl()
}
.runWith(Sink.ignore)
}

I finally figured out the solution. Due to a runtime exception in the stream a Future of failure is returned which terminates the stream immediately.
Akka-stream does not provide or display the runtime exception. So as to know the exception
done.onFailure{
case NonFatal(e)=>println(e)
}
The exception was in the if-else block.
Also one can use Actor Strategy to resume stream if exception occurs.

Related

param message.send.max.retries does not work in kafka producer

I have project on scala and sbt
I have producer
I tried to retry messages if kafka unreachable
package com.example
import akka.actor.ActorSystem
import akka.kafka.ProducerSettings
import akka.kafka.scaladsl.Producer
import akka.stream.scaladsl.Source
import akka.stream.{ActorMaterializer, ActorMaterializerSettings, Supervision}
import org.apache.kafka.clients.producer.{ProducerConfig, ProducerRecord}
import org.apache.kafka.common.serialization.StringSerializer
import scala.concurrent.Await
import scala.concurrent.duration._
import scala.util.control.NonFatal
object producer extends App {
private val decider: Supervision.Decider = {
case NonFatal(ex) =>
println("Non fatal exception in flow. Skip message and resuming flow.",
ex)
Supervision.Restart
case ex: Throwable =>
println("Other exception in flow. Stopping flow.", ex)
Supervision.Stop
}
implicit val system = ActorSystem("QuickStart")
private val strategy =
ActorMaterializerSettings(system).withSupervisionStrategy(decider)
implicit val materializer = ActorMaterializer(strategy)
val config = system.settings.config.getConfig("akka.kafka.producer")
val producerSettings =
ProducerSettings(system, new StringSerializer, new StringSerializer)
.withBootstrapServers("10.20.10.193:9092")
.withProperty("message.send.max.retries", "3")
.withProperty(ProducerConfig.MAX_BLOCK_MS_CONFIG, "5000")
//.withProperty(ProducerConfig.RECONNECT_BACKOFF_MS_CONFIG, "5000")
val done =
Source
.single("11")
.map(value => new ProducerRecord[String, String]("example", value))
.runWith(Producer.plainSink(producerSettings))
Await.result(done, 1000 seconds)
}
I defined a property:
.withProperty("message.send.max.retries", "3")
but it does not work
when I run produser with a bad kafka host - an output is
[INFO ] - 2018-07-30 23:00:36,951 - suppression - akka.event.slf4j.Slf4jLogger - Slf4jLogger started
(Non fatal exception in flow. Skip message and resuming flow.,org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 5000 ms.)
(Non fatal exception in flow. Skip message and resuming flow.,org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 5000 ms.)
Only two retries in log instead of three

How to it make pure?

I have following scala code:
import akka.Done
import akka.actor.ActorSystem
import akka.kafka.ConsumerMessage.CommittableOffsetBatch
import akka.kafka.scaladsl.Consumer
import akka.kafka.{ConsumerSettings, Subscriptions}
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.Sink
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.kafka.common.serialization.StringDeserializer
import scala.concurrent.Future
object TestConsumer {
def main(args: Array[String]): Unit = {
implicit val system = ActorSystem("KafkaConsumer")
implicit val materializer = ActorMaterializer()
val consumerSettings = ConsumerSettings(system, new StringDeserializer, new StringDeserializer)
.withBootstrapServers("localhost:9092")
.withGroupId("group1")
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
val result = Consumer
.committableSource(consumerSettings, Subscriptions.topics("test"))
.mapAsync(2)(rec => Future.successful(rec.record.value()))
.runWith(Sink.foreach(ele => {
print(ele)
system.terminate()
}))
}
}
As you can recognize, the application consumes message from kafka printed out on the shell.
runWith is not pure, it generates some side effect, print out the received message and shutdown the actor.
The question is, how to make it pure with cats IO effects? It is possible?
You don't need cats IO to make it pure. Note that your sink is already pure, because it's just the value that describes what will happen when it's used (in this case using means "connecting to the Source and running the stream").
val sink: Sink[String, Future[Done]] = Sink.foreach(ele => {
print(ele)
// system.terminate() // PROBLEM: terminating the system before stream completes!
})
The problem you described has nothing to do with purity. The problem is that the sink above closes over the value of system, and then tries to terminate it when processing each element of the source.
Terminating the system means that you are destroying the whole runtime environment (used by ActorMaterializer) that is used to run the stream. This should only be done when your stream completes.
val result: Future[Done] = Consumer
.committableSource(consumerSettings, Subscriptions.topics("test"))
.mapAsync(2)(rec => Future.successful(rec.record.value()))
.runWith(sink)
result.onComplete(_ => system.terminate())

Stopping Spark Streaming: exception in the cleaner thread but it will continue to run

I'm working on a Spark-Streaming application, I'm just trying to get a simple example of a Kafka Direct Stream working:
package com.username
import _root_.kafka.serializer.StringDecoder
import org.apache.spark.sql.SparkSession
import org.apache.spark.streaming.kafka._
import org.apache.spark.streaming.{Seconds, StreamingContext}
object MyApp extends App {
val topic = args(0) // 1 topic
val brokers = args(1) //localhost:9092
val spark = SparkSession.builder().master("local[2]").getOrCreate()
val sc = spark.sparkContext
val ssc = new StreamingContext(sc, Seconds(1))
val topicSet = topic.split(",").toSet
val kafkaParams = Map[String, String]("metadata.broker.list" -> brokers)
val directKafkaStream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicSet)
// Just print out the data within the topic
val parsers = directKafkaStream.map(v => v)
parsers.print()
ssc.start()
val endTime = System.currentTimeMillis() + (5 * 1000) // 5 second loop
while(System.currentTimeMillis() < endTime){
//write something to the topic
Thread.sleep(1000) // 1 second pause between iterations
}
ssc.stop()
}
This mostly works, whatever I write into the kafka topic, it gets included into the streaming batch and gets printed out. My only concern is what happens at ssc.stop():
dd/mm/yy hh:mm:ss WARN FileSystem: exception in the cleaner thread but it will continue to run
java.lang.InterruptException
at java.lang.Object.wait(Native Method)
at java.lang.ReferenceQueue.remove(ReferenceQueue.java:143)
at java.lang.ReferenceQueue.remove(ReferenceQueue.java:164)
at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:2989)
at java.lang.Thread.run(Thread.java:748)
This exception doesn't cause my app to fail nor exit though. I know I could wrap ssc.stop() into a try/catch block to suppress it, but looking into the API docs has me believe that this is not its intended behavior. I've been looking around online for a solution but nothing involving Spark has mentioned this exception, is there anyway for me to properly fix this?
I encountered the same problem with starting the process directly with sbt run. But if I packaged the project and start with YOUR_SPARK_PATH/bin/spark-submit --class [classname] --master local[4] [package_path], it works correctly. Hope this would help.

Why is AssertionError: assertion failed: Executor has not been attached to this receiver?

I'm trying to build a simple spark streaming custom receiver where messages are stored directly in the spark stream in order to be processed. However I get:
java.lang.AssertionError: assertion failed: Executor has not been attached to this receiver
I'm integrating into a third party java library that generates methods calls based on a socket it is listening to. By implementing an interface in the third party java library I plan to call the store method in the custom spark receiver.
I've created a simple cut down example which has the error, which does not reference the third party java library.
package com.custom.spark
import org.apache.spark.streaming.receiver.Receiver
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.storage.StorageLevel
import org.apache.spark.SparkConf
import org.apache.spark.streaming.Seconds
object CustomSparkReceiver {
def main(args: Array[String]) {
// create stream with custom receiver
val conf = new SparkConf().setMaster("local[*]").setAppName("CustomSparkReceiver")
val ssc = new StreamingContext(conf, Seconds(1))
val customReceiver = new CustomReceiver
val stream = ssc.receiverStream(customReceiver)
// print values in spark stream
stream.print
// start pushing data into stream
ssc.start
Thread.sleep(1000)
List.range(1, 10)
.foreach { number => customReceiver.store(number) }
Thread.sleep(1000)
ssc.stop()
}
class CustomReceiver extends Receiver[Integer](StorageLevel.MEMORY_AND_DISK_2) {
def onStart() = {}
def onStop() = {}
}
}
I think it is some sort of threading issue but not sure how to fix it. Any pointers on the above would be great.

Deadletters encountered when communicating between spark clusters with akka actor

Since spark is built on top of Akka, I want to use Akka to send and receive messages between spark clusters.
According to this tutorial, https://github.com/jaceklaskowski/spark-activator/blob/master/src/main/scala/StreamingApp.scala, I can run StreamingApp locally and send messages to the actorStream itself.
Then I tried to attach the sender part to my another spark master and send message from spark master to the remote actor in StreamingApp. The code is as follows
object SenderApp extends Serializable {
def main(args: Array[String]) {
val driverPort = 12345
val driverHost = "xxxx"
val conf = new SparkConf(false)
.setMaster("spark://localhost:8888") // Connecting to my spark master
.setAppName("Spark Akka Streaming Sender")
.set("spark.logConf", "true")
.set("spark.akka.logLifecycleEvents", "true")
val actorName = "helloer"
val sc = new SparkContext(conf)
val actorSystem = SparkEnv.get.actorSystem
val url = s"akka.tcp://sparkDriver#$driverHost:$driverPort/user/Supervisor0/$actorName"
val helloer = actorSystem.actorSelection(url)
helloer ! "Hello"
helloer ! "from"
helloer ! "Spark Streaming"
helloer ! "with"
helloer ! "Scala"
helloer ! "and"
helloer ! "Akka"
}
}
Then I got messages from StreamingApp saying it encountered DeadLetters.
The detailed messages are:
INFO LocalActorRef: Message [akka.remote.transport.AssociationHandle$Disassociated] from Actor[akka://sparkDriver/deadLetters] to Actor[akka://sparkDriver/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkDriver%40111.22.33.444%3A56840-4#-2094758237] was not delivered. [5] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
According to this article:
http://typesafe.com/activator/template/spark-streaming-scala-akka
I changed the helloer, it works now
val timeout = 100 seconds
val helloer = Await.result(actorSystem.actorSelection(url).resolveOne(timeout),
timeout)