Alpakka Kafka Stream not restarting after connection errors despite using RestartSource - apache-kafka

I have a simple committable source for Kafka stream wrapped in RestartSource. It works fine in happy path, but if I deliberately severe the connection to Kafka cluster, it throws connection exception from underlying kafka client and reports Kafka Consumer Shut Down. My expectation was it to restart the stream after ~150 seconds, but it doesn't. Is my understanding/usage of RestartSource incorrect from below:
val atomicControl = new AtomicReference[Consumer.Control](NoopControl)
val restartablekafkaSourceWithFlow = {
RestartSource.withBackoff(30.seconds, 120.seconds, 0.2) {
() => {
Consumer.committableSource(consumerSettings.withClientId("clientId"), Subscriptions.topics(Set("someTopic")))
.mapMaterializedValue(c => atomicControl.set(c))
.via(someFlow)
.via(httpFlow)
}
}
}
val committerSink: Sink[(Any, ConsumerMessage.CommittableOffset), Future[Done]] = Committer.sinkWithOffsetContext(CommitterSettings(actorSystem))
val runnableGraph = restartablekafkaSourceWithFlow.toMat(committerSink)(Keep.both)
val control = runnableGraph.mapMaterializedValue(x => Consumer.DrainingControl.apply(atomicControl.get, x._2)).run()

Maybe you are getting error outside of RestartSource.
You can add recover to see the error, and/or create a decider like below and use it in runnableGraph.
private val decider: Supervision.Decider = { e =>
logger.error("Unhandled exception in stream.", e)
Supervision.Resume
}
runnableGraph.withAttributes(supervisionStrategy(decider))

Related

Akka streams websocket stream things to a Sink.seq ends with exception SubscriptionWithCancelException$StageWasCompleted

I'm failing to materialize the Sink.seq, when it comes time to materialize I fail with this exception
akka.stream.SubscriptionWithCancelException$StageWasCompleted$:
Here is the full source code on github: https://github.com/Christewart/bitcoin-s-core/blob/aaecc7c180e5cc36ec46d73d6b2b0b0da87ab260/app/server-test/src/test/scala/org/bitcoins/server/WebsocketTests.scala#L51
I'm attempting to aggregate all elements being pushed out of a websocket into a Sink.seq. I have to a bit of json transformation before I aggreate things inside of Sink.seq.
val endSink: Sink[WalletNotification[_], Future[Seq[WalletNotification[_]]]] =
Sink.seq[WalletNotification[_]]
val sink: Sink[Message, Future[Seq[WalletNotification[_]]]] = Flow[Message]
.map {
case message: TextMessage.Strict =>
//we should be able to parse the address message
val text = message.text
val notification: WalletNotification[_] = {
upickle.default.read[WalletNotification[_]](text)(
WsPicklers.walletNotificationPickler)
}
logger.info(s"Notification=$notification")
notification
case msg =>
logger.error(s"msg=$msg")
sys.error("")
}
.log(s"### endSink ###")
.toMat(endSink)(Keep.right)
val f: Flow[
Message,
Message,
(Future[Seq[WalletNotification[_]]], Promise[Option[Message]])] = {
Flow
.fromSinkAndSourceMat(sink, Source.maybe[Message])(Keep.both)
}
val tuple: (
Future[WebSocketUpgradeResponse],
(Future[Seq[WalletNotification[_]]], Promise[Option[Message]])) = {
Http()
.singleWebSocketRequest(req, f)
}
val walletNotificationsF: Future[Seq[WalletNotification[_]]] =
tuple._2._1
val promise: Promise[Option[Message]] = tuple._2._2
logger.info(s"Requesting new address for expectedAddrStr")
val expectedAddressStr = ConsoleCli
.exec(CliCommand.GetNewAddress(labelOpt = None), cliConfig)
.get
val expectedAddress = BitcoinAddress.fromString(expectedAddressStr)
promise.success(None)
logger.info(s"before notificationsF")
//hangs here, as the future never gets completed, fails with an exception
for {
notifications <- walletNotificationsF
_ = logger.info(s"after notificationsF")
} yield {
//assertions in here...
}
What am i doing wrong?
To keep the client connection open you need "more code", sth like this:
val sourceKickOff = Source
.single(TextMessage("kick off msg"))
// Keeps the connection open
.concatMat(Source.maybe[Message])(Keep.right)
See full working example, which consumes msgs from a server:
https://github.com/pbernet/akka_streams_tutorial/blob/b6d4c89a14bdc5d72c557d8cede59985ca8e525f/src/main/scala/akkahttp/WebsocketEcho.scala#L280
The problem is this line
Flow.fromSinkAndSourceMat(sink, Source.maybe[Message])(Keep.both)
it needs to be
Flow.fromSinkAndSourceCoupledMat(sink, Source.maybe[Message])(Keep.both)
When the stream is terminated, the Coupled part of the materialized flow will make sure to terminate the Sink downstream.

Kafka Ensure At Least Once

First project with Kafka, trying to prove that an event will get processed at least once. So far, not seeing evidence that processing is retried.
Structure of dummy app is simple: subscribe, process, publish, commit; if exception, abort transaction and hope it gets retried. I am logging every message.
I expect to see (1) "process messageX" (2) "error for messageX" (3) "process messageX". Instead, I see processing continue beyond messageX, i.e. it does not get re-processed.
What I see is: (1) "process messageX" (2) "error for messageX" (3) "process someOtherMessage".
Using Kafka 2.7.0, Scala 2.12.
What am I missing? Showing relevant parts of dummy app below.
I also tried by removing the producer from the code (and all references to it).
UPDATE 1: I managed to get records re-processed by using the offsets with consumer.seek(), i.e. sending the consumer back to the start of the batch of records. Not sure why simply NOT reaching consumer.commitSync() (because of an exception) does not do this already.
import com.myco.somepackage.{MyEvent, KafkaConfigTxn}
import org.apache.kafka.clients.consumer.{ConsumerRecords, KafkaConsumer, OffsetAndMetadata}
import org.apache.kafka.clients.producer.KafkaProducer
import org.apache.kafka.common.{KafkaException, TopicPartition}
import org.slf4j.LoggerFactory
import java.util
import scala.collection.JavaConverters._
import scala.util.control.NonFatal
// Prove that a message can be re-processed if there is an exception
object TopicDrainApp {
private val logger = LoggerFactory.getLogger(this.getClass)
private val subTopic = "input.topic"
private val pubTopic = "output.topic"
val producer = new KafkaProducer[String, String](KafkaConfigTxn.producerProps)
producer.initTransactions()
val consumer = new KafkaConsumer[String, String](KafkaConfigTxn.consumerProps)
private var lastEventMillis = System.currentTimeMillis
private val pollIntervalMillis = 1000
private val pollDuration = java.time.Duration.ofMillis(pollIntervalMillis)
def main(args: Array[String]): Unit = {
subscribe(subTopic)
}
def subscribe(subTopic: String): Unit = {
consumer.subscribe(util.Arrays.asList(subTopic))
while (System.currentTimeMillis - lastEventMillis < 5000L) {
try {
val records: ConsumerRecords[String, String] = consumer.poll(pollDuration)
records.asScala.foreach { record =>
try {
lastEventMillis = System.currentTimeMillis
val event = MyEvent.deserialize(record.value())
logger.info("ReceivedMyEvent:" + record.value())
producer.beginTransaction()
simulateProcessing(event) // [not shown] throw exception to test re-processing
producer.flush()
val offsetsToCommit = getOffsetsToCommit(records)
//consumer.commitSync() // tried this; does not work
//producer.sendOffsetsToTransaction(offsetsToCommit, "group1") // tried this; does not work
producer.commitTransaction()
} catch {
case e: KafkaException => logger.error(s"rollback ${record.value()}", e)
producer.abortTransaction()
}
}
} catch {
case NonFatal(e) => logger.error(e.getMessage, e)
}
}
}
private def getOffsetsToCommit(records: ConsumerRecords[String, String]): util.Map[TopicPartition, OffsetAndMetadata] = {
records.partitions().asScala.map { partition =>
val partitionedRecords = records.records(partition)
val offset = partitionedRecords.get(partitionedRecords.size - 1).offset
(partition, new OffsetAndMetadata(offset + 1))
}.toMap.asJava
}
}
object KafkaConfigTxn {
// Only relevant properties are shown
def commonProperties: Properties = {
val props = new Properties()
props.put(CommonClientConfigs.CLIENT_ID_CONFIG, "...")
props.put(CommonClientConfigs.GROUP_ID_CONFIG, "...")
props
}
def producerProps: Properties = {
val props = new Properties()
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true") // "enable.idempotence"
props.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "...") // "transactional.id"
props.put(ProducerConfig.ACKS_CONFIG, "all")
props.put(ProducerConfig.RETRIES_CONFIG, "3")
commonProperties.asScala.foreach { case (k, v) => props.put(k, v) }
props
}
def consumerProps: Properties = {
val props = new Properties()
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false")
props.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed") // "isolation.level"
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
commonProperties.asScala.foreach { case (k, v) => props.put(k, v) }
props
}
}
according to the reference I gave you , you need to use sendOffsetsToTransaction in the process, but again your consumer won't get the message of the aborted transcation as you are reading only committed transcation
Transcations were introduced in order to allow exactly once processing between kafka to kafka, being said that kafka supported from day one delivery semantics of at least once and at most once,
To get at least once behavior you disable auto commit and commit when processing had finished successfully that way next time you call poll() if you had exception before commit you will read again the records from the last commit offset
To get at most once behavior you commit before processing starts that way if exception happens , next time you call poll() you get new messages (but lose the other messages)
Exactly once is the most hard to achieve in plain java (not talking on spring framework which makes everything easier) - it involves saving offsets to external db ( usually where you process is done) and reading from there on startup/rebalance
For transcation usage example in java you might read this excellent guide by baeldung
https://www.baeldung.com/kafka-exactly-once
Figured out the correct combination of method calls (subscribe, beginTransaction, process, commit / abortTransaction, etc.), for a demo app. The core of the code is
def readProcessWrite(subTopic: String, pubTopic: String): Int = {
var lastEventMillis = System.currentTimeMillis
val consumer = createConsumer(subTopic)
val producer = createProducer()
val groupMetadata = consumer.groupMetadata()
var numRecords = 0
while (System.currentTimeMillis - lastEventMillis < 10000L) {
try {
val records: ConsumerRecords[String, String] = consumer.poll(pollDuration)
val offsetsToCommit = getOffsetsToCommit(records)
// println(s">>> PollRecords: ${records.count()}")
records.asScala.foreach { record =>
val currentOffset = record.offset()
try {
numRecords += 1
lastEventMillis = System.currentTimeMillis
println(s">>> Topic: $subTopic, ReceivedEvent: offset=${record.offset()}, key=${record.key()}, value=${record.value()}")
producer.beginTransaction()
val eventOut = simulateProcessing(record.value()) // may throw
publish(producer, pubTopic, eventOut)
producer.sendOffsetsToTransaction(offsetsToCommit, groupMetadata)
consumer.commitSync()
producer.commitTransaction()
} catch {
case e: KafkaException => println(s"---------- rollback ${record.value()}", e)
producer.abortTransaction()
offsetsToCommit.forEach { case (topicPartition, _) =>
consumer.seek(topicPartition, currentOffset)
}
}
}
} catch {
case NonFatal(e) => logger.error(e.getMessage, e)
}
}
consumer.close()
producer.close()
numRecords
}
// Consumer created with props.put("max.poll.records", "1")
I was able to prove that this will process each event exactly once, even when simulateProcessing() throws an exception. To be precise: when processing works fine, each event is processed exactly once. If there is an exception, the event is re-processed until success. In my case, there is no real reason for the exceptions, so re-processing will always end in success.

Akka RestartSource does not restart

object TestSource {
implicit val ec = ExecutionContext.global
def main(args: Array[String]): Unit = {
def buildSource = {
println("fresh")
Source(List(() => 1,() => 2,() => 3,() => {
println("crash")
throw new RuntimeException(":(((")
}))
}
val restarting = RestartSource.onFailuresWithBackoff(
minBackoff = Duration(1, SECONDS) ,
maxBackoff = Duration(1, SECONDS),
randomFactor = 0.0,
maxRestarts = 10
)(() => {
buildSource
})
implicit val actorSystem: ActorSystem = ActorSystem()
implicit val executionContext: ExecutionContext = actorSystem.dispatcher
restarting.runWith(Sink.foreach(e => println(e())))
}
}
The code above prints: 1,2,3, crash
Why does my source not restart?
This is pretty much a 1:1 copy of the official documentation.
edit:
I also tried
val rs = RestartSink.withBackoff[() => Int](
Duration(1, SECONDS),
Duration(1, SECONDS),
0.0,
10
)(_)
val rsDone = rs(() => {
println("???")
Sink.foreach(e => println(e()))
})
restarting.runWith(rsDone)
but still get no restarts
This is because the exception is triggered outside of the buildSource Source in the Sink.foreach when you call the functions emitted from the Source.
Try this:
val restarting = RestartSource.onFailuresWithBackoff(
minBackoff = Duration(1, SECONDS) ,
maxBackoff = Duration(1, SECONDS),
randomFactor = 0.0,
maxRestarts = 10
)(() => {
buildSource
.map(e => e()) //call the functions inside the RestartSource
})
That way your exception will happen inside the inner Source wrapped by RestartSource and the restarting mechanism will kick in.
The source doesn't restart because your source never fails, therefore never needs to restart.
The exception gets thrown when Sink.foreach evaluates the function it received.
As artur noted, if you can move the failing bit into the source, you can wrap everything up to the sink in the RestartSource.
While it won't help for this contrived example (as restarting a sink doesn't result in resending previously sent messages), wrapping the sink in a RestartSink may be useful in real-world cases where this sort of thing can happen (off the top of my head, streams from Kafka blowing up because the offset commit in a sink failed (e.g. after a rebalance) should be an example of such a case).
An alternative, if you want to restart the whole stream if any part fails, and the stream materializes as a Future, you can implement retry-with-backoff on the failed future.
Source just never crashes, as already said here.
You are actually crashing you sink, not a source with this statement e => e()
this happens when applying lambda above to last element of source:
java.lang.RuntimeException: :(((
Here's the same stream without unhandled exception in sink:
...
RestartSource.withBackoff(
...
restarting.runWith(
Sink.foreach(e => {
def i: Int = try{ e() } catch {
case t: Throwable =>
println(t)
-1
}
println(i)
})
)
Works perfectly.

Azure Databricks Concurrent Job -Avoid Consuming the same eventhub messages in all Jobs

Please help us to implement the partition/grouping when receiving eventhub messages in a Azure Databricks concurrent Job and the right approach to consume eventhub messages in a concurrent job.
Created 3 concurrent jobs in Azure Databricks uploading consumer code written in scala as a jar files. In this case receiving the same messages in all 3 concurrent jobs. To overcome from this issue tried to consume the events by partitioning but receiving the same messages in all 3 partitions.
And also tried by sending messages based on partition key and also tried creating a consumer groups in eventhubs even though receiving same messages in all the groups. We are not sure to handle the eventhub messages in the concurrent job
EventHub Configuration:
No.of partitions is 3 and Message Retention is 1
EventHub Producer: Sending messages to Eventhub using .NET (C#) is working fine.
EventHub Consumer: Able to receive messages through Scala Program without any issues.
Problem : Created 3 concurrent jobs in Azure Databricks uploading consumer code written in Scala as a jar files.In this case receiving the same messages in all 3 concurrent jobs. To overcome from this issue tried to consume the events by partitioning but receiving the same messages in all 3 partitions.And
also tried by sending messages based on partition key and also tried creating a consumer groups in eventhubs even though receiving same messages in all the groups. We are not sure to handle the eventhub messages in the concurrent job.
Producer C# Code:
string eventHubName = ConfigurationManager.AppSettings["eventHubname"];
string connectionString = ConfigurationManager.AppSettings["eventHubconnectionstring"];
eventHubClient = EventHubClient.CreateFromConnectionString(connectionString, eventHubName);
for (var i = 0; i < 100; i++)
{
var sender = "event hub message 1" + i;
var data = new EventData(Encoding.UTF8.GetBytes(sender));
Console.WriteLine($"Sending message: {sender}");
eventHubClient.SendAsync(data);
}
eventHubClient.CloseAsync();
Console.WriteLine("Press ENTER to exit.");
Console.ReadLine();
Consumer Scala Code:
object ReadEvents {
val spark = SparkSession.builder()
.appName("eventhub")
.getOrCreate()
val sc = spark.sparkContext
val ssc = new StreamingContext(sc, Seconds(5))
def main(args : Array[String]) : Unit = {
val connectionString = ConnectionStringBuilder("ConnectionString").setEventHubName("eventhub1").build
val positions = Map(new NameAndPartition("eventhub1", 0) -> EventPosition.fromStartOfStream)
val position2 = Map(new NameAndPartition("eventhub1", 1) -> EventPosition.fromEnqueuedTime(Instant.now()))
val position3 = Map(new NameAndPartition("eventhub1", 2) -> EventPosition.fromEnqueuedTime(Instant.now()))
val ehConf = EventHubsConf(connectionString).setStartingPositions(positions)
val ehConf2 = EventHubsConf(connectionString).setStartingPositions(position2)
val ehConf3 = EventHubsConf(connectionString).setStartingPositions(position3)
val stream = org.apache.spark.eventhubs.EventHubsUtils.createDirectStream(ssc, ehConf)
println("Before the loop")
stream.foreachRDD(rdd => {
rdd.collect().foreach(rec => {
println(String.format("Message is first stream ===>: %s", new String(rec.getBytes(), Charset.defaultCharset())))
})
})
val stream2 = org.apache.spark.eventhubs.EventHubsUtils.createDirectStream(ssc, ehConf2)
stream2.foreachRDD(rdd2 => {
rdd2.collect().foreach(rec2 => {
println(String.format("Message second stream is ===>: %s", new String(rec2.getBytes(), Charset.defaultCharset())))
})
})
val stream3 = org.apache.spark.eventhubs.EventHubsUtils.createDirectStream(ssc, ehConf)
stream3.foreachRDD(rdd3 => {
println("Inside 3rd stream foreach loop")
rdd3.collect().foreach(rec3 => {
println(String.format("Message is third stream ===>: %s", new String(rec3.getBytes(), Charset.defaultCharset())))
})
})
ssc.start()
ssc.awaitTermination()
}
}
Expecting to partition the eventhub messages properly when receiving it on concurrent jobs running using scala program.
Below code help to iterate through all partions
eventHubsStream.foreachRDD { rdd =>
rdd.foreach { message =>
if (message != null) {
callYouMethod(new String(message.getBytes()))
}
}
}

Kafka topic to websocket

I am trying to implement a setup where I have multiple web browsers open a websocket connection to my akka-http server in order to read all messages posted to a kafka topic.
so the stream of messages should go this way
kafka topic -> akka-http -> websocket connection 1
-> websocket connection 2
-> websocket connection 3
For now I have created a path for the websocket:
val route: Route =
path("ws") {
handleWebSocketMessages(notificationWs)
}
Then I have created a consumer for my kafka topic:
val consumerSettings = ConsumerSettings(system,
new ByteArrayDeserializer, new StringDeserializer)
.withBootstrapServers("localhost:9092")
.withGroupId("group1")
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
val source = Consumer
.plainSource(consumerSettings, Subscriptions.topics("topic1"))
And then finally I want to connect this source to the websocket in handleWebSocketMessages
def handleWebSocketMessages: Flow[Message, Message, Any] =
Flow[Message].mapConcat {
case tm: TextMessage =>
TextMessage(source)::Nil
case bm: BinaryMessage =>
// ignore binary messages but drain content to avoid the stream being clogged
bm.dataStream.runWith(Sink.ignore)
Nil
}
Here is the error I get when I try to use source in the TextMessage:
Error:(77, 9) overloaded method value apply with alternatives:
(textStream: akka.stream.scaladsl.Source[String,Any])akka.http.scaladsl.model.ws.TextMessage
(text: String)akka.http.scaladsl.model.ws.TextMessage.Strict
cannot be applied to (akka.stream.scaladsl.Source[org.apache.kafka.clients.consumer.ConsumerRecord[Array[Byte],String],akka.kafka.scaladsl.Consumer.Control])
TextMessage(source)::Nil
I think I'm making numerous mistakes along the way but I would say that the most blocking part is the handleWebSocketMessages.
The first thing, is to understand that source is of type : Source[ConsumerRecord[K, V], Control].
So, it's not something that you could pass as an argument of a TextMessage.
Now, let's take the websocket's point of view:
An outgoing message is built for each message in the Kafka source. The message will be a TextMessage from a String transformation of the Kafka message.
For each incoming message, just println() it
So, the Flow can be seen as two components: the Source & the Sink.
val incomingMessages: Sink[Message, NotUsed] =
Sink.foreach(println(_))
val outgoingMessages: Source[Message, NotUsed] =
source
.map { consumerRecord => TextMessage(consumerRecord.record.value) }
val handleWebSocketMessages: Flow[Message, Message, Any]
= Flow.fromSinkAndSource(incomingMessages, outgoingMessages)
Hope it helps.