Can I use a ZSink to commit offsets in Zio-Kafka? - scala

I am learning ZIO integration with Apache Kafka, using the library zio-kafka. In the example on the Github main project page, they use a mapM function to commit the offset of a chunk:
Consumer.subscribeAnd(Subscription.topics("topic150"))
.plainStream(Serde.string, Serde.string)
.tap(cr => putStrLn(s"key: ${cr.record.key}, value: ${cr.record.value}"))
.map(_.offset)
.aggregateAsync(Consumer.offsetBatches)
.mapM(_.commit)
.runDrain
However, IMHO, committing the offset is a terminal operation on the stream. Which is the difference in using a ZSink, instead?
Consumer.subscribeAnd(Subscription.topics("topic150"))
.plainStream(Serde.string, Serde.string)
.tap(cr => putStrLn(s"key: ${cr.record.key}, value: ${cr.record.value}"))
.map(_.offset)
.aggregateAsync(Consumer.offsetBatches)
.run(ZSink.foreach(_.commit))

Related

Getting "java.lang.IllegalStateException: Tried to lookup lag for unknown task 3_0" after upgrading Kafka Stream from 2.5.1 to 2.6.2

I just upgraded our Kafka Stream application from 2.5.1 to 2.6.2. It used to work, now it doesn't.
Here is the troublesome topology (I have omitted the irrelevant Serdes):
val builder = new StreamsBuilder()
val contractEventStream: KStream[TariffId, ContractEvent] =
builder.stream[String, ContractUpsertAvro](settings.contractsTopicName)
.flatMap { (_, contractAvro) =>
ContractEvent.from(contractAvro)
.map(contractEvent => (contractEvent.tariffId, contractEvent))
}
val tariffsTable: KTable[TariffId, Tariff] =
builder.stream[String, TariffUpdateEventAvro](settings.tariffTopicName)
.flatMapValues(Tariff.fromAvro(_))
.selectKey((_, tariff) => tariff.tariffId)
.toTable(Materialized.`with`(tariffIdSerde, tariffSerde)) // Materialized.as also throws the same IllegalStateExceptions
contractEventStream
.join(tariffsTable)(JourneyStep.from(_, _).asInstanceOf[ContractCreated])(Joined.`with`(tariffIdSerde, contractEventSerde, tariffSerde))
.selectKey((_, contractUpdated) => contractUpdated.accountId)
.foreach((_, journeyStep) => println(journeyStep))
The join gives the following exception:
java.lang.IllegalStateException: Tried to lookup lag for unknown task 3_0
at org.apache.kafka.streams.processor.internals.assignment.ClientState.lagFor(ClientState.java:306)
at java.util.Comparator.lambda$comparingLong$6043328a$1(Comparator.java:511)
at java.util.Comparator.lambda$thenComparing$36697e65$1(Comparator.java:216)
at java.util.TreeMap.compare(TreeMap.java:1295)
at java.util.TreeMap.put(TreeMap.java:538)
at java.util.TreeSet.add(TreeSet.java:255)
at java.util.AbstractCollection.addAll(AbstractCollection.java:344)
at java.util.TreeSet.addAll(TreeSet.java:312)
at org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.getPreviousTasksByLag(StreamsPartitionAssignor.java:1275)
at org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.assignTasksToThreads(StreamsPartitionAssignor.java:1189)
at org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.computeNewAssignment(StreamsPartitionAssignor.java:940)
at org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.assign(StreamsPartitionAssignor.java:399)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:589)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:684)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1000(AbstractCoordinator.java:111)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:597)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:560)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1160)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1135)
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:206)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:169)
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:129)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:602)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:412)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:297)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1296)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1237)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:624)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
I can't see what I am doing wrong. The code above works with Kafka 2.5.1. Anyone has any idea what is going on?
The problem is caused by the Kafka Streams cache, which it keeps on disk. This cache is specific to Kafka-version and to the Kafka Streams topology you use (ie. a change in your topology could also lead to this error).
The cache is usually found in /tmp or elsewhere if you passed in the "state.dir" property to Kafka Streams. Clear the directory with the cache and you should be able to cleanly start again.

node-rdkafka - debug set to all but I only see broker transport failure

I am trying to connect to kafka server. Authentication is based on GSSAPI.
/opt/app-root/src/server/node_modules/node-rdkafka/lib/error.js:411
return new LibrdKafkaError(e);
^
Error: broker transport failure
at Function.createLibrdkafkaError (/opt/app-root/src/server/node_modules/node-rdkafka/lib/error.js:411:10)
at /opt/app-root/src/server/node_modules/node-rdkafka/lib/client.js:350:28
This my test_kafka.js:
const Kafka = require('node-rdkafka');
const kafkaConf = {
'group.id': 'espdev2',
'enable.auto.commit': true,
'metadata.broker.list': 'br01',
'security.protocol': 'SASL_SSL',
'sasl.kerberos.service.name': 'kafka',
'sasl.kerberos.keytab': 'svc_esp_kafka_nonprod.keytab',
'sasl.kerberos.principal': 'svc_esp_kafka_nonprod#INT.LOCAL',
'debug': 'all',
'enable.ssl.certificate.verification': true,
//'ssl.certificate.location': 'some-root-ca.cer',
'ssl.ca.location': 'some-root-ca.cer',
//'ssl.key.location': 'svc_esp_kafka_nonprod.keytab',
};
const topics = 'hello1';
console.log(Kafka.features);
let readStream = new Kafka.KafkaConsumer.createReadStream(kafkaConf, { "auto.offset.reset": "earliest" }, { topics })
readStream.on('data', function (message) {
const messageString = message.value.toString();
console.log(`Consumed message on Stream: ${messageString}`);
});
You can look at this issue for the explanation of this error:
https://github.com/edenhill/librdkafka/issues/1987
Taken from #edenhill:
As a general rule for librdkafka-based clients: given that the cluster and client are correctly configured, all errors can be ignored as they are most likely temporary and librdkafka will attempt to recover automatically. In this specific case; if a group coordinator request fails it will be retried (using any broker in state Up) within 500ms. The current assignment and group membership will not be affected, if a new coordinator is found before the missing heartbeats times out the membership (session.timeout.ms).
Auto offset commits will be stalled until a new coordinator is found. In a future version we'll extend the error type to include a severity, allowing applications to happily ignore non-terminal errors. At this time an application should consider all errors informational, and not terminal.

How to continually consume messages from Apache Pulsar?

How do you continually consume messages from Apache Pulsar using Akka Streams and print each message?
Below is sample code I found from the pulsar4s library. Instead of publishing the messages to another topic, how do you print the consumed messages?
val consumerFn = () => client.consumer(ConsumerConfig(Seq(intopic), Subscription("mysub")))
val producerFn = () => client.producer(ProducerConfig(outtopic))
val control = source(consumerFn, Some(MessageId.earliest))
.map { consumerMessage => ProducerMessage(consumerMessage.data) }
.to(sink(producerFn)).run()
You can simply use Sink.foreach(println))
For example
source(consumerFn, Some(MessageId.earliest))
.runWith(Sink.foreach(println))

Gracefully restart a Reactive-Kafka Consumer Stream on failure

Problem
When I restart/complete/STOP stream the old Consumer does not Die/Shutdown:
[INFO ] a.a.RepointableActorRef -
Message [akka.kafka.KafkaConsumerActor$Internal$Stop$]
from Actor[akka://ufo-sightings/deadLetters]
to Actor[akka://ufo-sightings/system/kafka-consumer-1#1896610594]
was not delivered. [1] dead letters encountered.
Description
I'm building a service that receives a message from Kafka topic and sends the message to an external service via HTTP request.
A connection with the external service can be broken, and my service needs to retry the request.
Additionally, if there is an error in the Stream, entire stream needs to restart.
Finally, sometimes I don't need the stream and its corresponding Kafka-consumer and I would like to shut down the entire stream
So I have a Stream:
Consumer.committableSource(customizedSettings, subscriptions)
.flatMapConcat(sourceFunction)
.toMat(Sink.ignore)
.run
Http request is sent in sourceFunction
I followed new Kafka Consumer Restart instructions in the new documentation
RestartSource.withBackoff(
minBackoff = 20.seconds,
maxBackoff = 5.minutes,
randomFactor = 0.2 ) { () =>
Consumer.committableSource(customizedSettings, subscriptions)
.watchTermination() {
case (consumerControl, streamComplete) =>
logger.info(s" Started Watching Kafka consumer id = ${consumer.id} termination: is shutdown: ${consumerControl.isShutdown}, is f completed: ${streamComplete.isCompleted}")
consumerControl.isShutdown.map(_ => logger.info(s"Shutdown of consumer finally happened id = ${consumer.id} at ${DateTime.now}"))
streamComplete
.flatMap { _ =>
consumerControl.shutdown().map(_ -> logger.info(s"3.consumer id = ${consumer.id} SHUTDOWN at ${DateTime.now} GRACEFULLY:CLOSED FROM UPSTREAM"))
}
.recoverWith {
case _ =>
consumerControl.shutdown().map(_ -> logger.info(s"3.consumer id = ${consumer.id} SHUTDOWN at ${DateTime.now} ERROR:CLOSED FROM UPSTREAM"))
}
}
.flatMapConcat(sourceFunction)
}
.viaMat(KillSwitches.single)(Keep.right)
.toMat(Sink.ignore)(Keep.left)
.run
There is an issue opened that discusses this non-terminating Consumer in a complex Akka-stream, but there is no solution yet.
Is there a workaround that forces the Kafka Consumer termination
How about wrapping the consumer in an Actor and registering a KillSwitch, see: https://doc.akka.io/docs/akka/2.5/stream/stream-dynamic.html#dynamic-stream-handling
Then in the Actor postStop method you can terminate the stream.
By wrapping the Actor in a BackoffSupervisor, you get the exponential backoff.
Example actor: https://github.com/tradecloud/kafka-akka-extension/blob/master/src/main/scala/nl/tradecloud/kafka/KafkaSubscriberActor.scala#L27

How do I use Play2 Iteratees to consume streaming HTTP with different event names?

I want a functional way of consuming server-sent events (SSE) over HTTP (or streaming HTTP as some call it). Through examples (Scala: Receiving Server-Sent-Events) I've found that Play2 Iteratees work well with its WS client when the event name is set to "message." Here is what the "message" stream looks like:
GET http://streaming.server.com/temperature
event: message
data: {"room":"room1","temp":71,"time":"2015-05-06T00:23:10.203+02:00"}
event: message
data: {"room":"room1","temp":70,"time":"2015-05-06T00:31:18.873+02:00"}
...
And here's what my web client looks like:
import com.ning.http.client.AsyncHttpClientConfig.Builder
import play.api.libs.iteratee.Iteratee
import play.api.libs.iteratee.Execution.Implicits.defaultExecutionContext
import play.api.libs.ws.ning.NingWSClient
object Client extends App {
val client = new NingWSClient(new Builder().build())
def print = Iteratee.foreach { chunk: Array[Byte] => println(new String(chunk)) }
client.url("http://streaming.server.com/temperature").get(_ => print)
}
with some output that it printed to my console:
$ sbt run
[info] Running Client
data: {"room":"room1","temp": 70, "time":"2015-05-06T00:31:14.193+02:00"}
data: {"room":"room1","temp": 70, "time":"2015-05-06T00:31:18.873+02:00"}
...
But when I set "event" to some other value than "message" the Iteratee immediately returns the Done signal just after reading the first value and then stops the stream. The spec I'm required to satisfy uses "event":"put". Below is an example of what the "put" stream looks like:
GET http://streaming.server.com/temperature
event: put
data: {"room":"room1","temp":71,"time":"2015-05-06T00:39:14.281+02:00"}
event: put
data: {"room":"room1","temp":70,"time":"2015-05-06T00:39:18.778+02:00"}
...
I discovered this when I added an onComplete() handler at the end and matched on a Success case like so:
client.url("http://streaming.server.com/temperature").get(_ => print).onComplete {
case Success(s) => println(s)
case Failure(s) => println(f.getMessage)
}
This code now prints:
$ sbt run
[info] Running Client
data: {"room":"room1","temp": 71, "time":"2015-05-06T00:39:14.281+02:00"}
Done((),Empty)
So far, I've only had success with the Jersey library for Java whose semantics is very similar to the EventSource JavaScript client; however it doesn't compose and appears to only support single-threaded consumption of SSEs. I would much rather use the Play2 WS+Iteratee libraries. How can I achieve this?