How do you continually consume messages from Apache Pulsar using Akka Streams and print each message?
Below is sample code I found from the pulsar4s library. Instead of publishing the messages to another topic, how do you print the consumed messages?
val consumerFn = () => client.consumer(ConsumerConfig(Seq(intopic), Subscription("mysub")))
val producerFn = () => client.producer(ProducerConfig(outtopic))
val control = source(consumerFn, Some(MessageId.earliest))
.map { consumerMessage => ProducerMessage(consumerMessage.data) }
.to(sink(producerFn)).run()
You can simply use Sink.foreach(println))
For example
source(consumerFn, Some(MessageId.earliest))
.runWith(Sink.foreach(println))
Related
I am reading from kafka consumer, then persist message to db and send to producer on different topic. My akka streams app stops running in few seconds after launch.
Here is how my stream looks like.
Consumer.committableSource(consumerSettings, Subscriptions.topics(config.getString("topic")))
.mapAsync(8) {
msg => dbPersistActor.ask(msg.record.value()).map(_=> msg)
}.async
.map {
msg =>
ProducerMessage.Message(new ProducerRecord("test-output", msg.record.key(), msg.record.value())
, passThrough = msg.committableOffset)
}.via(Producer.flexiFlow(producerSettings))
.map(_.passThrough)
.via(Committer.flow(committerSettings))
.runWith(Sink.ignore)
I am learning ZIO integration with Apache Kafka, using the library zio-kafka. In the example on the Github main project page, they use a mapM function to commit the offset of a chunk:
Consumer.subscribeAnd(Subscription.topics("topic150"))
.plainStream(Serde.string, Serde.string)
.tap(cr => putStrLn(s"key: ${cr.record.key}, value: ${cr.record.value}"))
.map(_.offset)
.aggregateAsync(Consumer.offsetBatches)
.mapM(_.commit)
.runDrain
However, IMHO, committing the offset is a terminal operation on the stream. Which is the difference in using a ZSink, instead?
Consumer.subscribeAnd(Subscription.topics("topic150"))
.plainStream(Serde.string, Serde.string)
.tap(cr => putStrLn(s"key: ${cr.record.key}, value: ${cr.record.value}"))
.map(_.offset)
.aggregateAsync(Consumer.offsetBatches)
.run(ZSink.foreach(_.commit))
I just upgraded our Kafka Stream application from 2.5.1 to 2.6.2. It used to work, now it doesn't.
Here is the troublesome topology (I have omitted the irrelevant Serdes):
val builder = new StreamsBuilder()
val contractEventStream: KStream[TariffId, ContractEvent] =
builder.stream[String, ContractUpsertAvro](settings.contractsTopicName)
.flatMap { (_, contractAvro) =>
ContractEvent.from(contractAvro)
.map(contractEvent => (contractEvent.tariffId, contractEvent))
}
val tariffsTable: KTable[TariffId, Tariff] =
builder.stream[String, TariffUpdateEventAvro](settings.tariffTopicName)
.flatMapValues(Tariff.fromAvro(_))
.selectKey((_, tariff) => tariff.tariffId)
.toTable(Materialized.`with`(tariffIdSerde, tariffSerde)) // Materialized.as also throws the same IllegalStateExceptions
contractEventStream
.join(tariffsTable)(JourneyStep.from(_, _).asInstanceOf[ContractCreated])(Joined.`with`(tariffIdSerde, contractEventSerde, tariffSerde))
.selectKey((_, contractUpdated) => contractUpdated.accountId)
.foreach((_, journeyStep) => println(journeyStep))
The join gives the following exception:
java.lang.IllegalStateException: Tried to lookup lag for unknown task 3_0
at org.apache.kafka.streams.processor.internals.assignment.ClientState.lagFor(ClientState.java:306)
at java.util.Comparator.lambda$comparingLong$6043328a$1(Comparator.java:511)
at java.util.Comparator.lambda$thenComparing$36697e65$1(Comparator.java:216)
at java.util.TreeMap.compare(TreeMap.java:1295)
at java.util.TreeMap.put(TreeMap.java:538)
at java.util.TreeSet.add(TreeSet.java:255)
at java.util.AbstractCollection.addAll(AbstractCollection.java:344)
at java.util.TreeSet.addAll(TreeSet.java:312)
at org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.getPreviousTasksByLag(StreamsPartitionAssignor.java:1275)
at org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.assignTasksToThreads(StreamsPartitionAssignor.java:1189)
at org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.computeNewAssignment(StreamsPartitionAssignor.java:940)
at org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.assign(StreamsPartitionAssignor.java:399)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:589)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:684)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1000(AbstractCoordinator.java:111)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:597)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:560)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1160)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1135)
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:206)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:169)
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:129)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:602)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:412)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:297)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1296)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1237)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:624)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
I can't see what I am doing wrong. The code above works with Kafka 2.5.1. Anyone has any idea what is going on?
The problem is caused by the Kafka Streams cache, which it keeps on disk. This cache is specific to Kafka-version and to the Kafka Streams topology you use (ie. a change in your topology could also lead to this error).
The cache is usually found in /tmp or elsewhere if you passed in the "state.dir" property to Kafka Streams. Clear the directory with the cache and you should be able to cleanly start again.
Problem
When I restart/complete/STOP stream the old Consumer does not Die/Shutdown:
[INFO ] a.a.RepointableActorRef -
Message [akka.kafka.KafkaConsumerActor$Internal$Stop$]
from Actor[akka://ufo-sightings/deadLetters]
to Actor[akka://ufo-sightings/system/kafka-consumer-1#1896610594]
was not delivered. [1] dead letters encountered.
Description
I'm building a service that receives a message from Kafka topic and sends the message to an external service via HTTP request.
A connection with the external service can be broken, and my service needs to retry the request.
Additionally, if there is an error in the Stream, entire stream needs to restart.
Finally, sometimes I don't need the stream and its corresponding Kafka-consumer and I would like to shut down the entire stream
So I have a Stream:
Consumer.committableSource(customizedSettings, subscriptions)
.flatMapConcat(sourceFunction)
.toMat(Sink.ignore)
.run
Http request is sent in sourceFunction
I followed new Kafka Consumer Restart instructions in the new documentation
RestartSource.withBackoff(
minBackoff = 20.seconds,
maxBackoff = 5.minutes,
randomFactor = 0.2 ) { () =>
Consumer.committableSource(customizedSettings, subscriptions)
.watchTermination() {
case (consumerControl, streamComplete) =>
logger.info(s" Started Watching Kafka consumer id = ${consumer.id} termination: is shutdown: ${consumerControl.isShutdown}, is f completed: ${streamComplete.isCompleted}")
consumerControl.isShutdown.map(_ => logger.info(s"Shutdown of consumer finally happened id = ${consumer.id} at ${DateTime.now}"))
streamComplete
.flatMap { _ =>
consumerControl.shutdown().map(_ -> logger.info(s"3.consumer id = ${consumer.id} SHUTDOWN at ${DateTime.now} GRACEFULLY:CLOSED FROM UPSTREAM"))
}
.recoverWith {
case _ =>
consumerControl.shutdown().map(_ -> logger.info(s"3.consumer id = ${consumer.id} SHUTDOWN at ${DateTime.now} ERROR:CLOSED FROM UPSTREAM"))
}
}
.flatMapConcat(sourceFunction)
}
.viaMat(KillSwitches.single)(Keep.right)
.toMat(Sink.ignore)(Keep.left)
.run
There is an issue opened that discusses this non-terminating Consumer in a complex Akka-stream, but there is no solution yet.
Is there a workaround that forces the Kafka Consumer termination
How about wrapping the consumer in an Actor and registering a KillSwitch, see: https://doc.akka.io/docs/akka/2.5/stream/stream-dynamic.html#dynamic-stream-handling
Then in the Actor postStop method you can terminate the stream.
By wrapping the Actor in a BackoffSupervisor, you get the exponential backoff.
Example actor: https://github.com/tradecloud/kafka-akka-extension/blob/master/src/main/scala/nl/tradecloud/kafka/KafkaSubscriberActor.scala#L27
I want a functional way of consuming server-sent events (SSE) over HTTP (or streaming HTTP as some call it). Through examples (Scala: Receiving Server-Sent-Events) I've found that Play2 Iteratees work well with its WS client when the event name is set to "message." Here is what the "message" stream looks like:
GET http://streaming.server.com/temperature
event: message
data: {"room":"room1","temp":71,"time":"2015-05-06T00:23:10.203+02:00"}
event: message
data: {"room":"room1","temp":70,"time":"2015-05-06T00:31:18.873+02:00"}
...
And here's what my web client looks like:
import com.ning.http.client.AsyncHttpClientConfig.Builder
import play.api.libs.iteratee.Iteratee
import play.api.libs.iteratee.Execution.Implicits.defaultExecutionContext
import play.api.libs.ws.ning.NingWSClient
object Client extends App {
val client = new NingWSClient(new Builder().build())
def print = Iteratee.foreach { chunk: Array[Byte] => println(new String(chunk)) }
client.url("http://streaming.server.com/temperature").get(_ => print)
}
with some output that it printed to my console:
$ sbt run
[info] Running Client
data: {"room":"room1","temp": 70, "time":"2015-05-06T00:31:14.193+02:00"}
data: {"room":"room1","temp": 70, "time":"2015-05-06T00:31:18.873+02:00"}
...
But when I set "event" to some other value than "message" the Iteratee immediately returns the Done signal just after reading the first value and then stops the stream. The spec I'm required to satisfy uses "event":"put". Below is an example of what the "put" stream looks like:
GET http://streaming.server.com/temperature
event: put
data: {"room":"room1","temp":71,"time":"2015-05-06T00:39:14.281+02:00"}
event: put
data: {"room":"room1","temp":70,"time":"2015-05-06T00:39:18.778+02:00"}
...
I discovered this when I added an onComplete() handler at the end and matched on a Success case like so:
client.url("http://streaming.server.com/temperature").get(_ => print).onComplete {
case Success(s) => println(s)
case Failure(s) => println(f.getMessage)
}
This code now prints:
$ sbt run
[info] Running Client
data: {"room":"room1","temp": 71, "time":"2015-05-06T00:39:14.281+02:00"}
Done((),Empty)
So far, I've only had success with the Jersey library for Java whose semantics is very similar to the EventSource JavaScript client; however it doesn't compose and appears to only support single-threaded consumption of SSEs. I would much rather use the Play2 WS+Iteratee libraries. How can I achieve this?