alpakka jms client acknowledgement mode delivery guarantee - apache-kafka

I have an alpakka JMS source -> kafka sink kind of a flow. I'm looking at the alpakka jms consumer documentation and trying to figure out what kind of delivery guarantees this gives me.
From https://doc.akka.io/docs/alpakka/current/jms/consumer.html
val result: Future[immutable.Seq[javax.jms.Message]] =
jmsSource
.take(msgsIn.size)
.map { ackEnvelope =>
ackEnvelope.acknowledge()
ackEnvelope.message
}
.runWith(Sink.seq)
I'm hoping that the way this actually works is that messages will only be ack'ed once the sinking succeeds (for at-least-once delivery guarantees), but I can't rely on wishful thinking.
Given that alpakka does not seem to utilize any kind of own state that persists across restarts, I can't think how I'd be able to get exactly-once guarantees a'la flink here, but can I at least count on at-least-once, or would I have to (somehow) ack in a map of a kafka producer flexiFlow (https://doc.akka.io/docs/alpakka-kafka/current/producer.html#producer-as-a-flow)
Thanks,
Fil

In that stream, the ack will happen before messages are added to the materialized sequence and before result becomes available for you to do anything (i.e. the Future completes). It therefore would be at-most-once.
To delay the ack until some processing has succeeded, the easiest approach is to keep what you're doing with the messages in the flow rather than materialize a future. The Alpakka Kafka producer supports a pass-through element which could be the JMS message:
val topic: String = ???
def javaxMessageToKafkaMessage[Key, Value](
ae: AckEnvelope,
kafkaKeyFor: javax.jms.Message => Key,
kafkaValueFor: javax.jms.Message => Value
): ProducerMessage.Envelope[Key, Value, javax.jms.Message] = {
val key = kafkaKeyFor(ae.message)
val value = kafkaValueFor(ae.message)
ProducerMessage.single(new ProducerRecord(topic, key, value), ae)
}
// types K and V are unspecified...
jmsSource
.map(
javaxMessageToKafkaMessage[K, V](
_,
{ _ => ??? },
{ _ => ??? }
)
)
.via(Producer.flexiFlow(producerSettings))
.to(
Sink.foreach { results =>
val msg = results.passThrough
msg.acknowledge()
}
)(Keep.both)
running this stream will materialize as a tuple of a JmsConsumerControl with a Future[Done]. Not being familiar with JMS, I don't know how a shutdown of the consumer control would interact with the acks.

Related

Akka streams RestartSink doesn't seem to be restarting during failures

I am playing around with handling errors in akka streams with restartable sources & sinks.
object Main extends App {
implicit val system: ActorSystem = ActorSystem("akka-streams-system")
val restartSettings =
RestartSettings(1.seconds, 10.seconds, 0.2d)
val restartableSource = RestartSource.onFailuresWithBackoff(restartSettings) {() => {
Source(0 to 10)
.map(n =>
if (n < 5) n.toString
else throw new RuntimeException("Boom!"))
}}
val restartableSink: Sink[String, NotUsed] = RestartSink.withBackoff(restartSettings){
() => Sink.fold("")((_, newVal) => {
if(newVal == "3") {
println(newVal + " Exception")
throw new RuntimeException("Kabooom!!!") // TRIGGERRING A FAILURE expecting the steam to restart just the sink.
} else {
println(newVal + " sink")
}
newVal
})
}
restartableSource.runWith(restartableSink)
}
I am breaking source and sink separately with different scenarios. I am breaking sink first expecting the sink to be restarting and reprocessing the newVal == 3 message over and over again.
But it seems like the error in the sink is just thrown away and only the source failure is retried so the source ends up being restarted and reprocess the events starting from 0.
I am mimicking a scenario where I want to read from a source(let's say from a file) and have an HTTP sink that retries failed HTTP requests independently without restarting the whole stream's pipeline.
The output I get with the above-shared code is as follows.
0 sink
1 sink
2 sink
3 Exception
4 sink
[WARN] [01/10/2022 09:13:14.647] [akka-streams-system-akka.actor.default-dispatcher-6] [RestartWithBackoffSource(akka://akka-streams-system)] Restarting stream due to failure [1]: java.lang.RuntimeException: Boom!
java.lang.RuntimeException: Boom!
at Main$.$anonfun$restartableSource$2(Main.scala:18)
at Main$.$anonfun$restartableSource$2$adapted(Main.scala:16)
at akka.stream.impl.fusing.Map$$anon$1.onPush(Ops.scala:52)
at akka.stream.impl.fusing.GraphInterpreter.processPush(GraphInterpreter.scala:542)
at akka.stream.impl.fusing.GraphInterpreter.processEvent(GraphInterpreter.scala:496)
at akka.stream.impl.fusing.GraphInterpreter.execute(GraphInterpreter.scala:390)
at akka.stream.impl.fusing.GraphInterpreterShell.runBatch(ActorGraphInterpreter.scala:650)
at akka.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:521)
at akka.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:625)
at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:800)
at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$shortCircuitBatch(ActorGraphInterpreter.scala:787)
at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:819)
at akka.actor.Actor.aroundReceive(Actor.scala:537)
at akka.actor.Actor.aroundReceive$(Actor.scala:535)
at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:716)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
at akka.actor.ActorCell.invoke(ActorCell.scala:548)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
at akka.dispatch.Mailbox.run(Mailbox.scala:231)
at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1016)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1665)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1598)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
I would appreciate any help on reasoning about why this is happening and how to restart the sink independently of the source.
Your RestartSink is restarting (and not in the process restarting anything else): if it wasn't, you would never have gotten 4 sink as output right after 3 Exception. For some reason it's not logging, but that might be due to stream attributes (there's also been some behavioral changes around logging in the stream restarts in recent months, so logging may differ depending on what version you're running).
From the docs for RestartSink:
The restart process is inherently lossy, since there is no coordination between cancelling and the sending of messages. When the wrapped Sink does cancel, this Sink will backpressure, however any elements already sent may have been lost.
This is fundamentally because in the general case stream stages are memoryless. In your Sink.fold example, it will restart with clean state (viz. ""). This does, in my experience, make the RestartSink and RestartFlow somewhat less useful than the RestartSource.
For the use-case you describe, I would tend to use a mapAsync stage with akka.pattern.RetrySupport to send HTTP requests via a Future-based API and retry requests on failures:
val restartingSource: Source[Element, _] = ???
restartingSource.mapAsync(1) { elem =>
import akka.pattern.RetrySupport._
// will need an implicit ExecutionContext and an implicit Scheduler (both are probably best obtained from the ActorSystem)
val sendRequest = () => {
// Future-based HTTP call
???
}
retry(
attempt = sendRequest,
attempts = Int.MaxValue,
minBackoff = 1.seconds,
maxBackoff = 10.seconds,
randomFactor = 0.2
)
}.runWith(Sink.ignore)

Akka Stream continuously consume websocket

Im kinda new to Scala and Akka Stream and im trying to get JSON String messages from a websocket and push them to a Kafka topic.
For now i am only working on the "get messages from a ws" part.
Messages coming from the websocket looks like this :
{
"bitcoin":"6389.06534240",
"ethereum":"192.93111286",
"monero":"108.90302506",
"litecoin":"52.25484165"
}
I want to split this JSON message to multiple messages :
{"coin": "bitcoin", "price": "6389.06534240"}
{"coin": "ethereum", "price": "192.93111286"}
{"coin": "monero", "price": "108.90302506"}
{"coin": "litecoin", "price": "52.25484165"}
And then push each of these messages to a kafka topic.
Here's what i achieved so far :
val message_decomposition: Flow[Message, String, NotUsed] = Flow[Message].mapConcat(
msg => msg.toString.replaceAll("[{})(]", "").split(",")
).map( msg => {
val splitted = msg.split(":")
s"{'coin': ${splitted(0)}, 'price': ${splitted(1)}}"
})
val sink: Sink[String, Future[Done]] = Sink.foreach[String](println)
val flow: Flow[Message, Message, Promise[Option[Message]]] =
Flow.fromSinkAndSourceMat(
message_decomposition.to(sink),
Source.maybe[Message])(Keep.right)
val (upgradeResponse, promise) = Http().singleWebSocketRequest(
WebSocketRequest("wss://ws.coincap.io/prices?assets=ALL"),
flow)
It's working im getting the expected output Json message but i was wondering if i could write this producer in a more "Akka-ish" style, like using GraphDSL. So i have a few questions :
Is it possible to continuously consume a WebSocket using a GraphDSL ? If yes, can you show me a example please ?
Is it a good idea to consume the WS using a GraphDSL ?
Should i decompose the received Json Message like im doing before sending it to kafka ? Or it's better to send it as it is for lower latency ?
After producing the message to Kafka, i am planning to consume it using Apache Storm, is it a good idea ? Or should i stick with Akka ?
Thanks for reading me, Regards,
Arès
That code is plenty Akka-ish: scaladsl is just as Akka as the GraphDSL or implementing a custom GraphStage. The only reason, IMO/E, to go to the GraphDSL is if the actual shape of the graph isn't readily expressible in the scaladsl.
I would personally go the route of defining a CoinPrice class to make the model explicit
case class CoinPrice(coin: String, price: BigDecimal)
And then have a Flow[Message, CoinPrice, NotUsed] which parses 1 incoming message into zero or more CoinPrices. Something (using Play JSON here) like:
val toCoinPrices =
Flow[Message]
.mapConcat { msg =>
Json.parse(msg.toString)
.asOpt[JsObject]
.toList
.flatMap { json =>
json.underlying.flatMap { kv =>
import scala.util.Try
kv match {
case (coin, JsString(priceStr)) =>
Try(BigDecimal(priceStr)).toOption
.map(p => CoinPrice(coin, p))
case (coin, JsNumber(price)) => Some(CoinPrice(coin, price))
case _ => None
}
}
}
}
You might, depending on what the size of the JSONs in the message are, want to break that into different stream stages to allow for an async boundary between the JSON parse and the extraction into CoinPrices. For example,
Flow[Message]
.mapConcat { msg =>
Json.parse(msg.toString).asOpt[JsObject].toList
}
.async
.mapConcat { json =>
json.underlying.flatMap { kv =>
import scala.util.Try
kv match {
case (coin, JsString(priceStr)) =>
Try(BigDecimal(priceStr)).toOption
.map(p => CoinPrice(coin, p))
case (coin, JsNumber(price)) => Some(CoinPrice(coin, price))
case _ => None
}
}
}
In the above, the stages on either side of the async boundary will execute in separate actors and thus, possibly concurrently (if there's enough CPU cores available etc.), at the cost of extra overhead for the actors to coordinate and exchange messages. That extra coordination/communication overhead (cf. Gunther's Universal Scalability Law) is only going to be worth it if the JSON objects are sufficiently large and coming in sufficiently quickly (consistently coming in before the previous one has finished processing).
If your intention is to consume the websocket until the program stops, you might find it clearer to just use Source.never[Message].

How to use Futures within Kafka Streams

When using the org.apache.kafka.streams.KafkaStreams library in Scala, I have been trying to read in an inputStream, pass that information over to a method: validateAll(infoToValidate) that returns a Future, resolve that and then send to an output stream.
Example:
builder.stream[String, Object](REQUEST_TOPIC)
.mapValues(v => ValidateFormat.from(v.asInstanceOf[GenericRecord]))
.mapValues(infoToValidate => {
SuccessFailFormat.to(validateAll(infoToValidate))
})
Is there any documentation on performing this? I have looked into filter() and transform() but still not sure how to deal with Futures in KStreams.
The answer depends whether you need to preserve the original order of your messages. If yes, then you will have to block in one way or the other. For example:
val duration = 10 seconds // whatever your timeout should be or Duration.Inf
sourceStream
.mapValues(x => Await.result(validate(x), duration))
.to(outputTopic)
If however the order is not important, you can simply use a Kafka producer:
sourceStream
.mapValues(x => validate(x)) // now you have KStream[.., Future[...]]
.foreach { future =>
future.foreach { item =>
val record = new ProducerRecord(outputTopic, key, item)
producer.send(record) // provided you have the implicit serializer
}
}

How to 'Chunk and Re-assmble' large messages in Reactive Kafka using Akka-Stream

When sending a large file using Kafka, is it possible to distribute it across partitions and then re-assemble it using Akka-Stream? as described in this presentation:
http://www.slideshare.net/JiangjieQin/handle-large-messages-in-apache-kafka-58692297
The "chunking" side, i.e. the producer, is easy enough to write using something like reactive kafka:
case class LargeMessage(bytes : Seq[Byte], topic : String)
def messageToKafka(message : LargeMessage, maxMessageSize : Int) =
Source.fromIterator(() => message.bytes.toIterator)
.via(Flow[Byte].grouped(maxMessageSize))
.via(Flow[Seq[Byte]].map(seq => new ProducerRecord(message.topic, seq)))
.runWith(Producer.plainSink(producerSettings)
The "re-assembling", i.e. the consumer, can be implemented in a manner similar to the documentation:
val messageFut : Future[LargeMessage] =
for {
bytes <- Consumer.map(_._1).runWith(Sink.seq[Byte])
} yield LargeMessage(bytes, topic)

How to use an Akka Streams SourceQueue with PlayFramework

I would like to use a SourceQueue to push elements dynamically into an Akka Stream source.
Play controller needs a Source to be able to stream a result using the chuncked method.
As Play uses its own Akka Stream Sink under the hood, I can't materialize the source queue myself using a Sink because the source would be consumed before it's used by the chunked method (except if I use the following hack).
I'm able to make it work if I pre-materialize the source queue using a reactive-streams publisher, but it's a kind of 'dirty hack' :
def sourceQueueAction = Action{
val (queue, pub) = Source.queue[String](10, OverflowStrategy.fail).toMat(Sink.asPublisher(false))(Keep.both).run()
//stupid example to push elements dynamically
val tick = Source.tick(0 second, 1 second, "tick")
tick.runForeach(t => queue.offer(t))
Ok.chunked(Source.fromPublisher(pub))
}
Is there a simpler way to use an Akka Streams SourceQueue with PlayFramework?
Thanks
The solution is to use mapMaterializedValue on the source to get a future of its queue materialization :
def sourceQueueAction = Action {
val (queueSource, futureQueue) = peekMatValue(Source.queue[String](10, OverflowStrategy.fail))
futureQueue.map { queue =>
Source.tick(0.second, 1.second, "tick")
.runForeach (t => queue.offer(t))
}
Ok.chunked(queueSource)
}
//T is the source type, here String
//M is the materialization type, here a SourceQueue[String]
def peekMatValue[T, M](src: Source[T, M]): (Source[T, M], Future[M]) = {
val p = Promise[M]
val s = src.mapMaterializedValue { m =>
p.trySuccess(m)
m
}
(s, p.future)
}
Would like to share an insight I got today, though it may not be appropriate to your case with Play.
Instead of thinking of a Source to trigger, one can often turn the problem upside down and provide a Sink to the function that does the sourcing.
In such a case, the Sink would be the "recipe" (non-materialized) stage and we can now use Source.queue and materialize it right away. Got queue. Got the flow that it runs.