Akka streams RestartSink doesn't seem to be restarting during failures - scala

I am playing around with handling errors in akka streams with restartable sources & sinks.
object Main extends App {
implicit val system: ActorSystem = ActorSystem("akka-streams-system")
val restartSettings =
RestartSettings(1.seconds, 10.seconds, 0.2d)
val restartableSource = RestartSource.onFailuresWithBackoff(restartSettings) {() => {
Source(0 to 10)
.map(n =>
if (n < 5) n.toString
else throw new RuntimeException("Boom!"))
}}
val restartableSink: Sink[String, NotUsed] = RestartSink.withBackoff(restartSettings){
() => Sink.fold("")((_, newVal) => {
if(newVal == "3") {
println(newVal + " Exception")
throw new RuntimeException("Kabooom!!!") // TRIGGERRING A FAILURE expecting the steam to restart just the sink.
} else {
println(newVal + " sink")
}
newVal
})
}
restartableSource.runWith(restartableSink)
}
I am breaking source and sink separately with different scenarios. I am breaking sink first expecting the sink to be restarting and reprocessing the newVal == 3 message over and over again.
But it seems like the error in the sink is just thrown away and only the source failure is retried so the source ends up being restarted and reprocess the events starting from 0.
I am mimicking a scenario where I want to read from a source(let's say from a file) and have an HTTP sink that retries failed HTTP requests independently without restarting the whole stream's pipeline.
The output I get with the above-shared code is as follows.
0 sink
1 sink
2 sink
3 Exception
4 sink
[WARN] [01/10/2022 09:13:14.647] [akka-streams-system-akka.actor.default-dispatcher-6] [RestartWithBackoffSource(akka://akka-streams-system)] Restarting stream due to failure [1]: java.lang.RuntimeException: Boom!
java.lang.RuntimeException: Boom!
at Main$.$anonfun$restartableSource$2(Main.scala:18)
at Main$.$anonfun$restartableSource$2$adapted(Main.scala:16)
at akka.stream.impl.fusing.Map$$anon$1.onPush(Ops.scala:52)
at akka.stream.impl.fusing.GraphInterpreter.processPush(GraphInterpreter.scala:542)
at akka.stream.impl.fusing.GraphInterpreter.processEvent(GraphInterpreter.scala:496)
at akka.stream.impl.fusing.GraphInterpreter.execute(GraphInterpreter.scala:390)
at akka.stream.impl.fusing.GraphInterpreterShell.runBatch(ActorGraphInterpreter.scala:650)
at akka.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:521)
at akka.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:625)
at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:800)
at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$shortCircuitBatch(ActorGraphInterpreter.scala:787)
at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:819)
at akka.actor.Actor.aroundReceive(Actor.scala:537)
at akka.actor.Actor.aroundReceive$(Actor.scala:535)
at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:716)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
at akka.actor.ActorCell.invoke(ActorCell.scala:548)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
at akka.dispatch.Mailbox.run(Mailbox.scala:231)
at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1016)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1665)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1598)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
I would appreciate any help on reasoning about why this is happening and how to restart the sink independently of the source.

Your RestartSink is restarting (and not in the process restarting anything else): if it wasn't, you would never have gotten 4 sink as output right after 3 Exception. For some reason it's not logging, but that might be due to stream attributes (there's also been some behavioral changes around logging in the stream restarts in recent months, so logging may differ depending on what version you're running).
From the docs for RestartSink:
The restart process is inherently lossy, since there is no coordination between cancelling and the sending of messages. When the wrapped Sink does cancel, this Sink will backpressure, however any elements already sent may have been lost.
This is fundamentally because in the general case stream stages are memoryless. In your Sink.fold example, it will restart with clean state (viz. ""). This does, in my experience, make the RestartSink and RestartFlow somewhat less useful than the RestartSource.
For the use-case you describe, I would tend to use a mapAsync stage with akka.pattern.RetrySupport to send HTTP requests via a Future-based API and retry requests on failures:
val restartingSource: Source[Element, _] = ???
restartingSource.mapAsync(1) { elem =>
import akka.pattern.RetrySupport._
// will need an implicit ExecutionContext and an implicit Scheduler (both are probably best obtained from the ActorSystem)
val sendRequest = () => {
// Future-based HTTP call
???
}
retry(
attempt = sendRequest,
attempts = Int.MaxValue,
minBackoff = 1.seconds,
maxBackoff = 10.seconds,
randomFactor = 0.2
)
}.runWith(Sink.ignore)

Related

alpakka jms client acknowledgement mode delivery guarantee

I have an alpakka JMS source -> kafka sink kind of a flow. I'm looking at the alpakka jms consumer documentation and trying to figure out what kind of delivery guarantees this gives me.
From https://doc.akka.io/docs/alpakka/current/jms/consumer.html
val result: Future[immutable.Seq[javax.jms.Message]] =
jmsSource
.take(msgsIn.size)
.map { ackEnvelope =>
ackEnvelope.acknowledge()
ackEnvelope.message
}
.runWith(Sink.seq)
I'm hoping that the way this actually works is that messages will only be ack'ed once the sinking succeeds (for at-least-once delivery guarantees), but I can't rely on wishful thinking.
Given that alpakka does not seem to utilize any kind of own state that persists across restarts, I can't think how I'd be able to get exactly-once guarantees a'la flink here, but can I at least count on at-least-once, or would I have to (somehow) ack in a map of a kafka producer flexiFlow (https://doc.akka.io/docs/alpakka-kafka/current/producer.html#producer-as-a-flow)
Thanks,
Fil
In that stream, the ack will happen before messages are added to the materialized sequence and before result becomes available for you to do anything (i.e. the Future completes). It therefore would be at-most-once.
To delay the ack until some processing has succeeded, the easiest approach is to keep what you're doing with the messages in the flow rather than materialize a future. The Alpakka Kafka producer supports a pass-through element which could be the JMS message:
val topic: String = ???
def javaxMessageToKafkaMessage[Key, Value](
ae: AckEnvelope,
kafkaKeyFor: javax.jms.Message => Key,
kafkaValueFor: javax.jms.Message => Value
): ProducerMessage.Envelope[Key, Value, javax.jms.Message] = {
val key = kafkaKeyFor(ae.message)
val value = kafkaValueFor(ae.message)
ProducerMessage.single(new ProducerRecord(topic, key, value), ae)
}
// types K and V are unspecified...
jmsSource
.map(
javaxMessageToKafkaMessage[K, V](
_,
{ _ => ??? },
{ _ => ??? }
)
)
.via(Producer.flexiFlow(producerSettings))
.to(
Sink.foreach { results =>
val msg = results.passThrough
msg.acknowledge()
}
)(Keep.both)
running this stream will materialize as a tuple of a JmsConsumerControl with a Future[Done]. Not being familiar with JMS, I don't know how a shutdown of the consumer control would interact with the acks.

akka-stream pipeline backpressures despite inserted buffer

I have an akka-stream pipeline that fans out events (via BroadcastHub) that are pushed into the stream via a SourceQueueWithComplete.
Despite all downstream consumers having a .buffer() inserted (of which I'd expect that it ensures that upstream buffers of hub and queue remain drained), I still observe backpressure kicking in after the system has been running for a while.
Here's a (simplified) snippet:
class NotificationHub[Event](
implicit materializer: Materializer,
ecForLogging: ExecutionContext
) {
// a SourceQueue to enque events and a BroadcastHub to allow multiple subscribers
private val (queue, broadCastSource) =
Source.queue[Event](
bufferSize = 64,
// we expect the buffer to never run full and if it does, we want
// to log that asap, so we use OverflowStrategy.backpressure
OverflowStrategy.backpressure
).toMat(BroadcastHub.sink)(Keep.both).run()
// This keeps the BroadCastHub drained while there are no subscribers
// (see https://doc.akka.io/docs/akka/current/stream/stream-dynamic.html ):
broadCastSource.to(Sink.ignore).run()
def notificationSource(p: Event => Boolean): Source[Unit, NotUsed] = {
broadCastSource
.collect { case event if p(event) => () }
// this buffer is intended to keep the upstream buffers of
// queue and hub drained:
.buffer(
// if a downstream consumer ever becomes too slow to consume,
// only the latest two notifications are relevant
size = 2,
// doesn't really matter whether we drop head or tail
// as all elements are the same (), it's just important not
// to backpressure in case of overflow:
OverflowStrategy.dropHead
)
}
def propagateEvent(
event: Event
): Unit = {
queue.offer(event).onComplete {
case Failure(e) =>
// unexpected backpressure occurred!
println(e.getMessage)
e.printStackTrace()
case _ =>
()
}
}
}
Since the doc for buffer() says that for DropHead it never backpressures, I would have expected that the upstream buffers remain drained. Yet still I end up with calls to queue.offer() failing because of backpressure.
Reasons that I could think of:
Evaluation of predicate p in .collect causes a lot of load and hence backpressure. This seems very unlikely because those are very simple non-blocking ops.
Overall system is totally overloaded. Also rather unlikely.
I have a felling I am missing something? Do I maybe need to add an async boundary via .async before or after buffer() to fully decouple the "hub" from possible heavy load that may occur somewhere further downstream?
So after more reading of akka docs and some experiments, I think I found the solution (sorry for maybe asking here too early).
To fully detach my code from any heavy load that may occur somewhere downstream, I need to ensure that any downstream code is not executed by the same actor as the .buffer() (e.g. by inserting .async).
For example, this code would eventually lead to the SourceQueue running full and then backpressuring:
val hub: NotifactionHub[Int] = // ...
hub.notificationSource(_ => true)
.map { x =>
Thread.sleep(250)
x
}
Further inspections showed that this .map() would be executed on the same thread (of the underlying actor) as the upstream .collect() (and .buffer()).
When inserting .async as shown below, the .buffer() would drop elements (as I had intended it to) and the upstream SourceQueue would remaind drained:
val hub: NotifactionHub[Int] = // ...
hub.notificationSource(_ => true)
.async
.map { x =>
Thread.sleep(250)
x
}

How to wait for file upload stream to complete in Akka actor

Recently I started using Akka and I am using it to create a REST API using Akka HTTP to upload a file. The file can have millions of records, and for each record I need to perform some validation and business logic. The way I have modeled my actors are, the root actor receives the file stream, converts bytes to String and then splits the records by line separator. After doing this it sends the stream (record by record) to another actor for processing, which in turn distributes the records to other actors based on some grouping. To send the steam from the main root actor to the actor for processing I am using Sink.actorRefWithAck.
This is working fine for a small file, but for a large file what I have observed is, I am getting multiple chunks and the first chunk is getting processed. If I add Thread.sleep for a few seconds based on the load, then it is processing the whole file. I am wondering if there is any way I can know if the stream has been consumed by the processing actor completely so that I don't have to deal with Thread.sleep. Here is the code snippet that I have used:
val AckMessage = DefaultFileUploadProcessActor.Ack
val receiver = context.system.actorOf(
Props(new DefaultFileUploadProcessActor(uuid, sourceId)(self, ackWith = AckMessage)))
// sent from stream to actor to indicate start, end or failure of stream:
val InitMessage = DefaultFileUploadProcessActor.StreamInitialized
val OnCompleteMessage = DefaultFileUploadProcessActor.StreamCompleted
val onErrorMessage = (ex: Throwable) => DefaultFileUploadProcessActor.StreamFailure(ex)
val actorSink = Sink.actorRefWithAck(
receiver,
onInitMessage = InitMessage,
ackMessage = AckMessage,
onCompleteMessage = OnCompleteMessage,
onFailureMessage = onErrorMessage
)
val processStream =
fileStream
.map(byte => byte.utf8String.split(System.lineSeparator()))
.runWith(actorSink)
Thread.sleep(9000)
log.info(s"completed distribution of data to the actors")
sender() ! ActionPerformed(uuid, "Done")
Any expert advice on the approach I have taken will be highly appreciated.
If you have Source with only one file you can await the stream completion by awaiting Future which is returned from runWith method.
If you have Source of multiple files, you should write something like:
filesSource
.mapAsync(1)(data => (receiver ? data).mapTo[ProcessingResult])
.mapAsync(1)(processingResult => (resultListener ? processingResult).mapTo[ListenerResponse])
.runWith(Sink.ignore)
Assuming that fileStream is a Source[ByteString, Future[IOResult], one idea is to retain the materialized value of the source, then fire off the reply to sender() once this materialized value has completed:
val processStream: Future[IOResult] =
fileStream
.map(_.utf8String.split(System.lineSeparator()))
.to(actorSink)
.run()
processStream.onComplete {
case Success(_) =>
log.info("completed distribution of data to the actors")
sender() ! ActionPerformed(uuid, "Done")
case Failure(t) =>
// ...
}
The above approach ensures that the entire file is consumed before the sender is notified.
Note that Akka Streams has a Framing object that can parse lines from a ByteString stream:
val processStream: Future[IOResult] =
fileStream
.via(Framing.delimiter(
ByteString(System.lineSeparator()),
maximumFrameLenght = 256,
allowTruncation = true))
.map(_.ut8String)
.to(actorSink) // the actor will have to expect String, not Array[String], messages
.run()
The receiver actor will receive the OnCompleteMessage or onErrorMessage when the stream has been completed successfully or with failure, so you should handle those messages in the receive block of the receiver DefaultFileUploadProcessActor actor.

How can Akka streams be materialized continually?

I am using Akka Streams in Scala to poll from an AWS SQS queue using the AWS Java SDK. I created an ActorPublisher which dequeues messages on a two second interval:
class SQSSubscriber(name: String) extends ActorPublisher[Message] {
implicit val materializer = ActorMaterializer()
val schedule = context.system.scheduler.schedule(0 seconds, 2 seconds, self, "dequeue")
val client = new AmazonSQSClient()
client.setRegion(RegionUtils.getRegion("us-east-1"))
val url = client.getQueueUrl(name).getQueueUrl
val MaxBufferSize = 100
var buf = Vector.empty[Message]
override def receive: Receive = {
case "dequeue" =>
val messages = iterableAsScalaIterable(client.receiveMessage(new ReceiveMessageRequest(url).getMessages).toList
messages.foreach(self ! _)
case message: Message if buf.size == MaxBufferSize =>
log.error("The buffer is full")
case message: Message =>
if (buf.isEmpty && totalDemand > 0)
onNext(message)
else {
buf :+= message
deliverBuf()
}
case Request(_) =>
deliverBuf()
case Cancel =>
context.stop(self)
}
#tailrec final def deliverBuf(): Unit =
if (totalDemand > 0) {
if (totalDemand <= Int.MaxValue) {
val (use, keep) = buf.splitAt(totalDemand.toInt)
buf = keep
use foreach onNext
} else {
val (use, keep) = buf.splitAt(Int.MaxValue)
buf = keep
use foreach onNext
deliverBuf()
}
}
}
In my application, I am attempting to run the flow at a 2 second interval as well:
val system = ActorSystem("system")
val sqsSource = Source.actorPublisher[Message](SQSSubscriber.props("queue-name"))
val flow = Flow[Message]
.map { elem => system.log.debug(s"${elem.getBody} (${elem.getMessageId})"); elem }
.to(Sink.ignore)
system.scheduler.schedule(0 seconds, 2 seconds) {
flow.runWith(sqsSource)(ActorMaterializer()(system))
}
However, when I run my application I receive java.util.concurrent.TimeoutException: Futures timed out after [20000 milliseconds] and subsequent dead letter notices which is caused by the ActorMaterializer.
Is there a recommended approach for continually materializing an Akka Stream?
I don't think you need to create a new ActorPublisher every 2 seconds. This seems redundant and wasteful of memory. Also, I don't think an ActorPublisher is necessary. From what I can tell of the code, your implementation will have an ever growing number of Streams all querying the same data. Each Message from the client will be processed by N different akka Streams and, even worse, N will grow over time.
Iterator For Infinite Loop Querying
You can get the same behavior from your ActorPublisher by using scala's Iterator. It is possible to create an Iterator which continuously queries the client:
//setup the client
val client = {
val sqsClient = new AmazonSQSClient()
sqsClient setRegion (RegionUtils getRegion "us-east-1")
sqsClient
}
val url = client.getQueueUrl(name).getQueueUrl
//single query
def queryClientForMessages : Iterable[Message] = iterableAsScalaIterable {
client receiveMessage (new ReceiveMessageRequest(url).getMessages)
}
def messageListIteartor : Iterator[Iterable[Message]] =
Iterator continually messageListStream
//messages one-at-a-time "on demand", no timer pushing you around
def messageIterator() : Iterator[Message] = messageListIterator flatMap identity
This implementation only queries the client when all previous Messages have been consumed and is therefore truly reactive. No need to keep track of a buffer with fixed size. Your solution needs a buffer because the creation of Messages (via a timer) is de-coupled from the consumption of Messages (via println). In my implementation, creation & consumption are tightly coupled via back-pressure.
Akka Stream Source
You can then use this Iterator generator-function to feed an akka stream Source:
def messageSource : Source[Message, _] = Source fromIterator messageIterator
Flow Formation
And finally you can use this Source to perform the println (As a side note: your flow value is actually a Sink since Flow + Sink = Sink). Using your flow value from the question:
messageSource runWith flow
One akka Stream processing all messages.

Iteratees Error/Exception Handling vs Reactive Streams/akka-stream

Surprise surprise I'm having a few problems with Iteratees and Error handling.
The problem;
Read some bytes from an InputStream (from the network, must be InputStream), do some chunking/grouing on this InputStream (for work distribution), followed by a transformation to turn this into a case class DataBlock(blockNum: Int, data: ByteString) for sending to actors (Array[Bytes] converted to a CompactByteString).
The flow;
InputStream.read -- bytes --> Group -- 1000 byte blocks --> Transform -- DataBlock --> Actors
The code;
class IterateeTest {
val actor: ActorRef = myDataBlockRxActor(...)
val is = new IntputStream(fromBytes...)
val source = Enumerator.fromStream(is)
val chunker = Traversable.takeUpTo[Array[Byte]](1000)
val transform:Iteratee[Array[Byte], Int] = Iteratee.fold[Array[Byte],Int](0) {
(bNum, bytes) => DataBlock(bNum, CompactByteString(bytes)); bNum + 1
}
val fut = source &> chunker |>> transform
}
case class DataBlock(blockNum: Int, data: CompactByteString)
The question;
My current Iteratee code works well. However, I want to be able to handle failures on either side;
When the InputStream read method fails - I want to know how many
bytes/block have been processed successfully and resume reading the
stream from that point. When read in the Enumerator throws an error, the fut just returns the exception, there is no state, so I dont know which block I am up to unless I pass it to the rxing actor (which I dont want to do)
If the output side fails or can no longer receive DataBlock messages because the Actor's buffer is full hold reading from the input stream
How should I do this?
How could I/would I be better of trying this using reactive-streams/Akka-stream (experimental) or scalaz iteratees over Play's iteratees because I need defined error handling?
(1) can be implemented with an Enumeratee.
val (counter, getCount) = {
var count = 0
(Enumeratee.map { x => count += 1; x },
() => count)
}
val fut = source &> counter &> chunker |>> transform
You can then use getCount inside a recover or such on fut.
You get (2) for free with Play Iteratees. No further reads will happen until the Iteratee is ready for more data, and if it fails no more reads will occur. The InputStream is automatically closed when the Enumerator is done, whether from failure or normal termination.