object TestSource {
implicit val ec = ExecutionContext.global
def main(args: Array[String]): Unit = {
def buildSource = {
println("fresh")
Source(List(() => 1,() => 2,() => 3,() => {
println("crash")
throw new RuntimeException(":(((")
}))
}
val restarting = RestartSource.onFailuresWithBackoff(
minBackoff = Duration(1, SECONDS) ,
maxBackoff = Duration(1, SECONDS),
randomFactor = 0.0,
maxRestarts = 10
)(() => {
buildSource
})
implicit val actorSystem: ActorSystem = ActorSystem()
implicit val executionContext: ExecutionContext = actorSystem.dispatcher
restarting.runWith(Sink.foreach(e => println(e())))
}
}
The code above prints: 1,2,3, crash
Why does my source not restart?
This is pretty much a 1:1 copy of the official documentation.
edit:
I also tried
val rs = RestartSink.withBackoff[() => Int](
Duration(1, SECONDS),
Duration(1, SECONDS),
0.0,
10
)(_)
val rsDone = rs(() => {
println("???")
Sink.foreach(e => println(e()))
})
restarting.runWith(rsDone)
but still get no restarts
This is because the exception is triggered outside of the buildSource Source in the Sink.foreach when you call the functions emitted from the Source.
Try this:
val restarting = RestartSource.onFailuresWithBackoff(
minBackoff = Duration(1, SECONDS) ,
maxBackoff = Duration(1, SECONDS),
randomFactor = 0.0,
maxRestarts = 10
)(() => {
buildSource
.map(e => e()) //call the functions inside the RestartSource
})
That way your exception will happen inside the inner Source wrapped by RestartSource and the restarting mechanism will kick in.
The source doesn't restart because your source never fails, therefore never needs to restart.
The exception gets thrown when Sink.foreach evaluates the function it received.
As artur noted, if you can move the failing bit into the source, you can wrap everything up to the sink in the RestartSource.
While it won't help for this contrived example (as restarting a sink doesn't result in resending previously sent messages), wrapping the sink in a RestartSink may be useful in real-world cases where this sort of thing can happen (off the top of my head, streams from Kafka blowing up because the offset commit in a sink failed (e.g. after a rebalance) should be an example of such a case).
An alternative, if you want to restart the whole stream if any part fails, and the stream materializes as a Future, you can implement retry-with-backoff on the failed future.
Source just never crashes, as already said here.
You are actually crashing you sink, not a source with this statement e => e()
this happens when applying lambda above to last element of source:
java.lang.RuntimeException: :(((
Here's the same stream without unhandled exception in sink:
...
RestartSource.withBackoff(
...
restarting.runWith(
Sink.foreach(e => {
def i: Int = try{ e() } catch {
case t: Throwable =>
println(t)
-1
}
println(i)
})
)
Works perfectly.
Related
I am working on the below stream processing system to grab frames from one source, process, and send to another. I'm using a combination of akka-streams and akka-http through their scapa api. The pipeline is very short but I can't seem to locate where the system decides to stop after precisely 100 requests to the endpoint.
object frameProcessor extends App {
implicit val system: ActorSystem = ActorSystem("VideoStreamProcessor")
val decider: Supervision.Decider = _ => Supervision.Restart
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val dispatcher: ExecutionContextExecutor = system.dispatcher
val http = Http(system)
val sourceConnectionFlow: Flow[HttpRequest, HttpResponse, Future[Http.OutgoingConnection]] = http.outgoingConnection(sourceUri)
val byteFlow: Flow[HttpResponse, Future[ByteString], NotUsed] =
Flow[HttpResponse].map(_.entity.dataBytes.runFold(ByteString.empty)(_ ++ _))
Source.repeat(HttpRequest(uri = sourceUri))
.via(sourceConnectionFlow)
.via(byteFlow)
.map(postFrame)
.runWith(Sink.ignore)
.onComplete(_ => system.terminate())
def postFrame(imageBytes: Future[ByteString]): Unit = {
imageBytes.onComplete{
case Success(res) => system.log.info(s"post frame. ${res.length} bytes")
case Failure(_) => system.log.error("failed to post image!")
}
}
}
Fore reference, I'm using akka-streams version 2.5.19 and akka-http version 10.1.7. No error is thrown, no error codes on the source server where the frames come from, and the program exits with error code 0.
My application.conf is as follows:
logging = "DEBUG"
Always 100 units processed.
Thanks!
Edit
Added logging to the stream like so
.onComplete{
case Success(res) => {
system.log.info(res.toString)
system.terminate()
}
case Failure(res) => {
system.log.error(res.getMessage)
system.terminate()
}
}
Received a connection reset exception but this is inconsistent. The stream completes with Done.
Edit 2
Using .mapAsync(1)(postFrame) I get the same Success(Done) after precisely 100 requests. Additionally, when I check the nginx server access.log and error.log there are only 200 responses.
I had to modify postFrame as follows to run mapAsync
def postFrame(imageBytes: Future[ByteString]): Future[Unit] = {
imageBytes.onComplete{
case Success(res) => system.log.info(s"post frame. ${res.length} bytes")
case Failure(_) => system.log.error("failed to post image!")
}
Future(Unit)
}
I believe I have found the answer on on the Akka docs using delayed restarts with a backoff operator. Instead of sourcing direct from an unstable remote connection, I use RestartSource.withBackoff and not RestartSource.onFailureWithBackoff. The modified stream looks like;
val restartSource = RestartSource.withBackoff(
minBackoff = 100.milliseconds,
maxBackoff = 1.seconds,
randomFactor = 0.2
){ () =>
Source.single(HttpRequest(uri = sourceUri))
.via(sourceConnectionFlow)
.via(byteFlow)
.mapAsync(1)(postFrame)
}
restartSource
.runWith(Sink.ignore)
.onComplete{
x => {
println(x)
system.terminate()
}
}
I was not able to find the source of the problem but it seems this will work.
I am modifying an existing stream graph by adding some retry logic around various functionality. One of those pieces is the source, which in this case happens to be a kafka Consumer.committableSource from the alpakka kafka connector. Downstream, the graph is expecting a type of Source[ConsumerMessage.CommittableMessage[String, AnyRef], Control], but when I wrap the committable source in a RestartSource I end up with Source[ConsumerMessage.CommittableMessage[String, AnyRef], NotUsed]
I tried adding (Keep.both) on the end, but ended up with a compile time error. Here are the two examples for reference:
val restartSource: Source[ConsumerMessage.CommittableMessage[String, AnyRef], NotUsed] = RestartSource.onFailuresWithBackoff(
minBackoff = 3.seconds,
maxBackoff = 60.seconds,
randomFactor = .2
) {() => Consumer.committableSource(consumerSettings, subscription)}
val s: Source[ConsumerMessage.CommittableMessage[String, AnyRef], Control] = Consumer.committableSource(consumerSettings, subscription)
As you have observed, and as discussed in this currently open ticket, the materialized value of the original Source is not exposed in the return value of the wrapping RestartSource. To get around this, try using mapMaterializedValue (disclaimer: I didn't test the following):
val restartSource: Source[ConsumerMessage.CommittableMessage[String, AnyRef], Control] = {
var control: Option[Control] = None
RestartSource.onFailuresWithBackoff(
minBackoff = 3.seconds,
maxBackoff = 60.seconds,
randomFactor = .2
) { () =>
Consumer
.committableSource(consumerSettings, subscription)
.mapMaterializedValue { c =>
control = Some(c)
}
}
.mapMaterializedValue(_ => control)
.collect { case Some(c) => c }
}
You could preMaterialize the Source which will yield the Control like so:
Pair<Consumer.Control, Source<ConsumerMessage.CommittableOffset, NotUsed>> controlSourcePair =
origSrc.preMaterialize(materializer);
Source<ConsumerMessage.CommittableOffset, NotUsed> source =
RestartSource.withBackoff(
Duration.ofSeconds(1),
Duration.ofSeconds(10),
0.2,
20,
controlSourcePair::second);
source
.toMat(Committer.sink(CommitterSettings.create(system)
.withMaxBatch(1)), Keep.both())
.mapMaterializedValue(pair ->
Consumer.createDrainingControl(
new Pair<>(controlSourcePair.first(), pair.second())))
.run(materializer);
Apologies for not providing you with the Scala equivalent.
I have very simple Akka Streams flow which reads msg from Kafka using alpakka, performs some manipulation on msg and indexes it to Elasticsearch.
I'm using CommitableSource, therefore i'm in At-Least-Once strategy. I commit my offset only when index to ES succeed, if it fails I will read again the message because form latest known offset.
val decider: Supervision.Decider = {
case _:Throwable => Supervision.Restart
case _ => Supervision.Restart
}
val config: Config = context.system.settings.config.getConfig("akka.kafka.consumer")
val flow: Flow[CommittableMessage[String, String], Done, NotUsed] =
Flow[CommittableMessage[String,String]].
map(msg => Event(msg.committableOffset,Success(Json.parse(msg.record.value()))))
.mapAsync(10) { event => indexEvent(event.json.get).map(f=> event.copy(json = f))}
.mapAsync(10)(f => {
f.json match {
case Success(_)=> f.committableOffset.commitScaladsl()
case Failure(ex) => throw new StreamFailedException(ex.getMessage,ex)
}
})
val r: Flow[CommittableMessage[String, String], Done, NotUsed] = RestartFlow.onFailuresWithBackoff(
minBackoff = 3.seconds,
maxBackoff = 3.seconds,
randomFactor = 0.2, // adds 20% "noise" to vary the intervals slightly
maxRestarts = 20 // limits the amount of restarts to 20
)(() => {
println("Creating flow")
flow
})
val consumerSettings: ConsumerSettings[String, String] =
ConsumerSettings(config, new StringDeserializer, new StringDeserializer)
.withBootstrapServers("localhost:9092")
.withGroupId("group1")
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
val restartSource: Source[CommittableMessage[String, String], NotUsed] = RestartSource.withBackoff(
minBackoff = 3.seconds,
maxBackoff = 30.seconds,
randomFactor = 0.2, // adds 20% "noise" to vary the intervals slightly
maxRestarts = 20 // limits the amount of restarts to 20
) {() =>
Consumer.committableSource(consumerSettings, Subscriptions.topics("test"))
}
implicit val mat: ActorMaterializer = ActorMaterializer(ActorMaterializerSettings(context.system).withSupervisionStrategy(decider))
restartSource
.via(flow)
.toMat(Sink.ignore)(Keep.both).run()
What I would like to achieve, is to restart entire flow Source -> Flow-> Sink. If from any reason I was no able to index message in Elastic.
I tried the following:
Supervision.Decider - It looks like flow was recreated but no
message was pulled from Kafka, obviously because it remembers it
offset.
RestartSource - doesn't looks ether, because exception happens in flow stage.
RestartFlow - Doesn't help as well because it restarts only Flow, but I need to restart Source from last successful offset.
Is there any elegant way to do that?
You can combine restartable source, flow & sink. Nobody prevents you from doing restartable source/flow/sink for each part of the graph
Update:
code example
val sourceFactory = () => Source(1 to 10).via(Flow.fromFunction(x => { println("problematic flow"); x }))
RestartSource.withBackoff(4.seconds, 4.seconds, 0.2)(sourceFactory)
I have a very simple Akka WebSocket server that pushes lines from a file to a connected client with an interval of 400ms per line. Everything works fine, except for the fact that the web server seems to buffer messages for about a minute before broadcasting them.
So when a client connects, I see at the server end that every 400ms a line is read and pushed to the Sink, but on the client side I get nothing for a minute and then a burst of about 150 messages (corresponding to a minute of messages).
Is there a setting that I'm overlooking?
object WebsocketServer extends App {
implicit val actorSystem = ActorSystem("WebsocketServer")
implicit val materializer = ActorMaterializer()
implicit val executionContext = actorSystem.dispatcher
val file = Paths.get("websocket-server/src/main/resources/EURUSD.txt")
val fileSource =
FileIO.fromPath(file)
.via(Framing.delimiter(ByteString("\n"), Int.MaxValue))
val delayedSource: Source[Strict, Future[IOResult]] =
fileSource
.map { line =>
Thread.sleep(400)
println(line.utf8String)
TextMessage(line.utf8String)
}
def route = path("") {
extractUpgradeToWebSocket { upgrade =>
complete(upgrade.handleMessagesWithSinkSource(
Sink.ignore,
delayedSource)
)
}
}
val bindingFuture = Http().bindAndHandle(route, "localhost", 8080)
bindingFuture.onComplete {
case Success(binding) ⇒
println(s"Server is listening on ws://localhost:8080")
case Failure(e) ⇒
println(s"Binding failed with ${e.getMessage}")
actorSystem.terminate()
}
}
So the approach with Thread.sleep(400) was wrong. I should've used the .throttle mechanic on sources:
val delayedSource: Source[Strict, Future[IOResult]] =
fileSource
.throttle(elements = 1, per = 400.millis)
.map { line =>
println(line.utf8String)
TextMessage(line.utf8String)
}
This fixed the issue.
I read this article on akka streams error handling
http://doc.akka.io/docs/akka/2.5.4/scala/stream/stream-error.html
and wrote this code.
val decider: Supervision.Decider = {
case _: Exception => Supervision.Restart
case _ => Supervision.Stop
}
implicit val actorSystem = ActorSystem()
implicit val actorMaterializer = ActorMaterializer(ActorMaterializerSettings(actorSystem).withSupervisionStrategy(decider))
val source = Source(1 to 10)
val flow = Flow[Int].map{x => if (x != 9) 2 * x else throw new Exception("9!")}
val sink : Sink[Int, Future[Done]] = Sink.foreach[Int](x => println(x))
val graph = RunnableGraph.fromGraph(GraphDSL.create(sink){implicit builder => s =>
import GraphDSL.Implicits._
source ~> flow ~> s.in
ClosedShape
})
val future = graph.run()
future.onComplete{ _ =>
actorSystem.terminate()
}
Await.result(actorSystem.whenTerminated, Duration.Inf)
This works very well .... except that I need to scan the output to see which row did not get processed. Is there a way for me to print/log the row which failed? [Without putting explicit try/catch blocks in each and every flow that I write?]
So for example If I was using actors (as opposed to streams) I could have written a life cycle event of an actor and I could have logged when an actor restarted along with the message which was being processed at the time of restart.
but here I am not using actors explicitly (although they are used internally). Are there life cycle events for a Flow / Source / Sink?
Just a small modification to your code:
val decider: Supervision.Decider = {
case e: Exception =>
println("Exception handled, recovering stream:" + e.getMessage)
Supervision.Restart
case _ => Supervision.Stop
}
If you pass meaningful messages to your exceptions in the stream, the line for example, you can print them in the supervision decider.
I used println to give a quick and short answer, but strongly recommend to use
some logging libraries such as scala-logging