RestartSource masking the materialized value for the wrapped source? - scala

I am modifying an existing stream graph by adding some retry logic around various functionality. One of those pieces is the source, which in this case happens to be a kafka Consumer.committableSource from the alpakka kafka connector. Downstream, the graph is expecting a type of Source[ConsumerMessage.CommittableMessage[String, AnyRef], Control], but when I wrap the committable source in a RestartSource I end up with Source[ConsumerMessage.CommittableMessage[String, AnyRef], NotUsed]
I tried adding (Keep.both) on the end, but ended up with a compile time error. Here are the two examples for reference:
val restartSource: Source[ConsumerMessage.CommittableMessage[String, AnyRef], NotUsed] = RestartSource.onFailuresWithBackoff(
minBackoff = 3.seconds,
maxBackoff = 60.seconds,
randomFactor = .2
) {() => Consumer.committableSource(consumerSettings, subscription)}
val s: Source[ConsumerMessage.CommittableMessage[String, AnyRef], Control] = Consumer.committableSource(consumerSettings, subscription)

As you have observed, and as discussed in this currently open ticket, the materialized value of the original Source is not exposed in the return value of the wrapping RestartSource. To get around this, try using mapMaterializedValue (disclaimer: I didn't test the following):
val restartSource: Source[ConsumerMessage.CommittableMessage[String, AnyRef], Control] = {
var control: Option[Control] = None
RestartSource.onFailuresWithBackoff(
minBackoff = 3.seconds,
maxBackoff = 60.seconds,
randomFactor = .2
) { () =>
Consumer
.committableSource(consumerSettings, subscription)
.mapMaterializedValue { c =>
control = Some(c)
}
}
.mapMaterializedValue(_ => control)
.collect { case Some(c) => c }
}

You could preMaterialize the Source which will yield the Control like so:
Pair<Consumer.Control, Source<ConsumerMessage.CommittableOffset, NotUsed>> controlSourcePair =
origSrc.preMaterialize(materializer);
Source<ConsumerMessage.CommittableOffset, NotUsed> source =
RestartSource.withBackoff(
Duration.ofSeconds(1),
Duration.ofSeconds(10),
0.2,
20,
controlSourcePair::second);
source
.toMat(Committer.sink(CommitterSettings.create(system)
.withMaxBatch(1)), Keep.both())
.mapMaterializedValue(pair ->
Consumer.createDrainingControl(
new Pair<>(controlSourcePair.first(), pair.second())))
.run(materializer);
Apologies for not providing you with the Scala equivalent.

Related

Akka RestartSource does not restart

object TestSource {
implicit val ec = ExecutionContext.global
def main(args: Array[String]): Unit = {
def buildSource = {
println("fresh")
Source(List(() => 1,() => 2,() => 3,() => {
println("crash")
throw new RuntimeException(":(((")
}))
}
val restarting = RestartSource.onFailuresWithBackoff(
minBackoff = Duration(1, SECONDS) ,
maxBackoff = Duration(1, SECONDS),
randomFactor = 0.0,
maxRestarts = 10
)(() => {
buildSource
})
implicit val actorSystem: ActorSystem = ActorSystem()
implicit val executionContext: ExecutionContext = actorSystem.dispatcher
restarting.runWith(Sink.foreach(e => println(e())))
}
}
The code above prints: 1,2,3, crash
Why does my source not restart?
This is pretty much a 1:1 copy of the official documentation.
edit:
I also tried
val rs = RestartSink.withBackoff[() => Int](
Duration(1, SECONDS),
Duration(1, SECONDS),
0.0,
10
)(_)
val rsDone = rs(() => {
println("???")
Sink.foreach(e => println(e()))
})
restarting.runWith(rsDone)
but still get no restarts
This is because the exception is triggered outside of the buildSource Source in the Sink.foreach when you call the functions emitted from the Source.
Try this:
val restarting = RestartSource.onFailuresWithBackoff(
minBackoff = Duration(1, SECONDS) ,
maxBackoff = Duration(1, SECONDS),
randomFactor = 0.0,
maxRestarts = 10
)(() => {
buildSource
.map(e => e()) //call the functions inside the RestartSource
})
That way your exception will happen inside the inner Source wrapped by RestartSource and the restarting mechanism will kick in.
The source doesn't restart because your source never fails, therefore never needs to restart.
The exception gets thrown when Sink.foreach evaluates the function it received.
As artur noted, if you can move the failing bit into the source, you can wrap everything up to the sink in the RestartSource.
While it won't help for this contrived example (as restarting a sink doesn't result in resending previously sent messages), wrapping the sink in a RestartSink may be useful in real-world cases where this sort of thing can happen (off the top of my head, streams from Kafka blowing up because the offset commit in a sink failed (e.g. after a rebalance) should be an example of such a case).
An alternative, if you want to restart the whole stream if any part fails, and the stream materializes as a Future, you can implement retry-with-backoff on the failed future.
Source just never crashes, as already said here.
You are actually crashing you sink, not a source with this statement e => e()
this happens when applying lambda above to last element of source:
java.lang.RuntimeException: :(((
Here's the same stream without unhandled exception in sink:
...
RestartSource.withBackoff(
...
restarting.runWith(
Sink.foreach(e => {
def i: Int = try{ e() } catch {
case t: Throwable =>
println(t)
-1
}
println(i)
})
)
Works perfectly.

Akka Streams recreate stream in case of stage failure

I have very simple Akka Streams flow which reads msg from Kafka using alpakka, performs some manipulation on msg and indexes it to Elasticsearch.
I'm using CommitableSource, therefore i'm in At-Least-Once strategy. I commit my offset only when index to ES succeed, if it fails I will read again the message because form latest known offset.
val decider: Supervision.Decider = {
case _:Throwable => Supervision.Restart
case _ => Supervision.Restart
}
val config: Config = context.system.settings.config.getConfig("akka.kafka.consumer")
val flow: Flow[CommittableMessage[String, String], Done, NotUsed] =
Flow[CommittableMessage[String,String]].
map(msg => Event(msg.committableOffset,Success(Json.parse(msg.record.value()))))
.mapAsync(10) { event => indexEvent(event.json.get).map(f=> event.copy(json = f))}
.mapAsync(10)(f => {
f.json match {
case Success(_)=> f.committableOffset.commitScaladsl()
case Failure(ex) => throw new StreamFailedException(ex.getMessage,ex)
}
})
val r: Flow[CommittableMessage[String, String], Done, NotUsed] = RestartFlow.onFailuresWithBackoff(
minBackoff = 3.seconds,
maxBackoff = 3.seconds,
randomFactor = 0.2, // adds 20% "noise" to vary the intervals slightly
maxRestarts = 20 // limits the amount of restarts to 20
)(() => {
println("Creating flow")
flow
})
val consumerSettings: ConsumerSettings[String, String] =
ConsumerSettings(config, new StringDeserializer, new StringDeserializer)
.withBootstrapServers("localhost:9092")
.withGroupId("group1")
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
val restartSource: Source[CommittableMessage[String, String], NotUsed] = RestartSource.withBackoff(
minBackoff = 3.seconds,
maxBackoff = 30.seconds,
randomFactor = 0.2, // adds 20% "noise" to vary the intervals slightly
maxRestarts = 20 // limits the amount of restarts to 20
) {() =>
Consumer.committableSource(consumerSettings, Subscriptions.topics("test"))
}
implicit val mat: ActorMaterializer = ActorMaterializer(ActorMaterializerSettings(context.system).withSupervisionStrategy(decider))
restartSource
.via(flow)
.toMat(Sink.ignore)(Keep.both).run()
What I would like to achieve, is to restart entire flow Source -> Flow-> Sink. If from any reason I was no able to index message in Elastic.
I tried the following:
Supervision.Decider - It looks like flow was recreated but no
message was pulled from Kafka, obviously because it remembers it
offset.
RestartSource - doesn't looks ether, because exception happens in flow stage.
RestartFlow - Doesn't help as well because it restarts only Flow, but I need to restart Source from last successful offset.
Is there any elegant way to do that?
You can combine restartable source, flow & sink. Nobody prevents you from doing restartable source/flow/sink for each part of the graph
Update:
code example
val sourceFactory = () => Source(1 to 10).via(Flow.fromFunction(x => { println("problematic flow"); x }))
RestartSource.withBackoff(4.seconds, 4.seconds, 0.2)(sourceFactory)

Akka Streams: Concatenating streams that initialise resources

I have two streams: A and B.
A is writing a file to disk and appends its elements during its execution.
B is reading that file and does some processing with the data.
Only AFTER A is done with processing and writing its data to the file, I want to start with B.
I tried to concat the two streams with:
A.concat(
Source.lazily { () =>
println("B is getting initialised")
getStreamForB()
}
)
But this is already initialising B BEFORE A has finished.
There is a ticket tracking the fact that Source#concat does not support lazy materialization. That ticket mentions the following work-around:
implicit class SourceLazyOps[E, M](val src: Source[E, M]) {
def concatLazy[M1](src2: => Source[E, M1]): Source[E, NotUsed] =
Source(List(() => src, () => src2)).flatMapConcat(_())
}
Applying the above implicit class to your case:
A.concatLazy(
Source.lazily { () =>
println("B is getting initialised")
getStreamForB()
}
)
The FileIO.toPath method will materialize the stream into a Future[IOResult]. If you are working with stream A that is writing to a file:
val someDataSource : Source[ByteString, _] = ???
val filePath : Path = ???
val fileWriteOptions : Set[OpenOption] = ???
val A : Future[IOResult] =
someDataSource
.to(FileIO.toPath(filePath, fileWriteOptions))
.run()
You can use the materialized Future to kick off your stream B once the writing is completed:
val fileReadOptions : Set[OpenOption] = ???
val someProcessingWithTheDataOfB : Sink[ByteString, _] = ???
A foreach { _ =>
val B : Future[IOResult] =
FileIO
.fromPath(filePath, fileReadOptions)
.to(someProcessingWithTheDataOfB)
.run()
}
Similarly, you could do some testing of the IOResult before doing the reading to make sure there were no failures during the writing process:
A.filter(ioResult => ioResult.status.isSuccess)
.foreach { _ =>
val B : Future[IOResult] =
FileIO
.fromPath(filePath, readOptions)
.to(someProcessingOfB)
.run()
}

File Upload and processing using akka-http websockets

I'm using some sample Scala code to make a server that receives a file over websocket, stores the file temporarily, runs a bash script on it, and then returns stdout by TextMessage.
Sample code was taken from this github project.
I edited the code slightly within echoService so that it runs another function that processes the temporary file.
object WebServer {
def main(args: Array[String]) {
implicit val actorSystem = ActorSystem("akka-system")
implicit val flowMaterializer = ActorMaterializer()
val interface = "localhost"
val port = 3000
import Directives._
val route = get {
pathEndOrSingleSlash {
complete("Welcome to websocket server")
}
} ~
path("upload") {
handleWebSocketMessages(echoService)
}
val binding = Http().bindAndHandle(route, interface, port)
println(s"Server is now online at http://$interface:$port\nPress RETURN to stop...")
StdIn.readLine()
binding.flatMap(_.unbind()).onComplete(_ => actorSystem.shutdown())
println("Server is down...")
}
implicit val actorSystem = ActorSystem("akka-system")
implicit val flowMaterializer = ActorMaterializer()
val echoService: Flow[Message, Message, _] = Flow[Message].mapConcat {
case BinaryMessage.Strict(msg) => {
val decoded: Array[Byte] = msg.toArray
val imgOutFile = new File("/tmp/" + "filename")
val fileOuputStream = new FileOutputStream(imgOutFile)
fileOuputStream.write(decoded)
fileOuputStream.close()
TextMessage(analyze(imgOutFile))
}
case BinaryMessage.Streamed(stream) => {
stream
.limit(Int.MaxValue) // Max frames we are willing to wait for
.completionTimeout(50 seconds) // Max time until last frame
.runFold(ByteString(""))(_ ++ _) // Merges the frames
.flatMap { (msg: ByteString) =>
val decoded: Array[Byte] = msg.toArray
val imgOutFile = new File("/tmp/" + "filename")
val fileOuputStream = new FileOutputStream(imgOutFile)
fileOuputStream.write(decoded)
fileOuputStream.close()
Future(Source.single(""))
}
TextMessage(analyze(imgOutFile))
}
private def analyze(imgfile: File): String = {
val p = Runtime.getRuntime.exec(Array("./run-vision.sh", imgfile.toString))
val br = new BufferedReader(new InputStreamReader(p.getInputStream, StandardCharsets.UTF_8))
try {
val result = Stream
.continually(br.readLine())
.takeWhile(_ ne null)
.mkString
result
} finally {
br.close()
}
}
}
}
During testing using Dark WebSocket Terminal, case BinaryMessage.Strict works fine.
Problem: However, case BinaryMessage.Streaming doesn't finish writing the file before running the analyze function, resulting in a blank response from the server.
I'm trying to wrap my head around how Futures are being used here with the Flows in Akka-HTTP, but I'm not having much luck outside trying to get through all the official documentation.
Currently, .mapAsync seems promising, or basically finding a way to chain futures.
I'd really appreciate some insight.
Yes, mapAsync will help you in this occasion. It is a combinator to execute Futures (potentially in parallel) in your stream, and present their results on the output side.
In your case to make things homogenous and make the type checker happy, you'll need to wrap the result of the Strict case into a Future.successful.
A quick fix for your code could be:
val echoService: Flow[Message, Message, _] = Flow[Message].mapAsync(parallelism = 5) {
case BinaryMessage.Strict(msg) => {
val decoded: Array[Byte] = msg.toArray
val imgOutFile = new File("/tmp/" + "filename")
val fileOuputStream = new FileOutputStream(imgOutFile)
fileOuputStream.write(decoded)
fileOuputStream.close()
Future.successful(TextMessage(analyze(imgOutFile)))
}
case BinaryMessage.Streamed(stream) =>
stream
.limit(Int.MaxValue) // Max frames we are willing to wait for
.completionTimeout(50 seconds) // Max time until last frame
.runFold(ByteString(""))(_ ++ _) // Merges the frames
.flatMap { (msg: ByteString) =>
val decoded: Array[Byte] = msg.toArray
val imgOutFile = new File("/tmp/" + "filename")
val fileOuputStream = new FileOutputStream(imgOutFile)
fileOuputStream.write(decoded)
fileOuputStream.close()
Future.successful(TextMessage(analyze(imgOutFile)))
}
}

How to switch between multiple Sources?

Suppose I have two infinite sources of the same type witch could be connected to the one Graph. I want to switch between them from outside already materialized graph, might be the same way as it possible to shutdown one of them with KillSwitch.
val source1: Source[ByteString, NotUsed] = ???
val source2: Source[ByteString, NotUsed] = ???
val (switcher: Switcher, source: Source[ByteString, NotUsed]) =
Source.combine(source1,source2).withSwitcher.run()
switcher.switch()
By default I want to use source1 and after switch I want to consume data from source2
source1
\
switcher ~> source
source2
Is it possible to implement this logic with Akka Streams?
Ok, after some time I found the solution.
So here I can use the same principle as we have in VLAN. I just need to tag my sources and then pass them through MergeHub. After that it's easy to filter those sources by tag and produce right result as Source.
All that I need to switch from one to another Source is a change of filter condition.
source1.map(s => (tag1, s))
\
MergeHub.filter(_._1 == tagX).map(_._2) -> Source
/
source2.map(s => (tag2, s))
Here is some example:
object SomeSource {
private var current = "tag1"
val source1: Source[ByteString, NotUsed] = ???
val source2: Source[ByteString, NotUsed] = ???
def switch = {
current = if (current == "tag1") "tag2" else "tag1"
}
val (sink: Sink[(String, ByteString), NotUsed],
source: Source[ByteString, NotUsed]) =
MergeHub.source[(String, ByteString)]
.filter(_._1 == current)
.via(Flow[(String, ByteString)].map(_._2))
.toMat(BroadcastHub.sink[ByteString])(Keep.both).run()
source1.map(s => ("tag1", s)).runWith(sink)
source2.map(s => ("tag2", s)).runWith(sink)
}
SomeSource.source // do something with Source
SomeSource.switch() // then switch