I have the following stream, that never reach the map after flatMapConcat.
private def stream[A](ref: ActorRef[ServerHealthStreamer])(implicit system: ActorSystem[A])
: KillSwitch = {
implicit val materializer = ActorMaterializer()
implicit val dispatcher = materializer.executionContext
system.log.info("=============> Start KafkaDetectorStream <=============")
val addr = system
.settings
.config
.getConfig("kafka")
.getString("servers")
val sink: Sink[ServerHealthEvent, NotUsed] =
ActorSink.actorRefWithAck[ServerHealthEvent, ServerHealthStreamer, Ack](
ref = ref,
onCompleteMessage = Complete,
onFailureMessage = Fail.apply,
messageAdapter = Message.apply,
onInitMessage = Init.apply,
ackMessage = Ack)
Source.tick(1.seconds, 5.seconds, NotUsed)
.flatMapConcat(_ => Source.fromFuture(health(addr)))
.map {
case true =>
KafkaActiveConfirmed
case false =>
KafkaInactiveConfirmed
}
.viaMat(KillSwitches.single)(Keep.right)
.to(sink)
.run()
}
private def health(server: String)(implicit executor: ExecutionContext): Future[Boolean] = {
val props = new Properties
props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, server)
props.put(AdminClientConfig.CONNECTIONS_MAX_IDLE_MS_CONFIG, "10000")
props.put(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG, "5000")
Future {
AdminClient
.create(props)
.listTopics()
.names()
.get()
}
.map(_ => true)
.recover {
case _: Throwable => false
}
}
What I mean is, that this part:
.map {
case true =>
KafkaActiveConfirmed
case false =>
KafkaInactiveConfirmed
}
never gets executed and I do not know the reason. The method health executes as expected.
Try to add .log between flatMapConcat and map to see emited element. log can else log errors and stream cancelation.
https://doc.akka.io/docs/akka/current/stream/operators/Source-or-Flow/log.html
Note, .log using implicit logger
And your .flatMapConcat(_ => Source.fromFuture(health(addr))) seams triky,
try .mapAsyncUnordered(1)(_ => health(addr))
Related
I am beginner in akka and I have an problem statement to work with. I have an akka flow that's reading Kafka Events from some topic and doing some transformation before creating Commitable offset of the message.
I am not sure the best way to add a akka sink on top of this code to store the transformed events in some DB
def eventTransform : Flow[KafkaMessage,CommittableRecord[Either[Throwable,SomeEvent]],NotUsed]
def processEvents
: Flow[KafkaMessage, ConsumerMessage.CommittableOffset, NotUsed] =
Flow[KafkaMessage]
.via(eventTransform)
.filter({ x =>
x.value match {
case Right(event: SomeEvent) =>
event.status != "running"
case Left(_) => false
}
})
.map(_.message.committableOffset)
This is my akka source calling the akka flow
private val consumerSettings: ConsumerSettings[String, String] = ConsumerSettings(
system,
new StringDeserializer,
new StringDeserializer,
)
.withGroupId(groupId)
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
private val committerSettings: CommitterSettings = CommitterSettings(system)
private val control = new AtomicReference[Consumer.Control](Consumer.NoopControl)
private val restartableSource = RestartSource
.withBackoff(restartSettings) { () =>
Consumer
.committableSource(consumerSettings, Subscriptions.topics(topicIn))
.mapMaterializedValue(control.set)
.via(processEvents) // calling the flow here
}
restartableSource
.toMat(Committer.sink(committerSettings))(Keep.both)
.run()
def api(): Behavior[Message] =
Behaviors.receive[Message] { (_, message) =>
message match {
case Stop =>
context.pipeToSelf(control.get().shutdown())(_ => Stopped)
Behaviors.same
case Stopped =>
Behaviors.stopped
}
}
.receiveSignal {
case (_, _ #(PostStop | PreRestart)) =>
control.get().shutdown()
Behaviors.same
}
}
Consider the following class:
class MongoDumpService #Inject()(eventsDao: EventDAO)(implicit val ec: ExecutionContext, mat: Materializer) extends LazyLogging {
private[services] def toAssetsWriterSink: Sink[List[Asset], FileDetails] = ParquetService.toParquetSingleFile[List[Asset]](AppConfig.AssetsFileName)
private[services] def toExpenseWriterSink: Sink[List[Expense], FileDetails] = ParquetService.toParquetSingleFile[List[Expense]](AppConfig.ExpensesFileName)
private[services] def toReportsWriterSink: Sink[List[Report], FileDetails] = ParquetService.toParquetSingleFile[List[Report]](AppConfig.ReportsFileName)
private[services] def toTransactionsWriterSink: Sink[List[Transaction], FileDetails] = ParquetService.toParquetSingleFile[List[Transaction]](AppConfig.TransactionsFileName)
private[services] def toEventsWriterSink: Sink[PacificOriginalEvent, FileDetails] = ParquetService.toParquetSingleFile[PacificOriginalEvent](AppConfig.PacificOriginalEventFileName)
def createMongoDump(recordingId: BSONObjectID, maxDocs: Option[Int] = None): List[FileDetails] = RunnableGraph.fromGraph(
GraphDSL.create(toAssetsWriterSink, toExpenseWriterSink, toReportsWriterSink, toTransactionsWriterSink, toEventsWriterSink, sharedKillSwitch.flow[Event])((f1,f2,f3,f4,f5,_) => List(f1,f2,f3,f4,f5)) {
import GraphDSL.Implicits._
implicit builder =>
(writeAssets, writeExpenses, writeReports, writeTransactions, writerEvents, sw) =>
val source = builder.add(eventsDao.getEventsSource(recordingId.stringify, maxDocs))
val broadcast = builder.add(Broadcast[Event](5))
source ~> sw ~> broadcast
broadcast.out(Write.PacificEvents).map(_.pacificEvent) ~> writerEvents
broadcast.out(Write.Expenses).filter(_.expenses.isDefined).map(_.expenses.get) ~> writeExpenses
broadcast.out(Write.Assets).filter(_.assets.isDefined).map(_.assets.get) ~> writeAssets
broadcast.out(Write.Reports).filter(_.reports.isDefined).map(_.reports.get) ~> writeReports
broadcast.out(Write.Transactions).filter(_.transactions.isDefined).map(_.transactions.get) ~> writeTransactions
ClosedShape
}).run()
}
This code is return List[FileDetails]], its actually writing Event Object which includes some fields of Option[List[T]] to the file its supposed to be written, for example fieldA ~> writerFieldA and so on
the problem is as follows:
I want to Wait Until this operation will be finished, since this will upload to S3 files with 0KB:
private[actors] def uploadDataToS3(recording: Recording) = {
logger.info(s"Uploading data to S3 with recordingId: ${recording._id.stringify}")
val details = mongoDumpService.createMongoDump(recording._id, recording.limit)
s3Service.uploadFiles(recording._id.stringify, details)
}
Without graph DSL i can do runWith witch returns Future[..]
How can i achieve this with graphDSL? ( I want to return Future[List[FileDetails]]]
Edit :
Added toParquetSingleFile
def toParquetSingleFile[In](fileName: String)(implicit
ec: ExecutionContext,
mat: Materializer,
writes: Writes[In]): Sink[In, FileDetails] = {
val absolutePath = TEMP_DIRECTORY + File.separator + s"$fileName.${FileExtension.PARQUET.toSuffix}"
toJsString[In]
.log(s"ParquetService", _ => s"[✍️] - Writing element toParquetSingleFile for path: $absolutePath ...")
.withAttributes(Attributes.logLevels(onFailure = LogLevels.Error, onFinish = LogLevels.Off, onElement = LogLevels.Info))
.to(
ParquetStreams.toParquetSingleFile(
path = absolutePath,
options = ParquetWriter.Options(
writeMode = ParquetFileWriter.Mode.OVERWRITE,
compressionCodecName = CompressionCodecName.GZIP))
).mapMaterializedValue(_ => FileDetails(absolutePath, FileExtension.PARQUET))
}
Solution:
def toParquetSingleFile[In](fileName: String)(implicit ec: ExecutionContext, mat: Materializer, writes: Writes[In]): Sink[In, Future[Option[FileDetails]]] = {
val absolutePath = TEMP_DIRECTORY + File.separator + s"$fileName.${FileExtension.PARQUET.toSuffix}"
toJsString[In]
.toMat(
Sink.lazySink(() => ParquetStreams.toParquetSingleFile(
path = absolutePath,
options = ParquetWriter.Options(
writeMode = ParquetFileWriter.Mode.OVERWRITE,
compressionCodecName = CompressionCodecName.GZIP))
)
)(Keep.right)
.mapMaterializedValue(_.flatten
.map { _ =>
logger.info(s"[ParquetService] - [✍️] Writing file: [$absolutePath] Finished!")
Some(FileDetails(absolutePath, FileExtension.PARQUET))
}
.recover {
case _: NeverMaterializedException => Option.empty[FileDetails]
}
)
}
As I see, this toParquetSingleFile creates a Sink with a Future[Done] as materialized value. But, in your function you are returning via mapMaterializedValue one FileDetails instance. I think that the mapMaterializedValue function that you are using accepts a function of
mapMaterializedValue(mat: Future[Done] => Mat2)
So if you map the Future[Done] to a Future[FileDetails] you will have a List[Future[FileDetails]] that you can flatten using Future sequence operation or other approach to get the Future[List[FileDetails]]
Trying to simulate your scenario, you have a function that create a Sink that writes a file and materializes a Future[Done]:
case class FileDetails(absPath: String, fileExtension: Int)
def sink[In] : Sink[In, Done] = ???
remove the mapMaterializedValue from your function and you will have something like the above.
Then, create a function that maps that materialized value:
def mapMatValue[In](in: Sink[In, Future[Done]]) =
in.mapMaterializedValue(result => result.map(_ => FileDetails("path", 0))
Using that, your createMongoDump should return Sink[In, List[Future[FileDetails]]
And finally, use Future.sequence(list) to obtain a Future[List[Future.sequence]]. You could use traverse too.
I'm new to Akka Stream. I used following code for CSV parsing.
class CsvParser(config: Config)(implicit system: ActorSystem) extends LazyLogging with NumberValidation {
import system.dispatcher
private val importDirectory = Paths.get(config.getString("importer.import-directory")).toFile
private val linesToSkip = config.getInt("importer.lines-to-skip")
private val concurrentFiles = config.getInt("importer.concurrent-files")
private val concurrentWrites = config.getInt("importer.concurrent-writes")
private val nonIOParallelism = config.getInt("importer.non-io-parallelism")
def save(r: ValidReading): Future[Unit] = {
Future()
}
def parseLine(filePath: String)(line: String): Future[Reading] = Future {
val fields = line.split(";")
val id = fields(0).toInt
try {
val value = fields(1).toDouble
ValidReading(id, value)
} catch {
case t: Throwable =>
logger.error(s"Unable to parse line in $filePath:\n$line: ${t.getMessage}")
InvalidReading(id)
}
}
val lineDelimiter: Flow[ByteString, ByteString, NotUsed] =
Framing.delimiter(ByteString("\n"), 128, allowTruncation = true)
val parseFile: Flow[File, Reading, NotUsed] =
Flow[File].flatMapConcat { file =>
val src = FileSource.fromFile(file).getLines()
val source : Source[String, NotUsed] = Source.fromIterator(() => src)
// val gzipInputStream = new GZIPInputStream(new FileInputStream(file))
source
.mapAsync(parallelism = nonIOParallelism)(parseLine(file.getPath))
}
val computeAverage: Flow[Reading, ValidReading, NotUsed] =
Flow[Reading].grouped(2).mapAsyncUnordered(parallelism = nonIOParallelism) { readings =>
Future {
val validReadings = readings.collect { case r: ValidReading => r }
val average = if (validReadings.nonEmpty) validReadings.map(_.value).sum / validReadings.size else -1
ValidReading(readings.head.id, average)
}
}
val storeReadings: Sink[ValidReading, Future[Done]] =
Flow[ValidReading]
.mapAsyncUnordered(concurrentWrites)(save)
.toMat(Sink.ignore)(Keep.right)
val processSingleFile: Flow[File, ValidReading, NotUsed] =
Flow[File]
.via(parseFile)
.via(computeAverage)
def importFromFiles = {
implicit val materializer = ActorMaterializer()
val files = importDirectory.listFiles.toList
logger.info(s"Starting import of ${files.size} files from ${importDirectory.getPath}")
val startTime = System.currentTimeMillis()
val balancer = GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val balance = builder.add(Balance[File](concurrentFiles))
val merge = builder.add(Merge[ValidReading](concurrentFiles))
(1 to concurrentFiles).foreach { _ =>
balance ~> processSingleFile ~> merge
}
FlowShape(balance.in, merge.out)
}
Source(files)
.via(balancer)
.withAttributes(ActorAttributes.supervisionStrategy { e =>
logger.error("Exception thrown during stream processing", e)
Supervision.Resume
})
.runWith(storeReadings)
.andThen {
case Success(_) =>
val elapsedTime = (System.currentTimeMillis() - startTime) / 1000.0
logger.info(s"Import finished in ${elapsedTime}s")
case Failure(e) => logger.error("Import failed", e)
}
}
}
I wanted to to use Akka HTTP which would give all ValidReading entities parsed from CSV but I couldn't understand on how would I do that.
The above code fetches file from server and parse each lines to generate ValidReading.
How can I pass/upload CSV via akka-http, parse the file and stream the resulted response back to the endpoint?
The "essence" of the solution is something like this:
import akka.http.scaladsl.server.Directives._
val route = fileUpload("csv") {
case (metadata, byteSource) =>
val source = byteSource.map(x => x)
complete(HttpResponse(entity = HttpEntity(ContentTypes.`text/csv(UTF-8)`, source)))
}
You detect that the uploaded thing is a multipart-form-data with a chunk named "csv". You get the byteSource from that. Do the calculation (insert your logic to the .map(x=>x) part). Convert your data back to ByteString. Complete the request with the new source. This will make your endoint like a proxy.
I am new to Akka and developed a sample Akka WebSocket server that streams a file's contents to clients using BroadcastHub (based on a sample from the Akka docs).
How can I measure the throughput (messages/second), assuming the clients are consuming as fast as the server?
// file source
val fileSource = FileIO.fromPath(Paths.get(path)
// Akka file source
val theFileSource = fileSource
.toMat(BroadcastHub.sink)(Keep.right)
.run
//Akka kafka file source
lazy val kafkaSourceActorStream = {
val (kafkaSourceActorRef, kafkaSource) = Source.actorRef[String](Int.MaxValue, OverflowStrategy.fail)
.toMat(BroadcastHub.sink)(Keep.both).run()
Consumer.plainSource(consumerSettings, Subscriptions.topics("perf-test-topic"))
.runForeach(record => kafkaSourceActorRef ! record.value().toString)
}
def logicFlow: Flow[String, String, NotUsed] = Flow.fromSinkAndSource(Sink.ignore, theFileSource)
val websocketFlow: Flow[Message, Message, Any] = {
Flow[Message]
.collect {
case TextMessage.Strict(msg) => Future.successful(msg)
case _ => println("ignore streamed message")
}
.mapAsync(parallelism = 2)(identity)
.via(logicFlow)
.map { msg: String => TextMessage.Strict(msg) }
}
val fileRoute =
path("file") {
handleWebSocketMessages(websocketFlow)
}
}
def startServer(): Unit = {
bindingFuture = Http().bindAndHandle(wsRoutes, HOST, PORT)
log.info(s"Server online at http://localhost:9000/")
}
def stopServer(): Unit = {
bindingFuture
.flatMap(_.unbind())
.onComplete{
_ => system.terminate()
log.info("terminated")
}
}
//ws client
def connectToWebSocket(url: String) = {
println("Connecting to websocket: " + url)
val (upgradeResponse, closed) = Http().singleWebSocketRequest(WebSocketRequest(url), websocketFlow)
val connected = upgradeResponse.flatMap{ upgrade =>
if(upgrade.response.status == StatusCodes.SwitchingProtocols )
{
println("Web socket connection success")
Future.successful(Done)
}else {
println("Web socket connection failed with error: {}", upgrade.response.status)
throw new RuntimeException(s"Web socket connection failed: ${upgrade.response.status}")
}
}
connected.onComplete { msg =>
println(msg)
}
}
def websocketFlow: Flow[Message, Message, _] = {
Flow.fromSinkAndSource(printFlowRate, Source.maybe)
}
lazy val printFlowRate =
Flow[Message]
.alsoTo(fileSink("output.txt"))
.via(flowRate(1.seconds))
.to(Sink.foreach(rate => println(s"$rate")))
def flowRate(sampleTime: FiniteDuration) =
Flow[Message]
.conflateWithSeed(_ ⇒ 1){ case (acc, _) ⇒ acc + 1 }
.zip(Source.tick(sampleTime, sampleTime, NotUsed))
.map(_._1.toDouble / sampleTime.toUnit(SECONDS))
def fileSink(file: String): Sink[Message, Future[IOResult]] = {
Flow[Message]
.map{
case TextMessage.Strict(msg) => msg
case TextMessage.Streamed(stream) => stream.runFold("")(_ + _).flatMap(msg => Future.successful(msg))
}
.map(s => ByteString(s + "\n"))
.toMat(FileIO.toFile(new File(file)))(Keep.right)
}
You could attach a throughput-measuring stream to your existing stream. Here is an example, inspired by this answer, that prints the number of integers that are emitted from the upstream source every second:
val rateSink = Flow[Int]
.conflateWithSeed(_ => 0){ case (acc, _) => acc + 1 }
.zip(Source.tick(1.second, 1.second, NotUsed))
.map(_._1)
.toMat(Sink.foreach(i => println(s"$i elements/second")))(Keep.right)
In the following example, we attach the above sink to a source that emits the integers 1 to 10 million. To prevent the rate-measuring stream from interfering with the main stream (which, in this case, simply converts every integer to a string and returns the last string processed as part of the materialized value), we use wireTapMat:
val (rateFut, mainFut) = Source(1 to 10000000)
.wireTapMat(rateSink)(Keep.right)
.map(_.toString)
.toMat(Sink.last[String])(Keep.both)
.run() // (Future[Done], Future[String])
rateFut onComplete {
case Success(x) => println(s"rateFut completed: $x")
case Failure(_) =>
}
mainFut onComplete {
case Success(s) => println(s"mainFut completed: $s")
case Failure(_) =>
}
Running the above sample prints something like the following:
0 elements/second
2597548 elements/second
3279052 elements/second
mainFut completed: 10000000
3516141 elements/second
607254 elements/second
rateFut completed: Done
If you don't need a reference to the materialized value of rateSink, use wireTap instead of wireTapMat. For example, attaching rateSink to your WebSocket flow could look like the following:
val websocketFlow: Flow[Message, Message, Any] = {
Flow[Message]
.wireTap(rateSink) // <---
.collect {
case TextMessage.Strict(msg) => Future.successful(msg)
case _ => println("ignore streamed message")
}
.mapAsync(parallelism = 2)(identity)
.via(logicFlow)
.map { msg: String => TextMessage.Strict(msg) }
}
wireTap is defined on both Source and Flow.
Where I last worked I implemented a performance benchmark of this nature.
Basically, it meant creating a simple client app that consumes messages from the websocket and outputs some metrics. The natural choice was to implement the client using akka-http client-side support for websockets. See:
https://doc.akka.io/docs/akka-http/current/client-side/websocket-support.html#singlewebsocketrequest
Then we used the micrometer library to expose metrics to Prometheus, which was our tool of choice for reporting and charting.
https://github.com/micrometer-metrics
https://micrometer.io/docs/concepts#_meters
I am using cake solution Akka client for scala and Kafka. While I am creating a KafkaProducerActor actor and trying to send message using ask pattern and return future and perform some operations, but every time, I am facing ask timeout exception. Below is my code:
class SimpleAkkaProducer (config: Config, system: ActorSystem) {
private val producerConf = KafkaProducer.
Conf(config,
keySerializer = new StringSerializer,
valueSerializer = new StringSerializer)
val actorRef = system.actorOf(KafkaProducerActor.props(producerConf))
def sendMessageWayOne(record: ProducerRecords[String, String]) = {
actorRef ! record
}
def sendMessageWayTwo(record: ProducerRecords[String, String]) = {
implicit val timeout = Timeout(100.seconds)
val future = (actorRef ? record).mapTo[String]
future onComplete {
case Success(data) => println(s" >>>>>>>>>>>> ${data}")
case Failure(ex) => ex.printStackTrace()
}
}
}
object SimpleAkkaProducer {
def main(args: Array[String]): Unit = {
val system = ActorSystem("KafkaProducerActor")
val config = ConfigFactory.defaultApplication()
val simpleAkkaProducer = new SimpleAkkaProducer(config, system)
val topic = config.getString("akka.topic")
val messageOne = ProducerRecords.fromKeyValues[String, String](topic,
Seq((Some("Topics"), "First Message")), None, None)
simpleAkkaProducer.sendMessageWayOne(messageOne)
simpleAkkaProducer.sendMessageWayTwo(messageOne)
}
}
Following is exception :
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://KafkaProducerActor/user/$a#-1520717141]] after [100000 ms]. Sender[null] sent message of type "cakesolutions.kafka.akka.ProducerRecords".
at akka.pattern.PromiseActorRef$.$anonfun$apply$1(AskSupport.scala:604)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:864)
at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:109)
at scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:862)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:745)
The producer actor only responds to the sender, if you specify the successResponse and failureResponse values in the ProducerRecords to be something other than None. The successResponse value is sent back to the sender when the Kafka write succeeds, and failureResponse value is sent back when the Kafka write fails.
Example:
val record = ProducerRecords.fromKeyValues[String, String](
topic = topic,
keyValues = Seq((Some("Topics"), "First Message")),
successResponse = Some("success"),
failureResponse = Some("failure")
)
val future = (actorRef ? record).mapTo[String]
future onComplete {
case Success("success") => println("Send succeeded!")
case Success("failure") => println("Send failed!")
case Success(data) => println(s"Send result: $data")
case Failure(ex) => ex.printStackTrace()
}