I have following code sample that works fine. I want to add some changes to preserve relation between request and responce. How can I ahieve that?
Rest api flow's materialized value is NotUsed. Is it possible to somehow use Keep.both for that?
// this flow is provided by some third party library that I can't change in place
val someRestApiFlow: Flow[Int, Int, NotUsed] = Flow[Int].mapAsync(10)(x => Future(x + 1))
val digits: Source[Int, NotUsed] = Source(List(1, 2, 3))
val r = digits.via(someRestApiFlow).runForeach(println)
Result is
2
3
4
I want result to be like
1 -> 2
2 -> 3
3 -> 4
You can use a broadcast element to create 2 separate flows. The first output of broadcast goes through someRestApiFlow, the second output of the broadcast goes unmodified. You then zip the output of the someRestApiFlow with the second output of the broadcast flow. Doing that, you have both the input element and the result of its transformation through someRestApiFlow.
digits ---> broadcast --> someRestApiFlow ---> zip --> result
\----------------------/
I have also encountered this kind of cases a couple of times. The only solution I have found is creating a graph using DSL and making use of broadcast and zip stages.
import akka.actor.ActorSystem
import akka.stream.scaladsl.{Broadcast, Flow, GraphDSL, RunnableGraph, Sink, Source, Zip}
import akka.stream.{ActorMaterializer, ClosedShape}
import akka.{Done, NotUsed}
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
object Main extends App {
implicit val system: ActorSystem = ActorSystem("my-system")
implicit val materializer: ActorMaterializer = ActorMaterializer()
val graph = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val src: Source[Int, NotUsed] = Source(List(1, 2, 3))
val someRestApiFlow: Flow[Int, Int, NotUsed] = Flow[Int].mapAsync(10)(x => Future(x + 1))
val out: Sink[(Int, Int), Future[Done]] = Sink.foreach[(Int, Int)](println)
val bcast = builder.add(Broadcast[Int](2))
val zip = builder.add(Zip[Int, Int])
src ~> bcast ~> zip.in0
bcast ~> someRestApiFlow ~> zip.in1
zip.out ~> out
ClosedShape
})
graph.run()
}
What is being done here is we are broadcasting the input to both a zip and a custom flow, and that zip also waits for the result of that custom flow and finally merges them to send sink.
Related
I'm coding a small Akka Streams sample where I want to write elements of a List to a local TXT file
implicit val ec = context.dispatcher
implicit val actorSystem = context.system
implicit val materializer = ActorMaterializer()
val source = Source(List("a", "b", "c"))
.map(char => ByteString(s"${char} \n"))
val runnableGraph = source.toMat(FileIO.toPath(Paths.get("~/Downloads/results.txt")))(Keep.right)
runnableGraph.run()
The file is already created by the location I set in the code.
I do not terminate the actor system, so definitely it has enough time to write all of List elements to the file.
But unfortunately, nothing happens
Use the expanded path to your home directory instead of the tilde (~). For example:
val runnableGraph =
source.toMat(
FileIO.toPath(Paths.get("/home/YourUserName/Downloads/results.txt")))(Keep.right)
runnableGraph.run()
I want to upload file into S3 using Alpakka and at the same time parse it with Tika to obtain its MimeType.
I have 3 parts of graph at the moment:
val fileSource: Source[ByteString, Any] // comes from Akka-HTTP
val fileUpload: Sink[ByteString, Future[MultipartUploadResult]] // created by S3Client from Alpakka
val mimeTypeDetection: Sink[ByteString, Future[MediaType.Binary]] // my implementation using Apache Tika
I would like to obtain both results at one place, something like:
Future[(MultipartUploadResult, MediaType.Binary)]
I have no issue with broadcasting part:
val broadcast = builder.add(Broadcast[ByteString](2))
source ~> broadcast ~> fileUpload
broadcast ~> mimeTypeDetection
However I have a trouble composing Sinks. Methods I found in API and documentation assumes that either combined sinks are of the same type or that I am Zipping Flows, not Sinks.
What is suggested approach in such case?
Two ways:
1) using alsoToMat (easier, no GraphDSL, enough for your example)
val mat1: (Future[MultipartUploadResult], Future[Binary]) =
fileSource
.alsoToMat(fileUpload)(Keep.right)
.toMat(mimeTypeDetection)(Keep.both)
.run()
2) using GraphDSL with custom materialized values (more verbose, more flexible). More info on this in the docs)
val mat2: (Future[MultipartUploadResult], Future[Binary]) =
RunnableGraph.fromGraph(GraphDSL.create(fileUpload, mimeTypeDetection)((_, _)) { implicit builder =>
(fileUpload, mimeTypeDetection) =>
import GraphDSL.Implicits._
val broadcast = builder.add(Broadcast[ByteString](2))
fileSource ~> broadcast ~> fileUpload
broadcast ~> mimeTypeDetection
ClosedShape
}).run()
I am playing with Akka Streams and streaming content from a file using Alpakka. I need to stop the stream after some time so I want to use KillSwitch. But I don't know how to use it because I am using the graph DSL.
My graph looks like this:
val graph = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
source ~> mainFlow ~> sink
ClosedShape
})
graph.run()
I found a solution here: How to abruptly stop an akka stream Runnable Graph?
However, I don't know how to apply it if I'm using the graph DSL. Can you give me some advice?
To surface a materialized value in the GraphDSL, you can pass the stage that materialized to that value to the create method. It is easier explained with an example. In your case:
val switch = KillSwitches.single[Int]
val graph: RunnableGraph[UniqueKillSwitch] =
RunnableGraph.fromGraph(GraphDSL.create(switch) { implicit builder: GraphDSL.Builder[UniqueKillSwitch] => sw =>
import GraphDSL.Implicits._
source ~> mainFlow ~> sw ~> sink
ClosedShape
})
val ks = graph.run()
ks.shutdown()
I'm reading a csv file. I am using Akka Streams to do this so that I can create a graph of actions to perform on each line. I've got the following toy example up and running.
def main(args: Array[String]): Unit = {
implicit val system = ActorSystem("MyAkkaSystem")
implicit val materializer = ActorMaterializer()
val source = akka.stream.scaladsl.Source.fromIterator(Source.fromFile("a.csv").getLines)
val sink = Sink.foreach(println)
source.runWith(sink)
}
The two Source types don't sit easy with me. Is this idiomatic or is there is a better way to write this?
Actually, akka-streams provides a function to directly read from a file.
FileIO.fromPath(Paths.get("a.csv"))
.via(Framing.delimiter(ByteString("\n"), 256, true).map(_.utf8String))
.runForeach(println)
Here, runForeach method is to print the lines. If you have a proper Sink to process these lines, use it instead of this function. For example, if you want to split the lines by ' and print the total number of words in it:
val sink: Sink[String] = Sink.foreach(x => println(x.split(",").size))
FileIO.fromPath(Paths.get("a.csv"))
.via(Framing.delimiter(ByteString("\n"), 256, true).map(_.utf8String))
.to(sink)
.run()
The idiomatic way to read a CSV file with Akka Streams is to use the Alpakka CSV connector. The following example reads a CSV file, converts it to a map of column names (assumed to be the first line in the file) and ByteString values, transforms the ByteString values to String values, and prints each line:
FileIO.fromPath(Paths.get("a.csv"))
.via(CsvParsing.lineScanner())
.via(CsvToMap.toMap())
.map(_.mapValues(_.utf8String))
.runForeach(println)
Try this:
import java.nio.file.Paths
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl._
import akka.util.ByteString
import scala.concurrent.Await
import scala.concurrent.duration._
object ReadStreamApp extends App {
implicit val actorSystem = ActorSystem()
import actorSystem.dispatcher
implicit val flowMaterializer = ActorMaterializer()
val logFile = Paths.get("src/main/resources/a.csv")
val source = FileIO.fromPath(logFile)
val flow = Framing
.delimiter(ByteString(System.lineSeparator()), maximumFrameLength = 512, allowTruncation = true)
.map(_.utf8String)
val sink = Sink.foreach(println)
source
.via(flow)
.runWith(sink)
.andThen {
case _ =>
actorSystem.terminate()
Await.ready(actorSystem.whenTerminated, 1 minute)
}
}
Yeah, it's ok because these are different Sources. But if you don't like scala.io.Source you can read file yourself (which sometimes we have to do e.g. source csv file is zipped) and then parse it using given InputStream like this
StreamConverters.fromInputStream(() => input)
.via(Framing.delimiter(ByteString("\n"), 4096))
.map(_.utf8String)
.collect { line =>
line
}
Having said that consider using Apache Commons CSV with akka-stream. You may end up writing less code :)
Hi I am new to apache Spark, In my use case I will be having 3 inputs ,all present in HDFS, I need to extract the data from file present in hdfs and add two datas and will divide the result with third data, How can i proceed?
Thanks for your quick response.
Something like this should work:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
val conf = new SparkConf().setAppName("spark-scratch").setMaster("local")
val sc = new SparkContext(conf)
val A = sc.textFile("/user/root/spark/cc.dat").map(_.split(",")).map(fc => (fc(3).toInt))
val B = sc.textFile("/user/root/spark/aci.dat").map(_.split(",")).map(fc => (fc(4).toInt))
val C = sc.textFile("/user/root/spark/bta.dat").map(_.split(",")).map(fc => (fc(5).toInt))
val calc = { r: ((Int, Int), Int) =>
val ((a, b), c) = r
a * b * c / 12
}
val result = (A zip B zip C).map(calc)
(it compiles, but I didn't test it)