Convert a Source to Flow in Scala - scala

How to convert a Source to Flow?
Input: Source[ByteString,NotUsed] a
Intermediary Step: Call an API which returns an InputStream
Output: Flow[ByteString,ByteString,NotUsed]
I am doing it as:
Type of input is = Source[ByteString,NotUsed]
val sink: Sink[ByteString,InputStream] = StreamConverters.asInputStream()
val output: InputStream = <API CALL>
val mySource: Source[ByteString,Future[IOResult]] = StreamConverters.fromInputStream(() => output)
val myFlow: Flow[ByteString,ByteString,NotUsed] = Flow.fromSinkAndSource(sink,source)
When I use the above Flow in the source it returns an empty result. Can someone help me figure out of I am doing it right?

I'm not sure tu fully grasp what you want to achieve but maybe this is a use case for flatMapConcat:
def readInputstream(bs: ByteString): Source[ByteString, Future[IOResult]] =
// Get some IS from the ByteString
StreamConverters.fromInputStream(() => ???)
val myFlow: Flow[ByteString, ByteString, NotUsed] =
Flow.flatMapConcat(bs => readInputstream(bs))
// And use it like this:
val source: Source[ByteString] = ???
source
.via(myFlow)
.to(???)

Related

How to send multiple files to kafka producer using akka stream in Scala

I am trying to send multiple data to kafka producer using akka stream , meanwhile I wrote the producer itself , but struggling of how to use akka-streamIO in order to get multiple files which will be the data I want to send to my kafka Producer this is my code:
object App {
def main(args: Array[String]): Unit = {
val file = Paths.get("233339.8.1231731728115136.1722327129833578.log")
// val file = Paths.get("example.csv")
//
// val foreach: Future[IOResult] = FileIO.fromPath(file)
// .to(Sink.ignore)
// .run()
println("Hello from producer")
implicit val system:ActorSystem = ActorSystem("producer-example")
implicit val materializer:Materializer = ActorMaterializer()
val producerSettings = ProducerSettings(system,new StringSerializer,new StringSerializer)
val done: Future[Done] =
Source(1 to 955)
.map(value => new ProducerRecord[String, String]("test-topic", s"$file : $value"))
.runWith(Producer.plainSink(producerSettings))
implicit val ec: ExecutionContextExecutor = system.dispatcher
done onComplete {
case Success(_) => println("Done"); system.terminate()
case Failure(err) => println(err.toString); system.terminate()
}
}
}
Given multiple file names:
val fileNames : Iterable[String] = ???
It is possible to create a Source that emits the contents of the files concatenated together using flatMapConcat:
val chunkSize = 8192
val chunkSource : Source[ByteString, _] =
Source.apply(fileNames)
.map(fileName => Paths get fileName)
.flatMapConcat(path => FileIO.fromPath(path, chunkSize))
This will emit fixed size ByteString values that are all chunkSize length, except for possibly the last value which may be smaller.
If you want to breakup the lines by some delimiter then you can use Framing:
val delimiter : ByteString = ???
val maxFrameLength : Int = ???
val framingSource : Source[ByteString, _] =
chunkSource.via(Framing.delimiter(delimiter, maxFrameLength))

Streaming CSV Source with AKKA-HTTP

I am trying to stream data from Mongodb using reactivemongo-akkastream 0.12.1 and return the result into a CSV stream in one of the routes (using Akka-http).
I did implement that following the exemple here:
http://doc.akka.io/docs/akka-http/10.0.0/scala/http/routing-dsl/source-streaming-support.html#simple-csv-streaming-example
and it looks working fine.
The only problem I am facing now is how to add the headers to the output CSV file. Any ideas?
Thanks
Aside from the fact that that example isn't really a robust method of generating CSV (doesn't provide proper escaping) you'll need to rework it a bit to add headers. Here's what I would do:
make a Flow to convert a Source[Tweet] to a source of CSV rows, e.g. a Source[List[String]]
concatenate it to a source containing your headers as a single List[String]
adapt the marshaller to render a source of rows rather than tweets
Here's some example code:
case class Tweet(uid: String, txt: String)
def getTweets: Source[Tweet, NotUsed] = ???
val tweetToRow: Flow[Tweet, List[String], NotUsed] =
Flow[Tweet].map { t =>
List(
t.uid,
t.txt.replaceAll(",", "."))
}
// provide a marshaller from a row (List[String]) to a ByteString
implicit val tweetAsCsv = Marshaller.strict[List[String], ByteString] { row =>
Marshalling.WithFixedContentType(ContentTypes.`text/csv(UTF-8)`, () =>
ByteString(row.mkString(","))
)
}
// enable csv streaming
implicit val csvStreaming = EntityStreamingSupport.csv()
val route = path("tweets") {
val headers = Source.single(List("uid", "text"))
val tweets: Source[List[String], NotUsed] = getTweets.via(tweetToRow)
complete(headers.concat(tweets))
}
Update: if your getTweets method returns a Future you can just map over its source value and prepend the headers that way, e.g:
val route = path("tweets") {
val headers = Source.single(List("uid", "text"))
val rows: Future[Source[List[String], NotUsed]] = getTweets
.map(tweets => headers.concat(tweets.via(tweetToRow)))
complete(rows)
}

Akka Streams: How do I get Materialized Sink output from GraphDSL API?

This is a really simple, newbie question using the GraphDSL API. I read several related SO threads and I don't see the answer:
val actorSystem = ActorSystem("QuickStart")
val executor = actorSystem.dispatcher
val materializer = ActorMaterializer()(actorSystem)
val source: Source[Int, NotUsed] = Source(1 to 5)
val throttledSource = source.throttle(1, 1.second, 1, ThrottleMode.shaping)
val intDoublerFlow = Flow.fromFunction[Int, Int](i => i * 2)
val sink = Sink.foreach(println)
val graphModel = GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
throttledSource ~> intDoublerFlow ~> sink
// I presume I want to change this shape to something else
// but I can't figure out what it is.
ClosedShape
}
// TODO: This is RunnableGraph[NotUsed], I want RunnableGraph[Future[Done]] that gives the
// materialized Future[Done] from the sink. I presume I need to use a GraphDSL SourceShape
// but I can't get that working.
val graph = RunnableGraph.fromGraph(graphModel)
// This works and gives me the materialized sink output using the simpler API.
// But I want to use the GraphDSL so that I can add branches or junctures.
val graphThatIWantFromDslAPI = throttledSource.toMat(sink)(Keep.right)
The trick is to pass the stage you want the materialized value of (in your case, sink) to the GraphDSL.create. The function you pass as a second parameter changes as well, needing a Shape input parameter (s in the example below) which you can use in your graph.
val graphModel: Graph[ClosedShape, Future[Done]] = GraphDSL.create(sink) { implicit b => s =>
import GraphDSL.Implicits._
throttledSource ~> intDoublerFlow ~> s
// ClosedShape is just fine - it is always the shape of a RunnableGraph
ClosedShape
}
val graph: RunnableGraph[Future[Done]] = RunnableGraph.fromGraph(graphModel)
More info can be found in the docs.
val graphModel = GraphDSL.create(sink) { implicit b: Builder[Future[Done]] => sink =>
import akka.stream.scaladsl.GraphDSL.Implicits._
throttledSource ~> intDoublerFlow ~> sink
ClosedShape
}
val graph: RunnableGraph[Future[Done]] = RunnableGraph.fromGraph(graphModel)
val graphThatIWantFromDslAPI: RunnableGraph[Future[Done]] = throttledSource.toMat(sink)(Keep.right)
The problem with the GraphDSL API is, that the implicit Builder is heavily overloaded. You need to wrap your sink in create, which turns the Builder[NotUsed] into Builder[Future[Done]] and represents now a function from builder => sink => shape instead of builder => shape.

How do you deal with futures in Akka Flow?

I have built an akka graph that defines a flow. My objective is to reformat my future response and save it to a file. The flow can be outlined bellow:
val g = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val balancer = builder.add(Balance[(HttpRequest, String)](6, waitForAllDownstreams = false))
val merger = builder.add(Merge[Future[Map[String, String]]](6))
val fileSink = FileIO.toPath(outputPath, options)
val ignoreSink = Sink.ignore
val in = Source(seeds)
in ~> balancer.in
for (i <- Range(0,6)) {
balancer.out(i) ~>
wikiFlow.async ~>
// This maps to a Future[Map[String, String]]
Flow[(Try[HttpResponse], String)].map(parseHtml) ~>
merger
}
merger.out ~>
// When we merge we need to map our Map to a file
Flow[Future[Map[String, String]]].map((d) => {
// What is the proper way of serializing future map
// so I can work with it like a normal stream into fileSink?
// I could manually do ->
// d.foreach(someWriteToFileProcess(_))
// with ignoreSink, but this defeats the nice
// akka flow
}) ~>
fileSink
ClosedShape
})
I can hack this workflow to write my future map to a file via foreach, but I'm afraid this could somehow lead to concurrency issues with FileIO and it just doesn't feel right. What is the proper way to handle futures with our akka flow?
The easiest way to create a Flow which involves an asynchronous computation is by using mapAsync.
So... lets say you want to create a Flow which consumes Int and produces String using an asynchronous computation mapper: Int => Future[String] with a parallelism of 5.
val mapper: Int => Future[String] = (i: Int) => Future(i.toString)
val yourFlow = Flow[Int].mapAsync[String](5)(mapper)
Now, you can use this flow in your graph however you want.
An example usage will be,
val graph = GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val intSource = Source(1 to 10)
val printSink = Sink.foreach[String](s => println(s))
val yourMapper: Int => Future[String] = (i: Int) => Future(i.toString)
val yourFlow = Flow[Int].mapAsync[String](2)(yourMapper)
intSource ~> yourFlow ~> printSink
ClosedShape
}

How to switch between multiple Sources?

Suppose I have two infinite sources of the same type witch could be connected to the one Graph. I want to switch between them from outside already materialized graph, might be the same way as it possible to shutdown one of them with KillSwitch.
val source1: Source[ByteString, NotUsed] = ???
val source2: Source[ByteString, NotUsed] = ???
val (switcher: Switcher, source: Source[ByteString, NotUsed]) =
Source.combine(source1,source2).withSwitcher.run()
switcher.switch()
By default I want to use source1 and after switch I want to consume data from source2
source1
\
switcher ~> source
source2
Is it possible to implement this logic with Akka Streams?
Ok, after some time I found the solution.
So here I can use the same principle as we have in VLAN. I just need to tag my sources and then pass them through MergeHub. After that it's easy to filter those sources by tag and produce right result as Source.
All that I need to switch from one to another Source is a change of filter condition.
source1.map(s => (tag1, s))
\
MergeHub.filter(_._1 == tagX).map(_._2) -> Source
/
source2.map(s => (tag2, s))
Here is some example:
object SomeSource {
private var current = "tag1"
val source1: Source[ByteString, NotUsed] = ???
val source2: Source[ByteString, NotUsed] = ???
def switch = {
current = if (current == "tag1") "tag2" else "tag1"
}
val (sink: Sink[(String, ByteString), NotUsed],
source: Source[ByteString, NotUsed]) =
MergeHub.source[(String, ByteString)]
.filter(_._1 == current)
.via(Flow[(String, ByteString)].map(_._2))
.toMat(BroadcastHub.sink[ByteString])(Keep.both).run()
source1.map(s => ("tag1", s)).runWith(sink)
source2.map(s => ("tag2", s)).runWith(sink)
}
SomeSource.source // do something with Source
SomeSource.switch() // then switch