Akka Streams, source items as another source? - scala

I am using Alpakka-FTP, but maybe I'm looking for a general akka-stream pattern. The FTP connector can list files or retrieve them:
def ls(host: String): Source[FtpFile, NotUsed]
def fromPath(host: String, path: Path): Source[ByteString, Future[IOResult]]
Ideally, I would like to create a stream like this:
LIST
.FETCH_ITEM
.FOREACH(do something)
But I'm unable to create such a stream with the two functions I wrote above. I feel like I should be able to get there using a Flow, something like
Ftp.ls
.via(some flow that uses the Ftp.fromPath above)
.runWith(Sink.foreach(do something))
Is this possible, given only the ls and fromPath functions above?
EDIT:
I am able to work it out using one actor and mapAsync, but I still feel it should be more straightforward.
class Downloader extends Actor {
override def receive = {
case ftpFile: FtpFile =>
Ftp.fromPath(Paths.get(ftpFile.path), settings)
.toMat(FileIO.toPath(Paths.get("testHDF.txt")))(Keep.right)
.run() pipeTo sender
}
}
val downloader = as.actorOf(Props(new Downloader))
Ftp.ls("test_path", settings)
.mapAsync(1)(ftpFile => (downloader ? ftpFile) (3.seconds).mapTo[IOResult])
.runWith(Sink.foreach(res => println("got it!" + res)))

You should be able to use flatMapConcat for this purpose. Your specific example could be re-written to
Ftp.ls("test_path", settings).flatMapConcat{ ftpFile =>
Ftp.fromPath(Paths.get(ftpFile.path), settings)
}.runWith(FileIO.toPath(Paths.get("testHDF.txt")))
Documentation here.

Related

Akka Streams, break tuple item apart?

Using the superPool from akka-http, I have a stream that passes down a tuple. I would like to pipeline it to the Alpakka Google Pub/Sub connector. At the end of the HTTP processing, I encode everything for the pub/sub connector and end up with
(PublishRequest, Long) // long is a timestamp
but the interface of the connector is
Flow[PublishRequest, Seq[String], NotUsed]
One first approach is to kill one part:
.map{ case(publishRequest, timestamp) => publishRequest }
.via(publishFlow)
Is there an elegant way to create this pipeline while keeping the Long information?
EDIT: added my not-so-elegant solution in the answers. More answers welcome.
I don't see anything inelegant about your solution using GraphDSL.create(), which I think has an advantage of visualizing the stream structure via the diagrammatic ~> clauses. I do see problem in your code. For example, I don't think publisher should be defined by add-ing a flow to the builder.
Below is a skeletal version (briefly tested) of what I believe publishAndRecombine should look like:
val publishFlow: Flow[PublishRequest, Seq[String], NotUsed] = ???
val publishAndRecombine = Flow.fromGraph(GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
val bcast = b.add(Broadcast[(PublishRequest, Long)](2))
val zipper = b.add(Zip[Seq[String], Long])
val publisher = Flow[(PublishRequest, Long)].
map{ case (pr, _) => pr }.
via(publishFlow)
val timestamp = Flow[(PublishRequest, Long)].
map{ case (_, ts) => ts }
bcast.out(0) ~> publisher ~> zipper.in0
bcast.out(1) ~> timestamp ~> zipper.in1
FlowShape(bcast.in, zipper.out)
})
There is now a much nicer solution for this which will be released in Akka 2.6.19 (see https://github.com/akka/akka/pull/31123).
In order to use the aformentioned unsafeViaData you would first have to represent (PublishRequest, Long) using FlowWithContext/SourceWithContext. FlowWithContext/SourceWithContext is an abstraction that was specifically designed to solve this problem (see https://doc.akka.io/docs/akka/current/stream/stream-context.html). The problem being you have a stream with the data part that is typically what you want to operate on (in your case the ByteString) and then you have the context (aka metadata) part which you typically just pass along unmodified (in your case the Long).
So in the end you would have something like this
val myFlow: FlowWithContext[PublishRequest, Long, PublishRequest, Long, NotUsed] =
FlowWithContext.fromTuples(originalFlowAsTuple) // Original flow that has `(PublishRequest, Long)` as an output
myFlow.unsafeViaData(publishFlow)
In contrast to Akka Streams, break tuple item apart?, not only is this solution involve much less boilerplate since its part of akka but it also retains the materialized value rather than losing it and always ending up with a NotUsed.
For the people wondering why the method unsafeViaData has unsafe in the name, its because the Flow that you pass into this method cannot add,drop or reorder any of the elements in the stream (doing so would mean that the context no longer properly corresponds to the data part of the stream). Ideally we would use Scala's type system to catch such errors at compile time but doing so would require a lot of changes to akka-stream especially if the changes need to remain backwards compatibility (which when dealing with akka we do). More details are in the PR mentioned earlier.
My not-so-elegant solution is using a custom flows that recombine things:
val publishAndRecombine = Flow.fromGraph(GraphDSL.create() { implicit b =>
val bc = b.add(Broadcast[(PublishRequest, Long)](2))
val publisher = b.add(Flow[(PublishRequest, Long)]
.map { case (pr, _) => pr }
.via(publishFlow))
val zipper = b.add(Zip[Seq[String], Long]).
bc.out(0) ~> publisher ~> zipper.in0
bc.out(1).map { case (pr, long) => long } ~> zipper.in1
FlowShape(bc.in, zipper.out)
})

How to create an Akka flow with backpressure and Control

I need to create a function with the following Interface:
import akka.kafka.scaladsl.Consumer.Control
object ItemConversionFlow {
def build(config: StreamConfig): Flow[Item, OtherItem, Control] = {
// Implementation goes here
}
My problem is that I don't know how to define the flow in a way that it fits the interface above.
When I am doing something like this
val flow = Flow[Item]
.map(item => doConversion(item)
.filter(_.isDefined)
.map(_.get)
the resulting type is Flow[Item, OtherItem, NotUsed]. I haven't found something in the Akka documentation so far. Also the functions on akka.stream.scaladsl.Flow only offer a "NotUsed" instead of Control. Would be great if someone could point me into the right direction.
Some background: I need to setup several pipelines which only distinguish in the conversion part. These pipelines are sub streams to a main stream which might be stopped for some reason (a corresponding message arrives in some kafka topic). Therefor I need the Control part. The idea would be to create a Graph template where I just insert the mentioned flow as argument (a factory returning it). For a specific case we have a solution which works. To generalize it I need this kind of flow.
You actually have backpressure. However, think about what do you really need about backpressure... you are not using asynchronous stages to increase your throughput... for example. Backpressure avoids fast producers overgrowing susbscribers https://doc.akka.io/docs/akka/2.5/stream/stream-rate.html. In your sample donĀ“t worry about it, your stream will ask for new elements to he publisher depending on how long doConversion takes to complete.
In case that you want to obtain the result of the stream use toMat or viaMat. For example, if your stream emits Item and transform these into OtherItem:
val str = Source.fromIterator(() => List(Item(Some(1))).toIterator)
.map(item => doConversion(item))
.filter(_.isDefined)
.map(_.get)
.toMat(Sink.fold(List[OtherItem]())((a, b) => {
// Examine the result of your stream
b :: a
}))(Keep.right)
.run()
str will be Future[List[OtherItem]]. Try to extrapolate this to your case.
Or using toMat with KillSwitches, "Creates a new [[Graph]] of [[FlowShape]] that materializes to an external switch that allows external completion
* of that unique materialization. Different materializations result in different, independent switches."
def build(config: StreamConfig): Flow[Item, OtherItem, UniqueKillSwitch] = {
Flow[Item]
.map(item => doConversion(item))
.filter(_.isDefined)
.map(_.get)
.viaMat(KillSwitches.single)(Keep.right)
}
val stream =
Source.fromIterator(() => List(Item(Some(1))).toIterator)
.viaMat(build(StreamConfig(1)))(Keep.right)
.toMat(Sink.ignore)(Keep.both).run
// This stops the stream
stream._1.shutdown()
// When it finishes
stream._2 onComplete(_ => println("Done"))

multipart form data in Lagom

I want to have a service which receives an item object, the object contains; name, description, price and picture.
the other attributes are strings which easily can be sent as Json object but for including picture what is the best solution?
if multipart formdata is the best solution how it is handled in Lagom?
You may want to check the file upload example in the lagom-recipes repository on GitHub.
Basically, the idea is to create an additional Play router. After that, we have to tell Lagom to use it as noted in the reference documentation (this feature is available since 1.5.0). Here is how the router might look like:
class FileUploadRouter(action: DefaultActionBuilder,
parser: PlayBodyParsers,
implicit val exCtx: ExecutionContext) {
private def fileHandler: FilePartHandler[File] = {
case FileInfo(partName, filename, contentType, _) =>
val tempFile = {
val f = new java.io.File("./target/file-upload-data/uploads", UUID.randomUUID().toString).getAbsoluteFile
f.getParentFile.mkdirs()
f
}
val sink: Sink[ByteString, Future[IOResult]] = FileIO.toPath(tempFile.toPath)
val acc: Accumulator[ByteString, IOResult] = Accumulator(sink)
acc.map {
case akka.stream.IOResult(_, _) =>
FilePart(partName, filename, contentType, tempFile)
}
}
val router = Router.from {
case POST(p"/api/files") =>
action(parser.multipartFormData(fileHandler)) { request =>
val files = request.body.files.map(_.ref.getAbsolutePath)
Results.Ok(files.mkString("Uploaded[", ", ", "]"))
}
}
}
And then, we simply tell Lagom to use it
override lazy val lagomServer =
serverFor[FileUploadService](wire[FileUploadServiceImpl])
.additionalRouter(wire[FileUploadRouter].router)
Alternatively, we can make use of the PlayServiceCall class. Here is a simple sketch on how to do that provided by James Roper from the Lightbend team:
// The type of the service call is NotUsed because we are handling it out of band
def myServiceCall: ServiceCall[NotUsed, Result] = PlayServiceCall { wrapCall =>
// Create a Play action to handle the request
EssentialAction { requestHeader =>
// Now we create the sink for where we want to stream the request to - eg it could
// go to a file, a database, some other service. The way Play gives you a request
// body is that you need to return a sink from EssentialAction, and when it gets
// that sink, it stream the request body into that sink.
val sink: Sink[ByteString, Future[Done]] = ...
// Play wraps sinks in an abstraction called accumulator, which makes it easy to
// work with the result of handling the sink. An accumulator is like a future, but
// but rather than just being a value that will be available in future, it is a
// value that will be available once you have passed a stream of data into it.
// We wrap the sink in an accumulator here.
val accumulator: Accumulator[ByteString, Done] = Accumulator.forSink(sink)
// Now we have an accumulator, but we need the accumulator to, when it's done,
// produce an HTTP response. Right now, it's just producing akka.Done (or whatever
// your sink materialized to). So we flatMap it, to handle the result.
accumulator.flatMap { done =>
// At this point we create the ServiceCall, the reason we do that here is it means
// we can access the result of the accumulator (in this example, it's just Done so
// not very interesting, but it could be something else).
val wrappedAction = wrapCall(ServiceCall { notUsed =>
// Here is where we can do any of the actual business logic, and generate the
// result that can be returned to Lagom to be serialized like normal
...
})
// Now we invoke the wrapped action, and run it with no body (since we've already
// handled the request body with our sink/accumulator.
wrappedAction(request).run()
}
}
}
Generally speaking, it probably isn't a good idea to use Lagom for that purpose. As noted on the GitHub issue on PlayServiceCall documentation:
Many use cases where we fallback to PlayServiceCall are related to presentation or HTTP-specific use (I18N, file upload, ...) which indicate: coupling of the lagom service to the presentation layer or coupling of the lagom service to the transport.
Quoting James Roper again (a few years back):
So currently, multipart/form-data is not supported in Lagom, at least not out of the box. You can drop down to a lower level Play API to handle it, but perhaps it would be better to handle it in a web gateway, where any files handled are uploaded directly to a storage service such as S3, and then a Lagom service might store the meta data associated with it.
You can also check the discussion here, which provides some more insight.

Show static content with parameters with Spray

I'm trying to serve an HTML page using Spray. It's fairly easy using getFromResource and getFromResourceDirectory, but I additionally need to pass some query parameters so that some Javascript on the page knows what to do. Is that possible ? All my prior attempts consisted in this kind of things
val route = path("show-repo") { serveResourceWithParams(SHOW_REPO) } ~ getFromResourceDirectory("web")
def serveWithParams(page: String, params: (String, String)*) = {
val url = page + (if (params.isEmpty) "" else "?") + params.map { case (k, v) => s"$k=$v" }.mkString("&")
getFromResource(url)
}
but I now realize it was a bit naive
I would create a new case class with members of type Array[Byte] and List[(String, String)].
case class FileWithParams(file: Array[Byte], params: List[(String, String)])
If you know the file won't be huge, you can read it in one shot with
FileUtils.readAllBytes
Assuming you're using(which you need to run Spray) https://github.com/sirthias/parboiled/blob/master/parboiled-core/src/main/java/org/parboiled/common/FileUtils.java
Then manually read the file contents, create the case class, and marshall it as you would normally do.
Of course, my preferred way would be to split this up into two async requests and wait for both to complete. You'll probably be waiting on the file route to complete anyway.

Parallel file processing in Scala

Suppose I need to process files in a given folder in parallel. In Java I would create a FolderReader thread to read file names from the folder and a pool of FileProcessor threads. FolderReader reads file names and submits the file processing function (Runnable) to the pool executor.
In Scala I see two options:
create a pool of FileProcessor actors and schedule a file processing function with Actors.Scheduler.
create an actor for each file name while reading the file names.
Does it make sense? What is the best option?
Depending on what you're doing, it may be as simple as
for(file<-files.par){
//process the file
}
I suggest with all my energies to keep as far as you can from the threads. Luckily we have better abstractions which take care of what's happening below, and in your case it appears to me that you do not need to use actors (while you can) but you can use a simpler abstraction, called Futures. They are a part of Akka open source library, and I think in the future will be a part of the Scala standard library as well.
A Future[T] is simply something that will return a T in the future.
All you need to run a future, is to have an implicit ExecutionContext, which you can derive from a java executor service. Then you will be able to enjoy the elegant API and the fact that a future is a monad to transform collections into collections of futures, collect the result and so on. I suggest you to give a look to http://doc.akka.io/docs/akka/2.0.1/scala/futures.html
object TestingFutures {
implicit val executorService = Executors.newFixedThreadPool(20)
implicit val executorContext = ExecutionContext.fromExecutorService(executorService)
def testFutures(myList:List[String]):List[String]= {
val listOfFutures : Future[List[String]] = Future.traverse(myList){
aString => Future{
aString.reverse
}
}
val result:List[String] = Await.result(listOfFutures,1 minute)
result
}
}
There's a lot going on here:
I am using Future.traverse which receives as a first parameter which is M[T]<:Traversable[T] and as second parameter a T => Future[T] or if you prefer a Function1[T,Future[T]] and returns Future[M[T]]
I am using the Future.apply method to create an anonymous class of type Future[T]
There are many other reasons to look at Akka futures.
Futures can be mapped because they are monad, i.e. you can chain Futures execution :
Future { 3 }.map { _ * 2 }.map { _.toString }
Futures have callback: future.onComplete, onSuccess, onFailure, andThen etc.
Futures support not only traverse, but also for comprehension
Ideally you should use two actors. One for reading the list of files, and one for actually reading the file.
You start the process by simply sending a single "start" message to the first actor. The actor can then read the list of files, and send a message to the second actor. The second actor then reads the file and processes the contents.
Having multiple actors, which might seem complicated, is actually a good thing in the sense that you have a bunch of objects communicating with eachother, like in a theoretical OO system.
Edit: you REALLY shouldn't be doing doing concurrent reading of a single file.
I was going to write up exactly what #Edmondo1984 did except he beat me to it. :) I second his suggestion in a big way. I'll also suggest that you read the documentation for Akka 2.0.2. As well, I'll give you a slightly more concrete example:
import akka.dispatch.{ExecutionContext, Future, Await}
import akka.util.duration._
import java.util.concurrent.Executors
import java.io.File
val execService = Executors.newCachedThreadPool()
implicit val execContext = ExecutionContext.fromExecutorService(execService)
val tmp = new File("/tmp/")
val files = tmp.listFiles()
val workers = files.map { f =>
Future {
f.getAbsolutePath()
}
}.toSeq
val result = Future.sequence(workers)
result.onSuccess {
case filenames =>
filenames.foreach { fn =>
println(fn)
}
}
// Artificial just to make things work for the example
Thread.sleep(100)
execContext.shutdown()
Here I use sequence instead of traverse, but the difference is going to depend on your needs.
Go with the Future, my friend; the Actor is just a more painful approach in this instance.
But if use actors, what's wrong with that?
If we have to read / write to some property file. There is my Java example. But still with Akka Actors.
Lest's say we have an actor ActorFile represents one file. Hm.. Probably it can not represent One file. Right? (would be nice it could). So then it represents several files like PropertyFilesActor then:
Why would not use something like this:
public class PropertyFilesActor extends UntypedActor {
Map<String, String> filesContent = new LinkedHashMap<String, String>();
{ // here we should use real files of cource
filesContent.put("file1.xml", "");
filesContent.put("file2.xml", "");
}
#Override
public void onReceive(Object message) throws Exception {
if (message instanceof WriteMessage) {
WriteMessage writeMessage = (WriteMessage) message;
String content = filesContent.get(writeMessage.fileName);
String newContent = content + writeMessage.stringToWrite;
filesContent.put(writeMessage.fileName, newContent);
}
else if (message instanceof ReadMessage) {
ReadMessage readMessage = (ReadMessage) message;
String currentContent = filesContent.get(readMessage.fileName);
// Send the current content back to the sender
getSender().tell(new ReadMessage(readMessage.fileName, currentContent), getSelf());
}
else unhandled(message);
}
}
...a message will go with parameter (fileName)
It has its own in-box, accepting messages like:
WriteLine(fileName, string)
ReadLine(fileName, string)
Those messages will be storing into to the in-box in the order, one after antoher. The actor would do its work by receiving messages from the box - storing/reading, and meanwhile sending feedback sender ! message back.
Thus, let's say if we write to the property file, and send showing the content on the web page. We can start showing page (right after we sent message to store a data to the file) and as soon as we received the feedback, update part of the page with a data from just updated file (by ajax).
Well, grab your files and stick them in a parallel structure
scala> new java.io.File("/tmp").listFiles.par
res0: scala.collection.parallel.mutable.ParArray[java.io.File] = ParArray( ... )
Then...
scala> res0 map (_.length)
res1: scala.collection.parallel.mutable.ParArray[Long] = ParArray(4943, 1960, 4208, 103266, 363 ... )