I have an Akka server who is asking the mallet file (some output) from an actor.
However in mallet actor code, several steps are done.
In which files are taken, modified, new files are created and saved in resource directory couple of times.
I need to run the actor sequentially so I am using map in calling future.
However the job is getting NullPointerException as there is no file for next future.
And as soon as I am stopping the server. all the files are getting generated in resources directory.
I need the files in resources directory as soon as individual future is completed. Please suggest
Below is the code of my server
lazy val routes: Route = apiRoutes
Configuration.parser.parse(args,Configuration.ConfigurationOptions()) match {
case Some(config) =>
val serverBinding: Future[Http.ServerBinding] = Http().bindAndHandle(routes, config.interface, config.port)
serverBinding.onComplete {
case Success(bound) =>
println(s"Server online at http://${bound.localAddress.getHostString}:${bound.localAddress.getPort}/")
case Failure(e) =>
log.error("Server could not start: ", e)
system.terminate()
}
case None =>
system.terminate()
}
Await.result(system.whenTerminated, Duration.Inf)
}
Below is the code of receive function.
def receive: Receive = {
case GetMalletOutput(malletFile) => createMalletResult(malletFile).pipeTo(sender())
}
def createMalletResult(malletFile: String): Future[MalletModel] = {
//sample malletResult
val topics = Array(Topic("1", "2").toJson)
var mM: Future[MalletModel] = Future{MalletModel("contentID", topics)}
//first Future to save file in resource
def saveFile(malletFile: String): Future[String] = Future {
val res = MalletResult(malletFile)
val converted = res.Score.parseJson.convertTo[MalletRepo]
val fileName = converted.ContentId
val fileTemp = new File("src/main/resources/new_corpus/" + fileName)
val output = new BufferedWriter(new FileWriter("src/main/resources/new_corpus/" + fileName))
output.write(converted.ContentText)
//output.close()
malletFile
}
//Second Future to used the resource file and create new one
def t2v(malletFile: String): Future[String] = Future{
val tmpDir = "src/main/resources/"
logger.debug(tmpDir.toString)
logger.debug("t2v Started")
Text2Vectors.main(("--input " + tmpDir + "new_corpus/ --keep-sequence --remove-stopwords " + "--output " + tmpDir + "new_corpus.mallet --use-pipe-from " + tmpDir + "corpus.mallet").split(" "))
logger.debug("text2Vector Completed")
malletFile
}
//another future to take file from resource and save in the new file back in resource
def infer(malletFile: String): Future[String] = Future {
val tmpDir = "src/main/resources/"
val tmpDirNew = "src/main/resources/inferResult/"
logger.debug("infer started")
InferTopics.main(("--input " + tmpDir + "new_corpus.mallet --inferencer " + tmpDir + "inferencer " + "--output-doc-topics " + tmpDirNew + "doc-topics-new.txt --num-iterations 1000").split(" "))
logger.debug("infer Completed")
malletFile
}
//final future to return the requested output using the saved future
def response(malletFile: String): Future[MalletModel] = Future{
logger.debug("response Started")
val lines = Source.fromResource("src/main/resources/inferResult/doc-topics-new.txt")
.getLines
.toList
.drop(1) match {
case Nil => List.empty
case x :: xs => x.split(" ").drop(2).mkString(" ") :: xs
}
logger.debug("response On")
val result = MalletResult(malletFile)
val convert = result.Score.parseJson.convertTo[MalletRepo]
val contentID = convert.ContentId
val inFile = lines.mkString(" ")
val a = inFile.split(" ").zipWithIndex.collect { case (v, i) if (i % 2 == 0) =>
(v, i)
}.map(_._1)
val b = inFile.split(" ").zipWithIndex.collect { case (v, i) if (i % 2 != 0) =>
(v, i)
}.map(_._1)
val paired = a.zip(b) // [(s,t),(s,t)]
val topics = paired.map(x => Topic(x._2, x._1).toJson)
logger.debug("validating")
logger.debug("mallet results...")
logger.debug("response Done")
MalletModel(contentID, topics)
}
//calling one future after another to run future sequntially
val result: Future[MalletModel] =
saveFile(malletFile).flatMap(malletFile =>
t2v(malletFile).flatMap(mf =>
infer(mf).flatMap(mf =>
response(mf))))
result
}
}
It seems I have got the answer myself.
The problem was I was trying to write the file in resource directory and reading the file back. to resolve the issue I wrote the file in tmpDir and read it back with bufferreader instead of source. and bam it worked.
Related
I am new to Scala and am using the method def process however getting build errors like "method wasSuccessful in class IOResult is deprecated (since 2.6.0): status is always set to Success(Done)
case Success((sftpResult, updateList)) if sftpResult.wasSuccessful =>". My question is how would I then implement this in my code? I apologise if the question seems stupid.
override def process(changedAfter: Option[ZonedDateTime], changedBefore: Option[ZonedDateTime])
(implicit fc: FlowContext): Future[(IOResult, Seq[Operation.Value])] = {
val now = nowUtc
val accountUpdatesFromDeletions: Source[AccountUpdate, NotUsed] =
dbService
.getDeleteActions(before = now)
.map(deletionEventToAccountUpdate)
val result = getAccountUpdates(changedAfter, changedBefore)
.merge(getErrorUpdates)
.merge(accountUpdatesFromDeletions)
.mapAsync(4)(writer.writePSVString)
.viaMat(creditBureauService.sendUpdate)(Keep.right)
.mapAsync(4)(au =>
for {
_ <- dbService.performUpdate(au)
_ <- performActionsDelete(now, au)
} yield au.operation
)
.toMat(Sink.seq)(Keep.both)
.withAttributes(ActorAttributes.supervisionStrategy(decider))
.run()
tupleFutureToFutureTuple(result) andThen {
case Success((sftpResult, updateList)) if sftpResult.wasSuccessful =>
val total = updateList.size
val deleted = updateList.count(_ == Operation.DELETED)
val updated = updateList.count(_ == Operation.UPDATED)
val inserted = updateList.count(_ == Operation.INSERTED)
log.info(s"SUCCESS! Uploaded $total accounts to Equifax.")
log.info(s"There were $deleted deletions, " +
s"$updated updates and " +
s"$inserted insertions to the database")
monitor.gauge("upload.process.batch.successful.total", total)
monitor.gauge("upload.process.batch.successful.deleted", deleted)
monitor.gauge("upload.process.batch.successful.updated", updated)
monitor.gauge("upload.process.batch.successful.inserted", inserted)
case Success((sftpResult, _)) if !sftpResult.wasSuccessful =>
val sw = new StringWriter
sftpResult.getError.printStackTrace(new PrintWriter(sw))
log.error("Failed to send data: " + sw.toString)
monitor.gauge("upload.process.batch.sftp.failed", 1)
case Failure(e: Throwable) =>
val sw = new StringWriter
e.printStackTrace(new PrintWriter(sw))
log.error(sw.toString)
monitor.gauge("upload.process.batch.failed", 1)
}
}
Both wasSuccessful and getError have been deprecated since 2.6. Since IOResult is in general wrapped in a Future which by itself already tells whether the I/O operation is a Success or Failure, the status parameter of a successfully returned IOResult is in essence redundant, thus is now always set to Success[Done].
Given that, you could simplify/refactor your case matching snippet to something like below:
tupleFutureToFutureTuple(result) andThen {
case Success((sftpResult, updateList)) =>
val total = updateList.size
val deleted = updateList.count(_ == Operation.DELETED)
val updated = updateList.count(_ == Operation.UPDATED)
val inserted = updateList.count(_ == Operation.INSERTED)
log.info(s"SUCCESS! Uploaded $total accounts to Equifax.")
log.info(s"There were $deleted deletions, " +
s"$updated updates and " +
s"$inserted insertions to the database")
monitor.gauge("upload.process.batch.successful.total", total)
monitor.gauge("upload.process.batch.successful.deleted", deleted)
monitor.gauge("upload.process.batch.successful.updated", updated)
monitor.gauge("upload.process.batch.successful.inserted", inserted)
case Failure(e: Throwable) =>
val sw = new StringWriter
e.printStackTrace(new PrintWriter(sw))
log.error(sw.toString)
monitor.gauge("upload.process.batch.failed", 1)
}
Please pardon me if the question sound naive as i am pretty new in akka.
I am trying to use the mallet api in scala akka in rest API
But getting error Request timeout encountered
below is the snapshot of my Server
Configuration.parser.parse(args,Configuration.ConfigurationOptions()) match {
case Some(config) =>
val serverBinding: Future[Http.ServerBinding] = Http().bindAndHandle(routes, config.interface, config.port)
serverBinding.onComplete {
case Success(bound) =>
println(s"Server online at http://${bound.localAddress.getHostString}:${bound.localAddress.getPort}/")
case Failure(e) =>
log.error("Server could not start: ", e)
system.terminate()
}
case None =>
system.terminate()
}
Await.result(system.whenTerminated, Duration.Inf)
snapshot of Router
lazy val apiRoutes: Route =
ignoreTrailingSlash {
pathSingleSlash {
complete(HttpEntity(ContentTypes.`text/html(UTF-8)`, "<html><body>Hello world!</body></html>"))
} ~
path("health") {
complete(HttpEntity(ContentTypes.`text/html(UTF-8)`, "ok"))
} ~
pathPrefix("mallet") {
parameter('malletFile) { malletFile =>
{val modelFile: Future[MalletModel] =
(malletActor ? GetMalletOutput(malletFile)).mapTo[MalletModel]
complete(modelFile)
}
}
}
}
and finally the snapshot of MalletActor
class MalletActor(implicit val uaCache: Cache[String, MalletModel],
implicit val executionContext: ExecutionContext)
extends Actor with ActorLogging with JsonSupport {
import MalletActor._
def receive: Receive = {
case GetMalletOutput(malletFile) => sender() ! createMalletResult2(malletFile)
}
def createMalletResult2(malletFile: String): MalletModel = {
logger.debug("run count...")
val res = MalletResult(malletFile)
val converted = res.Score.parseJson.convertTo[MalletRepo]
val fileName = converted.ContentId
val fileTemp = new File("src/main/resources/new_corpus/" + fileName)
if (fileTemp.exists) {
fileTemp.delete()
}
val output = new BufferedWriter(new FileWriter("src/main/resources/new_corpus/" + fileName))
output.write(converted.ContentText)
output.close()
//runMalletInferring()
val tmpDir = "src/main/resources/"
logger.debug("Import all documents to mallet...")
Text2Vectors.main(("--input " + tmpDir + "new_corpus/ --keep-sequence --remove-stopwords " + "--output " + tmpDir + "new_corpus.mallet --use-pipe-from " + tmpDir + "corpus.mallet").split(" "))
logger.debug("Run training process...")
InferTopics.main(("--input " + tmpDir + "new_corpus.mallet --inferencer " + tmpDir + "inferencer " + "--output-doc-topics " + tmpDir + "doc-topics-new.txt --num-iterations 1000").split(" "))
logger.debug("Inferring process finished.")
I am the getting error while calling text2Vector.main , the new vectorized file is getting created in new_corpus directory and new_corpus is getting generated as well. however after that I am getting the below error
Server online at http://127.0.0.1:9000/
12:45:38.622 [Sophi-Mallet-Api-akka.actor.default-dispatcher-3] DEBUG io.sophi.api.mallet.actors.MalletActor$ - run count...
12:45:38.634 [Sophi-Mallet-Api-akka.actor.default-dispatcher-3] DEBUG io.sophi.api.mallet.actors.MalletActor$ - Import all documents to mallet...
Couldn't open cc.mallet.util.MalletLogger resources/logging.properties file.
Perhaps the 'resources' directories weren't copied into the 'class' directory.
Continuing.
May 26, 2019 12:45:38 PM cc.mallet.classify.tui.Text2Vectors main
INFO: Labels =
May 26, 2019 12:45:38 PM cc.mallet.classify.tui.Text2Vectors main
INFO: src/main/resources/new_corpus/
May 26, 2019 12:45:46 PM cc.mallet.classify.tui.Text2Vectors main
INFO: rewriting previous instance list, with ID = 4e0a5a65540221c3:d508579:14b2ca15a26:-7ff7
[INFO] [05/26/2019 12:45:58.476] [Sophi-Mallet-Api-akka.actor.default-dispatcher-4] [akka.actor.ActorSystemImpl(Sophi-Mallet-Api)] Request timeout encountered for request [GET /mallet Empty]
in web browser as well I am getting the error
The server was not able to produce a timely response to your request.
Please try again in a short while!
I figure out the solution myself. Please correct me if it still is vague.
There were couple of change needed.
First in application configuration file (application.conf)
below code needed to be updated.
server{
idle-timeout = infinite
request-timeout = infinite
}
settings {
akka-workers-count = 100
akka-workers-count = ${?AKKA_WORKERS_COUNT}
actor-timeout = 100
actor-timeout = ${?ACTOR_TIMEOUT}
}
And the issue is needed to be handled in server code
by putting the api call inside timeout response
val timeoutResponse = HttpResponse(
StatusCodes.EnhanceYourCalm,
entity = "Running Mallet modelling.")
lazy val apiRoutes: Route =
ignoreTrailingSlash {
pathSingleSlash {
complete(HttpEntity(ContentTypes.`text/html(UTF-8)`, "<html><body>Hello world!</body></html>"))
} ~
path("health") {
complete(HttpEntity(ContentTypes.`text/html(UTF-8)`, "ok"))
} ~
pathPrefix("mallet") {
withRequestTimeout(5.minute, request => timeoutResponse) {
parameter('malletFile) { malletFile => {
val modelFile: Future[MalletModel] =
(malletActor ? GetMalletOutput(malletFile)).mapTo[MalletModel]
complete(modelFile)
}
}
}
}
}
I'm new to Akka Stream. I used following code for CSV parsing.
class CsvParser(config: Config)(implicit system: ActorSystem) extends LazyLogging with NumberValidation {
import system.dispatcher
private val importDirectory = Paths.get(config.getString("importer.import-directory")).toFile
private val linesToSkip = config.getInt("importer.lines-to-skip")
private val concurrentFiles = config.getInt("importer.concurrent-files")
private val concurrentWrites = config.getInt("importer.concurrent-writes")
private val nonIOParallelism = config.getInt("importer.non-io-parallelism")
def save(r: ValidReading): Future[Unit] = {
Future()
}
def parseLine(filePath: String)(line: String): Future[Reading] = Future {
val fields = line.split(";")
val id = fields(0).toInt
try {
val value = fields(1).toDouble
ValidReading(id, value)
} catch {
case t: Throwable =>
logger.error(s"Unable to parse line in $filePath:\n$line: ${t.getMessage}")
InvalidReading(id)
}
}
val lineDelimiter: Flow[ByteString, ByteString, NotUsed] =
Framing.delimiter(ByteString("\n"), 128, allowTruncation = true)
val parseFile: Flow[File, Reading, NotUsed] =
Flow[File].flatMapConcat { file =>
val src = FileSource.fromFile(file).getLines()
val source : Source[String, NotUsed] = Source.fromIterator(() => src)
// val gzipInputStream = new GZIPInputStream(new FileInputStream(file))
source
.mapAsync(parallelism = nonIOParallelism)(parseLine(file.getPath))
}
val computeAverage: Flow[Reading, ValidReading, NotUsed] =
Flow[Reading].grouped(2).mapAsyncUnordered(parallelism = nonIOParallelism) { readings =>
Future {
val validReadings = readings.collect { case r: ValidReading => r }
val average = if (validReadings.nonEmpty) validReadings.map(_.value).sum / validReadings.size else -1
ValidReading(readings.head.id, average)
}
}
val storeReadings: Sink[ValidReading, Future[Done]] =
Flow[ValidReading]
.mapAsyncUnordered(concurrentWrites)(save)
.toMat(Sink.ignore)(Keep.right)
val processSingleFile: Flow[File, ValidReading, NotUsed] =
Flow[File]
.via(parseFile)
.via(computeAverage)
def importFromFiles = {
implicit val materializer = ActorMaterializer()
val files = importDirectory.listFiles.toList
logger.info(s"Starting import of ${files.size} files from ${importDirectory.getPath}")
val startTime = System.currentTimeMillis()
val balancer = GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val balance = builder.add(Balance[File](concurrentFiles))
val merge = builder.add(Merge[ValidReading](concurrentFiles))
(1 to concurrentFiles).foreach { _ =>
balance ~> processSingleFile ~> merge
}
FlowShape(balance.in, merge.out)
}
Source(files)
.via(balancer)
.withAttributes(ActorAttributes.supervisionStrategy { e =>
logger.error("Exception thrown during stream processing", e)
Supervision.Resume
})
.runWith(storeReadings)
.andThen {
case Success(_) =>
val elapsedTime = (System.currentTimeMillis() - startTime) / 1000.0
logger.info(s"Import finished in ${elapsedTime}s")
case Failure(e) => logger.error("Import failed", e)
}
}
}
I wanted to to use Akka HTTP which would give all ValidReading entities parsed from CSV but I couldn't understand on how would I do that.
The above code fetches file from server and parse each lines to generate ValidReading.
How can I pass/upload CSV via akka-http, parse the file and stream the resulted response back to the endpoint?
The "essence" of the solution is something like this:
import akka.http.scaladsl.server.Directives._
val route = fileUpload("csv") {
case (metadata, byteSource) =>
val source = byteSource.map(x => x)
complete(HttpResponse(entity = HttpEntity(ContentTypes.`text/csv(UTF-8)`, source)))
}
You detect that the uploaded thing is a multipart-form-data with a chunk named "csv". You get the byteSource from that. Do the calculation (insert your logic to the .map(x=>x) part). Convert your data back to ByteString. Complete the request with the new source. This will make your endoint like a proxy.
So, I have this code in scala which I am converting to managed.
val file_out = new FileOutputStream(new java.io.File(filePath, resultFile + ".tar.gz"));
val buffer_out = new BufferedOutputStream(file_out);
val gzip_out = new GzipCompressorOutputStream(buffer_out);
val tar_out = new TarArchiveOutputStream(gzip_out);
try {
addFileToTarGz(tar_out, filePath + "/" + resultFolder, "");
} finally {
tar_out.finish();
tar_out.close();
gzip_out.close();
buffer_out.close();
file_out.close();
}
First attempt is:
val file = new java.io.File(filePath, s"$resultFile.tar.gz")
managed(new FileOutputStream(file))
.acquireAndGet(stream => managed(new BufferedOutputStream(stream))
.acquireAndGet(buffer => managed(new GzipCompressorOutputStream(buffer))
.acquireAndGet(gzip => managed(new TarArchiveOutputStream(gzip))
.acquireAndGet(tar => {
try {
addFileToTarGz(tar, filePath + "/" + resultFolder, "")
} finally {
tar.finish()
}
}))))
However, it doesn't look very readable. Is there a better way to make it managed but also readable?
Have you considered load pattern?
def withResource[T](block: Resource => T): T = {
val resource = new Resource
try {
block(resource)
} finally {
resource.close()
}
}
then you would use it like:
withResourse { resource =>
// do something with resource
}
If you used separate load for each of those files you would end up with nested blocks... (which under some circumstances might be the most reasonable choice), but here I guess it will be enough to do:
def withTarStream(filePath: String, resultFile: String)(block: TarArchiveOutputStream => T): T = {
val fileOut = new FileOutputStream(new java.io.File(filePath, resultFile))
val bufferOut = new BufferedOutputStream(fileOut)
val gzipOut = new GzipCompressorOutputStream(bufferOut)
val tarOut = new TarArchiveOutputStream(gzipOut)
try {
block(tarOut)
} finally {
tarOut.finish()
tarOut.close()
gzipOut.close()
bufferOut.close()
fileOut.close()
}
}
used like:
withTarStream(filePath, s"$resultFile.tar.gz") { tarStream =>
addFileToTarGz(tarStream, filePath + "/" + resultFolder, "")
}
Based on #Mateusz Kubuszok's suggestion, I tried these variations:
private def withResource[T: Resource : Manifest, X](t: T, block: T => X): X = managed(t).acquireAndGet(x => block(x))
withResource(new FileOutputStream(file),
(x:FileOutputStream) => withResource(new BufferedOutputStream(x),
(y: BufferedOutputStream) => withResource(new GzipCompressorOutputStream(y),
(z: GzipCompressorOutputStream) => withResource(new TarArchiveOutputStream(z),
(tar: TarArchiveOutputStream) => writeTar(tar, filePath, resultFolder)))));
And then also refactored the above to following form:
private def withResource[T: Resource : Manifest, X](t: T, block: T => X): X = managed(t).acquireAndGet(x => block(x))
def writeInFile(x: FileOutputStream): Try[Unit] = withResource(new BufferedOutputStream(x), writeInBuffer)
def writeInBuffer(y: BufferedOutputStream): Try[Unit] = withResource(new GzipCompressorOutputStream(y), writeGzipStream)
def writeGzipStream(z: GzipCompressorOutputStream): Try[Unit] = withResource(new TarArchiveOutputStream(z), writeTarStream)
val file = new File(filePath, s"$resultFile.tar.gz")
withResource(new FileOutputStream(file), writeInFile);
The next day, a colleague mentioned this, which looks better than both of above: I am still exploring how to propagate result/error out of this block.
val file = new File(filePath, s"$resultFile.tar.gz")
for {
stream <- managed(new FileOutputStream(file))
buffer <- managed(new BufferedOutputStream(stream))
gzip <- managed(new GzipCompressorOutputStream(buffer))
tar <- managed(new TarArchiveOutputStream(gzip))
} {
writeTar(tar)
}
Similar to what #cchantep suggested, I ended up doing this:
val tarOutputStream: ManagedResource[TarArchiveOutputStream] = (for {
stream <- managed(new FileOutputStream(file))
buffer <- managed(new BufferedOutputStream(stream))
gzip <- managed(new GzipCompressorOutputStream(buffer))
tar <- managed(new TarArchiveOutputStream(gzip))
} yield tar)
Try (tarOutputStream.acquireAndGet(writeTarStream(_))) match {
case Failure(e) => Failure(e)
case Success(_) => Success(new File(s"$filePath/$runLabel.tar.gz"))
}
Over the past few days I have been trying to figure out the best way to download a HTTP resource to a file using Akka Streams and HTTP.
Initially I started with the Future-Based Variant and that looked something like this:
def downloadViaFutures(uri: Uri, file: File): Future[Long] = {
val request = Get(uri)
val responseFuture = Http().singleRequest(request)
responseFuture.flatMap { response =>
val source = response.entity.dataBytes
source.runWith(FileIO.toFile(file))
}
}
That was kind of okay but once I learnt more about pure Akka Streams I wanted to try and use the Flow-Based Variant to create a stream starting from a Source[HttpRequest]. At first this completely stumped me until I stumbled upon the flatMapConcat flow transformation. This ended up a little more verbose:
def responseOrFail[T](in: (Try[HttpResponse], T)): (HttpResponse, T) = in match {
case (responseTry, context) => (responseTry.get, context)
}
def responseToByteSource[T](in: (HttpResponse, T)): Source[ByteString, Any] = in match {
case (response, _) => response.entity.dataBytes
}
def downloadViaFlow(uri: Uri, file: File): Future[Long] = {
val request = Get(uri)
val source = Source.single((request, ()))
val requestResponseFlow = Http().superPool[Unit]()
source.
via(requestResponseFlow).
map(responseOrFail).
flatMapConcat(responseToByteSource).
runWith(FileIO.toFile(file))
}
Then I wanted to get a little tricky and use the Content-Disposition header.
Going back to the Future-Based Variant:
def destinationFile(downloadDir: File, response: HttpResponse): File = {
val fileName = response.header[ContentDisposition].get.value
val file = new File(downloadDir, fileName)
file.createNewFile()
file
}
def downloadViaFutures2(uri: Uri, downloadDir: File): Future[Long] = {
val request = Get(uri)
val responseFuture = Http().singleRequest(request)
responseFuture.flatMap { response =>
val file = destinationFile(downloadDir, response)
val source = response.entity.dataBytes
source.runWith(FileIO.toFile(file))
}
}
But now I have no idea how to do this with the Future-Based Variant. This is as far as I got:
def responseToByteSourceWithDest[T](in: (HttpResponse, T), downloadDir: File): Source[(ByteString, File), Any] = in match {
case (response, _) =>
val source = responseToByteSource(in)
val file = destinationFile(downloadDir, response)
source.map((_, file))
}
def downloadViaFlow2(uri: Uri, downloadDir: File): Future[Long] = {
val request = Get(uri)
val source = Source.single((request, ()))
val requestResponseFlow = Http().superPool[Unit]()
val sourceWithDest: Source[(ByteString, File), Unit] = source.
via(requestResponseFlow).
map(responseOrFail).
flatMapConcat(responseToByteSourceWithDest(_, downloadDir))
sourceWithDest.runWith(???)
}
So now I have a Source that will emit one or more (ByteString, File) elements for each File (I say each File since there is no reason the original Source has to be a single HttpRequest).
Is there anyway to take these and route them to a dynamic Sink?
I'm thinking something like flatMapConcat, such as:
def runWithMap[T, Mat2](f: T => Graph[SinkShape[Out], Mat2])(implicit materializer: Materializer): Mat2 = ???
So that I could complete downloadViaFlow2 with:
def destToSink(destination: File): Sink[(ByteString, File), Future[Long]] = {
val sink = FileIO.toFile(destination, true)
Flow[(ByteString, File)].map(_._1).toMat(sink)(Keep.right)
}
sourceWithDest.runWithMap {
case (_, file) => destToSink(file)
}
The solution does not require a flatMapConcat. If you don't need any return values from the file writing then you can use Sink.foreach:
def writeFile(downloadDir : File)(httpResponse : HttpResponse) : Future[Long] = {
val file = destinationFile(downloadDir, httpResponse)
httpResponse.entity.dataBytes.runWith(FileIO.toFile(file))
}
def downloadViaFlow2(uri: Uri, downloadDir: File) : Future[Unit] = {
val request = HttpRequest(uri=uri)
val source = Source.single((request, ()))
val requestResponseFlow = Http().superPool[Unit]()
source.via(requestResponseFlow)
.map(responseOrFail)
.map(_._1)
.runWith(Sink.foreach(writeFile(downloadDir)))
}
Note that the Sink.foreach creates Futures from the writeFile function. Therefore there's not much back-pressure involved. The writeFile could be slowed down by the hard drive but the stream would keep generating Futures. To control this you can use Flow.mapAsyncUnordered (or Flow.mapAsync) :
val parallelism = 10
source.via(requestResponseFlow)
.map(responseOrFail)
.map(_._1)
.mapAsyncUnordered(parallelism)(writeFile(downloadDir))
.runWith(Sink.ignore)
If you want to accumulate the Long values for a total count you need to combine with a Sink.fold:
source.via(requestResponseFlow)
.map(responseOrFail)
.map(_._1)
.mapAsyncUnordered(parallelism)(writeFile(downloadDir))
.runWith(Sink.fold(0L)(_ + _))
The fold will keep a running sum and emit the final value when the source of requests has dried up.
Using the play Web Services client injected in ws, remmebering to import scala.concurrent.duration._:
def downloadFromUrl(url: String)(ws: WSClient): Future[Try[File]] = {
val file = File.createTempFile("my-prefix", new File("/tmp"))
file.deleteOnExit()
val futureResponse: Future[WSResponse] =
ws.url(url).withMethod("GET").withRequestTimeout(5 minutes).stream()
futureResponse.flatMap { res =>
res.status match {
case 200 =>
val outputStream = java.nio.file.Files.newOutputStream(file.toPath)
val sink = Sink.foreach[ByteString] { bytes => outputStream.write(bytes.toArray) }
res.bodyAsSource.runWith(sink).andThen {
case result =>
outputStream.close()
result.get
} map (_ => Success(file))
case other => Future(Failure[File](new Exception("HTTP Failure, response code " + other + " : " + res.statusText)))
}
}
}