managed (ARM) in scala for nested resources - scala

So, I have this code in scala which I am converting to managed.
val file_out = new FileOutputStream(new java.io.File(filePath, resultFile + ".tar.gz"));
val buffer_out = new BufferedOutputStream(file_out);
val gzip_out = new GzipCompressorOutputStream(buffer_out);
val tar_out = new TarArchiveOutputStream(gzip_out);
try {
addFileToTarGz(tar_out, filePath + "/" + resultFolder, "");
} finally {
tar_out.finish();
tar_out.close();
gzip_out.close();
buffer_out.close();
file_out.close();
}
First attempt is:
val file = new java.io.File(filePath, s"$resultFile.tar.gz")
managed(new FileOutputStream(file))
.acquireAndGet(stream => managed(new BufferedOutputStream(stream))
.acquireAndGet(buffer => managed(new GzipCompressorOutputStream(buffer))
.acquireAndGet(gzip => managed(new TarArchiveOutputStream(gzip))
.acquireAndGet(tar => {
try {
addFileToTarGz(tar, filePath + "/" + resultFolder, "")
} finally {
tar.finish()
}
}))))
However, it doesn't look very readable. Is there a better way to make it managed but also readable?

Have you considered load pattern?
def withResource[T](block: Resource => T): T = {
val resource = new Resource
try {
block(resource)
} finally {
resource.close()
}
}
then you would use it like:
withResourse { resource =>
// do something with resource
}
If you used separate load for each of those files you would end up with nested blocks... (which under some circumstances might be the most reasonable choice), but here I guess it will be enough to do:
def withTarStream(filePath: String, resultFile: String)(block: TarArchiveOutputStream => T): T = {
val fileOut = new FileOutputStream(new java.io.File(filePath, resultFile))
val bufferOut = new BufferedOutputStream(fileOut)
val gzipOut = new GzipCompressorOutputStream(bufferOut)
val tarOut = new TarArchiveOutputStream(gzipOut)
try {
block(tarOut)
} finally {
tarOut.finish()
tarOut.close()
gzipOut.close()
bufferOut.close()
fileOut.close()
}
}
used like:
withTarStream(filePath, s"$resultFile.tar.gz") { tarStream =>
addFileToTarGz(tarStream, filePath + "/" + resultFolder, "")
}

Based on #Mateusz Kubuszok's suggestion, I tried these variations:
private def withResource[T: Resource : Manifest, X](t: T, block: T => X): X = managed(t).acquireAndGet(x => block(x))
withResource(new FileOutputStream(file),
(x:FileOutputStream) => withResource(new BufferedOutputStream(x),
(y: BufferedOutputStream) => withResource(new GzipCompressorOutputStream(y),
(z: GzipCompressorOutputStream) => withResource(new TarArchiveOutputStream(z),
(tar: TarArchiveOutputStream) => writeTar(tar, filePath, resultFolder)))));
And then also refactored the above to following form:
private def withResource[T: Resource : Manifest, X](t: T, block: T => X): X = managed(t).acquireAndGet(x => block(x))
def writeInFile(x: FileOutputStream): Try[Unit] = withResource(new BufferedOutputStream(x), writeInBuffer)
def writeInBuffer(y: BufferedOutputStream): Try[Unit] = withResource(new GzipCompressorOutputStream(y), writeGzipStream)
def writeGzipStream(z: GzipCompressorOutputStream): Try[Unit] = withResource(new TarArchiveOutputStream(z), writeTarStream)
val file = new File(filePath, s"$resultFile.tar.gz")
withResource(new FileOutputStream(file), writeInFile);
The next day, a colleague mentioned this, which looks better than both of above: I am still exploring how to propagate result/error out of this block.
val file = new File(filePath, s"$resultFile.tar.gz")
for {
stream <- managed(new FileOutputStream(file))
buffer <- managed(new BufferedOutputStream(stream))
gzip <- managed(new GzipCompressorOutputStream(buffer))
tar <- managed(new TarArchiveOutputStream(gzip))
} {
writeTar(tar)
}
Similar to what #cchantep suggested, I ended up doing this:
val tarOutputStream: ManagedResource[TarArchiveOutputStream] = (for {
stream <- managed(new FileOutputStream(file))
buffer <- managed(new BufferedOutputStream(stream))
gzip <- managed(new GzipCompressorOutputStream(buffer))
tar <- managed(new TarArchiveOutputStream(gzip))
} yield tar)
Try (tarOutputStream.acquireAndGet(writeTarStream(_))) match {
case Failure(e) => Failure(e)
case Success(_) => Success(new File(s"$filePath/$runLabel.tar.gz"))
}

Related

Scala future not writing to Resource directory unless terminated

I have an Akka server who is asking the mallet file (some output) from an actor.
However in mallet actor code, several steps are done.
In which files are taken, modified, new files are created and saved in resource directory couple of times.
I need to run the actor sequentially so I am using map in calling future.
However the job is getting NullPointerException as there is no file for next future.
And as soon as I am stopping the server. all the files are getting generated in resources directory.
I need the files in resources directory as soon as individual future is completed. Please suggest
Below is the code of my server
lazy val routes: Route = apiRoutes
Configuration.parser.parse(args,Configuration.ConfigurationOptions()) match {
case Some(config) =>
val serverBinding: Future[Http.ServerBinding] = Http().bindAndHandle(routes, config.interface, config.port)
serverBinding.onComplete {
case Success(bound) =>
println(s"Server online at http://${bound.localAddress.getHostString}:${bound.localAddress.getPort}/")
case Failure(e) =>
log.error("Server could not start: ", e)
system.terminate()
}
case None =>
system.terminate()
}
Await.result(system.whenTerminated, Duration.Inf)
}
Below is the code of receive function.
def receive: Receive = {
case GetMalletOutput(malletFile) => createMalletResult(malletFile).pipeTo(sender())
}
def createMalletResult(malletFile: String): Future[MalletModel] = {
//sample malletResult
val topics = Array(Topic("1", "2").toJson)
var mM: Future[MalletModel] = Future{MalletModel("contentID", topics)}
//first Future to save file in resource
def saveFile(malletFile: String): Future[String] = Future {
val res = MalletResult(malletFile)
val converted = res.Score.parseJson.convertTo[MalletRepo]
val fileName = converted.ContentId
val fileTemp = new File("src/main/resources/new_corpus/" + fileName)
val output = new BufferedWriter(new FileWriter("src/main/resources/new_corpus/" + fileName))
output.write(converted.ContentText)
//output.close()
malletFile
}
//Second Future to used the resource file and create new one
def t2v(malletFile: String): Future[String] = Future{
val tmpDir = "src/main/resources/"
logger.debug(tmpDir.toString)
logger.debug("t2v Started")
Text2Vectors.main(("--input " + tmpDir + "new_corpus/ --keep-sequence --remove-stopwords " + "--output " + tmpDir + "new_corpus.mallet --use-pipe-from " + tmpDir + "corpus.mallet").split(" "))
logger.debug("text2Vector Completed")
malletFile
}
//another future to take file from resource and save in the new file back in resource
def infer(malletFile: String): Future[String] = Future {
val tmpDir = "src/main/resources/"
val tmpDirNew = "src/main/resources/inferResult/"
logger.debug("infer started")
InferTopics.main(("--input " + tmpDir + "new_corpus.mallet --inferencer " + tmpDir + "inferencer " + "--output-doc-topics " + tmpDirNew + "doc-topics-new.txt --num-iterations 1000").split(" "))
logger.debug("infer Completed")
malletFile
}
//final future to return the requested output using the saved future
def response(malletFile: String): Future[MalletModel] = Future{
logger.debug("response Started")
val lines = Source.fromResource("src/main/resources/inferResult/doc-topics-new.txt")
.getLines
.toList
.drop(1) match {
case Nil => List.empty
case x :: xs => x.split(" ").drop(2).mkString(" ") :: xs
}
logger.debug("response On")
val result = MalletResult(malletFile)
val convert = result.Score.parseJson.convertTo[MalletRepo]
val contentID = convert.ContentId
val inFile = lines.mkString(" ")
val a = inFile.split(" ").zipWithIndex.collect { case (v, i) if (i % 2 == 0) =>
(v, i)
}.map(_._1)
val b = inFile.split(" ").zipWithIndex.collect { case (v, i) if (i % 2 != 0) =>
(v, i)
}.map(_._1)
val paired = a.zip(b) // [(s,t),(s,t)]
val topics = paired.map(x => Topic(x._2, x._1).toJson)
logger.debug("validating")
logger.debug("mallet results...")
logger.debug("response Done")
MalletModel(contentID, topics)
}
//calling one future after another to run future sequntially
val result: Future[MalletModel] =
saveFile(malletFile).flatMap(malletFile =>
t2v(malletFile).flatMap(mf =>
infer(mf).flatMap(mf =>
response(mf))))
result
}
}
It seems I have got the answer myself.
The problem was I was trying to write the file in resource directory and reading the file back. to resolve the issue I wrote the file in tmpDir and read it back with bufferreader instead of source. and bam it worked.

How to using Akka Stream with Akk-Http to stream the response

I'm new to Akka Stream. I used following code for CSV parsing.
class CsvParser(config: Config)(implicit system: ActorSystem) extends LazyLogging with NumberValidation {
import system.dispatcher
private val importDirectory = Paths.get(config.getString("importer.import-directory")).toFile
private val linesToSkip = config.getInt("importer.lines-to-skip")
private val concurrentFiles = config.getInt("importer.concurrent-files")
private val concurrentWrites = config.getInt("importer.concurrent-writes")
private val nonIOParallelism = config.getInt("importer.non-io-parallelism")
def save(r: ValidReading): Future[Unit] = {
Future()
}
def parseLine(filePath: String)(line: String): Future[Reading] = Future {
val fields = line.split(";")
val id = fields(0).toInt
try {
val value = fields(1).toDouble
ValidReading(id, value)
} catch {
case t: Throwable =>
logger.error(s"Unable to parse line in $filePath:\n$line: ${t.getMessage}")
InvalidReading(id)
}
}
val lineDelimiter: Flow[ByteString, ByteString, NotUsed] =
Framing.delimiter(ByteString("\n"), 128, allowTruncation = true)
val parseFile: Flow[File, Reading, NotUsed] =
Flow[File].flatMapConcat { file =>
val src = FileSource.fromFile(file).getLines()
val source : Source[String, NotUsed] = Source.fromIterator(() => src)
// val gzipInputStream = new GZIPInputStream(new FileInputStream(file))
source
.mapAsync(parallelism = nonIOParallelism)(parseLine(file.getPath))
}
val computeAverage: Flow[Reading, ValidReading, NotUsed] =
Flow[Reading].grouped(2).mapAsyncUnordered(parallelism = nonIOParallelism) { readings =>
Future {
val validReadings = readings.collect { case r: ValidReading => r }
val average = if (validReadings.nonEmpty) validReadings.map(_.value).sum / validReadings.size else -1
ValidReading(readings.head.id, average)
}
}
val storeReadings: Sink[ValidReading, Future[Done]] =
Flow[ValidReading]
.mapAsyncUnordered(concurrentWrites)(save)
.toMat(Sink.ignore)(Keep.right)
val processSingleFile: Flow[File, ValidReading, NotUsed] =
Flow[File]
.via(parseFile)
.via(computeAverage)
def importFromFiles = {
implicit val materializer = ActorMaterializer()
val files = importDirectory.listFiles.toList
logger.info(s"Starting import of ${files.size} files from ${importDirectory.getPath}")
val startTime = System.currentTimeMillis()
val balancer = GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val balance = builder.add(Balance[File](concurrentFiles))
val merge = builder.add(Merge[ValidReading](concurrentFiles))
(1 to concurrentFiles).foreach { _ =>
balance ~> processSingleFile ~> merge
}
FlowShape(balance.in, merge.out)
}
Source(files)
.via(balancer)
.withAttributes(ActorAttributes.supervisionStrategy { e =>
logger.error("Exception thrown during stream processing", e)
Supervision.Resume
})
.runWith(storeReadings)
.andThen {
case Success(_) =>
val elapsedTime = (System.currentTimeMillis() - startTime) / 1000.0
logger.info(s"Import finished in ${elapsedTime}s")
case Failure(e) => logger.error("Import failed", e)
}
}
}
I wanted to to use Akka HTTP which would give all ValidReading entities parsed from CSV but I couldn't understand on how would I do that.
The above code fetches file from server and parse each lines to generate ValidReading.
How can I pass/upload CSV via akka-http, parse the file and stream the resulted response back to the endpoint?
The "essence" of the solution is something like this:
import akka.http.scaladsl.server.Directives._
val route = fileUpload("csv") {
case (metadata, byteSource) =>
val source = byteSource.map(x => x)
complete(HttpResponse(entity = HttpEntity(ContentTypes.`text/csv(UTF-8)`, source)))
}
You detect that the uploaded thing is a multipart-form-data with a chunk named "csv". You get the byteSource from that. Do the calculation (insert your logic to the .map(x=>x) part). Convert your data back to ByteString. Complete the request with the new source. This will make your endoint like a proxy.

How to emulate Sink in akka streams?

I have a simple "save" function that is using akka-stream-alpakka multipartUpload, it looks like this:
def save(fileName: String): Future[AWSLocation] = {
val uuid: String = s"${UUID.randomUUID()}"
val s3Sink: Sink[ByteString, Future[MultipartUploadResult]] = s3Client.multipartUpload(s"$bucketName", s"$uuid/$fileName")
val file = Paths.get(s"/tmp/$fileName")
FileIO.fromPath(file).runWith(s3Sink).map(res => {
AWSLocation(uuid, fileName, res.key)
}).recover {
case ex: S3Exception =>
logger.error("Upload to S3 failed with s3 exception", ex)
throw ex
case ex: Throwable =>
logger.error("Upload to S3 failed with an unknown exception", ex)
throw ex
}
}
I want to test this function, 2 cases:
that multipartUpload succeed and I get AWSLocation (my case class) back.
that multipartUpload fails and I get S3Exception
so i thought to spy on multipartUpload and return my own sink, like this:
val mockAmazonS3ProxyService: S3ClientProxy = mock[S3ClientProxy]
val s3serviceMock: S3Service = mock[S3Service]
override val fakeApplication: Application = GuiceApplicationBuilder()
.overrides(bind[S3ClientProxy].toInstance(mockAmazonS3ProxyService))
.router(Router.empty).build()
"test" in {
when(mockAmazonS3ProxyService.multipartUpload(anyString(), anyString())) thenReturn Sink(ByteString.empty, Future.successful(MultipartUploadResult(Uri(""),"","myKey123","",Some(""))))
val res = s3serviceMock.save("someFileName").futureValue
res.key shouldBe "myKey123"
}
the issue is that i get Error:(47, 93) akka.stream.scaladsl.Sink.type does not take parameters, i understand i cant create sink like this, but how can i?
or what could be a better way testing this?
Consider redesigning your method save so it becomes more testable and injection of specific sink that produce different outcomes for different tests is possible (as mentioned by Bennie Krijger).
def save(fileName: String): Future[AWSLocation] = {
val uuid: String = s"${UUID.randomUUID()}"
save(fileName)(() => s3Client.multipartUpload(s"$bucketName", s"$uuid/$fileName"))
}
def save(
fileName: String
)(createS3UploadSink: () => Sink[ByteString, Future[MultipartUploadResult]]): Future[AWSLocation] = {
val s3Sink: Sink[ByteString, Future[MultipartUploadResult]] = createS3UploadSink()
val file = Paths.get(s"/tmp/$fileName")
FileIO
.fromPath(file)
.runWith(s3Sink)
.map(res => {
AWSLocation(uuid, fileName, res.key)
})
.recover {
case ex: S3Exception =>
logger.error("Upload to S3 failed with s3 exception", ex)
throw ex
case ex: Throwable =>
logger.error("Upload to S3 failed with an unknown exception", ex)
throw ex
}
}
The test can look like
class MultipartUploadSpec extends TestKit(ActorSystem("multipartUpload")) with FunSpecLike {
implicit val mat: Materializer = ActorMaterializer()
describe("multipartUpload") {
it("should pass failure") {
val result = save(() => Sink.ignore.mapMaterializedValue(_ => Future.failed(new RuntimeException)))
// assert result
}
it("should pass successfully") {
val result = save(() => Sink.ignore.mapMaterializedValue(_ => Future.successful(new MultipartUploadResult(???))))
// assert result
}
}

Close InputStream wrapped into IO

I'm using IO (cats/scalaz does not matter). And I want to use bracket to close InputStream after I'm done with it. The problem is that I'm reading gzipped files. Here is what I tried:
I (Incorrect).
val io1 = IO(Files.newInputStream(Paths.get("/tmp/file")))
val io2 = io1.map(is => new GZIPInputStream(is))
val io3 = io2.bracket{_ =>
IO(println("use"))
//empty usage
}{ is =>
println("close")
IO(is.close())
}
This is incorrect because of if /tmp/file is a broken zip-file with invalid magic we will never reach "resource release" bracket.
II (Incorrect).
val io1 = IO(Files.newInputStream(Paths.get("/tmp/file")))
val io3 = io1.bracket{is =>
val gzis = new GZIPInputStream(is)
IO(println("use"))
//empty usage
}{ is =>
println("close")
IO(is.close())
}
This is incorrect because we are closing the underlying stream, but not the GzipInputStream so we may end up losing some buffered data inside.
In java I could simply do this without flushing:
var is: InputStream = null
try{
is = Files.newInputStream(Paths.get("/tmp/file"))
is = new GZIPInputStream(is)
//use
} finally {
if(is ne null)
is.close()
}
Can you suggest some approach for dealing with GzipInputStream?
It is not a problem to call close on input stream several times, so you can close InputStream and GZIPInputStream separatelly.
In Java it is common to let try with resources hanlde both streams
try (InputStream is = Files.newInputStream(Paths.get("/tmp/file"));
GZIPInputStream gzis = new GZIPInputStream(is)){
//use gzis
}
// both streams are closed in implicit finaly clause
You can translate this approach to IO brackets
val io1 = IO(Files.newInputStream(Paths.get("/tmp/file")))
val io2 = io1.bracket { is =>
IO(new GZIPInputStream(is)).bracket { gzis =>
IO(println("using gzis"))
}(gzis => IO(gzis.close()))
}(is => IO(is.close()))
To awoid nested brackets you can use Resource
def openFile(path: Path) = Resource(IO {
val is = Files.newInputStream(path)
(is, IO(is.close()))
})
def openGZIP(is: InputStream) = Resource(IO {
val gzis = new GZIPInputStream(is)
(gzis, IO(gzis.close()))
})
val gzip: Resource[IO, GZIPInputStream] = for {
is <- openFile(Paths.get("/tmp/file"))
gzis <- openGZIP(is)
} yield gzis
gzip.use {
gzis => IO(println("using gzis"))
}

How to download a HTTP resource to a file with Akka Streams and HTTP?

Over the past few days I have been trying to figure out the best way to download a HTTP resource to a file using Akka Streams and HTTP.
Initially I started with the Future-Based Variant and that looked something like this:
def downloadViaFutures(uri: Uri, file: File): Future[Long] = {
val request = Get(uri)
val responseFuture = Http().singleRequest(request)
responseFuture.flatMap { response =>
val source = response.entity.dataBytes
source.runWith(FileIO.toFile(file))
}
}
That was kind of okay but once I learnt more about pure Akka Streams I wanted to try and use the Flow-Based Variant to create a stream starting from a Source[HttpRequest]. At first this completely stumped me until I stumbled upon the flatMapConcat flow transformation. This ended up a little more verbose:
def responseOrFail[T](in: (Try[HttpResponse], T)): (HttpResponse, T) = in match {
case (responseTry, context) => (responseTry.get, context)
}
def responseToByteSource[T](in: (HttpResponse, T)): Source[ByteString, Any] = in match {
case (response, _) => response.entity.dataBytes
}
def downloadViaFlow(uri: Uri, file: File): Future[Long] = {
val request = Get(uri)
val source = Source.single((request, ()))
val requestResponseFlow = Http().superPool[Unit]()
source.
via(requestResponseFlow).
map(responseOrFail).
flatMapConcat(responseToByteSource).
runWith(FileIO.toFile(file))
}
Then I wanted to get a little tricky and use the Content-Disposition header.
Going back to the Future-Based Variant:
def destinationFile(downloadDir: File, response: HttpResponse): File = {
val fileName = response.header[ContentDisposition].get.value
val file = new File(downloadDir, fileName)
file.createNewFile()
file
}
def downloadViaFutures2(uri: Uri, downloadDir: File): Future[Long] = {
val request = Get(uri)
val responseFuture = Http().singleRequest(request)
responseFuture.flatMap { response =>
val file = destinationFile(downloadDir, response)
val source = response.entity.dataBytes
source.runWith(FileIO.toFile(file))
}
}
But now I have no idea how to do this with the Future-Based Variant. This is as far as I got:
def responseToByteSourceWithDest[T](in: (HttpResponse, T), downloadDir: File): Source[(ByteString, File), Any] = in match {
case (response, _) =>
val source = responseToByteSource(in)
val file = destinationFile(downloadDir, response)
source.map((_, file))
}
def downloadViaFlow2(uri: Uri, downloadDir: File): Future[Long] = {
val request = Get(uri)
val source = Source.single((request, ()))
val requestResponseFlow = Http().superPool[Unit]()
val sourceWithDest: Source[(ByteString, File), Unit] = source.
via(requestResponseFlow).
map(responseOrFail).
flatMapConcat(responseToByteSourceWithDest(_, downloadDir))
sourceWithDest.runWith(???)
}
So now I have a Source that will emit one or more (ByteString, File) elements for each File (I say each File since there is no reason the original Source has to be a single HttpRequest).
Is there anyway to take these and route them to a dynamic Sink?
I'm thinking something like flatMapConcat, such as:
def runWithMap[T, Mat2](f: T => Graph[SinkShape[Out], Mat2])(implicit materializer: Materializer): Mat2 = ???
So that I could complete downloadViaFlow2 with:
def destToSink(destination: File): Sink[(ByteString, File), Future[Long]] = {
val sink = FileIO.toFile(destination, true)
Flow[(ByteString, File)].map(_._1).toMat(sink)(Keep.right)
}
sourceWithDest.runWithMap {
case (_, file) => destToSink(file)
}
The solution does not require a flatMapConcat. If you don't need any return values from the file writing then you can use Sink.foreach:
def writeFile(downloadDir : File)(httpResponse : HttpResponse) : Future[Long] = {
val file = destinationFile(downloadDir, httpResponse)
httpResponse.entity.dataBytes.runWith(FileIO.toFile(file))
}
def downloadViaFlow2(uri: Uri, downloadDir: File) : Future[Unit] = {
val request = HttpRequest(uri=uri)
val source = Source.single((request, ()))
val requestResponseFlow = Http().superPool[Unit]()
source.via(requestResponseFlow)
.map(responseOrFail)
.map(_._1)
.runWith(Sink.foreach(writeFile(downloadDir)))
}
Note that the Sink.foreach creates Futures from the writeFile function. Therefore there's not much back-pressure involved. The writeFile could be slowed down by the hard drive but the stream would keep generating Futures. To control this you can use Flow.mapAsyncUnordered (or Flow.mapAsync) :
val parallelism = 10
source.via(requestResponseFlow)
.map(responseOrFail)
.map(_._1)
.mapAsyncUnordered(parallelism)(writeFile(downloadDir))
.runWith(Sink.ignore)
If you want to accumulate the Long values for a total count you need to combine with a Sink.fold:
source.via(requestResponseFlow)
.map(responseOrFail)
.map(_._1)
.mapAsyncUnordered(parallelism)(writeFile(downloadDir))
.runWith(Sink.fold(0L)(_ + _))
The fold will keep a running sum and emit the final value when the source of requests has dried up.
Using the play Web Services client injected in ws, remmebering to import scala.concurrent.duration._:
def downloadFromUrl(url: String)(ws: WSClient): Future[Try[File]] = {
val file = File.createTempFile("my-prefix", new File("/tmp"))
file.deleteOnExit()
val futureResponse: Future[WSResponse] =
ws.url(url).withMethod("GET").withRequestTimeout(5 minutes).stream()
futureResponse.flatMap { res =>
res.status match {
case 200 =>
val outputStream = java.nio.file.Files.newOutputStream(file.toPath)
val sink = Sink.foreach[ByteString] { bytes => outputStream.write(bytes.toArray) }
res.bodyAsSource.runWith(sink).andThen {
case result =>
outputStream.close()
result.get
} map (_ => Success(file))
case other => Future(Failure[File](new Exception("HTTP Failure, response code " + other + " : " + res.statusText)))
}
}
}