I want to process a large local file with Play.
The file should be deleted from the filesystem right after it's been processed. It would be easy using sendFile method like this:
def index = Action {
val fileToServe = TemporaryFile(new java.io.File("/tmp/fileToServe.pdf"))
Ok.sendFile(content = fileToServe, onClose = () => fileToServe.clean)
}
But I'd like to process the file in a streaming way in order to reduce the memory footprint:
def index = Action {
val file = new java.io.File("/tmp/fileToServe.pdf")
val path: java.nio.file.Path = file.toPath
val source: Source[ByteString, _] = FileIO.fromPath(path)
Ok.sendEntity(HttpEntity.Streamed(source, Some(file.length()), Some("application/pdf")))
.withHeaders("Content-Disposition" → "attachment; filename=file.pdf")
}
And in the latter case I can't figure out the moment when the stream would be finished and I will be able to remove the file from filesystem.
You could use watchTermination on the Source to delete the file once the stream has completed. For example:
val source: Source[ByteString, _] =
FileIO.fromPath(path)
.watchTermination()((_, futDone) => futDone.onComplete {
case Success(_) =>
println("deleting the file")
java.nio.file.Files.delete(path)
case Failure(t) => println(s"stream failed: ${t.getMessage}")
})
Related
I am trying to download pdf file from S3 using the akka-stream-alpakka connector. I have the s3 path and try to download the pdf using a wrapper method over the alpakka s3Client.
def getSource(s3Path: String): Source[ByteString, NotUsed] = {
val (source, _) = s3Client.download(s3Bucket, s3Path)
source
}
From my main code, I call the above method and try to convert it to a file
val file = new File("certificate.pdf")
val res: Future[IOResult] = getSource(data.s3PdfPath)
.runWith(FileIO.toFile(file))
However, instead of it getting converted to a file, I am stuck with a type of IOResult. Can someone please guide as to where I am going wrong regarding this ?
def download(bucket: String, bucketKey: String, filePath: String) = {
val (s3Source: Source[ByteString, _], _) = s3Client.download(bucket, bucketKey)
val result = s3Source.toMat(FileIO.toPath(Paths.get(filePath)))(Keep.right)
.run()
result
}
download(s3Bucket, key, newSigFilepath).onComplete {
}
Inspect the IOResult, and if successful you can use your file:
res.foreach {
case IOResult(bytes, Success(_)) =>
println(s"$bytes bytes written to $file")
... // do whatever you want with your file
case _ =>
println("some error occurred.")
}
Upload a file in chunk to a server including additional fields
def readFile(): Seq[ExcelFile] = {
logger.info(" readSales method initiated: ")
val source_test = source("E:/dd.xlsx")
println( " source_test "+source_test)
val source_test2 = Source.fromFile(source_test)
println( " source_test2 "+source_test)
//logger.info(" source: "+source)
for {
line <- source_test2.getLines().drop(1).toVector
values = line.split(",").map(_.trim)
// logger.info(" values are the: "+values)
} yield ExcelFile(Option(values(0)), Option(values(1)), Option(values(2)), Option(values(3)))
}
def source(filePath: String): String = {
implicit val codec = Codec("UTF-8")
codec.onMalformedInput(CodingErrorAction.REPLACE)
codec.onUnmappableCharacter(CodingErrorAction.REPLACE)
Source.fromFile(filePath).mkString
}
upload route,
path("upload"){
(post & extractRequestContext) { ctx => {
implicit val materializer = ctx.materializer
implicit val ec = ctx.executionContext
fileUpload("fileUploads") {
case (fileInfo, fileStream) =>
val path = "E:\\"
val sink = FileIO.toPath(Paths.get(path).resolve(fileInfo.fileName))
val wResult = fileStream.runWith(sink)
onSuccess(wResult) { rep => rep.status match {
case Success(_) =>
var ePath = path + File.separator + fileInfo.fileName
readFile(ePath)
_success message_
case Failure(e) => _faillure message_
} }
}
} }
}
am using above code, is it possible in scala or Akka can I read the excel file like chunk file
After looking at your code, it like you are having an issue with the post-processing (after upload) of the file.
If uploading a 3GB file is working even for 1 user then I assume that it is already chunked or multipart.
The first problem is here - source_test2.getLines().drop(1).toVector which create a Vector ( > 3GB ) with all line in file.
The other problem is that you are keeping the whole Seq[ExcelFile] in memory which should be bigger than 3 GB (because of Java object overhead).
So whenever you are calling this readFile function, you are using more than 6 GB memory.
You should try to avoid creating such large object in your application and use things like Iterator instead of Seq
def readFile(): Iterator[ExcelFile] = {
val lineIterator = Source.fromFile("your_file_path").getLines
lineIterator.drop(1).map(line => {
val values = line.split(",").map(_.trim)
ExcelFile(
Option(values(0)),
Option(values(1)),
Option(values(2)),
Option(values(3))
)
})
}
The advantage with Iterator is that it will not load all the things in memory at once. And you can keep using Iterators for further steps.
I use Play Framework 2.6 (Scala) and Alpakka AWS S3 Connector to upload files asynchronously to S3 bucket. My code looks like this:
def richUpload(extension: String, checkFunction: (String, Option[String]) => Boolean, cannedAcl: CannedAcl, bucket: String) = userAction(parse.multipartFormData(handleFilePartAsFile)).async { implicit request =>
val s3Filename = request.user.get.id + "/" + java.util.UUID.randomUUID.toString + "." + extension
val fileOption = request.body.file("file").map {
case FilePart(key, filename, contentType, file) =>
Logger.info(s"key = ${key}, filename = ${filename}, contentType = ${contentType}, file = $file")
if(checkFunction(filename, contentType)) {
s3Service.uploadSink(s3Filename, cannedAcl, bucket).runWith(FileIO.fromPath(file.toPath))
} else {
throw new Exception("Upload failed")
}
}
fileOption match {
case Some(opt) => opt.map(o => Ok(s3Filename))
case _ => Future.successful(BadRequest("ERROR"))
}
}
It works, but it returns filename before it uploads to S3. But I want to return value after it uploads to S3. Is there any solution?
Also, is it possible to stream file upload directly to S3, to show progress correctly and to not use temporary disk file?
You need to flip around your source and sink to obtain the materialized value you are interested in.
You have:
a source that reads from your local files, and materializes to a Future[IOResult] upon completion of reading the file.
a sink that writes to S3 and materializes to Future[MultipartUploadResult] upon completion of writing to S3.
You are interested in the latter, but in your code you are using the former. This is because the runWith function always keeps the materialized value of stage passed as parameter.
The types in the sample snippet below should clarify this:
val fileSource: Source[ByteString, Future[IOResult]] = ???
val s3Sink : Sink [ByteString, Future[MultipartUploadResult]] = ???
val m1: Future[IOResult] = s3Sink.runWith(fileSource)
val m2: Future[MultipartUploadResult] = fileSource.runWith(s3Sink)
After you have obtained a Future[MultipartUploadResult] you can map on it the same way and access the location field to get a file's URI, e.g.:
val location: URI = fileSource.runWith(s3Sink).map(_.location)
I am trying to implement file upload functionality in my application using Akka HTTP. I am using akka-stream version 2.4.4.
Here is the code (modified from akka-doc)
path("fileupload") {
post {
extractRequestContext {
ctx => {
implicit val materializer = ctx.materializer
implicit val ec = ctx.executionContext
fileUpload("fileUpload") {
case (metadata, byteSource) =>
val location = FileUtil.getUploadPath(metadata)
val updatedFileName = metadata.fileName.replaceAll(" ", "").replaceAll("\"", "")
val uniqFileName = uniqueFileId.concat(updatedFileName)
val fullPath = location + File.separator + uniqFileName
val writer = new FileOutputStream(fullPath)
val bufferedWriter = new BufferedOutputStream(writer)
val result = byteSource.map(s => {
bufferedWriter.write(s.toArray)
}).runWith(Sink.ignore)
val result1 = byteSource.runWith(Sink.foreach(s=>bufferedWriter.write(s.toArray)))
Await.result(result1, 5.seconds)
bufferedWriter.flush()
bufferedWriter.close()
complete(uniqFileName)
/*onSuccess(result) { x =>
bufferedWriter.flush()
bufferedWriter.close()
complete("hello world")
}*/
}
}
}
}
}
This code is working fine and is uploading the file to the given path. I am generating new file names by appending UUID to make sure that the file names are unique. So I need to return the new file name to the caller. However, this method is not returning the filename always. Sometimes, it is finishing with Response has no content.
Can anyone let me know what I am doing wrong here?
There is no need to use the standard blocking streams when you have reactive streams for that purpose:
path("fileUpload") {
post {
fileUpload("fileUpload") {
case (fileInfo, fileStream) =>
val sink = FileIO.toPath(Paths.get("/tmp") resolve fileInfo.fileName)
val writeResult = fileStream.runWith(sink)
onSuccess(writeResult) { result =>
result.status match {
case Success(_) => complete(s"Successfully written ${result.count} bytes")
case Failure(e) => throw e
}
}
}
}
}
This code will upload fileUpload multipart field to a file inside /tmp directory. It just dumps the content of the input source to the respective file sink, returning a message upon the completion of the write operation.
You may also want to tweak the dispatcher used for FileIO sources and sinks, as described in their scaladocs.
If you need only uploading a file but not doing anything until upload finishes in the file stream, then there is much simpler way:
def tempDestination(fileInfo: FileInfo): File =
File.createTempFile(fileInfo.fileName, ".tmp")
val route =
storeUploadedFile("csv", tempDestination) {
case (metadata, file) =>
// do something with the file and file metadata ...
file.delete()
complete(StatusCodes.OK)
}
See docs: https://doc.akka.io/docs/akka-http/current/routing-dsl/directives/file-upload-directives/storeUploadedFile.html
Over the past few days I have been trying to figure out the best way to download a HTTP resource to a file using Akka Streams and HTTP.
Initially I started with the Future-Based Variant and that looked something like this:
def downloadViaFutures(uri: Uri, file: File): Future[Long] = {
val request = Get(uri)
val responseFuture = Http().singleRequest(request)
responseFuture.flatMap { response =>
val source = response.entity.dataBytes
source.runWith(FileIO.toFile(file))
}
}
That was kind of okay but once I learnt more about pure Akka Streams I wanted to try and use the Flow-Based Variant to create a stream starting from a Source[HttpRequest]. At first this completely stumped me until I stumbled upon the flatMapConcat flow transformation. This ended up a little more verbose:
def responseOrFail[T](in: (Try[HttpResponse], T)): (HttpResponse, T) = in match {
case (responseTry, context) => (responseTry.get, context)
}
def responseToByteSource[T](in: (HttpResponse, T)): Source[ByteString, Any] = in match {
case (response, _) => response.entity.dataBytes
}
def downloadViaFlow(uri: Uri, file: File): Future[Long] = {
val request = Get(uri)
val source = Source.single((request, ()))
val requestResponseFlow = Http().superPool[Unit]()
source.
via(requestResponseFlow).
map(responseOrFail).
flatMapConcat(responseToByteSource).
runWith(FileIO.toFile(file))
}
Then I wanted to get a little tricky and use the Content-Disposition header.
Going back to the Future-Based Variant:
def destinationFile(downloadDir: File, response: HttpResponse): File = {
val fileName = response.header[ContentDisposition].get.value
val file = new File(downloadDir, fileName)
file.createNewFile()
file
}
def downloadViaFutures2(uri: Uri, downloadDir: File): Future[Long] = {
val request = Get(uri)
val responseFuture = Http().singleRequest(request)
responseFuture.flatMap { response =>
val file = destinationFile(downloadDir, response)
val source = response.entity.dataBytes
source.runWith(FileIO.toFile(file))
}
}
But now I have no idea how to do this with the Future-Based Variant. This is as far as I got:
def responseToByteSourceWithDest[T](in: (HttpResponse, T), downloadDir: File): Source[(ByteString, File), Any] = in match {
case (response, _) =>
val source = responseToByteSource(in)
val file = destinationFile(downloadDir, response)
source.map((_, file))
}
def downloadViaFlow2(uri: Uri, downloadDir: File): Future[Long] = {
val request = Get(uri)
val source = Source.single((request, ()))
val requestResponseFlow = Http().superPool[Unit]()
val sourceWithDest: Source[(ByteString, File), Unit] = source.
via(requestResponseFlow).
map(responseOrFail).
flatMapConcat(responseToByteSourceWithDest(_, downloadDir))
sourceWithDest.runWith(???)
}
So now I have a Source that will emit one or more (ByteString, File) elements for each File (I say each File since there is no reason the original Source has to be a single HttpRequest).
Is there anyway to take these and route them to a dynamic Sink?
I'm thinking something like flatMapConcat, such as:
def runWithMap[T, Mat2](f: T => Graph[SinkShape[Out], Mat2])(implicit materializer: Materializer): Mat2 = ???
So that I could complete downloadViaFlow2 with:
def destToSink(destination: File): Sink[(ByteString, File), Future[Long]] = {
val sink = FileIO.toFile(destination, true)
Flow[(ByteString, File)].map(_._1).toMat(sink)(Keep.right)
}
sourceWithDest.runWithMap {
case (_, file) => destToSink(file)
}
The solution does not require a flatMapConcat. If you don't need any return values from the file writing then you can use Sink.foreach:
def writeFile(downloadDir : File)(httpResponse : HttpResponse) : Future[Long] = {
val file = destinationFile(downloadDir, httpResponse)
httpResponse.entity.dataBytes.runWith(FileIO.toFile(file))
}
def downloadViaFlow2(uri: Uri, downloadDir: File) : Future[Unit] = {
val request = HttpRequest(uri=uri)
val source = Source.single((request, ()))
val requestResponseFlow = Http().superPool[Unit]()
source.via(requestResponseFlow)
.map(responseOrFail)
.map(_._1)
.runWith(Sink.foreach(writeFile(downloadDir)))
}
Note that the Sink.foreach creates Futures from the writeFile function. Therefore there's not much back-pressure involved. The writeFile could be slowed down by the hard drive but the stream would keep generating Futures. To control this you can use Flow.mapAsyncUnordered (or Flow.mapAsync) :
val parallelism = 10
source.via(requestResponseFlow)
.map(responseOrFail)
.map(_._1)
.mapAsyncUnordered(parallelism)(writeFile(downloadDir))
.runWith(Sink.ignore)
If you want to accumulate the Long values for a total count you need to combine with a Sink.fold:
source.via(requestResponseFlow)
.map(responseOrFail)
.map(_._1)
.mapAsyncUnordered(parallelism)(writeFile(downloadDir))
.runWith(Sink.fold(0L)(_ + _))
The fold will keep a running sum and emit the final value when the source of requests has dried up.
Using the play Web Services client injected in ws, remmebering to import scala.concurrent.duration._:
def downloadFromUrl(url: String)(ws: WSClient): Future[Try[File]] = {
val file = File.createTempFile("my-prefix", new File("/tmp"))
file.deleteOnExit()
val futureResponse: Future[WSResponse] =
ws.url(url).withMethod("GET").withRequestTimeout(5 minutes).stream()
futureResponse.flatMap { res =>
res.status match {
case 200 =>
val outputStream = java.nio.file.Files.newOutputStream(file.toPath)
val sink = Sink.foreach[ByteString] { bytes => outputStream.write(bytes.toArray) }
res.bodyAsSource.runWith(sink).andThen {
case result =>
outputStream.close()
result.get
} map (_ => Success(file))
case other => Future(Failure[File](new Exception("HTTP Failure, response code " + other + " : " + res.statusText)))
}
}
}