Download pdf file from s3 using akka-stream-alpakka - scala

I am trying to download pdf file from S3 using the akka-stream-alpakka connector. I have the s3 path and try to download the pdf using a wrapper method over the alpakka s3Client.
def getSource(s3Path: String): Source[ByteString, NotUsed] = {
val (source, _) = s3Client.download(s3Bucket, s3Path)
source
}
From my main code, I call the above method and try to convert it to a file
val file = new File("certificate.pdf")
val res: Future[IOResult] = getSource(data.s3PdfPath)
.runWith(FileIO.toFile(file))
However, instead of it getting converted to a file, I am stuck with a type of IOResult. Can someone please guide as to where I am going wrong regarding this ?

def download(bucket: String, bucketKey: String, filePath: String) = {
val (s3Source: Source[ByteString, _], _) = s3Client.download(bucket, bucketKey)
val result = s3Source.toMat(FileIO.toPath(Paths.get(filePath)))(Keep.right)
.run()
result
}
download(s3Bucket, key, newSigFilepath).onComplete {
}

Inspect the IOResult, and if successful you can use your file:
res.foreach {
case IOResult(bytes, Success(_)) =>
println(s"$bytes bytes written to $file")
... // do whatever you want with your file
case _ =>
println("some error occurred.")
}

Related

Getting a 0KB File in S3 when i write the zipInputStream

I have the following piece of code which is producing 0KB Files in S3 whats wrong with the following piece of code
def extractFilesFromZipStream(zipInputStream: ZipArchiveInputStream,
tgtPath: String, storageType:String): scala.collection.mutable.Map[String, (String, Long)] = {
var filePathMap = scala.collection.mutable.Map[String, (String, Long)]()
Try {
storageWrapper.mkDirs(tgtPath)
Stream.continually(zipInputStream.getNextZipEntry).takeWhile(_ != null).foreach {file =>
val storagePathFilePath = s"$tgtPath/${file.getName}"
storageWrapper.write(zipInputStream, storagePathFilePath)
LOGGER.info(storagePathFilePath)
val lineCount = Source.fromInputStream(storageWrapper.read(storagePathFilePath)).getLines().count(s => s!= null)
There is nothing wrong with the Storage wrapper, it takes an input stream and path and has been working well so far. Can anyone suggest what's wrong with my implementation of using zipArchive Stream.

How to delete file right after processing it with Play Framework

I want to process a large local file with Play.
The file should be deleted from the filesystem right after it's been processed. It would be easy using sendFile method like this:
def index = Action {
val fileToServe = TemporaryFile(new java.io.File("/tmp/fileToServe.pdf"))
Ok.sendFile(content = fileToServe, onClose = () => fileToServe.clean)
}
But I'd like to process the file in a streaming way in order to reduce the memory footprint:
def index = Action {
val file = new java.io.File("/tmp/fileToServe.pdf")
val path: java.nio.file.Path = file.toPath
val source: Source[ByteString, _] = FileIO.fromPath(path)
Ok.sendEntity(HttpEntity.Streamed(source, Some(file.length()), Some("application/pdf")))
.withHeaders("Content-Disposition" → "attachment; filename=file.pdf")
}
And in the latter case I can't figure out the moment when the stream would be finished and I will be able to remove the file from filesystem.
You could use watchTermination on the Source to delete the file once the stream has completed. For example:
val source: Source[ByteString, _] =
FileIO.fromPath(path)
.watchTermination()((_, futDone) => futDone.onComplete {
case Success(_) =>
println("deleting the file")
java.nio.file.Files.delete(path)
case Failure(t) => println(s"stream failed: ${t.getMessage}")
})

Play Framework 2.6 Alpakka S3 File Upload

I use Play Framework 2.6 (Scala) and Alpakka AWS S3 Connector to upload files asynchronously to S3 bucket. My code looks like this:
def richUpload(extension: String, checkFunction: (String, Option[String]) => Boolean, cannedAcl: CannedAcl, bucket: String) = userAction(parse.multipartFormData(handleFilePartAsFile)).async { implicit request =>
val s3Filename = request.user.get.id + "/" + java.util.UUID.randomUUID.toString + "." + extension
val fileOption = request.body.file("file").map {
case FilePart(key, filename, contentType, file) =>
Logger.info(s"key = ${key}, filename = ${filename}, contentType = ${contentType}, file = $file")
if(checkFunction(filename, contentType)) {
s3Service.uploadSink(s3Filename, cannedAcl, bucket).runWith(FileIO.fromPath(file.toPath))
} else {
throw new Exception("Upload failed")
}
}
fileOption match {
case Some(opt) => opt.map(o => Ok(s3Filename))
case _ => Future.successful(BadRequest("ERROR"))
}
}
It works, but it returns filename before it uploads to S3. But I want to return value after it uploads to S3. Is there any solution?
Also, is it possible to stream file upload directly to S3, to show progress correctly and to not use temporary disk file?
You need to flip around your source and sink to obtain the materialized value you are interested in.
You have:
a source that reads from your local files, and materializes to a Future[IOResult] upon completion of reading the file.
a sink that writes to S3 and materializes to Future[MultipartUploadResult] upon completion of writing to S3.
You are interested in the latter, but in your code you are using the former. This is because the runWith function always keeps the materialized value of stage passed as parameter.
The types in the sample snippet below should clarify this:
val fileSource: Source[ByteString, Future[IOResult]] = ???
val s3Sink : Sink [ByteString, Future[MultipartUploadResult]] = ???
val m1: Future[IOResult] = s3Sink.runWith(fileSource)
val m2: Future[MultipartUploadResult] = fileSource.runWith(s3Sink)
After you have obtained a Future[MultipartUploadResult] you can map on it the same way and access the location field to get a file's URI, e.g.:
val location: URI = fileSource.runWith(s3Sink).map(_.location)

File upload using Akka HTTP

I am trying to implement file upload functionality in my application using Akka HTTP. I am using akka-stream version 2.4.4.
Here is the code (modified from akka-doc)
path("fileupload") {
post {
extractRequestContext {
ctx => {
implicit val materializer = ctx.materializer
implicit val ec = ctx.executionContext
fileUpload("fileUpload") {
case (metadata, byteSource) =>
val location = FileUtil.getUploadPath(metadata)
val updatedFileName = metadata.fileName.replaceAll(" ", "").replaceAll("\"", "")
val uniqFileName = uniqueFileId.concat(updatedFileName)
val fullPath = location + File.separator + uniqFileName
val writer = new FileOutputStream(fullPath)
val bufferedWriter = new BufferedOutputStream(writer)
val result = byteSource.map(s => {
bufferedWriter.write(s.toArray)
}).runWith(Sink.ignore)
val result1 = byteSource.runWith(Sink.foreach(s=>bufferedWriter.write(s.toArray)))
Await.result(result1, 5.seconds)
bufferedWriter.flush()
bufferedWriter.close()
complete(uniqFileName)
/*onSuccess(result) { x =>
bufferedWriter.flush()
bufferedWriter.close()
complete("hello world")
}*/
}
}
}
}
}
This code is working fine and is uploading the file to the given path. I am generating new file names by appending UUID to make sure that the file names are unique. So I need to return the new file name to the caller. However, this method is not returning the filename always. Sometimes, it is finishing with Response has no content.
Can anyone let me know what I am doing wrong here?
There is no need to use the standard blocking streams when you have reactive streams for that purpose:
path("fileUpload") {
post {
fileUpload("fileUpload") {
case (fileInfo, fileStream) =>
val sink = FileIO.toPath(Paths.get("/tmp") resolve fileInfo.fileName)
val writeResult = fileStream.runWith(sink)
onSuccess(writeResult) { result =>
result.status match {
case Success(_) => complete(s"Successfully written ${result.count} bytes")
case Failure(e) => throw e
}
}
}
}
}
This code will upload fileUpload multipart field to a file inside /tmp directory. It just dumps the content of the input source to the respective file sink, returning a message upon the completion of the write operation.
You may also want to tweak the dispatcher used for FileIO sources and sinks, as described in their scaladocs.
If you need only uploading a file but not doing anything until upload finishes in the file stream, then there is much simpler way:
def tempDestination(fileInfo: FileInfo): File =
File.createTempFile(fileInfo.fileName, ".tmp")
val route =
storeUploadedFile("csv", tempDestination) {
case (metadata, file) =>
// do something with the file and file metadata ...
file.delete()
complete(StatusCodes.OK)
}
See docs: https://doc.akka.io/docs/akka-http/current/routing-dsl/directives/file-upload-directives/storeUploadedFile.html

Play! Upload file and save to AWS S3

I am using Play 2.3 and want to store the uploaded files to S3, so I use Play-S3 module.
However, I got stuck because I need to create a BucketFile to upload to S3 with this module, and a BucketFile is created using an Array[Byte] in memory of the file. The Play! body parser gives me a temporary on disc file. How can I put this file into BucketFile?
Here is my controller Action:
def upload = Action.async(parse.multipartFormData) { implicit request =>
request.body.file("file").map{ file =>
implicit val credential = AwsCredentials.fromConfiguration
val bucket = S3("bucketName")
val result = bucket + BucketFile(file.filename, file.contentType.get, file.ref.file.toString.getBytes)
result.map{ unit =>
Ok("File uploaded")
}
}.getOrElse {
Future.successful {
Redirect(routes.Application.index).flashing(
"error" -> "Missing file"
)
}
}
}
This code does not work because file.ref.file.toString() does not really return the string representation of a file.
Import the following:
import java.nio.file.{Paths, Files}
To create the Array[Byte] do:
val byteArray = Files.readAllBytes(Paths.get(file.ref.file.getPath))
Then upload with:
BucketFile(file.filename, file.contentType.get, byteArray, None, None)