Play Framework 2.6 Alpakka S3 File Upload

Play Framework 2.6 Alpakka S3 File Upload - scala

I use Play Framework 2.6 (Scala) and Alpakka AWS S3 Connector to upload files asynchronously to S3 bucket. My code looks like this:
def richUpload(extension: String, checkFunction: (String, Option[String]) => Boolean, cannedAcl: CannedAcl, bucket: String) = userAction(parse.multipartFormData(handleFilePartAsFile)).async { implicit request =>
val s3Filename = request.user.get.id + "/" + java.util.UUID.randomUUID.toString + "." + extension
val fileOption = request.body.file("file").map {
case FilePart(key, filename, contentType, file) =>
Logger.info(s"key = ${key}, filename = ${filename}, contentType = ${contentType}, file = $file")
if(checkFunction(filename, contentType)) {
s3Service.uploadSink(s3Filename, cannedAcl, bucket).runWith(FileIO.fromPath(file.toPath))
} else {
throw new Exception("Upload failed")
}
}
fileOption match {
case Some(opt) => opt.map(o => Ok(s3Filename))
case _ => Future.successful(BadRequest("ERROR"))
}
}
It works, but it returns filename before it uploads to S3. But I want to return value after it uploads to S3. Is there any solution?
Also, is it possible to stream file upload directly to S3, to show progress correctly and to not use temporary disk file?

You need to flip around your source and sink to obtain the materialized value you are interested in.
You have:
a source that reads from your local files, and materializes to a Future[IOResult] upon completion of reading the file.
a sink that writes to S3 and materializes to Future[MultipartUploadResult] upon completion of writing to S3.
You are interested in the latter, but in your code you are using the former. This is because the runWith function always keeps the materialized value of stage passed as parameter.
The types in the sample snippet below should clarify this:
val fileSource: Source[ByteString, Future[IOResult]] = ???
val s3Sink : Sink [ByteString, Future[MultipartUploadResult]] = ???
val m1: Future[IOResult] = s3Sink.runWith(fileSource)
val m2: Future[MultipartUploadResult] = fileSource.runWith(s3Sink)
After you have obtained a Future[MultipartUploadResult] you can map on it the same way and access the location field to get a file's URI, e.g.:
val location: URI = fileSource.runWith(s3Sink).map(_.location)

Related

Download pdf file from s3 using akka-stream-alpakka

I am trying to download pdf file from S3 using the akka-stream-alpakka connector. I have the s3 path and try to download the pdf using a wrapper method over the alpakka s3Client.
def getSource(s3Path: String): Source[ByteString, NotUsed] = {
val (source, _) = s3Client.download(s3Bucket, s3Path)
source
}
From my main code, I call the above method and try to convert it to a file
val file = new File("certificate.pdf")
val res: Future[IOResult] = getSource(data.s3PdfPath)
.runWith(FileIO.toFile(file))
However, instead of it getting converted to a file, I am stuck with a type of IOResult. Can someone please guide as to where I am going wrong regarding this ?

def download(bucket: String, bucketKey: String, filePath: String) = {
val (s3Source: Source[ByteString, _], _) = s3Client.download(bucket, bucketKey)
val result = s3Source.toMat(FileIO.toPath(Paths.get(filePath)))(Keep.right)
.run()
result
}
download(s3Bucket, key, newSigFilepath).onComplete {
}

Inspect the IOResult, and if successful you can use your file:
res.foreach {
case IOResult(bytes, Success(_)) =>
println(s"$bytes bytes written to $file")
... // do whatever you want with your file
case _ =>
println("some error occurred.")
}

Using Spark on Dataproc, how to write to GCS separately from each partition?

Using Spark on GCP Dataproc, I successfuly write an entire RDD to GCS like so:
rdd.saveAsTextFile(s"gs://$path")
The products are files for each partition in the same path.
How do I write files for each partition (with a unique path based on information from the partition)
Below is an invented non working wishful code example
rdd.mapPartitionsWithIndex(
(i, partition) =>{
partition.write(path = s"gs://partition_$i", data = partition_specific_data)
}
)
when I call the function below from within the partition on my mac it writes to local disk, on Dataproc I get an error not recognizing the gs as a valid path.
def writeLocally(filePath: String, data: Array[Byte], errorMessage: String): Unit = {
println("Juicy Platform")
val path = new Path(filePath)
var ofos: Option[FSDataOutputStream] = null
try {
println(s"\nTrying to write to $filePath\n")
val conf = new Configuration()
conf.set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
conf.set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
// conf.addResource(new Path("/home/hadoop/conf/core-site.xml"))
println(conf.toString)
val fs = FileSystem.get(conf)
val fos = fs.create(path)
ofos = Option(fos)
fos.write(data)
println(s"\nWrote to $filePath\n")
}
catch {
case e: Exception =>
logError(errorMessage, s"Exception occurred writing to GCS:\n${ExceptionUtils.getStackTrace(e)}")
}
finally {
ofos match {
case Some(i) => i.close()
case _ =>
}
}
}
This is the error:
java.lang.IllegalArgumentException: Wrong FS: gs://path/myFile.json, expected: hdfs://cluster-95cf-m

If running on a Dataproc cluster, you shouldn't need to explicitly populate "fs.gs.impl" in the Configuration; a new Configuration() should already contain the necessary mappings.
The main problem here is that val fs = FileSystem.get(conf) is using the fs.defaultFS property of the conf; it has no way of knowing whether you wanted to get a FileSystem instance specific to HDFS or to GCS. In general, In Hadoop and Spark, a FileSystem instance is fundamentally tied to a single URL scheme; you need to fetch a scheme-specific instance for each different scheme, such as hdfs:// or gs:// or s3://.
The simplest fix to your problem is to always use Path.getFileSystem(Configuration) as opposed to FileSystem.get(Configuration). And make sure your path is fully-qualified with the scheme:
...
val path = "gs://bucket/foo/data"
val fs = path.getFileSystem(conf)
val fos = fs.create(path)
ofos = Option(fos)
fos.write(data)
...

How to delete file right after processing it with Play Framework

I want to process a large local file with Play.
The file should be deleted from the filesystem right after it's been processed. It would be easy using sendFile method like this:
def index = Action {
val fileToServe = TemporaryFile(new java.io.File("/tmp/fileToServe.pdf"))
Ok.sendFile(content = fileToServe, onClose = () => fileToServe.clean)
}
But I'd like to process the file in a streaming way in order to reduce the memory footprint:
def index = Action {
val file = new java.io.File("/tmp/fileToServe.pdf")
val path: java.nio.file.Path = file.toPath
val source: Source[ByteString, _] = FileIO.fromPath(path)
Ok.sendEntity(HttpEntity.Streamed(source, Some(file.length()), Some("application/pdf")))
.withHeaders("Content-Disposition" → "attachment; filename=file.pdf")
}
And in the latter case I can't figure out the moment when the stream would be finished and I will be able to remove the file from filesystem.

You could use watchTermination on the Source to delete the file once the stream has completed. For example:
val source: Source[ByteString, _] =
FileIO.fromPath(path)
.watchTermination()((_, futDone) => futDone.onComplete {
case Success(_) =>
println("deleting the file")
java.nio.file.Files.delete(path)
case Failure(t) => println(s"stream failed: ${t.getMessage}")
})

File upload using Akka HTTP

I am trying to implement file upload functionality in my application using Akka HTTP. I am using akka-stream version 2.4.4.
Here is the code (modified from akka-doc)
path("fileupload") {
post {
extractRequestContext {
ctx => {
implicit val materializer = ctx.materializer
implicit val ec = ctx.executionContext
fileUpload("fileUpload") {
case (metadata, byteSource) =>
val location = FileUtil.getUploadPath(metadata)
val updatedFileName = metadata.fileName.replaceAll(" ", "").replaceAll("\"", "")
val uniqFileName = uniqueFileId.concat(updatedFileName)
val fullPath = location + File.separator + uniqFileName
val writer = new FileOutputStream(fullPath)
val bufferedWriter = new BufferedOutputStream(writer)
val result = byteSource.map(s => {
bufferedWriter.write(s.toArray)
}).runWith(Sink.ignore)
val result1 = byteSource.runWith(Sink.foreach(s=>bufferedWriter.write(s.toArray)))
Await.result(result1, 5.seconds)
bufferedWriter.flush()
bufferedWriter.close()
complete(uniqFileName)
/*onSuccess(result) { x =>
bufferedWriter.flush()
bufferedWriter.close()
complete("hello world")
}*/
}
}
}
}
}
This code is working fine and is uploading the file to the given path. I am generating new file names by appending UUID to make sure that the file names are unique. So I need to return the new file name to the caller. However, this method is not returning the filename always. Sometimes, it is finishing with Response has no content.
Can anyone let me know what I am doing wrong here?

There is no need to use the standard blocking streams when you have reactive streams for that purpose:
path("fileUpload") {
post {
fileUpload("fileUpload") {
case (fileInfo, fileStream) =>
val sink = FileIO.toPath(Paths.get("/tmp") resolve fileInfo.fileName)
val writeResult = fileStream.runWith(sink)
onSuccess(writeResult) { result =>
result.status match {
case Success(_) => complete(s"Successfully written ${result.count} bytes")
case Failure(e) => throw e
}
}
}
}
}
This code will upload fileUpload multipart field to a file inside /tmp directory. It just dumps the content of the input source to the respective file sink, returning a message upon the completion of the write operation.
You may also want to tweak the dispatcher used for FileIO sources and sinks, as described in their scaladocs.

If you need only uploading a file but not doing anything until upload finishes in the file stream, then there is much simpler way:
def tempDestination(fileInfo: FileInfo): File =
File.createTempFile(fileInfo.fileName, ".tmp")
val route =
storeUploadedFile("csv", tempDestination) {
case (metadata, file) =>
// do something with the file and file metadata ...
file.delete()
complete(StatusCodes.OK)
}
See docs: https://doc.akka.io/docs/akka-http/current/routing-dsl/directives/file-upload-directives/storeUploadedFile.html

Play! Upload file and save to AWS S3

I am using Play 2.3 and want to store the uploaded files to S3, so I use Play-S3 module.
However, I got stuck because I need to create a BucketFile to upload to S3 with this module, and a BucketFile is created using an Array[Byte] in memory of the file. The Play! body parser gives me a temporary on disc file. How can I put this file into BucketFile?
Here is my controller Action:
def upload = Action.async(parse.multipartFormData) { implicit request =>
request.body.file("file").map{ file =>
implicit val credential = AwsCredentials.fromConfiguration
val bucket = S3("bucketName")
val result = bucket + BucketFile(file.filename, file.contentType.get, file.ref.file.toString.getBytes)
result.map{ unit =>
Ok("File uploaded")
}
}.getOrElse {
Future.successful {
Redirect(routes.Application.index).flashing(
"error" -> "Missing file"
)
}
}
}
This code does not work because file.ref.file.toString() does not really return the string representation of a file.

Import the following:
import java.nio.file.{Paths, Files}
To create the Array[Byte] do:
val byteArray = Files.readAllBytes(Paths.get(file.ref.file.getPath))
Then upload with:
BucketFile(file.filename, file.contentType.get, byteArray, None, None)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Play Framework 2.6 Alpakka S3 File Upload - scala

Related

Download pdf file from s3 using akka-stream-alpakka

Using Spark on Dataproc, how to write to GCS separately from each partition?

How to delete file right after processing it with Play Framework

File upload using Akka HTTP

Play! Upload file and save to AWS S3

Categories

Resources