Play! Upload file and save to AWS S3 - scala

I am using Play 2.3 and want to store the uploaded files to S3, so I use Play-S3 module.
However, I got stuck because I need to create a BucketFile to upload to S3 with this module, and a BucketFile is created using an Array[Byte] in memory of the file. The Play! body parser gives me a temporary on disc file. How can I put this file into BucketFile?
Here is my controller Action:
def upload = Action.async(parse.multipartFormData) { implicit request =>
request.body.file("file").map{ file =>
implicit val credential = AwsCredentials.fromConfiguration
val bucket = S3("bucketName")
val result = bucket + BucketFile(file.filename, file.contentType.get, file.ref.file.toString.getBytes)
result.map{ unit =>
Ok("File uploaded")
}
}.getOrElse {
Future.successful {
Redirect(routes.Application.index).flashing(
"error" -> "Missing file"
)
}
}
}
This code does not work because file.ref.file.toString() does not really return the string representation of a file.

Import the following:
import java.nio.file.{Paths, Files}
To create the Array[Byte] do:
val byteArray = Files.readAllBytes(Paths.get(file.ref.file.getPath))
Then upload with:
BucketFile(file.filename, file.contentType.get, byteArray, None, None)

Related

Decompressing .Z file stored in Azure ADLS Gen2

I have a .Z file stored in Azure ADLS Gen2. I want to decompress the file in the ADLS, I tried decompressing using ADF and C# but found that .Z is not supported. Also I tried using Apache Common Compress Lib for decompression, but unable to read the file in InputStream.
Can anyone have any idea, how we can decompress the file using Apache lib in Scala.
.Z files are .gzip files so you could try this approach
import java.io.{BufferedReader, File, FileInputStream, InputStreamReader}
import java.util.zip.GZIPInputStream
object UnzipFiles {
def decompressGzipOrZFiles(file: File, encode: String): BufferedReader = {
val fis = new FileInputStream(file)
val gzis = new GZIPInputStream(fis)
val isr = new InputStreamReader(gzis, encode)
new BufferedReader(isr)
}
def main(args: Array[String]): Unit = {
val path = new File("/home/cloudera/files/my_file.Z")
// print to the console
decompressGzipOrZFiles(path,"UTF-8").lines().toArray.foreach(println)
}
}
or you could follow this too
def uncompressGzip(myFileDotZorGzip: String): Unit = {
import java.io.FileInputStream
import java.util.zip.GZIPInputStream
try {
val gzipInputStream = new GZIPInputStream(new FileInputStream(myFileDotZorGzip))
try {
val tam = 128
val buffer = new Array[Byte](tam)
do {
gzipInputStream.read(buffer)
gzipInputStream.skip(tam)
//do something with data
print(buffer.foreach(b => print(b.toChar)))
} while(gzipInputStream.read() != -1)
} finally {
if (gzipInputStream != null) gzipInputStream.close()
}
}
}
I hope this helps.

Download pdf file from s3 using akka-stream-alpakka

I am trying to download pdf file from S3 using the akka-stream-alpakka connector. I have the s3 path and try to download the pdf using a wrapper method over the alpakka s3Client.
def getSource(s3Path: String): Source[ByteString, NotUsed] = {
val (source, _) = s3Client.download(s3Bucket, s3Path)
source
}
From my main code, I call the above method and try to convert it to a file
val file = new File("certificate.pdf")
val res: Future[IOResult] = getSource(data.s3PdfPath)
.runWith(FileIO.toFile(file))
However, instead of it getting converted to a file, I am stuck with a type of IOResult. Can someone please guide as to where I am going wrong regarding this ?
def download(bucket: String, bucketKey: String, filePath: String) = {
val (s3Source: Source[ByteString, _], _) = s3Client.download(bucket, bucketKey)
val result = s3Source.toMat(FileIO.toPath(Paths.get(filePath)))(Keep.right)
.run()
result
}
download(s3Bucket, key, newSigFilepath).onComplete {
}
Inspect the IOResult, and if successful you can use your file:
res.foreach {
case IOResult(bytes, Success(_)) =>
println(s"$bytes bytes written to $file")
... // do whatever you want with your file
case _ =>
println("some error occurred.")
}

Play Framework 2.6 Alpakka S3 File Upload

I use Play Framework 2.6 (Scala) and Alpakka AWS S3 Connector to upload files asynchronously to S3 bucket. My code looks like this:
def richUpload(extension: String, checkFunction: (String, Option[String]) => Boolean, cannedAcl: CannedAcl, bucket: String) = userAction(parse.multipartFormData(handleFilePartAsFile)).async { implicit request =>
val s3Filename = request.user.get.id + "/" + java.util.UUID.randomUUID.toString + "." + extension
val fileOption = request.body.file("file").map {
case FilePart(key, filename, contentType, file) =>
Logger.info(s"key = ${key}, filename = ${filename}, contentType = ${contentType}, file = $file")
if(checkFunction(filename, contentType)) {
s3Service.uploadSink(s3Filename, cannedAcl, bucket).runWith(FileIO.fromPath(file.toPath))
} else {
throw new Exception("Upload failed")
}
}
fileOption match {
case Some(opt) => opt.map(o => Ok(s3Filename))
case _ => Future.successful(BadRequest("ERROR"))
}
}
It works, but it returns filename before it uploads to S3. But I want to return value after it uploads to S3. Is there any solution?
Also, is it possible to stream file upload directly to S3, to show progress correctly and to not use temporary disk file?
You need to flip around your source and sink to obtain the materialized value you are interested in.
You have:
a source that reads from your local files, and materializes to a Future[IOResult] upon completion of reading the file.
a sink that writes to S3 and materializes to Future[MultipartUploadResult] upon completion of writing to S3.
You are interested in the latter, but in your code you are using the former. This is because the runWith function always keeps the materialized value of stage passed as parameter.
The types in the sample snippet below should clarify this:
val fileSource: Source[ByteString, Future[IOResult]] = ???
val s3Sink : Sink [ByteString, Future[MultipartUploadResult]] = ???
val m1: Future[IOResult] = s3Sink.runWith(fileSource)
val m2: Future[MultipartUploadResult] = fileSource.runWith(s3Sink)
After you have obtained a Future[MultipartUploadResult] you can map on it the same way and access the location field to get a file's URI, e.g.:
val location: URI = fileSource.runWith(s3Sink).map(_.location)

reading zip file from s3 bucket using scala spark

i am trying to fetch and read text files in a zip file uploaded on aws s3 bucket
code i tried
var ZipFileList = spark.sparkContext.binaryFiles(/path/);
var unit = ZipFileList.flatMap {
case (zipFilePath, zipContent) =>
{
val zipInputStream = new ZipInputStream(zipContent.open())
val zipEntry = zipInputStream.getNextEntry()
println(zipEntry.getName)
}
}
but it gives an error unit required traversableOnce
val files = spark.sparkContext.wholeTextFiles(/path/))
files.flatMap({case (name, content) =>
unzip(content) //gives error "type mismatch; found : Unit required: scala.collection.GenTraversableOnce[?]"
})
is there any other way to read file contents inside a zip file ...
zip file contains .json files and i want to achieve is to read and parse all those files
you aren't actually returning the data in the unzip() command, are you? I think that's part of the problem

File upload using Akka HTTP

I am trying to implement file upload functionality in my application using Akka HTTP. I am using akka-stream version 2.4.4.
Here is the code (modified from akka-doc)
path("fileupload") {
post {
extractRequestContext {
ctx => {
implicit val materializer = ctx.materializer
implicit val ec = ctx.executionContext
fileUpload("fileUpload") {
case (metadata, byteSource) =>
val location = FileUtil.getUploadPath(metadata)
val updatedFileName = metadata.fileName.replaceAll(" ", "").replaceAll("\"", "")
val uniqFileName = uniqueFileId.concat(updatedFileName)
val fullPath = location + File.separator + uniqFileName
val writer = new FileOutputStream(fullPath)
val bufferedWriter = new BufferedOutputStream(writer)
val result = byteSource.map(s => {
bufferedWriter.write(s.toArray)
}).runWith(Sink.ignore)
val result1 = byteSource.runWith(Sink.foreach(s=>bufferedWriter.write(s.toArray)))
Await.result(result1, 5.seconds)
bufferedWriter.flush()
bufferedWriter.close()
complete(uniqFileName)
/*onSuccess(result) { x =>
bufferedWriter.flush()
bufferedWriter.close()
complete("hello world")
}*/
}
}
}
}
}
This code is working fine and is uploading the file to the given path. I am generating new file names by appending UUID to make sure that the file names are unique. So I need to return the new file name to the caller. However, this method is not returning the filename always. Sometimes, it is finishing with Response has no content.
Can anyone let me know what I am doing wrong here?
There is no need to use the standard blocking streams when you have reactive streams for that purpose:
path("fileUpload") {
post {
fileUpload("fileUpload") {
case (fileInfo, fileStream) =>
val sink = FileIO.toPath(Paths.get("/tmp") resolve fileInfo.fileName)
val writeResult = fileStream.runWith(sink)
onSuccess(writeResult) { result =>
result.status match {
case Success(_) => complete(s"Successfully written ${result.count} bytes")
case Failure(e) => throw e
}
}
}
}
}
This code will upload fileUpload multipart field to a file inside /tmp directory. It just dumps the content of the input source to the respective file sink, returning a message upon the completion of the write operation.
You may also want to tweak the dispatcher used for FileIO sources and sinks, as described in their scaladocs.
If you need only uploading a file but not doing anything until upload finishes in the file stream, then there is much simpler way:
def tempDestination(fileInfo: FileInfo): File =
File.createTempFile(fileInfo.fileName, ".tmp")
val route =
storeUploadedFile("csv", tempDestination) {
case (metadata, file) =>
// do something with the file and file metadata ...
file.delete()
complete(StatusCodes.OK)
}
See docs: https://doc.akka.io/docs/akka-http/current/routing-dsl/directives/file-upload-directives/storeUploadedFile.html