Wait for a "Process" in Scala to complete - scala

So there is this function in my program, that downloads a quite heavy zip file (about 500 megabytes) and then extracts the file and removes the zip file itself.
And obviously I want to wait for the file to download completely, then wait to unzip it completely and then remove the zip file itself (I just need the Json files inside). This is the code that I use currently:
import java.io.File
import java.net.URL
import scala.sys.process._
/* other functions */
// downloading, unzipping and removing are in separate functinos, but I
// aggregated them all here for simplicity
def downloadZipThenExtract(link: String, filePath: String): Future[Int] = {
val urlObject = new URL(link)
val file = new File(filePath)
Future {
val download: ProcessBuilder = urlObject #> file
val unzip: ProcessBuilder = s"unzip ${file.getPath} -d ${file.getParent}"
val delete: ProcessBuilder = s"rm ${file.getPath}"
/*
I've already tried this:
(download ### unzip ### delete) !
And every other solution, none of them worked
*/ // =>
download !
Thread.sleep(900000) // wait 15 minutes to download
unzip !
Thread.sleep(60000) // One minute to unzip
delete !
}
}
And as you can see, I found no other approach than freezing the thread to complete the download and unzipping, which of course sucks. So I wanted to know if you guys know any better approach, thanks.

I am not sure why you think that "freezing the thread" sucks. If you have to wait for the process to finish, then you have to wait for it.
But if it is just Thread.sleep that you see as problematic, then I have good news for you: you don't need it :)
download !
unzip !
delete !
I know you said you already tried "every other solution, and nothing worked", but if you have indeed tried this one, you gotta elaborate, because then I don't know what you mean when you say it "doesn't work".

I almost forgot about this question of mine, which I'd like to answer it in case anybody else faces this problem, since someone just upvoted the question.
My solution was to use akka, then put three streams inside my code, where obviously, first one downloads the data and puts it in the desired zip file, the second one extracts the zip file using java nio, and the last one just deletes the zip file.
Note that there's some extra code which you might not need, like the println's in case of exceptions and other stuff, I just wanted to share the idea and the solution.
here is my code:
def downloadFilesInto(fullPath: String, link: String)(implicit materializer: Materializer): Future[Int] = {
val directory: File = new File(fullPath).getParentFile
val urlObj = new URL(link)
val fileOutputStream = new FileOutputStream(fullPath)
val readableChannel = Channels.newChannel(urlObj.openStream())
val source = Source.single(1)
val downloaderFlow: Flow[Int, Long, NotUsed] = Flow[Int].map { _ =>
val size: Long = fileOutputStream.getChannel.transferFrom(readableChannel, 0, Long.MaxValue)
fileOutputStream.close()
readableChannel.close()
size
}
val unzipFlow: Flow[Long, Int, NotUsed] = Flow[Long].map { size =>
println(s"file downloaded, properties:\n type: zip\n size: ${size.toString.take(3)} MB")
val zipFile = new ZipFile(new File(fullPath))
val entries = zipFile.entries().asScala.toList
val eachFileExtractionResult: List[Int] =
entries.map { entry =>
try {
val path = new File(directory + "/" + entry.getName).toPath
Files.copy(zipFile.getInputStream(entry), path)
0
} catch {
case ex: Exception =>
println(
s"""
| Caught an Exception while unzipping file: ${entry.getName}
| Type: ${ex.getClass}
| Cause: ${ex.getCause}
| Message: ${ex.getMessage}
|""".stripMargin
)
1
case th: Throwable =>
println(
s"""
| Caught a throwable while unzipping file: ${entry.getName}
| Type: ${th.getClass}
| Cause: ${th.getCause}
| Message: ${th.getMessage}
|""".stripMargin
)
2
}
}
val unsuccessfulTries = eachFileExtractionResult.filterNot(_ == 0)
if (unsuccessfulTries.isEmpty)
0
else unsuccessfulTries.head
}
val removeFlow: Flow[Int, Int, NotUsed] = Flow[Int].map {
case 0 => // successfully unzipped
s"rm $fullPath".!
case whatever => whatever
}
val sink = Sink.fold[Int, Int](0)(_ + _)
source.viaMat(downloaderFlow)(Keep.right)
.viaMat(unzipFlow)(Keep.right)
.viaMat(removeFlow)(Keep.right)
.toMat(sink)(Keep.right)
.run()
}

Related

Download pdf file from s3 using akka-stream-alpakka

I am trying to download pdf file from S3 using the akka-stream-alpakka connector. I have the s3 path and try to download the pdf using a wrapper method over the alpakka s3Client.
def getSource(s3Path: String): Source[ByteString, NotUsed] = {
val (source, _) = s3Client.download(s3Bucket, s3Path)
source
}
From my main code, I call the above method and try to convert it to a file
val file = new File("certificate.pdf")
val res: Future[IOResult] = getSource(data.s3PdfPath)
.runWith(FileIO.toFile(file))
However, instead of it getting converted to a file, I am stuck with a type of IOResult. Can someone please guide as to where I am going wrong regarding this ?
def download(bucket: String, bucketKey: String, filePath: String) = {
val (s3Source: Source[ByteString, _], _) = s3Client.download(bucket, bucketKey)
val result = s3Source.toMat(FileIO.toPath(Paths.get(filePath)))(Keep.right)
.run()
result
}
download(s3Bucket, key, newSigFilepath).onComplete {
}
Inspect the IOResult, and if successful you can use your file:
res.foreach {
case IOResult(bytes, Success(_)) =>
println(s"$bytes bytes written to $file")
... // do whatever you want with your file
case _ =>
println("some error occurred.")
}

Not able to read file in scala

Link of my screenshot I am a beginner in Scala, trying to read the file but getting the java.io.FileNotFoundException,can someone help.
package standardscala
case class TempData(day :Int,doy :Int, month:Int, year :Int, precip :Double, snow :Double, tave :Double, tmax :Double, tmin :Double )
object TempData {
def main(args: Array[String]): Unit = {
val source = scala.io.Source.fromFile("DATA/MN212.csv")
val lines = source.getLines().drop(1) // to get the lines of files,drop(1) to drop the header
val data= lines.map { line => val p = line.split(",")
TempData(p(0).toInt,p(1).toInt,p(2).toInt,p(4).toInt,p(5).toDouble,p(6).toDouble,p(7).toDouble,p(8).toDouble,p(9).toDouble)
}.toArray
source.close() //Closing the connection
data.take(5) foreach println
}
}
Try to use absolute path, and the problem will disappear.
One option would be to move your csv file into a resources folder and load it as a resource like:
val f = new File(getClass.getClassLoader.getResource("your/csv/file.csv").getPath)
Or you could try loading it from an absolute path!
Please read this post about reading CSV by Alvin Alexander, writer of the Scala Cookbook:
object CSVDemo extends App {
println("Month, Income, Expenses, Profit")
val bufferedSource = io.Source.fromFile("/tmp/finance.csv")
for (line <- bufferedSource.getLines) {
val cols = line.split(",").map(_.trim)
// do whatever you want with the columns here
println(s"${cols(0)}|${cols(1)}|${cols(2)}|${cols(3)}")
}
bufferedSource.close
}
As Silvio Manolo pointed out, you should not use fromFile with absolute path, as you code will require the same file hierarchy to run. In a first draft, this is acceptable so you can move on and test the real job!

is it possible in scala/Akka to read the .xls file and .xlsx as a chunk?

Upload a file in chunk to a server including additional fields
def readFile(): Seq[ExcelFile] = {
logger.info(" readSales method initiated: ")
val source_test = source("E:/dd.xlsx")
println( " source_test "+source_test)
val source_test2 = Source.fromFile(source_test)
println( " source_test2 "+source_test)
//logger.info(" source: "+source)
for {
line <- source_test2.getLines().drop(1).toVector
values = line.split(",").map(_.trim)
// logger.info(" values are the: "+values)
} yield ExcelFile(Option(values(0)), Option(values(1)), Option(values(2)), Option(values(3)))
}
def source(filePath: String): String = {
implicit val codec = Codec("UTF-8")
codec.onMalformedInput(CodingErrorAction.REPLACE)
codec.onUnmappableCharacter(CodingErrorAction.REPLACE)
Source.fromFile(filePath).mkString
}
upload route,
path("upload"){
(post & extractRequestContext) { ctx => {
implicit val materializer = ctx.materializer
implicit val ec = ctx.executionContext
fileUpload("fileUploads") {
case (fileInfo, fileStream) =>
val path = "E:\\"
val sink = FileIO.toPath(Paths.get(path).resolve(fileInfo.fileName))
val wResult = fileStream.runWith(sink)
onSuccess(wResult) { rep => rep.status match {
case Success(_) =>
var ePath = path + File.separator + fileInfo.fileName
readFile(ePath)
_success message_
case Failure(e) => _faillure message_
} }
}
} }
}
am using above code, is it possible in scala or Akka can I read the excel file like chunk file
After looking at your code, it like you are having an issue with the post-processing (after upload) of the file.
If uploading a 3GB file is working even for 1 user then I assume that it is already chunked or multipart.
The first problem is here - source_test2.getLines().drop(1).toVector which create a Vector ( > 3GB ) with all line in file.
The other problem is that you are keeping the whole Seq[ExcelFile] in memory which should be bigger than 3 GB (because of Java object overhead).
So whenever you are calling this readFile function, you are using more than 6 GB memory.
You should try to avoid creating such large object in your application and use things like Iterator instead of Seq
def readFile(): Iterator[ExcelFile] = {
val lineIterator = Source.fromFile("your_file_path").getLines
lineIterator.drop(1).map(line => {
val values = line.split(",").map(_.trim)
ExcelFile(
Option(values(0)),
Option(values(1)),
Option(values(2)),
Option(values(3))
)
})
}
The advantage with Iterator is that it will not load all the things in memory at once. And you can keep using Iterators for further steps.

How to delete file right after processing it with Play Framework

I want to process a large local file with Play.
The file should be deleted from the filesystem right after it's been processed. It would be easy using sendFile method like this:
def index = Action {
val fileToServe = TemporaryFile(new java.io.File("/tmp/fileToServe.pdf"))
Ok.sendFile(content = fileToServe, onClose = () => fileToServe.clean)
}
But I'd like to process the file in a streaming way in order to reduce the memory footprint:
def index = Action {
val file = new java.io.File("/tmp/fileToServe.pdf")
val path: java.nio.file.Path = file.toPath
val source: Source[ByteString, _] = FileIO.fromPath(path)
Ok.sendEntity(HttpEntity.Streamed(source, Some(file.length()), Some("application/pdf")))
.withHeaders("Content-Disposition" → "attachment; filename=file.pdf")
}
And in the latter case I can't figure out the moment when the stream would be finished and I will be able to remove the file from filesystem.
You could use watchTermination on the Source to delete the file once the stream has completed. For example:
val source: Source[ByteString, _] =
FileIO.fromPath(path)
.watchTermination()((_, futDone) => futDone.onComplete {
case Success(_) =>
println("deleting the file")
java.nio.file.Files.delete(path)
case Failure(t) => println(s"stream failed: ${t.getMessage}")
})

How to download and save a file from the internet using Scala?

Basically I have a url/link to a text file online and I am trying to download it locally. For some reason, the text file that gets created/downloaded is blank. Open to any suggestions. Thanks!
def downloadFile(token: String, fileToDownload: String) {
val url = new URL("http://randomwebsite.com/docs?t=" + token + "&p=tsr%2F" + fileToDownload)
val connection = url.openConnection().asInstanceOf[HttpURLConnection]
connection.setRequestMethod("GET")
val in: InputStream = connection.getInputStream
val fileToDownloadAs = new java.io.File("src/test/resources/testingUpload1.txt")
val out: OutputStream = new BufferedOutputStream(new FileOutputStream(fileToDownloadAs))
val byteArray = Stream.continually(in.read).takeWhile(-1 !=).map(_.toByte).toArray
out.write(byteArray)
}
I know this is an old question, but I just came across a really nice way of doing this :
import sys.process._
import java.net.URL
import java.io.File
def fileDownloader(url: String, filename: String) = {
new URL(url) #> new File(filename) !!
}
Hope this helps. Source.
You can now simply use fileDownloader function to download the files.
fileDownloader("http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words", "stop-words-en.txt")
Here is a naive implementation by scala.io.Source.fromURL and java.io.FileWriter
def downloadFile(token: String, fileToDownload: String) {
try {
val src = scala.io.Source.fromURL("http://randomwebsite.com/docs?t=" + token + "&p=tsr%2F" + fileToDownload)
val out = new java.io.FileWriter("src/test/resources/testingUpload1.txt")
out.write(src.mkString)
out.close
} catch {
case e: java.io.IOException => "error occured"
}
}
Your code works for me... There are other possibilities that make empty file.
Here is a safer alternative to new URL(url) #> new File(filename) !!:
val url = new URL(urlOfFileToDownload)
val connection = url.openConnection().asInstanceOf[HttpURLConnection]
connection.setConnectTimeout(5000)
connection.setReadTimeout(5000)
connection.connect()
if (connection.getResponseCode >= 400)
println("error")
else
url #> new File(fileName) !!
Two things:
When downloading from an URL object, if an error (404 for instance) is returned, then the URL object will throw a FileNotFoundException. And since this exception is generated from another thread (as URL happens to run on a separate thread), a simple Try or try/catch won't be able to catch the exception. Thus the preliminary check for the response code: if (connection.getResponseCode >= 400).
As a consequence of checking the response code, the connection might sometimes get stuck opened indefinitely for improper pages (as explained here). This can be avoided by setting a timeout on the connection: connection.setReadTimeout(5000).
Flush the buffer and then close your output stream.