How to download and save a file from the internet using Scala? - scala

Basically I have a url/link to a text file online and I am trying to download it locally. For some reason, the text file that gets created/downloaded is blank. Open to any suggestions. Thanks!
def downloadFile(token: String, fileToDownload: String) {
val url = new URL("http://randomwebsite.com/docs?t=" + token + "&p=tsr%2F" + fileToDownload)
val connection = url.openConnection().asInstanceOf[HttpURLConnection]
connection.setRequestMethod("GET")
val in: InputStream = connection.getInputStream
val fileToDownloadAs = new java.io.File("src/test/resources/testingUpload1.txt")
val out: OutputStream = new BufferedOutputStream(new FileOutputStream(fileToDownloadAs))
val byteArray = Stream.continually(in.read).takeWhile(-1 !=).map(_.toByte).toArray
out.write(byteArray)
}

I know this is an old question, but I just came across a really nice way of doing this :
import sys.process._
import java.net.URL
import java.io.File
def fileDownloader(url: String, filename: String) = {
new URL(url) #> new File(filename) !!
}
Hope this helps. Source.
You can now simply use fileDownloader function to download the files.
fileDownloader("http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words", "stop-words-en.txt")

Here is a naive implementation by scala.io.Source.fromURL and java.io.FileWriter
def downloadFile(token: String, fileToDownload: String) {
try {
val src = scala.io.Source.fromURL("http://randomwebsite.com/docs?t=" + token + "&p=tsr%2F" + fileToDownload)
val out = new java.io.FileWriter("src/test/resources/testingUpload1.txt")
out.write(src.mkString)
out.close
} catch {
case e: java.io.IOException => "error occured"
}
}
Your code works for me... There are other possibilities that make empty file.

Here is a safer alternative to new URL(url) #> new File(filename) !!:
val url = new URL(urlOfFileToDownload)
val connection = url.openConnection().asInstanceOf[HttpURLConnection]
connection.setConnectTimeout(5000)
connection.setReadTimeout(5000)
connection.connect()
if (connection.getResponseCode >= 400)
println("error")
else
url #> new File(fileName) !!
Two things:
When downloading from an URL object, if an error (404 for instance) is returned, then the URL object will throw a FileNotFoundException. And since this exception is generated from another thread (as URL happens to run on a separate thread), a simple Try or try/catch won't be able to catch the exception. Thus the preliminary check for the response code: if (connection.getResponseCode >= 400).
As a consequence of checking the response code, the connection might sometimes get stuck opened indefinitely for improper pages (as explained here). This can be avoided by setting a timeout on the connection: connection.setReadTimeout(5000).

Flush the buffer and then close your output stream.

Related

Wait for a "Process" in Scala to complete

So there is this function in my program, that downloads a quite heavy zip file (about 500 megabytes) and then extracts the file and removes the zip file itself.
And obviously I want to wait for the file to download completely, then wait to unzip it completely and then remove the zip file itself (I just need the Json files inside). This is the code that I use currently:
import java.io.File
import java.net.URL
import scala.sys.process._
/* other functions */
// downloading, unzipping and removing are in separate functinos, but I
// aggregated them all here for simplicity
def downloadZipThenExtract(link: String, filePath: String): Future[Int] = {
val urlObject = new URL(link)
val file = new File(filePath)
Future {
val download: ProcessBuilder = urlObject #> file
val unzip: ProcessBuilder = s"unzip ${file.getPath} -d ${file.getParent}"
val delete: ProcessBuilder = s"rm ${file.getPath}"
/*
I've already tried this:
(download ### unzip ### delete) !
And every other solution, none of them worked
*/ // =>
download !
Thread.sleep(900000) // wait 15 minutes to download
unzip !
Thread.sleep(60000) // One minute to unzip
delete !
}
}
And as you can see, I found no other approach than freezing the thread to complete the download and unzipping, which of course sucks. So I wanted to know if you guys know any better approach, thanks.
I am not sure why you think that "freezing the thread" sucks. If you have to wait for the process to finish, then you have to wait for it.
But if it is just Thread.sleep that you see as problematic, then I have good news for you: you don't need it :)
download !
unzip !
delete !
I know you said you already tried "every other solution, and nothing worked", but if you have indeed tried this one, you gotta elaborate, because then I don't know what you mean when you say it "doesn't work".
I almost forgot about this question of mine, which I'd like to answer it in case anybody else faces this problem, since someone just upvoted the question.
My solution was to use akka, then put three streams inside my code, where obviously, first one downloads the data and puts it in the desired zip file, the second one extracts the zip file using java nio, and the last one just deletes the zip file.
Note that there's some extra code which you might not need, like the println's in case of exceptions and other stuff, I just wanted to share the idea and the solution.
here is my code:
def downloadFilesInto(fullPath: String, link: String)(implicit materializer: Materializer): Future[Int] = {
val directory: File = new File(fullPath).getParentFile
val urlObj = new URL(link)
val fileOutputStream = new FileOutputStream(fullPath)
val readableChannel = Channels.newChannel(urlObj.openStream())
val source = Source.single(1)
val downloaderFlow: Flow[Int, Long, NotUsed] = Flow[Int].map { _ =>
val size: Long = fileOutputStream.getChannel.transferFrom(readableChannel, 0, Long.MaxValue)
fileOutputStream.close()
readableChannel.close()
size
}
val unzipFlow: Flow[Long, Int, NotUsed] = Flow[Long].map { size =>
println(s"file downloaded, properties:\n type: zip\n size: ${size.toString.take(3)} MB")
val zipFile = new ZipFile(new File(fullPath))
val entries = zipFile.entries().asScala.toList
val eachFileExtractionResult: List[Int] =
entries.map { entry =>
try {
val path = new File(directory + "/" + entry.getName).toPath
Files.copy(zipFile.getInputStream(entry), path)
0
} catch {
case ex: Exception =>
println(
s"""
| Caught an Exception while unzipping file: ${entry.getName}
| Type: ${ex.getClass}
| Cause: ${ex.getCause}
| Message: ${ex.getMessage}
|""".stripMargin
)
1
case th: Throwable =>
println(
s"""
| Caught a throwable while unzipping file: ${entry.getName}
| Type: ${th.getClass}
| Cause: ${th.getCause}
| Message: ${th.getMessage}
|""".stripMargin
)
2
}
}
val unsuccessfulTries = eachFileExtractionResult.filterNot(_ == 0)
if (unsuccessfulTries.isEmpty)
0
else unsuccessfulTries.head
}
val removeFlow: Flow[Int, Int, NotUsed] = Flow[Int].map {
case 0 => // successfully unzipped
s"rm $fullPath".!
case whatever => whatever
}
val sink = Sink.fold[Int, Int](0)(_ + _)
source.viaMat(downloaderFlow)(Keep.right)
.viaMat(unzipFlow)(Keep.right)
.viaMat(removeFlow)(Keep.right)
.toMat(sink)(Keep.right)
.run()
}

Issue with try-finally in Scala

I have following scala code:
val file = new FileReader("myfile.txt")
try {
// do operations on file
} finally {
file.close() // close the file
}
How do I handle FileNotFoundException thrown when I read the file? If I put that line inside try block, I am not able to access the file variable inside finally.
For scala 2.13:
you can just use Using to acquire some resource and release it automatically without error handling if it's an AutoClosable:
import java.io.FileReader
import scala.util.Using
val newStyle: Try[String] = Using(new FileReader("myfile.txt")) {
reader: FileReader =>
// do something with reader
"something"
}
newStyle
// will be
// Failure(java.io.FileNotFoundException: myfile.txt (No such file or directory))
// if file is not found or Success with some value it will not fall
scala 2.12:
You can wrap your reader creation by scala.util.Try and if it will fall on creation you will get Failure with FileNotFoundException inside.
import java.io.FileReader
import scala.util.Try
val oldStyle: Try[String] = Try{
val file = new FileReader("myfile.txt")
try {
// do operations on file
"something"
} finally {
file.close() // close the file
}
}
oldStyle
// will be
// Failure(java.io.FileNotFoundException: myfile.txt (No such file or directory))
// or Success with your result of file reading inside
I recommend not to use try ... catch blocks in scala code. It's not type safety for some cases and can lead to non-obvious results but for release some resource in old scala versions there is the only way to do it - using try-finally.

File upload using Akka HTTP

I am trying to implement file upload functionality in my application using Akka HTTP. I am using akka-stream version 2.4.4.
Here is the code (modified from akka-doc)
path("fileupload") {
post {
extractRequestContext {
ctx => {
implicit val materializer = ctx.materializer
implicit val ec = ctx.executionContext
fileUpload("fileUpload") {
case (metadata, byteSource) =>
val location = FileUtil.getUploadPath(metadata)
val updatedFileName = metadata.fileName.replaceAll(" ", "").replaceAll("\"", "")
val uniqFileName = uniqueFileId.concat(updatedFileName)
val fullPath = location + File.separator + uniqFileName
val writer = new FileOutputStream(fullPath)
val bufferedWriter = new BufferedOutputStream(writer)
val result = byteSource.map(s => {
bufferedWriter.write(s.toArray)
}).runWith(Sink.ignore)
val result1 = byteSource.runWith(Sink.foreach(s=>bufferedWriter.write(s.toArray)))
Await.result(result1, 5.seconds)
bufferedWriter.flush()
bufferedWriter.close()
complete(uniqFileName)
/*onSuccess(result) { x =>
bufferedWriter.flush()
bufferedWriter.close()
complete("hello world")
}*/
}
}
}
}
}
This code is working fine and is uploading the file to the given path. I am generating new file names by appending UUID to make sure that the file names are unique. So I need to return the new file name to the caller. However, this method is not returning the filename always. Sometimes, it is finishing with Response has no content.
Can anyone let me know what I am doing wrong here?
There is no need to use the standard blocking streams when you have reactive streams for that purpose:
path("fileUpload") {
post {
fileUpload("fileUpload") {
case (fileInfo, fileStream) =>
val sink = FileIO.toPath(Paths.get("/tmp") resolve fileInfo.fileName)
val writeResult = fileStream.runWith(sink)
onSuccess(writeResult) { result =>
result.status match {
case Success(_) => complete(s"Successfully written ${result.count} bytes")
case Failure(e) => throw e
}
}
}
}
}
This code will upload fileUpload multipart field to a file inside /tmp directory. It just dumps the content of the input source to the respective file sink, returning a message upon the completion of the write operation.
You may also want to tweak the dispatcher used for FileIO sources and sinks, as described in their scaladocs.
If you need only uploading a file but not doing anything until upload finishes in the file stream, then there is much simpler way:
def tempDestination(fileInfo: FileInfo): File =
File.createTempFile(fileInfo.fileName, ".tmp")
val route =
storeUploadedFile("csv", tempDestination) {
case (metadata, file) =>
// do something with the file and file metadata ...
file.delete()
complete(StatusCodes.OK)
}
See docs: https://doc.akka.io/docs/akka-http/current/routing-dsl/directives/file-upload-directives/storeUploadedFile.html

How to close enumerated file?

Say, in an action I have:
val linesEnu = {
val is = new java.io.FileInputStream(path)
val isr = new java.io.InputStreamReader(is, "UTF-8")
val br = new java.io.BufferedReader(isr)
import scala.collection.JavaConversions._
val rows: scala.collection.Iterator[String] = br.lines.iterator
Enumerator.enumerate(rows)
}
Ok.feed(linesEnu).as(HTML)
How to close readers/streams?
There is a onDoneEnumerating callback that functions like finally (will always be called whether or not the Enumerator fails). You can close the streams there.
val linesEnu = {
val is = new java.io.FileInputStream(path)
val isr = new java.io.InputStreamReader(is, "UTF-8")
val br = new java.io.BufferedReader(isr)
import scala.collection.JavaConversions._
val rows: scala.collection.Iterator[String] = br.lines.iterator
Enumerator.enumerate(rows).onDoneEnumerating {
is.close()
// ... Anything else you want to execute when the Enumerator finishes.
}
}
The IO tools provided by Enumerator give you this kind of resource management out of the box—e.g. if you create an enumerator with fromStream, the stream is guaranteed to get closed after running (even if you only read a single line, etc.).
So for example you could write the following:
import play.api.libs.iteratee._
val splitByNl = Enumeratee.grouped(
Traversable.splitOnceAt[Array[Byte], Byte](_ != '\n'.toByte) &>>
Iteratee.consume()
) compose Enumeratee.map(new String(_, "UTF-8"))
def fileLines(path: String): Enumerator[String] =
Enumerator.fromStream(new java.io.FileInputStream(path)).through(splitByNl)
It's a shame that the library doesn't provide a linesFromStream out of the box, but I personally would still prefer to use fromStream with hand-rolled splitting, etc. over using an iterator and providing my own resource management.

Downloading Image file using scala

I am trying to downloading image file for Latex formula. Following is the code I am using
var out: OutputStream = null;
var in: InputStream = null;
try {
val url = new URL("http://latex.codecogs.com/png.download?$$I=\frac{dQ}{dt}$$")
val connection = url.openConnection().asInstanceOf[HttpURLConnection]
connection.setRequestMethod("GET")
in = connection.getInputStream
val localfile = "sample2.png"
out = new BufferedOutputStream(new FileOutputStream(localfile))
val byteArray = Stream.continually(in.read).takeWhile(-1 !=).map(_.toByte).toArray
out.write(byteArray)
} catch {
case e: Exception => println(e.printStackTrace())
} finally {
out.close
in.close
}
I am able to download but it is not downloading complete image, expected image size is around 517 bytes but it is downloading only 275 bytes. What might be going wrong in it. Attached the incomplete and complete images. Please help me. I have used same code to download files more than 1MB size it worked properly.
You're passing a bad string, the "\f" is interpreted as an escape sequence and gives you a single "form feed" character.
Better:
val url = new URL("http://latex.codecogs.com/png.download?$$I=\\frac{dQ}{dt}$$")
or
val url = new URL("""http://latex.codecogs.com/png.download?$$I=\frac{dQ}{dt}$$""")
An alternative option is to use the system commands which is much cleaner
import sys.process._
import java.net.URL
import java.io.File
new URL("""http://latex.codecogs.com/png.download?$$I=\frac{dQ}{dt}$$""") #> new File("sample2.png") !!
An example using standard Java API and resource releasing with Using.
import java.nio.file.Files
import java.nio.file.Paths
import java.net.URL
import scala.util.Using
#main def main() =
val url = URL("http://webcode.me/favicon.ico")
Using(url.openStream) { in =>
Files.copy(in, Paths.get("favicon.ico"))
}