Downloading Image file using scala - scala

I am trying to downloading image file for Latex formula. Following is the code I am using
var out: OutputStream = null;
var in: InputStream = null;
try {
val url = new URL("http://latex.codecogs.com/png.download?$$I=\frac{dQ}{dt}$$")
val connection = url.openConnection().asInstanceOf[HttpURLConnection]
connection.setRequestMethod("GET")
in = connection.getInputStream
val localfile = "sample2.png"
out = new BufferedOutputStream(new FileOutputStream(localfile))
val byteArray = Stream.continually(in.read).takeWhile(-1 !=).map(_.toByte).toArray
out.write(byteArray)
} catch {
case e: Exception => println(e.printStackTrace())
} finally {
out.close
in.close
}
I am able to download but it is not downloading complete image, expected image size is around 517 bytes but it is downloading only 275 bytes. What might be going wrong in it. Attached the incomplete and complete images. Please help me. I have used same code to download files more than 1MB size it worked properly.

You're passing a bad string, the "\f" is interpreted as an escape sequence and gives you a single "form feed" character.
Better:
val url = new URL("http://latex.codecogs.com/png.download?$$I=\\frac{dQ}{dt}$$")
or
val url = new URL("""http://latex.codecogs.com/png.download?$$I=\frac{dQ}{dt}$$""")

An alternative option is to use the system commands which is much cleaner
import sys.process._
import java.net.URL
import java.io.File
new URL("""http://latex.codecogs.com/png.download?$$I=\frac{dQ}{dt}$$""") #> new File("sample2.png") !!

An example using standard Java API and resource releasing with Using.
import java.nio.file.Files
import java.nio.file.Paths
import java.net.URL
import scala.util.Using
#main def main() =
val url = URL("http://webcode.me/favicon.ico")
Using(url.openStream) { in =>
Files.copy(in, Paths.get("favicon.ico"))
}

Related

spark jupyter notebook does not show scala console output

1) I am learning streaming and run into problems of nothing shown up (println via sendEVent) on the console (scala). I further attempted to inplant line of println("xyz") and found out that it only get printed if they are not embedded within the block of 'while' block... otherwise it wont get printed even placed before the while loop. I placed a few more lines of those println("xyz") and found out some might get blocked out... and only the last one get printed out.
Previously I also encounted twice with two different pieces of codes on Storm streaming that: nothing get printed out from Jupyter Notebook but perfectly ok on Scala Shell.
2) Also I wonder in those awaitTermination(), such as:
messages.writeStream.outputMode("append").format("console").option("truncate", false).start().awaitTermination() (I also get no output from console)
or those "infinitive loop" as shown bellowing codes:
var finished = false
while (!finished) {................. ..}
are they waiting for a hard break like halt or [CTR]C... or how to break them properly? so the next line get executed. I get so confused as the author writing the samples / tutorials explained nothing about this.
enter code here
import java.util._
import scala.collection.JavaConverters._
import java.util.concurrent._
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.eventhubs.ConnectionStringBuilder
// Event hub configurations
// Replace values below with yours
val eventHubName = "<Event hub name>"
val eventHubNSConnStr = "<Event hub namespace connection string>"
val connStr = ConnectionStringBuilder (eventHubNSConnStr)
.setEventHubName(eventHubName).build
import com.microsoft.azure.eventhubs._
val pool = Executors.newFixedThreadPool(1)
val eventHubClient = EventHubClient.create(connStr.toString(), pool)
def sendEvent(message: String) = {
val messageData = EventData.create(message.getBytes("UTF-8"))
eventHubClient.get().send(messageData)
println("Sent event: " + message + "\n")
}
import twitter4j._
import twitter4j.TwitterFactory
import twitter4j.Twitter
import twitter4j.conf.ConfigurationBuilder
// Twitter application configurations
// Replace values below with yours
val twitterConsumerKey = "<CONSUMER KEY>"
val twitterConsumerSecret = "<CONSUMER SECRET>"
val twitterOauthAccessToken = "<ACCESS TOKEN>"
val twitterOauthTokenSecret = "<TOKEN SECRET>"
val cb = new ConfigurationBuilder()
cb.setDebugEnabled
(true).setOAuthConsumerKeywitterConsumerKey).setOAuthConsumerSecret
(twitterConsumerSecret).setOAuthAccessToken
(twitterOauthAccessToken).setOAuthAccessTokenSecret(twitterOauthTokenSecret)
val twitterFactory = new TwitterFactory(cb.build())
val twitter = twitterFactory.getInstance()
//Getting tweets with keyword "Azure" and sending them to Event Hub realtime
val query = new Query(" #Azure ")
query.setCount(100)
query.lang("en")
var finished = false
while (!finished) {
val result = twitter.search(query)
val statuses = result.getTweets()
var lowestStatusId = Long.MaxValue
for (status <- statuses.asScala) {
if(!status.isRetweet()){
sendEvent(status.getText())
}
lowestStatusId = Math.min(status.getId(), lowestStatusId)
Thread.sleep(2000)
}
query.setMaxId(lowestStatusId - 1)
}
// Closing connection to the Event Hub
eventHubClient.get().close()

Read from GZIPInputStream to String without using Source

I am using Scala. I need to read a large gzip file and turn it into string. And I need to remove the first line.
This is how I read the file:
val fis = new FileInputStream(filename)
val gz = new GZIPInputStream(fis)
And then I tried with this Source.fromInputStream(gz).getLines.drop(1).mkString("")
. But it causes out of memory error.
Therefore, I think of reading line by line and maybe put it into byte array. Then I can just convert it into a single String in the end.
But I have no idea how to do this. Any suggestion? Or any better method is also welcome.
If your gzipped file is huge, you can go with BufferedReader. Here is an example. It copies all chars from gzipped file to uncompressed, but it skips the first line.
import java.util.zip.GZIPInputStream
import java.io._
import java.nio.charset.StandardCharsets
import scala.annotation.tailrec
import scala.util.Try
val bufferSize = 4096
val pathToGzFile = "/tmp/text.txt.gz"
val pathToOutputFile = "/tmp/text_without_first_line.txt"
val charset = StandardCharsets.UTF_8
val inStream = new FileInputStream(pathToGzFile)
val outStream = new FileOutputStream(pathToOutputFile)
try {
val inGzipStream = new GZIPInputStream(inStream)
val inReader = new InputStreamReader(inGzipStream, charset)
val outWriter = new OutputStreamWriter(outStream, charset)
val bufferedReader = new BufferedReader(inReader)
val closeables = Array[Closeable](inGzipStream, inReader,
outWriter, bufferedReader)
// Read first line, so copy method will not get this - it will be skipped
val firstLine = bufferedReader.readLine()
println(s"First line: $firstLine")
#tailrec
def copy(in: Reader, out: Writer, buffer: Array[Char]): Unit = {
// Copy while it's not end of file
val readChars = in.read(buffer, 0, buffer.length)
if (readChars > 0) {
out.write(buffer, 0, readChars)
copy(in, out, buffer)
}
}
// Copy chars from bufferReader to outWriter using buffer
copy(bufferedReader, outWriter, Array.ofDim[Char](bufferSize))
// Close all closeabes
closeables.foreach(c => Try(c.close()))
}
finally {
Try(inStream.close())
Try(outStream.close())
}

Reading Basic File Attributes in Scala?

I'm trying to get basic file attributes using Scala, and my reference is this Java question:
Determine file creation date in Java
and this piece of code I'm trying to rewrite in Scala:
static void getAttributes(String pathStr) throws IOException {
Path p = Paths.get(pathStr);
BasicFileAttributes view
= Files.getFileAttributeView(p, BasicFileAttributeView.class)
.readAttributes();
System.out.println(view.creationTime()+" is the same as "+view.lastModifiedTime());
}
The thing I just can't figure out is this line of code..I don't understand how to pass a class in this way using scala... or why Java is insisting upon this in the first place instead of using an actual constructed object as the parameter. Can someone please help me write this line of code to function properly? I must be using the wrong syntax
val attr = Files.readAttributes(f,Class[BasicFileAttributeView])
Try this:
def attrs(pathStr:String) =
Files.getFileAttributeView(
Paths.get(pathStr),
classOf[BasicFileAttributes] //corrected
).readAttributes
Get file creation date in Scala, from Basic Files Attributes:
// option 1,
import java.nio.file.{Files, Paths}
import java.nio.file.attribute.BasicFileAttributes
val pathStr = "/tmp/test.sql"
Files.readAttributes(Paths.get(pathStr), classOf[BasicFileAttributes]).creationTime
res3: java.nio.file.attribute.FileTime = 2018-03-06T00:25:52Z
// option 2,
import java.nio.file.{Files, Paths}
import java.nio.file.attribute.BasicFileAttributeView
val pathStr = "/tmp/test.sql"
{
Files
.getFileAttributeView(Paths.get(pathStr), classOf[BasicFileAttributeView])
.readAttributes.creationTime
}
res20: java.nio.file.attribute.FileTime = 2018-03-07T19:00:19Z

How to convert Source[ByteString, Any] to InputStream

akka-http represents a file uploaded using multipart/form-data encoding as Source[ByteString, Any]. I need to unmarshal it using Java library that expects an InputStream.
How Source[ByteString, Any] can be turned into an InputStream?
As of version 2.x you achieve this with the following code:
import akka.stream.scaladsl.StreamConverters
...
val inputStream: InputStream = entity.dataBytes
.runWith(
StreamConverters.asInputStream(FiniteDuration(3, TimeUnit.SECONDS))
)
See: http://doc.akka.io/docs/akka-stream-and-http-experimental/2.0.1/scala/migration-guide-1.0-2.x-scala.html
Note: was broken in version 2.0.2 and fixed in 2.4.2
You could try using an OutputStreamSink that writes to a PipedOutputStream and feed that into a PipedInputStream that your other code uses as its input stream. It's a little rough of an idea but it could work. The code would look like this:
import akka.util.ByteString
import akka.stream.scaladsl.Source
import java.io.PipedInputStream
import java.io.PipedOutputStream
import akka.stream.io.OutputStreamSink
import java.io.BufferedReader
import java.io.InputStreamReader
import akka.actor.ActorSystem
import akka.stream.ActorFlowMaterializer
object PipedStream extends App{
implicit val system = ActorSystem("flowtest")
implicit val mater = ActorFlowMaterializer()
val lines = for(i <- 1 to 100) yield ByteString(s"This is line $i\n")
val source = Source(lines)
val pipedIn = new PipedInputStream()
val pipedOut = new PipedOutputStream(pipedIn)
val flow = source.to(OutputStreamSink(() => pipedOut))
flow.run()
val reader = new BufferedReader(new InputStreamReader(pipedIn))
var line:String = reader.readLine
while(line != null){
println(s"Reader received line: $line")
line = reader.readLine
}
}
You could extract an interator from ByteString and then get the InputStream. Something like this (pseudocode):
source.map { data: ByteString =>
data.iterator.asInputStream
}
Update
A more elaborated sample starting with a Multipart.FormData
def isSourceFromFormData(formData: Multipart.FormData): Source[InputStream, Any] =
formData.parts.map { part =>
part.entity.dataBytes
.map(_.iterator.asInputStream)
}.flatten(FlattenStrategy.concat)

How to download and save a file from the internet using Scala?

Basically I have a url/link to a text file online and I am trying to download it locally. For some reason, the text file that gets created/downloaded is blank. Open to any suggestions. Thanks!
def downloadFile(token: String, fileToDownload: String) {
val url = new URL("http://randomwebsite.com/docs?t=" + token + "&p=tsr%2F" + fileToDownload)
val connection = url.openConnection().asInstanceOf[HttpURLConnection]
connection.setRequestMethod("GET")
val in: InputStream = connection.getInputStream
val fileToDownloadAs = new java.io.File("src/test/resources/testingUpload1.txt")
val out: OutputStream = new BufferedOutputStream(new FileOutputStream(fileToDownloadAs))
val byteArray = Stream.continually(in.read).takeWhile(-1 !=).map(_.toByte).toArray
out.write(byteArray)
}
I know this is an old question, but I just came across a really nice way of doing this :
import sys.process._
import java.net.URL
import java.io.File
def fileDownloader(url: String, filename: String) = {
new URL(url) #> new File(filename) !!
}
Hope this helps. Source.
You can now simply use fileDownloader function to download the files.
fileDownloader("http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words", "stop-words-en.txt")
Here is a naive implementation by scala.io.Source.fromURL and java.io.FileWriter
def downloadFile(token: String, fileToDownload: String) {
try {
val src = scala.io.Source.fromURL("http://randomwebsite.com/docs?t=" + token + "&p=tsr%2F" + fileToDownload)
val out = new java.io.FileWriter("src/test/resources/testingUpload1.txt")
out.write(src.mkString)
out.close
} catch {
case e: java.io.IOException => "error occured"
}
}
Your code works for me... There are other possibilities that make empty file.
Here is a safer alternative to new URL(url) #> new File(filename) !!:
val url = new URL(urlOfFileToDownload)
val connection = url.openConnection().asInstanceOf[HttpURLConnection]
connection.setConnectTimeout(5000)
connection.setReadTimeout(5000)
connection.connect()
if (connection.getResponseCode >= 400)
println("error")
else
url #> new File(fileName) !!
Two things:
When downloading from an URL object, if an error (404 for instance) is returned, then the URL object will throw a FileNotFoundException. And since this exception is generated from another thread (as URL happens to run on a separate thread), a simple Try or try/catch won't be able to catch the exception. Thus the preliminary check for the response code: if (connection.getResponseCode >= 400).
As a consequence of checking the response code, the connection might sometimes get stuck opened indefinitely for improper pages (as explained here). This can be avoided by setting a timeout on the connection: connection.setReadTimeout(5000).
Flush the buffer and then close your output stream.