Read file in Scala : Stream closed - scala

I try to read a file in scala like this:
def parseFile(filename: String) = {
val source = scala.io.Source.fromFile(filename)
try {
val lines = source.getLines().map(line => line.trim.toDouble)
return lines
} catch {
// re-throw exception, but make source source is closed
case
t: Throwable => {
println("error during parsing of file")
throw t
}
} finally {
source.close()
}
}
When I access the result later, I get an
java.io.IOException: Stream Closed
I understand that this arises because source.getLines() only returns an (lazy) Iterator[String], and I already close the BufferedSource in the finally clause.
How can I avoid this error, i.e. how can a "evaluate" the Stream before closing the source?
EDIT: I tried to call source.getLines().toSeq which did not help.

Maybe, you can try the following solution, which makes the codes more functional and takes the advantage of lazy evaluation.
First, define a helper function using, which takes care of open/close the file.
def using[A <: {def close() : Unit}, B](param: A)(f: A => B): B =
try f(param) finally param.close()
Then, you can refactor your code in functional programming style:
using(Source.fromFile(filename)) {
source =>
val lines = Try(source.getLines().map(line => line.trim.toDouble))
val result = lines.flatMap(l => Try(processOrDoWhatYouWantForLines(l)))
result.get
}
Actually, the using function can be used for handling all resources which need to be closed at the end of the operation.

List is not lazy so change:
val lines = source.getLines().map(line => line.trim.toDouble)
to
val lines = source.getLines().toList.map(line => line.trim.toDouble)
in order to force computing.

Related

Unable to print values of a Scala Future by using onComplete & andThen

I am trying to read incremental data from my data source using Scala-Spark. Before hitting the source tables, I am trying to calculate the min & max of partition column that I use in my code in a Future which is present in a class: GetSourceMeta as given below.
def getBounds(keyIdMap:scala.collection.mutable.Map[String, String]): Future[scala.collection.mutable.Map[String, String]] = Future {
var boundsMap = scala.collection.mutable.Map[String, String]()
keyIdMap.keys.foreach(table => if(!keyIdMap(table).contains("Invalid")) {
val minMax = s"select max(insert_tms) maxTms, min(insert_tms) minTms from schema.${table} where source='DB2' and key_id in (${keyIdMap(table)})"
println("MinMax: " + minMax)
val boundsDF = spark.read.format("jdbc").option("url", con.getConUrl()).option("dbtable", s"(${minMax}) as ctids").option("user", con.getUserName()).option("password", con.getPwd()).load()
try {
val maxTms = boundsDF.select("minTms").head.getTimestamp(0).toString + "," + boundsDF.select("maxTms").head.getTimestamp(0).toString
println("Bounds: " + maxTms)
boundsMap += (table -> maxTms)
} catch {
case np: java.lang.NullPointerException => { println("No data found") }
case e: Exception => { println(s"Unknown exception: $e") }
}
}
)
boundsMap.foreach(println)
boundsMap
}
I am calling the above method in my main method as:
object LoadToCopyDB {
val conf = new SparkConf().setAppName("TEST_YEAR").set("some parameters")
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().config(conf).master("yarn").enableHiveSupport().config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").getOrCreate()
val gsm = new GetSourceMeta()
val minMaxKeyMap = gsm.getBounds(keyIdMap).onComplete {
case Success(values) => values.foreach(println)
case Failure(f) => f.printStackTrace
}
.
.
.
}
Well, the onComplete didn't print any values so I used andThen as below and that didn't help as well.
val bounds: Future[scala.collection.mutable.Map[String, String]] = gpMetaData.getBounds(incrementalIds) andThen {
case Success(outval) => outval.foreach(println)
case Failure(e) => println(e)
}
Earlier the main thread exits without letting the Future: getBounds execute. Hence I couldn't find any println statements from the Future displayed on the terminal. I found out that I need to keep the main thread Await inorder to complete the Future. But when I use Await in main along with onComplete:
Await.result(bounds, Duration.Inf)
The compiler gives an error:
Type mismatch, expected: Awaitable[NotInferedT], actual:Unit
If I declare the val minMaxKeyMap as Future[scala.collection.mutable.Map[String, String] the compiler says: Expression of type Unit doesn't conform to expected type Future[mutable.map[String,String]]
I tried to print the values of bounds after the Await statement but that just prints an empty Map.
I couldn't understand how can to fix this. Could anyone let me know what do I do to make the Future run properly ?
In this kind of cases, is always better to follow the types. The method onComplete only returns Unit, it won´t return a future hence it can´t be passed using Await.
In case you want to return a Future of any type you will have to map or flatmap the value and return an option, for example. In this case, does not matter what you return, you only want Await method to wait for this result and print a trace. You can treat the possible exception in the recover. It would be like that in your code:
val minMaxKeyMap:Future[Option[Any] = gsm.getBounds(keyIdMap).map { values =>
values.foreach(println)
None
}.recover{
case e: Throwable =>
e. printStackTrace
None
}
Note that the recover part has to return an instance of the type.
After that, you can apply the Await to the Future, and you will get the results printed. Is not the prettiest solution but it will work in your case.

Is it possible to do finally with Try? [duplicate]

This question already has answers here:
Simple Scala pattern for "using/try-with-resources" (Automatic Resource Management)
(9 answers)
Closed 5 years ago.
I'm new to Scala and looked at source code of Try::apply
def apply[T](r: => T): Try[T] =
try Success(r) catch {
case NonFatal(e) => Failure(e)
}
It simply catches non-fatal exceptions. But what if I need finally clause? Is it possible to emulate it with Try in a functional way? I mean something like
try{
//acquire lock
//do some
} finally {
// release lock
}
with
Try{
//acquire lock
//do some
}
//Now how to release?
Something similar was already answered here.
TLTR;
There is no standard way to do that with the Try monad.
The usual workaround is something like this:
def use[A <: { def close(): Unit }, B](resource: A)(code: A ⇒ B): B =
try {
code(resource)
} finally {
resource.close()
}
That you can use like:
val path = Paths get "/etc/myfile"
use(Files.newInputStream(path)) { inputStream ⇒
val firstByte = inputStream.read()
....
}
Another approach which is explained here, implies that you "extend" the standard 'Try' by adding an additional method 'withRelease'
implicit class TryOps[A <: { def close(): Unit }](res: Try[A]) {
def withRelease() = res match {
case Success(s) => res.close(); res
case Failure(f) => res.close(); res
}
}
Then,
Try {
val inputStream = Files.newInputStream(path))
...
inputStream
}.withRelease()
Since Try resolves to a value and it doesn't unwind the stack when something fails, you can simply perform the cleanup operations after Try has executed. For example:
val someLock = ??? // acquire some lock
val result = Try {
// do something and return a result
}
someLock.release()
If you prefer, you can roll your own helper to keep everything into a single expression:
def withLock[A](f: Lock => A): Try[A] = {
val lock = ??? // acquire the lock
val res = f(lock)
lock.release()
}
and then you can write:
val res = withLock { lock =>
// some operation
}
This is usually refered to as loan pattern.

Most idiomatic way to write try/catch/finally in Scala?

What is the best way to write the following in Scala? It doesn't look quite right to me - first the forward declaration of the 2 vals, then the long PrintWriter creation line, then thefinallyblock. The only thing that's idiomatic, is the catch block...
val outputStream = Try(fs.create(tmpFile))
val writer = new PrintWriter(new BufferedWriter(new OutputStreamWriter(outputStream.get)))
if (outputStream.isFailure) {
logger.error(s"Couldn't open file: $tmpFile")
}
try {
features.foreach {
case (sectionName, modelRDD) =>
writer.append("{" + sectionName + ", " + modelRDD.getNumPartitions + "}")
}
} catch {
case e: Exception =>
logger.error(s"Got exception", e)
throw e
} finally {
outputStream.get.close()
writer.close()
}
We can further use the context of the initial Try to execute the complete I/O operation:
First, we define a function that encapsulates our process:
def safeFilePrint(tf: => OutputStream)(op: PrintWriter => Unit): Try[Unit] = {
val os = Try(tf)
val write = {
val writer = os.map(f => new PrintWriter(f))
val writeOp = writer.map(op)
val flushOp = writer.map(_.flush)
writeOp.flatMap(_ => flushOp)
}
val close = os.map(_.close)
write.flatMap(_ => close)
}
And usage:
val collection = Seq(...)
val writeResult = safeFilePrint(new FileOutputStream(new File("/tmp/foo.txt"))){w =>
collection.foreach(elem => w.write(e)
}
Note that in contrast with the original code, we have a result of the write operation. Either writeResult will be Success(()) if everything went well or Failure(exception) is something went wrong. Based on that our application can further decide what to do.
One may wonder: "Where is the finally?" In Java, finally is used to ensure that some code (typically resource management) would be executed, even in the case that an exception thrown in the try scope would cause an exception-handling path to be followed.
In Scala, using constructs like Try, Either or our own ADT, we lift error handling to the application level. finally becomes unnecessary as our program is able to deal with the failure as just another valid state of the program.
Finally settled on that code after reading #maasg's answer, which highlights the monadic flow and is more "symmetric". It looks much, much better than the code in the OP!
def safePrintToStream(gen: => OutputStream)(op: PrintWriter => Unit): Try[Unit] = {
val os = Try(gen)
val writer = os.map(stream => new PrintWriter(stream))
val write = writer.map(op(_))
val flush = writer.map(_.flush)
val close = os.map(_.close)
write.flatMap(_ => flush).flatMap(_ => close)
}

How do I write this without using a Try/Catch block?

I am looking to rewrite this scala function, but I am new to the language, I understand there is a alternative to using try\catch blocks. How would you guys rewrite this function?
def updateStationPost = Action { implicit request =>
StationForm.bindFromRequest.fold(
errors => { //needs to be revised!!
BadRequest(html.updateStation(errors,
Station(
request.body.asFormUrlEncoded.get("id")(0).toLong,
request.body.asFormUrlEncoded.get("operator")(0).toLong,
request.body.asFormUrlEncoded.get("name")(0),
try {
request.body.asFormUrlEncoded.get("number")(0).toInt
} catch {
case e:Exception => { 0 } //this exception happens when trying to convert the number when there is nothing in the flash scope to convert.
},
request.body.asFormUrlEncoded.get("timezone")(0)
),
Operators.retrieveJustOperators() //ugh... needs to be revised..
)
)
},
{ case(stationFormObj) =>
Stations.update(stationFormObj)
Redirect(routes.StationsController.index)
}
)
}
A general way of managing this is to use Try to wrap code that could throw an exception. Some of the ways of using this are illustrated below:
def unpredictable() = {
Try(Console.readLine("Int please: ").toInt) getOrElse 0
}
If the console read does not contain a parseable integer, then it throws an exception. This code just returns a 0 if there was an error, but you could put other statements there. As an alternative, you could use pattern matching to handle the situation.
def unpredictable() = {
Try(Console.readLine("Int please: ").toInt) match {
case Success(i) => i
case Failure(e) => println(e.getMessage())
}
}
You can also just return a Try and let the caller decide how to handle the failure.
How about:
import scala.util.control.Exception.handling
// Create a val like this as you reuse it over and over
val form: Option[Map[String, Seq[String]]] = request.body.asFormUrlEncoded
// Create some helper functions like this
val nfeHandler = handling(classOf[NumberFormatException]) by (_ => 0)
val intNFEHandler = (str: String) => nfeHandler apply str.toInt
val longNFEHandler = (str: String) => nfeHandler apply str.toLong
// You can use this instead of your try catch.. but this is just a sugar.. perhaps cleaner
intNFEHandler apply form.get("id")(0)
Here if the form was something like: Option(Map("id" -> Seq.empty[String]))
form.get("id")(0) would blow up with java.lang.IndexOutOfBoundsException.
I would suggest to have another helper:
// takes fieldNames and returns Option(fieldValue)
val fieldValueOpt = (fieldName: String) => form.flatMap(_.get(fieldName).flatMap(_.headOption))
Then create a validate method which performs pattern matching on all the fieldValue optionals, extract the values and create your Station object.

Do Scala files need to be released before deleting?

In the code below, if I uncomment the for loop the file no longer gets deleted
val file = "myfile.csv"
//for (line <- Source.fromFile(file).getLines()) { }
new File(file).delete()
If so is there some type of close function that I should be calling?
There is some sort of close that you should be calling:
val file = "myfile.csv"
val source = Source.fromFile(file)
for (line <- source.getLines()) { }
source.close
new File(file).delete
but this is a bit tedious. If you rewrite the for loop as
source.getLines().foreach{ line => }
you can then
class CloseAfter[A <: { def close(): Unit }](a: A) {
def closed[B](f: A => B) = try { f(a) } finally { a.close }
}
implicit def close_things[A <: { def close(): Unit }](a: A) = new CloseAfter(a)
and now your code would become
val file = "myfile.csv"
Source.fromFile(file).closed(_.foreach{ line => })
new File(file).delete
(which would be a benefit if you're doing it many times in your code, or if you already maintain your own library of helpful functions and it would be easy to add the closing implicit just once there so you could use it everywhere).
As others have said, yes, you need to close the Source when you're done with it. Another good solution is to use scala-arm to automagically close the file for you.
import resource._
val file = "myfile.csv"
for {
source <- managed(Source.fromFile(file))
line <- source.getLines()
} {
}
new File(file).delete
After reading "Why doesn't Scala Source close the underlying InputStream?", use instead "scala-incubator / scala-io".
It includes a delete operation on a Path which takes care of everything. That library always always ensures that files are safely closed after each use.