Do Scala files need to be released before deleting? - scala

In the code below, if I uncomment the for loop the file no longer gets deleted
val file = "myfile.csv"
//for (line <- Source.fromFile(file).getLines()) { }
new File(file).delete()
If so is there some type of close function that I should be calling?

There is some sort of close that you should be calling:
val file = "myfile.csv"
val source = Source.fromFile(file)
for (line <- source.getLines()) { }
source.close
new File(file).delete
but this is a bit tedious. If you rewrite the for loop as
source.getLines().foreach{ line => }
you can then
class CloseAfter[A <: { def close(): Unit }](a: A) {
def closed[B](f: A => B) = try { f(a) } finally { a.close }
}
implicit def close_things[A <: { def close(): Unit }](a: A) = new CloseAfter(a)
and now your code would become
val file = "myfile.csv"
Source.fromFile(file).closed(_.foreach{ line => })
new File(file).delete
(which would be a benefit if you're doing it many times in your code, or if you already maintain your own library of helpful functions and it would be easy to add the closing implicit just once there so you could use it everywhere).

As others have said, yes, you need to close the Source when you're done with it. Another good solution is to use scala-arm to automagically close the file for you.
import resource._
val file = "myfile.csv"
for {
source <- managed(Source.fromFile(file))
line <- source.getLines()
} {
}
new File(file).delete

After reading "Why doesn't Scala Source close the underlying InputStream?", use instead "scala-incubator / scala-io".
It includes a delete operation on a Path which takes care of everything. That library always always ensures that files are safely closed after each use.

Related

Scala - Implement Object Factory Pattern

I am trying to implement object factory design pattern in scala. However I am not able to undertand how to return the object based on condition.
I tried to return Option however it's not working as expected.
import java.io.File
import java.util.Properties
import scala.io.Source
abstract class FileSystem {
def moveFile(propFileURI: String): Unit
}
object FileSystem {
private class HDFSystem extends FileSystem{
override def moveFile(propFileURI: String): Unit = {
println(" HDFS move file")
}
}
private class S3System extends FileSystem {
override def moveFile(propFileURI: String): Unit = {
println("S3 Move File ")
}
}
// How to handle this??
def apply(propFileURI: String): Option[FileSystem] = {
val properties: Properties = new Properties()
val source = Source.fromFile( System.getProperty("user.dir")+"\\src\\main\\resources\\"+propFileURI).reader
properties.load(source)
val srcPath = properties.getProperty("srcPath")
val destPath = properties.getProperty("destPath")
var Obj = None:Option[FileSystem]
if (destPath.contains("hdfs")){
Obj = Option(new HDFSystem())
}
if (srcPath.contains("s3") && destPath.contains("s3")){
Obj = Option(new S3System())
}
Obj
}
def main(args: Array[String]): Unit = {
val obj = FileSystem("test.properties")
obj match {
case test: FileSystem => test.moveFile("test.properties")
case None => None
}
}
}
How to handle the Apply method based on condition I have mentioned? Do I really need to return option ?
There are ways to clean up your implementation, for example, usage of var could be avoided
if (destPath.contains("hdfs"))
Some(new HDFSystem())
else if (srcPath.contains("s3") && destPath.contains("s3"))
Some(new S3System())
else
None
however as far as I understand the main point of the question is
Do I really need to return Option?
This is a design decision with tradeoffs and no clear answer. Ask yourself how should the system react if .properties file is missing the appropriate configuration key:
Could we construct a meaningful default FileSystem object? In this case there is no need for Option, just return the default and continue processing.
If we cannot construct a meaningful default, is there any point in continuing? Should the system crash? In which case we might throw.
If the system can continue operating despite not being able to construct FileSystem, then we could model this information as Option or Either etc., perhaps log the event, and continue processing.
These are just some considerations to take into account. Personally, I have found that a misconfigured system is hard to recover from.
Options aren't Java's #nullables. You pattern match on Some if value is present:
val obj = FileSystem("test.properties") // Option[FileSystem]
obj match {
case Some(test) => test.moveFile("test.properties")
case None =>
}
Also in Scala almost everything is an expression so if-else, blocks, loops and functions returns their last value as the value of all. Also we should take IO errors into consideration, so Try could be better than Option:
def apply(propFileURI: String): Try[FileSystem] =
Try {
// reading properties could fail
val p = new Properties()
val source = Source.fromFile( System.getProperty("user.dir")+"\\src\\main\\resources\\"+propFileURI).reader
p.load(source)
p
}.flatMap { properties =>
// reading from properties could fail
val srcPath = properties.getProperty("srcPath")
val destPath = properties.getProperty("destPath")
if (destPath.contains("hdfs")) Success(new HDFSystem())
else if (srcPath.contains("s3") && destPath.contains("s3")) Success(new S3System())
else Failure(new Exception("Unable to recognize filesystem"))
}
def main(args: Array[String]): Unit =
FileSystem("test.properties") match {
case Success(fileSystem) => fileSystem.moveFile("test.properties")
case Failure(error) => error.printStackTrace()
}
We could convert Try at any moment to Option with .toOption and pattern-match on it. But this way we have no information on error. We could also create a FileSystemError type for storing error information and return Either[FileSystemError, FileSystem] instead of Try. Lastly, we could just throw Exception, but this way we return to Java-like practices that doesn't tell us that error can happen, and it surprises us at runtime.
What I would surely do is to rename apply - we usually expect object's apply to always return success, so if it isn't possible and we use some sort of smart constructor we should give it some other name. E.g. here we could name it
def resolveFor(propFileURI: String): Either[FileSystemError, FileSystem] = ...
so that everyone using it would know what behavior expect just from the signature.

Read file in Scala : Stream closed

I try to read a file in scala like this:
def parseFile(filename: String) = {
val source = scala.io.Source.fromFile(filename)
try {
val lines = source.getLines().map(line => line.trim.toDouble)
return lines
} catch {
// re-throw exception, but make source source is closed
case
t: Throwable => {
println("error during parsing of file")
throw t
}
} finally {
source.close()
}
}
When I access the result later, I get an
java.io.IOException: Stream Closed
I understand that this arises because source.getLines() only returns an (lazy) Iterator[String], and I already close the BufferedSource in the finally clause.
How can I avoid this error, i.e. how can a "evaluate" the Stream before closing the source?
EDIT: I tried to call source.getLines().toSeq which did not help.
Maybe, you can try the following solution, which makes the codes more functional and takes the advantage of lazy evaluation.
First, define a helper function using, which takes care of open/close the file.
def using[A <: {def close() : Unit}, B](param: A)(f: A => B): B =
try f(param) finally param.close()
Then, you can refactor your code in functional programming style:
using(Source.fromFile(filename)) {
source =>
val lines = Try(source.getLines().map(line => line.trim.toDouble))
val result = lines.flatMap(l => Try(processOrDoWhatYouWantForLines(l)))
result.get
}
Actually, the using function can be used for handling all resources which need to be closed at the end of the operation.
List is not lazy so change:
val lines = source.getLines().map(line => line.trim.toDouble)
to
val lines = source.getLines().toList.map(line => line.trim.toDouble)
in order to force computing.

How to pass input in scala through command line

import scala.io._
object Sum {
def main(args :Array[String]):Unit = {
println("Enter some numbers and press ctrl-c")
val input = Source.fromInputStream(System.in)
val lines = input.getLines.toList
println("Sum "+sum(lines))
}
def toInt(in:String):Option[Int] =
try{
Some(Integer.parseInt(in.trim))
}
catch {
case e: NumberFormatException => None
}
def sum(in :Seq[String]) = {
val ints = in.flatMap(s=>toInt(s))
ints.foldLeft(0) ((a,b) => a +b)
} }
I am trying to run this program after passing input I have press
ctrl + c but
It gives this message E:\Scala>scala HelloWord.scala Enter some
numbers and press ctrl-c 1 2 3 Terminate batch job (Y/N)?
Additional observations, note trait App to make an object executable, hence not having to declare a main(...) function, for instance like this,
object Sum extends App {
import scala.io._
import scala.util._
val nums = Source.stdin.getLines.flatMap(v => Try(v.toInt).toOption)
println(s"Sum: ${nums.sum}")
}
Using Try, non successful conversions from String to Int are turned to None and flattened out.
Also note objects and classes are capitalized, hence instead of object sum by convention we write object Sum.
You can also use an external API. I really like scallop API
Try this piece of code. It should work as intended.
object Sum {
def main(args: Array[String]) {
val lines = io.Source.stdin.getLines
val numbers = lines.map(_.toInt)
println(s"Sum: ${numbers.sum}")
}
}
Plus, the correct shortcut to end the input stream is Ctrl + D.

Scala try with finally best practice

I have the following implementation where I'm trying to handle proper resource closing during any fatal exceptions:
private def loadPrivateKey(keyPath: String) = {
def tryReadCertificate(file: File): Try[BufferedReader] = Try { new BufferedReader(new FileReader(file)) }
def tryLoadPemParser(reader: BufferedReader): Try[PEMParser] = Try { new PEMParser(reader) }
def createXXX(buffReader: BufferedReader, pemParser: PEMParser) = try {
...
} finally {
buffReader.close()
pemParser.close()
}
tryReadCertificate(new File(keyPath, "myKey.pem")) match {
case Success(buffReader) => tryLoadPemParser(buffReader) match {
case Success(pemParser) => createXXX(buffReader, pemParser)
case Failure(fail) =>
}
case Failure(fail) =>
}
}
I already see that my nested case blocks are a mess. Is there a better way to do this? In the end, I just want to make sure that I close the BufferedReader and the PEMParser !
You could restructure your code a little like this, using a for-comprehension to clean up some of the nested case statements:
def tryReadCertificate(file: File): Try[BufferedReader] = Try { new BufferedReader(new FileReader(file)) }
def tryLoadPemParser(reader: BufferedReader): Try[PEMParser] = Try { new PEMParser(reader) }
def createXXX(buffReader: BufferedReader, pemParser: PEMParser) = {
...
}
val certReaderTry = tryReadCertificate(new File(keyPath, "myKey.pem"))
val pemParserTry = for{
certReader <- certReaderTry
pemParser <- tryLoadPemParser(certReader)
} yield {
createXXX(certReader, pemParser)
pemParser
}
certReaderTry foreach(_.close)
pemParserTry foreach (_.close)
Structured like this, you will only ever end up calling close on things you are sure were opened successfully.
And even better, if your PEMParser happened to extend java.io.Closeable, meaning that the Trys both wrapped Closeable objects, then you could swap those last two lines for a single line like this:
(certReaderTry.toOption ++ pemParserTry.toOption) foreach (_.close)
EDIT
In response to the OP's comment: In the first example, if tryreadCertificate succeeds, then certReaderTry will be a Success[BufferedReader] and because it's successful, calling foreach on it will yield the BufferedReader which will then have close called on it. If certReaderTry is Success, then (via the for-comp) we will call tryLoadPemParser and if that also succeeds, we can move on to createXXX and assign the tryLoadPemParser to the pemParserTry val. Then, later, if pemParserTry is a Success, the same thing happens where foreach yields the PEMParser and we can close it. Per this example, as long as the those Trys are successes and something else unexpected does not happen (in createXXX for example) that would throw an exception all the way out, then you can guarantee that the closing related code at the end will do its job and close those resources.
EDIT2
If you wanted the value from createXXX in a separate Try, then you could do something like this:
val certReaderTry = tryReadCertificate(new File(keyPath, "myKey.pem"))
val pemParserTry = certReaderTry.flatMap(tryLoadPemParser)
val resultTry = for{
certReader <- certReaderTry
pemParser <- pemParserTry
} yield createXXX(certReader, pemParser)

"using" function

I've defined 'using' function as following:
def using[A, B <: {def close(): Unit}] (closeable: B) (f: B => A): A =
try { f(closeable) } finally { closeable.close() }
I can use it like that:
using(new PrintWriter("sample.txt")){ out =>
out.println("hellow world!")
}
now I'm curious how to define 'using' function to take any number of parameters, and be able to access them separately:
using(new BufferedReader(new FileReader("in.txt")), new PrintWriter("out.txt")){ (in, out) =>
out.println(in.readLIne)
}
Starting Scala 2.13, the standard library provides a dedicated resource management utility: Using.
More specifically, the Using#Manager can be used when dealing with several resources.
In our case, we can manage different resources such as your PrintWriter or BufferedReader as they both implement AutoCloseable, in order to read and write from a file to another and, no matter what, close both the input and the output resource afterwards:
import scala.util.Using
import java.io.{PrintWriter, BufferedReader, FileReader}
Using.Manager { use =>
val in = use(new BufferedReader(new FileReader("input.txt")))
val out = use(new PrintWriter("output.txt"))
out.println(in.readLine)
}
// scala.util.Try[Unit] = Success(())
Someone has already done this—it's called Scala ARM.
From the readme:
import resource._
for(input <- managed(new FileInputStream("test.txt")) {
// Code that uses the input as a FileInputStream
}
I've been thinking about this and I thought maybe there was an other way to address this. Here is my take on supporting "any number" of parameters (limited by what tuples provide):
object UsingTest {
type Closeable = {def close():Unit }
final class CloseAfter[A<:Product](val x: A) {
def closeAfter[B](block: A=>B): B = {
try {
block(x);
} finally {
for (i <- 0 until x.productArity) {
x.productElement(i) match {
case c:Closeable => println("closing " + c); c.close()
case _ =>
}
}
}
}
}
implicit def any2CloseAfter[A<:Product](x: A): CloseAfter[A] =
new CloseAfter(x)
def main(args:Array[String]): Unit = {
import java.io._
(new BufferedReader(new FileReader("in.txt")),
new PrintWriter("out.txt"),
new PrintWriter("sample.txt")) closeAfter {case (in, out, other) =>
out.println(in.readLine)
other.println("hello world!")
}
}
}
I think I'm reusing the fact that 22 tuple/product classes have been written in the library... I don't think this syntax is clearer than using nested using (no pun intended), but it was an interesting puzzle.
using structural typing seems like a little overkill since java.lang.AutoCloseable is predestined for usage:
def using[A <: AutoCloseable, B](resource: A)(block: A => B): B =
try block(resource) finally resource.close()
or, if you prefer extension methods:
implicit class UsingExtension[A <: AutoCloseable](val resource: A) extends AnyVal {
def using[B](block: A => B): B = try block(resource) finally resource.close()
}
using2 is possible:
def using2[R1 <: AutoCloseable, R2 <: AutoCloseable, B](resource1: R1, resource2: R2)(block: (R1, R2) => B): B =
using(resource1) { _ =>
using(resource2) { _ =>
block(resource1, resource2)
}
}
but imho quite ugly - I would prefer to simply nest these using statements in the client code.
Unfortunately, there isn't support for arbitrary-length parameter lists with arbitrary types in standard Scala.
You might be able to do something like this with a couple of language changes (to allow variable parameter lists to be passed as HLists; see here for about 1/3 of what's needed).
Right now, the best thing to do is just do what Tuple and Function do: implement usingN for as many N as you need.
Two is easy enough, of course:
def using2[A, B <: {def close(): Unit}, C <: { def close(): Unit}](closeB: B, closeC: C)(f: (B,C) => A): A = {
try { f(closeB,closeC) } finally { closeB.close(); closeC.close() }
}
If you need more, it's probably worth writing something that'll generate the source code.
Here is an example that allows you to use the scala for comprehension as an automatic resource management block for any item that is a java.io.Closeable, but it could easily be expanded to work for any object with a close method.
This usage seems pretty close to the using statement and allows you to easily have as many resources defined in one block as you want.
object ResourceTest{
import CloseableResource._
import java.io._
def test(){
for( input <- new BufferedReader(new FileReader("/tmp/input.txt")); output <- new FileWriter("/tmp/output.txt") ){
output.write(input.readLine)
}
}
}
class CloseableResource[T](resource: =>T,onClose: T=>Unit){
def foreach(f: T=>Unit){
val r = resource
try{
f(r)
}
finally{
try{
onClose(r)
}
catch{
case e =>
println("error closing resource")
e.printStackTrace
}
}
}
}
object CloseableResource{
implicit def javaCloseableToCloseableResource[T <: java.io.Closeable](resource:T):CloseableResource[T] = new CloseableResource[T](resource,{_.close})
}
It is a good idea to detatch the cleanup algorithm from the program path.
This solution lets you accumulate closeables in a scope.
The scope cleanup will happen on after the block is executed, or the scope can be detached. The cleaning of the scope can then be done later.
This way we get the same convenience whitout being limited to single thread programming.
The utility class:
import java.io.Closeable
object ManagedScope {
val scope=new ThreadLocal[Scope]();
def managedScope[T](inner: =>T):T={
val previous=scope.get();
val thisScope=new Scope();
scope.set(thisScope);
try{
inner
} finally {
scope.set(previous);
if(!thisScope.detatched) thisScope.close();
}
}
def closeLater[T <: Closeable](what:T): T = {
val theScope=scope.get();
if(!(theScope eq null)){
theScope.closeables=theScope.closeables.:+(what);
}
what;
}
def detatchScope(): Scope={
val theScope=scope.get();
if(theScope eq null) null;
else {
theScope.detatched=true;
theScope;
}
}
}
class Scope{
var detatched=false;
var closeables:List[Closeable]=List();
def close():Unit={
for(c<-closeables){
try{
if(!(c eq null))c.close();
} catch{
case e:Throwable=>{};
}
}
}
}
The usage:
def checkSocketConnect(host:String, portNumber:Int):Unit = managedScope {
// The close later function tags the closeable to be closed later
val socket = closeLater( new Socket(host, portNumber) );
doWork(socket);
}
def checkFutureConnect(host:String, portNumber:Int):Unit = managedScope {
// The close later function tags the closeable to be closed later
val socket = closeLater( new Socket(host, portNumber) );
val future:Future[Boolean]=doAsyncWork(socket);
// Detatch the scope and use it in the future.
val scope=detatchScope();
future.onComplete(v=>scope.close());
}
This solution doesn't quite have the syntax you desire, but I think it's close enough :)
def using[A <: {def close(): Unit}, B](resources: List[A])(f: List[A] => B): B =
try f(resources) finally resources.foreach(_.close())
using(List(new BufferedReader(new FileReader("in.txt")), new PrintWriter("out.txt"))) {
case List(in: BufferedReader, out: PrintWriter) => out.println(in.readLine())
}
Of course the down side is you have to type out the types BufferedReader and PrintWrter in the using block. You might be able to add some magic so that you just need List(in, out) by using multiple ORed type bounds for type A in using.
By defining some pretty hacky and dangerous implicit conversions you can get around having to type List (and another way to get around specifying types for the resources), but I haven't documented the detail as it's too dangerous IMO.
here is my solution to the resource management in Scala:
def withResources[T <: AutoCloseable, V](r: => T)(f: T => V): V = {
val resource: T = r
require(resource != null, "resource is null")
var exception: Throwable = null
try {
f(resource)
} catch {
case NonFatal(e) =>
exception = e
throw e
} finally {
closeAndAddSuppressed(exception, resource)
}
}
private def closeAndAddSuppressed(e: Throwable,
resource: AutoCloseable): Unit = {
if (e != null) {
try {
resource.close()
} catch {
case NonFatal(suppressed) =>
e.addSuppressed(suppressed)
}
} else {
resource.close()
}
}
I used this in multiple Scala apps including managing resources in Spark executors. and one should be aware that we are other even better ways to manage resource like in CatsIO: https://typelevel.org/cats-effect/datatypes/resource.html. if you are ok with pure FP in Scala.
to answer your last question, you can definitely nest the resource like this:
withResource(r: File)(
r => {
withResource(a: File)(
anotherR => {
withResource(...)(...)
}
)
}
)
this way, not just that those resources are protected from leaking, they will also be released in the correct order(like stack). same behaviour like the Resource Monad from CatsIO.