What is the best way to write the following in Scala? It doesn't look quite right to me - first the forward declaration of the 2 vals, then the long PrintWriter creation line, then thefinallyblock. The only thing that's idiomatic, is the catch block...
val outputStream = Try(fs.create(tmpFile))
val writer = new PrintWriter(new BufferedWriter(new OutputStreamWriter(outputStream.get)))
if (outputStream.isFailure) {
logger.error(s"Couldn't open file: $tmpFile")
}
try {
features.foreach {
case (sectionName, modelRDD) =>
writer.append("{" + sectionName + ", " + modelRDD.getNumPartitions + "}")
}
} catch {
case e: Exception =>
logger.error(s"Got exception", e)
throw e
} finally {
outputStream.get.close()
writer.close()
}
We can further use the context of the initial Try to execute the complete I/O operation:
First, we define a function that encapsulates our process:
def safeFilePrint(tf: => OutputStream)(op: PrintWriter => Unit): Try[Unit] = {
val os = Try(tf)
val write = {
val writer = os.map(f => new PrintWriter(f))
val writeOp = writer.map(op)
val flushOp = writer.map(_.flush)
writeOp.flatMap(_ => flushOp)
}
val close = os.map(_.close)
write.flatMap(_ => close)
}
And usage:
val collection = Seq(...)
val writeResult = safeFilePrint(new FileOutputStream(new File("/tmp/foo.txt"))){w =>
collection.foreach(elem => w.write(e)
}
Note that in contrast with the original code, we have a result of the write operation. Either writeResult will be Success(()) if everything went well or Failure(exception) is something went wrong. Based on that our application can further decide what to do.
One may wonder: "Where is the finally?" In Java, finally is used to ensure that some code (typically resource management) would be executed, even in the case that an exception thrown in the try scope would cause an exception-handling path to be followed.
In Scala, using constructs like Try, Either or our own ADT, we lift error handling to the application level. finally becomes unnecessary as our program is able to deal with the failure as just another valid state of the program.
Finally settled on that code after reading #maasg's answer, which highlights the monadic flow and is more "symmetric". It looks much, much better than the code in the OP!
def safePrintToStream(gen: => OutputStream)(op: PrintWriter => Unit): Try[Unit] = {
val os = Try(gen)
val writer = os.map(stream => new PrintWriter(stream))
val write = writer.map(op(_))
val flush = writer.map(_.flush)
val close = os.map(_.close)
write.flatMap(_ => flush).flatMap(_ => close)
}
Related
I am trying to read incremental data from my data source using Scala-Spark. Before hitting the source tables, I am trying to calculate the min & max of partition column that I use in my code in a Future which is present in a class: GetSourceMeta as given below.
def getBounds(keyIdMap:scala.collection.mutable.Map[String, String]): Future[scala.collection.mutable.Map[String, String]] = Future {
var boundsMap = scala.collection.mutable.Map[String, String]()
keyIdMap.keys.foreach(table => if(!keyIdMap(table).contains("Invalid")) {
val minMax = s"select max(insert_tms) maxTms, min(insert_tms) minTms from schema.${table} where source='DB2' and key_id in (${keyIdMap(table)})"
println("MinMax: " + minMax)
val boundsDF = spark.read.format("jdbc").option("url", con.getConUrl()).option("dbtable", s"(${minMax}) as ctids").option("user", con.getUserName()).option("password", con.getPwd()).load()
try {
val maxTms = boundsDF.select("minTms").head.getTimestamp(0).toString + "," + boundsDF.select("maxTms").head.getTimestamp(0).toString
println("Bounds: " + maxTms)
boundsMap += (table -> maxTms)
} catch {
case np: java.lang.NullPointerException => { println("No data found") }
case e: Exception => { println(s"Unknown exception: $e") }
}
}
)
boundsMap.foreach(println)
boundsMap
}
I am calling the above method in my main method as:
object LoadToCopyDB {
val conf = new SparkConf().setAppName("TEST_YEAR").set("some parameters")
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().config(conf).master("yarn").enableHiveSupport().config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").getOrCreate()
val gsm = new GetSourceMeta()
val minMaxKeyMap = gsm.getBounds(keyIdMap).onComplete {
case Success(values) => values.foreach(println)
case Failure(f) => f.printStackTrace
}
.
.
.
}
Well, the onComplete didn't print any values so I used andThen as below and that didn't help as well.
val bounds: Future[scala.collection.mutable.Map[String, String]] = gpMetaData.getBounds(incrementalIds) andThen {
case Success(outval) => outval.foreach(println)
case Failure(e) => println(e)
}
Earlier the main thread exits without letting the Future: getBounds execute. Hence I couldn't find any println statements from the Future displayed on the terminal. I found out that I need to keep the main thread Await inorder to complete the Future. But when I use Await in main along with onComplete:
Await.result(bounds, Duration.Inf)
The compiler gives an error:
Type mismatch, expected: Awaitable[NotInferedT], actual:Unit
If I declare the val minMaxKeyMap as Future[scala.collection.mutable.Map[String, String] the compiler says: Expression of type Unit doesn't conform to expected type Future[mutable.map[String,String]]
I tried to print the values of bounds after the Await statement but that just prints an empty Map.
I couldn't understand how can to fix this. Could anyone let me know what do I do to make the Future run properly ?
In this kind of cases, is always better to follow the types. The method onComplete only returns Unit, it won´t return a future hence it can´t be passed using Await.
In case you want to return a Future of any type you will have to map or flatmap the value and return an option, for example. In this case, does not matter what you return, you only want Await method to wait for this result and print a trace. You can treat the possible exception in the recover. It would be like that in your code:
val minMaxKeyMap:Future[Option[Any] = gsm.getBounds(keyIdMap).map { values =>
values.foreach(println)
None
}.recover{
case e: Throwable =>
e. printStackTrace
None
}
Note that the recover part has to return an instance of the type.
After that, you can apply the Await to the Future, and you will get the results printed. Is not the prettiest solution but it will work in your case.
I try to read a file in scala like this:
def parseFile(filename: String) = {
val source = scala.io.Source.fromFile(filename)
try {
val lines = source.getLines().map(line => line.trim.toDouble)
return lines
} catch {
// re-throw exception, but make source source is closed
case
t: Throwable => {
println("error during parsing of file")
throw t
}
} finally {
source.close()
}
}
When I access the result later, I get an
java.io.IOException: Stream Closed
I understand that this arises because source.getLines() only returns an (lazy) Iterator[String], and I already close the BufferedSource in the finally clause.
How can I avoid this error, i.e. how can a "evaluate" the Stream before closing the source?
EDIT: I tried to call source.getLines().toSeq which did not help.
Maybe, you can try the following solution, which makes the codes more functional and takes the advantage of lazy evaluation.
First, define a helper function using, which takes care of open/close the file.
def using[A <: {def close() : Unit}, B](param: A)(f: A => B): B =
try f(param) finally param.close()
Then, you can refactor your code in functional programming style:
using(Source.fromFile(filename)) {
source =>
val lines = Try(source.getLines().map(line => line.trim.toDouble))
val result = lines.flatMap(l => Try(processOrDoWhatYouWantForLines(l)))
result.get
}
Actually, the using function can be used for handling all resources which need to be closed at the end of the operation.
List is not lazy so change:
val lines = source.getLines().map(line => line.trim.toDouble)
to
val lines = source.getLines().toList.map(line => line.trim.toDouble)
in order to force computing.
I have the following implementation where I'm trying to handle proper resource closing during any fatal exceptions:
private def loadPrivateKey(keyPath: String) = {
def tryReadCertificate(file: File): Try[BufferedReader] = Try { new BufferedReader(new FileReader(file)) }
def tryLoadPemParser(reader: BufferedReader): Try[PEMParser] = Try { new PEMParser(reader) }
def createXXX(buffReader: BufferedReader, pemParser: PEMParser) = try {
...
} finally {
buffReader.close()
pemParser.close()
}
tryReadCertificate(new File(keyPath, "myKey.pem")) match {
case Success(buffReader) => tryLoadPemParser(buffReader) match {
case Success(pemParser) => createXXX(buffReader, pemParser)
case Failure(fail) =>
}
case Failure(fail) =>
}
}
I already see that my nested case blocks are a mess. Is there a better way to do this? In the end, I just want to make sure that I close the BufferedReader and the PEMParser !
You could restructure your code a little like this, using a for-comprehension to clean up some of the nested case statements:
def tryReadCertificate(file: File): Try[BufferedReader] = Try { new BufferedReader(new FileReader(file)) }
def tryLoadPemParser(reader: BufferedReader): Try[PEMParser] = Try { new PEMParser(reader) }
def createXXX(buffReader: BufferedReader, pemParser: PEMParser) = {
...
}
val certReaderTry = tryReadCertificate(new File(keyPath, "myKey.pem"))
val pemParserTry = for{
certReader <- certReaderTry
pemParser <- tryLoadPemParser(certReader)
} yield {
createXXX(certReader, pemParser)
pemParser
}
certReaderTry foreach(_.close)
pemParserTry foreach (_.close)
Structured like this, you will only ever end up calling close on things you are sure were opened successfully.
And even better, if your PEMParser happened to extend java.io.Closeable, meaning that the Trys both wrapped Closeable objects, then you could swap those last two lines for a single line like this:
(certReaderTry.toOption ++ pemParserTry.toOption) foreach (_.close)
EDIT
In response to the OP's comment: In the first example, if tryreadCertificate succeeds, then certReaderTry will be a Success[BufferedReader] and because it's successful, calling foreach on it will yield the BufferedReader which will then have close called on it. If certReaderTry is Success, then (via the for-comp) we will call tryLoadPemParser and if that also succeeds, we can move on to createXXX and assign the tryLoadPemParser to the pemParserTry val. Then, later, if pemParserTry is a Success, the same thing happens where foreach yields the PEMParser and we can close it. Per this example, as long as the those Trys are successes and something else unexpected does not happen (in createXXX for example) that would throw an exception all the way out, then you can guarantee that the closing related code at the end will do its job and close those resources.
EDIT2
If you wanted the value from createXXX in a separate Try, then you could do something like this:
val certReaderTry = tryReadCertificate(new File(keyPath, "myKey.pem"))
val pemParserTry = certReaderTry.flatMap(tryLoadPemParser)
val resultTry = for{
certReader <- certReaderTry
pemParser <- pemParserTry
} yield createXXX(certReader, pemParser)
Here below is my attempt to implement a class that provides functionality for compressing/decompressing strings:
object GZipHelper {
def deflate(txt: String): Try[String] = {
try {
val arrOutputStream = new ByteArrayOutputStream()
val zipOutputStream = new GZIPOutputStream(arrOutputStream)
zipOutputStream.write(txt.getBytes)
new Success(Base64.encodeBase64String(arrOutputStream.toByteArray))
} catch {
case _: e => new Failure(e)
}
}
def inflate(deflatedTxt: String): Try[String] = {
try {
val bytes = Base64.decodedBase64(deflatedTxt)
val zipInputStream = GZIPInputStream(new ByteArrayInputStream(bytes))
new success(IOUtils.toString(zipInputStream))
} catch {
case _: e => new Failure(e)
}
}
}
As you can see, the finally blocks that close GZIPOutputStream and GZIPInputStream are missing... how could I implement this in the ''scala'' way? How could I improve the code?
Since you're using the "old fashioned" try statement and explicitly turning it into a scala.util.Try, there really is no reason not to add a finally block after your try.
In this specific case though, there is little point in closing, for example, your ByteArrayInputStream - it's not really an open resource and does not need to be closed. In which case you can simplify your code and make it much more idiomatic this way:
def inflate(deflatedTxt: String): Try[String] = Try {
val bytes = Base64.decodedBase64(deflatedTxt)
val zipInputStream = GZIPInputStream(new ByteArrayInputStream(bytes))
IOUtils.toString(zipInputStream)
}
I personally would not declare bytes and zipInputStream since they're only used once, but that's a matter of preference.
The trick here is having a finally block with a call to scala.util.Try.apply - I'm not sure that's possible without going through a call to map that doesn't actually modify anything, which seems like a bit of an oversight to me. I was expecting to see an andThen or eventually method in scala.util.Try, but it doesn't seem to be there (yet?).
Just for completeness, here is the deflate method converted (original version was also missing the close() call on the GZIP class):
def deflate(txt: String): Try[String] = Try {
val arrOutputStream = new ByteArrayOutputStream()
val zipOutputStream = new GZIPOutputStream(arrOutputStream)
zipOutputStream.write(txt.getBytes)
zipOutputStream.close()
Base64.encodeBase64String(arrOutputStream.toByteArray)
}
https://github.com/jsuereth/scala-arm/wiki/basic-usage
this looks like a good approach
You can use scala-compress https://github.com/gekomad/scala-compress
compress string:
val aString: String = "foo"
val compressed: Try[Array[Byte]] = zipString(aString, charSetName = "UTF-8")
decompress:
val compressedArray: Array[Byte] = ???
val decompressed: Try[Array[Byte]] = unzipString(compressedArray)
new String(decompressed, "UTF-8")
Anybody know a solution to this problem ? I rewrote try catch finally construct to a functional way of doing things, but I can't close the stream now :-)
import scala.util.control.Exception._
def gunzip() = {
logger.info(s"Gunziping file ${f.getAbsolutePath}")
catching(classOf[IOException], classOf[FileNotFoundException]).
andFinally(println("how can I close the stream ?")).
either ({
val is = new GZIPInputStream(new FileInputStream(f))
Stream.continually(is.read()).takeWhile(-1 !=).map(_.toByte).toArray
}) match {
case Left(e) =>
val msg = s"IO error reading file ${f.getAbsolutePath} ! on host ${Setup.smtpHost}"
logger.error(msg, e)
MailClient.send(msg, msg)
new Array[Byte](0)
case Right(v) => v
}
}
I rewrote it based on Senia's solution like this :
def gunzip() = {
logger.info(s"Gunziping file ${file.getAbsolutePath}")
def closeAfterReading(c: InputStream)(f: InputStream => Array[Byte]) = {
catching(classOf[IOException], classOf[FileNotFoundException])
.andFinally(c.close())
.either(f(c)) match {
case Left(e) => {
val msg = s"IO error reading file ${file.getAbsolutePath} ! on host ${Setup.smtpHost}"
logger.error(msg, e)
new Array[Byte](0)
}
case Right(v) => v
}
}
closeAfterReading(new GZIPInputStream(new FileInputStream(file))) { is =>
Stream.continually(is.read()).takeWhile(-1 !=).map(_.toByte).toArray
}
}
I prefer this construction for such cases:
def withCloseable[T <: Closeable, R](t: T)(f: T => R): R = {
allCatch.andFinally{t.close} apply { f(t) }
}
def read(f: File) =
withCloseable(new GZIPInputStream(new FileInputStream(f))) { is =>
Stream.continually(is.read()).takeWhile(-1 !=).map(_.toByte).toArray
}
Now you could wrap it with Try and recover on some exceptions:
val result =
Try { read(f) }.recover{
case e: IOException => recover(e) // logging, default value
case e: FileNotFoundException => recover(e)
}
val array = result.get // Exception here!
take "scala-arm"
take Apache "commons-io"
then do the following
val result =
for {fis <- resource.managed(new FileInputStream(f))
gis <- resource.managed(new GZIPInputStream(fis))}
yield IOUtils.toString(gis, "UTF-8")
result.acquireFor(identity) fold (reportExceptions _, v => v)
One way how to handle it would be to use a mutable list of things that are opened and need to be closed later:
val cs: Buffer[Closeable] = new ArrayBuffer();
def addClose[C <: Closeable](c: C) = { cs += c; c; }
catching(classOf[IOException], classOf[FileNotFoundException]).
andFinally({ cs.foreach(_.close()) }).
either ({
val is = addClose(new GZIPInputStream(new FileInputStream(f)))
Stream.continually(is.read()).takeWhile(-1 !=).map(_.toByte).toArray
}) // ...
Update: You could use scala-conduit library (I'm the author) for this purpose. (The library is currently not considered production ready.) The main aim of pipes (AKA conduids) is to construct composable components with well defined resource handling. Each pipe repeatedly receives input and produces input. Optionally, it also produces a final result when it finishes. Pips has finalizers that are run after a pipe finishes - either on its own or when its downstream pipe finishes. Your example could be reworked (using Java NIO) as follows:
/**
* Filters buffers until a given character is found. The last buffer
* (truncated up to the character) is also included.
*/
def untilPipe(c: Byte): Pipe[ByteBuffer,ByteBuffer,Unit] = ...
// Create a new source that chunks a file as ByteBuffer's.
// (Note that the buffer changes on every step.)
val source: Source[ByteBuffer,Unit] = ...
// Sink that prints bytes to the standard output.
// You would create your own sink doing whatever you want.
val sink: Sink[ByteBuffer,Unit]
= NIO.writeChannel(Channels.newChannel(System.out));
runPipe(source >-> untilPipe(-1) >-> sink);
As soon as untilPipe(-1) finds -1 and finishes, its upstream source pipe's finalizer is run and the input is closed. If an exception occurs anywhere in the pipeline, the input is closed as well.
The full example can be found here.
I have another one proposition for cases when a closeble object like java.io.Socket may run for a long time, so one have to wrap it in Future. This is handful when you also control a timeout, when Socket is not responding.
object CloseableFuture {
type Closeable = {
def close(): Unit
}
private def withClose[T, F1 <: Closeable](f: => F1, andThen: F1 => Future[T]): Future[T] = future(f).flatMap(closeable => {
val internal = andThen(closeable)
internal.onComplete(_ => closeable.close())
internal
})
def apply[T, F1 <: Closeable](f: => F1, andThen: F1 => T): Future[T] =
withClose(f, {c: F1 => future(andThen(c))})
def apply[T, F1 <: Closeable, F2 <: Closeable](f1: => F1, thenF2: F1 => F2, andThen: (F1,F2) => T): Future [T] =
withClose(f1, {c1:F1 => CloseableFuture(thenF2(c1), {c2:F2 => andThen(c1,c2)})})
}
After I open a java.io.Socket and java.io.InputStream for it, and then execute code which reads from WhoisServer, I got both of them closed finally. Full code:
CloseableFuture(
{new Socket(server.address, WhoisPort)},
(s: Socket) => s.getInputStream,
(socket: Socket, inputStream: InputStream) => {
val streamReader = new InputStreamReader(inputStream)
val bufferReader = new BufferedReader(streamReader)
val outputStream = socket.getOutputStream
val writer = new OutputStreamWriter(outputStream)
val bufferWriter = new BufferedWriter(writer)
bufferWriter.write(urlToAsk+System.getProperty("line.separator"))
bufferWriter.flush()
def readBuffer(acc: List[String]): List[String] = bufferReader.readLine() match {
case null => acc
case str => {
readBuffer(str :: acc)
}
}
val result = readBuffer(Nil).reverse.mkString("\r\n")
WhoisResult(urlToAsk, result)
}
)