Data compression in Scala - scala

Here below is my attempt to implement a class that provides functionality for compressing/decompressing strings:
object GZipHelper {
def deflate(txt: String): Try[String] = {
try {
val arrOutputStream = new ByteArrayOutputStream()
val zipOutputStream = new GZIPOutputStream(arrOutputStream)
zipOutputStream.write(txt.getBytes)
new Success(Base64.encodeBase64String(arrOutputStream.toByteArray))
} catch {
case _: e => new Failure(e)
}
}
def inflate(deflatedTxt: String): Try[String] = {
try {
val bytes = Base64.decodedBase64(deflatedTxt)
val zipInputStream = GZIPInputStream(new ByteArrayInputStream(bytes))
new success(IOUtils.toString(zipInputStream))
} catch {
case _: e => new Failure(e)
}
}
}
As you can see, the finally blocks that close GZIPOutputStream and GZIPInputStream are missing... how could I implement this in the ''scala'' way? How could I improve the code?

Since you're using the "old fashioned" try statement and explicitly turning it into a scala.util.Try, there really is no reason not to add a finally block after your try.
In this specific case though, there is little point in closing, for example, your ByteArrayInputStream - it's not really an open resource and does not need to be closed. In which case you can simplify your code and make it much more idiomatic this way:
def inflate(deflatedTxt: String): Try[String] = Try {
val bytes = Base64.decodedBase64(deflatedTxt)
val zipInputStream = GZIPInputStream(new ByteArrayInputStream(bytes))
IOUtils.toString(zipInputStream)
}
I personally would not declare bytes and zipInputStream since they're only used once, but that's a matter of preference.
The trick here is having a finally block with a call to scala.util.Try.apply - I'm not sure that's possible without going through a call to map that doesn't actually modify anything, which seems like a bit of an oversight to me. I was expecting to see an andThen or eventually method in scala.util.Try, but it doesn't seem to be there (yet?).

Just for completeness, here is the deflate method converted (original version was also missing the close() call on the GZIP class):
def deflate(txt: String): Try[String] = Try {
val arrOutputStream = new ByteArrayOutputStream()
val zipOutputStream = new GZIPOutputStream(arrOutputStream)
zipOutputStream.write(txt.getBytes)
zipOutputStream.close()
Base64.encodeBase64String(arrOutputStream.toByteArray)
}

https://github.com/jsuereth/scala-arm/wiki/basic-usage
this looks like a good approach

You can use scala-compress https://github.com/gekomad/scala-compress
compress string:
val aString: String = "foo"
val compressed: Try[Array[Byte]] = zipString(aString, charSetName = "UTF-8")
decompress:
val compressedArray: Array[Byte] = ???
val decompressed: Try[Array[Byte]] = unzipString(compressedArray)
new String(decompressed, "UTF-8")

Related

How to perofrm a try and catch in ZIO?

My first steps with ZIO, I am trying to convert this readfile function with a compatible ZIO version.
The below snippet compiles, but I am not closing the source in the ZIO version. How do I do that?
def run(args: List[String]) =
myAppLogic.exitCode
val myAppLogic =
for {
_ <- readFileZio("C:\\path\\to\file.csv")
_ <- putStrLn("Hello! What is your name?")
name <- getStrLn
_ <- putStrLn(s"Hello, ${name}, welcome to ZIO!")
} yield ()
def readfile(file: String): String = {
val source = scala.io.Source.fromFile(file)
try source.getLines.mkString finally source.close()
}
def readFileZio(file: String): zio.Task[String] = {
val source = scala.io.Source.fromFile(file)
ZIO.fromTry[String]{
Try{source.getLines.mkString}
}
}
The simplest solution for your problem would be using bracket function, which in essence has similar purpose as try-finally block. It gets as first argument effect that closes resource (in your case Source) and as second effect that uses it.
So you could rewrite readFileZio like:
def readFileZio(file: String): Task[Iterator[String]] =
ZIO(Source.fromFile(file))
.bracket(
s => URIO(s.close),
s => ZIO(s.getLines())
)
Another option is to use ZManaged which is a data type that encapsulates the operation of opening and closing resource:
def managedSource(file: String): ZManaged[Any, Throwable, BufferedSource] =
Managed.make(ZIO(Source.fromFile(file)))(s => URIO(s.close))
And then you could use it like this:
def readFileZioManaged(file: String): Task[Iterator[String]] =
managedSource(file).use(s => ZIO(s.getLines()))

make scala future wait to modify a variable

I have been struck with a piece on how to obtain a listbuffer of strings in the case where the listbuffer happens to be constructed in a scala future called in a loop.
Here is a kiss example
def INeedThatListBuffer(): ListBuffer[String] = {
var myCollections: ListBuffer[String] = new ListBuffer[String]()
for (day <- daysInaWeek) {
val myFuture: Future[String] = Future {
// use 'day' do some stuff and get me a result
???
}
myFuture.onComplete {
case Success(result) =>
myCollections += result
}
}
myCollections
}
My problem is that sometimes listBuffer is empty list and sometimes the content that I expected. Clearly, this method is complete before the future is evaluated.
Just to Add
I don't want to used future.await
Passing myCollections as Future obj does not work as there is no binding that myFuture must be complete before myCollections is evaluated.
Kindly help me out.
Thanks
This returns a future. If you don't care about waiting for it to be complete, you can always access the underlying value using .value:
def INeedThatListBuffer(): Future[ListBuffer[String]] = {
def buildFutureFromDay(day: String): Future[String] = Future { ??? }
Future
.sequence(daysInAWeek.map(buildFutureFromDay))
.map(_.foldLeft(ListBuffer[String]())(_ += _))
}
You need to await at some point. Either await for each of the futures to get resolved or change the ListBuffer[String] to ListBuffer[Future[String]] and await on the whole buffer later.
val myFutureCollection: ListBuffer[Future[String]] = new ListBuffer[Future[String]]()
val myFuture: Future[String] = Future(???)
myFutureCollection += myFuture
val eventualBuffer: Future[ListBuffer[String]] = Future.sequence(myFutureCollection)
val buffer: ListBuffer[String] = Await.result(eventualBuffer, 2 seconds)
P.S: You can val instead of var for the list buffer as it is already mutable.

Most idiomatic way to write try/catch/finally in Scala?

What is the best way to write the following in Scala? It doesn't look quite right to me - first the forward declaration of the 2 vals, then the long PrintWriter creation line, then thefinallyblock. The only thing that's idiomatic, is the catch block...
val outputStream = Try(fs.create(tmpFile))
val writer = new PrintWriter(new BufferedWriter(new OutputStreamWriter(outputStream.get)))
if (outputStream.isFailure) {
logger.error(s"Couldn't open file: $tmpFile")
}
try {
features.foreach {
case (sectionName, modelRDD) =>
writer.append("{" + sectionName + ", " + modelRDD.getNumPartitions + "}")
}
} catch {
case e: Exception =>
logger.error(s"Got exception", e)
throw e
} finally {
outputStream.get.close()
writer.close()
}
We can further use the context of the initial Try to execute the complete I/O operation:
First, we define a function that encapsulates our process:
def safeFilePrint(tf: => OutputStream)(op: PrintWriter => Unit): Try[Unit] = {
val os = Try(tf)
val write = {
val writer = os.map(f => new PrintWriter(f))
val writeOp = writer.map(op)
val flushOp = writer.map(_.flush)
writeOp.flatMap(_ => flushOp)
}
val close = os.map(_.close)
write.flatMap(_ => close)
}
And usage:
val collection = Seq(...)
val writeResult = safeFilePrint(new FileOutputStream(new File("/tmp/foo.txt"))){w =>
collection.foreach(elem => w.write(e)
}
Note that in contrast with the original code, we have a result of the write operation. Either writeResult will be Success(()) if everything went well or Failure(exception) is something went wrong. Based on that our application can further decide what to do.
One may wonder: "Where is the finally?" In Java, finally is used to ensure that some code (typically resource management) would be executed, even in the case that an exception thrown in the try scope would cause an exception-handling path to be followed.
In Scala, using constructs like Try, Either or our own ADT, we lift error handling to the application level. finally becomes unnecessary as our program is able to deal with the failure as just another valid state of the program.
Finally settled on that code after reading #maasg's answer, which highlights the monadic flow and is more "symmetric". It looks much, much better than the code in the OP!
def safePrintToStream(gen: => OutputStream)(op: PrintWriter => Unit): Try[Unit] = {
val os = Try(gen)
val writer = os.map(stream => new PrintWriter(stream))
val write = writer.map(op(_))
val flush = writer.map(_.flush)
val close = os.map(_.close)
write.flatMap(_ => flush).flatMap(_ => close)
}

Read file in Scala : Stream closed

I try to read a file in scala like this:
def parseFile(filename: String) = {
val source = scala.io.Source.fromFile(filename)
try {
val lines = source.getLines().map(line => line.trim.toDouble)
return lines
} catch {
// re-throw exception, but make source source is closed
case
t: Throwable => {
println("error during parsing of file")
throw t
}
} finally {
source.close()
}
}
When I access the result later, I get an
java.io.IOException: Stream Closed
I understand that this arises because source.getLines() only returns an (lazy) Iterator[String], and I already close the BufferedSource in the finally clause.
How can I avoid this error, i.e. how can a "evaluate" the Stream before closing the source?
EDIT: I tried to call source.getLines().toSeq which did not help.
Maybe, you can try the following solution, which makes the codes more functional and takes the advantage of lazy evaluation.
First, define a helper function using, which takes care of open/close the file.
def using[A <: {def close() : Unit}, B](param: A)(f: A => B): B =
try f(param) finally param.close()
Then, you can refactor your code in functional programming style:
using(Source.fromFile(filename)) {
source =>
val lines = Try(source.getLines().map(line => line.trim.toDouble))
val result = lines.flatMap(l => Try(processOrDoWhatYouWantForLines(l)))
result.get
}
Actually, the using function can be used for handling all resources which need to be closed at the end of the operation.
List is not lazy so change:
val lines = source.getLines().map(line => line.trim.toDouble)
to
val lines = source.getLines().toList.map(line => line.trim.toDouble)
in order to force computing.

Scala try with finally best practice

I have the following implementation where I'm trying to handle proper resource closing during any fatal exceptions:
private def loadPrivateKey(keyPath: String) = {
def tryReadCertificate(file: File): Try[BufferedReader] = Try { new BufferedReader(new FileReader(file)) }
def tryLoadPemParser(reader: BufferedReader): Try[PEMParser] = Try { new PEMParser(reader) }
def createXXX(buffReader: BufferedReader, pemParser: PEMParser) = try {
...
} finally {
buffReader.close()
pemParser.close()
}
tryReadCertificate(new File(keyPath, "myKey.pem")) match {
case Success(buffReader) => tryLoadPemParser(buffReader) match {
case Success(pemParser) => createXXX(buffReader, pemParser)
case Failure(fail) =>
}
case Failure(fail) =>
}
}
I already see that my nested case blocks are a mess. Is there a better way to do this? In the end, I just want to make sure that I close the BufferedReader and the PEMParser !
You could restructure your code a little like this, using a for-comprehension to clean up some of the nested case statements:
def tryReadCertificate(file: File): Try[BufferedReader] = Try { new BufferedReader(new FileReader(file)) }
def tryLoadPemParser(reader: BufferedReader): Try[PEMParser] = Try { new PEMParser(reader) }
def createXXX(buffReader: BufferedReader, pemParser: PEMParser) = {
...
}
val certReaderTry = tryReadCertificate(new File(keyPath, "myKey.pem"))
val pemParserTry = for{
certReader <- certReaderTry
pemParser <- tryLoadPemParser(certReader)
} yield {
createXXX(certReader, pemParser)
pemParser
}
certReaderTry foreach(_.close)
pemParserTry foreach (_.close)
Structured like this, you will only ever end up calling close on things you are sure were opened successfully.
And even better, if your PEMParser happened to extend java.io.Closeable, meaning that the Trys both wrapped Closeable objects, then you could swap those last two lines for a single line like this:
(certReaderTry.toOption ++ pemParserTry.toOption) foreach (_.close)
EDIT
In response to the OP's comment: In the first example, if tryreadCertificate succeeds, then certReaderTry will be a Success[BufferedReader] and because it's successful, calling foreach on it will yield the BufferedReader which will then have close called on it. If certReaderTry is Success, then (via the for-comp) we will call tryLoadPemParser and if that also succeeds, we can move on to createXXX and assign the tryLoadPemParser to the pemParserTry val. Then, later, if pemParserTry is a Success, the same thing happens where foreach yields the PEMParser and we can close it. Per this example, as long as the those Trys are successes and something else unexpected does not happen (in createXXX for example) that would throw an exception all the way out, then you can guarantee that the closing related code at the end will do its job and close those resources.
EDIT2
If you wanted the value from createXXX in a separate Try, then you could do something like this:
val certReaderTry = tryReadCertificate(new File(keyPath, "myKey.pem"))
val pemParserTry = certReaderTry.flatMap(tryLoadPemParser)
val resultTry = for{
certReader <- certReaderTry
pemParser <- pemParserTry
} yield createXXX(certReader, pemParser)