How do you prevent objects from leaving their enclosing scope - scala

I ran into this while using a PDF library, but there have been plenty of other occasions in which I would have something like this useful.
There are many situations in which you have a resource (that needs to be closed) and you use these resources for obtaining objects that are only valid as long as the resource is open and hasn't been released yet.
Let's say the b reference in the code below is only valid while a is open:
val a = open()
try {
val b = a.someObject()
} finally {
a.close()
}
Now, this code is fine, but this code isn't:
val b = {
val a = open()
try {
a.someObject()
} finally {
a.close()
}
}
With that code I would have a reference to something of resource a, while a is no longer open.
Ideally I'd like to have something like this:
// Nothing producing an instance of A yet, but just capturing the way A needs
// to be opened.
a = Safe(open()) // Safe[A]
// Just building a function that opens a and extracts b, returning a Safe[B]
val b = a.map(_.someObject()) // Safe[B]
// Shouldn't compile since B is not safe to extract without being in the scope
// of an open A.
b.extract
// The c variable will hold something that is able to exist outside the scope of
// an open A.
val c = b.map(_.toString)
// So this should compile
c.extract

In your example, it is typical that a exception is thrown when you access a stream that is already closed. There exists util.Try, which is made exactly for this use case:
scala> import scala.util._
import scala.util._
scala> val s = Try(io.Source.fromFile("exists"))
s: scala.util.Try[scala.io.BufferedSource] = Success(non-empty iterator)
// returns a safe value
scala> s.map(_.getLines().toList)
res21: scala.util.Try[List[String]] = Success(List(hello))
scala> s.map(_.close())
res22: scala.util.Try[Unit] = Success(())
scala> val data = s.map(_.getLines().toList)
data: scala.util.Try[List[String]] = Failure(java.io.IOException: Stream Closed)
// not safe anymore, thus you won't get access to the data with map
scala> data.map(_.length)
res24: scala.util.Try[Int] = Failure(java.io.IOException: Stream Closed)
Like other monads, Try gives you a compile time guarantee to not access the wrapped value directly: you have to compose higher order functions to operate on its value.

Related

Calling function library scala

I'm looking to call the ATR function from this scala wrapper for ta-lib. But I can't figure out how to use wrapper correctly.
package io.github.patceev.talib
import com.tictactec.ta.lib.{Core, MInteger, RetCode}
import scala.concurrent.Future
object Volatility {
def ATR(
highs: Vector[Double],
lows: Vector[Double],
closes: Vector[Double],
period: Int = 14
)(implicit core: Core): Future[Vector[Double]] = {
val arrSize = highs.length - period + 1
if (arrSize < 0) {
Future.successful(Vector.empty[Double])
} else {
val begin = new MInteger()
val length = new MInteger()
val result = Array.ofDim[Double](arrSize)
core.atr(
0, highs.length - 1, highs.toArray, lows.toArray, closes.toArray,
period, begin, length, result
) match {
case RetCode.Success =>
Future.successful(result.toVector)
case error =>
Future.failed(new Exception(error.toString))
}
}
}
}
Would someone be able to explain how to use function and print out the result to the console.
Many thanks in advance.
Regarding syntax, Scala is one of many languages where you call functions and methods passing arguments in parentheses (mostly, but let's keep it simple for now):
def myFunction(a: Int): Int = a + 1
myFunction(1) // myFunction is called and returns 2
On top of this, Scala allows to specify multiple parameters lists, as in the following example:
def myCurriedFunction(a: Int)(b: Int): Int = a + b
myCurriedFunction(2)(3) // myCurriedFunction returns 5
You can also partially apply myCurriedFunction, but again, let's keep it simple for the time being. The main idea is that you can have multiple lists of arguments passed to a function.
Built on top of this, Scala allows to define a list of implicit parameters, which the compiler will automatically retrieve for you based on some scoping rules. Implicit parameters are used, for example, by Futures:
// this defines how and where callbacks are run
// the compiler will automatically "inject" them for you where needed
implicit val ec: ExecutionContext = concurrent.ExecutionContext.global
Future(4).map(_ + 1) // this will eventually result in a Future(5)
Note that both Future and map have a second parameter list that allows to specify an implicit execution context. By having one in scope, the compiler will "inject" it for you at the call site, without having to write it explicitly. You could have still done it and the result would have been
Future(4)(ec).map(_ + 1)(ec)
That said, I don't know the specifics of the library you are using, but the idea is that you have to instantiate a value of type Core and either bind it to an implicit val or pass it explicitly.
The resulting code will be something like the following
val highs: Vector[Double] = ???
val lows: Vector[Double] = ???
val closes: Vector[Double] = ???
implicit val core: Core = ??? // instantiate core
val resultsFuture = Volatility.ATR(highs, lows, closes) // core is passed implicitly
for (results <- resultsFuture; result <- results) {
println(result)
}
Note that depending on your situation you may have to also use an implicit ExecutionContext to run this code (because you are extracting the Vector[Double] from a Future). Choosing the right execution context is another kind of issue but to play around you may want to use the global execution context.
Extra
Regarding some of the points I've left open, here are some pointers that hopefully will turn out to be useful:
Operators
Multiple Parameter Lists (Currying)
Implicit Parameters
Scala Futures

Infinite loop when replacing concrete value by parameter name

I have the two following objects (in scala and using spark):
1. The main object
object Omain {
def main(args: Array[String]) {
odbscan
}
}
2. The object odbscan
object odbscan {
val conf = new SparkConf().setAppName("Clustering").setMaster("local")
conf.set("spark.driver.maxResultSize", "3g")
val sc = new SparkContext(conf)
val param_user_minimal_rating_count = 2
/***Connexion***/
val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
val sql = "SELECT id, data FROM user_profile"
val options = connectMysql.getOptionsMap(sql)
val uSQL = sqlcontext.load("jdbc", options)
val users = uSQL.rdd.map { x =>
val v = x.toString().substring(1, x.toString().size - 1).split(",")
var ap: Map[Int, Double] = Map()
if (v.size > 1)
ap = v(1).split(";").map { y => (y.split(":")(0).toInt, y.split(":")(1).toDouble) }.toMap
(v(0).toInt, ap)
}.filter(_._2.size >= param_user_minimal_rating_count)
println(users.collect().mkString("\n"))
}
When I execute this code I obtain an infinite loop, until I change:
filter(_._2.size >= param_user_minimal_rating_count)
to
filter(_._2.size >= 1)
or any other numerical value, in this case the code work, and I have my result displayed
What I think is happening here is that Spark serializes functions to send them over the wire. And that because your function (the one you're passing to map) calls the accessor param_user_minimal_rating_count of object odbscan, the entire object odbscan will need to get serialized and sent along with it. Deserializing and then using that deserialized object will cause the code in its body to get executed again which causes an infinite loop of serializing-->sending-->deserializing-->executing-->serializing-->...
Probably the easiest thing to do here is changing that val to final val param_user_minimal_rating_count = 2 so the compiler will inline the value. But note that this will only be a solution for literal constants. For more information see constant value definitions and constant expressions.
An other and better solution would be to refactor your code so that no instance variables are used in lambda expressions. Referencing vals that are defined in an object or class will get the whole object serialized. So try to only refer to vals that are local (to a method). And most importantly don't execute your business logic from within a constructor/the body of an object or class.
Your problem is somewhere else.
The only difference between the 2 snippets is the definition of val Eps = 5 outside of the map which does not change at all the control flow of your code.
Please post more context so we can help.

Best way to handle Error on basic Array

val myArray = Array("1", "2")
val error = myArray(5)//throws an ArrayOutOfBoundsException
myArray has no fixed size, which explains why a call like performed on the above second line might happen.
First, I never really understood the reasons to use error handling for expected errors. Am I wrong to consider this practice as bad, resulting from poor coding skills or an inclination towards laziness?
What would be the best way to handle the above case?
What I am leaning towards: basic implementation (condition) to prevent accessing the data like depicted;
use Option;
use Try or Either;
use a try-catch block.
1 Avoid addressing elements through index
Scala offers a rich set of collection operations that are applied to Arrays through ArrayOps implicit conversions. This lets us use combinators like map, flatMap, take, drop, .... on arrays instead of addressing elements by index.
2 Prevent access out of range
An example I've seen often when parsing CSV-like data (in Spark):
case class Record(id:String, name: String, address:String)
val RecordSize = 3
val csvData = // some comma separated data
val records = csvData.map(line => line.split(","))
.collect{case arr if (arr.size == RecordSize) =>
Record(arr(0), arr(1), arr(2))}
3 Use checks that fit in the current context
If we are using monadic constructs to compose access to some resource, use a fitting way of lift errors to the application flow:
e.g. Imagine we are retrieving user preferences from some repository and we want the first one:
Option
def getUserById(id:ID):Option[User]
def getPreferences(user:User) : Option[Array[Preferences]]
val topPreference = for {
user <- userById(id)
preferences <- getPreferences(user)
topPreference <- preferences.lift(0)
} yield topPreference
(or even better, applying advice #1):
val topPreference = for {
user <- userById(id)
preferences <- getPreferences(user)
topPreference <- preferences.headOption
} yield topPreference
Try
def getUserById(id:ID): Try[User]
def getPreferences(user:User) : Try[Array[Preferences]]
val topPreference = for {
user <- userById(id)
preferences <- getPreferences(user)
topPreference <- Try(preferences(0))
} yield topPreference
As general guidance: Use the principle of least power.
If possible, use error-free combinators: = array.drop(4).take(1)
If all that matters is having an element or not, use Option
If we need to preserve the reason why we could not find an element, use Try.
Let the types and context of the program guide you.
If indexing myArray can be expected to error on occasion, then it sounds like Option would be the way to go.
myArray.lift(1) // Option[String] = Some(2)
myArray.lift(5) // Option[String] = None
You could use Try() but why bother if you already know what the error is and you're not interested in catching or reporting it?
Use arr.lift (available in standard library) which returns Option instead of throwing exception.
if not use safely
Try to access the element safely to avoid accidentally throwing exceptions in middle of the code.
implicit class ArrUtils[T](arr: Array[T]) {
import scala.util.Try
def safely(index: Int): Option[T] = Try(arr(index)).toOption
}
Usage:
arr.safely(4)
REPL
scala> val arr = Array(1, 2, 3)
arr: Array[Int] = Array(1, 2, 3)
scala> implicit class ArrUtils[T](arr: Array[T]) {
import scala.util.Try
def safely(index: Int): Option[T] = Try(arr(index)).toOption
}
defined class ArrUtils
scala> arr.safely(4)
res5: Option[Int] = None
scala> arr.safely(1)
res6: Option[Int] = Some(2)

How to get truly atomic update for TrieMap.getOrElseUpdate

As I understand, TrieMap.getOrElseUpdate is still not truly atomic, and this fixes only returned result (it could return different instances for different callers before this fix), so the updater function still might be called several times, as documentation (for 2.11.7) says:
Note: This method will invoke op at most once. However, op may be invoked without the result being added to the map if a concurrent process is also trying to add a value corresponding to the same key k.
*I've checked that manually for 2.11.7, still "at least once"
How to guarantee one-time call (if I use TrieMap for factories)?
I think this solution should work for my requirements:
trait LazyComp { val get: Int }
val map = new TrieMap[String, LazyComp]()
val count = new AtomicInteger() //just for test, you don't need it
def getSingleton(key: String) = {
val v = new LazyComp {
lazy val get = {
//compute something
count.incrementAndGet() //just for test, you don't need it
}
}
map.putIfAbsent(key, v).getOrElse(v).get
}
I believe, lazy val actually uses synchronized inside. And also the code inside get should be safe from exceptions
However, performance could be improved in future: SIP-20
Test:
scala> (0 to 10000000).par.map(_ => getSingleton("zzz")).last
res8: Int = 1
P.S. Java has computeIfAbscent method on ConcurrentHashMap which I could use as well.

Make a Scala interpreter oblivious between interpret calls

Is it possible to configure a Scala interpreter (tools.nsc.IMain) so that it "forgets" the previously executed code, whenever I run the next interpret() call?
Normally when it compiles the sources, it wraps them in nested objects, so all the previously defined variables, functions and bindings are available.
It would suffice to not generate the nested objects (or to throw them away), although I would prefer a solution which would even remove the previously compiled classes from the class loader again.
Is there a setting, or a method, or something I can overwrite, or an alternative to IMain that would accomplish this? I need to be able to still access the resulting objects / classes from the host VM.
Basically I want to isolate subsequent interpret() calls without something as heavy weight as creating a new IMain for each iteration.
Here is one possible answer. Basically there is method reset() which calls the following things (mostly private, so either you buy the whole package or not):
clearExecutionWrapper()
resetClassLoader()
resetAllCreators()
prevRequests.clear()
referencedNameMap.clear()
definedNameMap.clear()
virtualDirectory.clear()
In my case, I am using a custom execution wrapper, so that needs to be set up again, and also imports are handled through a regular interpret cycle, so either add them again, or—better—just prepend them with the execution wrapper.
I would like to keep my bindings, they are also gone:
import tools.nsc._
import interpreter.IMain
object Test {
private final class Intp(cset: nsc.Settings)
extends IMain(cset, new NewLinePrintWriter(new ConsoleWriter, autoFlush = true)) {
override protected def parentClassLoader = Test.getClass.getClassLoader
}
object Foo {
def bar() { println("BAR" )}
}
def run() {
val cset = new nsc.Settings()
cset.classpath.value += java.io.File.pathSeparator + sys.props("java.class.path")
val i = new Intp(cset)
i.initializeSynchronous()
i.bind[Foo.type]("foo", Foo)
val res0 = i.interpret("foo.bar(); val x = 33")
println(s"res0: $res0")
i.reset()
val res1 = i.interpret("println(x)")
println(s"res1: $res1")
i.reset()
val res2 = i.interpret("foo.bar()")
println(s"res2: $res2")
}
}
This will find Foo in the first iteration, correctly forget x in the second iteration, but then in the third iteration, it can be seen that the foo binding is also lost:
foo: Test.Foo.type = Test$Foo$#8bf223
BAR
x: Int = 33
res0: Success
<console>:8: error: not found: value x
println(x)
^
res1: Error
<console>:8: error: not found: value foo
foo.bar()
^
res2: Error
The following seems to be fine:
for(j <- 0 until 3) {
val user = "foo.bar()"
val synth = """import Test.{Foo => foo}
""".stripMargin + user
val res = i.interpret(synth)
println(s"res$j: $res")
i.reset()
}