I am trying to read incremental data from my data source using Scala-Spark. Before hitting the source tables, I am trying to calculate the min & max of partition column that I use in my code in a Future which is present in a class: GetSourceMeta as given below.
def getBounds(keyIdMap:scala.collection.mutable.Map[String, String]): Future[scala.collection.mutable.Map[String, String]] = Future {
var boundsMap = scala.collection.mutable.Map[String, String]()
keyIdMap.keys.foreach(table => if(!keyIdMap(table).contains("Invalid")) {
val minMax = s"select max(insert_tms) maxTms, min(insert_tms) minTms from schema.${table} where source='DB2' and key_id in (${keyIdMap(table)})"
println("MinMax: " + minMax)
val boundsDF = spark.read.format("jdbc").option("url", con.getConUrl()).option("dbtable", s"(${minMax}) as ctids").option("user", con.getUserName()).option("password", con.getPwd()).load()
try {
val maxTms = boundsDF.select("minTms").head.getTimestamp(0).toString + "," + boundsDF.select("maxTms").head.getTimestamp(0).toString
println("Bounds: " + maxTms)
boundsMap += (table -> maxTms)
} catch {
case np: java.lang.NullPointerException => { println("No data found") }
case e: Exception => { println(s"Unknown exception: $e") }
}
}
)
boundsMap.foreach(println)
boundsMap
}
I am calling the above method in my main method as:
object LoadToCopyDB {
val conf = new SparkConf().setAppName("TEST_YEAR").set("some parameters")
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().config(conf).master("yarn").enableHiveSupport().config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").getOrCreate()
val gsm = new GetSourceMeta()
val minMaxKeyMap = gsm.getBounds(keyIdMap).onComplete {
case Success(values) => values.foreach(println)
case Failure(f) => f.printStackTrace
}
.
.
.
}
Well, the onComplete didn't print any values so I used andThen as below and that didn't help as well.
val bounds: Future[scala.collection.mutable.Map[String, String]] = gpMetaData.getBounds(incrementalIds) andThen {
case Success(outval) => outval.foreach(println)
case Failure(e) => println(e)
}
Earlier the main thread exits without letting the Future: getBounds execute. Hence I couldn't find any println statements from the Future displayed on the terminal. I found out that I need to keep the main thread Await inorder to complete the Future. But when I use Await in main along with onComplete:
Await.result(bounds, Duration.Inf)
The compiler gives an error:
Type mismatch, expected: Awaitable[NotInferedT], actual:Unit
If I declare the val minMaxKeyMap as Future[scala.collection.mutable.Map[String, String] the compiler says: Expression of type Unit doesn't conform to expected type Future[mutable.map[String,String]]
I tried to print the values of bounds after the Await statement but that just prints an empty Map.
I couldn't understand how can to fix this. Could anyone let me know what do I do to make the Future run properly ?
In this kind of cases, is always better to follow the types. The method onComplete only returns Unit, it won´t return a future hence it can´t be passed using Await.
In case you want to return a Future of any type you will have to map or flatmap the value and return an option, for example. In this case, does not matter what you return, you only want Await method to wait for this result and print a trace. You can treat the possible exception in the recover. It would be like that in your code:
val minMaxKeyMap:Future[Option[Any] = gsm.getBounds(keyIdMap).map { values =>
values.foreach(println)
None
}.recover{
case e: Throwable =>
e. printStackTrace
None
}
Note that the recover part has to return an instance of the type.
After that, you can apply the Await to the Future, and you will get the results printed. Is not the prettiest solution but it will work in your case.
Related
I am kind of failing this weird behaviour not sure where i am wrong exactly. So the situation is that tester2 function is returning a Future[Boolean]] now I want to wait for this to complete and when it gets completed I want it to return a List[String] based on different cases inside reset function, now the problem is instead of returning up a List[String] it is returning Future[List[String]] , not able to understand why match function behaving like this
I am getting this error to be exact
val les = Await.ready(tester2(5),Duration.Inf).map(reset).forEach(println)
object HelloWorld {
def main(args: Array[String]) {
val exp = tester2(5).map(reset)
val les = Await.ready(tester2(5),Duration.Inf).map(reset).forEach(println)
println(s"what do you say ${les}")
}
def reset (x: Option[Boolean]): List[String] =
x match {
case None => List("abc","def")
case Some(false) => List("abc","def")
case Some(true) => List("def","abc")
}
def tester():Future[Option[Message]]={
Future{
Thread.sleep(5000)
Option(Message("abc","def","ghi"))
}
}
def tester2(param:Int):Future[Option[Boolean]]={
Future{
Thread.sleep(5000)
if(param>10){
Some(true)
}else{
Some(false)
}
}
}
If tester2 returns a Future of an Option of a Boolean
def tester2(param:Int):Future[Option[Boolean] = ???
and you want to change the value to a string you need to say "when this future completes and there is a real Option[Boolean].. then do this thing. This is what "map" does on a future. It says "once the future completes, run this code". So you can do this:
def reset (in :Future[Option[Boolean]]) = in.map { optionOfBoolean :Option[Boolean] =>
optionOfBoolean match {
case None => ...
case Some(true) ...
}
}
Scala also allows you to short cut having the map and match together and just write:
def reset (in :Future[Option[Boolean]]) = in map {
case None => List("abc", "bcd")
case Some(true) => List("d3", "d4")
case Some(false) => List("sds", "dssds")
}
Since I can't see your error I can't help you further but something like this should work.
val booleanResult :Future[Option[Boolean]] = tester2(...)
val futureListStr :Future[List[String]] = reset(booleanResult)
val answer :List[String] = Await.result(futureListStr, scala.concurrent.duration.Duration.Inf)
Use Await.result to extract the result value.
final def result[T](awaitable: Awaitable[T], atMost: Duration): T
Await and return the result (of type T) of an Awaitable.
awaitable the Awaitable to be awaited
atMost maximum wait time, which may be negative (no waiting is done), >Duration.Inf for unbounded waiting, or a finite positive duration
returns the result value if awaitable is completed within the specific maximum wait time
trying to fetch result from database and returning the future resultset. But the issue is while accessing future result i am not getting any response.
below is the code snippnet:
def getAll(): Future[Iterable[Employee]] = {
Future{
fetchEmployees()
}(ec)
}
def fetchEmployees(): Iterable[Employee]={
var empList = ListBuffer[Employee]()
db.withConnection{ conn =>
val statement = conn.createStatement()
val rs = statement.executeQuery("Select * from Employee")
while (rs.next()){
println(rs.getString("EmpCode")+" "+rs.getString("FirstName")+" "+rs.getString("LastName"),rs.getString("Department"))
val emp = Employee(rs.getString("EmpCode"),rs.getString("FirstName"),rs.getString("LastName"),rs.getString("Department"))
empList.appended(emp)
}
}
empList
}
this is where trying to access return future object
def findAll: Future[Iterable[EmployeeResource]] = {
println("Inside resource handler")
repository.getAll().map(iterableEmp => {
iterableEmp.foreach(emp => println(s"Name is $emp.firstName"))
iterableEmp.map(emp=>createResource(emp))
})(ec)
}
Prints nothing.
Look at the doc for the appended method -- and note the term, it is not "append" (like in a command), but "appended", like what if...
def appended[B >: A](elem: B): ListBuffer[B]
A copy of this sequence with an element appended.
Your code:
empList.appended(emp)
creates a new ListBuffer, but you discard its result and your initial list buffer is never actually modified. (It is always a good idea to switch on the -Ywarn-value-discard scalac option!)
You need to use the += operator (or the addOne method).
I wanted to handle some exceptions in ZIO using catchAll or catchSome as the below :
object Test extends App {
def run(args: List[String]) =
myApp.fold(_ => 1, _ => 0)
val myApp =
for {
_ <- putStrLn(unsafeRun(toINT("3")).toString)
} yield ()
def toINT(s: String): IO[IOException, Int]= {
IO.succeed(s.toInt).map(v => v).catchAll(er =>IO.fail(er))
}
the code succeeded in case I passed a valid format number but it's unable to handle the exception in case I passed invalid format and idea ??
s.toInt gets evaluated outside of the IO monad. What happens is that you evaluate s.toInt first and try to pass the result of that to IO.succeed, but an exception has already been thrown before you can pass anything to IO.succeed. The name of succeed already basically says that you are sure that whatever you pass it is a plain value that cannot fail.
The docs suggest using Task.effect, IO.effect, or ZIO.effect for lifting an effect that can fail into ZIO.
Here is a program that worked for me:
val program =
for {
int <- toINT("3xyz")
_ <- putStrLn(int.toString)
} yield ()
def toINT(s: String): Task[Int] = {
ZIO.fromTry(Try(s.toInt))
}
rt.unsafeRun(program.catchAll(t => putStrLn(t.getMessage)))
According to the Scala Language Specification (§6.19), "An enumerator sequence always starts with a generator". Why?
I sometimes find this restriction to be a hindrance when using for-comprehensions with monads, because it means you can't do things like this:
def getFooValue(): Future[Int] = {
for {
manager = Manager.getManager() // could throw an exception
foo <- manager.makeFoo() // method call returns a Future
value = foo.getValue()
} yield value
}
Indeed, scalac rejects this with the error message '<-' expected but '=' found.
If this was valid syntax in Scala, one advantage would be that any exception thrown by Manager.getManager() would be caught by the Future monad used within the for-comprehension, and would cause it to yield a failed Future, which is what I want. The workaround of moving the call to Manager.getManager() outside the for-comprehension doesn't have this advantage:
def getFooValue(): Future[Int] = {
val manager = Manager.getManager()
for {
foo <- manager.makeFoo()
value = foo.getValue()
} yield value
}
In this case, an exception thrown by foo.getValue() will yield a failed Future (which is what I want), but an exception thrown by Manager.getManager() will be thrown back to the caller of getFooValue() (which is not what I want). Other possible ways of handling the exception are more verbose.
I find this restriction especially puzzling because in Haskell's otherwise similar do notation, there is no requirement that a do block should begin with a statement containing <-. Can anyone explain this difference between Scala and Haskell?
Here's a complete working example showing how exceptions are caught by the Future monad in for-comprehensions:
import scala.concurrent._
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.util.{Try, Success, Failure}
class Foo(val value: Int) {
def getValue(crash: Boolean): Int = {
if (crash) {
throw new Exception("failed to get value")
} else {
value
}
}
}
class Manager {
def makeFoo(crash: Boolean): Future[Foo] = {
if (crash) {
throw new Exception("failed to make Foo")
} else {
Future(new Foo(10))
}
}
}
object Manager {
def getManager(crash: Boolean): Manager = {
if (crash) {
throw new Exception("failed to get manager")
} else {
new Manager()
}
}
}
object Main extends App {
def getFooValue(crashGetManager: Boolean,
crashMakeFoo: Boolean,
crashGetValue: Boolean): Future[Int] = {
for {
manager <- Future(Manager.getManager(crashGetManager))
foo <- manager.makeFoo(crashMakeFoo)
value = foo.getValue(crashGetValue)
} yield value
}
def waitForValue(future: Future[Int]): Unit = {
val result = Try(Await.result(future, Duration("10 seconds")))
result match {
case Success(value) => println(s"Got value: $value")
case Failure(e) => println(s"Got error: $e")
}
}
val future1 = getFooValue(false, false, false)
waitForValue(future1)
val future2 = getFooValue(true, false, false)
waitForValue(future2)
val future3 = getFooValue(false, true, false)
waitForValue(future3)
val future4 = getFooValue(false, false, true)
waitForValue(future4)
}
Here's the output:
Got value: 10
Got error: java.lang.Exception: failed to get manager
Got error: java.lang.Exception: failed to make Foo
Got error: java.lang.Exception: failed to get value
This is a trivial example, but I'm working on a project in which we have a lot of non-trivial code that depends on this behaviour. As far as I understand, this is one of the main advantages of using Future (or Try) as a monad. What I find strange is that I have to write
manager <- Future(Manager.getManager(crashGetManager))
instead of
manager = Manager.getManager(crashGetManager)
(Edited to reflect #RexKerr's point that the monad is doing the work of catching the exceptions.)
for comprehensions do not catch exceptions. Try does, and it has the appropriate methods to participate in for-comprehensions, so you can
for {
manager <- Try { Manager.getManager() }
...
}
But then it's expecting Try all the way down unless you manually or implicitly have a way to switch container types (e.g. something that converts Try to a List).
So I'm not sure your premises are right. Any assignment you made in a for-comprehension can just be made early.
(Also, there is no point doing an assignment inside a for comprehension just to yield that exact value. Just do the computation in the yield block.)
(Also, just to illustrate that multiple types can play a role in for comprehensions so there's not a super-obvious correct answer for how to wrap an early assignment in terms of later types:
// List and Option, via implicit conversion
for {i <- List(1,2,3); j <- Option(i).filter(_ <2)} yield j
// Custom compatible types with map/flatMap
// Use :paste in the REPL to define A and B together
class A[X] { def flatMap[Y](f: X => B[Y]): A[Y] = new A[Y] }
class B[X](x: X) { def map[Y](f: X => Y): B[Y] = new B(f(x)) }
for{ i <- (new A[Int]); j <- (new B(i)) } yield j.toString
Even if you take the first type you still have the problem of whether there is a unique "bind" (way to wrap) and whether to doubly-wrap things that are already the correct type. There could be rules for all these things, but for-comprehensions are already hard enough to learn, no?)
Haskell translates the equivalent of for { manager = Manager.getManager(); ... } to the equivalent of lazy val manager = Manager.getManager(); for { ... }. This seems to work:
scala> lazy val x: Int = throw new Exception("")
x: Int = <lazy>
scala> for { y <- Future(x + 1) } yield y
res8: scala.concurrent.Future[Int] = scala.concurrent.impl.Promise$DefaultPromise#fedb05d
scala> Try(Await.result(res1, Duration("10 seconds")))
res9: scala.util.Try[Int] = Failure(java.lang.Exception: )
I think the reason this can't be done is because for-loops are syntactic sugar for flatMap and map methods (except if you are using a condition in the for-loop, in that case it's desugared with the method withFilter). When you are storing in a immutable variable, you can't use these methods. That's the reason you would be ok using Try as pointed out by Rex Kerr. In that case, you should be able to use map and flatMap methods.
I have the following code snippet that I use to read a record from the database and I'm using ReactiveMongo for this.
val futureList: Future[Option[BSONDocument]] = collection.find(query).cursor[BSONDocument].headOption
val os: Future[Option[Exam]] = futureList.map {
(list: Option[BSONDocument]) => list match {
case Some(examBSON) => {
val id = examBSON.getAs[Int]("id").get
val text = examBSON.getAs[String]("text").get
val description = examBSON.getAs[String]("description").get
val totalQuestions = examBSON.getAs[Int]("totalQuestions").get
val passingScore = examBSON.getAs[Int]("passingScore").get
Some(Exam(id, text, description, totalQuestions, passingScore))
}
case None => None
}
}.recover {
case t: Throwable => // Log exception
None
}
I do not want to change my method signature to return a Future. I want to get the value inside the Future and return it to the caller.
You need then to block using the awaitable object:
import scala.concurrent.duration._
val os: Future[Option[Exam]] = ???
val result = Await.result(os, 10 seconds)
result.getOrElse(/* some default */)
Note that blocking will block the thread until the future is completed or the timeout expires and an exception is thrown, note also that this kinda defeats the purpose of having async computation, but it may be ok depending on your use case.
If you don't need the result immediately you can attach a callback using onComplete
os onComplete {
case Success(someOption) => myMethod(someOption)
case Failure(t) => println("Error)
}
Note that onComplete will be fired only when the future is completed so the result is not immediately accessible, also the return type is Unit.