Hi I am trying to process data in a file.
This the code I am using below.
I have a list of Futures and trying to get the output from these futures.
Everything is fine but the last line of return is executing before OnSuccess.
How can I change that behaviour without having a blocking operation.
def processRow(rowNumber: Int, row: String, delimiter: String, rules: List[Rule]): RowMessage = {
var cells = row.split(delimiter)
var passedRules = new ListBuffer[RuleResult]()
val failedRules = new ListBuffer[RuleResult]()
val rulesFuture = rules.map {
i => Future {
val cells = row.split(delimiter);
//some processing....
}
}
val f1 = Future.sequence(rulesFuture)
f1 onComplete {
case Success(results) => for (result <- results) (result.map(x => {
if (x.isPassFailed) {
passedRules += x
}
else {
failedRules += x
}
}))
case Failure(t) => println("An error has occured: " + t.getMessage)
}
return new RowMessage(passedRules.toList, failedRules.toList)
}
You can't avoid blocking and return a plain RowMessage. You need to return a Futureas well.
def processRow(rowNumber: Int, row: String, delimiter: String, rules: List[Rule]): Future[RowMessage] = {
val cells = row.split(delimiter)
Future.traverse(rules) { i =>
Future {
//some processing....
}
} map { results =>
val (passed, failed) = results.partition(_.isPassFailed)
new RowMessage(passed, failed)
}
}
Also think about your algorithm to avoid mutable state, especially when you change it from different Futures.
Future.traverse is equivalent of your map + Future.sequence. Then instead of onComplete, just map your Future to modify the list. You can split it easly using partition instead of what you've been doing.
You don't need to use return, in fact you shouldn't unless you know what you are doing.
Btw isPassFailed doesn't sound like a reasonable method name to me, especially considering that when it's true you are adding it to passed rules.
Related
I am trying to read incremental data from my data source using Scala-Spark. Before hitting the source tables, I am trying to calculate the min & max of partition column that I use in my code in a Future which is present in a class: GetSourceMeta as given below.
def getBounds(keyIdMap:scala.collection.mutable.Map[String, String]): Future[scala.collection.mutable.Map[String, String]] = Future {
var boundsMap = scala.collection.mutable.Map[String, String]()
keyIdMap.keys.foreach(table => if(!keyIdMap(table).contains("Invalid")) {
val minMax = s"select max(insert_tms) maxTms, min(insert_tms) minTms from schema.${table} where source='DB2' and key_id in (${keyIdMap(table)})"
println("MinMax: " + minMax)
val boundsDF = spark.read.format("jdbc").option("url", con.getConUrl()).option("dbtable", s"(${minMax}) as ctids").option("user", con.getUserName()).option("password", con.getPwd()).load()
try {
val maxTms = boundsDF.select("minTms").head.getTimestamp(0).toString + "," + boundsDF.select("maxTms").head.getTimestamp(0).toString
println("Bounds: " + maxTms)
boundsMap += (table -> maxTms)
} catch {
case np: java.lang.NullPointerException => { println("No data found") }
case e: Exception => { println(s"Unknown exception: $e") }
}
}
)
boundsMap.foreach(println)
boundsMap
}
I am calling the above method in my main method as:
object LoadToCopyDB {
val conf = new SparkConf().setAppName("TEST_YEAR").set("some parameters")
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().config(conf).master("yarn").enableHiveSupport().config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").getOrCreate()
val gsm = new GetSourceMeta()
val minMaxKeyMap = gsm.getBounds(keyIdMap).onComplete {
case Success(values) => values.foreach(println)
case Failure(f) => f.printStackTrace
}
.
.
.
}
Well, the onComplete didn't print any values so I used andThen as below and that didn't help as well.
val bounds: Future[scala.collection.mutable.Map[String, String]] = gpMetaData.getBounds(incrementalIds) andThen {
case Success(outval) => outval.foreach(println)
case Failure(e) => println(e)
}
Earlier the main thread exits without letting the Future: getBounds execute. Hence I couldn't find any println statements from the Future displayed on the terminal. I found out that I need to keep the main thread Await inorder to complete the Future. But when I use Await in main along with onComplete:
Await.result(bounds, Duration.Inf)
The compiler gives an error:
Type mismatch, expected: Awaitable[NotInferedT], actual:Unit
If I declare the val minMaxKeyMap as Future[scala.collection.mutable.Map[String, String] the compiler says: Expression of type Unit doesn't conform to expected type Future[mutable.map[String,String]]
I tried to print the values of bounds after the Await statement but that just prints an empty Map.
I couldn't understand how can to fix this. Could anyone let me know what do I do to make the Future run properly ?
In this kind of cases, is always better to follow the types. The method onComplete only returns Unit, it won´t return a future hence it can´t be passed using Await.
In case you want to return a Future of any type you will have to map or flatmap the value and return an option, for example. In this case, does not matter what you return, you only want Await method to wait for this result and print a trace. You can treat the possible exception in the recover. It would be like that in your code:
val minMaxKeyMap:Future[Option[Any] = gsm.getBounds(keyIdMap).map { values =>
values.foreach(println)
None
}.recover{
case e: Throwable =>
e. printStackTrace
None
}
Note that the recover part has to return an instance of the type.
After that, you can apply the Await to the Future, and you will get the results printed. Is not the prettiest solution but it will work in your case.
In the following function, I am passing an Option. Depending on whether the Option is Some or None, I need to call the a specific API but the rest of the code is the same for both Some and None. I don't know how to remove the code duplication though. How could I re-write the code in functional-programming style?
def getRowsByPartitionKeyId(id:I, pagingStateOption:Option[PagingState]):Tuple2[Option[List[M]],Option[PagingState]] = {
pagingStateOption match {
case Some(pagingState:PagingState) => {
val resultSet = session.execute(whereClause
.setFetchSize(1)
.setPagingState(pagingState)) //THIS IS THE ONLY DIFFERENCE IN THE TWO CODE LEGS
val it = resultSet.iterator();//resultSet is an iterator
val newPagingState:PagingState = resultSet.getExecutionInfo.getPagingState
if(it.hasNext){
val resultSetAsList:List[Row] = asScalaIterator(it).toList
val resultSetAsModelList = rowToModel(resultSetAsList.head)
Tuple2(Some(List(resultSetAsModelList)),Some(pagingState))
}
else {
Tuple2(None, None)
}
}
case None =>{
val resultSet = session.execute(whereClause
.setFetchSize(1)) //get one row from ResultSet. Cassandra might return more or less though
val it = resultSet.iterator();//resultSet is an iterator
val pagingState:PagingState = resultSet.getExecutionInfo.getPagingState
if(it.hasNext){
val resultSetAsList:List[Row] = asScalaIterator(it).toList
val resultSetAsModelList = rowToModel(resultSetAsList.head)
Tuple2(Some(List(resultSetAsModelList)),Some(pagingState))
}
else {
Tuple2(None, None)
}
}
}
def getRowsByPartitionKeyId(
id:I,
pagingStateOption:Option[PagingState]
): (Option[List[M]], Option[PagingState]) = {
val resultSet = session.execute(pagingStateOption match {
case Some(pagingState: PagingState) =>
whereClause.setFetchSize(1).setPagingState(pagingState)
case None =>
whereClause.setFetchSize(1)
})
val it = resultSet.iterator();//resultSet is an iterator
val newPagingState:PagingState = resultSet.getExecutionInfo.getPagingState
if (it.hasNext) {
val resultSetAsList:List[Row] = asScalaIterator(it).toList
val resultSetAsModelList = rowToModel(resultSetAsList.head)
Tuple2(Some(List(resultSetAsModelList)),Some(pagingState))
} else {
Tuple2(None, None)
}
}
Got it. I forgot that everything in Scala returns a value, even match, so I can do this
val resultSet = pagingStateOption match {
case Some(pagingState: PagingState) => {
println("got paging state:" +pagingState)
session.execute(whereClause
.setFetchSize(1)
.setPagingState(pagingState)) //get one row from ResultSet. Cassandra might return more or less though
}
case None => {
session.execute(whereClause
.setFetchSize(1)) //get one row from ResultSet. Cassandra might return more or less though
}
}
I am confused on how one would conditionally update a document based on a previous query using only futures.
Let say I want to push to some value into an array in a document only if that array has a size less than a given Integer.
I am using this function to get the document, after getting the document I am pushing values - what I am unable to do is do that conditionally.
def joinGroup(actionRequest: GroupRequests.GroupActionRequest): Future[GroupResponse.GroupActionCompleted] = {
//groupisNotFull() is a boolean future
groupIsNotFull(actionRequest.groupId).map(
shouldUpdate => {
if(shouldUpdate){
Logger.info(actionRequest.initiator + " Joining Group: " + actionRequest.groupId)
val selector = BSONDocument("group.groupid" -> BSONDocument("$eq" -> actionRequest.groupId))
val modifier = BSONDocument("$push" -> BSONDocument("group.users" -> "test-user"))
val updateResult = activeGroups.flatMap(_.update(selector, modifier))
.map(res => {
GroupActionCompleted(
actionRequest.groupId,
actionRequest.initiator,
GroupConstants.Actions.JOIN,
res.ok,
GroupConstants.Messages.JOIN_SUCCESS
)
})
.recover {
case e: Throwable => GroupActionCompleted(
actionRequest.groupId,
actionRequest.initiator, GroupConstants.Actions.JOIN,
success = false,
GroupConstants.Messages.JOIN_FAIL
)
}
updateResult
}
else {
val updateResult = Future.successful(
GroupActionCompleted(
actionRequest.groupId,
actionRequest.initiator,
GroupConstants.Actions.JOIN,
success = false,
GroupConstants.Messages.JOIN_FAIL
))
updateResult
}
}
)
}
//returns a Future[Boolean] based on if there is room for another user.
private def groupIsNotFull(groupid: String): Future[Boolean] = {
findGroupByGroupId(groupid)
.map(group => {
if (group.isDefined) {
val fGroup = group.get
fGroup.group.users.size < fGroup.group.groupInformation.maxUsers
} else {
false
}
})
}
I am confused on why I cannot do this. The compilation error is:
Error:type mismatch;
found : scala.concurrent.Future[response.group.GroupResponse.GroupActionCompleted]
required: response.group.GroupResponse.GroupActionCompleted
for both the if and else branch 'updateResult'.
As a side question.. is this the proper way of updating documents conditionally - that is querying for it, doing some logic then executing another query?
Ok got it - you need to flatMap the first Future[Boolean] like this:
groupIsNotFull(actionRequest.groupId).flatMap( ...
Using flatMap, the result will be a Future[T] with map you would get a Future[Future[T]]. The compiler knows you want to return a Future[T] so its expecting the map to return a T and you are trying to return a Future[T] - so it throws the error. Using flatMap will fix this.
Some further clarity on map vs flatmap here: In Scala Akka futures, what is the difference between map and flatMap?
I believe the problem is because the joinGroup2 function return type is Future[Response], yet you are returning just a Response in the else block. If you look at the signature of the mapTo[T] function, it returns a Future[T].
I think you need to wrap the Response object in a Future. Something like this:
else {
Future { Response(false, ERROR_REASON) }
}
Btw you have a typo: Respose -> Response
I have a method that does a couple of database look up and performs some logic.
The MyType object that I return from the method is as follows:
case class MyResultType(typeId: Long, type1: Seq[Type1], type2: Seq[Type2])
The method definition is like this:
def myMethod(typeId: Long, timeInterval: Interval) = async {
// 1. check if I can find an entity in the database for typeId
val myTypeOption = await(db.run(findMyTypeById(typeId))) // I'm getting the headOption on this result
if (myTypeOption.isDefined) {
val anotherDbLookUp = await(doSomeDBStuff) // Line A
// the interval gets split and assume that I get a List of thse intervals
val intervalList = splitInterval(interval)
// for each of the interval in the intervalList, I do database look up
val results: Seq[(Future[Seq[Type1], Future[Seq[Type2])] = for {
interval <- intervalList
} yield {
(getType1Entries(interval), getType2Entries(interval))
}
// best way to work with the results so that I can return MyResultType
}
else {
None
}
}
Now the getType1Entries(interval) and getType2Entries(interval) each returns a Future of Seq(Type1) and Seq(Type2) entries!
My problem now is to get the Seq(Type1) and Seq(Type2) out of the Future and stuff that into the MyResultType case class?
You could refer to this question you asked
Scala transforming a Seq with Future
so you get the
val results2: Future[Seq([Iterable[Type1], [Iterable[Type2])] = ???
and then call await on it and you have no Futures at all, you can do what you want.
I hope I understood the question correctly.
Oh and by the way you should map myTypeOption instead of checking if it's defined and returning None if it's not
if (myTypeOption.isDefined) {
Some(x)
} else {
None
}
can be simply replaced with
myTypeOption.map { _ => // ignoring what actually was inside option
x // return whatever you want, without wrapping it in Some
}
If I understood your question correctly, then this should do the trick.
def myMethod(typeId: Long, timeInterval: Interval): Option[Seq[MyResultType]] = async {
// 1. check if I can find an entity in the database for typeId
val myTypeOption = await(db.run(findMyTypeById(typeId))) // I'm getting the headOption on this result
if (myTypeOption.isDefined) {
// the interval gets split and assume that I get a List of thse intervals
val intervalList = splitInterval(interval)
// for each of the interval in the intervalList, I do database look up
val results: Seq[(Future[Seq[Type1]], Future[Seq[Type2]])] = for {
interval <- intervalList
} yield {
(getType1Entries(interval), getType2Entries(interval))
}
// best way to work with the results so that I can return MyResultType
Some(
await(
Future.sequence(
results.map{
case (l, r) =>
l.zip(r).map{
case (vl, vr) => MyResultType(typeId, vl, vr)
}
})))
}
else {
None
}
}
There are two parts to your problem, 1) how to deal with two dependent futures, and 2) how to extract the resulting values.
When dealing with dependent futures, I normally compose them together:
val future1 = Future { 10 }
val future2 = Future { 20 }
// results in a new future with type (Int, Int)
val combined = for {
a <- future1
b <- future2
} yield (a, b)
// then you can use foreach/map, Await, or onComplete to do
// something when your results are ready..
combined.foreach { ((a, b)) =>
// do something with the result here
}
To extract the results I generally use Await if I need to make a synchronous response, use _.onComplete() if I need to deal with potential failure, and use _.foreach()/_.map() for most other circumstances.
I have 3 futures of type Response. The first future returns a JsValue which defines if future 2 and future 3 shall be executed or only future 3 shall be executed.
Pseudocode:
If Future 1 then {future2 and future 3}
else future 3
Iam trying to do this in a play framwork action which means in order to use the result of the futures later I cant use onSuccess, onFailure and onComplete because all of them return Unit and not the actual JsValue from the last future.
I tried to do this with map() and andThen but I am a Scala noob and I guess i wasn't able to do it because I always just missed a little point. Here is my current approach which does not work!
def addSuggestion(indexName: String, suggestion: String) = Action.async {
val indexExistsQuery: IndexExistsQuery = new IndexExistsQuery(indexName);
val addSuggestionQuery: AddSuggestionQuery = new AddSuggestionQuery(indexName, suggestion)
val indexCreationQuery: CreateCompletionsFieldQuery = new CreateCompletionsFieldQuery(indexName)
val futureResponse: Future[Response] = for {
responseOne <- EsClient.execute(indexExistsQuery)
responseTwo <- EsClient.execute(indexCreationQuery) if (indexExistsQuery.getBooleanResult(indexExistsQuery.getResult(responseOne)))
responseThree <- EsClient.execute(addSuggestionQuery)
} yield responseThree
futureResponse.map { response =>
Ok("Feed title: " + (response.json))
}
I created some pseudocode:
checkIndexExists() map {
case true => Future.successful()
case false => createIndex()
} flatMap { _ =>
query()
}.map { response =>
Ok("Feed title: " + (response.json))
}.recover {
case _ => Ok("bla")
}
First you fire up the query if the index exists.
Then you map that future how to work with that Future[Boolean] if it successful. Since you use map, you kind of extract the Boolean. If the index exists, you just create a future that is already complete. If the index not exists you need to fire up the index creation command. Now you have the situation that you have nested Future's (Future[Future[Response]]). Using flatMap you remove one dimension, so that you only have Future[Response]. That can be mapped to a Play result.
Update (the implementation of MeiSign):
EsClient.execute(indexExistsQuery) map { response =>
if (indexExistsQuery.getBooleanResult(indexExistsQuery.getResult(response))) Future.successful(Response)
else EsClient.execute(indexCreationQuery)
} flatMap { _ =>
EsClient.execute(addSuggestionQuery)
} map { response: Response =>
Ok("Feed title: " + (response.json))
}
I found this solution but I dont think that it is a good solution because Iam using Await.result() which is a blocking operation.
If anyone knows how to refactor this code without blocking operations please let me know.
def addSuggestion(indexName: String, suggestion: String) = Action.async {
val indexExistsQuery: IndexExistsQuery = new IndexExistsQuery(indexName);
val addSuggestionQuery: AddSuggestionQuery = new AddSuggestionQuery(indexName, suggestion)
val indexCreationQuery: CreateCompletionsFieldQuery = new CreateCompletionsFieldQuery(indexName)
val indexExists: Boolean = indexExistsQuery.getBooleanResult(indexExistsQuery.getResult(Await.result(EsClient.execute(indexExistsQuery), 5.second)))
if (indexExists) {
EsClient.execute(addSuggestionQuery).map { response => Ok("Feed title: " + (response.json)) }
} else {
val futureResponse: Future[Response] = for {
responseTwo <- EsClient.execute(indexCreationQuery)
responseThree <- EsClient.execute(addSuggestionQuery)
} yield responseThree
futureResponse.map { response =>
{
Ok("Feed title: " + (response.json))
}
}.recover { case _ => Ok("bla") }
}
}