Return keyword inside the inline function in Scala - scala

I've heard about to not use Return keyword in Scala, because it might change the flow of the program like;
// this will return only 2 because of return keyword
List(1, 2, 3).map(value => return value * 2)
Here is my case; I've recursive case class, and using DFS to do some calculation on it. So, maximum depth could be 5. Here is the model;
case class Line(
lines: Option[Seq[Line]],
balls: Option[Seq[Ball]],
op: Option[String]
)
I'm using DFS approach to search this recursive model. But at some point, if a special value exist in the data, I want to stop iterating over the data left and return the result directly instead. Here is an example;
Line(
lines = Some(Seq(
Line(None, Some(Seq(Ball(1), Ball(3))), Some("s")),
Line(None, Some(Seq(Ball(5), Ball(2))), Some("d")),
Line(None, Some(Seq(Ball(9))), None)
)),
balls = None,
None
)
In this data, I want to return as like "NOT_OKAY" if I run into the Ball(5), which means I do not need to any operation on Ball(2) and Ball(9) anymore. Otherwise, I will apply a calculation to the each Ball(x) with the given operator.
I'm using this sort of DFS method;
def calculate(line: Line) = {
// string here is the actual result that I want, Boolean just keeps if there is a data that I don't want
def dfs(line: Line): (String, Boolean) = {
line.balls.map{_.map { ball =>
val result = someCalculationOnBall(ball)
// return keyword here because i don't want to iterate values left in the balls
if (result == "NOTREQUIRED") return ("NOT_OKAY", true)
("OKAY", false)
}}.getOrElse(
line.lines.map{_.map{ subLine =>
val groupResult = dfs(subLine)
// here is I'm using return because I want to return the result directly instead of iterating the values left in the lines
if (groupResult._2) return ("NOT_OKAY", true)
("OKAY", false)
}}
)
}
.... rest of the thing
}
In this case, I'm using return keyword in the inline functions, and change the behaviour of the inner map functions completely. I've just read somethings about not using return keyword in Scala, but couldn't make sure this will create a problem or not. Because in my case, I don't want to do any calculation if I run into a value that I don't want to see. Also I couldn't find the functional way to get rid of return keyword.
Is there any side effect like stack exception etc. to use return keyword here? I'm always open to the alternative ways. Thank you so much!

Related

How to do a case insensitive match for command line arguments in scala?

I'm working on a command line tool written in Scala which is executed as:
sbt "run --customerAccount 1234567"
Now, I wish to make this flexible to accept "--CUSTOMERACCOUNT" or --cUsToMerAccount or --customerACCOUNT ...you get the drift
Here's what the code looks like:
lazy val OptionsParser: OptionParser[Args] = new scopt.OptionParser[Args]("scopt") {
head(
"XML Generator",
"Creates XML for testing"
)
help("help").text(s"Prints this usage message. $envUsage")
opt[String]('c', "customerAccount")
.text("Required: Please provide customer account number as -c 12334 or --customerAccount 12334")
.required()
.action { (cust, args) =>
assert(cust.nonEmpty, "cust is REQUIRED!!")
args.copy(cust = cust)
}
}
I assume the opt[String]('c', "customerAccount") does the pattern matching from the command line and will match with "customerAccount" - how do I get this to match with "--CUSTOMERACCOUNT" or --cUsToMerAccount or --customerACCOUNT? What exactly does the args.copy (cust = cust) do?
I apologize if the questions seem too basic. I'm incredibly new to Scala, have worked in Java and Python earlier so sometimes I find the syntax a little hard to understand as well.
You'd normally be parsing the args with code like:
OptionsParser.parse(args, Args())
So if you want case-insensitivity, probably the easiest way is to canonicalize the case of args with something like
val canonicalized = args.map(_.toLowerCase)
OptionsParser.parse(canonicalized, Args())
Or, if you for instance wanted to only canonicalize args starting with -- and before a bare --:
val canonicalized =
args.foldLeft(false -> List.empty[String]) { (state, arg) =>
val (afterDashes, result) = state
if (afterDashes) true -> (arg :: result) // pass through unchanged
else {
if (arg == "==") true -> (arg :: result) // move to afterDash state & pass through
else {
if (arg.startsWith("--")) false -> (arg.toLowerCase :: result)
else false -> (arg :: result) // pass through unchanged
}
}
}
._2 // Extract the result
.reverse // Reverse it back into the original order (if building up a sequence, your first choice should be to build a list in reversed order and reverse at the end)
OptionsParser.parse(canonicalized, Args())
Re the second question, since Args is (almost certainly) a case class, it has a copy method which constructs a new object with (most likely, depending on usage) different values for its fields. So
args.copy(cust = cust)
creates a new Args object, where:
the value of the cust field in that object is the value of the cust variable in that block (this is basically a somewhat clever hack that works with named method arguments)
every other field's value is taken from args

Creating Seq after waiting for all results from map/foreach in Scala

I am trying to loop over inputs and process them to produce scores.
Just for the first input, I want to do some processing that takes a while.
The function ends up returning just the values from the 'else' part. The 'if' part is done executing after the function returns the value.
I am new to Scala and understand the behavior but not sure how to fix it.
I've tried inputs.zipWithIndex.map instead of foreach but the result is the same.
def getscores(
inputs: inputs
): Future[Seq[scoreInfo]] = {
var scores: Seq[scoreInfo] = Seq()
inputs.zipWithIndex.foreach {
case (f, i) => {
if (i == 0) {
// long operation that returns Future[Option[scoreInfo]]
getgeoscore(f).foreach(gso => {
gso.foreach(score => {
scores = scores.:+(score)
})
})
} else {
scores = scores.:+(
scoreInfo(
id = "",
score = 5
)
)
}
}
}
Future {
scores
}
}
For what you need, I would drop the mutable variable and replace foreach with map to obtain an immutable list of Futures and recover to handle exceptions, followed by a sequence like below:
def getScores(inputs: Inputs): Future[List[ScoreInfo]] = Future.sequence(
inputs.zipWithIndex.map{ case (input, idx) =>
if (idx == 0)
getGeoScore(input).map(_.getOrElse(defaultScore)).recover{ case e => errorHandling(e) }
else
Future.successful(ScoreInfo("", 5))
})
To capture/print the result, one way is to use onComplete:
getScores(inputs).onComplete(println)
The part your missing is understanding a tricky element of concurrency, and that is that the order of execution when using multiple futures is not guaranteed.
If your block here is long running, it will take a while before appending the score to scores
// long operation that returns Future[Option[scoreInfo]]
getgeoscore(f).foreach(gso => {
gso.foreach(score => {
// stick a println("here") in here to see what happens, for demonstration purposes only
scores = scores.:+(score)
})
})
Since that executes concurrently, your getscores function will also simultaneously continue its work iterating over the rest of inputs in your zipWithindex. This iteration, especially since it's trivial work, likely finishes well before the long-running getgeoscore(f) completes the execution of the Future it scheduled, and the code will exit the function, moving on to whatever code is next after you called getscores
val futureScores: Future[Seq[scoreInfo]] = getScores(inputs)
futureScores.onComplete{
case Success(scoreInfoSeq) => println(s"Here's the scores: ${scoreInfoSeq.mkString(",")}"
}
//a this point the call to getgeoscore(f) could still be running and finish later, but you will never know
doSomeOtherWork()
Now to clean this up, since you can run a zipWithIndex on your inputs parameter, I assume you mean it's something like a inputs:Seq[Input]. If all you want to do is operate on the first input, then use the head function to only retrieve the first option, so getgeoscores(inputs.head) , you don't need the rest of the code you have there.
Also, as a note, if using Scala, get out of the habit of using mutable vars, especially if you're working with concurrency. Scala is built around supporting immutability, so if you find yourself wanting to use a var , try using a val and look up how to work with the Scala's collection library to make it work.
In general, that is when you have several concurrent futures, I would say Leo's answer describes the right way to do it. However, you want only the first element transformed by a long running operation. So you can use the future return by the respective function and append the other elements when the long running call returns by mapping the future result:
def getscores(inputs: Inputs): Future[Seq[ScoreInfo]] =
getgeoscore(inputs.head)
.map { optInfo =>
optInfo ++ inputs.tail.map(_ => scoreInfo(id = "", score = 5))
}
So you neither need zipWithIndex nor do you need an additional future or join the results of several futures with sequence. Mapping the future just gives you a new future with the result transformed by the function passed to .map().

Get the first elements (take function) of a DStream

I look for a way to retrieve the first elements of a DStream created as:
val dstream = ssc.textFileStream(args(1)).map(x => x.split(",").map(_.toDouble))
Unfortunately, there is no take function (as on RDD) on a dstream //dstream.take(2) !!!
Could someone has any idea on how to do it ?! thanks
You can use transform method in the DStream object then take n elements of the input RDD and save it to a list, then filter the original RDD to be contained in this list. This will return a new DStream contains n elements.
val n = 10
val partOfResult = dstream.transform(rdd => {
val list = rdd.take(n)
rdd.filter(list.contains)
})
partOfResult.print
The previous suggested solution did not compile for me as the take() method returns an Array, which is not serializable thus Spark streaming will fail with a java.io.NotSerializableException.
A simple variation on the previous code that worked for me:
val n = 10
val partOfResult = dstream.transform(rdd => {
rdd.filter(rdd.take(n).toList.contains)
})
partOfResult.print
Sharing a java based solution that is working for me. The idea is to use a custom function, which can send the top row from a sorted RDD.
someData.transform(
rdd ->
{
JavaRDD<CryptoDto> result =
rdd.keyBy(Recommendations.volumeAsKey)
.sortByKey(new CryptoComparator()).values().zipWithIndex()
.map(row ->{
CryptoDto purchaseCrypto = new CryptoDto();
purchaseCrypto.setBuyIndicator(row._2 + 1L);
purchaseCrypto.setName(row._1.getName());
purchaseCrypto.setVolume(row._1.getVolume());
purchaseCrypto.setProfit(row._1.getProfit());
purchaseCrypto.setClose(row._1.getClose());
return purchaseCrypto;
}
).filter(Recommendations.selectTopinSortedRdd);
return result;
}).print();
The custom function selectTopinSortedRdd looks like below:
public static Function<CryptoDto, Boolean> selectTopInSortedRdd = new Function<CryptoDto, Boolean>() {
private static final long serialVersionUID = 1L;
#Override
public Boolean call(CryptoDto value) throws Exception {
if (value.getBuyIndicator() == 1L) {
System.out.println("Value of buyIndicator :" + value.getBuyIndicator());
return true;
}
else {
return false;
}
}
};
It basically compares all incoming elements, and returns true only for the first record from the sorted RDD.
This seems to be always an issue with DStreams as well as regular RDDs.
If you don't want (or can't) to use .take() (especially in DStreams) you can think outside the box here and just use reduce instead. That is a valid function for both DStreams as well as RDD's.
Think about it. If you use reduce like this (Python example):
.reduce( lambda x, y : x)
Then what happens is: For every 2 elements you pass in, always return only the first. So if you have a million elements in your RDD or DStream it will shrink it to one element in the end which is the very first one in your RDD or DStream.
Simple and clean.
However keep in mind that .reduce() does not take order into consideration. However you can easily overcome this with a custom function instead.
Example: Let's assume your data looks like this x = (1, [1,2,3]) and y = (2, [1,2]). A tuple x where the 2nd element is a list. If you are sorting by the longest list for example then your code could look like below maybe (adapt as needed):
def your_reduce(x,y):
if len(x[1]) > len(y[1]):
return x
else:
return y
yourNewRDD = yourOldRDD.reduce(your_reduce)
Accordingly you will get '(1, [1,2,3])' as that has the longer list. There you go!
This has caused me some headaches in the past until I finally tried this. Hopefully this helps.

How to use reduce or fold to avoid mutable state

I have a mutable variable in my code that I want to avoid by using some of aggregation function. Unfortunatelly I couldn't find solution for the following pseudocode.
def someMethods(someArgs) = {
var someMutableVariable = factory
val resources = getResourcesForVariable(someMutableVariable)
resources foreach (resource => {
val localTempVariable = getSomeOtherVariable(resource)
someMutableVariable = chooseBetteVariable(someMutableVariable, localTempVariable)
})
someMutableVariable
}
I have two places in my code where I need to build some variable, then in loop compare it with other possibilities and if it worse then replace it with this newly possibility.
If you resources variable supports it:
//This is the "currently best" and "next" in list being folded over
resources.foldLeft(factory)((cur, next) =>
val local = getSomeOther(next)
//Since this function returns the "best" between the two, you're solid
chooseBetter(local, cur)
}
and then you don't have mutable state.

How does the following Java "continue" code translate to Scala?

for (String stock : allStocks) {
Quote quote = getQuote(...);
if (null == quoteLast) {
continue;
}
Price price = quote.getPrice();
if (null == price) {
continue;
}
}
I don't necessarily need a line by line translation, but I'm looking for the "Scala way" to handle this type of problem.
You don't need continue or breakable or anything like that in cases like this: Options and for comprehensions do the trick very nicely,
val stocksWithPrices =
for {
stock <- allStocks
quote <- Option(getQuote(...))
price <- Option(quote.getPrice())
} yield (stock, quote, price);
Generally you try to avoid those situations to begin with by filtering before you even start:
val goodStocks = allStocks.view.
map(stock => (stock, stock.getQuote)).filter(_._2 != null).
map { case (stock, quote) => (stock,quote, quote.getPrice) }.filter(_._3 != null)
(this example showing how you'd carry along partial results if you need them). I've used a view so that results will be computed as-needed, instead of creating a bunch of new collections at each step.
Actually, you'd probably have the quotes and such return options--look around on StackOverflow for examples of how to use those instead of null return values.
But, anyway, if that sort of thing doesn't work so well (e.g. because you are generating too many intermediate results that you need to keep, or you are relying on updating mutable variables and you want to keep the evaluation pattern simple so you know what's happening when) and you can't conceive of the problem in a different, possibly more robust way, then you can
import scala.util.control.Breaks._
for (stock <- allStocks) {
breakable {
val quote = getQuote(...)
if (quoteLast eq null) break;
...
}
}
The breakable construct specifies where breaks should take you to. If you put breakable outside a for loop, it works like a standard Java-style break. If you put it inside, it acts like continue.
Of course, if you have a very small number of conditions, you don't need the continue at all; just use the else of the if-statement.
Your control structure here can be mapped very idiomatically into the following for loop, and your code demonstrates the kind of filtering that Scala's for loop was designed for.
for {stock <- allStocks.view
quote = getQuote(...)
if quoteLast != null
price = quote.getPrice
if null != price
}{
// whatever comes after all of the null tests
}
By the way, Scala will automatically desugar this into the code from Rex Kerr's solution
val goodStocks = allStocks.view.
map(stock => (stock, stock.getQuote)).filter(_._2 != null).
map { case (stock, quote) => (stock,quote, quote.getPrice) }.filter(_._3 != null)
This solution probably doesn't work in general for all different kinds of more complex flows that might use continue, but it does address a lot of common ones.
If the focus is really on the continue and not on the null handling, just define an inner method (the null handling part is a different idiom in scala):
def handleStock(stock: String): Unit {
val quote = getQuote(...)
if (null == quoteLast) {
return
}
val price = quote.getPrice();
if (null == price) {
return
}
}
for (stock <- allStocks) {
handleStock(stock)
}
The simplest way is to embed the skipped-over code in an if with reversed-sense to what you have.
See http://www.scala-lang.org/node/257