Problem with creating Auxiliary tail recursion in UDF of Spark Scala - scala

I am working on a small project where I am taking data from kafka and send each record to UDF. In UDF we have while loop code which I need to replace with tail recursion.
while (condition) {
fields
body
}
to
def whileReplacement(dummy: Int): Int ={
if(!condition) return 1
body
return parseExtTag(dummy)
}
But I am getting java.io.NotSerializableException. I do not understand what causing the error and how to solve it. If you have any better approach to solve this please provide it. Thanks you

Before I Just declare and call the recursive function in the UDF function it self.
The problem was solved by placing the recursive function outside of the UDF function. I think by providing the function to the spark executors solves the serialization problem in this case. This is Just my understanding, I am not sure about what was really happening. If any one know please explain it.

Related

Writting function that can operate on RDD and Seq in Scala

I am trying to write functions that can receive both Spark RDD and Scala native Seq, so that I can showcase the performance difference of the two approaches. However, I couldn't figure out a common type or interface for the aforesaid function parameters. Let's imagine something simple like computing the mean using a map operation. Both RDD and Seq have this operation. I've tried using the type Either[RDD[Int], Seq[Int]] but it just doesn't typecheck :/.
Any pointer would be very appreciated :)
Basically, you can't. They don't show any common superclass - besides AnyRef I guess. Their map functions have completely different signature (params etc) even though they share a name (and purpose)

Is a while loop already implemented with pass-by-name parameters? : Scala

The Scala Tour Of Scala docs explain pass-by-name parameters using a whileLoop function as an example.
def whileLoop(condition: => Boolean)(body: => Unit): Unit =
if (condition) {
body
whileLoop(condition)(body)
}
var i = 2
whileLoop (i > 0) {
println(i)
i -= 1
} // prints 2 1
The section explains that if the condition is not met then the body is not evaluated, thus improving performance by not evaluating a body of code that isn't being used.
Does Scala's implementation of while already use pass-by-name parameters?
If there's a reason or specific cases where it's not possible for the implementation to use pass-by-name parameters, please explain to me, I haven't been able to find any information on it so far.
EDIT: As per Valy Dia's (https://stackoverflow.com/users/5826349/valy-dia) answer, I would like to add another question...
Would a method implementation of the while statement perform better than the statement itself if it's possible not to evaluate the body at all for certain cases? If so, why use the while statement at all?
I will try to test this, but I'm new to Scala so it might take me some time. If someone would like to explain, that would be great.
Cheers!
The while statement is not a method, so the terminology by-name parameter is not really relevant... Having said so the while statement has the following structure:
while(condition){
body
}
where the condition is repeatedly evaluated and the body is evaluated only upon the condition being verified, as show this small examples:
scala> while(false){ throw new Exception("Boom") }
// Does nothing
scala> while(true){ throw new Exception("Boom") }
// java.lang.Exception: Boom
scala> while(throw new Exception("boom")){ println("hello") }
// java.lang.Exception: Boom
Would a method implementation of the while statement perform better than the statement itself if it's possible not to evaluate the body at all for certain cases?
No. The built-in while also does not evaluate the body at all unless it has to, and it is going to compile to much more efficient code (because it does not need to introduce the "thunks"/closures/lambdas/anonymous functions that are used to implement "pass-by-name" under the hood).
The example in the book was just showing how you could implement it with functions if there was no built-in while statement.
I assumed that they were also inferring that the while statement's body will be evaluated whether or not the condition was met
No, that would make the built-in while totally useless. That is not what they were driving at. They wanted to say that you can do this kind of thing with "call-by-name" (as opposed to "call-by-value", not as opposed to what the while loop does -- because the latter also works like that).
The main takeaway is that you can build something that looks like a control structure in Scala, because you have syntactic sugar like "call-by-name" and "last argument group taking a function can be called with a block".

Spark (python) - explain the difference between user defined functions and simple functions

I am a Spark beginner. I am using Python and Spark dataframes. I just learned about user defined functions (udf) that one has to register first in order to use it.
Question: in what situation do you want to create a udf vs. just a simple (Python) function?
Thank you so much!
Your code will be neater if you use UDFs, because it will take a function, and the correct return type (defaults to string if empty), and create a column expression, which means you can write nice things like:
my_function_udf = udf(my_function, DoubleType())
myDf.withColumn("function_output_column", my_function_udf("some_input_column"))
This is just one example of how you can use a UDF to treat a function as a column. They also make it easy to introduce stuff like lists or maps into your function logic via a closure, which is explained very well here

Head of empty list in Scala

I've made this recursive metod in Scala that returns a list made of all distinct elements of another list.
object es20 extends App{
def filledList:List[Int]=List()
#scala.annotation.tailrec
def distinct(l:List[Int]):List[Int] ={
if (l.isEmpty) filledList
if (filledList.forall(_!=l.head)) l.head::filledList
distinct(l.tail)
}
println(distinct(List(1,1,5,6,6,3,8,3))) //Should print List(1,5,6,3,8)
}
However, when I compile the code and then I run it, there's this exception:
java.util.NoSuchElementException: head of empty list
I thought that this exception was handle by the condition if (l.isEmpty).
How can I fix the code?
In Scala method returns last expression of the block. In your case you have three expressions: two if-expressions which result in unit and call to distinct, so checks will be executed every time you call distinct, no matter if the list is empty or not.
To fix it you can use if / else construct, or pattern match on input list, or make operation on headOption.
Anyway I doubt if this code correct: you trying to check something on 'filledList' which is always empty
You can fix this particular error by inserting else before the second if. However, as mentioned in the other answer, your code isn't correct, and won't work anyway, you need to rewrite it.
Also, I understand, that you are just trying to write this function as an exercise (if not, just do list.distinct), but I submit, that implementing quadratic solutions to trivially linear problems is never a good exercise to begin with.

recursive variable needs type

I have a code where I wanted to update an RDD as below:
val xRDD = xRDD.zip(tempRDD)
This gave me the error : recursive value x needs type
I want to maintain the xRDD over iterations and modifying it with tempRDD in each iteration. How can I achieve it?
Thanks in advance.
The compiler is telling you that you're attempting to define a variable with itself and use it in it's own definition within an action. To say this another way, you're attempting to use something that doesn't exist in an action to define it.
Edit:
If you have a list of actions that produce new RDD that you'd like to zip together, perhaps you should look at a Fold:
listMyActions.foldLeft(origRDD){ (rdd, f) =>
val tempRDD = f(rdd)
rdd.zip(tempRDD)
}
Don't forget that vals are immutable, this means that you can't reassign something to a previously defined variable. However if you want to do this, you can replace it for a var, which is not recommended, this question is more related to Scala's feature than to Apache-Spark's one. Besides, if you want more information you can consult this post Use of def val and vars in scala.