Scala Class parameters - scala

I have recently come across a Scala class in my project which has parameters like currying. For Example:
class A (a:Int,b:Int)(c:Int){
/**
Some definition
**/
}
What is the advantage of these kind of parameterization?
When I create an object as
val obj = new A(10,20)
I am getting runtime error.
Can anyone explain?

What is the advantage of these kind of parameterization?
I can think of two possible advantages:
Delayed construction.
You can construct part of the instance in one area of the code base...
val almostA = new A(10, 20)(_)
...and complete it in a different area of the code.
val realA = almostA(30)
Default parameter value.
Your example doesn't do this, but a curried parameter can reference an earlier parameter.
class A (a:Int,b:Int)(c:Int = a){ . . .
When I create an object as val obj = new A(10,20) I am getting runtime error.
You should be getting a compile-time error. You either have to complete the constructor parameters, new A(10,20)(30), or you have to leave a placeholder, new A(10,20)(_). You can leave the 2nd parameter group off only if it has a default value (see #2 above).

Related

Variable inside dataframe foreach gives null pointer exception in Scala

I'm having some issues when trying to execute a class function inside a "dataframe.foreach" function. My custom class is persisting the data into a DynamoDB table.
What happens is that if I have the following code, it won't work and will raise a "Null Pointer Exception" that points to the line of code where the "writer.writeRow(r)" is executed:
object writeToDynamoDB extends App {
val df: DataFrame = ...
val writer: DynamoDBWriter = new DDBWriter(...)
df
.foreach(
r => writer.writeRow(r)
)
}
If I use the same code, but having the code inside a code block or an if clause, it will work:
object writeToDynamoDB extends App {
val df: DataFrame = ...
if(true) {
val writer: DynamoDBWriter = new DDBWriter(...)
df
.foreach(
r => writer.writeRow(r)
)
}
}
I guess it has something to do with the variable scope. Even in IntelliJ the color of the variable is purple + Italic in the first case and "regular" grey in the second case. I read about it, and we have the method, field and local scope in Scala, but I'm can't relate that with what I'm trying to do.
Some questions after this introduction:
Can anyone explain why does Scala and/or Spark have this behaviour?
The solution here is to put some code inside a function, code block
or a "fake" if clause as far as I know. Is there any possible issue regarding Spark properties (shuffles, etc)?
Is there any other way to do this type of operations?
Hope I was clear.
Thanks in advance.
Regards
As said above, your issue is caused by delayed initialization when using the App trait. Spark docs strongly discourage that:
Note that applications should define a main() method instead of extending scala.App. Subclasses of scala.App may not work correctly.
The reason can be found in the Javadocs of the App trait itself:
It should be noted that this trait is implemented using the DelayedInit functionality, which means that fields of the object will not have been initialized before the main method has been executed.
This basically means that writer is still uninitialized (so null) by the time the closure passed to foreach is created.
If you put respective code into a block, writer becomes a local variable and is initialized at the time when the block is evaluated. That way your closure will contain the correct value of writer. In this case it doesn't matter anymore when the code is evaluated, because everything get's evaluated together.
The correct and recommended solution is to use a standard main method for your Spark applications:
object writeToDynamoDB {
def main(args: Array[String]): Unit = {
val df: DataFrame = ...
val writer: DynamoDBWriter = new DDBWriter(...)
df.foreach(r => writer.writeRow(r))
}
}

Why this map function does not give traits' simple names

I try to get names of all trait a class extends using getInterfaces which returns an array of trait's names. When I manually access each member of the array, the method getName returns simple names like this
trait A
trait B
class C() extends A, B
val c = C()
val arr = c.getClass.getInterfaces
arr(0).getName // : String = A
arr(1).getName // : String = B
However, when I use map function on arr. The resulting array contains a cryptic version of trait's names
arr.map(t => t.getName) // : Array[String] = Array(repl$.rs$line$1$A, repl$.rs$line$2$B)
The goal of this question is not about how to get the resulting array that contains simple names (for that purpose, I can just use arr.map(t => t.getSimpleName).) What I'm curious about is that why accessing array manually and using a map do not yield a compatible result. Am I wrong to think that both ways are equivalent?
I believe you run things in Scala REPL or Ammonite.
When you define:
trait A
trait B
class C() extends A, B
classes A, B and C aren't defined in top level of root package. REPL creates some isolated environment, compiles the code and loads the results into some inner "anonymous" namespace.
Except this is not true. Where this bytecode was created is reflected in class name. So apparently there was something similar (not necessarily identical) to
// repl$ suggest object
object repl {
// .rs sound like nested object(?)
object rs {
// $line sounds like nested class
class line { /* ... */ }
// $line$1 sounds like the first anonymous instance of line
new line { trait A }
// import from `above
// $line$2 sounds like the second anonymous instance of line
new line { trait B }
// import from above
//...
}
}
which was made because of how scoping works in REPL: new line creates a new scope with previous definitions seen and new added (possibly overshadowing some old definition). This could be achieved by creating a new piece of code as code of new anonymous class, compiling it, reading into classpath, instantiating and importing its content. Byt putting each new line into separate class REPL is able to compile and run things in steps, without waiting for you to tell it that the script is completed and closed.
When you are accessing class names with runtime reflection you are seeing the artifacts of how things are being evaluated. One path might go trough REPLs prettifiers which hide such things, while the other bypass them so you see the raw value as JVM sees it.
The problem is not with map rather with Array, especially its toString method (which is one among the many reasons for not using Array).
Actually, in this case it is even worse since the REPL does some weird things to try to pretty-print Arrays which in this case didn't work well (and, IMHO, just add to the confusion)
You can fix this problem calling mkString directly like:
val arr = c.getClass.getInterfaces
val result = arr.map(t => t.getName)
val text = result.mkString("[", ", ", "]")
println(text)
However, I would rather suggest just not using Array at all, instead convert it to a proper collection (e.g. List) as soon as possible like:
val interfaces = c.getClass.getInterfaces.toList
interfaces .map(t => t.getName)
Note: About the other reasons for not using Arrays
They are mutable.
Thet are invariant.
They are not part of the collections hierarchy thus you can't use them on generic methods (well, you actually can but that requires more tricks).
Their equals is by reference instead of by value.

Generic class wrapper in scala

Hello I would like to create a generic wrapper in scala in order to track the changes of the value of any type. I don't know/haven't found any other ways so far and I was thinking of creating a class and I've been trying to use the Dynamic but it has some limitations.
case class Wrapper[T](value: T) extends Dynamic {
private val valueClass = value.getClass
def applyDynamic(id: String)(parameters: Any*) = {
val objectParameters = parameters map (x => x.asInstanceOf[Object])
val parameterClasses = objectParameters map (_.getClass)
val method = valueClass.getMethod(id, parameterClasses:_*)
val res = method.invoke(value, objectParameters:_*)
// TODO: Logic that will eventually create some kind of event about the method invoked.
new Wrapper(res)
}
}
With this code I have trouble when invoking the plus("+") method on two Integers and I don't understand why. Isn't there a "+" method in the Int class? The error I am getting when I try addition with both a type of Wrapper/Int is:
var wrapped1 = Wrapper(1)
wrapped1 = wrapped1 + Wrapper[2] // or just 2
type mismatch;
found : Wrapper[Int]/Int
required: String
Why is it expecting a string?
If possible it would also be nice to be able to work with both the Wrapper[T] and the T methods seamlessly, e.g.
val a = Wrapper[Int](1)
val b = Wrapper[Int](2)
val c = 3
a + b // Wrapper[Int].+(Wrapper[Int])
a + c // Wrapper[Int].+(Int)
c + a // Int.+(Wrapper[Int])
Well, if youre trying to make a proxy which will get any changes of desired values you'l probably fail without agents(https://dzone.com/articles/java-agent-1) because it will force you make code modifications for bytecode that accepts final classes and primitives to accept your proxy instead of that and it would require more than intercepting changes of "just class" but also all classes of members and produce origin-of-value analysis and so on. It's by no way trivial problem.
Another approach is to produce diffs of case classes by comparing classes in certain points of execution and there's generic implementation like that, it uses derivation for computing diffs: https://github.com/ivan71kmayshan27/ShapelesDerivationExample I believe you can came with easier solution with magnolia. Actualy this one is unable to work for just classes unless you write your own macro and have some problems regarding ordered and unordered collections.

Scala syntax in Kafka

I am reading the source code of the class kafka.core.log.LogSegment. Where the syntax of scala gives me huge confusion. I know I could make it clear if I can learn scala in a systematic way But I just don't have that much time since my project awaits.
Definition of the methods:
#volatile private var _maxTimestampSoFar: Option[Long] = None//**#pos 0 constructor??**
def maxTimestampSoFar_=(timestamp: Long): Unit = _maxTimestampSoFar = Some(timestamp)//**definition 1**
def maxTimestampSoFar: Long = {//**definition2**
if (_maxTimestampSoFar.isEmpty)
_maxTimestampSoFar = Some(timeIndex.lastEntry.timestamp)
_maxTimestampSoFar.get
}
Where they are called:
if (largestTimestamp > maxTimestampSoFar) {//**#pos 3.getter**
maxTimestampSoFar = largestTimestamp//**#pos4 set the value?**
offsetOfMaxTimestampSoFar = shallowOffsetOfMaxTimestamp
}
What confuses me can be concluded into the following:
What is usage of this kind of method with an extra "_" after the identifier of the method like this one here: the maxTimestampSoFar_.
When I checked the usage of definition 1 and definition 2, there occurrence overlaps, from which can I conclude they are regarded as the same method like overloaded twins?But since they have different parameters, why we need a difference in the identifier?
As for the place the method is called, is my understanding correct? Is pos 4 the place where definition 1 of the method is called? Then the argument of the parameter is passed just by using the "=" ?
If the second assumption is correct, then the at above pos 0, is it the call of Option's constructor? This is like calling the default constructor?
Hope anyone can help me. Appreciate that.
The method name contains also the equals sign, so is maxTimestampSoFar_=. That is how setters are defined in Scala (see Scala getters/setters - best practice?)
Yes, what looks like an assignment in pos 4 will invoke the method defined in 1
Option[Long] can either contain None or Some(<long value>), pos 0 in the code initializes the variable with value None

Scala reflection on function parameter names

I have a class which takes a function
case class FunctionParser1Arg[T, U](func:(T => U))
def testFunc(name1:String):String = name1
val res = FunctionParser1Arg(testFunc)
I would like to know the type signature information on the function from inside the case class. I want to know both the parameter name and type. I have had success in finding the type using the runtime mirror objects, but not the name. Any suggestions?
Ok, let's say you got the symbol for the instance func points to:
import scala.reflect.runtime.universe._
import scala.reflect.runtime.{currentMirror => m}
val im = m reflect res.func // Instance Mirror
You can get the apply method from its type members:
val apply = newTermName("apply")
val applySymbol = im.symbol.typeSignature member apply
And since we know it's a method, make it a method symbol:
val applyMethod = applySymbol.asMethod
It's parameters can be found through paramss, and we know there's only one parameter on one parameter list, so we can get the first parameter of the first parameter list:
val param = applyMethod.paramss(0)(0)
Then what you are asking for is:
val name = param.name.decoded // if you want "+" instead of "$plus", for example
val type = param.typeSignature
It's possible that you think that's the wrong answer because you got x$1 instead of name1, but what is passed to the constructor is not the named function testFunc, but, instead, an anonymous function representing that method created through a process called eta expansion. You can't find out the parameter name of the method because you can't pass the method.
If that's what you need, I suggest you use a macro instead. With a macro, you'll be able to see exactly what is being passed at compile time and get the name from it.