There is an Apache Spark Scala project (runnerProject) that uses another one in the same package (sourceProject). The aim of the source project is to get the name and version of the Databricks job that is running.
The problem with the following method is that when it is called from the runnerProject, it returns the sourceProject's details, not the runnerProject's name and version.
sourceProject's method:
class EnvironmentInfo(appName: String) {
override def getJobDetails(): (String, String) = {
val classPackage = getClass.getPackage
val jobName = classPackage.getImplementationTitle
val jobVersion = classPackage.getImplementationVersion
(jobName, jobVersion)
}
}
runnerProject uses sourceProject as a package:
import com.sourceProject.environment.{EnvironmentInfo}
class runnerProject {
def start(
environment: EnvironmentInfo
): Unit = {
// using the returned parameters of the environment
}
How can this issue be worked around in a way that the getJobDetails() runs in the sourceProject, so that it can be called from other projects as well, not just the runnerProject. And also, it should return the details about the "caller" job.
Thank you in advance! :)
Try the following, it gets the calling class name from the stack trace, and uses that to get the actual class, and it's package.
class EnvironmentInfo(appName: String) {
override def getJobDetails(): (String, String) = {
val callingClassName = Thread.currentThread.getStackTrace()(2).getClassName
val classPackage = Class.forName(callingClassName).getPackage
val jobName = classPackage.getImplementationTitle
val jobVersion = classPackage.getImplementationVersion
(jobName, jobVersion)
}
}
It will work if you call it directly, it might give you the wrong package if you call it from within a lambda function.
Related
This is more of a Scala concept doubt than Spark. I have this Spark initialization code :
object EntryPoint {
val spark = SparkFactory.createSparkSession(...
val funcsSingleton = ContextSingleton[CustomFunctions] { new CustomFunctions(Some(hashConf)) }
lazy val funcs = funcsSingleton.get
//this part I want moved to another place since there are many many UDFs
spark.udf.register("funcName", udf {funcName _ })
}
The other class, CustomFunctions looks like this
class CustomFunctions(val hashConfig: Option[HashConfig], sark: Option[SparkSession] = None) {
val funcUdf = udf { funcName _ }
def funcName(colValue: String) = withDefinedOpt(hashConfig) { c =>
...}
}
^ class is wrapped in Serializable interface using ContextSingleton which is defined like so
class ContextSingleton[T: ClassTag](constructor: => T) extends AnyRef with Serializable {
val uuid = UUID.randomUUID.toString
#transient private lazy val instance = ContextSingleton.pool.synchronized {
ContextSingleton.pool.getOrElseUpdate(uuid, constructor)
}
def get = instance.asInstanceOf[T]
}
object ContextSingleton {
private val pool = new TrieMap[String, Any]()
def apply[T: ClassTag](constructor: => T): ContextSingleton[T] = new ContextSingleton[T](constructor)
def poolSize: Int = pool.size
def poolClear(): Unit = pool.clear()
}
Now to my problem, I want to not have to explicitly register the udfs as done in the EntryPoint app. I create all udfs as needed in my CustomFunctions class and then register dynamically only the ones that I read from user provided config. What would be the best way to achieve it? Also, I want to register the required udfs outside the main app but that throws my the infamous TaskNotSerializable exception. Serializing the big CustomFunctions is not a good idea, hence wrapped it up in ContextSingleton but my problem of registering udfs outside cannot be solved that way. Please suggest the right approach.
I have a number of test files, each with their own tests. Until now, each one has a trait in which I create a configuration object.
Building that object now takes the better part of two minutes, as it has to do a lot of work calling a number of databases (don't ask - really, it has to be done).
Is there a way to share this object (tldConfigMap, below) between multiple test suites in multiple files without having to build it over and over?
Here's how I was doing it - as you can see, when brought in as a trait, load() will be called every time:
trait TLDMapAcceptanceIsLoadedSpec extends org.scalatest.fixture.FlatSpecLike with TLDConfigMap {
val tldConfigMap: Map[String, TLDConfig] = load(withAttributes = true).right.get
type FixtureParam = Map[String, TLDConfig]
def withFixture(test: OneArgTest) = {
withFixture(test.toNoArgTest(tldConfigMap)) // "loan" the fixture to the test
}
}
You could just put it into an object (assuming it's really config and isn't changed by tests):
object TldConfigMap {
val tldConfigMap: Map[String, TLDConfig] = load(withAttributes = true).right.get
}
trait TLDMapAcceptanceIsLoadedSpec extends org.scalatest.fixture.FlatSpecLike with TLDConfigMap {
def tldConfigMap: Map[String, TLDConfig] = TldConfigMap.tldConfigMap // or just use it directly
type FixtureParam = Map[String, TLDConfig]
def withFixture(test: OneArgTest) = {
withFixture(test.toNoArgTest(tldConfigMap)) // "loan" the fixture to the test
}
}
Consider we have the following:
class Base { def name = "Base" }
class Successor extends Base {
override def name = "Successor"
}
I have tried to do the following (took from How to call a superclass method using Java reflection):
import java.lang.invoke.{MethodHandles, MethodHandle, MethodType}
object TestApp {
def main(args: Array[String]) {
val a = new Successor;
val h1 = MethodHandles.lookup().findSpecial(classOf[Base],
"name",
MethodType.methodType(classOf[String]),
classOf[Successor]);
println(h1.invoke(a));
}
}
but I get a runtime exception:
java.lang.IllegalAccessException: no private access for invokespecial: class Successor, from TestApp$
I was told that it is possible that Java reflection may not work correctly for Scala. Is it true? Or I simply do something wrong?
Actually, you can NOT even do it in Java. Note, in the answer of "How to call a superclass method using Java reflection", it works because the Test extends the Base: public class Test extends Base {...}.
It looks like it is possible and Java reflection works for Scala as well, I just didn't read all answers for How to call a superclass method using Java reflection.
The following code works:
object TestApp {
def main(args: Array[String]) {
val a = new Successor;
val impl_lookup = classOf[MethodHandles.Lookup].getDeclaredField("IMPL_LOOKUP")
impl_lookup.setAccessible(true)
val lkp = impl_lookup.get(null).asInstanceOf[MethodHandles.Lookup];
val h1 = lkp.findSpecial(classOf[Base],
"name",
MethodType.methodType(classOf[String]),
classOf[Successor])
println(h1.invoke(a)) // prints "Base"
println(a.name) // prints "Successor"
}
}
Thanks to Jesse Glick for this solution.
I need to write an integration test and it requires starting a server executable. I want to make location of the server configurable, so that I could set it on my box and on integration server.
ConfigMapWrapperSuite seems to be doing exactly what I want:
#WrapWith(classOf[ConfigMapWrapperSuite])
class ConsulTest(configMap: ConfigMap) extends FlatSpec with ShouldMatchers {
val consulPath = configMap("consul.path")
"Consul" should "list keys under root" in {
...
}
But when I set my IDE (IntelliJ) to execute all tests in the project, I get an exception saying that constructor with Map parameter not found. Looking into source code of scalatest revealed:
final class ConfigMapWrapperSuite(clazz: Class[_ <: Suite]) extends Suite {
private lazy val wrappedSuite = {
val constructor = clazz.getConstructor(classOf[Map[_, _]])
constructor.newInstance(Map.empty)
}
So in contrary to what documentation says, suite must have constructor with a Map and not ConfigMap.
Ok, I changed constructor to take a Map[String,String] but now I get NoSuchElementException at val consulPath = configMap("consul.path"). Lookung up the stack down to ConfigMapWrapperSuite and I see that constructor.newInstance(Map.empty) WTF? So wrapped suite class is instantiated with empty map, and than another time, during the suite run with actual map of parameters? How do I suppose to get parameters if I'm given an empty map?
I looked up scalatest's unit tests. They are so rudimentary that actually retrieving a value from configMap is not performed.
I do not want to use ConfigMapFixture because it will make me initializing every single test with the same code.
So, how do I not only pass but also get global setting in test suite?
Scalatest version: 3.0.0-M15
Ok, answering my own question. ConfigMapWrapperSuite seems to be not used too much and essentially is broken.
Instead I've used BeforeAndAfterAllConfigMap as in here:
class ConsulTest extends FlatSpec with ShouldMatchers with OneInstancePerTest with BeforeAndAfterAllConfigMap {
var consulProcess: Process = null
override def beforeAll(conf: ConfigMap): Unit = {
consulProcess = Seq("bin/"+exe, "agent", "-advertise", "127.0.0.1", "-config-file", "bin/config.json").run()
}
override def afterAll(conf: ConfigMap): Unit = {
consulProcess.destroy()
}
You need override instance
#WrapWith(classOf[ConfigMapWrapperSuite])
class AcceptanceTest(configMap: Map[String, Any])
extends FlatSpec
with OneInstancePerTest {
override def newInstance = new AcceptanceTest(configMap)
}
I've seen some questions regarding Scala and variable scoping (such as Scala variable scoping question)
However, I'm having trouble getting my particular use-case to work.
Let's say I have a trait called Repo:
trait Repo {
val source: String
}
And then I have a method to create an implementation of Repo...
def createRepo(source: String) =
new Repo {
val source: String = source
}
Of course I have two source variables in use, one at the method level and one inside of the Repo implementation. How can I refer to the method-level source from within my Repo definition?
Thanks!
Not sure if this is the canonical way, but it works:
def createRepo(source: String) = {
val sourceArg = source
new Repo {
val source = sourceArg
}
}
Or, you could just give your paramenter a different name that doesn't clash.
Or, make a factory:
object Repo {
def apply(src: String) = new Repo { val source = src }
}
def createRepo(source: String) = Repo(source)
In addition to the solutions of Luigi, you might also consider changing Repo from a trait to a class,
class Repo(val source: String)
def createRepo(source: String) = new Repo(source)