Finding Scala libraries location from within Scala program

Finding Scala libraries location from within Scala program - scala

I'm trying to make one Scala program spawn another Scala program. I managed to obtain java executable from System.getProperty("java.home"), I've obtained some path from System.getProperty("java.class.path") (sbt-launcher.jar location), and with ClassLoader I've got project/target/scala-2.11/classes directory.
However, I am still unable to run it. JVM complain that it is unable to find Scala library's classes:
Exception in thread "main" java.lang.NoClassDefFoundError: scala/concurrent/ExecutionContext
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: scala.concurrent.ExecutionContext
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
I am looking for a way to add those files to classpath, but I want it to be portable. I do not look for solutions like hardcoding scala location on local computer nor do I want to use other environment variables and parameters than the one already existing. I also don't want to rely on SBT or Activators presence in user's environment.
Since the parent JVM process can use them their location has to be stored somewhere and I'll be thankful for help with finding out that location.

To successfully spawn one Scala App from another I had to fix several issues with my code:
1. correct main class:
object ChildApp extends App {
println("success")
}
to make sure that ChildApp is runnable by Java it has to be an object. Scala has no concept of static but object methods would (and main will) be compiled into static method.
2. correct class name:
While ChildApp.getClass.getName returns ChildApp$, it refers to an object (so that we could pass otherwise static-method-only class around). Java expects $ in command line - in other works I had to remove tailing $ before passing it into the process builder.
3. complete class path
I haven't found all used JARs within System.getPropertiy("java.class.path"):
val pcp = System getPropertiy "java.class.path" split File.pathSeparator // sbt-launcher.jar only
I haven't found them in SystemClassLoader either:
val scp = ClassLoader.getSystemClassLoader.asInstanceOf[URLClassLoader].getURLs.map(_.toString) // same as above
I did found compiled files from my project using Class' resources:
// format like jar:file:/(your-project/compiled.jar)!some/package/ChildApp.class
lazy val jarClassPathPattern = "jar:(file:)?([^!]+)!.+".r
// format like file:/(your-project/compiled/some/package/ChildApp).class
lazy val fileClassPathPattern = "file:(.+).class".r
val jcp = jarClassPathPattern.findFirstMatchIn(pathToClass) map { matcher =>
val jarDir = Paths get (matcher group 2) getParent()
s"${jarDir}/*"
} toSet
val fcp = fileClassPathPattern.findFirstMatchIn(pathToClass) map { matcher =>
val suffix = "/" + clazz.getName
val fullPath = matcher group 1
fullPath substring (0, fullPath.length - suffix.length)
} toList
Finally I found where all those dependencies where stored:
// use App class' ClassLoader instead of system one
val lcp = ChildApp.getClass.getClassLoader.asInstanceOf[URLClassLoader].getURLs.map(_.toString)
4. bonus - JVM params and java location
val jvmArgs = ManagementFactory.getRuntimeMXBean.getInputArguments.toList
lazy val javaHome = System getProperty "java.home"
lazy val java = Seq(
Paths.get(javaHome, "bin", "java"),
Paths.get(javaHome, "bin", "java.exe")
) filter (Files exists _) head
Then you have everything you need for ProcessBuilder / Process:
val executable = java.toString
val arguments = jvmArgs ++ List("-cp", classPath, mainName) ++ mainClassArguments
PS. I checked several times - those additional JARs aren't passed using neither CLASSPATH environment variable nor with -cp parameter (sbt-launcher.jar's MANIFEST file did't had anything as well). So anyone knowing how they are passed and why my solution actually works, please explain.

Related

How to differentiate between a script and normal class files in Scala?

In the book, Programming in Scala 5th Edition, the author says the following for two classes:
Neither ChecksumAccumulator.scala nor Summer.scala are scripts, because they end in a definition. A script, by contrast, must end in a result expression.
The ChecksumAccumulator.scala is as follows:
import scala.collection.mutable
class CheckSumAccumulator:
private var sum = 0
def add(b: Byte): Unit = sum += b
def checksum(): Int = ~(sum & 0XFF) + 1
object CheckSumAccumulator:
private val cache = mutable.Map.empty[String, Int]
def calculate(s: String): Int =
if cache.contains(s) then
cache(s)
else
val acc = new CheckSumAccumulator
for c<-s do
acc.add((c >> 8).toByte)
acc.add(c.toByte)
val cs = acc.checksum()
cache += (s -> cs)
cs
whereas the Summer.scala is as follows:
import CheckSumAccumulator.calculate
object Summer:
def main(args: Array[String]): Unit =
for arg <- args do
println(arg + ": " + calculate(arg))
But when I run the Summer.scala file, I get a different error than what mentioned by the author:
➜ learning-scala git:(main) ./scala3-3.0.0-RC3/bin/scala Summer.scala
-- [E006] Not Found Error: /Users/avirals/dev/learning-scala/Summer.scala:1:7 --
1 |import CheckSumAccumulator.calculate
| ^^^^^^^^^^^^^^^^^^^
| Not found: CheckSumAccumulator
longer explanation available when compiling with `-explain`
1 error found
Error: Errors encountered during compilation
➜ learning-scala git:(main)
The author mentioned that the error would be around not having a result expression.
I also tried to compile CheckSumAccumulator only and then run Summer.scala as a script without compiling it:
➜ learning-scala git:(main) ./scala3-3.0.0-RC3/bin/scalac CheckSumAccumulator.scala
➜ learning-scala git:(main) ✗ ./scala3-3.0.0-RC3/bin/scala Summer.scala
<No output, given no input>
➜ learning-scala git:(main) ✗ ./scala3-3.0.0-RC3/bin/scala Summer.scala Summer of love
Summer: -121
of: -213
love: -182
It works.
Obviously, when I compile both, and then run Summer.scala, it works as expected. However, the differentiation of Summer.scala as a script vs normal file is unclear to me.

Let's start top-down...
The most regular way to compile Scala is to use a build tool like SBT/Maven/Mill/Gradle/etc. This build tool will help with a few things: downloading dependencies/libraries, downloading Scala compiler (optional), setting up CLASS_PATH and most importantly running scalac compiler and passing all flags to it. Additionally it can package compiled class files into JARs and other formats and do much more. Most relevant part is CP and compilation flags.
If you strip off the build tool you can compile your project by manually invoking scalac with all required arguments and making sure your working directory matches package structure, i.e. you are in the right directory. This can be tedious because you need to download all libraries manually and make sure they are on the class path.
So far build tool and manual compiler invocation are very similar to what you can also do in Java.
If you want to have an ah-hoc way of running some Scala code there are 2 options. scala let's you run scripts or REPL by simply compiling your uncompiled code before it executes it.
However, there are some caveats. Essentially REPL and shell scripts are the same - Scala wraps your code in some anonymous object and then runs it. This way you can write any expression without having to follow convention of using main function or App trait (which provides main). It will compile the script you are trying to run but will have no idea about imported classes. You can either compile them beforehand or make a large script that contains all code. Of course if it starts getting too large it's time to make a proper project.
So in a sense there is no such thing as script vs normal file because they both contain Scala code. The file you are running with scala is a script if it's an uncompiled code XXX.scala and "normal" compiled class XXX.class otherwise. If you ignore object wrapping I've mentioned above the rest is the same just different steps to compile and run them.
Here is the traditional 2.xxx scala runner code snippet with all possible options:
def runTarget(): Option[Throwable] = howToRun match {
case AsObject =>
ObjectRunner.runAndCatch(settings.classpathURLs, thingToRun, command.arguments)
case AsScript if isE =>
ScriptRunner(settings).runScriptText(combinedCode, thingToRun +: command.arguments)
case AsScript =>
ScriptRunner(settings).runScript(thingToRun, command.arguments)
case AsJar =>
JarRunner.runJar(settings, thingToRun, command.arguments)
case Error =>
None
case _ =>
// We start the repl when no arguments are given.
if (settings.Wconf.isDefault && settings.lint.isDefault) {
// If user is agnostic about -Wconf and -Xlint, enable -deprecation and -feature
settings.deprecation.value = true
settings.feature.value = true
}
val config = ShellConfig(settings)
new ILoop(config).run(settings)
None
}
This is what's getting invoked when you run scala.
In Dotty/Scala3 the idea is similar but split into multiple classes and classpath logic might be different: REPL, Script runner. Script runner invokes repl.

SBT: Invalid project ID: Expected ID character when pointing to module in subdirectory

I a multi-module project structured like this
+- multiModuleProject
+-module1
+-dir1
+-subDirModule1
+-subDirModule2
+-module3
+-build.sbt
I want both subDirModule1 and subDirModule2 to be their own modules outright.
I added something like this to the build.sbt
lazy val subDir1 = Project(id = "dir1/subDirModule1", base = file("dir1/subDirModule1")
lazy val subDir1 = Project(id = "dir1/subDirModule2", base = file("dir1/subDirModule2")
I can't get it to work, I keep getting
[error] java.lang.RuntimeException: Invalid project ID: Expected ID character
[error] dir1/subDirModule1
[error] ^
But I'm sure I've seen a slash being used in another project I've worked on. What going wrong here?

Slash is used as a separator between project ID and config and has been for a long time, so I suspect you are misremembering (if you don't, you'd need to escape it all the time and I at least never remember seeing it). You can of course use it in the path (base argument), just not in the ID:
lazy val subDir1 = Project(id = "subDir1", base = file("dir1/subDirModule1"))
and then use e.g.
sbt> subDir1/compile
You can of course use whatever name you want, but usually the val name and the id will be the same.

Executing hdfs commands from inside scala script

I'm trying to execute a HDFS specific command from inside the scala script being executed by Spark in cluster mode. Below the command:
val cmd = Seq("hdfs","dfs","-copyToLocal","/tmp/file.dat","/path/to/local")
val result = cmd.!!
The job fails at this stage with the error as below:
java.io.FileNotFoundException: /var/run/cloudera-scm-agent/process/2087791-yarn-NODEMANAGER/log4j.properties (Permission denied)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.log4j.Logger.getLogger(Logger.java:104)
at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:262)
at org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:108)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
However, when I run the same command separately in Spark shell, it executes just fine and the file is copied as well.
scala> val cmd = Seq("hdfs","dfs","-copyToLocal","/tmp/file_landing_area/file.dat","/tmp/local_file_area")
cmd: Seq[String] = List(hdfs, dfs, -copyToLocal, /tmp/file_landing_area/file.dat, /tmp/local_file_area)
scala> val result = cmd.!!
result: String = ""
I don't understand the permission denied error. Although it displays as a FileNotFoundException. Totally confusing.
Any ideas?

As per error, it is checking hdfs data into var folder which I suspect configuration issue or it is not pointing to correct one.
Using seq and executing HDFS command is not good practise. It is useful only for spark shell. Using same approach in code not advisable. Instead of this try to use below Scala File system API to move data From or To HDFS. Please check below sample code just for reference that might help you.
import org.apache.hadoop.fs
import org.apache.hadoop.fs._
val conf = new Configuration()
val fs = path.getFileSystem(conf)
val hdfspath = new Path("hdfs:///user/nikhil/test.csv")
val localpath = new Path("file:///home/cloudera/test/")
fs.copyToLocalFile(hdfspath,localpath)
Please use below link for more reference regarding Scala File system API.
https://hadoop.apache.org/docs/r2.9.0/api/org/apache/hadoop/fs/FileSystem.html#copyFromLocalFile(boolean,%20boolean,%20org.apache.hadoop.fs.Path,%20org.apache.hadoop.fs.Path)

NoClassDefFoundError: Could not initialize XXX class after deploying on spark standalone cluster

I wrote a spark streaming application built with sbt. It works perfectly fine locally, but after deploying on the cluster, it complains about a class I wrote which clearly in the fat jar (checked using jar tvf). The following is my project structure. XXX object is the one that spark complains about
src
`-- main
`-- scala
|-- packageName
| `-- XXX object
`-- mainMethodEntryObject
My submit command:
$SPARK_HOME/bin/spark-submit \
--class mainMethodEntryObject \
--master REST_URL\
--deploy-mode cluster \
hdfs:///FAT_JAR_PRODUCED_BY_SBT_ASSEMBLY
Specific error message:
java.lang.NoClassDefFoundError: Could not initialize class XXX

I ran into this issue for a reason similar to this user:
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-td18972.html
I was calling a method on an object that had a few variables defined on the object itself, including spark and a logger, like this
val spark = SparkSession
.builder()
.getOrCreate()
val logger = LoggerFactory.getLogger(this.getClass.getName)
The function I was calling called another function on the object, which called another function, which called yet another function on the object inside of a flatMap call on an rdd.
I was getting the NoClassDefFoundError error in a stacktrace where the previous 2 function calls in the stack trace were functions on the class Spark was telling me did not exist.
Based on the conversation linked above, my hypothesis was that the global spark reference wasn't getting initialized by the time the function that used it was getting called (the one that resulted in the NoClassDefFoundError exception).
After quite a few experiments, I found that this pattern worked to resolve the problem.
// Move global definitions here
object MyClassGlobalDef {
val spark = SparkSession
.builder()
.getOrCreate()
val logger = LoggerFactory.getLogger(this.getClass.getName)
}
// Force the globals object to be initialized
import MyClassGlobalDef._
object MyClass {
// Functions here
}
It's kind of ugly, but Spark seems to like it.

It's difficult to say without the code but it looks like a problem of serialization of your XXX object. I can't say I'm understand perfectly why, but the point is that the object is not shipped to the executor.
The solution that worked for me is to convert your object to a class that extends Serializable and just instantiate it where you need it. So basically, if I'm not wrong you have
object test {
def foo = ...
}
which would be used as test.foo in your main, but you need at minimum
class Test extends Serializable {
def foo = ...
}
and then in your main have val test = new Test at the beginning and that's it.

It is related to serialization. I fixed this by adding "implements Serializable" and serialVersionUID field to given class.

Pick the right config file

I have created two files in my src/main/resources folder
application-dev.conf
which contains
dev {
oracle {
host = "foo"
}
}
and
application-qa.conf
which contains
qa {
oracle {
host = "bar"
}
}
I read this configuration with the following code
val env = args.lift(0).getOrElse("dev")
val parsedConfig = ConfigFactory.parseFile(new File(s"src/main/resources/application-${env}.conf"))
val conf = ConfigFactory.load(parsedConfig)
val config = conf.getConfig(env)
println(config.getString("oracle.host"))
Everything works great at development time and I am able to read the right configuration file based on the environment specified. If I don't specify anything then development is chosen.
However now when I package my jar file as an assembly using sbt assembly and now try to run my application from the command line java -jar ./target/MyApplication.jar
I get an error
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'dev'
at com.typesafe.config.impl.SimpleConfig.findKeyOrNull(SimpleConfig.java:152)
at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:170)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:184)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:189)
My objective is that I should have multiple configuration files and I should be able to choose between them at dev time and also when the application is packaged as a jar.
Based on the suggestion below I modified my code to
val env = args.lift(0).getOrElse("dev")
val parsedConfig = ConfigFactory.parseFile(new File(getClass.getResource(s"/application-${env}.conf").toURI))
val conf = ConfigFactory.load(parsedConfig)
val config = conf.getConfig(env)
println(config.getString("oracle.host"))
works in dev, but when I try to run my assembly it throws an exception
Exception in thread "main" java.lang.IllegalArgumentException: URI is not hierarchical
at java.io.File.<init>(File.java:418)

The config file is not in src/main/resources in your assembly. It usually is at the root of the classpath (unless you have configured the plugin to package it somewhere else).
Try using like below.
ConfigFactory.load(s"application-${env}.conf")
This loads the config file from the classpath.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Finding Scala libraries location from within Scala program - scala

Related

How to differentiate between a script and normal class files in Scala?

SBT: Invalid project ID: Expected ID character when pointing to module in subdirectory

Executing hdfs commands from inside scala script

NoClassDefFoundError: Could not initialize XXX class after deploying on spark standalone cluster

Pick the right config file

Categories

Resources