I'm using apache toree (version from github). When i'm trying to execute a query against a postgresql table, i'm getting intermittent scala compiler errors (when i run the same cell twice, the errors are gone and the code runs fine).
I am looking for advice on how to debug these errors. The errors look weird (they appear in the notebook nog on stdout).
error: missing or invalid dependency detected while loading class file 'QualifiedTableName.class'.
Could not access type AnyRef in package scala,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'QualifiedTableName.class' was compiled against an incompatible version of scala.
error: missing or invalid dependency detected while loading class file 'FunctionIdentifier.class'.
Could not access type AnyRef in package scala,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'FunctionIdentifier.class' was compiled against an incompatible version of scala.
error: missing or invalid dependency detected while loading class file 'DefinedByConstructorParams.class'.
...
The code is simple: extract a dataset from a postgres table:
%AddDeps org.postgresql postgresql 42.1.4 --transitive
val props = new java.util.Properties();
props.setProperty("driver","org.postgresql.Driver");
val df = spark.read.jdbc(url = "jdbc:postgresql://postgresql/database?user=user&password=password",
table = "table", predicates = Array("1=1"), connectionProperties = props)
df.show()
i checked for the obvious (both toree and apache spark use scala 2.11.8, i built apache toree with APACHE_SPARK_VERSION=2.2.0 which is the same as the spark i donwloaded)
For reference, this is the part of the Dockerfile i used to set up toree and spark:
RUN wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz && tar -zxf spark-2.2.0-bin-hadoop2.7.tgz && chmod -R og+rw /opt/spark-2.2.0-bin-hadoop2.7 && chown -R a1414.a1414 /opt/spark-2.2.0-bin-hadoop2.7
RUN (curl https://bintray.com/sbt/rpm/rpm > /etc/yum.repos.d/bintray-sbt-rpm.repo)
RUN yum -y install --nogpgcheck sbt
RUN (unset http_proxy; unset https_proxy; yum -y install --nogpgcheck java-1.8.0-openjdk-devel.i686)
RUN (git clone https://github.com/apache/incubator-toree && cd incubator-toree && make clean release APACHE_SPARK_VERSION=2.2.0 ; exit 0)
RUN (. /opt/rh/rh-python35/enable; cd /opt/incubator-toree/dist/toree-pip ;python setup.py install)
RUN (. /opt/rh/rh-python35/enable; jupyter toree install --spark_home=/opt/spark-2.2.0-bin-hadoop2.7 --interpreters=Scala)
I had a similar issue, but it appeared to resolve itself by merely reevaluating the cell in the Jupyter notebook, or by restarting the kernel and then reevaluating the cell. Annoying.
As said in cchantep's comment, you are probably using a different Scala version than the one used to build Spark.
The easiest solution is to check which one is used by Spark, and switch to this one, eg on Mac:
brew switch scala 2.11.8
Related
If you follow the steps at the official Scala 3 sites, like Dotty or Scala Lang then it recommends using Coursier to install Scala 3. The problem is that neither or these explain how to run a compiled Scala 3 application after following the steps.
Scala 2:
> cs install scala
> scalac HelloScala2.scala
> scala HelloScala2
Hello, Scala 2!
Scala 3:
> cs install scala3-compiler
> scala3-compiler HelloScala3.scala
Now how do you run the compiled application with Scala 3?
Currently there does not seem to be a way to launch a runner for Scala 3 using coursier, see this issue. As a workaround, you can install the binaries from the github release page. Scroll all the way down passed the contribution list to see the .zip file and download and unpack it to some local folder. Then put the unpacked bin directory on your path. After a restart you will get the scala command (and scalac etc) in terminal.
Another workaround is using the java runner directly with a classpath from coursier by this command:
java -cp $(cs fetch -p org.scala-lang:scala3-library_3:3.0.0):. myMain
Replace myMain with the name of your #main def function. If it is in a package myPack you need to say myPack.myMain (as usual).
Finally, it seems that is possible to run scala application like scala 2 version using scala3 in Coursier:
cs install scala3
Then, you can compile it with scala3-compiler and run with scala3:
scala3-compiler Main.scala
scala3 Main.scala
This work-around seems to work for me:
cs launch scala3-repl:3+ -M dotty.tools.MainGenericRunner -- YourScala3File.scala
This way, you don't even have to compile the source code first.
In case your source depends on third-party libraries, you can specify the dependencies like this:
cs launch scala3-repl:3+ -M dotty.tools.MainGenericRunner -- -classpath \
$(cs fetch --classpath io.circe:circe-generic_3:0.14.1):. \
YourScala3File.scala
This would be an example where you use the circe library that's compiled with Scala 3. You should be able to specify multiple third-party libraries with the fetch sub-command.
There are multiple binary incompatible scala 2 versions, however the document says the installation is either via IDE or SBT.
DOWNLOAD SCALA 2
Then, install Scala:...either by installing an IDE such as IntelliJ, or sbt, Scala's build tool.
Spark 3 needs Scala 2.12.
Spark 3.1.2 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x).
Then how can we make sure the scala version is 2.12 if we install sbt?
Or the documentation is not accurate and it should be "to use specific version of scala, need to download specific scala version on your own"?
Updates
As per the answer by mario-galic, in ONE-CLICK INSTALL FOR SCALA it is said:
Installing Scala has always been a task more challenging than necessary, with the potential to drive away beginners. Should I install Scala itself? sbt? Some other build tools? What about a better REPL like Ammonite? Oh and before all that I need to install Java?
To solve this problem, the Scala Center contracted Alexandre Archambault in January 2020 to add a one-click install of Scala through coursier. For example, on Linux, all we now need is:
$ curl -Lo cs https://git.io/coursier-cli-linux && chmod +x cs && ./cs setup
The Scala version is specified in the build.sbt file so SBT will download the appropriate version of Scala as necessary.
I personally use SDKMAN! to install Java and then SBT.
The key concept to understand is the difference between system-wide installation and project-specific version. System-wide installation ends up somewhere on the PATH like
/usr/local/bin/scala
and can be installed in various ways, personally I recommend coursier one-click install for Scala
curl -Lo cs https://git.io/coursier-cli-linux && chmod +x cs && ./cs setup
Project-specific version is specified by scalaVersion sbt settings which downloads Scala to coursier cache location. To see the Scala version and location used by the particular project try show scalaInstance which
inspect scalaInstance
[info] Task: sbt.internal.inc.ScalaInstance
[info] Description:
[info] Defines the Scala instance to use for compilation, running, and testing.
Scala should be binary compatible within minor version so Spark 3 or any other software built against any Scala 2.12.x version should work with any other Scala 2.12.x version where we have major.minor.patch. Note binary compatibility is not guaranteed for internal compiler APIs, so for example when publishing compiler plugins the best practice is to publish it against full specific Scala version. For example notice how kind-projector compiler plugin is published against full Scala version 2.13.6
https://repo1.maven.org/maven2/org/typelevel/kind-projector_2.13.6/
whilst scala-cats application-level library is published against any Scala 2.13.x version
https://repo1.maven.org/maven2/org/typelevel/cats-core_2.13/
Similarly spark is published against any Scala 2.12.x version
https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.12/
Regarding system-wide installation one trick I do for quick switching of versions is to put scala-runners on the PATH and then different versions can be launched via --scala-version argument
scala --scala-version 2.12.14
Using coursier or scala-runners you can even switch JDK quickly via -C--jvm for example
scala --scala-version 2.12.14 -C--jvm=11
For a project, there should be no need to download manually a specific version of Scala. sbt either directly or indirectly via an IDE will download all the dependencies behind the scenes for you, so the only thing to specify is sbt setting scalaVersion.
Using Python as analogy to Scala, and Pipenv as anology to sbt, then python_version in Pipfile is similar to scalaVersion in build.sbt. After executing pipenv shell and pipenv install you end up with project-specific shell environment with project specific Python version and dependencies. sbt similarly downloads project specific Scala version and dependencies based of build.sbt although it has no need for lock files or for modifying your shell environment.
So I am editing MovieLensALS.scala and I want to just recompile the examples jar with my modified MovieLensALS.scala.
I used build/mvn -pl :spark-examples_2.10 compile followed by build/mvn -pl :spark-examples_2.10 package which finish normally. I have SPARK_PREPEND_CLASSES=1 set.
But when I re-run MovieLensALS using bin/spark-submit --class org.apache.spark.examples.mllib.MovieLensALS examples/target/scala-2.10/spark-examples-1.4.0-hadoop2.4.0.jar --rank 5 --numIterations 20 --lambda 1.0 --kryo data/mllib/sample_movielens_data.txt I get java.lang.StackOverflowError even though all I added to MovieLensALS.scala is a println saying that this is the modified file, with no other modifications whatsoever.
My scala version is 2.11.8 and spark version is 1.4.0 and I am following the discussion on this thread to do what I am doing.
Help will be appreciated.
So I ended up figuring it out myself. I compiled using mvn compile -rf :spark-examples_2.10 followed by mvn package -rf :spark-examples_2.10 to generate the .jar file. Note that the jar file produced here is spark-examples-1.4.0-hadoop2.2.0.jar.
On the other hand, the stackoverflow error was because of a long lineage. For that I could either use checkpoints of reduce numiterations, I did the later. I followed this for it.
We are creating a customize version of Spark since we are changing some lines of code from ALS.scala. We build the customize spark version using
mvn command:
./make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.6 -Phive -Phive-thriftserver -Pyarn
However, upon using the customized version of Spark, we run into this error:
Do you guys have some idea on what causes the error and how we might solve the issue?
I am actually using a jar file in the local machine by building them using sbt: sbt compile then sbt clean package and putting the jar file here: /Users/user/local/kernel/kernel-0.1.5-SNAPSHOT/lib.
However in the hadoop environment, the installation is different. Thus, I use maven to build spark and that's where the error comes in. I am thinking that this error might be dependent on using maven to build spark as there are some reports like this:
https://issues.apache.org/jira/browse/SPARK-2075
or probably on building spark assembly files
I installed Scala^Z3 on my Mac OSX (Mountain Lion, JDK 7, Scala 2.10, Z3 4.3) successfully (following this: http://lara.epfl.ch/w/ScalaZ3). Everything went fine except that I cannot run any example from this website (http://lara.epfl.ch/w/jniz3-scala-examples) without getting this nasty error:
java.lang.NoClassDefFoundError: scala/reflect/ClassManifest
at .<init>(<console>:8)
at .<clinit>(<console>)
at .<init>(<console>:7)
...
Caused by: java.lang.ClassNotFoundException: scala.reflect.ClassManifest
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 29 more
I think this happens because of the incompatibility between the Scala 2.9.x and 2.10.x in handling reflections. As I was able to run the same set of examples under Scala 2.9.x. My question is, is there anyway to go around this and run Scala^Z3 under Scala 2.10?
From looking at the project propertie and build file (https://github.com/psuter/ScalaZ3/blob/master/project/build.properties and https://github.com/psuter/ScalaZ3/blob/master/project/build/scalaZ3.scala) I infer that scalaZ3 is currently provided for scala 2.9.2 only. There is no cross version support at the moment.
You might try to get the code and compile it yourself after having changed the version to scala 2.10.0 in the "build.properties" file.
See this page for instructions on how to compile it: https://github.com/psuter/ScalaZ3.
If you're lucky, the code will compile as is under scala 2.10. If you're not, there might be some small fixes to do. Cross your fingers.
If you are not in a hurry, you could also bug the Scala^Z3 authors and ask them for scala 2.10 version of the library.
I'm copying the instructions from my response to your issue on GitHub, as it may help someone in the future.
The current status is that the old sbt project does not seem mix well with Scala 2.10. Here are the instructions for a "manual" compilation of the project, for Linux. This works for me with Z3 4.3 (grabbed from the Z3 git repo) and Scala 2.10. After installing Z3 per the original instructions:
First compile the Java files:
$ mkdir bin
$ find src/main -name '*.java' -exec javac -d bin {} +
Then compile the C files. For this, you need to generate the JNI headers first, then compile the shared library. The options in the commands below are for Linux. To find our where the JNI headers are, I run (new java.io.File(System.getProperty("java.home")).getParent in a Scala console (and add /include to the result).
$ javah -classpath bin -d src/c z3.Z3Wrapper
$ gcc -o lib-bin/libscalaz3.so -shared -Wl,-soname,libscalaz3.so \
-I/usr/lib/jvm/java-6-sun-1.6.0.26/include \
-I/usr/lib/jvm/java-6-sun-1.6.0.26/include/linux \
-Iz3/4.3/include -Lz3/4.3/lib \
-g -lc -Wl,--no-as-needed -Wl,--copy-dt-needed -lz3 -fPIC -O2 -fopenmp \
src/c/*.[ch]
Now compile the Scala files:
$ find src/main -name '*.scala' -exec scalac -classpath bin -d bin {} +
You'll get "feature warnings", which is typical when moving to 2.10, and another warning about a non-exhaustive pattern match.
Now let's make a jar file out of everything...
$ cd bin
$ jar cvf scalaz3.jar z3
$ cd ..
$ jar uf bin/scalaz3.jar lib-bin/libscalaz3.so
...and now you should have bin/scalaz3.jar containing everything you need. Let's try it out:
$ export LD_LIBRARY_PATH=z3/4.3/lib
$ scala -cp bin/scalaz3.jar
scala> z3.scala.version
Hope this helps!
This does not directly answer the question, but might help others trying to build scalaz3 with scala 2.10.
I built ScalaZ3 with Scala 2.10.1 and Z3 4.3.0 on Windows 7. I tested it with the integer constraints example at http://lara.epfl.ch/w/jniz3-scala-examples and it is working fine.
Building Z3
The Z3 4.3.0 download at codeplex does not include libZ3.lib file. So I had to download the source and build it on my machine. The build process is quite simple.
Building ScalaZ3
Currently, build.properties has sbt version 0.7.4 and scala version 2.9.2. This builds fine. ( I had to make some minor modifications to the build.scala file. Modify z3LibPath(z3VN).absolutePath to z3LibPath(z3VN).absolutePath + "\\libz3.lib" in the gcc task. )
Now if I change scala version to 2.10.1 in build.properties, I get "Error compiling sbt component 'compiler-interface'" error on launching sbt. I have no clue why this happens.
I then changed sbt version to 0.12.2 and scala version to 2.10.1, and started with the fresh source. I have also added build.sbt in project root folder containing scalaVersion := "2.10.1". This is required since in sbt 0.12.2, the file build.properties is supposed to be used for specifying the sbt version only. More info about sbt version differences at (https://github.com/harrah/xsbt/wiki/Migrating-from-SBT-0.7.x-to-0.10.x).
I get error Z3Wrapper.java:27: cannot find symbol LibraryChecksum. This happens because the file LibraryChecksum.java which is supposted to be generated by the build (project\build\build.scala) is not generated. Looks like the package task does not execute the tasks in (project\build\build.scala). The tasks compute-checksum, javah, and gcc are not executed. This may be happening because sbt 0.12.2 expects the build.scala file to be directly under the project folder.
I then copied the LibraryChecksum.java generated from the previous build, the build then goes through. The generated jar file does not contain scalaz3.dll.
I then executed javah and gcc tasks manually. The command for these tasks can be copied from the log of successful build with scala 2.9.2(I made appropriate modifications to the commands for scala 2.10.1). Here also, I had to make some changes. I had to explicitly add full path of scala-library.jar classpath of the javah task.
I then added the lib-bin\scalaz3.dll to the jar file using jar uf target\scala-2.10\scalaz3.jar lib-bin/scalaz3.dll.