We have recently upgraded our Azure Databricks runtime from 6.4 to 7.3LTS. And the issue is that my Scala bulk insert for SQL Server MI is not working anymore.
The error i am getting is as below
%scala
val bulkCopyConfig = Config(Map(
"url" -> jdbchostURL,
"databaseName" -> jdbcDatabase,
"user" -> jdbcUsername,
"password" -> jdbcPassword,
"dbTable" -> jdbcTableName,
"bulkCopyBatchSize" -> "200000",
"bulkCopyTableLock" -> "true",
"bulkCopyTimeout" -> "600"
))
The error that i am getting is
java.lang.NoClassDefFoundError: scala/Product$class
I have checked GitHub and other sites, no specific answer to the problem is available.
Please help
Tried multiple solutions but to no avail, expecting a solution that can fix the above error and enable to run scala 2.11 code on scala 2.12 easily.
this kind of error is usually caused by incompatible connector - make sure that your SQL Server connector is compiled for Scala 2.12 (com.microsoft.azure:spark-mssql-connector_2.12:1.1.0 for 7.3/Spark 3.0 - see compatibility table on Github) - it's impossible to use connector compiled for Scala 2.11 on DBR 7.3.
P.S. It makes sense to go to DBR 10.4 already, as 7.3 will be end of life in ~half year
Related
I am new to Apache Spark and I am using Scala and Mongodb to learn it.
https://docs.mongodb.com/spark-connector/current/scala-api/
I am trying to read the RDD from my MongoDB database, my notebook script as below:
import com.mongodb.spark.config._
import com.mongodb.spark._
val readConfig = ReadConfig(Map("uri" -> "mongodb+srv://root:root#mongodbcluster.td5gp.mongodb.net/test_database.test_collection?retryWrites=true&w=majority"))
val testRDD = MongoSpark.load(sc, readConfig)
print(testRDD.collect)
At the print(testRDD.collect) line, I got this error:
java.lang.NoSuchMethodError:
com.mongodb.internal.connection.Cluster.selectServer(Lcom/mongodb/selector/ServerSelector;)Lcom/mongodb/internal/connection/Server;
And more than 10 lines "at..."
Used libraries:
org.mongodb.spark:mongo-spark-connector_2.12:3.0.1
org.mongodb.scala:mongo-scala-driver_2.12:4.2.3
Is this the problem from Mongodb internal libraries or how could I fix it?
Many thanks
I suspect that there is a conflict between mongo-spark-connector and mongo-scala-driver. The former is using Mongo driver 4.0.5, but the later is based on the version 4.2.3. I would recommend to try only with mongo-spark-connector
I was also facing the same problem, solved it using mongo-spark-connector-2.12:3.0.1 jar and with that also added jar of Scalaj HTTP » 2.4.2. It's working fine now.
I'm trying to run the main class in the scala-sbt project. Running the class is resulting in stackoverflow Error, StackTrace pasted below.
I am sure this is not a code Issue because for the same project I was able to run sbt package by setting the memory in sbt conf file as described here: https://stackoverflow.com/q/55874883.
I tried to set the params in Intellij> Settings> Scala Compile Server, but it didn't help to overcome Error.
JDK: Profile Default
Jvm maximum heap size, MB: 2024
JVM options: -server -Xmx2G -Xss20m -XX:MaxPermSize=1000m -XX:ReservedCodeCacheSize=1000m
IntelliJ:
IntelliJ IDEA 2019.1 (Community Edition)
Build #IC-191.6183.87, built on March 27, 2019
JRE: 1.8.0_202-release-1483-b39 amd64
JVM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
Windows 10 10.0
Sbt Version: 1.2.8
Scala Version: 2.11.8
Error:scalac: Error: org.jetbrains.jps.incremental.scala.remote.ServerException
java.lang.StackOverflowError
at scala.tools.nsc.transform.TailCalls$TailCallElimination.transform(TailCalls.scala:273)
at scala.tools.nsc.transform.TailCalls$TailCallElimination.transform(TailCalls.scala:209)
at scala.tools.nsc.transform.TailCalls$TailCallElimination.noTailTransform(TailCalls.scala:214)
at scala.tools.nsc.transform.TailCalls$TailCallElimination.transform(TailCalls.scala:403)
at scala.tools.nsc.transform.TailCalls$TailCallElimination.transform(TailCalls.scala:209)
at scala.tools.nsc.transform.TailCalls$TailCallElimination.noTailTransform(TailCalls.scala:214)
at scala.tools.nsc.transform.TailCalls$TailCallElimination.transform(TailCalls.scala:403)
at scala.tools.nsc.transform.TailCalls$TailCallElimination.transform(TailCalls.scala:209)
redacted...
I've had the same issue with Intellij IDEA Ultimate 2020.1.1.
None of above options worked for me. But with a hint of the above answer, I found there was also different setting for the Scala compiler, after changing it the error stopped.
I increased the size of stack trace of the Scala compiler server: Preferences -> Compiler -> Scala Compiler -> Scala Compiler Server. Then, change the JVM options accordingly (In my case, -server -Xss1024m).
I've had the same issue with Intellij IDEA Community Edition 2019.3.4.
In the end, what worked for me was this solution. Basically, go to Settings -> Build, Execution, Deployment -> Compiler
Then, on "User-local build process VM options" set the stack size to a greater value with -Xss
In my case, I finally managed to run the tests setting it to -Xss2048m
I hope this helps.
Go to Configure → Edit Custom VM Options and add your changes there
If you're able to do sbt package (so you have sufficient heap size and -Xss configured for sbt) but running class still throws java.lang.StackOverflowError try going to
Settings -> Build, Execution, Deployment -> sbt
and tick
sbt shell -> use for: project reload and builds
try those options
I am trying to run my Scala job on my local machine (a MacBook pro osx10.13.3) and I am having an error at runtime.
My versions:
scala: 2.11.12
spark: 2.3.0
hadoop: 3.0.0
I installed everything through brew.
The exception is:
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
Happening at those line:
val conf = new SparkConf()
.setAppName(getName)
.setMaster("local[2]")
val context = new SparkContext(conf)
The last line is where the exception is thrown.
My theory is that Hadoop and spark version arent working together but I can't find online what version should Hadoop be for spark 2.3.0.
Thank you.
So I figured out my problem.
So first, yes, I don't need Hadoop installed. Thanks for pointing that out.
And second I had java10 installed instead of java8. Removing it solved the rest of the problems.
Thank you everyone !
I have my scala code running in spark connecting to Neo4j on my mac. I wanted to test it on my windows machine but cannot seem to get it to run, I keep getting the error:
Spark context Web UI available at http://192.168.43.4:4040
Spark context available as 'sc' (master = local[*], app id = local-1508360735468).
Spark session available as 'spark'.
Loading neo4jspark.scala...
<console>:23: error: object neo4j is not a member of package org
import org.neo4j.spark._
^
Which gives subsequent errors of:
changeScoreList: java.util.List[Double] = []
<console>:87: error: not found: value neo
val initialDf2 = neo.cypher(noBbox).partitions(5).batch(10000).loadDataFrame
^
<console>:120: error: not found: value neo
Not sure what I am doing wrong, I am executing it like this:
spark-shell --conf spark.neo4j.bolt.password=TestNeo4j --packages neo4j-contrib:neo4j-spark-connector:2.0.0-M2,graphframes:graphframes:0.2.0-spark2.0-s_2.11 -i neo4jspark.scala
Says it finds all the dependencies yet the code throws the error when using neo. Not sure what else to try? Not sure why this doesn't work on my windows box and does my mac. Spark version 2.2 the same, neo4j up and running same versions, scala too, even java (save for a few minor revisions version differences)
This is a known issue (with a related one here), the fix for which is part of the Spark 2.2.1 release.
I’m having a weird issue with sbt-assembly if anyone could help
When trying to create fat jar to deploy to Spark with shading applied to shapeless libraries, I am seeing some classes not being renamed when ran in an Ubuntu machine while everything gets renamed fine when the sbt assembly is ran in Mac.
Here’s the shading config
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("shapeless.**" -> "shadedshapeless.#1")
.inLibrary("com.chuusai" % "shapeless_2.11" % "2.3.2")
.inLibrary("com.github.pureconfig" % "pureconfig_2.11" % "0.7.0")
.inProject)
When ran in Mac, these classes are renamed for pattern shapeless/Generic*
Renamed shapeless/Generic$.class -> shadedshapeless/Generic$.class
Renamed shapeless/Generic.class -> shadedshapeless/Generic.class
Renamed shapeless/Generic1$.class -> shadedshapeless/Generic1$.class
Renamed shapeless/Generic1$class.class -> shadedshapeless/Generic1$class.class
Renamed shapeless/Generic1.class -> shadedshapeless/Generic1.class
Renamed shapeless/Generic10$class.class -> shadedshapeless/Generic10$class.class
Renamed shapeless/Generic10.class -> shadedshapeless/Generic10.class
Renamed shapeless/Generic1Macros$$anonfun$1.class -> shadedshapeless/Generic1Macros$$anonfun$1.class
Renamed shapeless/Generic1Macros$$anonfun$2.class -> shadedshapeless/Generic1Macros$$anonfun$2.class
Renamed shapeless/Generic1Macros.class -> shadedshapeless/Generic1Macros.class
Renamed shapeless/GenericMacros$$anonfun$23.class -> shadedshapeless/GenericMacros$$anonfun$23.class
Renamed shapeless/GenericMacros.class -> shadedshapeless/GenericMacros.class
but when ran in Ubuntu, for the pattern shapless/Generic* only these things are renamed
Renamed shapeless/GenericMacros$$anonfun$23.class -> shadedshapeless/GenericMacros$$anonfun$23.class
Renamed shapeless/Generic1Macros$$anonfun$1.class -> shadedshapeless/Generic1Macros$$anonfun$1.class
Renamed shapeless/Generic1$.class -> shadedshapeless/Generic1$.class
Renamed shapeless/Generic1.class -> shadedshapeless/Generic1.class
I chose the pattern shapeless/Generic* coz when am providing the fat jar (produced in Ubuntu) to spark-submit then am getting the error right away (probably coming from pureconfig)
Exception in thread "main" java.lang.NoClassDefFoundError: shadedshapeless/Generic
No error occurs when fat jar produced in Mac is being fed to spark-submit
I'm not sure why on Ubuntu shading works differently from MacOs but one issue that I see is that you are shading only part of shapeless. I don't think this is a good idea and can give you problems because you'll mix different versions of shapeless. My suggestion is to try shading shapeless entirely for PureConfig. Just add
assemblyShadeRules in assembly := Seq(ShadeRule.rename("shapeless.**" -> "new_shapeless.#1").inAll)
to your sbt file. This solution has been tested by me on Ubuntu 16.04 with PureConfig 7.2 and Spark 2.1.0. See the FAQ section of PureConfig about this issue.