databricks-connect, py4j.protocol.Py4JJavaError: An error occurred while calling o342.cache - pyspark

Connection to databricks works fine, working with DataFrames goes smoothly (operations like join, filter, etc).
The problem appears when I call cache on a dataframe.
py4j.protocol.Py4JJavaError: An error occurred while calling o342.cache.
: failed to read class descriptor
Caused by: java.lang.ClassNotFoundException: org.apache.spark.rdd.RDD$client53442a94a3$$anonfun$mapPartitions$1$$anonfun$apply$23
at java.lang.ClassLoader.findClass(
at org.apache.spark.util.ParentClassLoader.findClass(
at java.lang.ClassLoader.loadClass(
at org.apache.spark.util.ParentClassLoader.loadClass(
at org.apache.spark.util.ChildFirstURLClassLoader.loadClass(
at java.lang.ClassLoader.loadClass(
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(
at org.apache.spark.util.Utils$.classForName(Utils.scala:257)
at org.apache.spark.sql.util.ProtoSerializer$$anon$4.readClassDescriptor(ProtoSerializer.scala:4304)
... 71 more
I work with java8 as required, clearing pycache doesn't help.
The same code submitted as a job to databricks works fine.
It looks like a local problem on a bridge python-jvm level but java version (8) and python (3.7) is as required. Switching to java13 produces quite the same message.
Versions databricks-connect==6.2.0, openjdk version "1.8.0_242", Python 3.7.6
Behavior depends on how DF is created, if the source of DF is external then it works fine, if DF is created locally then such error appears.
# works fine
df ="dbfs:/some.csv")
# ERROR in 'cache' line
df = spark.createDataFrame([("a",), ("b",)])

This is a known issue and I think a recent patch fixed it. This was seen for Azure, I am not sure whether you are using which Azure or AWS but it's solved. Please check the issue -


Apache Flink Kryo serializer - ClassNotFoundException

I have a project in Apache Flink 1.8.1, with Scala 2.11 and Java 8. I used to use Maven for compiling and all the dependency management, but switched to Gradle... which leads me to this problem below:
at java.lang.ClassLoader.loadClass(
... 3 frames excluded
at c.e.k.u.DefaultClassResolver.readName(
... 15 common frames omitted
Wrapped by: c.e.kryo.KryoException: Unable to find class:
Serialization trace:
eventOutputTag (
at c.e.k.u.DefaultClassResolver.readName(
at c.e.k.u.DefaultClassResolver.readClass(
at c.e.kryo.Kryo.readClass(
at c.e.kryo.Kryo.readClassAndObject(
at o.a.f.a.j.t.r.k.KryoSerializer.deserialize(
at o.a.f.s.r.s.StreamElementSerializer.deserialize(
at o.a.f.s.r.s.StreamElementSerializer.deserialize(
First, the error message has a missing 'c'. The class path should be ''... I checked the files using that code and there's no missing 'c' in my import statements...
I also edit the Flink conf file to use a parent-first strategy...
Further background info:
I have another file called ProjectContext which has an ArrayList<ProjectPayload>. It also has the eventOutputTag (as mentioned in the serialization trace)... When i comment out ArrayList<ProjectPayload> and its getters/setters, EVERYTHING WORKS!
When i put back the instance variable and its getters/setters in ProjectContext, then ClassNotFoundException occurs...
Furthermore, i sprinkled tons of print statements, and i was able to create an instance of ProjectPayload, and log it out fine.
### Edit (June, 30, 2020) ###
In light of this serialization issue, i added this code:
env.getConfig.registerTypeWithKryoSerializer(classOf[ProjectPayload], classOf[JavaSerializer[ProjectPayload]])
and now i have this awkward (but similar) error:
"j.l.ClassNotFoundException: \u0005sr\\"v
at java.lang.ClassLoader.loadClass(
... 3 frames excluded
at c.e.k.u.DefaultClassResolver.readName(
... 15 common frames omitted
Wrapped by: c.e.kryo.KryoException: Unable to find class: \u0005sr\\"v
Serialization trace:
allMyPayloads (
at c.e.k.u.DefaultClassResolver.readName(
at c.e.k.u.DefaultClassResolver.readClass(
at c.e.kryo.Kryo.readClass(
at c.e.kryo.Kryo.readClassAndObject(
at o.a.f.a.j.t.r.k.KryoSerializer.deserialize(
at o.a.f.s.r.s.StreamElementSerializer.deserialize(
at o.a.f.s.r.s.StreamElementSerializer.deserialize(
at o.a.f.r.p.NonReusingDeserializationDelegate....
Turns out \u0005 is the unicode character 'ENQUIRY'. and \u00008 leads to gibberish on Google search results... will report back later
### Edit (July 1, 2020) ###
Some progress: I was initializing the ArrayList<ProjectPayload> inside the ProjectContext. When i removed that initialization, moved it outside, and then set the ArrayList value, my code got much further along. Then it complained about a HashMap<String, String> instance variable as well -- i ended up deleting it since it wasn't used.
Which now brings me to an IndexOutOfBoundsException:
j.l.IndexOutOfBoundsException: Index: 93, Size: 9
at java.util.ArrayList.rangeCheck(
at java.util.ArrayList.get(
at c.e.k.u.MapReferenceResolver.getReadObject(
at c.e.kryo.Kryo.readReferenceOrNull(
at c.e.kryo.Kryo.readObjectOrNull(
... 12 common frames omitted
Wrapped by: c.e.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 93, Size: 9
Serialization trace:
fooBarStr (
at c.e.kryo.Kryo.readClassAndObject(
at o.a.f.a.j.t.r.k.KryoSerializer.deserialize(
at o.a.f.s.r.s.StreamElementSerializer.deserialize(
at o.a.f.s.r.s.StreamElementSerializer.deserialize(
at o.a.f.r.i.n.a.s.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRec...
and this Github issue on Kryo:
Try this:
env.getConfig.registerTypeWithKryoSerializer(classOf[ProjectPayload], classOf[JavaSerializer[ProjectPayload]])
env.getConfig.registerTypeWithKryoSerializer(classOf[ProjectContext], classOf[JavaSerializer[ProjectContext]])
and make sure you are importing

WSO2 Integration Studio v6.5.0's built-in Kafka template throws NoClassDefFoundError

I've installed WSO2 Integration Studio version 6.5.0 in my Windows workstation and created a project using the Kafka Consumer and Producer built-in template.
Then I configured the project with my own Kafka server settings (topic name "myTopic").
I then right-clicked the composite application and chose Export Project Artifacts and Run.
The Console window displayed at the very top the following messages:
[2019-06-25 09:23:45,499] [micro-integrator] INFO - LibraryArtifactDeployer Synapse Library named '{org.wso2.carbon.connector}kafkaTransport' has been deployed from file : C:\IntegrationStudio\runtime\microesb\tmp\carbonapps\-1234\\kafkaTransport-connector_2.0.6\
[2019-06-25 09:23:45,517] [micro-integrator] INFO - SynapseImportFactory Successfully created Synapse Import: kafkaTransport
[2019-06-25 09:23:45,533] [micro-integrator] ERROR - ClassMediatorFactory
Error in instantiating class :
java.lang.NoClassDefFoundError: org/apache/kafka/common/header/Headers
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(
at java.lang.Class.getConstructor0(
[snipped rest for clarity]
I've tried uninstalling Integrator Studio and running it with elevated right to no avail.
I expected the project to be deployed normally.
EDIT: after copying:
to the EI_HOME/lib directory, the exception changed to:
org.apache.axis2.deployment.DeploymentException: kafka/consumer/ConsumerTimeoutException
at org.apache.synapse.deployers.AbstractSynapseArtifactDeployer.deploy(
at org.wso2.carbon.application.deployer.synapse.SynapseAppDeployer.deployArtifactType(
at org.wso2.carbon.application.deployer.synapse.SynapseAppDeployer.deployArtifacts(
at org.wso2.carbon.application.deployer.internal.ApplicationManager.deployCarbonApp(
at org.wso2.carbon.application.deployer.CappAxis2Deployer.deploy(
at org.apache.axis2.deployment.repository.util.DeploymentFileData.deploy(
at org.apache.axis2.deployment.DeploymentEngine.doDeploy(
at org.apache.axis2.deployment.repository.util.WSInfoList.update(
[snipped for clarity]
Caused by: org.apache.axis2.deployment.DeploymentException: kafka/consumer/ConsumerTimeoutException
at org.apache.synapse.deployers.AbstractSynapseArtifactDeployer.deploy(
... 87 more
Caused by: java.lang.NoClassDefFoundError: kafka/consumer/ConsumerTimeoutException
at org.wso2.carbon.inbound.endpoint.protocol.kafka.KAFKAPollingConsumer.startsMessageListener(
at org.wso2.carbon.inbound.endpoint.protocol.kafka.KAFKAProcessor.init(
at org.apache.synapse.inbound.InboundEndpoint.init(
at org.apache.synapse.deployers.InboundEndpointDeployer.deploySynapseArtifact(
at org.apache.synapse.deployers.AbstractSynapseArtifactDeployer.deploy(
... 87 more
Caused by: java.lang.ClassNotFoundException: kafka.consumer.ConsumerTimeoutException cannot be found by synapse-core_2.1.7.wso2v111
at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(
at org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(
at java.lang.ClassLoader.loadClass(
... 92 more
Have you copied the required jars from kafka_home/libs folder to EI_home/lib, if yes then share your code to get the issue detail
According to this documentation, the recommended versions for Kafka is kafka_2.9.2- You can download it in the below link. Please use those jars and copy them to the EI_HOME/lib. There is an github issue for this as well.
I may be a bit too late but we have been using Custom inbound endpoint for Kafka. We also faced exactly same issue and that was the only way to fix it.
You could use to configure it.

How to save data-frame in MySQL using PySpark

I am new to Apache Spark. I have a use case where I have to save data frame data in MySQL. I got the below code to do the same:
But when I ran the code, I got the below error:
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
I might be missing out on very minute detail. How can I fix this?
The error description is clearly indicating that it's not able to locate the JDBC driver class. You will have to include the JAR file for com.mysql.jdbc.Driver using
pyspark --jars <jar-file-location>
See this question - How to add third-party Java JAR files for use in PySpark.

Spark executor is throwing error "java.lang.ClassNotFoundException: oracle.jdbc.OracleDriver"

I am trying to import a table from my oracle database using spark and here I am using Scala to import the table.
My jdbc driver is ojdbc7.jar and it's added in both the parameter spark.driver.extraClassPath and spark.executor.extraClassPath in configuration file
spark.driver.extraClassPath :/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/s
spark.driver.extraLibraryPath /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
spark.executor.extraClassPath :/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/s
spark.executor.extraLibraryPath /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
I can successfully import the table. I can print the schema of the table. But while performing any operations like Count,show() it throws below error
Caused by: java.lang.ClassNotFoundException: oracle.jdbc.OracleDriver
at java.lang.ClassLoader.findClass( at
at java.lang.ClassLoader.loadClass( at
at java.lang.ClassLoader.loadClass( at
... 21 more
This error was because Spark was not able to locate the ojdbc7.jar from every core node. So placing this jar in a shared location like /usr/lib/spark/jars will resolve this issue.
You can also do few other things including adding jar file full location as a dependency under spark section in the interpreter as an artifact
If you just want %jdbc to work, update the jdbc section under interpreter, add the jar file full location as a artifact under the dependencies and also update the default.driver, default.url, default.user, default.password accordingly

'new HiveContext' is wanting an X11 display? com.trend.iwss.jscan?

Spark 1.6.2 (YARN master)
Package name: com.example.spark.Main
Basic SparkSQL code
val conf = new SparkConf()
conf.setAppName("SparkSQL w/ Hive")
val sc = new SparkContext(conf)
val hiveContext = new HiveContext(sc)
import hiveContext.implicits._
// val rdd = <some RDD making>
val df = rdd.toDF()
And stacktrace...
No X11 DISPLAY variable was set, but this program performed an operation which requires it.
at java.awt.GraphicsEnvironment.checkHeadless(
at java.awt.Window.<init>(
at java.awt.Frame.<init>(
at java.awt.Frame.<init>(
at com.trend.iwss.jscan.runtime.BaseDialog.getActiveFrame(
at com.trend.iwss.jscan.runtime.AllowDialog.make(
at com.trend.iwss.jscan.runtime.PolicyRuntime.showAllowDialog(
at com.trend.iwss.jscan.runtime.PolicyRuntime.stopActionInner(
at com.trend.iwss.jscan.runtime.PolicyRuntime.stopAction(
at com.trend.iwss.jscan.runtime.PolicyRuntime.stopAction(
at com.trend.iwss.jscan.runtime.NetworkPolicyRuntime.checkURL(
at com.trend.iwss.jscan.runtime.NetworkPolicyRuntime._preFilter(
at com.trend.iwss.jscan.runtime.PolicyRuntime.preFilter(
at com.trend.iwss.jscan.runtime.NetworkPolicyRuntime.preFilter(
at org.apache.commons.logging.LogFactory$
at Method)
at org.apache.commons.logging.LogFactory.getProperties(
at org.apache.commons.logging.LogFactory.getConfigurationFile(
at org.apache.commons.logging.LogFactory.getFactory(
at org.apache.commons.logging.LogFactory.getLog(
at org.apache.hadoop.hive.shims.HadoopShimsSecure.<clinit>(
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(
at org.apache.hadoop.hive.shims.ShimLoader.createShim(
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(
at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(
at org.apache.spark.sql.hive.client.ClientWrapper.overrideHadoopShims(ClientWrapper.scala:116)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:69)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
at java.lang.reflect.Constructor.newInstance(
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:249)
at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:345)
at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:255)
at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:459)
at org.apache.spark.sql.hive.HiveContext.defaultOverrides(HiveContext.scala:233)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:236)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
at com.example.spark.Main1$.main(Main.scala:52)
at com.example.spark.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,/etc/hive/ will be used
This same code was working a week ago on a fresh HDP cluster, and it works fine in the sandbox... the only thing I remember doing was trying to change around the JAVA_HOME variable, but I am fairly sure I undid those changes.
I'm at a loss - not sure how to start tracking down the issue.
The cluster is headless, so of-course it has no X11 display, but what piece of new HiveContext even needs to pop-up any JFrame?
Based on the logs, I'd say it's a Java configuration issue I messed up and something within org.apache.hadoop.hive.shims.HadoopShimsSecure.<clinit>( got triggered, therefore a Java security dialog is appearing, but I don't know.
Can't do X11 forwarding, and tried to do export SPARK_OPTS="-Djava.awt.headless=true" before a spark-submit, and that didn't help.
Tried these, but again, can't forward and don't have a display
Getting a HeadlessException: No X11 DISPLAY variable was set
"No X11 DISPLAY variable" - what does it mean?
The error seems to be reproducible on two of the Spark clients.
Only on one machine did I try changing JAVA_HOME.
Did an Ambari Hive service-check. Didn't fix it.
Can connect fine to Hive database via Hive/Beeline CLI
As far as the Spark code is concerned, this seemed to mitigate the error.
val conf = SparkConf()
conf.set("spark.executor.extraJavaOptions" , "-Djava.awt.headless=true")
Original answer
Found this post. Spring 3.0.5 - java.awt.HeadlessException - com.trend.iwss.jscan
Basically, Trend Micro is inserting some com.trend.iwss.jscan package into the JAR files that are downloaded via Maven through a company firewall , and I have no control over that.
(link not working)
Wayback Machine to the rescue...
If anyone else has input, I would also like to hear it.
When downloading some .JAR files via IWSA, a directory filled with .class file, which is not related to what is being downloaded, is added to the jar file (com\trend\iwss\jscan\runtime\).
This happens because if a JAR file is originally unsigned, IWSA will insert some code into the applet to monitor and restrict potential harmful actions.
For IWSS/IWSA, every "get" request is the same so it will not know if you are trying to download an archive or an applet, which will be executed by your browser.
This code is added for security reason to monitor the behavior of the "possible" applet to be sure that it does not do any harm to the machine and its environment.
To prevent this issue, please follow these steps:
Log on to the IWSS web console.
Go to HTTP > Applets and ActiveX > Policies > Java Applet Security Rules.
Under Java Applet Security, change the value of "No signature" to either "Pass" or "Block", depending on what you want to do with the unsigned .JAR files.
Click Save.