Exception in thread "main" java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ - scala

Hi I try to run spark on my local laptop.
I created a mvn project in intelijidea and in my main class I have one line like bellow and when I try to run a project I got the error like below
val spark = SparkSession.builder().master("local").getOrCreate()
21/11/02 18:02:35 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
Exception in thread "main" java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module #0x34e9fd99) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module #0x34e9fd99
at org.apache.spark.storage.StorageUtils$.(StorageUtils.scala:213)
at org.apache.spark.storage.BlockManagerMasterEndpoint.(BlockManagerMasterEndpoint.scala:110)
at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
at org.apache.spark.SparkContext.(SparkContext.scala:460)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)
at scala.Option.getOrElse(Option.scala:201)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
at Main$.main(Main.scala:8)
at Main.main(Main.scala)
My dependency in pom
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.13</artifactId>
<version>3.2.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.13</artifactId>
<version>3.2.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.1.3</version>
<scope>provided</scope>
</dependency>
Any idea how to resolve this problem ?

At the time of writing this answer, Spark does not support Java 17 - only Java 8/11 (source: https://spark.apache.org/docs/latest/).
In my case uninstalling Java 17 and installing Java 8 (e.g. OpenJDK 8) fixed the problem and I started using Spark on my laptop.
UPDATE:
Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+.

Related

MyBatis-Guice fails to initialize with java 17

I am getting the following exception when starting mybatis with java17.
java.lang.NoSuchMethodError: 'void org.mybatis.guice.AbstractMyBatisModule.bindInterceptor(com.google.inject.matcher.Matcher, com.google.inject.matcher.Matcher, org.aopalliance.intercept.MethodInterceptor[])'
Maven dependencies:
<dependency>
<groupId>org.mybatis</groupId>
<artifactId>mybatis</artifactId>
<version>3.5.11</version>
</dependency>
<dependency>
<groupId>org.mybatis</groupId>
<artifactId>mybatis-guice</artifactId>
<version>3.18</version>
</dependency>
<dependency>
<groupId>com.google.inject</groupId>
<artifactId>guice</artifactId>
<version>5.1.0</version>
</dependency>
I tried downgrading to mybatis-guice version 3.12 but it did not help.
works in intelliJ does not work on a standalone server
The issue there was a 3rd party dependency that was pulling in guice no_aop from 4.0.1. once excluded it worked fine.
Verify no guice dependencies using
mvn dependency:tree

Hadoop 3 gcs-connector doesn't work properly with latest version of spark 3 standalone mode

I wrote a simple Scala application which reads a parquet file from GCS bucket. The application uses :
JDK 17
Scala 2.12.17
Spark SQL 3.3.1
gcs-connector of hadoop3-2.2.7
The connector is taken from Maven, imported via sbt (Scala build tool). I'm not using the latest, 2.2.9, version because of this issue.
The application works perfectly in local mode, so I tried to switch to the standalone mode.
What I did is these steps:
Downloaded Spark 3.3.1 from here
Started the cluster manually like here
I tried to run the application again and faced this error:
[error] Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
[error] at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
[error] at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
[error] at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
[error] at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
[error] at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
[error] at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
[error] at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
[error] at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
[error] at org.apache.parquet.hadoop.util.HadoopInputFile.fromStatus(HadoopInputFile.java:44)
[error] at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:44)
[error] at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:484)
[error] ... 14 more
[error] Caused by: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
[error] at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2592)
[error] at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2686)
[error] ... 24 more
Somehow it cannot detect connector's file system: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
My spark configuration is pretty basic:
spark.app.name = "Example app"
spark.master = "spark://YOUR_SPARK_MASTER_HOST:7077"
spark.hadoop.fs.defaultFS = "gs://YOUR_GCP_BUCKET"
spark.hadoop.fs.gs.impl = "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem"
spark.hadoop.fs.AbstractFileSystem.gs.impl = "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS"
spark.hadoop.google.cloud.auth.service.account.enable = true
spark.hadoop.google.cloud.auth.service.account.json.keyfile = "src/main/resources/gcp_key.json"
I ve found out that the maven version of GCS hadoop connector, is missing dependecies internally.
Ive fixed it by either:
downloading the connector from here https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage and providing to spark configuration on startup. (but it is not recommended to use in production, as the site is clearly stating)
providing missing dependencies for the connector.
to resolve the second option, I did unpack the gcs hadoop connector jar file, looked for the pom.xml, copy dependencies to a new stand alone xml file, and download them using mvn dependency:copy-dependencies -DoutputDirectory=/path/to/pyspark/jars/ command
here is example pom.xml that Ive created, please note I am using the 2.2.9 version of the connector
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<name>TMP_PACKAGE_NAME</name>
<description>
jar dependencies of gcs hadoop connector
</description>
<!--'com.google.oauth-client:google-oauth-client:jar:1.34.1'
-->
<groupId>TMP_PACKAGE_GROUP</groupId>
<artifactId>TMP_PACKAGE_NAME</artifactId>
<version>0.0.1</version>
<dependencies>
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>gcs-connector</artifactId>
<version>hadoop3-2.2.9</version>
</dependency>
<dependency>
<groupId>com.google.api-client</groupId>
<artifactId>google-api-client-jackson2</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>31.1-jre</version>
</dependency>
<dependency>
<groupId>com.google.oauth-client</groupId>
<artifactId>google-oauth-client</artifactId>
<version>1.34.1</version>
</dependency>
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>util</artifactId>
<version>2.2.9</version>
</dependency>
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>util-hadoop</artifactId>
<version>hadoop3-2.2.9</version>
</dependency>
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>gcsio</artifactId>
<version>2.2.9</version>
</dependency>
<dependency>
<groupId>com.google.auto.value</groupId>
<artifactId>auto-value-annotations</artifactId>
<version>1.10.1</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>com.google.flogger</groupId>
<artifactId>flogger</artifactId>
<version>0.7.4</version>
</dependency>
<dependency>
<groupId>com.google.flogger</groupId>
<artifactId>google-extensions</artifactId>
<version>0.7.4</version>
</dependency>
<dependency>
<groupId>com.google.flogger</groupId>
<artifactId>flogger-system-backend</artifactId>
<version>0.7.4</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.10</version>
</dependency>
</dependencies>
</project>
hope this helps
This is caused by the fact that Spark uses an old Guava library version and you used a non-shaded GCS connector jar. To make it work, you just need to use shaded GCS connector jar from Maven, for example: https://repo1.maven.org/maven2/com/google/cloud/bigdataoss/gcs-connector/hadoop3-2.2.9/gcs-connector-hadoop3-2.2.9-shaded.jar

Error while using SparkSession or sqlcontext

I am new to spark. I am just trying to parse a json file using sparksession or sqlcontext.
But whenever I run them, I am getting the following error.
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.spark.internal.config.package$.CATALOG_IMPLEMENTATION()Lorg/apache/spark/internal/config/ConfigEntry; at
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$sessionStateClassName(SparkSession.scala:930) at
org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:112) at
org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:110) at
org.apache.spark.sql.DataFrameReader.<init>(DataFrameReader.scala:535) at
org.apache.spark.sql.SparkSession.read(SparkSession.scala:595) at
org.apache.spark.sql.SQLContext.read(SQLContext.scala:504) at
joinAssetsAndAd$.main(joinAssetsAndAd.scala:21) at
joinAssetsAndAd.main(joinAssetsAndAd.scala)
As of now I created a scala project in eclipse IDE and configured it as Maven project and added the spark and sql dependencies.
My dependencies :
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.0.0</version>
</dependency>
</dependencies>
Could you please explain why I am getting this error and how to correct them?
Try to use the same version for spark-core and spark-sql. Change version of spark-sql to 2.1.0

Getting exception : java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;) while using data frames

I am receiving "java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)" error while using dataframes in scala app and running it using spark. However if I work using only RDD's and not dataframes, no such error comes up with same pom and settings. Also while going through other posts with same error, it is mentioned that scala version has to be 2.10 as spark is not compatible with 2.11 scala, and i am using 2.10 scala version with 2.0.0 spark.
Below is the snip from pom:
<properties>
<spark-assembly>/usr/lib/spark/lib/spark-assembly.jar</spark-assembly>
<encoding>UTF-8</encoding>
<hadoop.version>2.7.1</hadoop.version>
<hbase.version>1.1.1</hbase.version>
<scala.version>2.10.5</scala.version>
<scala.tools.version>2.10</scala.tools.version>
<spark.version>2.0.0</spark.version>
<phoenix.version>4.7.0-HBase-1.1</phoenix.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
Error:
16/10/19 02:57:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at com.abc.xyz.Compare$.main(Compare.scala:64)
at com.abc.xyz.Compare.main(Compare.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:627)
16/10/19 02:57:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;)
16/10/19 02:57:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
Change scala version
<scala.version>2.11.8</scala.version>
<scala.tools.version>2.11</scala.tools.version>
and add
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<version>${scala.version}</version>
</dependency>
I also faced this error and this is purely a version issue.
Your scala version is not compatible or may be you are using the correct version but the intellij libraries has the old version.
Quick fix :
I as using spark 2.2.0 and scala 2.10.4 , which I then changed to scala version 2.11.8.After that do the below:
1) right click on intellij module
2) open module-settings.
3) go to libraries and clear all of them
4) Rebuild
Doing above for me issue is resolved.

Scala package throws java.lang.UnsupportedClassVersionError

Our java application has dependencies on Spark, which is written in Scala. Build tool is Maven, and am running from within Eclipse. The JDK_HOME used to compile the application on the command line using Maven, and the JRE used to run within Eclipse, are both 1.7.0_15.
The Maven POM contains the following:
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
...
<configuration>
<scalaVersion>1.10.5</scalaVersion>
<args>
<arg>-target:jvm-1.7</arg>
</args>
</configuration>
</plugin>
I understand that Spark is built using Scala 2.10
The maven dependencies include the following:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop</artifactId>
<version>2.1.0.Beta4</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark_2.10</artifactId>
<version>2.1.0.Beta4</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-xml</artifactId>
<version>2.11.0-M4</version>
</dependency>
<dependency>
<groupId>org.scala-lang.modules</groupId>
<artifactId>scala-parser-combinators_2.12.0-M2</artifactId>
<version>1.0.4</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.3.0</version>
</dependency
>
At runtime, the folowing exception is thrown:
Exception in thread "main" java.lang.UnsupportedClassVersionError: scala/util/parsing/combinator/PackratParsers : Unsupported major.minor version 52.0
I cannot find a 2.10.* version of the scala-parser-combinators jar.
Can anyone assist with the solution?
Thanks!
The scala-parser-combinators_2.12.0-M2 module is part of the Scala 2.12 distribution.
2.12 is targeted for Java 8 - bytecode major version 52, hence the error.
Your best bet is to either use an older Spark distribution or switch to Java 8 (Java 7 is at End-Of-Life since April 2015).
EDIT (addressing question edit): you cannot find an older version of the scala-parser-combinators library, because it was isolated to a stand-alone module at some point after 2.10. You can attempt to simply exclude this dependency in your POM, but there's no guarantee your chosen Spark version will be compatible with this older library version.