How to run a spark example program in Intellij IDEA - scala

First on the command line from the root of the downloaded spark project I ran
mvn package
It was successful.
Then an intellij project was created by importing the spark pom.xml.
In the IDE the example class appears fine: all of the libraries are found. This can be viewed in the screenshot.
However , when attempting to run the main() a ClassNotFoundException on SparkContext occurs.
Why can Intellij not simply load and run this maven based scala program? And what can be done as a workaround?
As one can see below, the SparkContext is looking fine in the IDE: but then is not found when attempting to run:
The test was run by right clicking inside main():
.. and selecting Run GroupByTest
It gives
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkContext
at org.apache.spark.examples.GroupByTest$.main(GroupByTest.scala:36)
at org.apache.spark.examples.GroupByTest.main(GroupByTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at com.intellij.rt.execution.application.AppMain.main(
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkContext
at Method)
at java.lang.ClassLoader.loadClass(
at sun.misc.Launcher$AppClassLoader.loadClass(
at java.lang.ClassLoader.loadClass(
... 7 more
Here is the run configuration:

Spark lib isn't your class_path.
Execute sbt/sbt assembly,
and after include "/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*-deps.jar" to your project.

This may help IntelliJ-Runtime-error-tt11383. Change module dependencies from provide to compile. This works for me.

You need to add the spark dependency. If you are using maven just add these lines to your pom.xml:
This way you'll have the dependency for compiling and testing purposes but not in the "jar-with-dependencies" artifact.
But if you want to execute the whole application in an standalone cluster running in your intellij you can add a maven profile to add the dependency with compile scope. Just like this:
I also added an option to my application to start a local cluster if --local is passed:
private def sparkContext(appName: String, isLocal:Boolean): SparkContext = {
val sparkConf = new SparkConf().setAppName(appName)
if (isLocal) {
new SparkContext(sparkConf)
Finally you have to enable "local" profile in Intellij in order to get proper dependencies. Just go to "Maven Projects" tab and enable the profile.


Hadoop 3 gcs-connector doesn't work properly with latest version of spark 3 standalone mode

I wrote a simple Scala application which reads a parquet file from GCS bucket. The application uses :
JDK 17
Scala 2.12.17
Spark SQL 3.3.1
gcs-connector of hadoop3-2.2.7
The connector is taken from Maven, imported via sbt (Scala build tool). I'm not using the latest, 2.2.9, version because of this issue.
The application works perfectly in local mode, so I tried to switch to the standalone mode.
What I did is these steps:
Downloaded Spark 3.3.1 from here
Started the cluster manually like here
I tried to run the application again and faced this error:
[error] Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class not found
[error] at org.apache.hadoop.conf.Configuration.getClass(
[error] at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
[error] at org.apache.hadoop.fs.FileSystem.createFileSystem(
[error] at org.apache.hadoop.fs.FileSystem.access$300(
[error] at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
[error] at org.apache.hadoop.fs.FileSystem$Cache.get(
[error] at org.apache.hadoop.fs.FileSystem.get(
[error] at org.apache.hadoop.fs.Path.getFileSystem(
[error] at org.apache.parquet.hadoop.util.HadoopInputFile.fromStatus(
[error] at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(
[error] at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:484)
[error] ... 14 more
[error] Caused by: java.lang.ClassNotFoundException: Class not found
[error] at org.apache.hadoop.conf.Configuration.getClassByName(
[error] at org.apache.hadoop.conf.Configuration.getClass(
[error] ... 24 more
Somehow it cannot detect connector's file system: java.lang.ClassNotFoundException: Class not found
My spark configuration is pretty basic: = "Example app"
spark.master = "spark://YOUR_SPARK_MASTER_HOST:7077"
spark.hadoop.fs.defaultFS = "gs://YOUR_GCP_BUCKET" = "" = "" = true = "src/main/resources/gcp_key.json"
I ve found out that the maven version of GCS hadoop connector, is missing dependecies internally.
Ive fixed it by either:
downloading the connector from here and providing to spark configuration on startup. (but it is not recommended to use in production, as the site is clearly stating)
providing missing dependencies for the connector.
to resolve the second option, I did unpack the gcs hadoop connector jar file, looked for the pom.xml, copy dependencies to a new stand alone xml file, and download them using mvn dependency:copy-dependencies -DoutputDirectory=/path/to/pyspark/jars/ command
here is example pom.xml that Ive created, please note I am using the 2.2.9 version of the connector
<project xmlns=""
jar dependencies of gcs hadoop connector
hope this helps
This is caused by the fact that Spark uses an old Guava library version and you used a non-shaded GCS connector jar. To make it work, you just need to use shaded GCS connector jar from Maven, for example:

Exception in thread "main" java.lang.IllegalAccessError: class$

Hi I try to run spark on my local laptop.
I created a mvn project in intelijidea and in my main class I have one line like bellow and when I try to run a project I got the error like below
val spark = SparkSession.builder().master("local").getOrCreate()
21/11/02 18:02:35 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
Exception in thread "main" java.lang.IllegalAccessError: class$ (in unnamed module #0x34e9fd99) cannot access class (in module java.base) because module java.base does not export to unnamed module #0x34e9fd99
at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
at org.apache.spark.SparkContext.(SparkContext.scala:460)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)
at scala.Option.getOrElse(Option.scala:201)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
at Main$.main(Main.scala:8)
at Main.main(Main.scala)
My dependency in pom
<!-- -->
<!-- -->
<!-- -->
Any idea how to resolve this problem ?
At the time of writing this answer, Spark does not support Java 17 - only Java 8/11 (source:
In my case uninstalling Java 17 and installing Java 8 (e.g. OpenJDK 8) fixed the problem and I started using Spark on my laptop.
Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+.

Apache Flink: How to sink stream to Google Cloud Storage File System

I was trying yo write Some Data Streams into a file in Google Cloud Storage File System as following (Using Flink 1.8 and Scala 2.11):
data.addSink(new BucketingSink[(String, Int)]("gs://url-try/try.txt"))
But I´m facing the following error:
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(
at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(
at org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(
at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:654)
Caused by: java.lang.RuntimeException: Error while creating FileSystem when initializing the state of the BucketingSink.
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'gs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.createHadoopFileSystem(
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initFileSystem(
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(
... 8 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Cannot support file system for 'gs' via Hadoop, because Hadoop is not in the classpath, or some classes are missing from the classpath.
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(
... 11 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/HdfsConfiguration
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(
... 12 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hdfs.HdfsConfiguration
at java.lang.ClassLoader.loadClass(
at sun.misc.Launcher$AppClassLoader.loadClass(
at java.lang.ClassLoader.loadClass(
... 13 more
I have seen several questions about that and I also did:
env variables:
file flink-conf.yaml:
- fs.hdfs.hadoopconf: src/main/resources/core-site.xml
> <property>
> <name></name>
> <value>
> GoogleHadoopFileSystem</value>
> <description>The FileSystem for gs: (GCS) uris.</description>
> </property>
> <property>
> <name></name>
> <value></value>
> <description>The AbstractFileSystem for gs: (GCS)
> uris.</description>
These are my pom dependencies:
<!-- -->
<!-- -->
<!-- -->
<!-- -->
<!-- -->
<!-- -->
<!-- -->
<!-- -->
Any help?
According to the stack trace posted by you, I have seen that you are having issues trying to write into the GCS container Using Flink and Scala.
So there is a similar post where the issue is resolved, please check it.
do not hesitate to comeback if you have further questions.

Maven dependency given but still one class not found on execution

I have added maven dependencies in pom.xml for cdk but still i get one error of class not found when executing the jar file.
dyna218-128:spark4vs laeeqahmed$ java -cp target/spark4vs-1.0-SNAPSHOT.jar se.uu.farmbio.spark4vs.RunPrediction
Exception in thread "main" java.lang.NoClassDefFoundError: org/openscience/cdk/interfaces/IAtomContainer
Caused by: java.lang.ClassNotFoundException: org.openscience.cdk.interfaces.IAtomContainer
at Method)
at java.lang.ClassLoader.loadClass(
at sun.misc.Launcher$AppClassLoader.loadClass(
at java.lang.ClassLoader.loadClass(
POM.XML is as under
<dependency><!-- SVM depedency -->
Maven dependencies are for building project. Maven Jar Plugin is not using it when JAR id packaged. So you cant run without additional work.
Hovewer there are many solution for this. For example you can use Maven One Jar Plugin and package all dependencies into JAR - but this is not always usable
You can create archive with jar-dependencies
You can merge all jars into one with Maven Shade Plugin

Arquillian/JUnit tests run from console but not inside Eclipse

I've setup our project with some JUnit tests that are run by Arquillian inside the full JBoss Server (inside a profile called jboss-remote-6). I pretty much did everything as in the manual at
If I execute mvn test in the console, everything is properly executed and the assertions are checked.
But when I try to run the JUnit test case inside Eclipse, it fails with the following exception:
org.jboss.arquillian.impl.client.deployment.ValidationException: DeploymentScenario contains targets not maching any defined Container in the registry. _DEFAULT_
at org.jboss.arquillian.impl.client.deployment.DeploymentGenerator.validate(
at org.jboss.arquillian.impl.client.deployment.DeploymentGenerator.generateDeployment(
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
I set up the Maven profile for this project correctly to "jbossas-remote-6" as stated in the pom.xml. What am I doing wrong? Google coulnd't help on this specific one.
Best regards,
There are various things I did to make this work. My role model was the jboss-javaee6 Maven archetype, which is also using Arquillian for unit testing the code in a remote JBoss 6 server. I did the following steps:
Add arquillian.xml
I added the Arquillian.xml in src/test/resources:
<?xml version="1.0" encoding="UTF-8"?>
<arquillian xmlns="" xmlns:xsi=""
<container qualifier="jbossas-remote" default="true">
<property name="httpPort">8080</property>
Shrinkwrap a WebArchive instead of JavaArchive
Using return Shrinkwrap.create( WebArchive.class, "test.war") instead of the JavaArchive.class made the method addAsWebInfResource() method available, where I could add the empty generated beans.xml.
Adjust pom.xml to reduce CLASSPATH length
Eclipse was constantly breaking with javaw.exe giving a CreateProcess error=87 message. This was caused by the CLASSPATH being too long for the console command. Since the dependency jboss-as-client added Bazillions of dependencies, I changed it to jboss-as-profileservice-client which works fine and has a lot less dependencies.
Another important thing is to have a file in the src/test/resources directory, as stated in the Arquillian docs. But that was already the case here. I guess the arquillian.xml made the difference - this file was never mentioned in the docs, just saw it in the archetype.
This is my Maven profile for remote JBoss testing:
I hope my answer will be useful to somebody. :)
Note there is also an open issue related to running tests in Eclipse: