What is version library spark supported SparkSession - scala

Code Spark with SparkSession.
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
val conf = SparkSession.builder
.enableHiveSupport() // <- enable Hive support.
Code pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<name>Scala-tools Maven2 Repository</name>
<id>make-assembly</id> <!-- this is used for inheritance merges -->
<phase>install</phase> <!-- bind to the packaging phase -->
I have some problem. I create code spark with SparkSession, iam get trouble SparkSession not find in library SparkSql. So iam can't run code spark. Iam question what is version to find SparkSession in library Spark. I give code pom.xml.

you need both core and SQL artifacts

You need Spark 2.0 to use SparkSession. It's available in Maven central snapshot repository as for now:
groupId = org.apache.spark
artifactId = spark-core_2.11
version = 2.0.0-SNAPSHOT
The same version have to be specified for other Spark artifacts. Note, that 2.0 is still in beta and expected to be stable in about a month, AFAIK.
Update. Alternatively, you can use Cloudera fork of Spark 2.0:
groupId = org.apache.spark
artifactId = spark-core_2.11
version = 2.0.0-cloudera1-SNAPSHOT
Cloudera repository has to be specified in your Maven repositories list:


Spark java.lang.NoClassDefFoundError: org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2

I currently am trying to spark-submit a fat jar to a local cluster, which I developed using Spark 2.4.6; Scala 2.11.12. Upon submitting to the cluster, I receive this error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2
My spark submit command (run in cmd prompt):
spark-submit --class main.app --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.6 my_app_name-1.0-SNAPSHOT-jar-with-dependencies.jar
Other details:
Scala version: 2.11.12
Spark 2.4.6
When I submit using Spark 3.0.0 (i.e. pointing my SPARK_HOME to Spark 3.0.0 directory and submitting), it works fine, but when I submit using Spark 2.4.6 (i.e. pointing my SPARK_HOME to Spark 2.4.6 directory and submitting) I get that error
I have to use 2.4.6 (this cannot be changed)
My pom file
<name>Scala-Tools Maven2 Repository</name>
<name>Scala-Tools Maven2 Repository</name>
<!-- https://mvnrepository.com/artifact/org.junit.jupiter/junit-jupiter-api -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-avro -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql-kafka-0-10 -->
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka -->
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-tools -->
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients -->
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-streams -->
<!-- https://mvnrepository.com/artifact/com.databricks/spark-csv -->
<!-- see http://davidb.github.com/scala-maven-plugin -->
<recompileMode>incremental</recompileMode> <!-- NOTE: incremental compilation although faster requires passing to MAVEN_OPTS="-XX:MaxPermSize=128m" -->
<!-- addScalacArgs>-feature</addScalacArgs -->
<arg>-Yresolve-term-conflict:object</arg> <!-- required for package/object name conflict in Jenkins jar -->
My Main App File
package main
import java.nio.file.{Files, Paths}
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.avro.to_avro
import org.apache.spark.sql.functions.{date_format, struct}
object app {
def main(args: Array[String]): Unit = {
val spark = SparkSession
val accessKeyId = System.getenv("AWS_ACCESS_KEY_ID")
val secretAccessKey = System.getenv("AWS_SECRET_ACCESS_KEY")
val person_df = spark.read.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").load("s3_parquet_path_here")
val person_df_reformatted = person_df.withColumn("registration_dttm_string", date_format(person_df("registration_dttm"), "MM/dd/yyyy hh:mm"))
val person_df_final = person_df_reformatted.select("registration_dttm_string", "id", "first_name", "last_name", "email", "gender", "ip_address", "cc", "country", "birthdate", "salary", "title", "comments")
val person_avro_schema = new String(Files.readAllBytes(Paths.get("input\\person_schema.avsc")))
person_df_final.write.format("avro").mode("overwrite").option("avroSchema", person_avro_schema).save("output/person.avro")
print("\n" + "=====================successfully wrote avro to local path=====================" + "\n")
person_df_final.select(to_avro(struct("registration_dttm_string", "id", "first_name", "last_name", "email", "gender", "ip_address", "cc", "country", "birthdate", "salary", "title", "comments")) as "value")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("topic", "spark_topic_test")
print("\n" + "========================Successfully wrote to avro consumer on localhost kafka consumer========================" + "\n"+ "\n")
First, you have problems with dependencies:
you don't need com.databricks:spark-csv_2.11 - CSV support is in the Spark itself for a long time
you don't need Kafka dependencies except org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.6
spark-sql and spark-core need to be declared with <scope>provided</scope> like here
it's better to use the same version of Spark dependencies as you're using for submission
Second, the problem could be from the incorrect Scala version (for example, you didn't do mvn clean when you changed it) - if you said that code works with Spark 3.0 then it should be compiled with Scala 2.12, while 2.4.6 works only with 2.11
I strongly recommend to get rid of unnecessary dependencies, use provided, do mvn clean, etc.
I met the same error. And solved it by using the jar with same scala version and spark version. I see the jar version you are using (org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.6 ) is consistent with your spark, maybe you can try to change the version to a close one (e.g. org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0 etc).
My spark is "version 2.4.4 using Scala version 2.11.12", when I read an avro file using the following jar(spark-avro_2.12), I got the exactly same error.
spark-shell --packages org.apache.spark:spark-avro_2.12:3.1.2
It was fixed after changing to "spark-shell --packages com.databricks:spark-avro_2.11:2.4.0".

spark scala maven error: SparkConf does not have a constructor

I have created a Maven project to run a wordcount spark-scala program. Here when I create my SparkConf it gives me an error "org.apache.spark.SparkConf does not have constructor". Similar for SparkContext
(org.apache.spark.SparkContext has no constructor)
I have imported both SparkContext and SparkConf and also written in the proper constructor format.This could be a Maven issue but no such error pops up related to that.
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object WordCount {
def main(args: Array[String]) {
val cf = new SparkConf().setAppName("WordCount").setMaster("local")
val sc = new SparkContext(cf)
val rawData = sc.textFile("C:/Users/siddharth.shankar/Documents/input.txt")
val words = rawData.flatMap(line => line.split(" "))
val wordCount = words.map(word => (word, 1)).reduceByKey(_ + _)
Here is my pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<name>SparkSample Maven Webapp</name>
I don't know what the issue is here as if I apply the same program as a regular spark-scala(no maven) application the program runs without errors.
Check your scala version for both the cases is same or not?
It seams like version issue. I execute this code with maven it works fine with scala 2.11.
even with spark submit also it is working.
Try adding:
<!-- see http://davidb.github.com/scala-maven-plugin -->
Use these dependencies will work fine.

Why "java.lang.ClassNotFoundException: Failed to find data source: kinesis" with spark-streaming-kinesis-asl dependency?

My setup:
I have already add this in my .pom file:
However, when I run my spark-streaming code to consume data from kinesis, it returns:
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: kinesis.
I got a similar error when I consume data from Kafka and solved it by indicating the dependent jar in the submit command. But it seems this doesn't work this time:
sudo -u hdfs spark2-submit --packages org.apache.spark:spark-streaming-kinesis-asl_2.11:2.3.0 --class com.package.newkinesis --master yarn sparktest-1.0-SNAPSHOT.jar
How to address this issue? Any help is appreciated.
My code:
val spark = SparkSession
.config("spark.driver.memory", "3g")
val kinesis = spark.readStream
.option("streamName", kinesisStreamName)
.option("endpointUrl", kinesisEndpointUrl)
.option("initialPosition", "TRIM_HORIZON")
.option("awsAccessKey", awsAccessKeyId)
.option("awsSecretKey", awsSecretKey)
My full .pom file:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
tl;dr It won't work.
You use spark-streaming-kinesis-asl_2.11 dependency that is for the older Spark Streaming API with the new Spark Structured Streaming and hence the exception.
You have to find a compatible Spark Structured Streaming data source for AWS Kinesis which is not officially supported by the Apache Spark project.

NoClassDefFoundError: HikariCP with Maven

I'm creating a Maven plugin (to hook into Spigot/Bukkit/BungeeCord) and am attempting to connect to a database. On startup, I get this error:
Exception encountered when loading plugin: WarCore
java.lang.NoClassDefFoundError: com/zaxxer/hikari/HikariDataSource
I've tried using the Maven dependency plugin, Maven assembly plugin, but I cannot find a solution.
Here's my current pom.xml:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
Let me know if you have any solutions, thanks in advance!

Scala signature error for Scala module in IntelliJ Idea Maven project

Disclaimer: I am new to Scala and trying to create a sample Scala Maven project using simple scala archtype in IntelliJ IDEA. IntelliJ version is 14.1.2
Below is my pom file, I did change the Scala version to 2.11.6 from 2.7 which the archetype generates by default.
<!-- language: lang-xml -->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<name>Scala-Tools Maven2 Repository</name>
<name>Scala-Tools Maven2 Repository</name>
On running Maven test, I keep getting following error -
[WARNING] error: error while loading JUnit4, Scala signature JUnit4 has wrong version
[WARNING] expected: 5.0
[WARNING] found: 4.1 in JUnit4.class
I am not sure how to fix this problem?
Just make sure you use the up-to-date scala-archetype-simple, because IDEA requires new scala-archetype-simple to work with. But by default IDEA does not provide the correct scala-srchetype-simple to choose, you need to type in the right one by yourself.
It is like.
I had this issue as well and while trying to correct the ver number manually I found that simple using terminal or CMD you can just build a new project.
Using the mvn commands ensures that your project is built with the correct ver.
mvn -B archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=com.mycompany.app -DartifactId=my-app
Reference: http://maven.apache.org/guides/getting-started/index.html