Unable to execute Hive queries using spark-submit - hivecontext

I am not able run hive queries using spark-submit command. But, the same is getting executed in spark-shell. I am using AWS EMR as the cluster.
Below is my code written in eclipse scala IDE
object HiveTest {
def main(args: Array[String]): Unit =
val sparkConf = new SparkConf()
val sc = new SparkContext(sparkConf)
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
import sqlContext.implicits._
sqlContext.sql("select * from stream_table");
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka -->
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
spark-submit command
spark-submit --master local[2] --class HiveTest
[hadoop#ip-10-134-23-168 jars]$ spark-submit --master local[2] --class HiveTest ./word-count-0.1-SNAPSHOT-jar-with-dependencies.jar
18/02/12 10:58:45 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
18/02/12 10:58:49 WARN Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:192)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

As the spark version is 2.0 it’s better to use the sparksession object instead of sparkcontext or sqlcontext. And you have to create sparksession object with hivesupport enabled.
It is running in spark shell because the spark session and sc are created with give support enabled.

The reason for failure is classpath. When I run spark-submit with dependency jar, default classpath of spark is not being utilized. Adding provided line in the POM dependencies resolved the issue.
Dependencies with the scope provided will not be added to dependency(word-count-0.1-SNAPSHOT-jar-with-dependencies.jar here) jar. They will be used only for compilation.
changed POM.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka -->
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
spark-submit command
spark-submit --master local[2] --class HiveWordCountScala


spark scala maven error: SparkConf does not have a constructor

I have created a Maven project to run a wordcount spark-scala program. Here when I create my SparkConf it gives me an error "org.apache.spark.SparkConf does not have constructor". Similar for SparkContext
(org.apache.spark.SparkContext has no constructor)
I have imported both SparkContext and SparkConf and also written in the proper constructor format.This could be a Maven issue but no such error pops up related to that.
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object WordCount {
def main(args: Array[String]) {
val cf = new SparkConf().setAppName("WordCount").setMaster("local")
val sc = new SparkContext(cf)
val rawData = sc.textFile("C:/Users/siddharth.shankar/Documents/input.txt")
val words = rawData.flatMap(line => line.split(" "))
val wordCount = words.map(word => (word, 1)).reduceByKey(_ + _)
Here is my pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<name>SparkSample Maven Webapp</name>
I don't know what the issue is here as if I apply the same program as a regular spark-scala(no maven) application the program runs without errors.
Check your scala version for both the cases is same or not?
It seams like version issue. I execute this code with maven it works fine with scala 2.11.
even with spark submit also it is working.
Try adding:
<!-- see http://davidb.github.com/scala-maven-plugin -->
Use these dependencies will work fine.

Why "java.lang.ClassNotFoundException: Failed to find data source: kinesis" with spark-streaming-kinesis-asl dependency?

My setup:
I have already add this in my .pom file:
However, when I run my spark-streaming code to consume data from kinesis, it returns:
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: kinesis.
I got a similar error when I consume data from Kafka and solved it by indicating the dependent jar in the submit command. But it seems this doesn't work this time:
sudo -u hdfs spark2-submit --packages org.apache.spark:spark-streaming-kinesis-asl_2.11:2.3.0 --class com.package.newkinesis --master yarn sparktest-1.0-SNAPSHOT.jar
How to address this issue? Any help is appreciated.
My code:
val spark = SparkSession
.config("spark.driver.memory", "3g")
val kinesis = spark.readStream
.option("streamName", kinesisStreamName)
.option("endpointUrl", kinesisEndpointUrl)
.option("initialPosition", "TRIM_HORIZON")
.option("awsAccessKey", awsAccessKeyId)
.option("awsSecretKey", awsSecretKey)
My full .pom file:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
tl;dr It won't work.
You use spark-streaming-kinesis-asl_2.11 dependency that is for the older Spark Streaming API with the new Spark Structured Streaming and hence the exception.
You have to find a compatible Spark Structured Streaming data source for AWS Kinesis which is not officially supported by the Apache Spark project.

Spark Maven fail to find ml classes

I create code spark with SparkSession but i can't run this code.
I think I am missing some dependencies in my pom.xml or something else-
import org.apache.spark.sql.SparkSession
val spark = SparkSession
pom.xml for scala 2.11
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<!-- optional dependencies -->
<!-- logs -->
<!-- tests -->
I got the same error when I tried to add :
import works :
import org.apache.spark.ml.feature.Tokenizer
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.feature.Word2VecModel
import dosn't work :
import org.apache.spark.ml.feature.CountVectorizerModel
import org.apache.spark.ml.feature.StopWordsRemover
with error Cannot resolve symbol
you would need these dependencies
for more information checkout https://mvnrepository.com/artifact/org.apache.spark
Add below dependency-

How to run maven web application with jetty server in eclipse

I have to created maven project to run with jetty server this project is comfortably running using command line the same project if going to run in eclipse project getting build failure whats the problem
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 6.910 s
[INFO] Finished at: 2017-07-06T15:58:41+05:30
[INFO] Final Memory: 11M/204M
[INFO] ------------------------------------------------------------------------
[ERROR] No plugin found for prefix 'jetty' in the current project and in the plugin groups [org.apache.maven.plugins, org.codehaus.mojo] available from the repositories [local (C:\Users\abc\.m2\repository), central (https://repo.maven.apache.org/maven2)] -> [Help 1]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/NoPluginFoundForPrefixException
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<!-- other dependencies-->
<!-- https://mvnrepository.com/artifact/captcha.simplecaptcha/simplecaptcha -->
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi -->
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml -->
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi-scratchpad -->
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml-schemas -->
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi-excelant -->
<!-- https://mvnrepository.com/artifact/org.jfree/jfreechart -->
<!-- https://mvnrepository.com/artifact/com.itextpdf/itextpdf -->
<!-- end other dependencies -->
<!-- https://mvnrepository.com/artifact/org.apache.struts.xwork/xwork-core -->
<!-- Spring 3 dependencies -->
<!-- Struts 2 + Spring 3 need this jar, ContextLoaderListener -->
<!-- Struts 2 + Spring integration plugins -->
<!-- Mockito -->
<!-- Hibernate -->
<!-- h2 database -->
<!-- Jersey core Servlet 2.x implementation -->
<!-- Jersey JSON Jackson (2.x) entity providers support module -->
The project's artifact is automatically deployed if no deployable
is defined. However, we define it here so that we can specify the
context (we don't want the version to be included in the context).
<!-- specify the dependent jdbc driver here -->
<!--all executions are ignored if -Dmaven.test.skip=true-->
<!-- It creates integration test data before running the tests -->
<!-- <groupId>org.mortbay.jetty</groupId>
<version>8.1.2.v20120308</version> -->
You can run the following command from the same folder as your pom.xml file
mvn jetty:run
This starts Jetty and serves up your project on http://localhost:8080/.
Jetty will continue to run until you stop it. While it runs it periodically scans for changes to your project files If you save changes and recompile your class files, Jetty redeploys your webapp, and you can instantly test the changes that were just made.

Error: Could not find or load main class in scala

After installing eclipse scala plugins and eclipse maven plugin for scala .
I am new to scala , so i tried to so ensured that the enviorment was working after testing a scala hello world project. It works as expected.
But i am facing difficulty while trying to execute the project that i had checked out from the company's repository. No matter what I do (clean,build, clean-install via mave etc) I am getting a "Error: Could not find or load main class com.company.team.spark.sqlutil.testQuery" while trying to run even a small hello world program inside the project. My hunch says eclipse is unable to create class files for the project due to a pom issse, but I am unable to nail it down even after several tries. Please help me to figure this out
Version: Eclipse Luna Release (4.4.0)
Build id: 20140612-0600
scala - 2.10.6
Scalacode - testQuery.scala
package com.company.team.spark.sqlutil
object testQuery {
def main(args: Array[String]): Unit = {
print ("Hello")
Below is the POM I used.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<!-- <dependency>
</dependency> -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10 -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10 -->
<!-- https://mvnrepository.com/artifact/com.databricks/spark-csv_2.10 -->
Link to image of project structure
Use compile install as required for scala-maven-plugin. You might be using clean install which is deleting generated .class files from /bin, eclipse could not find or load main class.
I was able to resolve the issues after opted for scala IDE over eclipse integrated with scala IDE plugin.
Also changed the pom.xml to the following:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<description>My wonderfull scala app</description>
<name>My License</name>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11 -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11 -->
<!-- https://mvnrepository.com/artifact/com.databricks/spark-csv_2.11 -->
<!-- Test -->
<!-- https://mvnrepository.com/artifact/org.scala-lang/scala-library
<!-- https://mvnrepository.com/artifact/com.typesafe.scala-logging/scala-logging_2.11 -->
<!-- see http://davidb.github.com/scala-maven-plugin -->
<!-- If you have classpath issue like NoDefClassError,... -->
<!-- useManifestOnlyJar>false</useManifestOnlyJar -->