I'm trying to compile a simple scala program and I'm using StreamingContext , here is a snippet of my code:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.scheduler.SparkListener
import org.apache.spark.scheduler.SparkListenerStageCompleted
import org.apache.spark.streaming.StreamingContext._ //error:object streaming is not a member of package org.apache.spark
object FileCount {
def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("File Count")
.setMaster("local")
val sc = new SparkContext(conf)
val textFile = sc.textFile(args(0))
val ssc = new StreamingContext(sc, Seconds(10)) //error : not found: type StreamingContext
sc.stop()
}
}
I have these two error:
object streaming is not a member of package org.apache.spark
and
not found: type StreamingContext
any help please !!
If you are using sbt, add the following library dependencies:
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0" % "provided"
If you are using maven, add the below to pom.xml
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.1.0</version>
<scope>provided</scope>
</dependency>
You'll need to add the dependency of spark-streaming into your build manager.
You need to add the correct dependency corresponds to your import statement. And hope obviously you have import the spark-streaming dependencies. except that we need this dependency also.
Here are the dependencies based on your dependency management tool.
For Maven : Add following to pom.xml
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.1.0</version>
</dependency>
For SBT : Add following to build.sbt
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.1.0" % "provided"
For Gradle
provided group: 'org.apache.spark', name: 'spark-mllib_2.11', version: '2.1.0'
TIP : use grepcode.com to find the appropriate dependency by searching your import statement. It is a nite site!
NOTE : dependency versions can be changed & updated with the time.
I have added missing dependency after that its work for me, which are
"org.apache.spark" %% "spark-mllib" % SparkVersion,
"org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.0.1"
Related
Hello i am trying use Hive with spark but when i try executing, it shows this error
Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
This is my source code
package com.spark.hiveconnect
import java.io.File
import org.apache.spark.sql.{Row, SaveMode, SparkSession}
object sourceToHIve {
case class Record(key: Int, value: String)
def main(args: Array[String]){
val warehouseLocation = new File("spark-warehouse").getAbsolutePath
val spark = SparkSession
.builder()
.appName("Spark Hive Example")
.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()
import spark.implicits._
import spark.sql
sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive")
sql("LOAD DATA LOCAL INPATH '/usr/local/spark3/examples/src/main/resources/kv1.txt' INTO TABLE src")
sql("SELECT * FROM src").show()
spark.close()
}
}
This is my build.sbt file.
name := "SparkHive"
version := "0.1"
scalaVersion := "2.12.10"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.5"
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.5"
And i also have hive running in the console.
Can anyone help me with this?
Thank You.
try adding
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.4.5"
The Major Problem is that the class "org.apache.hadoop.hive.conf.HiveConf" can not be loaded.
you can insert ther following code for testing.
Class.forName("org.apache.hadoop.hive.conf.HiveConf",true,
Thread.currentThread().getContextClassLoader);
And an error will occur in this line.
To be exactly,the fundament problem is your pom may not support hive on spark.
you may check the following dependency.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.4.3</version>
</dependency>
The class "org.apache.hadoop.hive.conf.HiveConf" locates in this dependcy.
I tried to create RDD using sc.parallelize. It gave an exception as :
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.SparkContext.withScope(SparkContext.scala:701)
at org.apache.spark.SparkContext.parallelize(SparkContext.scala:718)
at df_avro.SampleDf$.main(SampleDf.scala:25)
at df_avro.SampleDf.main(SampleDf.scala)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.10.2
at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:808)
at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
... 4 more
This was the code :
val conf = new SparkConf()
.setMaster("local[2]")
.setAppName("TheApp")
val spark = SparkSession.builder()
.config(conf)
.getOrCreate()
val sc = spark.sparkContext
val rowArray: Array[Row] = Array(
Row(1,"hello", true),
Row(2,"goodbye", false)
)
val rows: RDD[Row] = sc.parallelize(rowArray)
println(rows.count())
Why is this causing an exception. Am I missing something ?
Dependencies used :
val spark2Version ="2.2.1"
// additional libraries
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % spark2Version,
"org.apache.spark" %% "spark-sql" % spark2Version,
"org.apache.spark" %% "spark-streaming" % spark2Version )
Can add below dependency overrides inside your build.sbt & try again.
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.6.5"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.6.5"
dependencyOverrides += "com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.6.5"
u can go to
at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
and see
def setupModule(context: SetupContext) {
val MajorVersion = version.getMajorVersion
val MinorVersion = version.getMinorVersion
context.getMapperVersion match {
case version#VersionExtractor(MajorVersion, minor) if minor < MinorVersion =>
throw new JsonMappingException("Jackson version is too old " + version)
case version#VersionExtractor(MajorVersion, minor) =>
// Under semantic versioning, this check would not be needed; however Jackson
// occasionally has functionally breaking changes across minor versions
// (2.4 -> 2.5 as an example). This may be the fault of the Scala module
// depending on implementation details, so for now we'll just declare ourselves
// as incompatible and move on.
if (minor > MinorVersion) {
throw new JsonMappingException("Incompatible Jackson version: " + version)
}
case version =>
throw new JsonMappingException("Incompatible Jackson version: " + version)
}
so it's clearly;u use a higher version than this;u can see the jar's version;
So changing your dependency's version can solve it;
I use 2.11 scala,2.4.1 spark,3.1.4 hadoop;
Then I use
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version> 2.6.7</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version> 2.6.7</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version> 2.6.7</version>
</dependency>
I'm trying to read properties file and got stuck with error which is given below.I have written an Scala package where i'm trying to read properties file and call into abc.scala program.Any help will be appreciated.
File:- xyz.properties
driver = "oracle.jdbc.driver.OracleDriver"
url = "jdbc:oracle:thin:#xxxx:1521/xxxx.xxxx"
username = "xxx"
password = "xxx"
input_file = "C:\\Users\\xxx\\test\\src\\main\\resources\\xxxx.xlsx"
build.sbt
name := "xxx.xxxx.xxxxx"
scalaVersion := "2.10.6"
ivyScala := ivyScala.value map{ _.copy(overrideScalaVersion = true) }
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.1.0",
"com.databricks" %% "spark-csv" % "1.5.0",
"org.apache.commons" % "commons-configuration2" % "2.1.1",
"commons-beanutils" % "commons-beanutils" % "1.9.3",
"org.apache.spark" %% "spark-sql" % "2.1.0",
"org.scala-lang" % "scala-xml" % "2.11.0-M4" )
Package
package com.xxx.zzzz.xxx1
import java.io.File
import org.apache.commons.configuration2.builder.fluent.{Configurations, Parameters}
object Configuration {
var config = new Configurations()
var configs = config.properties(new File("xyz.properties"))
var inputFile = configs.getString("input")
var userName = configs.getString("user_name")
var password = configs.getString("passwd")
var driver = configs.getString("driver")
var url = configs.getString("Url")
}
Main Program abc.scala
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SQLContext
import package com.xxx.zzzz.xxx1.Configurations
import org.apache.commons.beanutils.PropertyUtils
object ItalyPanelData {
def main(args: Array[String]): Unit = {
//Logger.getRootLogger().setLevel(Level.OFF)
println("Inside main program"+ Configuration.driver)
//Set the properties for spark to connect the oracle database
val dbProp = new java.util.Properties
dbProp.setProperty("driver", Configuration.driver)
dbProp.setProperty("user", Configuration.userName)
dbProp.setProperty("password", Configuration.password)
//Create a connection to connect spark
val conf = new SparkConf().setAppName("Simple Application").setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
//exception handlying
try {
//Create dataframe boject
val df = sqlContext.read
.option("location", Configuration.inputFile) //Initiating input path
.option("sheetName", "xyz") //Give the SheetName
.option("useHeader", "true") //It takes the header name from excel sheet
.option("treatEmptyValuesAsNulls", "true")
.option("inferSchema", "true")
.option("addColorColumns", "false")
.load()
// Write into oracale database
df.write.mode("append").jdbc(Configuration.url, "xyz", dbProp)
}
catch {
case e: Throwable => e.printStackTrace();
}
}
}
Error
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.beanutils.PropertyUtilsBean.addBeanIntrospector(Lorg/apache/commons/beanutils/BeanIntrospector;)V
at org.apache.commons.configuration2.beanutils.BeanHelper.initBeanUtilsBean(BeanHelper.java:631)
at org.apache.commons.configuration2.beanutils.BeanHelper.<clinit>(BeanHelper.java:89)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at com.sun.proxy.$Proxy0.<clinit>(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:739)
at org.apache.commons.configuration2.builder.fluent.Parameters.createParametersProxy(Parameters.java:294)
at org.apache.commons.configuration2.builder.fluent.Parameters.fileBased(Parameters.java:185)
at org.apache.commons.configuration2.builder.fluent.Configurations.fileParams(Configurations.java:602)
at org.apache.commons.configuration2.builder.fluent.Configurations.fileParams(Configurations.java:614)
at org.apache.commons.configuration2.builder.fluent.Configurations.fileBasedBuilder(Configurations.java:132)
at org.apache.commons.configuration2.builder.fluent.Configurations.propertiesBuilder(Configurations.java:238)
at org.apache.commons.configuration2.builder.fluent.Configurations.properties(Configurations.java:282)
at com.rxcorp.italy.config.Configuration$.<init>(Configuration.scala:8)
at com.rxcorp.italy.config.Configuration$.<clinit>(Configuration.scala)
at com.rxcorp.paneldataloading.ItalyPanelData$.main(abc.scala:12)
Such exceptions are an indication of a version incompatibility.
Meaning: the code that you have written (or more likely: the one of the libraries under the surface) wants to call a method
org.apache.commons.beanutils.PropertyUtilsBean.addBeanIntrospector(BeanIntrospector[]);
but the thing is: at runtime, the class file for PropertyUtilsBean does not contain that method.
Thus: you have to step back and understand the components in your stack, and check out their version requirements on the Apache commons library.
And you get more ideas when looking into the javadoc for that method; as it says: Since: 1.9 there.
In other words: this method was added Apache commons 1.9; so some piece of your stack expects at least that version of commons; but your classpath in the JVM that executes the whole thing ... has an older version.
So: check the classpath for apache commons; and most likely you are good by simply updating to a newer version of apache commons. (and yes, maybe that will mean more "hard" debug work; as at least your build settings include a newer version of apache commons).
I guess I have a similar problem. Apache commons configuration 2.7 is used in our project together with apache commons BeanUtils 1.9.
Unfortunately another library we use is jxls-reader 2.0.+ and this one references commons-digester3 library.
So the beanutils 1.9 as well as the commons-digester3 lib both have a class packaged org.apache.commons.beanutils.PropertyUtilsBean. But commons-digester3's version does not have the above mentioned method bringing us to the same dilemma as you have.
For now we can be lucky as our windows servers loading the "correct" version of beanutils first whereas some developers using a mac have it the other way round where the digester3 package is loaded first bringing up the no-such-method-error you have.
Not sure what can be our workaround here.
Anyway check if you have the class two times on your classpath and find out who's using it by checking all your pom.xmls of dependent libs on the classpath. Finally you might be lucky to remove some library if its not needed by your code (chances are low :-( though)
Update 10thNov: I exluded the commons-digester3 from the jxls-reader dependency:
<dependency>
<groupId>org.jxls</groupId>
<artifactId>jxls-reader</artifactId>
<version>2.0.3</version>
<exclusions>
<exclusion>
<groupId>org.apache.commons</groupId>
<artifactId>commons-digester3</artifactId>
</exclusion>
</exclusions>
</dependency>
So that the commons-digester with classifier "with-deps" from jxls-reader won't get resolved and I pull it in explicitely in our pom.xml but only the normal jar without packaged classes of commons-logging, commons-beanutils...
My Environment: scala 2.11.7, spark 1.2.0 on CDH
spark-assembly-1.2.0-cdh5.3.8-hadoop2.5.0-cdh5.3.8.jar
I get data from mongo with spark. But saveAsNewAPIHadoopFile method could not be find. Only saveAsTextFile, saveAsObjectFile methods are available for save.
val mongoConfig = new Configuration()
mongoConfig.set("mongo.input.uri", "mongodb://192.168.0.211:27017/chat.article")
mongoConfig.set("mongo.input.query","{customerId: 'some mongo id', usage: {'$gt': 30}")
val articleRDD = sc.newAPIHadoopRDD(mongoConfig, classOf[MongoInputFormat], classOf[Text], classOf[BSONObject])
val outputConfig = new Configuration()
outputConfig.set("mongo.input.uri", "mongodb://192.168.0.211:27017/chat.recomm")
articleRDD.saveAsNewAPIHadoopFile("", classOf[Object], classOf[BSONObject],
classOf[MongoOutputFormat[Object, BSONObject]], outputConfig)
This is my screen capture in Intellij IDEA
Following is my build.sbt:
libraryDependencies += "org.mongodb.mongo-hadoop" % "mongo-hadoop-core" % "1.4.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.5.0-cdh5.3.8"
【spark-assembly-1.2.0-cdh5.3.8-hadoop2.5.0-cdh5.3.8.jar】 is not in sbt. I found in cdh home directory and moved it to my project dir manually.
Because the method is not in that package but rather in the following one :
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-hadoop-core</artifactId>
<version>1.4.0-SNAPSHOT</version>
</dependency>
You might want to check compatibility of the mongo-hadoop-core package so you can use the proper one for Spark 1.2
I'm trying to get the DataStax spark cassandra connector working. I've created a new SBT project in IntelliJ, and added a single class. The class and my sbt file is given below. Creating spark contexts seem to work, however, the moment I uncomment the line where I try to create a cassandraTable, I get the following compilation error:
Error:scalac: bad symbolic reference. A signature in CassandraRow.class refers to term catalyst
in package org.apache.spark.sql which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling CassandraRow.class.
Sbt is kind of new to me, and I would appreciate any help in understanding what this error means (and of course, how to resolve it).
name := "cassySpark1"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.1.0"
libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector" % "1.1.0" withSources() withJavadoc()
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector-java" % "1.1.0-alpha2" withSources() withJavadoc()
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
And my class:
import org.apache.spark.{SparkConf, SparkContext}
import com.datastax.spark.connector._
object HelloWorld { def main(args:Array[String]): Unit ={
System.setProperty("spark.cassandra.query.retry.count", "1")
val conf = new SparkConf(true)
.set("spark.cassandra.connection.host", "cassandra-hostname")
.set("spark.cassandra.username", "cassandra")
.set("spark.cassandra.password", "cassandra")
val sc = new SparkContext("local", "testingCassy", conf)
> //val foo = sc.cassandraTable("keyspace name", "table name")
val rdd = sc.parallelize(1 to 100)
val sum = rdd.reduce(_+_)
println(sum) } }
You need to add spark-sql to dependencies list
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.1.0"
Add library dependency in your project's pom.xml file. It seems they have changed the Vector.class dependencies location in the new refactoring.