My Environment: scala 2.11.7, spark 1.2.0 on CDH
spark-assembly-1.2.0-cdh5.3.8-hadoop2.5.0-cdh5.3.8.jar
I get data from mongo with spark. But saveAsNewAPIHadoopFile method could not be find. Only saveAsTextFile, saveAsObjectFile methods are available for save.
val mongoConfig = new Configuration()
mongoConfig.set("mongo.input.uri", "mongodb://192.168.0.211:27017/chat.article")
mongoConfig.set("mongo.input.query","{customerId: 'some mongo id', usage: {'$gt': 30}")
val articleRDD = sc.newAPIHadoopRDD(mongoConfig, classOf[MongoInputFormat], classOf[Text], classOf[BSONObject])
val outputConfig = new Configuration()
outputConfig.set("mongo.input.uri", "mongodb://192.168.0.211:27017/chat.recomm")
articleRDD.saveAsNewAPIHadoopFile("", classOf[Object], classOf[BSONObject],
classOf[MongoOutputFormat[Object, BSONObject]], outputConfig)
This is my screen capture in Intellij IDEA
Following is my build.sbt:
libraryDependencies += "org.mongodb.mongo-hadoop" % "mongo-hadoop-core" % "1.4.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.5.0-cdh5.3.8"
【spark-assembly-1.2.0-cdh5.3.8-hadoop2.5.0-cdh5.3.8.jar】 is not in sbt. I found in cdh home directory and moved it to my project dir manually.
Because the method is not in that package but rather in the following one :
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-hadoop-core</artifactId>
<version>1.4.0-SNAPSHOT</version>
</dependency>
You might want to check compatibility of the mongo-hadoop-core package so you can use the proper one for Spark 1.2
Related
I tried to create RDD using sc.parallelize. It gave an exception as :
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.SparkContext.withScope(SparkContext.scala:701)
at org.apache.spark.SparkContext.parallelize(SparkContext.scala:718)
at df_avro.SampleDf$.main(SampleDf.scala:25)
at df_avro.SampleDf.main(SampleDf.scala)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.10.2
at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:808)
at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
... 4 more
This was the code :
val conf = new SparkConf()
.setMaster("local[2]")
.setAppName("TheApp")
val spark = SparkSession.builder()
.config(conf)
.getOrCreate()
val sc = spark.sparkContext
val rowArray: Array[Row] = Array(
Row(1,"hello", true),
Row(2,"goodbye", false)
)
val rows: RDD[Row] = sc.parallelize(rowArray)
println(rows.count())
Why is this causing an exception. Am I missing something ?
Dependencies used :
val spark2Version ="2.2.1"
// additional libraries
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % spark2Version,
"org.apache.spark" %% "spark-sql" % spark2Version,
"org.apache.spark" %% "spark-streaming" % spark2Version )
Can add below dependency overrides inside your build.sbt & try again.
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.6.5"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.6.5"
dependencyOverrides += "com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.6.5"
u can go to
at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
and see
def setupModule(context: SetupContext) {
val MajorVersion = version.getMajorVersion
val MinorVersion = version.getMinorVersion
context.getMapperVersion match {
case version#VersionExtractor(MajorVersion, minor) if minor < MinorVersion =>
throw new JsonMappingException("Jackson version is too old " + version)
case version#VersionExtractor(MajorVersion, minor) =>
// Under semantic versioning, this check would not be needed; however Jackson
// occasionally has functionally breaking changes across minor versions
// (2.4 -> 2.5 as an example). This may be the fault of the Scala module
// depending on implementation details, so for now we'll just declare ourselves
// as incompatible and move on.
if (minor > MinorVersion) {
throw new JsonMappingException("Incompatible Jackson version: " + version)
}
case version =>
throw new JsonMappingException("Incompatible Jackson version: " + version)
}
so it's clearly;u use a higher version than this;u can see the jar's version;
So changing your dependency's version can solve it;
I use 2.11 scala,2.4.1 spark,3.1.4 hadoop;
Then I use
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version> 2.6.7</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version> 2.6.7</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version> 2.6.7</version>
</dependency>
Hi I am a noob to Scala and Intellij and I am just trying to do this on Scala:
import org.apache.spark
import org.apache.spark.sql.SQLContext
import com.databricks.spark.xml.XmlReader
object SparkSample {
def main(args: Array[String]): Unit = {
val conf = new spark.SparkConf()
conf.setAppName("Datasets Test")
conf.setMaster("local[2]")
val sc = new spark.SparkContext(conf)
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
.format("com.databricks.spark.xml")
.option("rowTag", "shop")
.load("shops.xml") /* NoSuchMethod error here */
val selectedData = df.select("author", "_id")
df.show
}
Basically I am trying to convert XML into spark dataframe
I am getting a NoSuchMethod error in '.load("shops.xml")'
the Below is the SBT
version := "0.1"
scalaVersion := "2.11.3"
val sparkVersion = "2.0.0"
val sparkXMLVersion = "0.3.3"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion exclude("jline", "2.12"),
"org.apache.spark" %% "spark-sql" % sparkVersion excludeAll(ExclusionRule(organization = "jline"),ExclusionRule("name","2.12")),
"com.databricks" %% "spark-xml" % sparkXMLVersion,
)
Below is the trace:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.types.DecimalType$.Unlimited()Lorg/apache/spark/sql/types/DecimalType;
at com.databricks.spark.xml.util.InferSchema$.<init>(InferSchema.scala:50)
at com.databricks.spark.xml.util.InferSchema$.<clinit>(InferSchema.scala)
at com.databricks.spark.xml.XmlRelation$$anonfun$1.apply(XmlRelation.scala:46)
at com.databricks.spark.xml.XmlRelation$$anonfun$1.apply(XmlRelation.scala:46)
at scala.Option.getOrElse(Option.scala:120)
at com.databricks.spark.xml.XmlRelation.<init>(XmlRelation.scala:45)
at com.databricks.spark.xml.DefaultSource.createRelation(DefaultSource.scala:66)
at com.databricks.spark.xml.DefaultSource.createRelation(DefaultSource.scala:44)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:315)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
Can someone point out the error?Seems like a dependency issue to me.
spark-core seems to be working fine but not spark-sql
I had scala 2.12 before but changed to 2.11 because spark-core was not resolved
tl;dr I think it's a Scala version mismatch issue. Use spark-xml 0.4.1.
Quoting spark-xml's Requirements (highlighting mine):
This library requires Spark 2.0+ for 0.4.x.
For version that works with Spark 1.x, please check for branch-0.3.
That says to me that spark-xml 0.3.3 works with Spark 1.x (not Spark 2.0.0 that you requested).
I followed the instructions at http://mongodb.github.io/mongo-scala-driver/ to install for scala 2.11. But my first class cannot run.
def main(args: Array[String]): Unit = {
val mongoClient: MongoClient = if (args.isEmpty) MongoClient() else MongoClient("mongodb://localhost")
}
Exception:
Exception in thread "main" java.lang.NoSuchMethodError: com.mongodb.connection.DefaultClusterFactory.create(Lcom/mongodb/connection/ClusterSettings;Lcom/mongodb/connection/ServerSettings;Lcom/mongodb/connection/ConnectionPoolSettings;Lcom/mongodb/connection/StreamFactory;Lcom/mongodb/connection/StreamFactory;Ljava/util/List;Lcom/mongodb/event/ClusterListener;Lcom/mongodb/event/ConnectionPoolListener;Lcom/mongodb/event/ConnectionListener;Lcom/mongodb/event/CommandListener;Ljava/lang/String;Lcom/mongodb/client/MongoDriverInformation;)Lcom/mongodb/connection/Cluster;
at com.mongodb.async.client.MongoClients.createMongoClient(MongoClients.java:188)
at com.mongodb.async.client.MongoClients.create(MongoClients.java:181)
at com.mongodb.async.client.MongoClients.create(MongoClients.java:123)
at org.mongodb.scala.MongoClient$.apply(MongoClient.scala:102)
at org.mongodb.scala.MongoClient$.apply(MongoClient.scala:77)
at org.mongodb.scala.MongoClient$.apply(MongoClient.scala:51)
at org.mongodb.scala.MongoClient$.apply(MongoClient.scala:43)
at db.mongo.client.MongoClientExample$.main(MongoClientExample.scala:18)
at db.mongo.client.MongoClientExample.main(MongoClientExample.scala)
I have had the same problem and it was due to a wrong dependency to the underlying Java driver. The Mongo Spark connector had a dependency to version 3.2.2 of the Java driver, which apparently was also used by the Scala driver. I solved the problem by defining the dependencies as follows in sbt:
val mongoScalaDriver = "org.mongodb.scala" %% "mongo-scala-driver" % "2.1.0"
val mongoJavaDriver = "org.mongodb" % "mongo-java-driver" % "3.4.2"
val mongoSpark = Seq(mongoScalaDriver, mongoJavaDriver, "org.mongodb.spark" %% "mongo-spark-connector" % "2.0.0")
I use then mongoSpark in my library project. Also, I added explicitly a dependency to version 3.4.2 of the Java driver.
I finally fixed the problem by one additional line in build.sbt:
libraryDependencies += "org.mongodb.scala" %% "mongo-scala-bson" % "2.1.0"
I'm trying to read properties file and got stuck with error which is given below.I have written an Scala package where i'm trying to read properties file and call into abc.scala program.Any help will be appreciated.
File:- xyz.properties
driver = "oracle.jdbc.driver.OracleDriver"
url = "jdbc:oracle:thin:#xxxx:1521/xxxx.xxxx"
username = "xxx"
password = "xxx"
input_file = "C:\\Users\\xxx\\test\\src\\main\\resources\\xxxx.xlsx"
build.sbt
name := "xxx.xxxx.xxxxx"
scalaVersion := "2.10.6"
ivyScala := ivyScala.value map{ _.copy(overrideScalaVersion = true) }
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.1.0",
"com.databricks" %% "spark-csv" % "1.5.0",
"org.apache.commons" % "commons-configuration2" % "2.1.1",
"commons-beanutils" % "commons-beanutils" % "1.9.3",
"org.apache.spark" %% "spark-sql" % "2.1.0",
"org.scala-lang" % "scala-xml" % "2.11.0-M4" )
Package
package com.xxx.zzzz.xxx1
import java.io.File
import org.apache.commons.configuration2.builder.fluent.{Configurations, Parameters}
object Configuration {
var config = new Configurations()
var configs = config.properties(new File("xyz.properties"))
var inputFile = configs.getString("input")
var userName = configs.getString("user_name")
var password = configs.getString("passwd")
var driver = configs.getString("driver")
var url = configs.getString("Url")
}
Main Program abc.scala
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SQLContext
import package com.xxx.zzzz.xxx1.Configurations
import org.apache.commons.beanutils.PropertyUtils
object ItalyPanelData {
def main(args: Array[String]): Unit = {
//Logger.getRootLogger().setLevel(Level.OFF)
println("Inside main program"+ Configuration.driver)
//Set the properties for spark to connect the oracle database
val dbProp = new java.util.Properties
dbProp.setProperty("driver", Configuration.driver)
dbProp.setProperty("user", Configuration.userName)
dbProp.setProperty("password", Configuration.password)
//Create a connection to connect spark
val conf = new SparkConf().setAppName("Simple Application").setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
//exception handlying
try {
//Create dataframe boject
val df = sqlContext.read
.option("location", Configuration.inputFile) //Initiating input path
.option("sheetName", "xyz") //Give the SheetName
.option("useHeader", "true") //It takes the header name from excel sheet
.option("treatEmptyValuesAsNulls", "true")
.option("inferSchema", "true")
.option("addColorColumns", "false")
.load()
// Write into oracale database
df.write.mode("append").jdbc(Configuration.url, "xyz", dbProp)
}
catch {
case e: Throwable => e.printStackTrace();
}
}
}
Error
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.beanutils.PropertyUtilsBean.addBeanIntrospector(Lorg/apache/commons/beanutils/BeanIntrospector;)V
at org.apache.commons.configuration2.beanutils.BeanHelper.initBeanUtilsBean(BeanHelper.java:631)
at org.apache.commons.configuration2.beanutils.BeanHelper.<clinit>(BeanHelper.java:89)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at com.sun.proxy.$Proxy0.<clinit>(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:739)
at org.apache.commons.configuration2.builder.fluent.Parameters.createParametersProxy(Parameters.java:294)
at org.apache.commons.configuration2.builder.fluent.Parameters.fileBased(Parameters.java:185)
at org.apache.commons.configuration2.builder.fluent.Configurations.fileParams(Configurations.java:602)
at org.apache.commons.configuration2.builder.fluent.Configurations.fileParams(Configurations.java:614)
at org.apache.commons.configuration2.builder.fluent.Configurations.fileBasedBuilder(Configurations.java:132)
at org.apache.commons.configuration2.builder.fluent.Configurations.propertiesBuilder(Configurations.java:238)
at org.apache.commons.configuration2.builder.fluent.Configurations.properties(Configurations.java:282)
at com.rxcorp.italy.config.Configuration$.<init>(Configuration.scala:8)
at com.rxcorp.italy.config.Configuration$.<clinit>(Configuration.scala)
at com.rxcorp.paneldataloading.ItalyPanelData$.main(abc.scala:12)
Such exceptions are an indication of a version incompatibility.
Meaning: the code that you have written (or more likely: the one of the libraries under the surface) wants to call a method
org.apache.commons.beanutils.PropertyUtilsBean.addBeanIntrospector(BeanIntrospector[]);
but the thing is: at runtime, the class file for PropertyUtilsBean does not contain that method.
Thus: you have to step back and understand the components in your stack, and check out their version requirements on the Apache commons library.
And you get more ideas when looking into the javadoc for that method; as it says: Since: 1.9 there.
In other words: this method was added Apache commons 1.9; so some piece of your stack expects at least that version of commons; but your classpath in the JVM that executes the whole thing ... has an older version.
So: check the classpath for apache commons; and most likely you are good by simply updating to a newer version of apache commons. (and yes, maybe that will mean more "hard" debug work; as at least your build settings include a newer version of apache commons).
I guess I have a similar problem. Apache commons configuration 2.7 is used in our project together with apache commons BeanUtils 1.9.
Unfortunately another library we use is jxls-reader 2.0.+ and this one references commons-digester3 library.
So the beanutils 1.9 as well as the commons-digester3 lib both have a class packaged org.apache.commons.beanutils.PropertyUtilsBean. But commons-digester3's version does not have the above mentioned method bringing us to the same dilemma as you have.
For now we can be lucky as our windows servers loading the "correct" version of beanutils first whereas some developers using a mac have it the other way round where the digester3 package is loaded first bringing up the no-such-method-error you have.
Not sure what can be our workaround here.
Anyway check if you have the class two times on your classpath and find out who's using it by checking all your pom.xmls of dependent libs on the classpath. Finally you might be lucky to remove some library if its not needed by your code (chances are low :-( though)
Update 10thNov: I exluded the commons-digester3 from the jxls-reader dependency:
<dependency>
<groupId>org.jxls</groupId>
<artifactId>jxls-reader</artifactId>
<version>2.0.3</version>
<exclusions>
<exclusion>
<groupId>org.apache.commons</groupId>
<artifactId>commons-digester3</artifactId>
</exclusion>
</exclusions>
</dependency>
So that the commons-digester with classifier "with-deps" from jxls-reader won't get resolved and I pull it in explicitely in our pom.xml but only the normal jar without packaged classes of commons-logging, commons-beanutils...
I'm trying to compile a simple scala program and I'm using StreamingContext , here is a snippet of my code:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.scheduler.SparkListener
import org.apache.spark.scheduler.SparkListenerStageCompleted
import org.apache.spark.streaming.StreamingContext._ //error:object streaming is not a member of package org.apache.spark
object FileCount {
def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("File Count")
.setMaster("local")
val sc = new SparkContext(conf)
val textFile = sc.textFile(args(0))
val ssc = new StreamingContext(sc, Seconds(10)) //error : not found: type StreamingContext
sc.stop()
}
}
I have these two error:
object streaming is not a member of package org.apache.spark
and
not found: type StreamingContext
any help please !!
If you are using sbt, add the following library dependencies:
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0" % "provided"
If you are using maven, add the below to pom.xml
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.1.0</version>
<scope>provided</scope>
</dependency>
You'll need to add the dependency of spark-streaming into your build manager.
You need to add the correct dependency corresponds to your import statement. And hope obviously you have import the spark-streaming dependencies. except that we need this dependency also.
Here are the dependencies based on your dependency management tool.
For Maven : Add following to pom.xml
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.1.0</version>
</dependency>
For SBT : Add following to build.sbt
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.1.0" % "provided"
For Gradle
provided group: 'org.apache.spark', name: 'spark-mllib_2.11', version: '2.1.0'
TIP : use grepcode.com to find the appropriate dependency by searching your import statement. It is a nite site!
NOTE : dependency versions can be changed & updated with the time.
I have added missing dependency after that its work for me, which are
"org.apache.spark" %% "spark-mllib" % SparkVersion,
"org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.0.1"