I'm trying to query MongoDB using Spark SQL shell. I have a limitation that I can only use SQL: no Scala, Python, etc. I intend to use Thrift, but for proof of concept, I am using spark-sql. I'm using EMR with Spark version 2.4.4. More info:
Using Scala version 2.11.12, OpenJDK 64-Bit Server VM, 1.8.0_242
Branch HEAD
Compiled by user ec2-user on 2019-12-14T00:54:30Z
Revision 5f788d5e8f90539ee331702c753fa250727128f4
Url git#aws157git.com:/pkg/Aws157BigTop
Type --help for more information.
I start my shell with a pointer to MongoDB Spark maven coordinates:
spark-sql --packages org.mongodb.spark:mongo-spark-connector_2.12:2.4.1 --conf spark.mongodb.input.uri=mongodb://something.real/development?readPreference=secondary
Spark SQL seems to recognise the jar, via the logs:
org.mongodb.spark#mongo-spark-connector_2.12 added as a dependency
Then I run
CREATE TEMPORARY VIEW mongo
USING com.mongodb.spark.sql.DefaultSource
OPTIONS (
collection 'accounts'
);
And I get the following error:
java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V
at com.mongodb.spark.rdd.partitioner.DefaultMongoPartitioner$.<init>(DefaultMongoPartitioner.scala:64)
at com.mongodb.spark.rdd.partitioner.DefaultMongoPartitioner$.<clinit>(DefaultMongoPartitioner.scala)
at com.mongodb.spark.config.ReadConfig$.<init>(ReadConfig.scala:48)
at com.mongodb.spark.config.ReadConfig$.<clinit>(ReadConfig.scala)
at com.mongodb.spark.sql.DefaultSource.constructRelation(DefaultSource.scala:91)
at com.mongodb.spark.sql.DefaultSource.createRelation(DefaultSource.scala:50)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.execution.datasources.CreateTempViewUsing.run(ddl.scala:93)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:84)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:371)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:274)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V
at com.mongodb.spark.rdd.partitioner.DefaultMongoPartitioner$.<init>(DefaultMongoPartitioner.scala:64)
at com.mongodb.spark.rdd.partitioner.DefaultMongoPartitioner$.<clinit>(DefaultMongoPartitioner.scala)
at com.mongodb.spark.config.ReadConfig$.<init>(ReadConfig.scala:48)
at com.mongodb.spark.config.ReadConfig$.<clinit>(ReadConfig.scala)
at com.mongodb.spark.sql.DefaultSource.constructRelation(DefaultSource.scala:91)
at com.mongodb.spark.sql.DefaultSource.createRelation(DefaultSource.scala:50)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.execution.datasources.CreateTempViewUsing.run(ddl.scala:93)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:84)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:371)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:274)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Any idea how to set up this view using SQL only, ideally without any startup command line options except the Maven coordinates would also be ace.
Looks like a mismatch between Scala versions.
Starting spark-sql with spark-sql --packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.1 worked alright.
To create the Mongo view using SQL only:
CREATE TEMPORARY VIEW source
USING mongo
OPTIONS (
uri 'mongodb://…',
collection 'my_collection'
);
Related
I'm trying to read the table from postgres tables. but i'm facing below error.
Note: i cannot be able to refer external files from local since it is a private workspace.
JDBC : Eg:
"url":"jdbc:postgresql://xxxx-xxxxx-postgresql-prod01.cluster-xxxx.xx-xx-1.rds.amazonaws.com:0000/db_xxx_txxx",
Error i'm getting like : "
java.lang.ClassNotFoundException: org.postgresql.Driver
"
An error was encountered:
An error occurred while calling o153.jdbc.
: java.lang.ClassNotFoundException: org.postgresql.Driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:46)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.$anonfun$driverClass$1(JDBCOptions.scala:102)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.$anonfun$driverClass$1$adapted(JDBCOptions.scala:102)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:102)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:38)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:32)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:340)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
i've tried below code.
tables = read_table(
url=URL,
table="information_schema.tables",
driver=DRIVER,
user=USER,
password=PASS
)
You need to add Postgres driver as a dependency/classpath first.
First copy the JAR onto the cluster or s3 and then in the first cell execute:
%%configure -f
{ "conf":{
"spark.jars": "s3://JAR-LOCATION/postgresql.jar"
}
}
Ref. Postgres JAR with EMR and Jupyter Notebooks
Alternatively, you can configure it while creating the SparkSession.
spark = SparkSession.builder.config('spark.driver.extraClassPath', '/JAR-LOCATION/postgresql.jar').getOrCreate()
Update: Based on your comment since you can't push JAR, you can use maven dependency
%%configure -f
{
"conf": {"spark.jars.packages": "org.postgresql:postgresql:jar:42.4.3"}
}
I am currently using John Snow labs SparkNLP library to train a custom NER Model. I am able to successfully complete the training and the model is getting saved to the disk. When I try to load the model for next step to actually tag some sample data I run into the following error. I using the Ubuntu on Windows 10. spark-nlp_2.12:3.1.2 with Spark 3.1.2, scala 2.12.10 OpenJDK 8
I have also tried the same on PySpark I get same the exact error
pyspark error:
java.lang.NoSuchMethodException:
org.apache.spark.ml.PipelineModel.(java.lang.String)
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getConstructor(Class.java:1825)
at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:468)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:12)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Any help is appreciated. Scala version error
java.lang.NoSuchMethodException:
org.apache.spark.ml.PipelineModel.(java.lang.String) at
java.lang.Class.getConstructor0(Class.java:3082) at
java.lang.Class.getConstructor(Class.java:1825) at
org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:468)
at
com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:12)
at
com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:8)
val loaded_ner_model = NerDLModel.load("./NER_bert_prod_20200221")
.setInputCols("sentence", "token", "bert")
.setOutputCol("ner")
I have a DS joined in Spark. I am using scala. I want to partition the DS on a column "date". I am using the following syntax. The code does not throw Compile Error:
joined.repartition(joined.col("date")).show()
The Scala repartition function is defined as follows:
def repartition(partitionExprs : org.apache.spark.sql.Column*) : org.apache.spark.sql.Dataset[T] = { /* compiled code */ }
However when I execute the code in Spark, I get the following error in Run time. What am I doing wrong?
Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.ScalaRunTime$.wrapRefArray([Ljava/lang/Object;)Lscala/collection/immutable/ArraySeq;
at example.Stocks$.main(Stocks.scala:38)
at example.Stocks.main(Stocks.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
You are probably compiling to the wrong version of Scala (and probably Spark as well). You can verify your running version with spark-submit --version.
If you can isolate then, may be try to run the same on spark-shell. I usually use it for troubleshooting the spark jobs.
I have scala code which uses the Htable class of Hbase , I am building that as jar and running using spark-submit like below
spark2-submit --conf spark.driver.extraClassPath=/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/hbase/lib/* --class commontest scala-maven-plugin-0.0.1-SNAPSHOT.jar
I am passing the hbase class path using extraClassPath , but still getting below error, has anyone got this error ?
Exception in thread "main" java.lang.NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class;
at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<init>(ScalaNumberDeserializersModule.scala:49)
at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<clinit>(ScalaNumberDeserializersModule.scala)
at com.fasterxml.jackson.module.scala.deser.ScalaNumberDeserializersModule$class.$init$(ScalaNumberDeserializersModule.scala:61)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.<init>(DefaultScalaModule.scala:20)
at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<init>(DefaultScalaModule.scala:37)
at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<clinit>(DefaultScalaModule.scala)
at org.apache.spark.util.JsonProtocol$.<init>(JsonProtocol.scala:59)
at org.apache.spark.util.JsonProtocol$.<clinit>(JsonProtocol.scala)
at org.apache.spark.scheduler.EventLoggingListener$.initEventLog(EventLoggingListener.scala:270)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:121)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:531)
at KPICommonDeviceDayUsage$.main(KPICommonDeviceDayUsage.scala:339)
at KPICommonDeviceDayUsage.main(KPICommonDeviceDayUsage.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" java.lang.NoSuchMethodError:
com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class;
This exception indicates that you have multiple versions of the library available at run-time.
I am able to run Spark shell by bin/spark-shell --packages com.databricks:spark-xml_2.11:0.3.0 to analize xml files, for example:
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
.format("com.databricks.spark.xml")
.option("rowTag", "book")
.load("books.xml")
but how can I run Zeppelin to do it so. Does Zeppelin need some parameter at start to import com.databricks.spark.xml?
Now I am getting:
java.lang.RuntimeException: Failed to load class for data source:
com.databricks.spark.xml at
scala.sys.package$.error(package.scala:27) at
org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:220)
at
org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:233)
at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26) at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31) at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) at
$iwC$$iwC$$iwC$$iwC$$iwC.(:35) at
$iwC$$iwC$$iwC$$iwC.(:37) at
$iwC$$iwC$$iwC.(:39) at $iwC$$iwC.(:41)
at $iwC.(:43) at (:45) at
.(:49) at .() at
.(:7) at .() at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497) at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
at
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at
org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:709)
at
org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:674)
at
org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:667)
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
In Zeppelin, you need to call for those dependencies before creating the SparkContext.
In a separate cell you add and run the following
%dep
z.reset()
z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages/maven")
z.load("com.databricks:spark-xml_2.11:0.3.0")
If this gives you an error from the type : "You have to add dependencies before starting your SparkContext" just restart the interpreter or Zeppelin.
The above solution is valid for older versions of zeppelin. For version >0.9 the %dep has been removed. Please refer to the latest doc