spark-shell throws NoSuchMethodException if I define a class in REPL and then call newInstance via reflection.
Spark context available as 'sc' (master = yarn, app id = application_1656488084960_0162).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.3
/_/
Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_141)
Type in expressions to have them evaluated.
Type :help for more information.
scala> class Demo {
| def demo(s: String): Unit = println(s)
| }
defined class Demo
scala> classOf[Demo].newInstance().demo("OK")
java.lang.InstantiationException: Demo
at java.lang.Class.newInstance(Class.java:427)
... 47 elided
Caused by: java.lang.NoSuchMethodException: Demo.<init>()
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.newInstance(Class.java:412)
... 47 more
But the same code works fine in native scala REPL:
Welcome to Scala 2.12.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131).
Type in expressions for evaluation. Or try :help.
scala> class Demo {
| def demo(s: String): Unit = println(s)
| }
defined class Demo
scala> classOf[Demo].newInstance().demo("OK")
OK
What's the difference between spark-shell REPL and native scala REPL?
I guess the Demo class might be treated as inner class in spark-shell REPL.
But ... how to solve the problem?
In Scala 2.12.4 REPL the class is nested into objects, so it has zero-argument constructor accessible via .newInstance(). In Spark 3.0.3 shell the class is nested into classes, so there is no zero-argument constructor, the constructor of Demo accepts an instance of outer class and should be accessed via .getConstructors()(0).newInstance(...). Start Scala REPL and Spark shell with ./scala -Xprint:typer and ./spark-shell -Xprint:typer correspondingly and you'll see the difference.
So in Spark shell try
classOf[Demo].getDeclaredMethod("demo", classOf[String])
.invoke(
classOf[Demo].getConstructors()(0).newInstance($lineXX.$read.INSTANCE.$iw.$iw),
"OK"
)
//OK
//resYY: Object = null
(XX is the number of line where Demo is defined).
See details in Spark adds hidden parameter to constructor of a Scala class
Related
Why SparkHadoopUtil is not accessible here whereas is accessible in lower version of spark even though they are imported?
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.2
/_/
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)
Type in expressions to have them evaluated.
Type :help for more information.
scala> import org.apache.spark.deploy.SparkHadoopUtil
import org.apache.spark.deploy.SparkHadoopUtil
scala> import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.conf.Configuration
scala>
scala>
scala> val hadoopConf: Configuration = SparkHadoopUtil.get.conf
<console>:25: error: object SparkHadoopUtil in package deploy cannot be accessed in package org.apache.spark.deploy
val hadoopConf: Configuration = SparkHadoopUtil.get.conf
^
scala>
That's because the SparkHadoopUtil class has been changed to a private class in Spark 3. Here's the difference between the source of Spark 2.4 and Spark 3.0.
Spark 2.4:
#DeveloperApi
class SparkHadoopUtil extends Logging {
Spark 3.0:
private[spark] class SparkHadoopUtil extends Logging {
all
I was reading the section about accumulator in Spark documentation. http://spark.apache.org/docs/latest/rdd-programming-guide.html#accumulators
I was trying to run the sample code in spark-shell:
download the spark zip file
unzip file and cd to the directory
execute ./bin/spark-shell
scala> val accum = sc.longAccumulator("My Accumulator")
accum: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 0, name: Some(My Accumulator), value: 0)
scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum.add(x))
...
10/09/29 18:41:08 INFO SparkContext: Tasks finished in 0.317106 s
scala> accum.value
res2: Long = 10
But no luck, I tried spark 2 and spark 3, both throw an exception. Could you tell me why?
This is spark 2
Spark context available as 'sc' (master = local[*], app id = local-1609254493356).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.7
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 15.0.1)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val accum = sc.longAccumulator("My Accumulator")
accum: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 0, name: Some(My Accumulator), value: 0)
scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum.add(x))
at org.apache.spark.util.FieldAccessFinder$$anon$4$$anonfun$visitMethodInsn$7.apply(ClosureCleaner.scala:845)
at org.apache.spark.util.FieldAccessFinder$$anon$4$$anonfun$visitMethodInsn$7.apply(ClosureCleaner.scala:828)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:134)
This is spark 3
Spark context Web UI available at http://192.168.57.243:4040
Spark context available as 'sc' (master = local[*], app id = local-1609254662877).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.1
/_/
Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 15.0.1)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val accum = sc.longAccumulator("My Accumulator")
accum: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 0, name: Some(My Accumulator), value: 0)
scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum.add(x))
java.lang.IllegalAccessException: Can not set final $iw field $Lambda$2159/0x0000000801470488.arg$1 to $iw
at java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.throwFinalFieldIllegalAccessException(UnsafeFieldAccessorImpl.java:76)
at java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.throwFinalFieldIllegalAccessException(UnsafeFieldAccessorImpl.java:80)
at java.base/jdk.internal.reflect.UnsafeQualifiedObjectFieldAccessorImpl.set(UnsafeQualifiedObjectFieldAccessorImpl.java:79)
at java.base/java.lang.reflect.Field.set(Field.java:793)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:398)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2362)
at org.apache.spark.rdd.RDD.$anonfun$foreach$1(RDD.scala:985)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
at org.apache.spark.rdd.RDD.foreach(RDD.scala:984)
... 47 elided
Java 15.0.1 is the problem. Java 11 is the highest version supported:
Spark runs on Java 8/11, Scala 2.12, Python 2.7+/3.4+ and R 3.5+. Java 8 prior to version 8u92 support is deprecated as of Spark 3.0.0.
I am connecting to Hbase using Spark. I have added all the dependencies but still i am getting this exception. Kindly help me like which JAR i need to add to resolve this issue.
SPARK_MAJOR_VERSION is set to 2, using Spark2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/slf4j-log4j12 -1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/slf4j-log4j12 -1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/phoenix-4.7.0 .2.6.5.0-292-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/phoenix-4.7.0 .2.6.5.0-292-thin-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLeve l(newLevel).
18/09/17 05:34:36 WARN Utils: Service 'SparkUI' could not bind on port 4040. Att empting port 4041.
Spark context Web UI available at http://sandbox-hdp.hortonworks.com:4041
Spark context available as 'sc' (master = local[*], app id = local-1537162476668).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.0.2.6.5.0-292
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.
scala> :paste
// Entering paste mode (ctrl-D to finish)
import org.apache.spark.sql.{SQLContext, _}
import org.apache.spark.sql.execution.datasources.hbase._
import org.apache.spark.{SparkConf, SparkContext}
import spark.sqlContext.implicits._
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{ConnectionFactory,HBaseAdmin,HTable,Put,Get}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.{HTableDescriptor,HColumnDescriptor}
def catalog = s"""{
|"table":{"namespace":"default", "name":"Contacts"},
|"rowkey":"key",
|"columns":{
|"rowkey":{"cf":"rowkey", "col":"key", "type":"string"},
|"officeAddress":{"cf":"Office", "col":"Address", "type":"string"},
|"officePhone":{"cf":"Office", "col":"Phone", "type":"string"},
|"personalName":{"cf":"Personal", "col":"Name", "type":"string"},
|"personalPhone":{"cf":"Personal", "col":"Phone", "type":"string"}
|}
|}""".stripMargin
def withCatalog(cat: String): DataFrame = {
spark.sqlContext
.read
.options(Map(HBaseTableCatalog.tableCatalog->cat))
.format("org.apache.spark.sql.execution.datasources.hbase")
.load()
}
val df = withCatalog(catalog)
df.registerTempTable("contacts")
val query = spark.sqlContext.sql("select personalName, officeAddress from contacts")
query.show() <p>
// Exiting paste mode, now interpreting.
warning: there was one deprecation warning; re-run with -deprecation for details
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/shaded/protobuf/generated/MasterProtos$MasterService$BlockingInterface
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
Below are the Jar's available in spark-Jar's Folder
hbase-0.94.2.jar
hbase-annotations-1.2.0.jar
hbase-client-2.1.0.jar
hbase-common-2.1.0.jar
hbase-hadoop-compat-2.1.0.jar
hbase-hadoop2-compat-2.1.0.jar
hbase-it-1.1.2.2.6.5.0-292.jar
hbase-prefix-tree-1.1.2.2.6.5.0-292.jar
hbase-procedure-1.1.2.2.6.5.0-292.jar
hbase-protocol-2.1.0.jar
hbase-server-2.1.0.jar
hbase-spark-1.2.0-cdh5.8.3.jar
hbase-spark-1.1.2.2.6.5.0-292.jar
hbase-thrift-1.1.2.2.6.5.0-292.jar
hive-hbase-handler-0.12.0-cdh5.1.3.jar
hive-hbase-handler-3.1.0.jar
protobuf-java-3.5.1.jar
Kindly provide me suggestion like which jar i missed to add in the jars folder in order to connect to hbase.
Seems like you are missing a shc-core jar which is used to write dataframes to hbase which has been implented by hortonworks.
As you are importing the package from hortonworks-shc-connector
import org.apache.spark.sql.execution.datasources.hbase._
You need add the jar to your spark application.
Steps to get the jar of shc-core connector:
First get pull of hortonworks-spark/hbase connector github repository then checkout to a appropriate branch with version of hbase and hadoop that you are using in your environment and build it using
mvn clean install -DskipTests
after executing above you will have a jar in your ~/.m2/repository/com/hortonworks/shc/
Use this jar for your spark application.
You can either add to your spark-jar folder or you can pass it in spark-submit/spark-shell with --jars flag
Then use try to execute the code you are trying run.
I have followed the same steps and was able to read from hbase with HCatalog.
Example
spark-shell --jars shc-core-1.1.3-2.4-s_2.11.jar
SPARK_MAJOR_VERSION is set to 2, using Spark2
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
Spark context Web UI available at http://sandbox-hdp.hortonworks.com:4040
Spark context available as 'sc' (master = yarn, app id =
application_1592322799672_0007).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.0.7.0.3.0-79
/_/
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_232)
Type in expressions to have them evaluated.
Type :help for more information.
scala> :paste
// Entering paste mode (ctrl-D to finish)
import org.apache.spark.sql.{SQLContext, _}
import org.apache.spark.sql.execution.datasources.hbase._
import org.apache.spark.{SparkConf, SparkContext}
import spark.sqlContext.implicits._
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{ConnectionFactory,HBaseAdmin,HTable,Put,Get}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.{HTableDescriptor,HColumnDescriptor}
def catalog = s"""{
|"table":{"namespace":"default", "name":"Contacts"},
|"rowkey":"key",
|"columns":{
|"rowkey":{"cf":"rowkey", "col":"key", "type":"string"},
|"officeAddress":{"cf":"Office", "col":"Address", "type":"string"},
|"officePhone":{"cf":"Office", "col":"Phone", "type":"string"},
|"personalName":{"cf":"Personal", "col":"Name", "type":"string"},
|"personalPhone":{"cf":"Personal", "col":"Phone", "type":"string"}
|}
|}""".stripMargin
def withCatalog(cat: String): DataFrame = {
spark.sqlContext
.read
.options(Map(HBaseTableCatalog.tableCatalog->cat))
.format("org.apache.spark.sql.execution.datasources.hbase")
.load()
}
val df = withCatalog(catalog)
df.registerTempTable("contacts")
val query = spark.sqlContext.sql("select personalName, officeAddress from contacts")
query.show()
// Exiting paste mode, now interpreting.
warning: there was one deprecation warning; re-run with -deprecation for details
Hive Session ID = 5cc02976-98c4-447f-9ba0-e70c4a3c4ab1
+------------+-------------+
|personalName|officeAddress|
+------------+-------------+
|John Jackson| 40 Ellis St.|
|John Jackson| 40 Ellis St.|
+------------+-------------+
import org.apache.spark.sql.{SQLContext, _}
import org.apache.spark.sql.execution.datasources.hbase._
import org.apache.spark.{SparkConf, SparkContext}
import spark.sqlContext.implicits._
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{ConnectionFactory, HBaseAdmin, HTable, Put, Get}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.{HTableDescriptor, HColumnDescriptor}
catalog: String
withCatalog: (cat: String)org.apache.spark.sql.DataFrame
df: org.apache.spark.sql.DataFrame = [rowkey: string, officeAddress: string ... 3 more fields]
query: org.apache.spark.sql.DataFrame = [personalName: string, officeAddress: string]
scala> query.show()
+------------+-------------+
|personalName|officeAddress|
+------------+-------------+
|John Jackson| 40 Ellis St.|
|John Jackson| 40 Ellis St.|
+------------+-------------+
scala>
Stack Versions :
HBase 2.2.0
Hadoop 3.1.1
Spark 2.4.0
Scala 2.11.12
Is it possible to execute initialCommands in the console task silently, i.e. as if
:silent
val $session = new foo.bar.Session()
import $session._
import $session.lib._
:silent
Putting these commands in initialCommands doesn't work, though, because :<command> commands apparently cannot be used in initialCommands:
Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_102).
Type in expressions for evaluation. Or try :help.
scala> <console>:2: error: illegal start of definition
:silent
^
Interpreter encountered errors during initialization!
[error] (Thread-1) java.lang.InterruptedException
java.lang.InterruptedException
at java.util.concurrent.SynchronousQueue.put(SynchronousQueue.java:879)
at scala.tools.nsc.interpreter.SplashLoop.run(InteractiveReader.scala:77)
at java.lang.Thread.run(Thread.java:745)
Unfortunately, as of 0.13.13, sbt runs the initialCommands early, while it's creating the interpreter, and before the console has a chance to bind the interpreter as $intp.
This is close:
$ sbt -Dscala.repl.maxprintstring=-1
[info] Set current project to sbt-test (in build file:/home/apm/tmp/sbt-test/)
> console
[info] Starting scala interpreter...
[info]
Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_111).
Type in expressions for evaluation. Or try :help.
scala> ...
scala> Future(42)
...
scala> $intp.isettings.max
maxAutoprintCompletion maxPrintString
scala> $intp.isettings.maxPrintString = 1000
$intp.isettings.maxPrintString: Int = 1000
scala> "hi"*1000
res0: String = hihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihihi...
scala> Future(42)
res1: scala.concurrent.Future[Int] = Future(Success(42))
It's a misfeature that setting maxPrintString to zero doesn't truncate everything, including the ellipsis, which is always residual.
I'm unaware of an sbt option to do that. In the lack of a better solution, you could hide all your setup in nice looking import as follows:
object console {
object setup {
val bar = foo.bar
bar.init()
}
}
Edit 1:
Note that this is equivalent to the code original code you wrote: it put a thing in scope called bar, which points to foo.bar. You can also use the same technique with types to group whatever imports you need into a single one. This is the mechanism used Predef to magically get scala.collection.immutable.Set (both the type and the value) in scope.
Edit 2:
I guess your technique can't achieve that with a single import.
It still works. Suppose Session is defined as follows:
trait Session {
val v
def f
lazy val l
object o {}
type T
}
then
val $session = new foo.bar.Session()
import $session._
becomes
object console {
object setup {
val $session = new foo.bar.Session()
val v = $session.v
def f = $session.f
lazy val l = $session.l
val o = $session.o
type T = $session.T
}
}
You can apply this transformation recursively for lib._ and whatever other imports you have until you've built the exact same scope.
I'm trying to run some examples in Apache Spark to learn more about it, but when I try to do it (in spark-shell) I'm receiving the error:
java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.addDeprecations([Lorg/apache/hadoop/conf/Configuration$DeprecationDelta;)V
There's the full execution and the error trace. I wish you could help me.
pcitbu#pcitbumint /usr/spark/spark-2.0.1-bin-hadoop2.7 $ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/10/25 09:52:38 WARN SparkConf:
SPARK_WORKER_INSTANCES was detected (set to '3').
This is deprecated in Spark 1.0+.
Please instead use:
- ./spark-submit with --num-executors to specify the number of executors
- Or set SPARK_EXECUTOR_INSTANCES
- spark.executor.instances to configure the number of instances in the spark config.
16/10/25 09:52:38 WARN Utils: Your hostname, pcitbumint resolves to a loopback address: 127.0.1.1; using 192.168.0.119 instead (on interface ens33)
16/10/25 09:52:38 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/10/25 09:52:38 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://192.168.0.119:4040
Spark context available as 'sc' (master = local[*], app id = local-1477381958561).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.1
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val file = sc.textFile("README.md")
file: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at <console>:24
scala> val counts = file.flatMap(line => line.split(" "))
counts: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at flatMap at <console>:26
scala> .map(word => (word, 1))
res0: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[3] at map at <console>:29
scala> .reduceByKey(_ + _)
java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.addDeprecations([Lorg/apache/hadoop/conf/Configuration$DeprecationDelta;)V
at org.apache.hadoop.hdfs.HdfsConfiguration.addDeprecatedKeys(HdfsConfiguration.java:66)
at org.apache.hadoop.hdfs.HdfsConfiguration.<clinit>(HdfsConfiguration.java:31)
at org.apache.hadoop.hdfs.DistributedFileSystem.<clinit>(DistributedFileSystem.java:116)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1440)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:124)
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:563)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:318)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:291)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$29.apply(SparkContext.scala:992)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$29.apply(SparkContext.scala:992)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at scala.Option.map(Option.scala:146)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:328)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:328)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:327)
... 48 elided
please check your hadoop version. More particularly the version of hadoop-common-x.x.x.jar
Couple of points:
Did you built this spark source yourself, if yes check the hadoop version you built with.
If you've a pre-built version, please see the version of hadoop-common inside your install dir i.e. /usr/spark/spark-2.0.1-bin-hadoop2.7