Using fastutill ObjectArrayList in Spark application throws exception - scala

I am using a fastutill ObjectArrayList[object] instead of ArrayBuffer in scala in my spark streamign applicatiuon (inside a mapwithstae function)
I use the lates version..using below sbt
libraryDependencies += "it.unimi.dsi" % "fastutil" % "7.0.13"
However, it throws an exception
yarn.ApplicationMaster: User class threw exception: java.lang.NoClassDefFoundError: it/unimi/dsi/fastutil/objects/ObjectArrayList
java.lang.NoClassDefFoundError: it/unimi/dsi/fastutil/objects/ObjectArrayList
at MapRRecoverableConsumer$$anonfun$5.apply(MapRRecoverableConsumer.scala:756)
at MapRRecoverableConsumer$$anonfun$5.apply(MapRRecoverableConsumer.scala:293)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:864)
at MapRRecoverableConsumer$.main(MapRRecoverableConsumer.scala:863)
at MapRRecoverableConsumer.main(MapRRecoverableConsumer.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
Caused by: java.lang.ClassNotFoundException: it.unimi.dsi.fastutil.objects.ObjectArrayList
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

Ok..just added as jars parameter in spark submit and it resolved that..

Related

Mongo-Hadoop connector issue

I run file jar Word Count but have this error
NoClassDefFoundError: org/Apache/commons/lang/StringUtils ..
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/lang/StringUtils
at com.mongodb.hadoop.util.MongoConfigUtil.getMongoURIs(MongoConfigUtil.java:409)
at com.mongodb.hadoop.util.MongoConfigUtil.getOutputURIs(MongoConfigUtil.java:598)
at com.mongodb.hadoop.MongoOutputFormat.checkOutputSpecs(MongoOutputFormat.java:32)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:277)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1571)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1568)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1568)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1589)
at WordCount.main(WordCount.java:55)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

Error when launching the graph (DSEGraphFrame )

I have a dse graph in validation/prod environement.
The problem occurs when I try to launch a DSEGraphFrame query using Spark in Scala.
val graph = spark.dseGraph("my_graph")
generates the following exception:
Exception in thread "main"
com.datastax.driver.core.exceptions.InvalidQueryException: The method
DseGraphRpc.getSchemaBlob does not exist. Make sure that the required
component for that method is active/enabled
at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:40)
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:26)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:284)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:37)
at com.sun.proxy.$Proxy27.execute(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:37)
at com.sun.proxy.$Proxy28.execute(Unknown Source)
at com.datastax.bdp.util.rpc.RpcUtil.callInternal(RpcUtil.java:57)
at com.datastax.bdp.util.rpc.RpcUtil.call(RpcUtil.java:40)
at com.datastax.bdp.graph.spark.DseGraphRpc.callGetSchema(DseGraphRpc.java:45)
at com.datastax.bdp.graph.spark.graphframe.DseGraphFrame$$anonfun$getSchemaFromServer$1.apply(DseGraphFrame.scala:586)
at com.datastax.bdp.graph.spark.graphframe.DseGraphFrame$$anonfun$getSchemaFromServer$1.apply(DseGraphFrame.scala:586)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:115)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:114)
at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:158)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:114)
at com.datastax.bdp.graph.spark.graphframe.DseGraphFrame$.getSchemaFromServer(DseGraphFrame.scala:586)
at com.datastax.bdp.graph.spark.graphframe.DseGraphFrameBuilder$.apply(DseGraphFrameBuilder.scala:257)
at com.datastax.bdp.graph.spark.graphframe.SparkSessionFunctions.dseGraph(SparkSessionFunctions.scala:20)
What could I do to run DSEGraphFrame properly?
The problem comes from a node in dse cluster wich the graph isn't activated

pyspark - Spark hbase connector throws failed to find data source

I'm trying to connect to hbase from Pyspark using SHC API by referring the below link.
https://community.hortonworks.com/questions/143802/read-hbase-with-pyspark-from-jupyter-notebook.html
Sample code:
spark = SparkSession.builder.appName("Hbase Read").getOrCreate()
data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'
catalog = ''.join("""{
"table":{"namespace":"default", "name":"table"},
"rowkey":"key",
"columns":{
"firstcol":{"cf":"rowkey", "col":"key", "type":"string"},
"secondcol":{"cf":"cf", "col":"col1", "type":"int"}
}
}""".split())
df = spark.read \
.options(catalog=catalog) \
.format(data_source_format) \
.load()
df.show()
Spark-submit:
spark-submit --packages com.hortonworks:shc:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/ TestHbaseRead.py
i'm getting this error.
Error log:
: java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.execution.datasources.hbase. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:635)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.execution.datasources.hbase.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:618)
... 13 more

java.lang.ClassNotFoundException: com.google.common.base.Function

Appreciate your help on the error below. I got it when I restored my spring boot in eclipse. My main app is getting it:
Exception in thread "main" java.lang.NoClassDefFoundError:
com/google/common/base/Function at
java.lang.ClassLoader.defineClass1(Native Method) at
java.lang.ClassLoader.defineClass(ClassLoader.java:760) at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at
java.net.URLClassLoader.access$100(URLClassLoader.java:73) at
java.net.URLClassLoader$1.run(URLClassLoader.java:368) at
java.net.URLClassLoader$1.run(URLClassLoader.java:362) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:361) at
java.lang.ClassLoader.loadClass(ClassLoader.java:424) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at
java.lang.ClassLoader.loadClass(ClassLoader.java:357) at
java.lang.Class.getDeclaredMethods0(Native Method) at
java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at
java.lang.Class.getDeclaredMethod(Class.java:2128) at
org.springframework.boot.devtools.restart.MainMethod.getMainMethod(MainMethod.java:57)
at
org.springframework.boot.devtools.restart.MainMethod.getMainMethod(MainMethod.java:45)
at
org.springframework.boot.devtools.restart.MainMethod.(MainMethod.java:39)
at
org.springframework.boot.devtools.restart.Restarter.getMainClassName(Restarter.java:145)
at
org.springframework.boot.devtools.restart.Restarter.(Restarter.java:135)
at
org.springframework.boot.devtools.restart.Restarter.initialize(Restarter.java:531)
at
org.springframework.boot.devtools.restart.RestartApplicationListener.onApplicationStartedEvent(RestartApplicationListener.java:64)
at
org.springframework.boot.devtools.restart.RestartApplicationListener.onApplicationEvent(RestartApplicationListener.java:46)
at
org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:166)
at
org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:138)
at
org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:121)
at
org.springframework.boot.context.event.EventPublishingRunListener.publishEvent(EventPublishingRunListener.java:111)
at
org.springframework.boot.context.event.EventPublishingRunListener.started(EventPublishingRunListener.java:60)
at
org.springframework.boot.SpringApplicationRunListeners.started(SpringApplicationRunListeners.java:48)
at
org.springframework.boot.SpringApplication.run(SpringApplication.java:302)
at
org.springframework.boot.SpringApplication.run(SpringApplication.java:1185)
at
org.springframework.boot.SpringApplication.run(SpringApplication.java:1174)
at com.myagency.App.main(App.java:25) Caused by:
java.lang.ClassNotFoundException: com.google.common.base.Function

How do I configure the YARN address for yarn-client mode in spark?

From a remote scala program, using Spark 1.3, how do I initialize the sparkContext so that I can connect to Spark running on YARN? i.e. where do I put the address of the YARN node(s)?
Currently my program contains:
val conf = new SparkConf().setMaster("yarn-client").setAppName("MyApp")
val sc = new SparkContext(conf)
and it yields
[error] (run-main-0) java.lang.ExceptionInInitializerError
java.lang.ExceptionInInitializerError
at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:1959)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:104)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:179)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:310)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:163)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:269)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:272)
at SparkExampleLocalDriver$.main(SparkExample.scala:9)
at SparkExampleLocalDriver.main(SparkExample.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
Caused by: org.apache.spark.SparkException: Unable to load YARN support
at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:217)
at org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:212)
at org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:1959)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:104)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:179)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:310)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:163)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:269)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:272)
at SparkExampleLocalDriver$.main(SparkExample.scala:9)
at SparkExampleLocalDriver.main(SparkExample.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:195)
at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:213)
at org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:212)
at org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:1959)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:104)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:179)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:310)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:163)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:269)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:272)
at SparkExampleLocalDriver$.main(SparkExample.scala:9)
at SparkExampleLocalDriver.main(SparkExample.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
Your Spark binary doesn't contain YARN related classes
Use pre-built binary for hadoop
http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.4.tgz
If you are compiling source, include yarn and hadoop profiles
./make-distribution.sh --tgz --skip-java-test -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0