Intermittent org.apache.spark.SparkException: Could not find spark-version-info.properties in K8s - scala

My application uses K8s cronjob to schedule the application run, consequently creating a pod for each occurrence.
In most of the cases, the application runs well, but in some of them, it fails with the following error:
java.lang.ExceptionInInitializerError
2022-12-23T10:45:32.899555393Z at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
2022-12-23T10:45:32.899572625Z at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
2022-12-23T10:45:32.899576309Z at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
2022-12-23T10:45:32.899590250Z at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
2022-12-23T10:45:32.899601166Z at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:600)
2022-12-23T10:45:32.899604720Z at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:678)
2022-12-23T10:45:32.899608169Z at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:737)
2022-12-23T10:45:32.899610934Z at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
2022-12-23T10:45:32.899613557Z at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
2022-12-23T10:45:32.899629628Z at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
2022-12-23T10:45:32.899633296Z at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
2022-12-23T10:45:32.899637988Z at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
[...]
2022-12-23T10:45:32.899959490Z Caused by: java.lang.ExceptionInInitializerError
2022-12-23T10:45:32.899972035Z at org.apache.spark.package$.<init>(package.scala:93)
2022-12-23T10:45:32.899985422Z at org.apache.spark.package$.<clinit>(package.scala)
2022-12-23T10:45:32.899990812Z at org.apache.spark.SparkContext.$anonfun$new$1(SparkContext.scala:183)
2022-12-23T10:45:32.899996770Z at org.apache.spark.internal.Logging.logInfo(Logging.scala:54)
2022-12-23T10:45:32.899998834Z at org.apache.spark.internal.Logging.logInfo$(Logging.scala:53)
2022-12-23T10:45:32.900029297Z at org.apache.spark.SparkContext.logInfo(SparkContext.scala:73)
2022-12-23T10:45:32.900031985Z at org.apache.spark.SparkContext.<init>(SparkContext.scala:183)
2022-12-23T10:45:32.900033999Z at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2526)
2022-12-23T10:45:32.900049879Z at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:930)
2022-12-23T10:45:32.900055477Z at scala.Option.getOrElse(Option.scala:189)
2022-12-23T10:45:32.900061375Z at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
[...]
2022-12-23T10:45:32.900111374Z at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
2022-12-23T10:45:32.900134359Z at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
2022-12-23T10:45:32.900137775Z at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
2022-12-23T10:45:32.900145849Z at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
2022-12-23T10:45:32.900159046Z at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
2022-12-23T10:45:32.900178955Z at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
2022-12-23T10:45:32.900183168Z at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
2022-12-23T10:45:32.900206355Z at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
2022-12-23T10:45:32.900214981Z at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
2022-12-23T10:45:32.900227137Z at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
2022-12-23T10:45:32.900377715Z Caused by: org.apache.spark.SparkException: Could not find spark-version-info.properties
2022-12-23T10:45:32.900381131Z at org.apache.spark.package$SparkBuildInfo$.<init>(package.scala:62)
2022-12-23T10:45:32.900396317Z at org.apache.spark.package$SparkBuildInfo$.<clinit>(package.scala)
2022-12-23T10:45:32.900399396Z ... 25 more
How spark sesssion is being build
lazy val spark: SparkSession = SparkSession.builder
.appName("My application")
.master("local") //Error occur here
.getOrCreate()
Spark version: 2.4.8
Scala version: 2.12
Anyone already had a similar problem?

Related

Azure Databricks, could not initialize class org.apache.spark.eventhubs.EventHubsConf

I'm pretty new to Scala and I'm trying to create a notebook to elaborate data written in an Azure Event Hub. This is my code:
import org.apache.spark.eventhubs._
val connectionString = ConnectionStringBuilder("MY-CONNECTION-STRING")
.setEventHubName("EVENT-HUB-NAME")
.build
val eventHubsConf = EventHubsConf(connectionString)
.setStartingPosition(EventPosition.fromEndOfStream)
val eventhubs = spark.readStream
.format("eventhubs")
.options(eventHubsConf.toMap)
.load()
And I get the following error: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.eventhubs.EventHubsConf$
Cluster Configuration:
Databricks Runtime Version: 7.0 (includes Apache Spark 3.0.0, Scala 2.12)
Driver & Worker Type: 14.0 GB Memory, 4 Cores, 0.75 DBU
Standard_DS3_v2
I have installed the following library:
com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.17
cluster libraries
The other JAR installed is to resolve a problem with Logging
The code crashes as soon as I try to create eventHubsConf.
Complete traceback:
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.eventhubs.EventHubsConf$
at line14a6ae940dd14957b7172a4cf8f6cdd348.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2632683088190841:7)
at line14a6ae940dd14957b7172a4cf8f6cdd348.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2632683088190841:70)
at line14a6ae940dd14957b7172a4cf8f6cdd348.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2632683088190841:72)
at line14a6ae940dd14957b7172a4cf8f6cdd348.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2632683088190841:74)
at line14a6ae940dd14957b7172a4cf8f6cdd348.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2632683088190841:76)
at line14a6ae940dd14957b7172a4cf8f6cdd348.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-2632683088190841:78)
at line14a6ae940dd14957b7172a4cf8f6cdd348.$read$$iw$$iw$$iw$$iw.<init>(command-2632683088190841:80)
at line14a6ae940dd14957b7172a4cf8f6cdd348.$read$$iw$$iw$$iw.<init>(command-2632683088190841:82)
at line14a6ae940dd14957b7172a4cf8f6cdd348.$read$$iw$$iw.<init>(command-2632683088190841:84)
at line14a6ae940dd14957b7172a4cf8f6cdd348.$read$$iw.<init>(command-2632683088190841:86)
at line14a6ae940dd14957b7172a4cf8f6cdd348.$read.<init>(command-2632683088190841:88)
at line14a6ae940dd14957b7172a4cf8f6cdd348.$read$.<init>(command-2632683088190841:92)
at line14a6ae940dd14957b7172a4cf8f6cdd348.$read$.<clinit>(command-2632683088190841)
at line14a6ae940dd14957b7172a4cf8f6cdd348.$eval$.$print$lzycompute(<notebook>:7)
at line14a6ae940dd14957b7172a4cf8f6cdd348.$eval$.$print(<notebook>:6)
at line14a6ae940dd14957b7172a4cf8f6cdd348.$eval.$print(<notebook>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
at es Scala 2.12 but yoscala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:215)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$1(ScalaDriverLocal.scala:202)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:714)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:667)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:202)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$10(DriverLocal.scala:396)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:238)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:233)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:230)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:49)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:275)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:268)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:49)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:373)
at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:653)
at scala.util.Try$.apply(Try.scala:213)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:645)
at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:486)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:598)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:391)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
at java.lang.Thread.run(Thread.java:748)
It seems that your Runtime includes Scala 2.12 but your package is from scala 2.11
Try installing this one
com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.17

App can't run from eclipse which created by 'jhipster --skipClient'

java.lang.IllegalStateException: Error processing condition on org.springframework.boot.autoconfigure.context.MessageSourceAutoConfiguration
at org.springframework.boot.autoconfigure.condition.SpringBootCondition.matches(SpringBootCondition.java:64)
at org.springframework.context.annotation.ConditionEvaluator.shouldSkip(ConditionEvaluator.java:108)
at org.springframework.context.annotation.ConfigurationClassBeanDefinitionReader$TrackedConditionEvaluator.shouldSkip(ConfigurationClassBeanDefinitionReader.java:441)
at org.springframework.context.annotation.ConfigurationClassBeanDefinitionReader.loadBeanDefinitionsForConfigurationClass(ConfigurationClassBeanDefinitionReader.java:129)
at org.springframework.context.annotation.ConfigurationClassBeanDefinitionReader.loadBeanDefinitions(ConfigurationClassBeanDefinitionReader.java:118)
at org.springframework.context.annotation.ConfigurationClassPostProcessor.processConfigBeanDefinitions(ConfigurationClassPostProcessor.java:328)
at org.springframework.context.annotation.ConfigurationClassPostProcessor.postProcessBeanDefinitionRegistry(ConfigurationClassPostProcessor.java:233)
at org.springframework.context.support.PostProcessorRegistrationDelegate.invokeBeanDefinitionRegistryPostProcessors(PostProcessorRegistrationDelegate.java:271)
at org.springframework.context.support.PostProcessorRegistrationDelegate.invokeBeanFactoryPostProcessors(PostProcessorRegistrationDelegate.java:91)
at org.springframework.context.support.AbstractApplicationContext.invokeBeanFactoryPostProcessors(AbstractApplicationContext.java:692)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:530)
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:754)
at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:386)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:307)
at com.einfochips.demo.TestApp.main(TestApp.java:63)
Caused by: java.lang.IllegalStateException: Failed to introspect Class [org.springframework.security.config.annotation.web.configuration.WebSecurityConfiguration] from ClassLoader [sun.misc.Launcher$AppClassLoader#659e0bfd]

Error while running spark on standalone cluster

I'm trying to run a simple Spark code on standalone cluster. Below is the code:
from pyspark import SparkConf,SparkContext
if __name__ == "__main__":
conf = SparkConf().setAppName("even-numbers").setMaster("spark://sumit-Inspiron-N5110:7077")
sc = SparkContext(conf)
inp = sc.parallelize([1,2,3,4,5])
even = inp.filter(lambda x: (x % 2 == 0)).collect()
for i in even:
print(i)
but, I'm getting error stating " Could not parse Master URL":
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: Could not parse Master URL: '<pyspark.conf.SparkConf object at 0x7fb27e864850>'
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2760)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:236)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
18/01/07 16:59:47 INFO ShutdownHookManager: Shutdown hook called
18/01/07 16:59:47 INFO ShutdownHookManager: Deleting directory /tmp/spark-0d71782f-617f-44b1-9593-b9cd9267757e
I also tried setting the master as 'local', but it didn't work. Can someone help?
And yes, the command to run the job is
./bin/spark-submit even.py
Replace your following line
sc = SparkContext(conf)
with
sc = SparkContext(conf=conf)
you should have it solved.

NoSuchMethodError in spark submit

I submitted my jar with dependencies to spark using spark-submit. In main method of my jar I want to create HttpAsyncCliens instance and execute some request (apache http async client library):
val httpClient = HttpAsyncClients.custom.setMaxConnTotal(10).setMaxConnPerRoute(10).build
httpClient.start()
httpClient.execute(new HttpGet("https://google.com"), new FutureCallback[HttpResponse] {
/* callbacks */
})
It throws exceptions:
Exception in thread "pool-1-thread-1" java.lang.NoSuchMethodError:
org.apache.http.util.Asserts.check(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:315)
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:191)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
at java.lang.Thread.run(Thread.java:745) Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.http.util.Asserts.check(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClient.execute(CloseableHttpAsyncClient.java:74)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClient.execute(CloseableHttpAsyncClient.java:107)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClient.execute(CloseableHttpAsyncClient.java:91)
at spark.Application$.main(Application.scala:37)
at spark.Application.main(Application.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
It seems like there is no http-core dependency in my jar but I can call this method (org.apache.http.util.Asserts.check(ZLjava/lang/String;Ljava/lang/Object;)V) in code before or after http client creation and request:
org.apache.http.util.Asserts.check(true, "test", "test2") // it produces no exception
val httpClient = HttpAsyncClients.custom.setMaxConnTotal(10).setMaxConnPerRoute(10).build
httpClient.start()
httpClient.execute(new HttpGet("https://google.com"), new FutureCallback[HttpResponse] {
/* callbacks */
}) // it produces exception
Why I got NoSuchMethodError if I can call this method from same classpath in code?
Apache httpasyncclient v4.1
The solution here is to sync up with the version of httpcomponents.httpcore that is in the immediate classpath of spark. For version 1.6 the version is 4.0.1.
You will have to unzip the spark jar to view the META-INF, etc. After some detective work things will be alright!

:INIT_FAILURE, Fail to create InputInitializerManager

I am using EMR services from Amazon Web Services and am attempting to run a count query on an external table that I've built. The data for the table is stored in mongodb and the table is an external table in Hive. The query I'm trying to run is
select user_id, count (*) from myTable group by user_id;
I can query select * from myTable but I cannot do any other queries. When I try to I get this error:
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 container FAILED -1 0 0 -1 0 0
Reducer 2 container KILLED 1 0 0 1 0 0
----------------------------------------------------------------------------------------------
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 0.03 s
----------------------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1476467351971_0008_2_00, diagnostics=[Vertex vertex_1476467351971_0008_2_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:70)
at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:151)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)
at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:3986)
at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3100(VertexImpl.java:204)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2818)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2765)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2747)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1888)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:203)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2242)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2228)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
... 25 more
Caused by: java.lang.RuntimeException: Failed to load plan: hdfs://ip-172-31-33-88.ec2.internal:8020/tmp/hive/hadoop/8c1eca9a-84ba-4d79-b39c-e633f1c6a646/hive_2016-10-14_18-57-38_728_4862537458701624850-1/hadoop/_tez_scratch_dir/157c0e27-0bfa-4acf-b0e9-b2fed595a8de/map.xml: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: com.mongodb.hadoop.hive.input.HiveMongoInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:451)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:298)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:131)
... 30 more
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: com.mongodb.hadoop.hive.input.HiveMongoInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:180)
at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)
at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:198)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:175)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:213)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:686)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:205)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectByKryo(SerializationUtilities.java:583)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:492)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:469)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:411)
... 32 more
Caused by: java.lang.ClassNotFoundException: com.mongodb.hadoop.hive.input.HiveMongoInputFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
... 55 more
]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1476467351971_0008_2_01, diagnostics=[Vertex received Kill in NEW state., Vertex vertex_1476467351971_0008_2_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1476467351971_0008_2_00, diagnostics=[Vertex vertex_1476467351971_0008_2_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:70)
at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:151)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)
at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:3986)
at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3100(VertexImpl.java:204)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2818)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2765)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2747)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1888)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:203)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2242)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2228)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
... 25 more
Caused by: java.lang.RuntimeException: Failed to load plan: hdfs://ip-172-31-33-88.ec2.internal:8020/tmp/hive/hadoop/8c1eca9a-84ba-4d79-b39c-e633f1c6a646/hive_2016-10-14_18-57-38_728_4862537458701624850-1/hadoop/_tez_scratch_dir/157c0e27-0bfa-4acf-b0e9-b2fed595a8de/map.xml: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: com.mongodb.hadoop.hive.input.HiveMongoInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:451)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:298)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:131)
... 30 more
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: com.mongodb.hadoop.hive.input.HiveMongoInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:180)
at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)
at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:198)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:175)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:213)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:686)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:205)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectByKryo(SerializationUtilities.java:583)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:492)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:469)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:411)
... 32 more
Caused by: java.lang.ClassNotFoundException: com.mongodb.hadoop.hive.input.HiveMongoInputFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
... 55 more
]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1476467351971_0008_2_01, diagnostics=[Vertex received Kill in NEW state., Vertex vertex_1476467351971_0008_2_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
hive>
I am using a single node EMR cluster from AWS with the "Core Hadoop" configuration, but I had the same errors using a three node cluster. I have already added the Mongo Hadoop Core, Mongo Hadoop Hive, Mongo Java driver to the master node.. These are the jars I'm using:
mongo-hadoop-hive-2.0.1.jar
mongo-java-driver-3.3.0.jar
remotecontent?filepath=org%2Fmongodb%2Fmongo-hadoop%2Fmongo-hadoop-core%2F2.0.1%2Fmongo-hadoop-core-2.0.1.jar
These jars were downloaded from Maven.
A similar question had previously been asked on StackOverflow and was resolved by adding the src for Apache Tez 0.8.4 and building it. Apache Tez 0.8.4 is already on the cluster though, at least according to the EMR configuration we chose.
We are using Hadoop 2.7.2 and Hive 2.1.0.