We are trying to upgrade from Wildfly 18 to 22 and are experiencing some trouble with threads left dangling in the jgroups thread group. When the number of threads reaches the default max number (200) everything stops up (naturally). This takes two to three days in our test environment.
In the thread dump we are left with 200 dangling threads that look like this:
at jdk.internal.misc.Unsafe.park(java.base#11.0.8/Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base#11.0.8/LockSupport.java:234)
at java.util.concurrent.CompletableFuture$Signaller.block(java.base#11.0.8/CompletableFuture.java:1798)
at java.util.concurrent.ForkJoinPool.managedBlock(java.base#11.0.8/ForkJoinPool.java:3128)
at java.util.concurrent.CompletableFuture.timedGet(java.base#11.0.8/CompletableFuture.java:1868)
at java.util.concurrent.CompletableFuture.get(java.base#11.0.8/CompletableFuture.java:2021)
at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:125)
at org.infinispan.util.concurrent.CompletionStages.join(CompletionStages.java:80)
at org.infinispan.interceptors.distribution.L1NonTxInterceptor.lambda$handleDataWriteCommand$10(L1NonTxInterceptor.java:399)
at org.infinispan.interceptors.distribution.L1NonTxInterceptor$$Lambda$1806/0x0000000841ace040.apply(Unknown Source)
at org.infinispan.interceptors.impl.QueueAsyncInvocationStage.invokeQueuedHandlers(QueueAsyncInvocationStage.java:125)
at org.infinispan.interceptors.impl.QueueAsyncInvocationStage.accept(QueueAsyncInvocationStage.java:88)
at org.infinispan.interceptors.impl.QueueAsyncInvocationStage.accept(QueueAsyncInvocationStage.java:33)
at java.util.concurrent.CompletableFuture.uniWhenComplete(java.base#11.0.8/CompletableFuture.java:859)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base#11.0.8/CompletableFuture.java:837)
at java.util.concurrent.CompletableFuture.postComplete(java.base#11.0.8/CompletableFuture.java:506)
at java.util.concurrent.CompletableFuture.complete(java.base#11.0.8/CompletableFuture.java:2073)
at org.infinispan.interceptors.distribution.PrimaryOwnerOnlyCollector.primaryResult(PrimaryOwnerOnlyCollector.java:30)
at org.infinispan.interceptors.distribution.TriangleDistributionInterceptor.lambda$forwardToPrimary$6(TriangleDistributionInterceptor.java:526)
at org.infinispan.interceptors.distribution.TriangleDistributionInterceptor$$Lambda$1953/0x0000000841d51040.apply(Unknown Source)
at java.util.concurrent.CompletableFuture.uniHandle(java.base#11.0.8/CompletableFuture.java:930)
at java.util.concurrent.CompletableFuture$UniHandle.tryFire(java.base#11.0.8/CompletableFuture.java:907)
at java.util.concurrent.CompletableFuture.postComplete(java.base#11.0.8/CompletableFuture.java:506)
at java.util.concurrent.CompletableFuture.complete(java.base#11.0.8/CompletableFuture.java:2073)
at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:67)
at org.infinispan.remoting.transport.impl.SingleTargetRequest.onResponse(SingleTargetRequest.java:45)
at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:52)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1402)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1305)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$300(JGroupsTransport.java:131)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1445)
at org.jgroups.JChannel.up(JChannel.java:784)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:913)
at org.jgroups.protocols.FRAG3.up(FRAG3.java:165)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:343)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:343)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:876)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:243)
at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1049)
at org.jgroups.protocols.UNICAST3.addMessage(UNICAST3.java:772)
at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:753)
at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:405)
at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:592)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
at org.jgroups.protocols.FailureDetection.up(FailureDetection.java:186)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:254)
at org.jgroups.protocols.MERGE3.up(MERGE3.java:281)
at org.jgroups.protocols.Discovery.up(Discovery.java:300)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1396)
at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base#11.0.8/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base#11.0.8/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base#11.0.8/Thread.java:834)
These threads are hanging on a CompletableFuture.get(...) with a infinispan-hardcoded timeout of 24 hours. This is probably not something that should happen, which is confirmed by the exception we get when the timeout is finally reached:
2021-03-03 09:00:46,638 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (jgroups-263,xxxxxxxxxxxx) ISPN000136: Error executing command ReadWriteKeyCommand on Cache 'sessionmanager.session', writing keys [xxxx#xxxxxx]: java.lang.IllegalStateException: This should never happen!
at org.infinispan#11.0.8.Final//org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:127)
at org.infinispan#11.0.8.Final//org.infinispan.util.concurrent.CompletionStages.join(CompletionStages.java:80)
at org.infinispan#11.0.8.Final//org.infinispan.interceptors.distribution.L1NonTxInterceptor.lambda$handleDataWriteCommand$10(L1NonTxInterceptor.java:399)
at org.infinispan#11.0.8.Final//org.infinispan.interceptors.impl.QueueAsyncInvocationStage.invokeQueuedHandlers(QueueAsyncInvocationStage.java:125)
at org.infinispan#11.0.8.Final//org.infinispan.interceptors.impl.QueueAsyncInvocationStage.accept(QueueAsyncInvocationStage.java:88)
at org.infinispan#11.0.8.Final//org.infinispan.interceptors.impl.QueueAsyncInvocationStage.accept(QueueAsyncInvocationStage.java:33)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
at org.infinispan#11.0.8.Final//org.infinispan.interceptors.distribution.PrimaryOwnerOnlyCollector.primaryResult(PrimaryOwnerOnlyCollector.java:30)
at org.infinispan#11.0.8.Final//org.infinispan.interceptors.distribution.TriangleDistributionInterceptor.lambda$forwardToPrimary$6(TriangleDistributionInterceptor.java:526)
at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
at org.infinispan#11.0.8.Final//org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:67)
at org.infinispan#11.0.8.Final//org.infinispan.remoting.transport.impl.SingleTargetRequest.onResponse(SingleTargetRequest.java:45)
at org.infinispan#11.0.8.Final//org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:52)
at org.infinispan#11.0.8.Final//org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1402)
at org.infinispan#11.0.8.Final//org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1305)
at org.infinispan#11.0.8.Final//org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$300(JGroupsTransport.java:131)
at org.infinispan#11.0.8.Final//org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1445)
at org.jgroups#4.2.10.Final//org.jgroups.JChannel.up(JChannel.java:784)
at org.jgroups#4.2.10.Final//org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:913)
at org.jgroups#4.2.10.Final//org.jgroups.protocols.FRAG3.up(FRAG3.java:165)
at org.jgroups#4.2.10.Final//org.jgroups.protocols.FlowControl.up(FlowControl.java:343)
at org.jgroups#4.2.10.Final//org.jgroups.protocols.FlowControl.up(FlowControl.java:343)
at org.jgroups#4.2.10.Final//org.jgroups.protocols.pbcast.GMS.up(GMS.java:876)
at org.jgroups#4.2.10.Final//org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:243)
at org.jgroups#4.2.10.Final//org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1049)
at org.jgroups#4.2.10.Final//org.jgroups.protocols.UNICAST3.addMessage(UNICAST3.java:772)
at org.jgroups#4.2.10.Final//org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:753)
at org.jgroups#4.2.10.Final//org.jgroups.protocols.UNICAST3.up(UNICAST3.java:405)
at org.jgroups#4.2.10.Final//org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:592)
at org.jgroups#4.2.10.Final//org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
at org.jgroups#4.2.10.Final//org.jgroups.protocols.FailureDetection.up(FailureDetection.java:186)
at org.jgroups#4.2.10.Final//org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:254)
at org.jgroups#4.2.10.Final//org.jgroups.protocols.MERGE3.up(MERGE3.java:281)
at org.jgroups#4.2.10.Final//org.jgroups.protocols.Discovery.up(Discovery.java:300)
at org.jgroups#4.2.10.Final//org.jgroups.protocols.TP.passMessageUp(TP.java:1396)
at org.jgroups#4.2.10.Final//org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.util.concurrent.TimeoutException
at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
at org.infinispan#11.0.8.Final//org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:125)
... 44 more
Have anyone encountered a similar problem? What can possibly lead to this situation?
Related
・Python3.8
・JDK 11
I've started learning pyflink and write a code instructed by official web which is https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/datastream/intro_to_datastream_api/
And here is my code
from pyflink.common.serialization import JsonRowDeserializationSchema,JsonRowSerializationSchema
from pyflink.common import WatermarkStrategy, Row
from pyflink.common.serialization import Encoder
from pyflink.common.typeinfo import Types
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.connectors import FlinkKafkaConsumer,FlinkKafkaProducer
def streaming():
env = StreamExecutionEnvironment.get_execution_environment()
deserialization_schema =JsonRowDeserializationSchema.builder().type_info(
type_info=Types.ROW([Types.INT(), Types.STRING()])).build()
kafka_consumer = FlinkKafkaConsumer(
topics='test',
deserialization_schema=deserialization_schema,
properties={'bootstrap.servers': 'localhost:9092','group.id': 'test_group'})
ds = env.add_source(kafka_consumer)
ds = ds.map(lambda a: Row(a % 4, 1),
output_type=Types.ROW([Types.LONG(), Types.LONG()])) \
.key_by(lambda a: a[0]) \
.reduce(lambda a, b: Row(a[0], a[1] + b[1]))
serialization_schema = JsonRowSerializationSchema.builder().with_type_info(
type_info=Types.ROW([Types.LONG(), Types.LONG()])).build()
kafka_sink = FlinkKafkaProducer(
topic='test_sink_topic',
serialization_schema=serialization_schema,
producer_config={'bootstrap.servers': 'localhost:9092',
'group.id': 'test_group'})
ds.add_sink(kafka_sink)
env.execute('datastream_api_demo')
if __name__ == '__main__':
streaming()
Firstly it said to me to specify jarfile. So I downloaded flink-connector-kafka and kafka-clients jarfile for each from https://mvnrepository.com/artifact/org.apache.flink and put them into pyflink/lib directory.
And now I'm at next step getting this error;
(pyflink_demo) C:\work\pyflink_demo>python Kafka_stream_Kafka.py
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.flink.api.java.ClosureCleaner (file:/C:/work/pyflink_demo/Lib/site-packages/pyflink/lib/flink-dist_2.11-1.14.4.jar) to field java.util.P
roperties.serialVersionUID
WARNING: Please consider reporting this to the maintainers of org.apache.flink.api.java.ClosureCleaner
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Traceback (most recent call last):
File "Kafka_stream_Kafka.py", line 38, in <module>
streaming()
File "Kafka_stream_Kafka.py", line 33, in streaming
env.execute('datastream_api_demo')
File "C:\work\pyflink_demo\lib\site-packages\pyflink\datastream\stream_execution_environment.py", line 691, in execute
return JobExecutionResult(self._j_stream_execution_environment.execute(j_stream_graph))
File "C:\work\pyflink_demo\lib\site-packages\py4j\java_gateway.py", line 1285, in __call__
return_value = get_return_value(
File "C:\work\pyflink_demo\lib\site-packages\pyflink\util\exceptions.py", line 146, in deco
return f(*a, **kw)
File "C:\work\pyflink_demo\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o0.execute.
: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
at org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:137)
at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:258)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
at org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1389)
at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47)
at akka.dispatch.OnComplete.internal(Future.scala:300)
at akka.dispatch.OnComplete.internal(Future.scala:297)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:221)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$DirectExecutionContext.execute(AkkaFutureUtils.java:65)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:621)
at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:24)
at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:23)
at scala.concurrent.Future.$anonfun$andThen$1(Future.scala:532)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:63)
at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:100)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:100)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy
at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138)
at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82)
at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:252)
at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:242)
at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:233)
at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:684)
at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79)
at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at akka.actor.Actor.aroundReceive(Actor.scala:537)
at akka.actor.Actor.aroundReceive$(Actor.scala:535)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
at akka.actor.ActorCell.invoke(ActorCell.scala:548)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
at akka.dispatch.Mailbox.run(Mailbox.scala:231)
at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
... 5 more
Caused by: java.lang.RuntimeException: Failed to create stage bundle factory! INFO:root:Initializing Python harness: C:\work\pyflink_demo\lib\site-packages\pyflink\fn_execution\beam\bea
m_boot.py --id=4-1 --provision_endpoint=localhost:51794
INFO:root:Starting up Python harness in loopback mode.
at org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.createStageBundleFactory(BeamPythonFunctionRunner.java:566)
at org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:255)
at org.apache.flink.streaming.api.operators.python.AbstractPythonFunctionOperator.open(AbstractPythonFunctionOperator.java:131)
at org.apache.flink.streaming.api.operators.python.AbstractOneInputPythonFunctionOperator.open(AbstractOneInputPythonFunctionOperator.java:116)
at org.apache.flink.streaming.api.operators.python.PythonProcessOperator.open(PythonProcessOperator.java:59)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:110)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:711)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:687)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalStateException: Process died with exit code 0
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2050)
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952)
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964)
at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:451)
at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:436)
at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:303)
at org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.createStageBundleFactory(BeamPythonFunctionRunner.java:564)
... 14 more
Caused by: java.lang.IllegalStateException: Process died with exit code 0
at org.apache.beam.runners.fnexecution.environment.ProcessManager$RunningProcess.isAliveOrThrow(ProcessManager.java:75)
at org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory.createEnvironment(ProcessEnvironmentFactory.java:112)
at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:252)
at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:231)
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
... 22 more
I tried to figure out what's going on and found very similar question What's wrong with my Pyflink setup that Python UDFs throw py4j exceptions?
It says that was caused by network proxy problem. JVM and python uses local socket communication. So set local communication with no proxy.
I set environment valuable "no_proxy" but it doesn't work.
enter image description here
Could anyone provide solution for this?
There is no useful information in the exception stack to help to identify the problem. This should be caused by a known issue(FLINK-26543, already solved, however still not released). This issue only occurs in loopback mode which is enabled by default when executing the job locally.
For now, you could try to force the job run in process mode instead of loopback mode by setting environment variable _python_worker_execution_mode to process. After doing this, you should see the root cause of the failure.
Besides, there is also a small issue in your code. I guess you meant ds.map(lambda a: Row(a[0] % 4, 1), output_type=Types.ROW([Types.LONG(), Types.LONG()])) instead of ds.map(lambda a: Row(a % 4, 1), output_type=Types.ROW([Types.LONG(), Types.LONG()])) as it doesn't support % operation in Row object.
I have tried the script. I am not quite sure what caused the error. Try to start kafka first and create the topics, before running the script. Or start kafka and run the script a second time after first failure.
I have a pyspark application that will transform csv to parquet and before this happen I'm copying some S3 object from a bucket to another.
pyspark with spark 2.4, emr 5.27, maximizeResourceAllocation set to true
I have various csv files size, from 80kb to 500mb.
Nonetheless, my EMR cluster (it doesn't fail on local with spark-submit) fails at 70% completion on a file that is 166mb (a previous at 480mb succeeded).
The job is simple:
def organise_adwords_csv():
s3 = boto3.resource('s3')
bucket = s3.Bucket(S3_ORIGIN_RAW_BUCKET)
for obj in bucket.objects.filter(Prefix=S3_ORIGIN_ADWORDS_RAW + "/"):
key = obj.key
copy_source = {
'Bucket': S3_ORIGIN_RAW_BUCKET,
'Key': key
}
key_tab = obj.key.split("/")
if len(key_tab) < 5:
print("continuing from length", obj)
continue
file_name = ''.join(key_tab[len(key_tab)-1:len(key_tab)])
if file_name == '':
print("continuing", obj)
continue
table = file_name.split("_")[1].replace("-", "_")
new_path = "{0}/{1}/{2}".format(S3_DESTINATION_ORDERED_ADWORDS_RAW_PATH, table, file_name)
print("new_path", new_path) <- the last print will end here
try:
s3.meta.client.copy(copy_source, S3_DESTINATION_RAW_BUCKET, new_path)
print("copy done")
except Exception as e:
print(e)
print("an exception occured while copying")
if __name__=='__main__':
organise_adwords_csv()
print("copy Final done") <- never printed
spark = SparkSession.builder.appName("adwords_transform") \
...
but, in the stdout, no errors / exception are showing.
In stderr logs:
19/10/09 16:16:57 INFO ApplicationMaster: Waiting for spark context initialization...
19/10/09 16:18:37 ERROR ApplicationMaster: Uncaught exception:
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
19/10/09 16:18:37 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
)
19/10/09 16:18:37 INFO ShutdownHookManager: Shutdown hook called
I'm completely blind, I don't understand what is failing / why.
How can I figure that out? On local it works like a charm (but super slow of course)
Edit:
After many tries I can confirm that the function:
s3.meta.client.copy(copy_source, S3_DESTINATION_RAW_BUCKET, new_path)
make the EMR cluster timeout, even tho it processed 80% of the files already.
Does anyone have a recommendation about this?
s3.meta.client.copy(copy_source, S3_DESTINATION_RAW_BUCKET, new_path)
This will fail for any source object larger than 5 GB. please use multipart upload in AWS. See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#multipartupload
I want to load data into titan ,which's backend is hbase.So I changed the example from https://github.com/thinkaurelius/titan/blob/titan10/titan-core/src/main/java/com/thinkaurelius/titan/example/GraphOfTheGodsFactory.java
a little.But it ocurred many Exceptions when i connect.
Exception in thread "main" com.thinkaurelius.titan.core.TitanException: Could not open global configuration
at com.thinkaurelius.titan.diskstorage.Backend.getStandaloneGlobalConfiguration(Backend.java:451)
at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.(GraphDatabaseConfiguration.java:1322)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:94)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:84)
at com.thinkaurelius.titan.core.TitanFactory$Builder.open(TitanFactory.java:139)
at titantest.TitanLoad.main(TitanLoad.java:106)
Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: Temporary failure in storage backend
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.ensureTableExists(HBaseStoreManager.java:759)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.ensureColumnFamilyExists(HBaseStoreManager.java:831)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.openDatabase(HBaseStoreManager.java:456)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.KeyColumnValueStoreManager.openDatabase(KeyColumnValueStoreManager.java:29)
at com.thinkaurelius.titan.diskstorage.Backend.getStandaloneGlobalConfiguration(Backend.java:449)
... 5 more
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.()V from class org.apache.hadoop.hbase.zookeeper.MetaTableLocator
at org.apache.hadoop.hbase.client.RpcRetryingCaller.translateException(RpcRetryingCaller.java:229)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:202)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:299)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:278)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:140)
at org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:135)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:845)
at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:600)
at org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:364)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:281)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseAdmin1_0.tableExists(HBaseAdmin1_0.java:70)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.ensureTableExists(HBaseStoreManager.java:753)
... 9 more
Caused by: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.()V from class org.apache.hadoop.hbase.zookeeper.MetaTableLocator
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:434)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:60)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1139)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1106)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:293)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:147)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:56)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
... 19 more
Here is the code:
TitanGraph g = TitanFactory.build().set("storage.backend","hbase").set("storage.hostname","192.168.200.121,192.168.200.115")
.set("storage.hbase.table","mytitan").set("storage.hbase.ext.zookeeper.znode.parent","/hbase-unsecure").set("storage.port",2181).open();
BTW,I can connect to the Hbase cluster,it looks fine.
I am using EMR services from Amazon Web Services and am attempting to run a count query on an external table that I've built. The data for the table is stored in mongodb and the table is an external table in Hive. The query I'm trying to run is
select user_id, count (*) from myTable group by user_id;
I can query select * from myTable but I cannot do any other queries. When I try to I get this error:
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 container FAILED -1 0 0 -1 0 0
Reducer 2 container KILLED 1 0 0 1 0 0
----------------------------------------------------------------------------------------------
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 0.03 s
----------------------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1476467351971_0008_2_00, diagnostics=[Vertex vertex_1476467351971_0008_2_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:70)
at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:151)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)
at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:3986)
at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3100(VertexImpl.java:204)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2818)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2765)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2747)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1888)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:203)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2242)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2228)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
... 25 more
Caused by: java.lang.RuntimeException: Failed to load plan: hdfs://ip-172-31-33-88.ec2.internal:8020/tmp/hive/hadoop/8c1eca9a-84ba-4d79-b39c-e633f1c6a646/hive_2016-10-14_18-57-38_728_4862537458701624850-1/hadoop/_tez_scratch_dir/157c0e27-0bfa-4acf-b0e9-b2fed595a8de/map.xml: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: com.mongodb.hadoop.hive.input.HiveMongoInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:451)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:298)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:131)
... 30 more
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: com.mongodb.hadoop.hive.input.HiveMongoInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:180)
at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)
at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:198)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:175)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:213)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:686)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:205)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectByKryo(SerializationUtilities.java:583)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:492)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:469)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:411)
... 32 more
Caused by: java.lang.ClassNotFoundException: com.mongodb.hadoop.hive.input.HiveMongoInputFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
... 55 more
]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1476467351971_0008_2_01, diagnostics=[Vertex received Kill in NEW state., Vertex vertex_1476467351971_0008_2_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1476467351971_0008_2_00, diagnostics=[Vertex vertex_1476467351971_0008_2_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:70)
at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:151)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)
at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:3986)
at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3100(VertexImpl.java:204)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2818)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2765)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2747)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1888)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:203)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2242)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2228)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
... 25 more
Caused by: java.lang.RuntimeException: Failed to load plan: hdfs://ip-172-31-33-88.ec2.internal:8020/tmp/hive/hadoop/8c1eca9a-84ba-4d79-b39c-e633f1c6a646/hive_2016-10-14_18-57-38_728_4862537458701624850-1/hadoop/_tez_scratch_dir/157c0e27-0bfa-4acf-b0e9-b2fed595a8de/map.xml: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: com.mongodb.hadoop.hive.input.HiveMongoInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:451)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:298)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:131)
... 30 more
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: com.mongodb.hadoop.hive.input.HiveMongoInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:180)
at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)
at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:198)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:175)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:213)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:686)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:205)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectByKryo(SerializationUtilities.java:583)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:492)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:469)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:411)
... 32 more
Caused by: java.lang.ClassNotFoundException: com.mongodb.hadoop.hive.input.HiveMongoInputFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
... 55 more
]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1476467351971_0008_2_01, diagnostics=[Vertex received Kill in NEW state., Vertex vertex_1476467351971_0008_2_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
hive>
I am using a single node EMR cluster from AWS with the "Core Hadoop" configuration, but I had the same errors using a three node cluster. I have already added the Mongo Hadoop Core, Mongo Hadoop Hive, Mongo Java driver to the master node.. These are the jars I'm using:
mongo-hadoop-hive-2.0.1.jar
mongo-java-driver-3.3.0.jar
remotecontent?filepath=org%2Fmongodb%2Fmongo-hadoop%2Fmongo-hadoop-core%2F2.0.1%2Fmongo-hadoop-core-2.0.1.jar
These jars were downloaded from Maven.
A similar question had previously been asked on StackOverflow and was resolved by adding the src for Apache Tez 0.8.4 and building it. Apache Tez 0.8.4 is already on the cluster though, at least according to the EMR configuration we chose.
We are using Hadoop 2.7.2 and Hive 2.1.0.
Exception in thread "main" org.apache.gora.util.GoraException: java.io.IOException
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:214)
at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
Caused by: java.io.IOException
at org.apache.gora.cassandra.store.CassandraStore.initialize(CassandraStore.java:88)
at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
... 8 more
Caused by: java.lang.NullPointerException
at org.apache.gora.cassandra.store.CassandraMapping.<init>(CassandraMapping.java:117)
at org.apache.gora.cassandra.store.CassandraMappingManager.get(CassandraMappingManager.java:84)
at org.apache.gora.cassandra.store.CassandraClient.initialize(CassandraClient.java:84)
at org.apache.gora.cassandra.store.CassandraStore.initialize(CassandraStore.java:85)
... 10 more
I just run nutch2.0 on cassandra. It's the output of crawl, and the output of TestGoreStorage is as following:
Starting!
Exception in thread "main" org.apache.gora.util.GoraException: java.io.IOException
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
at org.apache.nutch.storage.TestGoraStorage.main(TestGoraStorage.java:204)
Caused by: java.io.IOException
at org.apache.gora.cassandra.store.CassandraStore.initialize(CassandraStore.java:88)
at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
... 3 more
Caused by: java.lang.NullPointerException
at org.apache.gora.cassandra.store.CassandraMapping.<init>(CassandraMapping.java:117)
at org.apache.gora.cassandra.store.CassandraMappingManager.get(CassandraMappingManager.java:84)
at org.apache.gora.cassandra.store.CassandraClient.initialize(CassandraClient.java:84)
at org.apache.gora.cassandra.store.CassandraStore.initialize(CassandraStore.java:85)
... 5 more
I can connect cassandra with cassandra-cli, and just check out the nutch from svn.
Here is the effect config in gora.properties:
gora.datastore.default=org.apache.gora.cassandra.store.CassandraStore
gora.sqlstore.jdbc.driver=org.hsqldb.jdbc.JDBCDriver
gora.sqlstore.jdbc.url=jdbc:hsqldb:hsql://210.44.138.8/nutchtest
gora.sqlstore.jdbc.user=sa
gora.sqlstore.jdbc.password=
gora.cassandrastore.servers=210.44.138.8:9160
and the config in gora-cassandra-mapping:
<keyspace name="webpage" cluster="My Cluster" host="210.44.138.8">
<family name="p"/>
<family name="f"/>
<family name="sc" type="super"/>
</keyspace>
210.44.138.8 is a node of my cluster, and the name of cluster is "My Cluster",
more info: closed firewall, run in eclipse. I'm very pleasure if someone give me any help.
I'm not sure if I had the exact same problem, but I found that in the gora-cassandra-mapping.xml file I had to add a keyspace attribute (keyspace="ks1") to the class element:
<keyspace name="ks1" cluster="My Cluster" host="1.2.3.4">
...
</keyspace>
<class keyspace="ks1" keyClass="java.lang.String" name="org.apache.nutch.storage.WebPage">
...
</class>