Data Fusion Provisioning of Dataproc Cluster Fails - google-cloud-dataproc

I've created a simple pipeline which reads from a SQL Server table and writes to a BigQuery table. Then I configure it to use Spark and deploy and run. It starts by provisioning the dataproc cluster and I can see that it relatively quickly creates 3 VM's, one master and two workers. The main cluster creation job stays as "provisioning" though, both in the dataproc UI and in the Data Fusion UI. After about 17 minutes it fails.
I've tried both in an enterprise instance and a basic instance. I've made sure that the instance service account has the "Cloud Data Fusion API Service Agent" role. I've run the preview, which runs in around 20 seconds and succeeds.
This is the log:
2019-06-21 10:59:37,011 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask#121] - Executing PROVISION subtask REQUESTING_CREATE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 10:59:42,087 - INFO [provisioning-service-3:i.c.c.r.s.p.d.DataprocProvisioner#171] - Creating Dataproc cluster cdap-loadfromb-a7999324-9413-11e9-a296-564a3b7813c8 with system labels {goog-datafusion-version=6_0, cdap-version=6_0_1-1559673739218, goog-datafusion-edition=basic}
2019-06-21 10:59:45,446 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask#125] - Completed PROVISION subtask REQUESTING_CREATE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 10:59:45,461 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask#121] - Executing PROVISION subtask POLLING_CREATE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 10:59:46,402 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask#125] - Completed PROVISION subtask POLLING_CREATE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
(...)
2019-06-21 11:17:31,345 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask#121] - Executing PROVISION subtask REQUESTING_DELETE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 11:17:32,753 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask#125] - Completed PROVISION subtask REQUESTING_DELETE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 11:17:32,769 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask#121] - Executing PROVISION subtask POLLING_DELETE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 11:17:33,588 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask#125] - Completed PROVISION subtask POLLING_DELETE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 11:17:33,601 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask#112] - Completed PROVISION task for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 11:17:35,946 - DEBUG [provisioning-service-4:i.c.c.i.p.t.ProvisioningTask#121] - Executing DEPROVISION subtask REQUESTING_DELETE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 11:17:37,219 - ERROR [provisioning-service-4:i.c.c.i.p.t.ProvisioningTask#151] - DEPROVISION task failed in REQUESTING_DELETE state for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
com.google.api.gax.rpc.FailedPreconditionException: io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Cannot delete cluster 'cdap-loadfromb-a7999324-9413-11e9-a296-564a3b7813c8' while it has other pending delete operations.
at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:59) ~[na:na]
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72) ~[na:na]
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60) ~[na:na]
at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:95) ~[na:na]
at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:61) ~[na:na]
at com.google.common.util.concurrent.Futures$4.run(Futures.java:1123) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:435) ~[na:na]
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:900) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:811) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:675) ~[com.google.guava.guava-13.0.1.jar:na]
at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:492) ~[na:na]
at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:467) ~[na:na]
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:41) ~[na:na]
at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:684) ~[na:na]
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:41) ~[na:na]
at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:392) ~[na:na]
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:475) ~[na:na]
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63) ~[na:na]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:557) ~[na:na]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:478) ~[na:na]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:590) ~[na:na]
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[na:na]
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) ~[na:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_212]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_212]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_212]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_212]
Caused by: io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Cannot delete cluster 'cdap-loadfromb-a7999324-9413-11e9-a296-564a3b7813c8' while it has other pending delete operations.
at io.grpc.Status.asRuntimeException(Status.java:526) ~[na:na]
... 19 common frames omitted
2019-06-21 11:17:37,235 - DEBUG [provisioning-service-4:i.c.c.i.p.t.ProvisioningTask#159] - Terminated DEPROVISION task for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8 due to exception.

Because the Dataproc cluster remains in "provisioning", my suspicion is that the network being used for the Dataproc cluster is not configured such that the nodes of the Dataproc cluster can communicate with each other.
For more information on this, see https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/network#overview.

Make sure the Data Fusion has default network accesses. If you have a new VPC without default network firewall rules, you might face with this problem. Basically give it a try to run Data Fusion on default VPC network with following properties.
"system.profile.properties.network=default"

Related

Spring boot and mongo db gitlab CI/CD

I have set up a spring boot API that does CRUD operations with MongoDB running.
I need to run the commands on CI with hosted MongoDB automatically using GitLab
mvn clean install -B
and
mvn clean test
from .gitlab-ci.yml.
com.mongodb.MongoSocketOpenException: Exception opening socket
at com.mongodb.internal.connection.SocketStream.open(SocketStream.java:70) ~[mongodb-driver-core-3.11.2.jar:na]
at com.mongodb.internal.connection.InternalStreamConnection.open(InternalStreamConnection.java:128) ~[mongodb-driver-core-3.11.2.jar:na]
at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:117) ~[mongodb-driver-core-3.11.2.jar:na]
at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:na]
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) ~[na:na]
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242) ~[na:na]
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) ~[na:na]
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403) ~[na:na]
at java.base/java.net.Socket.connect(Socket.java:609) ~[na:na]
at com.mongodb.internal.connection.SocketStreamHelper.initialize(SocketStreamHelper.java:64) ~[mongodb-driver-core-3.11.2.jar:na]
at com.mongodb.internal.connection.SocketStream.initializeSocket(SocketStream.java:79) ~[mongodb-driver-core-3.11.2.jar:na]
at com.mongodb.internal.connection.SocketStream.open(SocketStream.java:65) ~[mongodb-driver-core-3.11.2.jar:na]
... 3 common frames omitted
Looking for a .gitlab-ci.yml file that does Mongo server instantiation and DB creation before running mvn clean install
current .gitlab-ci.yml
image: maven:3.6-jdk-11
stages:
- build
cache:
paths:
- target/
services:
- mongo:latest
build:
stage: build
script:
- "mvn clean install -B"
artifacts:
paths:
- ./target/*************
Thanks for the prominent answers/ comments
I was able to solve the problem using embed.mongo in such a way that while the normal API run's it uses actual Mongo store hosted. The tests will be run on embed.mongo
Dependency added :
<dependency>
<groupId>de.flapdoodle.embed</groupId>
<artifactId>de.flapdoodle.embed.mongo</artifactId>
<version>1.48.0</version>
<scope>test</scope>
</dependency>
Common Mongo config
#Configuration
#EnableAutoConfiguration(exclude = { EmbeddedMongoAutoConfiguration.class })
public class MongoConfig {
}
Annotations used for the test class
#ContextConfiguration(classes = {MongoConfig.class})
#RunWith(SpringJUnit4ClassRunner.class)
#SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.DEFINED_PORT)
#AutoConfigureDataMongo

Zeppeling throwing NullPointerException while configuring

I am trying to set up zeppelin-0.8.0 on my windos 8 r2 OS. I have already running spark on my console i.e. SPARK_HOME and JAVA_HOME, HADOOP_HOME set up and running fine. But while I am trying to execute printl("hello") in zeppelin spark interpreter it is throwing bellow error ...
I already set SPARK_HOME and JAVA_HOME in zeppelin-env.cmd file.
Error
DEBUG [2019-01-22 10:05:34,129] ({pool-2-thread-2} RemoteInterpreterManagedProcess.java[start]:153) - callbackServer is serving now
INFO [2019-01-22 10:05:34,143] ({pool-2-thread-2} RemoteInterpreterManagedProcess.java[start]:190) - Run interpreter process [C:\Software\Zepplin\zepplin\bin\interpreter.cmd, -d, C:\Software\Zepplin\zepplin/interpreter/spark, -c, 10.188.16
DEBUG [2019-01-22 10:05:34,419] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:298) - When was unexpected at this time.
INFO [2019-01-22 10:05:34,435] ({Exec Default Executor} RemoteInterpreterManagedProcess.java[onProcessFailed]:250) - Interpreter process failed {}
org.apache.commons.exec.ExecuteException: Process exited with an error: 255 (Exit value: 255)
at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:48)
at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:200)
at java.lang.Thread.run(Thread.java:748)
ERROR [2019-01-22 10:06:34,177] ({pool-2-thread-2} Job.java[run]:190) - Job failed
java.lang.RuntimeException: When was unexpected at this time.
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:205)
at org.apache.zeppelin.interpreter.ManagedInterpreterGroup.getOrCreateInterpreterProcess(ManagedInterpreterGroup.java:64)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getOrCreateInterpreterProcess(RemoteInterpreter.java:111)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:164)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:132)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:299)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:407)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
ERROR [2019-01-22 10:06:52,103] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2294) - Error
java.lang.RuntimeException: When was unexpected at this time.
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:205)
at org.apache.zeppelin.interpreter.ManagedInterpreterGroup.getOrCreateInterpreterProcess(ManagedInterpreterGroup.java:64)

Spark in cluster mode throws error if a SparkContext is not started

I have a Spark job that initializes the spark context only if it is really necessary:
val conf = new SparkConf()
val jobs: List[Job] = ??? //get some jobs
if(jobs.nonEmpty) {
val sc = new SparkContext(conf)
sc.parallelize(jobs).foreach(....)
} else {
//do nothing
}
It worked fine on Yarn if deploy-mode is 'client'
spark-submit --master yarn --deploy-mode client
Then I switched deploy mode to 'cluster' and it started to crash in case of jobs.isEmpty
spark-submit --master yarn --deploy-mode cluster
Below is the error text:
INFO yarn.Client: Application report for
application_1509613523426_0017 (state: ACCEPTED)
17/11/02 11:37:17
INFO yarn.Client: Application report for
application_1509613523426_0017 (state: FAILED) 17/11/02 11:37:17
INFO yarn.Client: client token: N/A diagnostics: Application
application_1509613523426_0017 failed 2 times due to AM Container for
appattempt_1509613523426_0017_000002 exited with exitCode: -1000 For
more detailed output, check application tracking
page:http://xxxxxx.com:8088/cluster/app/application_1509613523426_0017Then,
click on links to logs of each attempt. Diagnostics: File does not
exist:
hdfs://xxxxxxx/.sparkStaging/application_1509613523426_0017/__spark_libs__997458388067724499.zip
java.io.FileNotFoundException: File does not exist:
hdfs://xxxxxxx/.sparkStaging/application_1509613523426_0017/__spark_libs__997458388067724499.zip
at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at
org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Failing this attempt. Failing the application. ApplicationMaster
host: N/A ApplicationMaster RPC port: -1 queue: dev start time:
1509622629354 final status: FAILED tracking URL:
http://xxxxxx.com:8088/cluster/app/application_1509613523426_0017 user: xxx Exception in thread "main"
org.apache.spark.SparkException: Application
application_1509613523426_0017 finished with failed status at
org.apache.spark.deploy.yarn.Client.run(Client.scala:1104) at
org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150) at
org.apache.spark.deploy.yarn.Client.main(Client.scala) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/11/02 11:37:17 INFO util.ShutdownHookManager: Shutdown hook called
17/11/02 11:37:17 INFO util.ShutdownHookManager: Deleting directory
/tmp/spark-a5b20def-0218-4b0c-b9f8-fdf8a1802e95
Is it a bug in Yarn support or I'm missing something?
SparkContext is the one who is responsible for communication with cluster manager. If application is submitted to the cluster, but context is never created, YARN cannot determine the state of the application - this is why you get an error.

How to restart Spark Streaming job from checkpoint on Dataproc?

This is a follow up to Spark streaming on dataproc throws FileNotFoundException
Over the past few weeks (not sure since exactly when), restart of a spark streaming job, even with the "kill dataproc.agent" trick is throwing this exception:
17/05/16 17:39:02 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at stream-event-processor-m/10.138.0.3:8032
17/05/16 17:39:03 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1494955637459_0006
17/05/16 17:39:04 ERROR org.apache.spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258)
at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:140)
at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:826)
at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:826)
at scala.Option.map(Option.scala:146)
at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:826)
at com.thumbtack.common.model.SparkStream$class.main(SparkStream.scala:73)
at com.thumbtack.skyfall.StreamEventProcessor$.main(StreamEventProcessor.scala:19)
at com.thumbtack.skyfall.StreamEventProcessor.main(StreamEventProcessor.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/05/16 17:39:04 INFO org.spark_project.jetty.server.ServerConnector: Stopped ServerConnector#5555ffcf{HTTP/1.1}{0.0.0.0:4479}
17/05/16 17:39:04 WARN org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
17/05/16 17:39:04 ERROR org.apache.spark.util.Utils: Uncaught exception in thread main
java.lang.NullPointerException
at org.apache.spark.network.shuffle.ExternalShuffleClient.close(ExternalShuffleClient.java:152)
at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1360)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:87)
at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1797)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1290)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1796)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:565)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258)
at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:140)
at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:826)
at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:826)
at scala.Option.map(Option.scala:146)
at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:826)
at com.thumbtack.common.model.SparkStream$class.main(SparkStream.scala:73)
at com.thumbtack.skyfall.StreamEventProcessor$.main(StreamEventProcessor.scala:19)
at com.thumbtack.skyfall.StreamEventProcessor.main(StreamEventProcessor.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258)
at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:140)
at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:826)
at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:826)
at scala.Option.map(Option.scala:146)
at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:826)
at com.thumbtack.common.model.SparkStream$class.main(SparkStream.scala:73)
at com.thumbtack.skyfall.StreamEventProcessor$.main(StreamEventProcessor.scala:19)
at com.thumbtack.skyfall.StreamEventProcessor.main(StreamEventProcessor.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Job output is complete
How to restart a spark streaming job from its checkpoint on a Dataproc cluster?
We've recently added auto-restart capabilities to dataproc jobs (available in gcloud beta track and in v1 API).
To take advantage of auto-restart, a job must be able to recover/cleanup so it will not work for most jobs without modification. However, it does work out of the box with Spark streaming with checkpoint files.
The restart-dataproc-agent trick should no longer be necessary. Auto-restart is resilient against Job crashes, Dataproc Agent failures, and VM restart-on-migration events.
Example:
gcloud beta dataproc jobs submit spark ... --max-failures-per-hour 1
See:
https://cloud.google.com/dataproc/docs/concepts/restartable-jobs
If you want to test out recovery, you can simulate VM migration by restarting the master VM [1]. After this you should be able to describe the job [2] and see ATTEMPT_FAILURE entry in statusHistory.
[1] gcloud compute instances reset <cluster-name>-m
[2] gcloud dataproc jobs describe

Spark Job failing with exit status 15

I am trying to run simple word count job in spark but I am getting exception while running job.
For more detailed output, check application tracking page:http://quickstart.cloudera:8088/proxy/application_1446699275562_0006/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1446699275562_0006_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 15
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.cloudera
start time: 1446910483956
final status: FAILED
tracking URL: http://quickstart.cloudera:8088/cluster/app/application_1446699275562_0006
user: cloudera
Exception in thread "main" org.apache.spark.SparkException: Application finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:626)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:651)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I checked log from following command
yarn logs -applicationId application_1446699275562_0006
Here is log
15/11/07 07:35:09 ERROR yarn.ApplicationMaster: User class threw exception: Output directory hdfs://quickstart.cloudera:8020/user/cloudera/WordCountOutput already exists
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://quickstart.cloudera:8020/user/cloudera/WordCountOutput already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1053)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:954)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:863)
at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1290)
at org.com.td.sparkdemo.spark.WordCount$.main(WordCount.scala:23)
at org.com.td.sparkdemo.spark.WordCount.main(WordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:480)
15/11/07 07:35:09 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Output directory hdfs://quickstart.cloudera:8020/user/cloudera/WordCountOutput already exists)
15/11/07 07:35:14 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application.
Exception clearly indicates that WordCountOutput directory already exists but I made sure that directory is not there before running job.
Why I am getting this error even though directory was not there before running my job?
I came across same issue and fixed it by adding below highlighted part.
SparkConf sparkConf = new SparkConf().setAppName("Sentiment Scoring").set("spark.hadoop.validateOutputSpecs", "true");
Thanks,
Sathish.
For us this was caused by "running beyond physical memory limits". After increasing executer memory, the issue was fixed.
This Error mostly occurs while you submitting the wrong spark parameter in the spark-submit command. Please check the configuration parameters. In my case, I failed to submit masternamenode address where I need to read resources from HDFS.
Before : Where yarn threw Error- 15
dfs://masternamenode/TESTCGNATDATA/
After: Able to run the application
hdfs://masternamenode/TESTCGNATDATA/