spark-submit failed due to Exception in thread "main" java.lang.NoSuchMethodException - scala

I got the error information below when i tried to submit my spark job for testing purpose.
jianrui#spark:~$ sudo $SPARK_HOME/bin/spark-submit --class com.test.spark.FirstScalaExample --master spark://spark.sparkstreaming.i10.internal.cloudapp.net:7077 /opt/spark/FirstScalaExample-0.0.1.jar
Exception in thread "main" java.lang.NoSuchMethodException: com.test.spark.FirstScalaExample.main([Ljava.lang.String;)
at java.lang.Class.getMethod(Class.java:1786)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:42)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2018-04-06 13:13:00 INFO ShutdownHookManager:54 - Shutdown hook called
2018-04-06 13:13:00 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-7f47cab1-f8b3-4731-bd67-e0d0ad013617
[Scala version - 2.11.6]
[Hadoop version - 2.7.5]
[Spark version - 2.3.0]
Note:
To be informed that i specified the hadoop native lib in this way "export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native" in the "spark-env.sh", and i only extract the spark but not installed.
I don't know what exactly the problem is.

Related

Spark application erroring out because driver outofmemory even though lot free memory still available

Can someone please help me to figure out my simple spark application is requiring huge driver memory? Even though I allocated about 112GB, my application fails at about 67GB.
Thanks in advance
The spark driver is using huge memory for running simple application
Allocated about 112G of memory for running my application when do spark-submit
At start of the job, I see below message in the logs
1019 [main] INFO org.apache.spark.storage.memory.MemoryStore - MemoryStore started with capacity 67.0 GiB
My application fails with this error message
java.lang.IllegalStateException: dag-scheduler-event-loop has already been stopped accidentally.
at org.apache.spark.util.EventLoop.post(EventLoop.scala:107)
at org.apache.spark.scheduler.DAGScheduler.taskStarted(DAGScheduler.scala:283)
at org.apache.spark.scheduler.TaskSetManager.prepareLaunchingTask(TaskSetManager.scala:539)
at org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$2(TaskSetManager.scala:478)
at scala.Option.map(Option.scala:230)
at org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:455)
at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2(TaskSchedulerImpl.scala:395)
at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2$adapted(TaskSchedulerImpl.scala:390)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$1(TaskSchedulerImpl.scala:390)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOfferSingleTaskSet(TaskSchedulerImpl.scala:381)
at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20(TaskSchedulerImpl.scala:587)
at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20$adapted(TaskSchedulerImpl.scala:582)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16(TaskSchedulerImpl.scala:582)
at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16$adapted(TaskSchedulerImpl.scala:555)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:555)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.$anonfun$makeOffers$5(CoarseGrainedSchedulerBackend.scala:359)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$$withLock(CoarseGrainedSchedulerBackend.scala:955)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$DriverEndpoint$$makeOffers(CoarseGrainedSchedulerBackend.scala:351)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:162)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
59376410 [dispatcher-CoarseGrainedScheduler] INFO org.apache.spark.scheduler.TaskSchedulerImpl - Cancelling stage 1
59376410 [dispatcher-CoarseGrainedScheduler] INFO org.apache.spark.scheduler.TaskSchedulerImpl - Killing all running tasks in stage 1: Stage cancelled
59376415 [dispatcher-CoarseGrainedScheduler] ERROR org.apache.spark.scheduler.DAGSchedulerEventProcessLoop - DAGSchedulerEventProcessLoop failed; shutting down SparkContext
scala code snippet
val df = spark.read.parquet(data_path)
df.rdd.foreachPartition(p => {
// code to process the code...
})
Job Submit
'''
spark-submit --master "spark://x.x.x.x:7077" --driver-cores=4 --driver-memory=112G --conf spark.driver.maxResultSize=0 --conf spark.rpc.message.maxSize=2047 --conf spark.driver.host=x.x.x.x --class myclass.processor --packages "..,org.apache.hadoop:hadoop-azure:3.3.1" --deploy-mode client
'''

Remote Spark Connection - Scala: Could not find BlockManagerMaster

Spark Master and Worker, both are running in localhost. I have started Master and Worker node by triggering command:
sbin/start-all.sh
Logs for master node invocation:
Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/bin/java -cp /Users/gaurishi/spark/spark-2.3.1-bin-hadoop2.7/conf/:/Users/gaurishi/spark/spark-2.3.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host 192.168.0.38 --port 7077 --webui-port 8080
Logs for Worker node invocation:
Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/bin/java -cp /Users/gaurishi/spark/spark-2.3.1-bin-hadoop2.7/conf/:/Users/gaurishi/spark/spark-2.3.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://192.168.0.38:7077
I have following configuration in conf/spark-env.sh
SPARK_MASTER_HOST=192.168.0.38
Content of /etc/hosts:
127.0.0.1 localhost
::1 localhost
255.255.255.255 broadcasthost
Scala code, that I am invoking to establish remote spark connection:
val sparkConf = new SparkConf()
.setAppName(AppConstants.AppName)
.setMaster("spark://192.168.0.38:7077")
val sparkSession = SparkSession.builder()
.appName(AppConstants.AppName)
.config(sparkConf)
.enableHiveSupport()
.getOrCreate()
While executing code from IDE, I am getting following exception in console:
2018-10-04 14:43:33,426 ERROR [main] spark.SparkContext (Logging.scala:logError(91)) - Error initializing SparkContext.
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
........
Caused by: org.apache.spark.SparkException: Could not find BlockManagerMaster.
at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:157)
at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:132)
.......
2018-10-04 14:43:33,432 INFO [stop-spark-context] spark.SparkContext (Logging.scala:logInfo(54)) - Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
........
Caused by: org.apache.spark.SparkException: Could not find BlockManagerMaster.
at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:157)
at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:132)
........
Logs from /logs/master shows following error:
18/10/04 14:43:13 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
java.io.InvalidClassException: org.apache.spark.rpc.RpcEndpointRef; local class incompatible: stream classdesc serialVersionUID = 1835832137613908542, local class serialVersionUID = -1329125091869941550
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
.......
What changes should be done to connect spark remotely?
Spark Versions:
Spark: spark-2.3.1-bin-hadoop2.7
Build dependencies:
Scala: 2.11
Spark-hive: 2.2.2
Maven-org-spark-project-hive hive-metastore = 1.x;
Logs:
Console log
Spark Master-Node log
I know this is an old post. But, sharing my answer to save someone else precious time.
I was facing a similar issue two days back, and after so much of hacking, I found the root cause for the problem was the Scala version I was using in my Maven project.
I was using Spark 2.4.3, and it's internally using Scala 2.11, and the Scala project I was using was compiled with Scala 2.12. This Scala version mismatch was the reason for the above error.
When I downgraded the Scala version in my Maven project, it started working. Hope it helps.

Spark in cluster mode throws error if a SparkContext is not started

I have a Spark job that initializes the spark context only if it is really necessary:
val conf = new SparkConf()
val jobs: List[Job] = ??? //get some jobs
if(jobs.nonEmpty) {
val sc = new SparkContext(conf)
sc.parallelize(jobs).foreach(....)
} else {
//do nothing
}
It worked fine on Yarn if deploy-mode is 'client'
spark-submit --master yarn --deploy-mode client
Then I switched deploy mode to 'cluster' and it started to crash in case of jobs.isEmpty
spark-submit --master yarn --deploy-mode cluster
Below is the error text:
INFO yarn.Client: Application report for
application_1509613523426_0017 (state: ACCEPTED)
17/11/02 11:37:17
INFO yarn.Client: Application report for
application_1509613523426_0017 (state: FAILED) 17/11/02 11:37:17
INFO yarn.Client: client token: N/A diagnostics: Application
application_1509613523426_0017 failed 2 times due to AM Container for
appattempt_1509613523426_0017_000002 exited with exitCode: -1000 For
more detailed output, check application tracking
page:http://xxxxxx.com:8088/cluster/app/application_1509613523426_0017Then,
click on links to logs of each attempt. Diagnostics: File does not
exist:
hdfs://xxxxxxx/.sparkStaging/application_1509613523426_0017/__spark_libs__997458388067724499.zip
java.io.FileNotFoundException: File does not exist:
hdfs://xxxxxxx/.sparkStaging/application_1509613523426_0017/__spark_libs__997458388067724499.zip
at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at
org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Failing this attempt. Failing the application. ApplicationMaster
host: N/A ApplicationMaster RPC port: -1 queue: dev start time:
1509622629354 final status: FAILED tracking URL:
http://xxxxxx.com:8088/cluster/app/application_1509613523426_0017 user: xxx Exception in thread "main"
org.apache.spark.SparkException: Application
application_1509613523426_0017 finished with failed status at
org.apache.spark.deploy.yarn.Client.run(Client.scala:1104) at
org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150) at
org.apache.spark.deploy.yarn.Client.main(Client.scala) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/11/02 11:37:17 INFO util.ShutdownHookManager: Shutdown hook called
17/11/02 11:37:17 INFO util.ShutdownHookManager: Deleting directory
/tmp/spark-a5b20def-0218-4b0c-b9f8-fdf8a1802e95
Is it a bug in Yarn support or I'm missing something?
SparkContext is the one who is responsible for communication with cluster manager. If application is submitted to the cluster, but context is never created, YARN cannot determine the state of the application - this is why you get an error.

neo4j - graphaware plugins

I downloaded the plugins of graphaware nlp,open-nlp,framework and copied the jar files to the plugins directory.
And as per the steps in neo4j , i included the following lines in neo4j.config file
dbms.unmanaged_extension_classes=com.graphaware.server=/graphaware
com.graphaware.runtime.enabled=true
com.graphaware.module.NLP.2=com.graphaware.nlp.module.NLPBootstrapper
After inserting this the localhost:7474 is not starting.
But when i comment these lines localhost starts and works properly but doesnt include the plugins.
Version : enterprise 3.1.3
Error in LocalLost after commenting those lines:
Failed to invoke procedure `ga.nlp.annotate`: Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: No GraphAware Runtime is registered with the given database
Error in log file:
2017-11-07 10:41:03.839+0000 INFO ======== Neo4j 3.1.3 ========
2017-11-07 10:41:04.120+0000 INFO Starting...
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/share/neo4j/lib/slf4j-nop-1.7.22.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/var/lib/neo4j/plugins/nlp-opennlp-3.1.3.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.helpers.NOPLoggerFactory]
2017-11-07 10:41:04.985+0000 INFO Bolt enabled on localhost:7687.
2017-11-07 10:41:05.010+0000 INFO Initiating metrics...
2017-11-07 10:41:07.374+0000 INFO [c.g.r.b.RuntimeKernelExtension] GraphAware Runtime enabled, bootstrapping...
2017-11-07 10:41:07.444+0000 INFO [c.g.r.b.RuntimeKernelExtension] Bootstrapping module with order 2, ID NLP, using com.graphaware.nlp.module.NLPBootstrapper
2017-11-07 10:41:07.523+0000 INFO Registering module NLP with GraphAware Runtime.
2017-11-07 10:41:07.523+0000 INFO [c.g.r.b.RuntimeKernelExtension] GraphAware Runtime bootstrapped, starting the Runtime...
2017-11-07 10:41:21.893+0000 INFO Starting GraphAware...
2017-11-07 10:41:21.894+0000 INFO Loading module metadata...
2017-11-07 10:41:21.894+0000 INFO Loading metadata for module NLP
2017-11-07 10:41:21.946+0000 INFO Module NLP seems to have been registered for the first time.
2017-11-07 10:41:21.947+0000 INFO Module NLP seems to have been registered for the first time, will try to initialize...
2017-11-07 10:41:21.947+0000 INFO InitializeUntil set to 9223372036854775807 and it is 1510051281947. Will initialize.
2017-11-07 10:41:24.709+0000 INFO Started.
2017-11-07 10:41:24.811+0000 INFO Mounted REST API at: /db/manage
2017-11-07 10:41:24.823+0000 INFO [c.g.s.f.b.GraphAwareServerBootstrapper] started
2017-11-07 10:41:24.825+0000 INFO Mounted unmanaged extension [com.graphaware.server] at [/graphaware]
Exception in thread "GraphAware Starter" java.lang.RuntimeException: Error while initializing model of class: class opennlp.tools.namefind.TokenNameFinderModel
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.loadModel(OpenNLPPipeline.java:503)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.lambda$loadNamedEntitiesFinders$2(OpenNLPPipeline.java:161)
at java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1691)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.loadNamedEntitiesFinders(OpenNLPPipeline.java:158)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.init(OpenNLPPipeline.java:118)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.<init>(OpenNLPPipeline.java:108)
at com.graphaware.nlp.processor.opennlp.PipelineBuilder.build(PipelineBuilder.java:79)
at com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor.createPhrasePipeline(OpenNLPTextProcessor.java:106)
at com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor.init(OpenNLPTextProcessor.java:56)
at com.graphaware.nlp.processor.TextProcessorsManager.lambda$initiateTextProcessors$0(TextProcessorsManager.java:61)
at java.util.HashMap$Values.forEach(HashMap.java:980)
at com.graphaware.nlp.processor.TextProcessorsManager.initiateTextProcessors(TextProcessorsManager.java:60)
at com.graphaware.nlp.processor.TextProcessorsManager.<init>(TextProcessorsManager.java:37)
at com.graphaware.nlp.NLPManager.init(NLPManager.java:95)
at com.graphaware.nlp.module.NLPModule.initialize(NLPModule.java:52)
at com.graphaware.runtime.manager.ProductionTxDrivenModuleManager.initialize(ProductionTxDrivenModuleManager.java:57)
at com.graphaware.runtime.manager.BaseTxDrivenModuleManager.initializeIfAllowed(BaseTxDrivenModuleManager.java:128)
at com.graphaware.runtime.manager.BaseTxDrivenModuleManager.handleNoMetadata(BaseTxDrivenModuleManager.java:72)
at com.graphaware.runtime.manager.BaseTxDrivenModuleManager.handleNoMetadata(BaseTxDrivenModuleManager.java:39)
at com.graphaware.runtime.manager.BaseModuleManager.loadMetadata(BaseModuleManager.java:143)
at com.graphaware.runtime.manager.BaseModuleManager.loadMetadata(BaseModuleManager.java:125)
at com.graphaware.runtime.TxDrivenRuntime.loadMetadata(TxDrivenRuntime.java:130)
at com.graphaware.runtime.ProductionRuntime.loadMetadata(ProductionRuntime.java:80)
at com.graphaware.runtime.BaseGraphAwareRuntime.startModules(BaseGraphAwareRuntime.java:154)
at com.graphaware.runtime.TxDrivenRuntime.startModules(TxDrivenRuntime.java:146)
at com.graphaware.runtime.ProductionRuntime.startModules(ProductionRuntime.java:70)
at com.graphaware.runtime.BaseGraphAwareRuntime.start(BaseGraphAwareRuntime.java:134)
at com.graphaware.runtime.bootstrap.RuntimeKernelExtension.lambda$start$8(RuntimeKernelExtension.java:117)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor29.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.loadModel(OpenNLPPipeline.java:499)
... 29 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at opennlp.tools.ml.model.AbstractModelReader.getParameters(AbstractModelReader.java:140)
at opennlp.tools.ml.maxent.io.GISModelReader.constructModel(GISModelReader.java:78)
at opennlp.tools.ml.model.GenericModelReader.constructModel(GenericModelReader.java:62)
at opennlp.tools.ml.model.AbstractModelReader.getModel(AbstractModelReader.java:85)
at opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:32)
at opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:29)
at opennlp.tools.util.model.BaseModel.finishLoadingArtifacts(BaseModel.java:309)
at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:239)
at opennlp.tools.util.model.BaseModel.<init>(BaseModel.java:173)
at opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:103)
at sun.reflect.GeneratedConstructorAccessor29.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.loadModel(OpenNLPPipeline.java:499)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.lambda$loadNamedEntitiesFinders$2(OpenNLPPipeline.java:161)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline$$Lambda$239/1188677545.accept(Unknown Source)
at java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1691)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.loadNamedEntitiesFinders(OpenNLPPipeline.java:158)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.init(OpenNLPPipeline.java:118)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.<init>(OpenNLPPipeline.java:108)
at com.graphaware.nlp.processor.opennlp.PipelineBuilder.build(PipelineBuilder.java:79)
at com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor.createPhrasePipeline(OpenNLPTextProcessor.java:106)
at com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor.init(OpenNLPTextProcessor.java:56)
at com.graphaware.nlp.processor.TextProcessorsManager.lambda$initiateTextProcessors$0(TextProcessorsManager.java:61)
at com.graphaware.nlp.processor.TextProcessorsManager$$Lambda$234/2094381213.accept(Unknown Source)
at java.util.HashMap$Values.forEach(HashMap.java:980)
at com.graphaware.nlp.processor.TextProcessorsManager.initiateTextProcessors(TextProcessorsManager.java:60)
at com.graphaware.nlp.processor.TextProcessorsManager.<init>(TextProcessorsManager.java:37)
at com.graphaware.nlp.NLPManager.init(NLPManager.java:95)
at com.graphaware.nlp.module.NLPModule.initialize(NLPModule.java:52)
at com.graphaware.runtime.manager.ProductionTxDrivenModuleManager.initialize(ProductionTxDrivenModuleManager.java:57)
please help me out
You do not have sufficient memory for the NLP plugins to load, hence the NLP module is not registered and thus not available once that database has started.
As stated in the NLP plugin README, you need at least 4GB of heap for the modules to run, adapt it in your neo4j.conf and restart.

Spark RDD method "saveAsTextFile" throwing exception Even after deleting the output directory. org.apache.hadoop.mapred.FileAlreadyExistsException

I am calling this method on an RDD[String] with destination in the arguments. (Scala)
Even after deleting the directory before starting, the process gives this error.
I am running this process on EMR cluster with output location at aws S3.
Below is the command used:
spark-submit --deploy-mode cluster --class com.hotwire.hda.spark.prd.pricingengine.PRDPricingEngine --conf spark.yarn.submit.waitAppCompletion=true --num-executors 21 --executor-cores 4 --executor-memory 20g --driver-memory 8g --driver-cores 4 s3://bi-aws-users/sbatheja/hotel-shopper-0.0.1-SNAPSHOT-jar-with-dependencies.jar -d 3 -p 100 --search-bucket s3a://hda-prod-business.hotwire.hotel.search --prd-output-path s3a://bi-aws-users/sbatheja/PRD/PriceEngineOutput/
Log:
16/07/07 11:27:47 INFO BlockManagerMaster: BlockManagerMaster stopped
16/07/07 11:27:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/07/07 11:27:47 INFO SparkContext: Successfully stopped SparkContext
16/07/07 11:27:47 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: **org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory s3a://bi-aws-users/sbatheja/PRD/PriceEngineOutput already exists)**
16/07/07 11:27:47 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/07/07 11:27:47 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/07/07 11:27:47 INFO AMRMClientImpl: Waiting for application to be successfully unregistered.
16/07/07 11:27:47 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
16/07/07 11:27:47 INFO ApplicationMaster: Deleting staging directory .sparkStaging/application_1467889642439_0001
16/07/07 11:27:47 INFO ShutdownHookManager: Shutdown hook called
16/07/07 11:27:47 INFO ShutdownHookManager: Deleting directory /mnt/yarn/usercache/hadoop/appcache/application_1467889642439_0001/spark-7f836950-a040-4216-9308-2bb4565c5649
It creates "_temporary" directory in the location, which contains empty part files.
In short, a word:
Make sure the scala version of spark-core and scala-library is consistent.
I encountered the same problem.
As I saving the file to the HDFS, it throws an exception: org.apache.hadoop.mapred.FileAlreadyExistsException
Then I checked the HDFS file directory, there is a empty temporary folder: TARGET_DIR/_temporary/0.
You can submit the job, open the detailed configuration:./spark-submit --verbose.
And then look at the full context and log, there must be other errors caused.
My job in the RUNNING state, the first error is thrown:
17/04/23 11:47:02 ERROR executor.Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
Then the job will be retried and re-executed. At this time, job re-implementation, it will find just the directory has been created. And also throws the directory already exists.
After confirming that the first error is version compatibility issues.
The spark version is 2.1.0, the corresponding spark-core scala version is 2.11, and the scala-library dependency of the scala version is 2.12.xx.
When the two scala version of the change is consistent (usually modify the scala-library version), you can solve the first exception problem, then job can be normal FINISHED.
pom.xml example:
<!-- Spark -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.0</version>
</dependency>
<!-- scala -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.7</version>
</dependency>