Database schema is different - Exception in OrientDB - Baasbox - orientdb

I am getting the following exception on starting up baasbox (it has embedded orientdb).
This exception happens when I connect to the database using console.sh/connect:plocal and then run a few select queries, but doesn't happen the first time I startup baasbox.
Any help is appreciated.
Orientdb version: 1.7.10
com.orientechnologies.orient.core.exception.OConfigurationException: Database schema is different. Please export your old database with the previous version of OrientDB and reimport it using the current one.
at com.orientechnologies.orient.core.metadata.schema.OSchemaShared.fromStream(OSchemaShared.java:800) ~[orientdb-core-1.7.10.jar:1.7.10]
at com.orientechnologies.orient.core.type.ODocumentWrapperNoClass.reload(ODocumentWrapperNoClass.java:70) ~[orientdb-core-1.7.10.jar:1.7.10]
at com.orientechnologies.orient.core.metadata.schema.OSchemaShared.load(OSchemaShared.java:921) ~[orientdb-core-1.7.10.jar:1.7.10]
at com.orientechnologies.orient.core.metadata.OMetadataDefault$1.call(OMetadataDefault.java:115) ~[orientdb-core-1.7.10.jar:1.7.10]
at com.orientechnologies.orient.core.metadata.OMetadataDefault$1.call(OMetadataDefault.java:110) ~[orientdb-core-1.7.10.jar:1.7.10]
at com.orientechnologies.common.concur.resource.OSharedContainerImpl.getResource(OSharedContainerImpl.java:53) ~[orient-commons-1.7.10.jar:1.7.10]
Wrapped by: com.orientechnologies.common.exception.OException: Error on creation of shared resource
at com.orientechnologies.common.concur.resource.OSharedContainerImpl.getResource(OSharedContainerImpl.java:55) ~[orient-commons-1.7.10.jar:1.7.10]
at com.orientechnologies.orient.core.metadata.OMetadataDefault.init(OMetadataDefault.java:110) ~[orientdb-core-1.7.10.jar:1.7.10]
at com.orientechnologies.orient.core.metadata.OMetadataDefault.load(OMetadataDefault.java:68) ~[orientdb-core-1.7.10.jar:1.7.10]
at com.orientechnologies.orient.core.db.record.ODatabaseRecordAbstract.open(ODatabaseRecordAbstract.java:291) ~[orientdb-core-1.7.10.jar:1.7.10]
at com.orientechnologies.orient.core.db.ODatabaseWrapperAbstract.open(ODatabaseWrapperAbstract.java:49) ~[orientdb-core-1.7.10.jar:1.7.10]
at com.baasbox.db.DbHelper.open(DbHelper.java:353) ~[classes/:na]
at com.baasbox.Global.onStart(Global.java:169) ~[classes/:na]
at play.core.j.JavaGlobalSettingsAdapter.onStart(JavaGlobalSettingsAdapter.scala:18) [play_2.10.jar:2.2.4]
at play.api.GlobalPlugin.onStart(GlobalSettings.scala:203) [play_2.10.jar:2.2.4]
at play.api.Play$$anonfun$start$1$$anonfun$apply$mcV$sp$1.apply(Play.scala:88) [play_2.10.jar:2.2.4]
at play.api.Play$$anonfun$start$1$$anonfun$apply$mcV$sp$1.apply(Play.scala:88) [play_2.10.jar:2.2.4]
at scala.collection.immutable.List.foreach(List.scala:318) [scala-library.jar:na]
at play.api.Play$$anonfun$start$1.apply$mcV$sp(Play.scala:88) [play_2.10.jar:2.2.4]
at play.api.Play$$anonfun$start$1.apply(Play.scala:88) [play_2.10.jar:2.2.4]
at play.api.Play$$anonfun$start$1.apply(Play.scala:88) [play_2.10.jar:2.2.4]
at play.utils.Threads$.withContextClassLoader(Threads.scala:18) [play_2.10.jar:2.2.4]
at play.api.Play$.start(Play.scala:87) [play_2.10.jar:2.2.4]
at play.core.ReloadableApplication$$anonfun$get$1$$anonfun$apply$1$$anonfun$1.apply(ApplicationProvider.scala:139) [play_2.10.jar:2.2.4]
at play.core.ReloadableApplication$$anonfun$get$1$$anonfun$apply$1$$anonfun$1.apply(ApplicationProvider.scala:112) [play_2.10.jar:2.2.4]
at scala.Option.map(Option.scala:145) [scala-library.jar:na]
at play.core.ReloadableApplication$$anonfun$get$1$$anonfun$apply$1.apply(ApplicationProvider.scala:112) [play_2.10.jar:2.2.4]
at play.core.ReloadableApplication$$anonfun$get$1$$anonfun$apply$1.apply(ApplicationProvider.scala:110) [play_2.10.jar:2.2.4]
at scala.util.Success.flatMap(Try.scala:200) [scala-library.jar:na]
at play.core.ReloadableApplication$$anonfun$get$1.apply(ApplicationProvider.scala:110) [play_2.10.jar:2.2.4]
at play.core.ReloadableApplication$$anonfun$get$1.apply(ApplicationProvider.scala:102) [play_2.10.jar:2.2.4]
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) [scala-library.jar:na]
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) [scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1361) [scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library.jar:na]

I got this reply from the baasbox team:
When you use console.sh (the OrientDB console tool), be sure the BaasBox is not running, otherwise you can have data corruption.
However now it is clear that the error occurs after you have accessed the db via the console using a different version of the ODB tool.
I was using a different version of Orientdb tool (console.sh v2.0) than the orientdb version 1.7.10 used in baasbox. This caused the corruption and may have messed up the DB schema.

Related

How to load local file using sc.textFile in spark?

I've been trying to load local file using sc.textFile()in spark.
I already read [question]:How to load local file in sc.textFile, instead of HDFS
I have local file in /home/spark/data.txt on Centos 7.0
When I use val data = sc.textFile("file:///home/spark/data.txt").collect, I got a error as below.
16/12/27 12:15:56 WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID
36,): java.io.FileNotFoundException: File file:/home/spark/data.txt does not
exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:140)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767)
at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:109)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:246)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:209)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/12/27 12:15:56 ERROR TaskSetManager: Task 0 in stage 5.0 failed 4
times; aborting job org.apache.spark.SparkException: Job aborted due
to stage failure: Task 0 in stage 5.0 failed 4 times, most recent
failure: Lost task 0.3 in stage 5.0 (TID 42,):
java.io.FileNotFoundException: File file:/home/spark/data.txt does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:140)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767)
at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:109)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:246)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:209)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace: at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at scala.Option.foreach(Option.scala:257) at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1871) at
org.apache.spark.SparkContext.runJob(SparkContext.scala:1884) at
org.apache.spark.SparkContext.runJob(SparkContext.scala:1897) at
org.apache.spark.SparkContext.runJob(SparkContext.scala:1911) at
org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:893) at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) at
org.apache.spark.rdd.RDD.collect(RDD.scala:892) ... 48 elided Caused
by: java.io.FileNotFoundException: File file:/home/spark/data.txt does not exist
at
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:140)
at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767) at
org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:109)
at
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:246)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:209) at
org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Apparently there is a file in this path. If I use wrong path, then the error is like below.
val data = sc.textFile("file:///data.txt").collect
org.apache.hadoop.mapred.InvalidInputException: Input path does not
exist: file:/data.txt at
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121) at
org.apache.spark.rdd.RDD.partitions(RDD.scala:246) at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121) at
org.apache.spark.rdd.RDD.partitions(RDD.scala:246) at
org.apache.spark.SparkContext.runJob(SparkContext.scala:1911) at
org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:893) at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) at
org.apache.spark.rdd.RDD.collect(RDD.scala:892)
I don't know why it doesn't work.
Any ideas?
copy that file to your $SPARK_HOME folder and use this command:val data = sc.textFile("data.txt").collect
use this val data = sc.textFile("/home/spark/data.txt") this should work
and set master as local.
Your data file needs to exist in 'home/spark/data.txt' on ALL executer nodes.I know it's kind of preposterous. To fix it, you have the following options:
Move the data file to HDFS
Copy the data file on all the executer nodes
Load the file in pure Scala (not Spark) and then use sc.parallelize() to create the RDDs.
The Problem is our local is different from spark local. so when you run your pyspark, it's mandatory to mention your code must be run in your local machine, especially when you use AWS EC2. So simply run
./pyspark --master local[n]
after that your local and spark local will be the same.....
don't forget to use(file:///....)

Spark doesn't distribute provided driver across the cluster

I have created spark cluster using spark-ec2.
Now I want to submit a task that gets some data from postgres, enriches it, and dumps it back in new table, so I try to do that with the following command:
PYSPARK_PYTHON=/usr/bin/python2.7 ./spark/bin/spark-submit --jars=/root/jars/postgresql-9.4.1208.jre7.jar --py-files=serializers.py parse_pageviews.py s3n://logs
But I get the following exception
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:393)
at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Did not find registered driver with class org.postgresql.Driver
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:57)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:347)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:342)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
As I understand, --jars option should serve the following driver across the cluster, but for some reason, workers can't find it. (if I'm not mistaken)
Here are some code fragments where I'm accessing postgres
# get `attribution_orders` df and register it as table so we can query it
orders = sqlc.read.jdbc(postgres_url, "attribution_orders")
sqlc.registerDataFrameAsTable(orders, "attribution_orders")
# processing...
# write processed data back to postgres table
sales_cycles = sqlc.createDataFrame(mapped)
sales_cycles.write.jdbc(postgres_url, 'scs', mode='overwrite')
So am I doing something wrong? How can I distribute the postgres driver and access it across the cluster? Thanks!
I believe this is a classloader issue, instead of adding the jar using --jars try adding it via yarn.application.classpath in yarn-site.xml.

kafka.common.KafkaStorageException: Failed to change the log file suffix from to .deleted for log segment 0

Does anybody know what this error means ?
kafka.common.KafkaStorageException: Failed to change the log file suffix from to .deleted for log segment 0
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:259)
at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:729)
at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:720)
at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:488)
at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:488)
at scala.collection.immutable.List.foreach(List.scala:318)
at kafka.log.Log.deleteOldSegments(Log.scala:488)
at kafka.log.LogManager.kafka$log$LogManager$$cleanupExpiredSegments(LogManager.scala:411)
at kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:442)
at kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:440)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at kafka.log.LogManager.cleanupLogs(LogManager.scala:440)
at kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(LogManager.scala:182)
at kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:99)
at kafka.utils.Utils$$anon$1.run(Utils.scala:54)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Kafka seems to work but I'd like to understand what's the message about.
Seems that this happens when kafka tries to rename the file when its still open. This is the report of the issue.
The workaround is to delete your kafka logs:
C:\tmp\kafka-logs
C:\tmp\zookeeper
At the time of writing this answer the existing bug is still outstanding.

Play framework hanging

Related to question in
Rest server (Play Framework) gets "Read Timed out" exception during load test
I have similar situation using play framework 2.2.4:
java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
I am testing play with jmeter in my local PC. I hit with 2000 threads and timeout exception occurs, and Socket is somehow not closing.
How do I catch this exception and close the hanging sockets?
StackTrace:
va.util.concurrent.TimeoutException: Futures timed out after [5 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) [scala-library.jar:na]
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) [scala-library.jar:na]
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) ~[scala-library.jar:na]
at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread$$anon$3.block(ThreadPoolBuilder.scala:169) ~[akka-actor_2.10.jar:2.2.0]
at scala.concurrent.forkjoin.ForkJoinPool.managedBlock(ForkJoinPool.java:3640) [scala-library.jar:na]
at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread.blockOn(ThreadPoolBuilder.scala:167) ~[akka-actor_2.10.jar:2.2.0]
at scala.concurrent.Await$.result(package.scala:107) ~[scala-library.jar:na]
at play.core.j.FPromiseHelper$.get(FPromiseHelper.scala:53) ~[play_2.10.jar:2.2.4]
at play.core.j.FPromiseHelper.get(FPromiseHelper.scala) ~[play_2.10.jar:2.2.4]
at play.libs.F$Promise.get(F.java:363) ~[play_2.10.jar:2.2.4]
wrap response>>
at com.ganda.common.controllers.filters.WrapResponse.wrap(WrapResponse.java:50) ~[na:na]
db filter below>>>
at com.ganda.common.controllers.filters.CredentialsFilter.call(CredentialsFilter.java:72) ~[na:na]
at play.core.j.JavaAction$$anon$3.apply(JavaAction.scala:91) ~[play_2.10.jar:2.2.4]
at play.core.j.JavaAction$$anon$3.apply(JavaAction.scala:90) ~[play_2.10.jar:2.2.4]
at play.core.j.FPromiseHelper$$anonfun$flatMap$1.apply(FPromiseHelper.scala:82) ~[play_2.10.jar:2.2.4]
at play.core.j.FPromiseHelper$$anonfun$flatMap$1.apply(FPromiseHelper.scala:82) ~[play_2.10.jar:2.2.4]
at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:251) ~[scala-library.jar:na]
at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:249) ~[scala-library.jar:na]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) [scala-library.jar:na]
at play.core.j.HttpExecutionContext$$anon$2.run(HttpExecutionContext.scala:37) ~[play_2.10.jar:2.2.4]
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:42) ~[akka-actor_2.10.jar:2.2.0]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) ~[akka-actor_2.10.jar:2.2.0]
Wrapped by: play.api.Application$$anon$1: Execution exception[[TimeoutException: Futures timed out after [5 seconds]]]
at play.api.Application$class.handleError(Application.scala:293) ~[play_2.10.jar:2.2.4]
at play.api.DefaultApplication.handleError(Application.scala:399) [play_2.10.jar:2.2.4]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3$$anonfun$applyOrElse$3.apply(PlayDefaultUpstreamHandler.scala:264) [play_2.10.jar:2.2.4]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3$$anonfun$applyOrElse$3.apply(PlayDefaultUpstreamHandler.scala:264) [play_2.10.jar:2.2.4]
at scala.Option.map(Option.scala:145) [scala-library.jar:na]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3.applyOrElse(PlayDefaultUpstreamHandler.scala:264) [play_2.10.jar:2.2.4]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3.applyOrElse(PlayDefaultUpstreamHandler.scala:260) [play_2.10.jar:2.2.4]
at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:344) [scala-library.jar:na]
at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:343) [scala-library.jar:na]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) [scala-library.jar:na]
at play.api.libs.iteratee.Execution$$anon$1.execute(Execution.scala:43) [play-iteratees_2.10.jar:2.2.4]
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) [scala-library.jar:na]
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) [scala-library.jar:na]
at scala.concurrent.Promise$class.complete(Promise.scala:55) [scala-library.jar:na]
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153) [scala-library.jar:na]
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) [scala-library.jar:na]
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) [scala-library.jar:na]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) [scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1361) [scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library.jar:na]
What is happening here is that the Play promise is timeouting probably due to slow down introduced by load.
So this reveals a performance issue in your application, you could if it's acceptable increase the promise timeout or profile and improve performances of the promise processing so that it proceeds faster.

Jboss CDI with RestEasy

I get the following exception, when i try to use cdi in resteasy application
JBWEB000287: Exception sending context initialized event to listener instance of class org.jboss.resteasy.plugins.server.servlet.ResteasyBootstrap: java.lang.RuntimeException: Unable to instantiate InjectorFactory implementation.
at org.jboss.resteasy.spi.ResteasyDeployment.start(ResteasyDeployment.java:154) [resteasy-jaxrs-2.3.7.Final-redhat-2.jar:2.3.7.Final-redhat-2]
at org.jboss.resteasy.plugins.server.servlet.ResteasyBootstrap.contextInitialized(ResteasyBootstrap.java:28) [resteasy-jaxrs-2.3.7.Final-redhat-2.jar:2.3.7.Final-redhat-2]
at org.apache.catalina.core.StandardContext.contextListenerStart(StandardContext.java:3339) [jbossweb-7.2.2.Final-redhat-1.jar:7.2.2.Final-redhat-1]
at org.apache.catalina.core.StandardContext.start(StandardContext.java:3777) [jbossweb-7.2.2.Final-redhat-1.jar:7.2.2.Final-redhat-1]
at org.jboss.as.web.deployment.WebDeploymentService.doStart(WebDeploymentService.java:156) [jboss-as-web-7.3.0.Final-redhat-14.jar:7.3.0.Final-redhat-14]
at org.jboss.as.web.deployment.WebDeploymentService.access$000(WebDeploymentService.java:60) [jboss-as-web-7.3.0.Final-redhat-14.jar:7.3.0.Final-redhat-14]
at org.jboss.as.web.deployment.WebDeploymentService$1.run(WebDeploymentService.java:93) [jboss-as-web-7.3.0.Final-redhat-14.jar:7.3.0.Final-redhat-14]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_71]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [rt.jar:1.7.0_71]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_71]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_71]
at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_71]
at org.jboss.threads.JBossThread.run(JBossThread.java:122)
Caused by: java.lang.InstantiationException: org.jboss.resteasy.cdi.CdiInjectorFactory
at java.lang.Class.newInstance(Class.java:364) [rt.jar:1.7.0_71]
at org.jboss.resteasy.spi.ResteasyDeployment.start(ResteasyDeployment.java:146) [resteasy-jaxrs-2.3.7.Final-redhat-2.jar:2.3.7.Final-redhat-2]
... 12 more
I am using JBoss EAP 6.2. Resteasy version is 2.3.7.Final which is bundled with Jboss. I have beans.xml under WEB-INF. Kindly help me in this regard.