Cluster running on single machine eats too much space of /dev/shm - scala

I am running the example provided by official akka: https://github.com/akka/akka-samples/tree/2.5/akka-sample-cluster-scala.
My OS is: Linux Mint 19 with the latest kernel.
And for the Worker Dial-in Example(Transformation Example), I cannot fully run this example as there is no enough space in /dev/shm. Although I have more than 2GB available space.
The problem is when I launch the first frontend node, it eats some KBs space. When I launch the second one, it eats some MBs space. When I launch the third one, it eats some hundred of MBs space. Further I just cannot even launch the fourth one, it just throws an error which causes the whole cluster down:
[info] Warning: space is running low in /dev/shm (tmpfs) threshold=167,772,160 usable=95,424,512
[info] Warning: space is running low in /dev/shm (tmpfs) threshold=167,772,160 usable=45,088,768
[info] [ERROR] [11/05/2018 21:03:56.156] [ClusterSystem-akka.actor.default-dispatcher-12] [akka://ClusterSystem#127.0.0.1:57246/] swallowing exception during message send
[info] io.aeron.exceptions.RegistrationException: IllegalStateException : Insufficient usable storage for new log of length=50335744 in /dev/shm (tmpfs)
[info] at io.aeron.ClientConductor.onError(ClientConductor.java:174)
[info] at io.aeron.DriverEventsAdapter.onMessage(DriverEventsAdapter.java:81)
[info] at org.agrona.concurrent.broadcast.CopyBroadcastReceiver.receive(CopyBroadcastReceiver.java:100)
[info] at io.aeron.DriverEventsAdapter.receive(DriverEventsAdapter.java:56)
[info] at io.aeron.ClientConductor.service(ClientConductor.java:660)
[info] at io.aeron.ClientConductor.awaitResponse(ClientConductor.java:696)
[info] at io.aeron.ClientConductor.addPublication(ClientConductor.java:371)
[info] at io.aeron.Aeron.addPublication(Aeron.java:259)
[info] at akka.remote.artery.aeron.AeronSink$$anon$1.<init>(AeronSink.scala:103)
[info] at akka.remote.artery.aeron.AeronSink.createLogicAndMaterializedValue(AeronSink.scala:100)
[info] at akka.stream.impl.GraphStageIsland.materializeAtomic(PhasedFusingActorMaterializer.scala:630)
[info] at akka.stream.impl.PhasedFusingActorMaterializer.materialize(PhasedFusingActorMaterializer.scala:450)
[info] at akka.stream.impl.PhasedFusingActorMaterializer.materialize(PhasedFusingActorMaterializer.scala:415)
[info] at akka.stream.impl.PhasedFusingActorMaterializer.materialize(PhasedFusingActorMaterializer.scala:406)
[info] at akka.stream.scaladsl.RunnableGraph.run(Flow.scala:588)
[info] at akka.remote.artery.Association.runOutboundOrdinaryMessagesStream(Association.scala:726)
[info] at akka.remote.artery.Association.runOutboundStreams(Association.scala:657)
[info] at akka.remote.artery.Association.associate(Association.scala:649)
[info] at akka.remote.artery.AssociationRegistry.association(Association.scala:989)
[info] at akka.remote.artery.ArteryTransport.association(ArteryTransport.scala:724)
[info] at akka.remote.artery.ArteryTransport.send(ArteryTransport.scala:710)
[info] at akka.remote.RemoteActorRef.$bang(RemoteActorRefProvider.scala:591)
[info] at akka.actor.ActorRef.tell(ActorRef.scala:124)
[info] at akka.actor.ActorSelection$.rec$1(ActorSelection.scala:265)
[info] at akka.actor.ActorSelection$.deliverSelection(ActorSelection.scala:269)
[info] at akka.actor.ActorSelection.tell(ActorSelection.scala:46)
[info] at akka.actor.ScalaActorSelection.$bang(ActorSelection.scala:280)
[info] at akka.actor.ScalaActorSelection.$bang$(ActorSelection.scala:280)
[info] at akka.actor.ActorSelection$$anon$1.$bang(ActorSelection.scala:198)
[info] at akka.cluster.ClusterCoreDaemon.gossipTo(ClusterDaemon.scala:1330)
[info] at akka.cluster.ClusterCoreDaemon.gossip(ClusterDaemon.scala:1047)
[info] at akka.cluster.ClusterCoreDaemon.gossipTick(ClusterDaemon.scala:1010)
[info] at akka.cluster.ClusterCoreDaemon$$anonfun$initialized$1.applyOrElse(ClusterDaemon.scala:496)
[info] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
[info] at akka.actor.Actor.aroundReceive(Actor.scala:517)
[info] at akka.actor.Actor.aroundReceive$(Actor.scala:515)
[info] at akka.cluster.ClusterCoreDaemon.aroundReceive(ClusterDaemon.scala:295)
[info] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
[info] at akka.actor.ActorCell.invoke(ActorCell.scala:557)
[info] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
[info] at akka.dispatch.Mailbox.run(Mailbox.scala:225)
[info] at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
[info] at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
[info] at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
[info] at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
[info] at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
It seems it is sending huge message(48MB+?) to every nodes.
So what's up here? What is the root cause and how shall I fix this?

Related

OCI error "/opt/docker/bin/my_job" : no such file or directory using sbt docker:publishLocal

If you use sbt docker:publishLocal to build a docker image from your scala project, you will see the below lines printed out:
[info] Packaging /home/user123/myUser/repos/my_job/target/scala-2.12/app_internal_2.12-0.1.jar ...
[info] Done packaging.
[info] Sending build context to Docker daemon 129.7MB
[info] Step 1/7 : FROM openjdk:11-jre
[info] ---> 8c8b7f0ab84c
[info] Step 2/7 : LABEL MAINTAINER="no_name#my.org"
[info] ---> Using cache
[info] ---> d5caf9a92999
[info] Step 3/7 : WORKDIR /opt/docker
[info] ---> Using cache
[info] ---> d887eeb10e8e
[info] Step 4/7 : ADD --chown=root:root opt /opt
[info] ---> 1b43a84a5e32
[info] Step 5/7 : USER root
[info] ---> Running in 282c7f7de8ad
[info] Removing intermediate container 282c7f7de8ad
[info] ---> 11fed4892683
[info] Step 6/7 : ENTRYPOINT ["/opt/docker/bin/my_job"]
[info] ---> Running in 1d297dd1e960
[info] Removing intermediate container 1d297dd1e960
[info] ---> 1923a8df3fcf
[info] Step 7/7 : CMD []
[info] ---> Running in 3d9f7a4a262b
[info] Removing intermediate container 3d9f7a4a262b
[info] ---> d67ed46fd3fe
[info] Successfully built d67ed46fd3fe
[info] Successfully tagged docker_app_internal:0.1
[info] Built image docker_app_internal with tags [0.1]
[success] Total time: 25 s, completed Mar 27, 2019 10:23:35 AM
You may get confused by the error. And why:
THIS WORKS:
docker run -it --entrypoint=/bin/bash docker_app_internal:0.1 -i
Does not work:
docker run docker_app_internal:0.1
Thanks to #Muki for creating this helpful project.
Refer: https://github.com/sbt/sbt-native-packager
If you have the project root folder different from MainClass name, then your entrypoint using sbt docker:publishLocal becomes /your/linuxpath/bin/rootFolder. But, the actual file that gets created in the docker image is /your/linuxpath/bin/main-class (if your main class name is MainClass)
To fix this, please explicitly mention the entrypoint in build.sbt as below:
dockerEntrypoint := Seq("/opt/docker/bin/main-class")

Kafka-Manager Web UI not loading

I have started kafka-manager on centos VM and below are its logs.
[info] o.a.z.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
[info] o.a.z.ZooKeeper - Client environment:java.io.tmpdir=/tmp
[info] o.a.z.ZooKeeper - Client environment:java.compiler=
[info] o.a.z.ZooKeeper - Client environment:os.name=Linux
[info] o.a.z.ZooKeeper - Client environment:os.arch=amd64
[info] o.a.z.ZooKeeper - Client environment:os.version=3.10.0-862.el7.x86_64
[info] o.a.z.ZooKeeper - Client environment:user.name=root
[info] o.a.z.ZooKeeper - Client environment:user.home=/root
[info] o.a.z.ZooKeeper - Client environment:user.dir=/root/Confluent_kafka/kafka-manager-1.3.3.21
[info] o.a.z.ZooKeeper - Initiating client connection, connectString=localhost:3181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState#73687e45
[info] o.a.z.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:3181. Will not attempt to authenticate using SASL (unknown error)
[info] o.a.z.ClientCnxn - Socket connection established to localhost/127.0.0.1:3181, initiating session
[info] k.m.a.KafkaManagerActor - zk=localhost:3181
[info] k.m.a.KafkaManagerActor - baseZkPath=/kafka-manager
[info] o.a.z.ClientCnxn - Session establishment complete on server localhost/127.0.0.1:3181, sessionid = 0x16565ff95660000, negotiated timeout = 60000
[info] k.m.a.KafkaManagerActor - Started actor akka://kafka-manager-system/user/kafka-manager
[info] k.m.a.KafkaManagerActor - Starting delete clusters path cache...
[info] k.m.a.DeleteClusterActor - Started actor akka://kafka-manager-system/user/kafka-manager/delete-cluster
[info] k.m.a.DeleteClusterActor - Starting delete clusters path cache...
[info] k.m.a.KafkaManagerActor - Starting kafka manager path cache...
[info] k.m.a.DeleteClusterActor - Adding kafka manager path cache listener...
[info] k.m.a.DeleteClusterActor - Scheduling updater for 10 seconds
[info] k.m.a.KafkaManagerActor - Adding kafka manager path cache listener...
[info] play.api.Play - Application started (Prod)
[info] p.c.s.NettyServer - Listening for HTTP on /0.0.0.0:9000
[info] k.m.a.KafkaManagerActor - Updating internal state...
[info] k.m.a.KafkaManagerActor - Updating internal state...
[info] k.m.a.KafkaManagerActor - Updating internal state...
[info] k.m.a.KafkaManagerActor - Shutting down kafka manager
Tha kafka manager starts perfectly but the WEB UI does not load at all.
The IPV6 is disabled and the netstat shows this
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 2670/java
Can someone help in this.

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project hive-exec: There are test failures

When I try to build hive from source by following these commands:
$git clone https://git-wip-us.apache.org/repos/asf/hive.git
$cd hive
$mvn clean package -Pdist
I receive this errors after I run "mvn clean package -Pdist
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project hive-exec: There are test failures.
[ERROR]
[ERROR] Please refer to /home/elbasir/hive/ql/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :hive-exec
Can anybody help to solve this? I have been searching the internet for a week to solve it but no success until now.
Here is the failure I got:
Running org.apache.hadoop.hive.ql.parse.repl.load.message.PrimaryToReplicaResourceFunctionTest
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.573 sec <<< FAILURE! - in
org.apache.hadoop.hive.ql.parse.repl.load.message.PrimaryToReplicaResourceFunctionTest
createDestinationPath(org.apache.hadoop.hive.ql.parse.repl.load.message.PrimaryToReplicaResourceFunctionTest) Time elapsed: 1.919 sec <<< FAILURE!
java.lang.AssertionError:
Expected: is "hdfs://somehost:9000/someBasePath/withADir/replicaDbName/somefunctionname/9223372036854775807/ab.jar"
but: was "hdfs://somehost:9000/someBasePath/withADir/replicadbname/somefunctionname/0/ab.jar" at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.junit.Assert.assertThat(Assert.java:865)
at org.junit.Assert.assertThat(Assert.java:832)
at
org.apache.hadoop.hive.ql.parse.repl.load.message.PrimaryToReplicaResourceFunctionTest.createDestinationPath(PrimaryToReplicaResourceFunctionTest.java:100)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:68)
at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.runTestMethod(PowerMockJUnit44RunnerDelegateImpl.java:316)
at org.junit.internal.runners.MethodRoadie$2.run(MethodRoadie.java:88)
at org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:96)
at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.executeTest(PowerMockJUnit44RunnerDelegateImpl.java:300)
at org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTestInSuper(PowerMockJUnit47RunnerDelegateImpl.java:131)
at org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.access$100(PowerMockJUnit47RunnerDelegateImpl.java:59)
at org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner$TestExecutorStatement.evaluate(PowerMockJUnit47RunnerDelegateImpl.java:147)
at org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.evaluateStatement(PowerMockJUnit47RunnerDelegateImpl.java:107)
at org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTest(PowerMockJUnit47RunnerDelegateImpl.java:82)
at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.runBeforesThenTestThenAfters(PowerMockJUnit44RunnerDelegateImpl.java:288)
at org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:86)
at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:49)
at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.invokeTestMethod(PowerMockJUnit44RunnerDelegateImpl.java:208)
at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.runMethods(PowerMockJUnit44RunnerDelegateImpl.java:147)
at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$1.run(PowerMockJUnit44RunnerDelegateImpl.java:121)
at org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:33)
at org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:45)
at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.run(PowerMockJUnit44RunnerDelegateImpl.java:123)
at org.powermock.modules.junit4.common.internal.impl.JUnit4TestSuiteChunkerImpl.run(JUnit4TestSuiteChunkerImpl.java:121)
at org.powermock.modules.junit4.common.internal.impl.AbstractCommonPowerMockRunner.run(AbstractCommonPowerMockRunner.java:53)
at org.powermock.modules.junit4.PowerMockRunner.run(PowerMockRunner.java:59)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
Results :
Failed tests:
PrimaryToReplicaResourceFunctionTest.createDestinationPath:100 Expected: is "hdfs://somehost:9000/someBasePath/withADir/replicaDbName/somefunctionname/9223372036854775807/ab.jar"
but: was "hdfs://somehost:9000/someBasePath/withADir/replicadbname/somefunctionname/0/ab.jar"
Tests run: 3305, Failures: 1, Errors: 0, Skipped: 14
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Hive ............................................... SUCCESS [ 3.223 s]
[INFO] Hive Shims Common .................................. SUCCESS [ 10.496 s]
[INFO] Hive Shims 0.23 .................................... SUCCESS [ 7.779 s]
[INFO] Hive Shims Scheduler ............................... SUCCESS [ 3.569 s]
[INFO] Hive Shims ......................................... SUCCESS [ 1.963 s]
[INFO] Hive Storage API ................................... SUCCESS [01:32 min]
[INFO] Hive Common ........................................ SUCCESS [01:07 min]
[INFO] Hive Service RPC ................................... SUCCESS [ 5.318 s]
[INFO] Hive Serde ......................................... SUCCESS [03:32 min]
[INFO] Hive Standalone Metastore .......................... SUCCESS [ 37.830 s]
[INFO] Hive Metastore ..................................... SUCCESS [02:29 min]
[INFO] Hive Vector-Code-Gen Utilities ..................... SUCCESS [ 0.298 s]
[INFO] Hive Llap Common ................................... SUCCESS [ 7.068 s]
[INFO] Hive Llap Client ................................... SUCCESS [ 22.578 s]
[INFO] Hive Llap Tez ...................................... SUCCESS [ 16.250 s]
[INFO] Spark Remote Client ................................ SUCCESS [ 26.667 s]
[INFO] Hive Query Language ................................ FAILURE [ 01:09 h]
[INFO] Hive Llap Server ................................... SKIPPED
[INFO] Hive Service ....................................... SKIPPED
[INFO] Hive Accumulo Handler .............................. SKIPPED
[INFO] Hive JDBC .......................................... SKIPPED
[INFO] Hive Beeline ....................................... SKIPPED
[INFO] Hive CLI ........................................... SKIPPED
[INFO] Hive Contrib ....................................... SKIPPED
[INFO] Hive Druid Handler ................................. SKIPPED
[INFO] Hive HBase Handler ................................. SKIPPED
[INFO] Hive JDBC Handler .................................. SKIPPED
[INFO] Hive HCatalog ...................................... SKIPPED
[INFO] Hive HCatalog Core ................................. SKIPPED
[INFO] Hive HCatalog Pig Adapter .......................... SKIPPED
[INFO] Hive HCatalog Server Extensions .................... SKIPPED
[INFO] Hive HCatalog Webhcat Java Client .................. SKIPPED
[INFO] Hive HCatalog Webhcat .............................. SKIPPED
[INFO] Hive HCatalog Streaming ............................ SKIPPED
[INFO] Hive HPL/SQL ....................................... SKIPPED
[INFO] Hive Llap External Client .......................... SKIPPED
[INFO] Hive Shims Aggregator .............................. SKIPPED
[INFO] Hive TestUtils ..................................... SKIPPED
[INFO] Hive Packaging ..................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE

Spark S3 I/O - [S3ServiceException] S3 HEAD request failed

I want to Save and Read Spark DataFrame from AWS S3. I googled it a lot but found nothing much of use.
The code that I have written is like this:
val spark = SparkSession.builder().master("local").appName("test").getOrCreate()
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "**********")
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "********************")
import spark.implicits._
spark.read.textFile("s3n://myBucket/testFile").show(false)
List(1,2,3,4).toDF.write.parquet("s3n://myBucket/test/abc.parquet")
But when run it I get following error:
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/myBucket/testFile' - ResponseCode=403, ResponseMessage=Forbidden
[info] at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleServiceException(Jets3tNativeFileSystemStore.java:245)
[info] at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:119)
[info] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[info] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[info] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info] at java.lang.reflect.Method.invoke(Method.java:498)
[info] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
[info] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
[info] at org.apache.hadoop.fs.s3native.$Proxy15.retrieveMetadata(Unknown Source)
[info] at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:414)
[info] ...
[info] Cause: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/myBucket/testFile' - ResponseCode=403, ResponseMessage=Forbidden
[info] at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:477)
[info] at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestHead(RestS3Service.java:718)
[info] at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1599)
[info] at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectDetailsImpl(RestS3Service.java:1535)
[info] at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1987)
[info] at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1332)
[info] at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:111)
[info] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[info] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[info] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info] ...
[info] Cause: org.jets3t.service.impl.rest.HttpException:
[info] at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:475)
[info] at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestHead(RestS3Service.java:718)
[info] at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1599)
[info] at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectDetailsImpl(RestS3Service.java:1535)
[info] at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1987)
[info] at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1332)
[info] at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:111)
[info] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[info] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[info] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info] ...
I am using
Spark: 2.1.0
Scala: 2.11.2
AWS Java SDK: 1.11.126
Any help is appreciated !
I have tried following things on spark version 2.1.1 and its worked fine for me.
Step 1: Download following jars:
-- hadoop-aws-2.7.3.jar
-- aws-java-sdk-1.7.4.jar
Note:
If you not able to find the following jars, then you can get the jars from hadoop-2.7.3
Step 2: Place the above jars into $SPARK_HOME/jars/
Step 3: code:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
val conf = new SparkConf().setMaster("local").setAppName("My App")
val sc = new SparkContext(conf)
sc.getOrCreate.hadoopConfiguration.set("fs.s3a.access.key"", "***********")
sc.getOrCreate.hadoopConfiguration.set("fs.s3a.secret.key", "******************")
val input = sc.textFile("s3a://mybucket/*.txt")
List(1,2,3,4).toDF.write.parquet("s3a://mybucket/abc.parquet")
Set the secrets in the spark conf itself with an option like "spark.hadoop.fs.s3n...", so that spark propagates it around with the work

Scala/Spark tests failing with "copyAndReset must return a zero value copy"

After switching Spark framework from 1.6 to 2.1, running my tests results in many occurrences of the following error:
[info] java.lang.AssertionError: assertion failed: copyAndReset must return a zero value copy
[info] at scala.Predef$.assert(Predef.scala:170)
[info] at org.apache.spark.util.AccumulatorV2.writeReplace(AccumulatorV2.scala:163)
[info] at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
[info] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info] at java.lang.reflect.Method.invoke(Method.java:497)
[info] at java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1075)
[info] at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1136)
[info] at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
[info] at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
[info] at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
[info] ...
Any ideas what's happening and how I can fix this?