Connect to cassandra using spark in java - scala

I am using cassandra 3.2.1 with spark, i included all the required jars. and i tried to connect cassandra from java through spark, i am getting the following error,
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.augmentString(Ljava/lang/String;)Lscala/collection/immutable/StringOps;
at akka.util.Duration$.(Duration.scala:76)
at akka.util.Duration$.(Duration.scala)
at akka.actor.ActorSystem$Settings.(ActorSystem.scala:120)
at akka.actor.ActorSystemImpl.(ActorSystem.scala:426)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:103)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:98)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:55)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1837)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1828)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:57)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:223)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:163)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:269)
at org.apache.spark.SparkContext.(SparkContext.scala:272)
at spark.Sample.run(Sample.java:13)
at spark.Sample.main(Sample.java:23)
Any Idea regarding this? and what i am missing.
See the jars and my sample code in below image. Don't know where i am doing mistake.
Click here to open image

Related

Spark + Kafka Integration error. NoClassDefFoundError: org/apache/spark/sql/internal/connector/SimpleTableProvider

I am using Kafka 2.5.0 and Spark 3.0.0. I'm trying to import some data from kafka into spark. The following code snippet gives me an erorr:
spark.readStream.format("kafka").option("kafka.bootstrap.servers", "localhost:9092").option("subscribe", "topic1").load()
The error I get says
java.lang.NoClassDefFoundError: org/apache/spark/sql/internal/connector/SimpleTableProvider
This error is mainly due to spark-kafka dependency conflicts.
You can check the supporting scala version in maven repository if not yet.
If the error still occurring then you can share more details like the groupId, artifactId along with version.

Native snappy library not available

I'm trying to do lots of joins on some data frames using spark in scala. When I'm trying to get the count of the final data frame I'm generating here, I'm getting the following exception. I'm running the code using spark-shell.
I've tried some configuration params like following while starting the spark-shell. But none of them worked. Is there anything I'm missing here?
:
--conf "spark.driver.extraLibraryPath=/usr/hdp/2.6.3.0-235/hadoop/lib/native/"
--jars /usr/hdp/current/hadoop-client/lib/snappy-java-1.0.4.1.jar
Caused by: java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
Try to update Hadoop jar file from 2.6.3. to 2.8.0 or 3.0.0. There was the bug in the earlier version of Hadoop: the native snappy library was not available.
After modifying of Hadoop core jar, you should be able to perform snappy compression/decompression.

spark-submit netty NoSuchMethodError when using jdbc

I have been trying to run my job using spark-submit and i have a problem when i use jdbc to fetch a DataFrame from a Postgresql.
First of all the jdbc driver is inside my job jar but i had to load the driver like this inside my code
sparkSession.read.option("driver", "org.postgresql.Driver").jdbc(jdbcdn, query, props)
This works fine and the connection to the database is made, i know this because if the server is not found i recieve the appropriate exception from the driver.
But if the connection succeed i always receive the following exception and the job hangs :
17/05/31 10:56:16 ERROR server.TransportRequestHandler: Error sending result StreamResponse{streamId=/jars/bibi-1.0.0-spark.jar, byteCount=3345077, body=FileSegmentManagedBuffer{file=/srv/jobs/bibi-1.0.0-spark.jar, offset=0, length=3345077}} to /127.0.0.1:50087; closing connection
io.netty.handler.codec.EncoderException: java.lang.NoSuchMethodError: io.netty.channel.DefaultFileRegion.<init>(Ljava/io/File;JJ)V
at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:107)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:658)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:716)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:651)
at io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:658)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:716)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:706)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:741)
at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:895)
at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:240)
at org.apache.spark.network.server.TransportRequestHandler.respond(TransportRequestHandler.java:194)
at org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:150)
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111)
at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: io.netty.channel.DefaultFileRegion.<init>(Ljava/io/File;JJ)V
at org.apache.spark.network.buffer.FileSegmentManagedBuffer.convertToNetty(FileSegmentManagedBuffer.java:133)
at org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:58)
at org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:33)
at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
... 34 more
I tried the following (i am using Gradle)
Exclude netty dependencies from my project
Include a version of netty in my shadowJar
Relocate the included netty
But everything i tried had no effect.
What i am wondering is the problem i had with registering the driver since all the standard ways you can found online does not work with spark/scala/jdbc and i had to use the code above.
It seems to me that the jdbc call is in its own environment and whatever i do in my project gradle has no effect on this environment.
Since that option("driver", "org.postgresql.Driver") was hard to find, I wonder if there is something undocumented here and if i have to find a way to instruct the jdbc runtime on which netty version to use.
Ok so i continued my search and finally found what was going on.
I installed the spark-master and hadoop server myself, since the hadoop jars were going to be on the same server i installed the spark without hadoop.
Hadoop jars were added to spark classpath using the "hadoop classpath" command.
The thing is that hadoop 2.7.3 ship with netty 3.6.2/4.0.23.Final while spark ship with netty 3.8.0/4.0.42.Final
Both were on the classpath in the end provoking the problem.
What i did was to copy both netty jars from spark to all places in hadoop basically upgrading the netty version used by hadoop.
I don't see a problem so far but i use a fraction of what hadoop can do and issues may arise.
EDIT : Another quick fix is to use the spark-with-hadoop tar and NOT add hadoop classpath, that way both use their own jars without conflicting with each other.
This is actually what i ended up doing because i had another jar conflict when accessing sparkUI and it could not be corrected by copying jars like i did with netty.
The conclusion is : NEVER use spark-without-hadoop download.

NoSuchMethod exception in Flink when using dataset with custom object array

I have a problem with Flink
java.lang.NoSuchMethodError: org.apache.flink.api.java.typeutils.ObjectArrayTypeInfo.getInfoFor(Lorg/apache/flink/api/common/typeinfo/TypeInformation;)Lorg/apache/flink/api/java/typeutils/ObjectArrayTypeInfo;
at LowLevel.FlinkImplementation.FlinkImplementation$$anon$6.<init>(FlinkImplementation.scala:28)
at LowLevel.FlinkImplementation.FlinkImplementation.<init>(FlinkImplementation.scala:28)
at IRLogic.GmqlServer.<init>(GmqlServer.scala:15)
at it.polimi.App$.main(App.scala:20)
at it.polimi.App.main(App.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
the line with the problem is this one
implicit val regionTypeInformation =
api.scala.createTypeInformation[FlinkDataTypes.FlinkRegionType]
in the FlinkRegionType I have an Array of custom object
I developed the app with the maven plugin in the IDE and everything is working good, but when I move to the version I downloaded from the website I get the error above
I am using Flink 0.9
I was thinking that some library may be missing but I am using maven for handling everything. Moreover running through the code of ObjectArrayTypeInfo.java it doesn't seem to be the problem
A NoSuchMethodError commonly indicates a version mismatch between the libraries a Flink program was compiled with and the system the program is executed on. Especially if the same code works in an IDE setup where compile and execution libraries are the same.
In such case, you should check the version of the Flink dependencies, for example in the Maven POM file.

cassandra client hector api java.lang.NoClassDefFoundError: org/apache/log4/Level

I am getting java.lang.NoClassDefFoundError: org/apache/log4/Level.
I'm not a java guy but can read code. What should i do to get rid of this exception?
This exception really have nothing to do with cassandra hector api, why is it bothering me?
thanks!
You are missing the log4j jar on your classpath. Specify the classpath with the -cp option on the java command when you run your app.