java.io.InvalidClassException: org.apache.flink.api.common.operators.ResourceSpec; incompatible types for field cpuCores - apache-beam

I am using Beam version 1.24 with Flink Session cluster version 1.11 with beam-runners-flink-1.9. When I run the job with remote FlinkRunner in streaming mode, I get the following error. Any insights will be appreciated. I couldn't put all the stack trace as StackOverflows wouldn't let me post with the entire stack trace.
Thanks,
Rao.
Caused by: java.io.InvalidClassException: org.apache.flink.api.common.operators.ResourceSpec; incompatible types for field cpuCores

This was due to Flink and Beam version incompatibility; using the Maven artifact that matched the service version solved the issue. Here is the Maven snippet:
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-flink-1.11</artifactId>
<version>2.25.0</version>
</dependency>

Related

Connectio Faild to Kafka with IBM Info Sphere

While trying to read a Kafka topic in a InfoSpfhere Job, I got the error
Kafka_Customer: java.lang.NoClassDefFoundError: org.apache.kafka.clients.consumer.ConsumerRebalanceListener
at com.ibm.is.cc.kafka.runtime.KafkaProcessor.validateConfiguration (KafkaProcessor.java: 145)
at com.ibm.is.cc.javastage.connector.CC_JavaAdapter.initializeProcessor (CC_JavaAdapter.java: 1008)
at com.ibm.is.cc.javastage.connector.CC_JavaAdapter.getSerializedData (CC_JavaAdapter.java: 705)
Kafka_Customer: Java runtime exception occurred: java.lang.NoClassDefFoundError: org.apache.kafka.clients.consumer.ConsumerRebalanceListener (com.ibm.is.cc.kafka.runtime.KafkaProcessor::validateConfiguration, file KafkaProcessor.java, line 145)
I should add the jar file, which is missing, but where and how can I see which version is nedeed?. I could'n find anything after a lot of googling.
kafka-clients JAR versions are backwards compatible down to 0.10.2, however I would assume that the Kafka processors should have this, so you may want to reach out to IBM support

Error through remote Spark Job: java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem

Problem
I am trying to run a remote Spark Job through IntelliJ with a Spark HDInsight cluster (HDI 4.0). In my Spark application I am trying to read an input stream from a folder of parquet files from Azure blob storage using Spark's Structured Streaming built in readStream function.
The code works as expected when I run it on a Zeppelin notebook attached to the HDInsight cluster. However, when I deploy my Spark application to the cluster, I encounter the following error:
java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem cannot access its superinterface org.apache.hadoop.hdfs.web.TokenAspect$TokenManagementDelegator
Subsequently, I am unable to read any data from blob storage.
The little information I found online suggested that this is caused by a version conflict between Spark and Hadoop. The application is run with Spark 2.4 prebuilt for Hadoop 2.7.
Fix
To fix this, I ssh into each head and worker node of the cluster and manually downgrade the Hadoop dependencies to 2.7.3 from 3.1.x to match the version in my local spark/jars folder. After doing this , I am then able to deploy my application successfully. Downgrading the cluster from HDI 4.0 is not an option as it is the only cluster that can support Spark 2.4.
Summary
To summarize, could the issue be that I am using a Spark download prebuilt for Hadoop 2.7? Is there a better way to fix this conflict instead of manually downgrading the Hadoop versions on the cluster's nodes or changing the Spark version I am using?
After troubleshooting some previous methods I had attempted before, I've come across the following fix:
In my pom.xml I excluded the hadoop-client dependency automatically imported by the spark-core jar. This dependency was version 2.6.5 which conflicted with the cluster's version of Hadoop. Instead, I import the version I require.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version.major}</artifactId>
<version>${spark.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
</dependency>
After making this change, I encountered the error java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0. Further research revealed this was due to a problem with the Hadoop configuration on my local machine. Per this article's advice, I modified the winutils.exe version I had under C://winutils/bin to be the version I required and also added the corresponding hadoop.dll. After making these changes, I was able to successfully read data from blob storage as expected.
TLDR
Issue was the auto imported hadoop-client dependency which was fixed by excluding it & adding the new winutils.exe and hadoop.dll under C://winutils/bin.
This no longer required downgrading the Hadoop versions within the HDInsight cluster or changing my downloaded Spark version.
Problem:
I was facing same issue while running fat jar with hadoop 2.7 and spark 2.4 on cluster with hadoop 3.x ,
I was using maven shade plugin.
Observation:
While building fat jar it was including jar org.apache.hadoop:hadoop-hdfs:jar:2.6.5 which has class class org.apache.hadoop.hdfs.web.HftpFileSystem.
Which was causing problem in hadoop 3
Solution:
I have excluded this jar while building fat jar as below.Issue got resolved.

Spark + Kafka Integration error. NoClassDefFoundError: org/apache/spark/sql/internal/connector/SimpleTableProvider

I am using Kafka 2.5.0 and Spark 3.0.0. I'm trying to import some data from kafka into spark. The following code snippet gives me an erorr:
spark.readStream.format("kafka").option("kafka.bootstrap.servers", "localhost:9092").option("subscribe", "topic1").load()
The error I get says
java.lang.NoClassDefFoundError: org/apache/spark/sql/internal/connector/SimpleTableProvider
This error is mainly due to spark-kafka dependency conflicts.
You can check the supporting scala version in maven repository if not yet.
If the error still occurring then you can share more details like the groupId, artifactId along with version.

Cant find class in uber jar

I am on Hortonworks Distribution 2.4 (effectively hadoop 2.7.1 and spark 1.6.1)
I am packaging my own version of spark in the uber jar (2.1.0) while cluster is on 1.6.1. In the process, i am sending all required libraries through a fat jar (built using maven - uber jar concept).
However, spark submit (through spark 2.1.0 client) fails citing NoClassFound Error on jersey client. Upon listing my uber jar contents, i can see the exact class file in the jar, still spark/yarn cant find it.
here goes -
The error message -
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:151)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
And here is my attempt to find the class in jar file -
jar -tf uber-xxxxx-something.jar | grep jersey | grep ClientCon
com/sun/jersey/api/client/ComponentsClientConfig.class
com/sun/jersey/api/client/config/ClientConfig.class
... Other files
what could be going on here ? Suggestions ? ideas please..
EDIT
the jersey client section of the pom goes here -
<dependency>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-client</artifactId>
<version>1.19.3</version>
</dependency>
EDIT
I also wanted to indicate this, that my code is compiled with Scala 2.12 with compatibility level set to 2.11. However, the cluster is perhaps on 2.10. I am saying perhaps since I believe that cluster nodes dont necessarily have to have Scala binaries installed; YARN just launches the components' jar/class files without using Scala binaries. wonder if thats playing a role here !!!

Error upgrading to Mongo java driver 3.2.2

we have migrated to MongoDB 3.2.6. What could be the compatible mongo version jars for below dependencies,
mongo java driver version (org.mongodb)
spring data mongo version (org.springframework.data)
spring data commons version (spring-data-commons)
I have tried to upgrade these to 3.2.2 for java driver and 1.9.4.RELEASE for spring data and spring commons but facing maven compatible issues. Below 2 issues i'm unable to resolve as of now.
Kindly suggest what could be the problem.
Issue 1:
The type org.springframework.data.repository.query.QueryByExampleExecutor cannot be resolved. It is indirectly referenced from required .class files
Issue 2:
Error occured processing XML 'Invalid default: public abstract java.lang.Class
org.springframework.data.mongodb.repository.config.EnableMongoRepositories.repositoryBaseClass()'. See Error Log for more details
Tried mvn clean dependency:tree and it is successful. But mvn clean compile is failing with Issue 1 error mentioned above.
Answer:
I'm able to resolve both the issues by upgrading to 1.12.1 for spring-data-commons. This will resolve above mentioned compile time issues.
Below are my current settings.
mongo-java-driver to 3.2.2, spring-data-mongodb to 1.9.4.RELEASE, spring-data-commons to 1.12.1.
As per Spring data commons documentation Spring Data Commons I also upgraded my Spring framework version to 4.2.8.RELEASE.
Facing below issue which I couldn't resolve. Any ideas will be appreciated.
Issue1:
06:37:33.943 [localhost-startStop-1] DEBUG o.s.c.t.c.AnnotationAttributesReadingVisitor - Failed to class-load type while reading annotation metadata. This is a non-fatal error, but certain annotation metadata may be unavailable.
java.lang.ClassNotFoundException: javax.annotation.Nullable
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1305) ~[catalina.jar:na]
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1139) ~[catalina.jar:na]
at org.springframework.core.type.classreading.RecursiveAnnotationAttributesVisitor.visitEnd(RecursiveAnnotationAttributesVisitor.java:47) ~[spring-core-4.2.8.RELEASE.jar:4.2.8.RELEASE]
at org.springframework.asm.ClassReader.readAnnotationValues(ClassReader.java:1802) [spring-core-4.2.8.RELEASE.jar:4.2.8.RELEASE]
at org.springframework.asm.ClassReader.readMethod(ClassReader.java:976) [spring-core-4.2.8.RELEASE.jar:4.2.8.RELEASE]
at org.springframework.asm.ClassReader.accept(ClassReader.java:695) [spring-core-4.2.8.RELEASE.jar:4.2.8.RELEASE]
at org.springframework.asm.ClassReader.accept(ClassReader.java:508) [spring-core-4.2.8.RELEASE.jar:4.2.8.RELEASE]
at org.springframework.core.type.classreading.SimpleMetadataReader.<init>(SimpleMetadataReader.java:64) [spring-core-4.2.8.RELEASE.jar:4.2.8.RELEASE]
at org.springframework.core.type.classreading.SimpleMetadataReaderFactory.getMetadataReader(SimpleMetadataReaderFactory.java:98) [spring-core-4.2.8.RELEASE.jar:4.2.8.RELEASE]
at org.springframework.core.type.classreading.CachingMetadataReaderFactory.getMetadataReader(CachingMetadataReaderFactory.java:102) [spring-core-4.2.8.RELEASE.jar:4.2.8.RELEASE]
at org.springframework.core.type.classreading.SimpleMetadataReaderFactory.getMetadataReader(SimpleMetadataReaderFactory.java:93) [spring-core-4.2.8.RELEASE.jar:4.2.8.RELEASE]
at org.springframework.core.type.filter.AbstractTypeHierarchyTraversingFilter.match(AbstractTypeHierarchyTraversingFilter.java:121) [spring-core-4.2.8.RELEASE.jar:4.2.8.RELEASE]
at org.springframework.core.type.filter.AbstractTypeHierarchyTraversingFilter.match(AbstractTypeHierarchyTraversingFilter.java:105) [spring-core-4.2.8.RELEASE.jar:4.2.8.RELEASE]
at org.springframework.data.repository.config.RepositoryConfigurationDelegate$LenientAssignableTypeFilter.match(RepositoryConfigurationDelegate.java:202) [spring-data-commons-1.12.1.RELEASE.jar:na]
at org.springframework.context.annotation.ClassPathScanningCandidateComponentProvider.isCandidateComponent(ClassPathScanningCandidateComponentProvider.java:346) [spring-context-4.2.8.RELEASE.jar:4.2.8.RELEASE]
at org.springframework.context.annotation.ClassPathScanningCandidateComponentProvider.findCandidateComponents(ClassPathScanningCandidateComponentProvider.java:280) [spring-context-4.2.8.RELEASE.jar:4.2.8.RELEASE]
at org.springframework.data.repository.config.RepositoryConfigurationDelegate.multipleStoresDetected(RepositoryConfigurationDelegate.java:167) [spring-data-commons-1.12.1.RELEASE.jar:na]
at org.springframework.data.repository.config.RepositoryConfigurationDelegate.<init>(RepositoryConfigurationDelegate.java:88) [spring-data-commons-1.12.1.RELEASE.jar:na]
at org.springframework.data.repository.config.RepositoryBeanDefinitionRegistrarSupport.registerBeanDefinitions(RepositoryBeanDefinitionRegistrarSupport.java:80) [spring-data-commons-1.12.1.RELEASE.jar:na]
Tried adding dependency jsr305 from com.google.code.findbugs but still seeing same exceptions.