Exception on samza KafkaSystemFactory.getAdmin - scala

I am running Samza to consume messages off of a given Kafka topic in Scala. In order to run, I created a samza-read.properties file which contains:
systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
systems.kafka.samza.msg.serde=byte
systems.kafka.consumer.auto.offset.reset=largest
systems.kafka.consumer.zookeeper.connect=localhost:2181/
systems.kafka.producer.bootstrap.servers=localhost:9092
Yet, when I run my program I keep getting the exception:
java.lang.NoClassDefFoundError: kafka/common/ReplicaNotAvailableException
at org.apache.samza.system.kafka.KafkaSystemFactory.getAdmin(KafkaSystemFactory.scala:106)
I believe this has to deal with systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory but maybe someone has fallen upon this exception before. Any help is greatly appreciated!

Looks like you have one of the following problems with your build:
You are missing the Kafka jar (eg: org.apache.kafka_kafka_.jar) in your class path
The version of the Kafka jar in your class path is incompatible with the what getAdmin is expecting
You possibly have 2 versions of the Kafka jar (one correct + one incorrect) and the JVM is picking up the incorrect one (fix here is to exclude the bad version in your build)

Related

Databricks - java.lang.NoClassDefFoundError: org/json/JSONException

We can't figure out the following issue: we are trying to use Apache Hudi to save data to the storage. The problem is when we upload a fat jar which includes the org.json package in dependencies, the df.save() application is failing on
java.lang.NoClassDefFoundError: org/json/JSONException
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:10847)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10047)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10128)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:384)
at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:367)
at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:357)
at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:262)
at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:176)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:130)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:321)
at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:363)
at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:359)
Even if I go to the cluster libraries and explicitly add this dependency it still fails on save. On the other hand, when I just create new JSONException("hello") in my notebook everything seem to work fine. What could cause this behaviour? Thanks
This is probably because the jar is not making it's way to the executor nodes, try addJar (https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#addJar-java.lang.String-)
What version of Hudi are you using? There is a problem with JSON in version 0.6.0 and there is an opened issue. I suggest you to use version 0.5.2 by now.
Turns out that the problem was with different classpath between metastore service and spark process, because they run in separated JVM's. The problem was fixed in an init script that downloads the jar to the classpath folder.

Connectio Faild to Kafka with IBM Info Sphere

While trying to read a Kafka topic in a InfoSpfhere Job, I got the error
Kafka_Customer: java.lang.NoClassDefFoundError: org.apache.kafka.clients.consumer.ConsumerRebalanceListener
at com.ibm.is.cc.kafka.runtime.KafkaProcessor.validateConfiguration (KafkaProcessor.java: 145)
at com.ibm.is.cc.javastage.connector.CC_JavaAdapter.initializeProcessor (CC_JavaAdapter.java: 1008)
at com.ibm.is.cc.javastage.connector.CC_JavaAdapter.getSerializedData (CC_JavaAdapter.java: 705)
Kafka_Customer: Java runtime exception occurred: java.lang.NoClassDefFoundError: org.apache.kafka.clients.consumer.ConsumerRebalanceListener (com.ibm.is.cc.kafka.runtime.KafkaProcessor::validateConfiguration, file KafkaProcessor.java, line 145)
I should add the jar file, which is missing, but where and how can I see which version is nedeed?. I could'n find anything after a lot of googling.
kafka-clients JAR versions are backwards compatible down to 0.10.2, however I would assume that the Kafka processors should have this, so you may want to reach out to IBM support

ClassNotFoundException while creating Spark Session

I am trying to create a Spark Session in Unit Test case using the below code
val spark = SparkSession.builder.appName("local").master("local").getOrCreate()
but while running the tests, I am getting the below error:
java.lang.ClassNotFoundException: org.apache.hadoop.fs.GlobalStorageStatistics$StorageStatisticsProvider
I have tried to add the dependency but to no avail. Can someone point out the cause and the solution to this issue?
It can be because of two reasons.
1. You may have incompatible versions of spark and Hadoop stacks. For example, HBase 0.9 is incompatible with spark 2.0. It will result in the class/method not found exceptions.
2. You may have multiple version of the same library because of dependency hell. You may need to run the dependency tree to make sure this is not the case.

Spark and Prediction IO: NoClassDefFoundError Despite Dependency Existing

Problem:
I am attempting to train a Prediction IO project using Spark 1.6.1 and PredictionIO 0.9.5, but the job fails immediately after the Executors begin to work. This happens both in a Stand-Alone spark cluster and a Mesos cluster. In both cases I am deploying to the cluster from a remote client i.e. I am running pio train -- --master [master on some other server] .
Symptoms:
In the driver logs, shortly after the first [Stage 0:> (0 + 0) / 2] message, the executors die due to java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.protobuf.ProtobufUtil
Investigation:
Found the class-in-question within the pio-assembly jar:
jar -tf pio-assembly-0.9.5.jar | grep ProtobufUtil
org/apache/hadoop/hbase/protobuf/ProtobufUtil$1.class
org/apache/hadoop/hbase/protobuf/ProtobufUtil.class
When submitting, this jar is deployed with the project and can be found within the executors
Adding --jars pio-assembly-0.9.5.jar to pio train does not fix the problem
Creating an uber jar with pio build --clean --uber-jar does not fix the problem
Setting SPARK_CLASSPATH on the slaves to a local copy of pio-assembly-0.9.5.jar does solve the problem
As far as I am aware, SPARK_CLASSPATH is deprecated and should be replaced with --jars when submitting. I'd rather not be dependant on a deprecated feature. Is there something I am missing when calling pio train or with my infrastructure? Is there a defect (e.g. race condition) with the executors fetching the dependencies from the driver?
The problem is that java.lang.NoClassDefFoundError: Could not initialize class doesn't actually mean that the dependency is not there, but rather it is a poorly named exception and the real problem is that the class loader had trouble loading the class. The actual problem will be reported in the form of java.lang.ExceptionInInitializerError which will likely be thrown from a static code block. It is hard to tell the difference betweenjava.lang.NoClassDefFoundError and java.lang.ClassNotFoundException, but the latter is what actually means that the dependency is missing (this question and others provide more details).

I can't start Cassandra Server in Eclipse( Unknown Commitlog version 4)

I'm trying to run Cassandra in eclipse, but I'm getting this exception
java.lang.IllegalStateException: Unknown commitlog version 4Exception encountered during startup: Unknown commitlog version 4
at org.apache.cassandra.db.commitlog.CommitLogDescriptor.getMessagingVersion(CommitLogDescriptor.java:81)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:118)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:93)
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:146)
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:126)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:305)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)
What am I doing wrong?
Sounds like a version mismatch - possibly from downgrading Cassandra[?]
What version of Cassandra are you using in eclipse? Also, did you have another version running and sharing the same commitlogs? It is likely you have commitlogs from one version of cassandra being read from another. (That was my experience.)
Adding the solution, as provided by #LyubenTodorov in the comments:
To solve this either change your commitlog_directory or empty your current commitlog dir (default is /var/lib/cassandra/commitlog)