Regarding Apache-Kafka messaging queue.
I have downloaded Apache Kafka from the Kafka download page. I've extracted it to /opt/apache/installed/kafka-0.7.0-incubating-src.
The quickstart page says you need to start zookeeper and then start Kafka by running: >bin/kafka-server-start.sh config/server.properties
I'm using a separate Zookeeper server, so i edited config/server.properties to point to that Zookeeper instance.
When i run Kafka, as instructed in the quickstart page, I get the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: kafka/Kafka
Caused by: java.lang.ClassNotFoundException: kafka.Kafka
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
Could not find the main class: kafka.Kafka. Program will exit.
I used telnet to make sure the Zookeeper instance is accessible from the machine that Kafka runs on. Everything is OK.
Why am i getting this error?
You must first build Kafka by running the following commands:
> ./sbt update
> ./sbt package
Only then will Kafka be ready for use.
You should know that
./sbt update
./sbt package
will produce Kafka binaries for Scala 2.8.0 by default. If you need it for a different version, you need to do
./sbt "++2.9.2 update"
./sbt "++2.9.2 package"
replacing 2.9.2 with the desired version number. This will make the appropriate binaries. In general, when you switch versions, you should run
./sbt clean
to clean up the binaries from previous versions.
Actually, in addition, you might also need to perform this command
./sbt "++2.9.2 assembly-package-dependency"
This command resolves all the dependencies for running Kafka, and creates a jar that contains just these. Then the start scripts would add this to the class path and you should have all your desired classes.
It seems that without the SCALA_VERSION environment variable, the executable doesn't know how to load the libraries necessary. Try the following from the Kafka installation directory:
SCALA_VERSION=2.9.3 bin/kafka-server-start.sh config/server.properties
See http://kafka.apache.org/documentation.html#quickstart.
Just to add to the previous answer, if you're running IntelliJ, and want to run Kafka inside IntelliJ and/or step through it, make sure to run
> ./sbt idea
I spent easily half a day trying to create the IntelliJ project from scratch, and it turns out that single command was all I needed to get it working. Also, make sure you have the Scala plugin for IntelliJ installed.
You can also use the binary downloads provided by Apache.
For example download kafka version - 0.9.0.1 from this link.
For other version download from link2 and download the binary versions instead. These are already built version. Not need to build again using Scala.
Use are using the source download instead.
You've downloaded the source version. Download the binary package of Kafka and proceed with your testing.
You can find following two options on Kafka downloads page
https://kafka.apache.org/downloads.html
Source download:
Binary downloads
You have downloaded "kafka-0.7.0-incubating-src" it' source code
Download the binary package of Kafka
Scala 2.10 - kafka_2.10-0.10.1.1.tgz (asc, md5)
Related
We can't figure out the following issue: we are trying to use Apache Hudi to save data to the storage. The problem is when we upload a fat jar which includes the org.json package in dependencies, the df.save() application is failing on
java.lang.NoClassDefFoundError: org/json/JSONException
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:10847)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10047)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10128)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:384)
at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:367)
at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:357)
at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:262)
at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:176)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:130)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:321)
at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:363)
at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:359)
Even if I go to the cluster libraries and explicitly add this dependency it still fails on save. On the other hand, when I just create new JSONException("hello") in my notebook everything seem to work fine. What could cause this behaviour? Thanks
This is probably because the jar is not making it's way to the executor nodes, try addJar (https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#addJar-java.lang.String-)
What version of Hudi are you using? There is a problem with JSON in version 0.6.0 and there is an opened issue. I suggest you to use version 0.5.2 by now.
Turns out that the problem was with different classpath between metastore service and spark process, because they run in separated JVM's. The problem was fixed in an init script that downloads the jar to the classpath folder.
I am not able to get kafka-connect-jdbc
working with Kafka 0.10.1 on HDInsight Cluster. Here are the steps I went thru so far:
I cloned the repo and ran mvn install and after struggling with the dependencies (not SNAPSHOTS), I got the jar.
moved it to ./libs and I was able to see io.confluent.connect.jdbc.JdbcSourceConnector & io.confluent.connect.jdbc.JdbcSinkConnector when hitting GET /connector-plugins.
I created a source connector to Azure SQLServer & a topic.
I am getting the following error once, the source connector is created:
[2018-02-14 15:46:16,960] ERROR Error while starting connector azure-source-connector-test (org.apache.kafka.connect.runtime.WorkerConnector:108)
java.lang.NoSuchFieldError: SYSTEM
at io.confluent.connect.jdbc.source.JdbcSourceConnectorConfig.(JdbcSourceConnectorConfig.java:184)
at io.confluent.connect.jdbc.JdbcSourceConnector.start(JdbcSourceConnector.java:69)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:100)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:125)
at org.apache.kafka.connect.runtime.WorkerConnector.transitionTo(WorkerConnector.java:182)
at org.apache.kafka.connect.runtime.Worker.startConnector(Worker.java:165)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startConnector(DistributedHerder.java:773)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startWork(DistributedHerder.java:747)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.handleRebalanceCompleted(DistributedHerder.java:708)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:204)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:174)
at java.lang.Thread.run(Thread.java:748)
I tried different versions/branches of kafka-connect-jdbc but all trials end with the same error.
This question
has same issue, so I tried to force the build to use connect-api-0.10.1.2.6.2.3-1.jar via systemPath in pom.xml but if the build goes thru, it still has the same issue.
Any idea?
Please note that I am better dataminer than programmer.
I am trying to run examples from book "Advanced analytics with Spark" from author Sandy Ryza (these code examples can be downloaded from "https://github.com/sryza/aas"),
and I run into following problem.
When I open this project in Intelij Idea and try to run it, I get error "Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD"
Does anyone know how to solve this issue ?
Does this mean i am using wrong version of spark ?
First when I tried to run this code, I got error "Exception in thread "main" java.lang.NoClassDefFoundError: scala/product", but I solved it by setting scala-lib to compile in maven.
I use Maven 3.3.9, Java 1.7.0_79 and scala 2.11.7 , spark 1.6.1. I tried both Intelij Idea 14 and 15 different versions of java (1.7), scala (2.10) and spark, but to no success.
I am also using windows 7.
My SPARK_HOME and Path variables are set, and i can execute spark-shell from command line.
The examples in this book will show a --master argument to sparkshell, but you will need to specify arguments as appropriate for your environment. If you don’t have Hadoop installed you need to start the spark-shell locally. To execute the sample you can simply pass paths to local file reference (file:///), rather than a HDFS reference (hdfs://)
The author suggest an hybrid development approach:
Keep the frontier of development in the REPL, and, as pieces of code
harden, move them over into a compiled library.
Hence the samples code are considered as compiled libraries rather than standalone application. You can make the compiled JAR available to spark-shell by passing it to the --jars property, while maven is used for compiling and managing dependencies.
In the book the author describes how the simplesparkproject can be executed:
use maven to compile and package the project
cd simplesparkproject/
mvn package
start the spark-shell with the jar dependencies
spark-shell --master local[2] --driver-memory 2g --jars ../simplesparkproject-0.0.1.jar ../README.md
Then you can access you object within the spark-shell as follows:
val myApp = com.cloudera.datascience.MyApp
However if you want to execute the sample code as Standalone application and execute it within idea you need to modify the pom.xml.
Some of dependencies are required for compilation, but are available in an spark runtime environment. Therefore these dependencies are marked with scope provided in the pom.xml.
<!--<scope>provided</scope>-->
you can remake the provided scope, than you will be able to run the samples within idea. But you can not provide this jar as dependency for the spark shell anymore.
Note: using maven 3.0.5 and Java 7+. I had problems with maven 3.3.X version with the plugin versions.
I would like to run a Samza (using RocksDB KV store) application from SBT. When I do ./sbt "run " I receive the following error
java.lang.ExceptionInInitializerError
(snip)
Caused by: java.lang.RuntimeException: librocksdbjni-linux64.so was not found inside JAR.
(snip)
I assume that since I run with ./run, sbt runs the classes directly, without assembling a JAR.
The dependencies are set correctly, and I've got the librocksdbjni-linux64.so inside RocksDB JAR.
Do I have to create an assembly before running?
How can I test in this case without creating an assembly?
Well, librocksdbjni-linux64.so sounds like a native library, and those usually require a little extra fiddling with things, even if they are inside the path, in order to be recognized and added. Check this question.
The instructions for running kafka from the quick start page are not working for me.
http://kafka.apache.org/07/quickstart.html
Kafka builds fine
05:55:01/kafka-0.8.1-src:58 $sbt package
[info] Set current project to kafka-0-8-1-src (in build file:/shared/kafka-0.8.1-src/)
[info] Packaging /shared/kafka-0.8.1-src/target/scala-2.10/kafka-0-8-1-src_2.10-0.1-SNAPSHOT.jar ...
[info] Done packaging.
[success] Total time: 0 s, completed Apr 17, 2014 5:55:07 AM
But does not run fine..
05:55:07/kafka-0.8.1-src:59 $bin/zookeeper-server-start.sh config/zookeeper.properties
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/zookeeper/server/quorum/QuorumPeerMain
Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.server.quorum.QuorumPeerMain
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Similar errors occur for kafka-server-start.sh and all the other scripts inside bin
You downloaded kafka-0.8.1-src.tgz from download page. The instructions on quickstart link are meant for Binary download . Download one from the Binary downloads section of http://kafka.apache.org/downloads.html page. Now try .It should work. Or if you want to build from the src.tgz package you downloaded, then run ./gradlew jar. It will download all the required dependencies.
I was facing same problem On Windows 10, What i have done is :
Don't download/install Zookeeper separately, download only kafka_2.12-1.1.0 (or higher version)
Create temp folder , ( like this E:\DevApplications\kafka\temp)
Open zookeeper.properties ( i have it # E:\DevApplications\kafka\kafka_2.12-1.1.0\config)
Update dataDir (for me : dataDir=E:/DevApplications/kafka/temp) Note the forward slash
Open CMD and run zookeeper-server-start.bat with zookeeper.properties as second parameter like
.\zookeeper-server-start.bat ..\..\config\zookeeper.properties
Once zookeeper is started start kafka server by typing
.\kafka-server-start.bat ..\..\config\server.properties
Hope this helps.
To add to the Chandra Kant solutions, if you have proxy connection in your network then please use the below command
./gradlew -Dhttp.proxyHost=<PROXY-HOST> -Dhttp.proxyPort=<PROXY-PORT> jar
Thanks #Chandra kant it helped me a lot
You can also reach this exception if you try starting Kafka 0.9.0.0 running a java version less than java 1.7. Set your $JAVA_HOME to 1.7 or above and make sure JAVA_HOME/bin is on your classpath.