mongo kafka connect source - mongodb

I am using kafka connect in order to read data from mongo and write them to kafka topic.
I am using the mongo kafka source connector.
I am getting the following error :
ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:115)
java.lang.NoClassDefFoundError: com/mongodb/ConnectionString
at com.mongodb.kafka.connect.source.MongoSourceConfig.createConfigDef(MongoSourceConfig.java:209)
at com.mongodb.kafka.connect.source.MongoSourceConfig.<clinit>(MongoSourceConfig.java:138)
at com.mongodb.kafka.connect.MongoSourceConnector.config(MongoSourceConnector.java:56)
at org.apache.kafka.connect.connector.Connector.validate(Connector.java:129)
at org.apache.kafka.connect.runtime.AbstractHerder.validateConnectorConfig(AbstractHerder.java:282)
at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:188)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:109)
Caused by: java.lang.ClassNotFoundException: com.mongodb.ConnectionString
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:588)
at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:104)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 7 more
It seems that there is amising clas in the jar. In order to get the jar I used two different methods but I am gettng the same error. First I used the download fro: the maven repository and then I clone the source code from the github repo and I build the jar by myself. I pushed the jar to the plugins.path.
When I unzip the generated jar and go through the calsses I can't find the mentioned class: com.mongodb.ConnectionString
I used the following config files
worker.properties :
rest.port=18083
# Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins
# (connectors, converters, transformations). The list should consist of top level directories that include
# any combination of:
# a) directories immediately containing jars with plugins and their dependencies
# b) uber-jars with plugins and their dependencies
# c) directories immediately containing the package directory structure of classes of plugins and their dependencies
# Note: symlinks will be followed to discover dependencies or plugins.
# Examples:
# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
plugin.path=/usr/share/java/plugins
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
bootstrap.servers=127.0.0.1:9092
mongo-connector.properties:
name=mongo
tasks.max=1
connector.class =com.mongodb.kafka.connect.MongoSourceConnector
database=
collection=alerts
key.converter = org.apache.kafka.connect.storage.StringConverter
value.converter = org.apache.kafka.connect.storage.StringConverter
topic.prefix=someprefix
poll.max.batch.size=1000
poll.await.time.ms=5000
# Change stream options
pipeline=[]
batch.size=0
change.stream.full.document=updateLookup
then I launched the connector by the following command :
/usr/local/kafka/bin/connect-standalone.sh worker.properties mongo-connector.properties
Any idea how to fix this

You have to place the connector's JAR file under plugin.path which in your case is /usr/share/java/plugins.
The instructions are already present in Confluent's documentation:
A Kafka Connect plugin is:
an uber JAR containing all of the classfiles for the plugin and its
third-party dependencies in a single JAR file; or a directory on the
file system that contains the JAR files for the plugin and its
third-party dependencies. However, a plugin should never contain any
libraries that are provided by Kafka Connect’s runtime.
Kafka Connect finds the plugins using its plugin path, which is a
comma-separated list of directories defined in the Kafka Connect’s
worker configuration. To install a plugin, place the plugin directory
or uber JAR (or a symbolic link that resolves to one of those) in a
directory listed on the plugin path, or update the plugin path to
include the absolute path of the directory containing the plugin.

Im creating this answer as I tooksome time to find out the solution, as pointed out by scalacode, the easiest solution is to download the jar from confluent, not from maven.
https://www.confluent.io/hub/mongodb/kafka-connect-mongodb

Related

how to add classpath to oozie java action with sharedlib and external jars

I have a java application which needs the library of hadoop, hdfs, hive and spark, also some external libraries,
I've read this page but I'm still confused about the order of overriding sharedlib,
in the job configure, I have
oozie.use.system.libpath=false
oozie.action.sharelib.for.java=spark,hive2,hive
I also put the external jars under the /lib of the workspace directory.
Now I got this problem, in my jar I used class from json4s-native, so I put them in the myworkspace/lib path, but under the oozie/share/lib/spark,also has library of json4s-jackson, after a run the java action, throw an error of
Launcher exception: java.lang.NoClassDefFoundError: org/json4s/native/JsonMethods$
How can I get oozie use the library in my /lib path first?

How to add external jar files to a spark scala project

I am trying to use an LSH implementation of Scala(https://github.com/marufaytekin/lsh-spark) in my Spark project.I cloned the repository with some changes to the sbt file (added Organisation)
To use this implementation , I compiled it using sbt compile and moved the jar file to the "lib" folder of my project and updated the sbt configuration file of my project , which looks like this ,
Now when I try to compile my project using sbt compile , It fails to load the external jar file ,showing the error message "unresolved dependency: com.lendap.spark.lsh.LSH#lsh-scala_2.10;0.0.1-SNAPSHOT: not found".
Am i following the right steps for adding an external jar file ?
How do i solve the dependency issue
As an alternative, you can build the lsh-spark project and add the jar in your spark application.
To add the external jars, addJar option can be used while executing spark application. Refer Running spark application on yarn
This issue isn't related to spark but to sbt configuration.
Make sure you followed the correct folder structure imposed by sbt and added your jar in the lib folder, as explained here - lib folder should be at the same level as build.sbt (cf. this post).
You might also want to check out this SO post.

Akka and Typesafe config versions issue

I tried to use akka 2.1.0 on a Tomcat server. But I got an error asking me to put the config library on the classpath too.
Well that's not the issue. I put the config library of Typesafe, version 1.0.0 (the latest) in the lib folder. However, I always get the error
8d31597e-1b6e-4be5-9773-4fb7e0591312akka.ConfigurationException: Akka JAR version [2.1.0] does not match the provided config version [2.0]
at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:172)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:465)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:93)
The jar files in the lib folder are :
akka-actor_2.10-2.1.0.jar
config-1.0.0.jar
scala-library-2.10.0.jar
Where does this problem come from ?
It loads a configuration file containing akka.version=2.0 but is expecting 2.1.0.
You might have mistakenly defined akka.version in your application.conf. Remove that setting. Otherwise you have a akka-actor 2.0 jar file in your classspath.

kafka NoClassDefFoundError kafka/Kafka

Regarding Apache-Kafka messaging queue.
I have downloaded Apache Kafka from the Kafka download page. I've extracted it to /opt/apache/installed/kafka-0.7.0-incubating-src.
The quickstart page says you need to start zookeeper and then start Kafka by running: >bin/kafka-server-start.sh config/server.properties
I'm using a separate Zookeeper server, so i edited config/server.properties to point to that Zookeeper instance.
When i run Kafka, as instructed in the quickstart page, I get the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: kafka/Kafka
Caused by: java.lang.ClassNotFoundException: kafka.Kafka
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
Could not find the main class: kafka.Kafka. Program will exit.
I used telnet to make sure the Zookeeper instance is accessible from the machine that Kafka runs on. Everything is OK.
Why am i getting this error?
You must first build Kafka by running the following commands:
> ./sbt update
> ./sbt package
Only then will Kafka be ready for use.
You should know that
./sbt update
./sbt package
will produce Kafka binaries for Scala 2.8.0 by default. If you need it for a different version, you need to do
./sbt "++2.9.2 update"
./sbt "++2.9.2 package"
replacing 2.9.2 with the desired version number. This will make the appropriate binaries. In general, when you switch versions, you should run
./sbt clean
to clean up the binaries from previous versions.
Actually, in addition, you might also need to perform this command
./sbt "++2.9.2 assembly-package-dependency"
This command resolves all the dependencies for running Kafka, and creates a jar that contains just these. Then the start scripts would add this to the class path and you should have all your desired classes.
It seems that without the SCALA_VERSION environment variable, the executable doesn't know how to load the libraries necessary. Try the following from the Kafka installation directory:
SCALA_VERSION=2.9.3 bin/kafka-server-start.sh config/server.properties
See http://kafka.apache.org/documentation.html#quickstart.
Just to add to the previous answer, if you're running IntelliJ, and want to run Kafka inside IntelliJ and/or step through it, make sure to run
> ./sbt idea
I spent easily half a day trying to create the IntelliJ project from scratch, and it turns out that single command was all I needed to get it working. Also, make sure you have the Scala plugin for IntelliJ installed.
You can also use the binary downloads provided by Apache.
For example download kafka version - 0.9.0.1 from this link.
For other version download from link2 and download the binary versions instead. These are already built version. Not need to build again using Scala.
Use are using the source download instead.
You've downloaded the source version. Download the binary package of Kafka and proceed with your testing.
You can find following two options on Kafka downloads page
https://kafka.apache.org/downloads.html
Source download:
Binary downloads
You have downloaded "kafka-0.7.0-incubating-src" it' source code
Download the binary package of Kafka
Scala 2.10 - kafka_2.10-0.10.1.1.tgz (asc, md5)

Jar works with standalone Hadoop, but not on the actual cluster (java.lang.ClassNotFoundException: org.jfree.data.xy.XYDataset)

I am trying to build my project using Eclipse on Windows and execute on a Linux cluster. The project depends on some external jars, which I enclosed using eclipse's "Export->Runnable JAR -> Package required library into jar" build option. I checked the jar contains the classes within a folder structure, and the external jars are in the root folder.
On Hadoop standalone, Cygwin and Linux, this works fine but on an actual Hadoop Linux cluster it fails, when it tries to access a class from the first external jar, throwing up a ClassNotFoundException.
Is there a way to force Hadoop to search the jar, I thought this would work.
10/07/16 11:44:59 INFO mapred.JobClient: Task Id : attempt_201007161003_0005_m_000001_0, Status : FAILED
Error: java.lang.ClassNotFoundException: org.jfree.data.xy.XYDataset
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
at org.akintayo.analysis.ecg.preprocess.ReadPlotECG.plotECG(ReadPlotECG.java:27)
at org.akintayo.analysis.ecg.preprocess.BuildECGImages.writeECGImages(BuildECGImages.java:216)
at org.akintayo.analysis.ecg.preprocess.BuildECGImages.converSingleECGToImage(BuildECGImages.java:305)
at org.akintayo.analysis.ecg.preprocess.BuildECGImages.main(BuildECGImages.java:457)
at org.akintayo.hadoop.HadoopECGPreprocessByFile$MapTest.map(HadoopECGPreprocessByFile.java:208)
at org.akintayo.hadoop.HadoopECGPreprocessByFile$MapTest.map(HadoopECGPreprocessByFile.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Java can not use jars that are in other jar:/ (classloaders can't handle this)
So what you have to do is to install those packages separately on each machine in cluster, or if not possible add jars on the run, to do this you have to add option -libjars mylib.jar when running hadoop jar myjar.jar -libjars mylib.jar and this should work.
Wojtek's answer is correct. Using -libjars will put your external jars in the distributed cache and make them available to all of your Hadoop nodes.
However, if your external jars are not changing frequently, you may find it more convenient to copy the jar files to the node's hadoop/lib manually. Once you restart Hadoop your external jar will be added to the classpath of your jobs.