How to run mongodb connector in Kafka? - mongodb

I have downloaded a binary Kafka's files then Jar file of conector. Then specified the path to connector jar and configured config file with connection to mongo.
After I have laucnhed zookkepper, run Kafka and created topic. How to lauch connector source?

There are connect-standalone and connect-distributed scripts in the Kafka bin folder for running Kafka Connect.
You'll also need to edit the respective properties files to make sure the Mongo connector plugin gets loaded

Related

AWS MSK. Kafka Connect - plugin class loading

I am using Kafka Connect in MSK.
I have defined a plugin that points to a zip file in s3 - this works fine.
I have implemented SMT and uploaded the SMT jar into the same bucket and folder as the zip file of the plugin.
I define a new connector and this time I add the SMT using
transforms
I get an error message that the Class com.x.y.z.MySMT could not be found.
I verified that the jar is valid and contains the SMT.
Where should I put the SMT jar in order to make Kafka connect loading it?
Pushing the SMT jar into the zip (under /lib) solved the class not found issue.

Message transforms for the MirrorMaker 2.0

I am running a dedicated MirrorMaker cluster and want to perform my SMT transformation for the records. Could you advice where should I put the jar with my code, i. e. where should I define plugin.path property?
where should I define plugin.path property?
The worker property file when you start either connect-mirror-maker or connect-distributed
where should I put the jar
You need to make a subfolder under a directory listed in plugin.path, then put JARs there.

Where to keep the zipped file of kafka connect s3-source connector

I have downloaded the s3-source connector zip file as given in the confluent web page. I am not getting where to place the extracted files. I am getting the following error
Please guide me. To load the connector, Iam using this command -
confluent local load s3-source -- -d /etc/kafka-connect-s3/confluentinc-kafka-connect-s3-source-1.3.2/etc/quickstart-s3-source.properties
I am not getting where to place the extracted files
If you used confluent-hub install, it would put them in the correct place for you.
Otherwise , you can put them whereever, if you update plugin.path in the Connect properties to include the parent directory of the JARs for the connector
Extract the zip file whether its a source connector or sink connector and place the whole folder with all the jars inside of it under: plugin.path which you have set

classpath is empty. please build the project first

I was trying to run Kafka on Windows machine and when I try to start the zookeeper I am facing this weird error:
classpath is empty. please build the project first e.g. by running 'gradlew jarall'
If anyone else is facing this issue:
Note: Do not download a source files from appache kafka, download a binary file
Download Kafka from here: Link
Also follow this link for any additional information
Also this group has some additional information
I had the exact same problem and I finally solved it.
the problem is that you have space character in your path (inside folder names) which causes "dirname" command to receive more than one argument.
Therefore, in order to solve, you only need to remove space from folder names within your Kafka folder path.
Follow below steps for windows & kafka 0.9.0.0 (same steps will go with lower versions of kafka)
First download binary from:
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.9.0.0/kafka_2.11-0.9.0.0.tgz
extract to your particular folder and then
Step 1: create new directories in your kafka directory
- kafka-logs
- zookeeper
your directory after step 1 will be :
- bin
- config
- kafka-logs
- libs
- site-docs
- zookeeper
Step 2: Open config/server.properties and change below property
- log.dirs={fullpath}/kafka-logs
Step 3: Open config/zookeeper.properties and change belwo property
- dataDir={fullpath}/zookeeper
Step 4: create a run.bat file under bin/windows folder with following script:
start zookeeper-server-start.bat ..\..\config\zookeeper.properties
TIMEOUT 10
start kafka-server-start.bat ..\..\config\server.properties
exit
You can change timeout for your convenience.
Here i think you downloaded kafka source. you need to download binary
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.9.0.0/kafka_2.11-0.9.0.0.tgz
Follow below steps to resolve this error.
step1: Get inside kafka downloaded folder
cd kafka-2.5.0-src
step2: Run gradle
./gradlew jar
step3: Once build is successful, start the kafka server
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
Now Kafka will be starts on localhost:9092
Had the same problem and it was because I download the source file instead of the binary file.
Simple ensure there are no white spaces in your folder hierarchy
for example:
instead of -> "c:\desktop\work files\kafka_2.12-2.7.0"
use this -> "c:\desktop\work-files\kafka_2.12-2.7.0"
this worked for me!
If you are using the Kafka source to run the Kafka server on the Windows 10 machine. We need to build the source first using the below step.
Please note: we need to have gradle build tool installed and path variable set before following the below steps.
Open the command prompt and navigate to the Kafka home directory
C:\kafka-1.1.1-src>
Enter the command 'gradle' and press Enter
C:\kafka-1.1.1-src>gradle
Once the build is successful enter the below command
C:\kafka-1.1.1-src>gradlew jar
Now enter the below command to start the server
C:\kafka-1.1.1-src>.\bin\windows\kafka-server-start.bat .\config\server.properties
If everything went fine, your command prompt will look like this one
Ensure that you have no white space or special character
Step 1 : Navigate to \confluent-community-5.5.0–2.12\confluent-5.5.0\bin\windows folder.
Step 2: Open kafka-run-class.bat file.
Step 3 : Search rem Classpath addition for core in this bat file
Step 4: Now, just add the below code just above the rem Classpath addition for core line.
rem classpath addition for LSB style path
if exist %BASE_DIR%\share\java\kafka\* (
call:concat %BASE_DIR%\share\java\kafka\*
)
Using Windows 10:
Download and extract bin kafka and change the config/server.properties; for me it changes from
log.dirs=/tmp/kafka-logs
to
log.dir= D:\Elastic_search\kafka_2.11-0.9.0.0\kafka-logs
Create the new directory, kafka-logs.
Run
.\bin\windows\kafka-server-start.bat .\config\server.properties
in your root kafka_2.11-0.9.0.0 folder with CMD "again"
I found that the bit of code below that adds to the Classpath was missing from \bin\windows\kafka-run-class.bat from the previous version I was using. (Confluent 4.0.0 vs 5.3.1)
rem Classpath addition for LSB style path
if exist %BASE_DIR%\share\java\kafka\* (
call :concat %BASE_DIR%\share\java\kafka\*
)
I followed the link https://janschulte.wordpress.com/2013/10/13/apache-kafka-0-8-on-windows/ to configure kafka and it worked. But I used the same version as mentioned in the post (which is old version). For now I need kafka for my project so decided to proceed with the version.
Few things the author missed out in the explanation. Please find them below
1) After downloading the sbt windows installer, you need to restart the system not only the shell,to reflect the necessary changes
2) Add the following in the 66,67th line of kafka-run-class.sh
JAVA="java"
$JAVA $KAFKA_OPTS $KAFKA_JMX_OPTS -cp cygpath -wp $CLASSPATH "$#"
(Make sure your java is configured in environment variables)
3) Traverse to the appropriate path, to run the zookeeper command
bin/zookeeper-server-start.sh config/zookeeper.properties
Tag me if you have any doubts! Happy to Help!
I sufferred the same issue. Download the zookeeper tar file as well.
Downloading the zookeeper in the same folder and then typing the same commands worked for me.
Guys be sure that you are using the right path to zookeeper.properties file. In my occassion I was using the full path for the .bat file and a wrong relative path for the .properties file.
Having a wrong path to zookeeper.properties will produce the error that you mentioned.
Notice that I have used the binary, not the kafka source.
For me the issue was when unzipping the files. I moved them to another folder, and something went wrong. I unzipped again keeping the directory structure, and it worked.
thanks to orlando mendez for the advice!
https://www.youtube.com/watch?v=7F9tBwTUSeY
This happened to me when the length of Kafka folder path was long. Try it with a shorter path like "D:\kafka_2.12-2.7.0"
Please download binary package, not source code.
I faced the same issue, this is what worked for me
I downloaded the binary version I Created the directory as
following C:/kafka
Changed the properties files
Changes in zookeeper.properties -
dataDir=C:/kafka/zookeeper-data
Changes in server.properties - log.dirs = C:/kafka/kafka-logs
All the directories got created automatially
This should work.
Video for reference -
https://www.youtube.com/watch?v=3XjfYH5Z0f0
Download the Kafka binaries not source or make sure there are no empty characters in file paths
This site describes a solution that worked for me.
The solution was to modify a bat file so that java knows the path of several jar libs.
Of cource I downloaded the binary and not source files from confluent.

Analytics for Apache Hadoop - what files are uploaded for Analyzing data with Oozie?

The Analytics for Apache Hadoop documentation lists the following steps for analysing data with Oozie:
Analyzing data with Oozie
Install required drivers.
Use webHDFS to upload the workflow related files to HDFS.
For example, upload the files to /user/biblumix/apps/oozie
...
Source: https://www.ng.bluemix.net/docs/services/AnalyticsforHadoop/index.html
Question: What files are typically uploaded in step 2? The wording suggests that the files are oozie files (e.g. xml files). However, the link takes you to the section Upload your data.
I performed some testing, and I had to upload a workflow.xml in addition to the data files that my oozie job processes.