Kafka Connect On windows failed to find the connector class - apache-kafka

I have installed Kafka on my windows machine and everything is supposed to work properly until i insert a connector configuration file which contains the io debezium mysql connector. all i found on google for this bug is about the plugin.path so i did all the possible workarounds to make it work in vain: Change the jar folders to literally c:/mypluginfolder using " \ " and changing it to "/" in my paths, using absolute paths adding the directory in classpath ect... the logs even say that some of the debezium plugins are being added before it crashes so technically the server sees that path. Help a fellow out i've been at loss for more than 2 weeks. thank you.
The cmd output: https://controlc.com/880d73f2
my standalone.props and connector: https://controlc.com/4ee0164f
PS:Sorry for the controlc links i dont know how to format questions i'm new here.

Related

Classpath is empty error when running zookeeper instance

I am trying to follow the instructions on https://kafka.apache.org/quickstart to try and start a Kafka install and then send some messages from a scala client.
I am using a windows system.
I am getting this error(see screencap) when i run the zookeeper instance.
The reason most probably is because your directory path has a space - “Development Tools”. Try running this in a path which has no spaces. I guess the space is causing some path issues in the shell script.
Also, I assume that you downloaded the binary and not the source files?
Hope it works and let us know.

Divolte-collector with MAPR, Storm, Kafka and Cassandra

I am not sure if I can get help for this on here, but I thought it was worth a try.
I have 3 node cluster on AWS, I am running MAPR M3 , I installed Storm, Kafka and Divolte-collector and Cassandra. I would like try some of the clickstream examples and I am running into an issue with the tcp-consumer example. Also being quite new to java and distributed processing I have some clarification questions. Again I am not quite sure where to post this because I feel like this is divolte-collector specific and I also have some gaps in my understanding of the javadoc concept and the building and running of jar files; but I figured someone could point me to some resources or help with some clarifications. I can't get the json string to appear in the console running netcat socket listening for clicks:
Divolte tcp-kafka-consumer example
Everything works until the netcat part step 7 and my knowledge gap is with step 6.
Step 1: install and configure Divolte Collector
Install works and hello world click collections is promising :-)
Step 2: download, unpack and run Kafka
# In one terminal session
cd kafka_2.10-0.8.1.1/bin
./zookeeper-server-start.sh ../config/zookeeper.properties
# Leave Zookeeper running and in another terminal session, do:
cd kafka_2.10-0.8.1.1/bin
./kafka-server-start.sh ../config/server.properties
No erros plus tested kafka examples so seems to working as well
Step 3: start Divolte Collector
Go into the bin directory of your installation and run:
cd divolte-collector-0.2/bin
./divolte-collector
Step 3 no hitch, can test default divole-collector test page
Step 4: host your Javadoc files
Setup a HTTP server that serves the Javadoc files that you generated or downloaded for the examples. If you have Python installed, you can use this:
cd <your-javadoc-directory>
python -m SimpleHTTPServer
Ok so I can reach the javadoc pages
Step 5: listen on TCP port 1234
nc -kl 1234
Note: when using netcat (nc) as TCP server, make sure that you configure the Kafka consumer to use only 1 thread, because nc won't handle multiple incoming connections.
Tested netcat by opening port and sending messages so I figured I don't have any port issues on AWS.
Step 6: run the example
cd divolte-examples/tcp-kafka-consumer
mvn clean package
java -jar target/tcp-kafka-consumer-*-jar-with-dependencies.jar
Note: for this to work, you need to have the avro-schema project installed into your local Maven repository.
I installed the avro-schema with mvn clean install in avro project that comes with the examples. as per instructions here
Step 7: click around and check that you see events being flushed to the console where you run netcat
When you click around the Javadoc pages, you console should show events in JSON format similar to this:
I don't see the clicks in my netcat window :(
Investigating the issue I viewed the console and network tabs using chrome developer tools it seems divolte is running, but I am not sure how to dig further. This is the console view. Any ideas or pointers?
Thanks anyways
Initializing Divolte.
divolte.js:140 Divolte base URL detected http://ec2-x-x-x-x.us-west-x.compute.amazonaws.com:8290/
divolte.js:280 Divolte party/session/pageview identifiers ["0:i6i3g0jy:nxGMDVdU9~f1wF3RGqwmCKKICn4d1Sb9", "0:i6qx4rmi:IXc1i6Qcr17pespL5lIlQZql956XOqzk", "0:6ZIHf9BHzVt_vVNj76KFjKmknXJixquh"]
divolte.js:307 Module initialized. Object {partyId: "0:i6i3g0jy:nxGMDVdU9~f1wF3RGqwmCKKICn4d1Sb9", sessionId: "0:i6qx4rmi:IXc1i6Qcr17pespL5lIlQZql956XOqzk", pageViewId: "0:6ZIHf9BHzVt_vVNj76KFjKmknXJixquh", isNewPartyId: false, isFirstInSession: false…}
divolte.js:21 Signalling event: pageView 0:6ZIHf9BHzVt_vVNj76KFjKmknXJixquh0
allclasses-frame.html:9 GET http://ec2-x-x-x-x.us-west-x.compute.amazonaws.com:8000/resources/fonts/dejavu.css
overview-summary.html:200 GET http://localhost:8290/divolte.js net::ERR_CONNECTION_REFUSED
(Intro: I work on Divolte Collector)
It seems that you are running the example on an AWS instance somewhere. If you are using the pre-packaged JavaDoc files that come with the examples, they have hard-coded the divolte location as http://localhost:8290/divolte.js. So if you are running somewhere other than localhost, you should probably create your own JavaDoc for the example, using the correct hostname for the Divolte Collector server.
You can do so using this command. Be sure to run it from the directory where you source tree is rooted. And of course change localhost for the hostname where you are running the collector.
javadoc -d YOUR_OUTPUT_DIRECTORY \
-bottom '<script src="//localhost:8290/divolte.js" defer async></script>' \
-subpackages .
As an alternative, you could also just try to run the examples locally first (possibly in a virtual machine, if you are on a Windows machine).
It doesn't seem there is anything MapR specific with the issue that you are seeing so far. The Kafka based examples and pipeline should work in any environment that has the required components installed. This doesn't touch MapR-FS or anything else MapR specific. Writing to the distributed filesystem is another story.
We don't compile Divolte Collector against MapR Hadoop currently, but incidentally I have given it a run on the MapR sandbox VM. When installing from the RPM distribution, create a /etc/divolte/divolte-env.sh with the following env var setting:
HADOOP_CONF_DIR=/usr/share/divolte/lib/guava-18.0.jar:/usr/share/divolte/lib/avro-1.7.7.jar:$(hadoop classpath)
Obviously this is a bit of a hack to get around classpath peculiarities and we hope to provide a distribution compiled against MapR that works out of the box in the future.
Also, you need Java 8 to run Divolte. If you install this from the Oracle RPM, add the proper JAVA_HOME to divolte-env.sh as well, e.g.:
JAVA_HOME=/usr/java/jdk1.8.0_31
With these settings I'm able to run the server and collect Avro files on MapR FS, create a external Hive table on those files and run a query.

starting warden after zookeeper of MapR

I am installing the MapR and I stucked at starting warden after start zookeeper on a single node.
# service mapr-warden start
Error: warden can not be started. See /opt/mapr/logs/warden.log for details
On this file there is no detail. Does anybody have a hint? Thanks =)
If you aren't getting anything in warden.log, then it's likely that the warden JVM is never even being started by the mapr-warden init script.
In some MapR versions, the mapr-warden init script will log some details into /opt/mapr/logs/wardeninit.log. You can try checking there.
However, I will also caution that currently the logging done by the init script is sparse and not necessarily user friendly to read. If you can't discern the cause from the contents of the wardeninit.log you can post them here and maybe I can help.
Another thing you can do is edit /etc/init.d/mapr-warden and add "set -x" towards the top of the file, right before the "BASEMAPR=" line, then try starting warden again and you'll get a bunch of shell debugging output on your screen. If you copy and paste that output here that should be enough to tell the root cause of the problem.
One more thing to mention, you may be better off using the http://answers.mapr.com forum as that is MapR specific and I think there may be more users there that could help.
Was configure.sh (/opt/mapr/server/configure.sh -C nodeA -Z nodeA)run on the node? Did zookeeper come up successfully?
service mapr-zookeeper status
Even when using MapR in a single node configure.sh is still required. In fact, without configure.sh warden, zookeeper, cldb and other MapR components will lack their configuration and in many cases will fail to start.
You must run configure.sh after installing the software packages (deb or rpm).

Unable to view any folders on DFS locations connecting to hadoop from eclipse

I have setup Hadoop1.2.1 in windows with CYGWIN installed.
I have started sshd service.
Also started namenode, datanode, mapreduce (job tracker, task tracker). I am able to see the namenode, datanode and mapreduce running status through the following URLs.
When i try connecting the hadoop through eclipse, i am able to.Though i was able to connect hadoop from eclipse, i was not seeing any folders on opening DFS locations. Its displaying as (0) (refer Pic #1 ,
which i guess no directories/files available. The same i checked with namenode storage (refer Pic #2)
Even when i try creating a directory through CYGWIN terminal (refer Pic #4), i was not able to see it in DFS locations in eclipse environment.
That being said, i tried with WordCount example, by setting the input path and output path as follows,
// specify input and output dirs
FileInputFormat.addInputPath(conf, new Path("Input"));
FileOutputFormat.setOutputPath(conf, new Path("Output"));
When i run that in HDFS location from eclipse, i was getting the following exception
13/10/30 06:52:44 ERROR security.UserGroupInformation: PriviledgedActionException as:Administrator cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:47110/user/Administrator/Input
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:47110/user/Administrator/Input
Questions:
Why i am not able to see the directory that i created through CYGWIN terminal and any folders for that matter?
What does "hdfs://localhost:47110" point to?
Am i getting the above exception since it dont see the directory in datanode?
What is the input path should i set?
Please advice me on this.
Thanks in advance.
1st you should have check the all the setting of your hadoop cluster from scratch beacuse this problem shows that you have not configure your eclipse properly with the hadoop cluster
see the following link which help you...
https://www.youtube.com/watch?v=TavehEdfNDk
also check you dfs is connected to your cluster or not means are able to store file and in your dfs or not..

talend , mongoDB connection

I am facing a problem with mongo DB connection.
I have succefully imported tMongo components it to my Talend Open Studio 5.1.1 and by copying the mongo 1.3.jar file to lib/java folder, my Mongo DB jobs are running successfully, but the problem is even if I provide some fake server path(IP) and fake port for mongoDB, my job is running without an error and it is giving me 1 row with no data. and same goes with right IP and port.
How do I resolve it.
I think the connection is not working. As you must be knowing, mongoDB checks that the connection is actually working or not when you perform a query on it.
(Yeah, it doesn't check for a successful connection when you just connect to it ).
I would suggest to instead add the mongoDB components present in Talend for Big Data by following the steps below:
Components provided for MongoDB are :
tMongoDBInput, tMongoDBOutput, tMongoDBConnection etc.
Or you can Download the components from http://www.talendforge.org/exchange/ and search for Mongo instead of using Talend Big Data. But I would suggest use Talend for big Data for it.
The components will be zipped format , Unzip the same. In Talend Big data you will find the components in Component folder.
Copy these Unzipped Components to the installation Path of TOS.
C:TalendTOS_DI-Win32-r84309V5.1.1pluginsorg.talend.designer.components.localprovider_5.1.1.r84309components
Copy the mongo-1.3.jar file in the component folder into the C:TalendTOS_DI-Win32-r84309-V5.1.1libjava
In many systems you might not be able to see this file then go with ADMINISTRATOR priviliges.
optional for few systems——>>> Inside index.xml add
save index.xml
Restart TOS
Then you will be able to use them as normal components.
Cheers!
The reason for the Job running without any error could be due to the connection / meta-data you have used for the Mongo Connector. It doesn't is not possible for the job to run without any error even after giving fakepath.
I guess you might configured (re-modified) the repository connection but using a built-in meta data for component.