how to set java parameters for Confluent distribution kafka jvm? - confluent-platform

the confluent installation guide doesnt say how to modify jvm options.
for example, things like tuning the GC, enabling jmx etc.
Am i supposed to add an environment variable? if so which one?

You can find environment variables from confluent bin directory:
$ grep jmx confluent-4.0.0/bin/kafka-run-class
KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false "
KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT "
So use envs like KAFKA_JMX_OPTS for JMX settings, or KAFKA_OPTS for generic JVM settings.

Related

How do I view metrics for a confluentinc/cp-kafka container?

Hi I have a Kafka container built using the image 'confluentinc/cp-kafka:6.1.0'.
How do I view the metrics from the container?
You can add an environment variable for JMX_PORT then attach a tool like jconsole or visualvm to that.
This is mentioned in the docs, but I think it might be incorrect (at least, trying to use /jmx on Zookeeper, and the variable is only JMX_PORT and shouldn't be different in the container)
If you want to use Prometheus/Grafana, then you'll need to extend the container to add the JMX exporter
I set up Kafka using https://docs.confluent.io/platform/current/quickstart/ce-docker-quickstart.html#ce-docker-quickstart. This launches Kafka with JMX installed.
This installation provides Confluent Control Center so you can view metrics there.
However I wanted the raw metrics exposed by JMX so I proceeded to the next steps.
I installed VisualVM from here https://visualvm.github.io/download.html.
(You can also use jconsole available in the JAVA/jdk/bin folder installed in your local m/c but I had connectivity issues running jconsole against the container JMX.)
Install the VisualVM-MBeans plugin in VisualVM.
Add a JMX connection using the KAFKA_JMX_HOSTNAME:KAFKA_JMX_PORT values from your docker-compose.yml in Step 1.
Bingo you can see the metrics from Confluent Kafka running on the container!

Error starting kafka server: " **Different application** unexpected at this time"

I am following a quick start guide for getting kafka up and running.
I have zookeeper running but when I try to start kafka server with the following command:
.\bin\windows\kafka-server-start.bat .\config\server.properties
I get the following error
\Teradata\Client\16.20\bin\ was unexpected at this time.
I cant even begin to understand how the two might be related.
Any tip is appreciated. I am very much stuck....
In my case I needed to change the CLASSPATH environment variable, changing from Program Files to Progra~1 and Program Files (x86) to Progra~2
This solved the problem.
This isn't specific to Kafka or Zookeeper, but how they load the Java Home and classpath.
You need to fix your PATH variable in Windows settings to not contain any spaces
looks like the kafka process needs all environment variables in the CLASSPATH to be present. I was getting the same error for Meld which I had deleted but CLASSPATH still have a reference to that.
Removing it from the CLASSPATH worked.
If you are using the Apache Kafka package structure as it is, then fix the path.it will be resolved automatically.
downloaded apache Kafka folder structure kafka_2.11-0.9.0.0 ->bin->windows-> All *.bat files.
kafka_2.11-0.9.0.0->config-> all *.properties files
so to start zookeeper go to the kafka_2.11-0.9.0.0 ->bin->windows->in window address bar type cmd and enter after that enter:
zookeeper-server-start.bat ..\..\config/zookeeper.properties
o/p: zookeeper will start working
To start broker again go to the same folder
kafka_2.11-0.9.0.0 ->bin->windows->in window address bar type cmd and enter after that enter:
kafka-server-start.bat ..\..\config/server.properties
o/p: broker will start working.
Note: you can update your own configuration using this properties file.like change in port etc..
I can't even begin to understand how the two might be related.<br/>
For more information, you can check the blog: Role of Apache ZooKeeper in Kafka — Monitoring & Configuration

Kafka broker.id: env variable vs config file precedence

I'm setting up a Kafka cluster, in which I'm setting the broker.id=-1 so that broker.ids are automatically generated. but in some cases want to set them using environment variables (i.e. KAFKA_BROKER_ID).
If done so, will the nodes with the KAFKA_BROKER_ID env variables use the env variable or auto-generate them?
Depends on how you are deploying your Kafka installation.
Out of the box, Kafka does not use system properties to configure broker id, so you need to put the value into .properties file.
(among others: grepping for KAFKA_BROKER_ID in Kafka source returns nothing)
KAFKA_BROKER_ID appears to be added by multiple Docker images, you'd need to contact the author of the one you are using.

How to run a map reduce job using Java -jar command

I write a Map reduce Job using Java.
Set configuration
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://127.0.0.1:9000");
configuration.set("mapreduce.job.tracker", "localhost:54311");
configuration.set("mapreduce.framework.name", "yarn");
configuration.set("yarn.resourcemanager.address", "localhost:8032");
Run using Different Case
case 1: "Using Hadoop and Yarn command" : Success Fine Work
case 2: "Using Eclipse " : Success Fine Work
case 3: "Using Java -jar after remove all configuration.set() " :
Configuration configuration = new Configuration();
Run successful but not display Job status on Yarn (default port number 8088)
case 4: "Using Java -jar" : Error
Find stack trace:Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at com.my.cache.run.MyTool.run(MyTool.java:38)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.my.main.Main.main(Main.java:45)
I request to you please tell me how to run a map-reduce job using "Java -jar" command and also able to check status and log on Yarn (default port 8088).
Why need: want to create a web service and submit a map-reduce job.(Without using Java runtime library for executing Yarn or Hadoop command ).
In my opinion, it's quite difficult to run hadoop application without hadoop command. You better use hadoop jar than java -jar.
I think you don't have hadoop environment in your machine. First, you must make sure hadoop running well on your machine.
Personally, I do prefer set configuration at mapred-site.xml, core-site.xml, yarn-site.xml, hdfs-site.xml. I know a clear tutorial to install hadoop cluster in here
At this step, You can monitor hdfs in port 50070, yarn cluster in port 8088, mapreduce job history in port 19888.
Then, you should prove your hdfs environtment and yarn environtment running well. For hdfs environment you can try with simple hdfs command like mkdir, copyToLocal, copyFromLocal, etc and for yarn environment you can try sample wordcount project.
After you have hadoop environment, you can create your own mapreduce application (you can use any IDE). probably you need this for tutorial. compile it and make it in jar.
open your terminal, and run this command
hadoop jar <path to jar> <arg1> <arg2> ... <arg n>
hope this helpfull.

How do one read the Zookeeper transaction log?

Are there any existing tools that help to read the Zookeeper transaction log? By default, it is in binary format and I would like to read it in human readable form.
I don't know if u have solved this question.
Answer:
cd the zookeeper dir.
If u want to read snapshots, use:
java -cp zookeeper-3.4.6.jar:lib/log4j-1.2.16.jar:lib/slf4j-log4j12-1.6.1.jar:lib/slf4j-api-1.6.1.jar org.apache.zookeeper.server.SnapshotFormatter version-2/snapshot.xxx
If u want to read logs, use:
java -cp zookeeper-3.4.6.jar:lib/log4j-1.2.16.jar:lib/slf4j-log4j12-1.6.1.jar:lib/slf4j-api-1.6.1.jar org.apache.zookeeper.server.LogFormatter version-2/log.xxx
You can use something like this
java -cp $ZOOKEEPER_CLASSPATH org.apache.zookeeper.server.LogFormatter [zookeeper log file path]
Building on the previous two answers, using Zookeeper 3.5.6: from the /path/to/zookeeper/lib dir which contains all the ZK and supporting jars run:
java -cp * org.apache.zookeeper.server.LogFormatter /path/to/zookeeper/transaction/logs/version-2/log.xxx
Since zookeeper version 3.6 there are tools to read transaction log and snapshots in zookeeper distribution:
For transaction log:
bin/zkTxnLogToolkit.sh --dump /datalog/version-2/log.f3aa6
For snapshots:
./zkSnapShotToolkit.sh -d /data/zkdata/version-2/snapshot.fa01000186d
See the details in official docs
You can enable ZooKeeper audit logs for ZooKeeper 3.6.o and upwards. To enable audit logs configure audit.enable=true in conf/zoo.cfg.
One thing to keep in mind is that logs from different servers of the same ensemble should be aggregated because each one of them includes operations that where executed only form clients connected to this particular server.
Full information here: https://zookeeper.apache.org/doc/r3.6.1/zookeeperAuditLogs.html