How do one read the Zookeeper transaction log? - apache-zookeeper

Are there any existing tools that help to read the Zookeeper transaction log? By default, it is in binary format and I would like to read it in human readable form.

I don't know if u have solved this question.
Answer:
cd the zookeeper dir.
If u want to read snapshots, use:
java -cp zookeeper-3.4.6.jar:lib/log4j-1.2.16.jar:lib/slf4j-log4j12-1.6.1.jar:lib/slf4j-api-1.6.1.jar org.apache.zookeeper.server.SnapshotFormatter version-2/snapshot.xxx
If u want to read logs, use:
java -cp zookeeper-3.4.6.jar:lib/log4j-1.2.16.jar:lib/slf4j-log4j12-1.6.1.jar:lib/slf4j-api-1.6.1.jar org.apache.zookeeper.server.LogFormatter version-2/log.xxx

You can use something like this
java -cp $ZOOKEEPER_CLASSPATH org.apache.zookeeper.server.LogFormatter [zookeeper log file path]

Building on the previous two answers, using Zookeeper 3.5.6: from the /path/to/zookeeper/lib dir which contains all the ZK and supporting jars run:
java -cp * org.apache.zookeeper.server.LogFormatter /path/to/zookeeper/transaction/logs/version-2/log.xxx

Since zookeeper version 3.6 there are tools to read transaction log and snapshots in zookeeper distribution:
For transaction log:
bin/zkTxnLogToolkit.sh --dump /datalog/version-2/log.f3aa6
For snapshots:
./zkSnapShotToolkit.sh -d /data/zkdata/version-2/snapshot.fa01000186d
See the details in official docs

You can enable ZooKeeper audit logs for ZooKeeper 3.6.o and upwards. To enable audit logs configure audit.enable=true in conf/zoo.cfg.
One thing to keep in mind is that logs from different servers of the same ensemble should be aggregated because each one of them includes operations that where executed only form clients connected to this particular server.
Full information here: https://zookeeper.apache.org/doc/r3.6.1/zookeeperAuditLogs.html

Related

How to send logs from Google Stackdriver to Kafka

I see many docs and posts about how to send logs to Stackdriver but almost no information about how to do the opposite - send logs from the Stackdriver to Kafka.
In my case, our Ops want to collect the logs from our web servers using Google's stackdriver agents and pushing them to stackdriver ... However, for my stream processing needs I want to get the logs into Kafka to use it's unparalleled abilities to retain and reprocess data by any number of consumers, something that I cannot do with PubSub.
So, what are the options for doing this? I only saw a couple of possible avenues - neither sounds too good:
based on this post: (https://powerspace.tech/how-to-stream-data-from-google-pubsub-to-kafka-with-kafka-connect-dbef1c340a76) push data into PubSub first, and then read from it using either Kafka connector or write my own Kafka consumer. I hate the thought of adding yet another hop (serialize/deserialize/ack/etc.) between the source of data and Kafka ....
I noticed a brief mentioning in passing on adding a plugin to Google's version of Fluentd (which is what stackdriver log collection agent is based on) here: https://powerspace.tech/how-to-stream-data-from-google-pubsub-to-kafka-with-kafka-connect-dbef1c340a76 . Not many details - so hard to tell how involved this approach is ...
Any other options?
Thank you!
Enter in to the Kafka console and add certain elements in the console. Once you have added the elements in the Kafka console you need to check if these elements are reflected successfully in the cloud shell. For this you will run the command > $ gcloud pubsub subscriptions pull from-kafka — auto-ack — limit=10 < . Once you run this command it will take some time to sync with the Kafka console. You will get the results after running this command a couple of times.
You will run the commands in the Cloud Shell and see the output in the Kafka VM SSH.
***Image1
Now you will be verifying the exact opposite procedure where in you will be running the command in the Kafka VM and seeing the output in the Cloud Shell. It will take some time for the output to be reflected and you may have to run the command > $ gcloud pubsub subscriptions pull from-kafka — auto-ack — limit=10 < a couple of times to see the output. Your output will look like this
*** image2
The Kafka plugin is deprecated. For more information, refer to https://cloud.google.com/stackdriver/docs/deprecations
Note: This functionality is only available for agents running on Linux. It is not available on Windows.
Kafka is monitored via JMX. Monitoring supports monitoring Kafka version 0.8.2 and higher.
On your VM instance, download kafka-082.conf from the GitHub configuration repository and place it in the directory /etc/stackdriver/collectd.d/:
(cd /etc/stackdriver/collectd.d/ && sudo curl -O https://raw.githubusercontent.com/Stackdriver/stackdriver-agent-service-configs/master/etc/collectd.d/kafka-082.conf)
The downloaded plugin configuration file assumes that your Kafka server is configured to accept JMX connections on port 9999. If you have configured Kafka with a different JMX port, as root, edit the file and follow the instructions to change the JMX port settings.
After adding the configuration file, restart the Monitoring agent by running the following command:
sudo service stackdriver-agent restart
What is monitored:
https://cloud.google.com/monitoring/api/metrics_agent#agent-kafka

After Zookeeper audit is disable Stopped Working

I wanted to install the Apache Kafka but stuck in installing the Zookeeper
I extracted all the file and created the environment variables as well and now it stopped working after
"Zookeeper audit is disabled."
Now after following through these links,
https://www.programmersought.com/article/22066571206/
https://zookeeper.apache.org/doc/r3.6.2/zookeeperAuditLogs.html
I updated both the files (skServer.cmd and conf/zoo.cfg). Adder a line "-Dzookeeper.audit.enable=true" in skServer.cmd file and "audit.enable=true" in conf/zoo.cfg.
Now the output has changed to "Zookeeper audit is Enabled." but still it doesn't do anything and stops there like before.
Output after running the zkServer command with files edited
Even tried changing one file at a time but the same output and still stops working.
Can anyone help me understand the problem and provide a solution as well?
Thank you so much.
zookeeper-server-start or zkServer.cmd do not return, by design. It's tailing the logs and waiting for a client connection
You must start a second terminal to run the Kafka broker, as mentioned in the Kafka documentation
Kill all java services.
In Linux run killall -9 java
Re re-run the zookeeper and Kafka server.
It works

Error starting kafka server: " **Different application** unexpected at this time"

I am following a quick start guide for getting kafka up and running.
I have zookeeper running but when I try to start kafka server with the following command:
.\bin\windows\kafka-server-start.bat .\config\server.properties
I get the following error
\Teradata\Client\16.20\bin\ was unexpected at this time.
I cant even begin to understand how the two might be related.
Any tip is appreciated. I am very much stuck....
In my case I needed to change the CLASSPATH environment variable, changing from Program Files to Progra~1 and Program Files (x86) to Progra~2
This solved the problem.
This isn't specific to Kafka or Zookeeper, but how they load the Java Home and classpath.
You need to fix your PATH variable in Windows settings to not contain any spaces
looks like the kafka process needs all environment variables in the CLASSPATH to be present. I was getting the same error for Meld which I had deleted but CLASSPATH still have a reference to that.
Removing it from the CLASSPATH worked.
If you are using the Apache Kafka package structure as it is, then fix the path.it will be resolved automatically.
downloaded apache Kafka folder structure kafka_2.11-0.9.0.0 ->bin->windows-> All *.bat files.
kafka_2.11-0.9.0.0->config-> all *.properties files
so to start zookeeper go to the kafka_2.11-0.9.0.0 ->bin->windows->in window address bar type cmd and enter after that enter:
zookeeper-server-start.bat ..\..\config/zookeeper.properties
o/p: zookeeper will start working
To start broker again go to the same folder
kafka_2.11-0.9.0.0 ->bin->windows->in window address bar type cmd and enter after that enter:
kafka-server-start.bat ..\..\config/server.properties
o/p: broker will start working.
Note: you can update your own configuration using this properties file.like change in port etc..
I can't even begin to understand how the two might be related.<br/>
For more information, you can check the blog: Role of Apache ZooKeeper in Kafka — Monitoring & Configuration

How to redirect Apache Spark logs from the driver and the slaves to the console of the machine that launchs the Spark job using log4j?

I'm trying to build an Apache Spark application that normalizes csv files from HDFS (changes delimiter, fix broken lines). I use log4j for logging but all the logs just print in the executors so the only way i can check them is using yarn logs -applicationId command. Is there any way i can redirect all logs( from driver and from executors) to my gateway node(the one which launchs the spark job) so i can check them during execution?
You should have the executors log4j props configured to write files local to themselves. Streaming back to the driver will cause unnecessary latency in processing.
If you plan on being able to 'tail" the logs in near real-time, you would need to instrument a solution like Splunk or Elasticsearch, and use tools like Splunk Forwarders, Fluentd, or Filebeat that are agents on each box that specifically watch for all configured log paths, and push that data to a destination indexer, that'll parse and extract log field data.
Now, there are other alternatives like Streamsets or Nifi or Knime (all open source), which offer more instrumentation for collecting event processing failures, and effectively allow for "dead letter queues" to handle errors in a specific way. The part I like about those tools - no programming required.
i think it is not possible. When you execute spark in local mode you can able to see it in console. Otherwise you have to alter log4j properties for the log file path.
As per https://spark.apache.org/docs/preview/running-on-yarn.html#configuration,
YARN has two modes for handling container logs after an application has completed. If log aggregation is turned on (with the yarn.log-aggregation-enable config in yarn-site.xml file), container logs are copied to HDFS and deleted on the local machine.
You can also view the container log files directly in HDFS using the HDFS shell or API. The directory where they are located can be found by looking at your YARN configs (yarn.nodemanager.remote-app-log-dir and yarn.nodemanager.remote-app-log-dir-suffix in yarn-site.xml).
I am not sure whether the log aggregation from worker nodes happen in real time !!
There is an indirect way to achieve. Enable the following property in yarn-site.xml.
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
This will store all your logs of the submitted applications in hdfs location. Then using the following command you can download the logs into a single aggregated file.
yarn logs -applicationId application_id_example > app_logs.txt
I came across this github repo which downloads the driver and container logs separately. Clone this repository : https://github.com/hammerlab/yarn-logs-helpers
git clone --recursive https://github.com/hammerlab/yarn-logs-helpers.git
In your .bashrc (or equivalent), source .yarn-logs-helpers.sourceme:
$ source /path/to/repo/.yarn-logs-helpers.sourceme
Then download the aggregated logs into nicely segregated driver and container logs by this command.
yarn-container-logs application_example_id

Cluster(Multi server) setup for zookeeper exhibitor

I'm newbie to Zookeeper.Trying to setup clustering server for zookeeper exhibitor to modify the data. I have tried the server setup with 3 nodes, but data modification not reflected on all the zookeeper.
I refereed the following url & also setup the server in the same way. But no use, some thing i'm missing in that config to run it correctly.
http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup
Exhibitor startup command is:
java -jar exhibitor-war-1.0-jar-with-dependencies.jar -c file --nodemodification true --port 9090
Farther I need to add any other config with this to get my data modification reflect on all the zookeeper.
Advance thanks for ur kind time!
I have similar scenario, and I'm on Windows. So I get some problems due to the fact that Exhibitor is Unix-oriented - it tries to restart zkServer.sh (instead of zkServer.bat). So I:
1. have manually started ZK ensemble (all instances get data modifications from each other).
2. set up Exhibitor above every ZK instance - with single network config file.
Hope it helps. If not, give more details.