Apache Zookeeper not able to read config file, despite providing one - apache-kafka

When I start up the Apache Zookeper with this command:
sudo ./zookeeper-server-start.sh ../config/zookeeper.properties
I notice that on startup, it will emit this warning:
[2018-06-27 11:21:47,038] WARN Either no config or no quorum defined in config, running in standalone mode (org.apache.zookeeper.server.quorum.QuorumPeerMain)
There's not much in there, but I have double checked, and the config is indeed there. Here is its contents:
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# the directory where the snapshot is stored.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-production config
maxClientCnxns=0
What lead me to this discover is that I am actually currently having issues where my Brokers and Zookeepers both are randomly shutting down, without reason. The only reason for the shutdown I am given is:
Signal was killed at: [2018-06-27 13:39:52,423] INFO Terminating process due to signal SIGHUP (kafka.Kafka$)
But that's probably another question - my real question is - why can't my Kafka zookeeper find and keep its config that I am feeding it as a parameter?

Related

Unable to start a Zookeeper server

I am trying to initialize/start a Zookeeper server on Windows. I have create a logs folder to store the data and also modify the path inside the zoo.cfg file to the path of the logs folder. When I start to run the command .\zkServer.cmd start, I receive this error.
This is my zoo.cfg file
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=C:\zookeeper\logs
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
## Metrics Providers
#
# https://prometheus.io Metrics Exporter
#metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
#metricsProvider.httpHost=0.0.0.0
#metricsProvider.httpPort=7000
#metricsProvider.exportJvmInfo=true

Minimun privileges for CSI sidecar

I'm building my own CSI driver with CSI standards and I'm wondering about the Security Context to be set for the CSI sidecar containers.
I'm going to use:
Node Driver Registrar
CSI provisioner
CSI attacher
CSI liveness probe.
Some of them need to run as root and I'm wondering about the configuration in the Security Context to assign them the minimum Linux capabilities and to be sure that root capabilities are provided for the minimum time.
Am I forced to set the security context as follows? Is there any way to restrict it furthermore?
securityContext:
allowPrivilegeEscalation: true
privileged: false
runAsNonRoot: true
capabilities:
drop:
- all
add:
- SYS_ADMIN
Thanks in advance,
Antonio
Based on research about kubernetes and linux capabilities, that looks you've already found the least possible privileges.
Your example contains minimum needed capability - CAP_SYS_ADMIN which is used primarily for mounting and unmounting filesystems.
In more details CAP_SYS_ADMIN is used for:
Perform a range of system administration operations including: quotactl(2), mount(2), umount(2), pivot_root(2), swapon(2), swapoff(2), sethostname(2),
and setdomainname(2);
use ioprio_set(2) to assign IOPRIO_CLASS_RT and (before Linux 2.6.25) IOPRIO_CLASS_IDLE I/O scheduling classes;
exceed /proc/sys/fs/file-max, the system-wide limit on
the number of open files, in system calls that open
files (e.g., accept(2), execve(2), open(2), pipe(2));
call setns(2) (requires CAP_SYS_ADMIN in the target
namespace);
employ the TIOCSTI ioctl(2) to insert characters into
the input queue of a terminal other than the caller's
controlling terminal;
employ the obsolete nfsservctl(2) system call;
employ the obsolete bdflush(2) system call;
perform various privileged block-device ioctl(2)
operations;
perform various privileged filesystem ioctl(2) operations;
perform privileged ioctl(2) operations on the /dev/random device (see random(4));
perform administrative operations on many device drivers;
Source - capabilities(7) — Linux manual page
Also there is a very good article with many details about docker images and security aspects can be found here - Towards unprivileged container builds
One more article explains linux capabilities and its appliance with examples may be helpful - HackTricks - Linux Capabilities

what is in zookeeper datadir and how to cleanup?

I found my zookeeper dataDir is huge. I would like to understand
What is in the dataDir?
How to cleanup? Does it automatically cleanup after certain period?
Thanks
According to Zookeeper's administrator guide:
The ZooKeeper Data Directory contains files which are a persistent copy of the znodes stored by a particular serving ensemble. These are the snapshot and transactional log files. As changes are made to the znodes these changes are appended to a transaction log, occasionally, when a log grows large, a snapshot of the current state of all znodes will be written to the filesystem. This snapshot supercedes all previous logs.
So in short, for your first question, you can assume that dataDir is used to store Zookeeper's state.
As for your second question, there is no automatic cleanup. From the doc:
A ZooKeeper server will not remove old snapshots and log files, this is the responsibility of the operator. Every serving environment is different and therefore the requirements of managing these files may differ from install to install (backup for example).
The PurgeTxnLog utility implements a simple retention policy that administrators can use. The API docs contains details on calling conventions (arguments, etc...).
In the following example the last count snapshots and their corresponding logs are retained and the others are deleted. The value of should typically be greater than 3 (although not required, this provides 3 backups in the unlikely event a recent log has become corrupted). This can be run as a cron job on the ZooKeeper server machines to clean up the logs daily.
java -cp zookeeper.jar:log4j.jar:conf org.apache.zookeeper.server.PurgeTxnLog <dataDir> <snapDir> -n <count>
If this is a dev instance, I guess you could just almost completely purge the folder (except some files like myid if its there). But for a production instance you should follow the cleanup procedure shown above.

Zookeeper dataLogDir config invalid

I built a zookeeper cluster and it runs very well. But I found that the log directory I set in the zoo.cfg seems not working. Below is my config about log directory and snapshots directory.
dataDir=/var/lib/zookeeper
dataLogDir=/var/lib/zookeeper/logs
However, file zookeeper.out is generated in /var/lib/zookeeper rather than the subsidiary log folder /var/lib/zookeeper/logs.
I restarted zookeeper on every server many times, but made no sense.
This happens because zookeeper.out is related to other type of log (application log) instead of the one specified by dataLogDir which relates to transaction log.
dataLogDir
This option will direct the machine to write the transaction log to
the dataLogDir rather than the dataDir. This allows a dedicated log
device to be used, and helps avoid competition between logging and
snaphots.
By checking zkServer.sh you'll see that zookeeper.out is related to _ZOO_DAEMON_OUT which depends on ZOO_LOG_DIR which is set by default by zkEnv.sh. Depending on your environment and zookeeper (ZK) version the zookeeper.out file might land in different places (according to this answer even in the working directory from which ZK is started).
For application logging you'll better configure the log4j.properties file; that's because ZK uses log4j.

Cannot Start Mesos/Marathon Cluster

Physical Machine: 192.168.10.1 ( Mesos, Zookeeper, Marathon )
Virtual Machine: 192.168.122.10 ( Mesos, Zookeeper )
Virtual Machine: 192.168.122.46 ( Mesos, Zookeeper )
OS for all three machines are Fedora 23 Server
The two networks are already inter-routed by default as the virtual machines all reside on the physical machine.
There is no firewall setup.
Mesos Election LOG:
Master bound to loopback interface! Cannot communicate with remote schedulers or slaves. You might want to set '--ip' flag to a routable IP address.
I can set this manually, however I cannot set this dynamically... the --ip_discovery_command flag is not recognized.
What I wanted to do was link the below script to that flag.
if [[ $(ip addr) == *enp8s0* ]];
then
ip addr show enp8s0 | awk -F'/| ' '/inet/ { print $6 }'
else
ip addr show eth0 | awk -F'/| ' '/inet/ { print $6 }'
fi
When I do set this manually (not what I want to do)...
the Mesos page at IP:5050 comes up... but then the mesos-master fails after 1 minute due to this...
F0427 17:03:27.975260 6914 master.cpp:1253] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
*** Check failure stack trace: ***
# 0x7f8360fa9edd (unknown)
# 0x7f8360fabc50 (unknown)
# 0x7f8360fa9ad3 (unknown)
# 0x7f8360fac61e (unknown)
# 0x7f83619a85dd (unknown)
# 0x7f83619e7c30 (unknown)
# 0x55a885ee3b2e (unknown)
# 0x7f8361a11c0e (unknown)
# 0x7f8361a5d75e (unknown)
# 0x7f8361a7077a (unknown)
# 0x7f83618f4aae (unknown)
# 0x7f8361a70768 (unknown)
# 0x7f8361a548d0 (unknown)
# 0x7f8361fc832c (unknown)
# 0x7f8361fd42a5 (unknown)
# 0x7f8361fd472f (unknown)
# 0x7f8360a5e60a start_thread
# 0x7f835fefda4d __clone Aborted (core dumped)
Zookeeper is setup like this:
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/var/lib/zookeeper/data
dataLogDir=/var/lib/zookeeper/log
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1:192.168.10.1:2888:3888
server.2:192.168.122.46:2888:3888
server.3:192.168.122.10:2888:3888
and have no idea how to verify that it is working properly...
I'm honestly on the end of my rope.. pulling out my hair for the past week on this due to poor documentation and lack of proper architecture explanations (primarily Marathon) horribly organized logs (Mesos), systemd being unable to properly parse a bash and use the output as a variable, and lack of instructions all around.
Am I doing something wrong? I Appreciate any assistance I can get, Let me know if you need anything I have not yet provided and I will post it right away.
EDIT:
I fixed the issue with marathon, by adding two additional Marathon servers to the VM's so that they could form a quorum.
EDIT2:
I am now having issues where the Mesos server keeps rapidly re-electing a leader... but depending on the outcome I will look into this later...
If you follow the installation docs closely, I think you should get it to work.
For example you "Master binds to loopback" problem is IMHO related to incorrect/incomplete settings. See:
Hostname (optional)
If you're unable to resolve the hostname of the machine directly (e.g., if on a different network or using a VPN), set /etc/mesos-master/hostname to a value that you can resolve, for example, an externally accessible IP address or DNS hostname. This will ensure all links from the Mesos console work correctly.
You will also want to set this property in /etc/marathon/conf/hostname.
Furthermore, I'd recommend to also set the Master IP address in the /etc/mesos-master/ip file. Always make sure that the hostnames are resolvable to a non-local IP address, i.e. by adding entries in the /etc/hosts file on each host.
Basically, the /etc/hosts file should look similar to this (replace the hostnames with the actual ones):
127.0.0.1 localhost
192.168.10.1 host1
192.168.122.10 host2
192.168.122.46 host3
If you just want to test a Mesos cluster, you could also use a preconfigured Vagrant solution like tobilg/coreos-mesos-cluster.
Regarding the ZooKeeper setup, make sure that you created a /var/lib/zookeeper/myid on each node which contains the actual numeric id you set for each node, e.g. for 192.168.10.1 the sole content of the file needs to be 1.
Before debugging the masters, check that the ZooKeeper cluster works correctly, and that a leader is elected. Make sure that /etc/mesos/zk contains the right ZooKeeper connection string on each host, e.g.
zk://192.168.10.1:2181,192.168.122.10:2181,192.168.122.46:2181/mesos
If ZK works, then restart the services and check the Masters logs. Do the same with the Slaves.
References:
https://open.mesosphere.com/reference/mesos-master/
https://open.mesosphere.com/reference/mesos-slave/
https://mesosphere.github.io/marathon/docs/