Is zookeeper reconfig expected to update the zoo.cfg.dynamic file? - apache-zookeeper

I'm setting up a distributed cluster for ZooKeeper based on version 3.5.2. In specific, I'm utilizing the reconfig command to dynamically update the configuration when there is any rebalance in the cluster (e.g. one of the nodes comes down).
The observation I have is that the zoo.cfg.dynamic file is not getting updated even when the reconfig (add/remove) command is correctly executed. Is this the expected behavior ? Basically I'm looking for guidance whether we should manage the zoo.cfg.dynamic file also through a separate script (update it lock-step with the reconfig command) or can we rely on the reconfig command to do this for us. My preference/expectation is the latter.
Following is the sample command:
reconfig -remove 6 -add server.5=125.23.63.23:1234:1235;1236

From the reconfig documentation:
Dynamic configuration parameters are stored in a separate file on the server (which we call the dynamic configuration file). This file is linked from the static config file using the new dynamicConfigFile keyword.
So I could practically start with any file name to host the ensemble list and ensure the 'dynamicConfigFile' config keyword just point to this file.
Now when the reconfig command is run, basically a new dynamic-config file (e.g. zoo.cfg.dynamic.00000112) is generated which contains the transformed list of servers, in the form as below (as an example):
server.1=125.23.63.23:2780:2783:participant;2791
server.2=125.23.63.24:2781:2784:participant;2792
server.3=125.23.63.25:2782:2785:participant;2793
The zoo.cfg file is hence auto-updated to point the 'dynamicConfigFile' config keyword to the new config file (zoo.cfg.dynamic.00000112). The previous dynamic-config file continues to be available in the runtime (config directory) but it is not being referred by the main config anymore.
So overall, there is no overhead to update any file lock-step to the reconfig command i.e. reconfig command takes care of it all. The only potential overhead to upfront resolve is to write a periodic purge of the old dynamic-config files.

Related

Reading a file from local file system after reading it from hadoop file system

I am trying to read a file from my local EMR file system. It is there as a file under the folder /emr/myFile.csv. However, I keep getting a FileNotFoundException. Here is the line of code that I use to read it:
val myObj: File = new File("/emr/myFile.csv")
I added a file://// prefix to the file path as well because I have seen that work for others, but that still did not work. So I also try to read directly from the hadoop file system where it is stored in the folder: /emr/CNSMR_ACCNT_BAL/myFile.csv because I thought it was maybe checking by default in hdfs. However, that also results in a FileNotFoundException. Here is the code for that:
val myObj: File = new File("/emr/CNSMR_ACCNT_BAL/myFile.csv")
How can I read this file into a File?
For your 1st problem:
When you submit a hadoop job application master can get created on any of your worker node including master node (depending on your configuration).
If you are using EMR, your application master by default gets created on any of your worker node (CORE node) but not on master.
When you say file:///emr/myFile.csv this file exists on your local file system (I'm assuming that means on master node), your program will search for this file on that node where the application master is and its definitely not on your master node because for that you wouldn’t get any error.
2nd problem:
When you try to access a file in HDFS using java File.class, it won’t be able to access that file.
You need to use hadoop FileSystem api (org.apache.hadoop.fs.FileSystem) to interact with a HDFS file.
Also use HDFS file tag hdfs://<namenode>:<port>/emr/CNSMR_ACCNT_BAL/myFile.csv.
If your core-site.xml contains value of fs.defaultFS then you don’t need to put namenode and port info just simply hdfs:///emr/CNSMR_ACCNT_BAL/myFile.csv
So what's better option here while accessing file in hadoop cluster?
The answer depends upon your use case, but most cases putting it in HDFS it much better, because you don’t have to worry about where your application master is. Each and every node have access to the hdfs.
Hope that resolves your problem.

How do I upgrade apache kafka in linux

I have a novice question on kafka upgrade.. This is the 1st time I'm upgrading my kafka in Linux.
My current version is "kafka_2.11-1.0.0.tgz".. when I initially setup I had a folder named kafka_2.11-1.0.0.
Now I downloaded a new version "kafka_2.12-2.3.0.tgz". If I extract it is going to create a new folder kafka_2.12-2.3.0 which will result in 2 independent kafka with server.properties.
As per documentation I have to update server.properties with below 2 properties..
inter.broker.protocol.version=2.3
log.message.format.version=2.3
How does this affect if it is going to install in a new directory with new server.properties?
How can I merge server.properties & do the upgrade? Please share if you have documents or steps..
it's fairly simple to upgrade Kafka.
It would have been easier for you to separate config files from binary directories, as a result, from what I understand, your config file remains with the untar package folder.
You can put the config file in /etc/kafka next time you'll package it on your Linux server.
What you can do here is , after untar your kafka_2.12-2.3.0.tgz file, you just copy the former server.properties ( and other config files you might use as well) and replace the one in the 2.3.0 arborescence.
But be careful, for inter.broker.protocol.version=2.3 and log.message.format.version=2.3 parameters, you must first specify the former version for those parameters ( and message.format is not mandatory to change, double check the doc for this one) before doing your rolling restart.
If you are using 1.0 now, just put the following :
inter.broker.protocol.version=1.0 and log.message.format.version=1.0
then restart your brokers one by one (using the new package folder this time)
Then edit them again as follows :
inter.broker.protocol.version=2.3 and log.message.format.version=2.3 and do a second rolling restart.
Then you should be good
More details here :
https://kafka.apache.org/documentation/#upgrade_2_3_0
Yannick

Zookeeper dataLogDir config invalid

I built a zookeeper cluster and it runs very well. But I found that the log directory I set in the zoo.cfg seems not working. Below is my config about log directory and snapshots directory.
dataDir=/var/lib/zookeeper
dataLogDir=/var/lib/zookeeper/logs
However, file zookeeper.out is generated in /var/lib/zookeeper rather than the subsidiary log folder /var/lib/zookeeper/logs.
I restarted zookeeper on every server many times, but made no sense.
This happens because zookeeper.out is related to other type of log (application log) instead of the one specified by dataLogDir which relates to transaction log.
dataLogDir
This option will direct the machine to write the transaction log to
the dataLogDir rather than the dataDir. This allows a dedicated log
device to be used, and helps avoid competition between logging and
snaphots.
By checking zkServer.sh you'll see that zookeeper.out is related to _ZOO_DAEMON_OUT which depends on ZOO_LOG_DIR which is set by default by zkEnv.sh. Depending on your environment and zookeeper (ZK) version the zookeeper.out file might land in different places (according to this answer even in the working directory from which ZK is started).
For application logging you'll better configure the log4j.properties file; that's because ZK uses log4j.

"Injecting" configuration files at startup

I have a number of legacy services running which read their configuration files from disk and a separate daemon which updates these files as they change in zookeeper (somewhat similar to confd).
For most of these types of configuration we would love to move to a more environment variable like model, where the config is fixed for the lifetime of the pod. We need to keep the outside config files as the source of truth as services are transitioning from the legacy model to kubernetes, however. I'm curious if there is a clean way to do this in kubernetes.
A simplified version of the current model that we are pursuing is:
Create a docker image which has a utility for fetching config files and writing them to disk ones. Then writes a /donepath/done file.
The main image waits until the done file exists. Then allows the normal service startup to progress.
Use an empty dir volume and volume mounts to get the conf from the helper image into the main image.
I keep seeing instances of this problem where I "just" need to get a couple of files into the docker image at startup (to allow per-env/canary/etc variance), and running all of this machinery each time seems like a burden throw on devs. I'm curious if there is a more simplistic way to do this already in kubernetes or on the horizon.
You can use the ADD command in your Dockerfile. It is used as ADD File /path/in/docker. This will allow you to add a complete file quickly to your container. You need to have the file you want to add to the image in the same directory as the Dockerfile when you build the container. You can also add a tar file this way which will be expanded during the build.
Another option is the ENV command in a your Dockerfile. This adds the data as an environment variable.

Standalone replica sets mongodb

I need to create a standalone replica set in mongo. I followed the steps here: http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/
Everything worked as expected, but I was wondering how I could configure this in the mongodb.conf file so I didn't have to manually do these steps every time. Is something like this possible via the conf file? I know there is a replSet param that you can have in the conf file, but I wasn't sure how to specify which ports to use for the different replica sets. Thanks!
Most of the command line parameters you specify are settable in the configuration file - you can see how here: http://docs.mongodb.org/manual/reference/configuration-options/
Specifically, notice that you can set, port, replSet, and dbPath from the configuration file.
There is also a good article on Replica set configuration here: http://docs.mongodb.org/manual/reference/replica-configuration/