Zookeeper - three nodes and nothing but errors - apache-zookeeper

I have three zookeeper nodes. All ports are open. The ip address are correct. Below is my config file. All nodes where booted by chef and all have the same install and config file.
# The number of milliseconds of each tick
tickTime=3000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/var/lib/zookeeper
# Place the dataLogDir to a separate physical disc for better performance
# dataLogDir=/disk2/zookeeper
# the port at which the clients will connect
clientPort=2181
server.1=111.111.111:2888:3888
server.2=111.111.112:2888:3888
server.3=111.111.113:2888:3888
Here is error for one of the nodes. So...I am rather confused on how I could get an error since the config is rather vanilla. All three nodes are doing hte same thing.
2012-07-16 05:16:57,558 - INFO [main:QuorumPeerConfig#90] - Reading configuration from: /etc/zookeeper/conf/zoo.cfg
2012-07-16 05:16:57,567 - INFO [main:QuorumPeerConfig#310] - Defaulting to majority quorums
2012-07-16 05:16:57,572 - FATAL [main:QuorumPeerMain#83] - Invalid config, exiting abnormally
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing /etc/zookeeper/conf/zoo.cfg
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:110)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:99)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
Caused by: java.lang.IllegalArgumentException: serverid replace this text with the cluster-unique zookeeper's instance id (1-255) is not a number
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:333)
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:106)
... 2 more

You need create a file named myid and put it into zookeeper var directory, one for each server, consists of a single line containing only the text of that machine's id. So myid of server 1 would contain the text "1" and nothing else. The id must be unique within the ensemble and should have a value between 1 and 255.
see more at http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup

server.1=111.111.111:2888:3888
server.2=111.111.112:2888:3888
server.3=111.111.113:2888:3888
Are your servers and IP's
Then create myid file on each of the nodes with value 1 in 111.111.111 and 2 in 111.111.111.112 and 3 in 111.111.111.113 servers under directory(dataDir=/var/lib/zookeeper)
If you place value "1" myid file you will get Number format exception and "Invalid config, exiting abnormally" if the myid file is created with any extension.
Therefore just create myid file without any extension and place integer values 1,2,3 in the corresponding servers without double quotes

Related

Kafka connector's task status are different when queried against different kafka connect nodes in a clustered enviroment

We have a 3 node Kafka connect cluster running version 5.5.4 in distributed mode. We are observing a strange issue regarding connector's task status.
The REST calls to node 1 and 2 are returning different results.
The first node returned this result:
{
"connector":{
"state":"RUNNING",
"worker_id":"x.com:8083"
},
"name":"connector",
"type":"source",
"tasks":[
]
}
Yes the task is empty where as the other node returned this result:
{
"connector":{
"state":"RUNNING",
"worker_id":"x.com:8083"
},
"name":"connector...",
"type":"source",
"tasks":[
{
"id":0,
"state":"RUNNING",
"worker_id":"x.com:8083"
}
]
}
As mentioned in this doc https://docs.confluent.io/home/connect/userguide.html#kconnect-internal-topics, I have configured group.id, config.storage.topic, offset.storage.topic and status.storage.topic with identical values in all 3 nodes.
I did go through connect-statuses-0 data directory and the file sizes for log, index and timestamp are all identical in node 1 and node 2. I don't know what is the .snapshot file but I see only one with root user/group in first node where as I see 2 of them in the 2nd node. One owned by root user/group and the other owned by our custom created user. Not sure this has anything to do with this problem.
Please guide me in identifying the root cause for this problem. If I do need to check any configuration, please let me know.

fluentbit writes to /var/log/messages

I'm running fluentbit (td-agent-bit) on a CentOS system in order to output all logs in a centralized system. Everytime fluentbit pushes a record to the remote location, it adds a record in /var/log/messages as well, leading up to a huge log filesize.
Jul 21 08:48:53 hostname td-agent-bit: [2020/07/21 08:48:53] [ info] [out_azure] customer_id=XXXXXXXXXXXXXXXXXXXXXXXX, HTTP status=200
Any idea how can I stop a service (td-agent-bit) from writing to /var/log/messages? Couldn't find any configuration parameter (e.g. verbose) in fluentbit documentation. Thanks!
Your log_level is "info" which includes a lot of messages of the pipeline. You can either decrease the log level inside the output section of the plugin to "error" only, e.g:
[OUTPUT]
name azure
match *
log_level error
note: you can decrease the general log_level also in the main [SERVICE] section.

rsyslog 5.8 imfile outside /var/log not picking up log files

I would like to pick up logs of different types from various locations other than /var/log and send them to a central location.
Using RH 6.6 and rsyslog 5.8 the configuration works fine when using path within /var/log. If I use other path like /opt/appname/log/file.log. The rsyslog client does not pick up the log. I do not see any error or message when running rsyslogd in debug mode.
Example:
Client:
...
$InputFileName /opt/appname/test.log
$InputFileTag APPNAME1
$InputFileStateFile stat-APPNAME1
$InputFileSeverity info
$InputFilePersistStateInterval 200
$InputFileFacility local3 # alto tried with other local
$InputRunFileMonitor
...
Server:
...
$template HostAudit, "/opt/logs/%HOSTNAME%/test.log" # tried differnt path
$template auditFormat, "%msg%\n"
local3.* ?HostAudit;auditFormat
...
Any recommendations?, I appreciate your help!!!
Bill
I would first try these:
Verify that the state file names are unique
Verify that every $InputFileName points to an existing regular file
Remove some of the files that you want to be monitored from the configuration. It could be that there is a problem with only one of the monitored files. That would make rsyslog ignore the rest of the files.
I had this with "$InputFileStateFile tomcat-log" for each of the individual tomcat logs. Each of the state file name needs to be unique. For me it worked by changing it to instances of:
"$InputFileStateFile tomcat-manager"
"$InputFileStateFile tomcat-localhost"
etc...
Another option is to just add numbers to the end of the state file name.
"$InputFileStateFile tomcat-log1"
"$InputFileStateFile tomcat-log2"

mesos-master crash with zookeeper cluster

I am deploying a zookeeper cluster which has 3 nodes. I use it to keep my mesos master high availability. I download the zookeeper-3.4.6.tar.gz tarball and uncompress it to /opt, rename it to /opt/zookeeper, enter the directory, edit the conf/zoo.cfg(pasted below), create a myid file in dataDir(which is set to /var/lib/zookeeper in zoo.cfg), and start zookeeper using ./bin/zkServer.sh start, and it goes well. I start all the 3 nodes one by one and they all seems well. I use ./bin/zkCli.sh to connect the server , no problem.
But when I start mesos (3 masters and 3 slaves, each node runs a master and a slave), then the masters soon crashed, one by one, and in the webpage http://mesos_master:5050, slave tab, no slaves are displayed. But when I run only one zookeeper, these are all fine. So I think it's the zookeeper cluster's problem.
I got 3 PV host in my ubuntu server. they are all running ubuntu 14.04 LTS:
node-01, node-02, node-03,
I have /etc/hosts in all three nodes like this:
172.16.2.70 node-01
172.16.2.81 node-02
172.16.2.80 node-03
I installed zookeeper, mesos on all the three nodes. Zookeeper configure file is like this (all three nodes) :
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=node-01:2888:3888
server.2=node-02:2888:3888
server.3=node-03:2888:3888
they can be started normally and run well. And then I start the mesos-master service, using the command line ./bin/mesos-master.sh --zk=zk://172.16.2.70:2181,172.16.2.81:2181,172.16.2.80:2181/mesos --work_dir=/var/lib/mesos --quorum=2, and after a few seconds, it gives me errors like this:
F0817 15:09:19.995256 2250 master.cpp:1253] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
*** Check failure stack trace: ***
# 0x7fa2b8be71a2 google::LogMessage::Fail()
# 0x7fa2b8be70ee google::LogMessage::SendToLog()
# 0x7fa2b8be6af0 google::LogMessage::Flush()
# 0x7fa2b8be9a04 google::LogMessageFatal::~LogMessageFatal()
▽
# 0x7fa2b81a899a mesos::internal::master::fail()
▽
# 0x7fa2b8262f8f _ZNSt5_BindIFPFvRKSsS1_EPKcSt12_PlaceholderILi1EEEE6__callIvJS1_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
▽
# 0x7fa2b823fba7 _ZNSt5_BindIFPFvRKSsS1_EPKcSt12_PlaceholderILi1EEEEclIJS1_EvEET0_DpOT_
# 0x7fa2b820f9f3 _ZZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsS6_EPKcSt12_PlaceholderILi1EEEEvEERKS2_OT_NS2_6PreferEENUlS6_E_clES6_
# 0x7fa2b826305c _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvS1_S1_EPKcSt12_PlaceholderILi1EEEEvEERKS6_OT_NS6_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_
# 0x4a44e7 std::function<>::operator()()
# 0x49f3a7 _ZN7process8internal3runISt8functionIFvRKSsEEJS4_EEEvRKSt6vectorIT_SaIS8_EEDpOT0_
# 0x499480 process::Future<>::fail()
# 0x7fa2b806b4b4 process::Promise<>::fail()
# 0x7fa2b826011b process::internal::thenf<>()
# 0x7fa2b82a0757 _ZNSt5_BindIFPFvRKSt8functionIFN7process6FutureI7NothingEERKN5mesos8internal8RegistryEEERKSt10shared_ptrINS1_7PromiseIS3_EEERKNS2_IS7_EEESB_SH_St12_PlaceholderILi1EEEE6__callIvISM_EILm0ELm1ELm2EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
# 0x7fa2b82962d9 std::_Bind<>::operator()<>()
# 0x7fa2b827ee89 std::_Function_handler<>::_M_invoke()
I0817 15:09:20.098639 2248 http.cpp:283] HTTP GET for /master/state.json from 172.16.2.84:54542 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36'
# 0x7fa2b8296507 std::function<>::operator()()
# 0x7fa2b827efaf _ZZNK7process6FutureIN5mesos8internal8RegistryEE5onAnyIRSt8functionIFvRKS4_EEvEES8_OT_NS4_6PreferEENUlS8_E_clES8_
# 0x7fa2b82a07fe _ZNSt17_Function_handlerIFvRKN7process6FutureIN5mesos8internal8RegistryEEEEZNKS5_5onAnyIRSt8functionIS8_EvEES7_OT_NS5_6PreferEEUlS7_E_E9_M_invokeERKSt9_Any_dataS7_
# 0x7fa2b8296507 std::function<>::operator()()
# 0x7fa2b82e4419 process::internal::run<>()
# 0x7fa2b82da22a process::Future<>::fail()
# 0x7fa2b83136b5 std::_Mem_fn<>::operator()<>()
# 0x7fa2b830efdf _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureIN5mesos8internal8RegistryEEEFbRKSsEES6_St12_PlaceholderILi1EEEE6__callIbIS8_EILm0ELm1EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
# 0x7fa2b8307d7f _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureIN5mesos8internal8RegistryEEEFbRKSsEES6_St12_PlaceholderILi1EEEEclIJS8_EbEET0_DpOT_
# 0x7fa2b82fe431 _ZZNK7process6FutureIN5mesos8internal8RegistryEE8onFailedISt5_BindIFSt7_Mem_fnIMS4_FbRKSsEES4_St12_PlaceholderILi1EEEEbEERKS4_OT_NS4_6PreferEENUlS9_E_clES9_
# 0x7fa2b830f065 _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureIN5mesos8internal8RegistryEE8onFailedISt5_BindIFSt7_Mem_fnIMS8_FbS1_EES8_St12_PlaceholderILi1EEEEbEERKS8_OT_NS8_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_
# 0x4a44e7 std::function<>::operator()()
# 0x49f3a7 _ZN7process8internal3runISt8functionIFvRKSsEEJS4_EEEvRKSt6vectorIT_SaIS8_EEDpOT0_
# 0x7fa2b82da202 process::Future<>::fail()
# 0x7fa2b82d2d82 process::Promise<>::fail()
Aborted
sometimes the warning is like this, and then crashed with the same output above:
0817 15:09:49.745750 2104 recover.cpp:111] Unable to finish the recover protocol in 10secs, retrying
I want to know whether zookeeper is deployed and run well in my case, and How can I locate where the problem is. Any answers and suggests are welcomed. thanks.
Actually, in my case, It's because I didn't open firewall port 5050 to allow three servers to communicate with each others. After updating firewall rule, it starts to work as expected.
I fall into same issue, I tried different ways and different options and finally --ip option worked for me. Initially I used --hostname option
mesos-master --ip=192.168.0.13 --quorum=2 --zk=zk://m1:2181,m2:2181,m3:2181/mesos --work_dir=/opt/mm1 --log_dir=/opt/mm1/logs
You need to check that all mesos/zookeeper master nodes can communicate correctly. For that, you need:
Zookeeper ports open: TCP 2181, 2888, 3888
Mesos port open: TCP 5050
ping available (ICMP message 0 and 8)
If you use FQDN instead of IP in your config, check that the DNS resolution is working correctly as well.
Split your mesos masters' work_dir to different dir, do not use a share work_dir for all masters, because of zk

Writing from PIG to MongoDB - error 2116 - mongodb schema not found

I am using Hadoop on Windows Server 2008 - Hortonworks distribution
We are using PIG and trying to write the data into MongoDB; I am not able to read or write to the MongoDB; not sure what the issue we get an error 2116 which states that the mongodb schema is empty
Command to read -
register 'D:\hdp\pig-0.12.1.2.1.1.0-1621\lib\mongo-hadoop-core-1.2.0.jar'
register 'D:\mongo-hadoop-2.2-1.2.0\mongo-hadoop-2.2-1.2.0\mongo-hadoop-1.2.0.jar'
register 'D:\mongo-hadoop-2.2-1.2.0\mongo-hadoop-2.2-1.2.0\mongo-hadoop-pig-1.2.0.jar'
register 'D:\hdp\hadoop-2.4.0.2.1.1.0-1621\lib\mongo-2.6.1.jar'
set mapred.map.tasks.speculative.execution false;
set mapred.reduce.tasks.speculative.execution false;
SET mapreduce.fileoutputcommitter.marksuccessfuljobs false;
SalesLoading = load 'mongodb://localhost/benvenuedb.SalesData' using com.mongodb.hadoop.pig.MongoLoader();
store SalesLoading into 'mongodb://localhost:27017/benvenuedb.SalesData1' using com.mongodb.hadoop.pig.MongoStorage();
Error Messages
Pig Stack Trace
---------------
ERROR 2116:
<line 5, column 0> Output Location Validation Failed for: 'mongodb://127.0.0.1:27017/benvenuedb.SalesData More info to follow:
The value of property mongo.pig.output.schema must not be null
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias salesLoading
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1637)
at org.apache.pig.PigServer.registerQuery(PigServer.java:577)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:541)
at org.apache.pig.Main.main(Main.java:156)
Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2116:
<line 5, column 0> Output Location Validation Failed for: 'mongodb://127.0.0.1:27017/benvenuedb.SalesData More info to follow:
The value of property mongo.pig.output.schema must not be null
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:75)
at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:303)
at org.apache.pig.PigServer.compilePp(PigServer.java:1382)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1307)
at org.apache.pig.PigServer.execute(PigServer.java:1299)
at org.apache.pig.PigServer.access$400(PigServer.java:124)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1632)
... 8 more
Caused by: java.lang.IllegalArgumentException: The value of property mongo.pig.output.schema must not be null
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:971)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:953)
at com.mongodb.hadoop.pig.MongoStorage.setStoreLocation(MongoStorage.java:249)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:68)
... 20 more
I have issued netstat -an to see the open ports
The local address is 10.69.148.89; I do not see the port 27017 open in this IP; however 127.0.0.1 has 27017 open. There is something simple we are overlooking.
Need some help; we have spent over 2 days with no resolution
Have you tried setting the property it says is missing?
The value of property mongo.pig.output.schema must not be null
There are some issues in writing to MongoDB from PIG especially when you use Hortonworks windows distribution. I have broken this into Three steps;
Write to HDFS filesystem as a json file using JSONStorage( );
Move the HDFS file to windows filesystem
Load the json file into MongoDB
I am open if anyone has attempted this in a different way