I am using kubernetes, where I have a Hadoop cluster running in namespace 'platform'.
I have an example application running in namespace 'example'
The example application needs to talk to HBase. When it does so, we see the following error:
java.net.UnknownHostException: hregion-0.hregion
at java.net.InetAddress.getAllByName0(InetAddress.java:1280)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at java.net.InetAddress.getByName(InetAddress.java:1076)
at org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:233)
at org.apache.hadoop.hbase.client.ConnectionImplementation.getClient(ConnectionImplementation.java:1192)
at org.apache.hadoop.hbase.client.ClientServiceCallable.setStubByServiceName(ClientServiceCallable.java:44)
at org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:229)
at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:386)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:360)
at org.apache.hadoop.hbase.MetaTableAccessor.getTableState(MetaTableAccessor.java:1078)
at org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:403)
at org.apache.hadoop.hbase.client.HBaseAdmin$6.rpcCall(HBaseAdmin.java:445)
at org.apache.hadoop.hbase.client.HBaseAdmin$6.rpcCall(HBaseAdmin.java:442)
at org.apache.hadoop.hbase.client.RpcRetryingCallable.call(RpcRetryingCallable.java:58)
at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3084)
at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3076)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:442)
The command
> nslookup hregion-0.hregion
on the client machine fails, because the hregion service is in the platform namespace (where that command will succeed).
We suspected that the HBase region server has registered itself with zookeeper using an incomplete name, and verified by connecting to the zookeeper server:
[zk: localhost:2181(CONNECTED) 8] ls /hbase/rs
[hregion-0.hregion,16020,1560851357442]
The ConnectionUtils.getStubKey method simply uses java.net.InetAddress.getByName(hostname) and it is this method which fails.
Here is some zookeeper debugging info (this from the HBase master):
hbase(main):001:0> zk_dump
HBase is rooted at /hbase
Active master address: hmaster-0.hmaster.platform.svc.cluster.local,16000,1560851357485
Backup master addresses:
Region server holding hbase:meta: hregion-0.hregion,16020,1560851357442
Region servers:
hregion-0.hregion,16020,1560851357442
On the hregion-0 server, we have the following entries in /etc/hosts:
# Kubernetes-managed hosts file.
127.0.0.1 localhost
10.1.14.53 hregion-0.hregion.platform.svc.cluster.local hregion-0
And the /etc/resolv.conf file looks like this:
nameserver 10.96.0.10
search platform.svc.cluster.local svc.cluster.local cluster.local mycompany.com
options ndots:5
How do I fix this? I assume I need to tell HBase to register its nodes in zookeeper using their fully qualified domain name - how?
I am using Storm v1.1.0, and I am building Storm on different machines, lets say I have 5 machines,
machine1: Zookeeper
machine2: Nimbus
machine3: Supervisor1
machine4: Supervisor2
machine5: UI
And the configuration of each machine is the following:
machine1
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/tmp/zookeeper
clientPort=2181
machine2
storm.local.dir: "/mnt/storm"
storm.zookeeper.servers: - "ZookeeperIP"
machines(3-4)
storm.local.dir: "/mnt/storm"
storm.zookeeper.servers:
- "ZookeeperIP"
nimbus.seeds: ["NimbusIP"]
machine5
storm.local.dir: "/mnt/storm"
storm.zookeeper.servers:
- "ZookeeperIP"
nimbus.seeds: ["NimbusIP"]
ui.port: 8080
All is running okay no errors, and I also checked logs, but no errors also all of them are running and starting fine!
The problem is UI is showing anything and gives error in the console
machine5
storm.local.dir: "/mnt/storm"
storm.zookeeper.servers:
- "ZookeeperIP"
nimbus.seeds: ["NimbusIP"]
ui.port: 8080
This configuration should be available in machine 2-5. From your description,
machine 2 is missing nimbus.seeds and ui.port. Machine 3-4 is missing ui.port.
Try keeping the same configuration throughout your storm cluster.
I use centos6,zookeeper3.4.6,Pseudo distributed cluster. it happens connection refused :Cannot open channel to 3 at election address /192.168.10.129:3883
now the config as follows;
zookeep1/data/myid:1
zookeep2/data/myid:2
zookeep3/data/myid:3
zookeep1/zoo.cfg:
/home/lwj/solrcloud/zookeep1/data/
clientPort=2181
server.1=127.0.0.1:2881:3881
server.2=127.0.0.1:2882:3882
server.3=127.0.0.1:2883:3883
zookeep2/zoo.cfg:
/home/lwj/solrcloud/zookeep2/data/
clientPort=2182
server.1=127.0.0.1:2881:3881
server.2=127.0.0.1:2882:3882
server.3=127.0.0.1:2883:3883
zookeep3/zoo.cfg:
/home/lwj/solrcloud/zookeep3/data/
clientPort=2183
server.1=127.0.0.1:2881:3881
server.2=127.0.0.1:2882:3882
server.3=127.0.0.1:2883:3883
this is the error
Looks like you are running 3 instances of Zookeeper in the same host. Check the state of the port 3882 in your host .netstat -a | grep 3882
Clear the iptables in the host to free ports service iptables clear
Also, its good practice to give IP 192.168.10.129 rather than the localhost(127.0.0.1) in the configuration .
For zookeeper port details Look :
what is zookeeper port and its usage?
I am deploying a zookeeper cluster which has 3 nodes. I use it to keep my mesos master high availability. I download the zookeeper-3.4.6.tar.gz tarball and uncompress it to /opt, rename it to /opt/zookeeper, enter the directory, edit the conf/zoo.cfg(pasted below), create a myid file in dataDir(which is set to /var/lib/zookeeper in zoo.cfg), and start zookeeper using ./bin/zkServer.sh start, and it goes well. I start all the 3 nodes one by one and they all seems well. I use ./bin/zkCli.sh to connect the server , no problem.
But when I start mesos (3 masters and 3 slaves, each node runs a master and a slave), then the masters soon crashed, one by one, and in the webpage http://mesos_master:5050, slave tab, no slaves are displayed. But when I run only one zookeeper, these are all fine. So I think it's the zookeeper cluster's problem.
I got 3 PV host in my ubuntu server. they are all running ubuntu 14.04 LTS:
node-01, node-02, node-03,
I have /etc/hosts in all three nodes like this:
172.16.2.70 node-01
172.16.2.81 node-02
172.16.2.80 node-03
I installed zookeeper, mesos on all the three nodes. Zookeeper configure file is like this (all three nodes) :
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=node-01:2888:3888
server.2=node-02:2888:3888
server.3=node-03:2888:3888
they can be started normally and run well. And then I start the mesos-master service, using the command line ./bin/mesos-master.sh --zk=zk://172.16.2.70:2181,172.16.2.81:2181,172.16.2.80:2181/mesos --work_dir=/var/lib/mesos --quorum=2, and after a few seconds, it gives me errors like this:
F0817 15:09:19.995256 2250 master.cpp:1253] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
*** Check failure stack trace: ***
# 0x7fa2b8be71a2 google::LogMessage::Fail()
# 0x7fa2b8be70ee google::LogMessage::SendToLog()
# 0x7fa2b8be6af0 google::LogMessage::Flush()
# 0x7fa2b8be9a04 google::LogMessageFatal::~LogMessageFatal()
▽
# 0x7fa2b81a899a mesos::internal::master::fail()
▽
# 0x7fa2b8262f8f _ZNSt5_BindIFPFvRKSsS1_EPKcSt12_PlaceholderILi1EEEE6__callIvJS1_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
▽
# 0x7fa2b823fba7 _ZNSt5_BindIFPFvRKSsS1_EPKcSt12_PlaceholderILi1EEEEclIJS1_EvEET0_DpOT_
# 0x7fa2b820f9f3 _ZZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsS6_EPKcSt12_PlaceholderILi1EEEEvEERKS2_OT_NS2_6PreferEENUlS6_E_clES6_
# 0x7fa2b826305c _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvS1_S1_EPKcSt12_PlaceholderILi1EEEEvEERKS6_OT_NS6_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_
# 0x4a44e7 std::function<>::operator()()
# 0x49f3a7 _ZN7process8internal3runISt8functionIFvRKSsEEJS4_EEEvRKSt6vectorIT_SaIS8_EEDpOT0_
# 0x499480 process::Future<>::fail()
# 0x7fa2b806b4b4 process::Promise<>::fail()
# 0x7fa2b826011b process::internal::thenf<>()
# 0x7fa2b82a0757 _ZNSt5_BindIFPFvRKSt8functionIFN7process6FutureI7NothingEERKN5mesos8internal8RegistryEEERKSt10shared_ptrINS1_7PromiseIS3_EEERKNS2_IS7_EEESB_SH_St12_PlaceholderILi1EEEE6__callIvISM_EILm0ELm1ELm2EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
# 0x7fa2b82962d9 std::_Bind<>::operator()<>()
# 0x7fa2b827ee89 std::_Function_handler<>::_M_invoke()
I0817 15:09:20.098639 2248 http.cpp:283] HTTP GET for /master/state.json from 172.16.2.84:54542 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36'
# 0x7fa2b8296507 std::function<>::operator()()
# 0x7fa2b827efaf _ZZNK7process6FutureIN5mesos8internal8RegistryEE5onAnyIRSt8functionIFvRKS4_EEvEES8_OT_NS4_6PreferEENUlS8_E_clES8_
# 0x7fa2b82a07fe _ZNSt17_Function_handlerIFvRKN7process6FutureIN5mesos8internal8RegistryEEEEZNKS5_5onAnyIRSt8functionIS8_EvEES7_OT_NS5_6PreferEEUlS7_E_E9_M_invokeERKSt9_Any_dataS7_
# 0x7fa2b8296507 std::function<>::operator()()
# 0x7fa2b82e4419 process::internal::run<>()
# 0x7fa2b82da22a process::Future<>::fail()
# 0x7fa2b83136b5 std::_Mem_fn<>::operator()<>()
# 0x7fa2b830efdf _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureIN5mesos8internal8RegistryEEEFbRKSsEES6_St12_PlaceholderILi1EEEE6__callIbIS8_EILm0ELm1EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
# 0x7fa2b8307d7f _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureIN5mesos8internal8RegistryEEEFbRKSsEES6_St12_PlaceholderILi1EEEEclIJS8_EbEET0_DpOT_
# 0x7fa2b82fe431 _ZZNK7process6FutureIN5mesos8internal8RegistryEE8onFailedISt5_BindIFSt7_Mem_fnIMS4_FbRKSsEES4_St12_PlaceholderILi1EEEEbEERKS4_OT_NS4_6PreferEENUlS9_E_clES9_
# 0x7fa2b830f065 _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureIN5mesos8internal8RegistryEE8onFailedISt5_BindIFSt7_Mem_fnIMS8_FbS1_EES8_St12_PlaceholderILi1EEEEbEERKS8_OT_NS8_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_
# 0x4a44e7 std::function<>::operator()()
# 0x49f3a7 _ZN7process8internal3runISt8functionIFvRKSsEEJS4_EEEvRKSt6vectorIT_SaIS8_EEDpOT0_
# 0x7fa2b82da202 process::Future<>::fail()
# 0x7fa2b82d2d82 process::Promise<>::fail()
Aborted
sometimes the warning is like this, and then crashed with the same output above:
0817 15:09:49.745750 2104 recover.cpp:111] Unable to finish the recover protocol in 10secs, retrying
I want to know whether zookeeper is deployed and run well in my case, and How can I locate where the problem is. Any answers and suggests are welcomed. thanks.
Actually, in my case, It's because I didn't open firewall port 5050 to allow three servers to communicate with each others. After updating firewall rule, it starts to work as expected.
I fall into same issue, I tried different ways and different options and finally --ip option worked for me. Initially I used --hostname option
mesos-master --ip=192.168.0.13 --quorum=2 --zk=zk://m1:2181,m2:2181,m3:2181/mesos --work_dir=/opt/mm1 --log_dir=/opt/mm1/logs
You need to check that all mesos/zookeeper master nodes can communicate correctly. For that, you need:
Zookeeper ports open: TCP 2181, 2888, 3888
Mesos port open: TCP 5050
ping available (ICMP message 0 and 8)
If you use FQDN instead of IP in your config, check that the DNS resolution is working correctly as well.
Split your mesos masters' work_dir to different dir, do not use a share work_dir for all masters, because of zk
Kafka 0.8
I follow the quickstart guide and when I come to the Step 2 to run bin/kafka-server-start.sh config/server.properties I'm facing the exception:
[2013-08-06 09:55:14,603] INFO 0 successfully elected as leader (kafka.server.ZookeeperLeaderElector)
[2013-08-06 09:55:14,657] ERROR Error while electing or becoming leader on broker 0 (kafka.server.ZookeeperLeaderElector)
java.net.SocketException: invalid argument
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:465)
at sun.nio.ch.Net.connect(Net.java:457)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:639)
at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
at kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$addNewBroker(ControllerChannelManager.scala:84)
at kafka.controller.ControllerChannelManager$$anonfun$1.apply(ControllerChannelManager.scala:35)
at kafka.controller.ControllerChannelManager$$anonfun$1.apply(ControllerChannelManager.scala:35)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:81)
at kafka.controller.ControllerChannelManager.<init>(ControllerChannelManager.scala:35)
at kafka.controller.KafkaController.startChannelManager(KafkaController.scala:503)
at kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:467)
at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:215)
at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:89)
at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:53)
at kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:106)
at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:549)
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
What could I be doing wrong? Please advise.
It is likely to be either a problem with name resolution, or with leftover settings from 0.7.
If you are migrating from 0.7, see the migration guide.
If you're starting fresh, ensure that there is an accurate entry in /etc/hosts for your hostname.
e.g. given a /etc/hostname file with
yourhostname
and an interface (/sbin/ifconfig) listening with an example IP of 10.181.11.14
/etc/hosts should correctly map that name to the listening interface:
10.181.11.14 yourhostname.yourdomain.com yourhostname someotheralias
You can test it by telnetting to the kafka port and ensuring that there is no timeout:
telnet yourhostname.yourdomain.com 9092
Trying 10.181.11.14...
Connected to yourhostname.yourdomain.com.
Escape character is '^]'.