I follow the instructions to bootstrap a new Ceph (I'm new to Ceph) cluster.
I got the following error:
sudo cephadm bootstrap --mon-ip <mon-ip>
INFO:cephadm:Verifying podman|docker is present...
INFO:cephadm:Verifying lvm2 is present...
INFO:cephadm:Verifying time synchronization is in place...
INFO:cephadm:Unit systemd-timesyncd.service is enabled and running
INFO:cephadm:Repeating the final host check...
INFO:cephadm:podman|docker (/usr/bin/podman) is present
INFO:cephadm:systemctl is present
INFO:cephadm:lvcreate is present
INFO:cephadm:Unit systemd-timesyncd.service is enabled and running
INFO:cephadm:Host looks OK
INFO:root:Cluster fsid: e08484be-72c1-11ea-a13e-0050563f093a
INFO:cephadm:Verifying IP *<mon-ip>* port 3300 ...
INFO:cephadm:Verifying IP *<mon-ip>* port 6789 ...
ERROR: Failed to infer CIDR network for mon ip *<mon-ip>*; pass --skip-mon-network to configure it later
What does it mean ? How to fix it ?
cephadm is still fairly new. I've tracked a few days ago in:
https://tracker.ceph.com/issues/44828
Please run
ceph config set mon public_network <mon_network>
after bootstrap finished.
Is this the exact command you ran?
sudo cephadm bootstrap --mon-ip *<mon-ip>*
If so you actually need to replace *<mon-ip>* with the actual IP address that you want the monitor daemon to listen on.
For future reference, on that page, any command you see that has a variable surrounded by asterisks is something you would need to replace with an address/host/hostname etc. that applies to your environment.
Related
sudo kubeadm init
I0609 02:20:26.963781 3600 version.go:252] remote version is much newer: v1.21.1; falling back to: stable-1.18
W0609 02:20:27.069495 3600 configset.go:202]
WARNING: kubeadm cannot validate component configs `for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]`
`[init] Using Kubernetes version: v1.18.19`
`[preflight] Running pre-flight checks`
`error execution phase preflight: [preflight] Some fatal errors occurred:`
`[ERROR Port-10259]: Port 10259 is in use`
`[ERROR Port-10257]: Port 10257 is in use`
`[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: `/etc/kubernetes/manifests/kube-apiserver.yaml already exists`
`[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]:` `/etc/kubernetes/manifests/kube-controller-manager.yaml already exists`
`[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]:` /etc/kubernetes/manifests/kube-scheduler.yaml already exists
`[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists`
`[ERROR Port-10250]: Port 10250 is in use`
`[ERROR Port-2379]: Port 2379 is in use`
`[ERROR Port-2380]: Port 2380 is in use`
`[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty`
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
Hi and welcome to Stack Overflow.
"Port in use" means that there's a process running that uses that port. So you need to stop that process. Since you already ran kubeadm init once, it must have already changed a number of things.
First run kubeadm reset to undo all of the changes from the first time you ran it.
Then run systemctl restart kubelet.
Finally, when you run kubeadm init you should no longer get the error.
Even after following the above steps , if you get this error:
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
Then, remove the etcd folder (/var/lib/etcd) before you run kubeadm init.
Note:
This solution worked for other users.
The warning itself is not an issue, it's just warning that kubeadm no longer validates the KubeletConfiguration, KubeProxyConfiguration that it feeds to the kubelet, kube-proxy components.
I also got this issue, additionally i had to manually kill kubelet , using
$ pkill kubelet
kubeadm init worked without issues after this,.
building a master localy.
I am trying to use /kubernetes/docs/getting-started-guides/docker-multinode/master.sh
master is done!
some detail imformation:
K8S_VERSION is set to: 1.2.4
ETCD_VERSION is set to: 2.2.1
FLANNEL_VERSION is set to: 0.5.5
FLANNEL_IFACE is set to: wlan0
FLANNEL_IPMASQ is set to: true
MASTER_IP is set to: 192.168.1.104
ARCH is set to: amd64
Detecting your OS distro ...
Starting bootstrap docker ...
Starting k8s ...
4fa730ca61727c165345885639cbef4d5a34b5d0fa3a361c13b0abc3ead0dd58
{ "Network": "10.1.0.0/16", "Backend": {"Type": "vxlan"}}
vDOCKER_OPTS="$DOCKER_OPTS --mtu=1450 --bip=10.1.48.1/24"
next add a vagrant virtualbox slave to master.
trying to use /kubernetes/docs/getting-started-guides/docker-multinode/worker.sh it bring some issue:
K8S_VERSION is set to: 1.2.4
FLANNEL_VERSION is set to: 0.5.5
FLANNEL_IFACE is set to: wlan0
FLANNEL_IPMASQ is set to: true
MASTER_IP is set to: 192.168.1.104
ARCH is set to: amd64
Detecting your OS distro ...
Starting bootstrap docker ...
Starting k8s ...
Error response from daemon: lstat /var/lib/docker-bootstrap/aufs/mnt/8881a46cdd1fe51b7f0f86e8659bc8f9c905089b0d86a273e85d134
c637041c1/tmp/flannel/subnet.env: no such file or directory
I have two vm,
one is on public_network:192.168.1.110
other is on private_network:192.168.50.2
all is the same error output.
ps: can I use the worker.sh in aws ec2?
question is a bit miscellaneous,
hoping a cattle help me to consolidation.
I am lucky enough to pass the work node, but I don't know the exactly reason.
now I package the vagrant box in my google cloud disk, url is below----[https://drive.google.com/file/d/0B9WzutmgDDrodkpLRGM2c0J1RXc/view?usp=sharing][1].
before you take a public or private network with it, you should download a pre-compiled kubernetes version, and copy it to the vagrant folder.
use vagrant init kube-slave to start.
I am trying to configure a two node cluster with cassandra in windows r2 2008
So i installed cassandra community version in one server (10.xxx.0.1,10.xxx.0.2)
And then I stopped the service and then edited the configuraton.yaml file in the conf folder.
The changes are:
cluster_name
commented the num_tokens
gave the tokens in initial_token,
seeds as 10.xxx.0.1,10.xxx.0.2,
listen_addresses are their respective ip addresses which are 10.xxx.0.1,10.xxx.0.2,
rpc_addresses as 0.0.0.0,
endpointsnitch as gossip
I also changed the cassandra rackdc.properties file to dc=DC1 rack=RAC1.
I then saved and started back the service and opened the cqlsh, but it is not connecting. Below is the error:
2015-10-12 16:20:13 Commons Daemon procrun stderr initialized
If rpc_address is set to a wildcard address (0.0.0.0), then you must set broadcast_rpc_address to a value other than 0.0.0.0
Fatal configuration error; unable to start. See log for stacktrace.
..
ERROR 21:20:14 Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: If rpc_address is set to a wildcard address (0.0.0.0), then you must set broadcast_rpc_address to a value other than 0.0.0.0
at org.apache.cassandra.config.DatabaseDescriptor.applyAddressConfig(DatabaseDescriptor.java:285) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:443) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:136) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:168) [apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562) [apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651) [apache-cassandra-2.1.10.jar:2.1.10]
If you out 0.0.0.0 to the rpc_address you have to change the broadcast_rpc_address like in http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html , I think that the right broadcast_rpc_address can be the own ip address.
I am deploying a zookeeper cluster which has 3 nodes. I use it to keep my mesos master high availability. I download the zookeeper-3.4.6.tar.gz tarball and uncompress it to /opt, rename it to /opt/zookeeper, enter the directory, edit the conf/zoo.cfg(pasted below), create a myid file in dataDir(which is set to /var/lib/zookeeper in zoo.cfg), and start zookeeper using ./bin/zkServer.sh start, and it goes well. I start all the 3 nodes one by one and they all seems well. I use ./bin/zkCli.sh to connect the server , no problem.
But when I start mesos (3 masters and 3 slaves, each node runs a master and a slave), then the masters soon crashed, one by one, and in the webpage http://mesos_master:5050, slave tab, no slaves are displayed. But when I run only one zookeeper, these are all fine. So I think it's the zookeeper cluster's problem.
I got 3 PV host in my ubuntu server. they are all running ubuntu 14.04 LTS:
node-01, node-02, node-03,
I have /etc/hosts in all three nodes like this:
172.16.2.70 node-01
172.16.2.81 node-02
172.16.2.80 node-03
I installed zookeeper, mesos on all the three nodes. Zookeeper configure file is like this (all three nodes) :
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=node-01:2888:3888
server.2=node-02:2888:3888
server.3=node-03:2888:3888
they can be started normally and run well. And then I start the mesos-master service, using the command line ./bin/mesos-master.sh --zk=zk://172.16.2.70:2181,172.16.2.81:2181,172.16.2.80:2181/mesos --work_dir=/var/lib/mesos --quorum=2, and after a few seconds, it gives me errors like this:
F0817 15:09:19.995256 2250 master.cpp:1253] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
*** Check failure stack trace: ***
# 0x7fa2b8be71a2 google::LogMessage::Fail()
# 0x7fa2b8be70ee google::LogMessage::SendToLog()
# 0x7fa2b8be6af0 google::LogMessage::Flush()
# 0x7fa2b8be9a04 google::LogMessageFatal::~LogMessageFatal()
▽
# 0x7fa2b81a899a mesos::internal::master::fail()
▽
# 0x7fa2b8262f8f _ZNSt5_BindIFPFvRKSsS1_EPKcSt12_PlaceholderILi1EEEE6__callIvJS1_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
▽
# 0x7fa2b823fba7 _ZNSt5_BindIFPFvRKSsS1_EPKcSt12_PlaceholderILi1EEEEclIJS1_EvEET0_DpOT_
# 0x7fa2b820f9f3 _ZZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsS6_EPKcSt12_PlaceholderILi1EEEEvEERKS2_OT_NS2_6PreferEENUlS6_E_clES6_
# 0x7fa2b826305c _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvS1_S1_EPKcSt12_PlaceholderILi1EEEEvEERKS6_OT_NS6_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_
# 0x4a44e7 std::function<>::operator()()
# 0x49f3a7 _ZN7process8internal3runISt8functionIFvRKSsEEJS4_EEEvRKSt6vectorIT_SaIS8_EEDpOT0_
# 0x499480 process::Future<>::fail()
# 0x7fa2b806b4b4 process::Promise<>::fail()
# 0x7fa2b826011b process::internal::thenf<>()
# 0x7fa2b82a0757 _ZNSt5_BindIFPFvRKSt8functionIFN7process6FutureI7NothingEERKN5mesos8internal8RegistryEEERKSt10shared_ptrINS1_7PromiseIS3_EEERKNS2_IS7_EEESB_SH_St12_PlaceholderILi1EEEE6__callIvISM_EILm0ELm1ELm2EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
# 0x7fa2b82962d9 std::_Bind<>::operator()<>()
# 0x7fa2b827ee89 std::_Function_handler<>::_M_invoke()
I0817 15:09:20.098639 2248 http.cpp:283] HTTP GET for /master/state.json from 172.16.2.84:54542 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36'
# 0x7fa2b8296507 std::function<>::operator()()
# 0x7fa2b827efaf _ZZNK7process6FutureIN5mesos8internal8RegistryEE5onAnyIRSt8functionIFvRKS4_EEvEES8_OT_NS4_6PreferEENUlS8_E_clES8_
# 0x7fa2b82a07fe _ZNSt17_Function_handlerIFvRKN7process6FutureIN5mesos8internal8RegistryEEEEZNKS5_5onAnyIRSt8functionIS8_EvEES7_OT_NS5_6PreferEEUlS7_E_E9_M_invokeERKSt9_Any_dataS7_
# 0x7fa2b8296507 std::function<>::operator()()
# 0x7fa2b82e4419 process::internal::run<>()
# 0x7fa2b82da22a process::Future<>::fail()
# 0x7fa2b83136b5 std::_Mem_fn<>::operator()<>()
# 0x7fa2b830efdf _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureIN5mesos8internal8RegistryEEEFbRKSsEES6_St12_PlaceholderILi1EEEE6__callIbIS8_EILm0ELm1EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
# 0x7fa2b8307d7f _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureIN5mesos8internal8RegistryEEEFbRKSsEES6_St12_PlaceholderILi1EEEEclIJS8_EbEET0_DpOT_
# 0x7fa2b82fe431 _ZZNK7process6FutureIN5mesos8internal8RegistryEE8onFailedISt5_BindIFSt7_Mem_fnIMS4_FbRKSsEES4_St12_PlaceholderILi1EEEEbEERKS4_OT_NS4_6PreferEENUlS9_E_clES9_
# 0x7fa2b830f065 _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureIN5mesos8internal8RegistryEE8onFailedISt5_BindIFSt7_Mem_fnIMS8_FbS1_EES8_St12_PlaceholderILi1EEEEbEERKS8_OT_NS8_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_
# 0x4a44e7 std::function<>::operator()()
# 0x49f3a7 _ZN7process8internal3runISt8functionIFvRKSsEEJS4_EEEvRKSt6vectorIT_SaIS8_EEDpOT0_
# 0x7fa2b82da202 process::Future<>::fail()
# 0x7fa2b82d2d82 process::Promise<>::fail()
Aborted
sometimes the warning is like this, and then crashed with the same output above:
0817 15:09:49.745750 2104 recover.cpp:111] Unable to finish the recover protocol in 10secs, retrying
I want to know whether zookeeper is deployed and run well in my case, and How can I locate where the problem is. Any answers and suggests are welcomed. thanks.
Actually, in my case, It's because I didn't open firewall port 5050 to allow three servers to communicate with each others. After updating firewall rule, it starts to work as expected.
I fall into same issue, I tried different ways and different options and finally --ip option worked for me. Initially I used --hostname option
mesos-master --ip=192.168.0.13 --quorum=2 --zk=zk://m1:2181,m2:2181,m3:2181/mesos --work_dir=/opt/mm1 --log_dir=/opt/mm1/logs
You need to check that all mesos/zookeeper master nodes can communicate correctly. For that, you need:
Zookeeper ports open: TCP 2181, 2888, 3888
Mesos port open: TCP 5050
ping available (ICMP message 0 and 8)
If you use FQDN instead of IP in your config, check that the DNS resolution is working correctly as well.
Split your mesos masters' work_dir to different dir, do not use a share work_dir for all masters, because of zk
I'm using python. I did a yum install memcached followed by a easy_install python-memcached
I used the simple test program from the Help(memcache). When I wasn't getting the proper answers I threw in some print statements:
[~/test]$ cat m2.py
import memcache
mc = memcache.Client(['127.0.0.1:11211'], debug=0)
x = mc.set("some_key", "Some value")
print 'Just set a key and value into the cache (suposedly)'
value = mc.get("some_key")
print 'Just retrieved that value from the cache using the key'
print 'X %s' % x
print 'Value %s' % value
[~/test]$ python m2.py
Just set a key and value into the cache (suposedly)
Just retrieved that value from the cache using the key
X 0
Value None
[~/test]$
The question now is, what have I failed to do in my installation? It appears to be working from an API perspective but it fails to put anything into the memcache share area.
I'm using a virtualbox vm running centos
[~]# cat /proc/version
Linux version 2.6.32-358.6.2.el6.i686 (mockbuild#c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Thu May 16 18:12:13 UTC 2013
Is there a daemon that is supposed to be running? I don't see an obvious named one when I do a ps.
I tried to get pylibmc installed on my vm but was unable to find a working installation so for now will see if I can get the above stuff working first.
I discovered if i ran straight from the python console GUI i get a bit more output if I set debug=1
>>> mc = memcache.Client(['127.0.0.1:11211'], debug=1)
>>> mc.stats
{}
>>> mc.set('test','value')
MemCached: MemCache: inet:127.0.0.1:11211: connect: Connection refused. Marking dead.
0
>>> mc.get('test')
MemCached: MemCache: inet:127.0.0.1:11211: connect: Connection refused. Marking dead.
When I try to use per the example telnet to connect to the port i get a connection refused:
[root#~]# telnet 127.0.0.1 11211
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
[root#~]#
I tried the instructions I found on the net for configuring telnet so localhost wouldn't be disabled:
vi /etc/xinetd.d/telnet
service telnet
{
flags = REUSE
socket_type = stream
wait = no
user = root
server = /usr/sbin/in.telnetd
log_on_failure += USERID
disable = no
}
And then ran the commands to restart the service(s):
service iptables stop
service xinetd stop
service iptables start
service xinetd start
service iptables stop
I ran with both cases (iptables started and stopped) but it has no effect. So I am out of ideas. What do I need to do to make it so the PORT will be allowed? if that is the problem?
Or is there a memcached service that needs to be running that needs to open up the port ?
well this is what it took to get it working: ( a series of manual steps )
1) su -
cd /var/run
mkdir memcached # this was missing
In the memcached file I added "-l 127.0.0.1" to the OPTIONS statement. It's apparently a listen option. Do this for steps 2 & 3. I'm not certain which file is actually used at runtime.
2) cd /etc/sysconfig
cp memcached memcached.old
vi memcached
3) cd /etc/init.d
cp memcached memcached.old
vi memcached
4) Try some commands to see if the server starts now
/etc/init.d/memcached start
/etc/init.d/memcached status
/etc/init.d/memcached stop
/etc/init.d/memcached restart
I tried opening a browser, but it never seemed to actually display anything so I don't really know how valid this approach is. I'm not running apache or anything like this so perhaps its not relevant to my cause. Perhaps I would have to supply a ?key=blah or something.
5) http://127.0.0.1:11211
6) Now it should be ready to go. If one runs the test shown with the following it should work. At least it did for me. doing the help(memcache) will display a simple program. just paste that in and it should work just fine.
[~]$ python
>>> import memcache
>>> help(memcache)