Kubeadm - no port 6443 after cluster creation - kubernetes

I'm trying to create Kubernetes HA cluster using kubeadm.
Kubeadm version: v.1.11.1
I'm using following instructions: kubeadm ha
All passed ok, except the final point. Nodes can't see each other on port 6443.
sudo netstat -an | grep 6443
Shows nothing.
In journalctl -u kubelet I see following error:
reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list *v1.Node: Get https://<LB>:6443/api/v1/nodes?fieldSelector=metadata.name%3Dip-172-19-111-200.ec2.internal&limit=500&resourceVersion=0: dial tcp 172.19.111.200:6443: connect: connection refused
List of docker runs on instance:
sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e3eabb527a92 0e4a34a3b0e6 "kube-scheduler --ad…" 19 hours ago Up 19 hours k8s_kube-scheduler_kube-scheduler-ip-172-19-111-200.ec2.internal_kube-system_31eabaff7d89a40d8f7e05dfc971cdbd_1
123e78fa73c7 55b70b420785 "kube-controller-man…" 19 hours ago Up 19 hours k8s_kube-controller-manager_kube-controller-manager-ip-172-19-111-200.ec2.internal_kube-system_85384ca66dd4dc0adddc63923e2425a8_1
e0aa05e74fb9 1d3d7afd77d1 "/usr/local/bin/kube…" 19 hours ago Up 19 hours k8s_kube-proxy_kube-proxy-xh5dg_kube-system_f6bc49bc-959e-11e8-be29-0eaa4481e274_0
f5eac0b8fe7b k8s.gcr.io/pause:3.1 "/pause" 19 hours ago Up 19 hours k8s_POD_kube-proxy-xh5dg_kube-system_f6bc49bc-959e-11e8-be29-0eaa4481e274_0
541011b3e83a k8s.gcr.io/pause:3.1 "/pause" 19 hours ago Up 19 hours k8s_POD_etcd-ip-172-19-111-200.ec2.internal_kube-system_84d934eebaace20c70e0f268eb100028_0
a5e203947686 k8s.gcr.io/pause:3.1 "/pause" 19 hours ago Up 19 hours k8s_POD_kube-scheduler-ip-172-19-111-200.ec2.internal_kube-system_31eabaff7d89a40d8f7e05dfc971cdbd_0
89dbcdda659c k8s.gcr.io/pause:3.1 "/pause" 19 hours ago Up 19 hours k8s_POD_kube-apiserver-ip-172-19-111-200.ec2.internal_kube-system_4202bb793950ae679b2a433ea8711d18_0
5948e629d90e k8s.gcr.io/pause:3.1 "/pause" 19 hours ago Up 19 hours k8s_POD_kube-controller-manager-ip-172-19-111-200.ec2.internal_kube-system_85384ca66dd4dc0adddc63923e2425a8_0
Forwarding in sysctl exists:
sudo sysctl -p
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.ip_forward = 1

Nodes can't see each other on port 6443.
It seems like your api server in not runnning.
Fact that you have error stating :6443: connect: connection refused is pointing towards your api server not running.
This is further confirmed from your list of running docker containers on instances - you are missing api server container. Note that you have related container with "/pause" but you are missing container with "kube-apiserver --...". Your scheduler and controller-manger appear to run correctly, but api server is not.
Now you have to dig in and see what prevented your api server from starting properly. Check kubelet logs on all control-plane nodes.

This also happens if your Linux kernel is not configured to do ip4/ip6 transparently.
An ip4 address configured when the kube-api listens on an ip6 interface breaks.

Related

How to properly query Kafka REST Proxy?

I'm running a dockerized distribution of Confluent platform:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e6963904b485 confluentinc/cp-enterprise-control-center:7.0.1 "/etc/confluent/dock…" 11 hours ago Up 11 hours 0.0.0.0:9021->9021/tcp, :::9021->9021/tcp control-center
49ade0e752b4 confluentinc/cp-ksqldb-cli:7.0.1 "/bin/sh" 11 hours ago Up 11 hours ksqldb-cli
95b0982c0159 confluentinc/ksqldb-examples:7.0.1 "bash -c 'echo Waiti…" 11 hours ago Up 11 hours ksql-datagen
e28e3b937f6e confluentinc/cp-ksqldb-server:7.0.1 "/etc/confluent/dock…" 11 hours ago Up 11 hours 0.0.0.0:8088->8088/tcp, :::8088->8088/tcp ksqldb-server
af92bfb84cb1 confluentinc/cp-kafka-rest:7.0.1 "/etc/confluent/dock…" 11 hours ago Up 11 hours 0.0.0.0:8082->8082/tcp, :::8082->8082/tcp rest-proxy
318a999e76dc cnfldemos/cp-server-connect-datagen:0.5.0-6.2.0 "/etc/confluent/dock…" 11 hours ago Up 11 hours 0.0.0.0:8083->8083/tcp, :::8083->8083/tcp, 9092/tcp connect
0c299fbda7c5 confluentinc/cp-schema-registry:7.0.1 "/etc/confluent/dock…" 11 hours ago Up 11 hours 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp schema-registry
a33075002386 confluentinc/cp-server:7.0.1 "/etc/confluent/dock…" 11 hours ago Up 11 hours 0.0.0.0:9092->9092/tcp, :::9092->9092/tcp, 0.0.0.0:9101->9101/tcp, :::9101->9101/tcp broker
135f832fbccb confluentinc/cp-zookeeper:7.0.1 "/etc/confluent/dock…" 11 hours ago Up 11 hours 2888/tcp, 0.0.0.0:2181->2181/tcp, :::2181->2181/tcp, 3888/tcp zookeeper
Kafka REST Proxy is running on port 8082
When I issue an HTTP GET call against the REST proxy:
curl --silent -X GET http://10.0.0.253:8082/kafka/clusters/ | jq
All I get is:
{
"error_code": 404,
"message": "HTTP 404 Not Found"
}
Given my configuration, what can I change to actually get some useful information out of Kafka REST Proxy?

Keycloak LetsEncrypt Nginx Reverse Proxy Docker Compose

I am trying to setup an keycloak instance with an ssl connection over an nginx proxy my 'docker ps' output:
d7fd473cc77b jboss/keycloak "/opt/jboss/tools/do…" 34 minutes ago Up 8 minutes 0.0.0.0:8080->8080/tcp, 8443/tcp auth
76e757bbe129 mariadb "sh -c ' echo 'CREAT…" 34 minutes ago Up 8 minutes 0.0.0.0:3306->3306/tcp backend-database
d99e23470955 stilliard/pure-ftpd:hardened-latest "/bin/sh -c '/run.sh…" 34 minutes ago Up 8 minutes 0.0.0.0:21->21/tcp, 0.0.0.0:30000->30000/tcp, 30001-30009/tcp ftp-server
95f4fbdea0de wordpress:latest "docker-entrypoint.s…" 35 minutes ago Up 35 minutes 80/tcp wordpress
b3e40ca6de48 mariadb:latest "docker-entrypoint.s…" 35 minutes ago Up 35 minutes 3306/tcp database
e5c12bb5ba52 nginx "/docker-entrypoint.…" 37 minutes ago Up 37 minutes 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp nginx-web
c0ac90a6c408 jrcs/letsencrypt-nginx-proxy-companion "/bin/bash /app/entr…" 37 minutes ago Up 37 minutes nginx-letsencrypt
33ae7de5f598 jwilder/docker-gen "/usr/local/bin/dock…" 37 minutes ago Up 37 minutes nginx-gen
As you can see at the above console output, I am also running an instance of wordpress in a docker container and this works like a charm, no problems with unsigned or invalid ssl certificates, just erverything fine. But when I am trying to call the web interface of keycloak over the domain with the corresponding port (in my case: 8080), I got the following error:
Fehlercode: SSL_ERROR_RX_RECORD_TOO_LONG
And when I am trying to call the web interface over the ip address also with the corresponding port, I got the message that the connection isn't safe.
Hopefully this are enough information for you guys, to figure out what I've done wrong.
So far,
Daniel

kubeadm init kubelet complains default bind address already in use

kubeadm version 1.12.2
$ sudo kubeadm init --config kubeadm_new.config --ignore-preflight-errors=all
/var/log/syslog shows:
Nov 15 08:44:13 khteh-T580 kubelet[5101]: I1115 08:44:13.438374 5101 server.go:1013] Started kubelet
Nov 15 08:44:13 khteh-T580 kubelet[5101]: I1115 08:44:13.438406 5101 server.go:133] Starting to listen on 0.0.0.0:10250
Nov 15 08:44:13 khteh-T580 kubelet[5101]: E1115 08:44:13.438446 5101 kubelet.go:1287] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache
Nov 15 08:44:13 khteh-T580 kubelet[5101]: E1115 08:44:13.438492 5101 server.go:753] Starting health server failed: listen tcp 127.0.0.1:10248: bind: address already in use
Nov 15 08:44:13 khteh-T580 kubelet[5101]: I1115 08:44:13.438968 5101 server.go:318] Adding debug handlers to kubelet server.
Nov 15 08:44:13 khteh-T580 kubelet[5101]: F1115 08:44:13.439455 5101 server.go:145] listen tcp 0.0.0.0:10250: bind: address already in use
I have tried sudo systemctl stop kubelet and manually kill kubelet process but to no avail. Any advice and insights are appreciated.
Here is what you can do:
Try the following command to find out which process is holding the port 10250
root#master admin]# ss -lntp | grep 10250
LISTEN 0 128 :::10250 :::* users:(("kubelet",pid=23373,fd=20))
It will give you PID of that process and name of that process. If it is unwanted process which is holding the port, you can always kill the process and that port becomes available to use by kubelet.
After killing the process again run the above command, it should return no value.
Just to be on safe side run kubeadm reset and then run kubeadm init and it should go through.
Have you tried using netstat to see what other process is running that has already bound to that port?
sudo netstat -tulpn | grep 10250
For me, later on I discovered that there were 2 extra containers "Terminating" of core-dns-xxxxx in my cluster.
After deleting them forcefully solved the problem for me:
kubectl delete core-dns-xxxx --force
Thanks.
I ditch kubeadm and use microk8s.

mesos slaves are not connecting with mesos masters cluster

i have a setup where i am using 3 mesos masters and 3 mesos slasves. after making all the required configurations i can see 3 mesos masters are part of a cluster which is maintained by zookeepers.
now i have setup 3 mesos slaves and when i am starting mesos-slave service, i am expecting that mesos slaves will be available to the mesos masters web UI page. But i can not see any of them in the slaves tab.
selinux, firewall, iptalbes all are disabled. able to perform ssh between node.
[cloud-user#slave1 ~]$ sudo systemctl status mesos-slave -l
mesos-slave.service - Mesos Slave
Loaded: loaded (/usr/lib/systemd/system/mesos-slave.service; enabled)
Active: active (running) since Sat 2016-01-16 16:11:55 UTC; 3s ago
Main PID: 2483 (mesos-slave)
CGroup: /system.slice/mesos-slave.service
├─2483 /usr/sbin/mesos-slave --master=zk://10.0.0.2:2181,10.0.0.6:2181,10.0.0.7:2181/mesos --log_dir=/var/log/mesos --containerizers=docker,mesos --executor_registration_timeout=5mins
├─2493 logger -p user.info -t mesos-slave[2483]
└─2494 logger -p user.err -t mesos-slave[2483]
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: I0116 16:11:55.628670 2497 detector.cpp:482] A new leading master (UPID=master#127.0.0.1:5050) is detected
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: I0116 16:11:55.628732 2497 slave.cpp:729] New master detected at master#127.0.0.1:5050
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: I0116 16:11:55.628825 2497 slave.cpp:754] No credentials provided. Attempting to register without authentication
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: I0116 16:11:55.628844 2497 slave.cpp:765] Detecting new master
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: I0116 16:11:55.628872 2497 status_update_manager.cpp:176] Pausing sending status updates
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: E0116 16:11:55.628922 2503 process.cpp:1911] Failed to shutdown socket with fd 11: Transport endpoint is not connected
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: I0116 16:11:55.629093 2502 slave.cpp:3215] master#127.0.0.1:5050 exited
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: W0116 16:11:55.629107 2502 slave.cpp:3218] Master disconnected! Waiting for a new master to be elected
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: E0116 16:11:55.983531 2503 process.cpp:1911] Failed to shutdown socket with fd 11: Transport endpoint is not connected
Jan 16 16:11:57 slave1.novalocal mesos-slave[2494]: E0116 16:11:57.465049 2503 process.cpp:1911] Failed to shutdown socket with fd 11: Transport endpoint is not connected
So the problematic line is:
Jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: I0116 16:11:55.629093 2502 slave.cpp:3215] master#127.0.0.1:5050 exited
Specifically, note it's detecting the master as having the IP address 127.0.0.1. The Mesos Agent[1] sees that IP address, and tries to connect which fails (The master isn't running on the same machine as the agent).
This happens because the master announces what it thinks it's IP address is into Zookeeper. In your case, the master is thinking it's IP is 127.0.0.1 and then storing that into zk. Mesos has several configuration flags to control this behavior, mainly --hostname, --no-hostname_lookup, --ip, --ip_discovery_command, and via setting the environment variable LIBPROCESS_IP. See http://mesos.apache.org/documentation/latest/configuration/ for details about them and what they do.
The best thing you can do to make sure things work out of the box is to make sure the machines have resolvable hostnames. Mesos does a reverse-DNS lookup of the boxes hostname in order to figure out what IP people will contact it from.
If you can't get the hostnames setup properly, I would recommend setting --hostname and --ip manually which should cause mesos to announce exactly what you want.
[1]The mesos slave has been renamed to agent, see: https://issues.apache.org/jira/browse/MESOS-1478

Kubernetes starts giving errors after few hours of uptime

I have installed K8S on OpenStack following this guide.
The installation went fine and I was able to run pods but after some time my applications stops working. I can still create pods but request won't reach the services from outside the cluster and also from within the pods. Basically, something in networking gets messed up. The iptables -L -vnt nat still shows the proper configuration but things won't work.
To make it working, I have to rebuild cluster, removing all services and replication controllers doesn't work.
I tried to look into the logs. Below is the journal for kube-proxy:
Dec 20 02:12:18 minion01.novalocal systemd[1]: Started Kubernetes Proxy.
Dec 20 02:15:52 minion01.novalocal kube-proxy[1030]: I1220 02:15:52.269784 1030 proxier.go:487] Opened iptables from-containers public port for service "default/opensips:sipt" on TCP port 5060
Dec 20 02:15:52 minion01.novalocal kube-proxy[1030]: I1220 02:15:52.278952 1030 proxier.go:498] Opened iptables from-host public port for service "default/opensips:sipt" on TCP port 5060
Dec 20 03:05:11 minion01.novalocal kube-proxy[1030]: W1220 03:05:11.806927 1030 api.go:224] Got error status on WatchEndpoints channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested index is outdated and cleared (the requested history has been cleared [1433/544]) [2432] Reason: Details:<nil> Code:0}
Dec 20 03:06:08 minion01.novalocal kube-proxy[1030]: W1220 03:06:08.177225 1030 api.go:153] Got error status on WatchServices channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested index is outdated and cleared (the requested history has been cleared [1476/207]) [2475] Reason: Details:<nil> Code:0}
..
..
..
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: E1220 16:01:23.448570 1030 proxier.go:161] Failed to ensure iptables: error creating chain "KUBE-PORTALS-CONTAINER": fork/exec /usr/sbin/iptables: too many open files:
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: W1220 16:01:23.448749 1030 iptables.go:203] Error checking iptables version, assuming version at least 1.4.11: %vfork/exec /usr/sbin/iptables: too many open files
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: E1220 16:01:23.448868 1030 proxier.go:409] Failed to install iptables KUBE-PORTALS-CONTAINER rule for service "default/kubernetes:"
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: E1220 16:01:23.448906 1030 proxier.go:176] Failed to ensure portal for "default/kubernetes:": error checking rule: fork/exec /usr/sbin/iptables: too many open files:
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: W1220 16:01:23.449006 1030 iptables.go:203] Error checking iptables version, assuming version at least 1.4.11: %vfork/exec /usr/sbin/iptables: too many open files
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: E1220 16:01:23.449133 1030 proxier.go:409] Failed to install iptables KUBE-PORTALS-CONTAINER rule for service "default/repo-client:"
I found few posts relating to "failed to install iptables" but they don't seem to be relevant as initially everything works but after few hours it gets messed up.
What version of Kubernetes is this? A long time ago (~1.0.4) we had a bug in the kube-proxy where it leaked sockets/file-descriptors.
If you aren't running a 1.1.3 binary, consider upgrading.
Also, you should be able to use lsof to figure out who has all of the files open.