I am using Kubernetes v1.2.2 on docker-machine and I want to connect to a multi-container pod from another ordinary single-container pod running on the same node.
Each pod has a service providing a clusterIP. I can access the single-container pod (postgres-service) from the multi-container pod (postgres-bw-service) via the clusterIP of the corresponding service but not vice-versa. I checked the kube-proxy logs and noticed that no service endpoints are set for the service of my multi-container pod:
I0408 16:20:06.693273 1 iptables.go:177] Could not connect to D-Bus system bus: dial unix /var/run/dbus/system_bus_socket: no such file or directory
E0408 16:20:06.703401 1 server.go:340] Can't get Node "default", assuming iptables proxy: Get http://127.0.0.1:8080/api/v1/nodes/default: dial tcp 127.0.0.1:8080: connection refused
I0408 16:20:06.711666 1 server.go:200] Using iptables Proxier.
I0408 16:20:06.712420 1 server.go:213] Tearing down userspace rules.
I0408 16:20:06.822631 1 conntrack.go:36] Setting nf_conntrack_max to 262144
I0408 16:20:06.822833 1 conntrack.go:41] Setting conntrack hashsize to 65536
I0408 16:20:06.823646 1 conntrack.go:46] Setting nf_conntrack_tcp_timeout_established to 86400
E0408 16:20:06.826470 1 event.go:202] Unable to write event: 'Post http://127.0.0.1:8080/api/v1/namespaces/default/events: dial tcp 127.0.0.1:8080: connection refused' (may retry after sleeping)
E0408 16:20:08.627546 1 event.go:202] Unable to write event: 'Post http://127.0.0.1:8080/api/v1/namespaces/default/events: dial tcp 127.0.0.1:8080: connection refused' (may retry after sleeping)
E0408 16:20:18.629129 1 event.go:202] Unable to write event: 'Post http://127.0.0.1:8080/api/v1/namespaces/default/events: dial tcp 127.0.0.1:8080: connection refused' (may retry after sleeping)
I0408 16:20:20.925349 1 proxier.go:484] Setting endpoints for "default/kubernetes:https" to [10.0.2.15:6443]
I0408 16:20:20.925592 1 proxier.go:484] Setting endpoints for "default/postgres-service:postgres-tcp" to [172.17.0.3:5432]
I0408 16:20:20.925841 1 proxier.go:565] Not syncing iptables until Services and Endpoints have been received from master
I0408 16:20:20.931213 1 proxier.go:421] Adding new service "default/kubernetes:https" at 10.0.0.1:443/TCP
I0408 16:20:20.931901 1 proxier.go:421] Adding new service "default/postgres-bw-service:postgres" at 10.0.0.139:5432/TCP
I0408 16:20:20.932084 1 proxier.go:421] Adding new service "default/postgres-bw-service:kafka" at 10.0.0.139:9092/TCP
I0408 16:20:20.932203 1 proxier.go:421] Adding new service "default/postgres-bw-service:zookeeper" at 10.0.0.139:2181/TCP
I0408 16:20:20.932331 1 proxier.go:421] Adding new service "default/postgres-service:postgres-tcp" at 10.0.0.241:5432/TCP
I0408 16:20:22.860305 1 proxier.go:494] Removing endpoints for "default/postgres-service:postgres-tcp"
I0408 16:20:26.582664 1 proxier.go:484] Setting endpoints for "default/postgres-service:postgres-tcp" to [172.17.0.2:5432]
Note the log entry Setting endpoints for "default/postgres-service:postgres-tcp" to [172.17.0.2:5432] but there is no similar entry for the postgres-bw-service. I assume that this is the reason why I cannot access the multi-container pod but I do not no what is causing this.
One more strange thing: I cannot access a pod via the clusterIP of its own service, this works neither for the multi-container pod nor for the single-container pod.
Any help appreciated
EDIT
kubectl get ep gives me
kubectl get ep postgres-service
NAME ENDPOINTS AGE
postgres-service 172.17.0.2:5432 17h
kubectl get ep postgres-bw-service
NAME ENDPOINTS AGE
postgres-bw-service <none> 17h
The issue was that the app labels of my pod did not match the app selector in my service definition.
Related
I have issues with CoreDNS on some nodes are in Crashloopback state due to error trying to reach the kubernetes internal service.
This is a new K8s cluster deployed using Kubespray, the network layer is Weave with Kubernetes version 1.12.5 on Openstack.
I've already tested the connection to the endpoints and have no issue reaching to 10.2.70.14:6443 for example.
But telnet from the pods to 10.233.0.1:443 is failing.
Thanks in advance for the help
kubectl describe svc kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.233.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 10.2.70.14:6443,10.2.70.18:6443,10.2.70.27:6443 + 2 more...
Session Affinity: None
Events: <none>
And from CoreDNS logs:
E0415 17:47:05.453762 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:311: Failed to list *v1.Service: Get https://10.233.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused
E0415 17:47:05.456909 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed to list *v1.Endpoints: Get https://10.233.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused
E0415 17:47:06.453258 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:318: Failed to list *v1.Namespace: Get https://10.233.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused
Also, checking out the logs of kube-proxy from one of the problematic nodes revealed the following errors:
I0415 19:14:32.162909 1 graceful_termination.go:160] Trying to delete rs: 10.233.0.1:443/TCP/10.2.70.36:6443
I0415 19:14:32.162979 1 graceful_termination.go:171] Not deleting, RS 10.233.0.1:443/TCP/10.2.70.36:6443: 1 ActiveConn, 0 InactiveConn
I0415 19:14:32.162989 1 graceful_termination.go:160] Trying to delete rs: 10.233.0.1:443/TCP/10.2.70.18:6443
I0415 19:14:32.163017 1 graceful_termination.go:171] Not deleting, RS 10.233.0.1:443/TCP/10.2.70.18:6443: 1 ActiveConn, 0 InactiveConn
E0415 19:14:32.215707 1 proxier.go:430] Failed to execute iptables-restore for nat: exit status 1 (iptables-restore: line 7 failed
)
I had exactly the same problem, and it turned out that my kubespray config was wrong. Especially the nginx ingress setting ingress_nginx_host_network
As it turns our you have to set ingress_nginx_host_network: true (defaults to false)
If you do not want to rerun the whole kubespray script, edit the nginx ingress deamon set
$ kubectl -n ingress-nginx edit ds ingress-nginx-controller
Add --report-node-internal-ip-address to the commandline:
spec:
container:
args:
- /nginx-ingress-controller
- --configmap=$(POD_NAMESPACE)/ingress-nginx
- --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
- --udp-services-configmap=$(POD_NAMESPACE)/udp-services
- --annotations-prefix=nginx.ingress.kubernetes.io
- --report-node-internal-ip-address # <- new
Set the following two properties on the same level as e.g serviceAccountName: ingress-nginx:
serviceAccountName: ingress-nginx
hostNetwork: true # <- new
dnsPolicy: ClusterFirstWithHostNet # <- new
Then save and quit :wq, check the pod status kubectl get pods --all-namespaces.
Source:
https://github.com/kubernetes-sigs/kubespray/issues/4357
I have had a running k8s cluster for 2 days and then it has started behaving strangely.
My specific question is on kube-proxy. kube-proxy is not updating iptables.
From kube-proxy logs, I can see it failed to connect to kubernetes-apiserver (in my case connection is kube-prxy --> Haproxy --> k8s API server). But the pod is shown as RUNNING.
Question: I am expecting kube-proxy pod to be down if it is not able to register with apiserver for events.
How do I achieve this behavior via liveness probes?
Note: After killing the pod, kube-proxy works fine.
kube-proxy logs
sudo docker logs 1de375c94fd4 -f
W0910 15:18:22.091902 1 server.go:195] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
I0910 15:18:22.091962 1 feature_gate.go:226] feature gates: &{{} map[]}
time="2018-09-10T15:18:22Z" level=warning msg="Running modprobe ip_vs failed with message: `modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.15.0-33-generic/modules.dep.bin'\nmodprobe: WARNING: Module ip_vs not found in directory /lib/modules/4.15.0-33-generic`, error: exit status 1"
time="2018-09-10T15:18:22Z" level=error msg="Could not get ipvs family information from the kernel. It is possible that ipvs is not enabled in your kernel. Native loadbalancing will not work until this is fixed."
I0910 15:18:22.185086 1 server.go:409] Neither kubeconfig file nor master URL was specified. Falling back to in-cluster config.
I0910 15:18:22.186885 1 server_others.go:140] Using iptables Proxier.
W0910 15:18:22.438408 1 server.go:601] Failed to retrieve node info: nodes "$(node_name)" not found
W0910 15:18:22.438494 1 proxier.go:306] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
I0910 15:18:22.438595 1 server_others.go:174] Tearing down inactive rules.
I0910 15:18:22.861478 1 server.go:444] Version: v1.10.2
I0910 15:18:22.867003 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 2883584
I0910 15:18:22.867046 1 conntrack.go:52] Setting nf_conntrack_max to 2883584
I0910 15:18:22.867267 1 conntrack.go:83] Setting conntrack hashsize to 720896
I0910 15:18:22.893396 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0910 15:18:22.893505 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0910 15:18:22.893737 1 config.go:102] Starting endpoints config controller
I0910 15:18:22.893749 1 controller_utils.go:1019] Waiting for caches to sync for endpoints config controller
I0910 15:18:22.893742 1 config.go:202] Starting service config controller
I0910 15:18:22.893765 1 controller_utils.go:1019] Waiting for caches to sync for service config controller
I0910 15:18:22.993904 1 controller_utils.go:1026] Caches are synced for endpoints config controller
I0910 15:18:22.993921 1 controller_utils.go:1026] Caches are synced for service config controller
W0910 16:13:28.276082 1 reflector.go:341] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: watch of *core.Endpoints ended with: very short watch: k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Unexpected watch close - watch lasted less than a second and no items received
W0910 16:13:28.276083 1 reflector.go:341] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: watch of *core.Service ended with: very short watch: k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Unexpected watch close - watch lasted less than a second and no items received
E0910 16:13:29.276678 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: Get https://127.0.0.1:6553/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:29.276677 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Service: Get https://127.0.0.1:6553/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:30.277201 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: Get https://127.0.0.1:6553/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:30.278009 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Service: Get https://127.0.0.1:6553/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:31.277723 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: Get https://127.0.0.1:6553/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:31.278574 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Service: Get https://127.0.0.1:6553/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:32.278197 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: Get https://127.0.0.1:6553/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:32.279134 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Service: Get https://127.0.0.1:6553/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:33.278684 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: Get https://127.0.0.1:6553/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:33.279587 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Service: Get https://127.0.0.1:6553/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
Question: I am expecting kube-proxy pod to be down if it is not able
to register with apiserver for events.
The kube-proxy is not supposed to go down. It listens for events on the kube-apiserver and performs whatever it needs to do when a change/deployment happens. The rationale that I can think of is that it may be caching information to keep the iptables on your system consistent. Kubernetes is designed in such a way that if your master/kube-apiserver/or master components go down, then traffic should still be flowing to the nodes with no downtime.
How do I achieve this behavior via liveness probes?
You can always add liveness probes to the kube-proxy DaemonSet but it's not a recommended practice:
spec:
containers:
- command:
- /usr/local/bin/kube-proxy
- --config=/var/lib/kube-proxy/config.conf
image: k8s.gcr.io/kube-proxy-amd64:v1.11.2
imagePullPolicy: IfNotPresent
name: kube-proxy
resources: {}
securityContext:
privileged: true
livenessProbe:
exec:
command:
- curl <apiserver>:10256/healthz
initialDelaySeconds: 5
periodSeconds: 5
Make sure that --healthz-port is enabled on the kube-apiserver.
I have a kubernetes cluster (v1.10) using flannel (not sure if relevant, might be) as CNI provider. Trying to apply kube-dns but it goes to CrashLoopBackOff and the logs for the kubedns pod show, repeatedly:
I0423 17:46:47.045712 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0423 17:46:47.545729 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0423 17:46:48.045723 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0423 17:46:48.545749 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
E0423 17:46:49.019286 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:147: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: connection refused
E0423 17:46:49.019325 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:150: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: connection refused
I0423 17:46:49.045731 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
F0423 17:46:49.545707 1 dns.go:167] Timeout waiting for initialization
Nothing in my kube-dns manifest refers to port 443 and kube-apiserver is configured for 6443. What is it trying to get a connection to that is being refused?
I also don't know whether it has anything to do with the kube-dns pod having an ip of 10.88.0.3:
kubectl -n kube-system -o wide get pods
NAME READY STATUS RESTARTS AGE IP NODE
kube-dns-564f9d98-lt9js 2/3 CrashLoopBackOff 13 18m 10.88.0.3 worker1
kube-flannel-ds-5bqm6 1/1 Running 0 35m 10.240.0.12 controller2
kube-flannel-ds-djmld 1/1 Running 0 35m 10.240.0.11 controller1
kube-flannel-ds-nbfhp 1/1 Running 0 35m 10.240.0.23 worker3
kube-flannel-ds-prxdr 1/1 Running 0 35m 10.240.0.22 worker2
kube-flannel-ds-x9cdq 1/1 Running 0 35m 10.240.0.21 worker1
kube-flannel-ds-zjbgb 1/1 Running 0 35m 10.240.0.13 controller3
Again, where is this coming from? It's not something I have configured and it does not sit within either of my service network or pod network CIDR ranges:
kubernetes_dns_domain: kubernetes.local
kubernetes_dns_ip: "{{ kubernetes_cluster_subnet }}.10"
kubernetes_cluster_subnet: 10.96.0
kubernetes_pod_network_cidr: 10.244.0.0/16
kubernetes_service_ip: "{{ kubernetes_cluster_subnet }}.1"
kubernetes_service_ip_range: "{{ kubernetes_cluster_subnet }}.0/24"
kubernetes_service_node_port_range: 30000-32767
kubernetes_secure_port: 6443
I'm thoroughly confused and would be grateful of any explanations as to what is going on.
kube_dns_version: 1.14.10
flannel_version: v0.10.0
I'm struggling to understand how to correctly configure kube-dns with flannel on kubernetes 1.10 and containerd as the CRI.
kube-dns fails to run, with the following error:
kubectl -n kube-system logs kube-dns-595fdb6c46-9tvn9 -c kubedns
I0424 14:56:34.944476 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:35.444469 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
E0424 14:56:35.815863 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host
E0424 14:56:35.815863 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host
I0424 14:56:35.944444 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:36.444462 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:36.944507 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
F0424 14:56:37.444434 1 dns.go:209] Timeout waiting for initialization
kubectl -n kube-system describe pod kube-dns-595fdb6c46-9tvn9
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 47m (x181 over 3h) kubelet, worker1 Readiness probe failed: Get http://10.244.0.2:8081/readiness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning BackOff 27m (x519 over 3h) kubelet, worker1 Back-off restarting failed container
Normal Killing 17m (x44 over 3h) kubelet, worker1 Killing container with id containerd://dnsmasq:Container failed liveness probe.. Container will be killed and recreated.
Warning Unhealthy 12m (x178 over 3h) kubelet, worker1 Liveness probe failed: Get http://10.244.0.2:10054/metrics: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning BackOff 2m (x855 over 3h) kubelet, worker1 Back-off restarting failed container
There is indeed no route to the 10.96.0.1 endpoint:
ip route
default via 10.240.0.254 dev ens160
10.240.0.0/24 dev ens160 proto kernel scope link src 10.240.0.21
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.0.0/16 dev cni0 proto kernel scope link src 10.244.0.1
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
10.244.4.0/24 via 10.244.4.0 dev flannel.1 onlink
10.244.5.0/24 via 10.244.5.0 dev flannel.1 onlink
What is responsible for configuring the cluster service address range and associated routes? Is it the container runtime, the overlay network (flannel in this case), or something else? Where should it be configured?
The 10-containerd-net.conflist configures the bridge between the host and my pod network. Can the service network be configured here too?
cat /etc/cni/net.d/10-containerd-net.conflist
{
"cniVersion": "0.3.1",
"name": "containerd-net",
"plugins": [
{
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"promiscMode": true,
"ipam": {
"type": "host-local",
"subnet": "10.244.0.0/16",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
},
{
"type": "portmap",
"capabilities": {"portMappings": true}
}
]
}
Edit:
Just came across this from 2016:
As of a few weeks ago (I forget the release but it was a 1.2.x where x
!= 0) (#24429) we fixed the routing such that any traffic that arrives
at a node destined for a service IP will be handled as if it came to a
node port. This means you should be able to set yo static routes for
your service cluster IP range to one or more nodes and the nodes will
act as bridges. This is the same trick most people do with flannel to
bridge the overlay.
It's imperfect but it works. In the future will will need to get more
precise with the routing if you want optimal behavior (i.e. not losing
the client IP), or we will see more non-kube-proxy implementations of
services.
Is that still relevant? Do I need to setup a static route for the service CIDR? Or is the issue actually with kube-proxy rather than flannel or containerd?
My flannel configuration:
cat /etc/cni/net.d/10-flannel.conflist
{
"name": "cbr0",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
And kube-proxy:
[Unit]
Description=Kubernetes Kube Proxy
Documentation=https://github.com/kubernetes/kubernetes
[Service]
ExecStart=/usr/local/bin/kube-proxy \
--cluster-cidr=10.244.0.0/16 \
--feature-gates=SupportIPVSProxyMode=true \
--ipvs-min-sync-period=5s \
--ipvs-sync-period=5s \
--ipvs-scheduler=rr \
--kubeconfig=/etc/kubernetes/kube-proxy.conf \
--logtostderr=true \
--master=https://192.168.160.1:6443 \
--proxy-mode=ipvs \
--v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
Edit:
Having looked at the kube-proxy debugging steps, it appears that kube-proxy cannot contact the master. I suspect this is a large part of the problem. I have 3 controller/master nodes behind a HAProxy loadbalancer, which is bound to 192.168.160.1:6443 and forwards round robin to each of the masters on 10.240.0.1[1|2|3]:6443. This can be seen in the output/configs above.
In kube-proxy.service, I have specified --master=192.168.160.1:6443. Why are connections being attempted to port 443? Can I change this - there doesn't seem to be a port flag? Does it need to be port 443 for some reason?
There are two components to this answer, one about running kube-proxy and one about where those :443 URLs are coming from.
First, about kube-proxy: please don't run kube-proxy as a systemd service like that. It is designed to be launched by kubelet in the cluster so that the SDN addresses behave rationally, since they are effectively "fake" addresses. By running kube-proxy outside the control of kubelet, all kinds of weird things are going to happen unless you expend a huge amount of energy to replicate the way that kubelet configures its subordinate docker containers.
Now, about that :443 URL:
E0424 14:56:35.815863 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host
...
Why are connections being attempted to port 443? Can I change this - there doesn't seem to be a port flag? Does it need to be port 443 for some reason?
That 10.96.0.1 is from the Service CIDR of your cluster, which is (and should be) separate from the Pod CIDR which should be separate from the Node's subnets, etc. The .1 of the cluster's Service CIDR is either reserved (or traditionally allocated) to the kubernetes.default.svc.cluster.local Service, with its one Service.port as 443.
I'm not super sure why the --master flag doesn't supersede the value in /etc/kubernetes/kube-proxy.conf but since that file is very clearly only supposed to be used by kube-proxy, why not just update the value in the file to remove all doubt?
I have kubeadm cluster deployed in CentOS VM. while trying to deploy ingress controller following github i noticed that i'm unable to see logs:
kubectl logs -n ingress-nginx nginx-ingress-controller-697f7c6ddb-x9xkh --previous
Error from server: Get https://192.168.56.34:10250/containerLogs/ingress-nginx/nginx-ingress-controller-697f7c6ddb-x9xkh/nginx-ingress-controller?previous=true: dial tcp 192.168.56.34:10250: getsockopt: connection timed out
In 192.168.56.34 (node1) netstat returns:
tcp6 0 0 :::10250 :::* LISTEN 1068/kubelet
In fact i'm unable to see any logs despite the status of the pod.
I disabled both the firewalld and SELinux.
I used proxy to enable kubernertes to download images, now i removed the proxy.
When navigating to the url in the error above i get Forbidden (user=system:anonymous, verb=get, resource=nodes, subresource=proxy)
I'm also able to fetch my nodes:
kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 32d v1.9.3
k8s-node1 Ready <none> 30d v1.9.3
k8s-node2 NotReady <none> 32d v1.9.3
getsockopt: connection timed out
Is 99.99999% a firewall issue. If it was "connection refused" then showing the output of netstat would be meaningful, but (as you can see) kubelet is listening on that port just fine -- it's the networking configuration between the machine that is running kubectl and "192.168.56.34" that is incorrectly configured to allow traffic.
The apiserver expects that everyone who would want to view logs (or use kubectl exec) can reach that port on every Node in the cluster; so be sure you don't just fix the firewall rule(s) for that one Node -- fix it for all of them.
This message is from the apiserver running on your master. The command kubectl logs, running on your local machine, fetches logs via the apiserver. So the error message reveals a firewall misconfiguration between the master and the node(s) (port 10250)