Readiness Probe fails on management proxy prod - deployment

I'm trying to setup an SQLServer BDC on AKS but the process does not seem to be moving beyond a certain point. The AKS cluster is a 3 node cluster built on a Standard_E8_v3 VM ScaleSet.
Here is a list of pods: C:\Users\rgn>kubectl get pods -n mssql-cluster
NAME READY STATUS RESTARTS AGE
control-qm754 3/3 Running 0 35m
controldb-0 2/2 Running 0 35m
controlwd-wxrlg 1/1 Running 0 32m
logsdb-0 1/1 Running 0 32m
logsui-mqfcv 1/1 Running 0 32m
metricsdb-0 1/1 Running 0 32m
metricsdc-9frbb 1/1 Running 0 32m
metricsdc-jr5hk 1/1 Running 0 32m
metricsdc-ls7mf 1/1 Running 0 32m
metricsui-pn9qf 1/1 Running 0 32m
mgmtproxy-x4ctb 2/2 Running 0 32m
When I ran describe against mgmtproxy-x4ctb pod the below is what I see. And even though that status indicates it is running, it is not (the readiness probe is failing). I believe this is the reason why the deployment is not proceeding.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11m default-scheduler Successfully assigned mssql-cluster/mgmtproxy-x4ctb to aks-agentpool-34156060-vmss000002
Normal Pulling 11m kubelet, aks-agentpool-34156060-vmss000002 Pulling image "mcr.microsoft.com/mssql/bdc/mssql-service-proxy:2019-CU4-ubuntu-16.04"
Normal Pulled 11m kubelet, aks-agentpool-34156060-vmss000002 Successfully pulled image "mcr.microsoft.com/mssql/bdc/mssql-service-proxy:2019-CU4-ubuntu-16.04"
Normal Created 11m kubelet, aks-agentpool-34156060-vmss000002 Created container service-proxy
Normal Started 11m kubelet, aks-agentpool-34156060-vmss000002 Started container service-proxy
Normal Pulling 11m kubelet, aks-agentpool-34156060-vmss000002 Pulling image "mcr.microsoft.com/mssql/bdc/mssql-monitor-fluentbit:2019-CU4-ubuntu-16.04"
Normal Pulled 11m kubelet, aks-agentpool-34156060-vmss000002 Successfully pulled image "mcr.microsoft.com/mssql/bdc/mssql-monitor-fluentbit:2019-CU4-ubuntu-16.04"
Normal Created 11m kubelet, aks-agentpool-34156060-vmss000002 Created container fluentbit
Normal Started 11m kubelet, aks-agentpool-34156060-vmss000002 Started container fluentbit
Warning Unhealthy 10m (x6 over 11m) kubelet, aks-agentpool-34156060-vmss000002 Readiness probe failed: cat: /var/run/container.ready: No such file or directory
I tried it twice but both times it was not able to move beyond this point. From the link it looks like this problem only exist since last month. Can someone point me in the right direction?
Log listing from the proxy pod:
2020/06/13 16:25:35 Setting the directories for 'agent:agent' owner with '-rwxrwxr-x' mode: [/var/opt /var/log /var/run/secrets /var/run/secrets/keytabs /var/run/secrets/certificates /var/run/secrets/credentials /var/opt/agent /var/log/agent /var/run/agent]
2020/06/13 16:25:35 Setting the directories for 'agent:agent' owner with '-rwxrwx---' mode: [/var/opt/agent /var/log/agent /var/run/agent]
2020/06/13 16:25:35 Searching agent configuration file at /opt/agent/conf/mgmtproxy.json
2020/06/13 16:25:35 Searching agent configuration file at /opt/agent/conf/agent.json
2020/06/13 16:25:35.777955 Changed the container umask from '-----w--w-' to '--------w-'
2020/06/13 16:25:35.778031 Setting the directories for 'supervisor:supervisor' owner with '-rwxrwx---' mode: [/var/log/supervisor/log /var/opt/supervisor /var/log/supervisor /var/run/supervisor]
2020/06/13 16:25:35.778170 Setting the directories for 'fluentbit:fluentbit' owner with '-rwxrwx---' mode: [/var/opt/fluentbit /var/log/fluentbit /var/run/fluentbit]
2020/06/13 16:25:35.778411 Agent configuration: {"PodType":"mgmtproxy","ContainerName":"fluentbit","GrpcPort":8311,"HttpsPort":8411,"ScaledSetKind":"ReplicaSet","securityPolicy":"certificate","dnsServicesToWaitFor":null,"cronJobs":null,"serviceJobs":null,"healthModules":null,"logRotation":{"agentLogMaxSize":500,"agentLogRotateCount":3,"serviceLogRotateCount":10},"fileMap":{"fluentbit-certificate.pem":"/var/run/secrets/certificates/fluentbit/fluentbit-certificate.pem","fluentbit-privatekey.pem":"/var/run/secrets/certificates/fluentbit/fluentbit-privatekey.pem","krb5.conf":"/etc/krb5.conf","nsswitch.conf":"/etc/nsswitch.conf","resolv.conf":"/etc/resolv.conf","smb.conf":"/etc/samba/smb.conf"},"userPermissions":{"agent":{"user":"agent","group":"agent","mode":"0770","modeSetgid":false,"directories":[]},"fluentbit":{"user":"fluentbit","group":"","mode":"","modeSetgid":false,"directories":[]},"fundamental":{"user":"agent","group":"agent","mode":"0775","modeSetgid":false,"directories":["/var/opt","/var/log","/var/run/secrets","/var/run/secrets/keytabs","/var/run/secrets/certificates","/var/run/secrets/credentials"]},"supervisor":{"user":"supervisor","group":"supervisor","mode":"0770","modeSetgid":false,"directories":["/var/log/supervisor/log"]}},"fileIgnoreList":["agent-certificate.pem","agent-privatekey.pem"],"InstanceId":"t4KLx1m5vDsHCHc038KgKHH5HOcQVR0Z","ContainerId":"","StartServicesImmediately":false,"DisableFileDownloads":false,"DisableHealthChecks":false,"serviceFencingEnabled":false,"isPrivileged":true,"IsConfigurationManagerEnabled":false,"LWriter":{"filename":"/var/log/agent/agent.log","maxsize":500,"maxage":0,"maxbackups":10,"localtime":true,"compress":false}}
2020/06/13 16:25:36.316209 Attempting to join cluster...
2020/06/13 16:25:36.316301 Source directory /var/opt/secrets/certificates/ca does not exist
2020/06/13 16:25:36.316520 [Reaper] Starting the signal loop for reaper
2020/06/13 16:25:40.642164 [Reaper] Received SIGCHLD signal. Starting process reaper.
2020/06/13 16:25:40.652703 Starting secure gRPC listener on 0.0.0.0:8311
2020/06/13 16:25:40.943805 Cluster join successful.
2020/06/13 16:25:40.943846 Stopping gRPC listener on 0.0.0.0:8311
2020/06/13 16:25:40.944704 Getting manifest from controller...
2020/06/13 16:25:40.964774 Downloading '/config/scaledsets/mgmtproxy/containers/fluentbit/files/fluentbit-certificate.pem' from controller...
2020/06/13 16:25:40.964816 Downloading '/config/scaledsets/mgmtproxy/containers/fluentbit/files/fluentbit-privatekey.pem' from controller...
2020/06/13 16:25:40.987309 Stored 1206 bytes to /var/run/secrets/certificates/fluentbit/fluentbit-certificate.pem
2020/06/13 16:25:40.992108 Stored 1694 bytes to /var/run/secrets/certificates/fluentbit/fluentbit-privatekey.pem
2020/06/13 16:25:40.992235 Agent is ready.
2020/06/13 16:25:40.992348 Starting supervisord with command: '[supervisord --nodaemon -c /etc/supervisord.conf]'
2020/06/13 16:25:40.992719 Started supervisord with pid=1437
2020/06/13 16:25:40.993030 Starting secure gRPC listener on 0.0.0.0:8311
2020/06/13 16:25:40.996580 Starting HTTPS listener on 0.0.0.0:8411
2020/06/13 16:25:41.998667 [READINESS] Not all supervisord processes are ready. Attempts: 1, Max attempts: 250
2020/06/13 16:25:41.999567 Loading go plugin plugins/bdc.so
2020/06/13 16:25:41.999588 Loading go plugin plugins/platform.so
2020/06/13 16:25:41.999600 Starting the health monitoring, number of modules: 2, services: ["fluentbit","agent"]
2020/06/13 16:25:41.999605 Starting the health service
2020/06/13 16:25:41.999609 Starting the health durable store
2020/06/13 16:25:41.999614 Loading existing health properties from /var/opt/agent/health/health-properties-main.gob
2020/06/13 16:25:41.999642 No existing file path for file: /var/opt/agent/health/health-properties-main.gob
2020/06/13 16:25:42.640719 Adding a new plugin plugins/bdc.so
2020/06/13 16:25:43.302872 Adding a new plugin plugins/platform.so
2020/06/13 16:25:43.302932 Created a health module watcher for service 'fluentbit'
2020/06/13 16:25:43.302948 Starting a new watcher for health module: fluentbit
2020/06/13 16:25:43.302983 Starting a new watcher for health module: agent
2020/06/13 16:25:43.302992 Health monitoring started
2020/06/13 16:25:53.000908 [READINESS] All services marked as ready.
2020/06/13 16:25:53.000966 [READINESS] Container is now ready.
2020/06/13 16:26:01.995093 [MONITOR] Service states: map[fluentbit:RUNNING]

All,
Finally it got sorted out.
There were several issues with respect to our azure policy and our network policy.
(1) It was not allowing new IP addresses to be assigned to the loadbalancer.
(2) The gateway proxy was not getting new IP Addresses since we ran out of our quota of 10 max IPs that were allowed.
(3) My desktop from where I started to deploy was not able to ping the controller service IP addresses and Port.
We addressed the above one after the other and we are in the final stages.
Given that the IP Addresses are static but generated on the fly it cannot be provisioned. How did the others handle this with their network/azure infrastructure team?
Thanks,
rgn

Related

istio bookinfo sample application deployment failed for "ImagePullBackOff"

I am trying to deploy istio's sample bookinfo application using the below command:
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
from here
but each time I am getting ImagePullBackoff error like this:
NAME READY STATUS RESTARTS AGE
details-v1-c74755ddf-m878f 2/2 Running 0 6m32s
productpage-v1-778ddd95c6-pdqsk 2/2 Running 0 6m32s
ratings-v1-5564969465-956bq 2/2 Running 0 6m32s
reviews-v1-56f6655686-j7lb6 1/2 ImagePullBackOff 0 6m32s
reviews-v2-6b977f8ff5-55tgm 1/2 ImagePullBackOff 0 6m32s
reviews-v3-776b979464-9v7x5 1/2 ImagePullBackOff 0 6m32s
For error details, I have run :
kubectl describe pod reviews-v1-56f6655686-j7lb6
Which returns these:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m41s default-scheduler Successfully assigned default/reviews-v1-56f6655686-j7lb6 to minikube
Normal Pulled 7m39s kubelet Container image "docker.io/istio/proxyv2:1.15.3" already present on machine
Normal Created 7m39s kubelet Created container istio-init
Normal Started 7m39s kubelet Started container istio-init
Warning Failed 5m39s kubelet Failed to pull image "docker.io/istio/examples-bookinfo-reviews-v1:1.17.0": rpc error: code = Unknown desc = context deadline exceeded
Warning Failed 5m39s kubelet Error: ErrImagePull
Normal Pulled 5m39s kubelet Container image "docker.io/istio/proxyv2:1.15.3" already present on machine
Normal Created 5m39s kubelet Created container istio-proxy
Normal Started 5m39s kubelet Started container istio-proxy
Normal BackOff 5m36s (x3 over 5m38s) kubelet Back-off pulling image "docker.io/istio/examples-bookinfo-reviews-v1:1.17.0"
Warning Failed 5m36s (x3 over 5m38s) kubelet Error: ImagePullBackOff
Normal Pulling 5m25s (x2 over 7m38s) kubelet Pulling image "docker.io/istio/examples-bookinfo-reviews-v1:1.17.0"
Do I need to build dockerfile first and push it to the local repository? There are no clear instructions there or I failed to find any.
Can anybody help?
If you check in dockerhub the image is there:
https://hub.docker.com/r/istio/examples-bookinfo-reviews-v1/tags
So the error that you need to deal with is context deadline exceeded while trying to pull it from dockerhub. This is likely a networking error (a generic Go error saying it took too long), depending on where your cluster is running you can do manually a docker pull from the nodes and that should work.
EDIT: for minikube do a minikube ssh and then a docker pull
Solved the problem by below command :
minikube ssh docker pull istio/examples-bookinfo-reviews-v1:1.17.0
from this git issues here
Also How to use local docker images with Minikube?
Hope this may help somebody

Pods starting but not working in Kubernetes

Created Kubernetes cluster deployment with 3 Pods, and all are running fine, but when trying to run them cannot do it, tried doing curl the Ip (Internal)of the Pods in describe section i could see this error "" MountVolume.SetUp failed for volume "default-token-twhht" : failed to sync secret cache:
errors below:
5m51s Normal RegisteredNode node/ip-10-1-1-4 Node ip-10-1-1-4 event: Registered Node ip-10-1-1-4 in Controller
57m Normal Scheduled pod/nginx-deployment-585449566-9bqp7 Successfully assigned default/nginx-deployment-585449566-9bqp7 to ip-10-1-1-4
57m Warning FailedMount pod/nginx-deployment-585449566-9bqp7 MountVolume.SetUp failed for volume "default-token-twhht" : failed to sync secret cache: timed out waiting for the condition
57m Normal Pulling pod/nginx-deployment-585449566-9bqp7 Pulling image "nginx:latest"
56m Normal Pulled pod/nginx-deployment-585449566-9bqp7 Successfully pulled image "nginx:latest" in 12.092210534s
56m Normal Created pod/nginx-deployment-585449566-9bqp7 Created container nginx
56m Normal Started pod/nginx-deployment-585449566-9bqp7 Started container nginx
57m Normal Scheduled pod/nginx-deployment-585449566-9hlhz Successfully assigned default/nginx-deployment-585449566-9hlhz to ip-10-1-1-4
57m Warning FailedMount pod/nginx-deployment-585449566-9hlhz MountVolume.SetUp failed for volume "default-token-twhht" : failed to sync secret cache: timed out waiting for the condition
57m Normal Pulling pod/nginx-deployment-585449566-9hlhz Pulling image "nginx:latest"
56m Normal Pulled pod/nginx-deployment-585449566-9hlhz Successfully pulled image "nginx:latest" in 15.127984291s
56m Normal Created pod/nginx-deployment-585449566-9hlhz Created container nginx
56m Normal Started pod/nginx-deployment-585449566-9hlhz Started container nginx
57m Normal Scheduled pod/nginx-deployment-585449566-ffkwf Successfully assigned default/nginx-deployment-585449566-ffkwf to ip-10-1-1-4
57m Warning FailedMount pod/nginx-deployment-585449566-ffkwf MountVolume.SetUp failed for volume "default-token-twhht" : failed to sync secret cache: timed out waiting for the condition
57m Normal Pulling pod/nginx-deployment-585449566-ffkwf Pulling image "nginx:latest"
56m Normal Pulled pod/nginx-deployment-585449566-ffkwf Successfully pulled image "nginx:latest" in 9.459864756s
56m Normal Created pod/nginx-deployment-585449566-ffkwf Created container nginx
You can add an additional RBAC role permission to your Pod's service account, reference 1 2 3.
Assure as well that you have the workload identity set up, reference 4.
This can also happen when apiserver is on high load, you could have more smaller nodes to spread your pods and increase your resource requests.
This error message is a bit misleading, since it suggests a K8s cluster internal connectivity problem. In reality it is an RBAC permission problem.
The default service account within the namespace you are deploying to is not authorized to mount the secret that you are trying to mount into your Pod.
To solve this, you have to add additional RBAC role permission to your Pod's service account.

Troubles while installing GlusterFS on Kubernetes cluster using Heketi

I try to install GlusterFS on my kubernetes cluster using heketi. I start gk-deploy but it shows that pods aren't found:
Using Kubernetes CLI.
Using namespace "default".
Checking for pre-existing resources...
GlusterFS pods ... not found.
deploy-heketi pod ... not found.
heketi pod ... not found.
gluster-s3 pod ... not found.
Creating initial resources ... Error from server (AlreadyExists): error when creating "/heketi/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml": serviceaccounts "heketi-service-account" already exists
Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "heketi-sa-view" already exists
clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view not labeled
OK
node/sapdh2wrk1 not labeled
node/sapdh2wrk2 not labeled
node/sapdh2wrk3 not labeled
daemonset.extensions/glusterfs created
Waiting for GlusterFS pods to start ... pods not found.
I've started gk-deploy more than once.
I have 3 nodes in my kubernetes cluster and it seems like pods can't start up on none of them, but I don't understand why.
Pods are created but aren't ready:
kubectl get pods
NAME READY STATUS RESTARTS AGE
glusterfs-65mc7 0/1 Running 0 16m
glusterfs-gnxms 0/1 Running 0 16m
glusterfs-htkmh 0/1 Running 0 16m
heketi-754dfc7cdf-zwpwn 0/1 ContainerCreating 0 74m
Here is a log of one GlusterFS Pod, it ends with a warning:
Events:
Type Reason Age From Message
Normal Scheduled 19m default-scheduler Successfully assigned default/glusterfs-65mc7 to sapdh2wrk1
Normal Pulled 19m kubelet, sapdh2wrk1 Container image "gluster/gluster-centos:latest" already present on machine
Normal Created 19m kubelet, sapdh2wrk1 Created container
Normal Started 19m kubelet, sapdh2wrk1 Started container
Warning Unhealthy 13m (x12 over 18m) kubelet, sapdh2wrk1 Liveness probe failed: /usr/local/bin/status-probe.sh
failed check: systemctl -q is-active glusterd.service
Warning Unhealthy 3m58s (x35 over 18m) kubelet, sapdh2wrk1 Readiness probe failed: /usr/local/bin/status-probe.sh
failed check: systemctl -q is-active glusterd.service
Glusterfs-5.8-100.1 is installed and started up on every node including master.
What is the reason why Pods don't start up?

Kubernetes cluster VirtualBox issues with networking (NAT and Host-only adapters)

I am trying to setup a kubernetes cluster (two nodes, 1 master, 1 worker) on VirtualBox. My host computer runs Windows 10 and on the VirtualBox I have installed Ubuntu 18.10, Codename cosmic.
I have configured two adapters on each VirtualBox, one NAT and one Host-Only adapter. I did that because I need to access some internal resources using the host IP (NAT) and I also need a stable network between the host and the virtual machines (Host-only network).
I have installed Kubernetes v1.12.4 and successfully joined the worker to the master node.
NAME STATUS ROLES AGE VERSION
kubernetes-master Ready master 36m v1.12.4
kubernetes-slave Ready <none> 25m v1.12.4
I am using Flannel for networking.
All pods seems to be ok.
NAMESPACE NAME READY STATUS RESTARTS AGE
default nginx-server-7bb6997d9c-kdcld 1/1 Running 0 27m
kube-system coredns-576cbf47c7-btrvb 1/1 Running 1 38m
kube-system coredns-576cbf47c7-zfscv 1/1 Running 1 38m
kube-system etcd-kubernetes-master 1/1 Running 1 38m
kube-system kube-apiserver-kubernetes-master 1/1 Running 1 38m
kube-system kube-controller-manager-kubernetes-master 1/1 Running 1 38m
kube-system kube-flannel-ds-amd64-29p96 1/1 Running 1 28m
kube-system kube-flannel-ds-amd64-sb2fq 1/1 Running 1 37m
kube-system kube-proxy-59v6b 1/1 Running 1 38m
kube-system kube-proxy-bfd78 1/1 Running 0 28m
kube-system kube-scheduler-kubernetes-master 1/1 Running 1 38m
I have deployed nginx to verify that everything is working
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 41m
nginx-http ClusterIP 10.111.151.28 <none> 80/TCP 29m
However when I try to reach nginx I am getting a timeout. describe pod gives me the following events.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 32m default-scheduler Successfully assigned default/nginx-server-7bb6997d9c-kdcld to kubernetes-slave
Warning FailedCreatePodSandBox 32m kubelet, kubernetes-slave Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "dbb2595628fc2579c29779e31e27e27eaeff2dbcf2bdb68467c47f22a3590bd0" network for pod "nginx-server-7bb6997d9c-kdcld": NetworkPlugin cni failed to set up pod "nginx-server-7bb6997d9c-kdcld_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 32m kubelet, kubernetes-slave Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "801e0f3f8ca4a9b7cc21d87d41141485e1b1da357f2d89e1644acf0ecf634016" network for pod "nginx-server-7bb6997d9c-kdcld": NetworkPlugin cni failed to set up pod "nginx-server-7bb6997d9c-kdcld_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 32m kubelet, kubernetes-slave Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "77214c757449097bfbe05b24ebb5fd3c7f1d96f7e3e9a3cd48f3b37f30224feb" network for pod "nginx-server-7bb6997d9c-kdcld": NetworkPlugin cni failed to set up pod "nginx-server-7bb6997d9c-kdcld_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 32m kubelet, kubernetes-slave Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ebffdd723083d916c0910489e12368dc4069dd99c24a3a4ab1b1d4ab823866ff" network for pod "nginx-server-7bb6997d9c-kdcld": NetworkPlugin cni failed to set up pod "nginx-server-7bb6997d9c-kdcld_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 32m kubelet, kubernetes-slave Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "d87b93815380246a05470e597a88d50eb31c132a50e30000ab41a456d1e65107" network for pod "nginx-server-7bb6997d9c-kdcld": NetworkPlugin cni failed to set up pod "nginx-server-7bb6997d9c-kdcld_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 32m kubelet, kubernetes-slave Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "3ef233ef0a6c447134c7b027747a701d6576a80e76c9cc8ffd8287e8ee5f02a4" network for pod "nginx-server-7bb6997d9c-kdcld": NetworkPlugin cni failed to set up pod "nginx-server-7bb6997d9c-kdcld_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 32m kubelet, kubernetes-slave Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "6b621aab3c57154941b37360240228fe939b528855a5fe8cd9536df63d41ed93" network for pod "nginx-server-7bb6997d9c-kdcld": NetworkPlugin cni failed to set up pod "nginx-server-7bb6997d9c-kdcld_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 32m kubelet, kubernetes-slave Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "fa992bde90e0a1839180666bedaf74965fb26f3dccb33a66092836a25882ab44" network for pod "nginx-server-7bb6997d9c-kdcld": NetworkPlugin cni failed to set up pod "nginx-server-7bb6997d9c-kdcld_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 32m kubelet, kubernetes-slave Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "81f74f687e17d67bd2853849f84ece33a118744278d78ac7af3bdeadff8aa9c7" network for pod "nginx-server-7bb6997d9c-kdcld": NetworkPlugin cni failed to set up pod "nginx-server-7bb6997d9c-kdcld_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 32m (x2 over 32m) kubelet, kubernetes-slave (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "29188c3e73d08e81b08b2258254dc2691fcaa514ecc96e9df86f2e61ba455b76" network for pod "nginx-server-7bb6997d9c-kdcld": NetworkPlugin cni failed to set up pod "nginx-server-7bb6997d9c-kdcld_default" network: open /run/flannel/subnet.env: no such file or directory
Normal SandboxChanged 32m (x11 over 32m) kubelet, kubernetes-slave Pod sandbox changed, it will be killed and re-created.
Normal Pulling 32m kubelet, kubernetes-slave pulling image "nginx"
Normal Pulled 32m kubelet, kubernetes-slave Successfully pulled image "nginx"
Normal Created 32m kubelet, kubernetes-slave Created container
I have tried to do the same exactly installation with a bridge adapter only configured to the virtual machines and then everything works as expected.
I believe that its a configuration issue however I am unable to solve it. Can someone advise me.
As I have mentioned in deleted comment, I recreated this on my Ubuntu 18.04 host. Created two Ubuntu 18.10 VM, with two adapters (NAT and one Host-Only adapter). I have the same configuration as you have specified here. Everything works fine.
What I had to do was to add the second adapter manually, I did it by using netplan before running kubeadm init and kubeadm join on node.
Just in case you did not do that - add the host only adapter network to the yaml file in /etc/netplan/50-cloud-init.yaml and run sudo netplan generate and sudo netplan apply. For nginx I have used deployment from official Kubernetes documentation. Then I have exposed the service:
kubectl create service nodeport nginx --tcp=80:80
Curling my node IP address on NodePort from host machine works fine.
This was just to demonstrate what I did so it works in my environment. Judging from the described pod error it seems like there is something wrong with Flannel itself:
/run/flannel/subnet.env: no such file or directory
I checked this directory on master and it looks like this:
/run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
Check if the file is there, and if this will not help you, we can try to further troubleshoot if you provide more information. However there are too many unknowns so I had to guess in some places, my advice would be to destroy it all and try again with the information I have provided, and run the nginx with NodePort and not ClusterIP type. ClusterIP will only be reachable from inside of the cluster - for example Node.
Please let me pump up this thread. Long time ago I had configurated 1 NAT for internet, 1 HOST for SSH remote and errors the same. Special when setup Rancher Longhorn.
Now, I don't build like that. First, I build the GATEWAY SERVER by using CentOS with iptable (1 NAT, 1 HOST)
Then, other VMs has just 1 interface HOST connected direct to GATEWAY SERVER

Kubernetes pod deployment error FailedSync| Error syncing pod

Env:
Vbox on a windows 10 desktop machine
Two ubuntu VMs, one VM is master and the other one is k8s(1.7) worker.
I can see two nodes are "ready" when get nodes. But even deploy a very simple nginx pod, I got the error message from pod describe
"norm | SandboxChanged |Pod sandbox changed, it will be killed and re-created." and "warning | FailedSync| Error syncing pod".
But if I run the docker container directly on the worker, the container can be up and running. Anyone has a suggestion what I can check for?
k8s-master#k8smaster-VirtualBox:~$ **kubectl get pods** NAME
READY STATUS RESTARTS AGE
movie-server-1517284798-lbb01 0/1 CrashLoopBackOff 6
16m
k8s-master#k8smaster-VirtualBox:~$ **kubectl describe pod
movie-server-1517284798-lbb01**
--- clip --- kubelet, master-virtualbox spec.containers{movie-server} Warning FailedError: failed to start
container "movie-server": Error response from daemon:
{"message":"cannot join network of a non running container:
3f59947dbd404ecf2f6dd0b65dd9dad8b25bf0c418aceb8cf666ad0761402b53"}
kubelet, master-virtualbox spec.containers{movie-server}
Warning BackOffBack-off restarting failed container
kubelet, master-virtualbox Normal
SandboxChanged Pod sandbox changed, it will be killed and
re-created.
kubelet, master-virtualbox spec.containers{movie-server} Normal
PulledContainer image "nancyfeng/movie-server:0.1.0" already present
on machine
kubelet, master-virtualbox spec.containers{movie-server}
Normal CreatedCreated container
kubelet, master-virtualbox
Warning FailedSync Error syncing pod
kubelet, master-virtualbox spec.containers{movie-server}
Warning FailedError: failed to start container "movie-server": Error
response from daemon: {"message":"cannot join network of a non running
container:
72ba77b25b6a3969e8921214f0ca73ffaab4c82d8a2852e3d1b1f3ac5dde6ce1"}
--- clip ---