KUBERNETES PODS:unable to mount nfs volume - kubernetes

I have some trouble mounting an nfs volume inside a docker container.
I have one minion(labcr3) and one pod(httpd-php).
Below you can find all relevant details.
Can you help me ?
Thanks and Regards
Phisical host(minion) is able to mount the nfs volume.
[root#LABCR3 ~]# df -h /NFS
Filesystem Size Used Avail Use% Mounted on
192.168.240.1:/mnt/VOL1/nfs-ds 500G 0 500G 0% /NFS
Here the priviliged mode enable on kubernetes on minion
root#LABCR3 ~]# cat /etc/kubernetes/config | grep PRIV
KUBE_ALLOW_PRIV="--allow_privileged=true"
Describe of pod
[root#labcr1 pods]# kubectl describe pod httpd-php
Name: httpd-php
Namespace: default
Image(s): centos7-httpd-php-alive-new
Node: labcr3/10.99.1.203
Labels: cont=httpd-php-mxs,name=httpd-php
Status: Pending
Reason:
Message:
IP:
Replication Controllers:
Containers:
httpd-php-mxs:
Image: centos7-httpd-php-alive-new
State: Waiting
Reason: Image: centos7-httpd-php-alive-new is ready, container is creating
Ready: False
Restart Count: 0
Conditions:
Type Status
Ready False
Events:
FirstSeen LastSeen CountFrom SubobjectPath Reason Message
Fri, 04 Dec 2015 15:29:48 +0100 Fri, 04 Dec 2015 15:29:48 +0100 1 {scheduler } scheduled Successfully assigned httpd-php to labcr3
Fri, 04 Dec 2015 15:31:53 +0100 Fri, 04 Dec 2015 15:38:09 +0100 4 {kubelet labcr3} failedMount Unable to mount volumes for pod "httpd-php_default": exit status 32
Fri, 04 Dec 2015 15:31:53 +0100 Fri, 04 Dec 2015 15:38:09 +0100 4 {kubelet labcr3} failedSync Error syncing pod, skipping: exit status 32
Kubelet logs on minons
-- Logs begin at ven 2015-12-04 14:25:39 CET, end at ven 2015-12-04 15:34:39 CET. --
dic 04 15:33:58 LABCR3 kubelet[1423]: E1204 15:33:58.986220 1423 pod_workers.go:111] Error syncing pod 7915461d-9a93-11e5-b8eb-d4bed9b48f94, skipping:
dic 04 15:33:58 LABCR3 kubelet[1423]: E1204 15:33:58.973581 1423 kubelet.go:1190] Unable to mount volumes for pod "httpd-php_default": exit status 32;
dic 04 15:33:58 LABCR3 kubelet[1423]: Output: mount.nfs: Connection timed out
dic 04 15:33:58 LABCR3 kubelet[1423]: Mounting arguments: labsn1:/mnt/VOL1/nfs-ds /var/lib/kubelet/pods/7915461d-9a93-11e5-b8eb-d4bed9b48f94/volumes/kube
dic 04 15:33:58 LABCR3 kubelet[1423]: E1204 15:33:58.973484 1423 mount_linux.go:103] Mount failed: exit status 32
dic 04 15:31:53 LABCR3 kubelet[1423]: E1204 15:31:53.939521 1423 pod_workers.go:111] Error syncing pod 7915461d-9a93-11e5-b8eb-d4bed9b48f94, skipping:
dic 04 15:31:53 LABCR3 kubelet[1423]: E1204 15:31:53.927865 1423 kubelet.go:1190] Unable to mount volumes for pod "httpd-php_default": exit status 32;
dic 04 15:31:53 LABCR3 kubelet[1423]: Output: mount.nfs: Connection timed out
dic 04 15:31:53 LABCR3 kubelet[1423]: Mounting arguments: labsn1:/mnt/VOL1/nfs-ds /var/lib/kubelet/pods/7915461d-9a93-11e5-b8eb-d4bed9b48f94/volumes/kube
dic 04 15:31:53 LABCR3 kubelet[1423]: E1204 15:31:53.927760 1423 mount_linux.go:103] Mount failed: exit status 32
dic 04 15:21:31 LABCR3 kubelet[1423]: E1204 15:21:31.119117 1423 reflector.go:183] watch of *api.Service ended with: 401: The event in requested in
UPDATE 12/18/2015
Yes, is the first pod with nfs mount.
nfs is mounted on the minion, so is working fine and there isn't ACL applyed to the nfs
this morning I've changed the config like this:
apiVersion: v1
kind: Pod
metadata:
name: httpd-php
labels:
name: httpd-php
cont: httpd-php-mxs
spec:
containers:
- name: httpd-php-mxs
image: "centos7-httpd-php-alive-new"
command: ["/usr/sbin/httpd","-DFOREGROUND"]
imagePullPolicy: "Never"
ports:
- containerPort: 80
volumeMounts:
- mountPath: "/NFS/MAXISPORT/DOCROOT"
name: mynfs
volumes:
- name: mynfs
persistentVolumeClaim:
claimName: nfs-claim
[root#labcr1 KUBE]# cat pv/nfspv.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
name: nfspv
spec:
capacity:
storage: 500Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Recycle
nfs:
path: /mnt/VOL1/nfs-ds
server: 10.99.1.202
[root#labcr1 KUBE]# cat pv/nfspvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nfs-claim
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 500Gi
[root#labcr1 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
httpd-php 1/1 Running 1 29m
[root#labcr1 ~]# kubectl get pv
NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON
nfspv <none> 536870912000 RWX Bound default/nfs-claim
[root#labcr1 ~]# kubectl get pvc
NAME LABELS STATUS VOLUME
nfs-claim map[] Bound nfspv
[root#labcr1 ~]# kubectl describe pod httpd-php
Name: httpd-php
Namespace: default
Image(s): centos7-httpd-php-alive-new
Node: labcr3/10.99.1.203
Labels: cont=httpd-php-mxs,name=httpd-php
Status: Running
Reason:
Message:
IP: 172.17.76.2
Replication Controllers: <none>
Containers:
httpd-php-mxs:
Image: centos7-httpd-php-alive-new
State: Running
Started: Fri, 18 Dec 2015 15:12:39 +0100
Ready: True
Restart Count: 1
Conditions:
Type Status
Ready True
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
Fri, 18 Dec 2015 14:44:20 +0100 Fri, 18 Dec 2015 14:44:20 +0100 1 {kubelet labcr3} implicitly required container POD pulled Pod container image "gcr.io/google_containers/pause:0.8.0" already present on machine
Fri, 18 Dec 2015 14:44:20 +0100 Fri, 18 Dec 2015 14:44:20 +0100 1 {scheduler } scheduled Successfully assigned httpd-php to labcr3
Fri, 18 Dec 2015 14:44:20 +0100 Fri, 18 Dec 2015 14:44:20 +0100 1 {kubelet labcr3} implicitly required container POD created Created with docker id 3b6d520e66b0
Fri, 18 Dec 2015 14:44:20 +0100 Fri, 18 Dec 2015 14:44:20 +0100 1 {kubelet labcr3} implicitly required container POD started Started with docker id 3b6d520e66b0
Fri, 18 Dec 2015 14:44:21 +0100 Fri, 18 Dec 2015 14:44:21 +0100 1 {kubelet labcr3} spec.containers{httpd-php-mxs} started Started with docker id 859ea73a6cdd
Fri, 18 Dec 2015 14:44:21 +0100 Fri, 18 Dec 2015 14:44:21 +0100 1 {kubelet labcr3} spec.containers{httpd-php-mxs} created Created with docker id 859ea73a6cdd
Fri, 18 Dec 2015 15:12:38 +0100 Fri, 18 Dec 2015 15:12:38 +0100 1 {kubelet labcr3} implicitly required container POD pulled Pod container image "gcr.io/google_containers/pause:0.8.0" already present on machine
Fri, 18 Dec 2015 15:12:38 +0100 Fri, 18 Dec 2015 15:12:38 +0100 1 {kubelet labcr3} implicitly required container POD created Created with docker id bdfed6fd4c97
Fri, 18 Dec 2015 15:12:38 +0100 Fri, 18 Dec 2015 15:12:38 +0100 1 {kubelet labcr3} implicitly required container POD started Started with docker id bdfed6fd4c97
Fri, 18 Dec 2015 15:12:39 +0100 Fri, 18 Dec 2015 15:12:39 +0100 1 {kubelet labcr3} spec.containers{httpd-php-mxs} created Created with docker id ab3a39784b4e
Fri, 18 Dec 2015 15:12:39 +0100 Fri, 18 Dec 2015 15:12:39 +0100 1 {kubelet labcr3} spec.containers{httpd-php-mxs} started Started with docker id ab3a39784b4e
Now the minion is up and running but inside the pods I can't see the nfs mounted....
[root#httpd-php /]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/docker-253:2-3413- f90c89a604c59b6aacbf95af649ef48eb4df829fdfa4b51eee4bfe75a6a156c3 99G 444M 93G 1% /
tmpfs 5.9G 0 5.9G 0% /dev
shm 64M 0 64M 0% /dev/shm
tmpfs 5.9G 0 5.9G 0% /sys/fs/cgroup
tmpfs 5.9G 0 5.9G 0% /run/secrets
/dev/mapper/centos_labcr2-var 248G 3.8G 244G 2% /etc/hosts

Related

Kubernetes pod stuck pending, but lacks events that tell me why

I have a simple alpine:node kubernetes pod attempting to start from a deployment on a cluster with a large surplus of resources on every node. It's failing to move out of the pending status. When I run kubectl describe, I get no events that explain why this is happening. What are the next steps for debugging a problem like this?
Here are some commands:
kubectl get events
60m Normal SuccessfulCreate replicaset/frontend-r0ktmgn9-dcc95dfd8 Created pod: frontend-r0ktmgn9-dcc95dfd8-8wn9j
36m Normal ScalingReplicaSet deployment/frontend-r0ktmgn9 Scaled down replica set frontend-r0ktmgn9-6d57cb8698 to 0
36m Normal SuccessfulDelete replicaset/frontend-r0ktmgn9-6d57cb8698 Deleted pod: frontend-r0ktmgn9-6d57cb8698-q52h8
36m Normal ScalingReplicaSet deployment/frontend-r0ktmgn9 Scaled up replica set frontend-r0ktmgn9-58cd8f4c79 to 1
36m Normal SuccessfulCreate replicaset/frontend-r0ktmgn9-58cd8f4c79 Created pod: frontend-r0ktmgn9-58cd8f4c79-fn5q4
kubectl describe po/frontend-r0ktmgn9-58cd8f4c79-fn5q4 (some parts redacted)
Name: frontend-r0ktmgn9-58cd8f4c79-fn5q4
Namespace: default
Priority: 0
Node: <none>
Labels: app=frontend
pod-template-hash=58cd8f4c79
Annotations: kubectl.kubernetes.io/restartedAt: 2021-05-14T20:02:11-05:00
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/frontend-r0ktmgn9-58cd8f4c79
Containers:
frontend:
Image: [Redacted]
Port: 3000/TCP
Host Port: 0/TCP
Environment: [Redacted]
Mounts: <none>
Volumes: <none>
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
I use loft virtual clusters, so the above commands were run in a virtual cluster context, where this pod's deployment is the only resource. When run from the main cluster itself:
kubectl describe nodes
Name: autoscale-pool-01-8bwo1
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=g-8vcpu-32gb
beta.kubernetes.io/os=linux
doks.digitalocean.com/node-id=d7c71f70-35bd-4854-9527-28f56adfb4c4
doks.digitalocean.com/node-pool=autoscale-pool-01
doks.digitalocean.com/node-pool-id=c31388cc-29c8-4fb9-9c52-c309dba972d3
doks.digitalocean.com/version=1.20.2-do.0
failure-domain.beta.kubernetes.io/region=nyc1
kubernetes.io/arch=amd64
kubernetes.io/hostname=autoscale-pool-01-8bwo1
kubernetes.io/os=linux
node.kubernetes.io/instance-type=g-8vcpu-32gb
region=nyc1
topology.kubernetes.io/region=nyc1
wireguard_capable=false
Annotations: alpha.kubernetes.io/provided-node-ip: 10.116.0.3
csi.volume.kubernetes.io/nodeid: {"dobs.csi.digitalocean.com":"246129007"}
io.cilium.network.ipv4-cilium-host: 10.244.0.171
io.cilium.network.ipv4-health-ip: 10.244.0.198
io.cilium.network.ipv4-pod-cidr: 10.244.0.128/25
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 14 May 2021 19:56:44 -0500
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: autoscale-pool-01-8bwo1
AcquireTime: <unset>
RenewTime: Fri, 14 May 2021 21:33:44 -0500
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Fri, 14 May 2021 19:57:01 -0500 Fri, 14 May 2021 19:57:01 -0500 CiliumIsUp Cilium is running on this node
MemoryPressure False Fri, 14 May 2021 21:30:33 -0500 Fri, 14 May 2021 19:56:44 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 14 May 2021 21:30:33 -0500 Fri, 14 May 2021 19:56:44 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 14 May 2021 21:30:33 -0500 Fri, 14 May 2021 19:56:44 -0500 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 14 May 2021 21:30:33 -0500 Fri, 14 May 2021 19:57:04 -0500 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
Hostname: autoscale-pool-01-8bwo1
InternalIP: 10.116.0.3
ExternalIP: 134.122.31.92
Capacity:
cpu: 8
ephemeral-storage: 103176100Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32941864Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 95087093603
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 29222Mi
pods: 110
System Info:
Machine ID: a98e294e721847469503cd531b9bc88e
System UUID: a98e294e-7218-4746-9503-cd531b9bc88e
Boot ID: a16de75d-7532-441d-885a-de90fb2cb286
Kernel Version: 4.19.0-11-amd64
OS Image: Debian GNU/Linux 10 (buster)
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.4.3
Kubelet Version: v1.20.2
Kube-Proxy Version: v1.20.2
ProviderID: digitalocean://246129007
Non-terminated Pods: (28 in total) [Redacted]
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2727m (34%) 3202m (40%)
memory 9288341376 (30%) 3680Mi (12%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events: <none>
Name: autoscale-pool-02-8mly8
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=m-2vcpu-16gb
beta.kubernetes.io/os=linux
doks.digitalocean.com/node-id=eb0f7d72-d183-4953-af0c-36a88bc64921
doks.digitalocean.com/node-pool=autoscale-pool-02
doks.digitalocean.com/node-pool-id=18a37926-d208-4ab9-b17d-b3f9acb3ce0f
doks.digitalocean.com/version=1.20.2-do.0
failure-domain.beta.kubernetes.io/region=nyc1
kubernetes.io/arch=amd64
kubernetes.io/hostname=autoscale-pool-02-8mly8
kubernetes.io/os=linux
node.kubernetes.io/instance-type=m-2vcpu-16gb
region=nyc1
topology.kubernetes.io/region=nyc1
wireguard_capable=true
Annotations: alpha.kubernetes.io/provided-node-ip: 10.116.0.12
csi.volume.kubernetes.io/nodeid: {"dobs.csi.digitalocean.com":"237830322"}
io.cilium.network.ipv4-cilium-host: 10.244.3.115
io.cilium.network.ipv4-health-ip: 10.244.3.96
io.cilium.network.ipv4-pod-cidr: 10.244.3.0/25
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sat, 20 Mar 2021 18:14:37 -0500
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: autoscale-pool-02-8mly8
AcquireTime: <unset>
RenewTime: Fri, 14 May 2021 21:33:44 -0500
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Tue, 06 Apr 2021 16:24:45 -0500 Tue, 06 Apr 2021 16:24:45 -0500 CiliumIsUp Cilium is running on this node
MemoryPressure False Fri, 14 May 2021 21:33:35 -0500 Tue, 13 Apr 2021 18:40:21 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 14 May 2021 21:33:35 -0500 Wed, 05 May 2021 15:16:08 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 14 May 2021 21:33:35 -0500 Tue, 06 Apr 2021 16:24:40 -0500 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 14 May 2021 21:33:35 -0500 Tue, 06 Apr 2021 16:24:49 -0500 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
Hostname: autoscale-pool-02-8mly8
InternalIP: 10.116.0.12
ExternalIP: 157.230.208.24
Capacity:
cpu: 2
ephemeral-storage: 51570124Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 16427892Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 47527026200
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 13862Mi
pods: 110
System Info:
Machine ID: 7c8d577266284fa09f84afe03296abe8
System UUID: cf5f4cc0-17a8-4fae-b1ab-e0488675ae06
Boot ID: 6698c614-76a0-484c-bb23-11d540e0e6f3
Kernel Version: 4.19.0-16-amd64
OS Image: Debian GNU/Linux 10 (buster)
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.4.4
Kubelet Version: v1.20.5
Kube-Proxy Version: v1.20.5
ProviderID: digitalocean://237830322
Non-terminated Pods: (73 in total) [Redacted]
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1202m (60%) 202m (10%)
memory 2135Mi (15%) 5170Mi (37%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events: <none>

How do i change Kubernetes DiskPressure status from true to false?

After creating a simple nginx deployment, my pod status shows as "PENDING". When I run kubectl get pods command, I get the following:
NAME READY STATUS RESTARTS AGE
nginx-deployment-6b474476c4-dq26w 0/1 Pending 0 50m
nginx-deployment-6b474476c4-wjblx 0/1 Pending 0 50m
If I check on my node health, I get:
Taints: node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: kubernetes-master
AcquireTime: <unset>
RenewTime: Wed, 05 Aug 2020 12:43:57 +0530
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Wed, 05 Aug 2020 09:12:31 +0530 Wed, 05 Aug 2020 09:12:31 +0530 CalicoIsUp Calico is running on this node
MemoryPressure False Wed, 05 Aug 2020 12:43:36 +0530 Tue, 04 Aug 2020 23:01:43 +0530 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Wed, 05 Aug 2020 12:43:36 +0530 Tue, 04 Aug 2020 23:02:06 +0530 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Wed, 05 Aug 2020 12:43:36 +0530 Tue, 04 Aug 2020 23:01:43 +0530 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 05 Aug 2020 12:43:36 +0530 Tue, 04 Aug 2020 23:02:06 +0530 KubeletReady kubelet is posting ready status. AppArmor enabled
You can remove the taint for disk pressure using below command but ideally you need to investigate why kubelet is reporting disk pressure . The node may be out of disk space.
kubectl taint nodes <nodename> node.kubernetes.io/disk-pressure-
This will get you out of pending state of the nginx pods.
#manjeet,
What's the out put of 'df -kh' on the node?
Find the disk/partiion/pv that has pressure. Increase it. Then restart kubelet. Then remove the taint. Things should work.

How do I find out what image is running in a Kubernetes VM on GCE?

I've created a Kubernetes cluster in Google Compute Engine using cluster/kube-up.sh. How can I find out what Linux image GCE used to create the virtual machines? I've logged into some nodes using SSH and the usual commands (uname -a etc) don't tell me.
The default config file at kubernetes/cluster/gce/config-default.sh doesn't seem to offer any clues.
It uses something called Google Container VM image. Check out the blogpost announcing it here:
https://cloudplatform.googleblog.com/2016/09/introducing-Google-Container-VM-Image.html
There are two simple ways to look at it
In the Kubernetes GUI based dashboard, click on the nodes
From command line of the kubernetes master node use kubectl describe
pods/{pod-name}
(Make sure to select the correct namespace, if you are using any.)
Here is a sample output, please look into the "image" label of the output
kubectl describe pods/fedoraapache
Name: fedoraapache
Namespace: default
Image(s): fedora/apache
Node: 127.0.0.1/127.0.0.1
Labels: name=fedoraapache
Status: Running
Reason:
Message:
IP: 172.17.0.2
Replication Controllers: <none>
Containers:
fedoraapache:
Image: fedora/apache
State: Running
Started: Thu, 06 Aug 2015 03:38:37 -0400
Ready: True
Restart Count: 0
Conditions:
Type Status
Ready True
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
Thu, 06 Aug 2015 03:38:35 -0400 Thu, 06 Aug 2015 03:38:35 -0400 1 {scheduler } scheduled Successfully assigned fedoraapache to 127.0.0.1
Thu, 06 Aug 2015 03:38:35 -0400 Thu, 06 Aug 2015 03:38:35 -0400 1 {kubelet 127.0.0.1} implicitly required container POD pulled Pod container image "gcr.io/google_containers/pause:0.8.0" already present on machine
Thu, 06 Aug 2015 03:38:36 -0400 Thu, 06 Aug 2015 03:38:36 -0400 1 {kubelet 127.0.0.1} implicitly required container POD created Created with docker id 98aeb13c657b
Thu, 06 Aug 2015 03:38:36 -0400 Thu, 06 Aug 2015 03:38:36 -0400 1 {kubelet 127.0.0.1} implicitly required container POD started Started with docker id 98aeb13c657b
Thu, 06 Aug 2015 03:38:37 -0400 Thu, 06 Aug 2015 03:38:37 -0400 1 {kubelet 127.0.0.1} spec.containers{fedoraapache} created Created with docker id debe7fe1ff4f
Thu, 06 Aug 2015 03:38:37 -0400 Thu, 06 Aug 2015 03:38:37 -0400 1 {kubelet 127.0.0.1} spec.containers{fedoraapache} started Started with docker id debe7fe1ff4f

Kubernetes pod on Google Container Engine continually restarts, is never ready

I'm trying to get a ghost blog deployed on GKE, working off of the persistent disks with WordPress tutorial. I have a working container that runs fine manually on a GKE node:
docker run -d --name my-ghost-blog -p 2368:2368 -d us.gcr.io/my_project_id/my-ghost-blog
I can also correctly create a pod using the following method from another tutorial:
kubectl run ghost --image=us.gcr.io/my_project_id/my-ghost-blog --port=2368
When I do that I can curl the blog on the internal IP from within the cluster, and get the following output from kubectl get pod:
Name: ghosty-nqgt0
Namespace: default
Image(s): us.gcr.io/my_project_id/my-ghost-blog
Node: very-long-node-name/10.240.51.18
Labels: run=ghost
Status: Running
Reason:
Message:
IP: 10.216.0.9
Replication Controllers: ghost (1/1 replicas created)
Containers:
ghosty:
Image: us.gcr.io/my_project_id/my-ghost-blog
Limits:
cpu: 100m
State: Running
Started: Fri, 04 Sep 2015 12:18:44 -0400
Ready: True
Restart Count: 0
Conditions:
Type Status
Ready True
Events:
...
The problem arises when I instead try to create the pod from a yaml file, per the Wordpress tutorial. Here's the yaml:
metadata:
name: ghost
labels:
name: ghost
spec:
containers:
- image: us.gcr.io/my_project_id/my-ghost-blog
name: ghost
env:
- name: NODE_ENV
value: production
- name: VIRTUAL_HOST
value: myghostblog.com
ports:
- containerPort: 2368
When I run kubectl create -f ghost.yaml, the pod is created, but is never ready:
> kubectl get pod ghost
NAME READY STATUS RESTARTS AGE
ghost 0/1 Running 11 3m
The pod continuously restarts, as confirmed by the output of kubectl describe pod ghost:
Name: ghost
Namespace: default
Image(s): us.gcr.io/my_project_id/my-ghost-blog
Node: very-long-node-name/10.240.51.18
Labels: name=ghost
Status: Running
Reason:
Message:
IP: 10.216.0.12
Replication Controllers: <none>
Containers:
ghost:
Image: us.gcr.io/my_project_id/my-ghost-blog
Limits:
cpu: 100m
State: Running
Started: Fri, 04 Sep 2015 14:08:20 -0400
Ready: False
Restart Count: 10
Conditions:
Type Status
Ready False
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
Fri, 04 Sep 2015 14:03:20 -0400 Fri, 04 Sep 2015 14:03:20 -0400 1 {scheduler } scheduled Successfully assigned ghost to very-long-node-name
Fri, 04 Sep 2015 14:03:27 -0400 Fri, 04 Sep 2015 14:03:27 -0400 1 {kubelet very-long-node-name} implicitly required container POD created Created with docker id dbbc27b4d280
Fri, 04 Sep 2015 14:03:27 -0400 Fri, 04 Sep 2015 14:03:27 -0400 1 {kubelet very-long-node-name} implicitly required container POD started Started with docker id dbbc27b4d280
Fri, 04 Sep 2015 14:03:27 -0400 Fri, 04 Sep 2015 14:03:27 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} created Created with docker id ceb14ba72929
Fri, 04 Sep 2015 14:03:27 -0400 Fri, 04 Sep 2015 14:03:27 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} started Started with docker id ceb14ba72929
Fri, 04 Sep 2015 14:03:27 -0400 Fri, 04 Sep 2015 14:03:27 -0400 1 {kubelet very-long-node-name} implicitly required container POD pulled Pod container image "gcr.io/google_containers/pause:0.8.0" already present on machine
Fri, 04 Sep 2015 14:03:30 -0400 Fri, 04 Sep 2015 14:03:30 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} started Started with docker id 0b8957fe9b61
Fri, 04 Sep 2015 14:03:30 -0400 Fri, 04 Sep 2015 14:03:30 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} created Created with docker id 0b8957fe9b61
Fri, 04 Sep 2015 14:03:40 -0400 Fri, 04 Sep 2015 14:03:40 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} created Created with docker id edaf0df38c01
Fri, 04 Sep 2015 14:03:40 -0400 Fri, 04 Sep 2015 14:03:40 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} started Started with docker id edaf0df38c01
Fri, 04 Sep 2015 14:03:50 -0400 Fri, 04 Sep 2015 14:03:50 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} started Started with docker id d33f5e5a9637
...
This cycle of created/started goes on forever, if I don't kill the pod. The only difference from the successful pod is the lack of a replication controller. I don't expect this is the problem because the tutorial mentions nothing about rc.
Why is this happening? How can I create a successful pod from config file? And where would I find more verbose logs about what is going on?
If the same docker image is working via kubectl run but not working in a pod, then something is wrong with the pod spec. Compare the full output of the pod as created from spec and as created by rc to see what differs by running kubectl get pods <name> -o yaml for both. Shot in the dark: is it possible the env vars specified in the pod spec are causing it to crash on startup?
Maybe you could use different restart Policy in the yaml file?
What you have I believe is equivalent to
- restartPolicy: Never
no replication controller. You may try to add this line to yaml and set it to Always (and this will provide you with RC), or to OnFailure.
https://github.com/kubernetes/kubernetes/blob/master/docs/user-guide/pod-states.md#restartpolicy
Container logs may be useful, with kubectl logs
Usage:
kubectl logs [-p] POD [-c CONTAINER]
http://kubernetes.io/v1.0/docs/user-guide/kubectl/kubectl_logs.html

kubernetes pod status always "pending"

I am following the Fedora getting started guide (https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/getting-started-guides/fedora/fedora_ansible_config.md) and trying to run the pod fedoraapache. But kubectl always shows fedoraapache as pending:
POD IP CONTAINER(S) IMAGE(S) HOST LABELS STATUS
fedoraapache fedoraapache fedora/apache 192.168.226.144/192.168.226.144 name=fedoraapache Pending
Since it is pending, I cannot run kubectl log pod fedoraapache. So,
I instead run kubectl describe pod fedoraapache, which shows the following errors:
Fri, 20 Mar 2015 22:00:05 +0800 Fri, 20 Mar 2015 22:00:05 +0800 1 {kubelet 192.168.226.144} implicitly required container POD created Created with docker id d4877bdffd4f2a13a17d4cc93c27c1c93d5494807b39ee8a823f5d9350e404d4
Fri, 20 Mar 2015 22:00:05 +0800 Fri, 20 Mar 2015 22:00:05 +0800 1 {kubelet 192.168.226.144} failedSync Error syncing pod, skipping: API error (500): Cannot start container d4877bdffd4f2a13a17d4cc93c27c1c93d5494807b39ee8a823f5d9350e404d4: (exit status 1)
Fri, 20 Mar 2015 22:00:15 +0800 Fri, 20 Mar 2015 22:00:15 +0800 1 {kubelet 192.168.226.144} implicitly required container POD created Created with docker id 1c32b4c6e1aad0e575f6a155aebefcd5dd96857b12c47a63bfd8562fba961747
Fri, 20 Mar 2015 22:00:15 +0800 Fri, 20 Mar 2015 22:00:15 +0800 1 {kubelet 192.168.226.144} implicitly required container POD failed Failed to start with docker id 1c32b4c6e1aad0e575f6a155aebefcd5dd96857b12c47a63bfd8562fba961747 with error: API error (500): Cannot start container 1c32b4c6e1aad0e575f6a155aebefcd5dd96857b12c47a63bfd8562fba961747: (exit status 1)
Fri, 20 Mar 2015 22:00:15 +0800 Fri, 20 Mar 2015 22:00:15 +0800 1 {kubelet 192.168.226.144} failedSync Error syncing pod, skipping: API error (500): Cannot start container 1c32b4c6e1aad0e575f6a155aebefcd5dd96857b12c47a63bfd8562fba961747: (exit status 1)
Fri, 20 Mar 2015 22:00:25 +0800 Fri, 20 Mar 2015 22:00:25 +0800 1 {kubelet 192.168.226.144} failedSync Error syncing pod, skipping: API error (500): Cannot start container 8b117ee5c6bf13f0e97b895c367ce903e2a9efbd046a663c419c389d9953c55e: (exit status 1)
Fri, 20 Mar 2015 22:00:25 +0800 Fri, 20 Mar 2015 22:00:25 +0800 1 {kubelet 192.168.226.144} implicitly required container POD created Created with docker id 8b117ee5c6bf13f0e97b895c367ce903e2a9efbd046a663c419c389d9953c55e
Fri, 20 Mar 2015 22:00:25 +0800 Fri, 20 Mar 2015 22:00:25 +0800 1 {kubelet 192.168.226.144} implicitly required container POD failed Failed to start with docker id 8b117ee5c6bf13f0e97b895c367ce903e2a9efbd046a663c419c389d9953c55e with error: API error (500): Cannot start container 8b117ee5c6bf13f0e97b895c367ce903e2a9efbd046a663c419c389d9953c55e: (exit status 1)
Fri, 20 Mar 2015 22:00:35 +0800 Fri, 20 Mar 2015 22:00:35 +0800 1 {kubelet 192.168.226.144} implicitly required container POD failed Failed to start with docker id 4b463040842b6a45db2ab154652fd2a27550dbd2e1a897c98473cd0b66d2d614 with error: API error (500): Cannot start container 4b463040842b6a45db2ab154652fd2a27550dbd2e1a897c98473cd0b66d2d614: (exit status 1)
Fri, 20 Mar 2015 22:00:35 +0800 Fri, 20 Mar 2015 22:00:35 +0800 1 {kubelet 192.168.226.144} implicitly required container POD created Created with docker id 4b463040842b6a45db2ab154652fd2a27550dbd2e1a897c98473cd0b66d2d614
Fri, 20 Mar 2015 21:42:35 +0800 Fri, 20 Mar 2015 22:00:35 +0800 109 {kubelet 192.168.226.144} implicitly required container POD pulled Successfully pulled image "kubernetes/pause:latest"
Fri, 20 Mar 2015 22:00:35 +0800 Fri, 20 Mar 2015 22:00:35 +0800 1 {kubelet 192.168.226.144} failedSync Error syncing pod, skipping: API error (500): Cannot start container 4b463040842b6a45db2ab154652fd2a27550dbd2e1a897c98473cd0b66d2d614: (exit status 1)
There are several reasons container can fail to start:
the container command itself fails and exits -> check your docker image and start up script to make sure it works.
Use
sudo docker ps -a to find the offending container and
sudo docker logs <container> to check for failure inside the container
a dependency is not there: that happens for example when one tries to mount a volume that is not present, for example Secrets that are not created yet.
--> make sure the dependent volumes are created.
Kubelet is unable to start the container we use for holding the network namespace. Some things to try are:
Can you manually pull and run gcr.io/google_containers/pause:0.8.0? (This is the image used for the network namespace container at head right now.)
As mentioned already, /var/log/kubelet.log should have more detail; but log location is distro-dependent, so check https://github.com/GoogleCloudPlatform/kubernetes/wiki/Debugging-FAQ#checking-logs.
Step one is to describe the pod and see the problems:
$ kubectl describe pod <pod Name>
or
If you use master node, then you can configure the master node to run pods with:
$ kubectl taint nodes --all node-role.kubernetes.io/master-
Check if is Kubelet is running on your machine. I came across this problem one time and discovered that Kubelet was not running, which explains why the pod status was stuck in "pending". Kubelet runs as a systemd service in my environment, so if that is also the case for you then the following command will help you check its status:
systemctl status kubelet