Kubernetes cronjob never scheduling, no errors

Kubernetes cronjob never scheduling, no errors - kubernetes

I have a kubernetes cluster on 1.18:
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-17T11:33:59Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
I am following the documentation for 1.18 cronjobs. I have the following yaml saved in hello_world.yaml:
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
I created the cronjob with
kubectl create -f hello_world.yaml
cronjob.batch/hello created
However the jobs are never scheduled, despite the cronjob being created:
kubectl get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
hello */1 * * * * False 0 <none> 5m48s
kubectl get jobs
NAME COMPLETIONS DURATION AGE
not-my-job-1624413720 1/1 43m 7d11h
not-my-job-1624500120 1/1 42m 6d11h
not-my-job-1624586520 1/1 43m 5d11h
I notice that the last job to run did so 5 days ago, when our certificates expired resulting in developers getting the following error:
"Unable to connect to the server: x509: certificate has expired or is not yet valid"
We regenerated the certs using the following procedure from IBM, which seemed to work at the time. These are the main commands, we did some backups of configuration files etc. also as per the linked doc:
kubeadm alpha certs renew all
systemctl daemon-reload&&systemctl restart kubelet
I am sure the certificate expiration and renewal has caused some issue, but I see no smoking gun.
kubectl describe cronjob hello
Name: hello
Namespace: default
Labels: <none>
Annotations: <none>
Schedule: */1 * * * *
Concurrency Policy: Allow
Suspend: False
Successful Job History Limit: 3
Failed Job History Limit: 1
Starting Deadline Seconds: <unset>
Selector: <unset>
Parallelism: <unset>
Completions: <unset>
Pod Template:
Labels: <none>
Containers:
hello:
Image: busybox
Port: <none>
Host Port: <none>
Args:
/bin/sh
-c
date; echo Hello from the Kubernetes cluster
Environment: <none>
Mounts: <none>
Volumes: <none>
Last Schedule Time: <unset>
Active Jobs: <none>
Events: <none>
Any help would be greatly appreciated! Thanks.
EDIT: provide some more info:
sudo kubeadm alpha certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Jun 30, 2022 13:31 UTC 364d no
apiserver Jun 30, 2022 13:31 UTC 364d ca no
apiserver-etcd-client Jun 30, 2022 13:31 UTC 364d etcd-ca no
apiserver-kubelet-client Jun 30, 2022 13:31 UTC 364d ca no
controller-manager.conf Jun 30, 2022 13:31 UTC 364d no
etcd-healthcheck-client Jun 30, 2022 13:31 UTC 364d etcd-ca no
etcd-peer Jun 30, 2022 13:31 UTC 364d etcd-ca no
etcd-server Jun 30, 2022 13:31 UTC 364d etcd-ca no
front-proxy-client Jun 30, 2022 13:31 UTC 364d front-proxy-ca no
scheduler.conf Jun 30, 2022 13:31 UTC 364d no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Jun 23, 2030 13:21 UTC 8y no
etcd-ca Jun 23, 2030 13:21 UTC 8y no
front-proxy-ca Jun 23, 2030 13:21 UTC 8y no
ls -alt /etc/kubernetes/pki/
total 68
-rw-r--r-- 1 root root 1058 Jun 30 13:31 front-proxy-client.crt
-rw------- 1 root root 1679 Jun 30 13:31 front-proxy-client.key
-rw-r--r-- 1 root root 1099 Jun 30 13:31 apiserver-kubelet-client.crt
-rw------- 1 root root 1675 Jun 30 13:31 apiserver-kubelet-client.key
-rw-r--r-- 1 root root 1090 Jun 30 13:31 apiserver-etcd-client.crt
-rw------- 1 root root 1675 Jun 30 13:31 apiserver-etcd-client.key
-rw-r--r-- 1 root root 1229 Jun 30 13:31 apiserver.crt
-rw------- 1 root root 1679 Jun 30 13:31 apiserver.key
drwxr-xr-x 4 root root 4096 Sep 9 2020 ..
drwxr-xr-x 3 root root 4096 Jun 25 2020 .
-rw------- 1 root root 1675 Jun 25 2020 sa.key
-rw------- 1 root root 451 Jun 25 2020 sa.pub
drwxr-xr-x 2 root root 4096 Jun 25 2020 etcd
-rw-r--r-- 1 root root 1038 Jun 25 2020 front-proxy-ca.crt
-rw------- 1 root root 1675 Jun 25 2020 front-proxy-ca.key
-rw-r--r-- 1 root root 1025 Jun 25 2020 ca.crt
-rw------- 1 root root 1679 Jun 25 2020 ca.key

Found a solution to this one after trying a lot of different stuff, forgot to update at the time. The certs were renewed after they had already expired, I guess this stopped a synchronisation of the certs across the different components in the cluster and nothing could talk to the API.
This is a three node cluster. I cordoned the worker nodes, stopped the kubelet service on them, stopped docker containers + service, started new docker containers, started the kubelet, uncordoned the nodes and carried out the same procedure on the master node. This forced the synchronisation of certs and keys across the different components.

We encountered this just now; updated certificate and cronjobs stopped scheduling.
We only needed to restart the master node, saved us a lot of hassle so try that first.

Related

Kubelet certificate expired but work nodes working fine, when we will see the issue

In my v1.23.1 test cluster I see worker node certificate expired some time ago. but worker node still taking the workload and in Ready status.
How this certificate is getting used, when we will see the issue with expired certificate?
# curl -v https://localhost:10250 -k 2>&1 |grep 'expire date'
* expire date: Oct 4 18:02:14 2021 GMT
# openssl x509 -text -noout -in /var/lib/kubelet/pki/kubelet.crt |grep -A2 'Validity'
Validity
Not Before: Oct 4 18:02:14 2020 GMT
Not After : Oct 4 18:02:14 2021 GMT
Update 1:
Cluster is running on-perm with CentOS Stream 8 OS , build with kubeadm tool. I was able to schedule the workload on all the worker nodes. created nginx deployment and scaled it 50 pods, I can see nginx PODs on all the worker nodes.
Also I can reboot the work nodes with-out any issue.
Update 2:
kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0303 11:17:18.261639 698383 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /run/systemd/resolve/resolv.conf
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Jan 16, 2023 16:15 UTC 318d ca no
apiserver Jan 16, 2023 16:15 UTC 318d ca no
apiserver-kubelet-client Jan 16, 2023 16:15 UTC 318d ca no
controller-manager.conf Jan 16, 2023 16:15 UTC 318d ca no
front-proxy-client Jan 16, 2023 16:15 UTC 318d front-proxy-ca no
scheduler.conf Jan 16, 2023 16:15 UTC 318d ca no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Oct 02, 2030 18:44 UTC 8y no
front-proxy-ca Oct 02, 2030 18:44 UTC 8y no
Thanks
Update 3
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
server10 Ready control-plane,master 519d v1.23.1
server11 Ready control-plane,master 519d v1.23.1
server12 Ready control-plane,master 519d v1.23.1
server13 Ready <none> 519d v1.23.1
server14 Ready <none> 519d v1.23.1
server15 Ready <none> 516d v1.23.1
server16 Ready <none> 516d v1.23.1
server17 Ready <none> 516d v1.23.1
server18 Ready <none> 516d v1.23.1
# kubectl get pods -o wide
nginx-dev-8677c757d4-4k9xp 1/1 Running 0 4d12h 10.203.53.19 server17 <none> <none>
nginx-dev-8677c757d4-6lbc6 1/1 Running 0 4d12h 10.203.89.120 server14 <none> <none>
nginx-dev-8677c757d4-ksckf 1/1 Running 0 4d12h 10.203.124.4 server16 <none> <none>
nginx-dev-8677c757d4-lrz9h 1/1 Running 0 4d12h 10.203.124.41 server16 <none> <none>
nginx-dev-8677c757d4-tllx9 1/1 Running 0 4d12h 10.203.151.70 server11 <none> <none>
# grep client /etc/kubernetes/kubelet.conf
client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
client-key: /var/lib/kubelet/pki/kubelet-client-current.pem
# ls -ltr /var/lib/kubelet/pki
total 16
-rw------- 1 root root 1679 Oct 4 2020 kubelet.key
-rw-r--r-- 1 root root 2258 Oct 4 2020 kubelet.crt
-rw------- 1 root root 1114 Oct 4 2020 kubelet-client-2020-10-04-14-50-21.pem
lrwxrwxrwx 1 root root 59 Jul 6 2021 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2021-07-06-01-44-10.pem
-rw------- 1 root root 1114 Jul 6 2021 kubelet-client-2021-07-06-01-44-10.pem

Those kubelet certificates is called kubelet-serving certificates. They are used when Kubelet acts as a "server" instead of a "client".
For example, kubelet provides metrics to metrics server. So when metrics-server was enabled to use secure-tls, and in case that those certificate are expired, the metrics-server would not have a proper connection to Kubelet to get the metrics. In case you are using K8s Dashboard, the Dashboard will not able to show CPU and memory consumption in the page. That's the time when you see the issue from those expired certificates.
Reference: https://kubernetes.io/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/#client-and-serving-certificates
Those certificate will not auto-rotate when expiring. They are not also able to be rotated with "kubeadm certificate renew". To renew those certificate, you will need to add "serverTLSBootstrap: true" in your cluster config. With this, when the serving certificate expired, kubelet will send a CSR request to K8s cluster, from the cluster, you can use "kubectl certificate approve" to renew them.

Why clustering on k8s through redis-ha doesn't work?

I'm trying to create Redis cluster along with Node.JS (ioredis/cluster) but that doesn't seem to work.
It's v1.11.8-gke.6 on GKE.
I'm doing exactly what been told in ha-redis docs:
~  helm install --set replicas=3 --name redis-test stable/redis-ha
NAME: redis-test
LAST DEPLOYED: Fri Apr 26 00:13:31 2019
NAMESPACE: yt
STATUS: DEPLOYED
RESOURCES:
==> v1/ConfigMap
NAME DATA AGE
redis-test-redis-ha-configmap 3 0s
redis-test-redis-ha-probes 2 0s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
redis-test-redis-ha-server-0 0/2 Init:0/1 0 0s
==> v1/Role
NAME AGE
redis-test-redis-ha 0s
==> v1/RoleBinding
NAME AGE
redis-test-redis-ha 0s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
redis-test-redis-ha ClusterIP None <none> 6379/TCP,26379/TCP 0s
redis-test-redis-ha-announce-0 ClusterIP 10.7.244.34 <none> 6379/TCP,26379/TCP 0s
redis-test-redis-ha-announce-1 ClusterIP 10.7.251.35 <none> 6379/TCP,26379/TCP 0s
redis-test-redis-ha-announce-2 ClusterIP 10.7.252.94 <none> 6379/TCP,26379/TCP 0s
==> v1/ServiceAccount
NAME SECRETS AGE
redis-test-redis-ha 1 0s
==> v1/StatefulSet
NAME READY AGE
redis-test-redis-ha-server 0/3 0s
NOTES:
Redis can be accessed via port 6379 and Sentinel can be accessed via port 26379 on the following DNS name from within your cluster:
redis-test-redis-ha.yt.svc.cluster.local
To connect to your Redis server:
1. Run a Redis pod that you can use as a client:
kubectl exec -it redis-test-redis-ha-server-0 sh -n yt
2. Connect using the Redis CLI:
redis-cli -h redis-test-redis-ha.yt.svc.cluster.local
~  k get pods | grep redis-test
redis-test-redis-ha-server-0 2/2 Running 0 1m
redis-test-redis-ha-server-1 2/2 Running 0 1m
redis-test-redis-ha-server-2 2/2 Running 0 54s
~  kubectl exec -it redis-test-redis-ha-server-0 sh -n yt
Defaulting container name to redis.
Use 'kubectl describe pod/redis-test-redis-ha-server-0 -n yt' to see all of the containers in this pod.
/data $ redis-cli -h redis-test-redis-ha.yt.svc.cluster.local
redis-test-redis-ha.yt.svc.cluster.local:6379> set test key
(error) READONLY You can't write against a read only replica.
But in the end only one random pod I connect to is writable. I ran logs on a few containers and everything seem to be fine there. I tried to run cluster info in redis-cli but I get ERR This instance has cluster support disabled everywhere.
Logs:
~  k logs pod/redis-test-redis-ha-server-0 redis
1:C 25 Apr 2019 20:13:43.604 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 25 Apr 2019 20:13:43.604 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 25 Apr 2019 20:13:43.604 # Configuration loaded
1:M 25 Apr 2019 20:13:43.606 * Running mode=standalone, port=6379.
1:M 25 Apr 2019 20:13:43.606 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 25 Apr 2019 20:13:43.606 # Server initialized
1:M 25 Apr 2019 20:13:43.606 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 25 Apr 2019 20:13:43.627 * DB loaded from disk: 0.021 seconds
1:M 25 Apr 2019 20:13:43.627 * Ready to accept connections
1:M 25 Apr 2019 20:14:11.801 * Replica 10.7.251.35:6379 asks for synchronization
1:M 25 Apr 2019 20:14:11.801 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c2827ffe011d774db005a44165bac67a7e7f7d85', my replication IDs are '8311a1ca896e97d5487c07f2adfd7d4ef924f36b' and '0000000000000000000000000000000000000000')
1:M 25 Apr 2019 20:14:11.802 * Delay next BGSAVE for diskless SYNC
1:M 25 Apr 2019 20:14:17.825 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 25 Apr 2019 20:14:17.825 * Background RDB transfer started by pid 55
55:C 25 Apr 2019 20:14:17.826 * RDB: 0 MB of memory used by copy-on-write
1:M 25 Apr 2019 20:14:17.926 * Background RDB transfer terminated with success
1:M 25 Apr 2019 20:14:17.926 # Slave 10.7.251.35:6379 correctly received the streamed RDB file.
1:M 25 Apr 2019 20:14:17.926 * Streamed RDB transfer with replica 10.7.251.35:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
1:M 25 Apr 2019 20:14:18.828 * Synchronization with replica 10.7.251.35:6379 succeeded
1:M 25 Apr 2019 20:14:42.711 * Replica 10.7.252.94:6379 asks for synchronization
1:M 25 Apr 2019 20:14:42.711 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c2827ffe011d774db005a44165bac67a7e7f7d85', my replication IDs are 'af453adde824b2280ba66adb40cc765bf390e237' and '0000000000000000000000000000000000000000')
1:M 25 Apr 2019 20:14:42.711 * Delay next BGSAVE for diskless SYNC
1:M 25 Apr 2019 20:14:48.976 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 25 Apr 2019 20:14:48.977 * Background RDB transfer started by pid 125
125:C 25 Apr 2019 20:14:48.978 * RDB: 0 MB of memory used by copy-on-write
1:M 25 Apr 2019 20:14:49.077 * Background RDB transfer terminated with success
1:M 25 Apr 2019 20:14:49.077 # Slave 10.7.252.94:6379 correctly received the streamed RDB file.
1:M 25 Apr 2019 20:14:49.077 * Streamed RDB transfer with replica 10.7.252.94:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
1:M 25 Apr 2019 20:14:49.761 * Synchronization with replica 10.7.252.94:6379 succeeded
~  k logs pod/redis-test-redis-ha-server-1 redis
1:C 25 Apr 2019 20:14:11.780 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 25 Apr 2019 20:14:11.781 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 25 Apr 2019 20:14:11.781 # Configuration loaded
1:S 25 Apr 2019 20:14:11.786 * Running mode=standalone, port=6379.
1:S 25 Apr 2019 20:14:11.791 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:S 25 Apr 2019 20:14:11.791 # Server initialized
1:S 25 Apr 2019 20:14:11.791 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:S 25 Apr 2019 20:14:11.792 * DB loaded from disk: 0.001 seconds
1:S 25 Apr 2019 20:14:11.792 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 25 Apr 2019 20:14:11.792 * Ready to accept connections
1:S 25 Apr 2019 20:14:11.792 * Connecting to MASTER 10.7.244.34:6379
1:S 25 Apr 2019 20:14:11.792 * MASTER <-> REPLICA sync started
1:S 25 Apr 2019 20:14:11.792 * Non blocking connect for SYNC fired the event.
1:S 25 Apr 2019 20:14:11.793 * Master replied to PING, replication can continue...
1:S 25 Apr 2019 20:14:11.799 * Trying a partial resynchronization (request c2827ffe011d774db005a44165bac67a7e7f7d85:6006176).
1:S 25 Apr 2019 20:14:17.824 * Full resync from master: af453adde824b2280ba66adb40cc765bf390e237:722
1:S 25 Apr 2019 20:14:17.824 * Discarding previously cached master state.
1:S 25 Apr 2019 20:14:17.852 * MASTER <-> REPLICA sync: receiving streamed RDB from master
1:S 25 Apr 2019 20:14:17.853 * MASTER <-> REPLICA sync: Flushing old data
1:S 25 Apr 2019 20:14:17.853 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 25 Apr 2019 20:14:17.853 * MASTER <-> REPLICA sync: Finished with success
What am I missing or is there a better way to do clustering?

Not the best solution, but I figured I can just use Sentinel instead of finding another way (or maybe there is no another way). It has support on most languages so it shouldn't be very hard (except redis-cli, can't figure how to query Sentinel server).
This is how I got this done on ioredis (node.js, sorry if you not familiar with ES6 syntax):
import * as IORedis from 'ioredis';
import Redis from 'ioredis';
import { redisHost, redisPassword, redisPort } from './config';
export function getRedisConfig(): IORedis.RedisOptions {
// I'm not sure how to set this properly
// ioredis/cluster automatically resolves all pods by hostname, but not this.
// So I have to implicitly specify all pods.
// Or resolve them all by hostname
return {
sentinels: process.env.REDIS_CLUSTER.split(',').map(d => {
const [host, port = 26379] = d.split(':');
return { host, port: Number(port) };
}),
name: process.env.REDIS_MASTER_NAME || 'mymaster',
...(redisPassword ? { password: redisPassword } : {}),
};
}
export async function initializeRedis() {
if (process.env.REDIS_CLUSTER) {
const cluster = new Redis(getRedisConfig());
return cluster;
}
// For dev environment
const client = new Redis(redisPort, redisHost);
if (redisPassword) {
await client.auth(redisPassword);
}
return client;
}
In env:
env:
- name: REDIS_CLUSTER
value: redis-redis-ha-server-1.redis-redis-ha.yt.svc.cluster.local:26379,redis-redis-ha-server-0.redis-redis-ha.yt.svc.cluster.local:23679,redis-redis-ha-server-2.redis-redis-ha.yt.svc.cluster.local:23679
You may wanna protect it using password.

How do I find out what image is running in a Kubernetes VM on GCE?

I've created a Kubernetes cluster in Google Compute Engine using cluster/kube-up.sh. How can I find out what Linux image GCE used to create the virtual machines? I've logged into some nodes using SSH and the usual commands (uname -a etc) don't tell me.
The default config file at kubernetes/cluster/gce/config-default.sh doesn't seem to offer any clues.

It uses something called Google Container VM image. Check out the blogpost announcing it here:
https://cloudplatform.googleblog.com/2016/09/introducing-Google-Container-VM-Image.html

There are two simple ways to look at it
In the Kubernetes GUI based dashboard, click on the nodes
From command line of the kubernetes master node use kubectl describe
pods/{pod-name}
(Make sure to select the correct namespace, if you are using any.)
Here is a sample output, please look into the "image" label of the output
kubectl describe pods/fedoraapache
Name: fedoraapache
Namespace: default
Image(s): fedora/apache
Node: 127.0.0.1/127.0.0.1
Labels: name=fedoraapache
Status: Running
Reason:
Message:
IP: 172.17.0.2
Replication Controllers: <none>
Containers:
fedoraapache:
Image: fedora/apache
State: Running
Started: Thu, 06 Aug 2015 03:38:37 -0400
Ready: True
Restart Count: 0
Conditions:
Type Status
Ready True
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
Thu, 06 Aug 2015 03:38:35 -0400 Thu, 06 Aug 2015 03:38:35 -0400 1 {scheduler } scheduled Successfully assigned fedoraapache to 127.0.0.1
Thu, 06 Aug 2015 03:38:35 -0400 Thu, 06 Aug 2015 03:38:35 -0400 1 {kubelet 127.0.0.1} implicitly required container POD pulled Pod container image "gcr.io/google_containers/pause:0.8.0" already present on machine
Thu, 06 Aug 2015 03:38:36 -0400 Thu, 06 Aug 2015 03:38:36 -0400 1 {kubelet 127.0.0.1} implicitly required container POD created Created with docker id 98aeb13c657b
Thu, 06 Aug 2015 03:38:36 -0400 Thu, 06 Aug 2015 03:38:36 -0400 1 {kubelet 127.0.0.1} implicitly required container POD started Started with docker id 98aeb13c657b
Thu, 06 Aug 2015 03:38:37 -0400 Thu, 06 Aug 2015 03:38:37 -0400 1 {kubelet 127.0.0.1} spec.containers{fedoraapache} created Created with docker id debe7fe1ff4f
Thu, 06 Aug 2015 03:38:37 -0400 Thu, 06 Aug 2015 03:38:37 -0400 1 {kubelet 127.0.0.1} spec.containers{fedoraapache} started Started with docker id debe7fe1ff4f

KUBERNETES PODS:unable to mount nfs volume

I have some trouble mounting an nfs volume inside a docker container.
I have one minion(labcr3) and one pod(httpd-php).
Below you can find all relevant details.
Can you help me ?
Thanks and Regards
Phisical host(minion) is able to mount the nfs volume.
[root#LABCR3 ~]# df -h /NFS
Filesystem Size Used Avail Use% Mounted on
192.168.240.1:/mnt/VOL1/nfs-ds 500G 0 500G 0% /NFS
Here the priviliged mode enable on kubernetes on minion
root#LABCR3 ~]# cat /etc/kubernetes/config | grep PRIV
KUBE_ALLOW_PRIV="--allow_privileged=true"
Describe of pod
[root#labcr1 pods]# kubectl describe pod httpd-php
Name: httpd-php
Namespace: default
Image(s): centos7-httpd-php-alive-new
Node: labcr3/10.99.1.203
Labels: cont=httpd-php-mxs,name=httpd-php
Status: Pending
Reason:
Message:
IP:
Replication Controllers:
Containers:
httpd-php-mxs:
Image: centos7-httpd-php-alive-new
State: Waiting
Reason: Image: centos7-httpd-php-alive-new is ready, container is creating
Ready: False
Restart Count: 0
Conditions:
Type Status
Ready False
Events:
FirstSeen LastSeen CountFrom SubobjectPath Reason Message
Fri, 04 Dec 2015 15:29:48 +0100 Fri, 04 Dec 2015 15:29:48 +0100 1 {scheduler } scheduled Successfully assigned httpd-php to labcr3
Fri, 04 Dec 2015 15:31:53 +0100 Fri, 04 Dec 2015 15:38:09 +0100 4 {kubelet labcr3} failedMount Unable to mount volumes for pod "httpd-php_default": exit status 32
Fri, 04 Dec 2015 15:31:53 +0100 Fri, 04 Dec 2015 15:38:09 +0100 4 {kubelet labcr3} failedSync Error syncing pod, skipping: exit status 32
Kubelet logs on minons
-- Logs begin at ven 2015-12-04 14:25:39 CET, end at ven 2015-12-04 15:34:39 CET. --
dic 04 15:33:58 LABCR3 kubelet[1423]: E1204 15:33:58.986220 1423 pod_workers.go:111] Error syncing pod 7915461d-9a93-11e5-b8eb-d4bed9b48f94, skipping:
dic 04 15:33:58 LABCR3 kubelet[1423]: E1204 15:33:58.973581 1423 kubelet.go:1190] Unable to mount volumes for pod "httpd-php_default": exit status 32;
dic 04 15:33:58 LABCR3 kubelet[1423]: Output: mount.nfs: Connection timed out
dic 04 15:33:58 LABCR3 kubelet[1423]: Mounting arguments: labsn1:/mnt/VOL1/nfs-ds /var/lib/kubelet/pods/7915461d-9a93-11e5-b8eb-d4bed9b48f94/volumes/kube
dic 04 15:33:58 LABCR3 kubelet[1423]: E1204 15:33:58.973484 1423 mount_linux.go:103] Mount failed: exit status 32
dic 04 15:31:53 LABCR3 kubelet[1423]: E1204 15:31:53.939521 1423 pod_workers.go:111] Error syncing pod 7915461d-9a93-11e5-b8eb-d4bed9b48f94, skipping:
dic 04 15:31:53 LABCR3 kubelet[1423]: E1204 15:31:53.927865 1423 kubelet.go:1190] Unable to mount volumes for pod "httpd-php_default": exit status 32;
dic 04 15:31:53 LABCR3 kubelet[1423]: Output: mount.nfs: Connection timed out
dic 04 15:31:53 LABCR3 kubelet[1423]: Mounting arguments: labsn1:/mnt/VOL1/nfs-ds /var/lib/kubelet/pods/7915461d-9a93-11e5-b8eb-d4bed9b48f94/volumes/kube
dic 04 15:31:53 LABCR3 kubelet[1423]: E1204 15:31:53.927760 1423 mount_linux.go:103] Mount failed: exit status 32
dic 04 15:21:31 LABCR3 kubelet[1423]: E1204 15:21:31.119117 1423 reflector.go:183] watch of *api.Service ended with: 401: The event in requested in
UPDATE 12/18/2015
Yes, is the first pod with nfs mount.
nfs is mounted on the minion, so is working fine and there isn't ACL applyed to the nfs
this morning I've changed the config like this:
apiVersion: v1
kind: Pod
metadata:
name: httpd-php
labels:
name: httpd-php
cont: httpd-php-mxs
spec:
containers:
- name: httpd-php-mxs
image: "centos7-httpd-php-alive-new"
command: ["/usr/sbin/httpd","-DFOREGROUND"]
imagePullPolicy: "Never"
ports:
- containerPort: 80
volumeMounts:
- mountPath: "/NFS/MAXISPORT/DOCROOT"
name: mynfs
volumes:
- name: mynfs
persistentVolumeClaim:
claimName: nfs-claim
[root#labcr1 KUBE]# cat pv/nfspv.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
name: nfspv
spec:
capacity:
storage: 500Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Recycle
nfs:
path: /mnt/VOL1/nfs-ds
server: 10.99.1.202
[root#labcr1 KUBE]# cat pv/nfspvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nfs-claim
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 500Gi
[root#labcr1 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
httpd-php 1/1 Running 1 29m
[root#labcr1 ~]# kubectl get pv
NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON
nfspv <none> 536870912000 RWX Bound default/nfs-claim
[root#labcr1 ~]# kubectl get pvc
NAME LABELS STATUS VOLUME
nfs-claim map[] Bound nfspv
[root#labcr1 ~]# kubectl describe pod httpd-php
Name: httpd-php
Namespace: default
Image(s): centos7-httpd-php-alive-new
Node: labcr3/10.99.1.203
Labels: cont=httpd-php-mxs,name=httpd-php
Status: Running
Reason:
Message:
IP: 172.17.76.2
Replication Controllers: <none>
Containers:
httpd-php-mxs:
Image: centos7-httpd-php-alive-new
State: Running
Started: Fri, 18 Dec 2015 15:12:39 +0100
Ready: True
Restart Count: 1
Conditions:
Type Status
Ready True
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
Fri, 18 Dec 2015 14:44:20 +0100 Fri, 18 Dec 2015 14:44:20 +0100 1 {kubelet labcr3} implicitly required container POD pulled Pod container image "gcr.io/google_containers/pause:0.8.0" already present on machine
Fri, 18 Dec 2015 14:44:20 +0100 Fri, 18 Dec 2015 14:44:20 +0100 1 {scheduler } scheduled Successfully assigned httpd-php to labcr3
Fri, 18 Dec 2015 14:44:20 +0100 Fri, 18 Dec 2015 14:44:20 +0100 1 {kubelet labcr3} implicitly required container POD created Created with docker id 3b6d520e66b0
Fri, 18 Dec 2015 14:44:20 +0100 Fri, 18 Dec 2015 14:44:20 +0100 1 {kubelet labcr3} implicitly required container POD started Started with docker id 3b6d520e66b0
Fri, 18 Dec 2015 14:44:21 +0100 Fri, 18 Dec 2015 14:44:21 +0100 1 {kubelet labcr3} spec.containers{httpd-php-mxs} started Started with docker id 859ea73a6cdd
Fri, 18 Dec 2015 14:44:21 +0100 Fri, 18 Dec 2015 14:44:21 +0100 1 {kubelet labcr3} spec.containers{httpd-php-mxs} created Created with docker id 859ea73a6cdd
Fri, 18 Dec 2015 15:12:38 +0100 Fri, 18 Dec 2015 15:12:38 +0100 1 {kubelet labcr3} implicitly required container POD pulled Pod container image "gcr.io/google_containers/pause:0.8.0" already present on machine
Fri, 18 Dec 2015 15:12:38 +0100 Fri, 18 Dec 2015 15:12:38 +0100 1 {kubelet labcr3} implicitly required container POD created Created with docker id bdfed6fd4c97
Fri, 18 Dec 2015 15:12:38 +0100 Fri, 18 Dec 2015 15:12:38 +0100 1 {kubelet labcr3} implicitly required container POD started Started with docker id bdfed6fd4c97
Fri, 18 Dec 2015 15:12:39 +0100 Fri, 18 Dec 2015 15:12:39 +0100 1 {kubelet labcr3} spec.containers{httpd-php-mxs} created Created with docker id ab3a39784b4e
Fri, 18 Dec 2015 15:12:39 +0100 Fri, 18 Dec 2015 15:12:39 +0100 1 {kubelet labcr3} spec.containers{httpd-php-mxs} started Started with docker id ab3a39784b4e
Now the minion is up and running but inside the pods I can't see the nfs mounted....
[root#httpd-php /]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/docker-253:2-3413- f90c89a604c59b6aacbf95af649ef48eb4df829fdfa4b51eee4bfe75a6a156c3 99G 444M 93G 1% /
tmpfs 5.9G 0 5.9G 0% /dev
shm 64M 0 64M 0% /dev/shm
tmpfs 5.9G 0 5.9G 0% /sys/fs/cgroup
tmpfs 5.9G 0 5.9G 0% /run/secrets
/dev/mapper/centos_labcr2-var 248G 3.8G 244G 2% /etc/hosts

Kubernetes pod on Google Container Engine continually restarts, is never ready

I'm trying to get a ghost blog deployed on GKE, working off of the persistent disks with WordPress tutorial. I have a working container that runs fine manually on a GKE node:
docker run -d --name my-ghost-blog -p 2368:2368 -d us.gcr.io/my_project_id/my-ghost-blog
I can also correctly create a pod using the following method from another tutorial:
kubectl run ghost --image=us.gcr.io/my_project_id/my-ghost-blog --port=2368
When I do that I can curl the blog on the internal IP from within the cluster, and get the following output from kubectl get pod:
Name: ghosty-nqgt0
Namespace: default
Image(s): us.gcr.io/my_project_id/my-ghost-blog
Node: very-long-node-name/10.240.51.18
Labels: run=ghost
Status: Running
Reason:
Message:
IP: 10.216.0.9
Replication Controllers: ghost (1/1 replicas created)
Containers:
ghosty:
Image: us.gcr.io/my_project_id/my-ghost-blog
Limits:
cpu: 100m
State: Running
Started: Fri, 04 Sep 2015 12:18:44 -0400
Ready: True
Restart Count: 0
Conditions:
Type Status
Ready True
Events:
...
The problem arises when I instead try to create the pod from a yaml file, per the Wordpress tutorial. Here's the yaml:
metadata:
name: ghost
labels:
name: ghost
spec:
containers:
- image: us.gcr.io/my_project_id/my-ghost-blog
name: ghost
env:
- name: NODE_ENV
value: production
- name: VIRTUAL_HOST
value: myghostblog.com
ports:
- containerPort: 2368
When I run kubectl create -f ghost.yaml, the pod is created, but is never ready:
> kubectl get pod ghost
NAME READY STATUS RESTARTS AGE
ghost 0/1 Running 11 3m
The pod continuously restarts, as confirmed by the output of kubectl describe pod ghost:
Name: ghost
Namespace: default
Image(s): us.gcr.io/my_project_id/my-ghost-blog
Node: very-long-node-name/10.240.51.18
Labels: name=ghost
Status: Running
Reason:
Message:
IP: 10.216.0.12
Replication Controllers: <none>
Containers:
ghost:
Image: us.gcr.io/my_project_id/my-ghost-blog
Limits:
cpu: 100m
State: Running
Started: Fri, 04 Sep 2015 14:08:20 -0400
Ready: False
Restart Count: 10
Conditions:
Type Status
Ready False
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
Fri, 04 Sep 2015 14:03:20 -0400 Fri, 04 Sep 2015 14:03:20 -0400 1 {scheduler } scheduled Successfully assigned ghost to very-long-node-name
Fri, 04 Sep 2015 14:03:27 -0400 Fri, 04 Sep 2015 14:03:27 -0400 1 {kubelet very-long-node-name} implicitly required container POD created Created with docker id dbbc27b4d280
Fri, 04 Sep 2015 14:03:27 -0400 Fri, 04 Sep 2015 14:03:27 -0400 1 {kubelet very-long-node-name} implicitly required container POD started Started with docker id dbbc27b4d280
Fri, 04 Sep 2015 14:03:27 -0400 Fri, 04 Sep 2015 14:03:27 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} created Created with docker id ceb14ba72929
Fri, 04 Sep 2015 14:03:27 -0400 Fri, 04 Sep 2015 14:03:27 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} started Started with docker id ceb14ba72929
Fri, 04 Sep 2015 14:03:27 -0400 Fri, 04 Sep 2015 14:03:27 -0400 1 {kubelet very-long-node-name} implicitly required container POD pulled Pod container image "gcr.io/google_containers/pause:0.8.0" already present on machine
Fri, 04 Sep 2015 14:03:30 -0400 Fri, 04 Sep 2015 14:03:30 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} started Started with docker id 0b8957fe9b61
Fri, 04 Sep 2015 14:03:30 -0400 Fri, 04 Sep 2015 14:03:30 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} created Created with docker id 0b8957fe9b61
Fri, 04 Sep 2015 14:03:40 -0400 Fri, 04 Sep 2015 14:03:40 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} created Created with docker id edaf0df38c01
Fri, 04 Sep 2015 14:03:40 -0400 Fri, 04 Sep 2015 14:03:40 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} started Started with docker id edaf0df38c01
Fri, 04 Sep 2015 14:03:50 -0400 Fri, 04 Sep 2015 14:03:50 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} started Started with docker id d33f5e5a9637
...
This cycle of created/started goes on forever, if I don't kill the pod. The only difference from the successful pod is the lack of a replication controller. I don't expect this is the problem because the tutorial mentions nothing about rc.
Why is this happening? How can I create a successful pod from config file? And where would I find more verbose logs about what is going on?

If the same docker image is working via kubectl run but not working in a pod, then something is wrong with the pod spec. Compare the full output of the pod as created from spec and as created by rc to see what differs by running kubectl get pods <name> -o yaml for both. Shot in the dark: is it possible the env vars specified in the pod spec are causing it to crash on startup?

Maybe you could use different restart Policy in the yaml file?
What you have I believe is equivalent to
- restartPolicy: Never
no replication controller. You may try to add this line to yaml and set it to Always (and this will provide you with RC), or to OnFailure.
https://github.com/kubernetes/kubernetes/blob/master/docs/user-guide/pod-states.md#restartpolicy

Container logs may be useful, with kubectl logs
Usage:
kubectl logs [-p] POD [-c CONTAINER]
http://kubernetes.io/v1.0/docs/user-guide/kubectl/kubectl_logs.html

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Kubernetes cronjob never scheduling, no errors - kubernetes

We encountered this just now; updated certificate and cronjobs stopped scheduling. We only needed to restart the master node, saved us a lot of hassle so try that first.

Related

Kubelet certificate expired but work nodes working fine, when we will see the issue

Why clustering on k8s through redis-ha doesn't work?

How do I find out what image is running in a Kubernetes VM on GCE?

KUBERNETES PODS:unable to mount nfs volume

Kubernetes pod on Google Container Engine continually restarts, is never ready

Categories

Resources