Kubernetes job debug command - kubernetes

I wrote a job and I always get init error. I have noticed that if I remove the related command all goes fine and I do not get any init error.
My question is: how can I debug commands that need to run in the job? I use pod describe but all I can see is an exit status code 2.
apiVersion: batch/v1
kind: Job
metadata:
name: database-import
spec:
template:
spec:
initContainers:
- name: download-dump
image: google/cloud-sdk:alpine
command: ##### ERROR HERE!!!
- bash
- -c
- "gsutil cp gs://webshop-254812-sbg-data-input/pg/spryker-stg.gz /data/spryker-stage.gz"
volumeMounts:
- name: application-default-credentials
mountPath: "/secrets/"
readOnly: true
- name: data
mountPath: "/data/"
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /secrets/application_default_credentials.json
containers:
- name: database-import
image: postgres:9.6-alpine
command:
- bash
- -c
- "gunzip -c /data/spryker-stage.gz | psql -h postgres -Uusername -W spy_ch "
env:
- name: PGPASSWORD
value: password
volumeMounts:
- name: data
mountPath: "/data/"
volumes:
- name: application-default-credentials
secret:
secretName: application-default-credentials
- name: data
emptyDir: {}
restartPolicy: Never
backoffLimit: 4
And this is the job describe:
Name: database-import
Namespace: sbg
Selector: controller-uid=a70d74a2-f596-11e9-a7fe-025000000001
Labels: app.kubernetes.io/managed-by=tilt
Annotations: <none>
Parallelism: 1
Completions: 1
Start Time: Wed, 23 Oct 2019 15:11:40 +0200
Pods Statuses: 1 Running / 0 Succeeded / 3 Failed
Pod Template:
Labels: app.kubernetes.io/managed-by=tilt
controller-uid=a70d74a2-f596-11e9-a7fe-025000000001
job-name=database-import
Init Containers:
download-dump:
Image: google/cloud-sdk:alpine
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
gsutil cp gs://webshop-254812-sbg-data-input/pg/spryker-stg.gz /data/spryker-stage.gz
Environment:
GOOGLE_APPLICATION_CREDENTIALS: /secrets/application_default_credentials.json
Mounts:
/data/ from data (rw)
/secrets/ from application-default-credentials (ro)
Containers:
database-import:
Image: postgres:9.6-alpine
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
gunzip -c /data/spryker-stage.gz | psql -h postgres -Uusername -W
spy_ch
Environment:
PGPASSWORD: password
Mounts:
/data/ from data (rw)
Volumes:
application-default-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: application-default-credentials-464thb4k85
Optional: false
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 2m5s job-controller Created pod: database-import-9tsjw
Normal SuccessfulCreate 119s job-controller Created pod: database-import-g68ld
Normal SuccessfulCreate 109s job-controller Created pod: database-import-8cx6v
Normal SuccessfulCreate 69s job-controller Created pod: database-import-tnjnh

The command to see the log of an init container ran in a job is:
kubectl logs -f <pod name> -c <initContainer name>

You check the logs using
Kubectl logs <Pod name>
Pod name is completed job or running job.
In logs you can more idea about error and easily you can debug the job on running on Kubernetes.
If you are using the Kubernetes Cluster on GKE and enabled the stackdriver monitoring then you can use it also for debugging.

Init:Error -> Init Container has failed to execute.
That is because there are some errors in the initContainers command section
There You can read how the yaml should be prepared.
I have fixed your yaml file.
apiVersion: batch/v1
kind: Job
metadata:
name: database-import
spec:
template:
spec:
containers:
- name: database-import
image: postgres:9.6-alpine
command:
- bash
- "-c"
- "gunzip -c /data/spryker-stage.gz | psql -h postgres -Uusername -W spy_ch "
env:
- name: PGPASSWORD
value: password
volumeMounts:
- name: data
mountPath: "/data/"
initContainers:
- name: download-dump
image: google/cloud-sdk:alpine
command:
- /bin/bash
- "-c"
- "gsutil cp gs://webshop-254812-sbg-data-input/pg/spryker-stg.gz /data/spryker-stage.gz"
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /secrets/application_default_credentials.json
volumeMounts:
- name: application-default-credentials
mountPath: "/secrets/"
readOnly: true
- name: data
mountPath: "/data/"
volumes:
- name: application-default-credentials
secret:
secretName: application-default-credentials
- name: data
emptyDir: {}
restartPolicy: Never
backoffLimit: 4
Result after kubectl apply -f job.yaml
job.batch/database-import created
Let me know if it works now.
EDIT
Use kubectl describe job <name of your job> and add results,we will see why it is not working then.

Related

Kubernetes livenessProbe some container stops with failure and others in success. What is the cause?

Deep dive to this question. I have a scheduled cron job and a never ending container in the same pod. To end the never ending container when the cron job has done it's work I'm using a liveness probe.
apiVersion: batch/v1
kind: CronJob
metadata:
name: pod-failed
spec:
schedule: "*/10 * * * *"
concurrencyPolicy: Replace
jobTemplate:
spec:
ttlSecondsAfterFinished: 300
activeDeadlineSeconds: 300
backoffLimit: 4
template:
spec:
containers:
- name: docker-http-server
image: katacoda/docker-http-server:latest
ports:
- containerPort: 80
volumeMounts:
- mountPath: /cache
name: cache-volume
volumeMounts:
- mountPath: /cache
name: cache-volume
livenessProbe:
exec:
command:
- sh
- -c
- if test -f "/cache/stop"; then exit 1; fi;
initialDelaySeconds: 5
periodSeconds: 5
- name: busy
image: busybox
imagePullPolicy: IfNotPresent
command:
- sh
- -c
args:
- echo start > /cache/start; sleep 15; echo stop > /cache/stop;
volumeMounts:
- mountPath: /cache
name: cache-volume
restartPolicy: Never
volumes:
- name: cache-volume
emptyDir:
sizeLimit: 10Mi
As you see the cron job will write the /cache/stop file and the never ending container is stopped. The problem is that with some images the never ending container stops in failure.
Is there a way to stop every container in success?
Name: pod-failed-27827190
Namespace: default
Selector: controller-uid=608efa7c-53cf-4978-9136-9fec772c1c6d
Labels: controller-uid=608efa7c-53cf-4978-9136-9fec772c1c6d
job-name=pod-failed-27827190
Annotations: batch.kubernetes.io/job-tracking:
Controlled By: CronJob/pod-failed
Parallelism: 1
Completions: 1
Completion Mode: NonIndexed
Start Time: Mon, 28 Nov 2022 11:30:00 +0100
Active Deadline Seconds: 300s
Pods Statuses: 0 Active (0 Ready) / 0 Succeeded / 5 Failed
Pod Template:
Labels: controller-uid=608efa7c-53cf-4978-9136-9fec772c1c6d
job-name=pod-failed-27827190
Containers:
docker-http-server:
Image: katacoda/docker-http-server:latest
Port: 80/TCP
Host Port: 0/TCP
Liveness: exec [sh -c if test -f "/cache/stop"; then exit 1; fi;] delay=5s timeout=1s period=5s #success=1 #failure=3
Environment: <none>
Mounts:
/cache from cache-volume (rw)
busy:
Image: busybox
Port: <none>
Host Port: <none>
Command:
sh
-c
Args:
echo start > /cache/start; sleep 15; echo stop > /cache/stop;
Environment: <none>
Mounts:
/cache from cache-volume (rw)
Volumes:
cache-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: 10Mi
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 2m5s job-controller Created pod: pod-failed-27827190-8tqxk
Normal SuccessfulCreate 102s job-controller Created pod: pod-failed-27827190-4gj2s
Normal SuccessfulCreate 79s job-controller Created pod: pod-failed-27827190-5wgfg
Normal SuccessfulCreate 56s job-controller Created pod: pod-failed-27827190-lzv8k
Normal SuccessfulCreate 33s job-controller Created pod: pod-failed-27827190-fr8v5
Warning BackoffLimitExceeded 9s job-controller Job has reached the specified backoff limit
As you can see the image: katacoda/docker-http-server:latest is failing with the liveness probe. This doesn't happens with ngix for example.
apiVersion: batch/v1
kind: CronJob
metadata:
name: pod-failed
spec:
schedule: "*/10 * * * *"
concurrencyPolicy: Replace
jobTemplate:
spec:
ttlSecondsAfterFinished: 300
activeDeadlineSeconds: 300
backoffLimit: 4
template:
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
volumeMounts:
- mountPath: /cache
name: cache-volume
volumeMounts:
- mountPath: /cache
name: cache-volume
livenessProbe:
exec:
command:
- sh
- -c
- if test -f "/cache/stop"; then exit 1; fi;
initialDelaySeconds: 5
periodSeconds: 5
- name: busy
image: busybox
imagePullPolicy: IfNotPresent
command:
- sh
- -c
args:
- echo start > /cache/start; sleep 15; echo stop > /cache/stop;
volumeMounts:
- mountPath: /cache
name: cache-volume
restartPolicy: Never
volumes:
- name: cache-volume
emptyDir:
sizeLimit: 10Mi
Of course the never ending image that I'm pulling is ending in failure and I've no control over that image.
Is there a way to force success status of the job/pod?
It depends on the exit code of the container's main process. Every container receives a term signal when kubernetes wants to stop it to give it the chance to end gracefully. This also applies when the reason is a failed liveness probe. I guess nginx exits with exit code 0 while your katacode http server returns with a code different to 0. Looking at the docs of the golang ListenAndServe method it clearly states that it ends with a non-nil error: https://pkg.go.dev/net/http#Server.ListenAndServe
You could override the container's default command with a bash script that starts the application and then waits until the stop file is written:
containers:
- name: docker-http-server
image: katacoda/docker-http-server:latest
command:
- "sh"
- "-c"
- "/app & while true; do if [ -f /cache/stop ]; then exit 0; fi; sleep 1; done;"
Here, "/app" is the start command of the katacode http server container.

Rabbitmq containers throwing error "Bad characters in cookie"

I am trying to setup a Rabbitmq cluster and when the containers start they are failing with error [error] CRASH REPORT Process <0.200.0> with 0 neighbours crashed with reason: "Bad characters in cookie" in auth:init_no_setcookie/0 line 313. This suggests that the erlang cookie value passed in is not valid :
kubectl -n demos get pods
NAME READY STATUS RESTARTS AGE
mongodb-deployment-6499999-vpcjh 1/1 Running 0 12h
rabbitmq-0 0/1 CrashLoopBackOff 9 25m
rabbitmq-1 0/1 CrashLoopBackOff 9 24m
rabbitmq-2 0/1 CrashLoopBackOff 9 23m
And when I query the logs for one of the pods :
kubectl -n demos logs -p rabbitmq-0 --previous
I get :
WARNING: '/var/lib/rabbitmq/.erlang.cookie' was populated from
'$RABBITMQ_ERLANG_COOKIE', which will no longer happen in 3.9 and later! (https://github.com/docker-library/rabbitmq/pull/424)
Configuring logger redirection
02:04:47.506 [error] Bad characters in cookie
02:04:47.512 [error]
02:04:47.506 [error] Supervisor net_sup had child auth started with auth:start_link() at undefined exit with reason "Bad characters in cookie" in auth:init_no_setcookie/0 line 313 in context start_error
02:04:47.506 [error] CRASH REPORT Process <0.200.0> with 0 neighbours crashed with reason: "Bad characters in cookie" in auth:init_no_setcookie/0 line 313
02:04:47.522 [error] BOOT FAILED
BOOT FAILED
02:04:47.523 [error] ===========
===========
02:04:47.523 [error] Exception during startup:
Exception during startup:
02:04:47.524 [error]
02:04:47.524 [error] supervisor:children_map/4 line 1250
....
....
....
This is how I am generating the cookie in bash :
dd if=/dev/urandom bs=30 count=1 | base64
And in the secrets manifest I have :
metadata:
name: rabbit-secret
namespace: demos
type: Opaque
data:
# echo -n "cookie-value" | base64
RABBITMQ_ERLANG_COOKIE: <encoded_cookie_value_here>
And in statefulset I have :
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rabbitmq
namespace: demos
spec:
serviceName: rabbitmq
replicas: 3
selector:
matchLabels:
app: rabbitmq
template:
metadata:
labels:
app: rabbitmq
spec:
serviceAccountName: rabbitmq
initContainers:
- name: config
image: busybox
imagePullPolicy: "IfNotPresent"
command: ['/bin/sh', '-c', 'cp /tmp/config/rabbitmq.conf /config/rabbitmq.conf && ls -l /config/ && cp /tmp/config/enabled_plugins /etc/rabbitmq/enabled_plugins']
volumeMounts:
- name: config
mountPath: /tmp/config/
readOnly: false
- name: config-file
mountPath: /config/
- name: plugins-file
mountPath: /etc/rabbitmq/
containers:
- name: rabbitmq
image: rabbitmq:3.8-management
imagePullPolicy: "IfNotPresent"
ports:
- containerPort: 4369
name: discovery
- containerPort: 5672
name: amqp
env:
- name: RABBIT_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: RABBIT_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: RABBITMQ_NODENAME
value: rabbit#$(RABBIT_POD_NAME).rabbitmq.$(RABBIT_POD_NAMESPACE).svc.cluster.local
- name: RABBITMQ_USE_LONGNAME
value: "true"
- name: RABBITMQ_CONFIG_FILE
value: "/config/rabbitmq"
- name: RABBITMQ_ERLANG_COOKIE
valueFrom:
secretKeyRef:
name: rabbit-secret
key: RABBITMQ_ERLANG_COOKIE
- name: K8S_HOSTNAME_SUFFIX
value: .rabbitmq.$(RABBIT_POD_NAMESPACE).svc.cluster.local
volumeMounts:
- name: data
mountPath: /var/lib/rabbitmq
readOnly: false
- name: config-file
mountPath: /config/
- name: plugins-file
mountPath: /etc/rabbitmq/
volumes:
- name: config-file
emptyDir: {}
- name: plugins-file
emptyDir: {}
- name: config
configMap:
name: rabbitmq-config
defaultMode: 0755
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "cinder-csi"
resources:
requests:
storage: 50Mi
---
apiVersion: v1
kind: Service
metadata:
name: rabbitmq
namespace: labs
spec:
clusterIP: None
ports:
- port: 4369
targetPort: 4369
name: discovery
- port: 5672
targetPort: 5672
name: amqp
selector:
app: rabbitmq
What am I missing ?
Is there a recommended way of generating the cookie or something else to do with the K8s cluster itself.
I have followed the example given here with the only difference being that I am generating my cookie on my local machine and not the k8s host.
duplicate question?
This requires you to create the Secret. Just to bring this up, you can run a one-off imperative command to create a random Secret:
kubectl create secret generic rabbitmq \
--from-literal=erlangCookie=$(dd if=/dev/urandom bs=30 count=1 | base64)
The error is from the rabbitmq's source code file docker-entrypoint.sh
if [ "${RABBITMQ_ERLANG_COOKIE:-}" ]; then
cookieFile='/var/lib/rabbitmq/.erlang.cookie'
if [ -e "$cookieFile" ]; then
if [ "$(cat "$cookieFile" 2>/dev/null)" != "$RABBITMQ_ERLANG_COOKIE" ]; then
echo >&2
echo >&2 "warning: $cookieFile contents do not match RABBITMQ_ERLANG_COOKIE"
echo >&2
fi
else
echo "$RABBITMQ_ERLANG_COOKIE" > "$cookieFile"
fi
chmod 600 "$cookieFile"
fi
Therefore you can delete '/var/lib/rabbitmq/.erlang.cookie' file, the code will copy the content from environment variable $RABBITMQ_ERLANG_COOKIE to create this file.
If you are working for the production system, you should be very careful and test it in your local system firstly and gain experience.
This source code will be execuated only the rabbitmq restart, and won't run again after it is execuated.
And you can find the actual rabbitmq's cookie data from the erlang's console by erlang:get_cookie() -> Cookie | nocookie.

kubectl exec permission denied

I have a pod running mariadb container and I would like to backup my database but it fails with a Permission denied.
kubectl exec my-owncloud-mariadb-0 -it -- bash -c "mysqldump --single-transaction -h localhost -u myuser -ppassword mydatabase > owncloud-dbbackup_`date +"%Y%m%d"`.bak"
And the result is
bash: owncloud-dbbackup_20191121.bak: Permission denied
command terminated with exit code 1
I can't run sudo mysqldump because I get a sudo command not found.
I tried to export the backup file on different location: /home, the directory where mysqldump is located, /usr, ...
Here is the yaml of my pod:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2019-11-20T14:16:58Z"
generateName: my-owncloud-mariadb-
labels:
app: mariadb
chart: mariadb-7.0.0
component: master
controller-revision-hash: my-owncloud-mariadb-77495ddc7c
release: my-owncloud
statefulset.kubernetes.io/pod-name: my-owncloud-mariadb-0
name: my-owncloud-mariadb-0
namespace: default
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: my-owncloud-mariadb
uid: 47f2a129-8d4e-4ae9-9411-473288623ed5
resourceVersion: "2509395"
selfLink: /api/v1/namespaces/default/pods/my-owncloud-mariadb-0
uid: 6a98de05-c790-4f59-b182-5aaa45f3b580
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
app: mariadb
release: my-owncloud
topologyKey: kubernetes.io/hostname
weight: 1
containers:
- env:
- name: MARIADB_ROOT_PASSWORD
valueFrom:
secretKeyRef:
key: mariadb-root-password
name: my-owncloud-mariadb
- name: MARIADB_USER
value: myuser
- name: MARIADB_PASSWORD
valueFrom:
secretKeyRef:
key: mariadb-password
name: my-owncloud-mariadb
- name: MARIADB_DATABASE
value: mydatabase
image: docker.io/bitnami/mariadb:10.3.18-debian-9-r36
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- sh
- -c
- exec mysqladmin status -uroot -p$MARIADB_ROOT_PASSWORD
failureThreshold: 3
initialDelaySeconds: 120
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: mariadb
ports:
- containerPort: 3306
name: mysql
protocol: TCP
readinessProbe:
exec:
command:
- sh
- -c
- exec mysqladmin status -uroot -p$MARIADB_ROOT_PASSWORD
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /bitnami/mariadb
name: data
- mountPath: /opt/bitnami/mariadb/conf/my.cnf
name: config
subPath: my.cnf
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-pbgxr
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostname: my-owncloud-mariadb-0
nodeName: 149.202.36.244
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1001
runAsUser: 1001
serviceAccount: default
serviceAccountName: default
subdomain: my-owncloud-mariadb
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: data
persistentVolumeClaim:
claimName: data-my-owncloud-mariadb-0
- configMap:
defaultMode: 420
name: my-owncloud-mariadb
name: config
- name: default-token-pbgxr
secret:
defaultMode: 420
secretName: default-token-pbgxr
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2019-11-20T14:33:22Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2019-11-20T14:34:03Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2019-11-20T14:34:03Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2019-11-20T14:33:22Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://3898b6a20bd8c38699374b7db7f04ccef752ffd5a5f7b2bc9f7371e6a27c963a
image: bitnami/mariadb:10.3.18-debian-9-r36
imageID: docker-pullable://bitnami/mariadb#sha256:a89e2fab7951c622e165387ead0aa0bda2d57e027a70a301b8626bf7412b9366
lastState: {}
name: mariadb
ready: true
restartCount: 0
state:
running:
startedAt: "2019-11-20T14:33:24Z"
hostIP: 149.202.36.244
phase: Running
podIP: 10.42.2.56
qosClass: BestEffort
startTime: "2019-11-20T14:33:22Z"
Is their something I'm missing?
You might not have permission to write to the location inside container. try the below command
use /tmp or some other location where you can dump the backup file
kubectl exec my-owncloud-mariadb-0 -it -- bash -c "mysqldump --single-transaction -h localhost -u myuser -ppassword mydatabase > /tmp/owncloud-dbbackup_`date +"%Y%m%d"`.bak"
Given the pod YAML file you've shown, you can't usefully use kubectl exec to make a database backup.
You're getting a shell inside the pod and running mysqldump there to write out the dump file somewhere else inside the pod. You can't write it to the secret directory or the configmap directory, so your essential choices are either to write it to the pod filesystem (which will get deleted as soon as the pod exits, including if Kubernetes decides to relocate the pod within the cluster) or the mounted database directory (and your backup will survive exactly as long as the data it's backing up).
I'd run mysqldump from outside the pod. One good approach would be to create a separate Job that mounted some sort of long-term storage (or relied on external object storage; if you're running on AWS, for example, S3), connected to the database pod, and ran the backup that way. That has the advantage of being fairly self-contained (so you can debug it without interfering with your live database) and also totally automated (you could launch it from a Kubernetes CronJob).
kubectl exec doesn't seem to have the same flags docker exec does to control the user identity, so you're dependent on there being some path inside the container that its default user can write to. /tmp is typically world-writable so if you just want that specific command to work I'd try putting the dump file into /tmp/owncloud-dbbackup_....

Where to execute kube-proxy command?

From this article, I can specify 'userspace' as my proxy-mode, but I am unable to understand what command I need to use for it and at what stage? Like after creating deployment or service?
I am running a minikube cluster currently.
kube-proxy is a process that runs on each kubernetes node to manage network connections coming into and out of kubernetes.
You don't run the command as such, but your deployment method (usually kubeadm) configures the options for it to run.
As #Hang Du mentioned, in minikube you can modify it's options by editing the kube-proxy configmap and changing mode to userspace
kubectl -n kube-system edit configmap kube-proxy
Then delete the Pod.
kubectl -n kube-system get pod
kubectl -n kube-system delete pod kube-proxy-XXXXX
If you are using minikube, you can find a DaemonSet named kube-proxy like followings:
$ kubectl get ds -n kube-system kube-proxy -o yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
...
labels:
k8s-app: kube-proxy
name: kube-proxy
namespace: kube-system
...
spec:
...
spec:
containers:
- command:
- /usr/local/bin/kube-proxy
- --config=/var/lib/kube-proxy/config.conf
- --hostname-override=$(NODE_NAME)
env:
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: k8s.gcr.io/kube-proxy:v1.15.0
imagePullPolicy: IfNotPresent
name: kube-proxy
...
volumeMounts:
- mountPath: /var/lib/kube-proxy
name: kube-proxy
- mountPath: /run/xtables.lock
name: xtables-lock
- mountPath: /lib/modules
name: lib-modules
readOnly: true
dnsPolicy: ClusterFirst
...
volumes:
- configMap:
defaultMode: 420
name: kube-proxy
name: kube-proxy
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: xtables-lock
- hostPath:
path: /lib/modules
type: ""
name: lib-modules
...
Look at the .spec.template.spec.containers[].command, the container runs the kube-proxy command. You can provide the flag --proxy-mode=userspace in the command array.
- command:
- /usr/local/bin/kube-proxy
- --config=/var/lib/kube-proxy/config.conf
- --hostname-override=$(NODE_NAME)
- --proxy-mode=userspace

Can't modify ETCD manifest for Kubernetes static pod

I'd like to modify etcd pod to listening 0.0.0.0(or host machine IP) instead of 127.0.0.1.
I'm working on a migration from a single master to multi-master kubernetes cluster, but I faced with an issue that after I modified /etc/kubernetes/manifests/etcd.yaml with correct settings and restart kubelet and even docker daemons, etcd still working on 127.0.0.1.
Inside docker container I'm steel seeing that etcd started with --listen-client-urls=https://127.0.0.1:2379 instead of host IP
cat /etc/kubernetes/manifests/etcd.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://192.168.22.9:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --initial-advertise-peer-urls=https://192.168.22.9:2380
- --initial-cluster=test-master-01=https://192.168.22.9:2380
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://192.168.22.9:2379
- --listen-peer-urls=https://192.168.22.9:2380
- --name=test-master-01
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: k8s.gcr.io/etcd-amd64:3.2.18
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- /bin/sh
- -ec
- ETCDCTL_API=3 etcdctl --endpoints=https://[192.168.22.9]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
get foo
failureThreshold: 8
initialDelaySeconds: 15
timeoutSeconds: 15
name: etcd
resources: {}
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priorityClassName: system-cluster-critical
volumes:
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
status: {}
[root#test-master-01 centos]# kubectl -n kube-system get po etcd-test-master-01 -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/config.hash: c3eef2d48a776483adc00311df8cb940
kubernetes.io/config.mirror: c3eef2d48a776483adc00311df8cb940
kubernetes.io/config.seen: 2019-05-24T13:50:06.335448715Z
kubernetes.io/config.source: file
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: 2019-05-24T14:08:14Z
labels:
component: etcd
tier: control-plane
name: etcd-test-master-01
namespace: kube-system
resourceVersion: "6288"
selfLink: /api/v1/namespaces/kube-system/pods/etcd-test-master-01
uid: 5efadb1c-7e2d-11e9-adb7-fa163e267af4
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://127.0.0.1:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --initial-advertise-peer-urls=https://127.0.0.1:2380
- --initial-cluster=test-master-01=https://127.0.0.1:2380
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379
- --listen-peer-urls=https://127.0.0.1:2380
- --name=test-master-01
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: k8s.gcr.io/etcd-amd64:3.2.18
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- /bin/sh
- -ec
- ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
get foo
First check your kubelet option --pod-manifest-path, put your correct yaml in this path.
To make sure etcd pod has been deleted, move yaml file out of pod-manifest-path, wait this pod has been deleted by docker ps -a. Then put your correct yaml file into pod-manifest-path.
Reviewed my automation scripts step by step and found that I've performed a backup of etcd yaml in the same folder with .bak extension. Looks like kubelet daemon uploads all the files inside the manifests folder and despite the file extension.