Multiple kube-shcedulers - kubernetes

I'm trying to create a kube-shceduler that will be used by my organisation pod deployment (a data-center). The objective would be to have the pods of any deployment be split equally between the two sites of the data-center. To archive this I created a new kube-shceduler manifest (see below), in which i changed the Ports that are used and added a new configuration file (see below) using the --config parameter. The problem is that even after setting the --port parameter to a new port that isn't already used by the original kube-scheduler, it still tries to use the old one. As such the new kube scheduler can't start:
I1109 20:27:59.225996 1 registry.go:173] Registering SelectorSpread plugin
I1109 20:27:59.226097 1 registry.go:173] Registering SelectorSpread plugin
I1109 20:27:59.926994 1 serving.go:331] Generated self-signed cert in-memory
failed to create listener: failed to listen on 0.0.0.0:10251: listen tcp 0.0.0.0:10251: bind: address already in use
How can I force the use of the specified port OR how can I delete the original kube-scheduler so that there won't be any conflict between the two. The first solution would be preferable.
Configuration kube-scheduler (manifest - /etc/kubernetes/manifest/kube-scheduler.yaml) :
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=true
#- --port=0
image: k8s.gcr.io/kube-scheduler:v1.19.3
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-scheduler
resources:
requests:
cpu: 100m
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/kubernetes/scheduler.conf
name: kubeconfig
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
volumes:
- hostPath:
path: /etc/kubernetes/scheduler.conf
type: FileOrCreate
name: kubeconfig
status: {}
Configuration kube-custom-scheduler.yaml (/etc/kubernetes/manifest/kube-custom-scheduler.yaml):
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-custom-scheduler
tier: control-plane
name: kube-custom-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --config=/etc/kubernetes/scheduler-custom.conf
- --master=true
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=false
- --secure-port=10269
- --scheduler-name=kube-custom-shceduler
- --port=10261
image: k8s.gcr.io/kube-scheduler:v1.19.3
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10269
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-scheduler
resources:
requests:
cpu: 100m
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /healthz
port: 10269
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/kubernetes/scheduler.conf
name: kubeconfig
readOnly: true
- mountPath: /etc/kubernetes/scheduler-custom.conf
name: customconfig
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
volumes:
- hostPath:
path: /etc/kubernetes/scheduler.conf
type: FileOrCreate
name: kubeconfig
- hostPath:
path: /etc/kubernetes/scheduler-custom.conf
type: FileOrCreate
name: customconfig
status: {}
Custom configuration mentioned in the kube-custom-scheduler.yaml (/etc/kubernetes/scheduler-custom.conf):
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
profiles:
- pluginConfig:
- name: PodTopologySpread
args:
defaultConstraints:
- maxSkew: 1
topologyKey: topology.kube.io/datacenter
whenUnsatisfiable: ScheduleAnyway
Please don't hesistate if you need more information about the cluster.

According to https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/ , the --port would be overwritten by the file specified by --config. Adding HealthzBindAddress and MetricsBindAddress to /etc/kubernetes/scheduler-custom.conf works:
healthzBindAddress: 0.0.0.0:10261
metricsBindAddress: 0.0.0.0:10261
If you want to remove the default scheduler, you can move kube-scheduler.yaml out of /etc/kubernetes/manifests/.

Related

Can't read ingress-nginx controller pod's logs after configuring filebeat

currently we are working on persist ingress-nginx logs,we implemented filebeat sidecar put forlogs to logstash,but we cannot read acces.log and error.log this path{/var/log/nginx/access.log} now we required read logs for this path, please give solution for this issue
ingress-nginx deployed manifest files like this
filebeat configmap
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-configmap
namespace: ingress-nginx
data:
filebeat.yml: |
filebeat:
config:
modules:
path: /usr/share/filebeat/modules.d/*.yml
reload:
enabled: true
modules:
- module: nginx
access:
var.paths: ["/var/log/nginx/access.log*"]
error:
var.paths: ["/var/log/nginx/error.log*"]
output:
logstash:
hosts: ["logstash-logstash-headless:9600"]
loadbalance: true
deployment file
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
selector:
matchLabels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
revisionHistoryLimit: 10
minReadySeconds: 0
template:
metadata:
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
spec:
dnsPolicy: ClusterFirst
containers:
- name: controller
image: k8s.gcr.io/ingress-nginx/controller:v0.41.2#sha256:1f4f402b9c14f3ae92b11ada1dfe9893a88f0faeb0b2f4b903e2c67a0c3bf0de
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
args:
- /nginx-ingress-controller
- --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
- --election-id=ingress-controller-leader
- --ingress-class=nginx
- --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
- --validating-webhook=:8443
- --validating-webhook-certificate=/usr/local/certificates/cert
- --validating-webhook-key=/usr/local/certificates/key
- --annotations-prefix=nginx.ingress.kubernetes.io
- --enable-ssl-passthrough
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
runAsUser: 101
allowPrivilegeEscalation: true
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: LD_PRELOAD
value: /usr/local/lib/libmimalloc.so
livenessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 3
ports:
- name: http
containerPort: 80
protocol: TCP
- name: https
containerPort: 443
protocol: TCP
- name: webhook
containerPort: 8443
protocol: TCP
- name: prometheus
containerPort: 10254
protocol: TCP
volumeMounts:
- name: webhook-cert
mountPath: /usr/local/certificates/
readOnly: true
- name: nginx-logs
mountPath: var/log/nginx
resources:
requests:
cpu: 100m
memory: 90Mi
- name: filebeat-nginx
image: docker.elastic.co/beats/filebeat:7.13.0
volumeMounts:
- name: nginx-logs
mountPath: var/log/nginx
- name: filebeat-config
mountPath: /usr/share/filebeat/filebeat.yml
subPath: filebeat.yml
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: ingress-nginx
terminationGracePeriodSeconds: 300
volumes:
- name: nginx-logs
- name: webhook-cert
secret:
secretName: ingress-nginx-admission
- name: filebeat-config
configMap:
name: filebeat-configmap
items:
- key: filebeat.yml
path: filebeat.yml
It looks like there is a typo in the file path in the following code:
- name: nginx-logs
mountPath: var/log/nginx
resources:
requests:
cpu: 100m
memory: 90Mi
- name: filebeat-nginx
image: docker.elastic.co/beats/filebeat:7.13.0
volumeMounts:
- name: nginx-logs
mountPath: var/log/nginx
The file paths need to be changed from var/log/nginx to /var/log/nginx (forward slash in the beginning of the path)

Configure pod resource request for k8s etcd pod

When running k8s 1.18 alongside with a default “on cluster” etcd pod deployment, what is the way to assign a resource (CPU/memory) request, or influence the pod spec for the etcd container?
The default configuration provides no resource requests or limits.
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system etcd-172-25-87-82-hybrid.com 0 (0%) 0 (0%) 0 (0%) 0 (0%) 77m
I’m aware of how one can pass extra args to etcd via kubeadm extraArgs config but these do not cover the etcd pod resources.
etcd:
local:
extraArgs:
heartbeat-interval: "1000"
election-timeout: "5000"
The question can be extended to the other resources in the kube-system namespace eg coredns, etc.
After init cluster, you can find generated /etc/kubernetes/manifests/etcd.yaml. Tried to edit it? The kubelet should pick the changes and restart the etcd instance.
root#kube-1:~# cat /etc/kubernetes/manifests/etcd.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://10.154.0.33:2379
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://10.154.0.33:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --initial-advertise-peer-urls=https://10.154.0.33:2380
- --initial-cluster=kube-1=https://10.154.0.33:2380
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379,https://10.154.0.33:2379
- --listen-metrics-urls=http://127.0.0.1:2381
- --listen-peer-urls=https://10.154.0.33:2380
- --name=kube-1
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: k8s.gcr.io/etcd:3.4.13-0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /health
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: etcd
resources: {}
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /health
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priorityClassName: system-node-critical
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
status: {}

K8s init container restart itself

I have pod that includes one init container and one app container,
between them there is a volume with shared folder.
My issue is that once a day or even more, the init container run itself and therefore it delete the node modules from the volume, then the main app crash because of missing modules.
The app container is not making any restarts, only the init container.
is anyone familiar with this issue in k8s? why those restarts happens only in the init container?
Thanks :)
edit:
the deployment yaml file -
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "25"
creationTimestamp: "2020-05-19T06:48:18Z"
generation: 25
labels:
apps: ******
commit: ******
name: *******
namespace: fleet
resourceVersion: "24059934"
selfLink: *******
uid: *******
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: *******
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: *******
commit: *******
revision: *******
spec:
containers:
image: XXXXXXXXXXXX
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /ping
port: http
scheme: HTTP
initialDelaySeconds: 120
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 30
name: *******
ports:
- containerPort: 1880
name: http
protocol: TCP
readinessProbe:
failureThreshold: 20
httpGet:
path: /ping
port: http
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 30
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/breeze
name: workdir
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: flowregistrykey
initContainers:
image: XXXXXXXXXXXX
imagePullPolicy: IfNotPresent
name: get-flow-json
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /work-dir
name: workdir
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: workdir
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2020-06-01T12:30:10Z"
lastUpdateTime: "2020-06-01T12:30:10Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2020-05-19T06:48:18Z"
lastUpdateTime: "2020-06-01T12:45:05Z"
message: ReplicaSet "collection-associator.sandbox.services.collection-8784dcb9d"
has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 25
readyReplicas: 1
replicas: 1
updatedReplicas: 1
and this image explain the problem-
the pod was created 7 days ago, but the files inside were created today, and there is no node_modules folder - because only the init container run again and not the app container so there was no mpm install
the init container run itself and therefore it delete the node modules from the volume
initContainer is only run when a pod restart. Don't treat it as service or application. It should be a script, a job only for setup before your application.
then the main app crash because of missing modules.
node_modules is not dynamic loading. It's loaded when you npm start
You might want to try livenessProbe.
Many applications running for long periods of time eventually transition to broken states, and cannot recover except by being restarted. Kubernetes provides liveness probes to detect and remedy such situations.
initContainers:
- name: set-delete-time
command: ["/bin/sh","-c"]
args:
- |
# Let's set 3600
echo 3600 > deleteTime
...
containers:
- name: node-app
livenessProbe:
exec:
command:
# If deleteTime > 0 exit 0
# Else /tmp/deleteTime - $periodSeconds > /tmp/deleteTime
- cat
- /tmp/deleteTime
- ...
initialDelaySeconds: 5
periodSeconds: 5

Kubernetes StatefulSet - obtain spec.replicas metadata and reference elsewhere in configuration

I am configuring a StatefulSet where I want the number of replicas (spec.replicas as shown below) available to somehow pass as a parameter into the application instance. My application needs spec.replicas to determine the numer of replicas so it knows what rows to load from a MySQL table. I don't want to hard-code the number of replicas in both spec.replicas and the application parameter as that will not work when scaling the number of replicas up or down, since the application parameter needs to adjust when scaling.
Here is my StatefulSet config:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
labels:
run: my-app
name: my-app
namespace: my-ns
spec:
replicas: 3
selector:
matchLabels:
run: my-app
serviceName: my-app
podManagementPolicy: Parallel
template:
metadata:
labels:
run: my-app
spec:
containers:
- name: my-app
image: my-app:latest
command:
- /bin/sh
- /bin/start.sh
- dev
- 2000m
- "0"
- "3" **Needs to be replaced with # replicas**
- 127.0.0.1
- "32990"
imagePullPolicy: Always
livenessProbe:
httpGet:
path: /health
port: 8081
initialDelaySeconds: 180
periodSeconds: 10
timeoutSeconds: 3
readinessProbe:
failureThreshold: 10
httpGet:
path: /ready
port: 8081
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 3
ports:
- containerPort: 8080
protocol: TCP
resources:
limits:
memory: 2500Mi
imagePullSecrets:
- name: snapshot-pull
restartPolicy: Always
I have read the Kubernetes docs and the spec.replicas field is scoped at the pod or container level, never the StatefulSet, at least as far as I have seen.
Thanks in advance.
You could use a yaml anchor to do this:
Check out:
https://helm.sh/docs/chart_template_guide/yaml_techniques/#yaml-anchors
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
labels:
run: my-app
name: my-app
namespace: my-ns
spec:
replicas: &numReplicas 3
selector:
matchLabels:
run: my-app
serviceName: my-app
podManagementPolicy: Parallel
template:
metadata:
labels:
run: my-app
spec:
containers:
- name: my-app
image: my-app:latest
command:
- /bin/sh
- /bin/start.sh
- dev
- 2000m
- "0"
- *numReplicas
- 127.0.0.1
- "32990"
imagePullPolicy: Always
livenessProbe:
httpGet:
path: /health
port: 8081
initialDelaySeconds: 180
periodSeconds: 10
timeoutSeconds: 3
readinessProbe:
failureThreshold: 10
httpGet:
path: /ready
port: 8081
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 3
ports:
- containerPort: 8080
protocol: TCP
resources:
limits:
memory: 2500Mi
imagePullSecrets:
- name: snapshot-pull
restartPolicy: Always
Normally you would use the downward api for this kind of thing. https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/
However it is currently not possible for kubernetes to propagate deployment/statefulset spec data into the pod spec with the downward api, nor should it be. If you are responsible for this software I'd set up some internal functionality so that it can find it's peers and determine their count periodically.

Kubernetes deployment incurs downtime

When running a deployment I get downtime. Requests failing after a variable amount of time (20-40 seconds).
The readiness check for the entry container fails when the preStop sends SIGUSR1, waits for 31 seconds, then sends SIGTERM. In that timeframe the pod should be removed from the service as the readiness check is set to fail after 2 failed attempts with 5 second intervals.
How can I see the events for pods being added and removed from the service to find out what's causing this?
And events around the readiness checks themselves?
I use Google Container Engine version 1.2.2 and use GCE's network load balancer.
service:
apiVersion: v1
kind: Service
metadata:
name: myapp
labels:
app: myapp
spec:
type: LoadBalancer
ports:
- name: http
port: 80
targetPort: http
protocol: TCP
- name: https
port: 443
targetPort: https
protocol: TCP
selector:
app: myapp
deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
strategy:
type: RollingUpdate
revisionHistoryLimit: 10
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
version: 1.0.0-61--66-6
spec:
containers:
- name: myapp
image: ****
resources:
limits:
cpu: 100m
memory: 250Mi
requests:
cpu: 10m
memory: 125Mi
ports:
- name: http-direct
containerPort: 5000
livenessProbe:
httpGet:
path: /status
port: 5000
initialDelaySeconds: 30
timeoutSeconds: 1
lifecycle:
preStop:
exec:
# SIGTERM triggers a quick exit; gracefully terminate instead
command: ["sleep 31;"]
- name: haproxy
image: travix/haproxy:1.6.2-r0
imagePullPolicy: Always
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 10m
memory: 25Mi
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
env:
- name: "SSL_CERTIFICATE_NAME"
value: "ssl.pem"
- name: "OFFLOAD_TO_PORT"
value: "5000"
- name: "HEALT_CHECK_PATH"
value: "/status"
volumeMounts:
- name: ssl-certificate
mountPath: /etc/ssl/private
livenessProbe:
httpGet:
path: /status
port: 443
scheme: HTTPS
initialDelaySeconds: 30
timeoutSeconds: 1
readinessProbe:
httpGet:
path: /readiness
port: 81
initialDelaySeconds: 0
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 2
lifecycle:
preStop:
exec:
# SIGTERM triggers a quick exit; gracefully terminate instead
command: ["kill -USR1 1; sleep 31; kill 1"]
volumes:
- name: ssl-certificate
secret:
secretName: ssl-c324c2a587ee-20160331
When the probe fails, the prober will emit a warning event with reason as Unhealthy and message as xx probe errored: xxx.
You should be able to find those events using either kubectl get events or kubectl describe pods -l app=myapp,version=1.0.0-61--66-6 (filter pods by its label).