Specifying a role in a kubernetes cronjob - kubernetes

We have a kubernetes cluster with istio proxy running.
At first I created a cronjob which reads from database and updates a value if found. It worked fine.
Then it turns out we already had a service that does the database update so I changed the database code into a service call.
conn := dial.Service("service:3550", grpc.WithInsecure())
client := protobuf.NewServiceClient(conn)
client.Update(ctx)
But istio rejects the calls with an RBAC error. It just rejects and doesnt say why.
Is it possible to add a role to a cronjob? How can we do that?
The mTLS meshpolicy is PERMISSIVE.
Kubernetes version is 1.17 and istio version is 1.3
API Version: authentication.istio.io/v1alpha1
Kind: MeshPolicy
Metadata:
Creation Timestamp: 2019-12-05T16:06:08Z
Generation: 1
Resource Version: 6578
Self Link: /apis/authentication.istio.io/v1alpha1/meshpolicies/default
UID: 25f36b0f-1779-11ea-be8c-42010a84006d
Spec:
Peers:
Mtls:
Mode: PERMISSIVE
The cronjob yaml
Name: cronjob
Namespace: serve
Labels: <none>
Annotations: <none>
Schedule: */30 * * * *
Concurrency Policy: Allow
Suspend: False
Successful Job History Limit: 1
Failed Job History Limit: 3
Pod Template:
Labels: <none>
Containers:
service:
Image: service:latest
Port: <none>
Host Port: <none>
Environment:
JOB_NAME: (v1:metadata.name)
Mounts: <none>
Volumes: <none>
Last Schedule Time: Tue, 17 Dec 2019 09:00:00 +0100
Active Jobs: <none>
Events:
edit
I have turned off RBA for my namespace in ClusterRBACConfig and now it works. So cronjobs are affected by roles is my conclusion then and it should be possible to add a role and call other services.

The cronjob needs proper permissions in order to run if RBAC is enabled.
One of the solutions in this case would be to add a ServiceAccount to the cronjob configuration file that has enough privileges to execute what it needs to.
Since You already have existing services in the namespace You can check if You have existing ServiceAccount for specific NameSpace by using:
$ kubectl get serviceaccounts -n serve
If there is existing ServiceAccount You can add it into Your cronjob manifest yaml file.
Like in this example:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: adwords-api-scale-up-cron-job
spec:
schedule: "*/2 * * * *"
jobTemplate:
spec:
activeDeadlineSeconds: 100
template:
spec:
serviceAccountName: scheduled-autoscaler-service-account
containers:
- name: adwords-api-scale-up-container
image: bitnami/kubectl:1.15-debian-9
command:
- bash
args:
- "-xc"
- |
kubectl scale --replicas=2 --v=7 deployment/adwords-api-deployment
volumeMounts:
- name: kubectl-config
mountPath: /.kube/
readOnly: true
volumes:
- name: kubectl-config
hostPath:
path: $HOME/.kube # Replace $HOME with an evident path location
restartPolicy: OnFailure
Then under Pod Template there should be Service Account visable:
$ kubectl describe cronjob adwords-api-scale-up-cron-job
Name: adwords-api-scale-up-cron-job
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"batch/v1beta1","kind":"CronJob","metadata":{"annotations":{},"name":"adwords-api-scale-up-cron-job","namespace":"default"},...
Schedule: */2 * * * *
Concurrency Policy: Allow
Suspend: False
Successful Job History Limit: 3
Failed Job History Limit: 1
Starting Deadline Seconds: <unset>
Selector: <unset>
Parallelism: <unset>
Completions: <unset>
Active Deadline Seconds: 100s
Pod Template:
Labels: <none>
Service Account: scheduled-autoscaler-service-account
Containers:
adwords-api-scale-up-container:
Image: bitnami/kubectl:1.15-debian-9
Port: <none>
Host Port: <none>
Command:
bash
Args:
-xc
kubectl scale --replicas=2 --v=7 deployment/adwords-api-deployment
Environment: <none>
Mounts:
/.kube/ from kubectl-config (ro)
Volumes:
kubectl-config:
Type: HostPath (bare host directory volume)
Path: $HOME/.kube
HostPathType:
Last Schedule Time: <unset>
Active Jobs: <none>
Events: <none>
In case of custom RBAC configuration i suggest referring to kubernetes documentation.
Hope this helps.

Related

Recreate job or delete old job on every helm install

I have created Kubernetes job and job got created, but it is failing on deployment to the Kubernetes cluster. When I try to redeploy it using Helm the job is not redeploying (deleting old job and recreating new one, unlike a microservice deployment).
How can I achieve this redeploying job without manually deleting it in Kubernetes? Could I tell it to recreate the container?
job.yaml contains:
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ .Release.Name }}-init-job"
namespace: {{ .Release.Namespace }}
spec:
template:
metadata:
annotations:
linkerd.io/inject: disabled
"helm.sh/hook-delete-policy": before-hook-creation
"helm.sh/hook": pre-install,pre-upgrade,pre-delete
"helm.sh/hook-weight": "-5"
spec:
serviceAccountName: {{ .Release.Name }}-init-service-account
containers:
- name: app-installer
image: some image
command:
- /bin/bash
- -c
- echo Hello executing k8s init-container
securityContext:
readOnlyRootFilesystem: true
restartPolicy: OnFailure
Job is not getting redeployed
kubectl get jobs -n namespace
NAME COMPLETIONS DURATION AGE
test-init-job 0/1 13h 13h
kubectl describe job test-init-job -n test
Name: test-init-job
Namespace: test
Selector: controller-uid=86370470-0964-42d5-a9c1-00c8a462239f
Labels: app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: test
meta.helm.sh/release-namespace: test
Parallelism: 1
Completions: 1
Start Time: Fri, 14 Oct 2022 18:03:31 +0530
Pods Statuses: 0 Running / 0 Succeeded / 1 Failed
Pod Template:
Labels: controller-uid=86370470-0964-42d5-a9c1-00c8a462239f
job-name=test-init-job
Annotations: helm.sh/hook: pre-install,pre-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
helm.sh/hook-weight: -5
linkerd.io/inject: disabled
Service Account: test-init-service-account
Containers:
test-app-installer:
Image: artifactory/test-init-container:1.0.0
Port: <none>
Host Port: <none>
Environment:
test.baseUrl: baser
test.authType: basic
test.basic.username: jack
test.basic.password: password
Mounts:
/etc/test/config from test-installer-config-vol (ro)
Volumes:
test-installer-config-vol:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: test-installer-config-files
Optional: false
Events: <none>

Getting error mkdir: cannot create directory ‘/bitnami/rabbitmq’: Permission denied when creating Kubernetes pod of Rabbitmq

While learning Kubernetes going by the book Kubernetes for developer, I am stuck at this point now.
I am trying to start Rabbitmq pod but but after lot of troubleshooting I have managed to get to this point but do not get clue where do I fix to get rid of the permission denied error.
# kubectl get pods
NAME READY STATUS RESTARTS AGE
rabbitmq-56c67d8d7d-s8vp5 0/1 CrashLoopBackOff 5 5m40s
if I look at the logs of this contianer thats where I found:
# kubectl logs rabbitmq-56c67d8d7d-s8vp5
21:22:58.49
21:22:58.50 Welcome to the Bitnami rabbitmq container
21:22:58.51 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-rabbitmq
21:22:58.51 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-rabbitmq/issues
21:22:58.52 Send us your feedback at containers#bitnami.com
21:22:58.52
21:22:58.52 INFO ==> ** Starting RabbitMQ setup **
21:22:58.54 INFO ==> Validating settings in RABBITMQ_* env vars..
21:22:58.56 INFO ==> Initializing RabbitMQ...
21:22:58.57 INFO ==> Generating random cookie
mkdir: cannot create directory ‘/bitnami/rabbitmq’: Permission denied
Here is my rabbitmq-deployment.yml
---
# EXPORT SERVICE INTERFACE
kind: Service
apiVersion: v1
metadata:
name: message-queue
labels:
app: rabbitmq
role: master
tier: queue
spec:
ports:
- port: 5672
targetPort: 5672
selector:
app: rabbitmq
role: master
tier: queue
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rabbitmq-pv-claim
labels:
app: rabbitmq
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: rabbitmq
spec:
replicas: 1
selector:
matchLabels:
app: rabbitmq
role: master
tier: queue
template:
metadata:
labels:
app: rabbitmq
role: master
tier: queue
spec:
nodeSelector:
boardType: x86vm
containers:
- name: rabbitmq
image: bitnami/rabbitmq:3.7
envFrom:
- configMapRef:
name: bitnami-rabbitmq-config
ports:
- name: queue
containerPort: 5672
- name: queue-mgmt
containerPort: 15672
livenessProbe:
exec:
command:
- rabbitmqctl
- status
initialDelaySeconds: 120
timeoutSeconds: 5
failureThreshold: 6
readinessProbe:
exec:
command:
- rabbitmqctl
- status
initialDelaySeconds: 10
timeoutSeconds: 3
periodSeconds: 5
volumeMounts:
- name: rabbitmq-storage
mountPath: /bitnami
volumes:
- name: rabbitmq-storage
persistentVolumeClaim:
claimName: rabbitmq-pv-claim
This is the rabbitmq-storage-class.yml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: rabbitmq-storage-class
labels:
app: rabbitmq
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
and persistant-volume.yml
apiVersion: v1
kind: PersistentVolume
metadata:
name: rabbitmq-pv-claim
labels:
app: rabbitmq
spec:
storageClassName: manual
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /bitnami
Logs:
# kubectl describe pods rabbitmq-5f7f787479-fpg6g
Name: rabbitmq-5f7f787479-fpg6g
Namespace: default
Priority: 0
Node: kube-worker-vm2/192.168.1.36
Start Time: Mon, 03 May 2021 12:29:17 +0100
Labels: app=rabbitmq
pod-template-hash=5f7f787479
role=master
tier=queue
Annotations: cni.projectcalico.org/podIP: 192.168.222.4/32
cni.projectcalico.org/podIPs: 192.168.222.4/32
Status: Running
IP: 192.168.222.4
IPs:
IP: 192.168.222.4
Controlled By: ReplicaSet/rabbitmq-5f7f787479
Containers:
rabbitmq:
Container ID: docker://bbdbb9c5d4b6737519d3dcf4bdda242a7fe904f2336334afe686e9b204fd6d5c
Image: bitnami/rabbitmq:3.7
Image ID: docker-pullable://bitnami/rabbitmq#sha256:8b6057997b74ebc81e934dd6c94e9da745635faa2d79b382cfda27b9176e0e6d
Ports: 5672/TCP, 15672/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 03 May 2021 12:30:48 +0100
Finished: Mon, 03 May 2021 12:30:48 +0100
Ready: False
Restart Count: 4
Liveness: exec [rabbitmqctl status] delay=120s timeout=5s period=10s #success=1 #failure=6
Readiness: exec [rabbitmqctl status] delay=10s timeout=3s period=5s #success=1 #failure=3
Environment Variables from:
bitnami-rabbitmq-config ConfigMap Optional: false
Environment: <none>
Mounts:
/bitnami from rabbitmq-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4qmxr (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
rabbitmq-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: rabbitmq-pv-claim
ReadOnly: false
default-token-4qmxr:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-4qmxr
Optional: false
QoS Class: BestEffort
Node-Selectors: boardType=x86vm
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m20s default-scheduler Successfully assigned default/rabbitmq-5f7f787479-fpg6g to kube-worker-vm2
Normal Created 96s (x4 over 2m18s) kubelet Created container rabbitmq
Normal Started 95s (x4 over 2m17s) kubelet Started container rabbitmq
Warning
BackOff 65s (x12 over 2m16s) kubelet Back-off restarting failed container
Normal Pulled 50s (x5 over 2m18s) kubelet Container image "bitnami/rabbitmq:3.7" already present on machine
When creating an image, the image creator often chooses to use a user other than root to run the process. This is the case for your image, and the user does not have write permissions on the /bitnami directory. You can verify this by commenting out the volume.
To fix the issue, you need to set a security contect for your pod: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod
Not sure about the exact syntax, but something like this should do the trick:
spec:
securityContext:
fsGroup: 1001 # the userid that is used in the image
nodeSelector:
boardType: x86vm
containers:
- name: rabbitmq
image: bitnami/rabbitmq:3.7
envFrom:
- configMapRef:
name: bitnami-rabbitmq-config
This makes the directory writeable by the user in the image.
Another thing: A deployment is for stateless services by design. If you have state to keep, always use a statefulset. It's very similiar to a deployment from a configuration point of view, but Kubernetes treats it very differently. See https://www.youtube.com/watch?v=Vrxr-7rjkvM for good explanation.
As per bitnami documentation, it depends on the kubernetes distribution
Quote from documentation
Adjust permissions of persistent volume mountpoint
As the image run as non-root by default, it is necessary to adjust the ownership of the persistent volume so that the container can write data into it.
By default, the chart is configured to use Kubernetes Security Context to automatically change the ownership of the volume. However, this feature does not work in all Kubernetes distributions. As an alternative, this chart supports using an initContainer to change the ownership of the volume before mounting it in the final destination.
You can enable this initContainer by setting volumePermissions.enabled to true.

"Must specify limits.cpu" error during pod deployment even though cpu limit is specified

I am trying to run a test pod with OpenShift CLI:
$oc run nginx --image=nginx --limits=cpu=2,memory=4Gi
deploymentconfig.apps.openshift.io/nginx created
$oc describe deploymentconfig.apps.openshift.io/nginx
Name: nginx
Namespace: myproject
Created: 12 seconds ago
Labels: run=nginx
Annotations: <none>
Latest Version: 1
Selector: run=nginx
Replicas: 1
Triggers: Config
Strategy: Rolling
Template:
Pod Template:
Labels: run=nginx
Containers:
nginx:
Image: nginx
Port: <none>
Host Port: <none>
Limits:
cpu: 2
memory: 4Gi
Environment: <none>
Mounts: <none>
Volumes: <none>
Deployment #1 (latest):
Name: nginx-1
Created: 12 seconds ago
Status: New
Replicas: 0 current / 0 desired
Selector: deployment=nginx-1,deploymentconfig=nginx,run=nginx
Labels: openshift.io/deployment-config.name=nginx,run=nginx
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal DeploymentCreated 12s deploymentconfig-controller Created new replication controller "nginx-1" for version 1
Warning FailedCreate 1s (x12 over 12s) deployer-controller Error creating deployer pod: pods "nginx-1-deploy" is forbidden: failed quota: quota-svc-myproject: must specify limits.cpu,limits.memory
I get "must specify limits.cpu,limits.memory" error, despite both limits being present in the same describe output.
What might be the problem and how do I fix it?
I found a solution!
Part of the error message was "Error creating deployer pod". It means that the problem is not with my pod, but with the deployer pod which performs my pod deployment.
It seems the quota in my project affects deployer pods as well.
I couldn't find a way to set deployer pod limits with CLI, so I've made a DeploymentConfig.
kind: "DeploymentConfig"
apiVersion: "v1"
metadata:
name: "test-app"
spec:
template:
metadata:
labels:
name: "test-app"
spec:
containers:
- name: "test-app"
image: "nginxinc/nginx-unprivileged"
resources:
limits:
cpu: "2000m"
memory: "20Gi"
ports:
- containerPort: 8080
protocol: "TCP"
replicas: 1
selector:
name: "test-app"
triggers:
- type: "ConfigChange"
- type: "ImageChange"
imageChangeParams:
automatic: true
containerNames:
- "test-app"
from:
kind: "ImageStreamTag"
name: "nginx-unprivileged:latest"
strategy:
type: "Rolling"
resources:
limits:
cpu: "2000m"
memory: "20Gi"
A you can see, two sets of limitations are specified here: for container and for deployment strategy.
With this configuration it worked fine!
Looks like you have specified resource quota and the values you specified for limits seems to be larger than that. Can you describe the resource quota oc describe quota quota-svc-myproject and adjust your configs accordingly.
A good reference could be https://docs.openshift.com/container-platform/3.11/dev_guide/compute_resources.html

Kubernetes metrics-server FailedDiscoveryCheck

was hoping to get a little help, my Google-Fu didnt get me much closer. I'm trying to install the metrics server for my fedora-coreos kubernetes 4 node cluster like so:
kubectl apply -f deploy/kubernetes/
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.apps/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
the service seems to never start
kubectl describe apiservice v1beta1.metrics.k8s.io
Name: v1beta1.metrics.k8s.io
Namespace:
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"apiregistration.k8s.io/v1beta1","kind":"APIService","metadata":{"annotations":{},"name":"v1beta1.metrics.k8s.io"},"spec":{"...
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2020-03-04T16:53:33Z
Resource Version: 1611816
Self Link: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
UID: 65d9a56a-c548-4d7e-a647-8ce7a865a266
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: kube-system
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2020-03-04T16:53:33Z
Message: failing or missing response from https://10.3.230.59:443/apis/metrics.k8s.io/v1beta1: bad status from https://10.3.230.59:443/apis/metrics.k8s.io/v1beta1: 403
Reason: FailedDiscoveryCheck
Status: False
Type: Available
Events: <none>
Diagnosing I have found googling around:
kubectl get deploy,svc -n kube-system |egrep metrics-server
deployment.apps/metrics-server 1/1 1 1 8m7s
service/metrics-server ClusterIP 10.3.230.59 <none> 443/TCP 8m7s
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
Error from server (ServiceUnavailable): the server is currently unable to handle the request
kubectl get all --all-namespaces | grep -i metrics-server
kube-system pod/metrics-server-75b5d446cd-zj4jm 1/1 Running 0 9m11s
kube-system service/metrics-server ClusterIP 10.3.230.59 <none> 443/TCP 9m11s
kube-system deployment.apps/metrics-server 1/1 1 1 9m11s
kube-system replicaset.apps/metrics-server-75b5d446cd 1 1 1 9m11s
kubectl logs -f metrics-server-75b5d446cd-zj4jm -n kube-system
I0304 16:53:36.475657 1 serving.go:312] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
W0304 16:53:38.229267 1 authentication.go:296] Cluster doesn't provide requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
I0304 16:53:38.267760 1 secure_serving.go:116] Serving securely on [::]:4443
kubectl get -n kube-system deployment metrics-server -o yaml | grep -i args -A 10
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"k8s-app":"metrics-server"},"name":"metrics-server","namespace":"kube-system"},"spec":{"selector":{"matchLabels":{"k8s-app":"metrics-server"}},"template":{"metadata":{"labels":{"k8s-app":"metrics-server"},"name":"metrics-server"},"spec":{"containers":[{"args":["--cert-dir=/tmp","--secure-port=4443","--kubelet-insecure-tls","--kubelet-preferred-address-types=InternalIP"],"image":"k8s.gcr.io/metrics-server-amd64:v0.3.6","imagePullPolicy":"IfNotPresent","name":"metrics-server","ports":[{"containerPort":4443,"name":"main-port","protocol":"TCP"}],"securityContext":{"readOnlyRootFilesystem":true,"runAsNonRoot":true,"runAsUser":1000},"volumeMounts":[{"mountPath":"/tmp","name":"tmp-dir"}]}],"nodeSelector":{"beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64"},"serviceAccountName":"metrics-server","volumes":[{"emptyDir":{},"name":"tmp-dir"}]}}}}
creationTimestamp: "2020-03-04T16:53:33Z"
generation: 1
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
resourceVersion: "1611810"
selfLink: /apis/apps/v1/namespaces/kube-system/deployments/metrics-server
uid: 006e758e-bd33-47d7-8378-d3a8081ee8a8
spec:
--
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
image: k8s.gcr.io/metrics-server-amd64:v0.3.6
imagePullPolicy: IfNotPresent
name: metrics-server
ports:
- containerPort: 4443
name: main-port
finally my deployment config:
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.6
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
ports:
- name: main-port
containerPort: 4443
protocol: TCP
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
imagePullPolicy: IfNotPresent
volumeMounts:
- name: tmp-dir
mountPath: /tmp
hostNetwork: true
nodeSelector:
beta.kubernetes.io/os: linux
kubernetes.io/arch: "amd64"
I'm at a loss of what it could be getting the metrics service to start and just get the basic kubectl top node to display any info all I get is
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
I have searched the internet and tried adding the args: and command: lines but no luck
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
Can anyone shed light on how to fix this? Thanks
Pastebin log file
Log File
I've reproduced your issue. I have used Calico as CNI.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fedora-master Ready master 6m27s v1.17.3
fedora-worker-1 Ready <none> 4m48s v1.17.3
fedora-worker-2 Ready <none> 4m46s v1.17.3
fedora-master:~/metrics-server$ kubectl describe apiservice v1beta1.metrics.k8s.io
Status:
Conditions:
Last Transition Time: 2020-03-12T16:04:59Z
Message: failing or missing response from https://10.99.122.196:443/apis/metrics.k8s.io/v
1beta1: Get https://10.99.122.196:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting
for connection (Client.Timeout exceeded while awaiting headers)
fedora-master:~/metrics-server$ kubectl top pod
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
When you have only one node in cluster, default settings in metrics-server repo works correctly. Issue occurs when you have more than 2 nodes. Ive used 1 master and 2 workers to reproduce. Below example deployment which works correct (have all required args). Before, please remove your current metrics-server YAMLs (kubectl delete -f deploy/kubernetes) and execute:
$ git clone https://github.com/kubernetes-sigs/metrics-server
$ cd metrics-server/deploy/kubernetes/
$ vi metrics-server-deployment.yaml
Paste below YAML:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
hostNetwork: true
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.6
imagePullPolicy: IfNotPresent
args:
- /metrics-server
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
- --cert-dir=/tmp
- --secure-port=4443
ports:
- name: main-port
containerPort: 4443
protocol: TCP
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- name: tmp-dir
mountPath: /tmp
nodeSelector:
kubernetes.io/os: linux
kubernetes.io/arch: "amd64"
save and quit using :wq
$ cd ~/metrics-server
$ kubectl apply -f deploy/kubernetes/
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.apps/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
Wait a while for metrics-server to gather a few metrics from nodes.
$ kubectl describe apiservice v1beta1.metrics.k8s.io
Name: v1beta1.metrics.k8s.io
Namespace:
...
Metadata:
Creation Timestamp: 2020-03-12T16:57:58Z
...
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: kube-system
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2020-03-12T16:58:01Z
Message: all checks passed
Reason: Passed
Status: True
Type: Available
Events: <none>
after a few minutes you can use top.
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
fedora-master 188m 9% 1315Mi 17%
fedora-worker-1 109m 5% 982Mi 13%
fedora-worker-2 84m 4% 969Mi 13%
If you will still encounter some issues, please add - --v=6 to deployment and provide logs from metrics-server pod.
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.1
args:
- /metrics-server
- --v=6
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
You need to carefully check logs for calico-node pods. In my case i have some other network interfaces and the autodetection mechanism in calico was detecting wrong interface (ip address). You need to consult this documentation https://projectcalico.docs.tigera.io/reference/node/configuration.
What i did in my case, was simply:
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=cidr=172.16.8.0/24
cidr is my "working network". After this, all calico-nodes restarted and suddenly everything was fine.

How to schedule a cronjob which executes a kubectl command?

How to schedule a cronjob which executes a kubectl command?
I would like to run the following kubectl command every 5 minutes:
kubectl patch deployment runners -p '{"spec":{"template":{"spec":{"containers":[{"name":"jp-runner","env":[{"name":"START_TIME","value":"'$(date +%s)'"}]}]}}}}' -n jp-test
For this, I have created a cronjob as below:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- kubectl patch deployment runners -p '{"spec":{"template":{"spec":{"containers":[{"name":"jp-runner","env":[{"name":"START_TIME","value":"'$(date +%s)'"}]}]}}}}' -n jp-test
restartPolicy: OnFailure
But it is failing to start the container, showing the message :
Back-off restarting failed container
And with the error code 127:
State: Terminated
Reason: Error
Exit Code: 127
From what I checked, the error code 127 says that the command doesn't exist. How could I run the kubectl command then as a cron job ? Am I missing something?
Note: I had posted a similar question ( Scheduled restart of Kubernetes pod without downtime ) , but that was more of having the main deployment itself as a cronjob, here I'm trying to run a kubectl command (which does the restart) using a CronJob - so I thought it would be better to post separately
kubectl describe cronjob hello -n jp-test:
Name: hello
Namespace: jp-test
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"batch/v1beta1","kind":"CronJob","metadata":{"annotations":{},"name":"hello","namespace":"jp-test"},"spec":{"jobTemplate":{"spec":{"templ...
Schedule: */5 * * * *
Concurrency Policy: Allow
Suspend: False
Starting Deadline Seconds: <unset>
Selector: <unset>
Parallelism: <unset>
Completions: <unset>
Pod Template:
Labels: <none>
Containers:
hello:
Image: busybox
Port: <none>
Host Port: <none>
Args:
/bin/sh
-c
kubectl patch deployment runners -p '{"spec":{"template":{"spec":{"containers":[{"name":"jp-runner","env":[{"name":"START_TIME","value":"'$(date +%s)'"}]}]}}}}' -n jp-test
Environment: <none>
Mounts: <none>
Volumes: <none>
Last Schedule Time: Wed, 27 Feb 2019 14:10:00 +0100
Active Jobs: hello-1551273000
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 6m cronjob-controller Created job hello-1551272700
Normal SuccessfulCreate 1m cronjob-controller Created job hello-1551273000
Normal SawCompletedJob 16s cronjob-controller Saw completed job: hello-1551272700
kubectl describe job hello -v=5 -n jp-test
Name: hello-1551276000
Namespace: jp-test
Selector: controller-uid=fa009d78-3a97-11e9-ae31-ac1f6b1a0950
Labels: controller-uid=fa009d78-3a97-11e9-ae31-ac1f6b1a0950
job-name=hello-1551276000
Annotations: <none>
Controlled By: CronJob/hello
Parallelism: 1
Completions: 1
Start Time: Wed, 27 Feb 2019 15:00:02 +0100
Pods Statuses: 0 Running / 0 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=fa009d78-3a97-11e9-ae31-ac1f6b1a0950
job-name=hello-1551276000
Containers:
hello:
Image: busybox
Port: <none>
Host Port: <none>
Args:
/bin/sh
-c
kubectl patch deployment runners -p '{"spec":{"template":{"spec":{"containers":[{"name":"jp-runner","env":[{"name":"START_TIME","value":"'$(date +%s)'"}]}]}}}}' -n jp-test
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 7m job-controller Created pod: hello-1551276000-lz4dp
Normal SuccessfulDelete 1m job-controller Deleted pod: hello-1551276000-lz4dp
Warning BackoffLimitExceeded 1m (x2 over 1m) job-controller Job has reached the specified backoff limit
Name: hello-1551276300
Namespace: jp-test
Selector: controller-uid=ad52e87a-3a98-11e9-ae31-ac1f6b1a0950
Labels: controller-uid=ad52e87a-3a98-11e9-ae31-ac1f6b1a0950
job-name=hello-1551276300
Annotations: <none>
Controlled By: CronJob/hello
Parallelism: 1
Completions: 1
Start Time: Wed, 27 Feb 2019 15:05:02 +0100
Pods Statuses: 1 Running / 0 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=ad52e87a-3a98-11e9-ae31-ac1f6b1a0950
job-name=hello-1551276300
Containers:
hello:
Image: busybox
Port: <none>
Host Port: <none>
Args:
/bin/sh
-c
kubectl patch deployment runners -p '{"spec":{"template":{"spec":{"containers":[{"name":"jp-runner","env":[{"name":"START_TIME","value":"'$(date +%s)'"}]}]}}}}' -n jp-test
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 2m job-controller Created pod: hello-1551276300-8d5df
Long story short BusyBox doesn' have kubectl installed.
You can check it yourself using kubectl run -i --tty busybox --image=busybox -- sh which will run a BusyBox pod as interactive shell.
I would recommend using bitnami/kubectl:latest.
Also keep in mind that You will need to set proper RBAC, as you will get Error from server (Forbidden): services is forbidden
You could use something like this:
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: jp-test
name: jp-runner
rules:
- apiGroups:
- extensions
- apps
resources:
- deployments
verbs:
- 'patch'
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: jp-runner
namespace: jp-test
subjects:
- kind: ServiceAccount
name: sa-jp-runner
namespace: jp-test
roleRef:
kind: Role
name: jp-runner
apiGroup: ""
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: sa-jp-runner
namespace: jp-test
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: sa-jp-runner
containers:
- name: hello
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- kubectl patch deployment runners -p '{"spec":{"template":{"spec":{"containers":[{"name":"jp-runner","env":[{"name":"START_TIME","value":"'$(date +%s)'"}]}]}}}}' -n jp-test
restartPolicy: OnFailure
You need to make the CronJob's container to download the cluster configuration so then you can run kubectl commands against it. Here is an example:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: drupal-cron
spec:
schedule: "*/5 * * * *"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
containers:
- name: drupal-cron
image: juampynr/digital-ocean-cronjob:latest
env:
- name: DIGITALOCEAN_ACCESS_TOKEN
valueFrom:
secretKeyRef:
name: api
key: key
command: ["/bin/bash","-c"]
args:
- doctl kubernetes cluster kubeconfig save drupster;
POD_NAME=$(kubectl get pods -l tier=frontend -o=jsonpath='{.items[0].metadata.name}');
kubectl exec $POD_NAME -c drupal -- vendor/bin/drush core:cron;
restartPolicy: OnFailure
I posted an answer describing how I did this in a different thread: https://stackoverflow.com/a/62321138/1120652