Kubernetes failed job with no pods - kubernetes

I see a failed job that created no pods.And also there is no information in the events.Since there are no pods,I could not check the logs.
Here is the description of the job which failed.
kubectl describe job time-limited-rbac-1604010900 -n add-ons
Name: time-limited-rbac-1604010900
Namespace: add-ons
Selector: controller-uid=0816b9b3-814c-4802-83cf-5d5f3456701d
Labels: controller-uid=0816b9b3-814c-4802-83cf-5d5f3456701d
job-name=time-limited-rbac-1604010900
Annotations: <none>
Controlled By: CronJob/time-limited-rbac
Parallelism: 1
Completions: <unset>
Start Time: Thu, 29 Oct 2020 15:35:08 -0700
Active Deadline Seconds: 280s
Pods Statuses: 0 Running / 0 Succeeded / 1 Failed
Pod Template:
Labels: controller-uid=0816b9b3-814c-4802-83cf-5d5f3456701d
job-name=time-limited-rbac-1604010900
Service Account: time-limited-rbac
Containers:
time-limited-rbac:
Image: bitnami/kubectl:latest
Port: <none>
Host Port: <none>
Command:
/bin/bash
Args:
/var/tmp/time-limited-rbac.sh
Environment: <none>
Mounts:
/var/tmp/ from script (rw)
Volumes:
script:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: time-limited-rbac-script
Optional: false
Events: <none>
Here is the description of CronJob.
apiVersion: v1
items:
- apiVersion: batch/v1beta1
kind: CronJob
metadata:
annotations:
meta.helm.sh/release-name: time-limited-rbac
meta.helm.sh/release-namespace: add-ons
labels:
app.kubernetes.io/name: time-limited-rbac
name: time-limited-rbac
spec:
concurrencyPolicy: Replace
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
spec:
activeDeadlineSeconds: 280
backoffLimit: 3
parallelism: 1
template:
metadata:
creationTimestamp: null
spec:
containers:
- args:
- /var/tmp/time-limited-rbac.sh
command:
- /bin/bash
image: bitnami/kubectl:latest
imagePullPolicy: Always
name: time-limited-rbac
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/tmp/
name: script
dnsPolicy: ClusterFirst
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: time-limited-rbac
serviceAccountName: time-limited-rbac
terminationGracePeriodSeconds: 0
volumes:
- configMap:
defaultMode: 356
name: time-limited-rbac-script
name: script
schedule: '*/5 * * * *'
successfulJobsHistoryLimit: 3
suspend: false
Is there any way to tune thie cronjob to avoid such scenarios? We are receiving this issue atleast once or twice everyday.

Related

Azure AKS backup using Velero

I noticed that Velero can only backup AKS PVCs if those PVCs are disk and not Azure fileshares. To handle this i tried to use restic to backup by fileshares itself but i gives me a strange log:
This is how my actual pod looks like
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
backup.velero.io/backup-volumes: grafana-data
deployment.kubernetes.io/revision: "17"
And the log of my backup:
time="2020-05-26T13:51:54Z" level=info msg="Adding pvc grafana-data to additionalItems" backup=velero/grafana-test-volume cmd=/velero logSource="pkg/backup/pod_action.go:67" pluginName=velero
time="2020-05-26T13:51:54Z" level=info msg="Backing up item" backup=velero/grafana-test-volume group=v1 logSource="pkg/backup/item_backupper.go:169" name=grafana-data namespace=grafana resource=persistentvolumeclaims
time="2020-05-26T13:51:54Z" level=info msg="Executing custom action" backup=velero/grafana-test-volume group=v1 logSource="pkg/backup/item_backupper.go:330" name=grafana-data namespace=grafana resource=persistentvolumeclaims
time="2020-05-26T13:51:54Z" level=info msg="Skipping item because it's already been backed up." backup=velero/grafana-test-volume group=v1 logSource="pkg/backup/item_backupper.go:163" name=grafana-data namespace=grafana resource=persistentvolumeclaims
As you can see somehow it did not backup the grafana-data volume since it says it is already in the backup (where it is actually not).
My azurefile volume holds these contents:
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1beta1","kind":"StorageClass","metadata":{"annotations":{},"labels":{"kubernetes.io/cluster-service":"true"},"name":"azurefile"},"parameters":{"skuName":"Standard_LRS"},"provisioner":"kubernetes.io/azure-file"}
creationTimestamp: "2020-05-18T15:18:18Z"
labels:
kubernetes.io/cluster-service: "true"
name: azurefile
resourceVersion: "1421202"
selfLink: /apis/storage.k8s.io/v1/storageclasses/azurefile
uid: e3cc4e52-c647-412a-bfad-81ab6eb222b1
mountOptions:
- nouser_xattr
parameters:
skuName: Standard_LRS
provisioner: kubernetes.io/azure-file
reclaimPolicy: Delete
volumeBindingMode: Immediate
As you can see i actually patched the storage class to hold the nouser_xattr mount option which was suggested earlier
When i check the Restic pod logs i see the following info:
E0524 10:22:08.908190 1 reflector.go:156] github.com/vmware-tanzu/velero/pkg/generated/informers/externalversions/factory.go:117: Failed to list *v1.PodVolumeBackup: Get https://10.0.0.1:443/apis/velero.io/v1/namespaces/velero/podvolumebackups?limit=500&resourceVersion=1212830: dial tcp 10.0.0.1:443: i/o timeout
I0524 10:22:08.909577 1 trace.go:116] Trace[1946538740]: "Reflector ListAndWatch" name:github.com/vmware-tanzu/velero/pkg/generated/informers/externalversions/factory.go:117 (started: 2020-05-24 10:21:38.908988405 +0000 UTC m=+487217.942875118) (total time: 30.000554209s):
Trace[1946538740]: [30.000554209s] [30.000554209s] END
When i check the PodVolumeBackup pod i see below contents. I don't know what is expected here though
➜ ~ kubectl -n velero get podvolumebackups -o yaml
apiVersion: v1
items: []
kind: List
metadata:
resourceVersion: ""
selfLink: ""
To summarize this, i installed Velero like this:
velero install \
--provider azure \
--plugins velero/velero-plugin-for-microsoft-azure:v1.0.1 \
--bucket $BLOB_CONTAINER \
--secret-file ./credentials-velero \
--backup-location-config resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP,storageAccount=$AZURE_STORAGE_ACCOUNT_ID \
--snapshot-location-config apiTimeout=5m,resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP \
--use-restic
--wait
The end result is the deployment described below
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
backup.velero.io/backup-volumes: app-upload
deployment.kubernetes.io/revision: "18"
creationTimestamp: "2020-05-18T16:55:38Z"
generation: 10
labels:
app: app
velero.io/backup-name: mekompas-tenant-production-20200518020012
velero.io/restore-name: mekompas-tenant-production-20200518020012-20200518185536
name: app
namespace: mekompas-tenant-production
resourceVersion: "427893"
selfLink: /apis/extensions/v1beta1/namespaces/mekompas-tenant-production/deployments/app
uid: c1961ec3-b7b1-4f81-9aae-b609fa3d31fc
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: app
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
kubectl.kubernetes.io/restartedAt: "2020-05-18T20:24:19+02:00"
creationTimestamp: null
labels:
app: app
spec:
containers:
- image: nginx:1.17-alpine
imagePullPolicy: IfNotPresent
name: app-nginx
ports:
- containerPort: 80
name: http
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/www/html
name: app-files
- mountPath: /etc/nginx/conf.d
name: nginx-vhost
- env:
- name: CONF_DB_HOST
value: db.mekompas-tenant-production
- name: CONF_DB
value: mekompas
- name: CONF_DB_USER
value: mekompas
- name: CONF_DB_PASS
valueFrom:
secretKeyRef:
key: DATABASE_PASSWORD
name: secret
- name: CONF_EMAIL_FROM_ADDRESS
value: noreply#mekompas.nl
- name: CONF_EMAIL_FROM_NAME
value: mekompas
- name: CONF_EMAIL_REPLYTO_ADDRESS
value: slc#mekompas.nl
- name: CONF_UPLOAD_PATH
value: /uploads
- name: CONF_SMTP_HOST
value: smtp.sendgrid.net
- name: CONF_SMTP_PORT
value: "587"
- name: CONF_SMTP_USER
value: apikey
- name: CONF_SMTP_PASSWORD
valueFrom:
secretKeyRef:
key: MAIL_PASSWORD
name: secret
image: me.azurecr.io/mekompas/php-fpm-alpine:1.12.0
imagePullPolicy: Always
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- cp -r /app/. /var/www/html && chmod -R 777 /var/www/html/templates_c
&& chmod -R 777 /var/www/html/core/lib/htmlpurifier-4.9.3/library/HTMLPurifier/DefinitionCache
name: app-php
ports:
- containerPort: 9000
name: upstream-php
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/www/html
name: app-files
- mountPath: /uploads
name: app-upload
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: registrypullsecret
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: app-upload
persistentVolumeClaim:
claimName: upload
- emptyDir: {}
name: app-files
- configMap:
defaultMode: 420
name: nginx-vhost
name: nginx-vhost
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2020-05-18T18:12:20Z"
lastUpdateTime: "2020-05-18T18:12:20Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2020-05-18T16:55:38Z"
lastUpdateTime: "2020-05-20T16:03:48Z"
message: ReplicaSet "app-688699c5fb" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 10
readyReplicas: 1
replicas: 1
updatedReplicas: 1
Best,
Pim
Have you added nouser_xattr to your StorageClass mountOptions list?
This requirement is documented in GitHub issue 1800.
Also mentioned on the restic integration page (check under the Azure section), where they provide this snippet to patch your StorageClass resource:
kubectl patch storageclass/<YOUR_AZURE_FILE_STORAGE_CLASS_NAME> \
--type json \
--patch '[{"op":"add","path":"/mountOptions/-","value":"nouser_xattr"}]'
If you have no existing mountOptions list, you can try:
kubectl patch storageclass azurefile \
--type merge \
--patch '{"mountOptions": ["nouser_xattr"]}'
Ensure the pod template of the Deployment resource includes the annotation backup.velero.io/backup-volumes. Annotations on Deployment resources will propagate to ReplicaSet resources, but not to Pod resources.
Specifically, in your example the annotation backup.velero.io/backup-volumes: app-upload should be a child of spec.template.metadata.annotations, rather than a child of metadata.annotations.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
# *** move velero annotiation from here ***
labels:
app: app
name: app
namespace: mekompas-tenant-production
spec:
template:
metadata:
annotations:
# *** velero annotation goes here in order to end up on the pod ***
backup.velero.io/backup-volumes: app-upload
labels:
app: app
spec:
containers:
- image: nginx:1.17-alpine

A pod cannot connect to a service by IP

My ingress pod is having trouble reaching two clusterIP services by IP. There are plenty of other clusterIP services it has no trouble reaching. Including in the same namespace. Another pod has no problem reaching the service (I tried the default backend in the same namespace and it was fine).
Where should I look? Here are my actual services, it cannot reach the first but can reach the second:
- apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2019-08-23T16:59:10Z"
labels:
app: pka-168-emtpy-id
app.kubernetes.io/instance: palletman-pka-168-emtpy-id
app.kubernetes.io/managed-by: Tiller
helm.sh/chart: pal-0.0.1
release: palletman-pka-168-emtpy-id
name: pka-168-emtpy-id
namespace: palletman
resourceVersion: "108574168"
selfLink: /api/v1/namespaces/palletman/services/pka-168-emtpy-id
uid: 539364f9-c5c7-11e9-8699-0af40ce7ce3a
spec:
clusterIP: 100.65.111.47
ports:
- port: 80
protocol: TCP
targetPort: 8080
selector:
app: pka-168-emtpy-id
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
- apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2019-03-05T19:57:26Z"
labels:
app: production
app.kubernetes.io/instance: palletman
app.kubernetes.io/managed-by: Tiller
helm.sh/chart: pal-0.0.1
release: palletman
name: production
namespace: palletman
resourceVersion: "81337664"
selfLink: /api/v1/namespaces/palletman/services/production
uid: e671c5e0-3f80-11e9-a1fc-0af40ce7ce3a
spec:
clusterIP: 100.65.82.246
ports:
- port: 80
protocol: TCP
targetPort: 8080
selector:
app: production
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
My ingress pod:
apiVersion: v1
kind: Pod
metadata:
annotations:
sumologic.com/format: text
sumologic.com/sourceCategory: 103308/CT/LI/kube_ingress
sumologic.com/sourceName: kube_ingress
creationTimestamp: "2019-08-21T19:34:48Z"
generateName: ingress-nginx-65877649c7-
labels:
app: ingress-nginx
k8s-addon: ingress-nginx.addons.k8s.io
pod-template-hash: "2143320573"
name: ingress-nginx-65877649c7-5npmp
namespace: kube-ingress
ownerReferences:
- apiVersion: extensions/v1beta1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: ingress-nginx-65877649c7
uid: 97db28a9-c43f-11e9-920a-0af40ce7ce3a
resourceVersion: "108278133"
selfLink: /api/v1/namespaces/kube-ingress/pods/ingress-nginx-65877649c7-5npmp
uid: bcd92d96-c44a-11e9-8699-0af40ce7ce3a
spec:
containers:
- args:
- /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/nginx-default-backend
- --configmap=$(POD_NAMESPACE)/ingress-nginx
- --publish-service=$(POD_NAMESPACE)/ingress-nginx
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: gcr.io/google_containers/nginx-ingress-controller:0.9.0-beta.13
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: ingress-nginx
ports:
- containerPort: 80
name: http
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-dg5wn
readOnly: true
dnsPolicy: ClusterFirst
nodeName: ip-10-55-131-177.eu-west-1.compute.internal
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 60
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-dg5wn
secret:
defaultMode: 420
secretName: default-token-dg5wn
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2019-08-21T19:34:48Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2019-08-21T19:34:50Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2019-08-21T19:34:48Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://d597673f4f38392a52e9537e6dd2473438c62c2362a30e3d58bf8a98e177eb12
image: gcr.io/google_containers/nginx-ingress-controller:0.9.0-beta.13
imageID: docker-pullable://gcr.io/google_containers/nginx-ingress-controller#sha256:c9d2e67f8096d22564a6507794e1a591fbcb6461338fc655a015d76a06e8dbaa
lastState: {}
name: ingress-nginx
ready: true
restartCount: 0
state:
running:
startedAt: "2019-08-21T19:34:50Z"
hostIP: 10.55.131.177
phase: Running
podIP: 172.6.218.18
qosClass: BestEffort
startTime: "2019-08-21T19:34:48Z"
It could be connectivity to the node where your Pod is running. (Or network overlay related) You can check where that pod is running:
$ kubectl get pod -o=json | jq .items[0].spec.nodeName
Check if the node is 'Ready':
$ kubectl get node <node-from-above>
If it's ready, then ssh into the node to further troubleshoot:
$ ssh <node-from-above>
Is your overlay pod running on the node? (Calico, Weave, CNI, etc)
You can further troubleshoot connecting to the pod/container
# From <node-from-above>
$ docker exec -it <container-id-in-pod> bash
# Check connectivity (ping, dig, curl, etc)
Also, from using the kubectl command line (if you have network connectivity to the node)
$ kubectl exec -it <pod-id> -c <container-name> bash
# Troubleshoot...

kubernetes cronjob in GKE stop scheduling the job after a few weeks

I have this yaml for cronjob, running in google kubernetes engine:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
creationTimestamp: 2019-04-22T18:20:51Z
name: cron-field-velocity-field-details-manager
namespace: master
resourceVersion: "73643714"
selfLink: /apis/batch/v1beta1/namespaces/master/cronjobs/cron-field-velocity-field-details-manager
uid: 5be9e8d5-652b-11e9-bf91-42010a9600af
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
spec:
template:
metadata:
creationTimestamp: null
labels:
app: cron-field-velocity-field-details-manager
chart: field-velocity-field-details-manager-0.0.1
heritage: Tiller
release: master-field-velocity-field-details-manager
spec:
containers:
- args:
- ./field-velocity-field-details-manager.dll
command:
- dotnet
image: taranisag/field-velocity-field-details-manager:master.993b179
imagePullPolicy: IfNotPresent
name: cron-field-velocity-field-details-manager
resources:
requests:
cpu: "2"
memory: 2Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: regsecret
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
schedule: '* 2,14 * * *'
successfulJobsHistoryLimit: 3
suspend: false
status:
lastScheduleTime: 2019-06-20T02:00:00Z
It was working for a few weeks meaning the job was running twice a day, but it stop running a week ago.
There was no indication of an error and the last run was completed successfully
Is it something in the yaml I defined wrong ?

Kubernetes: Error kubectl edit deployment

I'm trying to edit deployment in kubernetes by:
kubectl -n <namespace> edit deployment <depolyment_name>.
after entering the command, vi windows for editing appears, then I make some changes for example in the command section or in volumeMounts section.
but I get the following error:
A copy of your changes has been stored to "/tmp/kubectl-edit-hv5dh.yaml"
error: map: map[] does not contain declared merge key: name
someone can help with it?
attached the edit deployment file of apiserver:
kubectl -n federation-system edit deployment apiserver
(codes between ** ** are the lines i added)
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
federation.alpha.kubernetes.io/federation-name: fed
creationTimestamp: 2018-04-01T13:26:40Z
generation: 1
labels:
app: federated-cluster
name: apiserver
namespace: federation-system
resourceVersion: "393140"
selfLink: /apis/extensions/v1beta1/namespaces/federation-system/deployments/apiserver
uid: <uid>
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: federated-cluster
module: federation-apiserver
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
annotations:
federation.alpha.kubernetes.io/federation-name: fed
creationTimestamp: null
labels:
app: federated-cluster
module: federation-apiserver
name: apiserver
spec:
containers:
- command:
- /fcp
- federation-apiserver
- --admission-control=NamespaceLifecycle
- --advertise-address=<master-ip>
- --bind-address=0.0.0.0
- --client-ca-file=/etc/federation/apiserver/ca.crt
- --etcd-servers=http://localhost:2379
- --secure-port=8443
- --tls-cert-file=/etc/federation/apiserver/server.crt
- --tls-private-key-file=/etc/federation/apiserver/server.key
**- --enable-admission-plugins=SchedulingPolicy
- --admission-control-config-file=/etc/kubernetes/admission/config.yml**
image: gcr.io/k8s-jkns-e2e-gce-federation/fcp-amd64:v1.9.0-alpha.3
imagePullPolicy: IfNotPresent
name: apiserver
ports:
- containerPort: 8443
name: https
protocol: TCP
- containerPort: 8080
name: local
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/federation/apiserver
name: apiserver-credentials
readOnly: true
**volumeMounts:
- mountPath: /etc/kubernetes/admission
name: admission-config**
- command:
- /usr/local/bin/etcd
- --data-dir
- /var/etcd/data
image: gcr.io/google_containers/etcd:3.1.10
imagePullPolicy: IfNotPresent
name: etcd
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
imagePullSecrets:
- {}
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: apiserver-credentials
secret:
defaultMode: 420
secretName: apiserver-credentials
**- name: admission-config
configMap:
name: admission**
status:
availableReplicas: 1
conditions:
- lastTransitionTime: 2018-04-01T13:26:40Z
lastUpdateTime: 2018-04-01T13:26:40Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: 2018-04-01T13:26:40Z
lastUpdateTime: 2018-04-01T13:27:20Z
message: ReplicaSet "apiserver-8484fd45f8" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
it's happened after I created configMap file:
kubectl create -f scheduling-policy-admission.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: admission
namespace: federation-system
data:
config.yml: |
apiVersion: apiserver.k8s.io/v1alpha1
kind: AdmissionConfiguration
plugins:
- name: SchedulingPolicy
path: /etc/kubernetes/admission/scheduling-policy-config.yml
scheduling-policy-config.yml: |
kubeconfig: /etc/kubernetes/admission/opa-kubeconfig
opa-kubeconfig: |
clusters:
- name: opa-api
cluster:
server: http://opa.federation-system.svc.cluster.local:8181/v0/data/kubernetes/placement
users:
- name: scheduling-policy
user:
token: deadbeefsecret
contexts:
- name: default
context:
cluster: opa-api
user: scheduling-policy
current-context: default
I'm trying to configure Admission Controller in the Federation API.
Thanks,
dnsPolicy: ClusterFirst
# DELETE imagePullSecrets:
# DELETE - {}
restartPolicy: Always
I would strongly recommend removing that imagePullSecrets block. Since those objects have a mergeKey of name, but that object has no name, it would very easily cause the error you are experiencing. If the YAML was given to your editor in that condition, then I am almost certain that is a kubernetes bug: it should always(?) allow round-tripping YAML via kubectl edit, if for no other reason than this situation right here.

Kubernetes isn't allowing me to create a pod with securityContext runAsUser

Summary:
I have pods with security context: runAsUser: 1337 that fail to start due to being disallowed by policy. I have altered admission-control to no success (as suggested here
and here)
What else do I need to force through this kind of security context?
Details
I'm working through the https://istio.io/docs/samples/bookinfo.html example to start porting over to istio.
I have a deployment named details-v1 (see below) from which a replica set and pod have been created. The pod is stuck in pending.
NAME READY STATUS RESTARTS AGE
details-v1-3207759430-nt9tt 0/2 Pending 0 34m
describe on the pod shows the cause of the error:
FailedValidation Error validating pod details-v1-3207759430-nt9tt.azs-master from api, ignoring: spec.initContainers[1].securityContext.privileged: Forbidden: disallowed by policy
In order to get this far, I have already made changes to the kube-apiserver:
/usr/local/bin/kube-apiserver \
--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,ResourceQuota \
--allow-privileged=true \
Deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata":{"annotations":{},"creationTimestamp":null,"name":"details-v1","namespace":"azs-master"},"spec":{"replicas":1,"strategy":{},"template":{"metadata":{"annotations":{"alpha.istio.io/sidecar":"injected","alpha.istio.io/version":"jenkins#ubuntu-16-04-build-12ac793f80be71-0.1.6-dab2033","pod.beta.kubernetes.io/init-containers":"[{\"args\":[\"-p\",\"15001\",\"-u\",\"1337\"],\"image\":\"docker.io/istio/init:0.1\",\"imagePullPolicy\":\"Always\",\"name\":\"init\",\"securityContext\":{\"capabilities\":{\"add\":[\"NET_ADMIN\"]}}},{\"args\":[\"-c\",\"sysctl -w kernel.core_pattern=/tmp/core.%e.%p.%t \\u0026\\u0026 ulimit -c unlimited\"],\"command\":[\"/bin/sh\"],\"image\":\"alpine\",\"imagePullPolicy\":\"Always\",\"name\":\"enable-core-dump\",\"securityContext\":{\"privileged\":true}}]"},"creationTimestamp":null,"labels":{"app":"details","version":"v1"}},"spec":{"containers":[{"image":"istio/examples-bookinfo-details-v1","imagePullPolicy":"IfNotPresent","name":"details","ports":[{"containerPort":9080}],"resources":{}},{"args":["proxy","sidecar","-v","2"],"env":[{"name":"POD_NAME","valueFrom":{"fieldRef":{"fieldPath":"metadata.name"}}},{"name":"POD_NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}},{"name":"POD_IP","valueFrom":{"fieldRef":{"fieldPath":"status.podIP"}}}],"image":"docker.io/istio/proxy_debug:0.1","imagePullPolicy":"Always","name":"proxy","resources":{},"securityContext":{"runAsUser":1337},"volumeMounts":[{"mountPath":"/etc/certs","name":"istio-certs","readOnly":true}]}],"volumes":[{"name":"istio-certs","secret":{"secretName":"istio.default"}}]}}},"status":{}}
creationTimestamp: 2017-06-23T13:30:00Z
generation: 1
labels:
app: details
version: v1
name: details-v1
namespace: azs-master
resourceVersion: "29678612"
selfLink: /apis/extensions/v1beta1/namespaces/azs-master/deployments/details-v1
uid: 0eacea4a-5818-11e7-af0e-0a55ca98bb17
spec:
replicas: 1
selector:
matchLabels:
app: details
version: v1
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
annotations:
alpha.istio.io/sidecar: injected
alpha.istio.io/version: jenkins#ubuntu-16-04-build-12ac793f80be71-0.1.6-dab2033
pod.alpha.kubernetes.io/init-containers: '[{"name":"init","image":"docker.io/istio/init:0.1","args":["-p","15001","-u","1337"],"resources":{},"imagePullPolicy":"Always","securityContext":{"capabilities":{"add":["NET_ADMIN"]}}},{"name":"enable-core-dump","image":"alpine","command":["/bin/sh"],"args":["-c","sysctl
-w kernel.core_pattern=/tmp/core.%e.%p.%t \u0026\u0026 ulimit -c unlimited"],"resources":{},"imagePullPolicy":"Always","securityContext":{"privileged":true}}]'
pod.beta.kubernetes.io/init-containers: '[{"name":"init","image":"docker.io/istio/init:0.1","args":["-p","15001","-u","1337"],"resources":{},"imagePullPolicy":"Always","securityContext":{"capabilities":{"add":["NET_ADMIN"]}}},{"name":"enable-core-dump","image":"alpine","command":["/bin/sh"],"args":["-c","sysctl
-w kernel.core_pattern=/tmp/core.%e.%p.%t \u0026\u0026 ulimit -c unlimited"],"resources":{},"imagePullPolicy":"Always","securityContext":{"privileged":true}}]'
creationTimestamp: null
labels:
app: details
version: v1
spec:
containers:
- image: istio/examples-bookinfo-details-v1
imagePullPolicy: IfNotPresent
name: details
ports:
- containerPort: 9080
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
- args:
- proxy
- sidecar
- -v
- "2"
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
image: docker.io/istio/proxy_debug:0.1
imagePullPolicy: Always
name: proxy
resources: {}
securityContext:
runAsUser: 1337
terminationMessagePath: /dev/termination-log
volumeMounts:
- mountPath: /etc/certs
name: istio-certs
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: istio-certs
secret:
defaultMode: 420
secretName: istio.default
status:
conditions:
- lastTransitionTime: 2017-06-23T13:30:00Z
lastUpdateTime: 2017-06-23T13:30:00Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 1
replicas: 1
unavailableReplicas: 1
updatedReplicas: 1
Kubernetes server version: 1.5.6
The Pending state indicates that this was blocked by the Kubelet, which also needs the --allow-priveleged flag.