I'm running into a missing resources issue when submitting a Workflow. The Kubernetes namespace my-namespace has a quota enabled, and for whatever reason the pods being created after submitting the workflow are failing with:
pods "hello" is forbidden: failed quota: team: must specify limits.cpu,limits.memory,requests.cpu,requests.memory
I'm submitting the following Workflow,
apiVersion: "argoproj.io/v1alpha1"
kind: "Workflow"
metadata:
name: "hello"
namespace: "my-namespace"
spec:
entrypoint: "main"
templates:
- name: "main"
container:
image: "docker/whalesay"
resources:
requests:
memory: 0
cpu: 0
limits:
memory: "128Mi"
cpu: "250m"
Argo is running on Kubernetes 1.19.6 and was deployed with the official Helm chart version 0.16.10. Here are my Helm values:
controller:
workflowNamespaces:
- "my-namespace"
resources:
requests:
memory: 0
cpu: 0
limits:
memory: 500Mi
cpu: 0.5
pdb:
enabled: true
# See https://argoproj.github.io/argo-workflows/workflow-executors/
# docker container runtime is not present in the TKGI clusters
containerRuntimeExecutor: "k8sapi"
workflow:
namespace: "my-namespace"
serviceAccount:
create: true
rbac:
create: true
server:
replicas: 2
secure: false
resources:
requests:
memory: 0
cpu: 0
limits:
memory: 500Mi
cpu: 0.5
pdb:
enabled: true
executer:
resources:
requests:
memory: 0
cpu: 0
limits:
memory: 500Mi
cpu: 0.5
Any ideas on what I may be missing? Thanks, Weldon
Update 1: I tried another namespace without quotas enabled and got past the missing resources issue. However I now see: Failed to establish pod watch: timed out waiting for the condition. Here's what the spec looks like for this pod. You can see the wait container is missing resources. This is the container causing the issue reported by this question.
spec:
containers:
- command:
- argoexec
- wait
env:
- name: ARGO_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: ARGO_CONTAINER_RUNTIME_EXECUTOR
value: k8sapi
image: argoproj/argoexec:v2.12.5
imagePullPolicy: IfNotPresent
name: wait
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /argo/podmetadata
name: podmetadata
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-v4jlb
readOnly: true
- image: docker/whalesay
imagePullPolicy: Always
name: main
resources:
limits:
cpu: 250m
memory: 128Mi
requests:
cpu: "0"
memory: "0"
try deploying the workflow on another namespace if you can, and verify if it's working or not.
if you can try with removing the quota for respective namespace.
instead of quota you can also use the
apiVersion: v1
kind: LimitRange
metadata:
name: default-limit-range
spec:
limits:
- default:
memory: 512Mi
cpu: 250m
defaultRequest:
cpu: 50m
memory: 64Mi
type: Container
so any container have not resource request, limit mentioned that will get this default config of 50m CPU & 64 Mi Memory.
https://kubernetes.io/docs/concepts/policy/limit-range/
Related
My jenkins agent is deploy on the k8s, here is the agent yaml:
---
apiVersion: v1
kind: Pod
metadata:
labels:
jenkins: slave
cluster: dev-monitor-platform
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: jenkins
operator: In
values:
- ci
securityContext:
runAsUser: 0
runAsGroup: 0
fsGroup: 0
containers:
- name: slave-docker
image: harbor.mycompany.net/jenkins/docker:19.03-git
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: 50m
memory: 256Mi
limits:
cpu: 100m
memory: 512Mi
securityContext:
privileged: true
command:
- cat
tty: true
volumeMounts:
- mountPath: /var/run/docker.sock
name: docker-sock
- mountPath: /root/.m2
name: jenkins-maven-m2
- mountPath: /home/jenkins/
name: workspace-volume
readOnly: false
- name: jnlp
image: harbor.mycompany.net/jenkins/inbound-agent:alpine-jdk11
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: 50m
memory: 256Mi
limits:
cpu: 100m
memory: 512Mi
volumeMounts:
- mountPath: /home/jenkins/
name: workspace-volume
readOnly: false
volumes:
- name: docker-sock
hostPath:
path: /var/run/docker.sock
- name: workspace-volume
emptyDir: {}
- name: jenkins-maven-m2
nfs:
path: /export/mid-devops/jenkins/m2
server: xxx.xxx.xxx.xxx
The master itself pull code is fast:
However when the agent try to pull pipeline codes, it always gets stuck for about One-and-a-half minutes:
and when agent pull application codes, it is fast.
This problem happens every time, I have no idea.
I expect the agent don't get stuck when checkout codes
I have deployment with volumes and limits presented above.
The problem is that kubernetes reject create pod with such error:
pods "app-app-96d5dc969-2g6zp" is forbidden:
exceeded quota: general-resourcequota, requested: limits.ephemeral-storage=1280Mi,
used: limits.ephemeral-storage=0, limited: limits.ephemeral-storage=1Gi
As I've understood nodes have limit 1Gi for ephemeral-storage, but what is 1280Mi?
Is it correct, that kubernetes allocate some amount of memory for each volume?
...
spec:
containers:
resources:
limits:
cpu: 1
memory: 3Gi
ephemeral-storage: "1Gi"
requests:
cpu: 1
memory: 3Gi
ephemeral-storage: "1Gi"
volumeMounts:
- name: app-logs
mountPath: /app/log
- name: app-tmp
mountPath: /tmp
- name: app-backups
mountPath: /app/backups
- name: app-logback
mountPath: /app/config/logback.xml
subPath: logback.xml
- name: app-mdc
mountPath: /app/config/mdc.properties
subPath: mdc.properties
volumes:
- name: app-logs
emptyDir: {}
- name: app-tmp
emptyDir: {}
- name: app-backups
emptyDir: {}
- name: app-logback
configMap:
name: "{{ include "app.fullname" . }}-app-logback"
- name: app-mdc
configMap:
name: "{{ include "app.fullname" . }}-app-mdc"
Resource quotes for namespace:
kubectl describe quota
Name: general-resourcequota
Namespace: app
Resource Used Hard
-------- ---- ----
configmaps 5 15
limits.cpu 0 4
limits.ephemeral-storage 0 1Gi
limits.memory 0 8Gi
pods 0 10
requests.cpu 0 2
requests.memory 0 4Gi
requests.storage 0 50Gi
services 1 20
services.loadbalancers 1 5
services.nodeports 2 5
You namespace has a quota set to cap at 1Gi:
limits.ephemeral-storage 0 1Gi
The messaging said that the namespace will exceed the limit and reach 1.28Gi (1280Mi) with your deployment.
Reduce your limit to 700Mi to stay within the 1Gi limit and your pod will be schedule accordingly. Note that quota aggregates resource consumption in the namespace, not per pod basis.
You need to check resource quota set at the namespace level where you are running the ephemeral pod.
When I want to run the following YAML file, I get the following error:
error: error parsing pod2.yaml: error converting YAML to JSON: yaml: line 8: mapping values are not allowed in this context
---
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
name: wp
image: wordpress
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
You need to fix indentation and also containers is a list:
---
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: wp
image: wordpress
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
Trying to deploy heapster-controller to get Heapster + Graphana + InfluxDB working for Kubernetes. Getting error messages while trying ot deploy using heapster-controller.yaml file:
heapster-controller.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: heapster-v1.1.0-beta1
namespace: kube-system
labels:
k8s-app: heapster
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
matchLabels:
k8s-app: heapster
template:
metadata:
labels:
k8s-app: heapster
kubernetes.io/cluster-service: "true"
spec:
containers:
- image: gcr.io/google_containers/heapster:v1.1.0-beta1
name: heapster
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 100m
memory: 200m
requests:
cpu: 100m
memory: 200m
command:
- /heapster
- --source=kubernetes.summary_api:''
- --sink=influxdb:http://monitoring-influxdb:8086
- --metric_resolution=60s
- image: gcr.io/google_containers/heapster:v1.1.0-beta1
name: eventer
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 100m
memory: 200m
requests:
cpu: 100m
memory: 200m
command:
- /eventer
- --source=kubernetes:''
- --sink=influxdb:http://monitoring-influxdb:8086
- image: gcr.io/google_containers/addon-resizer:1.0
name: heapster-nanny
resources:
limits:
cpu: 50m
memory: 100Mi
requests:
cpu: 50m
memory: 100Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --cpu=100m
- --extra-cpu=0m
- --memory=200
- --extra-memory=200Mi
- --threshold=5
- --deployment=heapster-v1.1.0-beta1
- --container=heapster
- --poll-period=300000
- image: gcr.io/google_containers/addon-resizer:1.0
name: eventer-nanny
resources:
limits:
cpu: 50m
memory: 100Mi
requests:
cpu: 50m
memory: 100Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --cpu=100m
- --extra-cpu=0m
- --memory=200
- --extra-memory=200Ki
- --threshold=5
- --deployment=heapster-v1.1.0-beta1
- --container=eventer
- --poll-period=300000
Deployment goes through, but then I get error:
[root#node236 influxdb]# kubectl get pods -o wide --namespace=kube-system
NAME READY STATUS RESTARTS AGE NODE
heapster-v1.1.0-beta1-3082378092-t6inb 2/4 RunContainerError 0 1m node262.local.net
[root#node236 influxdb]#
Display log for the failed container:
[root#node236 influxdb]# kubectl logs --namespace=kube-system heapster-v1.1.0-beta1-3082378092-t6inb
Error from server: a container name must be specified for pod heapster-v1.1.0-beta1-3082378092-t6inb, choose one of: [heapster eventer heapster-nanny eventer-nanny]
[root#node236 influxdb]#
Where am I possibly going wrong ?
Any feedback appreaciated!!
Alex
The correct syntax is kubectl logs <pod> <container>.
In your example, kubectl logs heapster-v1.1.0-beta1-3082378092-t6inb heapster --namespace=kube-system will show the logs of the "heapster" container within the named pod.
Thanks a lot for the feedback. I think my problem lies around tls-certs. Need to dig deeper.
Thanks so much yet again!!
My pods can not resolve external world ( for ex for mail, ... ) how can I add google nameserver to the cluster ? For info the host resolve it without problem and has nameserver.
The problem is that the liveness check made skids fail, I changed it like bellow.
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-dns-v10
namespace: kube-system
labels:
k8s-app: kube-dns
version: v10
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: kube-dns
version: v10
template:
metadata:
labels:
k8s-app: kube-dns
version: v10
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: etcd
image: gcr.io/google_containers/etcd:2.0.9
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 100m
memory: 50Mi
requests:
cpu: 100m
memory: 50Mi
command:
- /usr/local/bin/etcd
- -data-dir
- /var/etcd/data
- -listen-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -advertise-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -initial-cluster-token
- skydns-etcd
volumeMounts:
- name: etcd-storage
mountPath: /var/etcd/data
- name: kube2sky
image: gcr.io/google_containers/kube2sky:1.12
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 100m
memory: 50Mi
requests:
cpu: 100m
memory: 50Mi
args:
# command = "/kube2sky"
- --domain=cluster.local
- name: skydns
image: gcr.io/google_containers/skydns:2015-10-13-8c72f8c
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 100m
memory: 50Mi
requests:
cpu: 100m
memory: 50Mi
args:
# command = "/skydns"
- -machines=http://127.0.0.1:4001
- -addr=0.0.0.0:53
- -ns-rotate=false
- -domain=cluster.local.
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 15
#readinessProbe:
#httpGet:
#path: /healthz
#port: 8080
#scheme: HTTP
#initialDelaySeconds: 1
#timeoutSeconds: 5
- name: healthz
image: gcr.io/google_containers/exechealthz:1.0
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
args:
- -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
- -port=8080
ports:
- containerPort: 8080
protocol: TCP
volumes:
- name: etcd-storage
emptyDir: {}
dnsPolicy: Default # Don't use cluster DNS.