$ kubectl version --short
Client Version: v1.20.2
Server Version: v1.19.6-eks-49a6c0
I have the following Deployment manifest
apiVersion: apps/v1
kind: Deployment
metadata:
name: stats-service
namespace: my-system
labels:
app: stats-service
spec:
selector:
matchLabels:
app: stats-service
template:
metadata:
labels:
app: stats-service
spec:
containers:
- name: stats-service
image: 0123456789.dkr.ecr.us-east-1.amazonaws.com/stats-service:3.12.1
resources:
requests:
memory: "1024m"
cpu: "512m"
limits:
memory: "2048m"
cpu: "1024m"
ports:
- name: http
containerPort: 5000
protocol: TCP
startupProbe:
httpGet:
path: /manage/health
port: 5000
failureThreshold: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /manage/health
port: 5000
failureThreshold: 3
periodSeconds: 10
readinessProbe:
httpGet:
path: /manage/health
port: 5000
failureThreshold: 6
periodSeconds: 10
env:
- name: SPRING_PROFILES_ACTIVE
value: test
- name: JAVA_OPTS
value: "my_java_opts"
When I apply it I get the following warning, and the Pod never gets created. What does it mean and how to resolve it? In my case, I'm running and EKS Fargate (only) cluster. Thanks!
$ kubectl describe pod stats-service-797784dfd5-tvh84
...
Warning FailedCreatePodSandBox 12s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:319: getting the final child's pid from pipe caused \"read init-p: connection reset by peer\"": unknown
NOTES:
Seems like the warning is related to the spec.template.spec.containers.resources.limits block. If I remove that, the Pod gets created.
A lot of the solutions I read online say to reset Docker, which obviously can't apply to me.
You are using the wrong notations for your resources. As per Meaning of memory:
Limits and requests for memory are measured in bytes. You can express
memory as a plain integer or as a fixed-point number using one of
these suffixes: E, P, T, G, M, K. You can also use the power-of-two
equivalents: Ei, Pi, Ti, Gi, Mi, Ki.
If you want,
Requests:
1GB RAM
0.5 vCPU/Core
Limits:
2GB RAM
1 vCPU/Core
This should work:
resources:
requests:
memory: "1G"
cpu: "0.5"
limits:
memory: "2G"
cpu: "1"
The following is equivalent, but using different notations:
resources:
requests:
memory: "1024M"
cpu: "500m"
limits:
memory: "2048M"
cpu: "1000m"
Notice that the above example uses M for memory, not m.
Related
I try to deploy Matabase on my GKE cluster but I got Readiness probe failed.
I build on my local and get localhost:3000/api/health i got status 200 but on k8s it's not works.
Dockerfile. I create my own for push and build to my GitLab registry
FROM metabase/metabase:v0.41.6
EXPOSE 3000
CMD ["/app/run_metabase.sh" ]
my deployment.yaml
# apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: metaba-dev
spec:
selector:
matchLabels:
app: metaba-dev
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 50%
maxSurge: 100%
template:
metadata:
labels:
app: metaba-dev
spec:
restartPolicy: Always
imagePullSecrets:
- name: gitlab-login
containers:
- name: metaba-dev
image: registry.gitlab.com/team/metabase:dev-{{BUILD_NUMBER}}
command: ["/app/run_metabase.sh" ]
livenessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 60
periodSeconds: 10
readinessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 60
periodSeconds: 10
imagePullPolicy: Always
ports:
- name: metaba-dev-port
containerPort: 3000
terminationGracePeriodSeconds: 90
I got this error from
kubectl describe pod metaba-dev
Warning Unhealthy 61s (x3 over 81s) kubelet Readiness probe failed: Get "http://10.207.128.197:3000/api/health": dial tcp 10.207.128.197:3000: connect: connection refused
Warning Unhealthy 61s (x3 over 81s) kubelet Liveness probe failed: Get "http://10.207.128.197:3000/api/health": dial tcp 10.207.128.197:3000: connect: connection refused
kubectl logs
Picked up JAVA_TOOL_OPTIONS: -Xmx1g -Xms1g -Xmx1g
Warning: environ value jdk-11.0.13+8 for key :java-version has been overwritten with 11.0.13
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2022-01-28 15:32:23,966 INFO metabase.util :: Maximum memory available to JVM: 989.9 MB
2022-01-28 15:33:09,703 INFO util.encryption :: Saved credentials encryption is ENABLED for this Metabase instance. 🔐
For more information, see https://metabase.com/docs/latest/operations-guide/encrypting-database-details-at-rest.html
Here Solution
I add initialDelaySeconds: to 1200 and check logging it's cause about network mypod cannot connect to database and when i checking log i did not see that cause i has been restart and it's was a new log
Try to change your initialDelaySeconds: 60 to 100
And you should always set the resource request and limit in your container to avoid the probe failure, this is because when your app starts hitting the resource limit the kubernetes starts throttling your container.
containers:
- name: app
image: images.my-company.example/app:v4
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "100Mi"
cpu: "500m"
I'm running into a missing resources issue when submitting a Workflow. The Kubernetes namespace my-namespace has a quota enabled, and for whatever reason the pods being created after submitting the workflow are failing with:
pods "hello" is forbidden: failed quota: team: must specify limits.cpu,limits.memory,requests.cpu,requests.memory
I'm submitting the following Workflow,
apiVersion: "argoproj.io/v1alpha1"
kind: "Workflow"
metadata:
name: "hello"
namespace: "my-namespace"
spec:
entrypoint: "main"
templates:
- name: "main"
container:
image: "docker/whalesay"
resources:
requests:
memory: 0
cpu: 0
limits:
memory: "128Mi"
cpu: "250m"
Argo is running on Kubernetes 1.19.6 and was deployed with the official Helm chart version 0.16.10. Here are my Helm values:
controller:
workflowNamespaces:
- "my-namespace"
resources:
requests:
memory: 0
cpu: 0
limits:
memory: 500Mi
cpu: 0.5
pdb:
enabled: true
# See https://argoproj.github.io/argo-workflows/workflow-executors/
# docker container runtime is not present in the TKGI clusters
containerRuntimeExecutor: "k8sapi"
workflow:
namespace: "my-namespace"
serviceAccount:
create: true
rbac:
create: true
server:
replicas: 2
secure: false
resources:
requests:
memory: 0
cpu: 0
limits:
memory: 500Mi
cpu: 0.5
pdb:
enabled: true
executer:
resources:
requests:
memory: 0
cpu: 0
limits:
memory: 500Mi
cpu: 0.5
Any ideas on what I may be missing? Thanks, Weldon
Update 1: I tried another namespace without quotas enabled and got past the missing resources issue. However I now see: Failed to establish pod watch: timed out waiting for the condition. Here's what the spec looks like for this pod. You can see the wait container is missing resources. This is the container causing the issue reported by this question.
spec:
containers:
- command:
- argoexec
- wait
env:
- name: ARGO_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: ARGO_CONTAINER_RUNTIME_EXECUTOR
value: k8sapi
image: argoproj/argoexec:v2.12.5
imagePullPolicy: IfNotPresent
name: wait
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /argo/podmetadata
name: podmetadata
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-v4jlb
readOnly: true
- image: docker/whalesay
imagePullPolicy: Always
name: main
resources:
limits:
cpu: 250m
memory: 128Mi
requests:
cpu: "0"
memory: "0"
try deploying the workflow on another namespace if you can, and verify if it's working or not.
if you can try with removing the quota for respective namespace.
instead of quota you can also use the
apiVersion: v1
kind: LimitRange
metadata:
name: default-limit-range
spec:
limits:
- default:
memory: 512Mi
cpu: 250m
defaultRequest:
cpu: 50m
memory: 64Mi
type: Container
so any container have not resource request, limit mentioned that will get this default config of 50m CPU & 64 Mi Memory.
https://kubernetes.io/docs/concepts/policy/limit-range/
I am trying to add a file to a pod's disk during initialization of the pod but without luck. Below is my deployment file which I use to deploy the pod. The file gets downloaded to the persistent volume, but the pod doesn't get into ready state. After a few seconds, the pods fail and get rebuilt. Which kicks off the whole process again.
Any help would be appreciated.
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: mapserver
spec:
selector:
matchLabels:
app: mapserver
template:
metadata:
labels:
app: mapserver
spec:
volumes:
- name: storage
persistentVolumeClaim:
claimName: mapserver-pv-claim
containers:
- name: maptiles
image: klokantech/tileserver-gl
command: ["/bin/sh"]
args:
- -c
- |
echo "[INFO] Startingcontainer"; if [ $(DOWNLOAD_MBTILES) = "true" ]; then
echo "[INFO] Download MBTILES_PLANET_URL";
rm /data/*
cd /data/
curl -k -sSL -X GET -u user:ww $(MBTILES_PLANET_URL) -O
echo "[INFO] Download finished";
fi;
env:
- name: MBTILES_PLANET_URL
value: 'https://abc-dev/nexus/repository/xyz-raw/2017-07-03_europe_netherlands.mbtiles'
- name: DOWNLOAD_MBTILES
value: 'true'
livenessProbe:
failureThreshold: 120
httpGet:
path: /health
port: 80
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
ports:
- containerPort: 80
name: http
protocol: TCP
readinessProbe:
failureThreshold: 120
httpGet:
path: /health
port: 80
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: 300m
memory: 3Gi
requests:
cpu: 100m
memory: 1Gi
volumeMounts:
- mountPath: "/data"
name: storage
am trying to add a file to a pod's disk during initialization of the pod but without luck.
In that case you might want to use InitContainers instead.
Judging from your manifest, your main command gets executed (copies the file and then exits) terminating the container (and accompanying pod) in the process. Deployment then restarts the exited pod and cycle repeats. If you use InitContainers instead (with the same definition and same PV as you are doing now for main container) you should then prepopulate data using InitContaienrs that runs to completion and then continue to use it in your normal container (that should have non-exiting main process as its command/entry point).
Note: if you don't want to use InitContainers or just as a quick test, you could append a regular non-exiting command after your copy statement, and also, check if you need to start container with tty, depending on your use case and ways to keep container up and running.
When running a deployment I get downtime. Requests failing after a variable amount of time (20-40 seconds).
The readiness check for the entry container fails when the preStop sends SIGUSR1, waits for 31 seconds, then sends SIGTERM. In that timeframe the pod should be removed from the service as the readiness check is set to fail after 2 failed attempts with 5 second intervals.
How can I see the events for pods being added and removed from the service to find out what's causing this?
And events around the readiness checks themselves?
I use Google Container Engine version 1.2.2 and use GCE's network load balancer.
service:
apiVersion: v1
kind: Service
metadata:
name: myapp
labels:
app: myapp
spec:
type: LoadBalancer
ports:
- name: http
port: 80
targetPort: http
protocol: TCP
- name: https
port: 443
targetPort: https
protocol: TCP
selector:
app: myapp
deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
strategy:
type: RollingUpdate
revisionHistoryLimit: 10
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
version: 1.0.0-61--66-6
spec:
containers:
- name: myapp
image: ****
resources:
limits:
cpu: 100m
memory: 250Mi
requests:
cpu: 10m
memory: 125Mi
ports:
- name: http-direct
containerPort: 5000
livenessProbe:
httpGet:
path: /status
port: 5000
initialDelaySeconds: 30
timeoutSeconds: 1
lifecycle:
preStop:
exec:
# SIGTERM triggers a quick exit; gracefully terminate instead
command: ["sleep 31;"]
- name: haproxy
image: travix/haproxy:1.6.2-r0
imagePullPolicy: Always
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 10m
memory: 25Mi
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
env:
- name: "SSL_CERTIFICATE_NAME"
value: "ssl.pem"
- name: "OFFLOAD_TO_PORT"
value: "5000"
- name: "HEALT_CHECK_PATH"
value: "/status"
volumeMounts:
- name: ssl-certificate
mountPath: /etc/ssl/private
livenessProbe:
httpGet:
path: /status
port: 443
scheme: HTTPS
initialDelaySeconds: 30
timeoutSeconds: 1
readinessProbe:
httpGet:
path: /readiness
port: 81
initialDelaySeconds: 0
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 2
lifecycle:
preStop:
exec:
# SIGTERM triggers a quick exit; gracefully terminate instead
command: ["kill -USR1 1; sleep 31; kill 1"]
volumes:
- name: ssl-certificate
secret:
secretName: ssl-c324c2a587ee-20160331
When the probe fails, the prober will emit a warning event with reason as Unhealthy and message as xx probe errored: xxx.
You should be able to find those events using either kubectl get events or kubectl describe pods -l app=myapp,version=1.0.0-61--66-6 (filter pods by its label).
My pods can not resolve external world ( for ex for mail, ... ) how can I add google nameserver to the cluster ? For info the host resolve it without problem and has nameserver.
The problem is that the liveness check made skids fail, I changed it like bellow.
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-dns-v10
namespace: kube-system
labels:
k8s-app: kube-dns
version: v10
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: kube-dns
version: v10
template:
metadata:
labels:
k8s-app: kube-dns
version: v10
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: etcd
image: gcr.io/google_containers/etcd:2.0.9
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 100m
memory: 50Mi
requests:
cpu: 100m
memory: 50Mi
command:
- /usr/local/bin/etcd
- -data-dir
- /var/etcd/data
- -listen-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -advertise-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -initial-cluster-token
- skydns-etcd
volumeMounts:
- name: etcd-storage
mountPath: /var/etcd/data
- name: kube2sky
image: gcr.io/google_containers/kube2sky:1.12
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 100m
memory: 50Mi
requests:
cpu: 100m
memory: 50Mi
args:
# command = "/kube2sky"
- --domain=cluster.local
- name: skydns
image: gcr.io/google_containers/skydns:2015-10-13-8c72f8c
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 100m
memory: 50Mi
requests:
cpu: 100m
memory: 50Mi
args:
# command = "/skydns"
- -machines=http://127.0.0.1:4001
- -addr=0.0.0.0:53
- -ns-rotate=false
- -domain=cluster.local.
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 15
#readinessProbe:
#httpGet:
#path: /healthz
#port: 8080
#scheme: HTTP
#initialDelaySeconds: 1
#timeoutSeconds: 5
- name: healthz
image: gcr.io/google_containers/exechealthz:1.0
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
args:
- -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
- -port=8080
ports:
- containerPort: 8080
protocol: TCP
volumes:
- name: etcd-storage
emptyDir: {}
dnsPolicy: Default # Don't use cluster DNS.