I'm using K8S 1.14 and Helm 3.3.1.
I have an app which works when deployed without probes. Then I set two trivial probes:
livenessProbe:
exec:
command:
- ls
- /mnt
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
exec:
command:
- ls
- /mnt
initialDelaySeconds: 5
periodSeconds: 5
When I deploy via helm upgrade, the command eventually (~5 mins) fails with:
Error: UPGRADE FAILED: release my-app failed, and has been rolled back due to atomic being set: timed out waiting for the condition
But in the events log there is no trace of any probe:
5m21s Normal ScalingReplicaSet deployment/my-app Scaled up replica set my-app-7 to 1
5m21s Normal Scheduled pod/my-app-7-6 Successfully assigned default/my-app-7-6 to gke-foo-testing-foo-testing-node-po-111-r0cu
5m21s Normal LoadBalancerNegNotReady pod/my-app-7-6 Waiting for pod to become healthy in at least one of the NEG(s): [k8s1-222-default-my-app-80-54]
5m21s Normal SuccessfulCreate replicaset/my-app-7 Created pod: my-app-7-6
5m20s Normal Pulling pod/my-app-7-6 Pulling image "my-registry/my-app:v0.1"
5m20s Normal Pulled pod/my-app-7-6 Successfully pulled image "my-registry/my-app:v0.1"
5m20s Normal Created pod/my-app-7-6 Created container my-app
5m20s Normal Started pod/my-app-7-6 Started container my-app
5m15s Normal Attach service/my-app Attach 1 network endpoint(s) (NEG "k8s1-222-default-my-app-80-54" in zone "europe-west3-a")
19s Normal ScalingReplicaSet deployment/my-app Scaled down replica set my-app-7 to 0
19s Normal SuccessfulDelete replicaset/my-app-7 Deleted pod: my-app-7-6
19s Normal Killing pod/my-app-7-6 Stopping container my-app
Hence the question: what are the probes doing and where?
Try deleting the helm then re-apply it again: helm del --purge <APPNAME>
Also which helm version are you using? Try upgrading to v3.2.1, there's an open issue that tries to fix this incident with previously failed upgrades: https://github.com/helm/helm/issues/5939
I reproduced the same scenario here and everything went fine. The release was deployed and the pod is running. Did you check within the container if the /mnt really exists?
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m41s Successfully assigned default/nginx-deployment2-5cdd568667-blsc7 to minikube
Normal Pulling 3m41s kubelet, minikube Pulling image "nginx"
Normal Pulled 3m38s kubelet, minikube Successfully pulled image "nginx" in 2.769840982s
Normal Created 3m38s kubelet, minikube Created container nginx
Normal Started 3m38s kubelet, minikube Started container nginx
NAME READY STATUS RESTARTS AGE
nginx-deployment2-5cdd568667-blsc7 1/1 Running 0 4m59s
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment2
spec:
selector:
matchLabels:
app: ameba
replicas: 1
template:
metadata:
labels:
app: ameba
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
name: nginx-port
livenessProbe:
exec:
command:
- ls
- /mnt
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
exec:
command:
- ls
- /mnt
initialDelaySeconds: 5
periodSeconds: 5
I don't if you image include bash, but if you just want to verify if the directory exists, you can do the samething using others shell commands, try this:
livenessProbe:
exec:
command:
- /bin/bash
- -c
- ls /mnt
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
exec:
command:
- /bin/bash
- -c
- ls /mnt
initialDelaySeconds: 5
periodSeconds: 5
In bash you can also try to use the test built-in function:
[[ -d /mnt ]] = The -d verify if the directory /mnt exists.
As an alternative, there is also the command stat:
stat /mnt
If you want to check if the directory has any specific file, use the complete path with filename include.
Related
I'm using AWS EKS Fargate to deploy my work. After applying the deployment yaml file, everything goes well in first 10mins, but after that, I failed to access the pod by using kubectl exec <podname> -- bash, when typing kubectl describe pod <podname>, both readinessProbe and livenessProbe return similar messages as below:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning LoggingDisabled 16m fargate-scheduler Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found
Normal Scheduled 15m fargate-scheduler Successfully assigned k8s-fargate/k8s-api-5765846f76-d7nws to fargate-ip-10-0-130-250.ap-east-1.compute.internal
Normal Pulling 15m kubelet Pulling image "awsaccid.dkr.ecr.ap-east-1.amazonaws.com/k8s-api-test:1.0.0"
Normal Pulled 14m kubelet Successfully pulled image "awsaccid.dkr.ecr.ap-east-1.amazonaws.com/k8s-api-test:1.0.0" in 1m18.703187993s
Normal Created 14m kubelet Created container k8s-api
Normal Started 14m kubelet Started container k8s-api
Warning Unhealthy 2m18s kubelet Readiness probe errored: rpc error: code = Unknown desc = failed to exec in container: failed to start exec "c2a2e9750a44684104a7e76a92bf7abe814ba29f306b092a48e17b90aab7f2dd": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: resource temporarily unavailable: unknown
Warning Unhealthy 2m13s kubelet Readiness probe errored: rpc error: code = Unknown desc = failed to exec in container: failed to start exec "bcb60f638e2c364adc8694bc12f00660e2b0d7647d3861d3462727976d2df08c": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: resource temporarily unavailable: unknown
Warning Unhealthy 2m8s kubelet Readiness probe errored: rpc error: code = Unknown desc = failed to exec in container: failed to start exec "988d29870b88fdcaae3cedf1071e79d2a786638c801364d71b6c7886f0be79e1": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: resource temporarily unavailable: unknown
Moreover, livenessProbe hasn't restart the pod even it is unhealthy.
I spent a whole day for this but still failed to solve it, anyone knows the problem? Thank you so much
Here's my deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: k8s-fargate
name: k8s-api
spec:
replicas: 1
selector:
matchLabels:
app: k8s-api
template:
metadata:
labels:
app: k8s-api
spec:
volumes:
- name: k8s-properties
configMap:
name: k8s-properties
containers:
- name: k8s-api
image: awsaccountid.dkr.ecr.ap-east-1.amazonaws.com/k8s-test:1.0.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8443
resources:
requests:
memory: "1024Mi"
cpu: "200m"
limits:
memory: "2500Mi"
cpu: "1000m"
volumeMounts:
- name: k8s-properties
mountPath: "/usr/local/folder"
readOnly: false
livenessProbe:
exec:
command:
- cat
- /usr/local/folder/file
initialDelaySeconds: 5
periodSeconds: 30
readinessProbe:
exec:
command:
- cat
- /usr/local/folder/file
initialDelaySeconds: 5
periodSeconds: 5
Problem solved by creating new Docker image. Still have no idea of the error, but problem likely comes from the image container itself.
I'm following a tutorial https://docs.openfaas.com/tutorials/first-python-function/,
currently, I have the right image
$ docker images | grep hello-openfaas
wm/hello-openfaas latest bd08d01ce09b 34 minutes ago 65.2MB
$ faas-cli deploy -f ./hello-openfaas.yml
Deploying: hello-openfaas.
WARNING! You are not using an encrypted connection to the gateway, consider using HTTPS.
Deployed. 202 Accepted.
URL: http://IP:8099/function/hello-openfaas
there is a step that forewarns me to do some setup(My case is I'm using Kubernetes and minikube and don't want to push to a remote container registry, I should enable the use of images from the local library on Kubernetes.), I see the hints
see the helm chart for how to set the ImagePullPolicy
I'm not sure how to configure it correctly. the final result indicates I failed.
Unsurprisingly, I couldn't access the function service, I find some clues in https://docs.openfaas.com/deployment/troubleshooting/#openfaas-didnt-start which might help to diagnose the problem.
$ kubectl logs -n openfaas-fn deploy/hello-openfaas
Error from server (BadRequest): container "hello-openfaas" in pod "hello-openfaas-558f99477f-wd697" is waiting to start: trying and failing to pull image
$ kubectl describe -n openfaas-fn deploy/hello-openfaas
Name: hello-openfaas
Namespace: openfaas-fn
CreationTimestamp: Wed, 16 Mar 2022 14:59:49 +0800
Labels: faas_function=hello-openfaas
Annotations: deployment.kubernetes.io/revision: 1
prometheus.io.scrape: false
Selector: faas_function=hello-openfaas
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 0 max unavailable, 1 max surge
Pod Template:
Labels: faas_function=hello-openfaas
Annotations: prometheus.io.scrape: false
Containers:
hello-openfaas:
Image: wm/hello-openfaas:latest
Port: 8080/TCP
Host Port: 0/TCP
Liveness: http-get http://:8080/_/health delay=2s timeout=1s period=2s #success=1 #failure=3
Readiness: http-get http://:8080/_/health delay=2s timeout=1s period=2s #success=1 #failure=3
Environment:
fprocess: python3 index.py
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing False ProgressDeadlineExceeded
OldReplicaSets: <none>
NewReplicaSet: hello-openfaas-558f99477f (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 29m deployment-controller Scaled up replica set hello-openfaas-558f99477f to 1
hello-openfaas.yml
version: 1.0
provider:
name: openfaas
gateway: http://IP:8099
functions:
hello-openfaas:
lang: python3
handler: ./hello-openfaas
image: wm/hello-openfaas:latest
imagePullPolicy: Never
I create a new project hello-openfaas2 to reproduce this error
$ faas-cli new --lang python3 hello-openfaas2 --prefix="wm"
Folder: hello-openfaas2 created.
# I add `imagePullPolicy: Never` to `hello-openfaas2.yml`
$ faas-cli build -f ./hello-openfaas2.yml
$ faas-cli deploy -f ./hello-openfaas2.yml
Deploying: hello-openfaas2.
WARNING! You are not using an encrypted connection to the gateway, consider using HTTPS.
Deployed. 202 Accepted.
URL: http://192.168.1.3:8099/function/hello-openfaas2
$ kubectl logs -n openfaas-fn deploy/hello-openfaas2
Error from server (BadRequest): container "hello-openfaas2" in pod "hello-openfaas2-7c67488865-7d7vm" is waiting to start: image can't be pulled
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-64897985d-kp7vf 1/1 Running 0 47h
...
openfaas-fn env-6c79f7b946-bzbtm 1/1 Running 0 4h28m
openfaas-fn figlet-54db496f88-957xl 1/1 Running 0 18h
openfaas-fn hello-openfaas-547857b9d6-z277c 0/1 ImagePullBackOff 0 127m
openfaas-fn hello-openfaas-7b6946b4f9-hcvq4 0/1 ImagePullBackOff 0 165m
openfaas-fn hello-openfaas2-7c67488865-qmrkl 0/1 ImagePullBackOff 0 13m
openfaas-fn hello-openfaas3-65847b8b67-b94kd 0/1 ImagePullBackOff 0 97m
openfaas-fn hello-python-554b464498-zxcdv 0/1 ErrImagePull 0 3h23m
openfaas-fn hello-python-8698bc68bd-62gh9 0/1 ImagePullBackOff 0 3h25m
from https://docs.openfaas.com/reference/yaml/, I know I put the imagePullPolicy in the wrong place, there is no such keyword in its schema.
I also tried eval $(minikube docker-env and still get the same error.
I've a feeling that faas-cli deploy can be replace by helm, they all mean to run the image(whether from remote or local) in Kubernetes cluster, then I can use helm chart to setup the pullPolicy there. Even though the detail is not still clear to me, This discovery inspires me.
So far, after eval $(minikube docker-env)
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
wm/hello-openfaas2 0.1 03c21bd96d5e About an hour ago 65.2MB
python 3-alpine 69fba17b9bae 12 days ago 48.6MB
ghcr.io/openfaas/figlet latest ca5eef0de441 2 weeks ago 14.8MB
ghcr.io/openfaas/alpine latest 35f3d4be6bb8 2 weeks ago 14.2MB
ghcr.io/openfaas/faas-netes 0.14.2 524b510505ec 3 weeks ago 77.3MB
k8s.gcr.io/kube-apiserver v1.23.3 f40be0088a83 7 weeks ago 135MB
k8s.gcr.io/kube-controller-manager v1.23.3 b07520cd7ab7 7 weeks ago 125MB
k8s.gcr.io/kube-scheduler v1.23.3 99a3486be4f2 7 weeks ago 53.5MB
k8s.gcr.io/kube-proxy v1.23.3 9b7cc9982109 7 weeks ago 112MB
ghcr.io/openfaas/gateway 0.21.3 ab4851262cd1 7 weeks ago 30.6MB
ghcr.io/openfaas/basic-auth 0.21.3 16e7168a17a3 7 weeks ago 14.3MB
k8s.gcr.io/etcd 3.5.1-0 25f8c7f3da61 4 months ago 293MB
ghcr.io/openfaas/classic-watchdog 0.2.0 6f97aa96da81 4 months ago 8.18MB
k8s.gcr.io/coredns/coredns v1.8.6 a4ca41631cc7 5 months ago 46.8MB
k8s.gcr.io/pause 3.6 6270bb605e12 6 months ago 683kB
ghcr.io/openfaas/queue-worker 0.12.2 56e7216201bc 7 months ago 7.97MB
kubernetesui/dashboard v2.3.1 e1482a24335a 9 months ago 220MB
kubernetesui/metrics-scraper v1.0.7 7801cfc6d5c0 9 months ago 34.4MB
nats-streaming 0.22.0 12f2d32e0c9a 9 months ago 19.8MB
gcr.io/k8s-minikube/storage-provisioner v5 6e38f40d628d 11 months ago 31.5MB
functions/markdown-render latest 93b5da182216 2 years ago 24.6MB
functions/hubstats latest 01affa91e9e4 2 years ago 29.3MB
functions/nodeinfo latest 2fe8a87bf79c 2 years ago 71.4MB
functions/alpine latest 46c6f6d74471 2 years ago 21.5MB
prom/prometheus v2.11.0 b97ed892eb23 2 years ago 126MB
prom/alertmanager v0.18.0 ce3c87f17369 2 years ago 51.9MB
alexellis2/openfaas-colorization 0.4.1 d36b67b1b5c1 2 years ago 1.84GB
rorpage/text-to-speech latest 5dc20810eb54 2 years ago 86.9MB
stefanprodan/faas-grafana 4.6.3 2a4bd9caea50 4 years ago 284MB
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-64897985d-kp7vf 1/1 Running 0 6d
kube-system etcd-minikube 1/1 Running 0 6d
kube-system kube-apiserver-minikube 1/1 Running 0 6d
kube-system kube-controller-manager-minikube 1/1 Running 0 6d
kube-system kube-proxy-5m8lr 1/1 Running 0 6d
kube-system kube-scheduler-minikube 1/1 Running 0 6d
kube-system storage-provisioner 1/1 Running 1 (6d ago) 6d
kubernetes-dashboard dashboard-metrics-scraper-58549894f-97tsv 1/1 Running 0 5d7h
kubernetes-dashboard kubernetes-dashboard-ccd587f44-lkwcx 1/1 Running 0 5d7h
openfaas-fn base64-6bdbcdb64c-djz8f 1/1 Running 0 5d1h
openfaas-fn colorise-85c74c686b-2fz66 1/1 Running 0 4d5h
openfaas-fn echoit-5d7df6684c-k6ljn 1/1 Running 0 5d1h
openfaas-fn env-6c79f7b946-bzbtm 1/1 Running 0 4d5h
openfaas-fn figlet-54db496f88-957xl 1/1 Running 0 4d19h
openfaas-fn hello-openfaas-547857b9d6-z277c 0/1 ImagePullBackOff 0 4d3h
openfaas-fn hello-openfaas-7b6946b4f9-hcvq4 0/1 ImagePullBackOff 0 4d3h
openfaas-fn hello-openfaas2-5c6f6cb5d9-24hkz 0/1 ImagePullBackOff 0 9m22s
openfaas-fn hello-openfaas2-8957bb47b-7cgjg 0/1 ImagePullBackOff 0 2d22h
openfaas-fn hello-openfaas3-65847b8b67-b94kd 0/1 ImagePullBackOff 0 4d2h
openfaas-fn hello-python-6d6976845f-cwsln 0/1 ImagePullBackOff 0 3d19h
openfaas-fn hello-python-b577cb8dc-64wf5 0/1 ImagePullBackOff 0 3d9h
openfaas-fn hubstats-b6cd4dccc-z8tvl 1/1 Running 0 5d1h
openfaas-fn markdown-68f69f47c8-w5m47 1/1 Running 0 5d1h
openfaas-fn nodeinfo-d48cbbfcc-hfj79 1/1 Running 0 5d1h
openfaas-fn openfaas2-fun 1/1 Running 0 15s
openfaas-fn text-to-speech-74ffcdfd7-997t4 0/1 CrashLoopBackOff 2235 (3s ago) 4d5h
openfaas-fn wordcount-6489865566-cvfzr 1/1 Running 0 5d1h
openfaas alertmanager-88449c789-fq2rg 1/1 Running 0 3d1h
openfaas basic-auth-plugin-75fd7d69c5-zw4jh 1/1 Running 0 3d2h
openfaas gateway-5c4bb7c5d7-n8h27 2/2 Running 0 3d2h
openfaas grafana 1/1 Running 0 4d8h
openfaas nats-647b476664-hkr7p 1/1 Running 0 3d2h
openfaas prometheus-687648749f-tl8jp 1/1 Running 0 3d1h
openfaas queue-worker-7777ffd7f6-htx6t 1/1 Running 0 3d2h
$ kubectl get -o yaml -n openfaas-fn deploy/hello-openfaas2
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "6"
prometheus.io.scrape: "false"
creationTimestamp: "2022-03-17T12:47:35Z"
generation: 6
labels:
faas_function: hello-openfaas2
name: hello-openfaas2
namespace: openfaas-fn
resourceVersion: "400833"
uid: 9c4e9d26-23af-4f93-8538-4e2d96f0d7e0
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
faas_function: hello-openfaas2
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io.scrape: "false"
creationTimestamp: null
labels:
faas_function: hello-openfaas2
uid: "969512830"
name: hello-openfaas2
spec:
containers:
- env:
- name: fprocess
value: python3 index.py
image: wm/hello-openfaas2:0.1
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /_/health
port: 8080
scheme: HTTP
initialDelaySeconds: 2
periodSeconds: 2
successThreshold: 1
timeoutSeconds: 1
name: hello-openfaas2
ports:
- containerPort: 8080
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /_/health
port: 8080
scheme: HTTP
initialDelaySeconds: 2
periodSeconds: 2
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
enableServiceLinks: false
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
conditions:
- lastTransitionTime: "2022-03-17T12:47:35Z"
lastUpdateTime: "2022-03-17T12:47:35Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: "2022-03-20T12:16:56Z"
lastUpdateTime: "2022-03-20T12:16:56Z"
message: ReplicaSet "hello-openfaas2-5d6c7c7fb4" has timed out progressing.
reason: ProgressDeadlineExceeded
status: "False"
type: Progressing
observedGeneration: 6
replicas: 2
unavailableReplicas: 2
updatedReplicas: 1
In one shell,
docker#minikube:~$ docker run --name wm -ti wm/hello-openfaas2:0.1
2022/03/20 13:04:52 Version: 0.2.0 SHA: 56bf6aac54deb3863a690f5fc03a2a38e7d9e6ef
2022/03/20 13:04:52 Timeouts: read: 5s write: 5s hard: 0s health: 5s.
2022/03/20 13:04:52 Listening on port: 8080
...
and another shell
docker#minikube:~$ docker ps | grep wm
d7796286641c wm/hello-openfaas2:0.1 "fwatchdog" 3 minutes ago Up 3 minutes (healthy) 8080/tcp wm
When you specify an image to pull from without a url, this defaults to DockerHub. When you use :latest tag, it will always pull the latest image regardless of what pull policy is defined.
So to use local built images - don't use the latest tag.
To make minikube pull images from your local machine, you need to do few things:
Point your docker client to the VM's docker daemon: eval $(minikube docker-env)
Configure image pull policy: imagePullPolicy: Never
There is a flag to pass in to use insecure registries in minikube VM. This must be specified when you create the machine: minikube start --insecure-registry
Note you have to run eval eval $(minikube docker-env) on each terminal you want to use, since it only sets the environment variables for the current shell session.
This flow works:
# Start minikube and set docker env
minikube start
eval $(minikube docker-env)
# Build image
docker build -t foo:1.0 .
# Run in minikube
kubectl run hello-foo --image=foo:1.0 --image-pull-policy=Never
You can read more at the minikube docs.
If your image has a latest tag, the Pod's ImagePullPolicy will be automatically set to Always. Each time the pod is created, Kubernetes tries to pull the newest image.
Try not tagging the image as latest or manually setting the Pod's ImagePullPolicy to Never.
If you're using static manifest to create a Pod, the setting will be like the following:
containers:
- name: test-container
image: testImage:latest
imagePullPolicy: Never
From comments in initial post, I gathered that:
The issue is that the container runtime from your Minikube cluster is distinct from that of your host, where you have built your function image (not always the case: minikube can run with docker driver, which, I think implies the host docker runtime is shared with cluster)
the container runtime in use by Minikube is docker (could have been cri-o / following steps won't apply to that case. Those using crio may switch to docker, as I'm not sure image loading is possible with cri-o )
You can try to build your function image from a shell inside your Minikube instance.
Or you can:
export your image ( docker save -o image.tar my/image )
copy this to your minikube instance ( scp -i ~/.minikube/machines/minikube/id_rsa image.tar docker#$(minikube ip): )
open a shell ( ssh -i ~/.minikube/machines/minikube/id_rsa docker#$(minikube ip) )
load that image ( docker load -i image.tar )
Then, make sure your openfaas was deployed with faasnetes.imagePullPolicy=Never or IfNotPresent, as I doubt setting the imagePullPolicy directly in your function would do (haven't read about this in their docs, which instead mentions, as you pointed it out, to override this during openfaas deployment). Checking your deployment yaml definition ( kubectl get -o yaml -n openpaas-fn deploy/hello-openfaas ) should confirm you're not using Always: if that's already the case, no need to dig further: just make sure your image is imported, with name and tag matching that referenced by your function.
... Answering your last comment: you're not sure how openfaas was deployed. One way to make sure the proper option was set would be to look at the gateway deployment, in openfaas namespace ( kubectl get -o yaml -n openfaas deploy/gateway ).
In there, you should find a container named "operator". That container should include a few environment variables, one of which may be image_pull_policy. (we can see this looking at the Chart sources ). You want that environment variable to be set to IfNotPresent, add it or edit it if needed.
Checking your last edit, we can see the Deployment object created by your function says:
image: wm/hello-openfaas2:0.1
imagePullPolicy: Always
So for sure: you do need to reconfigure openfaas, adding that image_pull_policy environment variable.
I had successfully created a custom kafka connector image containing confluent hub connectors.
I am trying to create pod and service to launch it in GCP with kubernetes.
How should I configure yaml file ? The next part of code I took from quick-start guide. This is what I've tried:
Dockerfile:
FROM confluentinc/cp-kafka-connect-base:latest
ENV CONNECT_PLUGIN_PATH="/usr/share/java,/usr/share/confluent-hub-components,/usr/share/java/kafka-connect-jdbc"
RUN confluent-hub install --no-prompt confluentinc/kafka-connect-jdbc:10.2.6
RUN confluent-hub install --no-prompt debezium/debezium-connector-mysql:1.7.1
RUN confluent-hub install --no-prompt debezium/debezium-connector-postgresql:1.7.1
RUN confluent-hub install --no-prompt confluentinc/kafka-connect-oracle-cdc:1.5.0
RUN wget -O /usr/share/confluent-hub-components/confluentinc-kafka-connect-jdbc/lib/mysql-connector-java-8.0.26.jar https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.26/mysql-connector-java-8.0.26.jar
Modifield part of confluent-platform.yaml
apiVersion: platform.confluent.io/v1beta1
kind: Connect
metadata:
name: connect
namespace: confluent
spec:
replicas: 1
image:
application: maxprimeaery/kafka-connect-jdbc:latest #confluentinc/cp-server-connect:7.0.1
init: confluentinc/confluent-init-container:2.2.0-1
configOverrides:
server:
- config.storage.replication.factor=1
- offset.storage.replication.factor=1
- status.storage.replication.factor=1
podTemplate:
resources:
requests:
cpu: 200m
memory: 512Mi
probe:
liveness:
periodSeconds: 10
failureThreshold: 5
timeoutSeconds: 500
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
runAsNonRoot: true
And that's the error I get in console for connect-0 pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 45m default-scheduler Successfully assigned confluent/connect-0 to gke-my-kafka-cluster-default-pool-6ee97fb9-fh9w
Normal Pulling 45m kubelet Pulling image "confluentinc/confluent-init-container:2.2.0-1"
Normal Pulled 45m kubelet Successfully pulled image "confluentinc/confluent-init-container:2.2.0-1" in 17.447881861s
Normal Created 45m kubelet Created container config-init-container
Normal Started 45m kubelet Started container config-init-container
Normal Pulling 45m kubelet Pulling image "maxprimeaery/kafka-connect-jdbc:latest"
Normal Pulled 44m kubelet Successfully pulled image "maxprimeaery/kafka-connect-jdbc:latest" in 23.387676944s
Normal Created 44m kubelet Created container connect
Normal Started 44m kubelet Started container connect
Warning Unhealthy 41m (x5 over 42m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 404
Normal Killing 41m kubelet Container connect failed liveness probe, will be restarted
Warning Unhealthy 5m (x111 over 43m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 404
Warning BackOff 17s (x53 over 22m) kubelet Back-off restarting failed container
Should I create separate pod and service for custom kafka connector or I have to configure the code above ?
UPDATE to my question
I've found out how to configure it in kubernetes adding this to connect pod
apiVersion: platform.confluent.io/v1beta1
kind: Connect
metadata:
name: connect
namespace: confluent
spec:
replicas: 1
image:
application: confluentinc/cp-server-connect:7.0.1
init: confluentinc/confluent-init-container:2.2.0-1
configOverrides:
server:
- config.storage.replication.factor=1
- offset.storage.replication.factor=1
- status.storage.replication.factor=1
build:
type: onDemand
onDemand:
plugins:
locationType: confluentHub
confluentHub:
- name: kafka-connect-jdbc
owner: confluentinc
version: 10.2.6
- name: kafka-connect-oracle-cdc
owner: confluentinc
version: 1.5.0
- name: debezium-connector-mysql
owner: debezium
version: 1.7.1
- name: debezium-connector-postgresql
owner: debezium
version: 1.7.1
storageLimit: 4Gi
podTemplate:
resources:
requests:
cpu: 200m
memory: 1024Mi
probe:
liveness:
periodSeconds: 180 #DONT CHANGE THIS
failureThreshold: 5
timeoutSeconds: 500
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
runAsNonRoot: true
But I still can't add mysql-connector from Maven repo
I tried also making new docker image but it doesn't work. Also I tried new part of code:
locationType: url #NOT WORKING. NO IDEA HOW TO CONFIGURE THAT
url:
- name: mysql-connector-java
archivePath: https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.26/mysql-connector-java-8.0.26.jar
checksum: sha512sum #definitely wrong
After some retries I found out that I just had to wait a little bit longer.
probe:
liveness:
periodSeconds: 180 #DONT CHANGE THIS
failureThreshold: 5
timeoutSeconds: 500
This part periodSeconds: 180 will add more time to make the pod Running and I can just use my own image.
image:
application: maxprimeaery/kafka-connect-jdbc:5.0
init: confluentinc/confluent-init-container:2.2.0-1
And build part can be removed after those changes.
Is there a way to request the status of a readinessProbe by using a service name linked to a deployment ? In an initContainer for example ?
Imagine we have a deployment X, using a readinessProbe, a service linked to it so we can request for example http://service-X:8080.
Now we create a deployment Y, in the initContainer we want to know if deployment X is ready. Is there a way to ask something likedeployment-X.ready or service-X.ready ?
I know that the correct way to handle dependencies is to let kubernetes do it for us, but i have a container which doesn't crash and I have no hand on it...
You can add a ngnix proxy sidecar on deployment Y.
Set the deploymentY.initContainer.readynessProbe to a port on nginx and that port is proxied to deploymentY.readynessProbe
Instead of readinessProbe You can use just InitContainer.
You create a pod/deployment X, make service X, and create a initContainer which is searching for the service X.
If he find it -> he will make the pod.
If he won't find it -> he will keep looking until service X will be created.
Just a simple example, we create nginx deployment by using kubectl apply -f nginx.yaml.
nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx
spec:
selector:
matchLabels:
run: my-nginx
replicas: 2
template:
metadata:
labels:
run: my-nginx
spec:
containers:
- name: my-nginx
image: nginx
ports:
- containerPort: 80
Then we create initContainer
initContainer.yaml
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox:1.28
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-myservice
image: busybox:1.28
command: ['sh', '-c', 'until nslookup my-nginx; do echo waiting for myapp-pod2; sleep 2; done;']
initContainer will look for service my-nginx, until You create it ,it will be in Init:0/1 status.
NAME READY STATUS RESTARTS AGE
myapp-pod 0/1 Init:0/1 0 15m
After You add service for example by using kubectl expose deployment/my-nginx and initContainer will find my-nginx service, he will be created.
NAME READY STATUS RESTARTS AGE
myapp-pod 1/1 Running 0 35m
Result:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/myapp-pod to kubeadm2
Normal Pulled 20s kubelet, kubeadm2 Container image "busybox:1.28" already present on machine
Normal Created 20s kubelet, kubeadm2 Created container init-myservice
Normal Started 20s kubelet, kubeadm2 Started container init-myservice
Normal Pulled 20s kubelet, kubeadm2 Container image "busybox:1.28" already present on machine
Normal Created 20s kubelet, kubeadm2 Created container myapp-container
Normal Started 20s kubelet, kubeadm2 Started container myapp-container
Let me know if that answer your question.
I finaly found a solution by following this link :
https://blog.giantswarm.io/wait-for-it-using-readiness-probes-for-service-dependencies-in-kubernetes/
We first need to create a ServiceAccount in Kubernetes to allow listing endpoints from an initContainer. After this, we ask for the available endpoints, if there is at least one, dependency is ready (in my case).
I'm trying to figure out a way to restart container on failure and NOT remove and create a new container to take it's place. It would be a plus to be able to try restarting it, say 3 times, and then stop the pod.
I have a statefulset that looks like this (I removed some insignificant parts):
apiVersion: "apps/v1beta1"
kind: StatefulSet
metadata:
name: cassandra-stateful
spec:
serviceName: cassandra
replicas: 1
template:
metadata:
labels:
app: cassandra-stateful
spec:
# Only one Cassandra node should exist for one Kubernetes node.
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- cassandra
topologyKey: "kubernetes.io/hostname"
containers:
- name: cassandra
image: localrepo/cassandra-kube
ports:
- containerPort: 7000
name: intra-node
- containerPort: 7001
name: tls-intra-node
- containerPort: 7199
name: jmx
- containerPort: 9042
name: cql
lifecycle:
preStop:
exec:
command: ["pkill java && while ps -p 1 > /dev/null; do sleep 1; done"]
The reason I know it's recreating the pods is that I'm purposefully killing my process with:
pkill java && while ps -p 1 > /dev/null; do sleep 1; done
If I do a describe for the pod I can see it recreates the container instead of restarting:
$ kubectl describe po cassandra-stateful-0
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
11m 11m 1 default-scheduler Normal Scheduled Successfully assigned cassandra-stateful-0 to node-136-225-226-236
11m 11m 1 kubelet, node-136-225-226-236 spec.containers{cassandra} Normal Created Created container with id cf5bbdc2989e231cdad4bb16dd26ad55b9a016200842cc3b2a3915f3d618737f
11m 11m 1 kubelet, node-136-225-226-236 spec.containers{cassandra} Normal Started Started container with id cf5bbdc2989e231cdad4bb16dd26ad55b9a016200842cc3b2a3915f3d618737f
4m 4m 1 kubelet, node-136-225-226-236 spec.containers{cassandra} Normal Created Created container with id fb4869eb91313512dc56608a6ef3d24590c88234a0ef453cd7c16dcf625e1f37
4m 4m 1 kubelet, node-136-225-226-236 spec.containers{cassandra} Normal Started Started container with id fb4869eb91313512dc56608a6ef3d24590c88234a0ef453cd7c16dcf625e1f37
Is there any rule that make this possible?