Creating pod and service for custom kafka connect image with kubernetes - kubernetes

I had successfully created a custom kafka connector image containing confluent hub connectors.
I am trying to create pod and service to launch it in GCP with kubernetes.
How should I configure yaml file ? The next part of code I took from quick-start guide. This is what I've tried:
Dockerfile:
FROM confluentinc/cp-kafka-connect-base:latest
ENV CONNECT_PLUGIN_PATH="/usr/share/java,/usr/share/confluent-hub-components,/usr/share/java/kafka-connect-jdbc"
RUN confluent-hub install --no-prompt confluentinc/kafka-connect-jdbc:10.2.6
RUN confluent-hub install --no-prompt debezium/debezium-connector-mysql:1.7.1
RUN confluent-hub install --no-prompt debezium/debezium-connector-postgresql:1.7.1
RUN confluent-hub install --no-prompt confluentinc/kafka-connect-oracle-cdc:1.5.0
RUN wget -O /usr/share/confluent-hub-components/confluentinc-kafka-connect-jdbc/lib/mysql-connector-java-8.0.26.jar https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.26/mysql-connector-java-8.0.26.jar
Modifield part of confluent-platform.yaml
apiVersion: platform.confluent.io/v1beta1
kind: Connect
metadata:
name: connect
namespace: confluent
spec:
replicas: 1
image:
application: maxprimeaery/kafka-connect-jdbc:latest #confluentinc/cp-server-connect:7.0.1
init: confluentinc/confluent-init-container:2.2.0-1
configOverrides:
server:
- config.storage.replication.factor=1
- offset.storage.replication.factor=1
- status.storage.replication.factor=1
podTemplate:
resources:
requests:
cpu: 200m
memory: 512Mi
probe:
liveness:
periodSeconds: 10
failureThreshold: 5
timeoutSeconds: 500
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
runAsNonRoot: true
And that's the error I get in console for connect-0 pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 45m default-scheduler Successfully assigned confluent/connect-0 to gke-my-kafka-cluster-default-pool-6ee97fb9-fh9w
Normal Pulling 45m kubelet Pulling image "confluentinc/confluent-init-container:2.2.0-1"
Normal Pulled 45m kubelet Successfully pulled image "confluentinc/confluent-init-container:2.2.0-1" in 17.447881861s
Normal Created 45m kubelet Created container config-init-container
Normal Started 45m kubelet Started container config-init-container
Normal Pulling 45m kubelet Pulling image "maxprimeaery/kafka-connect-jdbc:latest"
Normal Pulled 44m kubelet Successfully pulled image "maxprimeaery/kafka-connect-jdbc:latest" in 23.387676944s
Normal Created 44m kubelet Created container connect
Normal Started 44m kubelet Started container connect
Warning Unhealthy 41m (x5 over 42m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 404
Normal Killing 41m kubelet Container connect failed liveness probe, will be restarted
Warning Unhealthy 5m (x111 over 43m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 404
Warning BackOff 17s (x53 over 22m) kubelet Back-off restarting failed container
Should I create separate pod and service for custom kafka connector or I have to configure the code above ?
UPDATE to my question
I've found out how to configure it in kubernetes adding this to connect pod
apiVersion: platform.confluent.io/v1beta1
kind: Connect
metadata:
name: connect
namespace: confluent
spec:
replicas: 1
image:
application: confluentinc/cp-server-connect:7.0.1
init: confluentinc/confluent-init-container:2.2.0-1
configOverrides:
server:
- config.storage.replication.factor=1
- offset.storage.replication.factor=1
- status.storage.replication.factor=1
build:
type: onDemand
onDemand:
plugins:
locationType: confluentHub
confluentHub:
- name: kafka-connect-jdbc
owner: confluentinc
version: 10.2.6
- name: kafka-connect-oracle-cdc
owner: confluentinc
version: 1.5.0
- name: debezium-connector-mysql
owner: debezium
version: 1.7.1
- name: debezium-connector-postgresql
owner: debezium
version: 1.7.1
storageLimit: 4Gi
podTemplate:
resources:
requests:
cpu: 200m
memory: 1024Mi
probe:
liveness:
periodSeconds: 180 #DONT CHANGE THIS
failureThreshold: 5
timeoutSeconds: 500
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
runAsNonRoot: true
But I still can't add mysql-connector from Maven repo
I tried also making new docker image but it doesn't work. Also I tried new part of code:
locationType: url #NOT WORKING. NO IDEA HOW TO CONFIGURE THAT
url:
- name: mysql-connector-java
archivePath: https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.26/mysql-connector-java-8.0.26.jar
checksum: sha512sum #definitely wrong

After some retries I found out that I just had to wait a little bit longer.
probe:
liveness:
periodSeconds: 180 #DONT CHANGE THIS
failureThreshold: 5
timeoutSeconds: 500
This part periodSeconds: 180 will add more time to make the pod Running and I can just use my own image.
image:
application: maxprimeaery/kafka-connect-jdbc:5.0
init: confluentinc/confluent-init-container:2.2.0-1
And build part can be removed after those changes.

Related

K8s EKS Error - Readiness probe errored: rpc error

I'm using AWS EKS Fargate to deploy my work. After applying the deployment yaml file, everything goes well in first 10mins, but after that, I failed to access the pod by using kubectl exec <podname> -- bash, when typing kubectl describe pod <podname>, both readinessProbe and livenessProbe return similar messages as below:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning LoggingDisabled 16m fargate-scheduler Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found
Normal Scheduled 15m fargate-scheduler Successfully assigned k8s-fargate/k8s-api-5765846f76-d7nws to fargate-ip-10-0-130-250.ap-east-1.compute.internal
Normal Pulling 15m kubelet Pulling image "awsaccid.dkr.ecr.ap-east-1.amazonaws.com/k8s-api-test:1.0.0"
Normal Pulled 14m kubelet Successfully pulled image "awsaccid.dkr.ecr.ap-east-1.amazonaws.com/k8s-api-test:1.0.0" in 1m18.703187993s
Normal Created 14m kubelet Created container k8s-api
Normal Started 14m kubelet Started container k8s-api
Warning Unhealthy 2m18s kubelet Readiness probe errored: rpc error: code = Unknown desc = failed to exec in container: failed to start exec "c2a2e9750a44684104a7e76a92bf7abe814ba29f306b092a48e17b90aab7f2dd": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: resource temporarily unavailable: unknown
Warning Unhealthy 2m13s kubelet Readiness probe errored: rpc error: code = Unknown desc = failed to exec in container: failed to start exec "bcb60f638e2c364adc8694bc12f00660e2b0d7647d3861d3462727976d2df08c": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: resource temporarily unavailable: unknown
Warning Unhealthy 2m8s kubelet Readiness probe errored: rpc error: code = Unknown desc = failed to exec in container: failed to start exec "988d29870b88fdcaae3cedf1071e79d2a786638c801364d71b6c7886f0be79e1": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: resource temporarily unavailable: unknown
Moreover, livenessProbe hasn't restart the pod even it is unhealthy.
I spent a whole day for this but still failed to solve it, anyone knows the problem? Thank you so much
Here's my deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: k8s-fargate
name: k8s-api
spec:
replicas: 1
selector:
matchLabels:
app: k8s-api
template:
metadata:
labels:
app: k8s-api
spec:
volumes:
- name: k8s-properties
configMap:
name: k8s-properties
containers:
- name: k8s-api
image: awsaccountid.dkr.ecr.ap-east-1.amazonaws.com/k8s-test:1.0.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8443
resources:
requests:
memory: "1024Mi"
cpu: "200m"
limits:
memory: "2500Mi"
cpu: "1000m"
volumeMounts:
- name: k8s-properties
mountPath: "/usr/local/folder"
readOnly: false
livenessProbe:
exec:
command:
- cat
- /usr/local/folder/file
initialDelaySeconds: 5
periodSeconds: 30
readinessProbe:
exec:
command:
- cat
- /usr/local/folder/file
initialDelaySeconds: 5
periodSeconds: 5
Problem solved by creating new Docker image. Still have no idea of the error, but problem likely comes from the image container itself.

Metrics server is currently unable to handle the request

I am new to kubernetes and was trying to apply horizontal pod autoscaling to my existing application. and after following other stackoverflow details - got to know that I need to install metric-server - and I was able to - but some how it's not working and unable to handle request.
Further I followed few more things but unable to resolve the issue - I will really appreciate any help here.
Please let me know for any further details you need for helping me :) Thanks in advance.
Steps followed:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
kubectl get deploy,svc -n kube-system | egrep metrics-server
deployment.apps/metrics-server 1/1 1 1 2m6s
service/metrics-server ClusterIP 10.32.0.32 <none> 443/TCP 2m6s
kubectl get pods -n kube-system | grep metrics-server
metrics-server-64cf6869bd-6gx88 1/1 Running 0 2m39s
vi ana_hpa.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: ana-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: common-services-auth
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 160
k apply -f ana_hpa.yaml
horizontalpodautoscaler.autoscaling/ana-hpa created
k get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
ana-hpa StatefulSet/common-services-auth <unknown>/160%, <unknown>/80% 1 10 0 4s
k describe hpa ana-hpa
Name: ana-hpa
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Tue, 12 Apr 2022 17:01:25 +0530
Reference: StatefulSet/common-services-auth
Metrics: ( current / target )
resource memory on pods (as a percentage of request): <unknown> / 160%
resource cpu on pods (as a percentage of request): <unknown> / 80%
Min replicas: 1
Max replicas: 10
StatefulSet pods: 3 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 38s (x8 over 2m23s) horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Warning FailedComputeMetricsReplicas 38s (x8 over 2m23s) horizontal-pod-autoscaler invalid metrics (2 invalid out of 2), first error is: failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Warning FailedGetResourceMetric 23s (x9 over 2m23s) horizontal-pod-autoscaler failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
kubectl get --raw /apis/metrics.k8s.io/v1beta1
Error from server (ServiceUnavailable): the server is currently unable to handle the request
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
Error from server (ServiceUnavailable): the server is currently unable to handle the request
kubectl edit deployments.apps -n kube-system metrics-server
Add hostNetwork: true
deployment.apps/metrics-server edited
kubectl get pods -n kube-system | grep metrics-server
metrics-server-5dc6dbdb8-42hw9 1/1 Running 0 10m
k describe pod metrics-server-5dc6dbdb8-42hw9 -n kube-system
Name: metrics-server-5dc6dbdb8-42hw9
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: pusntyn196.apac.avaya.com/10.133.85.196
Start Time: Tue, 12 Apr 2022 17:08:25 +0530
Labels: k8s-app=metrics-server
pod-template-hash=5dc6dbdb8
Annotations: <none>
Status: Running
IP: 10.133.85.196
IPs:
IP: 10.133.85.196
Controlled By: ReplicaSet/metrics-server-5dc6dbdb8
Containers:
metrics-server:
Container ID: containerd://024afb1998dce4c0bd5f4e58f996068ea37982bd501b54fda2ef8d5c1098b4f4
Image: k8s.gcr.io/metrics-server/metrics-server:v0.6.1
Image ID: k8s.gcr.io/metrics-server/metrics-server#sha256:5ddc6458eb95f5c70bd13fdab90cbd7d6ad1066e5b528ad1dcb28b76c5fb2f00
Port: 4443/TCP
Host Port: 4443/TCP
Args:
--cert-dir=/tmp
--secure-port=4443
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--kubelet-use-node-status-port
--metric-resolution=15s
State: Running
Started: Tue, 12 Apr 2022 17:08:26 +0530
Ready: True
Restart Count: 0
Requests:
cpu: 100m
memory: 200Mi
Liveness: http-get https://:https/livez delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:https/readyz delay=20s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g6p4g (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-g6p4g:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 2s
node.kubernetes.io/unreachable:NoExecute op=Exists for 2s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m31s default-scheduler Successfully assigned kube-system/metrics-server-5dc6dbdb8-42hw9 to pusntyn196.apac.avaya.com
Normal Pulled 2m32s kubelet Container image "k8s.gcr.io/metrics-server/metrics-server:v0.6.1" already present on machine
Normal Created 2m31s kubelet Created container metrics-server
Normal Started 2m31s kubelet Started container metrics-server
kubectl get --raw /apis/metrics.k8s.io/v1beta1
Error from server (ServiceUnavailable): the server is currently unable to handle the request
kubectl get pods -n kube-system | grep metrics-server
metrics-server-5dc6dbdb8-42hw9 1/1 Running 0 10m
kubectl logs -f metrics-server-5dc6dbdb8-42hw9 -n kube-system
E0412 11:43:54.684784 1 configmap_cafile_content.go:242] kube-system/extension-apiserver-authentication failed with : missing content for CA bundle "client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
E0412 11:44:27.001010 1 configmap_cafile_content.go:242] key failed with : missing content for CA bundle "client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
k logs -f metrics-server-5dc6dbdb8-42hw9 -n kube-system
I0412 11:38:26.447305 1 serving.go:342] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0412 11:38:26.899459 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0412 11:38:26.899477 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0412 11:38:26.899518 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0412 11:38:26.899545 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0412 11:38:26.899546 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0412 11:38:26.899567 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0412 11:38:26.900480 1 dynamic_serving_content.go:131] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key"
I0412 11:38:26.900811 1 secure_serving.go:266] Serving securely on [::]:4443
I0412 11:38:26.900854 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W0412 11:38:26.900965 1 shared_informer.go:372] The sharedIndexInformer has started, run more than once is not allowed
I0412 11:38:26.999960 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0412 11:38:26.999989 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
I0412 11:38:26.999970 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
E0412 11:38:27.000087 1 configmap_cafile_content.go:242] kube-system/extension-apiserver-authentication failed with : missing content for CA bundle "client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
E0412 11:38:27.000118 1 configmap_cafile_content.go:242] key failed with : missing content for CA bundle "client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
kubectl top pods
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
Edit metrics server deployment yaml
Add - --kubelet-insecure-tls
k apply -f metric-server-deployment.yaml
serviceaccount/metrics-server unchanged
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged
clusterrole.rbac.authorization.k8s.io/system:metrics-server unchanged
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader unchanged
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator unchanged
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server unchanged
service/metrics-server unchanged
deployment.apps/metrics-server configured
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io unchanged
kubectl get pods -n kube-system | grep metrics-server
metrics-server-5dc6dbdb8-42hw9 1/1 Running 0 10m
kubectl top pods
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
Also tried by adding below to metrics server deployment
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
This can easily be resolved by editing the deployment yaml files and adding the hostNetwork: true after the dnsPolicy: ClusterFirst
kubectl edit deployments.apps -n kube-system metrics-server
insert:
hostNetwork: true
I hope this help somebody for bare metal cluster:
$ helm --repo https://kubernetes-sigs.github.io/metrics-server/ --kubeconfig=$HOME/.kube/loc-cluster.config -n kube-system --set args='{--kubelet-insecure-tls}' upgrade --install metrics-server metrics-server
$ helm --kubeconfig=$HOME/.kube/loc-cluster.config -n kube-system uninstall metrics-server
Update: I deployed the metrics-server using the same command. Perhaps you can start fresh by removing existing resources and running:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
=======================================================================
It appears the --kubelet-insecure-tls flag was not configured correctly for the pod template in the deployment. The following should fix this:
Edit the existing deployment in the cluster with kubectl edit deployment/metrics-server -nkube-system.
Add the flag to the spec.containers[].args list, so that the deployment looks like this:
...
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls <=======ADD IT HERE.
image: k8s.gcr.io/metrics-server/metrics-server:v0.6.1
...
Simply save your changes and let the deployment rollout the updated pods. You can use watch -n1 kubectl get deployment/kube-metrics -nkube-system and wait for UP-TO-DATE column to show 1.
Like this:
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 1/1 1 1 16m
Verify with kubectl top nodes. It will show something like
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
docker-desktop 222m 5% 1600Mi 41%
I've just verified this to work on a local setup. Let me know if this helps :)
Please configuration aggregation layer correctly and carefully, you can use this link for help : https://kubernetes.io/docs/tasks/extend-kubernetes/configure-aggregation-layer/.
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: <name of the registration object>
spec:
group: <API group name this extension apiserver hosts>
version: <API version this extension apiserver hosts>
groupPriorityMinimum: <priority this APIService for this group, see API documentation>
versionPriority: <prioritizes ordering of this version within a group, see API documentation>
service:
namespace: <namespace of the extension apiserver service>
name: <name of the extension apiserver service>
caBundle: <pem encoded ca cert that signs the server cert used by the webhook>
It would be helpful to provide kubectl version return value.
For me on EKS with helmfile I had to write in the values.yaml using the metrics-server chart :
containerPort: 10250
The value was enforced by default to 4443 for an unknown reason when I first deployed the chart.
See doc:
https://github.com/kubernetes-sigs/metrics-server/blob/master/charts/metrics-server/values.yaml#L62
https://aws.amazon.com/premiumsupport/knowledge-center/eks-metrics-server/#:~:text=confirm%20that%20your%20security%20groups
Then kubectl top nodes and kubectl describe apiservice v1beta1.metrics.k8s.io were working.
First of all, execute the following command:
kubectl get apiservices
And checkout the availablity (status) of kube-system/metrics-server service.
In case the availability is True:
Add hostNetwork: true to the spec of your metrics-server deployment by executing the following command:
kubectl edit deployment -n kube-system metrics-server
It should look like the following:
...
spec:
hostNetwork: true
...
Setting hostNetwork to true means that Pod will have access to
the host where it's running.
In case the availability is False (MissingEndpoints):
Download metrics-server:
wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.5.0/components.yaml
Remove (legacy) metrics server:
kubectl delete -f components.yaml
Edit downloaded file and add - --kubelet-insecure-tls to args list:
...
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls # add this line
...
Create service once again:
kubectl apply -f components.yaml

metrics-service in kubernetes not working

I'm running kubernetes using an ec2 machine on aws.
Node is in Ubuntu.
my metrics-server version.
wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.7/components.yaml
components.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server/metrics-server:v0.3.7
imagePullPolicy: IfNotPresent
args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-type=InternalIP,ExternalIP,Hostname
- --kubelet-insecure-tls
Even after adding args, the error appears.
error :
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
or
error: metrics not available yet
No matter how long I wait, that error appears.
my kops version : Version 1.18.0 (git-698bf974d8)
i use networking calico.
please help...
++
I try to wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.5.0/components.yaml
view logs..
kubectl logs -n kube-system deploy/metrics-server
"Failed to scrape node" err="GET "https://172.20.51.226:10250/stats/summary?only_cpu_and_memory=true": bad status code "401 Unauthorized"" node="ip-172-20-51-226.ap-northeast-2.compute.internal"
"Failed probe" probe="metric-storage-ready" err="not metrics to serve"
Download the components.yaml file manually:
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Then edit the args section under Deployment:
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
add there two more lines:
- --kubelet-insecure-tls=true
- --kubelet-preferred-address-types=InternalIP
kubelet Of 10250 The port uses https agreement , The connection needs to be verified by tls certificate. Adding ,--kubelet-insecure-tls tells it do not verify client certificate.
After this modification just apply the manifest:
kubectl apply -f components.yaml
wait a minute and you will see metrics server pod is up
Last comment is useful.You can edit the deploy directly as well and adding line "--kubelet-insecure-tls=true" its enought for me:
Edit deploy:
$ kubectl edit deployment.apps/metrics-server -n kube-system
Add the line:
- --kubelet-insecure-tls=true
Similar result:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls=true
And save with ":wq" and enjoy.
~$ kubectl top pods -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-6d4b75cb6d-k8dmc 3m 18Mi
coredns-6d4b75cb6d-wxxn6 3m 17Mi
kube-apiserver-k8s-master1 82m 306Mi
kube-apiserver-k8s-master2 65m 247Mi
kube-controller-manager-k8s-master1 32m 47Mi
kube-controller-manager-k8s-master2 4m 19Mi
kube-proxy-9dbgk 1m 9Mi
kube-proxy-bwhdm 1m 14Mi
kube-proxy-fz8v8 1m 15Mi
kube-proxy-vcnrc 1m 9Mi
kube-scheduler-k8s-master1 7m 18Mi
kube-scheduler-k8s-master2 4m 16Mi
metrics-server-79576f7ff-97tpc 6m 15Mi
metrics-server-79576f7ff-qzczp 4m 13Mi
~$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-master1 318m 15% 1047Mi 55%
k8s-master2 208m 10% 1002Mi 52%
k8s-worker1 30m 3% 804Mi 42%
k8s-worker2 35m 3% 550Mi 29%

Kubernetes probe events not visible but failing

I'm using K8S 1.14 and Helm 3.3.1.
I have an app which works when deployed without probes. Then I set two trivial probes:
livenessProbe:
exec:
command:
- ls
- /mnt
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
exec:
command:
- ls
- /mnt
initialDelaySeconds: 5
periodSeconds: 5
When I deploy via helm upgrade, the command eventually (~5 mins) fails with:
Error: UPGRADE FAILED: release my-app failed, and has been rolled back due to atomic being set: timed out waiting for the condition
But in the events log there is no trace of any probe:
5m21s Normal ScalingReplicaSet deployment/my-app Scaled up replica set my-app-7 to 1
5m21s Normal Scheduled pod/my-app-7-6 Successfully assigned default/my-app-7-6 to gke-foo-testing-foo-testing-node-po-111-r0cu
5m21s Normal LoadBalancerNegNotReady pod/my-app-7-6 Waiting for pod to become healthy in at least one of the NEG(s): [k8s1-222-default-my-app-80-54]
5m21s Normal SuccessfulCreate replicaset/my-app-7 Created pod: my-app-7-6
5m20s Normal Pulling pod/my-app-7-6 Pulling image "my-registry/my-app:v0.1"
5m20s Normal Pulled pod/my-app-7-6 Successfully pulled image "my-registry/my-app:v0.1"
5m20s Normal Created pod/my-app-7-6 Created container my-app
5m20s Normal Started pod/my-app-7-6 Started container my-app
5m15s Normal Attach service/my-app Attach 1 network endpoint(s) (NEG "k8s1-222-default-my-app-80-54" in zone "europe-west3-a")
19s Normal ScalingReplicaSet deployment/my-app Scaled down replica set my-app-7 to 0
19s Normal SuccessfulDelete replicaset/my-app-7 Deleted pod: my-app-7-6
19s Normal Killing pod/my-app-7-6 Stopping container my-app
Hence the question: what are the probes doing and where?
Try deleting the helm then re-apply it again: helm del --purge <APPNAME>
Also which helm version are you using? Try upgrading to v3.2.1, there's an open issue that tries to fix this incident with previously failed upgrades: https://github.com/helm/helm/issues/5939
I reproduced the same scenario here and everything went fine. The release was deployed and the pod is running. Did you check within the container if the /mnt really exists?
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m41s Successfully assigned default/nginx-deployment2-5cdd568667-blsc7 to minikube
Normal Pulling 3m41s kubelet, minikube Pulling image "nginx"
Normal Pulled 3m38s kubelet, minikube Successfully pulled image "nginx" in 2.769840982s
Normal Created 3m38s kubelet, minikube Created container nginx
Normal Started 3m38s kubelet, minikube Started container nginx
NAME READY STATUS RESTARTS AGE
nginx-deployment2-5cdd568667-blsc7 1/1 Running 0 4m59s
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment2
spec:
selector:
matchLabels:
app: ameba
replicas: 1
template:
metadata:
labels:
app: ameba
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
name: nginx-port
livenessProbe:
exec:
command:
- ls
- /mnt
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
exec:
command:
- ls
- /mnt
initialDelaySeconds: 5
periodSeconds: 5
I don't if you image include bash, but if you just want to verify if the directory exists, you can do the samething using others shell commands, try this:
livenessProbe:
exec:
command:
- /bin/bash
- -c
- ls /mnt
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
exec:
command:
- /bin/bash
- -c
- ls /mnt
initialDelaySeconds: 5
periodSeconds: 5
In bash you can also try to use the test built-in function:
[[ -d /mnt ]] = The -d verify if the directory /mnt exists.
As an alternative, there is also the command stat:
stat /mnt
If you want to check if the directory has any specific file, use the complete path with filename include.

Kubernetes pod stuck in waiting state

Trying to start this pod
apiVersion: v1
kind: Pod
metadata:
name: tinyproxy
spec:
containers:
- name: master
image: asdrepo.isus.emc.com:8091/francisbesset/tinyproxy
env:
- name: MASTER
value: "true"
ports:
- containerPort: 6379
resources:
limits:
cpu: "0.1"
volumeMounts:
- mountPath: /tinyproxy-data
name: data
volumes:
- name: data
emptyDir: {}
This gets stuck in pending state. I looked in the troubleshooting guide, but this pod does not seem to have any events
$ kubectl describe pods tinyproxy
Name: tinyproxy
Namespace: default
Node: /
Labels: name=tinyproxy
Status: Pending
IP:
Controllers: <none>
Containers:
master:
Image: asdrepo.isus.emc.com:8091/francisbesset/tinyproxy
Port: 6379/TCP
QoS Tier:
cpu: Guaranteed
memory: BestEffort
Limits:
cpu: 100m
Requests:
cpu: 100m
Environment Variables:
MASTER: true
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
No events.
Also
$ kubectl get events
FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
13m 13m 1 10.0.0.5 Node Normal Starting {kubelet 10.0.0.5} Starting kubelet.
13m 13m 2 10.0.0.5 Node Warning MissingClusterDNS {kubelet 10.0.0.5} kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. pod: "kube-proxy-10.0.0.5_kube-system(9fa6e0ea64b9f19ad6996367402408eb)". Falling back to DNSDefault policy.
13m 13m 1 10.0.0.5 Node Normal NodeHasSufficientDisk {kubelet 10.0.0.5} Node 10.0.0.5 status is now: NodeHasSufficientDisk
13m 13m 1 10.0.0.5 Node Normal Starting {kubelet 10.0.0.5} Starting kubelet.
13m 13m 1 10.0.0.5 Node Normal NodeHasSufficientDisk {kubelet 10.0.0.5} Node 10.0.0.5 status is now: NodeHasSufficientDisk
13m 13m 1 k8-dvawxybzux-0-a7m3diiryehx-kube-minion-itahxn4icom6 Node Normal Starting {kube-proxy k8-dvawxybzux-0-a7m3diiryehx-kube-minion-itahxn4icom6} Starting kube-proxy.
The proxy does seem to be running and is not restarting
bash-4.3# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d6dd779b301f gcr.io/google_containers/hyperkube:v1.2.0 "/hyperkube proxy --m" 15 minutes ago Up 15 minutes k8s_kube-proxy.d87e83d4_kube-proxy-10.0.0.5_kube-system_9fa6e0ea64b9f19ad6996367402408eb_caae92ac
8191770f15d9 gcr.io/google_containers/pause:2.0 "/pause" 15 minutes ago Up 15 minutes k8s_POD.6059dfa2_kube-proxy-10.0.0.5_kube-system_9fa6e0ea64b9f19ad6996367402408eb_e4da5a30
How do I debug this?
Looks like the scheduler service did not start (this is in an openstack VM). All services were supposed to be configured and started automatically. This worked after I started the service manually.