How to add a loadbalancer to a cluster on digitalocean - kubernetes

I created a cluster on digitalocean using Kubeadm and 3 droplets. Since this is not a managed Kubernetes cluster from Digital ocean, how do I manually setup a LoadBalancer ?
I've tried adding an external load balancer by adding the following lines to a deployment config file
...
replicaCount: 1
image:
repository: turfff/node-replicas
tag: latest
pullPolicy: IfNotPresent
...
service:
type: LoadBalancer
port: 80
targetPort: 8080
...
however, when I run the configuration and check for created svc
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 13d
mongo-mongodb-replicaset ClusterIP None <none> 27017/TCP 3h15m
mongo-mongodb-replicaset-client ClusterIP None <none> 27017/TCP 3h15m
nodejs-nodeapp LoadBalancer 10.109.213.98 <pending> 80:31769/TCP 61m
kubectl describe svc nodejs-nodeapp
Name: nodejs-nodeapp
Namespace: default
Labels: app.kubernetes.io/instance=nodejs
app.kubernetes.io/managed-by=Tiller
app.kubernetes.io/name=nodeapp
app.kubernetes.io/version=1.0
helm.sh/chart=nodeapp-0.1.0
Annotations: <none>
Selector: app.kubernetes.io/instance=nodejs,app.kubernetes.io/name=nodeapp
Type: LoadBalancer
IP: 10.109.213.98
Port: http 80/TCP
TargetPort: http/TCP
NodePort: http 31769/TCP
Endpoints: 10.244.2.19:8080
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
kubectl get pods
NAME READY STATUS RESTARTS AGE
mongo-mongodb-replicaset-0 1/1 Running 0 3h18m
mongo-mongodb-replicaset-1 1/1 Running 0 3h17m
mongo-mongodb-replicaset-2 1/1 Running 0 3h16m
nodejs-nodeapp-7b89db8888-sjcbq 1/1 Running 0 65m
kubectl describe pod nodejs-nodeapp
Name: nodejs-nodeapp-7b89db8888-sjcbq
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: worker-02/206.81.3.65
Start Time: Sun, 14 Jun 2020 11:21:07 +0100
Labels: app.kubernetes.io/instance=nodejs
app.kubernetes.io/name=nodeapp
pod-template-hash=7b89db8888
Annotations: <none>
Status: Running
IP: 10.244.2.19
Controlled By: ReplicaSet/nodejs-nodeapp-7b89db8888
Containers:
nodeapp:
Container ID: docker://f0d4d01f....
Image: turfff/node-replicas:latest
Image ID: docker-pullable://turfff/node-replicas#sha256:34d...
Port: 8080/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 14 Jun 2020 11:21:08 +0100
Ready: True
Restart Count: 0
Liveness: http-get http://:http/sharks delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/sharks delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
MONGO_USERNAME: <set to the key 'MONGO_USERNAME' in secret 'nodejs-auth'> Optional: false
MONGO_PASSWORD: <set to the key 'MONGO_PASSWORD' in secret 'nodejs-auth'> Optional: false
MONGO_HOSTNAME: <set to the key 'MONGO_HOSTNAME' of config map 'nodejs-config'> Optional: false
MONGO_PORT: <set to the key 'MONGO_PORT' of config map 'nodejs-config'> Optional: false
MONGO_DB: <set to the key 'MONGO_DB' of config map 'nodejs-config'> Optional: false
MONGO_REPLICASET: <set to the key 'MONGO_REPLICASET' of config map 'nodejs-config'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nodejs-nodeapp-token-4wxvd (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
nodejs-nodeapp-token-4wxvd:
Type: Secret (a volume populated by a Secret)
SecretName: nodejs-nodeapp-token-4wxvd
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
It fails to create a loadbalancer. How do I manually setup the LoadBalancer ?

I would not recommend configuring loadbalancers manually. You can automate this if you install digital ocean cloud controller manager which is the Kubernetes cloud controller manager implementation for DigitalOcean. Read more about cloud controller managers here.
DigitalOcean cloud controller manager runs service controller, which is responsible for watching services of type LoadBalancer and creating DO loadbalancers to satisfy its requirements. Here are example of how it's used.
Here is a yaml file that you can use to deploy this on your Kubernetes cluster. This needs a digital ocean api token to be placed in access-token: section of the manifest.

Related

How to fix http 502 from external reverse proxy with upstream to ingress-nginx

Currently I have a cluster with single controller and single worker, also a nginx reverse-proxy (hhtp only) outside cluster.
Controller is at 192.168.1.65
worker is at 192.168.1.61
reverse proxy at 192.168.1.93 and public ip
here is my ingress-nginx services
bino#corobalap  ~/k0s-sriwijaya/ingress-nginx/testapp  kubectl -n ingress-nginx get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx-controller LoadBalancer 10.102.58.7 192.168.1.186 80:31097/TCP,443:31116/TCP 56m
ingress-nginx-controller-admission ClusterIP 10.108.233.49 <none> 443/TCP 56m
bino#corobalap  ~/k0s-sriwijaya/ingress-nginx/testapp  kubectl -n ingress-nginx describe svc ingress-nginx-controller
Name: ingress-nginx-controller
Namespace: ingress-nginx
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress-nginx
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.3.0
Annotations: <none>
Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.102.58.7
IPs: 10.102.58.7
LoadBalancer Ingress: 192.168.1.186
Port: http 80/TCP
TargetPort: http/TCP
NodePort: http 31097/TCP
Endpoints: 10.244.0.23:80
Port: https 443/TCP
TargetPort: https/TCP
NodePort: https 31116/TCP
Endpoints: 10.244.0.23:443
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
that 192.168.1.186 is assigned by MetalLB.
bino#corobalap  ~/k0s-sriwijaya/ingress-nginx/testapp  kubectl get IPAddressPools -A
NAMESPACE NAME AGE
metallb-system pool01 99m
bino#corobalap  ~/k0s-sriwijaya/ingress-nginx/testapp  kubectl -n metallb-system describe IPAddressPool pool01
Name: pool01
Namespace: metallb-system
Labels: <none>
Annotations: <none>
API Version: metallb.io/v1beta1
Kind: IPAddressPool
Metadata:
Creation Timestamp: 2022-07-26T09:08:10Z
Generation: 1
Managed Fields:
API Version: metallb.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:addresses:
f:autoAssign:
f:avoidBuggyIPs:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2022-07-26T09:08:10Z
Resource Version: 41021
UID: 2a0dcfb2-bf8f-4b1a-b459-380e78959586
Spec:
Addresses:
192.168.1.186 - 192.168.1.191
Auto Assign: true
Avoid Buggy I Ps: false
Events: <none>
I deploy hello-app at namespace : 'dev'
bino#corobalap  ~/k0s-sriwijaya/ingress-nginx/testapp  kubectl -n dev get all
NAME READY STATUS RESTARTS AGE
pod/hello-app-5c554f556c-v2gx9 1/1 Running 1 (20m ago) 63m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/hello-service ClusterIP 10.111.161.2 <none> 8081/TCP 62m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/hello-app 1/1 1 1 63m
NAME DESIRED CURRENT READY AGE
replicaset.apps/hello-app-5c554f556c 1 1 1 63m
bino#corobalap  ~/k0s-sriwijaya/ingress-nginx/testapp  kubectl -n dev describe service hello-service
Name: hello-service
Namespace: dev
Labels: app=hello
Annotations: <none>
Selector: app=hello
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.111.161.2
IPs: 10.111.161.2
Port: <unset> 8081/TCP
TargetPort: 8080/TCP
Endpoints: 10.244.0.22:8080
Session Affinity: None
Events: <none>
Local tests of that service:
bino#k8s-worker-1:~$ curl http://10.111.161.2:8081
Hello, world!
Version: 2.0.0
Hostname: hello-app-5c554f556c-v2gx9
bino#k8s-worker-1:~$ curl http://10.244.0.22:8080
Hello, world!
Version: 2.0.0
Hostname: hello-app-5c554f556c-v2gx9
and the ingress resource of that service:
bino#corobalap  ~/k0s-sriwijaya/ingress-nginx/testapp  kubectl -n dev describe ingress hello-app-ingress
Name: hello-app-ingress
Labels: <none>
Namespace: dev
Address: 192.168.1.61
Ingress Class: nginx
Default backend: <default>
Rules:
Host Path Backends
---- ---- --------
bino.k8s.jcamp.cloud
/ hello-service:8081 (10.244.0.22:8080)
Annotations: ingress.kubernetes.io/rewrite-target: /
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Sync 23m (x3 over 24m) nginx-ingress-controller Scheduled for sync
When I open http://bino.k8s.jcamp.cloud I got 502
my nginx reverse proxy conf :
server {
listen 80 default_server;
location / {
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_pass http://192.168.1.186;
}
}
The nginx error log say
2022/07/26 06:24:21 [error] 1593#1593: *6 connect() failed (113: No route to host) while connecting to upstream, client: 203.161.185.210, server: , request: "GET /favicon.ico HTTP/1.1", upstream: "http://192.168.1.186:80/favicon.ico", host: "bino.k8s.jcamp.cloud", referrer: "http://bino.k8s.jcamp.cloud/"
from describe ingress-nginx-controller pod
bino#corobalap  ~/k0s-sriwijaya/ingress-nginx/testapp  kubectl -n ingress-nginx describe pod ingress-nginx-controller-6dc865cd86-9fmsk
Name: ingress-nginx-controller-6dc865cd86-9fmsk
Namespace: ingress-nginx
Priority: 0
Node: k8s-worker-1/192.168.1.61
Start Time: Tue, 26 Jul 2022 16:11:05 +0700
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress-nginx
app.kubernetes.io/name=ingress-nginx
pod-template-hash=6dc865cd86
Annotations: kubernetes.io/psp: 00-k0s-privileged
Status: Running
IP: 10.244.0.23
IPs:
IP: 10.244.0.23
Controlled By: ReplicaSet/ingress-nginx-controller-6dc865cd86
Containers:
controller:
Container ID: containerd://541446c98b55312376aba4744891baa325dca26410abe5f94707d270d378d881
Image: registry.k8s.io/ingress-nginx/controller:v1.3.0#sha256:d1707ca76d3b044ab8a28277a2466a02100ee9f58a86af1535a3edf9323ea1b5
Image ID: registry.k8s.io/ingress-nginx/controller#sha256:d1707ca76d3b044ab8a28277a2466a02100ee9f58a86af1535a3edf9323ea1b5
Ports: 80/TCP, 443/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--election-id=ingress-controller-leader
--controller-class=k8s.io/ingress-nginx
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
State: Running
Started: Tue, 26 Jul 2022 16:56:40 +0700
Last State: Terminated
Reason: Unknown
Exit Code: 255
Started: Tue, 26 Jul 2022 16:11:09 +0700
Finished: Tue, 26 Jul 2022 16:56:26 +0700
Ready: True
Restart Count: 1
Requests:
cpu: 100m
memory: 90Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: ingress-nginx-controller-6dc865cd86-9fmsk (v1:metadata.name)
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
Mounts:
/usr/local/certificates/ from webhook-cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nfmrc (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
webhook-cert:
Type: Secret (a volume populated by a Secret)
SecretName: ingress-nginx-admission
Optional: false
kube-api-access-nfmrc:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning NodeNotReady 44m node-controller Node is not ready
Warning FailedMount 43m kubelet MountVolume.SetUp failed for volume "webhook-cert" : object "ingress-nginx"/"ingress-nginx-admission" not registered
Warning FailedMount 43m kubelet MountVolume.SetUp failed for volume "webhook-cert" : failed to sync secret cache: timed out waiting for the condition
Warning FailedMount 43m kubelet MountVolume.SetUp failed for volume "kube-api-access-nfmrc" : failed to sync configmap cache: timed out waiting for the condition
Normal SandboxChanged 43m kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 43m kubelet Container image "registry.k8s.io/ingress-nginx/controller:v1.3.0#sha256:d1707ca76d3b044ab8a28277a2466a02100ee9f58a86af1535a3edf9323ea1b5" already present on machine
Normal Created 43m kubelet Created container controller
Normal Started 43m kubelet Started container controller
Warning Unhealthy 42m (x2 over 42m) kubelet Liveness probe failed: Get "http://10.244.0.23:10254/healthz": dial tcp 10.244.0.23:10254: connect: connection refused
Warning Unhealthy 42m (x3 over 43m) kubelet Readiness probe failed: Get "http://10.244.0.23:10254/healthz": dial tcp 10.244.0.23:10254: connect: connection refused
Normal RELOAD 42m nginx-ingress-controller NGINX reload triggered due to a change in configuration
and here is the nft ruleset
bino#k8s-worker-1:~$ su -
Password:
root#k8s-worker-1:~# systemctl status nftables.service
● nftables.service - nftables
Loaded: loaded (/lib/systemd/system/nftables.service; enabled; vendor preset: enabled)
Active: active (exited) since Tue 2022-07-26 05:56:17 EDT; 46min ago
Docs: man:nft(8)
http://wiki.nftables.org
Process: 186 ExecStart=/usr/sbin/nft -f /etc/nftables.conf (code=exited, status=0/SUCCESS)
Main PID: 186 (code=exited, status=0/SUCCESS)
CPU: 34ms
Warning: journal has been rotated since unit was started, output may be incomplete.
[]
Complete ruleset is at https://pastebin.com/xd58rcQp
Kindly please tell me what to do, to check, or to learn for fixing this problem
Sincerely
-bino-
my bad ...
There is a name mismatch between ip pool devinition yaml and the l2 advertisement yaml.

"[error] [upstream] connection timed out after 10 seconds" failed when fluent-bit tries to communicate with fluentd in Kubernetes

I'm using fluent-bit to collect logs and pass it to fluentd for processing in a Kubernetes environment. Fluent-bit instances are controlled by DaemonSet and read logs from docker containers.
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Mem_Buf_Limit 5MB
Skip_Long_Lines On
There is a fluent-bit service also running
Name: monitoring-fluent-bit-dips
Namespace: dips
Labels: app.kubernetes.io/instance=monitoring
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=fluent-bit-dips
app.kubernetes.io/version=1.8.10
helm.sh/chart=fluent-bit-0.19.6
Annotations: meta.helm.sh/release-name: monitoring
meta.helm.sh/release-namespace: dips
Selector: app.kubernetes.io/instance=monitoring,app.kubernetes.io/name=fluent-bit-dips
Type: ClusterIP
IP Families: <none>
IP: 10.43.72.32
IPs: <none>
Port: http 2020/TCP
TargetPort: http/TCP
Endpoints: 10.42.0.144:2020,10.42.1.155:2020,10.42.2.186:2020 + 1 more...
Session Affinity: None
Events: <none>
Fluentd service description is as below
Name: monitoring-logservice
Namespace: dips
Labels: app.kubernetes.io/instance=monitoring
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=logservice
app.kubernetes.io/version=1.9
helm.sh/chart=logservice-0.1.2
Annotations: meta.helm.sh/release-name: monitoring
meta.helm.sh/release-namespace: dips
Selector: app.kubernetes.io/instance=monitoring,app.kubernetes.io/name=logservice
Type: ClusterIP
IP Families: <none>
IP: 10.43.44.254
IPs: <none>
Port: http 24224/TCP
TargetPort: http/TCP
Endpoints: 10.42.0.143:24224
Session Affinity: None
Events: <none>
But fluent-bit logs doesn't reach fluentd and getting following error
[error] [upstream] connection #81 to monitoring-fluent-bit-dips:24224 timed out after 10 seconds
I tried several things like;
re-deploying fluent-bit pods
re-deploy fluentd pod
Upgrade fluent-bit version from 1.7.3 to 1.8.10
This is an Kubernetes environment where fluent-bit able to communicate with fluentd in the very earlier stage of deployment. Apart from that, this same fluent versions is working when I deploy locally with docker-desktop environment.
My guesses are
fluent-bit cannot manage the amount of log process
fluent services are unable to communicate once the services are restarted
Anyone having any experience in this or has any idea how to debug this issue more deeper?
Updated following with fluentd running pod description
Name: monitoring-logservice-5b8864ffd8-gfpzc
Namespace: dips
Priority: 0
Node: sl-sy-k3s-01/10.16.1.99
Start Time: Mon, 29 Nov 2021 13:09:13 +0530
Labels: app.kubernetes.io/instance=monitoring
app.kubernetes.io/name=logservice
pod-template-hash=5b8864ffd8
Annotations: kubectl.kubernetes.io/restartedAt: 2021-11-29T12:37:23+05:30
Status: Running
IP: 10.42.0.143
IPs:
IP: 10.42.0.143
Controlled By: ReplicaSet/monitoring-logservice-5b8864ffd8
Containers:
logservice:
Container ID: containerd://102483a7647fd2f10bead187eddf69aa4fad72051d6602dd171e1a373d4209d7
Image: our.private.repo/dips/logservice/splunk:1.9
Image ID: our.private.repo/dips/logservice/splunk#sha256:531f15f523a251b93dc8a25056f05c0c7bb428241531485a22b94896974e17e8
Ports: 24231/TCP, 24224/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Mon, 29 Nov 2021 13:09:14 +0530
Ready: True
Restart Count: 0
Liveness: exec [/bin/healthcheck.sh] delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [/bin/healthcheck.sh] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
SOME_ENV_VARS
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from monitoring-logservice-token-g9kwt (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
monitoring-logservice-token-g9kwt:
Type: Secret (a volume populated by a Secret)
SecretName: monitoring-logservice-token-g9kwt
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
Try change your fluent-bit config that points to fluentd service as monitoring-logservice.dips:24224
https://docs.fluentbit.io/manual/pipeline/filters/kubernetes
filters: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default:443
tls.verify Off
In my issue, Kubernetes Apiserver ssl error.

Why do I keep getting error "5 pod has unbound immediate PersistentVolumeClaims"?

I am following the book Kubernetes for developers and seems maybe book is heavily outdated now.
Recently I have been trying to get prometheus up and running on kubernetes following the instruction from book. That suggested to install and use HELM to get Prometheus and grafana up and running.
helm install monitor stable/prometheus --namespace monitoring
this resulted:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
monitor-kube-state-metrics-578cdbb5b7-pdjzw 0/1 CrashLoopBackOff 14 36m 192.168.23.1 kube-worker-vm3 <none> <none>
monitor-prometheus-alertmanager-7b4c476678-gr4s6 0/2 Pending 0 35m <none> <none> <none> <none>
monitor-prometheus-node-exporter-5kz8x 1/1 Running 0 14h 192.168.1.13 rockpro64 <none> <none>
monitor-prometheus-node-exporter-jjrjh 1/1 Running 1 14h 192.168.1.35 osboxes <none> <none>
monitor-prometheus-node-exporter-k62fn 1/1 Running 1 14h 192.168.1.37 kube-worker-vm3 <none> <none>
monitor-prometheus-node-exporter-wcg2k 1/1 Running 1 14h 192.168.1.36 kube-worker-vm2 <none> <none>
monitor-prometheus-pushgateway-6898f8475b-sk4dz 1/1 Running 0 36m 192.168.90.200 osboxes <none> <none>
monitor-prometheus-server-74d7dc5d4c-vlqmm 0/2 Pending 0 14h <none> <none> <none
For the prometheus server I checked why is it Pending:
# kubectl describe pod monitor-prometheus-server-74d7dc5d4c-vlqmm -n monitoring
Name: monitor-prometheus-server-74d7dc5d4c-vlqmm
Namespace: monitoring
Priority: 0
Node: <none>
Labels: app=prometheus
chart=prometheus-13.8.0
component=server
heritage=Helm
pod-template-hash=74d7dc5d4c
release=monitor
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/monitor-prometheus-server-74d7dc5d4c
Containers:
prometheus-server-configmap-reload:
Image: jimmidyson/configmap-reload:v0.4.0
Port: <none>
Host Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://127.0.0.1:9090/-/reload
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from monitor-prometheus-server-token-n49ls (ro)
prometheus-server:
Image: prom/prometheus:v2.20.1
Port: 9090/TCP
Host Port: 0/TCP
Args:
--storage.tsdb.retention.time=15d
--config.file=/etc/config/prometheus.yml
--storage.tsdb.path=/data
--web.console.libraries=/etc/prometheus/console_libraries
--web.console.templates=/etc/prometheus/consoles
--web.enable-lifecycle
Liveness: http-get http://:9090/-/healthy delay=30s timeout=30s period=15s #success=1 #failure=3
Readiness: http-get http://:9090/-/ready delay=30s timeout=30s period=5s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from monitor-prometheus-server-token-n49ls (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: monitor-prometheus-server
Optional: false
storage-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: monitor-prometheus-server
ReadOnly: false
monitor-prometheus-server-token-n49ls:
Type: Secret (a volume populated by a Secret)
SecretName: monitor-prometheus-server-token-n49ls
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 28m (x734 over 14h) default-scheduler 0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 3m5s (x23 over 24m) default-scheduler 0/5 nodes are available: 5 pod has unbound immediate PersistentVolumeClaims.
r
However this message I am seeing 0/5 nodes are available: 5 pod has unbound immediate PersistentVolumeClaims. is coming with all other nodejs's StatefulSets and rabbitmq Deployments I have tried created. for rabbitmq and nodejs I figured out I need to create a PersistantVolume and a storage class whose name I needed to specify in the PV and PVC. and then it all worked but now I have Prometheus Server, Do I have to do the same for prometheus as well ? why is it not instructed by the HELM ?
Has something change in the Kubernetes API recently ? that I always have to create a PV and Storage Class explicitly for a PVC ?
Unless you configure your cluster with dynamic volume provisioning , you will have to make the PV manually each time. Even if you are not on a cloud, you can setup dynamic storage providers. There are a number of options for providers and you can find many here. Ceph and minio are popular providers.

NEG says Pods are 'unhealthy', but actually the Pods are healthy

I'm trying to apply gRPC load balancing with Ingress on GCP, and for this I referenced this example. The example shows gRPC load balancing is working by 2 ways(one with envoy side-car and the other one is HTTP mux, handling both gRPC/HTTP-health-check on same Pod.) However, the envoy proxy example doesn't work.
What makes me confused is, the Pods are running/healthy(confirmed by kubectl describe, kubectl logs)
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
fe-deployment-757ffcbd57-4w446 2/2 Running 0 4m22s
fe-deployment-757ffcbd57-xrrm9 2/2 Running 0 4m22s
$ kubectl describe pod fe-deployment-757ffcbd57-4w446
Name: fe-deployment-757ffcbd57-4w446
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc/10.128.0.64
Start Time: Thu, 26 Sep 2019 16:15:18 +0900
Labels: app=fe
pod-template-hash=757ffcbd57
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container fe-envoy; cpu request for container fe-container
Status: Running
IP: 10.56.1.29
Controlled By: ReplicaSet/fe-deployment-757ffcbd57
Containers:
fe-envoy:
Container ID: docker://b4789909494f7eeb8d3af66cb59168e009c582d412d8ca683a7f435559989421
Image: envoyproxy/envoy:latest
Image ID: docker-pullable://envoyproxy/envoy#sha256:9ef9c4fd6189fdb903929dc5aa0492a51d6783777de65e567382ac7d9a28106b
Port: 8080/TCP
Host Port: 0/TCP
Command:
/usr/local/bin/envoy
Args:
-c
/data/config/envoy.yaml
State: Running
Started: Thu, 26 Sep 2019 16:15:19 +0900
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Liveness: http-get https://:fe/_ah/health delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:fe/_ah/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data/certs from certs-volume (rw)
/data/config from envoy-config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c7nqc (ro)
fe-container:
Container ID: docker://a533224d3ea8b5e4d5e268a616d73762b37df69f434342459f35caa8fac32dab
Image: salrashid123/grpc_only_backend
Image ID: docker-pullable://salrashid123/grpc_only_backend#sha256:ebfac594116445dd67aff7c9e7a619d73222b60947e46ef65ee6d918db3e1f4b
Port: 50051/TCP
Host Port: 0/TCP
Command:
/grpc_server
Args:
--grpcport
:50051
--insecure
State: Running
Started: Thu, 26 Sep 2019 16:15:20 +0900
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c7nqc (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
certs-volume:
Type: Secret (a volume populated by a Secret)
SecretName: fe-secret
Optional: false
envoy-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: envoy-configmap
Optional: false
default-token-c7nqc:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-c7nqc
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m25s default-scheduler Successfully assigned default/fe-deployment-757ffcbd57-4w446 to gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc
Normal Pulled 4m25s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Container image "envoyproxy/envoy:latest" already present on machine
Normal Created 4m24s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Created container
Normal Started 4m24s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Started container
Normal Pulling 4m24s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc pulling image "salrashid123/grpc_only_backend"
Normal Pulled 4m24s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Successfully pulled image "salrashid123/grpc_only_backend"
Normal Created 4m24s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Created container
Normal Started 4m23s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Started container
Warning Unhealthy 4m10s (x2 over 4m20s) kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Readiness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 4m9s (x2 over 4m19s) kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Liveness probe failed: HTTP probe failed with statuscode: 503
$ kubectl describe pod fe-deployment-757ffcbd57-xrrm9
Name: fe-deployment-757ffcbd57-xrrm9
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9/10.128.0.22
Start Time: Thu, 26 Sep 2019 16:15:18 +0900
Labels: app=fe
pod-template-hash=757ffcbd57
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container fe-envoy; cpu request for container fe-container
Status: Running
IP: 10.56.0.23
Controlled By: ReplicaSet/fe-deployment-757ffcbd57
Containers:
fe-envoy:
Container ID: docker://255dd6cab1e681e30ccfe158f7d72540576788dbf6be60b703982a7ecbb310b1
Image: envoyproxy/envoy:latest
Image ID: docker-pullable://envoyproxy/envoy#sha256:9ef9c4fd6189fdb903929dc5aa0492a51d6783777de65e567382ac7d9a28106b
Port: 8080/TCP
Host Port: 0/TCP
Command:
/usr/local/bin/envoy
Args:
-c
/data/config/envoy.yaml
State: Running
Started: Thu, 26 Sep 2019 16:15:19 +0900
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Liveness: http-get https://:fe/_ah/health delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:fe/_ah/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data/certs from certs-volume (rw)
/data/config from envoy-config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c7nqc (ro)
fe-container:
Container ID: docker://f6a0246129cc89da846c473daaa1c1770d2b5419b6015098b0d4f35782b0a9da
Image: salrashid123/grpc_only_backend
Image ID: docker-pullable://salrashid123/grpc_only_backend#sha256:ebfac594116445dd67aff7c9e7a619d73222b60947e46ef65ee6d918db3e1f4b
Port: 50051/TCP
Host Port: 0/TCP
Command:
/grpc_server
Args:
--grpcport
:50051
--insecure
State: Running
Started: Thu, 26 Sep 2019 16:15:20 +0900
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c7nqc (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
certs-volume:
Type: Secret (a volume populated by a Secret)
SecretName: fe-secret
Optional: false
envoy-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: envoy-configmap
Optional: false
default-token-c7nqc:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-c7nqc
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m8s default-scheduler Successfully assigned default/fe-deployment-757ffcbd57-xrrm9 to gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9
Normal Pulled 5m8s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Container image "envoyproxy/envoy:latest" already present on machine
Normal Created 5m7s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Created container
Normal Started 5m7s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Started container
Normal Pulling 5m7s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 pulling image "salrashid123/grpc_only_backend"
Normal Pulled 5m7s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Successfully pulled image "salrashid123/grpc_only_backend"
Normal Created 5m7s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Created container
Normal Started 5m6s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Started container
Warning Unhealthy 4m53s (x2 over 5m3s) kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Readiness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 4m52s (x2 over 5m2s) kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Liveness probe failed: HTTP probe failed with statuscode: 503
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fe-srv-ingress NodePort 10.123.5.165 <none> 8080:30816/TCP 6m43s
fe-srv-lb LoadBalancer 10.123.15.36 35.224.69.60 50051:30592/TCP 6m42s
kubernetes ClusterIP 10.123.0.1 <none> 443/TCP 2d2h
$ kubectl describe service fe-srv-ingress
Name: fe-srv-ingress
Namespace: default
Labels: type=fe-srv
Annotations: cloud.google.com/neg: {"ingress": true}
cloud.google.com/neg-status:
{"network_endpoint_groups":{"8080":"k8s1-963b7b91-default-fe-srv-ingress-8080-e459b0d2"},"zones":["us-central1-a"]}
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"cloud.google.com/neg":"{\"ingress\": true}","service.alpha.kubernetes.io/a...
service.alpha.kubernetes.io/app-protocols: {"fe":"HTTP2"}
Selector: app=fe
Type: NodePort
IP: 10.123.5.165
Port: fe 8080/TCP
TargetPort: 8080/TCP
NodePort: fe 30816/TCP
Endpoints: 10.56.0.23:8080,10.56.1.29:8080
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Create 6m47s neg-controller Created NEG "k8s1-963b7b91-default-fe-srv-ingress-8080-e459b0d2" for default/fe-srv-ingress-8080/8080 in "us-central1-a".
Normal Attach 6m40s neg-controller Attach 2 network endpoint(s) (NEG "k8s1-963b7b91-default-fe-srv-ingress-8080-e459b0d2" in zone "us-central1-a")
but NEG says they are unhealthy(so Ingress also says backend is unhealthy).
I couldn't found what caused this. Does anyone know how to solve this?
Test environment:
GKE, 1.13.7-gke.8 (VPC enabled)
Default HTTP(s) load balancer on Ingress
YAML files I used(same with the example previously mentioned),
envoy-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: envoy-configmap
labels:
app: fe
data:
config: |-
---
admin:
access_log_path: /dev/null
address:
socket_address:
address: 127.0.0.1
port_value: 9000
node:
cluster: service_greeter
id: test-id
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 8080 }
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match:
path: "/echo.EchoServer/SayHello"
route: { cluster: local_grpc_endpoint }
http_filters:
- name: envoy.lua
config:
inline_code: |
package.path = "/etc/envoy/lua/?.lua;/usr/share/lua/5.1/nginx/?.lua;/etc/envoy/lua/" .. package.path
function envoy_on_request(request_handle)
if request_handle:headers():get(":path") == "/_ah/health" then
local headers, body = request_handle:httpCall(
"local_admin",
{
[":method"] = "GET",
[":path"] = "/clusters",
[":authority"] = "local_admin"
},"", 50)
str = "local_grpc_endpoint::127.0.0.1:50051::health_flags::healthy"
if string.match(body, str) then
request_handle:respond({[":status"] = "200"},"ok")
else
request_handle:logWarn("Envoy healthcheck failed")
request_handle:respond({[":status"] = "503"},"unavailable")
end
end
end
- name: envoy.router
typed_config: {}
tls_context:
common_tls_context:
tls_certificates:
- certificate_chain:
filename: "/data/certs/tls.crt"
private_key:
filename: "/data/certs/tls.key"
clusters:
- name: local_grpc_endpoint
connect_timeout: 0.05s
type: STATIC
http2_protocol_options: {}
lb_policy: ROUND_ROBIN
common_lb_config:
healthy_panic_threshold:
value: 50.0
health_checks:
- timeout: 1s
interval: 5s
interval_jitter: 1s
no_traffic_interval: 5s
unhealthy_threshold: 1
healthy_threshold: 3
grpc_health_check:
service_name: "echo.EchoServer"
authority: "server.domain.com"
hosts:
- socket_address:
address: 127.0.0.1
port_value: 50051
- name: local_admin
connect_timeout: 0.05s
type: STATIC
lb_policy: ROUND_ROBIN
hosts:
- socket_address:
address: 127.0.0.1
port_value: 9000
fe-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: fe-deployment
labels:
app: fe
spec:
replicas: 2
template:
metadata:
labels:
app: fe
spec:
containers:
- name: fe-envoy
image: envoyproxy/envoy:latest
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /_ah/health
scheme: HTTPS
port: fe
readinessProbe:
httpGet:
path: /_ah/health
scheme: HTTPS
port: fe
ports:
- name: fe
containerPort: 8080
protocol: TCP
command: ["/usr/local/bin/envoy"]
args: ["-c", "/data/config/envoy.yaml"]
volumeMounts:
- name: certs-volume
mountPath: /data/certs
- name: envoy-config-volume
mountPath: /data/config
- name: fe-container
image: salrashid123/grpc_only_backend # This runs gRPC secure/insecure server using port argument(:50051). Port 50051 is also exposed on Dockerfile.
imagePullPolicy: Always
ports:
- containerPort: 50051
protocol: TCP
command: ["/grpc_server"]
args: ["--grpcport", ":50051", "--insecure"]
volumes:
- name: certs-volume
secret:
secretName: fe-secret
- name: envoy-config-volume
configMap:
name: envoy-configmap
items:
- key: config
path: envoy.yaml
fe-srv-ingress.yaml
---
apiVersion: v1
kind: Service
metadata:
name: fe-srv-ingress
labels:
type: fe-srv
annotations:
service.alpha.kubernetes.io/app-protocols: '{"fe":"HTTP2"}'
cloud.google.com/neg: '{"ingress": true}'
spec:
type: NodePort
ports:
- name: fe
port: 8080
protocol: TCP
targetPort: 8080
selector:
app: fe
fe-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: fe-ingress
annotations:
kubernetes.io/ingress.allow-http: "false"
spec:
tls:
- hosts:
- server.domain.com
secretName: fe-secret
rules:
- host: server.domain.com
http:
paths:
- path: /echo.EchoServer/*
backend:
serviceName: fe-srv-ingress
servicePort: 8080
I had to allow any traffic from IP range specified as health checks source in documentation pages - 130.211.0.0/22, 35.191.0.0/16 , seen it here: https://cloud.google.com/kubernetes-engine/docs/how-to/standalone-neg
And I had to allow it for default network and for the new network (regional) the cluster lives in.
When I added these firewall rules, health checks could reach the pods exposed in NEG used as a regional backend within a backend service of our Http(s) load balancer.
May be there is a more restrictive firewall setup, but I just cut the corners and allowed anything from IP range declared to be healthcheck source range from the page referenced above.
GCP committer says this is kind of bug, so there is no way to fix this at this time.
Related issue is this, and pull request is now progressing.

How to run Tiller on Kubernetes cluster on AWS EKS

I created EKS Kubernetes cluster with terraform. It all went fine, cluster is created and there is one EC2 machine on it. However, I can't init helm and install Tiller there. All the code is on https://github.com/amorfis/aws-eks-terraform
As stated in README.md, after cluster creation I update ~/.kube/config, create rbac, and try to init helm. However, it's pod is still pending:
$> kubectl --namespace kube-system get pods
NAME READY STATUS RESTARTS AGE
coredns-7554568866-8mnsm 0/1 Pending 0 3h
coredns-7554568866-mng65 0/1 Pending 0 3h
tiller-deploy-77c96688d7-87rb8 0/1 Pending 0 1h
As well as other 2 coredns pods.
What am i missing?
UPDATE: Output of describe:
$> kubectl describe pod tiller-deploy-77c96688d7-87rb8 --namespace kube-system
Name: tiller-deploy-77c96688d7-87rb8
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: <none>
Labels: app=helm
name=tiller
pod-template-hash=3375224483
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/tiller-deploy-77c96688d7
Containers:
tiller:
Image: gcr.io/kubernetes-helm/tiller:v2.12.2
Ports: 44134/TCP, 44135/TCP
Host Ports: 0/TCP, 0/TCP
Liveness: http-get http://:44135/liveness delay=1s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:44135/readiness delay=1s timeout=1s period=10s #success=1 #failure=3
Environment:
TILLER_NAMESPACE: kube-system
TILLER_HISTORY_MAX: 0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from tiller-token-b9x6d (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
tiller-token-b9x6d:
Type: Secret (a volume populated by a Secret)
SecretName: tiller-token-b9x6d
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
try to allow the master to run pods
according to this issue issue form githup
kubectl taint nodes --all node-role.kubernetes.io/master-