kube-registry-proxy doesn't expose port 5000 on any nodes - kubernetes

I'm using private Docker registry addon in my kubernetes cluster, and I would like to expose port 5000 on each node to pull image from localhost:5000 easily. So I placed a pod manifest file /etc/kubernetes/manifests/kube-registry-proxy.manifest on every node to start a local proxy for port 5000. It works when I manually deployed kubernetes on bare metal ubuntu few months ago, but failed when I try kargo, the port 5000 not listening.
I'm using kargo with calico network plugin, the docker registry's configurations are:
kind: PersistentVolume
apiVersion: v1
metadata:
name: kube-system-kube-registry-pv
labels:
kubernetes.io/cluster-service: "true"
spec:
capacity:
storage: 500Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
hostPath:
path: /registry
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: kube-registry-pvc
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 500Gi
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-registry-v0
namespace: kube-system
labels:
k8s-app: kube-registry
version: v0
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: kube-registry
version: v0
template:
metadata:
labels:
k8s-app: kube-registry
version: v0
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: registry
image: registry:2.5.1
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 100m
memory: 100Mi
env:
- name: REGISTRY_HTTP_ADDR
value: :5000
- name: REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY
value: /var/lib/registry
volumeMounts:
- name: image-store
mountPath: /var/lib/registry
ports:
- containerPort: 5000
name: registry
protocol: TCP
volumes:
- name: image-store
persistentVolumeClaim:
claimName: kube-registry-pvc
apiVersion: v1
kind: Service
metadata:
name: kube-registry
namespace: kube-system
labels:
k8s-app: kube-registry
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "KubeRegistry"
spec:
selector:
k8s-app: kube-registry
ports:
- name: registry
port: 5000
protocol: TCP
I have created a pod manifest file /etc/kubernetes/manifests/kube-registry-proxy.manifest before run kargo:
apiVersion: v1
kind: Pod
metadata:
name: kube-registry-proxy
namespace: kube-system
spec:
containers:
- name: kube-registry-proxy
image: gcr.io/google_containers/kube-registry-proxy:0.3
resources:
limits:
cpu: 100m
memory: 50Mi
env:
- name: REGISTRY_HOST
value: kube-registry.kube-system.svc.cluster.local
- name: REGISTRY_PORT
value: "5000"
- name: FORWARD_PORT
value: "5000"
ports:
- name: registry
containerPort: 5000
hostPort: 5000
kube-registry-proxy is running on all nodes, but nothing listen on port 5000. Some output:
ubuntu#k8s15m1:~$ kubectl get all --all-namespaces | grep registry-proxy
kube-system po/kube-registry-proxy-k8s15m1 1/1 Running 1 1h
kube-system po/kube-registry-proxy-k8s15m2 1/1 Running 0 1h
kube-system po/kube-registry-proxy-k8s15s1 1/1 Running 0 1h
ubuntu#k8s15m1:~$ docker ps | grep registry
756fcf674288 gcr.io/google_containers/kube-registry-proxy:0.3 "/usr/bin/run_proxy" 19 minutes ago Up 19 minutes k8s_kube-registry-proxy.bebf6da1_kube-registry-proxy-k8s15m1_kube-system_a818b22dc7210ecd31414e328ae28e43_7221833c
ubuntu#k8s15m1:~$ docker logs 756fcf674288 | tail
waiting for kube-registry.kube-system.svc.cluster.local to come online
starting proxy
ubuntu#k8s15m1:~$ netstat -ltnp | grep 5000
ubuntu#k8s15m1:~$ curl -v localhost:5000/v1/
* Trying 127.0.0.1...
* connect to 127.0.0.1 port 5000 failed: Connection refused
* Failed to connect to localhost port 5000: Connection refused
* Closing connection 0
curl: (7) Failed to connect to localhost port 5000: Connection refused
ubuntu#k8s15m1:~$ kubectl get po kube-registry-proxy-k8s15m1 --namespace=kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE
kube-registry-proxy-k8s15m1 1/1 Running 3 1h 10.233.69.64 k8s15m1
ubuntu#k8s15m1:~$ curl -v 10.233.69.64:5000/v1/
* Trying 10.233.69.64...
* Connected to 10.233.69.64 (10.233.69.64) port 5000 (#0)
> GET /v1/ HTTP/1.1
> Host: 10.233.69.64:5000
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Content-Type: text/plain; charset=utf-8
< Docker-Distribution-Api-Version: registry/2.0
< X-Content-Type-Options: nosniff
< Date: Tue, 14 Mar 2017 16:41:56 GMT
< Content-Length: 19
<
404 page not found
* Connection #0 to host 10.233.69.64 left intact

I think there are a couple of things going on here.
Foremost, be aware that Kubernetes Services come in 3 flavors: ClusterIP (which is the default), NodePort (which sounds very much like what you were expecting to happen), and LoadBalancer (which I won't mention further, but the docs do).
I would expect that if you updated your Service to explicitly request type: NodePort, you'll get closer to what you had in mind (but be aware that unless you changed it, NodePort ports are limited to 30000-32767.
Thus:
apiVersion: v1
kind: Service
metadata:
name: kube-registry
namespace: kube-system
labels:
k8s-app: kube-registry
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "KubeRegistry"
spec:
type: NodePort # <--- updated line
selector:
k8s-app: kube-registry
ports:
- name: registry
port: 5000
nodePort: 30500 # <-- you can specify, or omit this
protocol: TCP
If you have an opinion about the port you want the Service to listen on, feel free to specify it, or just leave it off and Kubernetes will pick one from the available space.
I'm going to mention this next thing for completeness, but what I'm about to say is a bad practice, so please don't do it. You can also have the Pods listen directly on the actual TCP/IP stack of the Node, by specifying hostPort; so in your case, it would be hostPort: 5000 right below containerPort: 5000, causing the Pod to behave like a normal docker -p 5000:5000 command would. But doing that makes scheduling Pods a nightmare, so please don't.
Secondly, about your 404 from curl:
I'm going to assume, based on the output of your curl command that 10.233.69.x is your Service CIDR, which explains why port 5000 responded with anything. The request was in the right spirit, but /v1/ was an incorrect URI to attempt. The Docker Registry API docs contains a section about checking it is a V2 API instance. My favorite curl of a registry is https://registry.example.com/v2/_catalog because it will return the name of every image therein, ensuring that my credentials are correct, that the registry server is operating correctly, and so on.
I know it's a lot to take in, so if you feel I glossed over something, let me know and I'll try to address it. Good luck!

Related

wget : can't connect to remote host when connecting to kubernetes service in separate namespace

I have setup my API for Prometheus monitoring using client instrumentation but when I check the endpoint in PromUI it is in DOWN status as below :
I checked for connectivity issues by exec into the prometheus pod and trying a wget. When I do so I get :
/prometheus $ wget metrics-clusterip.labs.svc:9216
Connecting to metrics-clusterip.labs.svc:9216 (10.XXX.XX.XXX:9216)
wget: can't connect to remote host (10.XXX.XX.XX): Connection refused
I have an existing MongoDB instance with a Prometheus exporter sidecar and its working perfectly (ie its endpoint is UP in PromUI). As a check I tried connecting to this MongoDB instance's service and it turns out the prometheus pod can indeed connect (as expected) :
wget mongodb-metrics.labs.svc:9216
Connecting to mongodb-metrics.labs.svc:9216 (10.XXX.XX.X:9216)
wget: can't open 'index.html': File exists
This is what I have also tried :
I tried both nslookup metrics-clusterip.svc.labs and nslookup metrics-clusterip.svc.labs:9216 from the prometheus pod but I got same error :
nslookup metrics-clusterip.svc.labs:9216
Server: 169.XXX.XX.XX
Address: 169.XXX.XX.XX:XX
** server can't find metrics-clusterip.svc.labs:9216: NXDOMAIN
*** Can't find metrics-clusterip.svc.labs:9216: No answer
However, when I port-forward the service I can successfully query the metrics endpoint and this shows that the metrics endpoint is up in the API container :
kubectl port-forward svc/metrics-clusterip 9216
NB:Both the API routes and the metrics endpoint are using the same port (9216)
Check if DNS pod is running
kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-XXXXXXXXX-XXXXX 1/1 Running 0 294d
coredns-XXXXXXXXX-XXXXX 1/1 Running 0 143d
This is the configuration for my deployment :
apiVersion: apps/v1
kind: Deployment
metadata:
name: reg
labels:
app: reg
namespace: labs
spec:
replicas: 1
selector:
matchLabels:
app: reg
release: loki
template:
metadata:
labels:
app: reg
release: loki
spec:
containers:
- name: reg
image: xxxxxx/sre-ops:dev-latest
imagePullPolicy: Always
ports:
- name: reg
containerPort: 9216
resources:
limits:
memory: 500Mi
requests:
cpu: 100m
memory: 128Mi
nodeSelector:
kubernetes.io/hostname: xxxxxxxxxxxx
imagePullSecrets:
- name: xxxx
---
apiVersion: v1
kind: Service
metadata:
name: metrics-clusterip
namespace: labs
labels:
app: reg
release: loki
annotations:
prometheus.io/path: /metrics
prometheus.io/port: '9216'
prometheus.io/scrape: "true"
spec:
type: ClusterIP
selector:
app: reg
release: loki
ports:
- port: 9216
targetPort: reg
protocol: TCP
name: reg
And the ServiceMonitor :
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: reg
namespace: loki
labels:
app: reg
release: loki
spec:
selector:
matchLabels:
app: reg
release: loki
endpoints:
- port: reg
path: /metrics
interval: 15s
namespaceSelector:
matchNames:
- "labs"
I have also tried to pipe the DNS pod logs to a file but I am not sure what I should be looking to get more detail :
kubectl logs --namespace=kube-system coredns-XXXXXX-XXXX >> logs.txt
What am I missing?

Why my Nodeport service change its port number?

I am trying to install the velero for k8s. During the installation when try to install mini.io I changes its service type from cluster IP to Node Port. My Pods run successfully and also I can see the node Port services is up and running.
master-k8s#masterk8s-virtual-machine:~/velero-v1.9.5-linux-amd64$ kubectl get pods -n velero -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
minio-8649b94fb5-vk7gv 1/1 Running 0 16m 10.244.1.102 node1k8s-virtual-machine <none> <none>
master-k8s#masterk8s-virtual-machine:~/velero-v1.9.5-linux-amd64$ kubectl get svc -n velero NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
minio NodePort 10.111.72.207 <none> 9000:31481/TCP 53m
When I try to access my services port number changes from 31481 to 45717 by it self. Every time when I correct port number and hit enter it changes back to new port and I am not able to access my application.
These are my codes from mini.io service file.
apiVersion: v1
kind: Service
metadata:
namespace: velero
name: minio
labels:
component: minio
spec:
type: NodePort
ports:
- port: 9000
targetPort: 9000
protocol: TCP
selector:
component: minio
What I have done so far?
I look for the log and everything show successful No error. I also try it with Load balancer service. With Load balancer port not not changes but I am not able to access the application.
Noting found on google about this issue.
I also check all the namespaces pods and services to check if these Port numbers are being used. No services use these ports.
What Do I want?
Can you please help me to find out what cause my application to change its port. Where is the issue and how to fix it.? How can I access application dashbord?
Update Question
This is the full codes file. It may help to find my mistake.
apiVersion: v1
kind: Namespace
metadata:
name: velero
---
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: velero
name: minio
labels:
component: minio
spec:
strategy:
type: Recreate
selector:
matchLabels:
component: minio
template:
metadata:
labels:
component: minio
spec:
volumes:
- name: storage
emptyDir: {}
- name: config
emptyDir: {}
containers:
- name: minio
image: minio/minio:latest
imagePullPolicy: IfNotPresent
args:
- server
- /storage
- --config-dir=/config
env:
- name: MINIO_ACCESS_KEY
value: "minio"
- name: MINIO_SECRET_KEY
value: "minio123"
ports:
- containerPort: 9002
volumeMounts:
- name: storage
mountPath: "/storage"
- name: config
mountPath: "/config"
---
apiVersion: v1
kind: Service
metadata:
namespace: velero
name: minio
labels:
component: minio
spec:
# ClusterIP is recommended for production environments.
# Change to NodePort if needed per documentation,
# but only if you run Minio in a test/trial environment, for example with Minikube.
type: NodePort
ports:
- port: 9002
nodePort: 31482
targetPort: 9002
protocol: TCP
selector:
component: minio
---
apiVersion: batch/v1
kind: Job
metadata:
namespace: velero
name: minio-setup
labels:
component: minio
spec:
template:
metadata:
name: minio-setup
spec:
restartPolicy: OnFailure
volumes:
- name: config
emptyDir: {}
containers:
- name: mc
image: minio/mc:latest
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- "mc --config-dir=/config config host add velero http://minio:9000 minio minio123 && mc --config-dir=/config mb -p velero/velero"
volumeMounts:
- name: config
mountPath: "/config"
Edit2 Logs Of Pod
WARNING: MINIO_ACCESS_KEY and MINIO_SECRET_KEY are deprecated.
Please use MINIO_ROOT_USER and MINIO_ROOT_PASSWORD
Formatting 1st pool, 1 set(s), 1 drives per set.
WARNING: Host local has more than 0 drives of set. A host failure will result in data becoming unavailable.
MinIO Object Storage Server
Copyright: 2015-2023 MinIO, Inc.
License: GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Version: RELEASE.2023-01-25T00-19-54Z (go1.19.4 linux/amd64)
Status: 1 Online, 0 Offline.
API: http://10.244.1.108:9000 http://127.0.0.1:9000
Console: http://10.244.1.108:33045 http://127.0.0.1:33045
Documentation: https://min.io/docs/minio/linux/index.html
Warning: The standard parity is set to 0. This can lead to data loss.
Edit 3 Logs of Pod
master-k8s#masterk8s-virtual-machine:~/velero-1.9.5$ kubectl logs minio-8649b94fb5-qvzfh -n velero
WARNING: MINIO_ACCESS_KEY and MINIO_SECRET_KEY are deprecated.
Please use MINIO_ROOT_USER and MINIO_ROOT_PASSWORD
Formatting 1st pool, 1 set(s), 1 drives per set.
WARNING: Host local has more than 0 drives of set. A host failure will result in data becoming unavailable.
MinIO Object Storage Server
Copyright: 2015-2023 MinIO, Inc.
License: GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Version: RELEASE.2023-01-25T00-19-54Z (go1.19.4 linux/amd64)
Status: 1 Online, 0 Offline.
API: http://10.244.2.131:9000 http://127.0.0.1:9000
Console: http://10.244.2.131:36649 http://127.0.0.1:36649
Documentation: https://min.io/docs/minio/linux/index.html
Warning: The standard parity is set to 0. This can lead to data loss.
You can set the nodePort number inside the port config so that it won't be automatically set.
Try this Service:
apiVersion: v1
kind: Service
metadata:
namespace: velero
name: minio
labels:
component: minio
spec:
type: NodePort
ports:
- port: 9000
nodePort: 31481
targetPort: 9000
protocol: TCP
selector:
component: minio

Kubernetes : RabbitMQ pod is spammed with connections from kube-system

I'm currently learning Kubernetes and all its quircks.
I'm currently using a rabbitMQ Deployment, service and pod in my cluster to exchange messages between apps in the cluster. However, I saw an abnormal amount of the rabbitMQ pod restarts.
After installing prometheus and Grafana to see the problem, I saw that the rabbitMQ pod would consume more and more memory and cpu until it gets killed by the OOMkiller every two hours or so. The graph looks like this :
Graph of CPU consumption in my cluster (rabbitmq in red)
After that I looked into the rabbitMQ pod UI, and saw that an app in my cluster (ip 10.224.0.5) was constantly creating new connections, this IP corresponding to my kube-system and my prometheus instance, as shown by the following logs :
k get all -A -o wide | grep 10.224.0.5
E1223 12:13:48.231908 23198 memcache.go:255] couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1
E1223 12:13:48.311831 23198 memcache.go:255] couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1
kube-system pod/azure-ip-masq-agent-xh9jk 1/1 Running 0 25d 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
kube-system pod/cloud-node-manager-h5ff5 1/1 Running 0 25d 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
kube-system pod/csi-azuredisk-node-sf8sn 3/3 Running 0 3d15h 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
kube-system pod/csi-azurefile-node-97nbt 3/3 Running 0 19d 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
kube-system pod/kube-proxy-2s5tn 1/1 Running 0 3d15h 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
monitoring pod/prometheus-prometheus-node-exporter-dztwx 1/1 Running 0 20h 10.224.0.5 aks-agentpool-37892177-vmss000001 <none> <none>
Also, I noticed that these connections seem tpo be blocked by rabbitMQ, as the field connection.blocked in the client properties is set to true, as shown in the follwing image:
Print screen of a connection details from rabbitMQ pod's UI
I saw in the documentation that rabbitMQ starts to blocks connections when it hits low on resources, but I set the cpu and memory limits to 1 cpu and 1 Gib RAM, and the connections are blocked from the start anyway.
On the cluster, I'm also using Keda which uses the rabbitmq pod, and polls it every one second to see if there are any messages in a queue (I set pollingInterval to 1 in the yaml). But as I said earlier, it's not Keda that's creating all the connections, it's kube-system. Unless keda uses a component described earlier in the log to poll rabbitmq, and that the Keda's polling interval does not corresponds to seconds (which is highly unlikely as it's written in the docs that this polling intertval is given in seconds), I don't know at all what's going on with all these connections.
The following section contains the yamls of all the components that might be involved with this problem (keda and rabbitmq) :
rabbitMQ Replica Count.yaml
apiVersion: v1
kind: ReplicationController
metadata:
labels:
component: rabbitmq
name: rabbitmq-controller
spec:
replicas: 1
template:
metadata:
labels:
app: taskQueue
component: rabbitmq
spec:
containers:
- image: rabbitmq:3.11.5-management
name: rabbitmq
ports:
- containerPort: 5672
name: amqp
- containerPort: 15672
name: http
resources:
limits:
cpu: 1
memory: 1Gi
rabbitMQ Service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
component: rabbitmq
name: rabbitmq-service
spec:
type: LoadBalancer
ports:
- port: 5672
targetPort: 5672
name: amqp
- port: 15672
targetPort: 15672
name: http
selector:
app: taskQueue
component: rabbitmq
keda JobScaler, Secret and TriggerAuthentication (sample data is just a replacement for fields that I do not want to be revealed :) ):
apiVersion: v1
kind: Secret
metadata:
name: keda-rabbitmq-secret
data:
host: sample-host # base64 encoded value of format amqp://guest:password#localhost:5672/vhost
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-trigger-auth-rabbitmq-conn
namespace: default
spec:
secretTargetRef:
- parameter: host
name: keda-rabbitmq-secret
key: host
---
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: builder-job-scaler
namespace: default
spec:
jobTargetRef:
parallelism: 1
completions: 1
activeDeadlineSeconds: 600
backoffLimit: 5
template:
spec:
volumes:
- name: shared-storage
emptyDir: {}
initContainers:
- name: sourcesfetcher
image: sample image
volumeMounts:
- name: shared-storage
mountPath: /mnt/shared
env:
- name: SHARED_STORAGE_MOUNT_POINT
value: /mnt/shared
- name: RABBITMQ_ENDPOINT
value: sample host
- name: RABBITMQ_QUEUE_NAME
value: buildOrders
containers:
- name: builder
image: sample image
volumeMounts:
- name: shared-storage
mountPath: /mnt/shared
env:
- name: SHARED_STORAGE_MOUNT_POINT
value: /mnt/shared
- name: MINIO_ENDPOINT
value: sample endpoint
- name: MINIO_PORT
value: sample port
- name: MINIO_USESSL
value: "false"
- name: MINIO_ROOT_USER
value: sample user
- name: MINIO_ROOT_PASSWORD
value: sampel password
- name: BUCKET_NAME
value: "hex"
- name: SERVER_NAME
value: sample url
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 500m
memory: 512Mi
restartPolicy: OnFailure
pollingInterval: 1
maxReplicaCount: 2
minReplicaCount: 0
rollout:
strategy: gradual
triggers:
- type: rabbitmq
metadata:
protocol: amqp
queueName: buildOrders
mode: QueueLength
value: "1"
authenticationRef:
name: keda-trigger-auth-rabbitmq-conn
Any help would very much appreciated!

Accessing an external InfluxDb Database from a microk8s pod using selectorless service and manual endpoint?

Gist: I am struggling to get a pod to connect to a service outside the cluster.
Basically the pod manages to resolve the ClusterIp of the selectorless service, but traffic does not go through. Traffic does go through if i hit the ClusterIp of the selectorless service from the cluster host.
I'm fairly new with microk8s and k8s in general. I hope i am making some sense though...
Background:
I am attempting to move parts of my infrastructure from a docker-compose setup on one virtual machine, to a microk8s cluster (with 2 nodes).
In the docker compose, i have a Grafana Container, connecting to an InfluxDb container.
kubectl version:
Client Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.2-3+9ad9ee77396805", GitCommit:"9ad9ee77396805781cd0ae076d638b9da93477fd", GitTreeState:"clean", BuildDate:"2021-09-30T09:52:57Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
I now want to setup a Grafana container on the microk8s cluster, and have it connect to the InfluxDb that is still running on the docker-compose vm.
All of these VM's are running on an ESXi host.
InfluxDb is exposed at 10.1.2.220:8086
microk8s-master has ip 10.1.2.50
microk8s-slave-1 has ip 10.1.2.51
I have enabled ingress and dns. I have also enabled metallb, though i don't intend to use it here.
I have configured a selectorless service, a remote endpoint and an egress Network Policy (currently allowing all).
From microk8s-master and slave-1, i can
telnet directly to 10.1.2.220:8086 successfully
telnet to the ClusterIP(10.152.183.26):8086 of the service, successfully reaching influxdb
wget ClusterIp:8086
Inside the Pod, if i do a wget to influxdb-service:8086, it will resolve to the ClusterIP, but after that it times out.
I can however reach (wget), services pointing to other pods in the same namespace
Update:
I have been able to get it to work through a workaround, but i dont think this is the correct way.
My temporary solution is to expose the selectorless service on metallb, then use that exposed ip inside the pod.
Service and Endpoints for InfluxDb
---
apiVersion: v1
kind: Service
metadata:
name: influxdb-service
labels:
app: grafana
spec:
ports:
- protocol: TCP
port: 8086
targetPort: 8086
---
apiVersion: v1
kind: Endpoints
metadata:
name: influxdb-service
subsets:
- addresses:
- ip: 10.1.2.220
ports:
- port: 8086
The service and endpoint shows up fine
eso#microk8s-master:~/k8s-grafana$ microk8s.kubectl get endpoints
NAME ENDPOINTS AGE
neo4j-service-lb 10.1.166.176:7687,10.1.166.176:7474 25h
influxdb-service 10.1.2.220:8086 127m
questrest-service 10.1.166.178:80 5d
kubernetes 10.1.2.50:16443,10.1.2.51:16443 26d
grafana-service 10.1.237.120:3000 3h11m
eso#microk8s-master:~/k8s-grafana$ microk8s.kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 26d
questrest-service ClusterIP 10.152.183.56 <none> 80/TCP 5d
neo4j-service-lb LoadBalancer 10.152.183.166 10.1.2.60 7474:31974/TCP,7687:32688/TCP 25h
grafana-service ClusterIP 10.152.183.75 <none> 3000/TCP 3h13m
influxdb-service ClusterIP 10.152.183.26 <none> 8086/TCP 129m
eso#microk8s-master:~/k8s-grafana$ microk8s.kubectl get networkpolicy
NAME POD-SELECTOR AGE
grafana-allow-egress-influxdb app=grafana 129m
test-egress-influxdb app=questrest 128m
Describe:
eso#microk8s-master:~/k8s-grafana$ microk8s.kubectl describe svc influxdb-service
Name: influxdb-service
Namespace: default
Labels: app=grafana
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.152.183.26
IPs: 10.152.183.26
Port: <unset> 8086/TCP
TargetPort: 8086/TCP
Endpoints: 10.1.2.220:8086
Session Affinity: None
Events: <none>
eso#microk8s-master:~/k8s-grafana$ microk8s.kubectl describe endpoints influxdb-service
Name: influxdb-service
Namespace: default
Labels: <none>
Annotations: <none>
Subsets:
Addresses: 10.1.2.220
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
<unset> 8086 TCP
Events: <none>
eso#microk8s-master:~/k8s-grafana$ microk8s.kubectl describe networkpolicy grafana-allow-egress-influxdb
Name: grafana-allow-egress-influxdb
Namespace: default
Created on: 2021-11-03 20:53:00 +0000 UTC
Labels: <none>
Annotations: <none>
Spec:
PodSelector: app=grafana
Not affecting ingress traffic
Allowing egress traffic:
To Port: <any> (traffic allowed to all ports)
To: <any> (traffic not restricted by destination)
Policy Types: Egress
Grafana.yml:
eso#microk8s-master:~/k8s-grafana$ cat grafana.yml
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: grafana-pv
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteMany
storageClassName: ""
claimRef:
name: grafana-pvc
namespace: default
persistentVolumeReclaimPolicy: Retain
nfs:
path: /mnt/MainVol/grafana
server: 10.2.0.1
readOnly: false
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: grafana-pv
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: grafana
name: grafana
spec:
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
securityContext:
fsGroup: 472
supplementalGroups:
- 0
containers:
- name: grafana
image: grafana/grafana:7.5.2
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3000
name: http-grafana
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /robots.txt
port: 3000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 2
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 3000
timeoutSeconds: 1
resources:
requests:
cpu: 250m
memory: 750Mi
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-pv
volumes:
- name: grafana-pv
persistentVolumeClaim:
claimName: grafana-pvc
---
apiVersion: v1
kind: Service
metadata:
name: grafana-service
spec:
ports:
- port: 3000
protocol: TCP
targetPort: http-grafana
selector:
app: grafana
#sessionAffinity: None
#type: LoadBalancer
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: "g2.some.domain.com"
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana-service
port:
number: 3000
---
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: grafana-allow-egress-influxdb
namespace: default
spec:
podSelector:
matchLabels:
app: grafana
ingress:
- {}
egress:
- {}
policyTypes:
- Egress
As I haven't gotten much response, i'll answer the question with my "workaround". I am still not sure this is the best way to do it though.
I got it to work by exposing the selectorless service on metallb, then using that exposed ip inside grafana
kind: Service
apiVersion: v1
metadata:
name: influxdb-service-lb
#namespace: ingress
spec:
type: LoadBalancer
loadBalancerIP: 10.1.2.61
# selector:
# app: grafana
ports:
- name: http
protocol: TCP
port: 8086
targetPort: 8086
---
apiVersion: v1
kind: Endpoints
metadata:
name: influxdb-service-lb
subsets:
- addresses:
- ip: 10.1.2.220
ports:
- name: influx
protocol: TCP
port: 8086
I then use the loadbalancer ip in grafana (10.1.2.61)
Update October 2022
As a response to a comment above, I have added a diagram of how i believe this to work

Two kubernetes deployments in the same namespace are not able to communicate

I'm deploying ELK stack (oss) to kubernetes cluster. Elasticsearch deployment and service starts correctly and API is reacheble. Kibana deployment starts but can't access elasticsearch:
From Kibana container logs:
{"type":"log","#timestamp":"2019-05-08T22:49:26Z","tags":["error","elasticsearch","admin"],"pid":1,"message":"Request error, retrying\nHEAD http://elasticsearch:9200/ => getaddrinfo ENOTFOUND elasticsearch elasticsearch:9200"}
{"type":"log","#timestamp":"2019-05-08T22:50:44Z","tags":["warning","elasticsearch","admin"],"pid":1,"message":"Unable to revive connection: http://elasticsearch:9200/"}
{"type":"log","#timestamp":"2019-05-08T22:50:44Z","tags":["warning","elasticsearch","admin"],"pid":1,"message":"No living connections"}
Both deployments are in the same namespace "observability". I also tried to reference elasticsearch container as elasticsearch.observability.svc.cluster.local but it's not working too.
What I'am doing wrong? How to reference elasticsearch container from kibana container?
More info:
kubectl --context=19team-observability-admin-context -n observability get pods
NAME READY STATUS RESTARTS AGE
elasticsearch-9d495b84f-j2297 1/1 Running 0 15s
kibana-65bc7f9c4-s9cv4 1/1 Running 0 15s
kubectl --context=19team-observability-admin-context -n observability get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch NodePort 10.104.250.175 <none> 9200:30083/TCP,9300:30059/TCP 1m
kibana NodePort 10.102.124.171 <none> 5601:30124/TCP 1m
I start my containers with command
kubectl --context=19team-observability-admin-context -n observability apply -f .\elasticsearch.yaml -f .\kibana.yaml
elasticsearch.yaml
apiVersion: v1
kind: Service
metadata:
name: elasticsearch
namespace: observability
spec:
type: NodePort
ports:
- name: "9200"
port: 9200
targetPort: 9200
- name: "9300"
port: 9300
targetPort: 9300
selector:
app: elasticsearch
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: elasticsearch
namespace: observability
spec:
replicas: 1
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
initContainers:
- name: set-vm-max-map-count
image: busybox
imagePullPolicy: IfNotPresent
command: ['sysctl', '-w', 'vm.max_map_count=262144']
securityContext:
privileged: true
resources:
requests:
memory: "512Mi"
cpu: "1"
limits:
memory: "724Mi"
cpu: "1"
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.7.1
ports:
- containerPort: 9200
- containerPort: 9300
resources:
requests:
memory: "3Gi"
cpu: "1"
limits:
memory: "3Gi"
cpu: "1"
kibana.yaml
apiVersion: v1
kind: Service
metadata:
name: kibana
namespace: observability
spec:
type: NodePort
ports:
- name: "5601"
port: 5601
targetPort: 5601
selector:
app: observability_platform_kibana
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: observability_platform_kibana
name: kibana
namespace: observability
spec:
replicas: 1
template:
metadata:
labels:
app: observability_platform_kibana
spec:
containers:
- env:
# THIS IS WHERE WE SET CONNECTION BETWEEN KIBANA AND ELASTIC
- name: ELASTICSEARCH_HOSTS
value: http://elasticsearch:9200
- name: SERVER_NAME
value: kibana
image: docker.elastic.co/kibana/kibana-oss:6.7.1
name: kibana
ports:
- containerPort: 5601
resources:
requests:
memory: "512Mi"
cpu: "1"
limits:
memory: "724Mi"
cpu: "1"
restartPolicy: Always
UPDATE 1
As gonzalesraul proposed I've created second service for elastic with ClusterIP type:
apiVersion: v1
kind: Service
metadata:
labels:
app: elasticsearch
name: elasticsearch-local
namespace: observability
spec:
type: ClusterIP
ports:
- port: 9200
protocol: TCP
targetPort: 9200
selector:
app: elasticsearch
Service is created:
kubectl --context=19team-observability-admin-context -n observability get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch NodePort 10.106.5.94 <none> 9200:31598/TCP,9300:32018/TCP 26s
elasticsearch-local ClusterIP 10.101.178.13 <none> 9200/TCP 26s
kibana NodePort 10.99.73.118 <none> 5601:30004/TCP 26s
And reference elastic as "http://elasticsearch-local:9200"
Unfortunately it does not work, in kibana container:
{"type":"log","#timestamp":"2019-05-09T10:13:54Z","tags":["warning","elasticsearch","admin"],"pid":1,"message":"Unable to revive connection: http://elasticsearch-local:9200/"}
Do not use a NodePort service, instead use a ClusterIP. If you need to expose as a Nodeport your service, create a second service besides, for instance:
---
apiVersion: v1
kind: Service
metadata:
labels:
app: elasticsearch
name: elasticsearch-local
namespace: observability
spec:
type: ClusterIP
ports:
- port: 9200
protocol: TCP
targetPort: 9200
selector:
app: elasticsearch
Then update the kibana manifest to point to the ClusterIP service:
# ...
# THIS IS WHERE WE SET CONNECTION BETWEEN KIBANA AND ELASTIC
- name: ELASTICSEARCH_HOSTS
value: http://elasticsearch-local:9200
# ...
The nodePort services do not create a 'dns entry' (ex. elasticsearch.observability.svc.cluster.local) on kubernetes
Edit the server name value in kibana.yaml and set it to kibana:5601.
I think if you don't do this, by default it is trying to go to port 80.
This is what looks like now kibana.yaml:
...
spec:
containers:
- env:
- name: ELASTICSEARCH_HOSTS
value: http://elasticsearch:9200
- name: SERVER_NAME
value: kibana:5601
image: docker.elastic.co/kibana/kibana-oss:6.7.1
imagePullPolicy: IfNotPresent
name: kibana
...
And this is the output now:
{"type":"log","#timestamp":"2019-05-09T10:37:16Z","tags":["status","plugin:console#6.7.1","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-05-09T10:37:16Z","tags":["status","plugin:interpreter#6.7.1","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-05-09T10:37:16Z","tags":["status","plugin:metrics#6.7.1","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-05-09T10:37:16Z","tags":["status","plugin:tile_map#6.7.1","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-05-09T10:37:16Z","tags":["status","plugin:timelion#6.7.1","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-05-09T10:37:16Z","tags":["status","plugin:elasticsearch#6.7.1","info"],"pid":1,"state":"green","message":"Status changed from yellow to green - Ready","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"}
{"type":"log","#timestamp":"2019-05-09T10:37:17Z","tags":["listening","info"],"pid":1,"message":"Server running at http://0:5601"}
UPDATE
I just tested it on a bare metal cluster (bootstraped through kubeadm), and worked again.
This is the output:
{"type":"log","#timestamp":"2019-05-09T11:09:59Z","tags":["warning","elasticsearch","admin"],"pid":1,"message":"No living connections"}
{"type":"log","#timestamp":"2019-05-09T11:10:01Z","tags":["warning","elasticsearch","admin"],"pid":1,"message":"Unable to revive connection: http://elasticsearch:9200/"}
{"type":"log","#timestamp":"2019-05-09T11:10:01Z","tags":["warning","elasticsearch","admin"],"pid":1,"message":"No living connections"}
{"type":"log","#timestamp":"2019-05-09T11:10:04Z","tags":["status","plugin:elasticsearch#6.7.1","info"],"pid":1,"state":"green","message":"Status changed from red to green - Ready","prevState":"red","prevMsg":"Unable to connect to Elasticsearch."}
{"type":"log","#timestamp":"2019-05-09T11:10:04Z","tags":["info","migrations"],"pid":1,"message":"Creating index .kibana_1."}
{"type":"log","#timestamp":"2019-05-09T11:10:06Z","tags":["info","migrations"],"pid":1,"message":"Pointing alias .kibana to .kibana_1."}
{"type":"log","#timestamp":"2019-05-09T11:10:06Z","tags":["info","migrations"],"pid":1,"message":"Finished in 2417ms."}
{"type":"log","#timestamp":"2019-05-09T11:10:06Z","tags":["listening","info"],"pid":1,"message":"Server running at http://0:5601"}
Note that it passed from "No Living Connections" to "Running". I am running the nodes on GCP. I had to open the firewalls for it to work. What's your environment?