Strimzi kafka accessing it privately with in GKE - apache-kafka

I have two clusters,
one cluster with my application Microservices and other with the strimzi kafka installed. Both are the private GKE clusters .
My challenge exactly is to how to connect to this kafka from my application. There are around 10 Microservices running each has to connect to the kafka.
I have an approach for now by making the Strimzi kafka as a Nodeport service and providing the Ip and nodeIp in the application code.
The problem with this approach is that if the GKE nodes get auto updated I will have to reconfigure the code.
Also one more critical condition is that , The Kafka should be only accessed by our application ., It shouldn't be available for the public Internet.

I had a similar situation and how I solved it.
Create an internal loadbalancer to be accessible from another AKS cluster. Also allowing access to specific subnet as well.
Strimzi supports externalizing the bootstrap service.
example:
...
spec:
kafka:
replicas: 3
listeners:
plain: {}
tls: {}
external:
type: loadbalancer
tls: false
...
template:
externalBootstrapService:
metadata:
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
service.beta.kubernetes.io/azure-load-balancer-internal-subnet: "apps-subnet"
perPodService:
metadata:
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
service.beta.kubernetes.io/azure-load-balancer-internal-subnet: "apps-subnet"
for more details you can check internal load balancers section in this link:
https://strimzi.io/blog/2019/05/13/accessing-kafka-part-4/

Related

What is the purpose of headless service in Kubernetes?

I am naive in Kubernetes world. I was going through a interesting concept called headless service.
I have read it, understand it, and I can create headless service. But I am still not convinced about use cases. Like why do we need it. There are already three types of service clusterIP, NodePort and loadbalancer service with their separate use cases.
Could you please tell me what is exactly which headless service solve and all those other three services could not solve it.
I have read it that headless is mainly used with the application which is stateful like dB based pod for example cassandra, MongoDB etc. But my question is why?
A headless service doesn't provide any sort of proxy or load balancing -- it simply provides a mechanism by which clients can look up the ip address of pods. This means that when they connect to your service, they're connecting directly to the pods; there's no intervening proxy.
Consider a situation in which you have a service that matches three pods; e.g., I have this Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: example
name: example
spec:
replicas: 3
selector:
matchLabels:
app: example
template:
metadata:
labels:
app: example
spec:
containers:
- image: docker.io/traefik/whoami:latest
name: whoami
ports:
- containerPort: 80
name: http
If I'm using a typical ClusterIP type service, like this:
apiVersion: v1
kind: Service
metadata:
labels:
app: example
name: example
spec:
ports:
- name: http
port: 80
targetPort: http
selector:
app: example
Then when I look up the service in a client pod, I will see the ip address of the service proxy:
/ # host example
example.default.svc.cluster.local has address 10.96.114.63
However, when using a headless service, like this:
apiVersion: v1
kind: Service
metadata:
labels:
app: example
name: example-headless
spec:
clusterIP: None
ports:
- name: http
port: 80
targetPort: http
selector:
app: example
I will instead see the addresses of the pods:
/ # host example-headless
example-headless.default.svc.cluster.local has address 10.244.0.25
example-headless.default.svc.cluster.local has address 10.244.0.24
example-headless.default.svc.cluster.local has address 10.244.0.23
By removing the proxy from the equation, clients are aware of the actual pod ips, which may be important for some applications. This also simplifies the path between clients and the service, which may have performance benefits.
Kubernetes Services of type ClusterIP, NodePort and LoadBalancer have one thing in common: They loadbalance between all pods that match the service's selector, so you can talk to all pods via a virtual ip.
That's nice because you can have multiple pods of the same application and all requests will be spread between those, avoiding overloading one pod while others are still idle.
For some applications you might require to talk to the pods directly instead of over an virtual ip. Still you'll want a stable hostname that points to the same pod, even if the pod's ip changes, e.g. because it needs to be rescheduled on a different host.
Use cases are mostly databases where applications should always connect to the same instance for data and session consistency.
A headless service does not provide a virtual IP covering all the endpoints in its endpoint slice. If you query that service DNS, you will get all the endpoint IP addresses back. A regular service on the other hand will have sort of a virtual IP, so clients connecting to it are not aware of the underlying endpoints.
To answer your question, why this is important for a stateful. Usually, members of a statefulset need to be aware of each other. Let's say you want to run a distributed database in something like a cluster formation. The members of this cluster need to have a way to discover each other. This is where the headless service comes into play. Because the individual database pods can get the address of the other database pods by using the headless service.
The headless service also provides a stable network identify for each member of the statefulset. You can read about that here. This is, again, useful as the members of the statefulset need to coordinate with each other in some way. Let's say the pod-0 will always take the initial leader role, and every other member knows it should report itself to pod-0 to form a cluster like configuration.

AKS Kubernetes questions

Can someone please explain how POD to POD works in AKS?
from the docs, I can see it uses kube proxy component to send the traffic to the desired POD.
But I have been told that I must use clusterIP service and bind all the relevant POD's together.
So what is real flow? Or I missed something. below a couple of questions to be more clear.
Questions:
how POD to POD inside one node can talk to each other? what is the flow?
how POD to POD inside a cluster (different nodes) can talk to each other? what is the flow?
if it's possible it will be highly appreciated if you can describe the flows for #1 and #2 in the deployment of kubenet and CNI.
Thanks a lot!
for pod to pod communication we use services. so first we need to understand,
why we need service: what actually do service for us that, they resolve the dns name and give us the the exact ip that we need to connect a specific pod. now as you want to communicate with pod to pod you need to create a ClusterIP service.
ClusterIP: Exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster. This is the default ServiceType. with ClusterIP service you can't access a pod from outside the cluster for this reason we use clusterip service if we want the communication between pod to pod only.
kube-proxy is the network proxy that runs on each node in your cluster.
it maintains network rules on nodes. These network rules allow network communication to your Pods from network sessions inside or outside of your cluster.
every service maintain iptables.And kube-proxy handled these ip tables for every service. so yes, kube-proxy is the most vital point for network setup in our k8s cluster.
how the network policy works in kubernetes:
all Pods can communicate with all other Pods without using network address translation (NAT).
all Nodes can communicate with all Pods without NAT.
the IP that a Pod sees itself as is the same IP that others see it as.
with those point:
Container-to-Container networking
Pod-to-Pod networking
Pod-to-Service networking
Internet-to-Service networking
It handles transmission of packets between pod to pods, and also with the outside world. It acts like a network proxy and load balancer for pods running on the node by implementing load-balancing using NAT in iptables.
The kube-proxy process stands in between the Kubernetes network and the pods that are running on that particular node. It is responsible for ensuring that communication is maintained efficiently across all elements of the cluster. When a user creates a Kubernetes service object, the kube-proxy instance is responsible to translate that object into meaningful rules in the local iptables rule set on the worker node. iptables is used to translate the virtual IP assigned to the service object to all of the pod IPs mapped by the service.
i hope it's clear your idea about kube proxy.
lets see a example how it's works.
here i used headless service so that i can connect a specific pod.
---
apiVersion: v1
kind: Service
metadata:
name: my-service
namespace: default
spec:
clusterIP: None
selector:
app: my-test
ports:
- port: 80
name: rest
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: my-sts
spec:
serviceName: my-service
replicas: 3
selector:
matchLabels:
app: my-test
template:
metadata:
labels:
app: my-test
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
name: web
---
this will create 3 pods. as : my-sts-0, my-sts-1, my-sts-2. now if we want to connect to the pod my-sts-0 just use this dns name my-sts-0.my-service.default.svc:80 . and the service will resolve the dns name and will provide the exact podip of my-sts-0. now if you need to comminucate from my-sts-1 to my-sts-0, you can just use this dns name.
The template is like my_pod_name.my_Service_Name.my_Namespace.svc.cluster-domain.example , but you can skip the cluster-domain.example part. Only Service_Name.Namespace.svc will work fine.
ref

Kubernetes: Create an Ingress for Kafka

I have Confluent Kafka 5.3.1 pods running with Zookeeper with SSL on AWS.
How can I make the Ingress work? I want to give access to an external user, to my Kafka topic. This should work as a Layer-4 ingress as it is on port 9092~3.
I am trying to find some documentation online.
Thank you
Ingress is only an L7 resource. So you can make use of a plain LoadBalancer Service. Since you are running on AWS, you can either use an ELB or an NLB. Make sure you use the right annotation on your Service, for example for an NLB:
metadata:
name: my-service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"

Exposing Kafka cluster in Kubernetes using LoadBalancer service

Suppose if I have 3 node Kafka cluster setup. Then how do I expose it outside a cloud using Load Balancer service? I have read reference material but have a few doubts.
Say for example below is a service for a broker
apiVersion: v1
kind: Service metadata:
name: kafka-0
annotations: dns.alpha.kubernetes.io/external: kafka-0.kafka.my.company.com
spec:
externalTrafficPolicy: Local
type: LoadBalancer
ports:
- port: 9092
name: outside
targetPort: 9092
selector: app: kafka kafka-pod-id: "0"
What is port and targetPort?
Do I setup LoadBalancer service for each of the brokers?
Do these multiple brokers get mapped to single public IP address of cloud LB?
How does a service outside k8s/cloud access individual broker? By using public-ip:port? or by using kafka-<pod-id>.kafka.my.company.com:port?. Also which port is used here? port or targetPort?
How do I specify this configuration in Kafka broker's Advertised.listeners property? As port can be different for services inside k8s cluster and outside it.
Please help.
Based on the information you provided I will try give you some answers, eventually give some advise.
1) port: is the port number which makes a service visible to other services running within the same K8s cluster. In other words, in case a service wants to invoke another service running within the same Kubernetes cluster, it will be able to do so using port specified against port in the service spec file.
targetPort: is the port on the POD where the service is running. Your application needs to be listening for network requests on this port for the service to work.
2/3) Each Broker should be exposed as LoadBalancer and be configured as headless service for internal communication. There should be one addiational LoadBalancer with external ip for external connection.
Example of Service
apiVersion: v1
kind: Service
metadata:
name: kafka-0
annotations: dns.alpha.kubernetes.io/external: kafka-0.kafka.my.company.com
spec:
ports:
- port: 9092
name: kafka-port
protocol: TCP
selector:
pod-name: kafka-0
type: LoadBalancer
4) You have to use kafka-<pod-id>.kafka.my.company.com:port
5) It should be set to the external addres so that clients can connect to it. This article might help with understanding.
Similar case was on Github, it might help you also - https://github.com/kow3ns/kubernetes-kafka/issues/3
In addition, You could also think about Ingress - https://tothepoint.group/blog/accessing-kafka-on-google-kubernetes-engine-from-the-outside-world/

Understanding subnetting in Kubernetes cluster

When using GKE, I found that a all the nodes in the Kubernetes cluster must be in the same network and the same subnet. So, I wanted to understand the correct way to design networking.
I have two services A and B and they have no relation between them. My plan was to use a single cluster in a single region and have two nodes for each of the services A and B in different subnets in the same network.
However, it seems like that can't be done. The other way to partition a cluster is using namespaces, however I am already using partitioning development environment using namespaces.
I read about cluster federation https://kubernetes.io/docs/concepts/cluster-administration/federation/, however it my services are small and I don't need them in multiple clusters and in sync.
What is the correct way to setup netowrking for these services? Should I just use the same network and subnet for all the 4 nodes to serve the two services A and B?
You can restrict the incoming (or outgoing) traffic making use of labels and networking policies.
In this way the pods would be able to receive the traffic merely if it has been generated by a pod belonging to the same application or with any logic you want to implement.
You can follow this step to step tutorial that guides you thorough the implementation of a POC.
kubectl run hello-web --labels app=hello \
--image=gcr.io/google-samples/hello-app:1.0 --port 8080 --expose
Example of Network policy
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: hello-allow-from-foo
spec:
policyTypes:
- Ingress
podSelector:
matchLabels:
app: hello
ingress:
- from:
- podSelector:
matchLabels:
app: foo