ZooKeeper vs Consul with solrcloud - apache-zookeeper

I am a little unsure of the difference between using ZooKeeper and Consul with a SolrCloud solutions.
Can I use Consul for configuration management and service discovery, instead of using ZooKeeper.
Is there a reason why I would use both ZooKeeper and Consul? e.g. ZooKeeper for configuration management and Consul for service discovery.

Related

Kafka in Kubernetes Cluster- How can services in the K8S cluster be accessed using SASL and services outside the K8S cluster be accessed using SSL

I want to deploy the Kafka cluster on the Kubernetes environment and have the services within the Kubernetes cluster connect to Kafka in SASL_PLAINTEXT mode and the services outside the Kubernetes cluster connect to Kafka in SASL_SSL mode. However, I found that after setting this up, external services cannot connect to Kafka. Does Kafka not allow internal services to connect to external services differently? My Kafka version is 2.3.1 and I would be grateful if you could answer my questions.
It's possible, yes. You'd need to setup two advertised.listeners and listeners on the brokers with a protocol of and SSL for one set and SASL_SSL / SASL_PLAINTEXT for the other.
It's kubernetes network policies that control how cluster access happens, not only Kafka, but also, you'll need a NodePort or Ingress for any external app to reach services within the cluster. The Strimzi operator covers both, by the way

Unable to connect to Kafka on Kubernetes externally

I have deployed kafka on minikube by following https://docs.bitnami.com/tutorials/deploy-scalable-kafka-zookeeper-cluster-kubernetes.
I'm able to create kafka topics, publish and consume messages from the topics through kubectl commands. When I installed kafka through helm chart, kafka.kafka.svc.cluster.local is the DNS name within the cluster.
helm install kafka bitnami/kafka --set zookeeper.enabled=false --set replicaCount=3 --set externalZookeeper.servers=zookeeper.kafka.svc.cluster.local -n kafka
I have tried in multiple ways, but not able to access this kafka cluster outside. I'm trying to publish messages to a kafka topic through sample producer code in IntelliJ, but the bootstrap server kafka.kafka.svc.cluster.local is not reachable.
but the bootstrap server kafka.kafka.svc.cluster.local is not reachable.
That's internal CoreDNS record address, only. You'll need to define a headless service with exposed NodePort and optional TCP LoadBalancer that'll direct Ingress traffic into the cluster, along with an appropriate NetworkPolicy. Search the config for "external" - https://github.com/bitnami/charts/tree/master/bitnami/kafka#traffic-exposure-parameters
Kafka is not unique in this way, so I suggest learning more about accessing k8s services outside the cluster. Or switch back to Docker Compose for simply testing a local Kafka environment with containers.
Note that the advertised listeners setting of each broker pods would need to return their individual, external broker addresses back to the client

Registering Zuul on Eureka in Kubernetes cluster

I have a Kubernetes cluster on Linux with one master node and two slave nodes. I have installed & created services for a eureka-server and Zuul with multiple replicas which are accessible by NodePorts. In order to enable load balancing, we need to register Zuul service in Eureka.
Can anybody let me know how we can register Zuul on eureka-server?
If you look at the configuration for Zuul Service Discovery you can see that there is an option:
eureka.serviceUrl.default=http://${region}.${eureka.eurekaServer.domainName}:7001/${eureka.eurekaServer.context}
You would have to point that option to your eureka-server Kubernetes Service. Based on the DNS Kubernetes convention it would be something like this:
eureka-server-service.<k8s-namespace>.svc.cluster.local:<port-of-service-you-exposed>

Access kafka broker outside k8 minikube cluster

I have a landoop kafka image running on a Pod on minikube k8 cluster on my mac. I have 2 different services to expose the port 8081 for schema registry and 9092 for broker. I have mapped the ports 8081 -> 30081 and 9092 -> 30092 in my NodePort services so that I can access it from outside the cluster.
But when I try to run a console consumer or my consumer app, Kafka never consumes messages.
To verify broker 9092 port is reachable outside k8 cluster:
nc <exposed-ip> 30092, it says the port is open.
To verify Schema registry 8081 is reachable:
curl -X GET http://192.168.99.100:30081/subjects
It returns the schemas that are available.
I had a couple of questions.
1) Can we not access Kafka out of k8 cluster in an above-mentioned way outside of k8 cluster?If so am I doing it wrong in some way?
2) If the port is open, doesn't that mean that broker is available?
Any help is appreciated.Thanks
Accessing a Kafka cluster from outside a container network is rather complicated if you cannot route directly from the outside to the pod.
When you first connect to a Kafka cluster you connect to a single broker and the broker returns the list of all brokers and partitions inside the Kafka cluster. The Kafka client then uses the list to interact with the brokers where the specific topic lays.
The problem is that the broker lists contains by default the internal IP of the Kafka broker. Which would be in your case the container network ip. You can overwrite this value by setting advertised.listeners inside each broker's configuration.
To make a Kafka cluster available from outside Kubernetes you need to configure a nodeport service per each of your brokers and set the advertised.listeners setting of each broker to the external ip of the corresponding nodeport service. But note that this adds additional latency and failure points when you try to use Kafka from inside your Kubernetes cluster.
You need to set the advertised listeners for Kafka. For the landoop docker images this can be set via the environment flag
-e ADV_HOST=192.168.99.100

How to expose a headless service for a StatefulSet externally in Kubernetes

Using kubernetes-kafka as a starting point with minikube.
This uses a StatefulSet and a headless service for service discovery within the cluster.
The goal is to expose the individual Kafka Brokers externally which are internally addressed as:
kafka-0.broker.kafka.svc.cluster.local:9092
kafka-1.broker.kafka.svc.cluster.local:9092
kafka-2.broker.kafka.svc.cluster.local:9092
The constraint is that this external service be able to address the brokers specifically.
Whats the right (or one possible) way of going about this? Is it possible to expose a external service per kafka-x.broker.kafka.svc.cluster.local:9092?
We have solved this in 1.7 by changing the headless service to Type=NodePort and setting the externalTrafficPolicy=Local. This bypasses the internal load balancing of a Service and traffic destined to a specific node on that node port will only work if a Kafka pod is on that node.
apiVersion: v1
kind: Service
metadata:
name: broker
spec:
externalTrafficPolicy: Local
ports:
- nodePort: 30000
port: 30000
protocol: TCP
targetPort: 9092
selector:
app: broker
type: NodePort
For example, we have two nodes nodeA and nodeB, nodeB is running a kafka pod. nodeA:30000 will not connect but nodeB:30000 will connect to the kafka pod running on nodeB.
https://kubernetes.io/docs/tutorials/services/source-ip/#source-ip-for-services-with-typenodeport
Note this was also available in 1.5 and 1.6 as a beta annotation, more can be found here on feature availability: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip
Note also that while this ties a kafka pod to a specific external network identity, it does not guarantee that your storage volume will be tied to that network identity. If you are using the VolumeClaimTemplates in a StatefulSet then your volumes are tied to the pod while kafka expects the volume to be tied to the network identity.
For example, if the kafka-0 pod restarts and kafka-0 comes up on nodeC instead of nodeA, kafka-0's pvc (if using VolumeClaimTemplates) has data that it is for nodeA and the broker running on kafka-0 starts rejecting requests thinking that it is nodeA not nodeC.
To fix this, we are looking forward to Local Persistent Volumes but right now we have a single PVC for our kafka StatefulSet and data is stored under $NODENAME on that PVC to tie volume data to a particular node.
https://github.com/kubernetes/features/issues/121
https://kubernetes.io/docs/concepts/storage/volumes/#local
Solutions so far weren't quite satisfying enough for myself, so I'm going to post an answer of my own. My goals:
Pods should still be dynamically managed through a StatefulSet as much as possible.
Create an external service per Pod (i.e Kafka Broker) for Producer/Consumer clients and avoid load balancing.
Create an internal headless service so that each Broker can communicate with each other.
Starting with Yolean/kubernetes-kafka, the only thing missing is exposing the service externally and two challenges in doing so.
Generating unique labels per Broker pod so that we can create an external service for each of the Broker pods.
Telling the Brokers to communicate to each other using the internal Service while configuring Kafka to tell the producer/consumers to communicate over the external Service.
Per pod labels and external services:
To generate labels per pod, this issue was really helpful. Using it as a guide, we add the following line to the 10broker-config.yml init.sh property with:
kubectl label pods ${HOSTNAME} kafka-set-component=${HOSTNAME}
We keep the existing headless service, but we also generate an external Service per pod using the label (I added them to 20dns.yml):
apiVersion: v1
kind: Service
metadata:
name: broker-0
namespace: kafka
spec:
type: NodePort
ports:
- port: 9093
nodePort: 30093
selector:
kafka-set-component: kafka-0
Configure Kafka with internal/external listeners
I found this issue incredibly useful in trying to understand how to configure Kafka.
This again requires updating the init.sh and server.properties properties in 10broker-config.yml with the following:
Add the following to the server.properties to update the security protocols (currently using PLAINTEXT):
listener.security.protocol.map=INTERNAL_PLAINTEXT:PLAINTEXT,EXTERNAL_PLAINTEXT:PLAINTEXT
inter.broker.listener.name=INTERNAL_PLAINTEXT
Dynamically determine the external IP and for external port for each Pod in the init.sh:
EXTERNAL_LISTENER_IP=<your external addressable cluster ip>
EXTERNAL_LISTENER_PORT=$((30093 + ${HOSTNAME##*-}))
Then configure listeners and advertised.listeners IPs for EXTERNAL_LISTENER and INTERNAL_LISTENER (also in the init.sh property):
sed -i "s/#listeners=PLAINTEXT:\/\/:9092/listeners=INTERNAL_PLAINTEXT:\/\/0.0.0.0:9092,EXTERNAL_PLAINTEXT:\/\/0.0.0.0:9093/" /etc/kafka/server.properties
sed -i "s/#advertised.listeners=PLAINTEXT:\/\/your.host.name:9092/advertised.listeners=INTERNAL_PLAINTEXT:\/\/$HOSTNAME.broker.kafka.svc.cluster.local:9092,EXTERNAL_PLAINTEXT:\/\/$EXTERNAL_LISTENER_IP:$EXTERNAL_LISTENER_PORT/" /etc/kafka/server.properties
Obviously, this is not a full solution for production (for example addressing security for the externally exposed brokers) and I'm still refining my understanding of how to also let internal producer/consumers to also communicate with the brokers.
However, so far this is the best approach for my understanding of Kubernetes and Kafka.
Note: I completely rewrote this post a year after the initial posting:
1. Some of what I wrote is no longer relevant given updates to Kubernetes, and I figured it should be deleted to avoid confusing people.
2. I now know more about both Kubernetes and Kafka and should be able to do a better explanation.
Background Contextual Understanding of Kafka on Kubernetes:
Let's say a service of type cluster IP and stateful set are used to deploy a 5 pod Kafka cluster on a Kubernetes Cluster, because a stateful set was used to create the pods they each automatically get the following 5 inner cluster dns names, and then the kafka service of type clusterIP gives another inner cluster dns name.
M$* kafka-0.my-kafka-headless-service.my-namespace.svc.cluster.local
M$ kafka-1.my-kafka-headless-service.my-namespace.svc.cluster.local
M * kafka-2.my-kafka-headless-service.my-namespace.svc.cluster.local
M * kafka-3.my-kafka-headless-service.my-namespace.svc.cluster.local
M$ kafka-4.my-kafka-headless-service.my-namespace.svc.cluster.local
kafka-service.my-namespace.svc.cluster.local
^ Let's say you have 2 Kafka topics: $ and *
Each Kafka topic is replicated 3 times across the 5 pod Kafka cluster
(the ASCII diagram above shows which pods hold the replicas of the $ and * topics, M represents metadata)
4 useful bits of background knowledge:
1. .svc.cluster.local is the inner cluster DNS FQDN, but pods automatically are populated with the knowledge to autocomplete that, so you can omit it when talking via inner cluster DNS.
2. kafka-x.my-kafka-headless-service.my-namespace inner cluster DNS name resolves to a single pod.
3. kafka-service.my-namespace kubernetes service of type cluster IP acts like an inner cluster Layer 4 Load Balancer, and will round-robin traffic between the 5 kafka pods.
4. A critical Kafka specific concept to realize is when a Kafka client talks to a Kafka cluster it does so in 2 phases. Let's say a Kafka client wants to read the $ topic from the Kafka cluster.
Phase 1: The client reads the kafka clusters metadata, this is synchronized across all 5 kafka pods so it doesn't matter which one the client talks to, therefore it can be useful to do the initial communication using kafka-service.my-namespace (which LB's and only forwards to a random healthy kafka pod)
Phase 2: The metadata tells the Kafka client which Kafka brokers/nodes/servers/pods have the topic of interest, in this case $ exists on 0, 1, and 4. So for Phase 2 the client will only talk directly to the Kafka brokers that have the data it needs.
How to Externally Expose Pods of a Headless Service/Statefulset and Kafka specific Nuance:
Let's say that I have a 3 pod HashiCorp Consul Cluster spun up on a Kubernetes Cluster, I configure it so the webpage is enabled and I want to see the webpage from the LAN/externally expose it. There's nothing special about the fact that the pods are headless. You can use a Service of type NodePort or LoadBalancer to expose them as you normally would any pod, and the NP or LB will round robin LB incoming traffic between the 3 consul pods.
Because Kafka communication happens in 2 phases, this introduces some nuances where the normal method of externally exposing the statefulset's headless service using a single service of type LB or NP might not work when you have a Kafka Cluster of more than 1 Kafka pod.
1. The Kafka client is expecting to speak directly to the Kafka Broker during Phase 2 communications. So instead of 1 Service of type NodePort, you might want 6 services of type NodePort/LB. 1 that will round-robin LB traffic for phase 1, and 5 with a 1:1 mapping to individual pods for Phase 2 communication. (If you run kubectl get pods --show-labels against the 5 Kafka pods, you'll see that each pod of the stateful set has a unique label, statefulset.kubernetes.io/pod-name=kafka-0, and that allows you to manually create 1 NP/LB service that maps to 1 pod of a stateful set.) (Note this alone isn't enough)
2. When you install a Kafka Clusters on Kubernetes it's common for its default configuration to only support Kafka Clients inside the Kubernetes Cluster. Remember that Metadata from phase1 of a Kafka Client talking to a Kafka Cluster, well the kafka cluster may have been configured so that it's "advertised listeners" are made of inner cluster DNS names. So when the LAN client talks to an externally exposed Kafka Cluster via NP/LB, it's successful on phase1, but fails on phase 2, because the metadata returned by phase1 gave inner cluster DNS names as the means of communicating directly with the pods during phase 2 communications, which wouldn't be resolvable by clients outside the cluster and thus only work for Kafka Clients Inside the Cluster. So it's important to configure your kafka cluster so the "advertised.listeners" returned by the phase 1 metadata are resolvable by both clients external to the cluster and internal to the cluster.
Clarity on where the Problem caused by Kafka Nuance Lies:
For Phase 2 of communication between Kafka Client -> Broker, you need to configure the "advertised.listeners" to be externally resolvable.
This is difficult to pull off using Standard Kubernetes Logic, because what you need is for kafka-0 ... kafka-4 to each have a unique configuration/each to have a unique "advertised.listeners" that's externally reachable. But by default statefulsets are meant to have cookie-cutter configurations that are more or less identical.
Solution to the Problem caused by Kafka Nuances:
The Bitnami Kafka Helm Chart has some custom logic that allows each pod in the statefulset to have a unique "advertised.listerners" configuration.
Bitnami Offers hardened Containers, according to Quay.io 2.5.0 only has a single High CVE, runs as non-root, has reasonable documentation, and can be externally exposed*, https://quay.io/repository/bitnami/kafka?tab=tags
The last project I was on I went with Bitnami, because security was the priority and we only had kafka clients that were internal to the kubernetes cluster, I ended up having to figure out how to externally expose it in a dev env so someone could run some kind of test and I remember being able to get it to work, I also remember it wasn't super simple, that being said if I were to do another Kafka on Kubernetes Project I'd recommend looking into Strimzi Kafka Operator, as it's more flexible in terms of options for externally exposing Kafka, and it's got a great 5 part deep dive write up with different options for externally exposing a Kafka Cluster running on Kubernetes using Strimzi (via NP, LB, or Ingress) (I'm not sure what Strimzi's security looks like though, so I'd recommend using something like AnchorCLI to do a left shift CVE scan of the Strimzi images before trying a PoC)
https://strimzi.io/blog/2019/04/17/accessing-kafka-part-1/
Change the service from a headless ClusterIP into a NodePort which would forward request to any of the nodes on a set port (30092 in my example) to port 9042 on the Kafkas. You would hit one of the pods, on random, but I guess that is fine.
20dns.yml becomes (something like this):
# A no longer headless service to create DNS records
---
apiVersion: v1
kind: Service
metadata:
name: broker
namespace: kafka
spec:
type: NodePort
ports:
- port: 9092
- nodePort: 30092
# [podname].broker.kafka.svc.cluster.local
selector:
app: kafka
Disclaimer: You might need two services. One headless for the internal dns names and one NodePort for the external access. I haven't tried this my self.
From the kubernetes kafka documentation:
Outside access with hostport
An alternative is to use the hostport for the outside access. When
using this only one kafka broker can run on each host, which is a good
idea anyway.
In order to switch to hostport the kafka advertise address needs to be
switched to the ExternalIP or ExternalDNS name of the node running the
broker. in kafka/10broker-config.yml switch to
OUTSIDE_HOST=$(kubectl get node "$NODE_NAME" -o jsonpath='{.status.addresses[?(#.type=="ExternalIP")].address}')
OUTSIDE_PORT=${OutsidePort}
and in kafka/50kafka.yml add the hostport:
- name: outside
containerPort: 9094
hostPort: 9094
I solved this problem by creating separate statefulset for each broker and separate service of type NodePort for each broker. Internal communication can happen on each individual service name. External communication can happen on NodePort address.