I have a Kafka cluster in Kubernetes created using Strimzi.
apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
name: {{ .Values.cluster.kafka.name }}
spec:
kafka:
version: 2.7.0
replicas: 3
storage:
deleteClaim: true
size: {{ .Values.cluster.kafka.storagesize }}
type: persistent-claim
rack:
topologyKey: failure-domain.beta.kubernetes.io/zone
template:
pod:
metadata:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9404'
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
authentication:
type: tls
- name: external
port: 9094
type: loadbalancer
tls: true
authentication:
type: tls
configuration:
bootstrap:
loadBalancerIP: {{ .Values.cluster.kafka.bootstrapipaddress }}
brokers:
{{- range $key, $value := (split "," .Values.cluster.kafka.brokersipaddress) }}
- broker: {{ (split "=" .)._0 }}
loadBalancerIP: {{ (split "=" .)._1 | quote }}
{{- end }}
authorization:
type: simple
Cluster is created and up, I am able to create topics and produce/consume to/from topic.
The issue is that if I exec into one of Kafka brokers pods I see intermittent errors
INFO [SocketServer brokerId=0] Failed authentication with /10.240.0.35 (SSL handshake failed) (org.apache.kafka.common.network.Selector) [data-plane-kafka-network-thread-0-ListenerName(EXTERNAL-9094)-SSL-9]
INFO [SocketServer brokerId=0] Failed authentication with /10.240.0.159 (SSL handshake failed) (org.apache.kafka.common.network.Selector) [data-plane-kafka-network-thread-0-ListenerName(EXTERNAL-9094)-SSL-11]
INFO [SocketServer brokerId=0] Failed authentication with /10.240.0.4 (SSL handshake failed) (org.apache.kafka.common.network.Selector) [data-plane-kafka-network-thread-0-ListenerName(EXTERNAL-9094)-SSL-10]
INFO [SocketServer brokerId=0] Failed authentication with /10.240.0.128 (SSL handshake failed) (org.apache.kafka.common.network.Selector) [data-plane-kafka-network-thread-0-ListenerName(EXTERNAL-9094)-SSL-1]
After inspecting these IPs [10.240.0.35, 10.240.0.159, 10.240.0.4,10.240.0.128] I figured out the all they are related to pods from kube-system namespace which are implicitly created as part of Kafka cluster deployment.
Any idea what can be wrong?
I do not think this is necessarily wrong. You seem to have somewhere some application trying to connect to the broker without properly configured TLS. But as the connection is forwarded the IP probably gets masked - so it does not shwo the real external IP anymore. These can be all kind of things from misconfigured clients up to some healthchecks trying to just open TCP connection (depending on your environment, the load balancer can do it for example).
Unfortunately, it is a bit hard to find out where they really come from. You can try to trace it through the logs of whoeevr owns the IP address it came from, as that forwarded it from someone else etc. You could also try to enable TLS debug in Kafka with the Java system property javax.net.debug=ssl. But that might help only in some cases with misconfigured clients, not with some TPC probes and it will also make it hard to find the right place in the logs because it will also dump the replication traffic etc. which used TLS as well.
Related
I am trying to connect to kafka broker setup on azure aks from onprem rancher k8 cluster over internet . I have created a loadbalancer listener on azure kafka. it is creating 4 public ip's using azure load balancer service.
- name: external
port: 9093
type: loadbalancer
tls: true
authentication:
type: tls
configuration:
bootstrap:
loadBalancerIP: 172.22.199.99
annotations:
external-dns.alpha.kubernetes.io/hostname: bootstrap.example.com
brokers:
- broker: 0
loadBalancerIP: 172.22.199.100
annotations:
external-dns.alpha.kubernetes.io/hostname: kafka-0.example.com
- broker: 1
loadBalancerIP: 172.22.199.101
annotations:
external-dns.alpha.kubernetes.io/hostname: kafka-1.example.com
- broker: 2
loadBalancerIP: 172.22.199.102
annotations:
external-dns.alpha.kubernetes.io/hostname: kafka-2.example.com
brokerCertChainAndKey:
secretName: source-kafka-listener-cert
certificate: tls.crt
key: tls.key
now to connect from onprem i have opened firewall for only bootstrap lb ip .My understanding is that bootstrap will inturn route the request to individual broker but that is not happening. When i try to connect connection is established with bootstrap loadbalncer ip but after that i get timeout error .
2022-08-22 08:14:04,659 INFO Metrics reporters closed (org.apache.kafka.common.metrics.Metrics) [kafka-admin-client-thread | adminclient-1]
2022-08-22 08:14:04,659 ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectDistributed) [main] org.apache.kafka.connect.errors.ConnectException: Failed to connect to and describe Kafka cluster. Check worker's broker connection and security properties.
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:70)
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:51)
please let me know if i have to open firewall for individual brokers as well?
Im using bitnami helm chart for keycloak. and trying to achieve High availability with 3 keycloak replics, using DNS ping.
Chart version: 5.2.8
Image version: 15.1.1-debian-10-r10
Helm repo: https://charts.bitnami.com/bitnami -> bitnami/keycloak
The modified parameters of values.yaml file is as follows:
global:
image:
registry: docker.io
repository: bitnami/keycloak
tag: 15.1.1-debian-10-r10
pullPolicy: IfNotPresent
pullSecrets: []
debug: true
proxyAddressForwarding: true
serviceDiscovery:
enabled: true
protocol: dns.DNS_PING
properties:
- dns_query=keycloak.identity.svc.cluster.local
transportStack: tcp
cache:
ownersCount: 3
authOwnersCount: 3
replicaCount: 3
ingress:
enabled: true
hostname: my-keycloak.keycloak.example.com
apiVersion: ""
ingressClassName: "nginx"
path: /
pathType: ImplementationSpecific
annotations: {}
tls: false
extraHosts: []
extraTls: []
secrets: []
existingSecret: ""
servicePort: http
When login to the keycloak UI, after entering the username and password , the login does not happen, it redirects the back to login page.
From the pod logs I see following error:
0:07:05,251 WARN [org.keycloak.events] (default task-1) type=CODE_TO_TOKEN_ERROR, realmId=master, clientId=security-admin-console, userId=null, ipAddress=10.244.5.46, error=invalid_code, grant_type=authorization_code, code_id=157e0483-67fa-4ea4-a964-387f3884cbc9, client_auth_method=client-secret
When I checked about this error in forums, As per some suggestions to set proxyAddressForwarding to true, but with this as well, issue remains same.
Apart from this I have tried some other version of the helm chart , but with that the UI itself does not load correctly with page not found errors.
Update
I get the above error i.e, CODE_TO_TOKEN_ERROR in logs when I use the headless service with ingress. But if I use the service of type ClusterIP with ingress , the error is as follows:
06:43:37,587 WARN [org.keycloak.events] (default task-6) type=LOGIN_ERROR, realmId=master, clientId=null, userId=null, ipAddress=10.122.0.26, error=expired_code, restart_after_timeout=true, authSessionParentId=453870cd-5580-495d-8f03-f73498cd3ace, authSessionTabId=1d17vpIoysE
Another additional information I would like to post is , I see following INFO in all the keycloak pod logs at the startup.
05:27:10,437 INFO [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 58) my-keycloak-0: no members discovered after 3006 ms: creating cluster as coordinator
This sounds like the 3 members have not combined and not formed a keycloak cluster.
One common scenario that may lead to such a situation is when the node that issued the access code is not the one who has received the code to token request. So the client gets the access code from node 1 but the second request reaches node 2 and the value is not yet in this node's cache. The safest approach to prevent such scenario is to setup a session sticky load balancer.
I suggest you to try setting the service.spec.sessionAffinity to ClientIP. Its default value is None.
This part of the error
expired_code
might indicate a mismatch in timekeeping between the server and the client
My setup is like, I have a producer service provisioned as part of minikube cluster and it is trying to publish messages to a kafka instance running on the host machine.
I have written a kafka service and endpoints yaml as follows:
kind: Service
apiVersion: v1
metadata:
name: kafka
spec:
ports:
- name: "broker"
protocol: "TCP"
port: 9092
targetPort: 9092
nodePort: 0
---
kind: Endpoints
apiVersion: v1
metadata:
name: kafka
namespace: default
subsets:
- addresses:
- ip: 10.0.2.2
ports:
- name: "broker"
port: 9092
The ip address of the host machine from inside the minikube cluster mentioned in the endpoint is acquired from the following command:
minikube ssh "route -n | grep ^0.0.0.0 | awk '{ print \$2 }'"
The problem I am facing is that the topic is getting created when producer tries to publish message for the first time but no messages are getting written on to that topic.
Digging into the pod logs, I found that producer is trying to connect to kafka instance on localhost or something (not really sure of it).
2020-05-17T19:09:43.021Z [warn] org.apache.kafka.clients.NetworkClient [] -
[Producer clientId=/system/sharding/kafkaProducer-greetings/singleton/singleton/producer]
Connection to node 0 (omkara/127.0.1.1:9092) could not be established. Broker may not be available.
Following which I suspected that probably I need to modify server.properties with the following change:
listeners=PLAINTEXT://localhost:9092
This however resulted in the change in the ip address in the log:
2020-05-17T19:09:43.021Z [warn] org.apache.kafka.clients.NetworkClient [] -
[Producer clientId=/system/sharding/kafkaProducer-greetings/singleton/singleton/producer]
Connection to node 0 (omkara/127.0.0.1:9092) could not be established. Broker may not be available.
I am not sure what ip address must be mentioned here? Or what is an alternate solution? And if it is even possible to connect from inside the kubernetes cluster to the kafka instance installed outside the kubernetes cluster.
Since producer kafka client is on the very same network as the brokers, we need to configure an additional listener like so:
listeners=INTERNAL://0.0.0.0:9093,EXTERNAL://0.0.0.0:9092
listener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
advertised.listeners=INTERNAL://localhost:9093,EXTERNAL://10.0.2.2:9092
inter.broker.listener.name=INTERNAL
We can verify messages in topic like so:
kafka-console-consumer.sh --bootstrap-server INTERNAL://0.0.0.0:9093 --topic greetings --from-beginning
{"name":"Alice","message":"Namastey"}
You can find a detailed explaination on understanding & provisioning kafka listeners here.
I tried to build a Kafka cluster using Strimzi (0.14) in a Kubernetes cluster.
I use the examples come with the strimzi, i.e. examples/kafka/kafka-persistent.yaml.
This yaml file looks like this:
apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
version: 2.3.0
replicas: 3
listeners:
plain: {}
tls: {}
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
log.message.format.version: "2.3"
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 12Gi
deleteClaim: false
zookeeper:
replicas: 3
storage:
type: persistent-claim
size: 9Gi
deleteClaim: false
entityOperator:
topicOperator: {}
userOperator: {}
kubectl apply -f examples/kafka/kafka-persistent.yaml
Both zookeepers and kafka brokers were brought up.
However, I saw errors in kafka broker logs:
[SocketServer brokerId=1] Failed authentication with /10.244.5.94 (SSL handshake failed) (org.apache.kafka.common.network.Selector) [data-plane-kafka-network-thread-1-ListenerName(REPLICATION)-SSL-0]
Anyone know how to fix the problem?
One of the things which can cause this is if your cluster is using different DNS suffix for service domains (default is .cluster.local). You need to find out the right DNS suffix and use the environment variable KUBERNETES_SERVICE_DNS_DOMAIN in the Strimzi Cluster Operator deployment to override the default value.
If you exec into one of the Kafka or Zookeeper pods and do hostname -f it should show you the full hostname from which you can identifythe suffix.
(I pasted this from comments as a full answer since it helped to solve the question.)
For testing purposes I try to create a Kafka Cluster on my local minikube. The Cluster must be reachable from outside of the Kubernetes.
When I produce/consume from inside the pods there is no problem, everything works just fine.
When I produce from my local machine with
bin/kafka-console-producer.sh --topic mytopic --broker-list 192.168.99.100:32767
where 192.168.99.100 is my minikube-ip and 32767 is the node port of the kafka service.
I get the following Error Message:
>testmessage
>[2018-04-30 11:55:04,604] ERROR Error when sending message to topic ams_stream with key: null, value: 11 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for ams_stream-0: 1506 ms has passed since batch creation plus linger time
When I consume from my local machine I get the following warnings:
[2018-04-30 10:22:30,680] WARN Connection to node 2 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-04-30 10:23:46,057] WARN Connection to node 8 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-04-30 10:25:01,542] WARN Connection to node 2 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-04-30 10:26:17,008] WARN Connection to node 5 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
The Broker IDs are right, so it looks like I can at least reach the brokers
Edit:
I think that the Problem may be, that the service is routing me "randomly" to any of my brokers, but he needs to route me to the leader of the topic.
Could this be the Problem? Does anybody know a way around this Problem?
Additional Information:
I'm using the wurstmeister/kafka and digitalwonderland/zookeeper images
I started using the DellEMC Tutorial (and the linked one from defuze.org)
This did not work out for me so I made some changes in the kafka-service.yml (1) and the kafka-cluster.yml (2)
kafka-service.yml
added a fixed NodePort
removed id from the selector
kafka-cluster.yml
added replicas to the specification
removed id from the label
changed the broker id to be generated by the last number from the IP
replaced deprecated values advertised_host_name / advertised_port with
listeners ( pod-ip:9092 ) for communication inside the k8s
advertised_listeners ( minikube-ip:node-port ) for communication with applications outside the kubernetes
1 - kafka-service.yml:
---
apiVersion: v1
kind: Service
metadata:
name: kafka-service
labels:
name: kafka
spec:
type: NodePort
ports:
- port: 9092
nodePort: 32767
targetPort: 9092
protocol: TCP
selector:
app: kafka
type: LoadBalancer
2 - kafka-cluster.yml:
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: kafka-b
spec:
replicas: 3
template:
metadata:
labels:
app: kafka
spec:
containers:
- name: kafka
image: wurstmeister/kafka
ports:
- containerPort: 9092
env:
- name: HOSTNAME_COMMAND
value: "ifconfig |grep 'addr:172' |cut -d':' -f 2 |cut -d ' ' -f 1"
- name: KAFKA_ZOOKEEPER_CONNECT
value: zk1:2181
- name: BROKER_ID_COMMAND
value: "ifconfig |grep 'inet addr:172' | cut -d'.' -f '4' | cut -d' ' -f '1'"
- name: KAFKA_ADVERTISED_LISTENERS
value: "INTERNAL://192.168.99.100:32767"
- name: KAFKA_LISTENERS
value: "INTERNAL://_{HOSTNAME_COMMAND}:9092"
- name: KAFKA_LISTENER_SECURITY_PROTOCOL_MAP
value: "INTERNAL:PLAINTEXT"
- name: KAFKA_INTER_BROKER_LISTENER_NAME
value: "INTERNAL"
- name: KAFKA_CREATE_TOPICS
value: mytopic:1:3