How to reach Cloudera Kafka Broker on private network from outside? - apache-kafka

I have a cluster inside a VPN which contains a server with private IP. I'm trying to set up a Kafka communication between an external server to my private server. My approach is to set an IP table where a public IP is pointing my private IP. Also, I opened the port 9092 and 9093 to make it reachable from outside. Now I am available to connect successfully to my server with the public IP from the external server.
telnet <public_ip> 9092
Connected to <public_ip>
My kafka broker is under a cloudera cluster and I created it with Cloudera Manager. The configuration is the following:
kafka.properties:
listeners=PLAINTEXT://<private_ip>:9092,SSL://<private_ip>:9093
advertised.listeners=PLAINTEXT://<private_ip>:9092,SSL://<private_ip>:9093
advertised.host.name:
<public_ip>
Using this broker configuration the comunication works perfectly inside the cluster either using the public_ip or private_ip of the kafka broker host.
What I see now is that I have a working broker that can be used with a public_ip and a external server that is able to reach the public_ip and it's required ports. But when I try to connect to the broker from a external server, I have the following error:
NO BROKERS AVAILABLE
There's no more information of the error. On my external server I have the kafka python package where I configure the producer as:
"bootstrap_servers": ["<publi_ip>:9092"]
on a existing TOPIC of my kafka broker.
Especifications:
private host
cloudera: CDH 5.12.0
kafka: kafka 2.2.0-1.2.2.0
zookeeper: Zookeeper 3.4.5
external host
kafka Python package: kafka-python==1.4.2
The problem is very similar to this post. But in this case he uses a forwarded port with public ip. Is any possibility to do it with ip tables? Anyone has managed to do it on a cloudera cluster?
Thank you in advance.

The question isn't specific to Cloudera or Python. And I don't think Cloudera Manager has some setting that'll set this up for you.
advertised.listeners will have to be a publicly resolvable address that can be used to access each broker individually by clients (e.g two brokers cannot have the same listener setting and be used from a port forward from the public address to the internal address)
Your setup is very similar to Kafka running in Docker or Cloud providers such as AWS, in that you're interacting over two networks, so refer to this blog for more information
Also, unless you setup some other firewall settings to prevent random access, don't expose brokers in the plaintext protocol

Related

how to send message to Kafka cluster from outside of vpn

we have 3 Kafka brokers in a cluster running in our private VPN. our client wants to send messages to our Kafka cluster. our admin has created public-IP for only one Kafka broker out of 3. client cluster is able to communicate with our VPN through this public-IP(checked using Wireshark), but we are not receiving any messages.
Whether we need to create public IPs for the remaining 2 Kafka brokers also?
Can anyone suggest what configs should we change in order to work?
Clients are required to communicate directly with leader brokers of the partitions, so all brokers need to advertise some resolvable address for themselves through the VPN
I think you're using the VPN term wrong. You probably mean VPC or some kind of private network.
As mentioned by OneCricketeer you must assign public address to every broker. But also you must set advertised.listeners property or KAFKA_ADVERTISED_LISTENERS to the public IP address or public DNS name pointing to that public address otherwise your client will not be able to connect.
More details: https://www.confluent.io/blog/kafka-listeners-explained/

Setting up kafka in aws ec2

I have setup kakfa on ec2 instance. I have assigned elastic ip address to that instance. I am able to start the zoo keeper and kafka and create topics. I am not able to connect to broker from my local machine. When i searched, I understood tgar I need to configure listener and advertised host name in the server properties file. I tried enterung the public elasticip address but its not working.
Where am I going wrong and what values do I need to configure. I want a basic single node sigle broker kafka setup on ec2.
You need to configure your advertised.listeners per https://rmoff.net/2018/08/02/kafka-listeners-explained/

unable to remove default kafka configuration in cloudera

In a PoC cloudera cluster provided with kafka(1 broker) I'm trying to send messages to the cluster with an external producer through a public IP. My cluster nodes are hosted in Azure and all of them are reachable from my local machine where the external producer is running.
Following this great article https://rmoff.net/2018/08/02/kafka-listeners-explained/ I figured out the kafka listeners config I need as this broker public IP is only reachable from my local machine and inside the cluster they can only communicate using their internal hostnames:
listeners=INTERNAL://0.0.0.0:19092,EXTERNAL://0.0.0.0:9092
listener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
advertised.listeners=INTERNAL://internal-broker-node-hostname:19092,EXTERNAL://public-broker-ip:9092
inter.broker.listener.name=INTERNAL
With this configuration when I try to start the broker I get this error:
org.apache.kafka.common.config.ConfigException: Only one of inter.broker.listener.name and security.inter.broker.protocol should be set.
Which makes sense as in Kafka doc for this property inter.broker.listener.name they say:
Name of listener used for communication between brokers. If this is unset, the listener name is defined by security.inter.broker.protocol. It is an error to set this and security.inter.broker.protocol properties at the same time.
The problem I'm facing is that I'm unable to unset security.inter.broker.protocol through the cloudera manager as I only have the option of switching options in a radio button.
How can I unset that property?
Internal and external are just strings
So, you could set PLAINTEXT for CDH and just redefine the external listener as being advertised
listeners=PLAINTEXT://0.0.0.0:19092,EXTERNAL://0.0.0.0:9092
listener.security.protocol.map=EXTERNAL:PLAINTEXT,PLAINTEXT:PLAINTEXT
advertised.listeners=PLAINTEXT://internal-broker-node-hostname:19092,EXTERNAL://public-broker-ip:9092
You'll also want to make sure your external broker IPs are static

Consume from a Kafka Cluster through SSH Tunnel

We are trying to consume from a Kafka Cluster using the Java Client. The Cluster is a behind a Jump host and hence the only way to access is through a SSH Tunnel. But we are not able read because once the consumer fetches metadata it uses the original hosts to connect to brokers. Can this behaviour be overridden? Can we ask Kafka Client to not use the metadata?
Not as far as I know.
The trick I used when I needed to do something similar was:
setup a virtual interface for each Kafka broker
open a tunnel to each broker so that broker n is bound to virtual interface n
configure your /etc/hosts file so that the advertised hostname of broker n is resolved to the ip of the virtual interface n.
Es.
Kafka brokers:
broker1 (advertised as broker1.mykafkacluster)
broker2 (advertised as broker2.mykafkacluster)
Virtual interfaces:
veth1 (192.168.1.1)
veth2 (192.168.1.2)
Tunnels:
broker1: ssh -L 192.168.1.1:9092:broker1.mykafkacluster:9092 jumphost
broker2: ssh -L 192.168.1.2:9092:broker1.mykafkacluster:9092 jumphost
/etc/hosts:
192.168.1.1 broker1.mykafkacluster
192.168.1.2 broker2.mykafkacluster
If you configure your system like this you should be able reach all the brokers in your Kafka cluster.
Note: if you configured your Kafka brokers to advertise an ip address instead of a hostname the procedure can still work but you need to configure the virtual interfaces with the same ip address that the broker advertises.
You don't actually have to add virtual interfaces to acces the brokers via SSH tunnel if they advertise a hostname. It's enough to add a hosts entry in /etc/hosts of your client and bind the tunnel to the added name.
Assuming broker.kafkacluster is the advertised.hostname of your broker:
/etc/hosts:
127.0.2.1 broker.kafkacluster
Tunnel:
ssh -L broker.kafkacluster:9092:broker.kafkacluster:9092 <brokerhostip/name>
Try sshuttle like this:
sshuttle -r user#host broker-1-ip:port broker-2-ip:port broker-3-ip:port
Of course, the list of broker depends on advertised listeners broker setting.
Absolutely best solution for me was to use kafkatunnel (https://github.com/simple-machines/kafka-tunnel). Worked like a charm.
Changing the /etc/hosts file is NOT the right way.
Quoting Confluent blog post:
I saw a Stack Overflow answer suggesting to just update my hosts file…isn’t that easier?
This is nothing more than a hack to work around a misconfiguration instead of actually fixing it.
You need to set advertised.listeners (or KAFKA_ADVERTISED_LISTENERS if you’re using Docker images) to the external address (host/IP) so that clients can correctly connect to it. Otherwise, they’ll try to connect to the internal host address—and if that’s not reachable, then problems ensue.
Confluent blog post
Additionally you can have a look at this Pull Request on GitHub where I wrote an integration test to connect to Kafka via SSH. It should be easy to understand even if you don't know Golang.
There you have a full client and server example (see TestSSH). The test is bringing up actual Docker containers and it runs assertions against them.
TL;DR I had to configure the KAFKA_ADVERTISED_LISTENERS when connecting over SSH so that the host advertised by each broker would be one reachable from the SSH host. This is because the client connects to the SSH host first and then from there it connects to a Kafka broker. So the host in the advertised.listeners must be reachable from the SSH server.

Having Kafka connected with ip as well as service name - Openshift

In our Openshift ecosystem, we have a kafka instance sourced from wurstmeister/kafka. As of now I am able to have the kafka accessible withing the Openshift system using the below parameters,
KAFKA_LISTENERS=PLAINTEXT://:9092
KAFKA_ADVERTISED_HOST_NAME=kafka_service_name
And ofcourse, the params for port and zookeper is there.
I am able to access the kafka from the pods within the openshift system. But I am unable to access kafka service from the host machine. Eventhough I am able to access the kafka pod using its IP and able to telnet the pod using, telnet Pod_IP 9092
When I am trying to connect using the kafka producer from the host machine, I am getting the below error,
2017-08-07 07:45:13,925] WARN Error while fetching metadata with
correlation id 2 : {tls21=LEADER_NOT_AVAILABLE}
(org.apache.kafka.clients.NetworkClient)
And When I try to connect from Kafka consumer from the host machine using IP, it is blank.
Note: As of now, its a single openshift server. And the use case is for dev testing.
Maybe you want to take a look at this POC for having Kafka on OpenShift ?
https://github.com/EnMasseProject/barnabas