Consume from a Kafka Cluster through SSH Tunnel - apache-kafka

We are trying to consume from a Kafka Cluster using the Java Client. The Cluster is a behind a Jump host and hence the only way to access is through a SSH Tunnel. But we are not able read because once the consumer fetches metadata it uses the original hosts to connect to brokers. Can this behaviour be overridden? Can we ask Kafka Client to not use the metadata?

Not as far as I know.
The trick I used when I needed to do something similar was:
setup a virtual interface for each Kafka broker
open a tunnel to each broker so that broker n is bound to virtual interface n
configure your /etc/hosts file so that the advertised hostname of broker n is resolved to the ip of the virtual interface n.
Es.
Kafka brokers:
broker1 (advertised as broker1.mykafkacluster)
broker2 (advertised as broker2.mykafkacluster)
Virtual interfaces:
veth1 (192.168.1.1)
veth2 (192.168.1.2)
Tunnels:
broker1: ssh -L 192.168.1.1:9092:broker1.mykafkacluster:9092 jumphost
broker2: ssh -L 192.168.1.2:9092:broker1.mykafkacluster:9092 jumphost
/etc/hosts:
192.168.1.1 broker1.mykafkacluster
192.168.1.2 broker2.mykafkacluster
If you configure your system like this you should be able reach all the brokers in your Kafka cluster.
Note: if you configured your Kafka brokers to advertise an ip address instead of a hostname the procedure can still work but you need to configure the virtual interfaces with the same ip address that the broker advertises.

You don't actually have to add virtual interfaces to acces the brokers via SSH tunnel if they advertise a hostname. It's enough to add a hosts entry in /etc/hosts of your client and bind the tunnel to the added name.
Assuming broker.kafkacluster is the advertised.hostname of your broker:
/etc/hosts:
127.0.2.1 broker.kafkacluster
Tunnel:
ssh -L broker.kafkacluster:9092:broker.kafkacluster:9092 <brokerhostip/name>

Try sshuttle like this:
sshuttle -r user#host broker-1-ip:port broker-2-ip:port broker-3-ip:port
Of course, the list of broker depends on advertised listeners broker setting.

Absolutely best solution for me was to use kafkatunnel (https://github.com/simple-machines/kafka-tunnel). Worked like a charm.

Changing the /etc/hosts file is NOT the right way.
Quoting Confluent blog post:
I saw a Stack Overflow answer suggesting to just update my hosts file…isn’t that easier?
This is nothing more than a hack to work around a misconfiguration instead of actually fixing it.
You need to set advertised.listeners (or KAFKA_ADVERTISED_LISTENERS if you’re using Docker images) to the external address (host/IP) so that clients can correctly connect to it. Otherwise, they’ll try to connect to the internal host address—and if that’s not reachable, then problems ensue.
Confluent blog post
Additionally you can have a look at this Pull Request on GitHub where I wrote an integration test to connect to Kafka via SSH. It should be easy to understand even if you don't know Golang.
There you have a full client and server example (see TestSSH). The test is bringing up actual Docker containers and it runs assertions against them.
TL;DR I had to configure the KAFKA_ADVERTISED_LISTENERS when connecting over SSH so that the host advertised by each broker would be one reachable from the SSH host. This is because the client connects to the SSH host first and then from there it connects to a Kafka broker. So the host in the advertised.listeners must be reachable from the SSH server.

Related

unable to connect to kafka broker (via zookeeper) using Conduktor client

Able to connect successfully to local kafka broker/cluster running locally (dockerized) using Conduktor, but when trying to connect to Kafka cluster running on Unix VM, getting below error.
Error:
"The broker [...] is reachable but Kafka can't connect. Ensure you have access to the advertised listeners of the the brokers and the proper authorization"
Appreciate any assistance.
running locally (dockerized)
When running in docker, you need to ensure that the ports are accessible from outside of your container. To verify this, try doing a telnet <ip> <port> and check if you are able to connect.
Since the error message says, the broker is reachable, I suppose you would be able to successfully telnet to the broker.
Next, check your broker config called advertised.listeners. Here you need to mention your IP:Port combination where IP is what you will be giving in your client program i.e. Conduktor.
An example for that would be
advertised.listeners=PLAINTEXT://1.2.3.4:9092
and then restart your broker and reconnect. If you are using ssl then you need to provide some extra configuration. See Configuring Kafka brokers for more.
Try to add in /etc/hosts (Unix-like) or C:\Windows\System32\drivers\etc\hosts (windows-like) the Kafka server in such manner kafka_server_ip kafka_server_name_in_dns (e.g. 10.10.0.1 kafka).

Configuring kafka listeners

I have this question regarding configuring kafka listeners properties correctly -
listeners and advertised.listeners.
In my config I am setting below props:
listeners=SASL_PLAINTEXT://:9092
advertised.listeners=SASL_PLAINTEXT://u-kafkatst-kafkadev-5.sd.xxx.com:9092
The clients connect using u-kafkatst-kafkadev-5.sd.xxx.com:9092. Do I need to have the same value in listener and advertised.listeners. Here u-kafkatst-kafkadev-5.sd.xxx.com is a dns record that points to the host where kafka broker is running.
What are the situations where I would want to keep them same and different?
Thanks!
The advertised.listeners property is important if you are doing anything other than connecting to a broker directly on the same network. If you are using Docker, Kubernetes, IaaS (AWS, GCP, etc) then you need to expose the external address for the client to know where to connect to.
This article explains it all in depth.

How to reach Cloudera Kafka Broker on private network from outside?

I have a cluster inside a VPN which contains a server with private IP. I'm trying to set up a Kafka communication between an external server to my private server. My approach is to set an IP table where a public IP is pointing my private IP. Also, I opened the port 9092 and 9093 to make it reachable from outside. Now I am available to connect successfully to my server with the public IP from the external server.
telnet <public_ip> 9092
Connected to <public_ip>
My kafka broker is under a cloudera cluster and I created it with Cloudera Manager. The configuration is the following:
kafka.properties:
listeners=PLAINTEXT://<private_ip>:9092,SSL://<private_ip>:9093
advertised.listeners=PLAINTEXT://<private_ip>:9092,SSL://<private_ip>:9093
advertised.host.name:
<public_ip>
Using this broker configuration the comunication works perfectly inside the cluster either using the public_ip or private_ip of the kafka broker host.
What I see now is that I have a working broker that can be used with a public_ip and a external server that is able to reach the public_ip and it's required ports. But when I try to connect to the broker from a external server, I have the following error:
NO BROKERS AVAILABLE
There's no more information of the error. On my external server I have the kafka python package where I configure the producer as:
"bootstrap_servers": ["<publi_ip>:9092"]
on a existing TOPIC of my kafka broker.
Especifications:
private host
cloudera: CDH 5.12.0
kafka: kafka 2.2.0-1.2.2.0
zookeeper: Zookeeper 3.4.5
external host
kafka Python package: kafka-python==1.4.2
The problem is very similar to this post. But in this case he uses a forwarded port with public ip. Is any possibility to do it with ip tables? Anyone has managed to do it on a cloudera cluster?
Thank you in advance.
The question isn't specific to Cloudera or Python. And I don't think Cloudera Manager has some setting that'll set this up for you.
advertised.listeners will have to be a publicly resolvable address that can be used to access each broker individually by clients (e.g two brokers cannot have the same listener setting and be used from a port forward from the public address to the internal address)
Your setup is very similar to Kafka running in Docker or Cloud providers such as AWS, in that you're interacting over two networks, so refer to this blog for more information
Also, unless you setup some other firewall settings to prevent random access, don't expose brokers in the plaintext protocol

Is there a way to start a Zookeeper server using my static ip instead of localhost

I've started learning some big data tools for a new project, and right now I'm on Kafka and Zookeeper.
I have them both install on my local machine, and I can start them up and start producing and consuming messages just fine. Now, I want to try it having two machines, one with a kafka broker, zookeepr and a producer, and the other with a consumer. Lets call them Machine A and Machine B.
Machine A has runs the Zookeeper server, the broker and a producer. Machine B runs a consumer. From what I think I understand, I should be able to setup the consumer to listen to a topic from the producer on Machine A, using Zookeeper. Since both machines are on the same network (i.e. my local home network), I thought I could change the kafka broker server.properties to use my static ip address for Machine A, and then have the consumer on Machine B connect to it.
My problem, is that zookeeper keeps spinning up on localhost, and connecting to 0.0.0.0/0.0.0.0:2181 so when my broker tries to connect to it using my static ip address (i.e 192.168.x.x), it times out. I have looked all over for a solution, but I cannot find anything that tells me how to configure the Zookeeper sever to start on a different ip address.
Maybe my understanding of these technologies is simply wrong, but I thought this would be a fairly simple thing to do. Does anyone know any way to resolve this? Or else if I'm doing it completely wrong, what is the correct approach
zookeeper keeps spinning up on localhost, and connecting to 0.0.0.0/0.0.0.0:2181
Well, that is the bind address.
You need to also (preferably) have a static IP for Zookeeper, then set zookeeper.connect within the server.properties file of Kafka to reach to that other machine's external address.
From the Zookeeper configuration file, you would make sure you have the myid file and have a line in the property file that looks like this (without the double brackets)
server.{{ myid }}={{ ip_address }}:2888:3888
You wouldn't find this in the Kafka documentation, but it is in the Zookeeper documentation
However, if Kafka and Zookeeper are on the same machine, this isn't necessary.
Your external consumer should be setting bootstrap.servers property and the Kafka IP address(es) w/ port 9092.
Your problem might me related instead to the advertised.listeners setting within Kafka.
For example, start with listeners=PLAINTEXT://:9092
As of Zookeeper 3.3.0 (see Advanced Configuration):
clientPortAddress : New in 3.3.0: the address (ipv4, ipv6 or hostname)
to listen for client connections; that is, the address that clients
attempt to connect to. This is optional, by default we bind in such a
way that any connection to the clientPort for any
address/interface/nic on the server will be accepted
So you could use:
clientPortAddress=127.0.0.1

Unable to connect broker - kafka Tool

I am facing below error message when i was trying to connect and see the topic/consumer details of one of my kafka clusters we have.
we have 3 brokers in the cluster which I able to see but the topic and its partitions.
Note : I have kafka 1.0 and kafka tool version is 2.0.1
I had the same issue on my MacBook Pro. The tool was using "tshepo-mbp" as the hostname which it could not resolve. To get it to work I added 127.0.0.1 tshepo-mbp to the /etc/hosts file.
kafka tool is most likely using the hostname to connect to the broker and cannot reach it. You maybe connecting to the zookeeper host by IP address but make sure you can connect/ping the host name of the broker from the machine running the kafka tool.
If you cannot ping the broker either fix the network issues or as a workaround edit the host file on your client to let it know how to reach the broker by its name
This issue occurs if you have not set listeners and advertised.listeners property in server.properties file.
For Ex:
config/server.properties
...
listeners=PLAINTEXT://:9092
...
advertised.listeners=PLAINTEXT://<public-ip/host-name>:9092
...
To fix this issue, we need to change the server.properties file.
$ vim /usr/local/etc/kafka/server.properties
Here update the listeners value from
listeners=PLAINTEXT://:9092
to
listeners=PLAINTEXT://localhost:9092
source:https://medium.com/#Ankitthakur/apache-kafka-installation-on-mac-using-homebrew-a367cdefd273
For better visibility (even already commented the same in early days thread)
In my case, I got to know when I used Kafkatool from my local machine, tool tris to find out Kafka broker port which was blocked from my cluster admins for my local machine, that is the reason I was not able to connect.
Resolution:
Either ask the admin to open the port for intranet if they can, if they can not you can use tunnelling for your testing purpose or time being for your port.
Hope this would help a few.