How to connect to someone else's public Kafka Topic - apache-kafka

Apologies if this is a very basic question.
I'm just starting to get to grips with Kafka and have been given a kafka endpoint and topic to push messages to but I'm not actually sure where, when writing the consumer, to specify the end point. Atm I've only had experience in creating a consumer for a broker and producer that is running locally on my machine and so was able to do this by setting the bootstrap server to my local host and port.
I have an inkling that it may be something to do with the advertised listeners settings but I am unsure how it works.
Again sorry if this seems like a very basic question but I couldn't find the answer
Thank you!

Advertised listeners are a broker setting. If someone else setup Kafka, then all you need to do is change the bootstrap address
If it's "public" over the internet, then chances are you might also need to configure certificates & authentication

Connecting to a public cluster is same as connecting to a local deployment.
Im assuming that your are provided with a FQDN of the cluster and the topic name.
You need to add the FQDN to the bootstrap.servers property of your consumer and subscribe to the topics using the subscribe()
you might want to look into the client.dns.lookup property if you want to change the discovery strategy.
Additionally you might have to configure the keystore and a truststore depending on the security configuration on the cluster

Related

Can two independent bootstrap servers subscribe to the same Kafka topic?

I'm getting messages from an unknown origin in a Kafka topic in my dev environment and it's screwing things up. I'm guessing that our configuration is to blame but I'm not sure.
At work we've share the same topic name and consumer group IDs across different environments, so that in prod we have bootstrap server aws.our-prod.com, and in dev we have aws.our-dev.com. So basically two unconnected domains.
I didn't set this up, but the duplicate topic/group naming across environments seems awfully suspicious to me.
I think this is the problem. Is my hunch correct?
Check zookeeper nodes of both environment Kafka nodes. Make sure they different.
Check you DNS mapping/host mapping, Both domain names can be mapped to same IP address. (If you run locally, check etc/hosts file)
Those scenarios can be happen and there can be more. But same topic is not shared with different environment clusters for sure.
Different consumers may be listen to Group id that is used by your project. Or other projects may be send message to your topic.
Basics:
Donot use same kafka cluster for PROD & DEV environment
if still want to use same kafka cluster, then at-least use different groupid or topic name on PROD & DEV respectively
RESOLVING CURRENT ISSUE
PROD Environment
Consumers: change topic from 't1' to 't2', groupid can be same.
Producers: change topic from 't1' to 't2'.
Result: unknown producer will keep sending to old topic 't1'
If still not resolved, then there is problem with current set of producers.
No, duplicate properties are not an issue, in itself
For example, in a Spring app, you'd define defaults at the top level
application.properties
topic=t
consumerGroup=g
And override profile/environment properties
application-dev.properties
spring.kafka.bootstrap-servers=dev-cluster:9092
application-prod.properties
spring.kafka.bootstrap-servers=prod-cluster:9092
And this is very common, and nothing wrong with it.
It's possible you've got a producer somewhere in your environment that has misconfigured (or altogether missing) the production config file, and has decided to fallback into the dev properties or defined their dev bootstrap servers in the non-environment specific config, but as a consumer or the Kafka administrator, there's no way you'd know that.
Assuming you are able to consume the messages, and the producer has code in source control, you can look for trails there, but otherwise, you're effectively left looking for active network connections to the broker at the TCP level and doing packet analysis

Right way to configure listeners property in kafka brokers

We have a cluster of 5 brokers and have configured server.properties as below
listeners=PLAINTEXT://kafka1:9092
advertised.listeners=PLAINTEXT://kafka1:9092
I have added entries like below in /etc/hosts file of all the brokers, producers and consumers
"Private:IP:kafka:broker1" kafka1
This works for us for the most part and we don't have to remember private IPs of the bootstrap servers when configuring new consumers.
I would like to know if this is an okay way to communicate among kafka brokers and clients?
Since I am not a DevOps guy, I am not sure if this could potentially cause hidden problems. Please comment on this.
Another thing is that I am seeing random disconnections among Kafka broker and clients leading to different problems. I just want to clear the possibility that this is somehow causing problems.
I have added entries like below in /etc/hosts file of all the brokers, producers and consumers
This is NOT okay. Please do not do this
If you cannot resolve the hosts via your bootstrap.servers property alone, then the listeners are not correct.
Please read this explanation of Kafka Listeners for all details you could want.
we don't have to remember private IPs of the bootstrap servers when configuring new consumers
You could use a service discovery tool to accommodate for this problem. Consul is a popular one, then you would just point at kafka.service.consul:9092 and it "just works" via the magic of DNS.
Or you should standardize on a Kafka client library that is already pre-configured with at least the bootstrap servers setting, then you release this "library" internally to your developers for use

Configuring kafka listeners

I have this question regarding configuring kafka listeners properties correctly -
listeners and advertised.listeners.
In my config I am setting below props:
listeners=SASL_PLAINTEXT://:9092
advertised.listeners=SASL_PLAINTEXT://u-kafkatst-kafkadev-5.sd.xxx.com:9092
The clients connect using u-kafkatst-kafkadev-5.sd.xxx.com:9092. Do I need to have the same value in listener and advertised.listeners. Here u-kafkatst-kafkadev-5.sd.xxx.com is a dns record that points to the host where kafka broker is running.
What are the situations where I would want to keep them same and different?
Thanks!
The advertised.listeners property is important if you are doing anything other than connecting to a broker directly on the same network. If you are using Docker, Kubernetes, IaaS (AWS, GCP, etc) then you need to expose the external address for the client to know where to connect to.
This article explains it all in depth.

Unable to connect broker - kafka Tool

I am facing below error message when i was trying to connect and see the topic/consumer details of one of my kafka clusters we have.
we have 3 brokers in the cluster which I able to see but the topic and its partitions.
Note : I have kafka 1.0 and kafka tool version is 2.0.1
I had the same issue on my MacBook Pro. The tool was using "tshepo-mbp" as the hostname which it could not resolve. To get it to work I added 127.0.0.1 tshepo-mbp to the /etc/hosts file.
kafka tool is most likely using the hostname to connect to the broker and cannot reach it. You maybe connecting to the zookeeper host by IP address but make sure you can connect/ping the host name of the broker from the machine running the kafka tool.
If you cannot ping the broker either fix the network issues or as a workaround edit the host file on your client to let it know how to reach the broker by its name
This issue occurs if you have not set listeners and advertised.listeners property in server.properties file.
For Ex:
config/server.properties
...
listeners=PLAINTEXT://:9092
...
advertised.listeners=PLAINTEXT://<public-ip/host-name>:9092
...
To fix this issue, we need to change the server.properties file.
$ vim /usr/local/etc/kafka/server.properties
Here update the listeners value from
listeners=PLAINTEXT://:9092
to
listeners=PLAINTEXT://localhost:9092
source:https://medium.com/#Ankitthakur/apache-kafka-installation-on-mac-using-homebrew-a367cdefd273
For better visibility (even already commented the same in early days thread)
In my case, I got to know when I used Kafkatool from my local machine, tool tris to find out Kafka broker port which was blocked from my cluster admins for my local machine, that is the reason I was not able to connect.
Resolution:
Either ask the admin to open the port for intranet if they can, if they can not you can use tunnelling for your testing purpose or time being for your port.
Hope this would help a few.

Seamless Kafka broker lookup

I want to know is there a best way/practice to look up kafka broker from consumer/producer so that we don't need to change the consumer/producer code when application is moved from one environment (like Dev) to other environment (like ST,UAT,Prod). Currently all the example show consumer/producer need to be aware about IP address and port number of kafka broker in the cluster.
Thanks in advance for suggestions and views.
You can use domain names in place of IP addresses in the Kafka configuration and then just change how the domain names resolve separately.
However these parameters should not be hard coded. They should be in properties files than can be edited without recompiling the apps