Can two independent bootstrap servers subscribe to the same Kafka topic? - apache-kafka

I'm getting messages from an unknown origin in a Kafka topic in my dev environment and it's screwing things up. I'm guessing that our configuration is to blame but I'm not sure.
At work we've share the same topic name and consumer group IDs across different environments, so that in prod we have bootstrap server aws.our-prod.com, and in dev we have aws.our-dev.com. So basically two unconnected domains.
I didn't set this up, but the duplicate topic/group naming across environments seems awfully suspicious to me.
I think this is the problem. Is my hunch correct?

Check zookeeper nodes of both environment Kafka nodes. Make sure they different.
Check you DNS mapping/host mapping, Both domain names can be mapped to same IP address. (If you run locally, check etc/hosts file)
Those scenarios can be happen and there can be more. But same topic is not shared with different environment clusters for sure.

Different consumers may be listen to Group id that is used by your project. Or other projects may be send message to your topic.

Basics:
Donot use same kafka cluster for PROD & DEV environment
if still want to use same kafka cluster, then at-least use different groupid or topic name on PROD & DEV respectively
RESOLVING CURRENT ISSUE
PROD Environment
Consumers: change topic from 't1' to 't2', groupid can be same.
Producers: change topic from 't1' to 't2'.
Result: unknown producer will keep sending to old topic 't1'
If still not resolved, then there is problem with current set of producers.

No, duplicate properties are not an issue, in itself
For example, in a Spring app, you'd define defaults at the top level
application.properties
topic=t
consumerGroup=g
And override profile/environment properties
application-dev.properties
spring.kafka.bootstrap-servers=dev-cluster:9092
application-prod.properties
spring.kafka.bootstrap-servers=prod-cluster:9092
And this is very common, and nothing wrong with it.
It's possible you've got a producer somewhere in your environment that has misconfigured (or altogether missing) the production config file, and has decided to fallback into the dev properties or defined their dev bootstrap servers in the non-environment specific config, but as a consumer or the Kafka administrator, there's no way you'd know that.
Assuming you are able to consume the messages, and the producer has code in source control, you can look for trails there, but otherwise, you're effectively left looking for active network connections to the broker at the TCP level and doing packet analysis

Related

How to connect to someone else's public Kafka Topic

Apologies if this is a very basic question.
I'm just starting to get to grips with Kafka and have been given a kafka endpoint and topic to push messages to but I'm not actually sure where, when writing the consumer, to specify the end point. Atm I've only had experience in creating a consumer for a broker and producer that is running locally on my machine and so was able to do this by setting the bootstrap server to my local host and port.
I have an inkling that it may be something to do with the advertised listeners settings but I am unsure how it works.
Again sorry if this seems like a very basic question but I couldn't find the answer
Thank you!
Advertised listeners are a broker setting. If someone else setup Kafka, then all you need to do is change the bootstrap address
If it's "public" over the internet, then chances are you might also need to configure certificates & authentication
Connecting to a public cluster is same as connecting to a local deployment.
Im assuming that your are provided with a FQDN of the cluster and the topic name.
You need to add the FQDN to the bootstrap.servers property of your consumer and subscribe to the topics using the subscribe()
you might want to look into the client.dns.lookup property if you want to change the discovery strategy.
Additionally you might have to configure the keystore and a truststore depending on the security configuration on the cluster

How to convince Kafka brokers to assign workers to my topic?

We're got a (Confluent) managed kafka installation with 5 brokers and 2 connector hosts. I have two topics which never get any consumers assigned to them, despite repeated starting and stopping of the connectors which are supposed to listen for them. This configuration was running until recently (and no, nothing has changed - we've done an audit to confirm).
What, if anything, can I do to force assignment of consumers to these topics?
This problem occurred for two reasons: (1) I had mistakenly installed new versions of the PostgreSQL and SQLServer JDBC connectors {which conflicted with the already-installed versions}, and (2) not understanding that source connectors do not have Consumer Groups assigned to them.
Getting past the "well duh, of course sources don't have consumers", another vital piece of information (thankyou to the support team at Confluent for this) is that you can see where your connector is up to by reading the private or hidden topic connect-offsets. Once you have that, you can then check your actual DB query and see what it is returning then (if necessary) reset your connector's offset.

Right way to configure listeners property in kafka brokers

We have a cluster of 5 brokers and have configured server.properties as below
listeners=PLAINTEXT://kafka1:9092
advertised.listeners=PLAINTEXT://kafka1:9092
I have added entries like below in /etc/hosts file of all the brokers, producers and consumers
"Private:IP:kafka:broker1" kafka1
This works for us for the most part and we don't have to remember private IPs of the bootstrap servers when configuring new consumers.
I would like to know if this is an okay way to communicate among kafka brokers and clients?
Since I am not a DevOps guy, I am not sure if this could potentially cause hidden problems. Please comment on this.
Another thing is that I am seeing random disconnections among Kafka broker and clients leading to different problems. I just want to clear the possibility that this is somehow causing problems.
I have added entries like below in /etc/hosts file of all the brokers, producers and consumers
This is NOT okay. Please do not do this
If you cannot resolve the hosts via your bootstrap.servers property alone, then the listeners are not correct.
Please read this explanation of Kafka Listeners for all details you could want.
we don't have to remember private IPs of the bootstrap servers when configuring new consumers
You could use a service discovery tool to accommodate for this problem. Consul is a popular one, then you would just point at kafka.service.consul:9092 and it "just works" via the magic of DNS.
Or you should standardize on a Kafka client library that is already pre-configured with at least the bootstrap servers setting, then you release this "library" internally to your developers for use

Which Kafka broker configuration gets precedence when there's a conflict?

In Apache Kafka, each broker has its own configuration file. Some of the config entries, such as broker ID, are evidently unique to each node.
However, others such as topic retention time or maximum message size should be global to the entire cluster.
In case two brokers have conflicting configurations, which value gets precedence? Or am I wrong to assume that some config entries should be global?
Kafka does not check that each broker has exactly the same configuration.
That said, as you've pointed, some settings could conflict and if it's the case my guess is that at best a crash or worse undefined behaviour !
There is KIP-226 in progress that addresses some of these issues but if you're to deploy many brokers it's recommended you use some automation (K8s, Mesos) to ensure configuration is consistent across all of them.

Seamless Kafka broker lookup

I want to know is there a best way/practice to look up kafka broker from consumer/producer so that we don't need to change the consumer/producer code when application is moved from one environment (like Dev) to other environment (like ST,UAT,Prod). Currently all the example show consumer/producer need to be aware about IP address and port number of kafka broker in the cluster.
Thanks in advance for suggestions and views.
You can use domain names in place of IP addresses in the Kafka configuration and then just change how the domain names resolve separately.
However these parameters should not be hard coded. They should be in properties files than can be edited without recompiling the apps