Seamless Kafka broker lookup - apache-kafka

I want to know is there a best way/practice to look up kafka broker from consumer/producer so that we don't need to change the consumer/producer code when application is moved from one environment (like Dev) to other environment (like ST,UAT,Prod). Currently all the example show consumer/producer need to be aware about IP address and port number of kafka broker in the cluster.
Thanks in advance for suggestions and views.

You can use domain names in place of IP addresses in the Kafka configuration and then just change how the domain names resolve separately.
However these parameters should not be hard coded. They should be in properties files than can be edited without recompiling the apps

Related

How to add blacklist IPs in Apache kafka?

I am sending some filebeat data to kafka. What I want is that kafka may only take data of specified IPs. Can anyone tell how can I configure kafka for particular IPs I configure?
A better way would probably be to implement Access Control Lists (ACLs). That way if your filebeat process moves servers you don't have to arbitrarily change the accepted IP list on the Kafka machines.
However, if you actually want to create an accept-list of IPs, this isn't a Kafka feature but something you'd implement at the networking layer on your Kafka machine, with a rule to accept traffic from certain hosts to the Kafka port. For example, I found this iptables guide which shows how to accept traffic for a given service (SSH in the example, but you could amend it to Kafka) only from a particular IP.

Can two independent bootstrap servers subscribe to the same Kafka topic?

I'm getting messages from an unknown origin in a Kafka topic in my dev environment and it's screwing things up. I'm guessing that our configuration is to blame but I'm not sure.
At work we've share the same topic name and consumer group IDs across different environments, so that in prod we have bootstrap server aws.our-prod.com, and in dev we have aws.our-dev.com. So basically two unconnected domains.
I didn't set this up, but the duplicate topic/group naming across environments seems awfully suspicious to me.
I think this is the problem. Is my hunch correct?
Check zookeeper nodes of both environment Kafka nodes. Make sure they different.
Check you DNS mapping/host mapping, Both domain names can be mapped to same IP address. (If you run locally, check etc/hosts file)
Those scenarios can be happen and there can be more. But same topic is not shared with different environment clusters for sure.
Different consumers may be listen to Group id that is used by your project. Or other projects may be send message to your topic.
Basics:
Donot use same kafka cluster for PROD & DEV environment
if still want to use same kafka cluster, then at-least use different groupid or topic name on PROD & DEV respectively
RESOLVING CURRENT ISSUE
PROD Environment
Consumers: change topic from 't1' to 't2', groupid can be same.
Producers: change topic from 't1' to 't2'.
Result: unknown producer will keep sending to old topic 't1'
If still not resolved, then there is problem with current set of producers.
No, duplicate properties are not an issue, in itself
For example, in a Spring app, you'd define defaults at the top level
application.properties
topic=t
consumerGroup=g
And override profile/environment properties
application-dev.properties
spring.kafka.bootstrap-servers=dev-cluster:9092
application-prod.properties
spring.kafka.bootstrap-servers=prod-cluster:9092
And this is very common, and nothing wrong with it.
It's possible you've got a producer somewhere in your environment that has misconfigured (or altogether missing) the production config file, and has decided to fallback into the dev properties or defined their dev bootstrap servers in the non-environment specific config, but as a consumer or the Kafka administrator, there's no way you'd know that.
Assuming you are able to consume the messages, and the producer has code in source control, you can look for trails there, but otherwise, you're effectively left looking for active network connections to the broker at the TCP level and doing packet analysis

How to connect to someone else's public Kafka Topic

Apologies if this is a very basic question.
I'm just starting to get to grips with Kafka and have been given a kafka endpoint and topic to push messages to but I'm not actually sure where, when writing the consumer, to specify the end point. Atm I've only had experience in creating a consumer for a broker and producer that is running locally on my machine and so was able to do this by setting the bootstrap server to my local host and port.
I have an inkling that it may be something to do with the advertised listeners settings but I am unsure how it works.
Again sorry if this seems like a very basic question but I couldn't find the answer
Thank you!
Advertised listeners are a broker setting. If someone else setup Kafka, then all you need to do is change the bootstrap address
If it's "public" over the internet, then chances are you might also need to configure certificates & authentication
Connecting to a public cluster is same as connecting to a local deployment.
Im assuming that your are provided with a FQDN of the cluster and the topic name.
You need to add the FQDN to the bootstrap.servers property of your consumer and subscribe to the topics using the subscribe()
you might want to look into the client.dns.lookup property if you want to change the discovery strategy.
Additionally you might have to configure the keystore and a truststore depending on the security configuration on the cluster

Right way to configure listeners property in kafka brokers

We have a cluster of 5 brokers and have configured server.properties as below
listeners=PLAINTEXT://kafka1:9092
advertised.listeners=PLAINTEXT://kafka1:9092
I have added entries like below in /etc/hosts file of all the brokers, producers and consumers
"Private:IP:kafka:broker1" kafka1
This works for us for the most part and we don't have to remember private IPs of the bootstrap servers when configuring new consumers.
I would like to know if this is an okay way to communicate among kafka brokers and clients?
Since I am not a DevOps guy, I am not sure if this could potentially cause hidden problems. Please comment on this.
Another thing is that I am seeing random disconnections among Kafka broker and clients leading to different problems. I just want to clear the possibility that this is somehow causing problems.
I have added entries like below in /etc/hosts file of all the brokers, producers and consumers
This is NOT okay. Please do not do this
If you cannot resolve the hosts via your bootstrap.servers property alone, then the listeners are not correct.
Please read this explanation of Kafka Listeners for all details you could want.
we don't have to remember private IPs of the bootstrap servers when configuring new consumers
You could use a service discovery tool to accommodate for this problem. Consul is a popular one, then you would just point at kafka.service.consul:9092 and it "just works" via the magic of DNS.
Or you should standardize on a Kafka client library that is already pre-configured with at least the bootstrap servers setting, then you release this "library" internally to your developers for use

ActiveMQ Deployment model

I have gone through (not fully) ActiveMQ and tried to figure out the deployment model for my application.
I am bit confused on that.
I want to make the system High Availability and decided to use the following. Please correct me if anything is wrong or disadvantage of the model.
Deployment Modle:
Will deploy Brokers in M1 and M2 respectivley.
Use Hardware load balancer (Either F5 or Zeus) to connect either one of the broker (M1 or M2) based on the load.
Want to publish a message using Load balancer URL.
I have gone through network of brokers and we need to mintain some topology. I fell which makes the system more complicated if system grows horizontally. So it is better to have one load balancer to distribute the load.
Questions
Is this above model will send message to any one of the Broker?
Consumer Will be deployed in Tomcat (Think i need to use embeded brokers to configure either M1 or M2). Is it possible to use Load balancer URL instaed of M1 or M2?
Is it possible to have single Web Console Admin to monitor both M1 and M2.
Do we have any performance issue using Spring's feature to consume message.
Sorry to shoot out so many questions. Please help me to correct the deployment model.
I think the best way to get some load balancing with some activemq servers is having a : network of brokers and your consumers/producers (in your webapps) should use some failover
So if a producer p1 send a message on a queue on broker 1, the consumer c1 can read the message on broker 2.
[Edit] I have never tried to add some hardware balancer instead of the activemq protocole failover. It should work : just try it and tell us.
3- I do not think it is possible to have only one Web Console to monitor both of your brokers.
4- As far as I am concerned I do not have any performance issue with my Spring configuration.
There are a lot of questions there.
The first thing is to do is start simple. If your application's load is being handled with just one broker, consider setting up high availability through a master-slave setup. For this you do not need a load balancer - the ActiveMQ client library has a failover mechanism where you can define the URLs to a set of brokers that the client should attempt to connect to.
If you are looking at setting up an infrastructure where one broker will not be able to deal with the message load (you can test the maximum throughput of your broker using the performance module), I would advise you to read up on how networks of brokers work. If you do go down this path, you really need to understand ActiveMQ.
On monitoring, a web console can only show you the internals of a single broker. To get insight around what is going on around a set of brokers you will need a monitoring tool such as FuseHQ/Hyperic that is able to aggregate JMX information from a number of boxes.
Performance with Spring is not a problem as long as you configure it correctly (see the section on PooledConnectionFactory).
I see that you are a new user, so if this answers your question, please tick it.