I am receiving huge traffic to one of the topic . Unable to identify the source. Is there any way to find IP or domain of producer sending traffic from kafka/zookeeper using metrics or commands.
Related
I have to produce messages based on virtual iP's (Those ip's are targetting the same Kafka cluster behind) .
So i need to extract the IP from the url (producer request) to route the mmessage to a specific topic before the message is persisted to Kafka .
**Example **
Static IP's on host machine available :
192.168.0.2
192.168.0.3
192.168.0.4
192.168.0.5
Destination topics
Dest02 for IP 192.168.0.2
Dest03 for IP 192.168.0.3
Dest04 etc....
Dest05
So i Publish a record 001 to topicA (Virtual IP set to producer config > 192.168.0.2) in the service
=> record001 is routed to Dest02 destination topic
If you wonder why i want to route my message this way its because i cannot change the upstream service (producer) nor the downstream service neither (consumers) .
One more thing , i need to base this logic on the virtual IP as its used as my discriminant element to take a decision . otherwise i would not be able to know where to rouge my message
Thanks for your help
I am investigating on SMT's with HTTP source connector to try to catch the message before its written in Kafka brokers but maybe its not a good approach .
I am sending some filebeat data to kafka. What I want is that kafka may only take data of specified IPs. Can anyone tell how can I configure kafka for particular IPs I configure?
A better way would probably be to implement Access Control Lists (ACLs). That way if your filebeat process moves servers you don't have to arbitrarily change the accepted IP list on the Kafka machines.
However, if you actually want to create an accept-list of IPs, this isn't a Kafka feature but something you'd implement at the networking layer on your Kafka machine, with a rule to accept traffic from certain hosts to the Kafka port. For example, I found this iptables guide which shows how to accept traffic for a given service (SSH in the example, but you could amend it to Kafka) only from a particular IP.
We have setup a Kafka Cluster for High Availability and distributed data load. The current consumers and producers specify all the broker IP addresses to connect to the cluster. In the future, there will be the need to continuosly monitor the cluster and add a new broker based on collected metrics and overall system performance. In case a broker crashes, as soon as possible we have to add a new broker with a different IP.
In these scenarios, we have to change all client configurations, a time consuming and stressful operation.
I think we can setup a Config Server (e.g. Spring Cloud Config Server) to specify all the broker IP addresses in a centralized manner, so we have to change all in one place, without touching all the clients, but I don't know if it is the best approach. Obviously, the clients must be programmed to get broker list from config server.
There's a better approach?
Worth pointing out that the "bootstrap" process doesn't require giving every single broker address to the clients, really only the first available address in the list is used for the initial connection, then the advertised.listeners on all the broker configs in the cluster, is what the clients actually use
The answer to your question is to use service discovery, yes. That could be Spring Could Config, but the more general option would be Hashicorp Consul or other service that uses DNS (Kubernetes uses CoreDNS, by default, for example, or AWS Route53).
Then you edit the /etc/resolv.conf of each machine (assuming Linux) the client is running on to include the DNS servers, and you can simply refer to kafka.your.domain:9092 rather than using IP addresses
You could use a load balancer (with a friendly dns like kafka.domain.com), which points to all of your brokers. We do this in our environment. Your clients then connect to kafka.domain.com:9092.
As soon as you add new brokers, you only change the load balancer endpoints and not the client configuration.
Additionally please note that you only need to connect to one bootstrap broker and don't have to list all of them in the client configuration.
I use to configure bootstrap.servers in my kafka producer/consumer/stream apps with a list of broker ips. But I’d like to move to a single url entry that will be resolved by the DNS lookup to a broker ip currently known as up (DNS actively check the brokers in the cluster and responds to lookup with an IP short TTL [10s]). This gives me more flexibility to add brokers in the future, and I can keep the same config in my apps across all the environments/stages. Is this a recommended approach, or this remove resiliency on the client side to not have a strict list of brokers? I assume this config would only be used to initially “discover” the cluster and the partition leader brokers.
If anything, I'd say this adds a single point of failure on the single address you're providing, unless it's actually a load balanced, reverse proxy.
Another possibility that's worked somewhat well internally is using Consul service discovery, with Consul agents running on each broker. This way, you can do service discovery as well as health checks and easier monitoring setup, e.g. having Prometheus jmx_exporter on the brokers, and Prometheus Server scraping those values for all kafka.service.consul addresses
I have 3 Kafka brokers running in a isolated network region, my client can not connect them directly, so I have to use a VIP(virtual ip) to connect the brokers.
For example:
my brokers' IP are: 10.5.1.5, 10.5.1.6, 10.5.1.7,
my VIPs' ip are: 200.100.1.5, 200.100.1.6, 200.100.1.7, they one to one paired.
So when I indicate the bootstrap list as 200.100.1.5, the cluster response me the mixed VIPs and Broker ips, such as: 10.5.1.5, 10.5.1.6, 200.100.1.5, 200.100.1.6 ..., then the connection failed, because my program can not reach broker's ip, only can reach VIPs.
My current configuration as following, it responses both IP and VIP:
listeners=INTERNAL://:9092,EXTERNAL_PLAINTEXT://:8080
advertised.listeners=EXTERNAL_PLAINTEXT://200.100.1.5:8080,INTERNAL://10.5.1.5:9092
listener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL_PLAINTEXT:PLAINTEXT
inter.broker.listener.name=INTERNAL
How can I let Kafka only response the VIP list please.
I've got the answer, it could be the following:
advertised.listeners=PLAINTEXT://200.100.1.5:8080
listeners=PLAINTEXT://10.5.1.5:9092
And remove the listener.security and inter.broker.
You can use the broker setting called advertised.listeners to tell your brokers to include a different IP/hostname in their response to clients.
advertised.listeners:
Listeners to publish to ZooKeeper for clients to use, if different
than the listeners config property. In IaaS environments, this may
need to be different from the interface to which the broker binds. If
this is not set, the value for listeners will be used. Unlike
listeners it is not valid to advertise the 0.0.0.0 meta-address.
In your example, for the first broker you can have:
advertised.listeners=PLAINTEXT://200.100.1.5:9092
listeners=PLAINTEXT://10.5.1.5:9092