How to see what kafka client application connected from a certain ip address? - apache-kafka

Although kafka clients are authenticated, and can be restricted (authorized) to connect only from allowed ip addresses, on these app servers multiple applications may be deployed, and it would be beneficial if kafka admin could somehow match certain connection (visible only via netstat on kafka server machine!) to a certain application, either by an application tag passed explicitely by kafka client, or by command name that started the kafka client application passed by the local client operating system to a kafka client (and visible via ps command on unix), and passed through kafka client to kafka broker. Is there already such a possibility?
That would imply that connections are held as browsable objects within kafka, somewhere, either in some internal topic, or in its zookeeper.
Alternatively, at least displaying the authorized principal that initiated connection would also do. That is a question for both consumers and producers.

Related

How to connect to Kafka brokers via proxy over tcp (don't want to use kafka rest)

Please find the screen shot, what we are trying to achieve
Our deployment servers can't connect to internet directly without proxy. Hence needed a way to send messages to outside organization kafka cluster. Please note that we do not want to use kafka rest.
Connecting to Kafka is very complex and doesn't support this scenario; when you first connect to the bootstrap servers; they send the actual connections the client needs to make, based on the broker properties (advertised listeners).

Kafka Post Deployment - Handling ever-growing clients

We have setup a Kafka Cluster for High Availability and distributed data load. The current consumers and producers specify all the broker IP addresses to connect to the cluster. In the future, there will be the need to continuosly monitor the cluster and add a new broker based on collected metrics and overall system performance. In case a broker crashes, as soon as possible we have to add a new broker with a different IP.
In these scenarios, we have to change all client configurations, a time consuming and stressful operation.
I think we can setup a Config Server (e.g. Spring Cloud Config Server) to specify all the broker IP addresses in a centralized manner, so we have to change all in one place, without touching all the clients, but I don't know if it is the best approach. Obviously, the clients must be programmed to get broker list from config server.
There's a better approach?
Worth pointing out that the "bootstrap" process doesn't require giving every single broker address to the clients, really only the first available address in the list is used for the initial connection, then the advertised.listeners on all the broker configs in the cluster, is what the clients actually use
The answer to your question is to use service discovery, yes. That could be Spring Could Config, but the more general option would be Hashicorp Consul or other service that uses DNS (Kubernetes uses CoreDNS, by default, for example, or AWS Route53).
Then you edit the /etc/resolv.conf of each machine (assuming Linux) the client is running on to include the DNS servers, and you can simply refer to kafka.your.domain:9092 rather than using IP addresses
You could use a load balancer (with a friendly dns like kafka.domain.com), which points to all of your brokers. We do this in our environment. Your clients then connect to kafka.domain.com:9092.
As soon as you add new brokers, you only change the load balancer endpoints and not the client configuration.
Additionally please note that you only need to connect to one bootstrap broker and don't have to list all of them in the client configuration.

Client Local Queue in Red Hat AMQ

We have a network of Red Hat AMQ 7.2 brokers with Master/Slave configuration. The client application publish / subscribe to topics on the broker cluster.
How do we handle the situation wherein the network connectivity between the client application and the broker cluster goes down? Does Red Hat AMQ have a native solution like client local queue and a jms to jms bridge between local queue and remote broker so that network connectivity failure will not result in loss of messages.
It would be possible for you to craft a solution where your clients use a local broker and that local broker bridges messages to the remote broker. The local broker will, of course, never lose network connectivity with the local clients since everything is local. However, if the local broker loses connectivity with the remote broker it will act as a buffer and store messages until connectivity with the remote broker is restored. Once connectivity is restored then the local broker will forward the stored messages to the remote broker. This will allow the producers to keep working as if nothing has actually failed. However, you would need to configure all this manually.
That said, even if you don't implement such a solution there is absolutely no need for any message loss even when clients encounter a loss of network connectivity. If you send durable (i.e. persistent) messages then by default the client will wait for a response from the broker telling the client that the broker successfully received and persisted the message to disk. More complex interactions might require local JMS transactions and even more complex interactions may require XA transactions. In any event, there are ways to eliminate the possibility of message loss without implementing some kind of local broker solution.

How many bootstrap servers to provide for large Kafka cluster

I have a use case where my Kafka cluster will have 1000 brokers and I am writing Kafka client.
In order to write client, i need to provide brokers list.
Question is, what are the recommended guidelines to provide brokers list in client?
Is there any proxy like service available in kafka which we can give to client?
- that proxy will know all the brokers in cluster and connect client to appropriate broker.
- like in redis world, we have twemproxy (nutcracker)
- confluent-rest-api can act as proxy?
Is it recommended to provide any specific number of brokers in client, for example provide list of 3 brokers even though cluster has 1000 nodes?
- what if provided brokers gets crashed?
- what if provided brokers restarts and there location/ip changes?
The list of broker URL you pass to the client are only to bootstrap the client. Thus, the client will automatically learn about all other available brokers automatically, and also connect to the correct brokers it need to "talk to".
Thus, if the client is already running, the those brokers go down, the client will not even notice. Only if all those brokers are down at the same time, and you startup the client, the client will "hang" as it cannot connect to the cluster and eventually time out.
It's recommended to provide at least 3 broker URLs to "survive" the outage of 2 brokers. But you can also provide more if you need a higher level of resilience.

Zookeeper - what will happen if I pass in a connection string only some of the nodes from the zk cluster (ensemble)?

I have a zookeeper cluster consisting of N nodes (which knows about each other). What if I pass only M < N of the nodes' addresses in zk client connection string? What will be the cluster's behavior?
In a more specific case, what if I pass host address of only 1 zk from the cluster? Is it possible then for the zk client to connect to other hosts from the cluster? What if this one host is down? Will be client able to connect to other zookeeper nodes in an ensemble?
The other question is, is it possible to limit client to use only specific nodes from the ensemble?
What if I pass only M < N of the nodes' addresses in zk client
connection string? What will be the cluster's behavior?
ZooKeeper clients will connect only to the M nodes specified in the connection string. The ZooKeeper ensemble's back-end interactions (leader election and processing write transaction proposals) will continue to be processed by all N nodes in the cluster. Any of the N nodes still could become the ensemble leader. If a ZooKeeper server receives a write transaction request, and that server is not the current leader, then it will forward the request to the current leader.
In a more specific case, what if I pass host address of only 1 zk from
the cluster? Is it possible then for the zk client to connect to other
hosts from the cluster? What if this one host is down? Will be client
able to connect to other zookeeper nodes in an ensemble?
No, the client would only be able to connect to the single address specified in the connection string. That address effectively becomes a single point of failure for the application, because if the server goes down, the client will not have any other options for establishing a connection.
The other question is, is it possible to limit client to use only specific nodes from the ensemble?
Yes, you can limit the nodes that the client considers for establishing a connection by listing only those nodes in the client's connection string. However, keep in mind that any of the N nodes in the cluster could still become the leader, and then all client write requests will get forwarded to that leader. In that sense, the client is using the other nodes indirectly, but the client is not establishing a direct socket connection to those nodes.
The ZooKeeper Overview page in the Apache documentation has further discussion of client and server behavior in a ZooKeeper cluster. For example, there is a relevant quote in the Implementation section:
As part of the agreement protocol all write requests from clients are
forwarded to a single server, called the leader. The rest of the
ZooKeeper servers, called followers, receive message proposals from
the leader and agree upon message delivery. The messaging layer takes
care of replacing leaders on failures and syncing followers with
leaders.