How to connect to Kafka brokers via proxy over tcp (don't want to use kafka rest) - apache-kafka

Please find the screen shot, what we are trying to achieve
Our deployment servers can't connect to internet directly without proxy. Hence needed a way to send messages to outside organization kafka cluster. Please note that we do not want to use kafka rest.

Connecting to Kafka is very complex and doesn't support this scenario; when you first connect to the bootstrap servers; they send the actual connections the client needs to make, based on the broker properties (advertised listeners).

Related

How to get dedicated Apache Kafka MirrorMaker 2.0 Rest API exposed

I am trying to reach a dedicated MirrorMaker 2.0 cluster to see the status of connectors/tasks etc. On this README in their git Apache kafka people claims that when used with dedicated.mode.enable.internal.rest=true MirrorMaker nodes are starting with an internal listener port to communicate with each other.
My question is; Is there a way to advertise this port to outside so I can send curl requests to the dedicated MirrorMaker nodes as we do in general like curl http://localhost:8083/connectors to see the connectors running etc?
I have already tried multiple solutions I've found online they simply do not work. It seems to me this is impossible when you start mirrormaker 2.0 with ./bin/connect-mirror-maker. I know this is possible, If I add every single required connector manually to an existing Kafka Connect cluster, but thats not what I am looking for.
I am also curious if there is a way to add the dedicated MirrorMaker cluster connectors into a already running kafka connect cluster.
This is important because we would like to get curl responses to check tasks status for MirrorMaker.
Thanks.
You should be able to run connect-distributed like normal, have its REST API available, then configure and monitor MM2 without using its dedicated scripts. Similarly, this is how you'd add to other existing Connect cluster.
Ideally, you should monitor from JMX, instead, where you get count of the running tasks, not use curl. Or, add Jolokia or Prometheus JMX Exporter to run their own http server, then curl that, and grep for the tasks metric

How to add blacklist IPs in Apache kafka?

I am sending some filebeat data to kafka. What I want is that kafka may only take data of specified IPs. Can anyone tell how can I configure kafka for particular IPs I configure?
A better way would probably be to implement Access Control Lists (ACLs). That way if your filebeat process moves servers you don't have to arbitrarily change the accepted IP list on the Kafka machines.
However, if you actually want to create an accept-list of IPs, this isn't a Kafka feature but something you'd implement at the networking layer on your Kafka machine, with a rule to accept traffic from certain hosts to the Kafka port. For example, I found this iptables guide which shows how to accept traffic for a given service (SSH in the example, but you could amend it to Kafka) only from a particular IP.

How to see what kafka client application connected from a certain ip address?

Although kafka clients are authenticated, and can be restricted (authorized) to connect only from allowed ip addresses, on these app servers multiple applications may be deployed, and it would be beneficial if kafka admin could somehow match certain connection (visible only via netstat on kafka server machine!) to a certain application, either by an application tag passed explicitely by kafka client, or by command name that started the kafka client application passed by the local client operating system to a kafka client (and visible via ps command on unix), and passed through kafka client to kafka broker. Is there already such a possibility?
That would imply that connections are held as browsable objects within kafka, somewhere, either in some internal topic, or in its zookeeper.
Alternatively, at least displaying the authorized principal that initiated connection would also do. That is a question for both consumers and producers.

Kafka communication using cloudera

On-prem--->HAProxy(AWS)--->Kafka(AWS). We can allow the external communication using advertised.listerers property and we can use the listeners for internal communication. If we are enabling both the settings,the communication is not happening properly. We are using 0.10.2 as Kafka version.
I believe we have some setting to do through zookeeper to control the broker communication. How we can do it using cloudera?
See https://rmoff.net/2018/08/02/kafka-listeners-explained/. You need to set up two listeners, one for the external IP through which your clients connect, one for the internal AWS network through which your brokers will communicate with each other.

How many bootstrap servers to provide for large Kafka cluster

I have a use case where my Kafka cluster will have 1000 brokers and I am writing Kafka client.
In order to write client, i need to provide brokers list.
Question is, what are the recommended guidelines to provide brokers list in client?
Is there any proxy like service available in kafka which we can give to client?
- that proxy will know all the brokers in cluster and connect client to appropriate broker.
- like in redis world, we have twemproxy (nutcracker)
- confluent-rest-api can act as proxy?
Is it recommended to provide any specific number of brokers in client, for example provide list of 3 brokers even though cluster has 1000 nodes?
- what if provided brokers gets crashed?
- what if provided brokers restarts and there location/ip changes?
The list of broker URL you pass to the client are only to bootstrap the client. Thus, the client will automatically learn about all other available brokers automatically, and also connect to the correct brokers it need to "talk to".
Thus, if the client is already running, the those brokers go down, the client will not even notice. Only if all those brokers are down at the same time, and you startup the client, the client will "hang" as it cannot connect to the cluster and eventually time out.
It's recommended to provide at least 3 broker URLs to "survive" the outage of 2 brokers. But you can also provide more if you need a higher level of resilience.