How to proxy Apache Kafka producer requests on the Kafka broker, and redirect to a separate Kafka cluster? - apache-kafka

How to proxy Apache Kafka producer requests on the Kafka broker, and redirect to a separate Kafka cluster?
In my specific case, it's not possible to update the clients that write to this cluster. This means, it's not feasible to do the following:
Updating the bootstrap broker configuration in the client
Rewrite the client code to support Confluent REST Proxy
Therefore, I'm looking for proxy that would work on the Kafka Protocol
Here are some potential options I've discovered so far:
Kafka-proxy
EnvoyProxy
Does anyone have any experience with the tools above (or alternative tools) that would allow me to redirect a binary TCP Kafka request to a separate Kafka cluster?

Since late 2021, Envoy proxy does that for Produce requests (with kafka-mesh-filter), however it's still a work in progress.
In general your Proxy will need to understand the Kafka protocol and maintain necessary connections to upstream Kafka cluster(s).

Related

Accessing Kafka in Kubernetes from SvelteKit

I am building a SvelteKit application with Kafka Support. I have created a kafka.js file and tested the kafka support with a local kafka setup, which is successful. Now, When I replaced the topic name with a topic that is running in the kafka of our Kubernetes cluster, am not seeing any response.
How do I test this connection that is establishing between Kafka in Kubernetes cluster and the JS web application ? Any hints could be much helpful. Doing just console logs are not helpful so far because kafka itself not getting hit.
Any two pods deployed in the same namespace can communicate using local service names. So, do the brokers have a Service resource?
For example, assuming you are in namespace: default, and you have kafka-svc, then you'd setup bootstrap.servers: kafka-svc.svc.cluster.local.
You also need to configure Kafka's advertised.listeners. Related blog - https://strimzi.io/blog/2019/04/17/accessing-kafka-part-1/
But this requires NodeJS (or other language) backend, and not SvelteKit UI (which cannot connect to backend TCP server, only use some HTTP bridge). Your HTTP options would include some custom server, or Confluent REST Proxy, Strimzi Kafka Bridge, etc. But you were already told this.

Exposing a public kafka cluster

If I were to create a public Kafka cluster that accepts messages from multiple clients, but are purely processed by a separate backend. What would be the right way to design it?
A bit more concrete example, let's say I have 50 kafka brokers. How do I:
Configure clients without the manually adding in IPs of the 50 kafka brokers.?
Loadbalancing messages to kafka broker based on load if possible.
Easier/automated way to setup additional clients with quota.
You can use hashicorp consul which is one of the open source service discovery tools to get your kafka brokers on, ultimately you will have single endpoint and you don't need to add multiple brokers in your clients. There are several other open source told available
There are few ways, use kafka assigner tool to balance the traffic or kafka cruise control open source tool to automatically balance the cluster for you

How many bootstrap servers to provide for large Kafka cluster

I have a use case where my Kafka cluster will have 1000 brokers and I am writing Kafka client.
In order to write client, i need to provide brokers list.
Question is, what are the recommended guidelines to provide brokers list in client?
Is there any proxy like service available in kafka which we can give to client?
- that proxy will know all the brokers in cluster and connect client to appropriate broker.
- like in redis world, we have twemproxy (nutcracker)
- confluent-rest-api can act as proxy?
Is it recommended to provide any specific number of brokers in client, for example provide list of 3 brokers even though cluster has 1000 nodes?
- what if provided brokers gets crashed?
- what if provided brokers restarts and there location/ip changes?
The list of broker URL you pass to the client are only to bootstrap the client. Thus, the client will automatically learn about all other available brokers automatically, and also connect to the correct brokers it need to "talk to".
Thus, if the client is already running, the those brokers go down, the client will not even notice. Only if all those brokers are down at the same time, and you startup the client, the client will "hang" as it cannot connect to the cluster and eventually time out.
It's recommended to provide at least 3 broker URLs to "survive" the outage of 2 brokers. But you can also provide more if you need a higher level of resilience.

What are the benefits of Apache Kafka's native binary TCP protocol over it's restful API?

As per Apache Kafka's documentation, Kafka uses binary TCP protocol in it's native API's communication but they have also provided URL based restful API for the languages which don't support Apache Kafka's native API. I was just wondering if there is any benefit of native binary TCP protocol (supported in native API) over restful URL based communication with broker node? And I was also thinking that will restful API still maintain only once property?
Edit:
The restful API guide is here: https://www.confluent.io/blog/a-comprehensive-open-source-rest-proxy-for-kafka which explains how to produce and consume Kafka's message over restful API
There is no REST API included in Apache Kafka for producing or consuming messages as with the native Kafka protocol client implemented in Java.
There is a REST APIs in Apache Kafka for configuring Kafka Connect.
There are a number of third party REST Proxy implementations (such as the Confluent Kafka REST Proxy) which allow pub/sub over a REST interface but these are separate open source projects outside of Apache Kafka.
If you mean to ask what are the advantages to use the native Kafka Java Producer/Consumer API rather than these third party REST/HTTP Proxy implementation then these are some reasons:
One benefit is greater parallelism. A Kafka client will typically open up TCP connections to multiple brokers in the cluster and send or fetch data in parallel across multiple partitions of the same topic.
Another benefit is better network utilization as HTTP headers can add a lot of size to otherwise small messages while Kafka’s wire protocol is a compact binary protocol.
Kafka clients handle load balancing, failover, and cluster expansion or contraction automatically while REST clients typically require a third party load balancer to achieve the same functionality.
Kafka client can send their own authentication credentials for access control and bandwidth throttling (quotas) while all REST clients look to the kafka cluster as one Kafka client and therefore have common ACL privileges.
Kafka client libraries buffer and batch messages together into smaller numbers of Kafka produce or fetch requests while HTTP can only batch data if the programmer thought to publish them as a single batch.
The native Kafka protocol supports more than just what the producer/consumer api exposes. There is also an Admin API for creating topics, and modifying topic configurations. These functions are not (yet) exposed through the most popular REST Proxy implementations.

Apache Kafka Consumer Connection

I am looking at the docs of Apache Kafka.
The consumer connects to the Kafka by using the IP address/port of zookeepers.
Is it possible to use the IP address/port of broker?
Yes, when using the Simple consumer API you get to manage consumption directly from the brokers. See usage example here
Of course you can.
The consumer connects to the Kafka by using the IP address/port of zookeepers. What you point is High-level API
And connecting to broker directly points Low-level API
Maybe this can help you