Get list of kafka consumers for a specific topic - apache-kafka

We have a distributed, multi region, multi zone Kafka cluster. We are the platform owners and maintain and administer the cluster. There are applications which utilize our platform for their upstream/downstream data.
Now, how can we list down the consumers which are reading a specific topic?
So far I understand that, we can list all consumer groups and then describe them and thereafter search for topics in that.
Is there any simpler or other available solutions out there?

Without auditing/tracing via authorization plugins, describing each group is the best you can do out of the box.
Related blog that covers using Zipkin for client tracing - https://www.confluent.io/blog/importance-of-distributed-tracing-for-apache-kafka-based-applications/
In several jobs, I've seen several gatekeeping processes such as OpenPolicyAgent, Apache Ranger (for LDAP integration), internal web onboarding portals, etc. that were required for getting access to any Kafka topic.

Related

How to implement interceptor inside kafka broker for multi-tenancy in confluent cloud

I want to use the multi-tenancy feature as shown in the Confluent blog - https://www.confluent.io/blog/cloud-native-multi-tenant-kafka-with-confluent-cloud/.
When producers produce to the same topic say ‘topic-a’, their data needs to be isolated. Internally, it may forward the message to different topics according to the tenant ID for isolation. When consumers consume these messages from ‘topic-a’, it should only get the messages associated with the consumer’s tenant ID. The blog shows it is possible to implement this, and an interceptor within the kafka broker is used to do so.
But in Confluent cloud, I don’t see any option to implement this.

Kafka P2P Header based routing

I have a requirement to send an event to multiple systems based on their system code. Destination system can grow in future and they should be able to subscribe only to the interested events. Its a security mandate, so as a producer we need to ensure this.
We could use RabbitMQ header exchange and use multiple shovel configurations to the different queues in different vhost or cluster. But I am looking for a similar pattern with Kafka.
If we maintain different topic and authorise the consumer to their corresponding topic, it can grow in future, so as a producer I need to do the topic routing logic and the number of topics will grow.
The other option is to use AWS SNS and subscribe multiple SQS queues. Based on filter policies the message can be routed.
Could anyone think of a better solution to this problem?
send an event to multiple systems based on their system code
Using Kafka Streams API, you can use branching to route data to different topics based on Predicate logic
Once data is in their respective topics, "multiple systems" can consume them

Kafka producer code will be handled by which team when an event is generated

I have a basic knowledge of kafka topic/producer/consumer and broker.
I would like to understand how this works in realworld.
For example Consider below usecase .
User interacts with Web application
when user clicks on something an event is generated
So there will be one kafka producer running which writes messages to atopic when an event is generated
Then Consumer(for Ex: spark application reads from topic and process the data)
Whose responsibility it is to take care of the producer code? A frond end java/Web developer's? Because web developers are familiar with events and tomcat server logs.
Can anyone explain interms of developer and responsibility of taking care of each section.
In a "standard" scenario, following people/roles are involved:
Infrastructure Dev: Setup Kafka Instance (f.e. openshift/strimzi)
manage topics, users
Frontend Dev: Creating the frontend (f.e. react)
Backend Dev: Implementing backendsystem (f.e. asp .net core)
handle DB Connections, logging, monitoring, IAM, business logic, handling Events, Produce kafka Events, ...)
App Dev anyone writing or managing the "other apps" (f.e.spark application). Consumes (commit) the kafka Events
Since there are plenty implementations of the producer/consumer kafka API it's kind of language agnostic, (see some libs). But you are right the dev implementing the features regarding kafka should at least be familiar with pub-sub.
Be aware we are talking about roles, so there are not necessarily four people involved, it could also just be one person doing the whole job. Also this is just a generic real world scenario and can be completely different in your specific usecase/environment.

How do I limit a Kafka client to only access one customer's data?

I'm evaluating Apache Kafka for publishing some event streams (and commands) between my services running on machines.
However, most of those machines are owned by customers, on their premises, and connected to their networks.
I don't want a machine owned by one customer to have access to another customer's data.
I see that Kafka has an access control module, which looks like it lets you restrict a client's access based on topic.
So, I could create a topic per customer and restrict each customer to just their own topic. This seems like a bad idea that I could regret in the future, because I've seen things that recommend restricting the number of Kafka topics to the 1000s at most.
Another design is to create a partition per customer. However, I don't see a way to restrict access if I do that.
Is there a way out of this quandary?

Kafka instead of Zookeeper for cluster management

I am writing a clustered application sitting on top of Kafka -- it uses Kafka exclusively for interprocess communications and coordination. I could use Zookeeper to manage my cluster -- but it would not be very difficult to use Kafka topics to manage the cluster. And the more I think about it, other than for historical reasons, it seems like Kafka could drop Zookeeper and just use a topic-based solution
For example, there could be a special topic or topics in Kafka where you publish all of the same data currently kept track of in Zookeeper. Brokers, Topics, Partitions, Leaders, etc -- seems like this is just as easily tracked via Kafka topics as via Zookeeper.
I know in Kafka 0.9.0 there's some movement away from Zookeeper, more towards this model, and remember my question is less about Kafka development or more me trying to figure out which direction to go in my application.
I'm not asking for an opinion -- what I want to know is are there any specific functions provided by Zookeeper that are going to be difficult with a Kafka/topic-based approach to coordination. But I can't think of anything.
Even heartbeat monitoring -- which was the reason I started looking at Zookeeper in the first place -- you could have a client connection topic, and clients could publish to it when they join the cluster, publish heartbeats at a given interval, and publish as they leave it.
Let us start from a space eyed view: You have two distributed
systems which store data. Zookeeper organizes it's data in nodes in some kind
of directory like structure. Kafka stores messages within topics.
From a bird eye view kafka is build for high-throughput and scalability while one of zookeepers
main design goal is consistency. Zookeeper is mean to be a a Distributed Coordination Service for
Distributed Applications while Kafka can be thought as a distributed commit log.
So the answer to your question is surprisingly: 'It depends'. For coordinating
a distributed system I would use zookeeper: Thats what it was build for. You could
do this also with kafka but there are couple of things which needs to be done
manualy which comes out of the box if you are using zookeeper.
Some examples:
Consistency: The ZK-Client can choose if he needs strong or a eventual consistency
Ephemeral nodes: Together with ZK-Watches a great thing to react on failing services
Sequential Consistency: It's not granted that you recieve the kafka-message in the order you wrote it to the broker (it's only granted that messages within a partion are ordered)
ACLs: Never used it but its at least something which is not offered out of the box by kafka
Sequence Nodes
A pretty nice overview about what you can do with zookeeper are the zookeeper-recipes: https://zookeeper.apache.org/doc/trunk/recipes.html
[EDIT]: Heartbeating an application using kafka is of course possible. But ephemeral nodes in zookeeper are in my eyes the easier option.
This is currently being worked on in scope of KIP-500.