What's the use of ClientQuotaCallback in kafka-clients? - apache-kafka

I find this line in its Comment:"Quota callback interface for brokers that enables customization of client quota computation".but it doesnt has any child class,why?and i googled it but cant find an example.

In Kafka, it was decided to have all broker pluggable APIs as Java interfaces. For that reason, there are a few interfaces in kafka-clients that are not related to the clients. This is because the server side is actually written in Scala.
Anything under org.apache.kafka.server are pluggable APIs for the brokers. These can be used to customize some behaviours on the broker side:
http://kafka.apache.org/20/javadoc/org/apache/kafka/server/policy/package-summary.html
http://kafka.apache.org/20/javadoc/org/apache/kafka/server/quota/package-summary.html
For example, ClientQuotaCallback allows to customize the way quotas are calculated by Kafka brokers. For example, you can build Quotas for groups or have Quotas scale when topic/partitions are created. KIP-257 details exactly how this all works.
Of course, for these to work you need to build implementation of these interfaces and put them in the classpath on your brokers. It's not something that can be used by clients directly.

Related

Get list of kafka consumers for a specific topic

We have a distributed, multi region, multi zone Kafka cluster. We are the platform owners and maintain and administer the cluster. There are applications which utilize our platform for their upstream/downstream data.
Now, how can we list down the consumers which are reading a specific topic?
So far I understand that, we can list all consumer groups and then describe them and thereafter search for topics in that.
Is there any simpler or other available solutions out there?
Without auditing/tracing via authorization plugins, describing each group is the best you can do out of the box.
Related blog that covers using Zipkin for client tracing - https://www.confluent.io/blog/importance-of-distributed-tracing-for-apache-kafka-based-applications/
In several jobs, I've seen several gatekeeping processes such as OpenPolicyAgent, Apache Ranger (for LDAP integration), internal web onboarding portals, etc. that were required for getting access to any Kafka topic.

Kafka producer code will be handled by which team when an event is generated

I have a basic knowledge of kafka topic/producer/consumer and broker.
I would like to understand how this works in realworld.
For example Consider below usecase .
User interacts with Web application
when user clicks on something an event is generated
So there will be one kafka producer running which writes messages to atopic when an event is generated
Then Consumer(for Ex: spark application reads from topic and process the data)
Whose responsibility it is to take care of the producer code? A frond end java/Web developer's? Because web developers are familiar with events and tomcat server logs.
Can anyone explain interms of developer and responsibility of taking care of each section.
In a "standard" scenario, following people/roles are involved:
Infrastructure Dev: Setup Kafka Instance (f.e. openshift/strimzi)
manage topics, users
Frontend Dev: Creating the frontend (f.e. react)
Backend Dev: Implementing backendsystem (f.e. asp .net core)
handle DB Connections, logging, monitoring, IAM, business logic, handling Events, Produce kafka Events, ...)
App Dev anyone writing or managing the "other apps" (f.e.spark application). Consumes (commit) the kafka Events
Since there are plenty implementations of the producer/consumer kafka API it's kind of language agnostic, (see some libs). But you are right the dev implementing the features regarding kafka should at least be familiar with pub-sub.
Be aware we are talking about roles, so there are not necessarily four people involved, it could also just be one person doing the whole job. Also this is just a generic real world scenario and can be completely different in your specific usecase/environment.

Artemis Bridges/Federation

Looking to understand differences between various options for moving messages i.e. diverts , bridges & Federation. As I understand diverts are for within same broker and can mix along with brides.Bridge on the other hand,can be used to move messages to different broker instance(JMS Compliant one).
Then Federation when I read looks similar to Bridging , where messages can be moved/pulled from upstream. Quick help on when which feature to be used is helpful.
Thanks a lot for your help!
Bridges are the most basic way to move messages from one broker to another. However, each bridge can only move messages from one queue to one address, and each bridge must be created manually in broker.xml or programmatically via the management interface. Many messaging use-cases involve dynamically created addresses and queues so manually creating bridges is not feasible. Furthermore, many messaging use-cases involve lots of addresses in queues and manually creating corresponding bridges would be undesirable.
Federation uses bridges behind the scenes, but it allows configuring one element in broker.xml to apply to lots of addresses and queues (even those created dynamically). Federation also allows upstream & downstream configurations whereas bridges can only be configured to "push" messages from one broker to another.

architecture pattern for microservices

I have a microservices architecture whose logs have to be sent to a remote Kafka topic.
Next to it, the consumer of this topic will send the logs to an ELK stack (an other team)
I want to have a dedicated microservice (fwk-proxy-elasticsearch) whose responsability is to collec the logs from the others one and send them to the remote kafka topic.
what's the best protocol to dispatch all the logs aggregated from my microservices to the fwk-proxy-elasticsearch microservice ?
I want this pattern to not duplicate the security configuration of the remote kafka topic. I want to centralize it in a single place.
May I use vertx event bus for that ? or kafka is beter ? or someother tool ?
May I use vertx to send message from jvm to jvm ?
Moreover, in a microservice architecture, is it a good pattern to centralize a use case in a dedicated microservice? (remote http connection for example)
On my point of view, it allows business microservices to focus on a business issue and not to worry over the protocol that the result has to be sent.
Thanks!
I believe you can use both Vert.x event bus and Kafka to propagate the logs, there are pros and cons on each approach.
While I understand the reasoning behind this decision, I would still consider a dedicated solution built for this purpose, like Fluentd, which is able to aggregate the logs and push them into multiple sources (including Kafka, via the dedicated plugin). I'm sure there are other similar solutions.
There are a couple of important benefits that I see if you use a dedicated solution, instead of building it yourself:
The level of configurability, which is definitely useful in the future (in a dedicated solution, you need to write code each time you want to build something new)
The number of destinations where you can export the logs
Support for a hybrid architecture - with a few config updates, you will be able to grab logs from non-JVM microservices

Is Kafka suitable for running a public API?

I have an event stream that I want to publish. It's partitioned into topics, continually updates, will need to scale horizontally (and not having a SPOF is nice), and may require replaying old events in certain circumstances. All the features that seem to match Kafka's capabilities.
I want to publish this to the world through a public API that anyone can connect to and get events. Is Kafka a suitable technology for exposing as a public API?
I've read the Documentation page, but not gone any deeper yet. ACLs seem to be sensible.
My concerns
Consumers will be anywhere in the world. I can't see that being a problem seeing Kafka's architecture. The rate of messages probably won't be more than 10 per second.
Is integration with zookeeper an issue?
Are there any arguments against letting subscriber clients connect that I don't control?
Are there any arguments against letting subscriber clients connect that I don't control?
One of the issues that I would consider is possible group.id collisions.
Let's say that you have one single topic to be used by the world for consuming your messages.
Now if one of your clients has a multi-node system and wants to avoid reading the same message twice, they would set the same group.id to both nodes, forming a consumer group.
But, what if someone else in the world uses the same group.id? They would affect the first client, causing it to lose messages. There seems to be no security at that level.