As the title suggest I am looking for a way to restrict the Kafka Consumers to publish the data to my kafka topic . For example we can have read only user in the database .
My use case requires me to get the data from some vendors, enrich this data and publish it to our Kafka topic. The data from this topic will be be read by few consumers. And as I understand I have to provide the same username and password ( the one that producer is using ) to the consumers. So is there some way to stop consumers to publish the data in the Kafka topic .
E.g: restricting based on the username/ IP address.
By default, there is no such feature in Kafka.
Apache Ranger and Open Policy Agent are example systems that you can implement much richer ACL policies than the Zookeeper options built-in. In my experience, only Ranger can do IP whitelisting.
I don't think the default Zookeeper ACL policies can do much other than username&password access.
Related
I am trying to plot an overall topology for my Kafka cluster (i.e., producers-->topics-->consumers).
For the mapping from topics to consumers, I'm able to obtain it using the kafka-consumer-groups.sh script.
However, for the mapping from producers to topics, I understand there is no equivalent script in vanilla Kafka.
Question:
Does the Schema Registry allow us to associate metadata with producers and/or topics or otherwise create a mapping of all producers producing to a particular topic?
Schema Registry has no such functionality
Closest I've seen to something like this, is using Distributed Tracing (Brave library) or Cloudera's SMM tool, which requires authorized Kafka clients so it can trace requests and Producer client.id to topics, then consumer instances to groups
There's also the Stream Registry project
which I helped with the initial version for the vision of managing client state/discovery, but I think it took different direction and the documentation is not maintained
We are doing a stateful operation. Our cluster is managed. Everytime for internal topic creation , we have to ask admin guys to unlock so that internal topics can be created by the kafka stream app. We have control over target cluster not source cluster.
So, wanted to understand which cluster - source/ target are internal topics created?
AFAIK, There is only one cluster that the kafka-streams app connects to and all topics source/target/internal are created there.
So far, Kafka Stream applications can support connection to only one cluster as defined in the BOOTSTRAP_SERVERS_CONFIG in Stream configurations.
As answered above also, all source topics reside in those brokers and all internal topics(changelog/repartition topics) are created in the same cluster. KStream app will create the target topic in the same cluster as well.
It will be worth looking into the server logs to understand and analyze the actual root cause.
As the other answers suggest there should be only one cluster that the Kafka Stream application connects to. Internal topics are created by the Kafka stream application and will only be used by the application that created it. However, there could be some configuration related to security set on the Broker side which could be preventing the streaming application from creating these topics:
If security is enabled on the Kafka brokers, you must grant the underlying clients admin permissions so that they can create internal topics set. For more information, see Streams Security.
Quoted from here
Another point to keep in mind is that the internal topics are automatically created by the Stream application and there is no explicit configuration for auto creation of internal topics.
Nifi and Kafka are now both available in Cloudera Data Platform, CDP public cloud. Nifi is great at talking to everything and Kafka is a mainstream message bus, I just wondered:
What are the minimal steps needed to Produce/Consume data to Kafka from Apache Nifi within CDP Public Cloud
I would Ideally look for steps that work in any cloud, for instance Amazon AWS and Microsoft Azure.
I am satisfied with answers that follow best practices and work with the default configuration of the platform, but if there are common alternatives these are welcome as well.
There will be multiple form factors available in the future, for now I will assume you have an environment that contains 1 datahub with NiFi, and 1 Data Hub with Kafka. (The answer still works if both are on the same datahub).
Prerequisites
Data Hub(s) with NiFi and Kafka
Permission to access these (e.g. add processor, create Kafka topic)
Know your Workload User Name (Cdp management console>Click your name (bottom left) > Click profile)
You should have set your Workload Password in the same location
These steps allow you to Produce data from NiFi to Kafka in CDP Public Cloud
Unless mentioned otherwise, I have kept everything to its default settings.
In Kafka Data Hub Cluster:
Gather the FQDN links of the brokers, and the used ports.
If you have Streams Messaging Manager: Go to the brokers tab to see the FQDN and port already together
If you cannot use Streams Messaging Manager: Go to the hardware tab of your Data Hub with Kafka and get the FQDN of the relevant nodes. (Currently these are called broker). Then add :portnumber behind each one. The default port is 9093.
Combine the links together in this format: FQDN:port,FQDN:port,FQDN:port it should now look something like this:
broker1.abc:9093,broker2.abc:9093,broker3.abc:9093
In NiFi GUI:
Make sure you have some data in NiFi to produce, for example by using the GenerateFlowFile processor
Select the relevant processor for writing to kafka, for example PublishKafka_2_0, configure it as follows:
Settings
Automatically terminate relationships: Tick both success and faillure
Properties
Kafka Brokers: The combined list we created earlier
Security Protocol: SASL_SSL
SASL Mechanism: PLAIN
SSL Context Service: Default NiFi SSL Context Service
Username: your Workload User Name (see prerequisites above)
Password: your Workload Password
Topic Name: dennis
Use Transactions: false
Max Metadata Wait Time: 30 sec
Connect your GenerateFlowFile processor to your PublishKafka_2_0 processor and start the flow
These are the minimal steps, a more extensive explanation can be found on in the Cloudera Documentation. Note that it best practice to create topics explicitly (this example leverages the feature of Kafka that automatically lets it create topics when produced to).
These steps allow you to Consume data with NiFi from Kafka in CDP Public Cloud
A good check to see if data was written to Kafka, is consuming it again.
In NiFi GUI:
Create a Kafka consumption processor, for instance ConsumeKafka_2_0, configure its Properties as follows:
Kafka Brokers, Security Protocol, SASL Mechanism, SSL Context Service, Username, Password, Topic Name: All the same as in our producer example above
Consumer Group: 1
Offset Reset: earliest
Create another processor, or a funnel to send the messages to, and start the consumption processor.
And that is it, within 30 seconds you should see that the data that you published to Kafka is now flowing into NiFi again.
Full Disclosure: I am an employee of Cloudera, the driving force behind Nifi.
Is there a way to collect Kafka Producer configs from the Kafka cluster?
I know that these settings are stored on the client itself.
I am interested at least in client.id and a collection of topics that the producer is publishing to.
There is no such tool provided by Apache Kafka (or Confluent) to acquire this information.
I worked on a team that built a tool called the Stream Registry that did provide a centralized location for this information
May be you can have a look into kafkacat.github url
We find it very helpful in troubleshooting, kafka issues.
I have a scenario where we are forwarding our application logs to Kafka topic using fluentD agents,
as Kafka team introduced Kerberos authentication and fluentD version not supporting this authentication, I cannot directly use forward logs.
Now we have introduced a new Kafka server without authentication and created a topic there, I want forward messages from this topic in the new server to another topic in another server using Kafka connectors,
want to know how I can achieve this?
There's several different tools that enable you to stream messages from a Kafka topic on one cluster to a different cluster, including:
MirrorMaker (open source, part of Apache Kafka)
Confluent's Replicator (commercial tool, 30 day free trial)
uReplicator (open sourced from Uber)
Mirus (open sourced from Salesforce)
Brucke (open source)
Disclaimer: I work for Confluent.