Apache NiFi to/from Confluent Cloud - apache-kafka

I'm trying to publish custom db data (derived from Microsoft SQL CDC tables, having a join on other tables -> how it's arrived is for a different day though) to Kafka cluster.
I'm able to publish and consume messages from Apache NiFi -to/from- Apache Kafka.
But I'm unable to do publish messages from Apache NiFi -to- Kafka in Confluent Cloud.
Is it possible to publish/consume messages from Apache NiFi (server-A) to Confluent Cloud using the API Key that's created there?
If yes, what is the corresponding property in Apache NiFi's PublishKafkaRecord's processor and ConsumeKafkaRecord processor?
If no, please share any other idea to overcome the constraint.

Yes, NiFi uses the plain Kafka Clients Java API; it can work with any Kafka environment.
Confluent Cloud gives you all the client properties you will need, such as SASL configs for username + password.
Using PublishKafka_2_6 as an example,
Obviously, "Kafka Brokers" is the Bootstrap Brokers, then you have "Username" and "Password" settings for the SASL connection.
Set "Security Protocol" to SASL_SSL and "SASL Mechanism" to PLAIN.
"Delivery Guarantee" will set producer acks.
For any extra properties, use the + button above the properties for setting "Dynamic Properties" (refer above NiFi docs)
share any other idea to overcome the constraint
Use Debezium (Kafka Connect) instead.

Related

Spring Cloud Data Flow Kafka Source

I am new to Spring Cloud Data Flow, and need to listen for messages on a topic from an external kafka cluster. This external kafka topic in confluent cloud would be my Source that I need to pass on to my Sink application.
I am also using kafka as my underlying message broker, which is a separate kafka instance that is deployed on kubernetes. I'm just not sure what is the best approach to connect to this external kafka instance. Is there an existing kafka Source app that I can use, or do I need to create my own Source application to connect to it? Or is it just some kind of configuration that I need to setup to get connected?
Any examples would be helpful. Thanks in advance!

Can Kafka Connect consume data from a separate kerberized Kafka instance and then route to Splunk?

My pipeline is:
Kerberized Kafka --> Logstash (hosted on a different server) --> Splunk.
Can I replace the Logstash component with Kafka Connect?
Could you point me to a resource/guide where I can use kerberized Kafka as a source for my Kafka connect (which is hosted separately)?
From the documentation, what I understood is that if Kafka Connect is hosted on the same cluster as that of Kafka, that's quite possible. But I don't have that option right now, as our Kafka cluster is multi-tenant and hence not approved for additional processes on the cluster.
Kerberos keytabs aren't commonly machine/JVM specific, so yes, Kafka Connect should be able to be configured very similarly to Logstash since both are JVM processes using native Kafka protocol.
You shouldn't run Connect on the brokers anyway
If you can't add Kafka Connect to an existing Kafka cluster, you will have to spin up a separate Kafka Connect (Cluster or standalone).
I've written about it here: enter link description here

Read/Write with Nifi to Kafka in Cloudera Data Platform CDP public cloud

Nifi and Kafka are now both available in Cloudera Data Platform, CDP public cloud. Nifi is great at talking to everything and Kafka is a mainstream message bus, I just wondered:
What are the minimal steps needed to Produce/Consume data to Kafka from Apache Nifi within CDP Public Cloud
I would Ideally look for steps that work in any cloud, for instance Amazon AWS and Microsoft Azure.
I am satisfied with answers that follow best practices and work with the default configuration of the platform, but if there are common alternatives these are welcome as well.
There will be multiple form factors available in the future, for now I will assume you have an environment that contains 1 datahub with NiFi, and 1 Data Hub with Kafka. (The answer still works if both are on the same datahub).
Prerequisites
Data Hub(s) with NiFi and Kafka
Permission to access these (e.g. add processor, create Kafka topic)
Know your Workload User Name (Cdp management console>Click your name (bottom left) > Click profile)
You should have set your Workload Password in the same location
These steps allow you to Produce data from NiFi to Kafka in CDP Public Cloud
Unless mentioned otherwise, I have kept everything to its default settings.
In Kafka Data Hub Cluster:
Gather the FQDN links of the brokers, and the used ports.
If you have Streams Messaging Manager: Go to the brokers tab to see the FQDN and port already together
If you cannot use Streams Messaging Manager: Go to the hardware tab of your Data Hub with Kafka and get the FQDN of the relevant nodes. (Currently these are called broker). Then add :portnumber behind each one. The default port is 9093.
Combine the links together in this format: FQDN:port,FQDN:port,FQDN:port it should now look something like this:
broker1.abc:9093,broker2.abc:9093,broker3.abc:9093
In NiFi GUI:
Make sure you have some data in NiFi to produce, for example by using the GenerateFlowFile processor
Select the relevant processor for writing to kafka, for example PublishKafka_2_0, configure it as follows:
Settings
Automatically terminate relationships: Tick both success and faillure
Properties
Kafka Brokers: The combined list we created earlier
Security Protocol: SASL_SSL
SASL Mechanism: PLAIN
SSL Context Service: Default NiFi SSL Context Service
Username: your Workload User Name (see prerequisites above)
Password: your Workload Password
Topic Name: dennis
Use Transactions: false
Max Metadata Wait Time: 30 sec
Connect your GenerateFlowFile processor to your PublishKafka_2_0 processor and start the flow
These are the minimal steps, a more extensive explanation can be found on in the Cloudera Documentation. Note that it best practice to create topics explicitly (this example leverages the feature of Kafka that automatically lets it create topics when produced to).
These steps allow you to Consume data with NiFi from Kafka in CDP Public Cloud
A good check to see if data was written to Kafka, is consuming it again.
In NiFi GUI:
Create a Kafka consumption processor, for instance ConsumeKafka_2_0, configure its Properties as follows:
Kafka Brokers, Security Protocol, SASL Mechanism, SSL Context Service, Username, Password, Topic Name: All the same as in our producer example above
Consumer Group: 1
Offset Reset: earliest
Create another processor, or a funnel to send the messages to, and start the consumption processor.
And that is it, within 30 seconds you should see that the data that you published to Kafka is now flowing into NiFi again.
Full Disclosure: I am an employee of Cloudera, the driving force behind Nifi.

Generating timestamp based documenIds in kafka connect

I am sending data from Kafka to Couchbase using kafka sink connector (https://github.com/apache/kafka & https://github.com/couchbase/kafka-connect-couchbase).
I am using CB v5.1.0 and kafka 2.12
I have not enabled any kind of documentId generation in kafka connect (in file quickstart-couchbase-sink.properties). So, the connector is using the whole document as key. I want to generate key as topic-partition-offset-randomString-timestamp.
How can this be achieved? I could find something here - https://docs.confluent.io/current/connect/kafka-connect-elasticsearch/configuration_options.html but I don't see key.ignore option anywhere in kafka and kafka-connect code.

Is there any way to forward Kafka messages from topic on one server to topic on another server?

I have a scenario where we are forwarding our application logs to Kafka topic using fluentD agents,
as Kafka team introduced Kerberos authentication and fluentD version not supporting this authentication, I cannot directly use forward logs.
Now we have introduced a new Kafka server without authentication and created a topic there, I want forward messages from this topic in the new server to another topic in another server using Kafka connectors,
want to know how I can achieve this?
There's several different tools that enable you to stream messages from a Kafka topic on one cluster to a different cluster, including:
MirrorMaker (open source, part of Apache Kafka)
Confluent's Replicator (commercial tool, 30 day free trial)
uReplicator (open sourced from Uber)
Mirus (open sourced from Salesforce)
Brucke (open source)
Disclaimer: I work for Confluent.