confluent kafka s3 connector worker failed in connecting to kafka authenticated by krb5 - kerberos

I am working on leveraging confluent kafka s3 connector to download kafka record and save them as parquet file in minio. It worked fine with a dummy kafka without authentication.
Now I'm trying to verify the flow with a real kafka instance that requires kerberos authentication.
I assured my kerberos credentials are in place and set up properly.
env | grep OPTS
KAFKA_OPTS=-Djava.security.krb5.conf=/plugins/krb5.conf -Djava.security.auth.login.config=/plugins/kafka_client_jaas.conf
in my connector.properties file I specified the following
#connecting to kafka
security.protocol=SASL_PLAINTEXT
sasl.mechanism=GSSAPI
sasl.kerberos.service.name=kafka
Now I start the connector-standalone.sh
root#2c553a4e0b7c:/opt/bitnami/kafka/bin# ./connect-standalone.sh /plugins/connector.properties /plugins/s3-sink.properties
[2022-09-21 10:37:43,559] WARN [Consumer clientId=connector-consumer-s3-sink-0, groupId=connect-s3-sink] Bootstrap broker broker:9030 (id: -1 rack: n
ull) disconnected (org.apache.kafka.clients.NetworkClient:1024)
It seems that the krb5 authentication is not enabled.
I've tried kafka-console-consumer.sh with the same krb configs and credentials and it all worked ok.
It's likely that the three lines of configuration in connector.properties did not take effective. Are the correct configurations to to notify worker to use krb5?
security.protocol=SASL_PLAINTEXT
sasl.mechanism=GSSAPI
sasl.kerberos.service.name=kafka

Kafka-connect, Bootstrap broker disconnected
Answers to this question apply to here as well.
There's no document or whatsoever that talks about CONNECT_CONSUMER env vars must be set and repeat the jvm parameters.

Related

Kafka-Zookeeper Authentication without SASL

I am trying to enable SASL_PLAINTEXT authentication between Kafka broker and client, while not requiring it between Kafka and Zookeeper. Currently I am using confluent offering of Kafka, with CDH Zookeeper. Is there a way to pass in a flag to disable SASL for Kafka <-> Zookeeper?
You need to set the environment variable ZOOKEEPER_SASL_ENABLED=false. This is configured for Kafka broker

Trouble with Apache Kafka to Allow External Connections

I'm just having a difficult time with Kafka right now, but I feel like I'm close.
I have two VMs on FreeNAS running locally. Both Running Ubuntu 18.04 LTS.
VM Graylog: 192.168.1.25. Running Graylog Server. Working well retrieving rsyslogs and apache from itself.
VM Kafka: 192.168.1.16. Running Kafka.
My goal is to have VM Graylog pull logs from VM Kafka, via a Graylog Kafka UDP input. The secondary goal is to replicate this, except tha the Kafka instance will sit on my VPS server feeding apache logs from a website. Of course, I want to test this in a dev environment first.
I am able to have my VM Kafka server successfully listen through this line of code:
/opt/kafka_2.13-2.6.0/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic rsyslog_kafka --from-beginning
This is my 60-kafka.conf file:
module(load="omkafka")
template(name="json"
type="list"
option.json="on") {
constant(value="{")
constant(value="\"#timestamp\":\"") property(name="timereported" dateFormat="rfc33$
constant(value="\",\"#version\":\"1")
constant(value="\",\"message\":\"") property(name="msg")
constant(value="\",\"host\":\"") property(name="hostname")
constant(value="\",\"severity\":\"") property(name="syslogseverity-text")
constant(value="\",\"facility\":\"") property(name="syslogfacility-text")
constant(value="\",\"programname\":\"") property(name="programname")
constant(value="\",\"procid\":\"") property(name="procid")
constant(value="\"}\n")
}
action(
broker=["192.168.1.16:9092"]
type="omkafka"
topic="rsyslog_kafka"
template="json"
)
I'm using the default server.properties file which doesn't contain any listeners, just the defaults. I do understand I need to set the listeners and advertised.listeners.
I've attempted the following settings to no avail:
Attempt 1:
listeners = PLAINTEXT://localhost:9092
advertised.listeners=PLAINTEXT://192.168.1.16:9092
Attempt 2:
listeners = PLAINTEXT://127.0.0.1:9092
advertised.listeners=PLAINTEXT://192.168.1.16:9092
This after reloading both Kafka and Rsyslog and confirming their statuses are active.
Example errors when attempting to read messages.
Bunch of these
[2020-08-20 00:52:42,248] WARN [Consumer clientId=consumer-console-consumer-70205-1, groupId=console-consumer-70205] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
Followed by an infinite amount of these:
[2020-08-20 00:48:50,598] WARN [Consumer clientId=consumer-console-consumer-11975-1, groupId=console-consumer-11975] Error while fetching metadata with correlation id 254 : {rsyslog_kafka=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
I feel like I'm close. Perhaps there is something I'm just understanding. I've read lots of similar articles where they say just replace the IP addresses with your server. I feel like I've done that, with no success.
You need to set listeners to PLAINTEXT://0.0.0.0:9092 in order to bind externally.
The advertised listener ought to be set to an address that your consumers will be able to use to discover the cluster
Note: Docker Compose might be easier than VMs

Kafka zookeeper authentication not working

I am trying to enable SASL username and password for a Kafka cluster with no ssl. I followed the steps on this Stackoverflow:
Kafka SASL zookeeper authentication
SASL authentication seems to be working for Kafka brokers. consumers and producers have to authenticate before writing to or reading from a topic. So far so good.
The problem is with creating and deleting topics on kafka. when I try to use the following command for example:
~/kafka/bin/kafka-topics.sh --list --zookeeper 10.x.y.z:2181
I am able to list all topics in the kafka cluster and create or delete any topic with no authentication at all.
I tried to follow the steps here:
Super User Authentication and Authorization
but nothing seem to work.
Any help in this matter is really appreciated.
Thanks & Regards,
Firas Khasawneh
You need to add zookeeper.set.acl=true to your Kafka server.properties so that Kafka will create everything in zookeeper with ACL set. For the topics which are already there, there will be no ACL and everyone can remove them directly from zookeeper.
Actually because of that mess, I had to delete everything from my zookeeper and Kafka and start from scratch.
But once everything is set, you can open zookeeper shell to verify that the ACL is indeed set:
KAFKA_OPTS="-Djava.security.auth.login.config=/path/to/your/jaas.conf" bin/zookeeper-shell.sh XXXXX:2181
From the shell you can run: getAcl /brokers/topics and check that not anyone from world have cdrwa
On a side note, the link you provided doesn't seem to reflect how the current version of Kafka stores information in zookeeper. I briefly looked at the codes and for those kafka-topics.sh commands, the topics information is from /brokers/topics instead of /config/topics

Not Kerberized Kafka broker connection to Kerberized Zookeeper

I couldn't find any info about this issue, so I'd be glad if someone could help me on this.
I have a Kerberized cluster with services such as Hbase, MapReduce, HDFS, Zookeeper,... all kerberized and working.
Let's imagine I want to add some kafka brokers to the cluster, but I do not want to Kerberize Kafka, since a shot in the testicles makes me feel better than the idea of a kerberized Kafka.
I don't know if I'm missing something, some parameter... probably I am.. but can the zookeeper be told that also has to accept PLAINTEXT petitions for some nodes, or for some specific directories, such as kafka in the example:
zookeeper:2181/kafka
Resuming, the question is:
Is there any option to include a non kerberized Kafka Broker and make it work against the already kerberized Zookeeper in the cluster?
If you need configuration like:
[zookeeper] <----- SASL ----> [kafka] <----- non-authenticated request ---> [clients]
then yes, it's possible. You need just to
Create principal (with keytabs) for brokers that will be used to communicate with Zookeeper.
Configure Zookeeper ACLs, setting cdrwa access to the node zookeeper:2181/kafka to that user
Copy the keytab to brokers and configure Kafka jaas file like this:
ZookeeperClient {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
keyTab="/path/to/keytab"
principal="user#REALM";
};
Then, set zookeeper.set.acl=true in Kafka configuration, but do not set any authorizer.class.name (this would enable authentication for Kafka consumers and producers)

Why do we need to mention Zookeeper details even though Apache Kafka configuration file already has it?

I am using Apache Kafka in (Plain Vanilla) Hadoop Cluster for the past few months and out of curiosity I am asking this question. Just to gain additional knowledge about it.
Kafka server.properties file already has the below parameter :
zookeeper.connect=localhost:2181
And I am starting Kafka Server/Broker with the following command :
bin/kafka-server-start.sh config/server.properties
So I assume that Kafka automatically infers the Zookeeper details by the time we start the Kafka server itself. If that's the case, then why do we need to explicitly mention the zookeeper properties while we create Kafka topics the syntax for which is given below for your reference :
bin/kafka-topics.sh --create --zookeeper localhost:2181
--replication-factor 1 --partitions 1 --topic test
As per the Kafka documentation we need to start zookeeper before starting Kafka server. So I don't think Kafka can be started by commenting out the zookeeper details in Kafka's server.properties file
But atleast can we use Kafka to create topics and to start Kafka Producer/Consumer without explicitly mentioning about zookeeper in their respective commands ?
The zookeeper.connect parameter in the Kafka properties file is needed for having each Kafka broker in the cluster connecting to the Zookeeper ensemble.
Zookeeper will keep information about connected brokers and handling the controller election. Other than that, it keeps information about topics, quotas and ACL for example.
When you use the kafka-topics.sh tool, the topic creation happens at Zookeeper level first and then thanks to it, information are propagated to Kafka brokers and topic partitions are created and assigned to them (thanks to the elected controller). This connection to Zookeeper will not be needed in the future thanks to the new Admin Client API which provides some admin operations executed against Kafka brokers directly. For example, there is a opened JIRA (https://issues.apache.org/jira/browse/KAFKA-5561) and I'm working on it for having the tool using such API for topic admin operations.
Regarding producer and consumer ... the producer doesn't need to connect to Zookeeper while only the "old" consumer (before 0.9.0 version) needs Zookeeper connection because it saves topic offsets there; from 0.9.0 version, the "new" consumer saves topic offsets in real topics (__consumer_offsets). For using it you have to use the bootstrap-server option on the command line insteand of the zookeeper one.